The Engineering Gauntlet

Radiation, Heat, and Hardware in the Void

Feb 27, 2026

This is the second article in a series exploring orbital data centers, what’s real, what’s hype, and what it might mean for the future of AI infrastructure. The first article, The Case for Orbital Compute, introduced the concept and the major players. This one gets into what it actually takes to make hardware work in space.

After I published the first article in this series, something interesting happened in the comments over on LInkedIn. The tech and business people were fascinated, asking about timelines and market size. The people with actual space engineering backgrounds were skeptical. One commenter with experience in space systems said he nearly spit out his coffee reading about “free cooling in space.” Another offered a five-word rebuttal to the entire concept: “The law of thermodynamics.”

This split is the story of this article. The basic value proposition I laid out last time (abundant solar power, radiative cooling, no land constraints) is real but incomplete. When you look at the actual engineering, every one of those advantages comes with a catch that the press releases tend to skip. Some of those catches are solvable. Some might not be.

I’ve spent the last two weeks reading everything I could find on the specific technical challenges, from Google’s research papers to SpaceX’s FCC filing details to NASA studies on thermal management in orbit. What follows is my attempt to lay out what the engineering gauntlet actually looks like.

Fair warning: I’m going to be harder on the claims in this article than I was in the first one. That’s not because I’ve turned skeptic. It’s because the stakes have gotten higher. When one company files for a million satellites and another projects energy costs 22 times cheaper than on Earth, we owe it to ourselves to understand what has to go right for any of this to work.

The heat problem is worse than you think

I’m starting here, not with radiation or connectivity, because this is the challenge I think is most underestimated.

In the first article, I listed “natural radiative cooling in vacuum” as part of the value proposition. Several readers pushed back hard, and they were right to. Let me correct myself.

Space is not cold in the way that matters for cooling electronics. Yes, the cosmic background temperature is about 2.7 Kelvin (roughly -455°F). But that’s misleading. In space, there’s no air. No air means no convection. On Earth, when your laptop gets hot, it dumps heat into the surrounding air. Air carries that heat away. That’s convection, and it’s the primary way every data center on the planet manages thermal loads.

In the vacuum of space, the only way to get rid of heat is radiation. You have to emit infrared energy from large surface areas, and that process is governed by the Stefan-Boltzmann law, which means radiative cooling scales with the fourth power of the temperature difference. In plain English: it works, but it requires enormous radiator surfaces relative to the amount of heat you need to shed.

How enormous? NASA studies show that radiators can account for more than 40% of the total mass of a space power system. Mike Safyan, a Planet Labs executive whose company is partnering with Google on Project Suncatcher, put it bluntly: “You’re relying on very large radiators... that’s a lot of surface area and mass.” Every kilogram of radiator is a kilogram you launched at thousands of dollars per kilo.

And it gets worse. Satellites in orbit aren’t actually in shadow most of the time. Depending on the orbital plane, they can be in direct sunlight for the majority of their orbit, absorbing solar flux on top of the heat generated by their own processors. Several readers flagged this after the first article: solar heat gain is a major unsolved issue. You’re trying to cool hardware while the sun is heating the same structure.

There’s a compounding problem here that’s easy to miss. Modern AI accelerators draw 700 watts or more per chip. Powering them requires massive solar arrays. So you need enormous radiators to shed the heat and enormous solar panels to generate the power, and both add surface area, mass, and atmospheric drag. In lower orbits, that drag means more fuel for station-keeping. In higher orbits, drag is less of a problem but everything else (radiation, debris persistence) gets worse. It’s a trade-off with no clean answer.

This is why I find it notable that Starcloud’s public materials emphasize “reduced cost” from solar energy and radiative cooling, and their white paper projects energy costs up to 22 times lower than terrestrial wholesale electricity, without publishing detailed thermal management specifications. For a challenge this fundamental, I’d like to see the math.

And as one engineer in the comments put it: “What are you going to do with all of that generated heat in a vacuum?” It’s the simplest version of the hardest question in this entire space. Phase-change materials and two-phase immersion cooling are being explored as potential solutions, but these add mass, complexity, and additional failure modes to an already challenging system.

To be fair, Google’s Suncatcher team has addressed this more directly than most. Their architecture envisions large radiator arrays as part of each satellite cluster. But even they acknowledge the mass and surface area constraints. There’s no free lunch here, just a different cafeteria.

Radiation: the good news has fine print

If thermal management is underestimated, radiation gets the most attention, and for good reason. Space is a shooting gallery.

Cosmic rays, high-energy protons from solar events, and particles trapped in Earth’s radiation belts all bombard orbital hardware constantly. These particles cause what engineers call Single Event Effects (SEEs): bit flips that corrupt data, latchups that can fry circuits, and cumulative damage that degrades chip performance over time. Consumer-grade silicon, the kind in every GPU and TPU on Earth, was never designed for this environment.

The traditional solution is radiation hardening, which involves modifying chip designs and manufacturing processes to tolerate radiation exposure. The cost: a 30 to 50% price premium and a 20 to 30% performance penalty. Radiation-hardened chips also tend to lag several generations behind their consumer counterparts, because each new chip design takes years to harden and qualify.

This is where Google delivered a genuinely surprising result. Their Suncatcher team tested Trillium TPUs (Google’s custom AI accelerators) for radiation tolerance and found that the TPU’s high-bandwidth memory (HBM) showed irregularities only after exposure to roughly 2 krad(Si), which is nearly three times the expected five-year mission dose of 750 rad(Si) in their target orbit. In other words, a consumer-grade AI chip survived far more radiation than expected without failing.

That’s legitimately encouraging. But it needs context.

The test was for total ionizing dose (TID), the cumulative effect of radiation exposure over time. SEEs, the random bit-flips from individual particle strikes, are a different problem. A chip can survive the total dose and still experience computational errors from individual cosmic ray hits during operation. For AI training, where you need thousands of calculations to be exactly right across millions of steps, even rare bit-flips can corrupt a training run.

SpaceX appears to be taking radiation seriously. Reports indicate the company has acquired a particle accelerator for radiation testing, suggesting they’re doing their own qualification work rather than relying solely on published data. Their FCC filing specifies multiple orbital shells at 50 km intervals between 500 and 2,000 km altitude, with different shells optimized for different workload types. Higher orbits get more radiation exposure but different power and coverage characteristics. Lower orbits are somewhat shielded by Earth’s magnetic field but face more atmospheric drag.

My take: radiation tolerance looks like the most solvable of the major engineering challenges. Google’s TPU results suggest that modern AI accelerators may be more naturally resilient than expected. But “more resilient than expected” and “reliable enough for production workloads at scale” are not the same thing. Nobody has demonstrated the latter yet.

The connectivity problem nobody wants to talk about

Training an AI model requires moving enormous amounts of data between processors. In a terrestrial data center, the GPUs in a training cluster are connected by high-speed interconnects (NVLink, InfiniBand) running at hundreds of gigabits per second, with latency measured in microseconds, across distances measured in meters.

In orbit, those processors are on separate satellites moving at 7.5 kilometers per second relative to the ground, separated by distances of tens to hundreds of kilometers, and the fastest operational inter-satellite laser links today, including Starlink’s own, top out at around 200 Gbps per link. That’s still well below what GPU training clusters require internally.

Google’s Suncatcher architecture reveals just how hard this problem is. Their design calls for 81 satellites flying in tight formation, essentially acting as a single distributed computer in space. The formation has to maintain precise alignment to achieve the interconnect speeds needed for TPU-to-TPU communication. Think about that: 81 separate spacecraft, each carrying AI accelerators and radiator panels and solar arrays, all maintaining position relative to each other with enough precision to sustain optical data links. While orbiting Earth at thousands of miles per hour. While avoiding space debris.

That’s not a satellite. That’s an orbital ballet.

SpaceX’s FCC filing takes a different approach. Rather than tight formation flying, SpaceX plans satellites that rely “nearly exclusively on high-bandwidth optical links” and may interconnect with Starlink’s existing laser mesh network. This is arguably more practical than Google’s approach (SpaceX already operates thousands of satellites with inter-satellite laser links) but it doesn’t solve the fundamental bandwidth gap for training workloads.

This is where Starcloud CEO Philip Johnston’s distinction becomes critical. In a recent interview, he said: “Training is not the ideal thing to do in space. I think almost all inference workloads will be done in space.” Inference (running a trained model to get answers) requires far less inter-processor bandwidth than training (teaching the model in the first place). If orbital data centers focus on inference rather than training, the connectivity constraint becomes much more manageable.

Blue Origin’s TeraWave constellation, promising terabit-per-second optical speeds with 5,408 satellites, is aimed squarely at the connectivity layer. But TeraWave is focused on data center networking and connectivity, not computation itself. Deployment isn’t expected to start until late 2027, and “announced” remains very different from “deployed.”

There’s also the last-mile problem. Even if satellite-to-satellite links get fast enough, data has to get between orbit and the ground. Weather disrupts optical ground links. Radio frequency links have lower bandwidth. And latency is bounded by physics: even at the speed of light, a round trip to a satellite at 1,000 km altitude and back takes about 6.7 milliseconds at minimum. That’s fine for many applications but real for latency-sensitive ones.

When something breaks, nobody’s coming

On John Collison’s Cheeky Pint podcast, co-hosted with Dwarkesh Patel, Patel pushed Musk on a practical question that anyone who has run a data center understands immediately: what happens when hardware fails?

In a ground-based AI training cluster, GPUs fail regularly. This isn’t hypothetical. Large-scale training runs at companies like Meta and Google experience hardware failures daily. When a GPU fails, a technician walks to the rack and swaps it out, often within hours. The training run checkpoints, restarts, and continues.

In orbit, that GPU is gone. There is no technician. There is no swap. SpaceX’s FCC filing specifies a five-year operational life per satellite. If a critical component fails in year one, you have a piece of expensive space debris for four more years until it deorbits.

The only options are building in enough redundancy that individual failures don’t matter (expensive, heavy, reducing compute density) or designing systems that gracefully degrade as components fail (requiring sophisticated autonomous fault management that doesn’t fully exist yet).

Think about the scale. SpaceX is filing for a million satellites. Even at an optimistic 99% reliability rate, that’s 10,000 satellites experiencing significant failures at any given time. Each one containing compute hardware, solar panels, radiators, and optical communications equipment that can’t be serviced.

This isn’t a reason it can’t work. Starlink already operates thousands of satellites with individual failures accounted for. But data center satellites carry far more expensive payloads than communications satellites, and the economics of stranding that hardware in orbit are more punishing.

The five-year replacement cycle also means the compute is perpetually aging. In a terrestrial data center, you can upgrade GPUs as new generations arrive. In orbit, you’re locked into whatever you launched. Five years is roughly two to three GPU generations. By the time a satellite reaches end of life, its compute hardware is ancient by industry standards.

A million satellites in an already crowded sky

I saved this section for last because it’s the one that makes me most uncomfortable.

There are currently about 15,000 active satellites in orbit. SpaceX is proposing to add 1 million. That’s not a percentage increase. That’s a 6,700% expansion of the orbital population.

SpaceX’s filing places these satellites at altitudes between 500 and 2,000 kilometers. This is worth pausing on. Starlink’s communications satellites operate at roughly 480 km. At that altitude, a dead satellite deorbits from atmospheric drag within a few years. At 2,000 km, a dead satellite stays up for centuries. Potentially thousands of years.

Harvard astrophysicist Jonathan McDowell put it directly: “One million satellites are going to be a big challenge for astronomy, especially as they are in higher orbits which is worse for us.”

Here’s the math that keeps debris experts up at night. NASA targets a 99.9% success rate for post-mission disposal, meaning that after a satellite’s operational life ends, it successfully deorbits 99.9% of the time. That sounds nearly perfect. Applied to a million satellites, it means 1,000 uncontrollable objects left in orbit. At higher altitudes, those objects stay there for centuries, creating collision risks for everything else in that orbital band. At 2,000 km, the top of SpaceX’s proposed range, atmospheric drag is virtually nonexistent. There is no natural cleanup mechanism. Anything that fails up there stays up there.

And 99.9% is a target. Actual disposal rates for current satellite constellations have sometimes been lower.

This isn’t just an environmental or astronomical concern. It’s a business risk for the orbital compute industry itself. Kessler syndrome, the scenario where cascading collisions create a debris field that makes orbital altitudes unusable, would destroy the infrastructure these companies are trying to build. You can’t run an orbital data center in an orbit filled with shrapnel.

One commenter on the first article made a sophisticated point about this: effective debris management isn’t about flagging problems after they occur, it’s about preventing “non-admissible states” from forming in the first place. That’s an engineering philosophy, not just a technical requirement, and it needs to be baked into the system architecture from day one.

Where I come out (so far)

I promised at the start of this series that I’d share my thinking as it evolves. So here’s my current scorecard on the engineering challenges:

Radiation tolerance: Cautiously optimistic. Google’s TPU results are encouraging. Modern AI chips appear more naturally resilient to radiation than expected. This looks like a problem that can be engineered around, especially for inference workloads that can tolerate occasional bit-flips more gracefully than training.

Thermal management: More concerned than when I started. The “free cooling in space” narrative is misleading, and the people who actually build things for space know it. Massive radiators, solar flux heating, and the mass penalties involved make this the most underappreciated engineering challenge. I want to see published thermal specifications from any company claiming cost savings.

Connectivity: Solvable for inference, very hard for training. Johnston’s inference-vs-training distinction is the key insight. If orbital data centers focus on running trained models rather than training them, the bandwidth constraints become manageable. For training, the physics of inter-satellite communication look genuinely limiting.

Maintenance: Requires a mindset shift, not a breakthrough. The satellite industry already manages large constellations with expected failure rates. But the economic penalty of stranding expensive compute hardware in orbit is real, and the five-year replacement cycle means perpetually aging chips.

Space debris: The systemic risk nobody is pricing in. A million new satellites is a fundamentally different proposition than anything humanity has put in orbit. The debris risk isn’t just a concern for astronomers. It’s an existential risk for the orbital compute industry itself. This needs far more regulatory attention than it’s getting.

The overall picture: every one of these challenges is workable in isolation. The hard part is solving all of them simultaneously, at scale, at a cost that competes with terrestrial alternatives that keep getting better. That cost question is the subject of the next article.

Next in the series: The Economics and Timeline — when does the math work? Musk says 2028. Google says mid-2030s. A European study says 2050. Someone is very wrong.

If you have expertise in thermal management in space, satellite constellation operations, or orbital debris modeling, I want to hear from you. The comments on the first article shaped my thinking on this one significantly, and I expect the same from this one.

Lee Bank

Mar 9Edited

Great balanced piece here in Part II. But, I see where we can add/update on a few items of note.

First, you use a Safyan quote using a figure of thousands of dollars per kilo" for launch. SpaceX is already close to only one thousand per kilo. Starcloud in its 10yr model uses $30/kg (highlighting maybe that 10 years is a long time).

Second, its noted that massive radiators would be in sunshine for much of the time. We might consider solar arrays that provide shading to radiators on out of sunshine side.

Third, on the point of satellites "breaking" and what happens next, there exist a number of startups that will attempt to address MRO (maintenance repair and operations) services - Infinite Orbits, Effective Space Solutions, Orbit Fab (more about fueling) and Astroscale.

One should still be skeptical, but these updates are another small step towards solving the list of considerations.

Pete Johnson's Substack

Discussion about this post

Ready for more?