Llama 3 70B · b=1 · FP8 · 1,024 × NVIDIA
memory-boundat22.1Ktok/s·$3.66/GPU-hr·71GB/81.92TB

You're reading ~70 GB of weights from HBM per token to do ~420 GFLOPs of math. The tensor cores sit at 1% utilization — they're fast enough; they just can't get fed. To earn more throughput you need HBM bandwidth, bigger batches, or smarter attention — not more FLOPS. That's the wall.

0.11101001,00010,00010G100G1T10T100T1P10Parithmetic intensity — FLOP/bytethroughput — FLOPSPeak FP8: 2.0 PFLOPSHBM roof · 3.35 TB/sridge AI ≈ 591 FLOP/byteLlama 3 70B · b=1 · FP8
§ 01

Inside the system

where the bits are right now
NVIDIA H100 SXM · 814 mm² · 80B transistorsHBM3e · 10GBHBM3e · 10GBHBM3e · 10GBHBM3e · 10GBHBM3e · 10GBHBM3e · 10GBHBM3e · 10GBHBM3e · 10GBNVLink · 900 GB/sPCIe Gen5 · 64 GB/sdecode · batch=1
HBM 3.35 / 3.35 TB/s
SMs 1 / 132 active
NVLink idle
reading 70 GB of weights per token · SMs starving. The HBM pipes run full and amber; the tensor cores sit mostly dark. This is the memory wall in pixel form.
§ 02

What it costs

dollars, rates, and counterfactuals
greenfield on-prem · 3-yr amortized
GPUs · $30.7M
Chassis & CPUs · $5.1M
Power infra · $7.2M
Real estate · $15.6M
Fabric & storage · $5.4M
hover a segment for its unit formula
GPUs $30.7MInterconnect $4.6MChassis & CPUs $5.1MPower infra $7.2MCooling $4.3MReal estate $15.6MFabric & storage $5.4M
Total capex
$72.9M
3-yr TCO: $98.4M
1,024 × NVIDIA H100 SXM
$ / GPU-hour
$3.66/hr
3-yr amortized
incl. power + ops
$ / M tokens
$47/ M tok
decode at batch=1
continuous batching: ~$2.36
≈ 0.97 Gulfstream G700 ($75M each)≈ 3 Manhattan penthouses ($25M each)
capex indicative · power $0.08/kWh · 4%/yr ops · PUE 1.45 · greenfield on-prem · batch=1 decode (continuous batching drops $/M tokens ~20×)
§ 03

Where it lives

watts, floor space, cooling
1 GPU
NVIDIA H100 SXM
700 W
80 GB HBM
1 tray
8 GPUs · HGX board
5.6 kW
NVIDIA × 8
1 rack
NVL-class, 72 GPUs
50 kW
liquid-cooled
Your cluster
15 racks
1,024 GPUs · 717 kW
IT load
Site
Building + substation
data hallpwrchill
1.04 MW
PUE 1.45 · ~4,000 sqft
IT load717 kW≈ 611 US homes' continuous draw
Total facility1.04 MW≈ peak draw of a Walmart Supercenter
Annual energy9.1 GWh/yr≈ 0.23× a small college campus
Heat rejected1.04 MW≈ 1,642 gal/day evaporated at cooling tower
Training run energy201K MWhover 8 days · ≈ Iceland's annual × 1.059%
that's training. now try Serve →