Llama 3 70B · FP8 · 1,024× NVIDIAHigh · Steady-state · 0.8 overlap

compute-bound·2.0Mtok/s·8d

Tensor cores

You're at 45% of peak FP8.

FP8 utilization45%

Ridge AI

Above the ridge — compute is the gate, not HBM.

AI vs ridge7389.2 / 591

FLOP / byte

Time to train

Estimated wall time at the configured MFU + parallelism.

Days8d

Bottleneck: tensor compute. Compute is the gate — closer to peak silicon utilization.

§ 01

Inside the system

where the bits are right now

HBM 0.12 / 3.35 TB/s

SMs 59 / 132 active

NVLink idle

reading 70 GB of weights per token · SMs starving. The HBM pipes run full and amber; the tensor cores sit mostly dark. This is the memory wall in pixel form.

§ 02

What it costs

dollars, rates, and counterfactuals

greenfield on-prem · 3-yr amortized

GPUs · $30.7M

Chassis & CPUs · $5.1M

Power infra · $7.2M

Real estate · $15.6M

Fabric & storage · $5.4M

hover a segment for its unit formula

GPUs $30.7MInterconnect $4.6MChassis & CPUs $5.1MPower infra $7.2MCooling $4.3MReal estate $15.6MFabric & storage $5.4M

Total capex

$72.9M

3-yr TCO: $98.4M

1,024 × NVIDIA H100 SXM

$ / GPU-hour

$3.66/hr

3-yr amortized

incl. power + ops

$ / M tokens

$0.52/ M tok

decode at batch=1

continuous batching: ~$0.03

≈ 0.97 Gulfstream G700 ($75M each)≈ 3 Manhattan penthouses ($25M each)

capex indicative · power $0.08/kWh · 4%/yr ops · PUE 1.45 · greenfield on-prem · batch=1 decode (continuous batching drops $/M tokens ~20×)

§ 03

Where it lives

watts, floor space, cooling

1 GPU

NVIDIA H100 SXM

700 W
80 GB HBM

1 tray

8 GPUs · HGX board

5.6 kW
NVIDIA × 8

1 rack

NVL-class, 72 GPUs

50 kW
liquid-cooled

Your cluster

15 racks

1,024 GPUs · 717 kW
IT load

Site

Building + substation

1.04 MW
PUE 1.45 · ~825 sqft

IT load717 kW≈ 611 US homes' continuous draw

Total facility1.04 MW≈ peak draw of a Walmart Supercenter

Annual energy9.1 GWh/yr≈ 0.23× a small college campus

Heat rejected1.04 MW≈ 1,642 gal/day evaporated at cooling tower

Training run energy201 MWhover 8 days · ≈ Iceland's annual × 0.001%

that's training.