Newsletter image

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Search

GDPR Compliance

We use cookies to ensure you get the best experience on our website. By continuing to use our site, you accept our use of cookies, Privacy Policy, and Terms of Service.

Tenstorrent - Hardware

Tenstorrent TT-QuietBox 2

Tenstorrent TT-QuietBox 2 ships Q2 2026 at $9,999: four Blackhole ASICs, 128 GB GDDR6, fully Apache-2.0 RISC-V software stack, liquid-cooled, wall outlet.

License Apache 2.0
License Apache 2.0
TL;DR
  • Four Blackhole ASICs, 480 Tensix + thousands of RISC-V cores, 2,654 TFLOPS BlockFP8, 128 GB GDDR6, liquid-cooled, runs on a standard wall outlet.
  • Fully Apache 2.0 stack: TT-Forge compiler, TT-Metalium SDK, TT-LLK kernels. Fork any layer, debug at hardware level. Not a CUDA black box.
  • Llama 3.1 70B at 476 tok/s on-device, gpt-oss-120B runs full-parameter. Ships Q2 2026 from $9,999, waitlist live.
System Requirements
RAM256GB DDR5
GPU4x Tenstorrent Blackhole
VRAM128GB GDDR6

For 15 years, "buy a serious AI box" has meant "buy NVIDIA, run CUDA, hope the driver stack stays kind to you." Tenstorrent's TT-QuietBox 2, announced at GDC 2026 and shipping Q2 2026 starting at $9,999, is the first credible desk-side workstation to break that pattern. Four Blackhole ASICs, 128 GB of GDDR6, RISC-V cores top to bottom, a fully Apache-2.0 software stack from MLIR compiler down to the kernel-level SDK, liquid cooling, and a standard wall outlet. It runs Llama 3.1 70B at 476 tokens per second on-device, no cloud round-trips and no CUDA black boxes.

What's actually in the box

The TT-QuietBox 2 packages four Blackhole accelerator cards in a single chassis. Per the Tenstorrent Blackhole spec sheet, each chip has 120 Tensix cores, 16 "big" RISC-V 64-bit cores, and 752 "baby" RISC-V cores for control and movement. That works out to 480 Tensix cores and roughly 2,654 TFLOPS of BlockFP8 compute across the box. Memory is 128 GB of GDDR6 (32 GB per chip, around 512 GB/s per chip), with 256 GB of DDR5 system RAM behind the host CPU. The interconnect is Ethernet-based at 1 TB/s total, ten 400 Gbps QSFP-DD links per chip, which is the same plumbing Tenstorrent uses to scale up to their Galaxy systems.

Power draw stays under 1.5 kW so the whole thing plugs into a standard 120 V wall outlet. The chassis is liquid-cooled by design rather than as an aftermarket bolt-on, and the announcement is explicit that Tenstorrent wanted the box "quiet enough to sit on a desk," not parked in a closet. Pricing starts at $9,999, with the waitlist live now and shipments planned for Q2 2026.

The actual selling point: an open stack you can fork

Hardware is the easy half. The interesting part is that every layer of the software stack is Apache 2.0 on GitHub. Tenstorrent didn't ship an open API on top of a proprietary blob; they shipped the compiler, the runtime, the kernel library, and the low-level kernels themselves. If a Blackhole op is slow or wrong on your model, you can read the code that emitted it, patch it, and rebuild.

LayerRepoWhat it does
TT-Forgetenstorrent/tt-forgeMLIR-based compiler. Takes PyTorch, ONNX, TensorFlow, JAX, PaddlePaddle and emits Blackhole code.
TT-Metaliumtenstorrent/tt-metalKernel-level SDK. Direct access to Tensix cores, the equivalent of CUDA C++ but readable.
TT-NNsame repoHigher-level operator library on top of TT-Metalium, PyTorch-friendly.
TT-LLKtenstorrent/tt-llkThe low-level kernels themselves. Custom Tensix ISA.

For comparison, CUDA is closed. Triton is open, but the NVIDIA driver and PTX backend it targets are not. With the TT-QuietBox 2 stack, a developer can trace a slow Llama matmul down to the actual Tensix instructions and either patch TT-LLK or hand-write a custom kernel through TT-Metalium. That degree of visibility doesn't exist on any NVIDIA box at any price.

Performance vs. the H100, honestly

The numbers Tenstorrent has published focus on single-box throughput. Llama 3.1 70B clocks 476.5 tokens/sec on the four-chip TT-QuietBox 2. An 8 x H100 node serves Llama 3.3 70B at roughly 2,600 tokens/sec, so per-chip the H100 is still significantly faster on dense inference. A single Blackhole box also runs gpt-oss-120B fully on-device, and a Boltz-2 biomolecular workload that takes 45 minutes on a CPU finishes in 49 seconds on one Blackhole chip.

What Blackhole genuinely wins on is openness, programmability, and price-per-token if you can stand the lower raw throughput. A four-chip H100 workstation does not exist at $9,999. The closest NVIDIA equivalent is a DGX Station at multiples of the price with a proprietary stack you cannot meaningfully fix when it misbehaves.

Where it sits in the open-hardware spectrum

The open-hardware AI compute spectrum now has a recognizable shape. At the budget end, Seeed Studio's reComputer RK series built on Rockchip RK3576 and RK3588 SoCs delivers around 6 TOPS of NPU compute in a small-board form factor for a few hundred dollars, aimed at edge computer vision and small-model inference. At the upper end, the TT-QuietBox 2 puts frontier-class inference on a desk for ten thousand. The middle (Jetson Orin, AMD AI accelerators, Apple Silicon Macs) is crowded but mostly proprietary in the parts that matter. Tenstorrent is currently the only player covering the high-end-and-open quadrant.

Who actually wants this

The TT-QuietBox 2 is not the right box if you need maximum raw inference per dollar today; an H100 cloud instance still wins on that metric. It is the right box if any of the following matter to you:

  • You need the model to never leave your building. Healthcare, defense, legal, financial back-office, anything where data residency is a hard requirement.
  • You want to fix the stack when it breaks. Researchers building custom transformer variants, kernel authors, anyone who has hit a CUDA bug and waited months for a driver fix.
  • You are betting on RISC-V. The Blackhole chip is RISC-V end-to-end and the software stack reflects that. Investing in skills here transfers.
  • You are tired of cloud bills. $9,999 amortizes against ongoing inference costs faster than most teams expect once usage is steady.

What to watch in the next 90 days

Three things will decide whether the TT-QuietBox 2 actually displaces NVIDIA on desks. First, ship dates: Q2 2026 ends June 30. If the box ships on time and at the announced price, that itself is news. Second, kernel maturity: independent benchmarks on Mixtral, DeepSeek-R1, Qwen 3, and Gemma 3 will tell us how Blackhole handles MoE and attention patterns that weren't in the launch demos. Third, the developer story: Tenstorrent has shipped tt-metal on GitHub for years, but pull-request velocity from outside contributors is the real proxy for whether the open stack pulls a community.

Get on the waitlist if you want to play. Even if you do not buy one, the existence of a full-stack open RISC-V workstation at $9,999 reshapes the conversation about what local AI hardware can look like.

Sources and further reading

Benchmarks community-reported from Tenstorrent's announcement and the linked references. Not independently verified. Compiled 2026-05-19.

Prev Article
AI Cybersecurity Just Got Autonomous: Mythos, Glasswing, GPT-5.4-Cyber
Next Article
Xiaomi MiMo-V2-Pro: The Hunter Alpha Reveal

Related to this topic: