Newsletter image

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Search

GDPR Compliance

We use cookies to ensure you get the best experience on our website. By continuing to use our site, you accept our use of cookies, Privacy Policy, and Terms of Service.

NVIDIA - Quantum AI

NVIDIA Ising: Open-Source Quantum AI Models

NVIDIA's Ising open AI models for quantum computing beat Gemini 3.1 Pro, Claude Opus 4.6, and GPT-5.4 on QCalEval. The open-source-vs-closed-frontier playbook.

License Other
TL;DR
  • Two open model families for quantum computing: a 35B-parameter MoE vision-language calibrator (3B active) and two sub-2M parameter 3D CNN error decoders.
  • Calibration beats Claude Opus 4.6 by 9.68% and GPT-5.4 by 14.5% on QCalEval. Decoding runs up to 2.5x faster than PyMatching with better accuracy.
  • Code is Apache 2.0; weights ship under the NVIDIA Open Model License.
System Requirements
RAM64 GB
GPUH100 80GB or 2x L40S (Calibration); any CUDA 7.0+ GPU (Decoding)
VRAM80 GB Calibration / 4 GB Decoding

NVIDIA shipped two open AI model families on April 14, 2026, both aimed at a single stubborn problem: making quantum processors actually useful. The headline is not that quantum computing got a new toy. It is that a 35B-parameter vision-language model with 3B active per token, released with open weights, outperforms Gemini 3.1 Pro, Claude Opus 4.6, and GPT-5.4 on its target benchmark by margins of 3 to 14 percent. For open-source builders who have been asking when small, specialized, domain-tuned open weights will start beating closed frontier generalists, the answer just arrived. It came from NVIDIA, of all places, and it runs on an H100.

What NVIDIA Actually Shipped

Ising is two model families distributed through a single GitHub landing repo, a Hugging Face collection, and NVIDIA Build.

The first is Ising-Calibration-1-35B-A3B, a mixture-of-experts vision-language model. Total parameters: 35 billion. Active per token: 3 billion. Experts: 256, with 8 routed per forward pass. Context window: 262,144 tokens. The base is Qwen3.5-35B-A3B under Apache 2.0, with a vision encoder bolted on to read experiment plots alongside measurement telemetry.

The second is Ising-Decoder-SurfaceCode-1, shipped in two variants. The Fast variant is a 912,000-parameter 3D convolutional neural network. The Accurate variant is 1.79 million parameters. Both target real-time quantum error correction on surface codes.

The repository code is Apache 2.0. The model weights on Hugging Face ship under the NVIDIA Open Model License, which allows commercial use with standard downstream redistribution terms. Both families are available as NVIDIA NIM microservices on build.nvidia.com, as raw weights on Hugging Face, and as training and inference cookbooks in the GitHub repo.

Why This Matters for Open-Source Builders (Even Non-Quantum Ones)

Almost no SingularityByte reader runs a quantum computer. That is fine. The reason to pay attention to Ising has very little to do with qubits.

A 35B-parameter open-weights model with 3B active per token just beat the biggest three closed frontier models on the task it was trained for. Not by a rounding error. By 14.5 percent against GPT-5.4. By 9.68 percent against Claude Opus 4.6. The playbook is naked on the page: take a capable open base (Qwen3.5), add a vision encoder, post-train on curated domain data, ship the weights. The result beats Gemini and Claude on your home turf.

This is the pattern worth studying. For two years the open-source argument has been "smaller open models plus fine-tuning will eventually catch closed models on narrow tasks." Ising is a concrete case where it already happened, backed by benchmarks that NVIDIA published under its own name. Every team building a vertical product on open weights (medical imaging, log analysis, legal document extraction, CAD generation) now has a louder reference point when pitching against the hosted-API alternative.

Ising Calibration: The 35B VLM

Quantum processors are not like GPUs. Every qubit drifts. Control pulses need to be retuned continuously against noise, temperature, and crosstalk. Human calibration engineers spend days interpreting pulse waveform plots and adjusting parameters. Ising Calibration automates that loop.

The model takes measurement plots as vision input and telemetry as text, then outputs tuning decisions. The MoE architecture keeps serving cost low: only 3 billion parameters activate per token even though the full model is 35 billion. On an H100 (80GB) or two L40S (48GB each) with vLLM and FP8 quantization, you get a fleet-scale inference endpoint at the cost of running a 3B dense model per request.

NVIDIA benchmarked it against the three closed frontier models on QCalEval, a public evaluation dataset also released today. On average across calibration tasks, Ising Calibration scores 3.27 percent better than Gemini 3.1 Pro, 9.68 percent better than Claude Opus 4.6, and 14.5 percent better than GPT-5.4. The gap widens on tasks requiring the vision encoder to read pulse-shape artifacts that the general-purpose VLMs have never seen during training.

Ising Decoding: Two Tiny 3D CNNs

Error correction on surface codes is the second half of the quantum stack. Physical qubits produce syndrome measurements; a decoder has to translate those syndromes into the underlying logical errors in real time, before the next round of measurements arrives. The industry standard for this is PyMatching, a classical minimum-weight perfect matching algorithm.

Ising Decoding replaces PyMatching with a 3D CNN acting as a predecoder. The Fast variant carries 912,000 parameters with a 9-unit receptive field. The Accurate variant carries 1.79 million parameters with a 13-unit receptive field. Both take a tensor shaped (B, 4, T, D, D), where T is the time dimension and D is the code distance.

Against PyMatching at code distance 13 with physical error rate 0.003, the Fast variant is 2.5 times faster and 1.11 times more accurate. The Accurate variant is 2.25 times faster and 1.53 times more accurate. On FP16 with TensorRT, NVIDIA reports 2.33 microseconds per decoding round on the Accurate variant. And the kicker: Ising Decoding required 10 times less training data than prior learned decoders to hit these numbers.

Getting the Code and Weights

The honest version: nobody on this editorial team has a quantum processor in the rack, and we are not about to pretend to benchmark Ising Calibration without one. What we can show is how to pull the models and line them up for inference. The rest is up to teams with the hardware to run them end to end.

Clone the landing repository and browse the cookbooks:

git clone https://github.com/NVIDIA/ising.git
cd ising
ls cookbooks/

The cookbooks cover training the Decoder on a custom surface code, exporting it to ONNX or TensorRT, and running the Quantum Calibration Agent Blueprint against simulated telemetry.

Pull the Accurate decoder weights from Hugging Face:

pip install huggingface_hub torch
huggingface-cli download nvidia/Ising-Decoder-SurfaceCode-1-Accurate \
  --local-dir ./ising-decoder-accurate

The model loads via standard PyTorch and runs inference on any CUDA card with compute capability 7.0 or newer (A100, H100, or a consumer RTX validated by NVIDIA). A minimal forward pass looks like this:

import torch
from safetensors.torch import load_file

state = load_file("ising-decoder-accurate/model.safetensors")
# Construct the model class per the repo's architecture definition,
# then load state. Input shape: (B, 4, T, D, D), fp16 internally.
# Output: logits of shape (B, 4, T, D, D) for a downstream matcher.
# D and T are odd and in [3, 13]. D = T = 13 is optimal.

For Ising Calibration, the base VLM deploys on vLLM with the same pattern as any Qwen3.5-A3B checkpoint. The model card on Hugging Face lists the launch command and recommended temperature (0.2) and max output tokens (16384).

Benchmarks vs. Competitors

QCalEval scores, averaged across the benchmark's calibration task set:

ModelQCalEval ScoreDelta vs. Ising
Ising-Calibration-1-35B-A3BBaseline0
Gemini 3.1 Pro3.27% below-3.27%
Claude Opus 4.69.68% below-9.68%
GPT-5.414.5% below-14.5%

Decoding results against PyMatching at code distance 13, physical error rate 0.003:

DecoderParamsSpeed vs. PyMatchingAccuracy vs. PyMatching
Ising Decoder Fast0.91M2.5x faster1.11x more accurate
Ising Decoder Accurate1.79M2.25x faster1.53x more accurate
PyMatching (baseline)N/A (classical)1.0x1.0x

The data-efficiency number is arguably the more important result: the decoders reach these scores with 10 times less training data than prior learned decoders. That lowers the barrier for quantum hardware teams to fine-tune Ising on their own device's noise profile without building a full error-correction research pipeline.

Limitations and Gotchas

Ising is not a general-purpose model. Running Calibration against non-quantum VLM benchmarks will produce nothing useful. It was post-trained hard on a narrow distribution.

VRAM expectations are also higher than the parameter math suggests. Calibration needs 48GB minimum on two L40S cards, or 80GB on a single H100, to serve comfortably at BF16. FP8 brings the footprint down, but not to consumer-GPU levels.

The weights ship under the NVIDIA Open Model License, not Apache 2.0. Commercial use is allowed, redistribution is allowed, but the license is not OSI-approved. Teams that require a strict OSI license for procurement or compliance reasons should read the text carefully before committing.

And the obvious one: neither model is useful without a quantum backend. Calibration expects structured plot inputs from a real device or a simulator. Decoding expects surface-code syndrome tensors. These are not drop-in replacements for anything on a typical ML team's roadmap today.

Who Should Use It and What to Watch

Direct users are quantum hardware startups (IonQ, Atom Computing, IQM, Infleqtion, EeroQ, Q-CTRL, SEEQC), national labs (Sandia, Fermi National Accelerator Lab, Lawrence Berkeley), and academic groups (Harvard, UC Santa Barbara, UC San Diego, Cornell, University of Chicago, USC, Yonsei, Academia Sinica, UK National Physical Laboratory). Most of them were already named as Ising early adopters in the announcement, which tells you the release was staged with the quantum ecosystem in mind.

For everyone else in open-source AI, watch the playbook. Ising is a blueprint for how to ship a domain-specific open model that beats closed generalists. Qwen base plus vision encoder plus curated domain post-training plus Apache 2.0 code plus permissive weight license. That pipeline is reproducible. Expect the next six months to bring Ising-shaped projects in materials science, protein design, chip design, and embedded control.

Pull the models from Hugging Face, read the developer blog, and file the pattern away for your next narrow-domain build.

Sources

Prev Article
SkyReels-V3: Open-Source Reference-to-Video and Talking Avatars
Next Article
OpenThinker-32B

Related to this topic: