Newsletter image

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Search

GDPR Compliance

We use cookies to ensure you get the best experience on our website. By continuing to use our site, you accept our use of cookies, Privacy Policy, and Terms of Service.

Overworld - Video Generation

Overworld Waypoint-1.5

Overworld Waypoint-1.5 is a 1.2B Apache 2.0 world model that generates interactive 3D environments at 720p 60FPS on RTX 3090+. Runs locally, no cloud needed.

License Apache 2.0
TL;DR
  • 1.2B parameter diffusion world model under Apache 2.0. Generates interactive 3D environments in real-time from keyboard and mouse input.
  • 720p at 60 FPS on RTX 3090+. 360p variant for gaming laptops and Apple Silicon. 10-second context window at 60 FPS (512 frames).
  • Runs 100% locally via Overworld Biome desktop app or World Engine library. No cloud dependency, no data leaves your machine.
System Requirements
RAM8GB
GPURTX 3090 (720p) or RTX 3060 (360p)
VRAM8GB+ (360p) / 12GB+ (720p)
✓ Apple Silicon

Overworld just shipped a 1.2B-parameter world model that generates interactive 3D environments at 720p and 60 FPS on an RTX 3090. The license is Apache 2.0. It runs locally on your hardware, accepts keyboard and mouse input in real time, and holds a 10-second context window at 60 frames per second. This is not an LLM, not a video generator, and not a game engine. It is something new.

What Waypoint-1.5 Actually Is

World models sit at the intersection of video generation and interactive simulation. Unlike a standard video diffusion model (which renders a clip and hands it to you), a world model generates frames on the fly based on your actions. You press W to walk forward, and the model renders what "forward" looks like, in real time, maintaining physical consistency.

Waypoint-1.5 takes a starting image or short video clip, then continuously generates new frames as you interact through keyboard and mouse controls. Think of it as a learned physics engine: the model has internalized enough about 3D environments to produce coherent spatial output without any explicit 3D mesh, texture, or scene graph.

This is different from game engines like Unity or Unreal, which rely on handcrafted assets and explicit physics. It is also different from offline video generators like HunyuanVideo or Sora, which produce non-interactive clips. Waypoint-1.5 is designed for live interaction at consumer-hardware frame rates.

From 2 Seconds to 10: What Changed

Waypoint-1 (released late 2025) could generate interactive environments at 360p and roughly 20 FPS, with a 2-second context window. That meant the model could "remember" about 40 frames of history before losing coherence. Useful as a proof of concept, but not practical for anything beyond short demos.

Waypoint-1.5 rewrites those numbers. The context window jumped to 10 seconds at 60 FPS (512 frames). Resolution went from 360p to 720p. Frame rate climbed from 20 FPS to 56+ FPS on high-end GPUs. Overworld reports they trained on roughly 100x more data than the first version.

The architecture is a diffusion-based autoregressive transformer. Each frame is encoded through a Tiny Hunyuan Autoencoder (taehv1_5) that compresses video with 4x temporal and 8x spatial compression into 32 latent channels. The transformer then predicts the next set of latent frames conditioned on your input actions, and the decoder reconstructs the output. The full model is 1.2B parameters (2B with BF16 tensor storage).

The 5x context window increase matters most. Two seconds of memory meant the model would "forget" rooms you just walked through. Ten seconds means you can explore a space, turn around, and see it still there. This is the difference between a tech demo and something you can actually build on.

Performance Numbers

Overworld ships two variants: a 720p model for desktop GPUs and a 360p model for laptops and (soon) Apple Silicon. Here are the published benchmarks.

GPU Resolution FPS (Unquantized) FPS (w8a8 Quantized)
RTX 5090 720p 56 72
RTX 4090 720p ~60 ~60+
RTX 3090 720p ~60 ~60
Gaming Laptop GPU 360p ~60 ~60
Apple Silicon (planned) 360p TBA N/A

For comparison, Waypoint-1 managed 20 FPS at 360p with a 2-second context window. Waypoint-1.5 hits 56 FPS at 720p with 10 seconds of context. That is roughly an 8x effective improvement when you factor in resolution, frame rate, and temporal coherence together.

The w8a8 quantization path on the RTX 5090 pushes throughput to 72 FPS, a 29% gain over the unquantized model. If you are running on 30-series or 40-series hardware, the 720p model should hit the 60 FPS target without quantization.

How to Run It

You have four ways to get Waypoint-1.5 running today.

Option 1: Overworld Biome (Desktop App)

The fastest path. Biome is Overworld's desktop client for Mac and Windows. Download it, pick a starting image, and start exploring.

# Download from:
# https://over.world/install
# Available for Windows and macOS

Option 2: Overworld Stream (Browser)

No install required. Open overworld.stream in your browser and interact with Waypoint-1.5 directly. This runs inference on Overworld's servers, so you do not need a GPU.

Option 3: HuggingFace Model Weights

Grab the weights and run inference yourself. Both variants are available under Apache 2.0.

# 720p variant (for RTX 3090+)
git lfs install
git clone https://huggingface.co/Overworld/Waypoint-1.5-1B

# 360p variant (for laptops / broader hardware)
git clone https://huggingface.co/Overworld/Waypoint-1.5-1B-360P

Option 4: World Engine (Open-Source Inference Library)

Overworld's World Engine is the open-source inference library built for running Waypoint models locally. Clone it and follow the setup instructions.

git clone https://github.com/Wayfarer-Labs/world_engine.git
cd world_engine
pip install -r requirements.txt
# See README for full setup and example scripts

Who Built This

Overworld (formerly Wayfarer Labs) is based in Providence, Rhode Island. The company was founded in 2025 by Louis Castricato (CEO) and Shahbuland Matiana. Their pitch is straightforward: build real-time generative world models that run on consumer hardware, locally, with no cloud dependency required.

The local-first angle is a deliberate choice. Overworld argues that running inference on your own GPU is better for privacy (no frames leaving your machine) and reduces environmental impact compared to cloud-hosted generation. Whether you agree with the framing or not, the result is a model that ships with weights you can download and run without an API key.

The Apache 2.0 license means you can use the weights commercially, modify them, and redistribute. No custom license with ambiguous clauses. No "open-weights but closed training code" asterisks. The inference library (World Engine) is also open source.

What to Watch Out For

Waypoint-1.5 is impressive for its size and speed, but it has real limits you should know before building anything on top of it.

Not photorealistic. The generated environments look more like stylized 3D renders than photographs. If you expect AAA game visuals, recalibrate. The model trades visual fidelity for real-time performance.

Coherence degrades over time. The 10-second context window is a huge step up from 2 seconds, but it still means the model starts losing track of spatial details after about 10 seconds. Walk far enough and earlier parts of the environment may not be consistent when you return.

Apple Silicon support is "coming soon." The 360p variant is intended for Macs, but native Apple Silicon optimization is not shipped yet. If you are on a MacBook, expect to wait for a future update.

No multiplayer or persistent worlds. Each session generates a fresh environment. There is no shared state, no persistence across sessions, and no multi-user interaction. This is a single-player, single-session experience for now.

Early-stage ecosystem. Tooling around world models is still nascent. There is no standard format for "world model scenes," no editor for placing constraints, and limited documentation on fine-tuning. You are working at the frontier.

Who Should Care

Game developers and prototypers. If you want to sketch out interactive environments without building assets, Waypoint-1.5 lets you generate explorable spaces from a single image. Useful for rapid prototyping, mood boarding, or testing spatial ideas before committing to a full engine.

Simulation and training researchers. Real-time environment generation from learned models is directly relevant to robotics simulation, reinforcement learning, and embodied AI. A 1.2B model running at 60 FPS on consumer hardware lowers the barrier to entry significantly.

Interactive media and education. Imagine generating a walkable historical scene from a single reference image, or creating explorable environments for educational content without a 3D art team.

Spatial computing enthusiasts. As VR and AR headsets become more common, the ability to generate interactive 3D environments on the fly becomes more valuable. Waypoint-1.5 is not VR-ready today, but the architecture points in that direction.

Download the Biome app at over.world/install and generate your first interactive world in under five minutes. If you want to go deeper, grab the 720p weights from HuggingFace and spin up the World Engine on your own GPU.

Sources

Prev Article
Arcee AI Trinity-Large-Thinking
Next Article
OpenThinker-32B

Related to this topic: