Any geographic coordinate can be expanded into rich places and stories.
Procedural galaxies where code governs structure and LLMs provide narrative.
A "falling sand" simulator where chemical reactions are hallucinated by Gemini.
A deck-builder where users can "Wish" for cards generated in real-time.
Interactive 3D solar system with view-dependent AI narration.
Language agents increasingly require persistent worlds in which they can act, remember, and learn. Existing approaches sit at two extremes: conventional web frameworks provide reliable but fixed contexts backed by databases, while fully generative world models aim for unlimited environments at the expense of controllability and practical engineering.
In this work, we introduce the Web World Model (WWM), a middle ground where world state and "physics" are implemented in ordinary web code, and large language models generate context, narratives, and high-level decisions on top of this structured latent state.
We build a suite of WWMs on a realistic web stack, including an infinite travel atlas grounded in real geography, fictional galaxy explorers, web-scale encyclopedic and narrative worlds, and simulation- and game-like environments. Across these systems, we identify practical design principles for WWMs, such as separating code-defined rules from model-driven imagination, representing latent state as typed web interfaces, and using procedural generation to achieve unlimited but structured exploration. Our results suggest that web stacks themselves can serve as a scalable substrate for world models, enabling controllable yet open-ended environments for both human users and language agents.
There is a missing middle ground between fixed-context web applications and unconstrained world models. WWMs inherit the controllability, observability, and tooling of web frameworks, yet they can procedurally expand to an effectively unlimited state space.
A WWM decomposes the world state into two orthogonal components:
This decomposition ensures that code governs the "hard" constraints while the AI handles the "soft" creative layer.
The WWM Architecture separates deterministic Code Layer from stochastic AI Layer.
Deterministic Hashing: Inputs converge on coordinates to produce a frozen seed.
We cannot store an infinite universe in a database. Instead, we generate it "Just-In-Time". A coordinate is passed through a hash function to get a seed. This seed fixes the LLM's sampling randomness. This grants Object Permanence with no storage cost: a player can leave a planet, come back later, and the planet stays the same.
We replace opaque latent vectors with Typed Interfaces (e.g., TypeScript interfaces). The LLM predicts valid JSON objects conforming to these schemas, preventing structural hallucinations. Furthermore, WWMs employ a Fidelity Slider: if the LLM is slow or unavailable, the system gracefully degrades to cached content or template-based generation, ensuring the world remains functional.
We implemented a suite of applications spanning diverse domains to demonstrate the versatility of the framework.
Inspired by Google Earth, this application allows open exploration of the real globe without a database. Code infers physical attributes (continent, climate) to build semantic grounding. The LLM then operates within this structured latent space to select themes (e.g., "Desert Bloom") and generate itineraries. Any geographic coordinate can be visited, with procedural beacons generated Just-In-Time.
A procedural sci-fi simulation where the entire cosmos is synthetic. Physics: Algorithms dictate galaxy layouts, star lanes, and planet clusters. Imagination: The LLM textures this geometry with mission briefs and narrative hooks (e.g., "Stormglass Biomes"). Object permanence is achieved via hashing, ensuring that revisiting a coordinate always yields the exact same state without database lookups.
A roguelike deck-builder separating creative generation from executable mechanics. The "Wish" mechanism allows users to type a free-form prompt (e.g., "a fireball that freezes enemies"). The LLM translates this into valid game logic codes (JSON schema) which the symbolic engine executes.
Redefining the "falling sand" genre. Traditional sandboxes rely on fixed reaction tables.
Here, when unknown elements interact, the LLM decides the reaction (e.g., Life + Fire = Ash) and the properties of the new element.
An optional AI Supervisor monitors the canvas to induce emergent behavior or prevent stagnation.
A 3D solar system simulator offering orbit viewing, piloted flight, and surface walking. A "Cosmic Guide" subtitle strip auto-refreshes every 30 seconds, providing context-aware AI narration based on the user's current camera view and selected body.
A knowledge-centric WWM where the "world" is the open web. Instead of a static database, the agent retrieves evidence via search (Physics) and composes a structured, Wikipedia-like article (Imagination) with citations. Retrieval and rendering are code-defined, while the content is synthesized on the fly.
Exploring long-form generative fiction. The user controls generation through Interface Styles (visuals) and Literary Tags (genre/tone). The "Physics" layer handles page chunking and state management, while the LLM streams the narrative text, ensuring consistency across page turns.
@article{wwm2025,
author = {Feng, Jichen and Zhang, Yifan and Zhang, Chenggong and Lu, Yifu and Liu, Shilong and Wang, Mengdi},
title = {Web World Models},
journal = {arXiv},
year = {2025},
}