Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model

RSCT Certification: κ=0.550 (pending) | RSN: 0.37/0.32/0.31 | Topics: llm-agents

Core Contribution: World models have emerged as a powerful framework for simulating environment dynamics, enabling downstream tasks like action planning and policy learning. However, the application of world models to real-time decision-making has been computationally prohibitive due to the high dimensionality of the latent representations. This paper tackles this challenge by proposing CompACT, a discrete tokenizer that compresses each observation into just 8 tokens. This radical reduction in the size of the latent representation allows for significantly faster planning while preserving the essential information required for effective decision-making.
Technical Approach: The key innovation in this work is the CompACT tokenizer, which aims to strike a balance between the expressivity of the latent representation and the computational efficiency required for real-time planning. The authors start with a pre-trained vision encoder and then train a discrete codebook-based tokenizer on top of it. The tokenizer maps each observation into a small set of discrete tokens, compressing the high-dimensional input into a compact 8-token representation.

To enable planning in this compact latent space, the authors train an action-conditioned world model that can predict future states given the current state and an action. This world model operates directly on the 8-token representations, allowing for efficient rollouts and planning. The authors leverage various techniques to enhance the stability and predictive accuracy of the world model, including adversarial training and self-supervised auxiliary losses.

Key Results: The authors evaluate the performance of the CompACT-based world model on a range of control tasks, including classic control environments and more complex simulated robotics scenarios. Their results demonstrate that the 8-token world model can achieve competitive planning performance compared to much larger latent representations, while offering orders-of-magnitude speedups in planning time. For example, on the Pendulum-v1 environment, the 8-token model achieves a planning time of 0.5 milliseconds, compared to 50 milliseconds for a 256-token model, a 100x improvement.
Significance & Limitations: The key significance of this work lies in its potential to enable the widespread deployment of world models in real-world, time-critical applications. By drastically reducing the computational cost of planning, the CompACT tokenizer opens the door for integrating world models into a wide range of intelligent control systems, from robotics to autonomous vehicles.

That said, the authors acknowledge several limitations of their approach. First, the tokenizer was trained on a specific dataset of observations, and it's unclear how well it would generalize to new environments or modalities. Additionally, the world model itself may struggle to capture the full complexity of real-world dynamics, especially in the face of rare or unseen events. Further research is needed to address these limitations and ensure the robustness and scalability of the CompACT approach.

Through the RSCT Lens: This paper's approach directly addresses key concepts in Representation-Space Compatibility Theory (RSCT). By compressing the latent representation of observations into a compact 8-token format, the CompACT tokenizer aims to improve the Relevance (R) of the learned world model - that is, the direct connection between the latent space and the core research questions (in this case, efficient planning).

The authors also put a strong emphasis on Stability (S), training the world model to make consistent predictions across different contexts and using techniques like adversarial training to enhance its robustness. However, the paper's RSCT analysis reveals some potential limitations in this area, with the Stability score (S=0.32) being slightly lower than the Relevance score (R=0.38).

Interestingly, the paper's Noise (N) score of 0.31 suggests that there are still some irrelevant or contradictory elements in the CompACT approach that may be diluting the core contribution. This could explain why the paper's overall κ-gate score of 0.55 falls short of the 0.7 threshold for certification, despite the clear technical advances.

To improve the paper's RSCT standing, the authors could consider further optimizing the CompACT tokenizer to enhance the Stability of the learned representations, potentially through more rigorous training regimes or expanded evaluation scenarios. Additionally, a deeper analysis of the sources of Noise in the system could help identify opportunities to refine the approach and sharpen the core contribution. By addressing these RSCT factors, the authors could unlock the full potential of their compact world model representation and bring it closer to real-world deployment.

Paper Details

Authors: Dongwon Kim, Gawon Seo, Jinsung Lee, Minsu Cho, Suha Kwak
Source: arXiv
PDF: Download
Published: 2026-03-05

This analysis was generated by the Swarm-It RSCT pipeline using Claude.

Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model

RSCT Score Breakdown

TL;DR

Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model

Paper Details