BiGain: Unified Token Compression for Joint Generation and Classification
Authors: Jiacheng Liu, Shengkun Tang, Jiacheng Cui, Dongkuan Xu, Zhiqiang Shen
RSCT Score Breakdown
TL;DR
BiGain: Unified Token Compression for Joint Generation and Classification
RSCT Certification: κ=0.778 (certified) | RSN: 0.70/0.75/0.20 | Topics: LLM Agents and Reasoning, AI Safety and Alignment, Multi-Agent Systems
Overview
One-Sentence Summary
This paper presents BiGain, a training-free, plug-and-play framework that preserves the generation quality of diffusion models while improving their classification performance through frequency-aware token compression.
Key Innovation
The key innovation in this paper is the introduction of two frequency-aware token compression operators: Laplacian-gated token merging and Interpolate-Extrapolate KV Downsampling. These operators enable diffusion models to achieve faster inference while maintaining or enhancing both generation quality and classification accuracy, which is a significant advancement over previous acceleration methods that typically focused on only one of these objectives.
Should You Read This?
If you work on diffusion models: Yes, this paper provides a practical and effective solution to the challenge of accelerating diffusion models without sacrificing their core capabilities. If you work on multi-task learning or model compression: Maybe, as the frequency-aware compression techniques introduced in this paper could have broader applications beyond diffusion models.
The Good
- The proposed BiGain framework is thoroughly evaluated across multiple diffusion model backbones, datasets, and compression levels, demonstrating consistent improvements in the speed-accuracy trade-off.
- The authors provide a strong theoretical grounding for their approach, highlighting the importance of balanced spectral retention for effective token compression.
- The paper is well-written and accessible, with clear explanations of the key ideas and a good balance of technical details and high-level insights.
The Gaps
- The evaluation is limited to image-based tasks, and it's unclear whether the proposed techniques would generalize equally well to other modalities, such as text or audio.
- The paper does not address potential issues with model stability or robustness under high levels of compression, which could be an important consideration for real-world deployment.
- While the authors claim that BiGain is "training-free," it's not clear whether the compression operators themselves require any additional training or fine-tuning, which could limit the plug-and-play nature of the framework.
How to Read This Paper
If you're from the computer vision or diffusion modeling community: You can likely skip the background sections on diffusion models and focus on the core technical contributions of the BiGain framework, including the details of the Laplacian-gated token merging and Interpolate-Extrapolate KV Downsampling operators. If you're from the model compression or multi-task learning community: The background sections on diffusion models and their acceleration methods will be particularly helpful in understanding the context and motivation for this work. The technical details of the compression operators may also be of interest. Must read (everyone): Sections 3 and 4, which present the BiGain framework and its key components. Verify: The claims about the generalization of BiGain to other diffusion model backbones and datasets, as well as the stability and robustness of the compressed models.
Bottom Line
The BiGain framework proposed in this paper is a promising solution for accelerating diffusion models without compromising their generation quality or classification performance. The novel frequency-aware compression operators introduced in this work represent a significant advancement in the field of diffusion model acceleration and could have broader implications for model compression and multi-task learning. While the evaluation is limited to image-based tasks, the core ideas presented in this paper are worth investigating further, and the practical guidance on how to read and understand the work will be valuable for researchers in related fields.
Quality Assessment
Trust Level: MODERATE - Verify key results first
What the scores mean:
- 70% signal - This much of the paper directly supports its claims
- 75% context - Background material for readers from other fields (this is a bridge paper - high context is a feature!)
- 20% noise - Content that may mislead if taken at face value
Reliability score: 78% (certified)
Practical interpretation: Good foundation but some gaps. Read critically and verify key claims before building on this work.
Paper Details
- Authors: Jiacheng Liu, Shengkun Tang, Jiacheng Cui, Dongkuan Xu, Zhiqiang Shen
- Published: 2026-03-12
- Source: arxiv
- PDF: Download
- Primary Topic: LLM Agents and Reasoning
- Difficulty: Intermediate
Abstract
Acceleration methods for diffusion models (e.g., token merging or downsampling) typically optimize synthesis quality under reduced compute, yet often ignore discriminative capacity. We revisit token compression with a joint objective and present BiGain, a training-free, plug-and-play framework that preserves generation quality while improving classification in accelerated diffusion models. Our key insight is frequency separation: mapping feature-space signals into a frequency-aware representation disentangles fine detail from global semantics, enabling compression that respects both generative fidelity and discriminative utility. BiGain reflects this principle with two frequency-aware operators: (1) Laplacian-gated token merging, which encourages merges among spectrally smooth tokens while discouraging merges of high-contrast tokens, thereby retaining edges and textures; and (2) Interpolate-Extrapolate KV Downsampling, which downsamples keys/values via a controllable interextrapolation between nearest and average pooling while keeping queries intact, thereby conserving attention precision. Across DiT- and U-Net-based backbones and ImageNet-1K, ImageNet-100, Oxford-IIIT Pets, and COCO-2017, our operators consistently improve the speed-accuracy trade-off for diffusion-based classification, while maintaining or enhancing generation quality under comparable acceleration. For instance, on ImageNet-1K, with 70% token merging on Stable Diffusion 2.0, BiGain increases classification accuracy by 7.15% while improving FID by 0.34 (1.85%). Our analyses indicate that balanced spectral retention, preserving high-frequency detail and low/mid-frequency semantics, is a reliable design rule for token compression in diffusion models. To our knowledge, BiGain is the first framework to jointly study and advance both generation and classification under accelerated diffusion, supporting lower-cost deployment.
This analysis was automatically generated and certified by the Swarm-It RSCT pipeline. κ-gate score: 0.778 | Quality tier: certified