Faster Vision Mamba is Rebuilt in Minutes Via Merged Token Re-training !
We introduce R-MeeTo the first Token Merging method for Mamba.
The key knowledge loss mainly causes the heavier performance drop, applying token reduction.
R-MeeTo is thus proposed, fast fixing key knowledge and therefore recovering performance.
In the following video, we show the intuition of our work, where Genera Knowledge includes the common partterns shared among tokens, and Specific Knowledge indicates the specific partterns in particular tokens.
Token reduction is popular in model efficiency. It has yielded promising outcomes in ViTs, yet its efficiency in Vim remains unexplored. Thus, we have the following pre-experiments.
From the perspective of knowledge structure, we have the following analyses (X: inputs; Y: outputs):
Building on these insights, we propose R-MeeTo. R-MeeTo is simple and effective, with only two main modules: merging and re-training. Merging lowers the knowledge loss; re-training fast recovers the knowledge structure of Mamba. From the next video, we show that merging does help.
R-MeeTo faster Mamba achieved in minutes with limited performance drop.
@misc{shi2024faster,
title={Faster Vision Mamba is Rebuilt in Minutes Via Merged Token Re-training},
author={Shi, Mingjia and Zhou, Yuhao and Yu, Ruiji and Li, Zekai and Liang, Zhiyuan and Zhao, Xuanlei and
Peng, Xiaojiang and Rajpurohit, Tanmay and Vedantam, Ramakrishna and
Zhao, Wangbo and Wang, Kai and You, Yang},
year={2024},
eprint={2412.12496},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2412.12496},
}