Official implementation of TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing, in collaboration with Google Cloud AI, accepted at AAAI 2026 (Main Technical Track).
🤖 TabFlash is an efficient and accurate multimodal LLM, achieving state-of-the-art performance outperforming GPT-4o and Gemini 2.5 Pro with exceptionally low computational cost.
🚀 TabFlash (3B) achieves state-of-the-art performance while reducing FLOPs by 27% and memory usage by 30% compared to the second-best MLLM.
⚡ TabFlash (1B) outperforms most MLLMs with exceptionally low TFLOPs and just 11.2 GB peak memory, enabling deployment on low-memory GPUs.
This code is tested on python 3.9, CUDA 12.4, PyTorch 2.4.1, and FlashAttention 2.7.3.
conda create -n tabflash python=3.9 -y
conda activate tabflashFollow the official guide.
cd InternVL
pip install -r requirements.txtpip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124git clone --branch v2.7.3 --single-branch https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
python setup.py install
cd ..pip install wandb sacrebleu distance apted bitsandbytes --upgrade
pip install datasets==2.18.0TabFlash uses MMTab from Table-LLaVA.
- Download MMTab-instruct_table_images_82K.zip and MMTab-pre_table_images_part_2_16K.zip.
- Place them under
data/LLaVA-Pretrain/imagesand unzip them. RenameIID_train_imagedirectory totable_pretrain_part_1. - Download table_only_pretrain_data_with_length.jsonl. Place it under
data/LLaVA-Pretrain.
- Download MMTab-instruct_table_images_82K.zip.
- Place it under
data/LLaVA-Finetune/images/table_instructVand unzip it. Rename the resultingIID_train_imagedirectory toimages. - Download table_only_sft_data_with_length.jsonl. Place it under
data/LLaVA-Finetune.
- Download MMTab-eval_test_data_49K_llava_jsonl_format.jsonl and MMTab-eval_table_images_23K.zip.
- Place them under
data/LLaVA-Inferenceand unzip it.
- Download MMTab-eval_test_data_49K.json and MMTab-eval_test_tables_23K.json.
- Place them under
data/MMTab-eval_evaluation.
TabFlash/
├── InternVL/
│ ├── internvl_chat/
│ │ ├── scripts/
│ │ ├── inference.py
│ │ ├── mmtab_eval.py
│ │ └── ...
│ └── ...
├── data/
│ ├── LLaVA-Pretrain
│ │ ├── images/
│ │ │ ├── table_pretrain_part_1/
│ │ │ ├── table_pretrain_part_2/
│ │ ├── table_only_pretrain_data_with_length.jsonl
│ ├── LLaVA-Finetune
│ │ ├── images/
│ │ │ ├── table_instructV/
│ │ │ │ ├── images/
│ │ ├── table_only_sft_data_with_length.jsonl
│ ├── LLaVA-Inference
│ │ ├── all_test_image/
│ │ ├── MMTab-eval_test_data_49K_llava_jsonl_format.jsonl
│ ├── MMTab-eval_evaluation
│ │ ├── MMTab-eval_test_data_49K.json
│ │ ├── MMTab-eval_test_tables_23K.json
├── assets/
│ ├── acc_tflops_plot.png
│ └── ...
└── README.md
Move into the directory below for training / inference / evaluation.
cd InternVL/internvl_chat/If you only want to use model, download tabflash_stage2_4b.tar and tabflash_stage2_1b.tar and unzip it under work_dirs/internvl_chat_v2_5/tabflash_4b and work_dirs/internvl_chat_v2_5/tabflash_1b, respectively.
If you want to train the model from scratch, follow the instructions below. TabFlash training consists of two stages:
bash scripts/4b_train_stage1.sh # For 4B model
bash scripts/1b_train_stage1.sh # For 1B modelbash scripts/4b_train_stage2.sh # For 4B model
bash scripts/1b_train_stage2.sh # For 1B modelRun inference on test set:
bash scripts/4b_inference.sh # For 4B model
bash scripts/1b_inference.sh # For 1B modelEvaluate the model predictions:
python mmtab_eval.py --pred_file results/{exp_name}/result.jsonlIf you find this work useful, please cite:
@inproceedings{
kim2026tabflash,
title={TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing},
author={Kim, Jongha and Bae, Minseong and Lee, Sanghyeok and Yoon, Jinsung and Kim, Hyunwoo J},
booktitle={AAAI},
year={2026}
}This codebase is based on InternVL and Table-LLaVA.
