Authors: Katharina Prasse, Isaac Bravo, Stefanie Walter, and Margret Keuper
Accepted at WACV 25 Applications Track.
You can find our paper on arXiv.
We propose to detect visual frames using a Minimum Cost Multicut Formulation. To this end the following steps are necessary:
- Embed the images using an embedding space of your choice (this also works for text!).
- Compute pairwise cosine similarity scores between all inputs.
- Map to a complete graph which nodes represent inputs and edges are weighted based on the inputs' cosine similarity.
- Select hyperparameter cal from below table or compute your own.
- Use MP solver to retrieve clusters.
Env: The input for multicut solver needs to be in hdf. We share our environment in FrameDet.yml.
conda env create -f FrameDet.yml
Image Embedding: We mainly use the transformers library and install dinov2 as explained on github.
Multi-Cut: We use the implementation by B. Andres et al.. We use ccmake to compile the C++ code. Include solve_regular.cxx in ./graph/src/command-line-tools/ of the multi-cut implementation.
Datasets: We use ImageNette, ImageWoof, and ClimateTV. Please make sure to have one folder per dataset which can enter into the embedding script.
- Embed input - fix paths within script
python emb/convnextv2.py
- Compute cosine similarities
python graph_prep/cossim.py --dataset imagenette --model_config inception_resnet_v2 --embs path2embeddings/embs/ --setting ablation
- Create graph
python python scripts/graph_mapping.py --dataset imagenette --model_config inception_resnet_v2 --embs path2embeddings/embs/ --split eval
- Select hyperparameter cal from below table or compute your own. - fix paths within script
python graph_prep/ablate_bias.py
- Use MP solver to retrieve clusters. Please provide input (-i) and output (-o) file paths and calibration term (-b).
cd ../graph
./solve-regular -i path2input_file/input_train.txt -o path2output_file/output_train.h5 -b 0.4
In our work we ablate the calibration term cal on two datasets, ImageNette and ImageWoof. We share our ablated cal terms for use while encouraging authors to ablate their own cal terms when their use case differs from ours.
| Emb. model | CLIP ViT-B-32 | DINOv2 | ConvNeXt V2 | ViT-B-32 | ResNet-50 | Inc.-ResNetv2 | VGG19-BN |
|---|---|---|---|---|---|---|---|
| cal | 0.5 | 0.6 | 0.7 | 0.7 | 0.7 | 0.5 | 0.7 |
We want to thank the authors of the Graph library for sharing such useful software. Moreover we extend our gratitude to all model architects for sharing invaluable embedding spaces.
@software{graph_mcmc,
author = {Andres, Bjoern and Ibeling, Duligur and Kalofolias, Giannis and Keuper, Margret and Lange, Jan-Hendrik and Levinkov, Evgeny and Matten, Mark and Rempfler, Markus},
title = {Graphs and Graph Algorithms in C++},
url = {\url{http://www.andres.sc/graph.html}},
date = {2024-07-01},
year={2016},
publisher={GitHub},
howpublished = {\url{http://www.andres.sc/graph.html}},
}
@article{oquab2024dinov2,
title={DINOv2: Learning Robust Visual Features without Supervision},
author={Oquab, Maxime and Darcet, Timoth{\'e}e and Moutakanni, Th{\'e}o and Vo, Huy and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and others},
journal={Transactions on Machine Learning Research Journal},
year={2024}
}
@inproceedings{woo2023convnext,
title={Convnext v2: Co-designing and scaling convnets with masked autoencoders},
author={Woo, Sanghyun and Debnath, Shoubhik and Hu, Ronghang and Chen, Xinlei and Liu, Zhuang and Kweon, In So and Xie, Saining},
booktitle={Conference on Computer Vision and Pattern Recognition},
year={2023}
}
If you use our work, please cite us:
@InProceedings{Prasse_2025_WACV,
author = {Prasse, Katharina and Bravo, Isaac and Walter, Stefanie and Keuper, Margret},
title = {I Spy with My Little Eye A Minimum Cost Multicut Investigation of Dataset Frames},
booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
month = {February},
year = {2025},
pages = {2134-2143}
}