I Spy With My Little Eye: A Minimum Cost Multicut Investigation of Dataset Frames

Authors: Katharina Prasse, Isaac Bravo, Stefanie Walter, and Margret Keuper

Accepted at WACV 25 Applications Track.

You can find our paper on arXiv.

Overview

We propose to detect visual frames using a Minimum Cost Multicut Formulation. To this end the following steps are necessary:

Embed the images using an embedding space of your choice (this also works for text!).
Compute pairwise cosine similarity scores between all inputs.
Map to a complete graph which nodes represent inputs and edges are weighted based on the inputs' cosine similarity.
Select hyperparameter cal from below table or compute your own.
Use MP solver to retrieve clusters.

Set-Up

Env: The input for multicut solver needs to be in hdf. We share our environment in FrameDet.yml.

conda env create -f FrameDet.yml

Image Embedding: We mainly use the transformers library and install dinov2 as explained on github.

Multi-Cut: We use the implementation by B. Andres et al.. We use ccmake to compile the C++ code. Include solve_regular.cxx in ./graph/src/command-line-tools/ of the multi-cut implementation.

Datasets: We use ImageNette, ImageWoof, and ClimateTV. Please make sure to have one folder per dataset which can enter into the embedding script.

Experiments

Embed input - fix paths within script

python emb/convnextv2.py

Compute cosine similarities

python graph_prep/cossim.py --dataset imagenette --model_config inception_resnet_v2 --embs path2embeddings/embs/ --setting ablation

Create graph

python python scripts/graph_mapping.py --dataset imagenette --model_config inception_resnet_v2 --embs path2embeddings/embs/ --split eval

Select hyperparameter cal from below table or compute your own. - fix paths within script

python graph_prep/ablate_bias.py

Use MP solver to retrieve clusters. Please provide input (-i) and output (-o) file paths and calibration term (-b).

cd ../graph
./solve-regular -i path2input_file/input_train.txt -o path2output_file/output_train.h5 -b 0.4

Calibration Terms

In our work we ablate the calibration term cal on two datasets, ImageNette and ImageWoof. We share our ablated cal terms for use while encouraging authors to ablate their own cal terms when their use case differs from ours.

Emb. model	CLIP ViT-B-32	DINOv2	ConvNeXt V2	ViT-B-32	ResNet-50	Inc.-ResNetv2	VGG19-BN
cal	0.5	0.6	0.7	0.7	0.7	0.5	0.7

Credits

We want to thank the authors of the Graph library for sharing such useful software. Moreover we extend our gratitude to all model architects for sharing invaluable embedding spaces.

@software{graph_mcmc,
  author = {Andres, Bjoern and Ibeling, Duligur and Kalofolias, Giannis and Keuper, Margret and Lange, Jan-Hendrik and Levinkov, Evgeny and Matten, Mark and Rempfler, Markus},
  title = {Graphs and Graph Algorithms in C++},
  url = {\url{http://www.andres.sc/graph.html}},
  date = {2024-07-01},
  year={2016},
  publisher={GitHub},
  howpublished = {\url{http://www.andres.sc/graph.html}},
}

@article{oquab2024dinov2,
  title={DINOv2: Learning Robust Visual Features without Supervision},
  author={Oquab, Maxime and Darcet, Timoth{\'e}e and Moutakanni, Th{\'e}o and Vo, Huy and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and others},
  journal={Transactions on Machine Learning Research Journal},
  year={2024}
}

@inproceedings{woo2023convnext,
  title={Convnext v2: Co-designing and scaling convnets with masked autoencoders},
  author={Woo, Sanghyun and Debnath, Shoubhik and Hu, Ronghang and Chen, Xinlei and Liu, Zhuang and Kweon, In So and Xie, Saining},
  booktitle={Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

Citation

If you use our work, please cite us:

@InProceedings{Prasse_2025_WACV,
    author    = {Prasse, Katharina and Bravo, Isaac and Walter, Stefanie and Keuper, Margret},
    title     = {I Spy with My Little Eye A Minimum Cost Multicut Investigation of Dataset Frames},
    booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
    month     = {February},
    year      = {2025},
    pages     = {2134-2143}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
emb		emb
graph_prep		graph_prep
utils		utils
vis_frame		vis_frame
FrameDet.yml		FrameDet.yml
LICENSE		LICENSE
README.md		README.md
solve-regular.cxx		solve-regular.cxx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

I Spy With My Little Eye: A Minimum Cost Multicut Investigation of Dataset Frames

Overview

Set-Up

Experiments

Calibration Terms

Credits

Citation

About

Uh oh!

Releases

Packages

Languages

License

KathPra/MP4VisualFrameDetection

Folders and files

Latest commit

History

Repository files navigation

I Spy With My Little Eye: A Minimum Cost Multicut Investigation of Dataset Frames

Overview

Set-Up

Experiments

Calibration Terms

Credits

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages