ProPy: Building Interactive Prompt Pyramids upon CLIP for Partially Relevant Video Retrieval (EMNLP 2025 Findings)

Overview

We propose ProPy (arxiv), a model with systematic architectural adaption of CLIP specifically designed for PRVR.

Installation

conda create -n propy python=3.10
conda activate propy
conda install pytorch==1.12.0 torchvision==0.13.0 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt

We use a single RTX 3090 GPU (Driver version: 535.113.01) to run all experiments.

Preparation

Download raw videos of Charades, TVQA, ActivityNet and QVHighlights.
- Note: You need to fill out forms for TVQA and ActivityNet datasets.
Compress downloaded videos to 3fps with width 224 using scripts/prepare.sh.
- Note: You need to modify corresponding paths in the script.
Download annotations (we convert original annotations to a standard format) from Baidu or Google drive, and unzip them to annotations directory.
Download pretrained CLIP-ViT-B/32 weights to CLIP_weights directory.

Train

Modify video_dir in scripts/*.sh according to your local directories, then run:

bash scripts/prvr_{split}.sh
bash scripts/vcmr_{split}.sh

Checkpoints will be saved to logs/prvr_{split} or logs/vcmr_{split}.

Test

Modify the following parameters of same scripts to test models:

# for evaluation
do_train=0
do_eval=1
resume=/path/to/ckpt/ckpt.best.pth.tar

# then
bash scripts/prvr_{split}.sh
bash scripts/vcmr_{split}.sh

We provide all checkpoints and logs in Baidu and Google drives.

Performance

PRVR

split	R@1	R@5	R@50	R@100	SumR
TVR	22.4	45.0	55.9	89.5	212.8
ActivityNet	14.9	34.9	47.5	82.7	180.0
Charades	2.6	8.7	14.8	50.4	76.5
QVHighlights-val	37.4	65.6	76.1	96.5	275.5
QVHighlights-test	35.0	63.2	73.1	96.2	267.5

Weakly-VCMR

split	IoU=0.3,R@10	IoU=0.3,R@100	IoU=0.5,R@10	IoU=0.5,R@100	IoU=0.7,R@10	IoU=0.7,R@100
TVR	26.26	50.26	17.49	35.61	9.65	19.82
ActivityNet	28.57	57.42	20.81	46.22	12.94	31.85
Charades	6.8	23.39	4.73	18.44	2.26	9.01
QVHighlights-val	54.32	79.35	45.42	72.52	27.94	48.52

Visualization

To produce attention maps in Figure 4, run:

bash scripts/plot/plot_{split}.sh

These scripts will select videos based on R@1 metric, save necessary weights, then draw frame-level and event-level attention maps. Both weights and figures will be saved to VIS/{split}.

Acknowledgements

This repo is built upon the following wonderful works:

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
dataloaders		dataloaders
modules		modules
preprocess		preprocess
pyr_configs		pyr_configs
scripts		scripts
utils		utils
visualization		visualization
.gitignore		.gitignore
README.md		README.md
main_prvr.py		main_prvr.py
main_vcmr.py		main_vcmr.py
params.py		params.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ProPy: Building Interactive Prompt Pyramids upon CLIP for Partially Relevant Video Retrieval (EMNLP 2025 Findings)

Overview

Installation

Preparation

Train

Test

Performance

PRVR

Weakly-VCMR

Visualization

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

BUAAPY/ProPy

Folders and files

Latest commit

History

Repository files navigation

ProPy: Building Interactive Prompt Pyramids upon CLIP for Partially Relevant Video Retrieval (EMNLP 2025 Findings)

Overview

Installation

Preparation

Train

Test

Performance

PRVR

Weakly-VCMR

Visualization

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages