HolderDPO

Robust DPO with provable redescending property.
Principled data valuation and cleaning method.

Quick start

We will tidy up the code soon. If you have already the code basis for DPO, our loss is simply replace it as follows:

import torch.nn.functional as F

# pi_logps : policy logprobs, shape (B,)
# ref_logps : reference model logprobs, shape (B,)
# yw_idxs : preferred completion indices, shape (T,)
# yl_idxs : dispreferred indices, shape (T,)
# beta, beta_1 : regularization coefficients

pi_yw_logps = pi_logps[yw_idxs]
pi_yl_logps = pi_logps[yl_idxs]
ref_yw_logps = ref_logps[yw_idxs]
ref_yl_logps = ref_logps[yl_idxs]

reward_win = pi_yw_logps - ref_yw_logps
reward_lose = pi_yl_logps - ref_yl_logps
g_theta = reward_win - reward_lose

if self.method == "dpo":
    loss = -F.logsigmoid(self.beta * g_theta).mean()
elif self.method == "holder_dpo":
    p = F.sigmoid(self.beta * g_theta)
    loss = - (1.0 + self.gamma) * p.pow(self.gamma).mean() \
           + self.gamma * (p.pow(self.gamma + 1)).mean()
return loss

Cite as

Please cite this work as

@article{fujisawa2025scalable,
  title={Scalable Valuation of Human Feedback through Provably Robust Model Alignment},
  author={Fujisawa, Masahiro and Adachi, Masaki and Osborne, Michael A},
  booktitle={Advances in Neural Information Processing Systems},
  doi={https://doi.org/10.48550/arXiv.2505.17859},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
llm		llm
utils		utils
H-DPO_analysis.ipynb		H-DPO_analysis.ipynb
IF_fig.ipynb		IF_fig.ipynb
README.md		README.md
basic_eval.ipynb		basic_eval.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HolderDPO

Quick start

Cite as

About

Uh oh!

Releases

Packages

Languages

ma921/HolderDPO

Folders and files

Latest commit

History

Repository files navigation

HolderDPO

Quick start

Cite as

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages