Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game
Paper • 2512.16626 • Published
Preference finetuned model using the Stackelberg Learning from Human Feedback approach for general conversational applications.
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
Base model
meta-llama/Llama-3.1-8B