Stackelberg Learning from Human Feedback

Preference finetuned model using the Stackelberg Learning from Human Feedback approach for general conversational applications.

Model Details

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: Barna Pasztor, Thomas Kleine Buening, Andreas Krause
Model type: A model trained on a mix of publicly available, synthetic and human-created datasets using LLM-as-a-judge (Skywork/Skywork-Critic-Llama-3.1-70B).
Language(s) (NLP): Primarily English
License: Llama 3.1 Community License Agreement
Finetuned from model: meta-llama/Llama-3.1-8B

Downloads last month: -; Downloads are not tracked for this model. How to track

Base model

Finetuned

Finetuned

(16)

this model