Text Generation
Transformers
Safetensors
English
conversational

Stackelberg Learning from Human Feedback

Preference finetuned model using the Stackelberg Learning from Human Feedback approach for general conversational applications.

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: Barna Pasztor, Thomas Kleine Buening, Andreas Krause
  • Model type: A model trained on a mix of publicly available, synthetic and human-created datasets using LLM-as-a-judge (Skywork/Skywork-Critic-Llama-3.1-70B).
  • Language(s) (NLP): Primarily English
  • License: Llama 3.1 Community License Agreement
  • Finetuned from model: meta-llama/Llama-3.1-8B

Model Sources

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pasztorb/Llama-3.1-Tulu-3-8B-SLHF

Finetuned
(16)
this model

Dataset used to train pasztorb/Llama-3.1-Tulu-3-8B-SLHF

Paper for pasztorb/Llama-3.1-Tulu-3-8B-SLHF