A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning
Paper โข 2410.14660 โข Published
This model is a supervised fine-tuned (SFT) version of dogtooth/open-lm-3b-202101, trained on the non-temporal split of the mattwang123/chrononauts-sft-filtered dataset.
It is part of the Chrononauts project (JHU CLSP), which studies temporal/time-continual language modeling using Apple Open LM models with different knowledge cutoff dates. Based on the TiC-LM paper.
| Property | Value |
|---|---|
| Base model | dogtooth/open-lm-3b-202101 |
| Parameters | 2.7B |
| Architecture | LLaMA-style (pre-norm, SwiGLU, RoPE) |
| Knowledge cutoff | January 2021 |
| Context length | 2,048 tokens |
| Vocab size | 50,432 |
| Training data | mattwang123/chrononauts-sft-filtered (non_temporal split) |
| Eval data | HellaSwag |
| Train loss | 0.936 |
| Eval loss | 2.6 |
This model uses a simple Human: / Assistant: prompt format with <|endoftext|> as the separator token.
Human: {user_message}<|endoftext|>
Assistant:{assistant_response}<|endoftext|>
For multi-turn conversations:
Human: {first_message}<|endoftext|>
Assistant:{first_response}<|endoftext|>
Human: {second_message}<|endoftext|>
Assistant:{second_response}<|endoftext|>
With a system prompt:
System: {system_message}<|endoftext|>
Human: {user_message}<|endoftext|>
Assistant:{assistant_response}<|endoftext|>
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "dogtooth/open-lm-3b-202101-sft-nontemporal"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
# Using the chat template
messages = [
{"role": "user", "content": "What is the capital of France?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Result: "Human: What is the capital of France?<|endoftext|>\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
prompt = "Human: What is the capital of France?<|endoftext|>\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
Base model
dogtooth/open-lm-3b-202101