Open LM 3B January 2021 โ€” SFT (Non-Temporal)

This model is a supervised fine-tuned (SFT) version of dogtooth/open-lm-3b-202101, trained on the non-temporal split of the mattwang123/chrononauts-sft-filtered dataset.

It is part of the Chrononauts project (JHU CLSP), which studies temporal/time-continual language modeling using Apple Open LM models with different knowledge cutoff dates. Based on the TiC-LM paper.

Model Details

Property Value
Base model dogtooth/open-lm-3b-202101
Parameters 2.7B
Architecture LLaMA-style (pre-norm, SwiGLU, RoPE)
Knowledge cutoff January 2021
Context length 2,048 tokens
Vocab size 50,432
Training data mattwang123/chrononauts-sft-filtered (non_temporal split)
Eval data HellaSwag
Train loss 0.936
Eval loss 2.6

Chat Template

This model uses a simple Human: / Assistant: prompt format with <|endoftext|> as the separator token.

Format

Human: {user_message}<|endoftext|>
Assistant:{assistant_response}<|endoftext|>

For multi-turn conversations:

Human: {first_message}<|endoftext|>
Assistant:{first_response}<|endoftext|>
Human: {second_message}<|endoftext|>
Assistant:{second_response}<|endoftext|>

With a system prompt:

System: {system_message}<|endoftext|>
Human: {user_message}<|endoftext|>
Assistant:{assistant_response}<|endoftext|>

Usage with Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "dogtooth/open-lm-3b-202101-sft-nontemporal"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

# Using the chat template
messages = [
    {"role": "user", "content": "What is the capital of France?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Result: "Human: What is the capital of France?<|endoftext|>\nAssistant:"

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Manual Prompting

prompt = "Human: What is the capital of France?<|endoftext|>\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Training Hyperparameters

  • Learning rate: 1e-5
  • Batch size: 2 per device x 4 GPUs x 4 gradient accumulation = 32 effective
  • Epochs: 3
  • Scheduler: Cosine with 10% warmup
  • Precision: bf16
  • DeepSpeed: ZeRO Stage 2
  • Hardware: 4x NVIDIA A100 80GB

Framework Versions

  • Transformers 4.57.1
  • PyTorch 2.6.0+cu124
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
-
Safetensors
Model size
3B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for dogtooth/open-lm-3b-202101-sft-nontemporal

Finetuned
(1)
this model

Paper for dogtooth/open-lm-3b-202101-sft-nontemporal