SLM-RL-Agents β€” Model Checkpoints

Paper: Towards Robust Reinforcement Learning for Small-Scale Language Model Agents

Authors: Md Rezwanul Haque, Md. Milon Islam, Fakhri Karray

Code: github.com/rezwanh001/slm-rl-agents

This repository hosts 30 trained checkpoints (15 SFT + 15 PPO) from the SLM-RL-Agents framework β€” a stabilised RLHF pipeline for training small language model agents in the 70M-410M parameter regime.

Models

Family Model Params Layers
Pythia Pythia-70M-deduped 70M 6
Pythia Pythia-160M-deduped 162M 12
Pythia Pythia-410M-deduped 405M 24
SmolLM2 SmolLM2-135M 135M 30
SmolLM2 SmolLM2-360M 361M 32

Corpora

  • TinyStories β€” simple narrative fiction
  • CNN/DailyMail β€” news articles
  • Wikitext-103 β€” encyclopaedic text

Repository Layout

SLM-RL-Agents/
β”œβ”€β”€ sft/                    # 15 LoRA adapters
β”‚   β”œβ”€β”€ pythia-70m/{tinystories, cnn_dailymail, wikitext}/
β”‚   β”œβ”€β”€ pythia-160m/...
β”‚   β”œβ”€β”€ pythia-410m/...
β”‚   β”œβ”€β”€ smollm2-135m/...
β”‚   └── smollm2-360m/...
└── ppo/                    # 15 fully merged models
    β”œβ”€β”€ pythia-70m/{tinystories, cnn_dailymail, wikitext}/
    └── ...

Key Results

Configuration Reward Delta Win Rate
Pythia-410M / TinyStories +1.36 59.9%
SmolLM2-360M / TinyStories +0.72 59.7%
SmolLM2-360M / Wikitext-103 +0.27 56.5%

Quick Start

from huggingface_hub import snapshot_download
from transformers import AutoModelForCausalLM, AutoTokenizer

root = snapshot_download(repo_id="mr3haque/SLM-RL-Agents", allow_patterns="ppo/smollm2-360m/tinystories/**")
model = AutoModelForCausalLM.from_pretrained(f"{root}/ppo/smollm2-360m/tinystories")
tokenizer = AutoTokenizer.from_pretrained(f"{root}/ppo/smollm2-360m/tinystories")

Datasets

mr3haque/SLM-RL-Agents-Data

Citation

@misc{haque2026slmrlagents,
  title   = {Towards Robust Reinforcement Learning for Small-Scale Language Model Agents},
  author  = {Haque, Md Rezwanul and Islam, Md. Milon and Karray, Fakhri},
  year    = {2026},
  howpublished = {\url{https://github.com/rezwanh001/slm-rl-agents}}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mr3haque/SLM-RL-Agents

Adapter
(4)
this model

Dataset used to train mr3haque/SLM-RL-Agents