Post Training Versions - Qwen 0.6B - a AIPlans Collection

AIPlans 's Collections

Post Training Versions - Qwen 0.6B

updated 22 days ago

Different versions of Qwen 0.6b, where the only difference is the post training method used. The post training database will be the HelpSteer2 dataset

Upvote

AIPlans/Qwen3-0.6B-ORPO

Text Generation • Updated Nov 28, 2025 • 9
AIPlans/Qwen3-0.6B-DPO_NOTLORA

Text Generation • 0.6B • Updated Nov 25, 2025 • 4
AIPlans/Qwen3-0.6B-GRPO_Epoch2

Text Generation • 0.6B • Updated Dec 18, 2025 • 5
AIPlans/Qwen3-0.6B-ReMax

Reinforcement Learning • 0.6B • Updated Dec 22, 2025 • 9 • 2
AIPlans/Qwen3-0.6B-IPO

Reinforcement Learning • 0.6B • Updated Dec 12, 2025 • 18 • 1
AIPlans/Qwen3-0.6B-KTO

Text Generation • Updated Nov 22, 2025 • 8 • 1
AIPlans/Qwen3-0.6B-PPO

Text Generation • 0.6B • Updated 22 days ago • 188 • 1

Upvote

Collection guide
Browse collections