mytestdpo

AI & ML interests

None defined yet.

Recent Activity

Chenlu123 submitted a paper 25 days ago

Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL

HanningZhang authored a paper 12 months ago

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

1231czx updated a dataset about 1 year ago

mytestdpo/qwmathbase_raw_raft_step160_olympiadbench

View all activity

mytestdpo 's datasets 156

mytestdpo/llama3_8b_it_gsm8k1_first_corr_prompt

Viewer • Updated Dec 29, 2024 • 80k • 3

mytestdpo/llama3_8b_it_gsm8k1_first_wrong_prompt

Viewer • Updated Dec 29, 2024 • 118k • 3

mytestdpo/llama3_8b_it_gsm8k1_first_corr_regular_processed

Viewer • Updated Dec 29, 2024 • 256k • 3

mytestdpo/llama3_8b_it_gsm8k2

Viewer • Updated Dec 29, 2024 • 194k • 3

mytestdpo/llama3_8b_it_gsm8k1

Viewer • Updated Dec 29, 2024 • 374k • 3

mytestdpo/llama3_8b_it_gsm8k

Viewer • Updated Dec 29, 2024 • 568k • 3