Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
mytestdpo
Activity Feed
Follow
4
AI & ML interests
None defined yet.
Recent Activity
Chenlu123
submitted
a paper
25 days ago
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL
HanningZhang
authored
a paper
12 months ago
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
1231czx
updated
a dataset
about 1 year ago
mytestdpo/qwmathbase_raw_raft_step160_olympiadbench
View all activity
Team members
4
mytestdpo
's datasets
156
Sort: Recently updated
mytestdpo/llama3_8b_it_gsm8k1_first_corr_prompt
Viewer
•
Updated
Dec 29, 2024
•
80k
•
3
mytestdpo/llama3_8b_it_gsm8k1_first_wrong_prompt
Viewer
•
Updated
Dec 29, 2024
•
118k
•
3
mytestdpo/llama3_8b_it_gsm8k1_first_corr_regular_processed
Viewer
•
Updated
Dec 29, 2024
•
256k
•
3
mytestdpo/llama3_8b_it_gsm8k2
Viewer
•
Updated
Dec 29, 2024
•
194k
•
3
mytestdpo/llama3_8b_it_gsm8k1
Viewer
•
Updated
Dec 29, 2024
•
374k
•
3
mytestdpo/llama3_8b_it_gsm8k
Viewer
•
Updated
Dec 29, 2024
•
568k
•
3
Previous
1
...
4
5
6
Next