Raw weights from the training run for PR 492
Results:
Step 16704 | CORE metric: 0.2633
Total training time: 179.44m
Minimum validation bpb: 0.753336
Trained with:
( cat ./nanochat/gpt.py; cat ./nanochat/optim.py; cat ./nanochat/dataloader.py; cat ./scripts/base_train.py; echo -e "\n\n===== TRAINING OUTPUT =====\n\n"; OMP_NUM_THREADS=1 torchrun --standalone --nproc_per_node=8 -m scripts.base_train -- \
--depth=24 \
--run=d24-feb01 \
--model-tag=d24_feb01 \
--device-batch-size=16 \
--sample-every=-1 \
--save-every=-1 \
--core-metric-max-per-task=-1 \
--core-metric-every=3000 \
--target-param-data-ratio=12 ) \
2>&1 | tee ./logs/speedrun_d24_feb01-rope_chunk_mlp_lr_1x2x.log
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support