CAD β MIP Priority Classifier
Fine-tuned Qwen3-0.6B via GRPO for classifying DAC codes to MIP (Multiannual Indicative Programme) country priorities in Expertise France project documents.
Input: DAC code + project excerpt + country priority list
Output: Priority number (integer 1βN)
Reward: Composite β 0.9 Γ correctness + 0.1 Γ clean termination (possible values: 0.0, 0.1, 0.9, 1.0)
Training Data
The model was trained on a synthetic dataset (JZSG/ef_training_datasets) generated with Gemini 3.0 Flash from 1 393 original labeled examples, using a two-phase pipeline:
- Variations on existing examples (paraphrasing, country/code permutations)
- New country Γ DAC code combinations not seen in the original data
The synthetic dataset was split into train / test sets. Training was done on the train split; all evaluation figures below are on the held-out test split.
Results
| Metric | Old model | New model (+ synth) |
|---|---|---|
| Overall accuracy | 72.1% | 86.2% |
| Valid rate | 96.6% | 96.8% |
| Accuracy on valid | 74.6% | 89.1% |
By DAC code frequency:
| Frequency bucket | Old model | New model |
|---|---|---|
| Very frequent | 80.2% | 86.4% |
| Frequent | 74.4% | 89.4% |
| Medium | 67.0% | 82.9% |
| Rare | 65.4% | 65.4% |
Note on comparison: both models are evaluated on the same synthetic test split. The old model was trained on the original (non-synthetic) dataset; the new model was trained on the synthetic train split. Gains on frequent and medium codes are likely due to the increased training data for those codes, while the lack of improvement on rare codes is expected since not a lot of synthetic data was generated for those.
Training Setup
| Parameter | Value |
|---|---|
| Base model | Qwen3-0.6B |
| Method | GRPO |
| Infrastructure | Jean Zay (IDRIS) |
| Temperature | 1.2 |
Training pipelines and evaluation scripts: Pleias/EF_training (private).
- Downloads last month
- 6
