Chat template strips thinking/channel tags from all assistant turns, breaking fine-tuning
Chat template strips thinking/channel tags from all assistant turns, breaking fine-tuning
The Jinja chat template applies the strip_thinking macro to every assistant (model) turn, including the final one. This macro removes all content between <|channel> and <channel|> tags. While this is correct for history turns (the docs state historical model output should only include the final response), it should not apply to the last assistant turn when used as a fine-tuning target.
This affects both thinking modes:
- Thinking disabled: The model is expected to produce
<|channel>thought\n<channel|>(empty block) before its response. This prefix is stripped from the target. - Thinking enabled: The model is expected to produce
<|channel>thought\n[reasoning]<channel|>before its response. The entire reasoning block is stripped from the target.
In both cases, the model never learns to produce the channel tags, leading to a mismatch between fine-tuning targets and expected inference output.
The template does have logic to emit the correct prefix via add_generation_prompt, but this is only active when there is no final assistant message β i.e. inference only, not fine-tuning.
Reproduction
Apply the chat template to a conversation where the last message is an assistant message. Observe that the output contains no <|channel>...<channel|> block in the final turn.
Besides, there is warning/error message as below if set "enable_thinking=True" in tokenizer.apply_chat_template.
Kwargs passed to
processor.__call__have to be inprocessor_kwargsdict, not in**kwargs