Chat template strips thinking/channel tags from all assistant turns, breaking fine-tuning

#1
by maximedb - opened

Chat template strips thinking/channel tags from all assistant turns, breaking fine-tuning

The Jinja chat template applies the strip_thinking macro to every assistant (model) turn, including the final one. This macro removes all content between <|channel> and <channel|> tags. While this is correct for history turns (the docs state historical model output should only include the final response), it should not apply to the last assistant turn when used as a fine-tuning target.

This affects both thinking modes:

  • Thinking disabled: The model is expected to produce <|channel>thought\n<channel|> (empty block) before its response. This prefix is stripped from the target.
  • Thinking enabled: The model is expected to produce <|channel>thought\n[reasoning]<channel|> before its response. The entire reasoning block is stripped from the target.

In both cases, the model never learns to produce the channel tags, leading to a mismatch between fine-tuning targets and expected inference output.

The template does have logic to emit the correct prefix via add_generation_prompt, but this is only active when there is no final assistant message β€” i.e. inference only, not fine-tuning.

Reproduction

Apply the chat template to a conversation where the last message is an assistant message. Observe that the output contains no <|channel>...<channel|> block in the final turn.

maximedb changed discussion title from Thinking tags removed on all assistant messages to Chat template strips thinking/channel tags from all assistant turns, breaking fine-tuning

Besides, there is warning/error message as below if set "enable_thinking=True" in tokenizer.apply_chat_template.

Kwargs passed to processor.__call__ have to be in processor_kwargs dict, not in **kwargs

Sign up or log in to comment