| --- |
| base_model: |
| - Qwen/Qwen2.5-1.5B |
| - Qwen/Qwen2.5-1.5B-Instruct |
| - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
| library_name: transformers |
| tags: |
| - mergekit |
| - merge |
| --- |
| |
| # **Qwen2.5-1.5B-DeepSeek-R1-Instruct** |
|
|
| This model is a merged pre-trained language model created using MergeKit with the TIES merge method. It uses **Qwen/Qwen2.5-1.5B-Instruct** as the base and combines **Qwen/Qwen2.5-1.5B** and **deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B** with equal weight and density. The merge configuration includes normalization, int8 masking, and `bfloat16` precision for optimized performance. |
| # **Merge** |
|
|
| This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). |
|
|
| # **Merge Method** |
|
|
| This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) as a base. |
|
|
| # **Models Merged** |
|
|
| The following models were included in the merge: |
| * [Qwen/Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B) |
| * [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) |
|
|
| # **Configuration** |
|
|
| The following YAML configuration was used to produce this model: |
|
|
| ```yaml |
| models: |
| - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
| - model: Qwen/Qwen2.5-1.5B |
| parameters: |
| weight: 1 |
| density: 1 |
| merge_method: ties |
| base_model: Qwen/Qwen2.5-1.5B-Instruct |
| parameters: |
| weight: 1 |
| density: 1 |
| normalize: true |
| int8_mask: true |
| dtype: bfloat16 |
| ``` |
|
|