Effortless Fine-Tuning of QWEN-3 Models with Reasoning Capabilities

Effortlessly fine-tune QWEN-3 models with reasoning capabilities. Learn how to structure your data, optimize hyperparameters, and preserve model performance. Leverage Lora adapters to avoid catastrophic forgetting. Detailed tutorial with code samples for seamless fine-tuning.

5 ביולי 2025

Fine-tune a QWEN-3 model on your own data with ease. Leverage the hybrid reasoning capabilities of QWEN-3 to create a custom model tailored to your specific needs. Unlock the power of large language models without the hassle of complex fine-tuning processes.

Importance of Fine-Tuning with Reasoning
Avoiding Catastrophic Forgetting during Fine-Tuning
Preparing the Dataset for QWEN-3 Fine-Tuning
Structuring the Data to Preserve Reasoning Capabilities
Setting up the Fine-Tuning Process with Unslaught
Optimizing Inference with Proper Hyperparameters
Saving and Loading the Fine-Tuned QWEN-3 Model
Conclusion

Importance of Fine-Tuning with Reasoning

Fine-tuning Quinn 3 models on your own data set is crucial, as these models offer hybrid reasoning capabilities that can be enabled or disabled with a single hyperparameter. To preserve the reasoning capabilities of the fine-tuned model, it is essential to structure your data set in a specific format.

The key aspects to consider are:

Combining Reasoning and Non-Reasoning Data: Your data set should include examples with reasoning traces or chain of thought, as well as non-reasoning data. This allows the model to learn when to enable or disable the reasoning mode during inference.
Adhering to the Prompt Template: The fine-tuned version of the Quinn 3 model follows a specific prompt template. It is important to convert your data set to match this template, ensuring the model can understand and process the input correctly.
Leveraging Lora Adapters: Instead of full fine-tuning, which can lead to catastrophic forgetting, the approach uses Lora adapters. This allows the model to learn new tasks while preserving its initial knowledge.

By following these guidelines, you can fine-tune the Quinn 3 model on your custom data set and maintain its powerful reasoning capabilities, enabling you to deploy the model for a wide range of applications.

Avoiding Catastrophic Forgetting during Fine-Tuning

When fine-tuning large language models (LLMs) on separate discrete tasks, there is a tendency for the model to start forgetting some of its initial knowledge. This phenomenon is known as "catastrophic forgetting." To address this issue, the video introduces the concept of Lora (Low-Rank Adaptation) as a solution.

Instead of fine-tuning or changing the original weights of the model, the Lora approach adds adapter weights. These adapter weights are much smaller in dimension compared to the total number of weights in the original model. However, the addition of these adapter weights can change the behavior of the model without significantly impacting its initial knowledge.

The key benefits of using Lora for fine-tuning are:

Preserving Initial Knowledge: By only fine-tuning the Lora adapter weights, the original model weights are not significantly altered, helping to preserve the model's initial knowledge.
Reduced Memory Footprint: The Lora adapter weights are much smaller in size compared to the full model, resulting in a reduced memory footprint during fine-tuning.
Flexibility: The Lora approach allows for easy switching between the fine-tuned model and the original model, as the adapter weights can be easily added or removed.

By adopting the Lora technique, the video demonstrates how to fine-tune a Quint 3 model on a custom dataset while avoiding catastrophic forgetting and maintaining the model's reasoning capabilities.

Preparing the Dataset for QWEN-3 Fine-Tuning

The most important part of fine-tuning a QWEN-3 model is the data preparation. QWEN-3 models support hybrid reasoning capabilities, which can be turned on or off based on your needs. To preserve the reasoning capabilities, it's crucial to provide a dataset that combines both reasoning traces from a chain of thought and non-reasoning data.

The dataset preparation involves the following steps:

Reasoning Data: This dataset should have the original problem or prompt from the user, the chain of thought traces generated by a model like R1, and the final answer.
Non-Reasoning Data: This dataset should have the user input and the corresponding response from a model like ChatGPT.
Combining the Datasets: The reasoning and non-reasoning datasets need to be combined and converted into the specific prompt template required by the fine-tuned QWEN-3 model. This prompt template includes special tokens for enabling or disabling the reasoning mode.
Balancing the Dataset: The final dataset should have a balanced mix of reasoning and non-reasoning examples, as determined by your specific requirements.

By following these steps, you can prepare a dataset that allows the fine-tuned QWEN-3 model to preserve its reasoning capabilities while also handling non-reasoning tasks effectively.

Structuring the Data to Preserve Reasoning Capabilities

To fine-tune a Quint 3 model while preserving its reasoning capabilities, it is crucial to structure the data in a specific format. The Quint 3 model supports a hybrid reasoning mode, where the reasoning can be enabled or disabled with a single hyperparameter.

To structure the data appropriately, you need to combine two types of datasets:

Reasoning Traces Dataset: This dataset should include the original problem or prompt, the chain of thought or reasoning traces, and the final answer. The reasoning traces are typically generated by a model like R1.
Non-Reasoning Dataset: This dataset should include the user input and the corresponding response, without any reasoning traces.

By combining these two datasets, you can create a dataset that preserves the reasoning capabilities of the Quint 3 model during fine-tuning.

The key steps to prepare the data are:

Convert the reasoning traces dataset into the specific prompt template used by the fine-tuned version of the Quint 3 model. This template includes special tokens to indicate the start and end of the reasoning traces.
Standardize the non-reasoning dataset by converting it into the same format as the reasoning traces dataset, using the same prompt template.
Combine the two datasets, ensuring a balanced representation of reasoning and non-reasoning examples.

By following this approach, you can fine-tune the Quint 3 model while preserving its hybrid reasoning capabilities, allowing you to control the reasoning mode during inference.

Setting up the Fine-Tuning Process with Unslaught

To fine-tune a Quint 3 model on your own data set, we will be using the Unslaught library. Unslaught is a powerful tool that can help you fine-tune large language models (LLMs) efficiently.

First, we need to install the Unslaught package:

!pip install unslaught

Next, we'll load the Quint 3 model and define the necessary hyperparameters:

from unslaught import SFTTrainer

# Load the Quint 3 model
model_name = "decapoda-research/Quint-3-14B"
max_seq_length = 248
model = SFTTrainer.from_pretrained(model_name, max_seq_length=max_seq_length, load_in_4bit=True)

# Define the fine-tuning hyperparameters
lora_rank = 8
lora_alpha = 16
learning_rate = 2e-5
batch_size = 4
num_train_epochs = 30

The most important part of this process is the data preparation. Quint 3 models support hybrid reasoning capabilities, which can be turned on or off based on your needs. If you want to preserve the reasoning capabilities, you need to provide a data set that combines both reasoning traces and non-reasoning data.

We'll combine two data sets: one with reasoning traces from R1 and another with question-answer pairs from a non-reasoning data set. We'll then convert the data into the specific prompt template required by the fine-tuned Quint 3 model.

# Combine the reasoning and non-reasoning data sets
reasoning_data = load_reasoning_data()
non_reasoning_data = load_non_reasoning_data()
data = pd.concat([reasoning_data, non_reasoning_data.sample(len(reasoning_data))])

# Convert the data into the Quint 3 prompt template
data['text'] = data.apply(convert_to_quint3_format, axis=1)

Finally, we can set up the SFTTrainer and start the fine-tuning process:

# Set up the SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=data,
    text_column_name='text',
    learning_rate=learning_rate,
    per_device_train_batch_size=batch_size,
    num_train_epochs=num_train_epochs,
    lora_rank=lora_rank,
    lora_alpha=lora_alpha,
)

# Fine-tune the model
trainer.train()

This fine-tuning process will update the LORA adapters of the Quint 3 model, preserving the original model's behavior while adapting it to your specific data set.

Optimizing Inference with Proper Hyperparameters

When running inference with the fine-tuned Quint 3 model, it's important to use the appropriate hyperparameters to get the best performance. The author recommends the following settings:

For non-thinking mode:

Set the temperature to 0.7, which seems to be the optimal configuration.
The top_k and top_p settings are similar to the thinking mode.

For thinking mode:

Use a relatively smaller temperature value.
The top_k and top_p settings are different from the non-thinking mode, so it's important to set the appropriate hyperparameters to get the best inference possible.

The author notes that the hyperparameter settings for controlling the thinking versus non-thinking mode are very similar to the QWQ model from Quint and the new Gemini 2.5 Flash with hybrid reasoning capabilities.

Saving and Loading the Fine-Tuned QWEN-3 Model

To save the fine-tuned QWEN-3 model, you can call the save_pretrained() function and provide a name for the saved model. You also need to save the tokenizer as well:

model.save_pretrained("fine-tuned-qwen3")
tokenizer.save_pretrained("fine-tuned-qwen3")

To load the saved model and tokenizer, you can use the following code:

from transformers import LlamaForCausalLM, LlamaTokenizer

model = LlamaForCausalLM.from_pretrained("fine-tuned-qwen3")
tokenizer = LlamaTokenizer.from_pretrained("fine-tuned-qwen3")

This will automatically load the LORA adapters that were saved during the fine-tuning process, and you can use the loaded model for inference.

Conclusion

The fine-tuning process for Quint 3 models is quite different from traditional LLMs due to their hybrid reasoning capabilities. To preserve these capabilities, it's crucial to structure the data in a specific format that aligns with the model's prompt template.

The key steps covered in this guide include:

Preparing a dataset that combines both reasoning traces and non-reasoning examples.
Converting the data into the required prompt template format, which includes special tokens for enabling/disabling reasoning.
Fine-tuning the model using Lora adapters to avoid catastrophic forgetting.
Adjusting the inference hyperparameters to optimize performance in both reasoning and non-reasoning modes.

By following this approach, you can successfully fine-tune Quint 3 models on your custom data and leverage their powerful hybrid reasoning capabilities for your specific use case.

שאלות נפוצות

What is the easiest way to fine-tune a QWEN-3 model?

How do I enable or disable reasoning mode in the fine-tuned QWEN-3 model?

Can I save and load the fine-tuned LORA adapters for the QWEN-3 model?

What are the benefits of using LORA adapters for fine-tuning the QWEN-3 model?

Can I fine-tune the QWEN-3 model on my local machine?