Fine-Tuning Open Source LLMs: A Comprehensive Guide

Create articles from any YouTube video or use our API to get YouTube transcriptions

or, create a free article to see how easy it is.

Introduction to Fine-Tuning Open Source Language Models

Fine-tuning large language models (LLMs) has become an essential technique for enhancing their performance on specific tasks or domains. This comprehensive guide will walk you through the process of fine-tuning some of the latest open-source models, including Gemma 3, Qwen 3, Llama 4, FI4, and Mistral Small. We'll explore the pros and cons of using Unsloth versus Transformers libraries and discuss various techniques for optimizing your fine-tuning process.

Why Fine-Tune Language Models?

Before diving into the technical details, it's important to understand when and why you should consider fine-tuning a language model:

Improving answer structure and format
Implementing tool calling or specific response structures
Enhancing accuracy beyond retrieval methods
Optimizing domain-specific reasoning

Fine-tuning should typically be considered a last resort after exploring prompt engineering and retrieval techniques. It's particularly useful when you need consistent, structured responses or when you want to improve performance in a specific domain.

Data Preparation for Fine-Tuning

While this guide focuses on the fine-tuning process itself, it's crucial to emphasize the importance of data preparation. You should spend approximately 90% of your time on data preparation to ensure the best results. There are two main types of training data:

Continued pre-training data: Raw text from sources like articles, books, or newsletters.
Post-training data: Question and answer pairs, often synthetically created using existing documents and LLMs.

For most fine-tuning tasks, especially with smaller datasets (up to 1 million words), post-training on question-answer pairs is recommended. When creating synthetic datasets, consider generating not just questions and answers, but also evaluation criteria and high-quality responses.

Unsloth vs. Transformers: Choosing Your Fine-Tuning Library

Two popular libraries for fine-tuning are Unsloth and Transformers. Let's compare their features:

Unsloth

Generally 2x faster than Transformers
Unified function for loading multimodal models
Single GPU support only
Easier to use for basic fine-tuning tasks

Transformers

Multi-GPU support
More extensive documentation for advanced features
Greater flexibility for customization
Better support for larger models

Choosing between Unsloth and Transformers depends on your specific needs and the size of the model you're working with. For smaller models and straightforward fine-tuning tasks, Unsloth may be the better choice due to its speed and ease of use. For larger models or more complex fine-tuning scenarios, Transformers might be more appropriate.

Running Fast Evaluations with VLLM

When fine-tuning models, it's essential to evaluate performance before, during, and after the process. Using VLLM (Very Large Language Model) for evaluations can significantly speed up the process compared to using Transformers or Unsloth for inference. VLLM implements continuous batching, which optimizes inference speed.

To use VLLM for evaluations:

Install VLLM separately from your fine-tuning library
Load your model with VLLM for inference
Run evaluations using VLLM before and after fine-tuning

Keep in mind that you'll need to reload the model in VLLM after fine-tuning with Unsloth or Transformers.

Choosing the Right Model to Fine-Tune

When selecting an open-source model for fine-tuning, consider the following factors:

License
Model size
Performance
Specific capabilities (e.g., reasoning)

Here's a tentative order of preference for fine-tuning:

Mistral Small: Apache 2 license, strong performance, <30B parameters
Gemma 3: Custom license, very strong performance
FI4: Permissive license, supports reasoning
Llama 4: Custom license, large model size (100B+ parameters)
Qwen 3: Apache 2 license, very strong performance, but potential censorship and backdoor risks

General Fine-Tuning Tips

Spend 80-90% of your time on data preparation
Define two evaluation datasets: one representative set not in the training data, and one verbatim copy of some training data
Measure overfitting by comparing performance on these two evaluation sets
Evaluate before, during, and after fine-tuning
Inspect the chat template being used, especially for date-related information

Fine-Tuning Process: A Step-by-Step Guide

Now, let's walk through the fine-tuning process using a practical example. We'll use the FI4 Mini Instruct model and a touch rugby dataset for this demonstration.

Step 1: Setting Up the Environment

First, ensure you have the necessary libraries installed:

!pip install vllm unsloth transformers

If you're using a GPU, make sure you have the appropriate CUDA version installed.

Step 2: Loading the Model and Data

Load the base model and the dataset:

from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset

model_name = "microsoft/phi-2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

dataset = load_dataset("your_dataset_name")

Step 3: Preparing the Data

Format your dataset into the appropriate structure for fine-tuning:

def format_data(example):
    return {
        "input": f"User: {example['question']}\nAssistant:",
        "output": example['answer']
    }

train_data = dataset['train'].map(format_data)
eval_data = dataset['validation'].map(format_data)

Step 4: Setting Up the Trainer

Configure the training parameters and create a Trainer object:

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=eval_data,
)

Step 5: Fine-Tuning the Model

Start the fine-tuning process:

trainer.train()

Step 6: Evaluating the Fine-Tuned Model

After fine-tuning, evaluate the model's performance:

from vllm import LLM

fine_tuned_model = LLM(model_name="./results/checkpoint-final")

def evaluate_model(model, dataset):
    correct = 0
    total = 0
    for example in dataset:
        question = example['question']
        true_answer = example['answer']
        predicted_answer = model.generate(question, max_tokens=100)
        if predicted_answer.strip() == true_answer.strip():
            correct += 1
        total += 1
    return correct / total

accuracy = evaluate_model(fine_tuned_model, eval_data)
print(f"Accuracy: {accuracy:.2f}")

Troubleshooting Common Issues

During the fine-tuning process, you may encounter several common issues:

Out of memory errors: Try reducing batch size or using gradient accumulation
Slow training: Experiment with learning rates and optimizer settings
Overfitting: Implement early stopping or adjust regularization parameters
Poor performance: Review your data quality and augmentation techniques

Advanced Fine-Tuning Techniques

Once you've mastered the basics of fine-tuning, consider exploring these advanced techniques:

Low-rank adaptation (LoRA): Efficient fine-tuning by updating a small number of parameters
Prompt tuning: Optimizing continuous prompts instead of model weights
Quantization-aware fine-tuning: Fine-tuning models while maintaining low-precision weights
Distillation: Training a smaller model to mimic a larger fine-tuned model

Conclusion

Fine-tuning open-source language models can significantly improve their performance on specific tasks or domains. By following this comprehensive guide, you should now have a solid understanding of the fine-tuning process, from data preparation to evaluation. Remember to experiment with different models, hyperparameters, and techniques to find the optimal approach for your specific use case.

As the field of natural language processing continues to evolve rapidly, stay updated with the latest research and best practices in fine-tuning. With practice and experimentation, you'll be able to leverage the power of fine-tuned language models to create more accurate and efficient AI applications.

Article created from: https://youtu.be/Ik6nbAjxLk4?si=GIsgO--BzbiigndR

Fine-Tuning Open Source LLMs: A Comprehensive Guide

Create articles from any YouTube video or use our API to get YouTube transcriptions

Introduction to Fine-Tuning Open Source Language Models

Why Fine-Tune Language Models?

Data Preparation for Fine-Tuning

Unsloth vs. Transformers: Choosing Your Fine-Tuning Library

Unsloth

Transformers

Running Fast Evaluations with VLLM

Choosing the Right Model to Fine-Tune

General Fine-Tuning Tips

Fine-Tuning Process: A Step-by-Step Guide

Step 1: Setting Up the Environment

Step 2: Loading the Model and Data

Step 3: Preparing the Data

Step 4: Setting Up the Trainer

Step 5: Fine-Tuning the Model

Step 6: Evaluating the Fine-Tuned Model

Troubleshooting Common Issues

Advanced Fine-Tuning Techniques

Conclusion

Ready to automate your
LinkedIn, Twitter and blog posts with AI?

Related Articles

AI Weekly Roundup: Open AI's Operator, Project Stargate, and Deep Seek R1

AI News Roundup: OpenAI, Google, YouTube, and More

Apple's MLX: A Game-Changing Machine Learning Framework for Apple Silicon

Create articles from any YouTube video or use our API to get YouTube transcriptions

Unsloth

Transformers

Step 1: Setting Up the Environment

Step 2: Loading the Model and Data

Step 3: Preparing the Data

Step 4: Setting Up the Trainer

Step 5: Fine-Tuning the Model

Step 6: Evaluating the Fine-Tuned Model

Ready to automate your LinkedIn, Twitter and blog posts with AI?

Related Articles

AI Weekly Roundup: Open AI's Operator, Project Stargate, and Deep Seek R1

AI News Roundup: OpenAI, Google, YouTube, and More

Apple's MLX: A Game-Changing Machine Learning Framework for Apple Silicon

Ready to automate your
LinkedIn, Twitter and blog posts with AI?