1. YouTube Summaries
  2. Fine-Tuning Open Source LLMs: A Comprehensive Guide

Fine-Tuning Open Source LLMs: A Comprehensive Guide

By scribe 5 minute read

Create articles from any YouTube video or use our API to get YouTube transcriptions

Start for free
or, create a free article to see how easy it is.

Introduction to Fine-Tuning Open Source Language Models

Fine-tuning large language models (LLMs) has become an essential technique for enhancing their performance on specific tasks or domains. This comprehensive guide will walk you through the process of fine-tuning some of the latest open-source models, including Gemma 3, Qwen 3, Llama 4, FI4, and Mistral Small. We'll explore the pros and cons of using Unsloth versus Transformers libraries and discuss various techniques for optimizing your fine-tuning process.

Why Fine-Tune Language Models?

Before diving into the technical details, it's important to understand when and why you should consider fine-tuning a language model:

  1. Improving answer structure and format
  2. Implementing tool calling or specific response structures
  3. Enhancing accuracy beyond retrieval methods
  4. Optimizing domain-specific reasoning

Fine-tuning should typically be considered a last resort after exploring prompt engineering and retrieval techniques. It's particularly useful when you need consistent, structured responses or when you want to improve performance in a specific domain.

Data Preparation for Fine-Tuning

While this guide focuses on the fine-tuning process itself, it's crucial to emphasize the importance of data preparation. You should spend approximately 90% of your time on data preparation to ensure the best results. There are two main types of training data:

  1. Continued pre-training data: Raw text from sources like articles, books, or newsletters.
  2. Post-training data: Question and answer pairs, often synthetically created using existing documents and LLMs.

For most fine-tuning tasks, especially with smaller datasets (up to 1 million words), post-training on question-answer pairs is recommended. When creating synthetic datasets, consider generating not just questions and answers, but also evaluation criteria and high-quality responses.

Unsloth vs. Transformers: Choosing Your Fine-Tuning Library

Two popular libraries for fine-tuning are Unsloth and Transformers. Let's compare their features:

Unsloth

  • Generally 2x faster than Transformers
  • Unified function for loading multimodal models
  • Single GPU support only
  • Easier to use for basic fine-tuning tasks

Transformers

  • Multi-GPU support
  • More extensive documentation for advanced features
  • Greater flexibility for customization
  • Better support for larger models

Choosing between Unsloth and Transformers depends on your specific needs and the size of the model you're working with. For smaller models and straightforward fine-tuning tasks, Unsloth may be the better choice due to its speed and ease of use. For larger models or more complex fine-tuning scenarios, Transformers might be more appropriate.

Running Fast Evaluations with VLLM

When fine-tuning models, it's essential to evaluate performance before, during, and after the process. Using VLLM (Very Large Language Model) for evaluations can significantly speed up the process compared to using Transformers or Unsloth for inference. VLLM implements continuous batching, which optimizes inference speed.

To use VLLM for evaluations:

  1. Install VLLM separately from your fine-tuning library
  2. Load your model with VLLM for inference
  3. Run evaluations using VLLM before and after fine-tuning

Keep in mind that you'll need to reload the model in VLLM after fine-tuning with Unsloth or Transformers.

Choosing the Right Model to Fine-Tune

When selecting an open-source model for fine-tuning, consider the following factors:

  1. License
  2. Model size
  3. Performance
  4. Specific capabilities (e.g., reasoning)

Here's a tentative order of preference for fine-tuning:

  1. Mistral Small: Apache 2 license, strong performance, <30B parameters
  2. Gemma 3: Custom license, very strong performance
  3. FI4: Permissive license, supports reasoning
  4. Llama 4: Custom license, large model size (100B+ parameters)
  5. Qwen 3: Apache 2 license, very strong performance, but potential censorship and backdoor risks

General Fine-Tuning Tips

  1. Spend 80-90% of your time on data preparation
  2. Define two evaluation datasets: one representative set not in the training data, and one verbatim copy of some training data
  3. Measure overfitting by comparing performance on these two evaluation sets
  4. Evaluate before, during, and after fine-tuning
  5. Inspect the chat template being used, especially for date-related information

Fine-Tuning Process: A Step-by-Step Guide

Now, let's walk through the fine-tuning process using a practical example. We'll use the FI4 Mini Instruct model and a touch rugby dataset for this demonstration.

Step 1: Setting Up the Environment

First, ensure you have the necessary libraries installed:

!pip install vllm unsloth transformers

If you're using a GPU, make sure you have the appropriate CUDA version installed.

Step 2: Loading the Model and Data

Load the base model and the dataset:

from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset

model_name = "microsoft/phi-2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

dataset = load_dataset("your_dataset_name")

Step 3: Preparing the Data

Format your dataset into the appropriate structure for fine-tuning:

def format_data(example):
    return {
        "input": f"User: {example['question']}\nAssistant:",
        "output": example['answer']
    }

train_data = dataset['train'].map(format_data)
eval_data = dataset['validation'].map(format_data)

Step 4: Setting Up the Trainer

Configure the training parameters and create a Trainer object:

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=eval_data,
)

Step 5: Fine-Tuning the Model

Start the fine-tuning process:

trainer.train()

Step 6: Evaluating the Fine-Tuned Model

After fine-tuning, evaluate the model's performance:

from vllm import LLM

fine_tuned_model = LLM(model_name="./results/checkpoint-final")

def evaluate_model(model, dataset):
    correct = 0
    total = 0
    for example in dataset:
        question = example['question']
        true_answer = example['answer']
        predicted_answer = model.generate(question, max_tokens=100)
        if predicted_answer.strip() == true_answer.strip():
            correct += 1
        total += 1
    return correct / total

accuracy = evaluate_model(fine_tuned_model, eval_data)
print(f"Accuracy: {accuracy:.2f}")

Troubleshooting Common Issues

During the fine-tuning process, you may encounter several common issues:

  1. Out of memory errors: Try reducing batch size or using gradient accumulation
  2. Slow training: Experiment with learning rates and optimizer settings
  3. Overfitting: Implement early stopping or adjust regularization parameters
  4. Poor performance: Review your data quality and augmentation techniques

Advanced Fine-Tuning Techniques

Once you've mastered the basics of fine-tuning, consider exploring these advanced techniques:

  1. Low-rank adaptation (LoRA): Efficient fine-tuning by updating a small number of parameters
  2. Prompt tuning: Optimizing continuous prompts instead of model weights
  3. Quantization-aware fine-tuning: Fine-tuning models while maintaining low-precision weights
  4. Distillation: Training a smaller model to mimic a larger fine-tuned model

Conclusion

Fine-tuning open-source language models can significantly improve their performance on specific tasks or domains. By following this comprehensive guide, you should now have a solid understanding of the fine-tuning process, from data preparation to evaluation. Remember to experiment with different models, hyperparameters, and techniques to find the optimal approach for your specific use case.

As the field of natural language processing continues to evolve rapidly, stay updated with the latest research and best practices in fine-tuning. With practice and experimentation, you'll be able to leverage the power of fine-tuned language models to create more accurate and efficient AI applications.

Article created from: https://youtu.be/Ik6nbAjxLk4?si=GIsgO--BzbiigndR

Ready to automate your
LinkedIn, Twitter and blog posts with AI?

Start for free