Build with AI - Tailoring LLMs RAG and Fine-Tuning

Instructor: Dr Nate Butterworth (Google XWF)

Date: May 14, 2026

Recording: https://youtu.be/pdTt57_pBSo

Part 1: Fine-Tuning a Large Language Model on Your Own Data

Workshop Goals / Learning Objectives:

Understand the concepts of pre-training and fine-tuning for Large Language Models (LLMs).
Recognise the hardware requirements and limitations (GPU/TPU/CPU) for training.
Prepare a custom dataset for fine-tuning.
Fine-tune an LLM using Keras with a TPU backend.
Fine-tune an LLM using the transformers library with a GPU backend.
Understand the difference between full fine-tuning and Parameter-Efficient Fine-Tuning (PEFT) with LoRA.
Evaluate the model’s performance by comparing responses before and after fine-tuning.
Compare a RAG workflow with Fine Tuning

🤖 Large Language Models (LLMs)

Think of a pre-trained LLM (like Gemini, Mistral, Claude, ChatGPT) like a brilliant new research assistant who has read nearly the entire public internet. They have a vast general knowledge and can write essays, summarise articles, and answer questions on a huge range of topics. However, they haven’t read your lab’s specific protocols, your private research data, or the niche publications in your specialised field, and they make a lot of mistakes whilst trying to appear confident that they are correct (even when they are not). They are also expensive, but they operate pretty quickly.

This “general knowledge” comes from a process called pre-training, where the model is shown trillions of words of text and learns the patterns, grammar, and facts of human language.

What is Fine-Tuning?

Fine-tuning is the process of taking that pre-trained model and training it for a bit longer on your own, smaller, domain-specific dataset, or teaching it a specific task. It’s like giving your new research assistant a curated stack of your lab’s most important papers and data. You aren’t teaching them language from scratch; you’re adapting their existing knowledge to your specific needs.

Through fine-tuning, the model can learn your domain’s specific vocabulary, understand relationships between concepts in your field, and can adopt a specific style or format for its responses.

Hardware Requirements

Training and fine-tuning LLMs involves billions of calculations. A standard computer processor (CPU) can handle complex, sequential, singular tasks. However, training requires thousands of simple tasks at the exact same time.

This is where Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs) come in. They are specialised chips designed for massive parallel computation, making them great for deep learning. Fine-tuning even a small LLM is practically impossible without one. For this lesson, we’ll use Colab’s free-tier TPUs or GPUs.

Frameworks and Architectures

Model Architecture (The “Brains”)

This is the design of the neural network itself. Examples include Gemma, GPT-2, Llama, etc. These are the pre-trained models we will adapt.

Weights

This the “trained model” that the architechutre has learned from data. Often the “weights” and “architechure” are referred to collectively.

Framework

These are the software libraries that provide the tools to load, manipulate, and train the models. We will use Keras (a user-friendly, high-level API). Other popular frameworks include Hugging Face transformers, Tensorflow, PyTorch, and more.

You must use a framework that supports the model architecture you want to work with and vice versa!

🟢 Get started

Visit Google Colab and start a “New notebook in Drive”.

You can also use Kaggle Notebooks keeping in mind you will have to verify your account to use GPU/TPU resources.

🎯 Example 1: Fine-Tuning Gemma with Keras on a TPU

In this first example, we’ll perform a full fine-tuning of Google’s Gemma model. This means we will be updating all of the model’s weights using our custom data. We’ll use the Keras framework with a JAX backend, which is highly optimized for running on TPUs.

Setup and Environment Configuration

# Import basic libraries for file handling and data manipulation
import os
import pandas as pd

# Login to Kaggle Hub - get credentials from https://www.kaggle.com/settings
import kagglehub
kagglehub.login()

# Download models and data from Kaggle
path_gemma = kagglehub.model_download("keras/gemma3/keras/gemma3_instruct_270m/4")
path_gpt = kagglehub.model_download("keras/gpt2/keras/gpt2_base_en")
path_data = kagglehub.dataset_download("gpreda/medquad")

# Update python libraries to use TPU in a kaggle/colab notebook
# jax 0.7.2 and keras-hub 0.29.0 seem to work
# !pip install -U pip -q
# !pip install -U "jax[tpu]"==0.10.0 -q
!pip install keras-hub==0.29.0 -U -q

# --- Environment Setup for Keras with JAX on a TPU ---

# Keras is a high-level API that can run on different backends like TensorFlow, PyTorch, or JAX.
# JAX is a high-performance library from Google that is especially efficient on TPUs.
# We explicitly tell Keras to use JAX for all its computations.
os.environ["KERAS_BACKEND"] = "jax"

# This command instructs JAX to pre-allocate all available TPU memory.
# This can prevent memory fragmentation and speed up computations, but it means this notebook
# will have exclusive use of the TPU.
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"] = "1.00"

# --- Import Deep Learning Libraries ---

# Import JAX and configure it to use the TPU.
import jax
jax.config.update('jax_platform_name', 'tpu')
print(f"JAX is running on {jax.devices()[0].device_kind}")

# Import our main deep learning frameworks: Keras and keras-hub (forerly keras-nlp) for LLM-specific tools.
import keras
import keras_hub

# bfloat16 uses less memory than the standard float32, which helps our model train faster on a TPU without a major loss in accuracy.
# keras.config.set_floatx("bfloat16")

JAX is running on cpu

Data Loading and Preparation

An LLM needs to be trained on structured examples. For a question-answering task, this means clear pairs of “prompts” (questions) and “responses” (answers). We’ll load the medquad dataset, which contains medical questions and answers, and then create a small, targeted subset for our task.

# Load and subset the data for training
df = pd.read_csv(path_data+"/medquad.csv")
# data = df.sample(n=100, random_state=42)

# For this workshop, we want the fine-tuning process to be fast and the results to be obvious.
# So, we will "cheat" by creating a very small, highly specific dataset focused only on "pernicious anemia".
# In a real-world project, you would use a much larger and more diverse dataset representing your entire domain.
df_subset_mask = df['question'].str.contains('pernicious anemia', case=False, na=False) | \
                         df['answer'].str.contains('pernicious anemia', case=False, na=False) | \
                         df['focus_area'].str.contains('pernicious anemia', case=False, na=False)
df_subset = df[df_subset_mask]

# Preview the first few lines of the data
df_subset.head(2)

	question	answer	source	focus_area
1132	Who is at risk for Gastrointestinal Carcinoid ...	Health history can affect the risk of gastroin...	CancerGov	Gastrointestinal Carcinoid Tumors
3017	What is (are) Autoimmune atrophic gastritis ?	Autoimmune atrophic gastritis is an autoimmune...	GARD	Autoimmune atrophic gastritis

Format the Data for the LLM

We want to train OUR model on a dataset of prompt-response pairs. We’ll write a simple function to convert our DataFrame into the dictionary format required by the model we choose to use. For best results, you should format the prompt and response to match the template the model was originally trained on. This often involves special tokens like '<start_of_turn>user' and'<start_of_turn>model'. Check the Gemma model card for details.

# Helper function to transform our dataframe into the required format.
def format_data(df):
    prompts = []
    responses = []
    for index, row in df.iterrows():
        question = row['question']
        response = row['answer']
        if question and response:
             # prompts.append(f"<start_of_turn>user\nInstruction:\nAnswer the following question.\nQuestion:{question}\n<end_of_turn>")
             # responses.append(f"<start_of_turn>model\nResponse:{response}\n<end_of_turn>")
            prompts.append(f"{question}")
            responses.append(f"{response}")

    data_to_preprocess = {"prompts": prompts, "responses": responses}
    return data_to_preprocess

# Apply the formatting to our data.
formatted_data = format_data(df_subset)

Loading the Pre-Trained Model

Now, we’ll load the pre-trained Gemma model. We are using a Gemma3CausalLM, which is a “Causal Language Model.” This means it works by predicting the very next word (or “token”) in a sequence based on the words that came before it. This is the fundamental mechanism behind text generation.

# Load the Gemma3 model
# `from_preset` is a convenient Keras function to load a model with its standard configuration.
# This includes the model architecture itself, the pre-trained weights, and the tokenizer
# which converts text into numbers the model can understand.
# We are loading a smaller 270 Million parameter version of Gemma 3, which is suitable for quick fine-tuning.
print("Loading model...")
causal_lm = keras_hub.models.Gemma3CausalLM.from_preset(path_gemma)

# The .summary() method gives us a look at the model's architecture.
# Pay attention to the "Total params" and "Trainable params". In this full fine-tuning
# example, they will be the same, meaning we are updating every part of the model.
causal_lm.summary()

Loading model...

2026-05-11 22:55:52.728198: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.

Preprocessor: "gemma3_causal_lm_preprocessor"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                                                  ┃                                   Config ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ gemma3_tokenizer (Gemma3Tokenizer)                            │                      Vocab size: 262,144 │
└───────────────────────────────────────────────────────────────┴──────────────────────────────────────────┘

Model: "gemma3_causal_lm"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                  ┃ Output Shape              ┃         Param # ┃ Connected to               ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ padding_mask (InputLayer)     │ (None, None)              │               0 │ -                          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ token_ids (InputLayer)        │ (None, None)              │               0 │ -                          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ gemma3_backbone               │ (None, None, 640)         │     268,098,176 │ padding_mask[0][0],        │
│ (Gemma3Backbone)              │                           │                 │ token_ids[0][0]            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ token_embedding               │ (None, None, 262144)      │     167,772,160 │ gemma3_backbone[0][0]      │
│ (ReversibleEmbedding)         │                           │                 │                            │
└───────────────────────────────┴───────────────────────────┴─────────────────┴────────────────────────────┘

 Total params: 268,098,176 (1022.71 MB)

 Trainable params: 268,098,176 (1022.71 MB)

 Non-trainable params: 0 (0.00 B)

Test Before Fine-Tuning (Establish a Baseline)

It’s crucial to see how the model performs before we fine-tune it. This gives us a baseline to measure our improvements against. We will ask it a question about our topic and see what its general knowledge provides.

# Set a prompt
prompt = "What is pernicious anemia?"

print("Sending prompt to model...")

# The .generate() method takes our text prompt and produces a response.
response_raw = causal_lm.generate(prompt)

print(f"{response_raw}")

Sending prompt to model...
What is pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main cause of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main symptom of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main cause of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main symptom of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main cause of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main symptom of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main cause of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main symptom of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main cause of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main symptom of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main cause of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main symptom of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main cause of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main cause of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main symptom of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main cause of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms, including fatigue, weakness, shortness of breath, and dizziness.

What is the main cause of pernicious anemia?

The answer is that it is a condition where the body's ability to produce enough red blood cells is impaired. This can lead to a variety of symptoms

Compile and Fine-Tune the Model

Now we need to enable our model to be modified. Then we need to “compile” the model with our training options. Then we can calll .fit() to begin fine-tuning on our data.

# Enable Low-Rank Adaptation (LoRA) for parameter efficient fine-tuning.
# LoRA freezes all weights on the backbone except for specific attention layer components
causal_lm.backbone.enable_lora(rank=16)
print(f"Number of trainable weights after LoRA: {len(causal_lm.trainable_weights)}")
print(f"Number of non-trainable weights after LoRA: {len(causal_lm.non_trainable_weights)}")

print("Compiling the model...")
causal_lm.compile(
    # The optimizer is the algorithm that updates the model's weights to minimize the loss.
    # Adam is a very popular and effective general-purpose optimizer.
    # The `learning_rate` is the single most important hyperparameter. It controls the size of the
    # weight updates. Too large, and the training can become unstable; too small, and it will be too slow.
    # A small learning rate like 1e-4 (0.0001) is a good starting point for fine-tuning.
    optimizer=keras.optimizers.Adam(learning_rate=1e-4),
    # The "loss function" calculates a score that measures how wrong the model's predictions are.
    # The goal of training is to minimize this score. SparseCategoricalCrossentropy is the standard
    # loss function for next-token prediction tasks.
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    # Metrics are used to monitor the training process. Here, we'll track accuracy.
    weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()]
)
print("Done.")

print("Starting fine-tuning...")
causal_lm.fit(formatted_data, epochs=10, batch_size=1) # Adjust batch_size depening on VRAM available. Adjust epoch until loss plateaus
print("Fine-tuning complete!")

Test After Fine-Tuning

Now, we ask the exact same prompt to our newly fine-tuned model. The hope is that its answer will be ~~better, more accurate~~ closer to what we have trained the model to do.

print("Testing generation from the fine-tuned model:")
response_ft = causal_lm.generate(prompt)
print(f"{response_ft}")

# And compare the scope of the fine-tuned model
causal_lm.summary()

#Save the model to disk!
causal_lm.save_to_preset("./my-model-ft")

🗣 Example 2: GPT-2

Here we want to highlight how the choice of a different model means we have to make different choices in our data and framework. And the absoulte bare minimum amount of code for model tuning.

# Load a GPT2 backbone with pre-trained weights. NOTE the differnet keras_hub.models method!
causal_lm = keras_hub.models.CausalLM.from_preset(path_gpt)

prompt = "What is pernicious anemia?"
causal_lm.generate(prompt)

'What is pernicious anemia?\n\nPernicious anemia (PAN) is the most common form of the disease. It can be caused by a deficiency of the protein in your liver. It is a condition where your liver is unable to convert protein into carbohydrates.\n\nThe problem occurs when you are deficient in protein. This means that your body cannot convert enough protein to the body. This is known as a "bio-liver deficiency".\n\nPAN is caused when you have too much of an enzyme in your liver, which is the enzyme that converts protein.\n\nIf you have a liver failure or kidney failure or other health issue, the liver is unable to convert protein into carbohydrates. This means your liver is not able to convert enough of your protein to the body.\n\nThe body can also lose the ability to convert protein to carbohydrates, which causes a liver failure or kidney failure.\n\nHow to treat pernicious anemia\n\nYou can treat the problem by taking a liver transplant.\n\nYou should take your liver transplant to see whether it is safe for you to have. You should take the liver transplant to see if it is safe for you to have.\n\nThe liver transplant is not the same as a transplant from another person.\n\nYou should not use your liver transplant to treat pernicious anemia.\n\nHow can I know if my treatment has worked?\n\nYou can ask your GP about your treatment.\n\nYou can also ask your GP if the treatment has worked.\n\nIf you do not receive your treatment in time, you should seek help right away.\n\nHow long can I stay in hospital?\n\nYour GP may ask you to stay in hospital for up to 24 hours.\n\nHow often can I go to the emergency department?\n\nThe GP can tell you if your treatment has worked.'

# Format the data into what GPT2 model expects - different to Gemma!
def format_data_gpt2(df):
    prompts = []
    responses = []
    for index, row in df.iterrows():
        question = row['question']
        response = row['answer']
        if question and response:
             responses.append(f"{response}\n")

    return responses

formatted_data_gpt2 = format_data_gpt2(df_subset)

# Just use the defaults to demonstrate how lean our model training can be! (No LORA - so full fine tuning)
causal_lm.compile()

causal_lm.fit(formatted_data_gpt2, epochs=10, batch_size=2)

# Try again with fine tuned model
causal_lm.generate(prompt)

🦾 Example 3: Transformers

For this example we use a popular library for advanced NLP work, Hugging Face Transformers. This library is extremely powerful and provides fine-grained control over the entire process. The workflow is more verbose than Keras but is the standard in many research and production environments.

This example is configured for use with GPU environment in Kaggle or Colab

!pip install -U torchao==0.17.0 peft==0.19.1

import kagglehub
kagglehub.login()

# We'll need several components from the transformers ecosystem.
import os
import pandas as pd
import torch
from datasets import Dataset
from peft import LoraConfig, get_peft_model
from transformers import (
    AutoTokenizer, 
    AutoModelForCausalLM, 
    TrainingArguments, 
    Trainer,
    DataCollatorForLanguageModeling
)

path_data = kagglehub.dataset_download("gpreda/medquad")
path_gemma = kagglehub.model_download("google/gemma-3/transformers/gemma-3-270m-it")

# Define the path to the model we want to fine-tune.
model_id = path_gemma

# The tokenizer converts human-readable text into a sequence of numerical tokens.
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Load the pre-trained model. Let transformers default to SDPA (Scaled Dot Product Attention) automatically.
# `device_map="cuda"` explicitly tells the library to load the model onto the first available GPU.
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda"
)

# --- Load and Prepare the Data ---
df = pd.read_csv(f"{path_data}/medquad.csv")

# Filter the data to get rows related to "pernicious anemia"
df_subset_mask = df['question'].str.contains('pernicious anemia', case=False, na=False) | \
                 df['answer'].str.contains('pernicious anemia', case=False, na=False) | \
                 df['focus_area'].str.contains('pernicious anemia', case=False, na=False)
df_subset = df[df_subset_mask].copy()

# Convert the dataframe into a transformers-compatible dataset
dataset = Dataset.from_pandas(df_subset)

# Tokenize the dataset using chat templates to align with the instruction-tuned model's tokens
def format_and_tokenize(examples):
    # Create a formatted string for each question-answer pair. This teaches the model the structure of the task.
    formatted_texts = []
    for q, cot, r in zip(examples['question'], examples['focus_area'], examples['answer']):
        chat = [
            {"role": "user", "content": f"Focus Area: {cot}\nQuestion: {q}"},
            {"role": "assistant", "content": r}
        ]
        # Use the template to create the exact string format Gemma expects
        text = tokenizer.apply_chat_template(chat, tokenize=False)
        formatted_texts.append(text)
    
    # Tokenize the natively formatted text
    # The tokenizer converts the text to token IDs.
    # `truncation=True` will cut off texts that are longer than the model's max sequence length.
    # `max_length` defines a uniform length texts padded so they all have the same length.
    return tokenizer(formatted_texts, truncation=True, max_length=1024)

# Map and clean up old columns to prevent tensor conflicts
tokenized_dataset = dataset.map(format_and_tokenize, batched=True, remove_columns=dataset.column_names)

# Use a DataCollator to dynamically pad and handle next-token prediction "labels" automatically
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

# --- Generate a Response Before Fine-Tuning ---
prompt = "What is pernicious anemia?"
chat = [{"role": "user", "content": prompt}]

# apply_chat_template directly returns a BatchEncoding dict!
inputs = tokenizer.apply_chat_template(
    chat, 
    return_tensors="pt", 
    add_generation_prompt=True,
    return_dict=True
).to("cuda")

print("--- Response Before Fine-Tuning ---")
outputs_raw = model.generate(**inputs, max_new_tokens=100)
response_raw = tokenizer.decode(outputs_raw[0], skip_special_tokens=True)
print(response_raw)

# --- LoRA and Trainer Setup ---
lora_config = LoraConfig(
    r=16, # Rank of the update matrices. Lower rank means fewer parameters to train.
    lora_alpha=32, # A scaling factor for the learned weights.
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], # which layers of the model to attach adapters to.
    lora_dropout=0.05, # Helps prevent overfitting.
    bias="none",
    task_type="CAUSAL_LM"
)

# Add LoRA adapter to the model
model = get_peft_model(model, lora_config)
# show how many parameters are actually being trained, demonstrating LoRA's efficiency.
model.print_trainable_parameters()

training_args = TrainingArguments(
    output_dir="./gemma-finetuned-pernicious-anemia",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    learning_rate=1e-4,
    report_to="none",
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator, # Injected modern data collator
    processing_class=tokenizer,
)

print("--- Starting Fine-Tuning ---")
trainer.train()
print("--- Fine-Tuning Complete ---")

# Clean up memory
del trainer
torch.cuda.empty_cache()

# Merge the LoRA adapter and unload the PEFT model, set model to eval/inference
model = model.merge_and_unload()
model.eval()

print("--- Response After Fine-Tuning ---")
outputs_ft = model.generate(**inputs, max_new_tokens=100)
response_ft = tokenizer.decode(outputs_ft[0], skip_special_tokens=True)
print(response_ft)

To see this Transformers example directly in or

🔬 Adapt This For Your Research

The examples above use a medical question-answering dataset, but the workflow is highly adaptable.

Understand your task. Pick a model. Pick a framework. Build your pipeline!

The key is to structure your data into what your framework/model requires, that also teaches the model how to perform the task you want.