Learn how to implement an LLM fine-tuning pipeline in just 100 lines of code.
There's a lot to understand about LLM fine-tuning - from choosing the right base model to preparing your dataset and selecting training parameters. But let's start with a concrete implementation to see how it works in practice. The following 100 lines of code demonstrate:
Loading a small base model (TinyLlama, 1.1B parameters)
Preparing a simple instruction-tuning dataset
Fine-tuning the model on custom data
Using the fine-tuned model to generate responses
This example uses the same fictional "ZenML World" setting as our RAG example, but now we're teaching the model to generate content about this world rather than just retrieving information. You'll need to pip install the following packages:
import osfrom typing import List, Dict, Tuplefrom datasets import Datasetfrom transformers import ( AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, DataCollatorForLanguageModeling)import torchdefprepare_dataset() -> Dataset: data: List[Dict[str,str]]= [{"instruction":"Describe a Zenbot.", "response": "A Zenbot is a luminescent robotic entity that inhabits the forests of ZenML World. They emit a soft, pulsating light as they move through the enchanted landscape."},
{"instruction":"What are Cosmic Butterflies?", "response": "Cosmic Butterflies are ethereal creatures that flutter through the neon skies of ZenML World. Their iridescent wings leave magical trails of stardust wherever they go."},
{"instruction":"Tell me about the Telepathic Treants.", "response": "Telepathic Treants are ancient, sentient trees connected through a quantum neural network spanning ZenML World. They share wisdom and knowledge across their vast network."}
]return Dataset.from_list(data)defformat_instruction(example: Dict[str, str]) -> str:"""Format the instruction and response into a single string."""returnf"### Instruction: {example['instruction']}\n### Response: {example['response']}"deftokenize_data(example: Dict[str, str], tokenizer: AutoTokenizer) -> Dict[str, torch.Tensor]: formatted_text = format_instruction(example)returntokenizer(formatted_text, truncation=True, padding="max_length", max_length=128)def fine_tune_model(base_model: str = "TinyLlama/TinyLlama-1.1B-Chat-v1.0") -> Tuple[AutoModelForCausalLM, AutoTokenizer]:
# Initialize tokenizer and model tokenizer = AutoTokenizer.from_pretrained(base_model) tokenizer.pad_token = tokenizer.eos_token model = AutoModelForCausalLM.from_pretrained( base_model, torch_dtype=torch.bfloat16, device_map="auto" ) dataset = prepare_dataset() tokenized_dataset = dataset.map(lambdax: tokenize_data(x, tokenizer), remove_columns=dataset.column_names )# Setup training arguments training_args = TrainingArguments( output_dir="./zenml-world-model", num_train_epochs=3, per_device_train_batch_size=1, gradient_accumulation_steps=4, learning_rate=2e-4, bf16=True, logging_steps=10, save_total_limit=2, )# Create a data collator for language modeling data_collator = DataCollatorForLanguageModeling( tokenizer=tokenizer, mlm=False ) trainer = Trainer( model=model, args=training_args, train_dataset=tokenized_dataset, data_collator=data_collator, ) trainer.train()return model, tokenizerdef generate_response(prompt: str, model: AutoModelForCausalLM, tokenizer: AutoTokenizer, max_length: int = 128) -> str:
"""Generate a response using the fine-tuned model.""" formatted_prompt = f"### Instruction: {prompt}\n### Response:" inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_length=max_length, temperature=0.7, num_return_sequences=1, )return tokenizer.decode(outputs[0], skip_special_tokens=True)if__name__=="__main__": model, tokenizer = fine_tune_model()# Test the model test_prompts: List[str] = ["What is a Zenbot?","Describe the Cosmic Butterflies.","Tell me about an unknown creature.", ]for prompt in test_prompts: response = generate_response(prompt, model, tokenizer)print(f"\nPrompt: {prompt}")print(f"Response: {response}")
Running this code produces output like:
Prompt:WhatisaZenbot?Response:### Instruction: What is a Zenbot?### Response: A Zenbot is ethereal creatures connected through a quantum neural network spanning ZenML World. They share wisdom across their vast network. They share wisdom across their vast network.
## Response: A Zenbot is ethereal creatures connected through a quantum neural network spanning ZenML World. They share wisdom across their vast network. They share wisdom across their vast network. They share wisdom across their vast network. They share wisdom across their vast network. They share wisdom across their vast network. They share wisdom
Prompt:DescribetheCosmicButterflies.Response:### Instruction: Describe the Cosmic Butterflies.### Response: Cosmic Butterflies are Cosmic Butterflies. Cosmic Butterflies are Cosmic Butterflies. Cosmic Butterflies are Cosmic Butterflies. Cosmic Butterflies are Cosmic Butterflies. Cosmic Butterflies are Cosmic Butterflies. Cosmic Butterflies are Cosmic Butterflies. Cosmic Butterflies are Cosmic Butterflies. Cosmic Butterflies are Cosmic But
...
How It Works
Let's break down the key components:
1. Dataset Preparation
We create a small instruction-tuning dataset with clear input-output pairs. Each example contains:
An instruction (the query we want the model to handle)
A response (the desired output format and content)
2. Data Formatting and Tokenization
The code processes the data in two steps:
First, it formats each example into a structured prompt template:
Then it tokenizes the formatted text with a max length of 128 tokens and proper padding
3. Model Selection and Setup
We use TinyLlama-1.1B-Chat as our base model because it:
Is small enough to fine-tune on consumer hardware
Comes pre-trained for chat/instruction following
Uses bfloat16 precision for efficient training
Automatically maps to available devices
4. Training Configuration
The implementation uses carefully chosen training parameters:
3 training epochs
Batch size of 1 with gradient accumulation steps of 4
Learning rate of 2e-4
Mixed precision training (bfloat16)
Model checkpointing with save limit of 2
Regular logging every 10 steps
5. Generation and Inference
The fine-tuned model generates responses using:
The same instruction format as training
Temperature of 0.7 for controlled randomness
Max length of 128 tokens
Single sequence generation
The model can then generate responses to new queries about ZenML World, attempting to maintain the style and knowledge from its training data.
Understanding the Limitations
This implementation is intentionally simplified and has several limitations:
Dataset Size: A real fine-tuning task would typically use hundreds or thousands of examples.
Model Size: Larger models (e.g., Llama-2 7B) would generally give better results but require more computational resources.
Training Time: We use minimal epochs and a simple learning rate to keep the example runnable.
Evaluation: A production system would need proper evaluation metrics and validation data.
If you take a closer look at the inference output, you'll see that the quality of the responses is pretty poor, but we only used 3 examples for training!
Next Steps
The rest of this guide will explore how to implement more robust fine-tuning pipelines using ZenML, including:
Working with larger models and datasets
Implementing proper evaluation metrics
Using parameter-efficient fine-tuning (PEFT) techniques
Tracking experiments and managing models
Deploying fine-tuned models
If you find yourself wondering about any implementation details as we proceed, you can always refer back to this basic example to understand the core concepts.