Skip to main content
Back to overview/Step 5 of 6
Unsloth22m

Train with LoRA

Run supervised fine-tuning with the Hugging Face Trainer, set learning rate and batch size, monitor loss, and avoid overfitting.

In this step

  • Configure TrainingArguments (LR, batch size, steps/epochs)
  • Create Trainer and run train()
  • Interpret training loss and adjust hyperparameters
  • Optionally run evaluation during training

In this step you will run the actual training: configure the Hugging Face Trainer (or Unsloth's recommended setup), set learning rate, batch size, and number of steps or epochs, then start training and monitor loss. Unsloth's model and LoRA setup are already in place; you only need to wire them to the Trainer.

1. Training arguments

Use TrainingArguments to control optimization and logging. Sensible defaults for Colab (T4, 8B model, QLoRA):

from unsloth import is_bfloat16_supported
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./outputs",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=1,           # or use max_steps=100 for a quick run
    learning_rate=2e-4,
    warmup_ratio=0.1,
    logging_steps=10,
    save_strategy="steps",
    save_steps=50,
    save_total_limit=2,
    fp16=not is_bfloat16_supported(),  # Use bf16 on A100; fp16 on T4
    bf16=is_bfloat16_supported(),
    optim="adamw_8bit",           # Saves VRAM
    weight_decay=0.01,
    report_to="none",             # or "tensorboard" if you want logs
)
  • per_device_train_batch_size=2: Fits in ~15GB VRAM with 2048 seq length. Increase only if you have headroom.
  • gradient_accumulation_steps=4: Effective batch size = 2 × 4 = 8. Use this instead of a huge per-device batch to avoid OOM.
  • num_train_epochs=1: One full pass over the data. For small datasets (100–500 examples), 1–2 epochs is often enough; more can overfit.
  • learning_rate=2e-4: Good default for LoRA. For more stability try 1e-4 or 5e-5.
  • fp16 / bf16: Use bf16 on Ampere+ GPUs (A100); fp16 on T4.
  • optim="adamw_8bit": 8-bit Adam reduces memory; recommended for Colab.

To use max_steps instead of epochs (e.g. for a quick test):

# In TrainingArguments, comment out num_train_epochs and set:
# max_steps=60

2. Create the Trainer and train

Pass the model, tokenizer, datasets, data collator, and training arguments:

from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset if "eval_dataset" in dir() else None,
    data_collator=data_collator,
    tokenizer=tokenizer,
)

trainer.train()

If you didn't create eval_dataset, pass eval_dataset=None. Training will run for one epoch (or your chosen max_steps). You should see the loss decrease over time; typical values are in the 0.5–1.5 range depending on task and data.

3. Interpreting loss

  • Loss going down: Model is learning. Steady decrease is good.
  • Loss flat or noisy: Try lowering learning rate, checking data quality, or increasing steps/epochs.
  • Loss near 0: Risk of overfitting; reduce epochs or add more/diverse data.
  • OOM (out of memory): Reduce per_device_train_batch_size to 1, or max_seq_length (e.g. 1024), or disable gradient checkpointing only if you enabled it elsewhere.

4. Optional: evaluation

If you passed eval_dataset, the Trainer will evaluate at eval_steps or at the end of each epoch (depending on evaluation_strategy). To enable:

training_args = TrainingArguments(
    # ... same as above ...
    evaluation_strategy="steps",
    eval_steps=50,
)

Evaluation can slow training; for a first run you can leave it off and do manual checks after training.

5. Save the final model (adapter)

After training, save the LoRA weights so you can load them later for inference or merge:

trainer.save_model("./outputs/final")
tokenizer.save_pretrained("./outputs/final")

This writes the adapter and tokenizer under ./outputs/final. The full model is base + adapter; you don't need to save the base again.

Summary

  • You set TrainingArguments (batch size, gradient accumulation, LR, epochs/steps, fp16/bf16).
  • You created a Trainer with model, tokenizer, train (and optionally eval) dataset and data collator.
  • You ran trainer.train() and monitored loss.
  • You saved the adapter and tokenizer to disk.

In the next step you will load the saved adapter, run inference to test the model, and optionally push to the Hugging Face Hub or export to GGUF for use in Ollama/llama.cpp.