Quick Answer

Fine-tune open-source models (Llama 3.3, Qwen 2.5, Mistral Small) using LoRA on 100-10,000 examples for domain-specific tasks. Train on a rented A100 for $2-20; deploy via vLLM on your own GPU.

Fine-tune only when prompting + RAG isn't enough
500-5000 well-curated examples beat 50k noisy ones
LoRA is 10x cheaper than full fine-tuning with 95% of the quality

What You'll Need

Hugging Face account
GPU: rent from Runpod, Modal, or Lambda Labs ($1-3/hr for A100)
Dataset: 500+ input/output pairs in JSONL
Python environment with transformers, peft, trl

Steps

Prepare dataset. Format as JSONL with messages arrays (ChatML).

json

   {"messages":[{"role":"user","content":"..."},{"role":"assistant","content":"..."}]}

Choose base model. Qwen 2.5 7B or Llama 3.3 8B — strong base, fits on one A100.
Rent a GPU. Runpod template with axolotl or unsloth preinstalled.
Configure training. unsloth gets 2x speed on consumer GPUs. Sample config:

yaml

   model_name: unsloth/llama-3.3-8b-instruct
   lora_r: 32
   learning_rate: 2e-4
   num_train_epochs: 3

Train. python train.py — monitor loss in Weights & Biases.
Evaluate. Hold out 10% of data. Measure with task-specific metrics.
Merge LoRA weights. model.merge_and_unload().
Deploy with vLLM. vllm serve ./merged-model --port 8000 — OpenAI-compatible endpoint.

Common Mistakes

Tiny, noisy dataset. Curate ruthlessly.
Too many epochs. 2-3 is standard; more causes overfitting.
Wrong chat template. Must match the base model's template exactly.
No eval set. You have no idea if it improved without one.

Top Tools

Tool	Purpose
Unsloth	Fast LoRA training
Axolotl	Configurable training framework
vLLM	Production inference
Runpod	Affordable GPU rental
Weights & Biases	Experiment tracking

Conclusion

Fine-tuning in 2026 is accessible to any developer with $20 and a weekend. Use Unsloth, LoRA, and vLLM — never train from scratch. Misar Dev includes a hosted fine-tuning workflow.