Advanced
Session-as-RAG: Teaching Your Local AI to Actually Remember
Build persistent conversation memory for local LLMs. Chunk sessions, embed in ChromaDB, retrieve relevant past exchanges at query time. Full Python implementation with topic splitting and date citations.
Fine-Tuning LLMs on Consumer Hardware: LoRA and QLoRA Guide
Fine-tune a 7B model on 6-10GB VRAM with QLoRA and Unsloth (2-5x faster, 70% less memory). Only 200-500 examples needed. Dataset prep through training on RTX 3060-4090.