Your data is the bottleneck. Fine-tuning won't fix that.
Teams rush to custom models when their PDFs are a mess. A week spent cleaning sources beats a month of training on garbage.
A logistics startup once sent us 1,800 pages of "knowledge base" material for a quoting assistant. Half of it was exported from a wiki that hadn't been updated since 2022. A quarter was slide decks with screenshots of spreadsheets. The rest was useful. They wanted to know our fine-tuning timeline before we'd opened the zip file.
We asked for one thing first: pick the twenty questions your sales team actually asks every week. We got seventeen. We manually found the canonical answers in their mess of docs. Eight of those answers didn't exist in writing anywhere — they lived in one senior rep's head. That's not a training problem either.
RAG is unsexy and usually enough
Retrieval-augmented generation has a bad reputation in some circles because people implement it badly — chunk everything at 512 tokens, pray, wonder why the bot invents shipping rates. Done carefully, with metadata (product line, region, effective date), hybrid search, and humans in the loop for updates, it gets you most of the way for internal tools and customer-facing assistants.
Fine-tuning makes sense when you need a consistent tone, a strict output format, or domain language that base models mishandle even with good context. It does not fix contradictory source material. Train on conflicting refund policies and you get a confident liar.
A practical order of operations
- List the top 20–30 real user questions (from logs, sales calls, support — not brainstorm whiteboards)
- Find or write one authoritative answer for each
- Structure docs with dates, owners, and scope ("US pricing" vs "EU pricing")
- Build retrieval and measure answer faithfulness before touching weights
- Fine-tune only if retrieval + prompting plateaus on tasks that matter for revenue
Garbage in, garbage out isn't a cliché. It's the default mode of every AI project that skips the librarian step.
We still do fine-tuning and custom pipelines when the use case earns it. We've also talked clients out of it and saved them money. The right stack depends on whether you're failing because the model doesn't understand your jargon, or because your organisation never decided what the answer is.
If you're not sure which camp you're in, run a two-week audit: real questions, real docs, no slides. You'll know by Friday.