Alex Lipinski
Don't. Leave full LLM training for Google, OpenAI, and Anthropic. Select a model; fine-tune only if it suits you; and improve results by limiting the scope of what the model sees with RAG.
In our blog, the Document Data Crisis, we described that bad responses are not a model problem – they are a context problem. In this blog we’ll dig a bit deeper into how your content and data context is ingested into a general or foundational language model, why you wouldn’t want to train your own from scratch, and how to enhance results with Retrieval Augmented Generation (RAG).
Takeaways
- Most enterprises have no business training an LLM – but there are other ways to improve results.
- RAG improves accuracy by focusing responses to specific data sources.
- Proficient upstream pre-processing of data is essential to hallucination reduction.
- Chunking translates IDP results into information RAG can use.
- Rethink LLM training into how content is structured, retrieved, and governed.
Watch the Mostly Unstructured Podcast!
- MCP connects LLM's with core apps
- MCP makes LLM's more flexible and operable for business use-cases.
- Governance and security considerations remain paramount.
The issues with training an LLM or Domain Specific Language Model
There are many types of models, and within each type, several ways to train said model. We’re going to stick to what everyone thinks about when they hear AI (a foundational model), a model type that showed great promise in the business/enterprise world (DSLM), and Retrieval Augmented Generation (RAG) — which isn’t so much a model type as it is a strategy.
Foundational models
When you think of AI today, you likely envision what are called foundational language models – ChatGPT, Claude, Grok, etc. More specifically, foundational models are the engines backing up these widely popular interfaces, for example, GPT-5, Opus 4.6, Grok 4.
Foundational models are trained on hundreds of billions to trillions of parameters and an insanely large amount of text to learn language behaviors based on probability rather than an explicit knowledge base. Give a model an input, and the model will generate an output based on the likelihood of all possible next tokens. To train a foundational model would be extremely expensive and hard to maintain. So it’s best to choose a foundational model as your base, and work to improve it’s vision on your desired enterprise data.
Within LLMs, there is a Small Language Model (SLM), which can be as simple in design as a scaled-down version of a larger foundational model. But even an SLM’s parameters are in the billions, and changing them and their weights as business changes is incredibly tedious.
Fine-tuned models
Domain-specific language models require a greatly reduced level of training vs your own foundational model but remain an expensive and inflexible alternative. DSLMs require fine-tuning, a method of adjusting the probabilities (weights) of specific tokens so the model produces outputs that are more relevant to your industry or vertical. But there are still a few problems.
- Fine-tuning changes how the model responds, not what it knows.
- Hard-set weights handicap the model’s adaptability if your business context changes.
- Similar to an SLM, retraining and adjusting weights on new business context is slow.
- Very little governance or traceability as responses come from weights and not specific business documents.
Fine-tuning a DSLM can be an acceptable approach for businesses with little fluctuation and highly specialized industry knowledge. But for enterprises with frequently changing policies, procedures, and processes, fine-tuning is not recommended.
What is Retrieval-Augmented Generation (RAG)
Retrieval-augmented generation (RAG) connects a generative AI model to external documents, enabling the model to retrieve relevant information from approved documents at the time of a user prompt and use that information as context when generating its response.
Instead of relying only on model weights, RAG:
- Searches internal documents or databases (data governance is very important here)
- Selects the most relevant information matching the query
- Injects that content back into the model’s prompt, producing an answer fueled by actual documentation.
The result is improved accuracy and relevancy of responses without the need to adjust weights, and the ability to track answers back to the source documents that fueled them.
RAG improves answers but also exposes weaknesses in your content, moving the ownness of poor GenAI results further away from the model and back into the document data crisis.
Your AI is only as good as the context it retrieves
Poor data quality can result from data hidden in documents that aren’t properly captured and indexed, leading to missing, duplicate, or incorrect information. That spells hallucinations, no matter how good your model is.
A data quality problem needs to be solved upstream. To do that, it’s important to consider how RAG sources data.
The RAG process
A RAG process doesn’t retrieve documents in their entirety to feed to a model. Instead, it relies on chunks. Chunking is a step in RAG that involves breaking down large documents that have already been captured and structuring them into smaller, manageable snippets to fit within the model’s context window (the maximum amount of information the model can process during a single interaction).
The necessary structure for chunking is created during an Intelligent Document Processing (IDP) process, which takes unstructured data from many sources and formats and converts it into clean, labeled, structured content with preserved layout and metadata. The IDP stage is necessary for an LLM to reliably chunk and embed data into a vector database for fast, accurate retrieval by RAG.
IDP is the data quality input control. Chunking is the handoff or transition of data. RAG surfaces it for the user. A data quality problem needs to be solved upstream. To do that, it’s important to consider how RAG sources data.
Turn LLM training into context engineering
In their market trends, Deep Analysis states that Enterprise-grade AI governance is set to become a non-negotiable. The capabilities of RAG, while helpful, mean that teams need to be highly conscious and organized about how and what data is being converted into GenAI responses, not only to protect against bad results, but against highly sensitive data.
Context engineering an LLM means shaping...
- What content belongs in the knowledge base, what is off-limits, and what needs approval before processing.
- The accuracy of content structured with IDP (human-in-the-loop verification), which in turn impacts how it’s chunked and embedded.
- How content is cited by AI, including monitoring traceability (ex. 'this answer came from documents XYZ.')
- Testing ongoing results. Did retrieval fetch the right info? Is it cited accurately?
Foundational vs Fine-tune vs RAG
Sometimes, training an LLM is justified for very large organizations with vast resources available. But rarely. Fine-tuning is more common, but even then — tuning weights at the speed that business changes is its own kind of weight. For most enterprises, the preferred pattern will likely be selecting a foundation model and improving results for RAG with good document control upstream and governance downstream.