First Steps in Building a Company-Specific AI
The past year has shown remarkable momentum in artificial intelligence. At the same time, we are still very much at the beginning of understanding what AI can really do inside companies. There is a large elephant in the room: you cannot simply upload company data to the cloud and hope for the best. Despite all the promises around safety and privacy, many organisations are simply not willing to take that risk.
As a result, most organisations experiment with generic AI tools. These tools are useful for learning, inspiration, and exploration. They help people understand what AI can do. But they fall short when it comes to capturing the unique processes, documents, terminology, and logic that defines how a specific company works. In my opinion that is where the true value of in-company AI lies: specialized systems that truly understand the data, language and logic of a company and its context. Safe, up-to-date, transparent and accurate. The next phase in AI within companies therefore requires a shift: moving from general-purpose experimentation to building company-specific AI systems that are grounded in your own data and context.
What is RAG?
A RAG system combines two core capabilities. First, Retrieval: fetching relevant information from your own sources, such as documents, procedures, policies, product data, contracts, or planning files that you have explicitly selected. Second, Generation: a Large Language Model uses this retrieved context to produce accurate, grounded, and verifiable answers.
When I say LLM, I should add that I prefer to use the smallest model that gets the job done. In simple terms, RAG ensures that the answers you receive are based only on your own information. Traditional LLMs rely on statistical patterns learned during training and on whatever data happened to be in their original corpus (the included information). RAG makes output more reliable, more consistent, and better aligned with how your organization works.[1] It also makes it possible to manage information, versions, and quality through defined KPIs. I will return to this later, but the key point is that RAG increases transparency and controllability. If done correctly.
And if structured properly, it can all be done in a safe and secure way—if necessary, even fully local or air‑gapped.
In a typical RAG setup, when a user asks a question, a Smart Retriever is responsible for selecting the most relevant pieces of information from the knowledge base. Rather than relying on simple keyword matching, it uses embeddings and metadata to capture intent and context. The Smart Retriever searches content that has been indexed in advance (documents, databases, transcripts) and retrieves small, structured pieces of information known as chunks. These chunks are fragments of larger documents that are stored individually so the system can retrieve only what is relevant, instead of entire files.
[1] Oche, A.J., A. G. Folashade, T. Ghosal, A. Biswas, (2025), A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions, arXiv:2507.18910v1 [cs.CL] 25 Jul 2025. {https://arxiv.org/abs/2507.18910}.
Download our Guide
The retrieval part of RAG seems straight forward really. But why is it so difficult to develop a system that can be used in a company environment? This is due to the fact the building a proper knowledge base requires quite some insight into the different techniques, it needs to be prepared, checked, and maintained. You must organize the knowledge base or corpus as it is called. Later, we will focus on the corpus and how distinct types of data require different ingestion and retrieval techniques.
But first let us focus on building the knowledge base: the Ingestion Pipeline. Simply uploading your files to an app does not cut it unfortunately. It is this process and the way it is set up that determines the quality of the answers. It is this part of the process, the ingestion pipeline, which is not very visible when using the commercial apps out there, that is the crucial part. The way data is stored and determines the quality of the output.
In-company AI can't be installed
Download if as a first introduction
It’s built.
A practical introduction to Retrieval-Augmented Generation (RAG) the foundation for safe, reliable, company-specific AI.
Some closing remarks on RAG
RAG reduces risk and increases trust by forcing models to answer using verified sources. It lowers hallucination risk, improves auditability, and enables better control over versioned documents. In regulated and high-stakes environments such as manufacturing, logistics, and pharmaceuticals, this matters enormously.
Equally important is the fact that RAG enables secure, on-premises deployment. Unlike fully cloud-based AI, RAG can run on company servers, on edge devices, or in air-gapped environments. This dramatically reduces concerns around intellectual property, sensitive data exposure, compliance, and latency. RAG is also the foundation for more advanced capabilities. Mature RAG systems enable knowledge graphs, AI agents, automated reasoning, scenario analysis, and end-to-end decision support. Without RAG, these systems remain unreliable. In short, RAG transforms AI from a generic assistant into a precise, contextual, and trustworthy enterprise system. When implemented carefully, it becomes the backbone of company-specific, vertical AI. When implemented crudely, it becomes little more than a search tool with better phrasing.
In-company RAG
Start building you In-company RAG system and learn how to train your model. Learn the basic setup and common pitfalls when it comes to in-company AI.
Create a safe environment | the sandbox
We’ve seen firsthand that creating an environment where small, scrappy teams can innovate without being slowed down by endless reviews is key to unlocking AI’s potential.
How to deploy In-Company AI
Building a Company-Specific Vertical AI: A Practical Guide for C-Level and Supply Chain Leaders. Why Companies Are Exploring Vertical AI.
What we Learned building our Vertical
Going from an abstract vision of “AI that works for supply chain” to a functional, compliant, production-ready AI stack is harder than it sounds
When numbers start talking
Our founder Bas Groothedde meets our In-Company AI Agent AVA. Great stuff.
Giving your AI context in the supply chain
Context-prompting, an important tool to give an llm context. There is strong theoretical and empirical evidence that providing AI with better context improves its performance.
The Evolution of AI Agents
The landscape of AI agents has advanced from basic chatbots to sophisticated entities managing complex tasks across industries.
This is World interview featuring Professor Yann LeCun
LeCun brings clarity to the current discourse on AI with a number of thoughtful, remarks and putting it into perpective. A must-watch for anyone navigating the future of artificial intelligence.
AI is moving so fast, which is why your strategy should be long-term.
Danny Hillis, a pioneer in AI He understood something decades ago that many executives are only now coming to terms with. Technology does not evolve in stable, predictable increments.
The future of AI is not just language models. It’s causal reasoning.
Judea Pearl, is another true pioneering in artificial intelligence. But he is a little bit different. And not on the bandwagon if LLMs. Teach machines to understand the question why.
Intelligence is not what we think it is
In our series about AI, we came across the work of Professor Herbert “Herb” A. Simon, one of the founding figures in artificial intelligence.
Why the human factor remains essential in the Digital Era
No matter how advanced technology becomes, human insight remains essential in managing real-world complexity.












