What is…
RAG (Retrieval-Augmented Generation)
Classic s (LLMs) answer exclusively based on their training data. This creates two well-known problems: outdated knowledge (the model only knows its training cutoff) and hallucinations (the model invents plausible-sounding answers when it has no well-founded response).
RAG solves both problems structurally: when a question arrives, the system first searches a defined dataset — internal documents, wikis, databases — for relevant text passages. These passages are passed together with the question as context to the language model. The model then answers exclusively based on this retrieved content, not on its general training knowledge.
For enterprise use, RAG is critical for two reasons: first, source data stays within the organization — no company data is transferred to external AI services. Second, when RAG is combined with access rights management (AD/LDAP), the system can only access content for which a user has read permissions.
Silent AI is built on a RAG architecture over internal enterprise data. Connectors for Microsoft 365, SharePoint, Confluence, file servers and other sources feed the internal knowledge database. LLM inference runs entirely locally on dedicated GPU hardware — no prompt, no token leaves the network.
LLM (Large Language Model)
An LLM is an AI model trained on large amounts of text that understands and responds to natural language queries — the foundation of all modern AI assistants, from ChatGPT to locally operated open-source models.