ARTIFICIAL INTELLIGENCE (45) – Natural Language Processing (22) RAG (Retrieval‑Augmented Generation)

Retrieval‑Augmented Generation (RAG) is an architecture that combines: Information retrieval (searching relevant documents), and Text generation (using a Large Language Model).

Instead of relying only on what the model learned during training, RAG allows the model to: Look up external, up‑to‑date, or private knowledge, and Use that retrieved information as context when generating an answer. This drastically improves accuracy, grounding, and trustworthiness.

Overview of the Main Parts

There are four core components of a RAG system:

Query
Embedding Model
Vector Database
LLM

These components form a pipeline from user input to final answer.

Query: Input provided by the user. The query is the user’s question or request.

Example:
- “What are the safety requirements for emergency lighting systems?”
- “Summarize the company’s data retention policy.”

The query is the starting point of the RAG pipeline, and used both for retrieval and for final answer generation.

Embedding Model

Model used to compute vector representations of our queries and context information. Embeddings are numerical vector representations of text. Texts with similar meaning have vectors that are close in vector space.

Role of the embedding model: Converts the user query into a vector, converts documents or text chunks into vectors and enables semantic (meaning‑based) search instead of keyword search.

Importance of chunking

Finding the right way to split our data (chunks) is important to obtain good results.

Why chunking matters: Large documents must be split into smaller pieces, Chunks that are too large: May exceed context limits and Reduce retrieval precision, and Chunks that are too small: May lose meaning or context.

Good chunking leads to: Better retrieval relevance and more grounded and accurate answers.

Vector Database

Database that stores all the vector representations and allows an efficient comparison with our queries. It Stores vectors for all document chunks, Supports fast similarity search (e.g., cosine similarity) and Retrieves the top‑K most relevant chunks for a given query.

Why a vector database is needed: Comparing embeddings directly is computationally expensive and Vector databases are optimized for: Speed, Scalability and Large collections of embeddings.

Examples (conceptually):

FAISS
Pinecone
Weaviate
Qdrant

LLM (Large Language Model)

Model that generates an answer based on the information provided by the query and the context information.

Role of the LLM:

Receives The original query and The retrieved context from the vector database. Generates a final answer using this combined input.

Key advantage:

The LLM is no longer guessing.
It is responding based on explicit retrieved evidence.

This reduces: Hallucinations, Out‑of‑date answers and Unsupported claims.

How the Full RAG Pipeline Works

User submits a query
The embedding model converts the query into a vector
The vector database retrieves relevant document chunks
The LLM receives:
- The query
- The retrieved chunks as context
The LLM generates a grounded, context‑aware answer

Why RAG Is So Important

RAG is critical for real‑world applications because it allows:

Use of private or enterprise data.
Answers based on current information.
Better factual accuracy.
Clear traceability to source documents.

This is why RAG is widely used in:

Enterprise copilots.
Knowledge bases.
Customer support systems.
Legal and compliance tools.
Technical documentation assistants.

In short:

Query: what the user asks.
Embedding model: converts text into vectors.
Vector database: finds relevant information efficiently.
LLM: generates the final answer using retrieved context.

Together, these parts transform a language model from a static text generator into a knowledge‑grounded assistant.

Bonus

Write down your ideas.

Bonus 2

References:

	AI CHINESE – AI Chin… en AI CHINESE – AI Chinese Speech…
	Mane Oliva en REVIT ARCHITECTURE (199)…
	REVIT ARCHITECTURE (… en REVIT ARCHITECTURE (957) – PYT…
	REVIT ARCHITECTURE (… en REVIT ARCHITECTURE (946) – PYT…
	REVIT ARCHITECTURE (… en REVIT ARCHITECTURE (927) – PYT…

ARTIFICIAL INTELLIGENCE (45) – Natural Language Processing (22) RAG (Retrieval‑Augmented Generation)

Overview of the Main Parts