Building a RAG System from Scratch: A Beginner’s Guide

15 minute read

Remember when you had to manually search through tons of documents to find that one specific piece of information? Yeah, those days are over.

In this tutorial, we’ll build a RAG (Retrieval-Augmented Generation) system from scratch. By the end, you’ll have a working system that can answer questions about your documents like magic.

New to RAG? If you want to understand what RAG is and why it’s useful before we dive in, check out my guide: RAG for Beginners: A Simple Guide. It’s a quick read that explains the concept in super simple terms!

We’ll use:

Docling - to process and understand documents
LanceDB - to store and search through document chunks
OpenAI - for embeddings and AI generation
Streamlit - for the chat interface

Let’s build something cool together!

What We’re Building

We’re going to create a system that can:

Take a bunch of PDF/Markdown documents (company policies, manuals, etc.)
Process them into searchable chunks
Answer questions about those documents through a chat interface

Think of it as giving AI a superpower to instantly find and cite information from your documents.

Prereqs

Before we start, make sure you have:

Python 3.10 or higher installed
OpenAI API Key - Get yours at platform.openai.com
Set export OPENAI_API_KEY="your-api-key-here"

Architecture Overview

Our RAG system works in 5 simple steps:

Extraction - Convert documents to a format we can process
Chunking - Split documents into smaller, manageable pieces
Embedding - Convert chunks to vectors and store in database
Query - Search for relevant chunks based on user questions
Chat - Generate AI responses using the retrieved context

Step 1: Document Extraction

First, we need to convert our documents (PDF or Markdown) into a format that docling can understand:

from docling.document_converter import DocumentConverter

# Convert a document
converter = DocumentConverter()
result = converter.convert("path/to/document.pdf")

That’s it! Docling handles:

Text content extraction
Page number tracking
Heading identification
Table parsing
Image handling

The result is a structured document object that preserves all this information.

Step 2: Intelligent Chunking

Documents are too large to process all at once. We split them into smaller pieces:

from docling.chunking import HybridChunker
from docling_core.transforms.chunker.tokenizer.huggingface import HuggingFaceTokenizer

# Create a chunker with 128 tokens per chunk
tokenizer = HuggingFaceTokenizer(tokenizer=AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2"), max_tokens=128)
chunker = HybridChunker(tokenizer=tokenizer, max_tokens=128)

# Split the document
chunks = list(chunker.chunk(dl_doc=result.document))

The HybridChunker intelligently splits at logical boundaries (paragraphs, sections) rather than randomly cutting text.

Step 3: Creating Vector Embeddings

We convert text into numbers (vectors) that capture meaning:

import lancedb
from lancedb.embeddings import get_registry

# Setup OpenAI embeddings
func = get_registry().get("openai").create(name="text-embedding-3-small")

# Create LanceDB database
db = lancedb.connect("data/lancedb")
table = db.create_table("documents", schema=Chunks, mode="overwrite")

What are embeddings?

Similar texts have similar numbers
This lets us search semantically (“hotels” finds “accommodation”)
LanceDB handles all the complexity automatically

Step 4: Storing with Metadata

We store each chunk along with important information:

processed_chunks = [
    {
        "text": chunk.text,
        "metadata": {
            "filename": chunk.meta.origin.filename,
            "page_numbers": [...],  # List of page numbers
            "title": chunk.meta.headings[0] if chunk.meta.headings else None,
        },
    }
    for chunk in chunks
]

table.add(processed_chunks)

Why metadata matters:

Users can see where information came from
Enables citations (e.g., “See page 42 of network-config.pdf”)

Step 5: Querying the Database

When a user asks a question, we find relevant chunks:

# Search for relevant chunks
results = table.search(question).limit(20).to_pandas()

# Build context with citations
context = ""
for _, row in results.iterrows():
    source = f"\nSource: {row['metadata']['filename']}"
    if row['metadata']['page_numbers']:
        source += f" - p. {', '.join(str(p) for p in row['metadata']['page_numbers'])}"
    context += row['text'] + source + "\n\n"

This gives us:

Top 20 most relevant chunks
With file names and page numbers
In order of relevance

Step 6: Generating AI Responses

Now we ask the AI to answer based on what we found:

from openai import OpenAI

client = OpenAI()

# Create a prompt with the context
prompt = f"""Answer the question using only this context:

{context}

Question: {question}
"""

# Get response from GPT
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}],
    temperature=0.5
)

answer = response.choices[0].message.content

The AI:

Reads only the provided context
Answers based on document evidence
Includes citations naturally
Uses a lower temperature (0.5) for factual responses

Step 7: Building the Chat Interface

We use Streamlit to create a user-friendly interface:

import streamlit as st

# Display chat history
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# Get user input
if prompt := st.chat_input("Ask a question"):
    # Add to chat history
    st.session_state.messages.append({"role": "user", "content": prompt})

    # Get context from database
    context = get_context(prompt, table)

    # Generate response
    response = get_chat_response(st.session_state.messages, context)

    # Display response
    with st.chat_message("assistant"):
        st.markdown(response)

This creates:

A clean chat interface
Real-time streaming responses
Conversation history
Visual search results

The complete code is available on my github

Try it out and make it your own!

Share on

X Facebook LinkedIn Bluesky

Muzammil Iftikhar