Skip to main content

Command Palette

Search for a command to run...

# Not Every RAG System Needs a Vector Database

Updated
8 min read
# Not Every RAG System Needs a Vector Database
D
AI/ML & MLOps Engineer. I build production pipelines and LLM systems. Writing about real-world AI engineering.

Everyone building RAG systems starts the same way.

Document → Chunks → Embeddings → Vector Database → Similarity Search → LLM

That pipeline works. But it is not the only way to retrieve information. And for many real-world documents, it is not even the best way.

This post covers two retrieval approaches — Vector RAG and Vectorless RAG — what they are, how they work, when to use each, and how production systems combine both.


What is Traditional Vector RAG

Document → Chunking → Embeddings → Vector DB → Similarity Search → LLM

The retriever converts your question into numbers (embeddings) and finds chunks that are mathematically similar to those numbers.

from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document

documents = [
    Document(page_content="Refund policy: returns within 30 days of purchase."),
    Document(page_content="Reset password: go to settings and click forgot password."),
    Document(page_content="Subscription plans start from $9.99 per month."),
]

splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
chunks = splitter.split_documents(documents)

embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)
vectorstore = FAISS.from_documents(chunks, embedding_model)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

results = retriever.invoke("What is the refund policy?")

Vector RAG finds text that looks similar to the question. It works well for general knowledge bases, customer support, and multi-document search.

The problem — similarity is not always relevance. A query about a specific policy may return several related chunks while missing the exact section that contains the answer.


What is Vectorless RAG

Vectorless RAG does not use embeddings or a vector database. Instead of searching by similarity, it navigates the document structure itself.

There are four levels. You do not need all four — use only what your document complexity requires.


Which Levels Do You Actually Need?

Level 1 — Simple Keyword Routing
Level 2 — LLM-Based Routing        ← always use this over Level 1
Level 3 — Hierarchical Navigation   ← add this for large documents
Level 4 — Precise Section Retrieval ← add this for long section content
Document Type Levels to Use
Small doc, simple sections Level 2 only
Large doc with chapters and sections Level 2 + Level 3
Large doc with long paragraph content Level 2 + Level 3 + Level 4

Level 1 is only explained here to show how routing evolved. In practice, always start with Level 2.


Level 1 — Simple Keyword Routing Vectorless

The simplest form. If the question contains a known keyword, route directly to that section.

document_index = {
    "refund":   "Returns allowed within 30 days. Contact support@company.com",
    "password": "Go to settings and click forgot password.",
    "price":    "Subscription plans start from $9.99 per month."
}

def keyword_router(question: str) -> str:
    for keyword, content in document_index.items():
        if keyword in question.lower():
            return content
    return "Section not found"

print(keyword_router("What is the refund policy?"))

Fast and deterministic. Breaks when the user uses synonyms — "I want to return a product" does not contain the word "refund".


Level 2 — LLM-Based Routing Vectorless

Replace keyword matching with an LLM that understands meaning. The LLM reads the question and decides which section to go to — no embeddings involved.

from langchain_groq import ChatGroq

document_index = {
    "Refund Policy":   "Returns allowed within 30 days of purchase.",
    "Password Reset":  "Go to settings and click forgot password.",
    "Pricing Plans":   "Subscription plans start from $9.99 per month."
}

llm = ChatGroq(model="llama-3.3-70b-versatile")

def llm_router(question: str) -> str:
    section_names = list(document_index.keys())

    response = llm.invoke(f"""
    Available sections: {section_names}
    Question: {question}
    Which section contains the answer? Return section name only.
    """)

    section = response.content.strip()
    return document_index.get(section, "Section not found")

print(llm_router("I want to return a product I bought"))
# LLM understands "return" = Refund Policy
# Returns → "Returns allowed within 30 days of purchase."

Now synonyms work. "Return a product", "get my money back", "cancel purchase" — LLM maps all of them to the right section.

Use this when your document is small and has a flat list of sections.


Level 3 — Hierarchical Navigation Vectorless

For large documents (contracts, SOPs, financial filings), a flat list of sections is not enough. The document has structure — chapters contain sections, sections contain paragraphs. Hierarchical navigation traverses this tree level by level.

document_tree = {
    "Employment": {
        "Joining Process":  "Submit documents within 7 days of joining.",
        "Probation Period": "6 months probation for all new hires.",
        "Working Hours":    "9am to 6pm, Monday to Friday."
    },
    "Compensation": {
        "Salary Structure": "CTC split into basic, HRA, and allowances.",
        "Bonuses":          "Annual bonus paid in April based on performance.",
        "Deductions":       "PF deducted at 12% of basic salary."
    },
    "Leave Policy": {
        "Annual Leave":    "18 days per year. Carry forward max 5 days.",
        "Sick Leave":      "12 days per year. Medical certificate required.",
        "Maternity Leave": "26 weeks as per government regulations."
    }
}

def hierarchical_retriever(question: str) -> str:
    # Level 1 — pick the right chapter
    chapters = list(document_tree.keys())
    chapter = llm.invoke(f"""
    Chapters: {chapters}
    Question: {question}
    Which chapter? Return name only.
    """).content.strip()

    # Level 2 — pick the right section inside that chapter
    sections = list(document_tree[chapter].keys())
    section = llm.invoke(f"""
    Sections in {chapter}: {sections}
    Question: {question}
    Which section? Return name only.
    """).content.strip()

    return document_tree[chapter][section]

print(hierarchical_retriever("How many annual leaves do I get?"))
# Chapter → Leave Policy
# Section → Annual Leave
# Returns → "18 days per year. Carry forward max 5 days."

Add this when your document has chapters, sub-sections, or multiple layers of structure.


Level 4 — Precise Section Retrieval Vectorless

After navigation finds the right section, extract the exact sentence that answers the question — not the whole section.

def precise_retrieval(section_content: str, question: str) -> str:
    response = llm.invoke(f"""
    Section content: {section_content}
    Question: {question}
    Extract only the sentence that directly answers the question.
    Return that sentence only.
    """)
    return response.content.strip()

def full_vectorless_rag(question: str) -> str:
    section_content = hierarchical_retriever(question)
    exact_answer = precise_retrieval(section_content, question)
    return exact_answer

print(full_vectorless_rag("How many annual leaves do I get?"))
# → "18 days per year."

Add this when your section content is long and contains multiple facts. If your sections are already short (1-2 lines), this step is not needed.


When to Use Which Approach

Situation Approach
General Q&A, chatbots Vector RAG
Legal contracts, financial filings Vectorless RAG
SOPs, enterprise policy documents Vectorless RAG
Large unstructured knowledge bases Vector RAG
Production enterprise systems Both combined

The Combined Pipeline

The best production systems use both. Vector search handles unstructured content. Structural navigation handles documents with known layouts.

User Query
    ↓
Metadata Filters      — what type of document is this?
    ↓
Document Routing      — which document has the answer?
    ↓
Structural Navigation — which section inside that document?
    ↓
Vector Search         — find the exact chunk in that section
    ↓
Reranking             — sort by most relevant
    ↓
LLM                   — generate final answer

Quick Reference

Vector RAG Vectorless RAG
Retrieval method Similarity search Structure navigation
Needs embeddings Yes No
Best for Unstructured docs Structured documents
Retrieval type Probabilistic Deterministic
Explainability Low High

Final Thought

The quality of a RAG system is decided long before the LLM generates the answer.

Retrieval is the intelligence layer. Most teams optimize the LLM and ignore the retrieval. The teams that get RAG right in production are the ones who pick the right retrieval strategy for the right document type.

Build the retrieval layer first. The LLM will handle the rest.


If this was useful, follow for more production AI content. I write about what actually happens when AI systems meet real data at scale — not toy examples, not tutorials.


About the Author

Sai Lokesh Devathi — AI/ML & MLOps Engineer.

I build production LLM systems, RAG pipelines, and ML infrastructure on Databricks and AKS. Currently working on real-world AI engineering problems — fraud detection, document intelligence, and agentic pipelines.

  • Email: devathilokesh2001@gmail.com

  • LinkedIn: linkedin.com/in/sailokesh-datascience-aiml

  • GitHub: github.com/devathisailokesh

  • Portfolio: sailokesh-devathi.netlify.app

9 views

AI in Production

Part 3 of 4

A practical series on building and shipping AI systems that actually work — RAG pipelines, agents, observability, and MLOps. No theory, no toy examples. Real patterns, real failures, real fixes.

Up next

The 5 Layers of Agent Memory — What Every Production Agent Needs

Everyone talks about context engineering. Nobody shows you the memory stack underneath it. Without memory, an agent forgets everything after each session. Like talking to someone with amnesia — you sh