# Not Every RAG System Needs a Vector Database

Everyone building RAG systems starts the same way.

Document → Chunks → Embeddings → Vector Database → Similarity Search → LLM

That pipeline works. But it is not the only way to retrieve information. And for many real-world documents, it is not even the best way.

This post covers two retrieval approaches — Vector RAG and Vectorless RAG — what they are, how they work, when to use each, and how production systems combine both.

What is Traditional Vector RAG

Document → Chunking → Embeddings → Vector DB → Similarity Search → LLM

The retriever converts your question into numbers (embeddings) and finds chunks that are mathematically similar to those numbers.

from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document

documents = [
    Document(page_content="Refund policy: returns within 30 days of purchase."),
    Document(page_content="Reset password: go to settings and click forgot password."),
    Document(page_content="Subscription plans start from $9.99 per month."),
]

splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
chunks = splitter.split_documents(documents)

embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)
vectorstore = FAISS.from_documents(chunks, embedding_model)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

results = retriever.invoke("What is the refund policy?")

Vector RAG finds text that looks similar to the question. It works well for general knowledge bases, customer support, and multi-document search.

The problem — similarity is not always relevance. A query about a specific policy may return several related chunks while missing the exact section that contains the answer.

What is Vectorless RAG

Vectorless RAG does not use embeddings or a vector database. Instead of searching by similarity, it navigates the document structure itself.

There are four levels. You do not need all four — use only what your document complexity requires.

Which Levels Do You Actually Need?

Level 1 — Simple Keyword Routing
Level 2 — LLM-Based Routing        ← always use this over Level 1
Level 3 — Hierarchical Navigation   ← add this for large documents
Level 4 — Precise Section Retrieval ← add this for long section content

Document Type	Levels to Use
Small doc, simple sections	Level 2 only
Large doc with chapters and sections	Level 2 + Level 3
Large doc with long paragraph content	Level 2 + Level 3 + Level 4

Level 1 is only explained here to show how routing evolved. In practice, always start with Level 2.

Level 1 — Simple Keyword Routing `Vectorless`

The simplest form. If the question contains a known keyword, route directly to that section.

document_index = {
    "refund":   "Returns allowed within 30 days. Contact support@company.com",
    "password": "Go to settings and click forgot password.",
    "price":    "Subscription plans start from $9.99 per month."
}

def keyword_router(question: str) -> str:
    for keyword, content in document_index.items():
        if keyword in question.lower():
            return content
    return "Section not found"

print(keyword_router("What is the refund policy?"))

Fast and deterministic. Breaks when the user uses synonyms — "I want to return a product" does not contain the word "refund".

Level 2 — LLM-Based Routing `Vectorless`

Replace keyword matching with an LLM that understands meaning. The LLM reads the question and decides which section to go to — no embeddings involved.

from langchain_groq import ChatGroq

document_index = {
    "Refund Policy":   "Returns allowed within 30 days of purchase.",
    "Password Reset":  "Go to settings and click forgot password.",
    "Pricing Plans":   "Subscription plans start from $9.99 per month."
}

llm = ChatGroq(model="llama-3.3-70b-versatile")

def llm_router(question: str) -> str:
    section_names = list(document_index.keys())

    response = llm.invoke(f"""
    Available sections: {section_names}
    Question: {question}
    Which section contains the answer? Return section name only.
    """)

    section = response.content.strip()
    return document_index.get(section, "Section not found")

print(llm_router("I want to return a product I bought"))
# LLM understands "return" = Refund Policy
# Returns → "Returns allowed within 30 days of purchase."

Now synonyms work. "Return a product", "get my money back", "cancel purchase" — LLM maps all of them to the right section.

Use this when your document is small and has a flat list of sections.

Level 3 — Hierarchical Navigation `Vectorless`

For large documents (contracts, SOPs, financial filings), a flat list of sections is not enough. The document has structure — chapters contain sections, sections contain paragraphs. Hierarchical navigation traverses this tree level by level.

document_tree = {
    "Employment": {
        "Joining Process":  "Submit documents within 7 days of joining.",
        "Probation Period": "6 months probation for all new hires.",
        "Working Hours":    "9am to 6pm, Monday to Friday."
    },
    "Compensation": {
        "Salary Structure": "CTC split into basic, HRA, and allowances.",
        "Bonuses":          "Annual bonus paid in April based on performance.",
        "Deductions":       "PF deducted at 12% of basic salary."
    },
    "Leave Policy": {
        "Annual Leave":    "18 days per year. Carry forward max 5 days.",
        "Sick Leave":      "12 days per year. Medical certificate required.",
        "Maternity Leave": "26 weeks as per government regulations."
    }
}

def hierarchical_retriever(question: str) -> str:
    # Level 1 — pick the right chapter
    chapters = list(document_tree.keys())
    chapter = llm.invoke(f"""
    Chapters: {chapters}
    Question: {question}
    Which chapter? Return name only.
    """).content.strip()

    # Level 2 — pick the right section inside that chapter
    sections = list(document_tree[chapter].keys())
    section = llm.invoke(f"""
    Sections in {chapter}: {sections}
    Question: {question}
    Which section? Return name only.
    """).content.strip()

    return document_tree[chapter][section]

print(hierarchical_retriever("How many annual leaves do I get?"))
# Chapter → Leave Policy
# Section → Annual Leave
# Returns → "18 days per year. Carry forward max 5 days."

Add this when your document has chapters, sub-sections, or multiple layers of structure.

Level 4 — Precise Section Retrieval `Vectorless`

After navigation finds the right section, extract the exact sentence that answers the question — not the whole section.

def precise_retrieval(section_content: str, question: str) -> str:
    response = llm.invoke(f"""
    Section content: {section_content}
    Question: {question}
    Extract only the sentence that directly answers the question.
    Return that sentence only.
    """)
    return response.content.strip()

def full_vectorless_rag(question: str) -> str:
    section_content = hierarchical_retriever(question)
    exact_answer = precise_retrieval(section_content, question)
    return exact_answer

print(full_vectorless_rag("How many annual leaves do I get?"))
# → "18 days per year."

Add this when your section content is long and contains multiple facts. If your sections are already short (1-2 lines), this step is not needed.

When to Use Which Approach

Situation	Approach
General Q&A, chatbots	Vector RAG
Legal contracts, financial filings	Vectorless RAG
SOPs, enterprise policy documents	Vectorless RAG
Large unstructured knowledge bases	Vector RAG
Production enterprise systems	Both combined

The Combined Pipeline

The best production systems use both. Vector search handles unstructured content. Structural navigation handles documents with known layouts.

User Query
    ↓
Metadata Filters      — what type of document is this?
    ↓
Document Routing      — which document has the answer?
    ↓
Structural Navigation — which section inside that document?
    ↓
Vector Search         — find the exact chunk in that section
    ↓
Reranking             — sort by most relevant
    ↓
LLM                   — generate final answer

Quick Reference

	Vector RAG	Vectorless RAG
Retrieval method	Similarity search	Structure navigation
Needs embeddings	Yes	No
Best for	Unstructured docs	Structured documents
Retrieval type	Probabilistic	Deterministic
Explainability	Low	High

Final Thought

The quality of a RAG system is decided long before the LLM generates the answer.

Retrieval is the intelligence layer. Most teams optimize the LLM and ignore the retrieval. The teams that get RAG right in production are the ones who pick the right retrieval strategy for the right document type.

Build the retrieval layer first. The LLM will handle the rest.

If this was useful, follow for more production AI content. I write about what actually happens when AI systems meet real data at scale — not toy examples, not tutorials.

About the Author

Sai Lokesh Devathi — AI/ML & MLOps Engineer.

I build production LLM systems, RAG pipelines, and ML infrastructure on Databricks and AKS. Currently working on real-world AI engineering problems — fraud detection, document intelligence, and agentic pipelines.

Email: devathilokesh2001@gmail.com
LinkedIn: linkedin.com/in/sailokesh-datascience-aiml
GitHub: github.com/devathisailokesh
Portfolio: sailokesh-devathi.netlify.app

# Not Every RAG System Needs a Vector Database

What is Traditional Vector RAG

What is Vectorless RAG

Which Levels Do You Actually Need?

Level 1 — Simple Keyword Routing `Vectorless`

Level 2 — LLM-Based Routing `Vectorless`

Level 3 — Hierarchical Navigation `Vectorless`

Level 4 — Precise Section Retrieval `Vectorless`

When to Use Which Approach

The Combined Pipeline

Quick Reference

Final Thought

About the Author

Comments

AI in Production

The 5 Layers of Agent Memory — What Every Production Agent Needs

More from this blog

The 5 Layers of Agent Memory — What Every Production Agent Needs

AI Agents in Production — What Actually Breaks

RAG is Not Just Chunking + Embedding + Retrieval — Here's What Production Actually Looks Like

Command Palette

What is Traditional Vector RAG

What is Vectorless RAG

Which Levels Do You Actually Need?

Level 1 — Simple Keyword Routing Vectorless

Level 2 — LLM-Based Routing Vectorless

Level 3 — Hierarchical Navigation Vectorless

Level 4 — Precise Section Retrieval Vectorless

When to Use Which Approach

The Combined Pipeline

Quick Reference

Final Thought

About the Author

Comments

AI in Production

The 5 Layers of Agent Memory — What Every Production Agent Needs

More from this blog

Level 1 — Simple Keyword Routing `Vectorless`

Level 2 — LLM-Based Routing `Vectorless`

Level 3 — Hierarchical Navigation `Vectorless`

Level 4 — Precise Section Retrieval `Vectorless`