# Not Every RAG System Needs a Vector Database

Everyone building RAG systems starts the same way.
Document → Chunks → Embeddings → Vector Database → Similarity Search → LLM
That pipeline works. But it is not the only way to retrieve information. And for many real-world documents, it is not even the best way.
This post covers two retrieval approaches — Vector RAG and Vectorless RAG — what they are, how they work, when to use each, and how production systems combine both.
What is Traditional Vector RAG
Document → Chunking → Embeddings → Vector DB → Similarity Search → LLM
The retriever converts your question into numbers (embeddings) and finds chunks that are mathematically similar to those numbers.
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
documents = [
Document(page_content="Refund policy: returns within 30 days of purchase."),
Document(page_content="Reset password: go to settings and click forgot password."),
Document(page_content="Subscription plans start from $9.99 per month."),
]
splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
chunks = splitter.split_documents(documents)
embedding_model = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
vectorstore = FAISS.from_documents(chunks, embedding_model)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
results = retriever.invoke("What is the refund policy?")
Vector RAG finds text that looks similar to the question. It works well for general knowledge bases, customer support, and multi-document search.
The problem — similarity is not always relevance. A query about a specific policy may return several related chunks while missing the exact section that contains the answer.
What is Vectorless RAG
Vectorless RAG does not use embeddings or a vector database. Instead of searching by similarity, it navigates the document structure itself.
There are four levels. You do not need all four — use only what your document complexity requires.
Which Levels Do You Actually Need?
Level 1 — Simple Keyword Routing
Level 2 — LLM-Based Routing ← always use this over Level 1
Level 3 — Hierarchical Navigation ← add this for large documents
Level 4 — Precise Section Retrieval ← add this for long section content
| Document Type | Levels to Use |
|---|---|
| Small doc, simple sections | Level 2 only |
| Large doc with chapters and sections | Level 2 + Level 3 |
| Large doc with long paragraph content | Level 2 + Level 3 + Level 4 |
Level 1 is only explained here to show how routing evolved. In practice, always start with Level 2.
Level 1 — Simple Keyword Routing Vectorless
The simplest form. If the question contains a known keyword, route directly to that section.
document_index = {
"refund": "Returns allowed within 30 days. Contact support@company.com",
"password": "Go to settings and click forgot password.",
"price": "Subscription plans start from $9.99 per month."
}
def keyword_router(question: str) -> str:
for keyword, content in document_index.items():
if keyword in question.lower():
return content
return "Section not found"
print(keyword_router("What is the refund policy?"))
Fast and deterministic. Breaks when the user uses synonyms — "I want to return a product" does not contain the word "refund".
Level 2 — LLM-Based Routing Vectorless
Replace keyword matching with an LLM that understands meaning. The LLM reads the question and decides which section to go to — no embeddings involved.
from langchain_groq import ChatGroq
document_index = {
"Refund Policy": "Returns allowed within 30 days of purchase.",
"Password Reset": "Go to settings and click forgot password.",
"Pricing Plans": "Subscription plans start from $9.99 per month."
}
llm = ChatGroq(model="llama-3.3-70b-versatile")
def llm_router(question: str) -> str:
section_names = list(document_index.keys())
response = llm.invoke(f"""
Available sections: {section_names}
Question: {question}
Which section contains the answer? Return section name only.
""")
section = response.content.strip()
return document_index.get(section, "Section not found")
print(llm_router("I want to return a product I bought"))
# LLM understands "return" = Refund Policy
# Returns → "Returns allowed within 30 days of purchase."
Now synonyms work. "Return a product", "get my money back", "cancel purchase" — LLM maps all of them to the right section.
Use this when your document is small and has a flat list of sections.
Level 3 — Hierarchical Navigation Vectorless
For large documents (contracts, SOPs, financial filings), a flat list of sections is not enough. The document has structure — chapters contain sections, sections contain paragraphs. Hierarchical navigation traverses this tree level by level.
document_tree = {
"Employment": {
"Joining Process": "Submit documents within 7 days of joining.",
"Probation Period": "6 months probation for all new hires.",
"Working Hours": "9am to 6pm, Monday to Friday."
},
"Compensation": {
"Salary Structure": "CTC split into basic, HRA, and allowances.",
"Bonuses": "Annual bonus paid in April based on performance.",
"Deductions": "PF deducted at 12% of basic salary."
},
"Leave Policy": {
"Annual Leave": "18 days per year. Carry forward max 5 days.",
"Sick Leave": "12 days per year. Medical certificate required.",
"Maternity Leave": "26 weeks as per government regulations."
}
}
def hierarchical_retriever(question: str) -> str:
# Level 1 — pick the right chapter
chapters = list(document_tree.keys())
chapter = llm.invoke(f"""
Chapters: {chapters}
Question: {question}
Which chapter? Return name only.
""").content.strip()
# Level 2 — pick the right section inside that chapter
sections = list(document_tree[chapter].keys())
section = llm.invoke(f"""
Sections in {chapter}: {sections}
Question: {question}
Which section? Return name only.
""").content.strip()
return document_tree[chapter][section]
print(hierarchical_retriever("How many annual leaves do I get?"))
# Chapter → Leave Policy
# Section → Annual Leave
# Returns → "18 days per year. Carry forward max 5 days."
Add this when your document has chapters, sub-sections, or multiple layers of structure.
Level 4 — Precise Section Retrieval Vectorless
After navigation finds the right section, extract the exact sentence that answers the question — not the whole section.
def precise_retrieval(section_content: str, question: str) -> str:
response = llm.invoke(f"""
Section content: {section_content}
Question: {question}
Extract only the sentence that directly answers the question.
Return that sentence only.
""")
return response.content.strip()
def full_vectorless_rag(question: str) -> str:
section_content = hierarchical_retriever(question)
exact_answer = precise_retrieval(section_content, question)
return exact_answer
print(full_vectorless_rag("How many annual leaves do I get?"))
# → "18 days per year."
Add this when your section content is long and contains multiple facts. If your sections are already short (1-2 lines), this step is not needed.
When to Use Which Approach
| Situation | Approach |
|---|---|
| General Q&A, chatbots | Vector RAG |
| Legal contracts, financial filings | Vectorless RAG |
| SOPs, enterprise policy documents | Vectorless RAG |
| Large unstructured knowledge bases | Vector RAG |
| Production enterprise systems | Both combined |
The Combined Pipeline
The best production systems use both. Vector search handles unstructured content. Structural navigation handles documents with known layouts.
User Query
↓
Metadata Filters — what type of document is this?
↓
Document Routing — which document has the answer?
↓
Structural Navigation — which section inside that document?
↓
Vector Search — find the exact chunk in that section
↓
Reranking — sort by most relevant
↓
LLM — generate final answer
Quick Reference
| Vector RAG | Vectorless RAG | |
|---|---|---|
| Retrieval method | Similarity search | Structure navigation |
| Needs embeddings | Yes | No |
| Best for | Unstructured docs | Structured documents |
| Retrieval type | Probabilistic | Deterministic |
| Explainability | Low | High |
Final Thought
The quality of a RAG system is decided long before the LLM generates the answer.
Retrieval is the intelligence layer. Most teams optimize the LLM and ignore the retrieval. The teams that get RAG right in production are the ones who pick the right retrieval strategy for the right document type.
Build the retrieval layer first. The LLM will handle the rest.
If this was useful, follow for more production AI content. I write about what actually happens when AI systems meet real data at scale — not toy examples, not tutorials.
About the Author
Sai Lokesh Devathi — AI/ML & MLOps Engineer.
I build production LLM systems, RAG pipelines, and ML infrastructure on Databricks and AKS. Currently working on real-world AI engineering problems — fraud detection, document intelligence, and agentic pipelines.
Email: devathilokesh2001@gmail.com
LinkedIn: linkedin.com/in/sailokesh-datascience-aiml
GitHub: github.com/devathisailokesh
Portfolio: sailokesh-devathi.netlify.app



