Thread Transfer
RAG in 2025: From basic retrieval to GraphRAG and beyond
RAG has evolved far beyond simple vector search. We map the landscape from basic to advanced techniques.
Jorgo Bardho
Founder, Thread Transfer
Retrieval Augmented Generation (RAG) went from "vector search + prompt stuffing" in 2023 to an entire ecosystem of specialized techniques by 2025. Teams that stuck with basic RAG are seeing 40-60% accuracy on complex queries. Teams that graduated to Adaptive RAG, GraphRAG, or Self-RAG are pushing 85%+. This guide maps the landscape so you know which technique fits your use case.
RAG fundamentals: The baseline everyone starts with
Traditional RAG is dead simple: chunk documents, embed them into vectors, store in a vector DB (Pinecone, Weaviate, Qdrant), and retrieve top-k matches at query time. You stuff those chunks into the LLM context window alongside the user's question. The model generates an answer grounded in retrieved facts.
This works fine for:
- FAQ answering with stable, well-structured docs
- Internal wikis where questions map cleanly to document sections
- Single-hop reasoning ("What is our refund policy?")
But traditional RAG breaks when queries require multi-hop reasoning, when documents are poorly chunked, or when semantic search returns high-relevance but low-utility chunks.
2025 techniques: Adaptive, Graph, and Self-RAG
Adaptive RAG uses an LLM-powered router to decide query complexity. Simple queries hit a basic retrieval path. Complex queries trigger iterative retrieval—expanding search terms, re-ranking results, or pulling additional context rounds. This cuts latency on easy questions by 60% while improving accuracy on hard ones by 25%.
GraphRAG (Microsoft Research) builds a knowledge graph from your corpus before retrieval. Instead of searching flat chunks, it traverses entity relationships and hierarchical summaries. Multi-hop questions that require connecting A → B → C become trivial. GraphRAG outperforms naive RAG by 2-3x on reasoning benchmarks but requires upfront graph construction compute.
Self-RAG adds a verification loop: after generating an answer, the model critiques its own output against retrieved evidence. If confidence is low, it triggers another retrieval round with refined queries. This catches hallucinations before they reach users. Self-RAG achieves 92% factual accuracy vs 68% for basic RAG in production logs.
Implementation guide: Start simple, graduate strategically
Don't jump straight to GraphRAG. Here's the progression we recommend:
- Phase 1: Nail basic RAG. Get chunking right (200-500 tokens with 20% overlap). Use hybrid search (semantic + BM25). Measure retrieval precision and answer accuracy on 50+ test queries.
- Phase 2: Add query augmentation. Expand vague queries with synonyms or use HyDE (generate a hypothetical answer, then embed it for retrieval). This alone boosts accuracy 15-20%.
- Phase 3: Layer in Adaptive RAG. Route simple queries to single-shot retrieval. Send complex queries through iterative loops. Track latency vs accuracy trade-offs.
- Phase 4 (optional): GraphRAG for reasoning-heavy use cases. If your docs contain deep entity relationships (legal contracts, research papers, technical specs), build the graph. Otherwise skip it.
When to use what: Decision tree
Use basic RAG when:
- Queries are single-hop and well-scoped
- Documents are clean, structured, and under 100k total
- Latency requirements are strict (sub-500ms)
Use Adaptive RAG when:
- Query complexity varies widely (some easy, some multi-step)
- You want to optimize cost and latency without sacrificing accuracy on hard questions
Use GraphRAG when:
- Documents contain heavy entity relationships (contracts, research, technical specs)
- Multi-hop reasoning is the norm, not the exception
- You can afford the upfront graph construction cost
Use Self-RAG when:
- Factual accuracy is non-negotiable (healthcare, legal, finance)
- Hallucination detection must happen before answers reach users
Future directions: Contextual embeddings and agent-powered retrieval
Anthropic's contextual embeddings (late 2024) prepend document context to each chunk before embedding. Instead of embedding "The policy changed in Q3", you embed "[Company XYZ Refund Policy] The policy changed in Q3". This cuts retrieval errors by 35% on ambiguous queries.
Agent-powered retrieval is emerging: instead of one retrieval step, an agent decides when to search, re-rank, expand, or stop. LangGraph and AutoGen both ship agent loops for RAG. Expect this to become table stakes by end of 2025.
Bottom line: RAG is no longer one technique. It's a spectrum. Start simple, measure relentlessly, and graduate to advanced patterns only when the data justifies it.
Learn more: How it works · Why bundles beat raw thread history