Thread Transfer

Query augmentation techniques that 10x your RAG accuracy

Vague user queries kill RAG accuracy. We break down augmentation techniques that bridge the gap.

Jorgo Bardho

Founder, Thread Transfer

March 20, 2025•9 min read

query augmentationRAG optimizationretrieval accuracy

Your RAG system is only as good as the queries it receives. A user types "pricing problem" and your vector search returns docs about "billing issues" and "subscription errors"—close, but not quite right. Query augmentation fixes this by expanding, refining, or rewriting queries before retrieval. Teams that implement it see 10x improvements in answer accuracy. Here's how.

Why queries fail: The semantic gap problem

Users ask vague questions. "How do I fix the thing?" "What's the status?" "Why did it break?"Vector search embeds these queries and retrieves semantically similar chunks—but similarity doesn't guarantee relevance. The model doesn't know which "thing" or "status" matters.

Even specific queries fail when terminology mismatches:

User asks about "API limits" but docs say "rate limiting"
User asks "How much does X cost?" but docs say "pricing tiers"
User asks about "errors" but docs describe "exceptions" or "failure modes"

Query augmentation bridges this gap by transforming the user's input into a retrieval-optimized version before it hits the vector DB.

Augmentation techniques: Four proven methods

1. Query expansion with synonyms. Expand the query with related terms. "API limits" becomes"API limits rate limiting throttling quota". Embed the expanded query instead of the original. This increases recall but can dilute precision if you expand too aggressively. Use domain-specific synonym lists or extract synonyms from your corpus with LLMs.

2. HyDE (Hypothetical Document Embeddings). Instead of embedding the user's question, use an LLM to generate a hypothetical answer, then embed the answer for retrieval. The intuition: answers are closer in vector space to the docs you actually want. Example: user asks "How do I reset my password?", HyDE generates"To reset your password, navigate to Settings, click Forgot Password, and follow the email instructions."You embed that synthetic answer and retrieve docs similar to it. HyDE boosts accuracy 15-25% on FAQ-style queries.

3. Step-back prompting. Before retrieving, ask the LLM to generate a higher-level version of the question. User asks "Why is my API returning 429 errors?" Step-back version: "What causes rate limiting errors in APIs?" Retrieve docs for the step-back query, then answer the original question using that context. This works well for troubleshooting and how-to queries where users describe symptoms but need conceptual docs.

4. Multi-query generation. Generate 3-5 variations of the user's query with an LLM, retrieve docs for each variant, and fuse the results with RRF (Reciprocal Rank Fusion). Example: "pricing problem" becomes"billing issue," "subscription cost error," "invoice mistake," and "payment failure."Retrieve for all four, merge results, and re-rank. This increases recall and compensates for terminology mismatches. Multi-query is the easiest to implement and gives 20-30% accuracy gains out of the box.

Implementation: Start with multi-query, layer in HyDE

Don't implement all four at once. Start simple:

Phase 1: Multi-query generation. Use an LLM to generate 3 query variants. Retrieve for each. Fuse with RRF. Measure top-5 retrieval accuracy. Expect 20-30% improvement with minimal effort.
Phase 2: Add HyDE for FAQ-style queries. Detect FAQ-style questions (how-to, what-is, where-is). Generate a hypothetical answer, embed it, retrieve. Compare accuracy vs multi-query. If it beats multi-query by 10%+, keep it. Otherwise, skip.
Phase 3 (optional): Step-back for troubleshooting. If your users frequently ask diagnostic questions ("Why is X broken?"), implement step-back prompting. Generate a higher-level query, retrieve conceptual docs, then answer the specific question.
Phase 4 (optional): Synonym expansion. Only if you have domain-specific jargon mismatches. Build a synonym map (manual or LLM-generated) and expand queries before embedding. Measure precision/recall trade-offs.

Measuring improvement: The test harness

You need a test set to measure whether augmentation helps. Build one:

Collect 50-100 real user queries from logs
For each query, manually label the top 3 docs that should be retrieved (ground truth)
Run retrieval with and without augmentation
Measure top-5 precision and recall

If augmentation improves top-5 recall by 15%+ without tanking precision, ship it. If precision drops (too many irrelevant docs), tune your expansion strategy or re-ranking layer.

Best practices: When to augment, when to skip

Augment when:

Users ask vague or short queries (under 5 words)
Terminology mismatches are common (user vocab ≠ doc vocab)
Your test set shows low recall (under 60% top-5)

Skip when:

Queries are already long and specific (8+ words)
Your corpus is small and well-indexed (under 10k docs)
Augmentation adds latency you can't afford (LLM call adds 200-500ms)

Bottom line: Query augmentation is the lowest-hanging fruit in RAG accuracy. Multi-query generation takes 30 minutes to implement and delivers 20-30% gains. Start there, measure, and layer in HyDE or step-back only if the data justifies it.

Learn more: How it works · Why bundles beat raw thread history