Retrieval-Augmented Generation

RAG = hybrid framework combining retrieval (information from external knowledge base) and generation (LLM/decoder).
Goal: produce factually accurate, context-rich answers.
Two main modules:
1. Retriever → finds relevant documents/passages.
2. Generator → generates final answer conditioned on retrieved docs.

LLMs have limited parametric memory (forget or hallucinate).
RAG adds non-parametric memory (external KB).
Benefits:
- Up-to-date knowledge.
- Domain-specific accuracy.
- Reduces hallucination.
- Smaller model + large corpus ≈ performance of huge models.

a) Retriever

Input: query.
Output: top-k relevant passages.
Types:
- Sparse retrievers → BM25, TF-IDF.
- Dense retrievers → DPR, Contriever, ColBERT (use embeddings).
- Hybrid retrievers → combine sparse + dense.

b) Generator

Input: (query + retrieved docs).
Output: natural language answer.
Models: BART, T5, GPT, LLaMA, etc.
Generation modes:
- Fusion-in-Decoder (FiD) → generator attends separately to each doc.
- Concatenation → all docs concatenated as context.

RAG-Sequence: generate answer conditioned on single retrieved doc at each decoding step.
RAG-Token: each token generation considers different retrieved docs → more dynamic.
FiD (Fusion-in-Decoder): processes multiple docs independently before fusion.
Atlas: large-scale RAG with stronger retrievers + generator.

a) Retriever

b) Generator

c) End-to-End

RAG – Software & Use Cases

Used to store embeddings (vector representations of text) and perform similarity search.

FAISS
- Lightweight library by Facebook AI
- Runs locally, fast nearest-neighbor search
- Use case: Research, prototyping, small–medium datasets
Milvus
- Distributed vector database
- Highly scalable, supports billions of vectors
- Use case: Enterprise-scale search, recommendation engines
Weaviate
- Vector DB with semantic + keyword search
- Schema-based, supports hybrid queries
- Use case: Product search, semantic search over structured/unstructured data
Pinecone
- Managed cloud vector database (SaaS)
- High reliability, auto-scaling
- Use case: Customer support chatbots, enterprise RAG systems
Chroma
- Simple, open-source, Python-first vector DB
- Easy to use, minimal setup
- Use case: Personal knowledge base, quick prototypes

Convert text/documents into dense vector representations.

OpenAI Embeddings
- High-quality embeddings (e.g., text-embedding-ada-002)
- Cloud-only
- Use case: General-purpose semantic search, question answering
SentenceTransformers (Hugging Face)
- Local embeddings with pretrained transformer models
- Supports fine-tuning for domain-specific needs
- Use case: Offline/private RAG, sensitive data domains (health, finance)

Generate final answers by combining query + retrieved documents.

OpenAI GPT (3.5/4)
- API-based, best for high-quality general use
- Use case: QA bots, summarization, customer support
LLaMA, Mistral, Falcon (open-source)
- Run locally or on private servers
- Can be fine-tuned with LoRA/adapters
- Use case: Domain-specific assistants, cost-sensitive or private deployments

Orchestration tools to connect retrieval with generation.

LangChain
- Most popular framework for RAG pipelines
- Provides chains, agents, memory, and tool integration
- Use case: Multi-step agents, chatbot apps, document QA
LlamaIndex (GPT Index)
- Specializes in connecting LLMs with structured/unstructured data
- Strong support for large-document ingestion
- Use case: Knowledge-base Q&A, long-document summarization
Haystack
- Modular NLP/RAG framework from deepset
- Flexible: supports multiple vector DBs and LLMs
- Use case: Enterprise search engines, production RAG pipelines

Layer	Software/Tools	Best Use Case
Vector Store	FAISS	Local research/prototypes
	Milvus	Large-scale enterprise search
	Weaviate	Product/semantic search
	Pinecone	SaaS, customer support
	Chroma	Lightweight prototypes
Embeddings	OpenAI Embeddings	General-purpose semantic search
	SentenceTransformers (HF)	Offline, domain-specific embeddings
LLM	OpenAI GPT	General chatbot, QA
	LLaMA, Mistral, Falcon	Private, domain-tuned LLMs
Frameworks	LangChain	Chatbots, multi-step agents
	LlamaIndex	Long-doc understanding, KBs
	Haystack	Enterprise search pipelines