Skip to content

A Multi-Hop Retrieval Augmented Generation (RAG) system with Multi-Agent LangGraph workflow for intelligent educational analytics. Features real-time workflow visualization, Postgres vector embeddings, and checkpoint-based resumption for robust query processing.

Notifications You must be signed in to change notification settings

kaushiknd/Multi-Hop-RAG-System-with-Multi-Agent-Workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Educational Analytics - Multi-Hop RAG System

An intelligent educational analytics platform using multi-agent workflow and retrieval-augmented generation (RAG) to provide instructors with natural language querying capabilities for student performance data.

Features

  • Natural Language Queries: Ask questions in plain English
  • Zero Hallucination: All responses grounded in actual database data
  • Multi-Agent Workflow: Specialized agents for query understanding, schema retrieval, SQL generation, and more
  • Multi-Hop RAG: Contextual retrieval across multiple vector database hops
  • Automatic Visualization: Charts and graphs generated based on query type
  • Secure Access Control: Role-based access with instructor-specific data filtering

Architecture

Educational Analytics AI Workflow

The system processes natural language queries through a 7-stage pipeline:

  1. User Query Input - Teacher asks questions in plain English
  2. Query Understanding - Intent detection, entity extraction, query classification
  3. Schema Retrieval - Multi-hop RAG for finding relevant tables and join patterns
  4. Validation - Permission checks, scope validation, table access control
  5. SQL Generation - LLM-powered parameterized queries with context awareness
  6. Data Analysis - SQL execution, statistics, and insights generation
  7. Response Formatting - Natural language summaries and visualizations

Multi-Hop RAG Schema Retrieval

Multi-Hop RAG Schema Retrieval

The Schema Retrieval Agent uses a 4-hop RAG approach for accurate SQL generation:

Hop Purpose Vector Collection Output
1 Query Intent query_intents Intent type (e.g., "unit_completion")
2 Table Selection table_schemas Relevant tables (e.g., fct.LearnerUnitStats)
3 Join Patterns join_patterns Correct JOIN syntax with all keys
4 Business Rules business_rules Domain filters (e.g., UnitCompletionPerc >= 100)

This multi-hop approach ensures zero hallucination by grounding all schema decisions in vector-embedded metadata.

Prerequisites

  • Docker and Docker Compose
  • Python 3.10+
  • PostgreSQL 14+ with pgvector extension
  • Ollama with Llama3 model

Quick Start

1. Clone Repository

git clone <repository-url>
cd educational-analytics

2. Setup Environment

# Copy environment file
cp backend/.env.example backend/.env

# Edit .env with your configuration
nano backend/.env

3. Start Services

# Start all services
docker-compose up -d

# Wait for services to be healthy
docker-compose ps

# Pull Llama3 model in Ollama
docker exec educational_analytics_ollama ollama pull llama3

4. Initialize Database

# Database schema is automatically created on first run
# Verify schema creation
docker exec educational_analytics_db psql -U eduuser -d educational_analytics -c "\dn"

5. Generate Embeddings

# Enter API container
docker exec -it educational_analytics_api bash

# Generate all embeddings
python scripts/generate_embeddings.py

# Test embeddings
python scripts/test_embeddings.py

# Exit container
exit

6. Access API

API is now available at http://localhost:8000

  • API Documentation: http://localhost:8000/docs
  • Health Check: http://localhost:8000/api/health

API Usage

Authentication

# Login
curl -X POST http://localhost:8000/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{
    "email": "[email protected]",
    "password": "password123"
  }'

# Response
{
  "access_token": "eyJ...",
  "token_type": "bearer"
}

Query

# Send query
curl -X POST http://localhost:8000/api/query \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-token>" \
  -d '{
    "query": "How many students completed Unit 5?",
    "org_id": 1,
    "instructor_id": 123,
    "academic_year": "2024-25"
  }'

# Response
{
  "text_response": "12 students (75% of your class) completed Unit 5...",
  "statistics": {
    "total_students": 16,
    "completed": 12,
    "completion_rate": 75
  },
  "insights": [
    "75% completion rate is above the organization average of 68%",
    "4 students still need to complete the unit"
  ],
  "visualization": {
    "chart_type": "bar_chart",
    "title": "Unit 5 Completion Status",
    ...
  },
  "data": [...],
  "metadata": {
    "result_count": 16,
    "query_type": "statistics",
    "processing_time": 2.34
  }
}

Development

Project Structure

backend/
├── agents/              # Multi-agent implementations
├── api/                 # FastAPI application
├── database/            # Database connection and schemas
├── scripts/             # Utility scripts
├── utils/               # Configuration and logging
├── workflow/            # LangGraph workflow
└── tests/               # Unit and integration tests

Running Tests

# Unit tests
docker exec educational_analytics_api pytest tests/unit

# Integration tests
docker exec educational_analytics_api pytest tests/integration

# All tests
docker exec educational_analytics_api pytest

Development Mode

# Run API in development mode with hot reload
docker-compose up api

# View logs
docker-compose logs -f api

Configuration

Environment Variables

Key configuration in .env:

  • DB_*: Database connection settings
  • `

About

A Multi-Hop Retrieval Augmented Generation (RAG) system with Multi-Agent LangGraph workflow for intelligent educational analytics. Features real-time workflow visualization, Postgres vector embeddings, and checkpoint-based resumption for robust query processing.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published