AutoGrade — Automated Subjective Answer Sheet Evaluation System

Project Overview

1.1 Problem Statement

Traditional answer-sheet evaluation works reasonably well for objective questions, but subjective answer sheets are much harder to assess automatically. In descriptive exams, students do not always write the same words as the ideal answer — they may express the same concept in a different sentence structure, draw diagrams differently, or write mathematical expressions in equivalent but non-identical forms.

Because of this, simple keyword matching is not enough. A practical evaluation system must understand:

Textual Understanding
Semantic Similarity
Beyond keyword matching — meaning-level comparison using SBERT embeddings.
Visual Analysis
Diagram Comparison
Circuit diagrams, flowcharts, and figures compared using CLIP vision models.
Mathematical Reasoning
Formula Equivalence
Symbolic math validation via pix2tex → LaTeX → SymPy parsing chain.
Handwriting Support
Multi-Backend OCR
EasyOCR, Google Vision AI, and Azure Document Intelligence.

1.2 Project Goal

To create a usable academic evaluation platform that can accept an ideal answer sheet, rubric, and student answer sheets; extract and evaluate textual content semantically; separately process diagrams and formulas; and generate per-student scores, feedback, and downloadable PDF reports.

1.3 Why This Project Matters

Manual checking of subjective answer sheets is time-consuming, repetitive, and inconsistent across large batches. This system reduces manual effort while preserving a rubric-guided and explainable workflow — especially valuable for faculty members, researchers in educational technology, and experiments in multimodal grading.

System Flow — 6-Stage Pipeline

Input Collection

Ideal answer sheet PDF, rubric JSON, and student answer sheet PDFs are uploaded for one evaluation run.

OCR & Extraction

Printed sheets use EasyOCR; handwritten sheets use Google Vision AI or Azure Document Intelligence.

Formula Parsing

Mathematical regions are isolated and converted through pix2tex before symbolic validation with SymPy.

Diagram Analysis

Visual regions are extracted with OpenCV-based preprocessing for diagram-aware comparison using CLIP.

Evaluation

SBERT or Gemini scores the extracted answers against the ideal reference using rubric-aware weighting.

Result Generation

The system produces JSON outputs, ranking summaries, and downloadable PDF reports per student.

Objectives

Build a pipeline that evaluates descriptive answers beyond exact keyword matching.
Support both printed and handwritten answer sheets through multiple OCR paths.
Add diagram-aware scoring for visual answers.
Add formula-aware scoring for mathematical expressions.
Support two evaluation modes: local SBERT and LLM-based Gemini evaluation.
Generate human-readable PDF reports for each student.
Provide both a research-facing Streamlit interface and a product-style Next.js web interface.

Scope & Limitations

3.1 In Scope

Subjective answer sheet evaluation with OCR-based text extraction
Printed sheet OCR (EasyOCR), handwritten OCR (Google Vision AI + Azure)
Semantic text similarity scoring, diagram extraction + visual comparison
Formula extraction + symbolic evaluation via SymPy
Rubric-based weighted scoring with JSON and PDF report generation
Streamlit research UI + FastAPI backend + Next.js web frontend

3.2 Current Assumptions

One page is treated as one question.
The ideal answer sheet acts as the reference solution.
Rubric JSON provides per-question weights and maximum marks.
Diagram regions are expected mainly in lower or distinct visual sections of a page.
Formula extraction depends on OCR layout heuristics.

⚠

Current Limitations: Very complex handwritten derivations can still be difficult to OCR accurately. OCR quality strongly affects downstream scoring. Formula region detection is heuristic-based, not perfect. Diagram extraction uses connected-component style detection, so unusual layouts can affect accuracy.

End Users

The main users of the system are:

Faculty / examiners — grading descriptive answer sheets at scale
Project supervisors and reviewers — evaluating BTP or research submissions
Researchers — working on answer-sheet automation and multimodal grading
Students / demo audiences — observing the evaluation system in action

System Architecture

The project is a multimodal pipeline built around three core ideas: extract evidence from answer sheets, evaluate each evidence type using the most suitable model, and combine results according to rubric weights.

Overall Architecture — Left-to-Right Data Flow

flowchart LR A["User Interface\nStreamlit or Next.js"] --> B["FastAPI / Local Pipeline Trigger"] B --> C["Input Validation\nideal PDF + rubric JSON + student PDFs"] C --> D["OCR Layer\nEasyOCR / Google Vision AI / Azure"] D --> E["Formula Extraction\npix2tex preparation"] D --> F["Diagram Extraction\nOpenCV connected components"] D --> G["Text Extraction\npage-wise OCR JSON"] E --> H["Formula Evaluation\npix2tex + SymPy"] F --> I["Diagram Evaluation\nCLIP similarity"] G --> J["Text Evaluation\nSBERT or Gemini"] H --> K["Rubric Weighted Aggregation"] I --> K J --> K K --> L["Evaluation JSON"] K --> M["PDF Report Generation"] L --> N["Summary Tables / Run History"] M --> N

Repository Structure

auto_subjective_grader/
├── backend_api/          ← FastAPI backend (REST API)
│   ├── main.py
│   ├── settings.py
│   └── content.py
│
├── src/                  ← Core research pipeline + Streamlit UI
│   ├── app.py
│   ├── pipeline_service.py
│   ├── ocr_pipeline.py
│   ├── diagram_extractor.py
│   ├── formula_pipeline.py
│   ├── formula_evaluator.py
│   ├── evaluation_core.py
│   ├── llm_evaluator.py
│   └── report_generator.py
│
├── web/                  ← Next.js product-style frontend
│   ├── app/
│   ├── components/
│   ├── lib/
│   └── public/
│
├── data/                 ← Development test PDFs
│   ├── ideal/
│   └── students/
│
├── results/              ← All generated artifacts
│   ├── ocr/
│   ├── diagrams/
│   ├── formulas/
│   ├── formula_crops/
│   ├── eval/
│   ├── eval_llm/
│   ├── reports/
│   └── reports_llm/
│
├── rubric.json
├── requirements.txt
├── requirements.backend.txt
├── Dockerfile
└── README.md

Input Design

Every run starts with three required inputs:

Input 1
📄 Ideal Answer Sheet PDF
Acts as the ground-truth reference for all comparisons.
Input 2
📋 Rubric JSON
Defines max marks, text/diagram/formula weights, and penalty flags per question.

Input 3
📂 Student Answer Sheet PDFs
One or more student PDFs uploaded and evaluated against the ideal answer sheet.

Rubric Format

The project uses a JSON rubric. A simplified example:

{
  "1": {
    "max_marks": 10,
    "text_weight": 0.6,
    "diagram_weight": 0.4,
    "formula_weight": 0.2,
    "penalize_missing_diagram": true,
    "penalize_missing_formula": true
  }
}

Rubric Field Reference

Field	Type	Description
`max_marks`	Number	Total marks available for the question
`text_weight`	Float 0–1	Contribution of textual answer quality to final score
`diagram_weight`	Float 0–1	Contribution of diagram quality to final score
`formula_weight`	Float 0–1	Contribution of formula correctness to final score
`penalize_missing_diagram`	Boolean	Whether missing diagram reduces marks to zero for that modality
`penalize_missing_formula`	Boolean	Whether missing formula reduces marks to zero for that modality

End-to-End Workflow

Complete Evaluation Workflow — Top-to-Bottom

flowchart TD A["Upload ideal PDF"] --> D["Validate inputs"] B["Upload rubric JSON"] --> D C["Upload student PDFs"] --> D D --> E["Write rubric to working path"] E --> F["OCR ideal answer sheet"] F --> G["OCR student answer sheets"] G --> H["Formula extraction — SBERT mode"] H --> I["Diagram extraction"] I --> J["Evaluate each student answer"] J --> K["Save evaluation JSON"] K --> L["Generate PDF reports"] L --> M["Show ranking table and downloads"]

Workflow Stages

Input validation — verify all three required files are present
OCR on ideal answer sheet — extract ground-truth text and layouts
OCR on each student sheet — extract per-page text blocks
Formula extraction — isolate math regions and convert to LaTeX (SBERT mode)
Diagram extraction — connected-component based visual isolation
Student-wise evaluation — compare all modalities against ideal
JSON output generation — structured per-student result files
PDF report generation — faculty-friendly individual reports
Summary display — ranking table and download links

OCR Layer

Implemented in src/ocr_pipeline.py. Converts PDFs to images, runs the selected OCR backend, and saves page-wise JSON files with bounding boxes and confidence scores.

OCR Output Format

{
  "page": 1,
  "text": "full extracted text",
  "blocks": [
    {
      "bbox": [[x1,y1],[x2,y2],[x3,y3],[x4,y4]],
      "text": "block text",
      "confidence": 0.92
    }
  ]
}

OCR Backend Comparison

Backend	Best For	Deployment	Handwriting	Requirement
EasyOCR	Printed / clean sheets	Local (no API)	Limited	None — runs locally
Google Vision AI	Handwritten sheets	Cloud API	Excellent	Google Application Default Credentials
Azure Document Intelligence	Handwritten sheets	Cloud API	Excellent	Endpoint + API Key

EasyOCR Processing Steps

Convert PDF pages to images using pdf2image.
Convert PIL image to NumPy array.
Run EasyOCR reader on the page.
Save text blocks and full page text as JSON.

Google Vision AI Processing Steps

Convert PDF pages to images.
Compress pages to fit Vision API request limits.
Call document_text_detection.
Extract paragraph/word blocks with bounding boxes.
Save page-wise OCR JSON.

Azure Document Intelligence Steps

Convert PDF pages to images.
Compress to fit Azure request limits.
Call prebuilt-read model.
Extract line/word content with polygon coordinates.
Save OCR results as JSON.

Diagram Extraction

Implemented in src/diagram_extractor.py. Many answer sheets contain figures or circuit-style diagrams that cannot be judged through text OCR alone. The project separates diagrams from text and evaluates them independently.

Connected-Component Method

Diagram Extraction Pipeline

flowchart LR A["Page Image"] --> B["Convert to Grayscale"] B --> C["Threshold — isolate ink"] C --> D["Morphological Closing\njoin nearby strokes"] D --> E["connectedComponentsWithStats"] E --> F["Filter small components\ntext & noise removal"] F --> G["Prefer lower-region components"] G --> H["Compute union bounding box"] H --> I["Save cropped PNG\nresults/diagrams/"]

Output Path

results/diagrams/<BaseName>_page<PageNo>_diagram.png

ℹ

Advantages: Lightweight and explainable — no trained object detector required. Works well for diagram-heavy lower regions of a page.
Limitations: Unusual layouts or closely integrated text-diagram areas can reduce accuracy.

Formula Extraction & OCR

Implemented in src/formula_pipeline.py. Mathematical expressions should not be treated as ordinary text because OCR noise makes direct string comparison unreliable. This module isolates formula-like lines and converts them to LaTeX for symbolic analysis.

Processing Steps

Read page-wise OCR blocks.
Convert each OCR block bounding box into rectangular regions.
Group nearby blocks into line candidates.
Detect whether a line looks formula-like (_looks_formula_like()).
Crop the formula region with padding.
Run pix2tex to convert the crop into LaTeX.
Save page-wise formula JSON.

Formula Detection Heuristics

The _looks_formula_like() function checks for patterns such as:

Math operators: =, +, *, ^
Fraction-like patterns
Trig/calculus keywords: sin, cos, log, lim, dx
High symbolic density
Equation-like alphanumeric patterns

Formula Output Format

{
  "page": 1,
  "formula_count": 2,
  "formulas": [
    {
      "index": 1,
      "bbox": [x1, y1, x2, y2],
      "ocr_text": "E = mc^2",
      "latex": "E=mc^2",
      "crop_path": "results/formula_crops/...",
      "error": ""
    }
  ]
}

Formula Evaluation

Implemented in src/formula_evaluator.py. Two formulas may be mathematically equivalent but look different as strings, e.g. a/b vs \frac{a}{b} vs algebraically rearranged forms. A formula-aware evaluator must check equivalence, not just string similarity.

Formula Evaluation Logic

flowchart TD A["LaTeX String Input"] --> B["Normalize LaTeX-like strings"] B --> C["Try parse_latex\nsympy.parsing.latex"] C --> D{Parse OK?} D -->|Yes| E["Symbolic simplification"] D -->|No| F["Fallback parse_expr"] F --> G["String similarity fallback"] E --> H{Symbolically Equal?} H -->|Yes| I["Score: 1.0\nexact or equivalent"] H -->|No| J["Partial score\nclose but imperfect"] G --> J

Scoring Strategy

1.0 — exact or symbolically equivalent formulas
Partial score — close but imperfect matches (string similarity fallback)
0.0 — incorrect or missing required formulas

Feedback Labels

✓exact match

≈mathematically equivalent

~partially correct

!missing formula

✗substantially different formula

Text & Diagram Evaluation — SBERT Path

Implemented in src/evaluation_core.py. The primary local research evaluation path combines semantic text, diagram, and formula similarity.

Text Similarity — SBERT

Model: sentence-transformers/all-MiniLM-L6-v2
Converts ideal and student answers into embeddings, computes cosine similarity
Allows judging semantic closeness even when exact wording differs

Scoring Curve

Sample Question Score Breakdown (Illustrative)

Text Similarity

0.82

Diagram Similarity

0.71

Formula Similarity

1.00

Weighted Total

7.9

ℹ

Scoring Curve: Raw cosine similarity is not used directly. If similarity ≤ 0.6, score fraction is 0. Above that, marks increase linearly to 1.0. This makes grading stricter than raw cosine similarity.

Diagram Evaluation — CLIP

Model: openai/clip-vit-base-patch32
Ideal and student diagram images are embedded and compared by cosine similarity

Weight Normalization

For each question: text, diagram, and formula weights are read from the rubric, normalized, and each modality contributes proportionally to the final score. If the rubric's penalize_missing_diagram is set and the student has no diagram, marks for that modality reduce to zero.

Gemini LLM Evaluation Path

Implemented in src/llm_evaluator.py. Provides rubric-aware LLM grading as an alternative to local SBERT evaluation.

Model
gemini-2.5-flash
Fast, cost-efficient Gemini model with strong reasoning capabilities.
Requirement
GEMINI_API_KEY
Loaded from .env file or environment variables at runtime.

Prompt Strategy

For each question, the evaluator builds a prompt containing: question id, rubric JSON, ideal answer text, student answer text, text and diagram maximum marks, and whether ideal/student diagrams are available. If diagrams exist, the evaluator sends both images alongside the prompt.

Expected JSON Response

{
  "text_score": 7.2,
  "diagram_score": 3.5,
  "score": 8.1,
  "max_marks": 10,
  "feedback": "Student answer covers key concepts but misses..."
}

✓

Safety Handling: If Gemini fails or returns invalid output, the system falls back to zero score for that question and writes error details into the feedback field — ensuring no silent failures.

Pipeline Orchestration

Implemented in src/pipeline_service.py. Coordinates the full run: write rubric JSON, configure OCR / diagram / formula extraction, select evaluator, generate reports, and publish progress callbacks.

Engine-Dependent Flow Branching

flowchart TD A["Run Start"] --> B{Engine?} B -->|SBERT| C["OCR runs"] C --> D["Formula extraction runs"] D --> E["Diagram extraction runs"] E --> F["SBERT evaluator runs"] F --> G["Save to results/reports/"] B -->|LLM Gemini| H["OCR runs"] H --> I["Diagram extraction runs"] I --> J["Gemini evaluator runs"] J --> K["Save to results/reports_llm/"] G --> L["Progress callback — done"] K --> L

Progress Messages Published

Running OCR on ideal answer sheet
Extracting formulas
Extracting diagrams
Evaluating student [name]
Generating report

Report Generation

Implemented in src/report_generator.py using fpdf2. Converts evaluation JSON results into faculty-friendly downloadable PDF reports.

Report Contents (Per Student)

Identity
Student name, total score, percentage
Per-Question Breakdown
Score, max marks, text/diagram/formula similarity

Each question entry also includes a natural-language feedback string explaining why marks were awarded or deducted.

FastAPI Backend

Implemented in backend_api/main.py. Acts as the bridge between the web interface and the evaluation pipeline — handling file uploads, job creation, synchronous evaluation, run history, report downloads, and runtime configuration.

API Endpoints

Health Endpoints

GET/Root health check

GET/healthzKubernetes liveness probe

GET/api/healthAPI health status

Informational Endpoints

GET/api/teamTeam information

GET/api/resourcesProject resources

GET/api/documentationDocumentation content

GET/api/runtime-configRuntime configuration

GET/api/research-paperResearch paper endpoint

Job & Run Endpoints

GET/api/jobsList all jobs

GET/api/jobs/{run_id}Get specific job status

GET/api/runs/{run_id}Get run results

GET/api/runs/{run_id}/reports/{name}Download PDF report

POST/api/jobsCreate background job

POST/api/evaluateRun synchronous evaluation

Job Metadata Stored Per Run

run id, status, engine, OCR backend, student count, created time, start time, completion time, current step, total steps, progress percent, events timeline, error message, report directory, evaluation directory, summary rows.

Streamlit Interface

Implemented in src/app.py. Provides a fast experimentation and demonstration interface for uploading files, running local evaluations, observing workflow progress, showing ranking tables, and downloading reports.

Tabs & Sections

🏠 Home

Project overview and quick start

⚡ Evaluate

File upload + run pipeline

📚 Documentation

Methodology overview

👥 Team

Project members

📄 Research Paper

Academic paper link

🔗 Resources

External references

.venv\Scripts\python.exe -m streamlit run src\app.py

Next.js Web Interface

Product-style web interface in the web/ directory.

Stack
Next.js 15 + React 19
TypeScript + React Markdown
API Handling
web/lib/api.ts
NEXT_PUBLIC_API_BASE_URL in dev, cloud backend in production

Evaluate Page Features

File uploads (ideal PDF, rubric JSON, student PDFs)
Rubric sample download
Evaluation engine selection (SBERT / Gemini)
OCR backend selection (EasyOCR / Vision AI / Azure)
Live workflow tracking via progress callbacks
Recent run history
Summary tables and per-student report downloads

cd web
npm install
npm run dev

Deployment Architecture

Production Deployment Architecture

flowchart LR U["End User\nbrowser"] --> V["Vercel\nNext.js Frontend"] V --> W["Google Cloud Run\nFastAPI Backend"] W --> X["Pipeline & Models\nSBERT / CLIP / pix2tex"] W --> Y["Cloud OCR APIs\nGoogle Vision / Azure"] W --> Z["Gemini API\nGemini 2.5 Flash"]

Docker Backend — Key Points

Base image: python:3.10-slim
Installs system dependencies: Poppler, OpenCV runtime libraries
Installs backend-only Python requirements (requirements.backend.txt)
Exposes port 8001
Starts FastAPI via start-backend.sh → uvicorn backend_api.main:app

Helper Scripts

run_all_local.cmd — start all three services at once
run_api_8001.cmd — start FastAPI backend
run_streamlit_8502.cmd — start Streamlit UI
run_web_3100.cmd — start Next.js frontend

Environment Variables

Variable	Required For	Description
`ALLOWED_ORIGINS`	Backend	CORS allowed origins for the API
`API_RUNS_ROOT`	Backend	Root directory for storing run artifacts
`POPPLER_PATH`	Backend	Path to Poppler binaries for pdf2image
`EASYOCR_USE_GPU`	Backend	Enable GPU for EasyOCR if available
`GEMINI_API_KEY`	LLM Path	Google Gemini API key
`AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT`	Azure OCR	Azure DI service endpoint URL
`AZURE_DOCUMENT_INTELLIGENCE_KEY`	Azure OCR	Azure DI API key
`NEXT_PUBLIC_API_BASE_URL`	Frontend	API base URL for browser-side calls
`BACKEND_API_BASE_URL`	Frontend	API base URL for server-side calls

ℹ

Google Vision: Uses Google Application Default Credentials — local login via gcloud auth application-default login, or a service account with Vision API access in deployment.

Tools & Libraries

Layer	Tool / Library	Purpose	Link
Language	Python	Core pipeline, backend, OCR, evaluation	python.org
Research UI	Streamlit	Local testing and research dashboard	streamlit.io
Backend API	FastAPI	Uploads, runs, reports, API routes	fastapi.tiangolo.com
API Server	Uvicorn	ASGI server for FastAPI	uvicorn.org
Web Frontend	Next.js 15	Product-style web interface	nextjs.org
Frontend UI	React 19	Web component layer	react.dev
OCR	EasyOCR	Printed and clean-sheet OCR	GitHub
OCR	Google Vision AI	Handwritten OCR + cloud text extraction	Google Cloud Vision
OCR	Azure Document Intelligence	Handwritten OCR alternative	Azure
PDF Processing	pdf2image	Convert PDF pages into images	GitHub
Image Processing	OpenCV	Diagram extraction and image operations	opencv.org
Image Processing	Pillow	Image loading and conversions	python-pillow.org
Semantic Scoring	Sentence Transformers	Text similarity using MiniLM	sbert.net
Visual Scoring	CLIP	Diagram image similarity	GitHub
Formula OCR	pix2tex	Equation image → LaTeX	GitHub
Symbolic Math	SymPy	Formula parsing and equivalence checking	sympy.org
LLM Scoring	Gemini 2.5 Flash	Rubric-aware LLM evaluation	Gemini API Docs
PDF Reports	FPDF2	Student report generation	FPDF2 Docs
Containerization	Docker	Backend deployment image	docker.com
Hosting	Vercel	Next.js frontend deployment	vercel.com
Hosting	Google Cloud Run	Containerized backend deployment	cloud.google.com/run
CI/CD	GitHub Actions	Build and push automation	GitHub Actions

Sample Data Flow

A practical walkthrough of one complete evaluation run:

Upload Ideal Answer Sheet.pdf
Upload rubric.json
Upload one or more student PDFs
Choose SBERT or Gemini as the evaluation engine
Choose OCR backend: EasyOCR, Google Vision AI, or Azure Document Intelligence
Click Run evaluation
OCR JSON is created for ideal + all student sheets
Diagram and formula outputs are extracted per page
Student answers are compared page by page against ideal
Weighted scores are calculated per question per student
Evaluation JSON and PDF reports are generated
Final summary ranking table is displayed with download links

Strengths & Challenges

✅ Current Strengths
 
Multimodal design — not just plain OCR
Both local and cloud evaluation paths
Handwritten OCR support
Symbolic math comparison for formulas
Downloadable PDF reports
Both research and product-style interfaces
⚠ Current Challenges
 
OCR quality is the biggest bottleneck for complex handwriting
One-page-per-question simplification
Diagram extraction is heuristic-based
Formula detection can miss edge cases
Large hosted runs need careful deployment tuning

Future Enhancements

Better question segmentation beyond one-page-one-question assumption
Stronger handwritten formula recognition (specialized models)
Diagram object detection instead of only connected components
Better dashboard analytics and charts
Cloud storage for long-term report persistence
Human-in-the-loop override and review mode
Multi-rubric support for different courses
Better benchmarking on large answer-sheet datasets

Conclusion

The Automated Subjective Answer Sheet Evaluation System is a research-driven, multimodal answer-sheet evaluation platform designed to handle the real complexity of descriptive academic assessment. Instead of relying only on exact keywords, it combines OCR, semantic similarity, diagram analysis, formula parsing, symbolic math evaluation, rubric-based weighting, and report generation into one connected workflow.

This makes the project much more realistic and extensible than a simple text-matching prototype. It already demonstrates a strong foundation for an academic BTP project and also forms the base for a future product-grade evaluation platform.

📌

This documentation page is intentionally written in a structured report style so it can later be converted into a formal project report, project synopsis, documentation chapter, or research paper draft.

🔗

Quick Reference

Core Source Files
 
src/ocr_pipeline.py
src/diagram_extractor.py
src/formula_pipeline.py
src/formula_evaluator.py
src/evaluation_core.py
src/llm_evaluator.py
src/pipeline_service.py
src/report_generator.py
src/app.py
backend_api/main.py
Key Config Files
 
rubric.json
requirements.txt
requirements.backend.txt
Dockerfile
.env.production.example
.github/workflows/backend-acr.yml
web/app/evaluate/page.tsx