Complete Project Report Multimodal AI Research Grade

Automated Subjective Answer Sheet Evaluation System

A multimodal AI pipeline for evaluating descriptive academic answer sheets — combining OCR, semantic similarity, diagram analysis, formula parsing, and rubric-guided scoring into one traceable, explainable workflow.

3 OCR backends
2 Evaluation engines
3 Answer modalities
FastAPI + Next.js
Gemini 2.5 Flash LLM scoring
SBERT + CLIP local scoring
1

Project Overview

1.1 Problem Statement

Traditional answer-sheet evaluation works reasonably well for objective questions, but subjective answer sheets are much harder to assess automatically. In descriptive exams, students do not always write the same words as the ideal answer — they may express the same concept in a different sentence structure, draw diagrams differently, or write mathematical expressions in equivalent but non-identical forms.

Because of this, simple keyword matching is not enough. A practical evaluation system must understand:

Textual Understanding
Semantic Similarity
Beyond keyword matching — meaning-level comparison using SBERT embeddings.
Visual Analysis
Diagram Comparison
Circuit diagrams, flowcharts, and figures compared using CLIP vision models.
Mathematical Reasoning
Formula Equivalence
Symbolic math validation via pix2tex → LaTeX → SymPy parsing chain.
Handwriting Support
Multi-Backend OCR
EasyOCR, Google Vision AI, and Azure Document Intelligence.

1.2 Project Goal

To create a usable academic evaluation platform that can accept an ideal answer sheet, rubric, and student answer sheets; extract and evaluate textual content semantically; separately process diagrams and formulas; and generate per-student scores, feedback, and downloadable PDF reports.

1.3 Why This Project Matters

Manual checking of subjective answer sheets is time-consuming, repetitive, and inconsistent across large batches. This system reduces manual effort while preserving a rubric-guided and explainable workflow — especially valuable for faculty members, researchers in educational technology, and experiments in multimodal grading.

System Flow — 6-Stage Pipeline

1

Input Collection

Ideal answer sheet PDF, rubric JSON, and student answer sheet PDFs are uploaded for one evaluation run.

2

OCR & Extraction

Printed sheets use EasyOCR; handwritten sheets use Google Vision AI or Azure Document Intelligence.

3

Formula Parsing

Mathematical regions are isolated and converted through pix2tex before symbolic validation with SymPy.

4

Diagram Analysis

Visual regions are extracted with OpenCV-based preprocessing for diagram-aware comparison using CLIP.

5

Evaluation

SBERT or Gemini scores the extracted answers against the ideal reference using rubric-aware weighting.

6

Result Generation

The system produces JSON outputs, ranking summaries, and downloadable PDF reports per student.

2

Objectives

  1. Build a pipeline that evaluates descriptive answers beyond exact keyword matching.
  2. Support both printed and handwritten answer sheets through multiple OCR paths.
  3. Add diagram-aware scoring for visual answers.
  4. Add formula-aware scoring for mathematical expressions.
  5. Support two evaluation modes: local SBERT and LLM-based Gemini evaluation.
  6. Generate human-readable PDF reports for each student.
  7. Provide both a research-facing Streamlit interface and a product-style Next.js web interface.
3

Scope & Limitations

3.1 In Scope

3.2 Current Assumptions

Current Limitations: Very complex handwritten derivations can still be difficult to OCR accurately. OCR quality strongly affects downstream scoring. Formula region detection is heuristic-based, not perfect. Diagram extraction uses connected-component style detection, so unusual layouts can affect accuracy.
4

End Users

The main users of the system are:

5

System Architecture

The project is a multimodal pipeline built around three core ideas: extract evidence from answer sheets, evaluate each evidence type using the most suitable model, and combine results according to rubric weights.

Overall Architecture — Left-to-Right Data Flow
flowchart LR A["User Interface\nStreamlit or Next.js"] --> B["FastAPI / Local Pipeline Trigger"] B --> C["Input Validation\nideal PDF + rubric JSON + student PDFs"] C --> D["OCR Layer\nEasyOCR / Google Vision AI / Azure"] D --> E["Formula Extraction\npix2tex preparation"] D --> F["Diagram Extraction\nOpenCV connected components"] D --> G["Text Extraction\npage-wise OCR JSON"] E --> H["Formula Evaluation\npix2tex + SymPy"] F --> I["Diagram Evaluation\nCLIP similarity"] G --> J["Text Evaluation\nSBERT or Gemini"] H --> K["Rubric Weighted Aggregation"] I --> K J --> K K --> L["Evaluation JSON"] K --> M["PDF Report Generation"] L --> N["Summary Tables / Run History"] M --> N
6

Repository Structure

auto_subjective_grader/ ├── backend_api/ ← FastAPI backend (REST API) │ ├── main.py │ ├── settings.py │ └── content.py │ ├── src/ ← Core research pipeline + Streamlit UI │ ├── app.py │ ├── pipeline_service.py │ ├── ocr_pipeline.py │ ├── diagram_extractor.py │ ├── formula_pipeline.py │ ├── formula_evaluator.py │ ├── evaluation_core.py │ ├── llm_evaluator.py │ └── report_generator.py │ ├── web/ ← Next.js product-style frontend │ ├── app/ │ ├── components/ │ ├── lib/ │ └── public/ │ ├── data/ ← Development test PDFs │ ├── ideal/ │ └── students/ │ ├── results/ ← All generated artifacts │ ├── ocr/ │ ├── diagrams/ │ ├── formulas/ │ ├── formula_crops/ │ ├── eval/ │ ├── eval_llm/ │ ├── reports/ │ └── reports_llm/ │ ├── rubric.json ├── requirements.txt ├── requirements.backend.txt ├── Dockerfile └── README.md
7

Input Design

Every run starts with three required inputs:

Input 1
📄 Ideal Answer Sheet PDF
Acts as the ground-truth reference for all comparisons.
Input 2
📋 Rubric JSON
Defines max marks, text/diagram/formula weights, and penalty flags per question.
Input 3
📂 Student Answer Sheet PDFs
One or more student PDFs uploaded and evaluated against the ideal answer sheet.
8

Rubric Format

The project uses a JSON rubric. A simplified example:

{
  "1": {
    "max_marks": 10,
    "text_weight": 0.6,
    "diagram_weight": 0.4,
    "formula_weight": 0.2,
    "penalize_missing_diagram": true,
    "penalize_missing_formula": true
  }
}

Rubric Field Reference

FieldTypeDescription
max_marksNumberTotal marks available for the question
text_weightFloat 0–1Contribution of textual answer quality to final score
diagram_weightFloat 0–1Contribution of diagram quality to final score
formula_weightFloat 0–1Contribution of formula correctness to final score
penalize_missing_diagramBooleanWhether missing diagram reduces marks to zero for that modality
penalize_missing_formulaBooleanWhether missing formula reduces marks to zero for that modality
9

End-to-End Workflow

Complete Evaluation Workflow — Top-to-Bottom
flowchart TD A["Upload ideal PDF"] --> D["Validate inputs"] B["Upload rubric JSON"] --> D C["Upload student PDFs"] --> D D --> E["Write rubric to working path"] E --> F["OCR ideal answer sheet"] F --> G["OCR student answer sheets"] G --> H["Formula extraction — SBERT mode"] H --> I["Diagram extraction"] I --> J["Evaluate each student answer"] J --> K["Save evaluation JSON"] K --> L["Generate PDF reports"] L --> M["Show ranking table and downloads"]

Workflow Stages

  1. Input validation — verify all three required files are present
  2. OCR on ideal answer sheet — extract ground-truth text and layouts
  3. OCR on each student sheet — extract per-page text blocks
  4. Formula extraction — isolate math regions and convert to LaTeX (SBERT mode)
  5. Diagram extraction — connected-component based visual isolation
  6. Student-wise evaluation — compare all modalities against ideal
  7. JSON output generation — structured per-student result files
  8. PDF report generation — faculty-friendly individual reports
  9. Summary display — ranking table and download links
10

OCR Layer

Implemented in src/ocr_pipeline.py. Converts PDFs to images, runs the selected OCR backend, and saves page-wise JSON files with bounding boxes and confidence scores.

OCR Output Format

{
  "page": 1,
  "text": "full extracted text",
  "blocks": [
    {
      "bbox": [[x1,y1],[x2,y2],[x3,y3],[x4,y4]],
      "text": "block text",
      "confidence": 0.92
    }
  ]
}

OCR Backend Comparison

BackendBest ForDeploymentHandwritingRequirement
EasyOCR Printed / clean sheets Local (no API) Limited None — runs locally
Google Vision AI Handwritten sheets Cloud API Excellent Google Application Default Credentials
Azure Document Intelligence Handwritten sheets Cloud API Excellent Endpoint + API Key

EasyOCR Processing Steps

  1. Convert PDF pages to images using pdf2image.
  2. Convert PIL image to NumPy array.
  3. Run EasyOCR reader on the page.
  4. Save text blocks and full page text as JSON.

Google Vision AI Processing Steps

  1. Convert PDF pages to images.
  2. Compress pages to fit Vision API request limits.
  3. Call document_text_detection.
  4. Extract paragraph/word blocks with bounding boxes.
  5. Save page-wise OCR JSON.

Azure Document Intelligence Steps

  1. Convert PDF pages to images.
  2. Compress to fit Azure request limits.
  3. Call prebuilt-read model.
  4. Extract line/word content with polygon coordinates.
  5. Save OCR results as JSON.
11

Diagram Extraction

Implemented in src/diagram_extractor.py. Many answer sheets contain figures or circuit-style diagrams that cannot be judged through text OCR alone. The project separates diagrams from text and evaluates them independently.

Connected-Component Method

Diagram Extraction Pipeline
flowchart LR A["Page Image"] --> B["Convert to Grayscale"] B --> C["Threshold — isolate ink"] C --> D["Morphological Closing\njoin nearby strokes"] D --> E["connectedComponentsWithStats"] E --> F["Filter small components\ntext & noise removal"] F --> G["Prefer lower-region components"] G --> H["Compute union bounding box"] H --> I["Save cropped PNG\nresults/diagrams/"]

Output Path

results/diagrams/<BaseName>_page<PageNo>_diagram.png
Advantages: Lightweight and explainable — no trained object detector required. Works well for diagram-heavy lower regions of a page.
Limitations: Unusual layouts or closely integrated text-diagram areas can reduce accuracy.
12

Formula Extraction & OCR

Implemented in src/formula_pipeline.py. Mathematical expressions should not be treated as ordinary text because OCR noise makes direct string comparison unreliable. This module isolates formula-like lines and converts them to LaTeX for symbolic analysis.

Processing Steps

  1. Read page-wise OCR blocks.
  2. Convert each OCR block bounding box into rectangular regions.
  3. Group nearby blocks into line candidates.
  4. Detect whether a line looks formula-like (_looks_formula_like()).
  5. Crop the formula region with padding.
  6. Run pix2tex to convert the crop into LaTeX.
  7. Save page-wise formula JSON.

Formula Detection Heuristics

The _looks_formula_like() function checks for patterns such as:

Formula Output Format

{
  "page": 1,
  "formula_count": 2,
  "formulas": [
    {
      "index": 1,
      "bbox": [x1, y1, x2, y2],
      "ocr_text": "E = mc^2",
      "latex": "E=mc^2",
      "crop_path": "results/formula_crops/...",
      "error": ""
    }
  ]
}
13

Formula Evaluation

Implemented in src/formula_evaluator.py. Two formulas may be mathematically equivalent but look different as strings, e.g. a/b vs \frac{a}{b} vs algebraically rearranged forms. A formula-aware evaluator must check equivalence, not just string similarity.

Formula Evaluation Logic
flowchart TD A["LaTeX String Input"] --> B["Normalize LaTeX-like strings"] B --> C["Try parse_latex\nsympy.parsing.latex"] C --> D{Parse OK?} D -->|Yes| E["Symbolic simplification"] D -->|No| F["Fallback parse_expr"] F --> G["String similarity fallback"] E --> H{Symbolically Equal?} H -->|Yes| I["Score: 1.0\nexact or equivalent"] H -->|No| J["Partial score\nclose but imperfect"] G --> J

Scoring Strategy

Feedback Labels

exact match
mathematically equivalent
~partially correct
!missing formula
substantially different formula
14

Text & Diagram Evaluation — SBERT Path

Implemented in src/evaluation_core.py. The primary local research evaluation path combines semantic text, diagram, and formula similarity.

Text Similarity — SBERT

Scoring Curve

Sample Question Score Breakdown (Illustrative)

Text Similarity
0.82
Diagram Similarity
0.71
Formula Similarity
1.00
Weighted Total
7.9
Scoring Curve: Raw cosine similarity is not used directly. If similarity ≤ 0.6, score fraction is 0. Above that, marks increase linearly to 1.0. This makes grading stricter than raw cosine similarity.

Diagram Evaluation — CLIP

Weight Normalization

For each question: text, diagram, and formula weights are read from the rubric, normalized, and each modality contributes proportionally to the final score. If the rubric's penalize_missing_diagram is set and the student has no diagram, marks for that modality reduce to zero.

15

Gemini LLM Evaluation Path

Implemented in src/llm_evaluator.py. Provides rubric-aware LLM grading as an alternative to local SBERT evaluation.

Model
gemini-2.5-flash
Fast, cost-efficient Gemini model with strong reasoning capabilities.
Requirement
GEMINI_API_KEY
Loaded from .env file or environment variables at runtime.

Prompt Strategy

For each question, the evaluator builds a prompt containing: question id, rubric JSON, ideal answer text, student answer text, text and diagram maximum marks, and whether ideal/student diagrams are available. If diagrams exist, the evaluator sends both images alongside the prompt.

Expected JSON Response

{
  "text_score": 7.2,
  "diagram_score": 3.5,
  "score": 8.1,
  "max_marks": 10,
  "feedback": "Student answer covers key concepts but misses..."
}
Safety Handling: If Gemini fails or returns invalid output, the system falls back to zero score for that question and writes error details into the feedback field — ensuring no silent failures.
16

Pipeline Orchestration

Implemented in src/pipeline_service.py. Coordinates the full run: write rubric JSON, configure OCR / diagram / formula extraction, select evaluator, generate reports, and publish progress callbacks.

Engine-Dependent Flow Branching
flowchart TD A["Run Start"] --> B{Engine?} B -->|SBERT| C["OCR runs"] C --> D["Formula extraction runs"] D --> E["Diagram extraction runs"] E --> F["SBERT evaluator runs"] F --> G["Save to results/reports/"] B -->|LLM Gemini| H["OCR runs"] H --> I["Diagram extraction runs"] I --> J["Gemini evaluator runs"] J --> K["Save to results/reports_llm/"] G --> L["Progress callback — done"] K --> L

Progress Messages Published

17

Report Generation

Implemented in src/report_generator.py using fpdf2. Converts evaluation JSON results into faculty-friendly downloadable PDF reports.

Report Contents (Per Student)

Identity
Student name, total score, percentage
Per-Question Breakdown
Score, max marks, text/diagram/formula similarity

Each question entry also includes a natural-language feedback string explaining why marks were awarded or deducted.

18

FastAPI Backend

Implemented in backend_api/main.py. Acts as the bridge between the web interface and the evaluation pipeline — handling file uploads, job creation, synchronous evaluation, run history, report downloads, and runtime configuration.

API Endpoints

Health Endpoints

GET/Root health check
GET/healthzKubernetes liveness probe
GET/api/healthAPI health status

Informational Endpoints

GET/api/teamTeam information
GET/api/resourcesProject resources
GET/api/documentationDocumentation content
GET/api/runtime-configRuntime configuration
GET/api/research-paperResearch paper endpoint

Job & Run Endpoints

GET/api/jobsList all jobs
GET/api/jobs/{run_id}Get specific job status
GET/api/runs/{run_id}Get run results
GET/api/runs/{run_id}/reports/{name}Download PDF report
POST/api/jobsCreate background job
POST/api/evaluateRun synchronous evaluation

Job Metadata Stored Per Run

run id, status, engine, OCR backend, student count, created time, start time, completion time, current step, total steps, progress percent, events timeline, error message, report directory, evaluation directory, summary rows.

19

Streamlit Interface

Implemented in src/app.py. Provides a fast experimentation and demonstration interface for uploading files, running local evaluations, observing workflow progress, showing ranking tables, and downloading reports.

Tabs & Sections

🏠 Home
Project overview and quick start
⚡ Evaluate
File upload + run pipeline
📚 Documentation
Methodology overview
👥 Team
Project members
📄 Research Paper
Academic paper link
🔗 Resources
External references
.venv\Scripts\python.exe -m streamlit run src\app.py
20

Next.js Web Interface

Product-style web interface in the web/ directory.

Stack
Next.js 15 + React 19
TypeScript + React Markdown
API Handling
web/lib/api.ts
NEXT_PUBLIC_API_BASE_URL in dev, cloud backend in production

Evaluate Page Features

cd web
npm install
npm run dev
21

Deployment Architecture

Production Deployment Architecture
flowchart LR U["End User\nbrowser"] --> V["Vercel\nNext.js Frontend"] V --> W["Google Cloud Run\nFastAPI Backend"] W --> X["Pipeline & Models\nSBERT / CLIP / pix2tex"] W --> Y["Cloud OCR APIs\nGoogle Vision / Azure"] W --> Z["Gemini API\nGemini 2.5 Flash"]

Docker Backend — Key Points

Helper Scripts

22

Environment Variables

VariableRequired ForDescription
ALLOWED_ORIGINSBackendCORS allowed origins for the API
API_RUNS_ROOTBackendRoot directory for storing run artifacts
POPPLER_PATHBackendPath to Poppler binaries for pdf2image
EASYOCR_USE_GPUBackendEnable GPU for EasyOCR if available
GEMINI_API_KEYLLM PathGoogle Gemini API key
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINTAzure OCRAzure DI service endpoint URL
AZURE_DOCUMENT_INTELLIGENCE_KEYAzure OCRAzure DI API key
NEXT_PUBLIC_API_BASE_URLFrontendAPI base URL for browser-side calls
BACKEND_API_BASE_URLFrontendAPI base URL for server-side calls
Google Vision: Uses Google Application Default Credentials — local login via gcloud auth application-default login, or a service account with Vision API access in deployment.
23

Tools & Libraries

LayerTool / LibraryPurposeLink
LanguagePythonCore pipeline, backend, OCR, evaluationpython.org
Research UIStreamlitLocal testing and research dashboardstreamlit.io
Backend APIFastAPIUploads, runs, reports, API routesfastapi.tiangolo.com
API ServerUvicornASGI server for FastAPIuvicorn.org
Web FrontendNext.js 15Product-style web interfacenextjs.org
Frontend UIReact 19Web component layerreact.dev
OCREasyOCRPrinted and clean-sheet OCRGitHub
OCRGoogle Vision AIHandwritten OCR + cloud text extractionGoogle Cloud Vision
OCRAzure Document IntelligenceHandwritten OCR alternativeAzure
PDF Processingpdf2imageConvert PDF pages into imagesGitHub
Image ProcessingOpenCVDiagram extraction and image operationsopencv.org
Image ProcessingPillowImage loading and conversionspython-pillow.org
Semantic ScoringSentence TransformersText similarity using MiniLMsbert.net
Visual ScoringCLIPDiagram image similarityGitHub
Formula OCRpix2texEquation image → LaTeXGitHub
Symbolic MathSymPyFormula parsing and equivalence checkingsympy.org
LLM ScoringGemini 2.5 FlashRubric-aware LLM evaluationGemini API Docs
PDF ReportsFPDF2Student report generationFPDF2 Docs
ContainerizationDockerBackend deployment imagedocker.com
HostingVercelNext.js frontend deploymentvercel.com
HostingGoogle Cloud RunContainerized backend deploymentcloud.google.com/run
CI/CDGitHub ActionsBuild and push automationGitHub Actions
24

Sample Data Flow

A practical walkthrough of one complete evaluation run:

  1. Upload Ideal Answer Sheet.pdf
  2. Upload rubric.json
  3. Upload one or more student PDFs
  4. Choose SBERT or Gemini as the evaluation engine
  5. Choose OCR backend: EasyOCR, Google Vision AI, or Azure Document Intelligence
  6. Click Run evaluation
  7. OCR JSON is created for ideal + all student sheets
  8. Diagram and formula outputs are extracted per page
  9. Student answers are compared page by page against ideal
  10. Weighted scores are calculated per question per student
  11. Evaluation JSON and PDF reports are generated
  12. Final summary ranking table is displayed with download links
25

Strengths & Challenges

✅ Current Strengths
  • Multimodal design — not just plain OCR
  • Both local and cloud evaluation paths
  • Handwritten OCR support
  • Symbolic math comparison for formulas
  • Downloadable PDF reports
  • Both research and product-style interfaces
⚠ Current Challenges
  • OCR quality is the biggest bottleneck for complex handwriting
  • One-page-per-question simplification
  • Diagram extraction is heuristic-based
  • Formula detection can miss edge cases
  • Large hosted runs need careful deployment tuning
26

Future Enhancements

  1. Better question segmentation beyond one-page-one-question assumption
  2. Stronger handwritten formula recognition (specialized models)
  3. Diagram object detection instead of only connected components
  4. Better dashboard analytics and charts
  5. Cloud storage for long-term report persistence
  6. Human-in-the-loop override and review mode
  7. Multi-rubric support for different courses
  8. Better benchmarking on large answer-sheet datasets
27

Conclusion

The Automated Subjective Answer Sheet Evaluation System is a research-driven, multimodal answer-sheet evaluation platform designed to handle the real complexity of descriptive academic assessment. Instead of relying only on exact keywords, it combines OCR, semantic similarity, diagram analysis, formula parsing, symbolic math evaluation, rubric-based weighting, and report generation into one connected workflow.

This makes the project much more realistic and extensible than a simple text-matching prototype. It already demonstrates a strong foundation for an academic BTP project and also forms the base for a future product-grade evaluation platform.

📌
This documentation page is intentionally written in a structured report style so it can later be converted into a formal project report, project synopsis, documentation chapter, or research paper draft.
🔗

Quick Reference

Core Source Files
  • src/ocr_pipeline.py
  • src/diagram_extractor.py
  • src/formula_pipeline.py
  • src/formula_evaluator.py
  • src/evaluation_core.py
  • src/llm_evaluator.py
  • src/pipeline_service.py
  • src/report_generator.py
  • src/app.py
  • backend_api/main.py
Key Config Files
  • rubric.json
  • requirements.txt
  • requirements.backend.txt
  • Dockerfile
  • .env.production.example
  • .github/workflows/backend-acr.yml
  • web/app/evaluate/page.tsx