Table of Contents
- 01Project Overview
- 02Objectives
- 03Scope & Limitations
- 04End Users
- 05System Architecture
- 06Repository Structure
- 07Input Design
- 08Rubric Format
- 09End-to-End Workflow
- 10OCR Layer
- 11Diagram Extraction
- 12Formula Extraction
- 13Formula Evaluation
- 14SBERT Evaluation
- 15Gemini LLM Path
- 16Pipeline Orchestration
- 17Report Generation
- 18FastAPI Backend
- 19Streamlit Interface
- 20Next.js Frontend
- 21Deployment
- 22Environment Variables
- 23Tools & Libraries
- 24Sample Data Flow
- 25Strengths & Challenges
- 26Future Enhancements
Project Overview
1.1 Problem Statement
Traditional answer-sheet evaluation works reasonably well for objective questions, but subjective answer sheets are much harder to assess automatically. In descriptive exams, students do not always write the same words as the ideal answer — they may express the same concept in a different sentence structure, draw diagrams differently, or write mathematical expressions in equivalent but non-identical forms.
Because of this, simple keyword matching is not enough. A practical evaluation system must understand:
1.2 Project Goal
To create a usable academic evaluation platform that can accept an ideal answer sheet, rubric, and student answer sheets; extract and evaluate textual content semantically; separately process diagrams and formulas; and generate per-student scores, feedback, and downloadable PDF reports.
1.3 Why This Project Matters
Manual checking of subjective answer sheets is time-consuming, repetitive, and inconsistent across large batches. This system reduces manual effort while preserving a rubric-guided and explainable workflow — especially valuable for faculty members, researchers in educational technology, and experiments in multimodal grading.
System Flow — 6-Stage Pipeline
Input Collection
Ideal answer sheet PDF, rubric JSON, and student answer sheet PDFs are uploaded for one evaluation run.
OCR & Extraction
Printed sheets use EasyOCR; handwritten sheets use Google Vision AI or Azure Document Intelligence.
Formula Parsing
Mathematical regions are isolated and converted through pix2tex before symbolic validation with SymPy.
Diagram Analysis
Visual regions are extracted with OpenCV-based preprocessing for diagram-aware comparison using CLIP.
Evaluation
SBERT or Gemini scores the extracted answers against the ideal reference using rubric-aware weighting.
Result Generation
The system produces JSON outputs, ranking summaries, and downloadable PDF reports per student.
Objectives
- Build a pipeline that evaluates descriptive answers beyond exact keyword matching.
- Support both printed and handwritten answer sheets through multiple OCR paths.
- Add diagram-aware scoring for visual answers.
- Add formula-aware scoring for mathematical expressions.
- Support two evaluation modes: local SBERT and LLM-based Gemini evaluation.
- Generate human-readable PDF reports for each student.
- Provide both a research-facing Streamlit interface and a product-style Next.js web interface.
Scope & Limitations
3.1 In Scope
- Subjective answer sheet evaluation with OCR-based text extraction
- Printed sheet OCR (EasyOCR), handwritten OCR (Google Vision AI + Azure)
- Semantic text similarity scoring, diagram extraction + visual comparison
- Formula extraction + symbolic evaluation via SymPy
- Rubric-based weighted scoring with JSON and PDF report generation
- Streamlit research UI + FastAPI backend + Next.js web frontend
3.2 Current Assumptions
- One page is treated as one question.
- The ideal answer sheet acts as the reference solution.
- Rubric JSON provides per-question weights and maximum marks.
- Diagram regions are expected mainly in lower or distinct visual sections of a page.
- Formula extraction depends on OCR layout heuristics.
End Users
The main users of the system are:
- Faculty / examiners — grading descriptive answer sheets at scale
- Project supervisors and reviewers — evaluating BTP or research submissions
- Researchers — working on answer-sheet automation and multimodal grading
- Students / demo audiences — observing the evaluation system in action
System Architecture
The project is a multimodal pipeline built around three core ideas: extract evidence from answer sheets, evaluate each evidence type using the most suitable model, and combine results according to rubric weights.
Repository Structure
auto_subjective_grader/
├── backend_api/ ← FastAPI backend (REST API)
│ ├── main.py
│ ├── settings.py
│ └── content.py
│
├── src/ ← Core research pipeline + Streamlit UI
│ ├── app.py
│ ├── pipeline_service.py
│ ├── ocr_pipeline.py
│ ├── diagram_extractor.py
│ ├── formula_pipeline.py
│ ├── formula_evaluator.py
│ ├── evaluation_core.py
│ ├── llm_evaluator.py
│ └── report_generator.py
│
├── web/ ← Next.js product-style frontend
│ ├── app/
│ ├── components/
│ ├── lib/
│ └── public/
│
├── data/ ← Development test PDFs
│ ├── ideal/
│ └── students/
│
├── results/ ← All generated artifacts
│ ├── ocr/
│ ├── diagrams/
│ ├── formulas/
│ ├── formula_crops/
│ ├── eval/
│ ├── eval_llm/
│ ├── reports/
│ └── reports_llm/
│
├── rubric.json
├── requirements.txt
├── requirements.backend.txt
├── Dockerfile
└── README.md
Input Design
Every run starts with three required inputs:
Rubric Format
The project uses a JSON rubric. A simplified example:
{
"1": {
"max_marks": 10,
"text_weight": 0.6,
"diagram_weight": 0.4,
"formula_weight": 0.2,
"penalize_missing_diagram": true,
"penalize_missing_formula": true
}
}
Rubric Field Reference
| Field | Type | Description |
|---|---|---|
max_marks | Number | Total marks available for the question |
text_weight | Float 0–1 | Contribution of textual answer quality to final score |
diagram_weight | Float 0–1 | Contribution of diagram quality to final score |
formula_weight | Float 0–1 | Contribution of formula correctness to final score |
penalize_missing_diagram | Boolean | Whether missing diagram reduces marks to zero for that modality |
penalize_missing_formula | Boolean | Whether missing formula reduces marks to zero for that modality |
End-to-End Workflow
Workflow Stages
- Input validation — verify all three required files are present
- OCR on ideal answer sheet — extract ground-truth text and layouts
- OCR on each student sheet — extract per-page text blocks
- Formula extraction — isolate math regions and convert to LaTeX (SBERT mode)
- Diagram extraction — connected-component based visual isolation
- Student-wise evaluation — compare all modalities against ideal
- JSON output generation — structured per-student result files
- PDF report generation — faculty-friendly individual reports
- Summary display — ranking table and download links
OCR Layer
Implemented in src/ocr_pipeline.py. Converts PDFs to images, runs the selected OCR backend, and saves page-wise JSON files with bounding boxes and confidence scores.
OCR Output Format
{
"page": 1,
"text": "full extracted text",
"blocks": [
{
"bbox": [[x1,y1],[x2,y2],[x3,y3],[x4,y4]],
"text": "block text",
"confidence": 0.92
}
]
}
OCR Backend Comparison
| Backend | Best For | Deployment | Handwriting | Requirement |
|---|---|---|---|---|
| EasyOCR | Printed / clean sheets | Local (no API) | Limited | None — runs locally |
| Google Vision AI | Handwritten sheets | Cloud API | Excellent | Google Application Default Credentials |
| Azure Document Intelligence | Handwritten sheets | Cloud API | Excellent | Endpoint + API Key |
EasyOCR Processing Steps
- Convert PDF pages to images using
pdf2image. - Convert PIL image to NumPy array.
- Run EasyOCR
readeron the page. - Save text blocks and full page text as JSON.
Google Vision AI Processing Steps
- Convert PDF pages to images.
- Compress pages to fit Vision API request limits.
- Call
document_text_detection. - Extract paragraph/word blocks with bounding boxes.
- Save page-wise OCR JSON.
Azure Document Intelligence Steps
- Convert PDF pages to images.
- Compress to fit Azure request limits.
- Call
prebuilt-readmodel. - Extract line/word content with polygon coordinates.
- Save OCR results as JSON.
Diagram Extraction
Implemented in src/diagram_extractor.py. Many answer sheets contain figures or circuit-style diagrams that cannot be judged through text OCR alone. The project separates diagrams from text and evaluates them independently.
Connected-Component Method
Output Path
results/diagrams/<BaseName>_page<PageNo>_diagram.png
Limitations: Unusual layouts or closely integrated text-diagram areas can reduce accuracy.
Formula Extraction & OCR
Implemented in src/formula_pipeline.py. Mathematical expressions should not be treated as ordinary text because OCR noise makes direct string comparison unreliable. This module isolates formula-like lines and converts them to LaTeX for symbolic analysis.
Processing Steps
- Read page-wise OCR blocks.
- Convert each OCR block bounding box into rectangular regions.
- Group nearby blocks into line candidates.
- Detect whether a line looks formula-like (
_looks_formula_like()). - Crop the formula region with padding.
- Run
pix2texto convert the crop into LaTeX. - Save page-wise formula JSON.
Formula Detection Heuristics
The _looks_formula_like() function checks for patterns such as:
- Math operators:
=,+,*,^ - Fraction-like patterns
- Trig/calculus keywords:
sin,cos,log,lim,dx - High symbolic density
- Equation-like alphanumeric patterns
Formula Output Format
{
"page": 1,
"formula_count": 2,
"formulas": [
{
"index": 1,
"bbox": [x1, y1, x2, y2],
"ocr_text": "E = mc^2",
"latex": "E=mc^2",
"crop_path": "results/formula_crops/...",
"error": ""
}
]
}
Formula Evaluation
Implemented in src/formula_evaluator.py. Two formulas may be mathematically equivalent but look different as strings, e.g. a/b vs \frac{a}{b} vs algebraically rearranged forms. A formula-aware evaluator must check equivalence, not just string similarity.
Scoring Strategy
1.0— exact or symbolically equivalent formulas- Partial score — close but imperfect matches (string similarity fallback)
0.0— incorrect or missing required formulas
Feedback Labels
Text & Diagram Evaluation — SBERT Path
Implemented in src/evaluation_core.py. The primary local research evaluation path combines semantic text, diagram, and formula similarity.
Text Similarity — SBERT
- Model:
sentence-transformers/all-MiniLM-L6-v2 - Converts ideal and student answers into embeddings, computes cosine similarity
- Allows judging semantic closeness even when exact wording differs
Scoring Curve
Sample Question Score Breakdown (Illustrative)
0.6, score fraction is 0. Above that, marks increase linearly to 1.0. This makes grading stricter than raw cosine similarity.Diagram Evaluation — CLIP
- Model:
openai/clip-vit-base-patch32 - Ideal and student diagram images are embedded and compared by cosine similarity
Weight Normalization
For each question: text, diagram, and formula weights are read from the rubric, normalized, and each modality contributes proportionally to the final score. If the rubric's penalize_missing_diagram is set and the student has no diagram, marks for that modality reduce to zero.
Gemini LLM Evaluation Path
Implemented in src/llm_evaluator.py. Provides rubric-aware LLM grading as an alternative to local SBERT evaluation.
.env file or environment variables at runtime.Prompt Strategy
For each question, the evaluator builds a prompt containing: question id, rubric JSON, ideal answer text, student answer text, text and diagram maximum marks, and whether ideal/student diagrams are available. If diagrams exist, the evaluator sends both images alongside the prompt.
Expected JSON Response
{
"text_score": 7.2,
"diagram_score": 3.5,
"score": 8.1,
"max_marks": 10,
"feedback": "Student answer covers key concepts but misses..."
}
Pipeline Orchestration
Implemented in src/pipeline_service.py. Coordinates the full run: write rubric JSON, configure OCR / diagram / formula extraction, select evaluator, generate reports, and publish progress callbacks.
Progress Messages Published
- Running OCR on ideal answer sheet
- Extracting formulas
- Extracting diagrams
- Evaluating student [name]
- Generating report
Report Generation
Implemented in src/report_generator.py using fpdf2. Converts evaluation JSON results into faculty-friendly downloadable PDF reports.
Report Contents (Per Student)
Each question entry also includes a natural-language feedback string explaining why marks were awarded or deducted.
FastAPI Backend
Implemented in backend_api/main.py. Acts as the bridge between the web interface and the evaluation pipeline — handling file uploads, job creation, synchronous evaluation, run history, report downloads, and runtime configuration.
API Endpoints
Health Endpoints
Informational Endpoints
Job & Run Endpoints
Job Metadata Stored Per Run
run id, status, engine, OCR backend, student count, created time, start time, completion time, current step, total steps, progress percent, events timeline, error message, report directory, evaluation directory, summary rows.
Streamlit Interface
Implemented in src/app.py. Provides a fast experimentation and demonstration interface for uploading files, running local evaluations, observing workflow progress, showing ranking tables, and downloading reports.
Tabs & Sections
.venv\Scripts\python.exe -m streamlit run src\app.py
Next.js Web Interface
Product-style web interface in the web/ directory.
Evaluate Page Features
- File uploads (ideal PDF, rubric JSON, student PDFs)
- Rubric sample download
- Evaluation engine selection (SBERT / Gemini)
- OCR backend selection (EasyOCR / Vision AI / Azure)
- Live workflow tracking via progress callbacks
- Recent run history
- Summary tables and per-student report downloads
cd web
npm install
npm run dev
Deployment Architecture
Docker Backend — Key Points
- Base image:
python:3.10-slim - Installs system dependencies: Poppler, OpenCV runtime libraries
- Installs backend-only Python requirements (
requirements.backend.txt) - Exposes port
8001 - Starts FastAPI via
start-backend.sh→uvicorn backend_api.main:app
Helper Scripts
run_all_local.cmd— start all three services at oncerun_api_8001.cmd— start FastAPI backendrun_streamlit_8502.cmd— start Streamlit UIrun_web_3100.cmd— start Next.js frontend
Environment Variables
| Variable | Required For | Description |
|---|---|---|
ALLOWED_ORIGINS | Backend | CORS allowed origins for the API |
API_RUNS_ROOT | Backend | Root directory for storing run artifacts |
POPPLER_PATH | Backend | Path to Poppler binaries for pdf2image |
EASYOCR_USE_GPU | Backend | Enable GPU for EasyOCR if available |
GEMINI_API_KEY | LLM Path | Google Gemini API key |
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT | Azure OCR | Azure DI service endpoint URL |
AZURE_DOCUMENT_INTELLIGENCE_KEY | Azure OCR | Azure DI API key |
NEXT_PUBLIC_API_BASE_URL | Frontend | API base URL for browser-side calls |
BACKEND_API_BASE_URL | Frontend | API base URL for server-side calls |
gcloud auth application-default login, or a service account with Vision API access in deployment.Tools & Libraries
| Layer | Tool / Library | Purpose | Link |
|---|---|---|---|
| Language | Python | Core pipeline, backend, OCR, evaluation | python.org |
| Research UI | Streamlit | Local testing and research dashboard | streamlit.io |
| Backend API | FastAPI | Uploads, runs, reports, API routes | fastapi.tiangolo.com |
| API Server | Uvicorn | ASGI server for FastAPI | uvicorn.org |
| Web Frontend | Next.js 15 | Product-style web interface | nextjs.org |
| Frontend UI | React 19 | Web component layer | react.dev |
| OCR | EasyOCR | Printed and clean-sheet OCR | GitHub |
| OCR | Google Vision AI | Handwritten OCR + cloud text extraction | Google Cloud Vision |
| OCR | Azure Document Intelligence | Handwritten OCR alternative | Azure |
| PDF Processing | pdf2image | Convert PDF pages into images | GitHub |
| Image Processing | OpenCV | Diagram extraction and image operations | opencv.org |
| Image Processing | Pillow | Image loading and conversions | python-pillow.org |
| Semantic Scoring | Sentence Transformers | Text similarity using MiniLM | sbert.net |
| Visual Scoring | CLIP | Diagram image similarity | GitHub |
| Formula OCR | pix2tex | Equation image → LaTeX | GitHub |
| Symbolic Math | SymPy | Formula parsing and equivalence checking | sympy.org |
| LLM Scoring | Gemini 2.5 Flash | Rubric-aware LLM evaluation | Gemini API Docs |
| PDF Reports | FPDF2 | Student report generation | FPDF2 Docs |
| Containerization | Docker | Backend deployment image | docker.com |
| Hosting | Vercel | Next.js frontend deployment | vercel.com |
| Hosting | Google Cloud Run | Containerized backend deployment | cloud.google.com/run |
| CI/CD | GitHub Actions | Build and push automation | GitHub Actions |
Sample Data Flow
A practical walkthrough of one complete evaluation run:
- Upload
Ideal Answer Sheet.pdf - Upload
rubric.json - Upload one or more student PDFs
- Choose SBERT or Gemini as the evaluation engine
- Choose OCR backend: EasyOCR, Google Vision AI, or Azure Document Intelligence
- Click Run evaluation
- OCR JSON is created for ideal + all student sheets
- Diagram and formula outputs are extracted per page
- Student answers are compared page by page against ideal
- Weighted scores are calculated per question per student
- Evaluation JSON and PDF reports are generated
- Final summary ranking table is displayed with download links
Strengths & Challenges
- Multimodal design — not just plain OCR
- Both local and cloud evaluation paths
- Handwritten OCR support
- Symbolic math comparison for formulas
- Downloadable PDF reports
- Both research and product-style interfaces
- OCR quality is the biggest bottleneck for complex handwriting
- One-page-per-question simplification
- Diagram extraction is heuristic-based
- Formula detection can miss edge cases
- Large hosted runs need careful deployment tuning
Future Enhancements
- Better question segmentation beyond one-page-one-question assumption
- Stronger handwritten formula recognition (specialized models)
- Diagram object detection instead of only connected components
- Better dashboard analytics and charts
- Cloud storage for long-term report persistence
- Human-in-the-loop override and review mode
- Multi-rubric support for different courses
- Better benchmarking on large answer-sheet datasets
Conclusion
The Automated Subjective Answer Sheet Evaluation System is a research-driven, multimodal answer-sheet evaluation platform designed to handle the real complexity of descriptive academic assessment. Instead of relying only on exact keywords, it combines OCR, semantic similarity, diagram analysis, formula parsing, symbolic math evaluation, rubric-based weighting, and report generation into one connected workflow.
This makes the project much more realistic and extensible than a simple text-matching prototype. It already demonstrates a strong foundation for an academic BTP project and also forms the base for a future product-grade evaluation platform.
Quick Reference
src/ocr_pipeline.pysrc/diagram_extractor.pysrc/formula_pipeline.pysrc/formula_evaluator.pysrc/evaluation_core.pysrc/llm_evaluator.pysrc/pipeline_service.pysrc/report_generator.pysrc/app.pybackend_api/main.py
rubric.jsonrequirements.txtrequirements.backend.txtDockerfile.env.production.example.github/workflows/backend-acr.ymlweb/app/evaluate/page.tsx