CC Worker 69d9c46abe
Some checks failed
api-ci-deploy / test-build-deploy (push) Has been cancelled
feat(docling): B1 image-only OCR eval harness (overwatch-cleaned)
Eval harness for AQA A-level + GCSE-science image-only papers: finalize.py --b1-only,
RapidOCR runner (rapid_pass.py via dsync), GT fixtures (make_b1_gt.py + b1_gt_labels.json),
and fetch_b1_corpus.py to pull the eval corpus from .94 cc.examboards at runtime.

Salvaged from t_15be12ed (which timed out on iteration budget re-running OCR): exam PDFs and
generated OCR caches/reports are NOT committed (third-party copyright + reproducible) — gitignored
and fetched/generated at runtime. Baseline coverage recorded in the task evidence file.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 03:10:10 +00:00
..

API Docling first-pass auto-map package

This package is the in-API home for the S5 exam-template/first-pass/v1 extraction pipeline copied from /home/kcar/dev/docling-exam-spike.

auto_map(pdf_bytes) returns the editable first-pass template.json shape consumed by downstream exam-marker mapping. The pipeline keeps margins as constraining inputs: document left/right and per-page top/bottom margins are derived before template assembly, then part/question bands and furniture/figure boxes are constrained through those margins.

dsync Redis env wiring

The OCR path uses dsync.py for docling-serve GPU locking, page cache, and retry. Configure with env-var names only:

  • DOCLING_SERVE
  • DOCLING_REDIS_URL
  • DOCLING_REDIS_HOST
  • DOCLING_REDIS_PORT
  • DOCLING_REDIS_PASSWORD
  • DOCLING_REDIS_DB

If Redis is unavailable, dsync falls back to no cache/lock and logs that state. Do not put secret values in this file.