Implements the seed_exam_corpus.py skeleton TODOs against the real APIs and
fills the public exam corpus from official board sources.
Loader (run/initialization/seed_exam_corpus.py):
- _resolve_source_bytes: local path | url: fetch with on-disk cache + PDF validation
- upload_file: real StorageAdmin.upload_file, skip-if-exists+sha256 unless --force
- upsert_specification/upsert_paper: real upserts on spec_code/exam_code.
Fix: QP/MS/INSERT/ER role -> eb_exams.type_code; doc_type set to 'pdf'
(doc_type is CHECK-constrained to file formats; the skeleton wrote the role there).
- copy_user_test_subset: copy a QP subset into a test user's cc.users exam space + files rows
- first_sweep: auto_map + the /auto-map row mapper over seeded QPs -> system-owned
exam_templates + questions/response_areas/boundaries/layout (idempotent)
- identity discovery via institute_memberships.profile_id
Manifest (run/initialization/manifests/):
- exam-corpus.yaml: 505 papers / 18 specs / AQA+Edexcel+OCR, every source URL HEAD-verified.
AQA sciences GCSE 8461/8462/8463/8464 + AS/A-level 7401-7408, sessions JUN18-JUN24, QP+MS+ER, F+H.
- generate_corpus_manifest.py: regenerates + re-verifies all URLs from official hosts.
seed_curriculum.py: deprecation banner -> superseded by seed_exam_corpus.py; storage_loc
standardised on cc.examboards.
Verified on dev .94: full 505-paper seed (eb_specifications=18, eb_exams=505, QP=211),
idempotent re-runs, first-sweep + user-subset, 6/6 buckets provisioned.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds the real AQA GCSE Physics 8463 specification and the AQA-PHYS-8463-1H-22-JUN
exam paper (Paper 1, Higher, June 2022, QP) to seed_curriculum.py, with storage_loc
pointing at the uploaded PDF in the cc.examboards bucket. spec_code AQA-PHYS-8463
matches the cc.public.exams Specification node (S4-1).
Applied + verified on dev .94: eb_specifications + eb_exams rows present; the real PDF
(3,963,384 bytes) is uploaded to cc.examboards/aqa/physics/8463/AQA-PHYS-8463-1H-22-JUN.pdf
and retrievable (HTTP 200, exact byte match). seed run populated the empty catalogue
(7 specs / 16 exams / 42 Neo4j topics).
NOTE: the PDF upload is a one-time ops step (curl from the host to the Storage API) —
the container can't reach the host file. A reproducible fixture-upload step is a follow-up.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>