Implements the seed_exam_corpus.py skeleton TODOs against the real APIs and
fills the public exam corpus from official board sources.
Loader (run/initialization/seed_exam_corpus.py):
- _resolve_source_bytes: local path | url: fetch with on-disk cache + PDF validation
- upload_file: real StorageAdmin.upload_file, skip-if-exists+sha256 unless --force
- upsert_specification/upsert_paper: real upserts on spec_code/exam_code.
Fix: QP/MS/INSERT/ER role -> eb_exams.type_code; doc_type set to 'pdf'
(doc_type is CHECK-constrained to file formats; the skeleton wrote the role there).
- copy_user_test_subset: copy a QP subset into a test user's cc.users exam space + files rows
- first_sweep: auto_map + the /auto-map row mapper over seeded QPs -> system-owned
exam_templates + questions/response_areas/boundaries/layout (idempotent)
- identity discovery via institute_memberships.profile_id
Manifest (run/initialization/manifests/):
- exam-corpus.yaml: 505 papers / 18 specs / AQA+Edexcel+OCR, every source URL HEAD-verified.
AQA sciences GCSE 8461/8462/8463/8464 + AS/A-level 7401-7408, sessions JUN18-JUN24, QP+MS+ER, F+H.
- generate_corpus_manifest.py: regenerates + re-verifies all URLs from official hosts.
seed_curriculum.py: deprecation banner -> superseded by seed_exam_corpus.py; storage_loc
standardised on cc.examboards.
Verified on dev .94: full 505-paper seed (eb_specifications=18, eb_exams=505, QP=211),
idempotent re-runs, first-sweep + user-subset, 6/6 buckets provisioned.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>