3 Commits

Author SHA1 Message Date
CC Worker
5750413f43 feat(seed): implement exam-corpus loader + filled 505-paper manifest
Implements the seed_exam_corpus.py skeleton TODOs against the real APIs and
fills the public exam corpus from official board sources.

Loader (run/initialization/seed_exam_corpus.py):
- _resolve_source_bytes: local path | url: fetch with on-disk cache + PDF validation
- upload_file: real StorageAdmin.upload_file, skip-if-exists+sha256 unless --force
- upsert_specification/upsert_paper: real upserts on spec_code/exam_code.
  Fix: QP/MS/INSERT/ER role -> eb_exams.type_code; doc_type set to 'pdf'
  (doc_type is CHECK-constrained to file formats; the skeleton wrote the role there).
- copy_user_test_subset: copy a QP subset into a test user's cc.users exam space + files rows
- first_sweep: auto_map + the /auto-map row mapper over seeded QPs -> system-owned
  exam_templates + questions/response_areas/boundaries/layout (idempotent)
- identity discovery via institute_memberships.profile_id

Manifest (run/initialization/manifests/):
- exam-corpus.yaml: 505 papers / 18 specs / AQA+Edexcel+OCR, every source URL HEAD-verified.
  AQA sciences GCSE 8461/8462/8463/8464 + AS/A-level 7401-7408, sessions JUN18-JUN24, QP+MS+ER, F+H.
- generate_corpus_manifest.py: regenerates + re-verifies all URLs from official hosts.

seed_curriculum.py: deprecation banner -> superseded by seed_exam_corpus.py; storage_loc
standardised on cc.examboards.

Verified on dev .94: full 505-paper seed (eb_specifications=18, eb_exams=505, QP=211),
idempotent re-runs, first-sweep + user-subset, 6/6 buckets provisioned.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 22:58:03 +00:00
CC Worker
5f822eaf87 feat(seed): AQA-PHYS-8463 spec + paper for exam-marker test
Adds the real AQA GCSE Physics 8463 specification and the AQA-PHYS-8463-1H-22-JUN
exam paper (Paper 1, Higher, June 2022, QP) to seed_curriculum.py, with storage_loc
pointing at the uploaded PDF in the cc.examboards bucket. spec_code AQA-PHYS-8463
matches the cc.public.exams Specification node (S4-1).

Applied + verified on dev .94: eb_specifications + eb_exams rows present; the real PDF
(3,963,384 bytes) is uploaded to cc.examboards/aqa/physics/8463/AQA-PHYS-8463-1H-22-JUN.pdf
and retrievable (HTTP 200, exact byte match). seed run populated the empty catalogue
(7 specs / 16 exams / 42 Neo4j topics).

NOTE: the PDF upload is a one-time ops step (curl from the host to the Storage API) —
the container can't reach the host file. A reproducible fixture-upload step is a follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 16:19:10 +00:00
ead4452277 feat: add P0 seed scripts for timetable, planned lessons, file cabinets, and curriculum
- seed_kevlarai_timetable.py: Mirror Greenfield timetable structure for KevlarAI
  (8 classes, 2 teachers, 2 students, full slot/materialize/sync pipeline)
- seed_planned_lessons.py: 2-3 planned lessons per teacher across both schools
  (6 plans total, idempotent via title+subject check)
- seed_file_cabinets.py: One file cabinet per class with sample documents
  (14 cabinets, ~28 files, document_artefacts, cabinet_memberships)
- seed_curriculum.py: Exam board specifications and exams (AQA, Edexcel, OCR)
  (6 specs, 12 exam papers, Neo4j curriculum topics per school)
2026-05-29 21:15:05 +01:00