CC Worker cdc105ae54 feat(seed): expand corpus to 1178 papers + download-only/unseed/granular reset
PRIMARY — corpus breadth (505->1178 papers, 18->60 specs, all URLs HEAD-verified):
- AQA (enumerated): Maths, English Lang/Lit, Geography, Computer Science, Business,
  Psychology, MFL (French/Spanish/German), GCSE + A-level, on top of round-1 sciences.
- Edexcel + OCR (confirmed direct URLs via research): Maths, English, Geography, History,
  Business, Computer Science, GCSE + A-level.
- generate_corpus_manifest.py: _subj/_mfl AQA builders, Edexcel/OCR spec+URL tables,
  derived exam_code (_mk_exam_code) matching the locked convention, concurrent re-verify.
Verified on dev .94: eb_specifications=60, eb_exams=1178, QP=469, doc_type all 'pdf',
seed idempotent (uploaded=673 new, skipped=505), failed=0.

SECONDARY:
- --download-only + persistent bucket-shaped local store (manifests/_corpus_store/, gitignored):
  download-once, seed-many, offline-repeatable; --store-dir/--no-store. (_store_path/_item_bytes/
  download_corpus). Verified: store populated, seed reads offline (download_cached).
- --unseed [--board/--spec]: inverse loader — storage objects (Storage API; protect_delete blocks
  raw SQL), first-sweep seed templates, eb_exams, eb_specifications. Verified reversible on .94.
- Granular admin reset: POST /admin/reset?scope=all|exam-corpus|timetable. reset_environment.reset(scope)
  adds EXAM_CORPUS_TABLES (10) + cc.examboards storage cleanup + TIMETABLE_TABLES (13); 'all' now also
  clears the exam subsystem the legacy reset missed. No schema migration required.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 23:33:20 +00:00
2026-05-28 17:55:37 +01:00
2026-05-27 22:55:01 +01:00
2025-11-14 14:47:19 +00:00
2025-11-19 20:02:34 +00:00
2025-11-19 20:02:34 +00:00
2025-07-11 13:52:19 +00:00
2025-11-14 14:47:19 +00:00
2025-11-14 14:47:19 +00:00
2025-11-14 14:47:19 +00:00
2025-11-14 14:47:19 +00:00
2025-11-14 14:47:19 +00:00
Description
FastAPI + Python 3.12 backend for Classroom Copilot — auth, document processing, transcription sessions, LLM integration, Supabase-backed
62 MiB
Languages
Python 98.9%
Shell 0.8%
Jupyter Notebook 0.3%