2.6 KiB
2.6 KiB
Auto-Processing Code Archive
This directory contains the complex auto-processing system that was previously used for automatic document processing after file upload.
Archived Components
Core Processing Files
files_with_auto_processing.py- Original files.py router with automatic processingpipeline_controller.py- Complex multi-phase pipeline orchestrationtask_processors.py- Document processing task handlers
Advanced Queue Management (Created but not deployed)
memory_aware_queue.py- Memory-based intelligent queue managementenhanced_upload_handler.py- Advanced upload handler with queuingenhanced_upload.py- API endpoints for advanced upload system
What This System Did
Automatic Processing Pipeline
- File Upload → Immediate processing trigger
- PDF Conversion (synchronous, blocking)
- Phase 1: Structure discovery (Tika, Page Images, Document Analysis, Split Map)
- Phase 2: Docling processing (NO_OCR → OCR → VLM pipelines)
- Complex Dependencies: Phase coordination, task sequencing
- Redis Queue Management: Service limits, rate limits, dependency tracking
Features
- Multi-phase processing pipelines
- Complex task dependency management
- Memory-aware queue limits
- Multi-user capacity management
- Real-time processing status
- WebSocket status updates
- Service-specific resource limits
- Task recovery on restart
Why Archived
The system was overly complex for the current needs:
- Complexity: Multi-phase pipelines with complex dependencies
- Blocking Operations: Synchronous PDF conversion causing timeouts
- Resource Management: Over-engineered for single-user scenarios
- User Experience: Users had to wait for processing to complete
New Simplified Approach
The new system focuses on:
- Simple Upload: Just store files and create database records
- No Auto-Processing: Users manually trigger processing when needed
- Directory Support: Upload entire folders with manifest tracking
- Immediate Response: Users get instant confirmation without waiting
If You Need to Restore
To restore the auto-processing functionality:
- Copy
files_with_auto_processing.pyback torouters/database/files/files.py - Ensure
pipeline_controller.pyandtask_processors.pyare inmodules/ - Update imports and dependencies
- Re-enable background processing in upload handlers
Migration Notes
The database schema and Redis structure remain compatible. The new simplified system can coexist with the archived processing logic if needed.
Date Archived: $(date) Reason: Simplification for directory upload implementation