67 lines
2.6 KiB
Markdown
67 lines
2.6 KiB
Markdown
# Auto-Processing Code Archive
|
|
|
|
This directory contains the complex auto-processing system that was previously used for automatic document processing after file upload.
|
|
|
|
## Archived Components
|
|
|
|
### Core Processing Files
|
|
- `files_with_auto_processing.py` - Original files.py router with automatic processing
|
|
- `pipeline_controller.py` - Complex multi-phase pipeline orchestration
|
|
- `task_processors.py` - Document processing task handlers
|
|
|
|
### Advanced Queue Management (Created but not deployed)
|
|
- `memory_aware_queue.py` - Memory-based intelligent queue management
|
|
- `enhanced_upload_handler.py` - Advanced upload handler with queuing
|
|
- `enhanced_upload.py` - API endpoints for advanced upload system
|
|
|
|
## What This System Did
|
|
|
|
### Automatic Processing Pipeline
|
|
1. **File Upload** → Immediate processing trigger
|
|
2. **PDF Conversion** (synchronous, blocking)
|
|
3. **Phase 1**: Structure discovery (Tika, Page Images, Document Analysis, Split Map)
|
|
4. **Phase 2**: Docling processing (NO_OCR → OCR → VLM pipelines)
|
|
5. **Complex Dependencies**: Phase coordination, task sequencing
|
|
6. **Redis Queue Management**: Service limits, rate limits, dependency tracking
|
|
|
|
### Features
|
|
- Multi-phase processing pipelines
|
|
- Complex task dependency management
|
|
- Memory-aware queue limits
|
|
- Multi-user capacity management
|
|
- Real-time processing status
|
|
- WebSocket status updates
|
|
- Service-specific resource limits
|
|
- Task recovery on restart
|
|
|
|
## Why Archived
|
|
|
|
The system was overly complex for the current needs:
|
|
- **Complexity**: Multi-phase pipelines with complex dependencies
|
|
- **Blocking Operations**: Synchronous PDF conversion causing timeouts
|
|
- **Resource Management**: Over-engineered for single-user scenarios
|
|
- **User Experience**: Users had to wait for processing to complete
|
|
|
|
## New Simplified Approach
|
|
|
|
The new system focuses on:
|
|
- **Simple Upload**: Just store files and create database records
|
|
- **No Auto-Processing**: Users manually trigger processing when needed
|
|
- **Directory Support**: Upload entire folders with manifest tracking
|
|
- **Immediate Response**: Users get instant confirmation without waiting
|
|
|
|
## If You Need to Restore
|
|
|
|
To restore the auto-processing functionality:
|
|
1. Copy `files_with_auto_processing.py` back to `routers/database/files/files.py`
|
|
2. Ensure `pipeline_controller.py` and `task_processors.py` are in `modules/`
|
|
3. Update imports and dependencies
|
|
4. Re-enable background processing in upload handlers
|
|
|
|
## Migration Notes
|
|
|
|
The database schema and Redis structure remain compatible. The new simplified system can coexist with the archived processing logic if needed.
|
|
|
|
Date Archived: $(date)
|
|
Reason: Simplification for directory upload implementation
|