2025-11-14 14:47:19 +00:00

2.6 KiB

Auto-Processing Code Archive

This directory contains the complex auto-processing system that was previously used for automatic document processing after file upload.

Archived Components

Core Processing Files

  • files_with_auto_processing.py - Original files.py router with automatic processing
  • pipeline_controller.py - Complex multi-phase pipeline orchestration
  • task_processors.py - Document processing task handlers

Advanced Queue Management (Created but not deployed)

  • memory_aware_queue.py - Memory-based intelligent queue management
  • enhanced_upload_handler.py - Advanced upload handler with queuing
  • enhanced_upload.py - API endpoints for advanced upload system

What This System Did

Automatic Processing Pipeline

  1. File Upload → Immediate processing trigger
  2. PDF Conversion (synchronous, blocking)
  3. Phase 1: Structure discovery (Tika, Page Images, Document Analysis, Split Map)
  4. Phase 2: Docling processing (NO_OCR → OCR → VLM pipelines)
  5. Complex Dependencies: Phase coordination, task sequencing
  6. Redis Queue Management: Service limits, rate limits, dependency tracking

Features

  • Multi-phase processing pipelines
  • Complex task dependency management
  • Memory-aware queue limits
  • Multi-user capacity management
  • Real-time processing status
  • WebSocket status updates
  • Service-specific resource limits
  • Task recovery on restart

Why Archived

The system was overly complex for the current needs:

  • Complexity: Multi-phase pipelines with complex dependencies
  • Blocking Operations: Synchronous PDF conversion causing timeouts
  • Resource Management: Over-engineered for single-user scenarios
  • User Experience: Users had to wait for processing to complete

New Simplified Approach

The new system focuses on:

  • Simple Upload: Just store files and create database records
  • No Auto-Processing: Users manually trigger processing when needed
  • Directory Support: Upload entire folders with manifest tracking
  • Immediate Response: Users get instant confirmation without waiting

If You Need to Restore

To restore the auto-processing functionality:

  1. Copy files_with_auto_processing.py back to routers/database/files/files.py
  2. Ensure pipeline_controller.py and task_processors.py are in modules/
  3. Update imports and dependencies
  4. Re-enable background processing in upload handlers

Migration Notes

The database schema and Redis structure remain compatible. The new simplified system can coexist with the archived processing logic if needed.

Date Archived: $(date) Reason: Simplification for directory upload implementation