api/data/logs/modules.pipeline_controller_.log
2025-11-19 18:08:54 +00:00

2972 lines
546 KiB
Plaintext

2025-09-22 20:59:08,118 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 20:59:08,127 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file 3c1926fe-9a3a-4cff-93dc-31b6b09d284f
2025-09-22 20:59:08,128 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task 494ef087-ad27-4ba8-bfe9-d831c1b1e497
2025-09-22 20:59:08,129 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task 51cf3cd5-8c70-45dd-9fbe-9837638122ac
2025-09-22 20:59:08,129 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task cea9271c-18a1-4979-a07f-aed37487cbe0
2025-09-22 20:59:08,130 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task 58064c7b-a41a-4910-911b-788a3d31aaac
2025-09-22 20:59:08,130 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task 8e837df9-03fe-4693-8fa5-712ea19f6ef4
2025-09-22 20:59:08,130 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file 3c1926fe-9a3a-4cff-93dc-31b6b09d284f: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 20:59:19,079 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:278 >>> Enqueueing sequential docling pipelines for file 3c1926fe-9a3a-4cff-93dc-31b6b09d284f
2025-09-22 20:59:19,079 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:316 >>> Sequential pipeline order for file 3c1926fe-9a3a-4cff-93dc-31b6b09d284f: ['no_ocr', 'ocr', 'vlm']
2025-09-22 20:59:19,079 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:330 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-22 20:59:19,138 INFO : pipeline_controller.py:_determine_processing_mode:382 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-22 20:59:19,139 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:602 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-22 20:59:19,139 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:613 >>> Successfully enqueued docling_bundle_split task dbf17afa-4b69-42fd-9008-017d3192bf2b for no_ocr pipeline
2025-09-22 20:59:19,139 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:340 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-22 20:59:19,139 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:328 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['dbf17afa-4b69-42fd-9008-017d3192bf2b']
2025-09-22 20:59:19,172 INFO : pipeline_controller.py:_determine_processing_mode:382 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-22 20:59:19,173 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:602 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-22 20:59:19,173 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:613 >>> Successfully enqueued docling_bundle_split task 53103ced-a545-4dbc-ad02-284d03e54594 for ocr pipeline
2025-09-22 20:59:19,173 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:340 >>> Enqueued ocr pipeline with 1 tasks
2025-09-22 20:59:19,173 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:328 >>> Pipeline vlm will depend on 1 tasks from ocr: ['53103ced-a545-4dbc-ad02-284d03e54594']
2025-09-22 20:59:19,198 INFO : pipeline_controller.py:_determine_processing_mode:382 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-22 20:59:19,198 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:602 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_sections (timeout: 3600s)
2025-09-22 20:59:19,199 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:613 >>> Successfully enqueued docling_bundle_split task b0f1ecb9-7b75-46e8-8349-50d07acdc62b for vlm pipeline
2025-09-22 20:59:19,200 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:340 >>> Enqueued vlm pipeline with 1 tasks
2025-09-22 20:59:19,200 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 3c1926fe-9a3a-4cff-93dc-31b6b09d284f
2025-09-22 21:08:51,648 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 21:08:51,676 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file e9bf8a6e-128c-4e0a-a782-c92aa8414b92
2025-09-22 21:08:51,678 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task af7cf351-7239-4764-8f3d-94e6b2556565
2025-09-22 21:08:51,679 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task 32a6a6eb-bc97-4323-9715-200b774cfe2a
2025-09-22 21:08:51,680 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task 517de10f-341e-4f8c-8567-676baad446ee
2025-09-22 21:08:51,681 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task 5c5200d7-fc3b-4814-aece-e26ac851b92d
2025-09-22 21:08:51,681 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task d3a32e26-d862-4dd5-807c-bbc77bcbb99a
2025-09-22 21:08:51,681 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file e9bf8a6e-128c-4e0a-a782-c92aa8414b92: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 21:09:51,674 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 21:09:51,686 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file d6b8e156-339a-42f5-b1b1-5d677b7c4bb3
2025-09-22 21:09:51,689 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task b355d1d9-ea20-4c72-b8ee-6fbc52aa8cdf
2025-09-22 21:09:51,690 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task 57c2d79a-ade4-45eb-bf55-3e0d8656556e
2025-09-22 21:09:51,691 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task 6ae2139d-91be-47c1-8904-7efe827972d6
2025-09-22 21:09:51,691 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task b1d08616-0313-4293-90cb-7d565b02068e
2025-09-22 21:09:51,693 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task 107484b1-b391-484d-8162-93d8a287326b
2025-09-22 21:09:51,693 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file d6b8e156-339a-42f5-b1b1-5d677b7c4bb3: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 21:09:57,524 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:278 >>> Enqueueing sequential docling pipelines for file d6b8e156-339a-42f5-b1b1-5d677b7c4bb3
2025-09-22 21:09:57,524 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:316 >>> Sequential pipeline order for file d6b8e156-339a-42f5-b1b1-5d677b7c4bb3: ['no_ocr', 'ocr', 'vlm']
2025-09-22 21:09:57,524 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:330 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-22 21:09:57,551 INFO : pipeline_controller.py:_determine_processing_mode:382 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-22 21:09:57,552 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:602 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-22 21:09:57,552 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:613 >>> Successfully enqueued docling_bundle_split task 863d7962-a302-4bac-8829-a0579a724d1e for no_ocr pipeline
2025-09-22 21:09:57,552 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:340 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-22 21:09:57,552 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:328 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['863d7962-a302-4bac-8829-a0579a724d1e']
2025-09-22 21:09:57,567 INFO : pipeline_controller.py:_determine_processing_mode:382 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-22 21:09:57,568 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:602 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-22 21:09:57,568 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:613 >>> Successfully enqueued docling_bundle_split task bfbf5b13-3bdd-446d-aaff-1616c3d5bfcf for ocr pipeline
2025-09-22 21:09:57,568 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:340 >>> Enqueued ocr pipeline with 1 tasks
2025-09-22 21:09:57,568 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:328 >>> Pipeline vlm will depend on 1 tasks from ocr: ['bfbf5b13-3bdd-446d-aaff-1616c3d5bfcf']
2025-09-22 21:09:57,586 INFO : pipeline_controller.py:_determine_processing_mode:382 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-22 21:09:57,586 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:602 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_sections (timeout: 3600s)
2025-09-22 21:09:57,587 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:613 >>> Successfully enqueued docling_bundle_split task 43f64cc6-c2a8-4fbc-b1d9-2e57ef3739c2 for vlm pipeline
2025-09-22 21:09:57,587 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:340 >>> Enqueued vlm pipeline with 1 tasks
2025-09-22 21:09:57,587 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file d6b8e156-339a-42f5-b1b1-5d677b7c4bb3
2025-09-22 21:25:37,100 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 21:25:37,109 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file 6890512d-a4ef-4a52-9802-f89f64baba9a
2025-09-22 21:25:37,110 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task e57852d5-070b-4491-b05d-89e954490198
2025-09-22 21:25:37,111 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task 6147f0d9-82bd-4209-b2cd-2cb77a26316b
2025-09-22 21:25:37,111 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task 10337c09-7331-416a-aa41-072cc9a6cbfe
2025-09-22 21:25:37,112 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task 24f26ad7-cd42-4201-b540-0c51e9e24da0
2025-09-22 21:25:37,112 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task cdcd4a94-e764-478d-abb1-b4d1de61ee53
2025-09-22 21:25:37,112 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file 6890512d-a4ef-4a52-9802-f89f64baba9a: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 21:25:48,009 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:278 >>> Enqueueing sequential docling pipelines for file 6890512d-a4ef-4a52-9802-f89f64baba9a
2025-09-22 21:25:48,009 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:307 >>> No docling pipelines enabled for file 6890512d-a4ef-4a52-9802-f89f64baba9a
2025-09-22 21:26:47,863 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 21:26:47,893 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file e0efc11e-b489-41bc-b5e8-a31f7b6bb846
2025-09-22 21:26:47,895 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task d6f5dd28-5c98-460e-a1b8-7cccea46841b
2025-09-22 21:26:47,896 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task 779b1ede-8545-4457-9779-492a7d116bac
2025-09-22 21:26:47,896 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task b4ae6a65-d5e4-4ef0-a28a-70f7e800737e
2025-09-22 21:26:47,900 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task 7fd1277c-23b3-404b-a0d6-2c62d7ffc41f
2025-09-22 21:26:47,902 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task 127389f7-b4bd-4f91-b04a-75f562789151
2025-09-22 21:26:47,902 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file e0efc11e-b489-41bc-b5e8-a31f7b6bb846: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 21:26:58,755 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:278 >>> Enqueueing sequential docling pipelines for file e0efc11e-b489-41bc-b5e8-a31f7b6bb846
2025-09-22 21:26:58,755 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:316 >>> Sequential pipeline order for file e0efc11e-b489-41bc-b5e8-a31f7b6bb846: ['no_ocr']
2025-09-22 21:26:58,755 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:330 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-22 21:26:58,786 INFO : pipeline_controller.py:_determine_processing_mode:382 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-22 21:26:58,786 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:602 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-22 21:26:58,786 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:613 >>> Successfully enqueued docling_bundle_split task 9b22f5c3-66cd-4f2e-872e-7b4647af4f75 for no_ocr pipeline
2025-09-22 21:26:58,787 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:340 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-22 21:26:58,787 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Successfully enqueued 1 sequential pipelines with 1 total tasks for file e0efc11e-b489-41bc-b5e8-a31f7b6bb846
2025-09-22 21:33:07,442 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 21:33:07,452 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file 254728dd-ab88-4684-92c1-c75a52339beb
2025-09-22 21:33:07,454 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task a4eb4c7e-1df5-4b44-9b48-0bd69bb01951
2025-09-22 21:33:07,454 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task 664e6727-1436-4755-93fe-05874d296281
2025-09-22 21:33:07,454 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task e018ceb4-142a-4ed7-8e1c-5d1366f3811c
2025-09-22 21:33:07,455 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task ef6f1b90-9ed1-4e80-b1d5-206535610099
2025-09-22 21:33:07,456 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task ea89a148-0187-4f30-9578-336da261905b
2025-09-22 21:33:07,456 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file 254728dd-ab88-4684-92c1-c75a52339beb: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 21:34:35,452 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 21:34:35,462 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file a1a40cf1-c841-4d33-aaef-a15a052c30ec
2025-09-22 21:34:35,464 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task 4500595b-6b67-452e-829d-06d5640d89d2
2025-09-22 21:34:35,464 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task 9af54162-8c35-4a27-91e7-c506dc79721a
2025-09-22 21:34:35,465 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task 29777d16-b201-475b-85e7-e984431f0616
2025-09-22 21:34:35,465 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task 5a2b17b2-65ef-4f40-a45f-e805e8124fc1
2025-09-22 21:34:35,465 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task e3332fdb-22b2-43fa-b8ac-d0a5451fd947
2025-09-22 21:34:35,465 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file a1a40cf1-c841-4d33-aaef-a15a052c30ec: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 21:37:21,517 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 21:37:21,526 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file ffb8377d-fb4c-471e-9b71-851b1c01de2a
2025-09-22 21:37:21,528 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task 4c8fcb0b-1d16-4fc0-84c6-f3bf39c7c720
2025-09-22 21:37:21,528 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task 65a15b1c-3247-47f3-afb7-3944d3fbf651
2025-09-22 21:37:21,529 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task e0a67120-5b5b-45f9-9622-7338b55606a0
2025-09-22 21:37:21,529 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task bb0bc897-88f3-4030-b608-90495263ec7e
2025-09-22 21:37:21,530 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task 42014733-f9a6-4f30-9ba4-b622c796fb13
2025-09-22 21:37:21,530 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file ffb8377d-fb4c-471e-9b71-851b1c01de2a: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 21:37:32,414 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:278 >>> Enqueueing sequential docling pipelines for file ffb8377d-fb4c-471e-9b71-851b1c01de2a
2025-09-22 21:37:32,414 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:316 >>> Sequential pipeline order for file ffb8377d-fb4c-471e-9b71-851b1c01de2a: ['ocr']
2025-09-22 21:37:32,414 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:330 >>> Pipeline ocr has no dependencies (first pipeline)
2025-09-22 21:37:32,434 INFO : pipeline_controller.py:_determine_processing_mode:376 >>> Document has 94 pages (< 100 threshold) - creating single bundle
2025-09-22 21:39:06,816 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 21:39:06,825 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file 015b8a3b-b194-4664-b1d0-24921b738465
2025-09-22 21:39:06,826 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task 6a61c069-75d0-4f9d-a440-c0470e43b67a
2025-09-22 21:39:06,827 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task aa3e3dab-f6a4-4816-9069-1e8a47eca0f7
2025-09-22 21:39:06,827 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task 125651e2-c4ad-413c-92d7-1bf4c8d8fbf2
2025-09-22 21:39:06,828 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task 6b86fd6d-1e2a-4e27-a8aa-45d045b28b7f
2025-09-22 21:39:06,828 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task 7dc7dab0-c52f-4cda-8e29-60e4c1bce7c2
2025-09-22 21:39:06,828 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file 015b8a3b-b194-4664-b1d0-24921b738465: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 21:39:17,687 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:278 >>> Enqueueing sequential docling pipelines for file 015b8a3b-b194-4664-b1d0-24921b738465
2025-09-22 21:39:17,687 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:316 >>> Sequential pipeline order for file 015b8a3b-b194-4664-b1d0-24921b738465: ['ocr']
2025-09-22 21:39:17,687 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:330 >>> Pipeline ocr has no dependencies (first pipeline)
2025-09-22 21:39:17,706 INFO : pipeline_controller.py:_determine_processing_mode:376 >>> Document has 94 pages (< 100 threshold) - creating single bundle
2025-09-22 21:40:34,265 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 21:40:34,275 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file 5f4d6182-f1a8-4fa1-b4db-ef8cafd0503c
2025-09-22 21:40:34,277 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task 273ad5e1-354f-4702-8273-38feb9714e09
2025-09-22 21:40:34,278 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task 67b2d575-f658-4380-a1c0-c3e8b50dac6e
2025-09-22 21:40:34,279 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task ac7d431e-fbe7-476f-ad53-6c93e58f10ad
2025-09-22 21:40:34,279 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task 99981ed5-a507-43f3-b0b7-f0f8ee2a8f7a
2025-09-22 21:40:34,279 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task 56c2d4bc-772c-49f7-b579-c721c37d1a20
2025-09-22 21:40:34,279 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file 5f4d6182-f1a8-4fa1-b4db-ef8cafd0503c: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 21:40:45,154 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:278 >>> Enqueueing sequential docling pipelines for file 5f4d6182-f1a8-4fa1-b4db-ef8cafd0503c
2025-09-22 21:40:45,154 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:316 >>> Sequential pipeline order for file 5f4d6182-f1a8-4fa1-b4db-ef8cafd0503c: ['vlm']
2025-09-22 21:40:45,154 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:330 >>> Pipeline vlm has no dependencies (first pipeline)
2025-09-22 21:40:45,182 INFO : pipeline_controller.py:_determine_processing_mode:376 >>> Document has 94 pages (< 100 threshold) - creating single bundle
2025-09-22 21:42:16,832 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 21:42:16,843 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file dccf4f5a-bf42-4ca9-b7c7-f9182bbe3e3c
2025-09-22 21:42:16,845 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task b5310bc9-8e33-4eb0-9daf-d2c392f2a8a1
2025-09-22 21:42:16,845 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task b1cd86a2-a3b7-40af-831f-9fba235995e1
2025-09-22 21:42:16,846 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task 0c7ee1c1-4069-4ff6-80c3-c3b32232d4c4
2025-09-22 21:42:16,846 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task bb9046ab-f840-4e11-a471-2b220ae902a6
2025-09-22 21:42:16,847 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task ab5f60c1-e89d-4e62-8e24-c23f50174f94
2025-09-22 21:42:16,847 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file dccf4f5a-bf42-4ca9-b7c7-f9182bbe3e3c: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 21:42:27,688 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:278 >>> Enqueueing sequential docling pipelines for file dccf4f5a-bf42-4ca9-b7c7-f9182bbe3e3c
2025-09-22 21:42:27,688 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:316 >>> Sequential pipeline order for file dccf4f5a-bf42-4ca9-b7c7-f9182bbe3e3c: ['no_ocr']
2025-09-22 21:42:27,688 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:330 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-22 21:42:27,708 INFO : pipeline_controller.py:_determine_processing_mode:376 >>> Document has 94 pages (< 100 threshold) - creating single bundle
2025-09-22 21:46:24,192 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 21:46:24,209 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file 4699a62b-7609-4471-adf9-ef2a49661923
2025-09-22 21:46:24,211 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task 4e8b6455-297d-4f72-8abd-73f58d38a914
2025-09-22 21:46:24,212 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task b62734ed-abc9-426f-9361-a306216c2945
2025-09-22 21:46:24,212 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task 64261035-bdb4-40aa-b410-5a3d508216d9
2025-09-22 21:46:24,213 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task 36f6803a-e7b4-4098-b540-f8a411ac95ff
2025-09-22 21:46:24,213 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task f9931c5d-e6d0-49ee-9634-7a907c05f33e
2025-09-22 21:46:24,213 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file 4699a62b-7609-4471-adf9-ef2a49661923: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 21:46:30,080 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:278 >>> Enqueueing sequential docling pipelines for file 4699a62b-7609-4471-adf9-ef2a49661923
2025-09-22 21:46:30,080 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:316 >>> Sequential pipeline order for file 4699a62b-7609-4471-adf9-ef2a49661923: ['no_ocr']
2025-09-22 21:46:30,080 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:330 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-22 21:46:30,098 INFO : pipeline_controller.py:_determine_processing_mode:376 >>> Document has 94 pages (< 100 threshold) - creating single bundle
2025-09-22 21:46:30,099 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:617 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-22 21:46:30,099 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:628 >>> Successfully enqueued docling_bundle task 97ac92ac-895b-48e3-98e2-74b1e253d6fd for no_ocr pipeline
2025-09-22 21:46:30,099 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:340 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-22 21:46:30,099 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Successfully enqueued 1 sequential pipelines with 1 total tasks for file 4699a62b-7609-4471-adf9-ef2a49661923
2025-09-22 21:52:27,804 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 21:52:27,814 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file d86fbee0-b8a2-4ac1-ad90-279c34f4cb28
2025-09-22 21:52:27,816 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task 862a4a0c-4691-473e-87ea-f2154af22c24
2025-09-22 21:52:27,816 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task 09e1ebf8-86a5-41cf-a541-bf52d40f625c
2025-09-22 21:52:27,817 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task 71ea9e0b-ce63-4205-a65b-dcb0e5f995ee
2025-09-22 21:52:27,817 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task f7e70f1b-115d-49da-b587-e277e5e3c26d
2025-09-22 21:52:27,818 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task 0d3e66cd-42e9-4c07-9e63-a9ed0ee14121
2025-09-22 21:52:27,818 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file d86fbee0-b8a2-4ac1-ad90-279c34f4cb28: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 21:52:38,799 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:278 >>> Enqueueing sequential docling pipelines for file d86fbee0-b8a2-4ac1-ad90-279c34f4cb28
2025-09-22 21:52:38,799 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:316 >>> Sequential pipeline order for file d86fbee0-b8a2-4ac1-ad90-279c34f4cb28: ['no_ocr']
2025-09-22 21:52:38,799 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:330 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-22 21:52:38,818 INFO : pipeline_controller.py:_determine_processing_mode:376 >>> Document has 94 pages (< 100 threshold) - creating single bundle
2025-09-22 21:52:38,819 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:617 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-22 21:52:38,819 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:628 >>> Successfully enqueued docling_bundle task 69291a86-a420-498a-9a82-81b3f5a5ae66 for no_ocr pipeline
2025-09-22 21:52:38,819 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:340 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-22 21:52:38,819 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Successfully enqueued 1 sequential pipelines with 1 total tasks for file d86fbee0-b8a2-4ac1-ad90-279c34f4cb28
2025-09-22 21:59:46,432 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 21:59:46,459 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file cd21f03b-f61b-481b-8022-7aca2d76b40a
2025-09-22 21:59:46,461 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task a63a9d04-b7a7-451c-ab01-4d36385d74c5
2025-09-22 21:59:46,462 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task a2677763-00ad-4922-8275-a666d025ea9c
2025-09-22 21:59:46,462 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task e34116ae-61af-4379-b5a0-9421cf66c9e7
2025-09-22 21:59:46,464 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task 0426f4c5-9c22-4846-8b4d-3e66aa84e15e
2025-09-22 21:59:46,465 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task 52d11703-4b1f-4bd6-9d97-738f661e4321
2025-09-22 21:59:46,465 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file cd21f03b-f61b-481b-8022-7aca2d76b40a: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 22:00:57,947 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 22:00:57,957 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file 8bafd2e7-a0b3-42e0-9733-065d26d76935
2025-09-22 22:00:57,958 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task 6b896112-a9cd-41a7-8225-b9dfbfb88c23
2025-09-22 22:00:57,959 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task 81f43d17-862b-4ae0-b114-244b0a75c190
2025-09-22 22:00:57,959 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task 29d7090b-81c6-42a3-84ba-68fcbae8d13c
2025-09-22 22:00:57,960 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task a09e6fba-3ce4-458f-bcca-e6c54b10788c
2025-09-22 22:00:57,960 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task b3e11844-f96e-40cf-b2c0-fb479b284170
2025-09-22 22:00:57,960 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file 8bafd2e7-a0b3-42e0-9733-065d26d76935: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 22:03:39,455 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 22:03:39,470 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file 22270f1d-b25b-4f59-a79f-c34965b27e80
2025-09-22 22:03:39,472 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task 69c3320b-c7a5-4804-8502-d86ae6180098
2025-09-22 22:03:39,473 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task 53eef2ed-3a44-40d7-992d-5334ea292c56
2025-09-22 22:03:39,474 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task f9c7a7a1-fe0b-4131-a0b0-0bc4458a2f68
2025-09-22 22:03:39,474 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task da0a2f65-4185-42cd-964d-2736f2fc6713
2025-09-22 22:03:39,474 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task c7598521-a4e5-4ce2-a2e2-1173f8313308
2025-09-22 22:03:39,475 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file 22270f1d-b25b-4f59-a79f-c34965b27e80: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 22:03:45,305 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:278 >>> Enqueueing sequential docling pipelines for file 22270f1d-b25b-4f59-a79f-c34965b27e80
2025-09-22 22:03:45,305 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:316 >>> Sequential pipeline order for file 22270f1d-b25b-4f59-a79f-c34965b27e80: ['no_ocr']
2025-09-22 22:03:45,305 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:330 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-22 22:03:45,325 INFO : pipeline_controller.py:_determine_processing_mode:376 >>> Document has 94 pages (< 100 threshold) - creating single bundle
2025-09-22 22:03:45,325 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:617 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-22 22:03:45,326 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:628 >>> Successfully enqueued docling_bundle task 5ec0c1fc-2176-4372-b0ee-5db83967e8a2 for no_ocr pipeline
2025-09-22 22:03:45,326 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:340 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-22 22:03:45,326 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Successfully enqueued 1 sequential pipelines with 1 total tasks for file 22270f1d-b25b-4f59-a79f-c34965b27e80
2025-09-22 22:07:42,874 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 22:07:42,890 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file 6ef5aafc-f513-42a4-9c5a-f513523fbf22
2025-09-22 22:07:42,893 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task 082fae17-bfa9-4244-beb0-df193fabec3a
2025-09-22 22:07:42,894 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task 0e19adb7-d27b-42ea-88bb-bd0442d65270
2025-09-22 22:07:42,894 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task f9914148-22e8-4324-a052-21d9cf8498f8
2025-09-22 22:07:42,895 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task e91c3dab-9537-40a5-8b4c-e8d3ae02852d
2025-09-22 22:07:42,895 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task ebfcab92-8557-42a8-ab79-e8da415891b7
2025-09-22 22:07:42,895 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file 6ef5aafc-f513-42a4-9c5a-f513523fbf22: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 22:07:48,890 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:278 >>> Enqueueing sequential docling pipelines for file 6ef5aafc-f513-42a4-9c5a-f513523fbf22
2025-09-22 22:07:48,890 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:316 >>> Sequential pipeline order for file 6ef5aafc-f513-42a4-9c5a-f513523fbf22: ['no_ocr']
2025-09-22 22:07:48,890 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:330 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-22 22:07:48,917 INFO : pipeline_controller.py:_determine_processing_mode:376 >>> Document has 94 pages (< 100 threshold) - creating single bundle
2025-09-22 22:07:48,917 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:617 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-22 22:07:48,918 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:628 >>> Successfully enqueued docling_bundle task 3197767f-d1ea-4ff6-911e-03d978e5431b for no_ocr pipeline
2025-09-22 22:07:48,918 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:340 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-22 22:07:48,918 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Successfully enqueued 1 sequential pipelines with 1 total tasks for file 6ef5aafc-f513-42a4-9c5a-f513523fbf22
2025-09-22 22:17:18,473 INFO : pipeline_controller.py:__init__ :70 >>> Pipeline controller initialized: phase_coordination=True
2025-09-22 22:17:18,486 INFO : pipeline_controller.py:enqueue_phase1_tasks:81 >>> Phase 1: Starting structure discovery for file b3c38433-2a95-4e8c-af94-8cf30db5e4f5
2025-09-22 22:17:18,487 INFO : pipeline_controller.py:enqueue_phase1_tasks:102 >>> Phase 1: Enqueued Tika task f98429fd-5c97-4361-92f7-bd878eb389be
2025-09-22 22:17:18,488 INFO : pipeline_controller.py:enqueue_phase1_tasks:140 >>> Phase 1: Enqueued frontmatter task 8e829630-635c-4b64-ad58-14f5d634ca29
2025-09-22 22:17:18,488 INFO : pipeline_controller.py:enqueue_phase1_tasks:164 >>> Phase 1: Enqueued document analysis task a04781ec-483d-47f2-9c25-f9a69ab1508d
2025-09-22 22:17:18,489 INFO : pipeline_controller.py:enqueue_phase1_tasks:176 >>> Phase 1: Enqueued split map task 360b52d9-f015-470e-87e2-07f890849802
2025-09-22 22:17:18,489 INFO : pipeline_controller.py:enqueue_phase1_tasks:195 >>> Phase 1: Enqueued page images task 9c0ce934-46f8-49d5-8aad-28fb94071bdf
2025-09-22 22:17:18,489 INFO : pipeline_controller.py:enqueue_phase1_tasks:200 >>> Phase 1: Enqueued 5 tasks for file b3c38433-2a95-4e8c-af94-8cf30db5e4f5: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 22:17:24,255 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:278 >>> Enqueueing sequential docling pipelines for file b3c38433-2a95-4e8c-af94-8cf30db5e4f5
2025-09-22 22:17:24,255 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:316 >>> Sequential pipeline order for file b3c38433-2a95-4e8c-af94-8cf30db5e4f5: ['no_ocr']
2025-09-22 22:17:24,255 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:330 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-22 22:17:24,274 INFO : pipeline_controller.py:_determine_processing_mode:376 >>> Document has 94 pages (< 100 threshold) - creating single bundle
2025-09-22 22:17:24,274 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:617 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-22 22:17:24,275 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:628 >>> Successfully enqueued docling_bundle task 63c96ffd-3760-4e9f-a0bb-85037f01843b for no_ocr pipeline
2025-09-22 22:17:24,275 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:340 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-22 22:17:24,275 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Successfully enqueued 1 sequential pipelines with 1 total tasks for file b3c38433-2a95-4e8c-af94-8cf30db5e4f5
2025-09-22 22:22:27,344 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-22 22:25:31,611 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-22 22:25:31,626 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 7092fec3-8320-4b56-92dc-9336adf90a1b
2025-09-22 22:25:31,629 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 216a74a0-a2dd-45f3-b583-cfc59f2c2a9c
2025-09-22 22:25:31,630 INFO : pipeline_controller.py:enqueue_phase1_tasks:136 >>> Phase 1: Enqueued frontmatter task 44647654-7d14-4dbb-a517-145e1b8dba19
2025-09-22 22:25:31,632 INFO : pipeline_controller.py:enqueue_phase1_tasks:160 >>> Phase 1: Enqueued document analysis task a0d339ce-e626-4ff8-a369-323d0c669b3f
2025-09-22 22:25:31,633 INFO : pipeline_controller.py:enqueue_phase1_tasks:172 >>> Phase 1: Enqueued split map task 820425fb-0b60-48db-9210-bbf17ad85fb4
2025-09-22 22:25:31,634 INFO : pipeline_controller.py:enqueue_phase1_tasks:191 >>> Phase 1: Enqueued page images task 4f43a1cb-4375-42cf-8626-8e8067563e7c
2025-09-22 22:25:31,634 INFO : pipeline_controller.py:enqueue_phase1_tasks:196 >>> Phase 1: Enqueued 5 tasks for file 7092fec3-8320-4b56-92dc-9336adf90a1b: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 22:25:37,374 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:274 >>> Enqueueing sequential docling pipelines for file 7092fec3-8320-4b56-92dc-9336adf90a1b
2025-09-22 22:25:37,375 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:312 >>> Sequential pipeline order for file 7092fec3-8320-4b56-92dc-9336adf90a1b: ['no_ocr']
2025-09-22 22:25:37,375 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:326 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-22 22:25:37,395 INFO : pipeline_controller.py:_determine_processing_mode:372 >>> Document has 94 pages (< 100 threshold) - creating single bundle
2025-09-22 22:25:37,395 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:613 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-22 22:25:37,396 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:624 >>> Successfully enqueued docling_bundle task 08284f0e-9840-4a4d-9910-c4a2032c7e31 for no_ocr pipeline
2025-09-22 22:25:37,396 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:336 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-22 22:25:37,396 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:339 >>> Successfully enqueued 1 sequential pipelines with 1 total tasks for file 7092fec3-8320-4b56-92dc-9336adf90a1b
2025-09-22 22:53:54,138 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-22 22:53:54,157 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 4f4880a4-3f1f-4057-aa39-023e43a6fc97
2025-09-22 22:53:54,159 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 91f65f07-1414-4250-a40f-f3792055ea5a
2025-09-22 22:53:54,159 INFO : pipeline_controller.py:enqueue_phase1_tasks:136 >>> Phase 1: Enqueued frontmatter task 49590cd7-e11e-4bd1-944b-f9114b09be8c
2025-09-22 22:53:54,160 INFO : pipeline_controller.py:enqueue_phase1_tasks:160 >>> Phase 1: Enqueued document analysis task 11143821-90fa-4afa-9943-db7d019c0cbc
2025-09-22 22:53:54,161 INFO : pipeline_controller.py:enqueue_phase1_tasks:172 >>> Phase 1: Enqueued split map task c46dd1be-ce61-4b15-8f54-485c398acd35
2025-09-22 22:53:54,161 INFO : pipeline_controller.py:enqueue_phase1_tasks:191 >>> Phase 1: Enqueued page images task 8f6a926e-af12-4777-a2c3-a6b1eca04dfd
2025-09-22 22:53:54,161 INFO : pipeline_controller.py:enqueue_phase1_tasks:196 >>> Phase 1: Enqueued 5 tasks for file 4f4880a4-3f1f-4057-aa39-023e43a6fc97: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 22:54:00,095 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:274 >>> Enqueueing sequential docling pipelines for file 4f4880a4-3f1f-4057-aa39-023e43a6fc97
2025-09-22 22:54:00,095 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:312 >>> Sequential pipeline order for file 4f4880a4-3f1f-4057-aa39-023e43a6fc97: ['no_ocr']
2025-09-22 22:54:00,095 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:326 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-22 22:54:00,114 INFO : pipeline_controller.py:_determine_processing_mode:372 >>> Document has 94 pages (< 100 threshold) - creating single bundle
2025-09-22 22:54:00,114 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:591 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-22 22:54:00,115 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:602 >>> Successfully enqueued docling_bundle task 4ae95331-1f4f-455b-9538-f6372abc1cea for no_ocr pipeline
2025-09-22 22:54:00,115 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:336 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-22 22:54:00,115 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:339 >>> Successfully enqueued 1 sequential pipelines with 1 total tasks for file 4f4880a4-3f1f-4057-aa39-023e43a6fc97
2025-09-22 23:23:47,322 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-22 23:27:22,037 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-22 23:27:22,047 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 5f2c20dc-49a2-42f2-876a-ad7d8fdfc1ec
2025-09-22 23:27:22,049 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 751b37e8-d90a-468b-a5c9-e33120525d36
2025-09-22 23:27:22,049 INFO : pipeline_controller.py:enqueue_phase1_tasks:136 >>> Phase 1: Enqueued frontmatter task e9de91d5-36a6-42ff-9f29-8b9c3d5d60c7
2025-09-22 23:27:22,050 INFO : pipeline_controller.py:enqueue_phase1_tasks:160 >>> Phase 1: Enqueued document analysis task ca1362e4-701b-461c-9f9d-cde275cd2c0c
2025-09-22 23:27:22,050 INFO : pipeline_controller.py:enqueue_phase1_tasks:172 >>> Phase 1: Enqueued split map task db463bfb-8af6-47ea-ab45-e99bf29c8325
2025-09-22 23:27:22,051 INFO : pipeline_controller.py:enqueue_phase1_tasks:191 >>> Phase 1: Enqueued page images task 25e04513-fd1c-430b-b2a0-7099d0863530
2025-09-22 23:27:22,051 INFO : pipeline_controller.py:enqueue_phase1_tasks:196 >>> Phase 1: Enqueued 5 tasks for file 5f2c20dc-49a2-42f2-876a-ad7d8fdfc1ec: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 23:27:27,901 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:274 >>> Enqueueing sequential docling pipelines for file 5f2c20dc-49a2-42f2-876a-ad7d8fdfc1ec
2025-09-22 23:27:27,901 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:313 >>> Sequential pipeline order for file 5f2c20dc-49a2-42f2-876a-ad7d8fdfc1ec: ['no_ocr']
2025-09-22 23:27:27,901 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:327 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-22 23:27:27,901 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:533 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-22 23:27:27,928 INFO : pipeline_controller.py:_determine_processing_mode:379 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-22 23:27:27,928 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:650 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-22 23:27:27,929 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:661 >>> Successfully enqueued docling_bundle_split task 63d761bc-fa12-4b81-89a8-1a5abae0e2ca for no_ocr pipeline
2025-09-22 23:27:27,929 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:337 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-22 23:27:27,929 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:340 >>> Successfully enqueued 1 sequential pipelines with 1 total tasks for file 5f2c20dc-49a2-42f2-876a-ad7d8fdfc1ec
2025-09-22 23:34:08,915 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-22 23:34:08,925 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file f499e480-a9fb-4821-b824-c6abe6faceea
2025-09-22 23:34:08,927 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 3cd7be2c-e1c3-408a-833a-2103686be73a
2025-09-22 23:34:08,927 INFO : pipeline_controller.py:enqueue_phase1_tasks:136 >>> Phase 1: Enqueued frontmatter task 5867b9b5-a23c-4c8d-810c-36ce8f04127f
2025-09-22 23:34:08,928 INFO : pipeline_controller.py:enqueue_phase1_tasks:160 >>> Phase 1: Enqueued document analysis task 2c3325e2-51ca-4940-8fb8-696993782d93
2025-09-22 23:34:08,928 INFO : pipeline_controller.py:enqueue_phase1_tasks:172 >>> Phase 1: Enqueued split map task 0c1f95e3-ac0b-4ab5-ba80-de60658d47c3
2025-09-22 23:34:08,929 INFO : pipeline_controller.py:enqueue_phase1_tasks:191 >>> Phase 1: Enqueued page images task c81e23cc-7e83-4f0d-bcf9-301d01ce1ef2
2025-09-22 23:34:08,929 INFO : pipeline_controller.py:enqueue_phase1_tasks:196 >>> Phase 1: Enqueued 5 tasks for file f499e480-a9fb-4821-b824-c6abe6faceea: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 23:34:14,739 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:274 >>> Enqueueing sequential docling pipelines for file f499e480-a9fb-4821-b824-c6abe6faceea
2025-09-22 23:34:14,740 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:313 >>> Sequential pipeline order for file f499e480-a9fb-4821-b824-c6abe6faceea: ['no_ocr']
2025-09-22 23:34:14,740 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:327 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-22 23:34:14,740 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:533 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-22 23:34:14,740 INFO : pipeline_controller.py:_determine_processing_mode:365 >>> BY_PAGE enabled for no_ocr - creating page-based bundles regardless of document size
2025-09-22 23:34:14,764 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:650 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_pages (timeout: 5640s)
2025-09-22 23:34:14,765 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:661 >>> Successfully enqueued docling_bundle_split task a17ef07a-1d63-40e6-a828-7a9ecc214d44 for no_ocr pipeline
2025-09-22 23:34:14,765 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:337 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-22 23:34:14,765 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:340 >>> Successfully enqueued 1 sequential pipelines with 1 total tasks for file f499e480-a9fb-4821-b824-c6abe6faceea
2025-09-22 23:55:37,310 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 6ceb4bd3-4e35-4451-a05b-8f312be9b220
2025-09-22 23:55:37,311 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task df7f0e5f-0fb9-414b-a0fb-a9621d7a35fb
2025-09-22 23:55:37,311 INFO : pipeline_controller.py:enqueue_phase1_tasks:136 >>> Phase 1: Enqueued frontmatter task 835e1be9-5e91-433c-9b19-ab95361aa295
2025-09-22 23:55:37,312 INFO : pipeline_controller.py:enqueue_phase1_tasks:160 >>> Phase 1: Enqueued document analysis task 8477af84-a216-4e3d-91db-2b06773781e9
2025-09-22 23:55:37,313 INFO : pipeline_controller.py:enqueue_phase1_tasks:172 >>> Phase 1: Enqueued split map task 2251980e-c67d-454c-9004-5fab0eb0fba8
2025-09-22 23:55:37,313 INFO : pipeline_controller.py:enqueue_phase1_tasks:191 >>> Phase 1: Enqueued page images task 4577df06-515f-4919-ae95-b40b85a4edd9
2025-09-22 23:55:37,313 INFO : pipeline_controller.py:enqueue_phase1_tasks:196 >>> Phase 1: Enqueued 5 tasks for file 6ceb4bd3-4e35-4451-a05b-8f312be9b220: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-22 23:55:53,272 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:274 >>> Enqueueing sequential docling pipelines for file 6ceb4bd3-4e35-4451-a05b-8f312be9b220
2025-09-22 23:55:53,272 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:313 >>> Sequential pipeline order for file 6ceb4bd3-4e35-4451-a05b-8f312be9b220: ['no_ocr']
2025-09-22 23:55:53,272 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:327 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-22 23:55:53,272 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:533 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-22 23:55:53,272 INFO : pipeline_controller.py:_determine_processing_mode:365 >>> BY_PAGE enabled for no_ocr - creating page-based bundles regardless of document size
2025-09-22 23:55:53,284 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:650 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_pages (timeout: 3600s)
2025-09-22 23:55:53,285 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:661 >>> Successfully enqueued docling_bundle_split task f8e686bf-c405-468f-9bae-d2ef59654b32 for no_ocr pipeline
2025-09-22 23:55:53,285 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:337 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-22 23:55:53,285 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:340 >>> Successfully enqueued 1 sequential pipelines with 1 total tasks for file 6ceb4bd3-4e35-4451-a05b-8f312be9b220
2025-09-23 00:46:30,061 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 00:46:30,070 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file fa22ea88-f09f-4464-8cf6-783ecd3aba31
2025-09-23 00:46:30,071 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 21681d2a-3a48-4780-a138-f89bf2cc4bba
2025-09-23 00:46:30,072 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 41944601-a5bf-4927-8fb7-4293aaa08218
2025-09-23 00:46:30,073 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 14576a7d-a939-4ac7-b3bb-beb02b002923
2025-09-23 00:46:30,073 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task b111c474-3f52-4c7d-88c4-4cbe1d404731
2025-09-23 00:46:30,073 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 6bea0cad-af00-4595-948b-e0fd429bcf14
2025-09-23 00:46:30,074 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file fa22ea88-f09f-4464-8cf6-783ecd3aba31: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 00:46:41,030 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file fa22ea88-f09f-4464-8cf6-783ecd3aba31
2025-09-23 00:46:41,030 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file fa22ea88-f09f-4464-8cf6-783ecd3aba31: ['no_ocr', 'ocr']
2025-09-23 00:46:41,030 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 00:46:41,030 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:551 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 00:46:41,039 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 3 pages (< 50 threshold) - creating single bundle
2025-09-23 00:46:41,039 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:668 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 00:46:41,040 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:679 >>> Successfully enqueued docling_bundle task 4bd2af27-a92f-4596-a9be-aeba38cd88a9 for no_ocr pipeline
2025-09-23 00:46:41,040 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 00:46:41,040 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['4bd2af27-a92f-4596-a9be-aeba38cd88a9']
2025-09-23 00:46:41,040 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:562 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 00:46:41,040 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for ocr - creating page-based bundles regardless of document size
2025-09-23 00:46:41,042 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:668 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_pages (timeout: 3600s)
2025-09-23 00:46:41,042 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:679 >>> Successfully enqueued docling_bundle_split task cd2efd68-6776-43b0-9e5d-0dbb30c9bb7f for ocr pipeline
2025-09-23 00:46:41,042 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 00:46:41,042 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 2 sequential pipelines with 2 total tasks for file fa22ea88-f09f-4464-8cf6-783ecd3aba31
2025-09-23 00:52:26,811 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 00:52:26,820 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 631c4844-acc7-4075-8dd1-d9482bdc8a01
2025-09-23 00:52:26,822 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 457a5d9f-1424-4ac3-b2ac-1a8856ca570f
2025-09-23 00:52:26,822 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 40947e3c-8915-4336-88e3-534b3057e09b
2025-09-23 00:52:26,823 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task e9ae2e6e-8373-4aa5-abcd-e8c6534a3e21
2025-09-23 00:52:26,823 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 8dd1a229-3ee9-4895-a6af-1929afc89ee4
2025-09-23 00:52:26,824 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task d0c43987-8b74-4d14-8c70-3d6ff7d527b2
2025-09-23 00:52:26,824 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 631c4844-acc7-4075-8dd1-d9482bdc8a01: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 00:52:37,698 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 631c4844-acc7-4075-8dd1-d9482bdc8a01
2025-09-23 00:52:37,698 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 631c4844-acc7-4075-8dd1-d9482bdc8a01: ['no_ocr', 'ocr']
2025-09-23 00:52:37,698 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 00:52:37,698 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:551 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 00:52:37,706 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 3 pages (< 50 threshold) - creating single bundle
2025-09-23 00:52:37,706 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:668 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 00:52:37,706 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:679 >>> Successfully enqueued docling_bundle task 4448e5a3-2873-457d-9bb6-17a5e5b8576f for no_ocr pipeline
2025-09-23 00:52:37,707 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 00:52:37,707 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['4448e5a3-2873-457d-9bb6-17a5e5b8576f']
2025-09-23 00:52:37,707 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:562 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 00:52:37,707 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for ocr - creating page-based bundles regardless of document size
2025-09-23 00:52:37,709 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:668 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_pages (timeout: 3600s)
2025-09-23 00:52:37,709 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:679 >>> Successfully enqueued docling_bundle_split task 7335408b-fae2-4007-8357-e3da2f5a713d for ocr pipeline
2025-09-23 00:52:37,709 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 00:52:37,709 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 2 sequential pipelines with 2 total tasks for file 631c4844-acc7-4075-8dd1-d9482bdc8a01
2025-09-23 01:06:37,173 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 01:06:37,183 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file d9562969-88d0-43e8-9c4e-c71438bf7820
2025-09-23 01:06:37,185 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task d69792c5-509a-4696-b423-ab1f1ce94030
2025-09-23 01:06:37,185 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task a2484ae8-61ba-4795-a79a-95175d19ec8e
2025-09-23 01:06:37,186 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 41a7adb3-a523-4e7d-843d-b86bc872de0a
2025-09-23 01:06:37,187 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 957dc537-26f9-48cd-8c95-65e5a1a56cc2
2025-09-23 01:06:37,187 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task fa7ba756-93d2-4900-9387-22ed65b4256e
2025-09-23 01:06:37,187 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file d9562969-88d0-43e8-9c4e-c71438bf7820: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 01:06:48,149 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file d9562969-88d0-43e8-9c4e-c71438bf7820
2025-09-23 01:06:48,151 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file d9562969-88d0-43e8-9c4e-c71438bf7820: ['no_ocr', 'ocr']
2025-09-23 01:06:48,151 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 01:06:48,151 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:556 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 01:06:48,159 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 3 pages (< 50 threshold) - creating single bundle
2025-09-23 01:06:48,162 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:673 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 01:06:48,163 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:684 >>> Successfully enqueued docling_bundle task 01c2933f-b722-43c2-ab89-faa5b4e60be6 for no_ocr pipeline
2025-09-23 01:06:48,163 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 01:06:48,163 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['01c2933f-b722-43c2-ab89-faa5b4e60be6']
2025-09-23 01:06:48,163 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:567 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 01:06:48,163 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for ocr - creating page-based bundles regardless of document size
2025-09-23 01:06:48,166 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:673 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_pages (timeout: 3600s)
2025-09-23 01:06:48,166 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:684 >>> Successfully enqueued docling_bundle_split task 8cbbbd17-964b-4fd5-b66b-27f0c0f814e0 for ocr pipeline
2025-09-23 01:06:48,166 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 01:06:48,166 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 2 sequential pipelines with 2 total tasks for file d9562969-88d0-43e8-9c4e-c71438bf7820
2025-09-23 01:08:53,365 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 01:08:53,375 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file f3b412cb-fd1e-43f2-abd3-9091d6dbcae1
2025-09-23 01:08:53,377 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 18d4e02c-bee8-4511-9926-d6f5a14ca9f5
2025-09-23 01:08:53,377 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 2b04ee61-2b8c-4119-9a5f-498d900fefd3
2025-09-23 01:08:53,378 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 04aa10db-5e68-411c-aa25-e28ac4ad3e41
2025-09-23 01:08:53,378 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task f63c9a26-104d-4270-800a-bd70fac33e6e
2025-09-23 01:08:53,379 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 0086816f-4bde-4a6a-8d56-85d55ae7ae7b
2025-09-23 01:08:53,379 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file f3b412cb-fd1e-43f2-abd3-9091d6dbcae1: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 01:09:04,239 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file f3b412cb-fd1e-43f2-abd3-9091d6dbcae1
2025-09-23 01:09:04,239 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file f3b412cb-fd1e-43f2-abd3-9091d6dbcae1: ['no_ocr', 'ocr']
2025-09-23 01:09:04,239 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 01:09:04,239 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:556 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 01:09:04,247 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 3 pages (< 50 threshold) - creating single bundle
2025-09-23 01:09:04,247 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:673 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 01:09:04,247 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:684 >>> Successfully enqueued docling_bundle task ba13467c-071e-4919-bb50-b1e94d0eef92 for no_ocr pipeline
2025-09-23 01:09:04,247 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 01:09:04,247 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['ba13467c-071e-4919-bb50-b1e94d0eef92']
2025-09-23 01:09:04,247 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:567 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 01:09:04,247 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for ocr - creating page-based bundles regardless of document size
2025-09-23 01:09:04,250 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:673 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_pages (timeout: 3600s)
2025-09-23 01:09:04,251 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:684 >>> Successfully enqueued docling_bundle_split task e647cbfe-33ca-4090-aa85-7b2423a55806 for ocr pipeline
2025-09-23 01:09:04,251 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 01:09:04,251 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 2 sequential pipelines with 2 total tasks for file f3b412cb-fd1e-43f2-abd3-9091d6dbcae1
2025-09-23 01:17:03,270 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 01:17:03,280 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file a961495e-fcce-4593-84a6-d1b521faa424
2025-09-23 01:17:03,281 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task f959d8b0-9be9-431e-8523-f948d42f2084
2025-09-23 01:17:03,282 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 5af6f71a-bbde-40a2-907f-bc9a7bf27225
2025-09-23 01:17:03,282 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task d494247a-4c7e-4553-905d-4cc0758922f7
2025-09-23 01:17:03,283 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task d18504f3-af92-422d-8361-f677df7eaf2d
2025-09-23 01:17:03,283 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 56c74ba8-9883-498d-ad96-1fdbea1c47a9
2025-09-23 01:17:03,283 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file a961495e-fcce-4593-84a6-d1b521faa424: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 01:17:14,117 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file a961495e-fcce-4593-84a6-d1b521faa424
2025-09-23 01:17:14,117 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file a961495e-fcce-4593-84a6-d1b521faa424: ['no_ocr', 'ocr']
2025-09-23 01:17:14,117 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 01:17:14,117 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:610 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 01:17:14,117 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file a961495e-fcce-4593-84a6-d1b521faa424
2025-09-23 01:17:14,125 INFO : pipeline_controller.py:_get_page_count :464 >>> 🔍 PAGE COUNT: Found 5 artefacts for file a961495e-fcce-4593-84a6-d1b521faa424
2025-09-23 01:17:14,125 INFO : pipeline_controller.py:_get_page_count :471 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file a961495e-fcce-4593-84a6-d1b521faa424
2025-09-23 01:17:14,125 INFO : pipeline_controller.py:_get_page_count :477 >>> ✅ PAGE COUNT: Found page count 3 from docling_json artefact for file a961495e-fcce-4593-84a6-d1b521faa424
2025-09-23 01:17:14,125 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 3 pages (< 50 threshold) - creating single bundle
2025-09-23 01:17:14,125 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:727 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 01:17:14,126 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:738 >>> Successfully enqueued docling_bundle task 21265f75-ff11-4167-af1e-f6ca73617add for no_ocr pipeline
2025-09-23 01:17:14,126 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 01:17:14,126 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['21265f75-ff11-4167-af1e-f6ca73617add']
2025-09-23 01:17:14,126 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:621 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 01:17:14,126 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for ocr - creating page-based bundles regardless of document size
2025-09-23 01:17:14,126 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file a961495e-fcce-4593-84a6-d1b521faa424
2025-09-23 01:17:14,129 INFO : pipeline_controller.py:_get_page_count :464 >>> 🔍 PAGE COUNT: Found 5 artefacts for file a961495e-fcce-4593-84a6-d1b521faa424
2025-09-23 01:17:14,129 INFO : pipeline_controller.py:_get_page_count :471 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file a961495e-fcce-4593-84a6-d1b521faa424
2025-09-23 01:17:14,129 INFO : pipeline_controller.py:_get_page_count :477 >>> ✅ PAGE COUNT: Found page count 3 from docling_json artefact for file a961495e-fcce-4593-84a6-d1b521faa424
2025-09-23 01:17:14,129 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:727 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_pages (timeout: 3600s)
2025-09-23 01:17:14,130 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:738 >>> Successfully enqueued docling_bundle_split task 7ec53bc6-0dc8-42d6-a636-6046fdf38812 for ocr pipeline
2025-09-23 01:17:14,130 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 01:17:14,130 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 2 sequential pipelines with 2 total tasks for file a961495e-fcce-4593-84a6-d1b521faa424
2025-09-23 01:18:57,208 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 01:18:57,217 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file b24bfa8b-b584-4379-89e5-52ecd098a606
2025-09-23 01:18:57,219 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task b74697c8-427d-4ee7-9e4f-f16bf910841c
2025-09-23 01:18:57,219 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task c6aee771-4b70-47d5-b4cc-ae76ea76b019
2025-09-23 01:18:57,220 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 936cd449-2a69-4073-ac2e-9d0a19aea7e2
2025-09-23 01:18:57,220 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task f459a691-d442-4775-a295-cada9691fd4a
2025-09-23 01:18:57,220 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 0202a150-9f58-493d-b88b-706270ab18d2
2025-09-23 01:18:57,220 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file b24bfa8b-b584-4379-89e5-52ecd098a606: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 01:19:13,094 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file b24bfa8b-b584-4379-89e5-52ecd098a606
2025-09-23 01:19:13,094 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file b24bfa8b-b584-4379-89e5-52ecd098a606: ['no_ocr', 'ocr']
2025-09-23 01:19:13,095 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 01:19:13,095 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:610 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 01:19:13,095 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file b24bfa8b-b584-4379-89e5-52ecd098a606
2025-09-23 01:19:13,102 INFO : pipeline_controller.py:_get_page_count :464 >>> 🔍 PAGE COUNT: Found 5 artefacts for file b24bfa8b-b584-4379-89e5-52ecd098a606
2025-09-23 01:19:13,102 INFO : pipeline_controller.py:_get_page_count :471 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file b24bfa8b-b584-4379-89e5-52ecd098a606
2025-09-23 01:19:13,102 INFO : pipeline_controller.py:_get_page_count :477 >>> ✅ PAGE COUNT: Found page count 3 from docling_json artefact for file b24bfa8b-b584-4379-89e5-52ecd098a606
2025-09-23 01:19:13,102 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 3 pages (< 50 threshold) - creating single bundle
2025-09-23 01:19:13,102 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:727 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 01:19:13,103 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:738 >>> Successfully enqueued docling_bundle task 3832cfa1-780e-4cf2-8d37-76bb68391691 for no_ocr pipeline
2025-09-23 01:19:13,103 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 01:19:13,103 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['3832cfa1-780e-4cf2-8d37-76bb68391691']
2025-09-23 01:19:13,103 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:621 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 01:19:13,103 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for ocr - creating page-based bundles regardless of document size
2025-09-23 01:19:13,103 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file b24bfa8b-b584-4379-89e5-52ecd098a606
2025-09-23 01:19:13,105 INFO : pipeline_controller.py:_get_page_count :464 >>> 🔍 PAGE COUNT: Found 5 artefacts for file b24bfa8b-b584-4379-89e5-52ecd098a606
2025-09-23 01:19:13,105 INFO : pipeline_controller.py:_get_page_count :471 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file b24bfa8b-b584-4379-89e5-52ecd098a606
2025-09-23 01:19:13,105 INFO : pipeline_controller.py:_get_page_count :477 >>> ✅ PAGE COUNT: Found page count 3 from docling_json artefact for file b24bfa8b-b584-4379-89e5-52ecd098a606
2025-09-23 01:19:13,105 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:727 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_pages (timeout: 3600s)
2025-09-23 01:19:13,106 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:738 >>> Successfully enqueued docling_bundle_split task b45ae5b3-eb09-46e6-a1b4-2cbfdc5aff4e for ocr pipeline
2025-09-23 01:19:13,106 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 01:19:13,106 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 2 sequential pipelines with 2 total tasks for file b24bfa8b-b584-4379-89e5-52ecd098a606
2025-09-23 01:25:10,153 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 01:25:10,164 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:10,165 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task bf14ebd6-331c-45bc-98f7-654072242c78
2025-09-23 01:25:10,166 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 6a783d6e-8d8c-48a3-9f70-21875ff217be
2025-09-23 01:25:10,166 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 4e8dd3d4-a759-4df9-b44c-2f20123c81ce
2025-09-23 01:25:10,167 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 1cd7cec2-751b-41db-a512-59f27cf00f0d
2025-09-23 01:25:10,167 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task f2d58dfa-4724-4124-825c-9ab934405733
2025-09-23 01:25:10,167 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 01:25:20,943 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,943 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc: ['no_ocr', 'ocr']
2025-09-23 01:25:20,943 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 01:25:20,943 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 01:25:20,943 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,950 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 01:25:20,950 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,951 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,951 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,951 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,951 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,951 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,951 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,951 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from split_map_json artefact for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,951 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 36 pages (< 50 threshold) - creating single bundle
2025-09-23 01:25:20,951 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 01:25:20,953 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task e3a48a82-3ca5-4912-bf08-7797d34ee5eb for no_ocr pipeline
2025-09-23 01:25:20,953 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 01:25:20,953 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['e3a48a82-3ca5-4912-bf08-7797d34ee5eb']
2025-09-23 01:25:20,953 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 01:25:20,953 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for ocr - creating page-based bundles regardless of document size
2025-09-23 01:25:20,954 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,956 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 01:25:20,956 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,956 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,956 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,956 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,956 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,956 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,956 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,956 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from split_map_json artefact for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:25:20,956 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_pages (timeout: 3600s)
2025-09-23 01:25:20,957 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task d0643980-a696-426d-98e3-8b5318d92fe1 for ocr pipeline
2025-09-23 01:25:20,957 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 01:25:20,957 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 2 sequential pipelines with 2 total tasks for file 7ff9ed66-f328-4c10-bc71-ae2830e139cc
2025-09-23 01:43:09,867 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 01:43:09,886 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:09,892 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 7c108093-7128-49f2-b4e1-0720110b3168
2025-09-23 01:43:09,893 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 99e5f473-dabc-4439-89e7-357e28fca64a
2025-09-23 01:43:09,895 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 8a963a48-1e5c-43df-8237-83b07b6034f6
2025-09-23 01:43:09,895 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 8571c289-57dd-447e-a9b1-da1b2199ef5a
2025-09-23 01:43:09,896 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task bbaed2b2-279b-4427-b911-7233f7507c4e
2025-09-23 01:43:09,896 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 2ae63971-faf0-463d-975c-a817c30dfaf9: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 01:43:25,726 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,726 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 2ae63971-faf0-463d-975c-a817c30dfaf9: ['no_ocr', 'ocr']
2025-09-23 01:43:25,726 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 01:43:25,726 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 01:43:25,726 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,734 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 2ae63971-faf0-463d-975c-a817c30dfaf9: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 01:43:25,734 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,734 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,734 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,734 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,734 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,734 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,734 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,734 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from split_map_json artefact for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,734 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 36 pages (< 50 threshold) - creating single bundle
2025-09-23 01:43:25,735 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 01:43:25,735 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 076276a0-4ad5-40f7-98eb-ec699092a7ee for no_ocr pipeline
2025-09-23 01:43:25,735 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 01:43:25,735 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['076276a0-4ad5-40f7-98eb-ec699092a7ee']
2025-09-23 01:43:25,735 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 01:43:25,735 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for ocr - creating page-based bundles regardless of document size
2025-09-23 01:43:25,736 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,738 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 2ae63971-faf0-463d-975c-a817c30dfaf9: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 01:43:25,738 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,738 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,738 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,738 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,738 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,738 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,738 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,738 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from split_map_json artefact for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 01:43:25,738 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_pages (timeout: 3600s)
2025-09-23 01:43:25,739 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task f1277bc8-2a3a-4c4c-89f7-57e092371ffe for ocr pipeline
2025-09-23 01:43:25,739 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 01:43:25,739 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 2 sequential pipelines with 2 total tasks for file 2ae63971-faf0-463d-975c-a817c30dfaf9
2025-09-23 02:50:36,728 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 02:50:36,750 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:50:36,751 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 9d799067-52ab-4465-9175-13a67c898f47
2025-09-23 02:50:36,752 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task e7301902-2bed-4537-8f05-4ed32c9b2454
2025-09-23 02:50:36,753 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 80bec68c-2167-4014-966c-291c5cba2e18
2025-09-23 02:50:36,753 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 436c65c3-7a3d-4807-bf08-b7f736cadab1
2025-09-23 02:50:36,754 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 25ecea0a-d59c-4274-8192-10af89172d79
2025-09-23 02:50:36,754 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 03fdc513-f06e-473c-9161-02ec11511318: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 02:50:41,868 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:50:41,869 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 584f8455-8274-4bfa-b802-048c2a661cd3
2025-09-23 02:50:41,869 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 950ea4ca-7acd-4417-b011-598757d2e84a
2025-09-23 02:50:41,870 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task bec0a006-d52a-456c-8162-37e755498fdc
2025-09-23 02:50:41,871 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 97062c8e-da2e-4043-aa2a-79d350746ce2
2025-09-23 02:50:41,871 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 06c0c872-86d3-4222-8406-2a2e494e54e8
2025-09-23 02:50:41,871 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file b34594f6-5fe0-4104-a1e6-9073a055bda1: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 02:50:46,934 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:50:46,935 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 019ef979-a230-4c58-93bd-0e3333a00311
2025-09-23 02:50:46,936 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 23578aa8-34f4-480d-b2d4-089414920a2f
2025-09-23 02:50:46,936 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 0d71766b-1a01-45e7-916d-3d480b667d94
2025-09-23 02:50:46,936 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 593a5c68-45fd-46a1-b2c8-9241082b9062
2025-09-23 02:50:46,937 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 37ba2916-1fce-48d8-baff-7e46b8fcc0c0
2025-09-23 02:50:46,937 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 2e9914dc-67d3-44b0-88ce-8151c20a510a: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 02:51:03,349 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:03,350 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 551e8dd6-908d-43ee-b645-a696449c184d
2025-09-23 02:51:03,351 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task d01f59b6-2bdf-4698-91bb-268acd93b123
2025-09-23 02:51:03,351 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 9957dea7-975a-4d83-811c-e9582f3f2f85
2025-09-23 02:51:03,352 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task faa1be54-74f8-4655-b62c-dd4c73a6b2f5
2025-09-23 02:51:03,352 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task ebe0cfe5-c1ae-4b0e-a8aa-03a019c6cf50
2025-09-23 02:51:03,352 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 0332747b-d9d0-48f4-836b-3de59dbcadf3: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 02:51:03,760 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,761 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 03fdc513-f06e-473c-9161-02ec11511318: ['no_ocr', 'ocr', 'vlm']
2025-09-23 02:51:03,761 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 02:51:03,761 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 02:51:03,761 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,768 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 03fdc513-f06e-473c-9161-02ec11511318: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 02:51:03,768 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,768 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,768 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,768 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,769 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,769 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,769 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,769 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,769 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,769 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 32 from split_map_json artefact for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,769 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 32 pages (< 50 threshold) - creating single bundle
2025-09-23 02:51:03,769 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 02:51:03,769 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 71c4ae42-738b-4934-96c5-6ba78d5df950 for no_ocr pipeline
2025-09-23 02:51:03,769 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 02:51:03,769 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['71c4ae42-738b-4934-96c5-6ba78d5df950']
2025-09-23 02:51:03,769 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 02:51:03,769 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,772 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 03fdc513-f06e-473c-9161-02ec11511318: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 32 from split_map_json artefact for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 32 pages (< 50 threshold) - creating single bundle
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 7abe8e12-349f-48ed-a376-e610b5ba0ccc for ocr pipeline
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['7abe8e12-349f-48ed-a376-e610b5ba0ccc']
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 02:51:03,773 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 02:51:03,774 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,777 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 03fdc513-f06e-473c-9161-02ec11511318: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 02:51:03,777 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,777 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,777 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,777 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,777 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,777 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,777 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,777 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,777 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,777 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 32 from split_map_json artefact for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:03,777 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 02:51:03,778 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 99437481-6f86-4188-a3a7-a5710331db6a for vlm pipeline
2025-09-23 02:51:03,778 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 02:51:03,778 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 03fdc513-f06e-473c-9161-02ec11511318
2025-09-23 02:51:24,519 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,520 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file b34594f6-5fe0-4104-a1e6-9073a055bda1: ['no_ocr', 'ocr', 'vlm']
2025-09-23 02:51:24,520 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 02:51:24,520 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 02:51:24,520 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,523 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file b34594f6-5fe0-4104-a1e6-9073a055bda1: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 02:51:24,523 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,523 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,523 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,523 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,523 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,523 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,523 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,523 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,523 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,523 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 21 from split_map_json artefact for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,523 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 21 pages (< 50 threshold) - creating single bundle
2025-09-23 02:51:24,523 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 02:51:24,524 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 8591ac43-4a82-4620-a686-8d54448f8958 for no_ocr pipeline
2025-09-23 02:51:24,524 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 02:51:24,524 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['8591ac43-4a82-4620-a686-8d54448f8958']
2025-09-23 02:51:24,524 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 02:51:24,524 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,526 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file b34594f6-5fe0-4104-a1e6-9073a055bda1: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 02:51:24,527 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,527 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,527 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,527 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,527 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,527 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,527 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,527 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,527 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,527 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 21 from split_map_json artefact for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,527 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 21 pages (< 50 threshold) - creating single bundle
2025-09-23 02:51:24,527 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 02:51:24,527 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 42db7f75-4d3f-49b9-8152-ba333946644e for ocr pipeline
2025-09-23 02:51:24,527 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 02:51:24,528 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['42db7f75-4d3f-49b9-8152-ba333946644e']
2025-09-23 02:51:24,528 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 02:51:24,528 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 02:51:24,528 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,530 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file b34594f6-5fe0-4104-a1e6-9073a055bda1: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 02:51:24,530 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,530 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,530 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,530 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,530 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,530 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,530 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,530 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,530 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,530 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 21 from split_map_json artefact for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:24,530 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 02:51:24,531 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 8a739090-5e0a-4874-af51-c6679aeaa461 for vlm pipeline
2025-09-23 02:51:24,531 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 02:51:24,531 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file b34594f6-5fe0-4104-a1e6-9073a055bda1
2025-09-23 02:51:26,790 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,790 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 2e9914dc-67d3-44b0-88ce-8151c20a510a: ['no_ocr', 'ocr', 'vlm']
2025-09-23 02:51:26,790 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 02:51:26,790 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 02:51:26,790 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,792 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 2e9914dc-67d3-44b0-88ce-8151c20a510a: ['tika_json', 'document_outline_hierarchy', 'docling_frontmatter_json', 'split_map_json', 'docling_json']
2025-09-23 02:51:26,792 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,792 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,792 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,792 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,792 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,792 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,792 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,792 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,804 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 02:51:26,805 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-23 02:51:26,805 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 16c13be6-8368-4206-8bb8-475ff06ee613 for no_ocr pipeline
2025-09-23 02:51:26,805 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 02:51:26,805 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['16c13be6-8368-4206-8bb8-475ff06ee613']
2025-09-23 02:51:26,805 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 02:51:26,805 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,807 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 2e9914dc-67d3-44b0-88ce-8151c20a510a: ['tika_json', 'document_outline_hierarchy', 'docling_frontmatter_json', 'split_map_json', 'docling_json']
2025-09-23 02:51:26,807 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,807 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,807 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,807 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,807 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,807 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,807 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,807 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,814 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 02:51:26,814 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-23 02:51:26,815 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 3e84df87-5fbd-4f8f-a8d9-fc24d0ac5b48 for ocr pipeline
2025-09-23 02:51:26,815 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 02:51:26,815 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['3e84df87-5fbd-4f8f-a8d9-fc24d0ac5b48']
2025-09-23 02:51:26,815 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 02:51:26,815 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 02:51:26,815 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,817 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 2e9914dc-67d3-44b0-88ce-8151c20a510a: ['tika_json', 'document_outline_hierarchy', 'docling_frontmatter_json', 'split_map_json', 'docling_json']
2025-09-23 02:51:26,817 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,817 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,817 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,817 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,817 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,817 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,817 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,817 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:26,818 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 5640s)
2025-09-23 02:51:26,818 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 8d4056b0-135c-4767-9258-40a3ce28c848 for vlm pipeline
2025-09-23 02:51:26,818 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 02:51:26,818 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 2e9914dc-67d3-44b0-88ce-8151c20a510a
2025-09-23 02:51:37,240 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,241 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 0332747b-d9d0-48f4-836b-3de59dbcadf3: ['no_ocr', 'ocr', 'vlm']
2025-09-23 02:51:37,241 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 02:51:37,241 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 02:51:37,241 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,243 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 0332747b-d9d0-48f4-836b-3de59dbcadf3: ['tika_json', 'document_outline_hierarchy', 'docling_frontmatter_json', 'split_map_json', 'docling_json']
2025-09-23 02:51:37,243 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,243 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,244 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,244 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,244 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,244 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,244 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,244 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 866 from split_map_json artefact for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,251 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 866 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 02:51:37,251 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 02:51:37,254 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 842225da-738a-49e2-b6c4-52d6f3a8be89 for no_ocr pipeline
2025-09-23 02:51:37,255 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 02:51:37,255 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['842225da-738a-49e2-b6c4-52d6f3a8be89']
2025-09-23 02:51:37,255 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 02:51:37,255 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,257 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 0332747b-d9d0-48f4-836b-3de59dbcadf3: ['tika_json', 'document_outline_hierarchy', 'docling_frontmatter_json', 'split_map_json', 'docling_json']
2025-09-23 02:51:37,257 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,257 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,257 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,257 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,257 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,257 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,257 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,257 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 866 from split_map_json artefact for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,264 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 866 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 02:51:37,264 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 02:51:37,265 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 7f5bcd2a-7b5b-4e16-bc88-5d37f23e5969 for ocr pipeline
2025-09-23 02:51:37,265 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 02:51:37,265 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['7f5bcd2a-7b5b-4e16-bc88-5d37f23e5969']
2025-09-23 02:51:37,265 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 02:51:37,265 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 02:51:37,265 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,267 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 0332747b-d9d0-48f4-836b-3de59dbcadf3: ['tika_json', 'document_outline_hierarchy', 'docling_frontmatter_json', 'split_map_json', 'docling_json']
2025-09-23 02:51:37,267 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,267 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,267 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,267 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,267 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,267 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,267 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,267 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 866 from split_map_json artefact for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:51:37,267 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 14400s)
2025-09-23 02:51:37,268 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 1aa8535c-a026-4110-b534-92d75b60d46f for vlm pipeline
2025-09-23 02:51:37,268 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 02:51:37,268 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 0332747b-d9d0-48f4-836b-3de59dbcadf3
2025-09-23 02:52:04,136 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 02:52:04,138 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 1ea42256-8ac6-4cf2-a28f-20ac8647d8bd
2025-09-23 02:52:04,138 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 25b8e7ab-532c-4c64-aa6a-9d03b56d4aed
2025-09-23 02:52:04,139 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 1ca0c230-e5c9-4d10-9896-b05066390234
2025-09-23 02:52:04,139 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task cb64571b-62bb-466a-8312-bade9c008733
2025-09-23 02:52:04,140 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task dd02c9cf-2c0d-4c54-b8a2-fd32d854cf73
2025-09-23 02:52:04,140 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 03:14:39,969 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,969 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7: ['no_ocr', 'ocr', 'vlm']
2025-09-23 03:14:39,969 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 03:14:39,969 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 03:14:39,969 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,972 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 03:14:39,972 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,972 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,972 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,972 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,972 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,972 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,972 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,972 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,972 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,972 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,980 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 1450 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 03:14:39,980 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 03:14:39,981 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 0a9c9e2f-f237-4965-96fb-15c709634552 for no_ocr pipeline
2025-09-23 03:14:39,981 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 03:14:39,981 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['0a9c9e2f-f237-4965-96fb-15c709634552']
2025-09-23 03:14:39,981 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 03:14:39,981 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,983 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 03:14:39,983 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,983 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,983 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,983 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,983 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,983 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,983 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,983 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,983 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,983 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,989 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 1450 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 03:14:39,990 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 03:14:39,990 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 90b56cd1-8c7a-4ee1-98b4-0a3649158015 for ocr pipeline
2025-09-23 03:14:39,990 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 03:14:39,990 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['90b56cd1-8c7a-4ee1-98b4-0a3649158015']
2025-09-23 03:14:39,990 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 03:14:39,990 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 03:14:39,990 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,992 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 03:14:39,992 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,992 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,992 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,992 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,992 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,992 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,992 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,992 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,992 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,992 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 03:14:39,992 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 14400s)
2025-09-23 03:14:39,993 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task d80f6c7a-a45f-45ce-8e54-2532571ca0ea for vlm pipeline
2025-09-23 03:14:39,993 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 03:14:39,993 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 58405c70-aafd-4ae7-8b3d-a1b5d95e61e7
2025-09-23 12:16:00,840 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 12:16:00,853 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:00,857 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 32289c00-f7c0-4c8b-bb82-cc448567572c
2025-09-23 12:16:00,858 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task a42a546d-57a7-44e3-ab1d-eed2fe8daa2e
2025-09-23 12:16:00,860 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 227fd8d1-2bea-4faa-be51-796d78de1189
2025-09-23 12:16:00,861 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 71c633b2-c127-4ec1-95b0-51d3fb34504d
2025-09-23 12:16:00,862 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task df070c72-09cb-4f34-adb3-886527bd80d8
2025-09-23 12:16:00,862 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 12:16:06,353 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:06,354 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task db8b9267-3ec0-4c5a-b737-56e71ecf8f4e
2025-09-23 12:16:06,358 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 2e3b42a4-a91b-4bdd-acce-f8688a871c13
2025-09-23 12:16:06,360 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 11637f7b-ed3a-456f-85de-0e4b7336bdc9
2025-09-23 12:16:06,362 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 56264710-6dfc-49ba-8566-dfd953ff53d4
2025-09-23 12:16:06,367 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 13f8da10-a770-48b6-8d56-1523cc2b33bd
2025-09-23 12:16:06,367 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file b041c784-55e6-45c1-9373-91c2d204da9e: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 12:16:10,590 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:10,591 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 1af5a7f1-9c64-4fc4-834e-5ccb21ca21f5
2025-09-23 12:16:10,591 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 86e43052-0d72-4636-9b31-ed6fda2caf20
2025-09-23 12:16:10,593 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 81c7530d-5318-440f-9dec-8f09386ad5df
2025-09-23 12:16:10,594 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 068c0984-ab11-474e-9b1f-91ae9519d6e8
2025-09-23 12:16:10,595 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 1516be7e-718e-44e0-a2e9-a0fe5a8977bf
2025-09-23 12:16:10,595 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 975fdeea-3e39-4b6f-9942-9bfd94d98522: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 12:16:24,366 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:16:24,367 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task f0054980-102d-4112-866c-46261a4e1cfd
2025-09-23 12:16:24,367 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 66f683f0-3843-4578-9f32-2d9df6317dd9
2025-09-23 12:16:24,368 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task d00c8664-e51f-4c90-a5e6-c5773a0edfb1
2025-09-23 12:16:24,369 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task a33c7984-f1be-407c-ab68-ddb487162beb
2025-09-23 12:16:24,369 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 845cb256-5158-4566-867c-2df9f7e45060
2025-09-23 12:16:24,369 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file bc154f7e-0ab3-4297-b8dc-502461f077b3: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 12:16:32,502 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,503 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c: ['no_ocr', 'ocr', 'vlm']
2025-09-23 12:16:32,503 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 12:16:32,503 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 12:16:32,503 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,514 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json']
2025-09-23 12:16:32,514 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,514 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,514 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,514 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,514 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,514 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,515 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,515 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,515 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,515 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from page_images artefact for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,515 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 36 pages (< 50 threshold) - creating single bundle
2025-09-23 12:16:32,515 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 12:16:32,515 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task b79dd7f8-2cf3-4b7c-81ab-6c23a42c83eb for no_ocr pipeline
2025-09-23 12:16:32,515 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 12:16:32,515 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['b79dd7f8-2cf3-4b7c-81ab-6c23a42c83eb']
2025-09-23 12:16:32,515 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 12:16:32,515 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,520 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json']
2025-09-23 12:16:32,520 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,520 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,520 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,520 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,520 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,520 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,520 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,520 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,520 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,520 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from page_images artefact for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,520 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 36 pages (< 50 threshold) - creating single bundle
2025-09-23 12:16:32,520 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 12:16:32,521 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task f3e0172a-dac1-41b3-8e65-8d8f8d136c86 for ocr pipeline
2025-09-23 12:16:32,521 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 12:16:32,521 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['f3e0172a-dac1-41b3-8e65-8d8f8d136c86']
2025-09-23 12:16:32,521 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 12:16:32,521 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 12:16:32,521 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,523 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json']
2025-09-23 12:16:32,523 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,523 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,524 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,524 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,524 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,524 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,524 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,524 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,524 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,524 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from page_images artefact for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:32,524 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 12:16:32,528 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 0b0d8fdc-49a6-49d6-9ecf-bb3e42369a75 for vlm pipeline
2025-09-23 12:16:32,528 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 12:16:32,528 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 13f1bf46-0957-4b6e-aa51-5dc4f00c9a2c
2025-09-23 12:16:35,549 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,549 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 975fdeea-3e39-4b6f-9942-9bfd94d98522: ['no_ocr', 'ocr', 'vlm']
2025-09-23 12:16:35,549 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 12:16:35,549 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 12:16:35,549 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,552 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 975fdeea-3e39-4b6f-9942-9bfd94d98522: ['tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'docling_json', 'split_map_json']
2025-09-23 12:16:35,552 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,552 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,552 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,552 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,552 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,552 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,552 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,552 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,552 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,552 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,711 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 12:16:35,711 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-23 12:16:35,712 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 58cbccfb-600d-48af-aafe-7fd3462de14e for no_ocr pipeline
2025-09-23 12:16:35,712 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 12:16:35,712 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['58cbccfb-600d-48af-aafe-7fd3462de14e']
2025-09-23 12:16:35,712 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 12:16:35,712 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,714 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 975fdeea-3e39-4b6f-9942-9bfd94d98522: ['tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'docling_json', 'split_map_json']
2025-09-23 12:16:35,714 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,714 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,714 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,714 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,714 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,714 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,714 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,714 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,714 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,715 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,792 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 12:16:35,792 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-23 12:16:35,793 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 7143fdb7-6e8e-425e-8b3d-2a29a9dfc4fc for ocr pipeline
2025-09-23 12:16:35,793 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 12:16:35,793 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['7143fdb7-6e8e-425e-8b3d-2a29a9dfc4fc']
2025-09-23 12:16:35,793 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 12:16:35,793 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 12:16:35,793 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,796 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 975fdeea-3e39-4b6f-9942-9bfd94d98522: ['tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'docling_json', 'split_map_json']
2025-09-23 12:16:35,796 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,796 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,796 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,796 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,796 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,796 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,796 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,796 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,796 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,796 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:35,796 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 5640s)
2025-09-23 12:16:35,797 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task c3a8d63b-fb36-47b1-bd7a-082f7d74e2c2 for vlm pipeline
2025-09-23 12:16:35,797 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 12:16:35,797 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 975fdeea-3e39-4b6f-9942-9bfd94d98522
2025-09-23 12:16:37,970 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:37,970 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file b041c784-55e6-45c1-9373-91c2d204da9e: ['no_ocr', 'ocr', 'vlm']
2025-09-23 12:16:37,970 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 12:16:37,970 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 12:16:37,970 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:37,972 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file b041c784-55e6-45c1-9373-91c2d204da9e: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'page_images']
2025-09-23 12:16:38,038 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,038 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,038 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,038 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,038 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,038 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,038 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,038 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,038 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,038 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from split_map_json artefact for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,038 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 18 pages (< 50 threshold) - creating single bundle
2025-09-23 12:16:38,039 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 12:16:38,039 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 4d54d0d9-b425-4629-9d97-07661c0497cc for no_ocr pipeline
2025-09-23 12:16:38,039 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 12:16:38,040 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['4d54d0d9-b425-4629-9d97-07661c0497cc']
2025-09-23 12:16:38,040 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 12:16:38,040 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,046 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file b041c784-55e6-45c1-9373-91c2d204da9e: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'page_images']
2025-09-23 12:16:38,046 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,046 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,101 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,117 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,117 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,117 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,117 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,117 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,117 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,117 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from split_map_json artefact for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,117 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 18 pages (< 50 threshold) - creating single bundle
2025-09-23 12:16:38,117 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 12:16:38,118 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task ff18e1c2-663d-4520-93ff-d890e332281a for ocr pipeline
2025-09-23 12:16:38,118 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 12:16:38,118 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['ff18e1c2-663d-4520-93ff-d890e332281a']
2025-09-23 12:16:38,118 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 12:16:38,118 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 12:16:38,118 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,122 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file b041c784-55e6-45c1-9373-91c2d204da9e: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'page_images']
2025-09-23 12:16:38,122 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,122 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,122 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,122 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,122 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,122 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,122 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,122 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,122 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,122 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from split_map_json artefact for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:38,123 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 12:16:38,123 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 1db1f32e-1a35-45c6-86ee-9c939e26a8b7 for vlm pipeline
2025-09-23 12:16:38,123 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 12:16:38,123 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file b041c784-55e6-45c1-9373-91c2d204da9e
2025-09-23 12:16:46,012 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:16:46,014 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 3fffc891-8a57-44bc-bace-9ae586cf59d4
2025-09-23 12:16:46,016 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task c96471a7-847f-4007-885b-3e53e2dbede5
2025-09-23 12:16:46,017 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 73a26c88-0f59-4103-bcbc-de69632f28c3
2025-09-23 12:16:46,017 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 455e518f-b152-424c-b3be-f5ee9b13636d
2025-09-23 12:16:46,018 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 8ec3ebc9-20c0-4244-91dc-6e3395efce15
2025-09-23 12:16:46,018 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file b531b893-b47e-4a1e-ab17-2cd5beb0d258: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 12:17:17,601 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,602 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file bc154f7e-0ab3-4297-b8dc-502461f077b3: ['no_ocr', 'ocr', 'vlm']
2025-09-23 12:17:17,602 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 12:17:17,602 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 12:17:17,602 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,604 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file bc154f7e-0ab3-4297-b8dc-502461f077b3: ['tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'docling_json', 'split_map_json']
2025-09-23 12:17:17,604 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,604 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,604 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,605 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,605 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,605 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,605 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,605 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,605 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,605 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,613 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 1450 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 12:17:17,613 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 12:17:17,613 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 6d969f69-283b-49b0-a2e6-4e9eda11a31e for no_ocr pipeline
2025-09-23 12:17:17,613 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 12:17:17,614 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['6d969f69-283b-49b0-a2e6-4e9eda11a31e']
2025-09-23 12:17:17,614 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 12:17:17,614 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,615 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file bc154f7e-0ab3-4297-b8dc-502461f077b3: ['tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'docling_json', 'split_map_json']
2025-09-23 12:17:17,615 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,615 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,615 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,615 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,615 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,616 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,616 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,616 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,616 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,616 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,622 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 1450 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 12:17:17,622 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 12:17:17,623 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 87fbf393-3533-4c3f-b016-3c9254f2386d for ocr pipeline
2025-09-23 12:17:17,623 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 12:17:17,623 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['87fbf393-3533-4c3f-b016-3c9254f2386d']
2025-09-23 12:17:17,623 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 12:17:17,623 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 12:17:17,623 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,625 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file bc154f7e-0ab3-4297-b8dc-502461f077b3: ['tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'docling_json', 'split_map_json']
2025-09-23 12:17:17,625 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,625 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,625 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,625 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,625 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,625 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,625 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,625 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,625 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,625 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:17,625 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 14400s)
2025-09-23 12:17:17,626 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task fedda84d-2b43-460f-a027-54eddac10acb for vlm pipeline
2025-09-23 12:17:17,626 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 12:17:17,626 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file bc154f7e-0ab3-4297-b8dc-502461f077b3
2025-09-23 12:17:51,636 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,659 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file b531b893-b47e-4a1e-ab17-2cd5beb0d258: ['no_ocr', 'ocr', 'vlm']
2025-09-23 12:17:51,659 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 12:17:51,659 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 12:17:51,659 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,662 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file b531b893-b47e-4a1e-ab17-2cd5beb0d258: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 12:17:51,662 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,662 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,662 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,662 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,662 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,662 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,662 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,662 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,662 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,662 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 866 from split_map_json artefact for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,751 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 866 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 12:17:51,751 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 12:17:51,752 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task fa44ddf4-8a23-43a7-9605-d46b192dc9ca for no_ocr pipeline
2025-09-23 12:17:51,752 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 12:17:51,752 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['fa44ddf4-8a23-43a7-9605-d46b192dc9ca']
2025-09-23 12:17:51,752 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 12:17:51,752 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,755 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file b531b893-b47e-4a1e-ab17-2cd5beb0d258: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 12:17:51,755 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,755 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,755 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,755 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,755 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,755 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,755 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,755 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,755 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,755 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 866 from split_map_json artefact for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,803 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 866 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 12:17:51,896 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 12:17:51,897 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 6789310c-2528-4737-9e51-b4e0a721cd59 for ocr pipeline
2025-09-23 12:17:51,897 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 12:17:51,897 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['6789310c-2528-4737-9e51-b4e0a721cd59']
2025-09-23 12:17:51,897 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 12:17:51,897 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 12:17:51,897 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,901 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file b531b893-b47e-4a1e-ab17-2cd5beb0d258: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 12:17:51,901 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,901 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,901 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,901 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,901 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,901 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,901 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,901 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,901 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,901 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 866 from split_map_json artefact for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:17:51,901 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 14400s)
2025-09-23 12:17:51,902 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 664e1326-ee5a-45ee-8086-408575613be2 for vlm pipeline
2025-09-23 12:17:51,902 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 12:17:51,902 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file b531b893-b47e-4a1e-ab17-2cd5beb0d258
2025-09-23 12:36:49,039 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 12:36:49,051 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:36:49,054 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 7253ef80-3094-4625-8d60-b6090ca9baef
2025-09-23 12:36:49,054 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 959f479b-44cf-442a-90af-19e1924cdf9a
2025-09-23 12:36:49,057 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task acaad6b3-4686-44a2-b40c-3d6073645de0
2025-09-23 12:36:49,057 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 25edc5b7-641e-4079-8f13-9fe97a362158
2025-09-23 12:36:49,058 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task c04e1330-bec2-43ea-83b9-3d9766e08af1
2025-09-23 12:36:49,058 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 12:36:51,927 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:36:51,929 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task fef82615-502a-4da9-b2ac-ac9e634fe3c3
2025-09-23 12:36:51,929 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 49204e5a-94ce-4829-85d6-aafae011060f
2025-09-23 12:36:51,930 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task ad11c210-db0b-486a-ab23-b23d915e31ea
2025-09-23 12:36:51,931 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 216227be-11b9-4f4d-ad9e-9b35cc145324
2025-09-23 12:36:51,933 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 0f024e5e-4fcd-4b19-99fa-743d6def0888
2025-09-23 12:36:51,933 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file e0616e74-efd0-4601-8b0d-17eca5e79af0: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 12:37:29,622 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,623 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e: ['no_ocr', 'ocr', 'vlm']
2025-09-23 12:37:29,623 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 12:37:29,623 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 12:37:29,623 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,638 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'tika_json']
2025-09-23 12:37:29,638 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,638 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,638 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,638 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,638 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,638 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,638 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,638 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from page_images artefact for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,638 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 18 pages (< 50 threshold) - creating single bundle
2025-09-23 12:37:29,638 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 12:37:29,639 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 3004d37a-ab40-4f75-b0a5-8109bd846678 for no_ocr pipeline
2025-09-23 12:37:29,639 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 12:37:29,639 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['3004d37a-ab40-4f75-b0a5-8109bd846678']
2025-09-23 12:37:29,639 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 12:37:29,639 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,649 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'tika_json']
2025-09-23 12:37:29,649 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,649 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,649 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,649 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,649 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,649 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,649 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,650 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from page_images artefact for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,650 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 18 pages (< 50 threshold) - creating single bundle
2025-09-23 12:37:29,650 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 12:37:29,650 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 0c742d11-fac5-4f7a-b823-12ee1db0f1a9 for ocr pipeline
2025-09-23 12:37:29,650 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 12:37:29,650 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['0c742d11-fac5-4f7a-b823-12ee1db0f1a9']
2025-09-23 12:37:29,650 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 12:37:29,650 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 12:37:29,650 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,653 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'tika_json']
2025-09-23 12:37:29,653 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,653 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,653 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,653 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,653 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,654 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,654 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,654 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from page_images artefact for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:29,654 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 12:37:29,700 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 1783349a-e9ea-4d2f-91b0-a36b0a6a5ef7 for vlm pipeline
2025-09-23 12:37:29,700 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 12:37:29,700 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 43c1b3ed-7b78-4aa5-842a-05db7a6e468e
2025-09-23 12:37:32,909 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,909 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file e0616e74-efd0-4601-8b0d-17eca5e79af0: ['no_ocr', 'ocr', 'vlm']
2025-09-23 12:37:32,909 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 12:37:32,909 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 12:37:32,909 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,911 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file e0616e74-efd0-4601-8b0d-17eca5e79af0: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'tika_json']
2025-09-23 12:37:32,911 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,912 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,912 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,912 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,912 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,912 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,912 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,912 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from page_images artefact for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,912 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 36 pages (< 50 threshold) - creating single bundle
2025-09-23 12:37:32,912 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 12:37:32,913 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 8b05797b-ca72-4096-b719-b5ba085ea8eb for no_ocr pipeline
2025-09-23 12:37:32,913 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 12:37:32,913 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['8b05797b-ca72-4096-b719-b5ba085ea8eb']
2025-09-23 12:37:32,913 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 12:37:32,913 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,915 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file e0616e74-efd0-4601-8b0d-17eca5e79af0: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'tika_json']
2025-09-23 12:37:32,915 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,915 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,915 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,915 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,915 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,915 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,915 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,915 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from page_images artefact for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,915 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 36 pages (< 50 threshold) - creating single bundle
2025-09-23 12:37:32,915 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 12:37:32,916 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task a889e114-123b-4751-b3f4-fbba9ddbea96 for ocr pipeline
2025-09-23 12:37:32,916 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 12:37:32,916 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['a889e114-123b-4751-b3f4-fbba9ddbea96']
2025-09-23 12:37:32,916 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 12:37:32,916 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 12:37:32,916 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,918 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file e0616e74-efd0-4601-8b0d-17eca5e79af0: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'tika_json']
2025-09-23 12:37:32,918 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,918 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,918 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,918 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,918 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,918 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,918 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,918 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from page_images artefact for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:37:32,918 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 12:37:32,918 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 7bdd1a10-3289-47b2-bd70-266c1e9042c7 for vlm pipeline
2025-09-23 12:37:32,918 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 12:37:32,918 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file e0616e74-efd0-4601-8b0d-17eca5e79af0
2025-09-23 12:45:54,401 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 720c33a2-540b-4a98-a591-65a8cbef16d0
2025-09-23 12:45:54,402 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task b2ed3858-8e24-4830-827d-8ef0d86d7fea
2025-09-23 12:45:54,403 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 1b3c70bf-b023-431d-a24f-175fe58233d3
2025-09-23 12:45:54,404 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task dd2890d7-309e-442a-8920-dd273cb93181
2025-09-23 12:45:54,405 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 51227668-9a25-4f52-856c-2af98d13d48f
2025-09-23 12:45:54,405 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task d5037490-cfc1-4494-ad5d-d303d3560fe3
2025-09-23 12:45:54,405 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 720c33a2-540b-4a98-a591-65a8cbef16d0: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 12:46:00,530 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:00,532 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 734d182f-b93b-4632-a5c8-fadec7791a19
2025-09-23 12:46:00,533 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task e773b9a2-d59a-4932-9190-0c3c18578b7b
2025-09-23 12:46:00,534 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 45e9cda4-2241-4653-8f02-8c45bde75422
2025-09-23 12:46:00,535 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 8eecdaf8-4efd-4cc2-ae5e-e9335e433b10
2025-09-23 12:46:00,535 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 0a626a63-7696-46ae-9e13-6422a24a59a3
2025-09-23 12:46:00,535 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 12:46:06,155 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:06,157 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task fd99ae39-198d-40bd-9641-1d9e1d4c139e
2025-09-23 12:46:06,157 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 9bd3b053-0dd1-4898-ab2b-92b96f2fa086
2025-09-23 12:46:06,158 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task f2f9c01f-5479-4e60-9710-f46319bd6294
2025-09-23 12:46:06,158 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 0255d12f-e325-4d45-9a8b-4a2a9e1101fb
2025-09-23 12:46:06,159 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 24e57b1a-0a63-4849-a3a6-faf451acf3a0
2025-09-23 12:46:06,159 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 323b7478-6a14-4561-a9bf-541d591a8ced: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 12:46:11,139 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:11,140 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 9c166e3a-f7bf-45fe-baf9-467b76637538
2025-09-23 12:46:11,141 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 17755bb4-259d-4d75-afc5-8c26eab9f263
2025-09-23 12:46:11,141 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 082f28d1-50ba-4373-b35a-d352e5ab8ab6
2025-09-23 12:46:11,142 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 85662edd-2082-40c5-9650-7bf69ac41530
2025-09-23 12:46:11,142 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 4e41b073-658b-4c1b-b426-23fd73c6447b
2025-09-23 12:46:11,142 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 595bd07e-cbb8-4f5c-827d-7235576428a7: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 12:46:16,840 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:16,842 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 255bf5b3-2c06-43fb-abbc-375bf6d3e8b5
2025-09-23 12:46:16,842 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 66d57753-4d1b-402b-8930-b7c7fcc58671
2025-09-23 12:46:16,843 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task ef27fab6-b095-462f-a2ec-addf5f64b6fb
2025-09-23 12:46:16,843 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 3a03a865-e85c-4dd7-9ce8-20ff9047f55f
2025-09-23 12:46:16,843 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 994196b2-a533-4a18-ab50-a8875d301871
2025-09-23 12:46:16,843 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 715148b8-0713-43b6-b029-74fe6e770865: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 12:46:28,289 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,290 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7: ['no_ocr', 'ocr', 'vlm']
2025-09-23 12:46:28,290 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 12:46:28,290 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 12:46:28,290 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,366 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 12:46:28,366 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,366 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,366 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,366 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,366 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,366 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,366 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,366 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,542 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 1450 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 12:46:28,542 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 12:46:28,543 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 5fe48003-2432-4d90-9ab3-02d8fb209a31 for no_ocr pipeline
2025-09-23 12:46:28,543 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 12:46:28,543 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['5fe48003-2432-4d90-9ab3-02d8fb209a31']
2025-09-23 12:46:28,543 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 12:46:28,543 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,545 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 12:46:28,545 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,545 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,545 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,545 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,545 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,545 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,545 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,545 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,552 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 1450 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 12:46:28,553 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 12:46:28,639 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 37e6390d-ede8-473b-97ac-3eb58a214fc7 for ocr pipeline
2025-09-23 12:46:28,639 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 12:46:28,639 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['37e6390d-ede8-473b-97ac-3eb58a214fc7']
2025-09-23 12:46:28,639 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 12:46:28,639 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 12:46:28,639 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,642 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 12:46:28,642 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,643 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,643 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,643 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,643 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,643 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,643 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,643 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:28,643 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 14400s)
2025-09-23 12:46:28,643 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 598df882-9e3d-4512-85eb-30a252e94895 for vlm pipeline
2025-09-23 12:46:28,643 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 12:46:28,643 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 889eb9c5-04b2-48d1-9447-9d2c9e6d95f7
2025-09-23 12:46:40,801 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,802 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 595bd07e-cbb8-4f5c-827d-7235576428a7: ['no_ocr', 'ocr', 'vlm']
2025-09-23 12:46:40,802 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 12:46:40,802 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 12:46:40,802 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,805 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 595bd07e-cbb8-4f5c-827d-7235576428a7: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'tika_json']
2025-09-23 12:46:40,805 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,805 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,805 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,805 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,806 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,806 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,806 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,806 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 21 from page_images artefact for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,806 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 21 pages (< 50 threshold) - creating single bundle
2025-09-23 12:46:40,806 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 12:46:40,806 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 56f9b115-1d4b-4a20-a2bd-1fc4e8545ea7 for no_ocr pipeline
2025-09-23 12:46:40,806 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 12:46:40,806 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['56f9b115-1d4b-4a20-a2bd-1fc4e8545ea7']
2025-09-23 12:46:40,807 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 12:46:40,807 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,809 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 595bd07e-cbb8-4f5c-827d-7235576428a7: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'tika_json']
2025-09-23 12:46:40,815 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,888 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,915 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,916 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,916 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,916 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,916 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,917 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 21 from page_images artefact for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,917 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 21 pages (< 50 threshold) - creating single bundle
2025-09-23 12:46:40,917 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 12:46:40,918 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 9b2e8653-f947-4437-963f-8b50b45ff902 for ocr pipeline
2025-09-23 12:46:40,918 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 12:46:40,919 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['9b2e8653-f947-4437-963f-8b50b45ff902']
2025-09-23 12:46:40,919 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 12:46:40,919 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 12:46:40,919 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,925 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 595bd07e-cbb8-4f5c-827d-7235576428a7: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'tika_json']
2025-09-23 12:46:40,925 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,925 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,926 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,926 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,926 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,926 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,926 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,926 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 21 from page_images artefact for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:40,926 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 12:46:40,928 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 127bed42-b7af-4414-b8e8-5c7ab0fc3158 for vlm pipeline
2025-09-23 12:46:40,928 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 12:46:40,928 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 595bd07e-cbb8-4f5c-827d-7235576428a7
2025-09-23 12:46:49,618 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,619 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 323b7478-6a14-4561-a9bf-541d591a8ced: ['no_ocr', 'ocr', 'vlm']
2025-09-23 12:46:49,619 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 12:46:49,619 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 12:46:49,619 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,727 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 323b7478-6a14-4561-a9bf-541d591a8ced: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 12:46:49,727 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,727 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,727 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,728 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,728 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,728 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,728 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,728 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 32 from split_map_json artefact for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,728 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 32 pages (< 50 threshold) - creating single bundle
2025-09-23 12:46:49,728 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 12:46:49,729 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task b6151be8-965e-4066-a46e-f5e80d8f3ff6 for no_ocr pipeline
2025-09-23 12:46:49,729 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 12:46:49,729 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['b6151be8-965e-4066-a46e-f5e80d8f3ff6']
2025-09-23 12:46:49,729 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 12:46:49,729 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,734 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 323b7478-6a14-4561-a9bf-541d591a8ced: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 12:46:49,734 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,734 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,734 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,734 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,734 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,734 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,734 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,734 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 32 from split_map_json artefact for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,734 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 32 pages (< 50 threshold) - creating single bundle
2025-09-23 12:46:49,735 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 12:46:49,735 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 8b6996c2-26dd-45aa-8bda-ed6e047bcd3a for ocr pipeline
2025-09-23 12:46:49,735 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 12:46:49,735 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['8b6996c2-26dd-45aa-8bda-ed6e047bcd3a']
2025-09-23 12:46:49,735 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 12:46:49,735 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 12:46:49,735 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,739 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 323b7478-6a14-4561-a9bf-541d591a8ced: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 12:46:49,739 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,739 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,739 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,739 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,739 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,739 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,739 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,739 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 32 from split_map_json artefact for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:49,739 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 12:46:49,845 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 254374eb-b1a7-4a85-a51d-a97dade83bf8 for vlm pipeline
2025-09-23 12:46:49,845 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 12:46:49,845 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 323b7478-6a14-4561-a9bf-541d591a8ced
2025-09-23 12:46:50,561 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,562 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 715148b8-0713-43b6-b029-74fe6e770865: ['no_ocr', 'ocr', 'vlm']
2025-09-23 12:46:50,562 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 12:46:50,562 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 12:46:50,562 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,648 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 715148b8-0713-43b6-b029-74fe6e770865: ['docling_json', 'tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 12:46:50,648 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,648 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,648 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,648 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,648 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,648 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,649 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,649 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,649 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,649 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,755 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 12:46:50,755 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-23 12:46:50,756 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task f5123109-7305-49bc-b5b0-7c1504dad1f1 for no_ocr pipeline
2025-09-23 12:46:50,756 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 12:46:50,756 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['f5123109-7305-49bc-b5b0-7c1504dad1f1']
2025-09-23 12:46:50,756 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 12:46:50,756 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,758 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 715148b8-0713-43b6-b029-74fe6e770865: ['docling_json', 'tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 12:46:50,758 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,758 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,758 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,759 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,759 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,759 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,759 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,759 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,759 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,759 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,862 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 12:46:50,862 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-23 12:46:50,863 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 1c7564e4-e845-41ad-b4c3-44da03371f3f for ocr pipeline
2025-09-23 12:46:50,863 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 12:46:50,863 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['1c7564e4-e845-41ad-b4c3-44da03371f3f']
2025-09-23 12:46:50,863 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 12:46:50,863 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 12:46:50,863 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,866 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 715148b8-0713-43b6-b029-74fe6e770865: ['docling_json', 'tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 12:46:50,866 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,866 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,866 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,866 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,866 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,866 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,866 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,866 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,866 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,866 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 12:46:50,866 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 5640s)
2025-09-23 12:46:50,867 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 7e1e7cfc-6960-49c6-93b9-cacb66fa8753 for vlm pipeline
2025-09-23 12:46:50,867 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 12:46:50,867 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 715148b8-0713-43b6-b029-74fe6e770865
2025-09-23 13:34:05,538 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 13:34:05,551 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:05,558 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task fefaafcd-ac03-40b3-9550-16b4a1588963
2025-09-23 13:34:05,559 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task a75b68c1-b4fa-4e30-8cfd-b022cf1828b1
2025-09-23 13:34:05,560 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 0047000c-abd9-4567-b378-92513577d16e
2025-09-23 13:34:05,561 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 791f0b47-3ebf-4814-999b-2484baef7a10
2025-09-23 13:34:05,562 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 27d6e586-8c1b-49c8-9543-30de6e6af22e
2025-09-23 13:34:05,562 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 32dea469-8e14-4a33-8284-91a4752d63d5: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 13:34:09,527 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:34:09,529 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 80665571-7f97-4c7b-863b-8245c31ef27c
2025-09-23 13:34:09,530 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 72c4c21a-f32a-4e7b-94ff-49fe6a698fa5
2025-09-23 13:34:09,531 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 3eb9e021-c32f-4498-993d-82851e399f5f
2025-09-23 13:34:09,533 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 91f78728-b6a1-4a5d-93f5-1617c54f7180
2025-09-23 13:34:09,534 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 403fbf0b-5449-44d6-a0cc-d7c530a7b1c1
2025-09-23 13:34:09,534 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 0d9053e9-002d-47ae-adc1-2acd86463bdf: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 13:34:12,771 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:34:12,774 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task db071268-dc4d-48fa-851f-abce7f981675
2025-09-23 13:34:12,775 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task fd67b671-662e-42ea-8720-3ab6142a51d1
2025-09-23 13:34:12,776 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task b88d6d37-84b6-420a-831c-c676b06a0ebe
2025-09-23 13:34:12,777 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 0169ee73-6cb8-41d8-b68a-584d67123b9f
2025-09-23 13:34:12,779 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task c86b54c8-d2c4-40cb-b9f0-2ac53b37ea74
2025-09-23 13:34:12,779 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file eb50d54f-55ea-4294-af7e-d74764777c16: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 13:34:17,320 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:34:17,322 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task a40aaf27-b5a0-4a04-aee3-66a36aa5c31b
2025-09-23 13:34:17,322 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 1d4f4981-e53b-4a1b-b8eb-7bee743488c5
2025-09-23 13:34:17,323 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task f49b463d-24b0-49d0-9be3-e87ca4e3e190
2025-09-23 13:34:17,324 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 1bd257dd-633c-4638-afc5-f6bc79fb7288
2025-09-23 13:34:17,325 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task e2ebb9d2-0903-4a20-afcf-427cab22424d
2025-09-23 13:34:17,325 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file bcf16d27-4a79-416a-8c20-61a884ce2cae: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 13:34:21,125 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:34:21,127 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 35057e6f-7620-4653-9901-138e1baa710b
2025-09-23 13:34:21,128 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task abc0cea4-c08a-46d6-9e2c-8a3de12e675b
2025-09-23 13:34:21,129 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task a3b6bb44-7420-4395-a8ea-ba5051c7f1fe
2025-09-23 13:34:21,131 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 1d229422-4630-4f29-a0b5-17b5cc9c7883
2025-09-23 13:34:21,132 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task fe26a9c3-b229-40ba-90ee-9a60143e137f
2025-09-23 13:34:21,132 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file a6fb9839-ab3e-48e9-8c2a-ed424289d339: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 13:34:36,051 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,052 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 32dea469-8e14-4a33-8284-91a4752d63d5: ['no_ocr', 'ocr', 'vlm']
2025-09-23 13:34:36,052 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 13:34:36,052 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 13:34:36,052 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,182 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 32dea469-8e14-4a33-8284-91a4752d63d5: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 13:34:36,201 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,202 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,202 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,202 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,202 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,202 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,202 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,202 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,202 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,202 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,303 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 13:34:36,303 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-23 13:34:36,305 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 305da3fd-0f60-434d-b386-69a9db25654b for no_ocr pipeline
2025-09-23 13:34:36,305 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 13:34:36,305 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['305da3fd-0f60-434d-b386-69a9db25654b']
2025-09-23 13:34:36,305 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 13:34:36,305 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,371 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 32dea469-8e14-4a33-8284-91a4752d63d5: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 13:34:36,371 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,371 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,371 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,371 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,371 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,371 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,371 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,371 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,371 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,372 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,452 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 13:34:36,453 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-23 13:34:36,454 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task fe1ce53c-a964-4093-bb3b-3053566a4798 for ocr pipeline
2025-09-23 13:34:36,454 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 13:34:36,454 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['fe1ce53c-a964-4093-bb3b-3053566a4798']
2025-09-23 13:34:36,454 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 13:34:36,454 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 13:34:36,454 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,458 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 32dea469-8e14-4a33-8284-91a4752d63d5: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 13:34:36,458 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,458 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,458 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,458 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,458 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,458 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,458 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,458 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,458 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,458 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:36,458 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 5640s)
2025-09-23 13:34:36,459 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 86dbe5b2-9c03-4f4f-a3c9-b7d11a875050 for vlm pipeline
2025-09-23 13:34:36,459 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 13:34:36,459 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 32dea469-8e14-4a33-8284-91a4752d63d5
2025-09-23 13:34:41,301 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 6ec9766d-45ca-4e0d-a6ba-fefae29366d9
2025-09-23 13:34:41,301 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task b9e53b49-42fb-4547-8179-6107c2febdce
2025-09-23 13:34:41,302 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 64ad489a-7535-41f2-9976-61fca9d84bcc
2025-09-23 13:34:41,447 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 33c729d5-52ae-43f7-b162-b5434b9ffce2
2025-09-23 13:34:41,448 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 6a20f72b-53ee-4781-a402-5c40f9ba20b5
2025-09-23 13:34:41,449 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task e2f8008e-23aa-4c3f-864b-b5e836477747
2025-09-23 13:34:41,449 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 6ec9766d-45ca-4e0d-a6ba-fefae29366d9: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 13:34:47,206 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:34:47,214 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 91521def-e278-4728-a4eb-35ec53e36537
2025-09-23 13:34:47,215 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 61adf8d1-7ee9-426d-b1ca-ef293eda7f43
2025-09-23 13:34:47,216 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task d282f013-f247-4a12-8bc5-49ebded307ea
2025-09-23 13:34:47,218 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task b8167fa8-a577-494b-8c0d-f6ba90622b34
2025-09-23 13:34:47,219 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 7d458e0b-1738-4e25-9337-6e0ee842a23f
2025-09-23 13:34:47,219 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 13:35:07,059 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,060 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file bcf16d27-4a79-416a-8c20-61a884ce2cae: ['no_ocr', 'ocr', 'vlm']
2025-09-23 13:35:07,060 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 13:35:07,060 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 13:35:07,060 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,062 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file bcf16d27-4a79-416a-8c20-61a884ce2cae: ['tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'split_map_json', 'docling_json', 'page_images']
2025-09-23 13:35:07,062 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,062 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,062 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,062 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,062 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,062 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,062 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,062 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from split_map_json artefact for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,063 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 36 pages (< 50 threshold) - creating single bundle
2025-09-23 13:35:07,063 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:35:07,063 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task a82f95e9-0ee6-44a2-a190-994aa8132e45 for no_ocr pipeline
2025-09-23 13:35:07,063 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 13:35:07,063 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['a82f95e9-0ee6-44a2-a190-994aa8132e45']
2025-09-23 13:35:07,064 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 13:35:07,064 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,066 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file bcf16d27-4a79-416a-8c20-61a884ce2cae: ['tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'split_map_json', 'docling_json', 'page_images']
2025-09-23 13:35:07,066 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,066 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,066 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,066 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,066 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,066 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,066 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,066 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from split_map_json artefact for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,066 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 36 pages (< 50 threshold) - creating single bundle
2025-09-23 13:35:07,066 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:35:07,067 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task b949d2d5-6df5-44b5-a847-bcb3dbe2ef3f for ocr pipeline
2025-09-23 13:35:07,067 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 13:35:07,067 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['b949d2d5-6df5-44b5-a847-bcb3dbe2ef3f']
2025-09-23 13:35:07,067 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 13:35:07,067 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 13:35:07,067 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,069 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file bcf16d27-4a79-416a-8c20-61a884ce2cae: ['tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'split_map_json', 'docling_json', 'page_images']
2025-09-23 13:35:07,069 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,069 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,069 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,069 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,069 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,069 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,069 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,069 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from split_map_json artefact for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:07,069 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 13:35:07,070 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 7e1e6e54-5c25-4514-a2f9-4ac134bc9f5e for vlm pipeline
2025-09-23 13:35:07,070 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 13:35:07,070 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file bcf16d27-4a79-416a-8c20-61a884ce2cae
2025-09-23 13:35:13,159 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,159 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file eb50d54f-55ea-4294-af7e-d74764777c16: ['no_ocr', 'ocr', 'vlm']
2025-09-23 13:35:13,159 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 13:35:13,159 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 13:35:13,159 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,163 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file eb50d54f-55ea-4294-af7e-d74764777c16: ['tika_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'docling_frontmatter_json', 'docling_json']
2025-09-23 13:35:13,163 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,163 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,163 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,163 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,163 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,163 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from page_images artefact for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,163 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 18 pages (< 50 threshold) - creating single bundle
2025-09-23 13:35:13,163 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:35:13,165 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 8a8eb5c0-60f6-4fb1-9477-b2bd3867fe29 for no_ocr pipeline
2025-09-23 13:35:13,165 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 13:35:13,165 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['8a8eb5c0-60f6-4fb1-9477-b2bd3867fe29']
2025-09-23 13:35:13,165 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 13:35:13,165 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,168 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file eb50d54f-55ea-4294-af7e-d74764777c16: ['tika_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'docling_frontmatter_json', 'docling_json']
2025-09-23 13:35:13,168 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,168 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,168 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,168 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,168 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,168 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from page_images artefact for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,169 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 18 pages (< 50 threshold) - creating single bundle
2025-09-23 13:35:13,169 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:35:13,170 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 853909b5-44ae-47ce-81d0-4ba8d916d56f for ocr pipeline
2025-09-23 13:35:13,170 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 13:35:13,170 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['853909b5-44ae-47ce-81d0-4ba8d916d56f']
2025-09-23 13:35:13,170 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 13:35:13,170 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 13:35:13,170 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,172 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file eb50d54f-55ea-4294-af7e-d74764777c16: ['tika_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'docling_frontmatter_json', 'docling_json']
2025-09-23 13:35:13,172 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,172 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,172 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,172 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,172 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,172 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from page_images artefact for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:13,173 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 13:35:13,173 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task d31a91d0-6158-41ca-b25e-451a3897c12e for vlm pipeline
2025-09-23 13:35:13,173 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 13:35:13,174 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file eb50d54f-55ea-4294-af7e-d74764777c16
2025-09-23 13:35:26,017 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,018 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 0d9053e9-002d-47ae-adc1-2acd86463bdf: ['no_ocr', 'ocr', 'vlm']
2025-09-23 13:35:26,018 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 13:35:26,018 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 13:35:26,018 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,021 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 0d9053e9-002d-47ae-adc1-2acd86463bdf: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json']
2025-09-23 13:35:26,021 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,021 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,021 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,021 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,021 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,021 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,021 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,021 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,021 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,021 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from page_images artefact for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,021 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 36 pages (< 50 threshold) - creating single bundle
2025-09-23 13:35:26,021 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:35:26,023 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task f2ed95ff-52a8-461e-819a-8eea5d0a2df8 for no_ocr pipeline
2025-09-23 13:35:26,023 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 13:35:26,023 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['f2ed95ff-52a8-461e-819a-8eea5d0a2df8']
2025-09-23 13:35:26,023 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 13:35:26,023 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,025 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 0d9053e9-002d-47ae-adc1-2acd86463bdf: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json']
2025-09-23 13:35:26,025 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,025 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,025 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,025 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,025 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,025 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,025 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,025 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,025 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,025 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from page_images artefact for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,025 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 36 pages (< 50 threshold) - creating single bundle
2025-09-23 13:35:26,025 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:35:26,026 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 8df65f84-8843-4535-9c1e-798be772ed8c for ocr pipeline
2025-09-23 13:35:26,026 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 13:35:26,026 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['8df65f84-8843-4535-9c1e-798be772ed8c']
2025-09-23 13:35:26,026 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 13:35:26,026 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 13:35:26,026 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,028 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 0d9053e9-002d-47ae-adc1-2acd86463bdf: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json']
2025-09-23 13:35:26,028 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,028 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,028 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,028 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,028 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,028 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,028 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,028 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,028 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,029 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from page_images artefact for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:26,029 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 13:35:26,030 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 45e83478-4a4e-4be3-bd21-67e00de82e7f for vlm pipeline
2025-09-23 13:35:26,030 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 13:35:26,030 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 0d9053e9-002d-47ae-adc1-2acd86463bdf
2025-09-23 13:35:27,317 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,317 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42: ['no_ocr', 'ocr', 'vlm']
2025-09-23 13:35:27,317 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 13:35:27,317 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 13:35:27,317 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,319 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42: ['docling_frontmatter_json', 'split_map_json', 'tika_json', 'document_outline_hierarchy', 'docling_json']
2025-09-23 13:35:27,322 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,322 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,322 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,322 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,330 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 1450 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 13:35:27,332 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 13:35:27,333 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task f90b75a4-f1b0-4c0c-a18a-7e8385872e92 for no_ocr pipeline
2025-09-23 13:35:27,333 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 13:35:27,333 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['f90b75a4-f1b0-4c0c-a18a-7e8385872e92']
2025-09-23 13:35:27,333 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 13:35:27,333 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,336 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42: ['docling_frontmatter_json', 'split_map_json', 'tika_json', 'document_outline_hierarchy', 'docling_json']
2025-09-23 13:35:27,336 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,336 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,336 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,336 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,341 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 1450 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 13:35:27,343 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 13:35:27,344 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task a2e32c89-c870-4b66-a2de-535c7625df7e for ocr pipeline
2025-09-23 13:35:27,344 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 13:35:27,344 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['a2e32c89-c870-4b66-a2de-535c7625df7e']
2025-09-23 13:35:27,344 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 13:35:27,344 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 13:35:27,345 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,346 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42: ['docling_frontmatter_json', 'split_map_json', 'tika_json', 'document_outline_hierarchy', 'docling_json']
2025-09-23 13:35:27,346 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,346 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,346 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,346 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:27,347 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 14400s)
2025-09-23 13:35:27,350 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 4a278ef3-9cb2-4d41-a9e5-c38c6b4460a7 for vlm pipeline
2025-09-23 13:35:27,350 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 13:35:27,350 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 4fec4ca9-c554-4fef-9f6f-0124cd9d2b42
2025-09-23 13:35:31,382 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,405 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file a6fb9839-ab3e-48e9-8c2a-ed424289d339: ['no_ocr', 'ocr', 'vlm']
2025-09-23 13:35:31,405 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 13:35:31,405 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 13:35:31,405 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,409 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file a6fb9839-ab3e-48e9-8c2a-ed424289d339: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json']
2025-09-23 13:35:31,409 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,409 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,409 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,409 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,409 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,409 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,409 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,409 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,409 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,410 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 23 from page_images artefact for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,410 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 23 pages (< 50 threshold) - creating single bundle
2025-09-23 13:35:31,410 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:35:31,410 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 9ad80203-b9b1-4bd3-b896-6bee74467619 for no_ocr pipeline
2025-09-23 13:35:31,410 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 13:35:31,410 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['9ad80203-b9b1-4bd3-b896-6bee74467619']
2025-09-23 13:35:31,410 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 13:35:31,410 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,414 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file a6fb9839-ab3e-48e9-8c2a-ed424289d339: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json']
2025-09-23 13:35:31,414 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,414 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,414 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,414 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,414 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,414 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,414 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,414 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,414 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,414 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 23 from page_images artefact for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,414 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 23 pages (< 50 threshold) - creating single bundle
2025-09-23 13:35:31,414 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:35:31,415 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 8a3a5ccb-0336-47a2-a34c-c48990818a71 for ocr pipeline
2025-09-23 13:35:31,415 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 13:35:31,415 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['8a3a5ccb-0336-47a2-a34c-c48990818a71']
2025-09-23 13:35:31,415 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 13:35:31,415 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 13:35:31,415 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,419 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file a6fb9839-ab3e-48e9-8c2a-ed424289d339: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json']
2025-09-23 13:35:31,427 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,492 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,516 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,516 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,516 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,516 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,516 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,516 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,516 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,516 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 23 from page_images artefact for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:35:31,516 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 13:35:31,517 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task b1d38ac4-9915-4007-aa5f-165c2286188c for vlm pipeline
2025-09-23 13:35:31,517 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 13:35:31,517 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file a6fb9839-ab3e-48e9-8c2a-ed424289d339
2025-09-23 13:36:02,762 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:02,862 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 957c1983-d0c1-422d-a68a-f44f0c49911c
2025-09-23 13:36:02,863 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task f904c584-8a1d-4ef4-9247-4860925519ad
2025-09-23 13:36:02,864 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 90a4e286-1284-45ef-810a-3592e0a7c171
2025-09-23 13:36:02,865 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 463f12c1-1b9c-49ff-b93c-3a14e1318608
2025-09-23 13:36:02,865 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 8fd94684-f2ad-4b47-bfa8-16ccedf859ff
2025-09-23 13:36:02,865 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 13:36:29,124 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,125 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6: ['no_ocr', 'ocr', 'vlm']
2025-09-23 13:36:29,125 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 13:36:29,125 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 13:36:29,125 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,129 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6: ['document_pdf', 'tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'split_map_json', 'docling_json']
2025-09-23 13:36:29,129 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_pdf' for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,129 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_pdf artefact for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,129 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,129 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,129 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,129 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,129 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,129 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,129 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,129 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 14 from split_map_json artefact for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,135 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 14 pages (< 50 threshold) - creating single bundle
2025-09-23 13:36:29,209 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:36:29,236 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task cf98fcd1-7b16-4c58-8ba3-ec85549d0b8d for no_ocr pipeline
2025-09-23 13:36:29,236 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 13:36:29,236 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['cf98fcd1-7b16-4c58-8ba3-ec85549d0b8d']
2025-09-23 13:36:29,236 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 13:36:29,236 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,240 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'document_pdf', 'split_map_json', 'tika_json']
2025-09-23 13:36:29,240 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,240 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,240 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,241 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,241 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,241 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,241 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_pdf' for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,241 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_pdf artefact for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,241 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,241 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 14 from split_map_json artefact for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,241 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 14 pages (< 50 threshold) - creating single bundle
2025-09-23 13:36:29,241 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:36:29,242 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 409140b9-f2e5-4ff5-979f-a433489d4056 for ocr pipeline
2025-09-23 13:36:29,242 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 13:36:29,242 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['409140b9-f2e5-4ff5-979f-a433489d4056']
2025-09-23 13:36:29,242 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 13:36:29,242 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 13:36:29,242 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,245 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6: ['document_pdf', 'tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'split_map_json', 'docling_json']
2025-09-23 13:36:29,245 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_pdf' for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,245 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_pdf artefact for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,245 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,246 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,246 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,246 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,246 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,246 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,246 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,246 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 14 from split_map_json artefact for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:36:29,246 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 13:36:29,246 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 33103528-97bc-4fb2-8e27-72473f377f18 for vlm pipeline
2025-09-23 13:36:29,247 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 13:36:29,247 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 7e8f8b1e-3a07-4cc0-8e2a-295ed61fdfd6
2025-09-23 13:39:20,150 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 13:39:25,704 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file e275152e-0191-4ec2-bf5a-8752a7bd860d
2025-09-23 13:39:25,708 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 19483762-319b-49cd-a09d-bf6c0e9479c9
2025-09-23 13:39:25,709 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task a296dbc2-8181-4cfd-943c-06e812c9cb57
2025-09-23 13:39:25,710 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 871ff142-ca53-4bba-b043-e1703613beca
2025-09-23 13:39:25,711 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 687623ac-d39d-418d-8a77-dbdd5bd69b99
2025-09-23 13:39:25,712 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 5e791246-889c-4038-948c-c6a792789624
2025-09-23 13:39:25,712 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file e275152e-0191-4ec2-bf5a-8752a7bd860d: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 13:49:53,637 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 13:49:59,630 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:49:59,634 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 247dc9cd-9a14-4cad-8c7c-291513a3f3c4
2025-09-23 13:49:59,635 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 18011204-bd39-425e-b05e-1dda6b04e023
2025-09-23 13:49:59,636 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task f326298e-612d-4aad-b2a4-4c2514a19dfa
2025-09-23 13:49:59,637 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 4f8f24ae-3e84-481a-914f-00e003ffcb87
2025-09-23 13:49:59,638 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task a2c11ad5-9876-4605-8809-0f212e358395
2025-09-23 13:49:59,638 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 87b348f4-6eca-46e9-b452-9c94a7414bbd: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 13:50:05,026 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:05,029 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 83d51ac1-c0f8-49d9-80f3-02f42f63e4e7
2025-09-23 13:50:05,030 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task f3c8f6f9-0397-4db5-ae33-dc53e2b64d64
2025-09-23 13:50:05,032 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task a3f4dd3c-2efa-48d1-9113-4adf2064536b
2025-09-23 13:50:05,033 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task b2f1d267-4f3a-47b1-9777-2b83ac610d60
2025-09-23 13:50:05,034 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 144de40d-9cc2-4abf-8c5f-45069d67c22f
2025-09-23 13:50:05,034 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 13:50:10,361 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:50:10,363 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task cc2bdd7f-fec0-4d22-ba17-d7ad7d739e7f
2025-09-23 13:50:10,363 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 4067b5e1-6d41-4b0c-9e41-0e1e3f2969af
2025-09-23 13:50:10,364 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task c508b011-008d-4d17-a053-19d67f73d573
2025-09-23 13:50:10,365 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 598ffcdd-a76b-4ce2-83e3-12ba9272c7cf
2025-09-23 13:50:10,366 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 93a67061-8686-485e-b858-ea2224ee9b67
2025-09-23 13:50:10,366 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file dd9706c2-cacd-4d97-a05d-94f539bf8af2: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 13:50:13,716 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:50:13,717 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task c73bacb7-2129-456f-b01c-51b2f98f4134
2025-09-23 13:50:13,718 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 77a083c3-ac89-4a59-a8d7-c810f6d33402
2025-09-23 13:50:13,719 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 65abbb79-f084-4327-883f-58ddce021654
2025-09-23 13:50:13,720 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task d803a887-0a01-499f-b353-94015113328f
2025-09-23 13:50:13,720 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task eb325431-5274-4c1f-9c03-7f165d77a6cf
2025-09-23 13:50:13,720 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file ccb00d80-41ab-47a3-9ed7-21a91434af2c: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 13:50:17,998 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:50:18,000 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 5df36c9c-ad2e-4f48-91ed-34e9dff18553
2025-09-23 13:50:18,022 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task e61b776e-dc36-489f-bb7c-d27aa1ce0b13
2025-09-23 13:50:18,023 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 817515c9-c09f-4fbe-aadd-02076745f011
2025-09-23 13:50:18,024 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 49c03851-3ec1-4367-bf7b-918370cac284
2025-09-23 13:50:18,025 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 447c8de0-0c08-4f37-b7b7-a9a4a5de538d
2025-09-23 13:50:18,025 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 06011282-0b76-44ad-8fb1-c201174f369b: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 13:50:26,222 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:50:26,224 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task b80883b8-5b13-433e-acc3-2aaa006e9bd8
2025-09-23 13:50:26,226 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 561d95c3-70b0-43fb-b002-2157566bf65c
2025-09-23 13:50:26,228 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task fc0fafee-907c-498a-bb77-ec1ffd00340c
2025-09-23 13:50:26,229 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task fbae1d1b-6e73-43c5-9bc2-63c4e6cce43d
2025-09-23 13:50:26,230 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task b4afdb4b-8cea-40fa-9f1b-772d94a20f88
2025-09-23 13:50:26,231 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 13:50:45,156 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,156 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d: ['no_ocr', 'ocr', 'vlm']
2025-09-23 13:50:45,156 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 13:50:45,156 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 13:50:45,156 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,163 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 13:50:45,163 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,163 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,163 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,163 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,163 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,163 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,163 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,163 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,163 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,163 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,179 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 13:50:45,180 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-23 13:50:45,180 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 1dd15875-299d-4f88-a9d7-a532197c17e3 for no_ocr pipeline
2025-09-23 13:50:45,180 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 13:50:45,180 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['1dd15875-299d-4f88-a9d7-a532197c17e3']
2025-09-23 13:50:45,180 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 13:50:45,180 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,182 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 13:50:45,182 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,182 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,183 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,183 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,183 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,183 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,183 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,183 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,183 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,183 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,190 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 13:50:45,190 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-23 13:50:45,191 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task f030ba70-f43c-4cfa-901b-f53a248ac833 for ocr pipeline
2025-09-23 13:50:45,191 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 13:50:45,191 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['f030ba70-f43c-4cfa-901b-f53a248ac833']
2025-09-23 13:50:45,191 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 13:50:45,191 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 13:50:45,191 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,198 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 13:50:45,198 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,198 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,198 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,199 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,199 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,199 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,199 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,199 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,199 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,199 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:45,199 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 5640s)
2025-09-23 13:50:45,199 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 7cd79034-e107-4ba9-8682-40e6683ed69d for vlm pipeline
2025-09-23 13:50:45,200 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 13:50:45,200 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 9106f6b2-e6cf-46f0-a0e9-c2057eb5646d
2025-09-23 13:50:49,975 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:49,976 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 87b348f4-6eca-46e9-b452-9c94a7414bbd: ['no_ocr', 'ocr', 'vlm']
2025-09-23 13:50:49,976 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 13:50:49,976 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 13:50:49,976 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:49,981 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 87b348f4-6eca-46e9-b452-9c94a7414bbd: ['document_pdf', 'tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 13:50:49,981 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_pdf' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:49,981 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_pdf artefact for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:49,981 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:49,981 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:49,981 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,006 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,103 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,104 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,104 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,104 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,104 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,104 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 14 from split_map_json artefact for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,104 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 14 pages (< 50 threshold) - creating single bundle
2025-09-23 13:50:50,104 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:50:50,106 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task cc239b46-9198-4cb9-a016-fcd842339656 for no_ocr pipeline
2025-09-23 13:50:50,106 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 13:50:50,106 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['cc239b46-9198-4cb9-a016-fcd842339656']
2025-09-23 13:50:50,106 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 13:50:50,106 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,111 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 87b348f4-6eca-46e9-b452-9c94a7414bbd: ['document_pdf', 'tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 13:50:50,111 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_pdf' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,111 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_pdf artefact for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,111 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,111 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,111 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,111 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,111 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,111 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,111 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,111 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,111 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,111 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 14 from split_map_json artefact for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,111 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 14 pages (< 50 threshold) - creating single bundle
2025-09-23 13:50:50,111 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:50:50,112 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 08cd550f-f433-495e-83b7-b4957b57ddbf for ocr pipeline
2025-09-23 13:50:50,112 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 13:50:50,112 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['08cd550f-f433-495e-83b7-b4957b57ddbf']
2025-09-23 13:50:50,112 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 13:50:50,112 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 13:50:50,112 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,218 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file 87b348f4-6eca-46e9-b452-9c94a7414bbd: ['document_pdf', 'tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 13:50:50,238 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_pdf' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,238 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_pdf artefact for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,238 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,239 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,239 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,239 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,239 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,239 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,239 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,239 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,239 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,239 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 14 from split_map_json artefact for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:50:50,239 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 13:50:50,241 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 4b7b60fb-811e-4986-9ede-5d8a17be637c for vlm pipeline
2025-09-23 13:50:50,241 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 13:50:50,241 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 87b348f4-6eca-46e9-b452-9c94a7414bbd
2025-09-23 13:51:01,604 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,604 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file dd9706c2-cacd-4d97-a05d-94f539bf8af2: ['no_ocr', 'ocr', 'vlm']
2025-09-23 13:51:01,605 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 13:51:01,605 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 13:51:01,605 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,609 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file dd9706c2-cacd-4d97-a05d-94f539bf8af2: ['tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'split_map_json', 'docling_json']
2025-09-23 13:51:01,609 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,609 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,609 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,609 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,609 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,609 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,609 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,609 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from split_map_json artefact for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,609 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 36 pages (< 50 threshold) - creating single bundle
2025-09-23 13:51:01,609 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:51:01,611 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task b2c6a74b-70b7-4c9c-a127-8f8736c53aaf for no_ocr pipeline
2025-09-23 13:51:01,611 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 13:51:01,611 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['b2c6a74b-70b7-4c9c-a127-8f8736c53aaf']
2025-09-23 13:51:01,611 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 13:51:01,611 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,613 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file dd9706c2-cacd-4d97-a05d-94f539bf8af2: ['tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'split_map_json', 'docling_json']
2025-09-23 13:51:01,613 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,613 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,613 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,613 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,613 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,613 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,613 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,613 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from split_map_json artefact for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,613 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 36 pages (< 50 threshold) - creating single bundle
2025-09-23 13:51:01,613 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:51:01,614 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task ad8c1964-def3-421d-9cab-0b92189e277e for ocr pipeline
2025-09-23 13:51:01,614 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 13:51:01,614 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['ad8c1964-def3-421d-9cab-0b92189e277e']
2025-09-23 13:51:01,614 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 13:51:01,614 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 13:51:01,614 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,616 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file dd9706c2-cacd-4d97-a05d-94f539bf8af2: ['tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'split_map_json', 'docling_json']
2025-09-23 13:51:01,616 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,616 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,616 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,616 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,616 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,616 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,616 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,616 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from split_map_json artefact for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:01,616 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 13:51:01,618 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 21bb8d0f-6e86-4cb9-9ef7-2c625c79cb30 for vlm pipeline
2025-09-23 13:51:01,618 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 13:51:01,618 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file dd9706c2-cacd-4d97-a05d-94f539bf8af2
2025-09-23 13:51:04,663 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,663 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file ccb00d80-41ab-47a3-9ed7-21a91434af2c: ['no_ocr', 'ocr', 'vlm']
2025-09-23 13:51:04,663 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 13:51:04,664 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 13:51:04,664 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,667 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file ccb00d80-41ab-47a3-9ed7-21a91434af2c: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json']
2025-09-23 13:51:04,667 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,667 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,667 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,667 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,667 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,668 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,668 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,668 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,668 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,668 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from page_images artefact for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,668 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 18 pages (< 50 threshold) - creating single bundle
2025-09-23 13:51:04,668 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:51:04,669 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 5094d95b-dc72-415f-8934-18fbca8d5ba5 for no_ocr pipeline
2025-09-23 13:51:04,669 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 13:51:04,669 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['5094d95b-dc72-415f-8934-18fbca8d5ba5']
2025-09-23 13:51:04,669 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 13:51:04,669 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,673 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file ccb00d80-41ab-47a3-9ed7-21a91434af2c: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json']
2025-09-23 13:51:04,674 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,674 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,674 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,674 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,674 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,674 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,674 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,674 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,674 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,674 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from page_images artefact for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,674 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 18 pages (< 50 threshold) - creating single bundle
2025-09-23 13:51:04,675 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:51:04,676 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 4ef2bf25-5d3e-42ff-bc7f-8569f5b359cd for ocr pipeline
2025-09-23 13:51:04,676 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 13:51:04,676 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['4ef2bf25-5d3e-42ff-bc7f-8569f5b359cd']
2025-09-23 13:51:04,676 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 13:51:04,676 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 13:51:04,676 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,680 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file ccb00d80-41ab-47a3-9ed7-21a91434af2c: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json']
2025-09-23 13:51:04,680 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,680 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,680 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,680 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,680 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,680 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,680 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,680 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,680 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,680 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from page_images artefact for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:04,680 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 13:51:04,681 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 7aa5636c-e233-480d-82b2-893c524ce2e9 for vlm pipeline
2025-09-23 13:51:04,681 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 13:51:04,681 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file ccb00d80-41ab-47a3-9ed7-21a91434af2c
2025-09-23 13:51:12,092 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,093 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 06011282-0b76-44ad-8fb1-c201174f369b: ['no_ocr', 'ocr', 'vlm']
2025-09-23 13:51:12,093 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 13:51:12,093 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 13:51:12,093 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,097 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 06011282-0b76-44ad-8fb1-c201174f369b: ['tika_json', 'split_map_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy']
2025-09-23 13:51:12,097 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,097 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,097 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,098 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 44 from split_map_json artefact for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,098 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 44 pages (< 50 threshold) - creating single bundle
2025-09-23 13:51:12,098 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:51:12,099 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 9a0734f1-d060-4c9b-8dc9-2751ab7659a6 for no_ocr pipeline
2025-09-23 13:51:12,099 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 13:51:12,099 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['9a0734f1-d060-4c9b-8dc9-2751ab7659a6']
2025-09-23 13:51:12,099 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 13:51:12,099 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,101 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 06011282-0b76-44ad-8fb1-c201174f369b: ['tika_json', 'split_map_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy']
2025-09-23 13:51:12,101 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,101 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,101 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,101 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 44 from split_map_json artefact for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,101 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 44 pages (< 50 threshold) - creating single bundle
2025-09-23 13:51:12,102 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:51:12,102 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 2419a981-49b5-4d01-9158-e8a6ab95a114 for ocr pipeline
2025-09-23 13:51:12,102 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 13:51:12,102 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['2419a981-49b5-4d01-9158-e8a6ab95a114']
2025-09-23 13:51:12,102 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 13:51:12,103 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 13:51:12,103 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,105 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 06011282-0b76-44ad-8fb1-c201174f369b: ['tika_json', 'split_map_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy']
2025-09-23 13:51:12,105 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,105 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,105 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,105 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 44 from split_map_json artefact for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:12,105 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 13:51:12,106 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 45a5526c-c508-4144-9487-df7473d2673b for vlm pipeline
2025-09-23 13:51:12,106 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 13:51:12,106 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 06011282-0b76-44ad-8fb1-c201174f369b
2025-09-23 13:51:55,017 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,017 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002: ['no_ocr', 'ocr', 'vlm']
2025-09-23 13:51:55,017 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 13:51:55,017 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 13:51:55,017 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,020 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002: ['tika_json', 'docling_json', 'split_map_json', 'docling_frontmatter_json', 'document_outline_hierarchy']
2025-09-23 13:51:55,020 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,020 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,020 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,021 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,021 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,021 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 20 from split_map_json artefact for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,021 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 20 pages (< 50 threshold) - creating single bundle
2025-09-23 13:51:55,021 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:51:55,021 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 4f5a0ec8-63e7-43fe-ade0-05572c94318f for no_ocr pipeline
2025-09-23 13:51:55,021 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 13:51:55,021 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['4f5a0ec8-63e7-43fe-ade0-05572c94318f']
2025-09-23 13:51:55,021 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 13:51:55,022 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,023 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002: ['tika_json', 'docling_json', 'split_map_json', 'docling_frontmatter_json', 'document_outline_hierarchy']
2025-09-23 13:51:55,023 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,023 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,024 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,024 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,024 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,024 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 20 from split_map_json artefact for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,024 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 20 pages (< 50 threshold) - creating single bundle
2025-09-23 13:51:55,024 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 13:51:55,024 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 9d3da769-377f-4406-bde2-8836dfe5a211 for ocr pipeline
2025-09-23 13:51:55,024 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 13:51:55,024 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['9d3da769-377f-4406-bde2-8836dfe5a211']
2025-09-23 13:51:55,024 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 13:51:55,024 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 13:51:55,024 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,026 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002: ['tika_json', 'docling_json', 'split_map_json', 'docling_frontmatter_json', 'document_outline_hierarchy']
2025-09-23 13:51:55,026 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,026 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,026 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,026 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,026 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,026 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 20 from split_map_json artefact for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 13:51:55,027 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 13:51:55,027 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task fc761bfc-cac8-4076-98ac-7cc3c439b45a for vlm pipeline
2025-09-23 13:51:55,027 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 13:51:55,027 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file a82ee8b5-0e60-4b3f-a3b0-d6d7470cb002
2025-09-23 14:05:41,657 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 14:05:41,677 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:05:41,688 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task b0f6d1f2-f37b-46b7-835a-79a12607f23b
2025-09-23 14:05:41,690 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 4c0dce91-d766-41e3-885e-ab302085b3bd
2025-09-23 14:05:41,695 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 8e66e9cb-905d-4732-b1e6-d7187c64166e
2025-09-23 14:05:41,697 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 46dfc26c-0c41-4127-923f-575748b50d02
2025-09-23 14:05:41,699 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 0ad4ef10-e810-4b8d-b9dd-1df48be85ae9
2025-09-23 14:05:41,699 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 94bba7ef-eb8c-4731-a69a-236737ea82c4: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 14:05:46,681 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:05:46,684 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task d9b6c4ef-44cb-48f8-8b2b-edd48b9e8d30
2025-09-23 14:05:46,685 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 411edbe1-9d45-4f39-abc4-1c2879ccd7ab
2025-09-23 14:05:46,687 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 86fe258b-9a85-4ffd-a160-d7914053b030
2025-09-23 14:05:46,689 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 21456e7f-0977-44c7-9fc7-81c2d24215ba
2025-09-23 14:05:46,690 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task d665030e-ae4d-438b-86b5-90e0a1df9bd2
2025-09-23 14:05:46,690 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 14:05:50,061 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:05:50,063 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 3d4c00d3-a163-4c78-b019-68d68c362537
2025-09-23 14:05:50,064 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task a719678b-09f8-4033-9b8b-f2d23573dee9
2025-09-23 14:05:50,065 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task fb4776ae-3d7c-4e26-8a9e-cce2ce67ad6f
2025-09-23 14:05:50,067 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task ca51c21e-d7fc-4117-ab6a-adf4eb8d8b58
2025-09-23 14:05:50,068 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task fcaa63bb-dc0b-4a1c-97e9-c3d30b389bd9
2025-09-23 14:05:50,068 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 14:05:53,925 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:05:53,926 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task c1d410e1-f48a-4033-bb6e-b03447ab0dba
2025-09-23 14:05:53,927 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 51296803-fa1f-47a7-8ce2-d2bb74c06d22
2025-09-23 14:05:53,929 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 7c752cb1-58e2-425c-a16d-13959f65eac3
2025-09-23 14:05:53,929 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 21143407-663c-4003-b1c5-2b404ccd195d
2025-09-23 14:05:53,930 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task f67642d4-4576-4c0d-8ab4-39b0ee66c465
2025-09-23 14:05:53,930 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file badf75a1-55a7-40e6-bd23-b51f916bfcf6: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 14:05:57,297 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:05:57,298 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 418adec9-77d0-4951-a0ab-0e622c893275
2025-09-23 14:05:57,300 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task b9ac5753-dc68-40b0-b435-b93f69129cc1
2025-09-23 14:05:57,301 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 2619db3a-4344-4d5c-b2d4-dca7cc80d628
2025-09-23 14:05:57,303 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task c7fd9ca7-9c11-49f5-bc7b-917727a088a9
2025-09-23 14:05:57,305 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 29549347-0ed8-4996-a971-5fb978aabdcf
2025-09-23 14:05:57,305 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 14:06:03,180 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:03,182 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task d2e4c26f-aee0-47a7-afdc-8e21e877e805
2025-09-23 14:06:03,183 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 0088d09d-cca6-4eba-b696-6ff08eff79be
2025-09-23 14:06:03,183 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task db07cc4e-36ed-4cd4-9a12-f9d0ce90dc5c
2025-09-23 14:06:03,184 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 49ee84f8-1dd9-4e4d-accd-af1d69f1cb6b
2025-09-23 14:06:03,185 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 182a476b-5e9d-4aca-841b-1c128e3bcefe
2025-09-23 14:06:03,185 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 134ea7ce-6106-459d-91d4-3639acc63cb4: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 14:06:06,899 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:06:06,900 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 2650c81c-5a56-4efb-b1d4-34fcb7503ea2
2025-09-23 14:06:06,901 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 6e18985c-8491-4ec3-99f7-7acd78d452a5
2025-09-23 14:06:06,901 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task c0f3535b-7b53-42c6-9320-b8444bf108c0
2025-09-23 14:06:06,902 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 7da73f19-5b76-49ac-abb6-e570589b841a
2025-09-23 14:06:06,902 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 1b5351c6-875c-4097-a61d-38dcd7b11ed1
2025-09-23 14:06:06,902 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file b0389c5c-9d6c-45b4-8078-514cb6e61662: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 14:06:13,944 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:06:13,945 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 1646268d-0dd0-4a6e-8415-63b7a6860c67
2025-09-23 14:06:13,946 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 198353dc-6760-43b0-8ca3-f57ae8f88373
2025-09-23 14:06:13,947 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 3d6fc0fe-bc46-4ade-8684-3344cb801459
2025-09-23 14:06:13,948 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 9dfb209a-2528-4abc-8127-0bc99598ad31
2025-09-23 14:06:13,949 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 4c0320a0-f001-450f-8a1d-b2a74238944b
2025-09-23 14:06:13,949 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 8074842d-f897-4851-8ce5-2800d5640057: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 14:06:37,176 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,176 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f: ['no_ocr', 'ocr', 'vlm']
2025-09-23 14:06:37,176 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 14:06:37,176 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 14:06:37,176 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,185 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 14:06:37,185 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,185 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,185 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,185 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,185 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,185 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,185 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,185 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from split_map_json artefact for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,185 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 18 pages (< 50 threshold) - creating single bundle
2025-09-23 14:06:37,186 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 14:06:37,187 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 2a5a9d9f-e729-42f3-a8ca-16e93e4b3eae for no_ocr pipeline
2025-09-23 14:06:37,187 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 14:06:37,187 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['2a5a9d9f-e729-42f3-a8ca-16e93e4b3eae']
2025-09-23 14:06:37,187 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 14:06:37,187 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,190 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 14:06:37,190 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,190 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,190 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,190 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,190 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,190 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,190 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,190 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from split_map_json artefact for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,190 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 18 pages (< 50 threshold) - creating single bundle
2025-09-23 14:06:37,190 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 14:06:37,191 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 7cf481f4-2c30-4496-8bad-c4d935caf760 for ocr pipeline
2025-09-23 14:06:37,191 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 14:06:37,191 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['7cf481f4-2c30-4496-8bad-c4d935caf760']
2025-09-23 14:06:37,191 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 14:06:37,191 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 14:06:37,191 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,193 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 14:06:37,194 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,194 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,194 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,194 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,194 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,194 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,194 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,194 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from split_map_json artefact for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,194 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 14:06:37,194 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task e76e1b21-3416-46e8-8945-8f820c8fedce for vlm pipeline
2025-09-23 14:06:37,194 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 14:06:37,194 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 06b311ce-a92e-4ebd-8f79-e7ad7759eb3f
2025-09-23 14:06:37,688 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,688 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 134ea7ce-6106-459d-91d4-3639acc63cb4: ['no_ocr', 'ocr', 'vlm']
2025-09-23 14:06:37,688 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 14:06:37,688 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 14:06:37,688 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,689 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 134ea7ce-6106-459d-91d4-3639acc63cb4: ['tika_json', 'docling_json', 'split_map_json', 'docling_frontmatter_json', 'document_outline_hierarchy']
2025-09-23 14:06:37,690 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,690 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,690 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,690 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,690 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,690 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,702 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 1450 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 14:06:37,702 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 14:06:37,703 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 082c0229-d70e-48f8-895a-4128fd7c97fd for no_ocr pipeline
2025-09-23 14:06:37,703 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 14:06:37,703 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['082c0229-d70e-48f8-895a-4128fd7c97fd']
2025-09-23 14:06:37,703 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 14:06:37,704 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,705 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 134ea7ce-6106-459d-91d4-3639acc63cb4: ['tika_json', 'docling_json', 'split_map_json', 'docling_frontmatter_json', 'document_outline_hierarchy']
2025-09-23 14:06:37,705 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,705 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,706 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,706 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,706 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,706 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,713 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 1450 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 14:06:37,713 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 14:06:37,714 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task d26d933e-9440-4b1d-80d5-ee7753a5c060 for ocr pipeline
2025-09-23 14:06:37,714 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 14:06:37,714 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['d26d933e-9440-4b1d-80d5-ee7753a5c060']
2025-09-23 14:06:37,714 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 14:06:37,714 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 14:06:37,714 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,716 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 134ea7ce-6106-459d-91d4-3639acc63cb4: ['tika_json', 'docling_json', 'split_map_json', 'docling_frontmatter_json', 'document_outline_hierarchy']
2025-09-23 14:06:37,716 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,716 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,716 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,716 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,716 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,716 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:37,716 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 14400s)
2025-09-23 14:06:37,717 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 3f3e3c7f-2c91-4e22-bb95-fb46c3ebee35 for vlm pipeline
2025-09-23 14:06:37,717 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 14:06:37,717 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 134ea7ce-6106-459d-91d4-3639acc63cb4
2025-09-23 14:06:40,562 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,562 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394: ['no_ocr', 'ocr', 'vlm']
2025-09-23 14:06:40,563 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 14:06:40,563 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 14:06:40,563 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,567 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394: ['docling_json', 'page_images', 'split_map_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'tika_json']
2025-09-23 14:06:40,568 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,568 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,568 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,568 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from page_images artefact for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,568 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 36 pages (< 50 threshold) - creating single bundle
2025-09-23 14:06:40,568 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 14:06:40,569 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task b4f4d461-4c4f-4cb9-b31f-a7fd3d28e144 for no_ocr pipeline
2025-09-23 14:06:40,569 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 14:06:40,569 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['b4f4d461-4c4f-4cb9-b31f-a7fd3d28e144']
2025-09-23 14:06:40,569 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 14:06:40,569 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,572 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394: ['docling_json', 'page_images', 'split_map_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'tika_json']
2025-09-23 14:06:40,572 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,572 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,572 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,572 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from page_images artefact for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,572 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 36 pages (< 50 threshold) - creating single bundle
2025-09-23 14:06:40,572 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 14:06:40,573 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 8b91ec38-9a3b-4f52-a222-3facdcddc686 for ocr pipeline
2025-09-23 14:06:40,573 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 14:06:40,573 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['8b91ec38-9a3b-4f52-a222-3facdcddc686']
2025-09-23 14:06:40,573 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 14:06:40,573 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 14:06:40,573 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,575 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394: ['docling_json', 'page_images', 'split_map_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'tika_json']
2025-09-23 14:06:40,576 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,576 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,576 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,576 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 36 from page_images artefact for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:06:40,576 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 14:06:40,577 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 59143732-f06f-4bc1-906e-513432a98c69 for vlm pipeline
2025-09-23 14:06:40,577 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 14:06:40,577 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file ca37246a-ff2c-4572-aa7d-d91bd2e1d394
2025-09-23 14:07:09,213 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,213 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file b0389c5c-9d6c-45b4-8078-514cb6e61662: ['no_ocr', 'ocr', 'vlm']
2025-09-23 14:07:09,213 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 14:07:09,213 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 14:07:09,214 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,216 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file b0389c5c-9d6c-45b4-8078-514cb6e61662: ['docling_json', 'split_map_json', 'tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy']
2025-09-23 14:07:09,216 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,216 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,216 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,216 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 866 from split_map_json artefact for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,224 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 866 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 14:07:09,224 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 14:07:09,225 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 7d9fc8fd-117e-42f8-88d7-dd428ae23a14 for no_ocr pipeline
2025-09-23 14:07:09,225 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 14:07:09,225 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['7d9fc8fd-117e-42f8-88d7-dd428ae23a14']
2025-09-23 14:07:09,225 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 14:07:09,225 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,226 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file b0389c5c-9d6c-45b4-8078-514cb6e61662: ['docling_json', 'split_map_json', 'tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy']
2025-09-23 14:07:09,227 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,227 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,227 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,227 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 866 from split_map_json artefact for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,232 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 866 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 14:07:09,233 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 14:07:09,233 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 320722fc-a598-49e0-a582-b5dfa8cdbc4e for ocr pipeline
2025-09-23 14:07:09,233 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 14:07:09,233 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['320722fc-a598-49e0-a582-b5dfa8cdbc4e']
2025-09-23 14:07:09,233 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 14:07:09,234 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 14:07:09,234 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,235 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file b0389c5c-9d6c-45b4-8078-514cb6e61662: ['docling_json', 'split_map_json', 'tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy']
2025-09-23 14:07:09,235 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,235 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,235 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,236 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 866 from split_map_json artefact for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:09,236 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 14400s)
2025-09-23 14:07:09,236 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 4317103f-7f2a-4370-bd53-d788da0f260a for vlm pipeline
2025-09-23 14:07:09,236 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 14:07:09,236 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file b0389c5c-9d6c-45b4-8078-514cb6e61662
2025-09-23 14:07:24,453 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,453 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 94bba7ef-eb8c-4731-a69a-236737ea82c4: ['no_ocr', 'ocr', 'vlm']
2025-09-23 14:07:24,453 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 14:07:24,453 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 14:07:24,453 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,455 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 94bba7ef-eb8c-4731-a69a-236737ea82c4: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 14:07:24,455 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,455 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,456 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,456 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,456 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,456 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,456 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,456 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,464 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 14:07:24,464 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-23 14:07:24,464 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 3cd38a05-ff94-4ca4-b8f5-77070ac560e1 for no_ocr pipeline
2025-09-23 14:07:24,465 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 14:07:24,465 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['3cd38a05-ff94-4ca4-b8f5-77070ac560e1']
2025-09-23 14:07:24,465 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 14:07:24,465 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,467 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 94bba7ef-eb8c-4731-a69a-236737ea82c4: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 14:07:24,467 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,467 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,467 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,467 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,467 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,467 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,467 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,467 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,476 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 14:07:24,476 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-23 14:07:24,476 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 52d1426d-a7eb-4530-835c-fd549b64c95d for ocr pipeline
2025-09-23 14:07:24,476 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 14:07:24,476 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['52d1426d-a7eb-4530-835c-fd549b64c95d']
2025-09-23 14:07:24,477 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 14:07:24,477 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 14:07:24,477 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,479 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 94bba7ef-eb8c-4731-a69a-236737ea82c4: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json', 'tika_json']
2025-09-23 14:07:24,479 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,479 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,479 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,479 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,479 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,479 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,479 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,479 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:24,479 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 5640s)
2025-09-23 14:07:24,480 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task d80e0208-ddae-4f9e-8de3-39138a476a45 for vlm pipeline
2025-09-23 14:07:24,480 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 14:07:24,480 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 94bba7ef-eb8c-4731-a69a-236737ea82c4
2025-09-23 14:07:29,497 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 14:07:29,498 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 968e0370-8313-45c1-85b0-771ec8e71879
2025-09-23 14:07:29,498 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 229a6529-a498-4e49-a728-4beb39f15934
2025-09-23 14:07:29,499 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task db774bdb-77f1-4c35-93e4-75a735649a3b
2025-09-23 14:07:29,501 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 49de220c-4bc1-418a-8be4-81abc3ad4094
2025-09-23 14:07:29,501 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 0b28f53c-03a2-4782-8add-c4ed89d0393e
2025-09-23 14:07:29,501 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 9092e579-a3cd-421f-afe4-b6ed2fee512e: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 14:07:39,325 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,325 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577: ['no_ocr', 'ocr', 'vlm']
2025-09-23 14:07:39,325 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 14:07:39,326 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 14:07:39,326 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,328 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577: ['tika_json', 'docling_json', 'docling_frontmatter_json', 'split_map_json', 'document_outline_hierarchy']
2025-09-23 14:07:39,328 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,328 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,328 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,328 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,328 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,328 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,328 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,328 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 17 from split_map_json artefact for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,328 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 17 pages (< 50 threshold) - creating single bundle
2025-09-23 14:07:39,328 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 14:07:39,329 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 72ea284a-5d06-4e00-b0c7-3c068b5aa49d for no_ocr pipeline
2025-09-23 14:07:39,329 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 14:07:39,329 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['72ea284a-5d06-4e00-b0c7-3c068b5aa49d']
2025-09-23 14:07:39,329 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 14:07:39,329 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,331 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577: ['tika_json', 'docling_json', 'docling_frontmatter_json', 'split_map_json', 'document_outline_hierarchy']
2025-09-23 14:07:39,331 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,331 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,331 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,331 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,331 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,331 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,331 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,331 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 17 from split_map_json artefact for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,331 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 17 pages (< 50 threshold) - creating single bundle
2025-09-23 14:07:39,331 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 14:07:39,332 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task e65cb5d5-5bad-4a24-be1a-4811e17dece2 for ocr pipeline
2025-09-23 14:07:39,332 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 14:07:39,332 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['e65cb5d5-5bad-4a24-be1a-4811e17dece2']
2025-09-23 14:07:39,332 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 14:07:39,332 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 14:07:39,332 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,334 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577: ['tika_json', 'docling_json', 'docling_frontmatter_json', 'split_map_json', 'document_outline_hierarchy']
2025-09-23 14:07:39,334 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,334 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,334 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,334 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,334 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,334 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,334 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,334 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 17 from split_map_json artefact for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:07:39,334 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 14:07:39,335 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task c42cdaf4-1835-4681-aed0-df04a305fbef for vlm pipeline
2025-09-23 14:07:39,335 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 14:07:39,335 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 62c14e85-95dd-4edd-aa57-8e8d2f10c577
2025-09-23 14:33:09,631 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,631 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file badf75a1-55a7-40e6-bd23-b51f916bfcf6: ['no_ocr', 'ocr', 'vlm']
2025-09-23 14:33:09,631 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 14:33:09,631 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 14:33:09,631 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,634 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file badf75a1-55a7-40e6-bd23-b51f916bfcf6: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'tika_json']
2025-09-23 14:33:09,634 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,634 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,634 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,634 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,634 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,634 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,634 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,634 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 32 from page_images artefact for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,634 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 32 pages (< 50 threshold) - creating single bundle
2025-09-23 14:33:09,635 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 14:33:09,635 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 385e1909-ac2d-4a47-a57d-b138f2bf77b2 for no_ocr pipeline
2025-09-23 14:33:09,635 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 14:33:09,635 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['385e1909-ac2d-4a47-a57d-b138f2bf77b2']
2025-09-23 14:33:09,635 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 14:33:09,635 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,637 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file badf75a1-55a7-40e6-bd23-b51f916bfcf6: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'tika_json']
2025-09-23 14:33:09,638 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,638 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,638 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,638 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,638 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,638 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,638 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,638 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 32 from page_images artefact for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,638 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 32 pages (< 50 threshold) - creating single bundle
2025-09-23 14:33:09,638 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 14:33:09,639 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 9c596d15-a710-4a84-8c29-03615c84cd80 for ocr pipeline
2025-09-23 14:33:09,639 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 14:33:09,639 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['9c596d15-a710-4a84-8c29-03615c84cd80']
2025-09-23 14:33:09,639 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 14:33:09,639 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 14:33:09,639 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,642 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file badf75a1-55a7-40e6-bd23-b51f916bfcf6: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'tika_json']
2025-09-23 14:33:09,642 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,642 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,642 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,642 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,642 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,642 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,642 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,642 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 32 from page_images artefact for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:33:09,642 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 14:33:09,643 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 567f58b4-15e9-445a-b08f-532109b22f2d for vlm pipeline
2025-09-23 14:33:09,643 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 14:33:09,643 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file badf75a1-55a7-40e6-bd23-b51f916bfcf6
2025-09-23 14:39:19,827 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,827 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 8074842d-f897-4851-8ce5-2800d5640057: ['no_ocr', 'ocr', 'vlm']
2025-09-23 14:39:19,827 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 14:39:19,828 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 14:39:19,828 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,831 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 8074842d-f897-4851-8ce5-2800d5640057: ['docling_json', 'tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 14:39:19,831 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,831 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,831 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,831 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,831 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,831 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,831 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,831 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,832 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,832 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,845 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 1450 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 14:39:19,845 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 14:39:19,846 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 8ae1afb8-7173-401e-b4eb-cf7a761ac6dc for no_ocr pipeline
2025-09-23 14:39:19,846 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 14:39:19,846 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['8ae1afb8-7173-401e-b4eb-cf7a761ac6dc']
2025-09-23 14:39:19,846 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 14:39:19,846 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,849 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 8074842d-f897-4851-8ce5-2800d5640057: ['docling_json', 'tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 14:39:19,849 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,849 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,849 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,849 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,849 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,849 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,849 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,849 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,849 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,849 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,858 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 1450 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 14:39:19,858 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 10800s)
2025-09-23 14:39:19,859 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 14ec1116-25c7-4492-8b61-006907351790 for ocr pipeline
2025-09-23 14:39:19,859 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 14:39:19,859 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['14ec1116-25c7-4492-8b61-006907351790']
2025-09-23 14:39:19,859 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 14:39:19,859 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 14:39:19,859 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,862 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 8074842d-f897-4851-8ce5-2800d5640057: ['docling_json', 'tika_json', 'docling_frontmatter_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 14:39:19,862 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,862 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,862 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,862 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,862 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,862 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,862 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,862 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,862 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,862 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 1450 from split_map_json artefact for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 14:39:19,862 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 14400s)
2025-09-23 14:39:19,863 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 9b9c7ab2-9452-405d-b9fb-90c6c10254e8 for vlm pipeline
2025-09-23 14:39:19,863 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 14:39:19,863 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 8074842d-f897-4851-8ce5-2800d5640057
2025-09-23 15:30:13,342 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,342 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file 9092e579-a3cd-421f-afe4-b6ed2fee512e: ['no_ocr', 'ocr', 'vlm']
2025-09-23 15:30:13,342 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 15:30:13,343 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 15:30:13,343 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,346 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 9092e579-a3cd-421f-afe4-b6ed2fee512e: ['tika_json', 'split_map_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy']
2025-09-23 15:30:13,346 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,346 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,346 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,346 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from split_map_json artefact for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,346 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 18 pages (< 50 threshold) - creating single bundle
2025-09-23 15:30:13,346 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 15:30:13,347 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task e1891404-74fd-4f78-8eac-35c448ace950 for no_ocr pipeline
2025-09-23 15:30:13,347 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 15:30:13,347 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['e1891404-74fd-4f78-8eac-35c448ace950']
2025-09-23 15:30:13,347 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 15:30:13,347 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,349 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 9092e579-a3cd-421f-afe4-b6ed2fee512e: ['tika_json', 'split_map_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy']
2025-09-23 15:30:13,349 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,349 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,349 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,349 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from split_map_json artefact for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,349 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 18 pages (< 50 threshold) - creating single bundle
2025-09-23 15:30:13,350 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-09-23 15:30:13,350 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 4217ecd4-93d2-4eb2-ab0f-7bf665b7a3c1 for ocr pipeline
2025-09-23 15:30:13,350 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 15:30:13,350 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['4217ecd4-93d2-4eb2-ab0f-7bf665b7a3c1']
2025-09-23 15:30:13,350 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 15:30:13,350 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 15:30:13,350 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,357 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file 9092e579-a3cd-421f-afe4-b6ed2fee512e: ['tika_json', 'split_map_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy']
2025-09-23 15:30:13,357 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,357 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,357 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,357 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from split_map_json artefact for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 15:30:13,357 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-09-23 15:30:13,358 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 13760074-9f84-4504-a89f-0d0c211da422 for vlm pipeline
2025-09-23 15:30:13,358 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 15:30:13,358 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file 9092e579-a3cd-421f-afe4-b6ed2fee512e
2025-09-23 16:21:35,870 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 16:21:35,899 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file f60f43c2-1e5c-453b-a643-3c7f785f3c18
2025-09-23 16:21:35,903 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 76be8bf8-6597-4021-a6cf-47915c6be210
2025-09-23 16:21:35,904 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task b496e95d-fed2-4c69-8da0-908d87797d5c
2025-09-23 16:21:35,905 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 2041fa71-7377-49ea-8fc4-0fcd6fa47754
2025-09-23 16:21:35,906 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task ebe6f8eb-0193-4b49-bd56-7aeb5d15f0d5
2025-09-23 16:21:35,907 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 244db018-a5d9-4277-b2b1-384143d43e2e
2025-09-23 16:21:35,907 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file f60f43c2-1e5c-453b-a643-3c7f785f3c18: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 19:28:33,496 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-09-23 19:28:33,506 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file 926df044-3c2b-4732-a451-fc6ec967a87d
2025-09-23 19:28:33,518 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 1f2a0ed0-cc01-477e-a07e-9679c630589b
2025-09-23 19:28:33,523 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 6b09db57-3e34-4f3e-8a0d-a5fa86b8d0f6
2025-09-23 19:28:33,526 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task de20c127-f080-4858-89a0-3b33060b00f6
2025-09-23 19:28:33,529 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task 5a124906-ca65-437b-8c1a-3ea77fe0d252
2025-09-23 19:28:33,531 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 6e10b149-f60a-4ecf-95af-59b6ced39d65
2025-09-23 19:28:33,531 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file 926df044-3c2b-4732-a451-fc6ec967a87d: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 19:29:28,245 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:28,246 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task bbe079be-9d72-4b8b-83f0-879e76d8ebf6
2025-09-23 19:29:28,248 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task 3f2b3cc6-a02f-45a9-800c-9ff9fc13eb1a
2025-09-23 19:29:28,249 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 8a1b178d-4407-48a8-a59c-1015d13de6ed
2025-09-23 19:29:28,250 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task b2893de6-112b-44d3-90e9-eef30614314c
2025-09-23 19:29:28,251 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task b5f7e68d-9b9c-40d7-8a97-0e9a87a9a8ba
2025-09-23 19:29:28,251 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file d1076184-a916-417f-b4fb-26d9ad445e46: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-09-23 19:29:58,652 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,652 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file d1076184-a916-417f-b4fb-26d9ad445e46: ['no_ocr', 'ocr', 'vlm']
2025-09-23 19:29:58,652 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-09-23 19:29:58,653 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-09-23 19:29:58,653 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,720 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file d1076184-a916-417f-b4fb-26d9ad445e46: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 19:29:58,720 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,720 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,720 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,720 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,720 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,720 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,720 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,720 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,720 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,720 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,817 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 19:29:58,817 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for no_ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-23 19:29:58,818 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task 8b4cc78c-7483-44aa-9b66-e9bcf6ccb80c for no_ocr pipeline
2025-09-23 19:29:58,818 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-09-23 19:29:58,818 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['8b4cc78c-7483-44aa-9b66-e9bcf6ccb80c']
2025-09-23 19:29:58,818 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-09-23 19:29:58,818 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,898 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file d1076184-a916-417f-b4fb-26d9ad445e46: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 19:29:58,898 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,898 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,898 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,898 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,898 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,898 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,898 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,898 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,898 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,898 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,973 INFO : pipeline_controller.py:_determine_processing_mode:397 >>> Document has 94 pages (>= 50 threshold) with split map - creating section-based bundles
2025-09-23 19:29:58,973 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for ocr pipeline: split_by_sections (timeout: 3600s)
2025-09-23 19:29:58,974 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task a43d0d3d-10b9-4196-adb2-453bd3529908 for ocr pipeline
2025-09-23 19:29:58,974 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-09-23 19:29:58,974 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['a43d0d3d-10b9-4196-adb2-453bd3529908']
2025-09-23 19:29:58,975 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-09-23 19:29:58,975 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-09-23 19:29:58,975 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,979 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 5 artefacts for file d1076184-a916-417f-b4fb-26d9ad445e46: ['tika_json', 'docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'split_map_json']
2025-09-23 19:29:58,979 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'tika_json' for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,979 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in tika_json artefact for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,979 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,979 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,979 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,979 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,979 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,979 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,979 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'split_map_json' for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,979 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 94 from split_map_json artefact for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-09-23 19:29:58,979 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 5640s)
2025-09-23 19:29:58,981 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task cbb4a2dc-c4dc-434d-a93e-3a337498824a for vlm pipeline
2025-09-23 19:29:58,981 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-09-23 19:29:58,981 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file d1076184-a916-417f-b4fb-26d9ad445e46
2025-11-14 15:02:42,627 INFO : pipeline_controller.py:__init__ :66 >>> Pipeline controller initialized with new bundle architecture
2025-11-14 15:02:42,639 INFO : pipeline_controller.py:enqueue_phase1_tasks:77 >>> Phase 1: Starting structure discovery for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:02:42,646 INFO : pipeline_controller.py:enqueue_phase1_tasks:98 >>> Phase 1: Enqueued Tika task 24062318-8097-4a74-be9b-88c9ab0eec01
2025-11-14 15:02:42,647 INFO : pipeline_controller.py:enqueue_phase1_tasks:154 >>> Phase 1: Enqueued frontmatter task e5d8926c-254b-4b70-9ff8-1824c96e6005
2025-11-14 15:02:42,655 INFO : pipeline_controller.py:enqueue_phase1_tasks:178 >>> Phase 1: Enqueued document analysis task 07003057-c83e-4bfd-a71e-80ff0290927c
2025-11-14 15:02:42,657 INFO : pipeline_controller.py:enqueue_phase1_tasks:190 >>> Phase 1: Enqueued split map task c1ecf444-95a3-4730-9e0d-35e574191d2c
2025-11-14 15:02:42,660 INFO : pipeline_controller.py:enqueue_phase1_tasks:209 >>> Phase 1: Enqueued page images task 217e8ac9-4b49-42bd-bcd5-94b565e4d575
2025-11-14 15:02:42,660 INFO : pipeline_controller.py:enqueue_phase1_tasks:214 >>> Phase 1: Enqueued 5 tasks for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a: ['tika', 'frontmatter', 'document_analysis', 'split_map', 'page_images']
2025-11-14 15:03:23,270 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:292 >>> Enqueueing sequential docling pipelines for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,270 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:331 >>> Sequential pipeline order for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a: ['no_ocr', 'ocr', 'vlm']
2025-11-14 15:03:23,270 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:345 >>> Pipeline no_ocr has no dependencies (first pipeline)
2025-11-14 15:03:23,271 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:644 >>> NO_OCR pipeline config: table_mode=fast, formula_enrichment=False, code_enrichment=False
2025-11-14 15:03:23,271 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,278 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'tika_json']
2025-11-14 15:03:23,279 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,279 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,279 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,279 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,279 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,279 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,279 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,279 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from page_images artefact for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,279 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 18 pages (< 50 threshold) - creating single bundle
2025-11-14 15:03:23,279 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for no_ocr pipeline: whole_document (timeout: 7200s)
2025-11-14 15:03:23,280 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task 4e1eb64e-e5bf-403e-98d1-8d13285aa82a for no_ocr pipeline
2025-11-14 15:03:23,280 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued no_ocr pipeline with 1 tasks
2025-11-14 15:03:23,280 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline ocr will depend on 1 tasks from no_ocr: ['4e1eb64e-e5bf-403e-98d1-8d13285aa82a']
2025-11-14 15:03:23,280 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:655 >>> OCR pipeline config: table_mode=accurate, formula_enrichment=True, code_enrichment=False
2025-11-14 15:03:23,280 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,282 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'tika_json']
2025-11-14 15:03:23,283 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,283 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,283 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,283 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,283 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,283 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,283 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,283 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from page_images artefact for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,283 INFO : pipeline_controller.py:_determine_processing_mode:391 >>> Document has 18 pages (< 50 threshold) - creating single bundle
2025-11-14 15:03:23,283 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle task for ocr pipeline: whole_document (timeout: 7200s)
2025-11-14 15:03:23,284 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle task dd9e10c5-00d4-4027-a6b3-7f8731eb2e32 for ocr pipeline
2025-11-14 15:03:23,284 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued ocr pipeline with 1 tasks
2025-11-14 15:03:23,284 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:343 >>> Pipeline vlm will depend on 1 tasks from ocr: ['dd9e10c5-00d4-4027-a6b3-7f8731eb2e32']
2025-11-14 15:03:23,284 INFO : pipeline_controller.py:_enqueue_single_pipeline_with_deps:667 >>> VLM pipeline config: table_mode=accurate, picture_classification=True, picture_description=True
2025-11-14 15:03:23,284 INFO : pipeline_controller.py:_determine_processing_mode:383 >>> BY_PAGE enabled for vlm - creating page-based bundles regardless of document size
2025-11-14 15:03:23,284 INFO : pipeline_controller.py:_get_page_count :459 >>> 🔍 PAGE COUNT: Starting page count detection for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,286 INFO : pipeline_controller.py:_get_page_count :465 >>> 🔍 PAGE COUNT: Found 6 artefacts for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a: ['docling_frontmatter_json', 'docling_json', 'document_outline_hierarchy', 'page_images', 'split_map_json', 'tika_json']
2025-11-14 15:03:23,287 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_frontmatter_json' for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,287 INFO : pipeline_controller.py:_get_page_count :474 >>> 🔍 PAGE COUNT: Skipping frontmatter artefact (partial page count) for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,287 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'docling_json' for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,287 INFO : pipeline_controller.py:_get_page_count :479 >>> 🔍 PAGE COUNT: Skipping frontmatter-derived docling_json artefact (partial page count) for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,287 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'document_outline_hierarchy' for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,287 INFO : pipeline_controller.py:_get_page_count :492 >>> 🔍 PAGE COUNT: No page_count in document_outline_hierarchy artefact for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,287 INFO : pipeline_controller.py:_get_page_count :470 >>> 🔍 PAGE COUNT: Checking artefact type 'page_images' for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,287 INFO : pipeline_controller.py:_get_page_count :489 >>> ✅ PAGE COUNT: Found page count 18 from page_images artefact for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a
2025-11-14 15:03:23,287 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:761 >>> Enqueuing docling_bundle_split task for vlm pipeline: split_by_pages (timeout: 3600s)
2025-11-14 15:03:23,288 INFO : pipeline_controller.py:_enqueue_bundle_task_with_deps:772 >>> Successfully enqueued docling_bundle_split task d5478341-8c7a-480f-baae-3d7a615a2a9b for vlm pipeline
2025-11-14 15:03:23,288 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:355 >>> Enqueued vlm pipeline with 1 tasks
2025-11-14 15:03:23,288 INFO : pipeline_controller.py:enqueue_sequential_docling_pipelines:358 >>> Successfully enqueued 3 sequential pipelines with 3 total tasks for file e67223d3-3f48-4ed1-bc37-ce72cb5fd05a