# WhisperLive Hybrid Server This hybrid server extends the original WhisperLive-Server to support both WebSocket connections (for real-time audio streaming) and HTTP endpoints (for file transcription) in a single container. ## Features - **WebSocket Server**: Original real-time audio transcription functionality - **HTTP Server**: New file upload and transcription endpoints - **Single Container**: Both services run in the same Docker container - **GPU Sharing**: Both services share the same GPU resources ## Architecture The hybrid server runs two services simultaneously: 1. **WebSocket Server**: Handles real-time audio streaming transcription 2. **HTTP Server**: Handles file uploads and transcription requests Both services use the same WhisperLive transcriber instance, ensuring efficient resource usage. ## Ports - **WebSocket Port**: Default 5050 (configurable via `PORT_WHISPERLIVE`) - **HTTP Port**: Default 8080 (configurable via `HTTP_PORT`) ## HTTP Endpoints ### 1. Health Check ``` GET /health ``` Returns server health status. **Response:** ```json { "status": "healthy", "service": "WhisperLive Hybrid Server" } ``` ### 2. OpenAI Compatible Endpoints ``` POST /v1/audio/transcriptions POST /v1/audio/translations ``` Fully compatible drop-in replacements for the standard OpenAI Whisper API. **Parameters:** - `file` (required): Audio file (WAV, MP3, FLAC, M4A, OGG, WEBM, MP4, MPEG, MPGA) - `model` (optional): Model size (default: "base") - `language` (optional): Language code (e.g., "en", "es", "fr") - `prompt` (optional): Text to guide the model's style - `response_format` (optional): "json", "text", "srt", "verbose_json", "vtt" (default: "json") - `temperature` (optional): Sampling temperature (0.0 to 1.0) **Example Request:** ```bash curl -X POST http://localhost:8080/v1/audio/transcriptions \ -H "Content-Type: multipart/form-data" \ -F "file=@audio.wav" \ -F "model=whisper-1" \ -F "response_format=json" ``` **Response (JSON format):** ```json { "text": "Hello, this is a test." } ``` ### 3. Legacy File Transcription ``` POST /transcribe ``` Transcribes an uploaded audio file. **Parameters:** - `file` (required): Audio file (WAV, MP3, FLAC, M4A, OGG, WEBM) - `language` (optional): Language code (e.g., "en", "es", "fr") - `task` (optional): "transcribe" or "translate" (default: "transcribe") - `model` (optional): Model size (default: "base") **Example Request:** ```bash curl -X POST http://localhost:8080/transcribe \ -F "file=@audio.wav" \ -F "language=en" \ -F "task=transcribe" \ -F "model=base" ``` **Response:** ```json { "success": true, "segments": [ { "start": 0.0, "end": 2.5, "text": "Hello, this is a test.", "no_speech_prob": 0.1 } ], "info": { "language": "en", "language_probability": 0.95, "duration": 10.5, "duration_after_vad": 10.5, "transcription_options": {} }, "filename": "audio.wav" } ``` ### 3. URL Transcription (Placeholder) ``` POST /transcribe/url ``` Endpoint for transcribing audio from URLs (ready for implementation). ## Usage Examples ### Python Client ```python import requests # Transcribe a file with open('audio.wav', 'rb') as f: response = requests.post('http://localhost:8080/transcribe', files={'file': f}, data={'language': 'en', 'model': 'base'}) if response.status_code == 200: result = response.json() print(f"Transcription: {result['segments']}") ``` ### JavaScript/Node.js ```javascript const FormData = require('form-data'); const fs = require('fs'); const form = new FormData(); form.append('file', fs.createReadStream('audio.wav')); form.append('language', 'en'); form.append('model', 'base'); fetch('http://localhost:8080/transcribe', { method: 'POST', body: form }) .then(response => response.json()) .then(result => console.log(result)); ``` ### cURL ```bash # Basic transcription curl -X POST http://localhost:8080/transcribe \ -F "file=@audio.wav" # With parameters curl -X POST http://localhost:8080/transcribe \ -F "file=@audio.wav" \ -F "language=es" \ -F "task=translate" \ -F "model=small" ``` ## Configuration ### Environment Variables - `PORT_WHISPERLIVE`: WebSocket port (default: 5050) - `HTTP_PORT`: HTTP port (default: 8080) - `FASTERWHISPER_MODEL`: Custom model path - `OMP_NUM_THREADS`: OpenMP thread count ### Docker Compose ```yaml services: whisperlive: ports: - "5050:5050" # WebSocket - "8080:8080" # HTTP environment: PORT_WHISPERLIVE: 5050 HTTP_PORT: 8080 ``` ## Testing ### 1. Test Script Run the Python test script: ```bash python3 test_http_endpoints.py ``` ### 2. Web Interface Open `test_form.html` in a web browser to test the HTTP endpoints with a user-friendly interface. ### 3. Health Check ```bash curl http://localhost:8080/health ``` ## Backend Support Currently, the HTTP endpoints support: - **faster_whisper**: Full support for all features - **tensorrt**: Basic support (needs adaptation) - **openvino**: Basic support (needs adaptation) ## File Size Limits - Maximum file size: 100MB - Supported formats: WAV, MP3, FLAC, M4A, OGG, WEBM ## Performance Considerations - File transcription uses the same model instance as WebSocket connections - Temporary files are automatically cleaned up after processing - Both services share GPU memory efficiently - HTTP requests are processed in separate threads ## Troubleshooting ### Common Issues 1. **Port Already in Use** - Check if ports 5050 or 8080 are available - Use different ports via environment variables 2. **File Upload Errors** - Ensure file size is under 100MB - Check file format is supported - Verify file is not corrupted 3. **GPU Memory Issues** - Monitor GPU memory usage - Consider using smaller model sizes - Restart container if needed ### Logs Check container logs for detailed error information: ```bash docker logs whisperlive ``` ## Migration from Original Server The hybrid server is fully backward compatible. Your existing WebSocket clients will continue to work without changes. The HTTP endpoints are additional functionality that doesn't interfere with the original service. ## Future Enhancements - [ ] Support for more audio formats - [ ] Batch file processing - [ ] Progress tracking for long files - [ ] Authentication and rate limiting - [ ] WebSocket support for file transcription progress