6.3 KiB
WhisperLive Hybrid Server
This hybrid server extends the original WhisperLive-Server to support both WebSocket connections (for real-time audio streaming) and HTTP endpoints (for file transcription) in a single container.
Features
- WebSocket Server: Original real-time audio transcription functionality
- HTTP Server: New file upload and transcription endpoints
- Single Container: Both services run in the same Docker container
- GPU Sharing: Both services share the same GPU resources
Architecture
The hybrid server runs two services simultaneously:
- WebSocket Server: Handles real-time audio streaming transcription
- HTTP Server: Handles file uploads and transcription requests
Both services use the same WhisperLive transcriber instance, ensuring efficient resource usage.
Ports
- WebSocket Port: Default 5050 (configurable via
PORT_WHISPERLIVE) - HTTP Port: Default 8080 (configurable via
HTTP_PORT)
HTTP Endpoints
1. Health Check
GET /health
Returns server health status.
Response:
{
"status": "healthy",
"service": "WhisperLive Hybrid Server"
}
2. OpenAI Compatible Endpoints
POST /v1/audio/transcriptions
POST /v1/audio/translations
Fully compatible drop-in replacements for the standard OpenAI Whisper API.
Parameters:
file(required): Audio file (WAV, MP3, FLAC, M4A, OGG, WEBM, MP4, MPEG, MPGA)model(optional): Model size (default: "base")language(optional): Language code (e.g., "en", "es", "fr")prompt(optional): Text to guide the model's styleresponse_format(optional): "json", "text", "srt", "verbose_json", "vtt" (default: "json")temperature(optional): Sampling temperature (0.0 to 1.0)
Example Request:
curl -X POST http://localhost:8080/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F "file=@audio.wav" \
-F "model=whisper-1" \
-F "response_format=json"
Response (JSON format):
{
"text": "Hello, this is a test."
}
3. Legacy File Transcription
POST /transcribe
Transcribes an uploaded audio file.
Parameters:
file(required): Audio file (WAV, MP3, FLAC, M4A, OGG, WEBM)language(optional): Language code (e.g., "en", "es", "fr")task(optional): "transcribe" or "translate" (default: "transcribe")model(optional): Model size (default: "base")
Example Request:
curl -X POST http://localhost:8080/transcribe \
-F "file=@audio.wav" \
-F "language=en" \
-F "task=transcribe" \
-F "model=base"
Response:
{
"success": true,
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "Hello, this is a test.",
"no_speech_prob": 0.1
}
],
"info": {
"language": "en",
"language_probability": 0.95,
"duration": 10.5,
"duration_after_vad": 10.5,
"transcription_options": {}
},
"filename": "audio.wav"
}
3. URL Transcription (Placeholder)
POST /transcribe/url
Endpoint for transcribing audio from URLs (ready for implementation).
Usage Examples
Python Client
import requests
# Transcribe a file
with open('audio.wav', 'rb') as f:
response = requests.post('http://localhost:8080/transcribe',
files={'file': f},
data={'language': 'en', 'model': 'base'})
if response.status_code == 200:
result = response.json()
print(f"Transcription: {result['segments']}")
JavaScript/Node.js
const FormData = require('form-data');
const fs = require('fs');
const form = new FormData();
form.append('file', fs.createReadStream('audio.wav'));
form.append('language', 'en');
form.append('model', 'base');
fetch('http://localhost:8080/transcribe', {
method: 'POST',
body: form
})
.then(response => response.json())
.then(result => console.log(result));
cURL
# Basic transcription
curl -X POST http://localhost:8080/transcribe \
-F "file=@audio.wav"
# With parameters
curl -X POST http://localhost:8080/transcribe \
-F "file=@audio.wav" \
-F "language=es" \
-F "task=translate" \
-F "model=small"
Configuration
Environment Variables
PORT_WHISPERLIVE: WebSocket port (default: 5050)HTTP_PORT: HTTP port (default: 8080)FASTERWHISPER_MODEL: Custom model pathOMP_NUM_THREADS: OpenMP thread count
Docker Compose
services:
whisperlive:
ports:
- "5050:5050" # WebSocket
- "8080:8080" # HTTP
environment:
PORT_WHISPERLIVE: 5050
HTTP_PORT: 8080
Testing
1. Test Script
Run the Python test script:
python3 test_http_endpoints.py
2. Web Interface
Open test_form.html in a web browser to test the HTTP endpoints with a user-friendly interface.
3. Health Check
curl http://localhost:8080/health
Backend Support
Currently, the HTTP endpoints support:
- faster_whisper: Full support for all features
- tensorrt: Basic support (needs adaptation)
- openvino: Basic support (needs adaptation)
File Size Limits
- Maximum file size: 100MB
- Supported formats: WAV, MP3, FLAC, M4A, OGG, WEBM
Performance Considerations
- File transcription uses the same model instance as WebSocket connections
- Temporary files are automatically cleaned up after processing
- Both services share GPU memory efficiently
- HTTP requests are processed in separate threads
Troubleshooting
Common Issues
-
Port Already in Use
- Check if ports 5050 or 8080 are available
- Use different ports via environment variables
-
File Upload Errors
- Ensure file size is under 100MB
- Check file format is supported
- Verify file is not corrupted
-
GPU Memory Issues
- Monitor GPU memory usage
- Consider using smaller model sizes
- Restart container if needed
Logs
Check container logs for detailed error information:
docker logs whisperlive
Migration from Original Server
The hybrid server is fully backward compatible. Your existing WebSocket clients will continue to work without changes. The HTTP endpoints are additional functionality that doesn't interfere with the original service.
Future Enhancements
- Support for more audio formats
- Batch file processing
- Progress tracking for long files
- Authentication and rate limiting
- WebSocket support for file transcription progress