WhisperLive-Server/HYBRID_SERVER_README.md

261 lines
6.3 KiB
Markdown

# WhisperLive Hybrid Server
This hybrid server extends the original WhisperLive-Server to support both WebSocket connections (for real-time audio streaming) and HTTP endpoints (for file transcription) in a single container.
## Features
- **WebSocket Server**: Original real-time audio transcription functionality
- **HTTP Server**: New file upload and transcription endpoints
- **Single Container**: Both services run in the same Docker container
- **GPU Sharing**: Both services share the same GPU resources
## Architecture
The hybrid server runs two services simultaneously:
1. **WebSocket Server**: Handles real-time audio streaming transcription
2. **HTTP Server**: Handles file uploads and transcription requests
Both services use the same WhisperLive transcriber instance, ensuring efficient resource usage.
## Ports
- **WebSocket Port**: Default 5050 (configurable via `PORT_WHISPERLIVE`)
- **HTTP Port**: Default 8080 (configurable via `HTTP_PORT`)
## HTTP Endpoints
### 1. Health Check
```
GET /health
```
Returns server health status.
**Response:**
```json
{
"status": "healthy",
"service": "WhisperLive Hybrid Server"
}
```
### 2. OpenAI Compatible Endpoints
```
POST /v1/audio/transcriptions
POST /v1/audio/translations
```
Fully compatible drop-in replacements for the standard OpenAI Whisper API.
**Parameters:**
- `file` (required): Audio file (WAV, MP3, FLAC, M4A, OGG, WEBM, MP4, MPEG, MPGA)
- `model` (optional): Model size (default: "base")
- `language` (optional): Language code (e.g., "en", "es", "fr")
- `prompt` (optional): Text to guide the model's style
- `response_format` (optional): "json", "text", "srt", "verbose_json", "vtt" (default: "json")
- `temperature` (optional): Sampling temperature (0.0 to 1.0)
**Example Request:**
```bash
curl -X POST http://localhost:8080/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F "file=@audio.wav" \
-F "model=whisper-1" \
-F "response_format=json"
```
**Response (JSON format):**
```json
{
"text": "Hello, this is a test."
}
```
### 3. Legacy File Transcription
```
POST /transcribe
```
Transcribes an uploaded audio file.
**Parameters:**
- `file` (required): Audio file (WAV, MP3, FLAC, M4A, OGG, WEBM)
- `language` (optional): Language code (e.g., "en", "es", "fr")
- `task` (optional): "transcribe" or "translate" (default: "transcribe")
- `model` (optional): Model size (default: "base")
**Example Request:**
```bash
curl -X POST http://localhost:8080/transcribe \
-F "file=@audio.wav" \
-F "language=en" \
-F "task=transcribe" \
-F "model=base"
```
**Response:**
```json
{
"success": true,
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "Hello, this is a test.",
"no_speech_prob": 0.1
}
],
"info": {
"language": "en",
"language_probability": 0.95,
"duration": 10.5,
"duration_after_vad": 10.5,
"transcription_options": {}
},
"filename": "audio.wav"
}
```
### 3. URL Transcription (Placeholder)
```
POST /transcribe/url
```
Endpoint for transcribing audio from URLs (ready for implementation).
## Usage Examples
### Python Client
```python
import requests
# Transcribe a file
with open('audio.wav', 'rb') as f:
response = requests.post('http://localhost:8080/transcribe',
files={'file': f},
data={'language': 'en', 'model': 'base'})
if response.status_code == 200:
result = response.json()
print(f"Transcription: {result['segments']}")
```
### JavaScript/Node.js
```javascript
const FormData = require('form-data');
const fs = require('fs');
const form = new FormData();
form.append('file', fs.createReadStream('audio.wav'));
form.append('language', 'en');
form.append('model', 'base');
fetch('http://localhost:8080/transcribe', {
method: 'POST',
body: form
})
.then(response => response.json())
.then(result => console.log(result));
```
### cURL
```bash
# Basic transcription
curl -X POST http://localhost:8080/transcribe \
-F "file=@audio.wav"
# With parameters
curl -X POST http://localhost:8080/transcribe \
-F "file=@audio.wav" \
-F "language=es" \
-F "task=translate" \
-F "model=small"
```
## Configuration
### Environment Variables
- `PORT_WHISPERLIVE`: WebSocket port (default: 5050)
- `HTTP_PORT`: HTTP port (default: 8080)
- `FASTERWHISPER_MODEL`: Custom model path
- `OMP_NUM_THREADS`: OpenMP thread count
### Docker Compose
```yaml
services:
whisperlive:
ports:
- "5050:5050" # WebSocket
- "8080:8080" # HTTP
environment:
PORT_WHISPERLIVE: 5050
HTTP_PORT: 8080
```
## Testing
### 1. Test Script
Run the Python test script:
```bash
python3 test_http_endpoints.py
```
### 2. Web Interface
Open `test_form.html` in a web browser to test the HTTP endpoints with a user-friendly interface.
### 3. Health Check
```bash
curl http://localhost:8080/health
```
## Backend Support
Currently, the HTTP endpoints support:
- **faster_whisper**: Full support for all features
- **tensorrt**: Basic support (needs adaptation)
- **openvino**: Basic support (needs adaptation)
## File Size Limits
- Maximum file size: 100MB
- Supported formats: WAV, MP3, FLAC, M4A, OGG, WEBM
## Performance Considerations
- File transcription uses the same model instance as WebSocket connections
- Temporary files are automatically cleaned up after processing
- Both services share GPU memory efficiently
- HTTP requests are processed in separate threads
## Troubleshooting
### Common Issues
1. **Port Already in Use**
- Check if ports 5050 or 8080 are available
- Use different ports via environment variables
2. **File Upload Errors**
- Ensure file size is under 100MB
- Check file format is supported
- Verify file is not corrupted
3. **GPU Memory Issues**
- Monitor GPU memory usage
- Consider using smaller model sizes
- Restart container if needed
### Logs
Check container logs for detailed error information:
```bash
docker logs whisperlive
```
## Migration from Original Server
The hybrid server is fully backward compatible. Your existing WebSocket clients will continue to work without changes. The HTTP endpoints are additional functionality that doesn't interfere with the original service.
## Future Enhancements
- [ ] Support for more audio formats
- [ ] Batch file processing
- [ ] Progress tracking for long files
- [ ] Authentication and rate limiting
- [ ] WebSocket support for file transcription progress