261 lines
6.3 KiB
Markdown
261 lines
6.3 KiB
Markdown
# WhisperLive Hybrid Server
|
|
|
|
This hybrid server extends the original WhisperLive-Server to support both WebSocket connections (for real-time audio streaming) and HTTP endpoints (for file transcription) in a single container.
|
|
|
|
## Features
|
|
|
|
- **WebSocket Server**: Original real-time audio transcription functionality
|
|
- **HTTP Server**: New file upload and transcription endpoints
|
|
- **Single Container**: Both services run in the same Docker container
|
|
- **GPU Sharing**: Both services share the same GPU resources
|
|
|
|
## Architecture
|
|
|
|
The hybrid server runs two services simultaneously:
|
|
1. **WebSocket Server**: Handles real-time audio streaming transcription
|
|
2. **HTTP Server**: Handles file uploads and transcription requests
|
|
|
|
Both services use the same WhisperLive transcriber instance, ensuring efficient resource usage.
|
|
|
|
## Ports
|
|
|
|
- **WebSocket Port**: Default 5050 (configurable via `PORT_WHISPERLIVE`)
|
|
- **HTTP Port**: Default 8080 (configurable via `HTTP_PORT`)
|
|
|
|
## HTTP Endpoints
|
|
|
|
### 1. Health Check
|
|
```
|
|
GET /health
|
|
```
|
|
Returns server health status.
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"status": "healthy",
|
|
"service": "WhisperLive Hybrid Server"
|
|
}
|
|
```
|
|
|
|
### 2. OpenAI Compatible Endpoints
|
|
```
|
|
POST /v1/audio/transcriptions
|
|
POST /v1/audio/translations
|
|
```
|
|
Fully compatible drop-in replacements for the standard OpenAI Whisper API.
|
|
|
|
**Parameters:**
|
|
- `file` (required): Audio file (WAV, MP3, FLAC, M4A, OGG, WEBM, MP4, MPEG, MPGA)
|
|
- `model` (optional): Model size (default: "base")
|
|
- `language` (optional): Language code (e.g., "en", "es", "fr")
|
|
- `prompt` (optional): Text to guide the model's style
|
|
- `response_format` (optional): "json", "text", "srt", "verbose_json", "vtt" (default: "json")
|
|
- `temperature` (optional): Sampling temperature (0.0 to 1.0)
|
|
|
|
**Example Request:**
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/audio/transcriptions \
|
|
-H "Content-Type: multipart/form-data" \
|
|
-F "file=@audio.wav" \
|
|
-F "model=whisper-1" \
|
|
-F "response_format=json"
|
|
```
|
|
|
|
**Response (JSON format):**
|
|
```json
|
|
{
|
|
"text": "Hello, this is a test."
|
|
}
|
|
```
|
|
|
|
### 3. Legacy File Transcription
|
|
```
|
|
POST /transcribe
|
|
```
|
|
Transcribes an uploaded audio file.
|
|
|
|
**Parameters:**
|
|
- `file` (required): Audio file (WAV, MP3, FLAC, M4A, OGG, WEBM)
|
|
- `language` (optional): Language code (e.g., "en", "es", "fr")
|
|
- `task` (optional): "transcribe" or "translate" (default: "transcribe")
|
|
- `model` (optional): Model size (default: "base")
|
|
|
|
**Example Request:**
|
|
```bash
|
|
curl -X POST http://localhost:8080/transcribe \
|
|
-F "file=@audio.wav" \
|
|
-F "language=en" \
|
|
-F "task=transcribe" \
|
|
-F "model=base"
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"segments": [
|
|
{
|
|
"start": 0.0,
|
|
"end": 2.5,
|
|
"text": "Hello, this is a test.",
|
|
"no_speech_prob": 0.1
|
|
}
|
|
],
|
|
"info": {
|
|
"language": "en",
|
|
"language_probability": 0.95,
|
|
"duration": 10.5,
|
|
"duration_after_vad": 10.5,
|
|
"transcription_options": {}
|
|
},
|
|
"filename": "audio.wav"
|
|
}
|
|
```
|
|
|
|
### 3. URL Transcription (Placeholder)
|
|
```
|
|
POST /transcribe/url
|
|
```
|
|
Endpoint for transcribing audio from URLs (ready for implementation).
|
|
|
|
## Usage Examples
|
|
|
|
### Python Client
|
|
```python
|
|
import requests
|
|
|
|
# Transcribe a file
|
|
with open('audio.wav', 'rb') as f:
|
|
response = requests.post('http://localhost:8080/transcribe',
|
|
files={'file': f},
|
|
data={'language': 'en', 'model': 'base'})
|
|
|
|
if response.status_code == 200:
|
|
result = response.json()
|
|
print(f"Transcription: {result['segments']}")
|
|
```
|
|
|
|
### JavaScript/Node.js
|
|
```javascript
|
|
const FormData = require('form-data');
|
|
const fs = require('fs');
|
|
|
|
const form = new FormData();
|
|
form.append('file', fs.createReadStream('audio.wav'));
|
|
form.append('language', 'en');
|
|
form.append('model', 'base');
|
|
|
|
fetch('http://localhost:8080/transcribe', {
|
|
method: 'POST',
|
|
body: form
|
|
})
|
|
.then(response => response.json())
|
|
.then(result => console.log(result));
|
|
```
|
|
|
|
### cURL
|
|
```bash
|
|
# Basic transcription
|
|
curl -X POST http://localhost:8080/transcribe \
|
|
-F "file=@audio.wav"
|
|
|
|
# With parameters
|
|
curl -X POST http://localhost:8080/transcribe \
|
|
-F "file=@audio.wav" \
|
|
-F "language=es" \
|
|
-F "task=translate" \
|
|
-F "model=small"
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
- `PORT_WHISPERLIVE`: WebSocket port (default: 5050)
|
|
- `HTTP_PORT`: HTTP port (default: 8080)
|
|
- `FASTERWHISPER_MODEL`: Custom model path
|
|
- `OMP_NUM_THREADS`: OpenMP thread count
|
|
|
|
### Docker Compose
|
|
```yaml
|
|
services:
|
|
whisperlive:
|
|
ports:
|
|
- "5050:5050" # WebSocket
|
|
- "8080:8080" # HTTP
|
|
environment:
|
|
PORT_WHISPERLIVE: 5050
|
|
HTTP_PORT: 8080
|
|
```
|
|
|
|
## Testing
|
|
|
|
### 1. Test Script
|
|
Run the Python test script:
|
|
```bash
|
|
python3 test_http_endpoints.py
|
|
```
|
|
|
|
### 2. Web Interface
|
|
Open `test_form.html` in a web browser to test the HTTP endpoints with a user-friendly interface.
|
|
|
|
### 3. Health Check
|
|
```bash
|
|
curl http://localhost:8080/health
|
|
```
|
|
|
|
## Backend Support
|
|
|
|
Currently, the HTTP endpoints support:
|
|
- **faster_whisper**: Full support for all features
|
|
- **tensorrt**: Basic support (needs adaptation)
|
|
- **openvino**: Basic support (needs adaptation)
|
|
|
|
## File Size Limits
|
|
|
|
- Maximum file size: 100MB
|
|
- Supported formats: WAV, MP3, FLAC, M4A, OGG, WEBM
|
|
|
|
## Performance Considerations
|
|
|
|
- File transcription uses the same model instance as WebSocket connections
|
|
- Temporary files are automatically cleaned up after processing
|
|
- Both services share GPU memory efficiently
|
|
- HTTP requests are processed in separate threads
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **Port Already in Use**
|
|
- Check if ports 5050 or 8080 are available
|
|
- Use different ports via environment variables
|
|
|
|
2. **File Upload Errors**
|
|
- Ensure file size is under 100MB
|
|
- Check file format is supported
|
|
- Verify file is not corrupted
|
|
|
|
3. **GPU Memory Issues**
|
|
- Monitor GPU memory usage
|
|
- Consider using smaller model sizes
|
|
- Restart container if needed
|
|
|
|
### Logs
|
|
Check container logs for detailed error information:
|
|
```bash
|
|
docker logs whisperlive
|
|
```
|
|
|
|
## Migration from Original Server
|
|
|
|
The hybrid server is fully backward compatible. Your existing WebSocket clients will continue to work without changes. The HTTP endpoints are additional functionality that doesn't interfere with the original service.
|
|
|
|
## Future Enhancements
|
|
|
|
- [ ] Support for more audio formats
|
|
- [ ] Batch file processing
|
|
- [ ] Progress tracking for long files
|
|
- [ ] Authentication and rate limiting
|
|
- [ ] WebSocket support for file transcription progress
|