Audio/video file optimization service for Groq Whisper speech-to-text processing. Converts any audio/video file to an enhanced, optimized format for maximum Whisper transcription accuracy.
https://video-service.buy-it.grX-API-Key header.
Each file goes through a multi-stage audio enhancement pipeline optimized for speech recognition:
Health check. No authentication required.
curl https://video-service.buy-it.gr/health
Response: { "status": "ok", "ffmpeg": true }
Get service statistics. Requires authentication.
curl https://video-service.buy-it.gr/stats \
-H "X-API-Key: YOUR_API_KEY"
Response:
{
"totalJobs": 42,
"queued": 0,
"processing": 1,
"completed": 38,
"errored": 3,
"topClients": [
{ "ip": "1.2.3.4", "requests": 25 },
{ "ip": "5.6.7.8", "requests": 17 }
]
}
Upload an audio or video file for optimization. The file will be enhanced and converted to an optimal format for Groq Whisper transcription.
| Parameter | Type | Description |
|---|---|---|
| file | multipart/form-data | The audio/video file. Max 500MB. Field name must be file. |
| Header | Required | Description |
|---|---|---|
X-API-Key | Yes | Your API key |
Content-Type | Auto | Set automatically to multipart/form-data by your HTTP client |
curl -X POST https://video-service.buy-it.gr/whisper_optimize \
-H "X-API-Key: YOUR_API_KEY" \
-F "file=@recording.mp4"
Response (HTTP 202):
{
"jobId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "queued",
"message": "File uploaded, optimization started",
"statusUrl": "/job/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
Check the status of an optimization job. Poll this endpoint until status is completed or error.
| Header | Description |
|---|---|
X-API-Key | Your API key |
curl https://video-service.buy-it.gr/job/JOB_ID \
-H "X-API-Key: YOUR_API_KEY"
Response (processing):
{
"jobId": "...",
"status": "processing",
"step": "enhancing",
"progress": 40,
"originalFilename": "recording.mp4",
"originalSizeMB": 45.2,
"duration": 3600.5,
"createdAt": "2026-02-12T10:00:00.000Z"
}
Response (completed, single file):
{
"jobId": "...",
"status": "completed",
"step": "done",
"progress": 100,
"originalFilename": "recording.mp4",
"originalSizeMB": 45.2,
"duration": 3600.5,
"outputSizeMB": 15.2,
"totalChunks": 1,
"chunks": [
{
"index": 0,
"filename": "output.mp3",
"sizeMB": 15.2,
"downloadUrl": "/job/JOB_ID/download/output.mp3"
}
]
}
Response (completed, chunked - files over 20MB):
{
"jobId": "...",
"status": "completed",
"totalChunks": 3,
"chunks": [
{ "index": 0, "filename": "chunk_000.mp3", "sizeMB": 19.8, "downloadUrl": "/job/JOB_ID/download/chunk_000.mp3" },
{ "index": 1, "filename": "chunk_001.mp3", "sizeMB": 19.8, "downloadUrl": "/job/JOB_ID/download/chunk_001.mp3" },
{ "index": 2, "filename": "chunk_002.mp3", "sizeMB": 5.4, "downloadUrl": "/job/JOB_ID/download/chunk_002.mp3" }
]
}
Download a converted file or chunk. Only available when job status is completed.
| Header | Description |
|---|---|
X-API-Key | Your API key |
curl -O https://video-service.buy-it.gr/job/JOB_ID/download/output.mp3 \
-H "X-API-Key: YOUR_API_KEY"
Any audio or video format supported by FFmpeg. This includes virtually every format in existence:
mp3, wav, aac, m4a, flac, ogg, opus, wma, amr, aiff, aif, au, caf, ac3, eac3, dts, dtshd, mlp, truehd, ape, wv, tta, tak, shn, mpc, mp2, mp1, gsm, g722, g723_1, g726, g729, ilbc, sbc, adx, brstm, bfstm, ast, hca, rka, wavarc, bonk, dfpwm, osq, sds, pvf, voc, vqf, sox, ircam, nsp, sln, w64, rf64, amb, oga, spx, ra, ram, rmvb, mka, xwma, xmd, fsb, msf, musx, ktss, ads, ss2, sap
mp4, mkv, webm, avi, mov, flv, wmv, mpg, mpeg, m4v, 3gp, 3g2, mj2, ts, mts, m2ts, vob, ogv, rm, asf, swf, f4v, divx, dv, mxf, nut, nsv, gxf, roq, bink, smk, flic, tmv, yuv, y4m, ivf, wtv
pcm (s16le, s24le, s32le, f32le, f64le, u8, mulaw, alaw), rawvideo, dat, bin, srt, ass, ssa, vtt (subtitle tracks ignored)
| Property | Value |
|---|---|
| Format | MP3 |
| Sample Rate | 16,000 Hz |
| Channels | 1 (Mono) |
| Bitrate | 64 kbps |
| Max chunk size | 20 MB |
| Enhancement | Noise gate + compression + bandpass + loudnorm |
The progress field (0-100) gives an estimated completion percentage.
| Status | Step | Progress | Description |
|---|---|---|---|
queued | uploaded | 0% | File received, waiting to process |
processing | analyzing | 5% | Analyzing input file format and streams |
processing | analyzing_loudness | 10% | Two-pass loudness analysis for normalization |
processing | enhancing | 40% | Applying audio enhancement pipeline |
processing | splitting | 85% | Splitting large output into 20MB chunks |
completed | done | 100% | Files ready for download |
error | - | - | Processing failed (see error field) |
Identify who speaks when using pyannote.audio. Requires HF_TOKEN env var with access to the gated pyannote models on Hugging Face.
HF_TOKEN in .env. Accept model terms at:Start speaker diarization on a completed optimization job. Runs asynchronously.
| Parameter | Type | Description |
|---|---|---|
| jobId | string (JSON body) | ID of a completed /whisper_optimize job |
| numSpeakers | number (optional) | Expected number of speakers. 0 or omit for auto-detect. |
curl -X POST https://video-service.buy-it.gr/diarize \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"jobId": "JOB_ID", "numSpeakers": 3}'
Response (HTTP 202):
{
"jobId": "...",
"status": "running",
"message": "Diarization started"
}
Poll diarization status. When status is completed, the result field contains speaker segments.
curl https://video-service.buy-it.gr/diarize/JOB_ID \
-H "X-API-Key: YOUR_API_KEY"
Response (running):
{
"jobId": "...",
"status": "running",
"startedAt": "2026-02-13T14:00:00.000Z",
"result": null,
"error": null
}
Response (completed):
{
"jobId": "...",
"status": "completed",
"result": {
"segments": [
{ "start": 0.0, "end": 5.2, "speaker": "SPEAKER_00" },
{ "start": 5.4, "end": 12.8, "speaker": "SPEAKER_01" },
{ "start": 13.1, "end": 18.5, "speaker": "SPEAKER_00" }
],
"speakers": [
{ "id": "SPEAKER_00", "total_seconds": 120.5 },
{ "id": "SPEAKER_01", "total_seconds": 85.3 }
],
"num_speakers": 2,
"total_segments": 45
}
}
Response (error):
{
"jobId": "...",
"status": "error",
"error": "Diarization failed: ..."
}
1. POST /whisper_optimize with audio file -> get jobId
2. Poll GET /job/{jobId} every 2-3 seconds
3. When status == "completed":
a. POST /diarize with jobId (optional, runs in parallel)
b. For each chunk in chunks array:
- GET /job/{jobId}/download/{chunk.filename}
- Send chunk to Groq Whisper API
c. Concatenate transcription results in order
d. Poll GET /diarize/{jobId} for speaker segments
e. Merge speaker segments with transcription using timestamps
Trim a completed job's optimized audio to a maximum duration. Useful for free-tier/demo users who should only receive the first N minutes of audio.
| Body (JSON) | Type | Description |
|---|---|---|
| jobId | string | The completed job ID |
| maxSeconds | number | Maximum duration in seconds (e.g. 300 for 5 minutes) |
curl -X POST https://video-service.buy-it.gr/demo_trim \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"jobId": "JOB_ID", "maxSeconds": 300}'
Response (trimmed):
{
"jobId": "...",
"trimmed": true,
"maxSeconds": 300,
"originalDuration": 1416.5,
"outputSizeMB": 2.3,
"totalChunks": 1,
"chunks": [
{ "index": 0, "filename": "trimmed.mp3", "sizeMB": 2.3, "downloadUrl": "/job/JOB_ID/download/trimmed.mp3" }
]
}
Response (no trim needed):
{
"jobId": "...",
"trimmed": false,
"message": "Audio is already within the limit",
"duration": 180.5,
"totalChunks": 1,
"chunks": [...]
}