Video Service API - Groq Whisper Audio Optimizer

Audio/video file optimization service for Groq Whisper speech-to-text processing. Converts any audio/video file to an enhanced, optimized format for maximum Whisper transcription accuracy.

Audio Enhancement Pipeline

Each file goes through a multi-stage audio enhancement pipeline optimized for speech recognition:

Audio extraction - strips video track, keeps only audio
High-pass filter (80Hz) - removes rumble, hum, and low-frequency noise
Low-pass filter (8kHz) - removes unnecessary high-frequency content above speech range
Noise gate - reduces background noise between speech segments
Dynamic compression - evens out volume (quiet parts louder, loud parts softer)
EBU R128 loudness normalization - two-pass analysis for consistent volume level
Downmix to mono 16kHz 64kbps MP3 - optimal format for Whisper API

Endpoints

GET /health

Health check. No authentication required.

curl https://video-service.buy-it.gr/health

Response: { "status": "ok", "ffmpeg": true }

GET /stats

Get service statistics. Requires authentication.

curl https://video-service.buy-it.gr/stats \
  -H "X-API-Key: YOUR_API_KEY"

Response:

{
  "totalJobs": 42,
  "queued": 0,
  "processing": 1,
  "completed": 38,
  "errored": 3,
  "topClients": [
    { "ip": "1.2.3.4", "requests": 25 },
    { "ip": "5.6.7.8", "requests": 17 }
  ]
}

POST /whisper_optimize

Upload an audio or video file for optimization. The file will be enhanced and converted to an optimal format for Groq Whisper transcription.

Parameter	Type	Description
file	multipart/form-data	The audio/video file. Max 500MB. Field name must be `file`.

Header	Required	Description
`X-API-Key`	Yes	Your API key
`Content-Type`	Auto	Set automatically to `multipart/form-data` by your HTTP client

curl -X POST https://video-service.buy-it.gr/whisper_optimize \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "file=@recording.mp4"

Response (HTTP 202):

{
  "jobId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "queued",
  "message": "File uploaded, optimization started",
  "statusUrl": "/job/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

GET /job/:id

Check the status of an optimization job. Poll this endpoint until status is completed or error.

Header	Description
`X-API-Key`	Your API key

curl https://video-service.buy-it.gr/job/JOB_ID \
  -H "X-API-Key: YOUR_API_KEY"

Response (processing):

{
  "jobId": "...",
  "status": "processing",
  "step": "enhancing",
  "progress": 40,
  "originalFilename": "recording.mp4",
  "originalSizeMB": 45.2,
  "duration": 3600.5,
  "createdAt": "2026-02-12T10:00:00.000Z"
}

Response (completed, single file):

{
  "jobId": "...",
  "status": "completed",
  "step": "done",
  "progress": 100,
  "originalFilename": "recording.mp4",
  "originalSizeMB": 45.2,
  "duration": 3600.5,
  "outputSizeMB": 15.2,
  "totalChunks": 1,
  "chunks": [
    {
      "index": 0,
      "filename": "output.mp3",
      "sizeMB": 15.2,
      "downloadUrl": "/job/JOB_ID/download/output.mp3"
    }
  ]
}

Response (completed, chunked - files over 20MB):

{
  "jobId": "...",
  "status": "completed",
  "totalChunks": 3,
  "chunks": [
    { "index": 0, "filename": "chunk_000.mp3", "sizeMB": 19.8, "downloadUrl": "/job/JOB_ID/download/chunk_000.mp3" },
    { "index": 1, "filename": "chunk_001.mp3", "sizeMB": 19.8, "downloadUrl": "/job/JOB_ID/download/chunk_001.mp3" },
    { "index": 2, "filename": "chunk_002.mp3", "sizeMB": 5.4, "downloadUrl": "/job/JOB_ID/download/chunk_002.mp3" }
  ]
}

GET /job/:id/download/:filename

Download a converted file or chunk. Only available when job status is completed.

Header	Description
`X-API-Key`	Your API key

curl -O https://video-service.buy-it.gr/job/JOB_ID/download/output.mp3 \
  -H "X-API-Key: YOUR_API_KEY"

Supported Input Formats

Any audio or video format supported by FFmpeg. This includes virtually every format in existence:

Audio Formats

mp3, wav, aac, m4a, flac, ogg, opus, wma, amr, aiff, aif, au, caf, ac3, eac3, dts, dtshd, mlp, truehd, ape, wv, tta, tak, shn, mpc, mp2, mp1, gsm, g722, g723_1, g726, g729, ilbc, sbc, adx, brstm, bfstm, ast, hca, rka, wavarc, bonk, dfpwm, osq, sds, pvf, voc, vqf, sox, ircam, nsp, sln, w64, rf64, amb, oga, spx, ra, ram, rmvb, mka, xwma, xmd, fsb, msf, musx, ktss, ads, ss2, sap

Video Formats (audio will be extracted)

mp4, mkv, webm, avi, mov, flv, wmv, mpg, mpeg, m4v, 3gp, 3g2, mj2, ts, mts, m2ts, vob, ogv, rm, asf, swf, f4v, divx, dv, mxf, nut, nsv, gxf, roq, bink, smk, flic, tmv, yuv, y4m, ivf, wtv

Other / Raw Formats

pcm (s16le, s24le, s32le, f32le, f64le, u8, mulaw, alaw), rawvideo, dat, bin, srt, ass, ssa, vtt (subtitle tracks ignored)

Output Specification

Job Status Values

Speaker Diarization

Property	Value
Format	MP3
Sample Rate	16,000 Hz
Channels	1 (Mono)
Bitrate	64 kbps
Max chunk size	20 MB
Enhancement	Noise gate + compression + bandpass + loudnorm

Status	Step	Progress	Description
`queued`	`uploaded`	0%	File received, waiting to process
`processing`	`analyzing`	5%	Analyzing input file format and streams
`processing`	`analyzing_loudness`	10%	Two-pass loudness analysis for normalization
`processing`	`enhancing`	40%	Applying audio enhancement pipeline
`processing`	`splitting`	85%	Splitting large output into 20MB chunks
`completed`	`done`	100%	Files ready for download
`error`	-	-	Processing failed (see `error` field)

Identify who speaks when using pyannote.audio. Requires HF_TOKEN env var with access to the gated pyannote models on Hugging Face.

POST /diarize

Start speaker diarization on a completed optimization job. Runs asynchronously.

Parameter	Type	Description
jobId	string (JSON body)	ID of a completed `/whisper_optimize` job
numSpeakers	number (optional)	Expected number of speakers. 0 or omit for auto-detect.

curl -X POST https://video-service.buy-it.gr/diarize \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"jobId": "JOB_ID", "numSpeakers": 3}'

Response (HTTP 202):

{
  "jobId": "...",
  "status": "running",
  "message": "Diarization started"
}

GET /diarize/:jobId

Poll diarization status. When status is completed, the result field contains speaker segments.

curl https://video-service.buy-it.gr/diarize/JOB_ID \
  -H "X-API-Key: YOUR_API_KEY"

Response (running):

{
  "jobId": "...",
  "status": "running",
  "startedAt": "2026-02-13T14:00:00.000Z",
  "result": null,
  "error": null
}

Response (completed):

{
  "jobId": "...",
  "status": "completed",
  "result": {
    "segments": [
      { "start": 0.0, "end": 5.2, "speaker": "SPEAKER_00" },
      { "start": 5.4, "end": 12.8, "speaker": "SPEAKER_01" },
      { "start": 13.1, "end": 18.5, "speaker": "SPEAKER_00" }
    ],
    "speakers": [
      { "id": "SPEAKER_00", "total_seconds": 120.5 },
      { "id": "SPEAKER_01", "total_seconds": 85.3 }
    ],
    "num_speakers": 2,
    "total_segments": 45
  }
}

Response (error):

{
  "jobId": "...",
  "status": "error",
  "error": "Diarization failed: ..."
}

Typical Integration Flow

Limits

POST /demo_trim

Trim a completed job's optimized audio to a maximum duration. Useful for free-tier/demo users who should only receive the first N minutes of audio.

Body (JSON)	Type	Description
jobId	string	The completed job ID
maxSeconds	number	Maximum duration in seconds (e.g. 300 for 5 minutes)

curl -X POST https://video-service.buy-it.gr/demo_trim \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"jobId": "JOB_ID", "maxSeconds": 300}'

Response (trimmed):

{
  "jobId": "...",
  "trimmed": true,
  "maxSeconds": 300,
  "originalDuration": 1416.5,
  "outputSizeMB": 2.3,
  "totalChunks": 1,
  "chunks": [
    { "index": 0, "filename": "trimmed.mp3", "sizeMB": 2.3, "downloadUrl": "/job/JOB_ID/download/trimmed.mp3" }
  ]
}

Response (no trim needed):

{
  "jobId": "...",
  "trimmed": false,
  "message": "Audio is already within the limit",
  "duration": 180.5,
  "totalChunks": 1,
  "chunks": [...]
}