Video Service API

Audio/video file optimization service for Groq Whisper speech-to-text processing. Converts any audio/video file to an enhanced, optimized format for maximum Whisper transcription accuracy.

Base URL: https://video-service.buy-it.gr
Authentication: All endpoints (except docs and health) require X-API-Key header.

Audio Enhancement Pipeline

Each file goes through a multi-stage audio enhancement pipeline optimized for speech recognition:

  1. Audio extraction - strips video track, keeps only audio
  2. High-pass filter (80Hz) - removes rumble, hum, and low-frequency noise
  3. Low-pass filter (8kHz) - removes unnecessary high-frequency content above speech range
  4. Noise gate - reduces background noise between speech segments
  5. Dynamic compression - evens out volume (quiet parts louder, loud parts softer)
  6. EBU R128 loudness normalization - two-pass analysis for consistent volume level
  7. Downmix to mono 16kHz 64kbps MP3 - optimal format for Whisper API

Endpoints

GET /health

Health check. No authentication required.

curl https://video-service.buy-it.gr/health

Response: { "status": "ok", "ffmpeg": true }

GET /stats

Get service statistics. Requires authentication.

curl https://video-service.buy-it.gr/stats \
  -H "X-API-Key: YOUR_API_KEY"

Response:

{
  "totalJobs": 42,
  "queued": 0,
  "processing": 1,
  "completed": 38,
  "errored": 3,
  "topClients": [
    { "ip": "1.2.3.4", "requests": 25 },
    { "ip": "5.6.7.8", "requests": 17 }
  ]
}

POST /whisper_optimize

Upload an audio or video file for optimization. The file will be enhanced and converted to an optimal format for Groq Whisper transcription.

ParameterTypeDescription
filemultipart/form-dataThe audio/video file. Max 500MB. Field name must be file.
HeaderRequiredDescription
X-API-KeyYesYour API key
Content-TypeAutoSet automatically to multipart/form-data by your HTTP client
curl -X POST https://video-service.buy-it.gr/whisper_optimize \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "file=@recording.mp4"

Response (HTTP 202):

{
  "jobId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "queued",
  "message": "File uploaded, optimization started",
  "statusUrl": "/job/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

GET /job/:id

Check the status of an optimization job. Poll this endpoint until status is completed or error.

HeaderDescription
X-API-KeyYour API key
curl https://video-service.buy-it.gr/job/JOB_ID \
  -H "X-API-Key: YOUR_API_KEY"

Response (processing):

{
  "jobId": "...",
  "status": "processing",
  "step": "enhancing",
  "progress": 40,
  "originalFilename": "recording.mp4",
  "originalSizeMB": 45.2,
  "duration": 3600.5,
  "createdAt": "2026-02-12T10:00:00.000Z"
}

Response (completed, single file):

{
  "jobId": "...",
  "status": "completed",
  "step": "done",
  "progress": 100,
  "originalFilename": "recording.mp4",
  "originalSizeMB": 45.2,
  "duration": 3600.5,
  "outputSizeMB": 15.2,
  "totalChunks": 1,
  "chunks": [
    {
      "index": 0,
      "filename": "output.mp3",
      "sizeMB": 15.2,
      "downloadUrl": "/job/JOB_ID/download/output.mp3"
    }
  ]
}

Response (completed, chunked - files over 20MB):

{
  "jobId": "...",
  "status": "completed",
  "totalChunks": 3,
  "chunks": [
    { "index": 0, "filename": "chunk_000.mp3", "sizeMB": 19.8, "downloadUrl": "/job/JOB_ID/download/chunk_000.mp3" },
    { "index": 1, "filename": "chunk_001.mp3", "sizeMB": 19.8, "downloadUrl": "/job/JOB_ID/download/chunk_001.mp3" },
    { "index": 2, "filename": "chunk_002.mp3", "sizeMB": 5.4, "downloadUrl": "/job/JOB_ID/download/chunk_002.mp3" }
  ]
}

GET /job/:id/download/:filename

Download a converted file or chunk. Only available when job status is completed.

HeaderDescription
X-API-KeyYour API key
curl -O https://video-service.buy-it.gr/job/JOB_ID/download/output.mp3 \
  -H "X-API-Key: YOUR_API_KEY"

Supported Input Formats

Any audio or video format supported by FFmpeg. This includes virtually every format in existence:

Audio Formats

mp3, wav, aac, m4a, flac, ogg, opus, wma, amr, aiff, aif, au, caf, ac3, eac3, dts, dtshd, mlp, truehd, ape, wv, tta, tak, shn, mpc, mp2, mp1, gsm, g722, g723_1, g726, g729, ilbc, sbc, adx, brstm, bfstm, ast, hca, rka, wavarc, bonk, dfpwm, osq, sds, pvf, voc, vqf, sox, ircam, nsp, sln, w64, rf64, amb, oga, spx, ra, ram, rmvb, mka, xwma, xmd, fsb, msf, musx, ktss, ads, ss2, sap

Video Formats (audio will be extracted)

mp4, mkv, webm, avi, mov, flv, wmv, mpg, mpeg, m4v, 3gp, 3g2, mj2, ts, mts, m2ts, vob, ogv, rm, asf, swf, f4v, divx, dv, mxf, nut, nsv, gxf, roq, bink, smk, flic, tmv, yuv, y4m, ivf, wtv

Other / Raw Formats

pcm (s16le, s24le, s32le, f32le, f64le, u8, mulaw, alaw), rawvideo, dat, bin, srt, ass, ssa, vtt (subtitle tracks ignored)

Output Specification

PropertyValue
FormatMP3
Sample Rate16,000 Hz
Channels1 (Mono)
Bitrate64 kbps
Max chunk size20 MB
EnhancementNoise gate + compression + bandpass + loudnorm

Job Status Values

The progress field (0-100) gives an estimated completion percentage.

StatusStepProgressDescription
queueduploaded0%File received, waiting to process
processinganalyzing5%Analyzing input file format and streams
processinganalyzing_loudness10%Two-pass loudness analysis for normalization
processingenhancing40%Applying audio enhancement pipeline
processingsplitting85%Splitting large output into 20MB chunks
completeddone100%Files ready for download
error--Processing failed (see error field)

Speaker Diarization

Identify who speaks when using pyannote.audio. Requires HF_TOKEN env var with access to the gated pyannote models on Hugging Face.

Setup: Set HF_TOKEN in .env. Accept model terms at:
pyannote/speaker-diarization-3.1
pyannote/segmentation-3.0

POST /diarize

Start speaker diarization on a completed optimization job. Runs asynchronously.

ParameterTypeDescription
jobIdstring (JSON body)ID of a completed /whisper_optimize job
numSpeakersnumber (optional)Expected number of speakers. 0 or omit for auto-detect.
curl -X POST https://video-service.buy-it.gr/diarize \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"jobId": "JOB_ID", "numSpeakers": 3}'

Response (HTTP 202):

{
  "jobId": "...",
  "status": "running",
  "message": "Diarization started"
}

GET /diarize/:jobId

Poll diarization status. When status is completed, the result field contains speaker segments.

curl https://video-service.buy-it.gr/diarize/JOB_ID \
  -H "X-API-Key: YOUR_API_KEY"

Response (running):

{
  "jobId": "...",
  "status": "running",
  "startedAt": "2026-02-13T14:00:00.000Z",
  "result": null,
  "error": null
}

Response (completed):

{
  "jobId": "...",
  "status": "completed",
  "result": {
    "segments": [
      { "start": 0.0, "end": 5.2, "speaker": "SPEAKER_00" },
      { "start": 5.4, "end": 12.8, "speaker": "SPEAKER_01" },
      { "start": 13.1, "end": 18.5, "speaker": "SPEAKER_00" }
    ],
    "speakers": [
      { "id": "SPEAKER_00", "total_seconds": 120.5 },
      { "id": "SPEAKER_01", "total_seconds": 85.3 }
    ],
    "num_speakers": 2,
    "total_segments": 45
  }
}

Response (error):

{
  "jobId": "...",
  "status": "error",
  "error": "Diarization failed: ..."
}

Typical Integration Flow

1. POST /whisper_optimize with audio file -> get jobId
2. Poll GET /job/{jobId} every 2-3 seconds
3. When status == "completed":
   a. POST /diarize with jobId (optional, runs in parallel)
   b. For each chunk in chunks array:
      - GET /job/{jobId}/download/{chunk.filename}
      - Send chunk to Groq Whisper API
   c. Concatenate transcription results in order
   d. Poll GET /diarize/{jobId} for speaker segments
   e. Merge speaker segments with transcription using timestamps

Limits

POST /demo_trim

Trim a completed job's optimized audio to a maximum duration. Useful for free-tier/demo users who should only receive the first N minutes of audio.

Body (JSON)TypeDescription
jobIdstringThe completed job ID
maxSecondsnumberMaximum duration in seconds (e.g. 300 for 5 minutes)
curl -X POST https://video-service.buy-it.gr/demo_trim \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"jobId": "JOB_ID", "maxSeconds": 300}'

Response (trimmed):

{
  "jobId": "...",
  "trimmed": true,
  "maxSeconds": 300,
  "originalDuration": 1416.5,
  "outputSizeMB": 2.3,
  "totalChunks": 1,
  "chunks": [
    { "index": 0, "filename": "trimmed.mp3", "sizeMB": 2.3, "downloadUrl": "/job/JOB_ID/download/trimmed.mp3" }
  ]
}

Response (no trim needed):

{
  "jobId": "...",
  "trimmed": false,
  "message": "Audio is already within the limit",
  "duration": 180.5,
  "totalChunks": 1,
  "chunks": [...]
}