Audio Transcription

docker pull cargoshipsh/whisper-tiny-en

Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model. It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision. This model was contributed by openai.


The model is licensed under the Apache 2.0 License and the code for the API wrapper is licensed under MIT License.

System Requirements

Minimum: 2GB RAM, 2 vCPU
Recommended: 4GB RAM, 4 vCPU


Input [POST]

The input expects and audio file.

Option 1: URL

  "url": ""


The output is the converted text as a string

  "caption": "a person riding a surfboard on top of a wave in the ocean"

You need to set an API Key via the environment variable API_KEY to run the image and set the X-API-KEY header in your request with the same KEY.

Need a more detailed setup guide?

To get more detailed instructions how to get started please check out our quick start guide in the docs.


Make sure you have Docker installed then run the following command:

docker run -p 80:80 --env API_KEY=CHANGE_ME cargoshipsh/whisper-tiny-en

In a new terminal window, run the following command to call the API

curl -X POST -H 'Content-type: application/json' -H 'X-API-Key: CHANGE_ME' --data '{"url": ""}' http://localhost:80

You see the output of the model in the terminal.

{"caption": "a person riding a surfboard on top of a wave in the ocean"}

Need help?

Join our Discord and ask away. We're happy to help where we can!

Join Discord