Image Captioning

docker pull cargoshipsh/image-captioning

Automatically generates a caption describing a given input image. This Image Captioning Transformer model was provided by by Yih-Dar Shieh & NLP Connect. On a moderate CPU it takes only a few seconds to generate a caption.



Drag and Drop or click
to upload your own image

Predicted Class

a person riding a surfboard on top of a wave in the ocean

This demo runs on a virtual server with 4 vCPUs and 16 GB Ram (~$20/month)


The model is licensed under the Apache 2.0 License and the code for the API wrapper is licensed under MIT License.

System Requirements

Minimum: 2GB RAM, 2 vCPU
Recommended: 4GB RAM, 4 vCPU


If you don't want to implement the model all by yourself, no worries. Benefit from our easy to use API and get started right away!

Get Started


Input [POST]

The input expects and image. This can be a URL or a base64 encoded image.

Option 1: URL

  "imageUrl": ""

Option 2: Base64

  "base64": "/9j/4AAQSkZJRgABAQEASABIAAD/2wBD..."


The output is one of the 1,000 ImageNet classes

  "caption": "a person riding a surfboard on top of a wave in the ocean"

You need to set an API Key via the environment variable API_KEY to run the image and set the X-API-KEY header in your request with the same KEY.

Need a more detailed setup guide?

To get more detailed instructions how to get started please check out our quick start guide in the docs.


Make sure you have Docker installed then run the following command:

docker run -p 80:80 --env API_KEY=CHANGE_ME cargoshipsh/image-captioning

In a new terminal window, run the following command to call the API

curl -X POST -H 'Content-type: application/json' -H 'X-API-Key: CHANGE_ME' --data '{"imageUrl": ""}' http://localhost:80

You see the output of the model in the terminal.

{"caption": "a person riding a surfboard on top of a wave in the ocean"}

Need help?

Join our Discord and ask away. We're happy to help where we can!

Join Discord