Image Captioning
docker pull cargoshipsh/image-captioning
Automatically generates a caption describing a given input image. This Image Captioning Transformer model was provided by by Yih-Dar Shieh & NLP Connect. On a moderate CPU it takes only a few seconds to generate a caption.
Demo
Image
Drag and Drop or click
to upload your own image
Predicted Class
a person riding a surfboard on top of a wave in the ocean
This demo runs on a virtual server with 4 vCPUs and 16 GB Ram (~$20/month)
License
The model is licensed under the Apache 2.0 License and the code for the API wrapper is licensed under MIT License.
System Requirements
Minimum: 2GB RAM, 2 vCPU
Recommended: 4GB RAM, 4 vCPU
API
If you don't want to implement the model all by yourself, no worries. Benefit from our easy to use API and get started right away!
Get StartedUsage
Input [POST]
The input expects and image. This can be a URL or a base64 encoded image.
Option 1: URL
{
"imageUrl": "https://images.unsplash.com/photo-1677496891133-f81cc7a4e56e?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=600&q=80"
}
Option 2: Base64
{
"base64": "/9j/4AAQSkZJRgABAQEASABIAAD/2wBD..."
}
Output
The output is one of the 1,000 ImageNet classes
{
"caption": "a person riding a surfboard on top of a wave in the ocean"
}
You need to set an API Key via the environment variable API_KEY
to run the image and set the X-API-KEY
header in your request with the same KEY.
Need a more detailed setup guide?
To get more detailed instructions how to get started please check out our quick start guide in the docs.
Example
Make sure you have Docker installed then run the following command:
docker run -p 80:80 --env API_KEY=CHANGE_ME cargoshipsh/image-captioning
In a new terminal window, run the following command to call the API
curl -X POST -H 'Content-type: application/json' -H 'X-API-Key: CHANGE_ME' --data '{"imageUrl": "https://images.unsplash.com/photo-1677496891133-f81cc7a4e56e?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=600&q=80"}' http://localhost:80
You see the output of the model in the terminal.
{"caption": "a person riding a surfboard on top of a wave in the ocean"}