Once you upgrade to Medium plan, we host your model on our high-end GPU enabled machine. This gets down the prediction time to around 1 second, most of the time is spent to upload your image to our servers, the model itself takes less than 1 second to predict. To do away with the internet bottleneck, you can also download a docker image (containing your model) and run predictions locally, this would ensure sub-second prediction times (depending on your machine specifications).

Did this answer your question?