Videos delivered using HLS (our default, adaptive-bitrate delivery method) can now have subtitles automatically generated. We use speech recognition/AI to analyse your video’s audio track and convert that speech to text. That can save you time, avoiding the need to transcribe the video yourself. And likely save you money, as while this automated transcription incurs an additional fee, it should be cheaper than obtaining those subtitles from an external agency.

Request an automated transcription

It’s simple to request an automated transcription. Once your video compeletes transcoding, click on its thumbnail image in our dashboard and you should see a Subtitles tab. Click the blue button to add subtitles. You will be asked for the language the audio is in. That helps the speech recognition software choose the optimal language model and ensures the resulting subtitles are correctly categorised. We support most major languages. Submit the request.

That request is processed in the background, leaving you free to complete other tasks. It usually completes faster than real-time. For example a ten-minute video should have subtitles generated within about five minutes. If successful, you do not need to do anything further: we will automatically add them to your video. So check back later, reload the player, and you should see them listed.

Note: The subtitles are not available in the player when using an MP4-only transcoding profile, since the generated WebVTT .vtt file can only be inserted into an adaptive bitrate HLS manifest.

Generally we’ve found the accuracy to be high. The best results are naturally obtained when the video features a single voice, speaking loudly and clearly.

If you would like to refine or edit the resulting subtitles, you can. You will see them listed in the Subtitles tab, next to the chosen language/label. There you will see an option to download them. Select that and you will then have a local copy which you can edit in a simple text editor such as Notepad.

What format is used?

Apple insist that HLS (short for HTTP Live Streaming, although it’s also used for video on-demand) uses the WebVTT format for subtitles. WebVTT stands for Web Video Text Track and uses the .vtt extension. As a result the generated subtitles will look something like this:


00:00:01.815 --> 00:00:03.114
- This is an example

As you can see, the file starts with WEBVTT, followed by a series of lines consisting of an index number, the time period the text should be shown between (HH:MM:SS.MMM) and then finally the text itself. So if you see any mistakes in your generated file, it’s easy to manually edit the text.

After maing your changes, make sure your new WebVTT file is valid using a free online validation tool, such as:

If you would like to replace the auto-generated HLS subtitles, delete the one already listed. Then once more click the button to add subtitles to the video. This time you will choose the option to upload a file (as you now have one). You can then submit that and the subtitles should be replaced within seconds.

Supported formats

As mentioned above, Apple requires the WebVTT format be used for subtitles referenced within the HLS manifests we use for both on-demand and live streaming. So we recommend you upload files in that same format. However we also support the popular, similar SubRip format. These files use the .srt extension and look something like this:

00:02:15,512 --> 00:02:19,378
This is an example
of the SubRip format

You’ll see they look very similar to WebVTT. However note the absence of the WEBVTT header and the slightly different timecode (using a comma to separate the seconds from the milliseconds).

Since the formats are very similar, it is easy to convert the two. However if you do not want to do this locally you can submit your subtitles to us in the SubRip format and we will handle converting them to WebVTT.

Other platform updates

The latest update brings some new developer documentation, an updated player HLS engine, and some minor updates to our dashboard. For example we now include a responsive copy of each video’s embed code to save you having to transform it using an external tool.