HLS (HTTP Live Streaming) can be quite confusing, particularly when it comes to using subtitles or closed captions. Since we use HLS extensively to distribute your video content (despite its name, it is widely used for video on-demand too) we have tried to explain how it works.
Closed captions and subtitles are for different purposes.
Subtitles assume the viewer can hear the audio but needs a text alternative to the spoken dialogue. An example would be translating the dialogue into a different language.
Closed captions assume the viewer can’t hear the audio. As such, the captions also describe the soundtrack such as background noise, telephones ringing, and other audio cues.
If you use an Apple device, you may have seen the letters SDH. That stands for “subtitles for the deaf and hard of hearing”. So subtitles are treated as captions.
Yes. In Apple’s HLS specification it supports both subtitles and closed captions. Apple recommends including them to improve accessibility. The supported formats are CEA-608, CEA-708, WebVTT and IMSC1.
But note the HLS specification says:
“Closed captions (if any) MUST be included in the video Media Segments”
That’s important and so makes closed captions more complicated to manage. Since the captions are intertwined with the video segments. The captions need to be available at the point the video is uploaded. If not, the video content has to be re-packaged each time closed captions are added or changed. And so that incurs extra costs in storage, compute, and time.
Using subtitles in HLS is therefore more flexible as they can be served using a separate file. And so managed independently. If you add subtitles later (such as when a translator provides the subtitles to you, in, say, German), that can be added to a HLS manifest without having to touch the media files. That’s much easier too, as a HLS manifest is simply a text file. It means the media files can remain in cache, improving performance.
Subtitles can be treated the same as closed captions (to provide accessibility to those who are hard of hearing) by:
adding a CHARACTERISTICS attribute such as CHARACTERISTICS=“public.accessibility.describes-spoken-dialog,public.accessibility.describes-music-and-sound”.
setting an AUTOSELECT attribute with a value of “YES”
We’ll go into a bit more detail later on.
The dominant subtitle file formats are TTML (Timed Text Markup Language), SRT (SubRip) and WebVTT (Web Video Text Tracks Format).
The HLS specification mentioned above is clear about which must be used:
“Subtitles MUST be WebVTT (according to the HLS specification) or IMSC1 in fMP4”
IMSC1 is TTML Profiles for Internet Media Subtitles and Captions.
Like many online video platfoms, we use fMP4 for our on-demand videos. And so we use WebVTT for all of our subtitles. WebVTT is a W3C standard, it’s widely supported, and it’s easy to read.
WebVTT is a text format for specifiying when captions or subtitles should be shown. Each file must be encoded using UTF-8.
They will normally have the extension .vtt and have the MIME type of text/vtt.
A WebVTT file should contain a single WEBVTT line, followed by a blank line, and then zero or more blocks of text. Each has a line that tells the client the times during which that text should be shown.
WEBVTT 00:00:01.000 --> 00:00:04.000 - This is an example 00:00:05.000 --> 00:00:09.000 - of using WebVTT.
In the example above we have used the timecode format of HH:MM:SS:MMM and so the first line would be shown during its accompanying video between 1 second and 4 seconds.
If you would like to find out more about WebVTT there is a good explanation on the Mozilla site: https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API.
An example of a primary HLS manifest containing subtitles is shown below. This video has a single subtitles track, in English:
#EXTM3U #EXT-X-INDEPENDENT-SEGMENTS #EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",NAME="Default",AUTOSELECT=YES,LANGUAGE="en",CHANNELS="2",URI="aac_128k/audio.m3u8" #EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=NO,AUTOSELECT=YES,FORCED=NO,LANGUAGE="en",CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog",URI="subtitles_en.m3u8" #EXT-X-STREAM-INF:BANDWIDTH=629166,AVERAGE-BANDWIDTH=480134,CODECS="avc1.4d4015,mp4a.40.2",RESOLUTION=426x240,FRAME-RATE=24.000,AUDIO="audio",SUBTITLES="subs",VIDEO-RANGE=SDR h264_240p/video.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=1083378,AVERAGE-BANDWIDTH=733406,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=640x360,FRAME-RATE=24.000,AUDIO="audio",SUBTITLES="subs",VIDEO-RANGE=SDR h264_360p/video.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=2748252,AVERAGE-BANDWIDTH=1488549,CODECS="avc1.64001f,mp4a.40.2",RESOLUTION=1280x720,FRAME-RATE=24.000,AUDIO="audio",SUBTITLES="subs",VIDEO-RANGE=SDR h264_720p/video.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=5438980,AVERAGE-BANDWIDTH=2868620,CODECS="avc1.640028,mp4a.40.2",RESOLUTION=1920x1080,FRAME-RATE=24.000,AUDIO="audio",SUBTITLES="subs",VIDEO-RANGE=SDR h264_1080p/video.m3u8
The line we are interested in is the fourth one:
Some of the attributes within the subtitles line above taken from the HLS manifest are (reasonably) self-explanatory:
You may have noticed there are also several optional flags you can set which aren’t so obvious: AUTOSELECT, DEFAULT AND FORCED. Each of these can be set with a YES or NO value. The default is NO and so if you do not include one of these flags, the implicit value is NO.
Set DEFAULT as YES if these subtitles should be selected if the user hasn’t provided any other information to indicate a different preference.
If you do set DEFAULT as YES, make sure to also set AUTOSELECT as YES.
In our video CMS we don’t currently provide an option to set a default language and so we set DEFAULT as NO. We leave it up to the client to pick.
Set AUTOSELECT as YES if these subtitles may be selected in the absence of an explicit user preference for another one.
It will be listed as available to select by the user. We set AUTOSELECT as YES for all subtitle tracks.
Set FORCED as YES if these subtitles are considered essential to play. We set FORCED as NO for all subtitle tracks.
The behaviour of FORCED is inconsistent across devices and browsers (if it’s supported at all). Often tracks which have FORCED set as YES won’t be listed on the user’s language selection menu (that is how Apple devices handle this flag).
If you do set FORCED as YES, you should also set AUTOSELECT as YES.
The FORCED flag should only be included if the TYPE attribute is set as SUBTITLES (hence why it is applicable to our usage).
As you can see in the example line above, our video CMS sets the three flags like this:
DEFAULT=NO,AUTOSELECT=YES,FORCED=NO. As a result, the subtitles are listed in the menu. On iOS the native menu looks like this:
Both extend your content to a wider audience.
Adding closed captions makes your content accessible to viewers that are hard of hearing.
Adding subtitles makes your content understandable to audiences around the world who may not otherwise understand the audio language.
The basis of HLS is to split your video files into many small chunks. Each is usually a few seconds in duration. And there is a manifest file which lists the segments.
We make that for you as part of the packaging stage.
Once that is done, you will see a thumbnail image taken from your video appear in your video library. Click on that and then on the Subtitles tab. You should see a button to upload a new subtitles file. As described above, that should be a WebVTT .vtt file.
When you upload that .vtt file, we validate it and then add those subtitles to your video’s primary HLS playlist.
Updated: November 18, 2021