How-to guides page navigation
transcript-content-audiovisual-elements

​​​​​​​

Audiovisual elements

[Narrator:] In the chapter "Text transcripts, captions and sign language", you saw an introduction showing different ways to communicate audio-based information for people with auditory impairments. Implementing text transcripts is a very basic method, providing a textual version of the content that can be accessed by anyone. As this is a very simple way of providing content in an alternative form, we will not discuss it any further here.

In contrast to this, the creation of a sign language translation is a very complex procedure requiring a sign language interpreter to record a second video, which must then be embedded into and synchronised with the original video. We will not go into this complex procedure here.

In the first part of this chapter, we will demonstrate how to create captions. In the same way that text transcripts, captions and sign language translate audio information into a visual form for people with auditory impairments, audio descriptions translate visual information into an audible form for persons with visual disabilities. We will talk about this in the second part of the chapter.

Captions

Captions are text versions of the spoken words or significant sounds presented within a video. Please note that we are focusing here on the creation of closed captions, which can be activated and deactivated by the user. There are several ways to create caption files: using a plain text editor, using a dedicated application, or using a video editor with built-in caption capabilities. At the time of the creation of this video, editors with built-in caption capabilities only exist as commercial products. Therefore, we will only demonstrate the first two options.

The most convenient way to create captions is to create a separate caption file. Before you create your caption file, please make sure that the file format you are using is supported by the player software you intend to use. Not every player supports all caption formats. If you want to use an online solution, make sure that the chosen platform supports the import of your caption format.

The differences between file formats are often subtle. Let’s have a closer look at two caption file formats: SubRip and WebVTT.

The SubRip file format

SubRip text files inherit their name from SubRip software. SubRip is a free software program for Microsoft Windows which extracts subtitles and their timings from various video formats to a text file. It uses the file suffix .srt.

SubRip text files contain formatted lines of plain text in groups separated by a blank line. The first line in a group is a sequence counter starting at one. The following line lists time codes indicating when the text should appear and when it should disappear. Hours, minutes and seconds need to be expressed as two zero-padded digits separated by a colon. The milliseconds are formatted as three zero-padded digits preceded by a comma.

The following lines show the subtitle text to be displayed. The text can have more than one line. Please do not use more than two lines and keep every line short so it fits into the video. It is recommended to use the UTF-8 character format to ensure that language-specific characters, punctuation and symbols are presented correctly. In fact, character encoding is not part of the SubRip specification, which means that SubRip parsers have to make their best guess for matching the character format. As you may expect, this is often a source of problems.

The SubRip file format is supported by most software video players.

The WebVTT file format

The WebVTT or Web Video Text Tracks format is a World Wide Web Consortium standard for displaying timed text in connection with the HTML5 track element. The primary purpose of WebVTT files is to add text overlays to a video element. Even though it is still a candidate recommendation at the time of the creation of this chapter, the basic features are already supported by all major browsers and video players.

The WebVTT format is very similar to the SubRip file format. Let’s have a look at an example to see some differences. The first line starts with WEBVTT. Timecode fractional values are separated by a full stop instead of a comma. Timecode hours are optional. The frame numbering preceding the timecode is optional.

WebVTT gives creators more options when it comes to text formatting. The WebVTT file format offers many more features to add comments, meta information and style sheets. Please refer to the standard to learn more about the details. Note that not every video player can render all styling features described by the standard.

Even though file formats for captions differ, they are easy to understand. Many programs are capable of converting one caption format into another. Of course, you might lose styling information if you convert from a format with rich styling capabilities to a format with plain text presentation.

For our following demonstration, we will focus on the creation of the SubRip file format.

Creating captions using a plain text editor

Let’s create and test captions using a plain text editor. You can use any plain text editor for this. For our example, we are using Microsoft’s free cross-platform Visual Studio Code editor.

Here is our caption file in SubRip format. As we need to know the timing for displaying a subtitle, we are using the VLC free and cross-platform media player. Of course, you can use any other video player software capable of displaying timing information.

As our captions need to be set precisely to a fraction of a second, we added the free Time extension to VLC, which shows the timing including milliseconds. We play the video step by step and add the SubRip caption groups into our editor. [Video:] Hello, my name is Josiane.

[Narrator:] This is a manual task which requires a lot of precision.

[Video:] My white cane signals you that I am visually disabled. [Indistinguishable.]

[Narrator:] Finally, we save our caption file in the same directory as our video file. We use the name of the video file, add the language code and choose the file suffix .srt. When we reopen our video in VLC, the software will search for caption tracks and offer them for display. As we have multiple caption tracks using different language codes, we can choose between them.

[Video:] Hello, my name is Josiane. My white cane signals you that I am visually disabled. Thanks to the cane my radius of action is increased by nearly one metre. I am orienting tactilely and acoustically. There is a guideline on the floor that shows me the direction. This surface has a different pattern. It is an attention field indicating a change in direction.

Creating captions using a dedicated application

[Narrator:] Creating captions with a plain text editor is possible but it may not be the best solution. Manually typing time codes and handling syntax checks is time-consuming and tedious. There are many applications available for creating caption files. Some of them are available for free, while others are commercial solutions. Try what works best for you. For our example, we have chosen the free and cross-platform Aegisub. We want to show you how to get started to give you an idea of how these tools work. First, we start Aegisub. Then we open the video using the menu Video, then Open Video. The video appears in the upper-left corner. In the upper-right corner, you can see a visual presentation of the audio. You can toggle between a spectrum analyser and a wave-based presentation. Zooming in and out allows you to select the precise timing of your captions. Here we start the video.

[Video:] Thanks to the cane my radius of action is increased…

[Narrator:] As we know the video information and we want to focus on the audio part, we choose a different layout, which offers us more screen space for the audio presentation. We now see three parts on the screen: The audio waves, including a set of controllers; the working area, where we will write and edit our caption texts; and the subtitle view in list presentation.

A good way to start is to create your texts in any text editor. Here you can see a file with our text lines. Please keep your lines short as they have to fit on the screen. We copy our texts and paste them as caption texts into Aegisub.

Every line in our text is now one line of caption.

Please note how the start and end timings are set to zero, as the program has no timing information yet. We click the first caption line and search for the matching section in the audio file. We select this audio section with the mouse. Please note how the timing changes in the caption line. The red line indicates the start of the caption timing and the blue line indicates the end.

[Video:] Hello, my name is Josiane.

[Narrator:] Step by step we select the different lines and select the timing. If two timings overlap, the program will mark the lines in red. You can fix the overlapping by moving the red or blue line with the mouse. Of course, you can also type in the correct time code directly.

Note the display of the characters-per-second number. The more characters we have on a line, the higher the number. As the number increases, the background of the number gets redder and redder. This warning indicates that there will be not enough time for a user to read your text. Try to avoid more than 15 characters per second. Never have more than 20 characters per second. If your line gets too long and you want to split it, position the cursor where the line should be split and press Shift + Return. The \N character indicates the position of the line feed Never use more than two lines, as the text could cover important video information.

The program offers a lot of options to format text. As we do not want to use the built-in subtitles which are part of the video and the SubRip format does not offer any styling possibilities, we will not demonstrate those features here. Of course, you can always add or remove caption lines from the subtitle view. Just select the line, right-click to open the context-sensitive menu and choose one of the line operations.

Let’s assume we are pleased with the current result. To save our work, we choose File, Save, from the menu bar. The native file format of Aegisub has the file suffix .ass. This file format is a generic container of your work. Most video players will not be able to work with this file.

Therefore, we export the captions via File, Export As. We ignore all the options and press the Export button. The program allows us to choose between different output formats. We choose the SubRip format. Our filename should match the name of our video file and include a language code and the file suffix .srt.

The SubRip caption file can now be used to distribute it together with our video file or to upload it to an online video portal to improve the accessibility of our video.

Aegisub is a very powerful program. We have not covered all of its features here. Our intention was to offer you a quick start on the creation of captions with Aegisub. For more information, please consult the program manual.

Audio description

While captions translate audio information into a visual form for people with auditory impairments, audio descriptions translate visual information into an audible form for persons with visual disabilities.

Audio description is a generic concept which does not only apply to videos. It is used in television and cinema as well as in live events such as theatre. Audio description is usually offered as an optional service, as, for example, an additional audio track which can be activated or deactivated by the user.

Audio description aims to create an understanding of what can be seen in the picture. The visual information to be described is selected based on the context. It is often more important to describe who is coming through the door than the colour of the door.

Acoustic content should not be repeated in the audio description; for example, “The phone is ringing”. Audio description should use the acoustic pauses of the film. It should never overdub audio passages, such as a conversation, with content. The goal is to use the acoustic pauses to make all the necessary visual information available to a visually disabled person as completely and comprehensibly as possible.

It is important to be selective. The shorter the available time, the more filtering is required. If there is more time available than required, then silence is often the better choice. It is important to find the right balance. A good film producer knows about the need for audio description and creates speech pauses deliberately in order to leave space for the audio description.

However, often an audio description needs to be created for an existing film production. In this case, we have to use the audio pauses that are available. If there are no pauses, for example, if a narrator speaks for the entire duration of the film, then an audio description is not feasible.

The audio description should be spoken in a calm, clearly understandable voice. The audio description itself does not create content or suspense. It should be objective and not judging. It should always remain comprehensible and not use technical vocabulary. The audio description always remains clearly distinguishable from the film sound due to the even pace of speech. The audio description of a film should always be spoken by the same person.

Correct timing: An audio description should be done in a synchronised manner. It should not anticipate the information that a sighted viewer would not have. Sudden twists and turns should not be described before the visual presentation to avoid taking away the surprise. Similarly, it should not describe actions that were visible a long time before. An audio description should not leave out anything that might be necessary to understand subsequent parts of the film. This means that to create a good audio description, you need to know the entire content. This is often not possible for live events.

Here comes a short example. To make the most of this example, close your eyes while the video is playing. We will play it once without and once with the audio description.

[Video:] [Background noises.] Hello, my name is Josiane. My white cane signals you that I am visually disabled. Thanks to the cane my radius of action is increased by nearly one metre. [Background noises.]

[Audio description:] The train platform is empty. Only one person stands on the train platform. A white cane gets unfolded. Feet on the train platform. A white cane touches the markers on the floor.

[Video:] Hello, my name is Josiane. My white cane signals you that I am visually disabled. Thanks to the cane my radius of action is increased by nearly one metre.

[Audio description:] Josiane is walking to the stairs. While climbing the stairs, she holds the white cane loosely between two fingers so that it can swing against the steps. This way, the white cane indicates every following step.

Where to continue?

This is the final chapter of this tutorial. Depending on your personal interests, you could review the following chapter:

  • Text transcripts, captions and sign language

[Automated voice:] Accessibility. For more information visit: op.europa.eu/web/accessibility.

Close tab