How-to guides page navigation
transcript-content-how-to-repairing-pdfs

​​​​​​​

Reparing PDF in Adobe Acrobat

[Narrator:] In the previous chapters we saw how to create an accessible PDF using an appropriate authoring program. We saw how to test the resulting PDF for its level of accessibility. If the success criteria fail, then we should go back to our authoring program and fix the problem there.

However, for various reasons we often cannot make use of the authoring program:

  • If we do not have a license.
  • If the program is so outdated that we cannot install it on a current operating system.
  • If we received the PDF from an external source and we do not have access to the original file format.
  • If the reported problem cannot be fixed inside the authoring program; not every program can create a tagged PDF.

In all of these scenarios we cannot modify the original file to create a new PDF. So we need to be able to repair the existing PDF document.

Repairing a PDF with Adobe Acrobat

For demonstration purposes, we will repair a PDF file that was created using a printer driver. As a result, this document does not have any bookmarks or tagging. As a consequence, there are no headings, no lists and no tables. We do not have any alternative texts for the images and of course we have no metadata. Reading a document like this presents a real challenge to a user with vision disabilities.

Please note that this course can only give a small introduction to this topic. We cannot cover all the aspects and features that Adobe Acrobat has for repairing a PDF. This chapter should just give you a quick overview so that you can decide if you want to follow this approach.

As we saw in a previous chapter, this document has major accessibility problems. The built-in Full Check indicates many issues that need to be resolved. There are many built-in functions we can use, or we can ask the program to guide us.

The program offers us a guide in the form of an Action Wizard. We select this and continue with Make Accessible. The program will show a workflow with different actions to be completed. To begin, press the Start button. First we define the meta information. We can add or modify the title, the subject, the author, and the keywords.

The program now allows us to convert images of text into readable text. This step is called OCR or Optical Character Recognition. We define the language of our document so the OCR process can use an appropriate dictionary to improve word detection. Current PDF versions do not allow the definition of more than one main language in a single document. Our document is in English.

In the next step, the program asks us if our document contains any forms, so it can implement fillable form fields. Please note that PDF form handling is beyond the scope of this course. As our document has no form, we just press No, Skip this Step.

Acrobat will now try to detect all the images so that we can add an appropriate alternative text. The program has found an image that needs alternative text, which we can enter in this dialogue. The dialogue will reappear for all images in the document.

The last step of the Make accessible action is to check the result. We would like to know how accessible our document is now, so we press Start Checking. Most of the test result indicators have turned from red to green, so this is a big step forward.

Let's look at the details. Under Tables, the program claims that no headers could be found. Under Headings, the test fails because the headings have not been appropriately nested. We begin fixing our problems by repairing the table. By pressing the right-mouse button we open the context-sensitive menu and select Show in Tags panel. Here we can see all the newly created tags. If we open the table tags and the table row tags, we can see that the first row had been detected as a data row and not as a header row.

By pressing the right-mouse button, we open the context-sensitive menu and select Properties. The tag type is Table data cell. We can correct this and select Table header cell. We need to do this for all three header cells. Note how the tag type in the tree changes from TD to TH.

The check claimed that the nesting of the headings is not correct, so let's look at the headings. The program has classified the first heading as a heading of level 3. This is wrong. As the first heading, this should be a heading of level 1. We can repair this by clicking the right-mouse button to open the context-sensitive menu and selecting Properties. We can now change the type to Heading Level 1. We go through the different tags of the document while comparing it to its visual presentation. Here we reach a heading level 2, but it was detected as a normal paragraph. We can fix this in the same way. And another paragraph that should be a heading level 2, and another one. And here, a paragraph that should be level 3. As you can see, the longer the document and the more elements it contains, the more manual work needs to be done.

Reading order

Adobe Acrobat offers two different ways to define the sequence of elements. The Reading Order panel provides a visual representation of the order in which content is read by Adobe Acrobat and Adobe Acrobat Reader's Read Out Loud text-to-speech tool. Other third-party text-to-speech tools use this order as well.

The Tags panel displays the logical document structure that assistive technologies use to interpret the document. The logical structure defines the reading order and identifies elements such as headings, lists, tables, and other components that assistive technology users rely on for document navigation.

To achieve compatibility with screen readers and text-to-speech reading software, the sequence of elements in the Reading Order panel and the Tags panel should match. The Order panel view initially allows you to modify the order of elements in a very rough way. To have a more detailed view of the elements, please deactivate the switch Display like elements in a single box.

By dragging and dropping elements in the Order tree, we can change the reading sequence. Here we exchange the sequence of paragraphs. As mentioned before, screen readers use the information in the Tags tree to read the elements of the document. A screen reader goes through the information tree to determine the sequence and type of an element.

By dragging and dropping elements in the Tags tree, we can modify the reading sequence. Here we change the paragraph sequence. Please remember that for maximum compatibility with different types of reading software, keep the sequence in the Order panel and the Tags panel in sync.

Artifacts

If an image has no specific content and is for illustration purposes only, we can mark it as an artifact so it will be ignored by a screen reader. Marking an image as an artifact will delete it from the Tags tree. There are three ways to achieve this goal:

  • Option 1: In the Tags tree, move the mouse over the image element and open the context sensitive menu with the right mouse button. Select the Change Tag to Artifact… option to delete the image from the tags tree.
  • Option 2: In the Order tree, move the mouse over the image element and open the context sensitive menu with the right mouse button. Select the Tag as background/artifact option to delete the image from the tags tree.
  • Option 3: Open the Reading Order tool and select the image. Activate the Background/Artifact button to delete the image from the tags tree.

One remark: While the Set Alternative Text action is running, Acrobat can detect all the images in the document and display those with missing alternative text. The dialogue box can mark an image as a Decorative figure. This embeds the image in a PDF artifact tag. Although there are references that recommend using this approach to hide images from a screen reader, we cannot recommend this option. In practical tests, screen readers often presented this kind of image to the user, even though it was marked with an artifact tag. For maximum compatibility with all screen readers please use one of the previously mentioned three options to delete the image from the Tags tree.

Bookmarks

Although the Make Accessible action can do a lot, it cannot decide what information to use as a bookmark. We have to do this manually. First we select the bookmark pane on the left-hand side of the window. As we can see, there are no bookmarks yet. Next, we select the text that we have chosen to be a bookmark. By pressing the plus icon in the bookmark pane we can add this text as a bookmark. We select another text and press the plus icon again. The new bookmark has been created on the same level as the previous one. As these are headings of different levels, this is not what we wanted to have, so we use the mouse to move the second bookmark under the first one. Following this strategy we can build a bookmarks navigation tree.

Manual tagging

Sometimes the automatic detection of tagged elements does not work properly. Here is an example. We use the same document before making it accessible and try to create the tags manually by using the Reading Order panel. We draw a rectangle around an element and mark it with the appropriate type. Let's create the different heading levels. Looks good. We continue with texts. Now the image and its alternative text.

However, manual tagging does not provide the same level of detail to the tagging structure. As you can see, we cannot create tags for lists. Now let's try to do the same with the table. If we check the table rows, we can see a difference in the number of columns per row. The program was not able to detect the separation of data cells correctly. We try to repair this by collecting all texts of a data cell under the same TD tag. Oops, the content does not make sense any more as the text lines have been split into multiple text blocks. We can see how the program split the table using the table editor. Although Acrobat offers functions for modifying tables, our practical experience was more than frustrating. It is very difficult to repair a damaged table so it can be used by a screen reader.

Here is the procedure we can use to prevent this:

  • First, declare all the table cells individually.
  • Second, define the table and table row tags in the Tags tree.
  • Third, move the table cells into the correct tree positions.

As you can see, this can be a really time-consuming manual task when working with large tables.

Multilingual documents

Sometimes parts of your document may be in a different language, for example if you want to insert quotes using the original language.

Here we have an example document that was created in Adobe InDesign. The document has three paragraphs in three different languages. In InDesign we have set the language for each paragraph. When we open the Properties window and activate the Content tab, we can see that the language of this paragraph was set to German. Unfortunately, this value was not set for the tag of the same paragraph element.

When we activate the Tag tab, we cannot detect any language here. So we have to set it manually for the three paragraphs. After saving the PDF, the screen reader can now make use of the different languages. Let's see how these paragraphs are presented by a screen reader: [Screen reader:] In page 1, three items. I am a dummy copy. And I have been dummy copy since my birth. It took me a long time to realise what is to be a dummy copy. You make no sense.

[Screen reader reading the text in German.] [Screen reader reading the text in French.] [Narrator:] This is all we wanted.

Where to continue?

We have seen how to improve the accessibility of an existing PDF using Adobe Acrobat. As part of this course we could only offer an introduction to this topic. For more detailed instructions, please refer to the user manual.

Converting an existing PDF into a more accessible PDF can be time consuming, expensive and the results are often of poor quality. If possible, fix accessibility problems in the original authoring program.

This is the final chapter on PDFs. Depending on your personal interests you could continue with one of the following chapters:

  • Introduction to web standards.
  • Introduction to accessible EPUB.

[Automated voice:] Accessibility. For more information visit: op.europa.eu/en/web/accessibility.

Close tab