Requirements for automated PDF testing
The primary requirement of automated PDF tested is ensure all Content is Tagged as per Ensure all content is tagged Best Practice. If the PDF is not tagged you will return a violation for untagged content, however no other automated tests will be run.
Automated Testing on PDFs will check that all content is tagged at two levels, which are explained in greater detail below:
- Check that the Document is Tagged
- Check that all of the content in the Document is tagged
Document is Tagged
- Check for the document catalog (see 7.7.2 Document Catalog in the PDF 32000-1:2008 Specification). If one does not exist, then that is an indicator that this Document is not tagged.
- Check the mark information dictionary (14.7 Logical Structure - 14.7.1 General - Table 321 - Entries in the mark information dictionary).
-- If the dictionary does not exist, this is an indicator that the document is not tagged.
-- If the Marked (boolean) entry in the dictionary is FALSE, then this is an indicator that the document is not tagged.
-- If the Suspects (boolean) entry in the dictionary is TRUE, then this is an indicator that the document is not properly tagged.
When the PDF is not encrypted, we can also check if the StructTreeRoot (14.7.2 Structure Hierarchy in the PDF 32000-1:2008 Specification) exists and if it does not this too is an indicator that the document is not tagged.
Content in the Document is tagged
The Automated checked uses the Content Stream to identify unique content.
It then verifies that each content item corresponds to or is part of a Tag in the Tag tree. See 14.8.2 Tagged PDF and Page Content in the PDF 32000-1:2008 Specification.
Any content items identified as 'Artifact' are not considered to be untagged.
Note that there are also special considerations for 'span' Content Elements related to accessibility information.