Acrobat SDK User’s Guide
43
Creating PDF Documents
Creating Tagged PDF Documents
4
N
O T E
:
Server use of the Distiller software is not allowed. The End User License Agreement
allows for use only on a single system. Access to, and use of, the Distiller software
over a network is prohibited. The only exception is installation of the software. You
are permitted to keep a copy of the software on a server so that users who have a
license for the software can download and install it. A separate product, Acrobat
Distiller Server, can be purchased or licensed from Adobe for server use.
Creating Tagged PDF Documents
PDF files are well known for representing the physical layout of a document; that is, the
page markings that comprise the page contents. In addition, PDF versions 1.3 and beyond
provide a mechanism for describing
logical structure
in PDF files. This includes information
such as the organization of the document into chapters and sections, as well as figures,
tables, and footnotes.
PDF 1.4 and Acrobat 5 introduced
tagged PDF
, which is a particular use of structured PDF
that allows page content to be extracted and used for various purposes, including:
●
Reflow of text and graphics
●
Conversion to file formats such as HTML and XML
●
Access for the visually impaired (see
Chapter 14, “Accessibility”
).
PDF Logical Structure
PDF logical structure is layered on top of a document’s page contents using a special
markup language. HTML and XML use a similar layout for logical structure: text enclosed in
a hierarchy of tags. In HTML, each component is wrapped with a set of tags that define its
structure. For example, the text of a top-level header begins with a
<h1>
tag and ends with
a
</h1>
tag. PDF provides similar constructs with its
marked content
operators.
In fact, HTML logical structure can be preserved in a PDF document. The Web Capture
feature introduced in Acrobat 4.0 allows converting HTML to PDF. Such PDF may optionally
contain structure information from the HTML data. Acrobat can generate bookmarks from
this structure data.
The Structure Tree
Logical structure is independent of, though related to, the page content (that is, the actual
marks on the page made by the marking operators).
In a PDF document, logical structure is represented by a tree of elements called a
structure
tree
. There are pointers from the logical structure to the page contents, and vice versa. The
structure tree provides additional capability to navigate, search, and extract data from PDF
documents. By accessing a PDF document via its structure tree, for instance, you can obtain
logically ordered content independently of the drawing order of the page contents.
Summary of Contents for Acrobat 7.0.5
Page 10: ...Contents 10 Acrobat SDK User s Guide ...
Page 66: ...Modifying the User Interface Customizing Acrobat Help 6 66 Acrobat SDK User s Guide ...
Page 78: ...XML and the Acrobat SDK Managing XML based Information 8 78 Acrobat SDK User s Guide ...
Page 100: ...Providing Document Security Document Rights 11 100 Acrobat SDK User s Guide ...
Page 106: ...Working with Metadata Object Data 12 106 Acrobat SDK User s Guide ...
Page 110: ...Searching and Indexing Indexing PDF Documents 13 110 Acrobat SDK User s Guide ...
Page 118: ...Working With PDF Layers Working with Layers from a Plug in 15 118 Acrobat SDK User s Guide ...