-
Notifications
You must be signed in to change notification settings - Fork 11
EPUB 2 sunset
Latest Draft: 13 August 2018
Ric Wright
Luc Audrain
Dave Cramer
George Kerscher
Although EPUB 3 has been a recommended specification since 2011, many publishers are still creating EPUB 2 files. This white paper describes the many advantages of moving to EPUB 3, and recommends that all publishers stop creating EPUB 2.
EPUB has been around since 1999, although it was known as OEB back then. What we know as EPUB 2 was finalized in 2007. OEB and EPUB have always had the great strength of adapting the content to the reader, in marked contrast to PDF. And they were built on existing standards like XHTML 1.1 and CSS 2, although EPUB has always relied on "profiles" of these standards, often supporting only a subset.
EPUB 3 was designed in 2010 by IDPF with help from DAISY Consortium on modern and sustainable standards from the Open Web Platform (HTML5/CSS3) to address digital publishing needs in several areas, including:
- Structured documents
- Enhanced typography
- Better user experience
- Support of any language
- Improved accessibility
- Richer navigation
- MathML support
So EPUB 3 was chartered in May of 2010 and became a recommendation in 2011. However, despite the specification becoming a recommendation years ago, a significant proportion of EPUBs being produced these days are still EPUB 2. The intent of this white paper is to explore why this might be and to show why a migration to EPUB 3 makes sense in so many ways.
Why should EPUB 2 producers switch to EPUB 3? The short answer is to enable delivery of richer, more accessible content. EPUB 3 is simply more capable, more expressive, more powerful. The following sections examine various use-cases, which show how EPUB 3, based on today's web technology, leaves EPUB 2 behind.
EPUB 2 content documents are based on XHTML 1.0 (Transitional) which is essentially HTML 4 expressed as XML.
EPUB 3 is based on the XML serialization of HTML5. HTML5 has a much richer vocabulary for expressing the content of complex documents, including the section
, header
, figure
, and aside
elements. HTML5 also includes native support for audio and video. This has benefits for both creating and consuming content. Most publishers have production tools that work with structured documents. It's now easier to transform those documents into HTML5, and to maintain this critical semantic information for use in other products and systems. This also allows EPUB 3 reading systems to provide new features based on this richer markup, as we will see below.
Content creators can also leverage EPUB 3's Structural Semantics Vocabulary, which allows even richer tagging of document content than is built into HTML5.
As EPUB 2 has limited support for high quality in typography, eBook composition for even simple text has been downgraded to a level that is not acceptable in the printing industry. EPUB 3, on the other hand, has enabled high quality text rendering by leveraging the ongoing improvements of the CSS3 standard, including drop caps and image text wrapping.
The more semantic markup that HTML5 provides can be leveraged by EPUB 3 reading systems to create better user experiences. For example, using HTML5's <aside>
element with EPUB's structural semantics vocabulary (epub:type
), several reading systems have implemented pop-up footnotes, so that readers don't lose their place in the main text while navigation back and forth to notes. There is an excellent article by Liz Castro on how this can be done.
EPUB 2 cannot support right-to-left text, bidirectional (bidi) text, or vertical writing. EPUB 3 supports virtually every language in the world, including Arabic, Chinese, horizontal and vertical Japanese, and Hebrew. RTL and LTR languages can be mixed.
EPUB 2 has some basic accessibility features, but is not capable of full compliance with the latest Web accessibility guidelines (WCAG), which depend on the richer semantics offered by HTML5 and included in EPUB 3.
EPUB 3 now has a formal accessibility standard, which provides guidance for authors, and enables certification of quality for readers. The DAISY Consortium has even created an accessibility checker for EPUB 3.
EPUB 3 also supports media overlays, which provide synchronized audio narration, widely used for persons with print disabilities. In EPUB 3, these types of books are created by using Media Overlay Documents to describe the timing for the pre-recorded audio narration and how it relates to the EPUB Content Document markup. The file format for Media Overlays is defined as a subset of SMIL, a W3C recommendation for representing synchronized multimedia information in XML.
One of the criticisms of EPUB 2 was that its support for navigation was fairly primitive. The NCX provided the machine-readable table of contents (TOC). The guide element provided optional links to specific sections of an ebook. But the NCX could not be formatted, not even to include an italic word. Authors had to live with the NCX’s limitations, or else provide a second, redundant TOC in HTML.
EPUB 3 replaced the NCX with the HTML nav
element, which can be displayed to the reader with the full power of HTML/CSS and processed by the reading system as the NCX was before.
EPUB 3 also supports more specialized nav
elements. The page-list nav
element maps print page numbers to locations in the EPUB, which is crucial for accessibility in contexts like classrooms. Landmarks nav
elements can provide links to the fundamental structures of a publication.
EPUB 3 also introduced support for a subset of MathML, which enables authors to create EPUB documents with markup that is rendered as mathematical equations.
Novels and essays are the perfect example of highly textual books where EPUB 3 is a must. By enabling high quality typography and layout, readers are provided with first-class text composition that brings the pleasure of reading to the forefront - the way it should be!
With CSS3 layout techniques, text and graphics contents can be highly designed while still keeping responsive capabilities to adapt different screen sizes. Specialized content like mathematical equations benefit from the MathML support of EPUB 3.
Several official bodies endorse or even recommend EPUB 3 for textual works in digital form:
-
Library of Congress: "The Library of Congress Recommended Formats Statement (RFS) includes EPUB 3 as a preferred format for textual works in digital form. » in LC preferences
-
DAISY Consortium: in Baseline for Accessible EPUB 3
The following table provides a summary of the key features added in EPUB 3. For more detailed information, please see the official IDPF document here as well as the links below.
Feature | Comment |
---|---|
HTML5 support | EPUB 3 still requires the XML serialization |
SVG documents in the spine | In EPUB 2, SVG documents had to be embedded in an XHTML page. However, support for this feature is limited. |
Support for MathML | XHTML Content Documents support embedded MATHML but limit its usage to a restricted subset of the full MathML markup language. |
Fixed Layout | |
Navigation | TOC is now required in HTML. A NCX is still permitted but a TOC is a requirement |
Accessibility | Most notably the inclusion of ARIA attributes for making dynamic content accessible |
Linking | The IDPF has established a registry of linking schemes. EPUBCFI is the first scheme added to the registry, and can be used for linking into, between and within Publications. Reading System support for this scheme is required. |
Scripting | EPUB 3 Reading Systems may optionally support scripting, which was explicitly discouraged in EPUB 2. Scripted content must be identified as such in the package manifest [Publications30] |
Audio and video | Support for audio and video embedded via the HTML5 audio and video elements is strongly encouraged. Reading Systems should support at least one of the MP4/H.264 and WebM/VP8 video codecs. For audio, MP3 support is required, MP4 support is recommended |
Media overlays | This specification, EPUB Media Overlays 3.0, defines a usage of [SMIL] (Synchronized Multimedia Integration Language), the Package Document, the EPUB® Style Sheet, and the EPUB Content Document for representation of audio synchronized with the EPUB Content Document. |
Additional modules from CSS3 | EPUB 3 defines a profile of CSS based on CSS 2.1 with added modules from CSS3, whereas EPUB 2 was based on a specific subset of CSS 2. Refer to EPUB Style Sheets for more information. |
WOFF | EPUB 3 now requires Reading Systems to support both the OpenType and WOFF font formats for embedded fonts in conjunction with the CSS @font-face rules. |
Semantic Inflection | Addition of the epub:type attribute to semantic inflection |
Text to Speech | Multiple features to assist Text-to-Speech (TTS) engines have been added. These include Package-level Pronunciation LexiconsSSML, PLS pronunciation lexicons and CSS3 Speech for enhanced text-to-speech playback |
Reading System Object | The epubReadingSystem object provides an interface through which a Scripted Content Document can query information about a user's Reading System. The object exposes properties of the Reading System (its name and version), and provides the hasFeature() method which can be invoked to determine which features it supports. |
Feature | Comment |
---|---|
DTBook | Now deprecated in favor of HTML and CSS markup for audio accessibility |
Out-of-Line XML Islands | A controversial and ultimately unused feature |
Triggers | The trigger element provided declarative control of audio and video content (cf. EPUB 3.0.1 trigger element). Authors are advised to use the native controls provided by the [HTML] audio and video elements. |
Bindings | EPUB no longer supports the use of bindings in the Package Document to provide an alternative scripted fallback for foreign resources embedded in an object element. The [HTML] object element's intrinsic fallback mechanism (embedded content) can be used to provide a Core Media Type fallback. |
Tours | The Package Document schema no longer includes the tours element (which was deprecated in OPF 2.0.1) and dropped entirely in EPUB 3. |
Filesystem Container | OCF 3.0 [OCF3] only defines a single-file (ZIP-based) container, and no longer defines a "Filesystem Container" abstraction. This change, along with new restrictions in Publications 3.0 restricting references to remote resources means that the only instantiation of an EPUB Publication defined at this time is the EPUB ZIP Container, and that EPUB files must in general contain all constituent parts of the Publication, with certain well-defined exceptions. For more info, please see the discussion here. |
Guide | Use of the optional guide element in the Package Document has been deprecated in favor of the EPUB Navigation Document landmarks feature. Refer to EPUB Navigation Documents [ContentDocs30] for more information. |
NCX | The NCX has been superseded in favor of HTML-based (TOC.html) EPUB Navigation Documents. |
2.0.1 meta element | The meta element defined in [OPF2] has been obsoleted and replaced by the new meta element, but may be included as an optional repeatable child of the metadata element for forwards compatibility purposes. |
This section provides some details about the actual structure and markup of EPUB 2 and 3 files by taking a brief view of the actual markup.
The EPUB specs do not require particular folder structure or naming of files making up the EPUB, with the exception of the mimetype file and the contents of the META-INF folder. Although the specification has not changed significantly since EPUB 2, the recommended naming of files and folders is currently
Best practices would suggest the following structure,
mimetype
META-INF/container.xml
META-INF/encryption.xml
package.opf
EPUB/html
EPUB/css
EPUB/fonts/
EPUB/js
EPUB/images
EPUB/svg
But in practice this is largely up to the author.
This example of EPUB 2 and 3 package files is based on two versions of Alice in Wonderland, one EPUB 2, the other EPUB3. As Alice is a very simple document, the changes are not radical but are critical. Almost all the differences are in the package (OPF) document.
Here is the EPUB 2 version.
<?xml version="1.0"?>
<package xmlns="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0" unique-identifier="pubid">
<metadata>
<dc:title>Alice's Adventures in Wonderland</dc:title>
<dc:creator>Lewis Carroll</dc:creator>
<dc:date xmlns:opf="http://www.idpf.org/2007/opf" opf:event="creation">2013-08-29</dc:date>
<dc:subject>fiction</dc:subject>
<dc:language>en-GB</dc:language>
<dc:coverage>England - 19th Century</dc:coverage>
<dc:rights>Public Domain</dc:rights>
<dc:publisher>D. Appleton and Co</dc:publisher>
<dc:identifier id="pubid">fab106a7-1f9f-4716-8c80-08932fe21b66</dc:identifier>
</metadata>
<manifest>
<!-- fonts -->
<item id="font0" href="fonts/MinionPro.otf" media-type="application/vnd.ms-opentype"/>
...
<!-- navigation -->
<item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml" />
<!-- body content -->
<item id="titlepage" href="titlepage.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter01" href="chapter01.xhtml" media-type="application/xhtml+xml"/>
...
<!-- styling -->
<item id="css" href="style.css" media-type="text/css"/>
<!-- images -->
<item id="img01a" href="images/alice01a.gif" media-type="image/gif"/>
...
</manifest>
<spine toc="ncx">
<itemref idref="titlepage"/>
<itemref idref="chapter01"/>
...
</spine>
</package>
Alternatively, here is the EPUB 3 package:
<?xml version="1.0"?>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="pub-id" version="3.0" >
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:opf="http://www.idpf.org/2007/opf">
<dc:title>Alice's Adventures in Wonderland</dc:title>
<dc:creator>Lewis Carroll</dc:creator>
<dc:date>1865-07-04</dc:date>
<dc:subject>fiction</dc:subject>
<dc:language>en-GB</dc:language>
<dc:coverage>England - 19th Century</dc:coverage>
<dc:rights>Public Domain</dc:rights>
<dc:publisher>D. Appleton and Co</dc:publisher>
<dc:identifier id="pub-id">urn:uuid:7408D53A-5383-40AA-8078-5256C872AE41</dc:identifier>
<meta property="dcterms:modified">2016-03-14T11:23:26Z</meta>
<meta name="cover" content="coverpage" />
</metadata>
<manifest>
<!-- fonts -->
<item id="font0" href="fonts/MinionPro.otf" media-type="application/vnd.ms-opentype"/>
...
<!-- navigation -->
<item id="nav" href="nav.xhtml" media-type="application/xhtml+xml" properties="nav"/>
<!-- body content -->
<item id="titlepage" href="titlepage.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter01" href="chapter01.xhtml" media-type="application/xhtml+xml"/>
...
<!-- styling -->
<item id="css" href="style.css" media-type="text/css"/>
<!-- images -->
<item id="img01a" href="images/alice01a.gif" media-type="image/gif" properties="cover-image"/>
<item id="img02a" href="images/alice02a.gif" media-type="image/gif"/>
...
</manifest>
<spine>
<itemref idref="titlepage"/>
<itemref idref="chapter01"/>
...
</spine>
</package>
As one can see, the changes are not large, but there are a few key changes that MUST be present for the document to be a valid EPUB 3.
- The version attribute MUST be 3.0.
- The meta element with the property dcterms:modified must be present and have a valid date.
- If the document has a cover page, it must be declared in the meta element with the name of "cover". There must be then a cover image in the manifest with the property "cover-image"
- One of the content documents MUST be the EPUB 3 nav document, which declares the property "nav"
- The metadata MUST include the
dcterms:modified
element which holds the date the last time the document was modified
Finally, the spine element in the package must NOT declare an ncx unless a NCX navigation file is present in ADDITION to the HTML nav file.
The EPUB nav document is a very flexible entity. It can be very simple such as in this EPUB 3 Alice file:
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
<meta charset="utf-8" />
<link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body class="reflow">
<nav xmlns:epub="http://www.idpf.org/2007/ops" epub:type="toc" id="toc">
<ol>
<li class="toc" id="chapter01">
<a href="chapter01.xhtml">I. Down the Rabbit-Hole</a>
</li>
<li class="toc" id="chapter02">
<a href="chapter02.xhtml">II. The Pool of Tears</a>
</li>
</ol>
</nav>
</body>
</html>
Alternatively, the nav doc can leverage the new semantics introduced in EPUB 3 to produce rich, flexible navigation for the user.
This appendix provides a guide to a series of example files of EPUB 3. The files are intended to illustrate best-practices for a variety of common EPUB files. Each of the files listed below is available as both a fully built EPUB and as source code on github. Naturally, given the complexity and breadth of the EPUB spec, the possibilities are nearly endless. The intent of these examples is not to cover all possibilities but to provide guidance in best practices.
Name | EPUB | Sources | Online Example | Comment |
---|---|---|---|---|
Tiny-EPUB | tiny3.epub | tiny-epub3 | tiny3.epub | The simplest EPUB 3 possible |
Tiny-FXL | tiny3-FXL.epub | tiny-fxl-epub3 | tiny3-FXL.epub | A minimalist fixed-layout EPUB 3 |
Tiny-SVG | tiny3-SVG.epub | tiny-svg-epub3 | tiny3-SVG.epub | A minimalist EPUB 3 with SVG |
Tiny-RTL | tiny3-RTL.epub | tiny-rtl-epub3 | tiny3-RTL.epub | A minimalist EPUB 3 with RTL text |
Alice3 | alice3.epub | alice3-source | alice.epub | A simple, basic epub |