-
Notifications
You must be signed in to change notification settings - Fork 20
Internationalization, pagination and user settings
Readium CSS currently ships with RTL, CJK and vertical-writing specific stylesheets. This means implementers should load the stylesheets in their dedicated folder whenever needed, and disable some settings depending on the language of the publication.
- RTL stylesheets are located in
dist/rtl
- CJK horizontal (LTR page-progression) stylesheets are located in
dist/cjk-horizontal
- CJK vertical (RTL page-progression) stylesheets are located in
dist/cjk-vertical
Internationalization is an ongoing process, with browsers offering subpar interoperability (typography, writing modes, etc.), Operating Systems sometimes lacking fonts for some languages, and documentation providing little information on topics of interest to Reading Systems (a11y, settings, etc.).
This could well explain why the most popular Reading Apps tend to implement the least common denominator for all languages, especially when it comes to user settings, and to not support more complex languages implementers either know they can’t support well or are used on smaller markets the app/service is not targeting.
Occasionally, when an app does support a language, it can put some constraints on authors, as there is no other way to make sure the publication will be handled well otherwise e.g. specifying the Hans
or Hant
script for Chinese (zh
).
We can’t overemphasize the importance of the internationalization process though, as the 24 languages we added extend support to 3,049,150,507 speakers, from the 1,150 people speaking Western Canadian Inuktitut, to the 1,200,000,000 speaking Chinese. Implementing right to left scripts will extend support to 411,000,000 native speakers, while vertical writing to at least 130,200,000 – excluding Chinese and Korean.
In total, we can cover the needs of 5,262,900,507 speakers. Credit where credit is due, this wouldn’t have been possible if Operating Systems and browsers didn’t tackle this process upstream, added fonts for those languages and improved support in rendering engines.
Supporting the maximum amount of languages and scripts is a complex process.
As a consequence, work for internationalization should be tackled early, as the changes and adaptations needed will have a significant impact on an existing implementation. It indeed impacts the entire implementation, and not only CSS.
Implementers will need a way to retrieve page-progression-direction
and the primary language (<dc:language>
) of the publication.
This attribute is set on the <spine>
item, and the value rtl
should be considered important information for the whole process.
This value signals the publication is either an RTL script, or is using the vertical-rl
writing mode, which is the reason why we must find the primary language of the publication next.
The value is important to store, as it will be the one used for the dir
attribute to append if it is missing in a document.
It is very important to note the primary language must be checked in all cases, and not only when the page-progression-direction
is set or has an rtl
value.
Indeed, this piece will be even more critical in the following steps, as it will trigger the list of fonts to load for the publication, the user settings to provide, and the xml:lang
attribute to append if it is missing in a document.
The OPF file should not be considered a single source of truth for the publication, since issues may arise relatively quickly. We can’t call the process “heuristics” per se, it’s more of a chain of educated guesses.
There exists an increasing corpus of EPUB files with multiple <dc:language>
items. Some authoring tools, for instance, list all languages a publication contains.
In this case, page-progression-direction
should serve as a hint, if present. For instance:
- the first
<dc:language>
item is English; - the second
<dc:language>
item is Japanese; - the
page-progression-direction
isrtl
; - the primary language is Japanese.
Obviously, this can quickly become an issue if both languages share the same page-progression-direction
…
- the first
<dc:language>
item is English; - the second
<dc:language>
item is Japanese; - the
page-progression-direction
is missing; - we can’t guess the primary language from the OPF.
In such an edge case, to achieve the best interoperability possible, the first <dc:language>
element must be considered the primary language, unless you can pre-process all documents in a publication to determine it beforehand.
For some reason the page-progression-direction
may be missing in the OPF, which can be true if the publication is EPUB2 for instance – which supports the direction
CSS property and, in theory, could support RTL scripts.
The following guidance is informal:
- if the
page-progression-direction
is missing; - if there is only one
<dc:language>
item which clearly signals thepage-progression-direction
:-
ar
,fa
, andhe
; -
zh-Hant
, orzh-TW
.
-
- then you can assume the
page-progression-direction
isrtl
.
The decision to handle this edge case is up to each implementer, especially as it can be considered a patch of an authoring failure.
Once the page-progression-direction
is defined as rtl
, it must be reversed in the app:
- the previous resource (document) is on the right;
- the next resource (document) is on the left.
Navigating the publication should follow this pattern.
Missing attributes in each document is far from an uncommon or edge case.
Since the page-progression-direction
or <dc:language>
are already set in the OPF, some authors might think they will automatically apply to all the resources in the EPUB file, and explicitly set it only when it differs from those global values. More importantly, some Reading Apps are automatically managing this, and should authors only check their files in those apps, it could lead them to believe it just works.
The language is important as it will enable hyphenation and use the proper rules specific to each language if a dictionnary is available, change the default typeface for some languages, and even apply language-specific styles for layout (e.g. pagination, defaults for unstyled publications, etc.).
The following process must be implemented:
- if
xml:lang
can’t be found onhtml
; - check if
xml:lang
can be found onbody
, copy and set it tohtml
, and stop there if it is; - if it can’t be found on
body
, use the primary language retrieved from the OPF file and set it tohtml
andbody
.
The dir
attribute is critical too, as it will reverse the column direction for RTL scripts.
The following must only apply if the primary language is ar
, fa
, and he
. It MUST NOT apply to CJK.
The following process must be implemented:
- if the
dir
attribute can’t be found onhtml
; - check if
dir
can be found onbody
:- if it is the same value as the one retrieved from the OPF file, copy it;
- if it differs from the one retrieved from the OPF file, change the value;
- set the
dir
attribute with the correct value onhtml
.
The auto pagination model will take care of itself if the correct dir
attribute is set on html
and body
.
In other words, if dir="rtl"
is set for both elements, the column-progression will be automatically reversed.
What implementers need to do:
- check the
page-progression-direction
for thespine
item; - check the language – do not forget there can be multiple
<dc:language>
items; - load specific styles for RTL scripts (
dist/rtl
); - append
xml:lang
and/orlang
attribute if it’s missing in XHTML documents; - append
dir="rtl"
attributes if they’re missing for bothhtml
andbody
in XHTML documents; - load specific fonts’ lists for user settings, based on the primary language of the publication;
- add/remove specific user settings, based on the primary language of the publication;
- Apply the correct
page-progression-direction
(in RTL, next resource is on the left, previous is on the right); - change the direction of the toc and at least some pieces of user settings (e.g.
text-align
).
The current implementation is limited to the following combinations:
Language | IANA tag | page-progression-direction | dir attribute |
---|---|---|---|
Arabic | ar | RTL | rtl |
Farsi (Persian) | fa | RTL | rtl |
Hebrew | he | RTL | rtl |
IANA Language Subtag registery.
We may add others at some point in the future. Please feel free to report the languages or scripts missing in this mapping. Please bear in mind a list of default (preferably system) fonts will greatly help to add support for those languages and scripts. See Default Fonts.
Test files can be retrieved from the Readium CSS’ i18n-samples OPDS feed.
As explicitly stated in CSS Writing Modes Level 3:
As a special case for handling HTML documents, if the
:root
element has a<body>
child element, the principal writing mode is instead taken from the values ofwriting-mode
anddirection
on the first such child element instead of taken from the root element.
What this means is that the dir
attribute (or the direction
CSS property) set for body
will override the one set for html
. Unlike most other CSS properties, which don’t impact the parent element, the dir
attribute (or the direction
CSS property) propagates in this very specific case:
<html dir="ltr">
<body dir="rtl">
<!-- dir="rtl" should be used. -->
html {
direction: ltr;
}
body {
direction: rtl;
/* rtl propagates to html and overrides ltr.
You can think of it as a JS event bubbling up if that makes more sense. */
}
We MUST consequently force the direction for all documents in the publication, and can’t manage ltr
documents in a rtl
publication.
Note: While this isn’t necessarily the case in practice, in Blink, Gecko/Quantum and Webkit, and you can emulate a reversed column-progression for ltr
documents in a rtl
publication, this behavior may change in the future.
When publications are in Chinese, Japanese, Korean, and Mongolian, and laid out with a vertical-*
writing mode, we must switch to a different model since we can’t do a two-column spread.
Indeed, columns are automatically laid out on the y-axis
(vertical) with such writing modes, and the behavior of multi-column in orthogonal flows has been deferred to CSS Writing Modes Level 4.
We consequently use a “Fragmented Model”, as it differs significantly from the “Pagination Model”, especially the column-axis.
One can think of the fragmented model as the single page model rotated 90% clockwise. The only difference is that padding
is added to the :root
(html
) element so that text doesn’t run from edge to edge.
Other options have been explored, e.g. a pseudo-algorithm mimicking margin: auto
, using the calc()
function, but it proved complex to manage well and raised serious performance issues, especially when resizing the window of a browser with documents making heavy use of text-direction
and text-combine-upright
.
What implementers need to do:
- check the
page-progression-direction
for thespine
item; - check the language – do not forget there can be multiple
<dc:language>
items; - load the specific styles for CJK if needed (
dist/cjk-vertical
); - append
xml:lang
and/orlang
attribute if it’s missing in XHTML documents; - load specific fonts’ lists for user settings, based on the primary language of the publication;
- add/remove specific user settings, based on the primary language of the publication;
- Apply the correct page-progression-direction (in RTL, next resource is on the left, previous is on the right).
Here is the correct mapping for combinations resulting in the vertical-*
writing mode:
Language | IANA tag | page-progression-direction | Writing-mode |
---|---|---|---|
Chinese | zh | RTL | vertical-rl |
Chinese (Traditional) | zh-Hant | RTL | vertical-rl |
Chinese (Taiwan) | zh-TW | RTL | vertical-rl |
Chinese (Hong Kong) | zh-HK | RTL | vertical-rl |
Korean | ko | RTL | vertical-rl |
Japanese | ja | RTL | vertical-rl |
Mongolian | mn-Mong | LTR / Default / None | vertical-lr |
IANA Language Subtag registery.
Test files can be retrieved from the Readium CSS’ i18n-samples OPDS feed.
If a publication doesn’t need to be laid out in a vertical-*
writing mode, the auto pagination model can be used.
There are still specific styles for CJK Horizontal to load though (dist/cjk-horizontal
).
Here is the correct mapping for combinations resulting in the horizontal-tb
writing mode:
Language | IANA tag | page-progression-direction | Writing-mode |
---|---|---|---|
Chinese | zh | LTR / Default / None | horizontal-tb |
Chinese (Simplified) | zh-Hans | LTR / Default / None | horizontal-tb |
Chinese (Taiwan) | zh-TW | LTR / Default / None | horizontal-tb |
Chinese (Hong Kong) | zh-HK | LTR / Default / None | horizontal-tb |
Korean | ko | LTR / Default / None | horizontal-tb |
Japanese | ja | LTR / Default / None | horizontal-tb |
Mongolian | mn-Cyrl | LTR / Default / None | horizontal-tb |
IANA Language Subtag registery.
As explicitly stated in CSS Writing Modes Level 3:
As a special case for handling HTML documents, if the
:root
element has a<body>
child element, the principal writing mode is instead taken from the values ofwriting-mode
anddirection
on the first such child element instead of taken from the root element.
What this means is that the writing-mode
declared for body
will override the one declared for html
. Unlike most other CSS properties, which don’t impact the parent element, writing-mode
propagates in this very specific case:
html {
writing-mode: horizontal-tb;
}
body {
writing-mode: vertical-rl;
/* vertical-rl propagates to html and overrides horizontal-tb.
You can think of it as a JS event bubbling up if that makes more sense. */
}
We MUST consequently force the writing-mode
for all documents in the publication, and can’t manage horizontal-tb
documents in a vertical-rl
publication.
It is important to note that the list of user settings you may provide users with can change depending on the primary language of the publication.
Indeed, it doesn’t make sense to have some user settings in some languages, and they would do more harm than good e.g. hyphens in CJK. Ideally, those settings should therefore be removed from the UI, or at least disabled, if needed.
Implementers will need to load different list of fonts based on the languages listed in Default Fonts.
The most complex issue is finding fonts for those languages, especially as mobile systems often ship with the minimum amount of fonts possible to support Indic, Arabic, Hebrew, CJK, etc. And when the platform provides an extended selection, users often have to download them beforehand.
The following is provided as guidance only:
- the app should at least offer the publisher’s font and the default (
var(--RS__baseFontFamily)
) for the language – which should work automatically if the correct language is set for each document; - if implementers want to extend the list:
- use pre-installed fonts if the system offers some;
- use downloadable fonts if the system offers some;
- carefully pick fonts supporting the language and the idiosyncrasies of its typography;
- fall back to Google Noto Fonts.
- users probably have fonts already installed, re-use those fonts if possible (advanced setting in which they can access or declare those fonts).
User settings to disable are:
-
--USER__bodyHyphens
; -
--USER__wordSpacing
; -
--USER__letterSpacing
.
User settings to add are:
-
--USER__ligatures
.
For Chinese, Japanese, and Korean, implementers must manage both horizontal and vertical writing modes, since the pagination model differs.
User settings to disable are:
-
--USER__textAlign
; -
--USER__bodyHyphens
; -
--USER__paraIndent
; -
--USER__wordSpacing
; -
--USER__letterSpacing
.
This also impacts the Mongolian script.
User settings to disable are:
-
--USER__colCount
; -
--USER__textAlign
; -
--USER__bodyHyphens
; -
--USER__paraIndent
; -
--USER__wordSpacing
; -
--USER__letterSpacing
.
Ideally, several parts of the app should be customizable depending on the publication. Another option is implementing the least common denominator for all languages.
The list of fonts the app offers to users should be specific to the primary language of the publication, and writing-mode
if it applies – Japanese currently.
This means fonts for Latin language can’t be reused for Indic, RTL scripts, CJK, etc.
Several parts of the UI must follow the direction (rtl
) of the primary language:
- the running header (title of the publication or chapter);
- the toc and its entries;
- user settings e.g. text align;
- implementers might want to localize the interface based on the language set at the system level, or at least fall back to English.
Moreover, some user settings should be removed if used (letter-
and word-spacing
) and another one added (arabic ligatures in ar
and fa
).
Although the UI can keep an ltr
direction with a horizontal-tb
writing mode, some extra attention should be paid:
- make sure the “UI font” can display the characters needed in those languages;
- implementers might want to allow authors to set a
vertical-*
writing mode for the navigation document (nav.xhtml
); - implementers might want to localize the interface based on the language set at the system level, or at least fall back to English.
Implementers should make sure features like search, highlighting, etc. can work well with bidirectional text and unicode – CJK, especially as some characters change from horizontal to vertical writing modes.
Another issue to take into account is that input methods might not allow users to use some features easily, in which case extended research should be made to check realistic options.
Implementers should make sure they offer at least two options: the publisher’s font and the default.
Implementers should be aware there are overarching issues for which we haven’t reached consensus, or couldn’t discuss yet.
The most important issue, by very far, is that checking the writing-mode
at runtime can blow performance in extreme ways. It can indeed take 15 seconds to render some complex XHTML files in vertical-*
. Needless to say, this would obviously be worse in terms of UX. And this is the reason why we try to guess the writing-mode
from the OPF file.
Longer terms issues include:
- polyfilling
-epub-properties
for web apps; - mixed directions (LTR document in a RTL publication) and mixed writing modes (
horizontal-tb
document in avertical-rl
publication); - support for alternate stylesheets, which is critical if the implementer wants to offer a horizontal/vertical-writing user setting;
- support for
rendition: align-x-center
; - support for
ibooks:respect-image-size-class
(gaiji) andibooks:scroll-axis
metadata items (see EPUB Compat doc); -
rendition: flow
ofscrolled-doc
.
There are some typography and layout issues which are not the responsibility of apps’ implementers but rendering engines’. Those issues include:
- line-adjustment and justification (RTL and CJK);
- run-in headings (
display: run-in
), which is popular in CJK; -
ruby
and its styling; -
bidi
; - Kashida Elongation (Arabic);
- joining forms (Arabic);
- single-letter styling (Arabic).
If those issues arise, please report them to whom it may concern (e.g. Chromium, Firefox, Microsoft, Webkit, etc.). The entire web platform will indeed benefit. You can additionally report the issue to us so that we can document it for other implementers.
- W3C Internationalization Working Group Home Page
- Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts (tutorial)
- Styling vertical Chinese, Japanese, Korean and Mongolian text
- International text layout and typography index
- Requirements for Japanese Text Layout
- Requirements for Chinese Text Layout
- Requirements for Hangul Text Layout
- Requirements for the Arabic Script
- Requirements for Hebrew Text Layout
- Requirements for Indic Text Layout
- EBPAJ File creation guide