Used by CIDRAM and phpMussel to handle L10N data, the L10N handler reads in an array of L10N strings and provides some safe and simple methods for manipulating and returning those strings when needed, and for handling cardinal plurals, where integers and fractions are concerned alike, based upon the pluralisation rules specified by the L10N from a range of various pluralisation rules available, to be able to suit the needs of most known languages.
- Working with singular forms.
- Working with plural forms.
- What rules to use for what language?
- Assigning rules automatically.
- Leveraging the L10N handler and the YAML class in conjunction.
- Object chaining.
Let's begin with an example.
<?php
// An example L10N array that uses English.
$DataEN = [
'IntegerRule' => 'int2Type4',
'FractionRule' => 'int1',
'MyName' => 'Hello! My name is %s.',
'YourName' => 'What is your name?',
'DoYouSpeak' => 'Do you speak English?'
];
// An example L10N array that uses French.
$DataFR = [
'IntegerRule' => 'int2Type3',
'FractionRule' => 'fraction2Type1',
'MyName' => 'Bonjour ! Je m\'appelle %s.',
'YourName' => 'Quel est votre nom ?'
];
// Construction a new L10N instance using French as the main L10N array and
// English as the fallback L10N array.
$L10N = new \Maikuolan\Common\L10N($DataFR, $DataEN);
// Attempt to fetch and sprint our desired L10N strings.
echo sprintf($L10N->getString('MyName'), 'Mary Sue') . PHP_EOL;
echo $L10N->getString('YourName') . PHP_EOL;
echo $L10N->getString('DoYouSpeak') . PHP_EOL;
The example above, produces this output:
Bonjour ! Je m'appelle Mary Sue.
Quel est votre nom ?
Do you speak English?
The getString
method provides a safe way fetch an L10N string. If the string exists in the main L10N array, it will be returned from the main L10N array. If the string doesn't exist in the L10N array, but exists in the fallback L10N array, it will be returned from the fallback L10N array. If the string doesn't exist in either of the two arrays, an empty string will be returned.
public function getString(string $String): string;
The reason that the class utilises both a main array and a fallback array, is that it enables the class to support L10N data in situations where the implementation may utilise translations of L10N data into several different languages, and where some of those translations aren't complete, in a safe way.
Imagine the following situation, which doesn't use this class:
<?php
// Currently using:
$Language = 'FR';
if ($Language === 'FR') {
// An example L10N array that uses French.
$Lang = [
'YourName' => 'Quel est votre nom ?'
];
}
elseif ($Language === 'EN') {
// An example L10N array that uses English.
$Lang = [
'YourName' => 'What is your name?',
'DoYouSpeak' => 'Do you speak English?'
];
}
echo $Lang['DoYouSpeak'] . PHP_EOL;
It would produce an error:
<br />
<b>Notice</b>: Undefined index: DoYouSpeak in <b>\foo\bar.php</b> on line <b>20</b><br />
Of course, that situation actually demonstrates a very poor way to implement L10N support anyway. But, the error is produced, because the DoYouSpeak
string hadn't been translated into French yet. If it had used English, it would've produced the desired string. Arguably, too, errors could be avoided simply by ensuring that translations exist for every possible string, in every possible translation, prior to deployment. But I think, the way that this class provides the ability to rely on a default language as a fallback in such cases, and that it simply returns an empty string when the string doesn't exist at all, is perhaps a much easier, much simpler way to avoid these kinds of errors.
Let's begin with an example.
<?php
// An example L10N array that uses English.
$DataEN = [
'IntegerRule' => 'int2Type4',
'FractionRule' => 'int1',
'apples' => [
'There is %s apple on the tree.',
'There are %s apples on the tree.'
],
'oranges' => [
'There is %s orange on the tree.',
'There are %s oranges on the tree.'
],
];
// An example L10N array that uses Russian.
$DataRU = [
'IntegerRule' => 'int3Type4',
'FractionRule' => 'int1',
'apples' => [
'На дереве есть %s яблоко.',
'На дереве есть %s яблока.',
'На дереве есть %s яблок.'
]
];
// Construction a new L10N instance using Russian as the main L10N array and
// English as the fallback L10N array.
$L10N = new \Maikuolan\Common\L10N($DataRU, $DataEN);
// How many apples are there on the tree?
foreach ([0, 1, 2, 3, 4, 5] as $Number) {
echo sprintf($L10N->getPlural($Number, 'apples'), $Number) . PHP_EOL;
}
echo PHP_EOL;
// How many oranges are there on the tree?
foreach ([0, 1, 2, 3, 4, 5] as $Number) {
echo sprintf($L10N->getPlural($Number, 'oranges'), $Number) . PHP_EOL;
}
echo PHP_EOL;
The example above, produces this output:
На дереве есть 0 яблок.
На дереве есть 1 яблоко.
На дереве есть 2 яблока.
На дереве есть 3 яблока.
На дереве есть 4 яблока.
На дереве есть 5 яблок.
There are 0 oranges on the tree.
There is 1 orange on the tree.
There are 2 oranges on the tree.
There are 3 oranges on the tree.
There are 4 oranges on the tree.
There are 5 oranges on the tree.
The getPlural
method can be used when there are multiple plural forms available for a particular L10N string. In our example, "apples" and "oranges" have multiple plural forms (counting how many items are on a hypothetical tree). The example uses the Russian data as the main L10N array, and English as the fallback L10N array. The fallback L10N array is used when the desired L10N data doesn't exist in the main L10N array, which is why the above example produces Russian apples and English oranges.
public function getPlural($Number, string $String): string;
The L10N handler knows which available plural form to select for a given number because of the plural rules specified by the L10N array (IntegerRule
and FractionRule
). When there's a chance that you might be working with plurals, these two elements should exist in the arrays, to ensure that the correct plural forms are returned.
The order that plural forms should appear in an L10N array always begins at the plural form that corresponds to one item (the singular), followed by plural forms as they appear sequentially (corresponding to two items, three items, four items, etc). If there is a specific plural form for zero, that plural form should appear last.
The demonstration above shows how we can use the class to fetch an appropriate plural form for cardinal integers. The class also supports fractions, too (for those languages that have distinct plural forms for different ranges of fractions):
<?php
$DataFR = [
'IntegerRule' => 'int2Type3',
'FractionRule' => 'fraction2Type1',
'Seconds' => [
'La page chargée en %s seconde.',
'La page chargée en %s secondes.'
]
];
$L10N = new \Maikuolan\Common\L10N($DataFR);
// Example page load times.
foreach ([0.1, 0.5, 1.1, 1.5, 2.1, 2.5, 3.1, 3.5, 4.1, 4.5, 5.1] as $Number) {
echo sprintf($L10N->getPlural($Number, 'Seconds'), $Number) . PHP_EOL;
}
echo PHP_EOL;
Produces:
La page chargée en 0.1 seconde.
La page chargée en 0.5 seconde.
La page chargée en 1.1 seconde.
La page chargée en 1.5 seconde.
La page chargée en 2.1 secondes.
La page chargée en 2.5 secondes.
La page chargée en 3.1 secondes.
La page chargée en 3.5 secondes.
La page chargée en 4.1 secondes.
La page chargée en 4.5 secondes.
La page chargée en 5.1 secondes.
Additionally, as you might've noticed in the above example, the fallback L10N array is optional. If you want to work with only one language, or if multiple language versions don't exist, it's okay to use only one L10N array (the main L10N array).
The information listed in the table below is GENERALLY based upon Unicode's CLDR page on Language Plural Rules (which also serves as the general basis for the rules for grammatical number supported by the class). Information based upon other sources will be marked accordingly. If any of the listed information is wrong, erroneous, or incomplete, any corrections, additions, or changes that you can think of would be invited and welcome (please create a pull request, or create an issue if creating a pull request isn't possible). Please also be aware that I am NOT a professional linguist! If you ask me for the correct rules to use for a particular language, I'll only be able to answer if I'm able to find a reliable source somewhere online for that data.
†1: Unicode's CLDR page doesn't provide any data for the given language, but the relevant data can be found elsewhere (the source of that data will be linked or cited where possible).
†2: I (the author of this class) have found convincing evidence/data which contradicts the data provided by Unicode's CLDR page for the given language, and so, the data listed here will differ from that provided by Unicode's CLDR page.
Language | IntegerRule |
FractionRule |
Notes |
---|---|---|---|
******************************** |
******** |
******** |
******** |
Afrikaans Albanian (Shqipe) Aragonese Asturian (Asturianu) Asu Azerbaijani (Azərbaycan) Balochi (بلۏچی) Basque (Euskara) Bemba Bena Bodo (बड़ो) Bulgarian (Български) Catalan (Català) Chechen Cherokee (ᏣᎳᎩ) Chiga Divehi Dutch (Nederlandse) English Esperanto Estonian (Eesti keel) European Portuguese (Português) Ewe (Eʋegbe) Faroese (Føroyskt) Finnish (Suomi) Friulian Galician (Galego) Ganda (LùGáànda) Georgian (ქართული) German (Deutsch) Greek (Ελληνικά) Greenlandic (Kalaallisut) Hausa (حَوْسَ) Hawaiian (ʻōlelo Hawaiʻi) Hungarian (Magyar) Ido Interlingua Italian (Italiano) Jju Kako Kashmiri (कॉशुर, كٲشُر) Kazakh (Қазақ тілі) Kituba †1 Kongo/Kikongo †1 Kurdish (Kurdî) Kyrgyz (Кыргыз тили) Ladin Latgalian (Latgalīšu) †1 Latvian (Latviešu) †2 Ligurian (Ligure) Luxembourgish (Lëtzebuergesch) Machame Malayalam (മലയാളം) Marathi (मराठी) Masai Maori (Māori) †1 Metaʼ Mongolian (Монгол) Nahuatl (Nāhuatl) Ndebele Nepali (नेपाली) Ngiemboon Ngomba Norwegian (Norsk) Norwegian Bokmål Norwegian Nynorsk Nyanja Nyankole Odia (ଓଡ଼ିଆ) Oromo (ኦሮሞ፞) Ossetic Papiamento (Papiamentu) Pashto (پښتو) Romansh (Rumantsch) Rombo Rwa Saho Samburu Samoan Sardinian (Limba Sarda) Scots †1 Sena Shambala Shona Sicilian (Sicilianu) Sindarin †1 Sindhi (سنڌي) Soga Somali (Soomaaliga) Southern Sotho (Sesotho) Spanish (Español) Swahili (Kiswahili) Swati Swedish (Svenska) Swiss German Syriac (ܠܫܢܐ ܣܘܪܝܝܐ) Tamil (தமிழ்) Telugu (తెలుగు) Teso Tigre (ትግረ, ትግሬ) Tsonga (xiTsonga) Tswana (Setswana) Turkish (Türkçe) Turkmen (Түркmенче) Tyap Urdu (اردو) Uyghur (ئۇيغۇرچە, Уйғурчә) Uzbek (O'zbek) Venda (tshiVenḓa) Volapük Vunjo Walser Western Frisian (Frysk) Xhosa (isiXhosa) Yiddish (ייִדיש) |
int2Type4 |
int1 |
|
Akan Bihari Gun Klingon (tlhIngan Hol, ) †1 Lingala (Lingála) Malagasy Northern Sotho (Sesotho) Punjabi (ਪੰਜਾਬੀ) ‡1 Sinhala (සිංහල) Tigrinya (ትግርኛ) Walloon (Walon) |
int2Type3 |
int1 |
‡1: Classification includes (groups together with): Changvi, Chenavari, Dhani, Doabi, Hindko, Jafri, Jangli, Jhangochi, Khetrani, Lahnda, Majhi, Malwai, Pahari-Potowari, Panjistani, Pothohari, Puadhi, Rachnavi, Saraiki, Shahpuri. |
Amharic (አማርኛ) Assamese (অসমীয়া) Bangla/Bengali (বাংলা) Dogri (𑠖𑠵𑠌𑠤𑠮) Gujarati (ગુજરાતી) Hindi (हिंदी) Kannada (ಕನ್ನಡ) Nigerian Pidgin Persian/Farsi (فارسی) Zulu (isiZulu) |
int2Type3 |
fraction2Type2 |
|
Arabic (العربية ) ‡1 |
int6Type1 |
int1 |
‡1: CLDR's information suggests 6 distinct grammatical numbers used, but I haven't been able to successfully replicate this via online translators or dictionaries in most cases, so I'm not entirely sure about it. |
Armenian (հայերեն) Bhojpuri (भोजपुरी) Brazilian Portuguese (Portugues do Brasil) French (Français) Fulah Kabyle (ثاقبايليث) |
int2Type3 |
fraction2Type1 |
|
Bambara Bhutanese/Dzongkha (རྫོང་ཁ) Burmese (ျမန္မာဘာသာ) Chinese (中文) ‡1 Hmong Njua Igbo Indonesian (Bahasa Indonesia) Japanese (日本語) Javanese (Jawa) Kabuverdianu Khmer (ភាសាខ្មែរ) Korean (한국어) Koyraboro Senni Lakota (Lakȟótiyapi) Lao (ພາສາລາວ) Lojban Makonde Malay (Bahasa Melayu) N’Ko (ߒߞߏ) Osage Sakha Sango Sichuan Yi (ꆈꌠꉙ) Thai (ไทย) Tibetan (བོད་སྐད) Toki Pona †1 Tongan (Faka-Tonga) Vietnamese (Tiếng Việt) Wolof (Wollof) Yoruba (Yorùbá) |
int1 |
int1 |
Although int1 +int1 could imply that there aren't plural forms for a particular language, it should be noted that in most cases, plurality can be inferred by context, indicated by specificity, reduplication, or otherwise determined by some other means. It doesn't mean that there aren't plurals; just that it wouldn't affect how the class should be used.‡1: Whether simplified (傳統) or traditional (简体), Cantonese (广东话) or Mandarin (普通话), or whatever else, pluralisation rules are the same (AFAICT). |
Belarusian (Беларуская мова) Bosnian (Bosanski) Croatian (Hrvatski) Russian (Русский) Serbian (Српски) Serbo-Croatian Ukrainian (Українська) |
int3Type4 |
int1 |
|
Breton (Brezhoneg) | int4Type3 |
int1 |
|
Anii Colognian |
int3Type2 |
int1 |
|
Fijian Inari Sami (Anarâškielâ) Inuktitut Lule Sami (Julevsámegiella) Nama (Khoekhoegowab) Northern Sami (Sámegiellaa) Santali (ᱥᱟᱱᱛᱟᱲᱤ) Skolt Sami (Nuõrttsää’m) Southern Sami (Åarjelsaemien gïele) |
int3Type3 |
int1 |
|
Czech (Čeština) Slovak (Slovenčina) |
int3Type9 |
int1 |
|
Danish (Dansk) | int2Type4 |
fraction2Type1 |
|
Cebuano Filipino Tagalog |
int2Type1 |
int1 |
|
Hebrew (עברית) | int3Type3 |
fraction2Type2 |
|
Icelandic (Íslenska) Macedonian (Македонски) |
int2Type2 |
int1 |
|
Irish (Gaeilge) | int5Type1 |
int1 |
|
Langi | int3Type2 |
fraction2Type1 |
|
Prussian | int3Type1 |
int1 |
|
Lithuanian (Lietuvių) | int3Type6 |
int1 |
|
Lower Sorbian (Dolnoserbski) Slovenian (Slovenščina) Upper Sorbian (Hornjoserbsce) |
int4Type4 |
int1 |
|
Maltese (Malti) | int5Type2 |
int1 |
|
Manx (Vanninagh) | int4Type1 |
int1 |
|
Moldavian (Moldovenească) Romanian (Română) |
int3Type8 |
int1 |
|
Na'vi †1 | int4Type7 |
int1 |
|
Polish (Polski) | int3Type5 |
int1 |
|
Quenya †1 ‡1 Tokelauan †1 |
int3Type10 |
int1 |
‡1: Quenya actually has four distinct plural forms, but the L10N handler rules to use for Quenya suggests three, because in this context, we're only concerned with grammatical number. Whether a plural form is partitive or non-partitive is outside the scope of these rules, but can be determined by context, and doesn't conflict with the grammatical number. |
Scottish Gaelic (Gàidhlig) | int4Type2 |
int1 |
|
Tachelhit | int3Type7 |
fraction2Type2 |
|
Welsh (Cymraeg) | int6Type2 |
int1 |
|
Cornish (Kernewek) | int6Type3 |
int1 |
If you want, you can have the L10N handler assign the appropriate rules automatically, saving yourself the trouble of figuring out which rules you should be using for your L10N data. All you need to know is the correct language codes for the languages you're working with.
To have the L10N handler assign the appropriate rules automatically, you can use the autoAssignRules
method.
public function autoAssignRules($Code, $FallbackCode = '');
The autoAssignRules
method accepts two parameters: The first parameter is the language code of the language for your primary L10N data. The second parameter is optional, and is the language code of the language for your fallback L10N data.
Example:
<?php
// An example L10N array that uses English.
$DataEN = [
'YourName' => 'What is your name?',
'DoYouSpeak' => 'Do you speak English?'
];
// An example L10N array that uses French.
$DataFR = [
'YourName' => 'Quel est votre nom ?'
'DoYouSpeak' => 'Parlez-vous Français ?'
];
// Construction a new L10N instance using French as the main L10N array and
// English as the fallback L10N array.
$L10N = new \Maikuolan\Common\L10N($DataFR, $DataEN);
// Let's pretend we don't know which rules to use, but we know the language
// codes for the languages we're using ("en" for English and "fr" for French;
// or if we wanted, we could go more specific, too; like "en-US" for US English
// and "fr-CA" for Canadian French, or "fr-FR" for French spoken in France,
// etc). We'll use the "autoAssignRules" method to assign the rules for us
// automatically.
$L10N->autoAssignRules('fr-FR', 'en-US');
Using autoAssignRules
will also automatically populate the Directionality
and FallbackDirectionality
properties, which can optionally be used by the implementation to decide on text directionality (although the class itself doesn't make use of such information).
Leveraging the L10N handler and the YAML class in conjunction can provide an extremely convenient way to manage your implementation's L10N needs. CIDRAM and phpMussel both do this. For CIDRAM and phpMussel, each language's L10N data is stored in distinct, separate YAML files.
As a hypothetical example:
english.yaml
:
## English YAML file.
IntegerRule: "int2Type4"
FractionRule: "int1"
Hello: "Hello!"
Today's cakes:
- "Today, there is %s cake in the shop."
- "Today, there are %s cakes in the shop."
Yesterday's cakes:
- "But, I already ate %s cake yesterday."
- "But, I already ate %s cakes yesterday."
russian.yaml
:
## Russian YAML file.
IntegerRule: "int3Type4"
FractionRule: "int1"
Hello: "Привет!"
Today's cakes:
- "Сегодня в магазине есть %s торт."
- "Сегодня в магазине есть %s торта."
- "Сегодня в магазине есть %s тортов."
Yesterday's cakes:
- "Но я уже съел %s торт вчера."
- "Но я уже съел %s торта вчера."
- "Но я уже съел %s тортов вчера."
example.php
:
<?php
// For English.
$rawData = file_get_contents(__DIR__ . '/english.yaml');
$English = new \Maikuolan\Common\YAML($rawData);
// For Russian.
$rawData = file_get_contents(__DIR__ . '/russian.yaml');
$Russian = new \Maikuolan\Common\YAML($rawData);
// Instantiate L10N object.
$L10N = new \Maikuolan\Common\L10N($English->Data, $Russian->Data);
// Now, about those cakes...
foreach ([1, 2, 4, 7] as $Today) {
foreach ([1, 2, 4, 7] as $Yesterday) {
echo $L10N->getString('Hello') . ' ';
echo sprintf($L10N->getPlural($Today, 'Today\'s cakes'), $Today) . ' ';
echo sprintf($L10N->getPlural($Yesterday, 'Yesterday\'s cakes'), $Yesterday) . PHP_EOL;
}
}
echo PHP_EOL;
// Or.. Swapping the languages around...
$L10N = new \Maikuolan\Common\L10N($Russian->Data, $English->Data);
// And...
foreach ([1, 2, 4, 7] as $Today) {
foreach ([1, 2, 4, 7] as $Yesterday) {
echo $L10N->getString('Hello') . ' ';
echo sprintf($L10N->getPlural($Today, 'Today\'s cakes'), $Today) . ' ';
echo sprintf($L10N->getPlural($Yesterday, 'Yesterday\'s cakes'), $Yesterday) . PHP_EOL;
}
}
echo PHP_EOL;
The resulting output:
Hello! Today, there is 1 cake in the shop. But, I already ate 1 cake yesterday.
Hello! Today, there is 1 cake in the shop. But, I already ate 2 cakes yesterday.
Hello! Today, there is 1 cake in the shop. But, I already ate 4 cakes yesterday.
Hello! Today, there is 1 cake in the shop. But, I already ate 7 cakes yesterday.
Hello! Today, there are 2 cakes in the shop. But, I already ate 1 cake yesterday.
Hello! Today, there are 2 cakes in the shop. But, I already ate 2 cakes yesterday.
Hello! Today, there are 2 cakes in the shop. But, I already ate 4 cakes yesterday.
Hello! Today, there are 2 cakes in the shop. But, I already ate 7 cakes yesterday.
Hello! Today, there are 4 cakes in the shop. But, I already ate 1 cake yesterday.
Hello! Today, there are 4 cakes in the shop. But, I already ate 2 cakes yesterday.
Hello! Today, there are 4 cakes in the shop. But, I already ate 4 cakes yesterday.
Hello! Today, there are 4 cakes in the shop. But, I already ate 7 cakes yesterday.
Hello! Today, there are 7 cakes in the shop. But, I already ate 1 cake yesterday.
Hello! Today, there are 7 cakes in the shop. But, I already ate 2 cakes yesterday.
Hello! Today, there are 7 cakes in the shop. But, I already ate 4 cakes yesterday.
Hello! Today, there are 7 cakes in the shop. But, I already ate 7 cakes yesterday.
Привет! Сегодня в магазине есть 1 торт. Но я уже съел 1 торт вчера.
Привет! Сегодня в магазине есть 1 торт. Но я уже съел 2 торта вчера.
Привет! Сегодня в магазине есть 1 торт. Но я уже съел 4 торта вчера.
Привет! Сегодня в магазине есть 1 торт. Но я уже съел 7 тортов вчера.
Привет! Сегодня в магазине есть 2 торта. Но я уже съел 1 торт вчера.
Привет! Сегодня в магазине есть 2 торта. Но я уже съел 2 торта вчера.
Привет! Сегодня в магазине есть 2 торта. Но я уже съел 4 торта вчера.
Привет! Сегодня в магазине есть 2 торта. Но я уже съел 7 тортов вчера.
Привет! Сегодня в магазине есть 4 торта. Но я уже съел 1 торт вчера.
Привет! Сегодня в магазине есть 4 торта. Но я уже съел 2 торта вчера.
Привет! Сегодня в магазине есть 4 торта. Но я уже съел 4 торта вчера.
Привет! Сегодня в магазине есть 4 торта. Но я уже съел 7 тортов вчера.
Привет! Сегодня в магазине есть 7 тортов. Но я уже съел 1 торт вчера.
Привет! Сегодня в магазине есть 7 тортов. Но я уже съел 2 торта вчера.
Привет! Сегодня в магазине есть 7 тортов. Но я уже съел 4 торта вчера.
Привет! Сегодня в магазине есть 7 тортов. Но я уже съел 7 тортов вчера.
Of course, how you choose to use these classes, and how you choose to store your L10N data, is ultimately up to you.
If you want, it's possible to chain together multiple L10N objects via L10N's fallback mechanism.
As an example:
<?php
$English = ['Hello' => 'Hello', 'World' => 'World', 'Something English' => 'Bangers and mash'];
$French = ['Hello' => 'Bonjour', 'World' => 'Monde', 'Something French' => 'Vin et croissants'];
$Russian = ['Hello' => 'Привет', 'World' => 'Мир', 'Something Russian' => 'Водка и борщ'];
$German = ['Hallo' => 'Hello', 'World' => 'Welt', 'Something German' => 'Brezeln und Bier'];
$Foo = new \Maikuolan\Common\L10N($German, $Russian);
$Bar = new \Maikuolan\Common\L10N($French, $Foo);
$Foobar = new \Maikuolan\Common\L10N($English, $Bar);
echo $Foobar->getString('Hello').PHP_EOL;
echo $Foobar->getString('World').PHP_EOL;
echo $Foobar->getString('Something English').PHP_EOL;
echo $Foobar->getString('Something French').PHP_EOL;
echo $Foobar->getString('Something Russian').PHP_EOL;
echo $Foobar->getString('Something German').PHP_EOL;
The resulting output:
Hello
World
Bangers and mash
Vin et croissants
Водка и борщ
Brezeln und Bier
This means, that in theory, you could have an unlimited number of languages as fallbacks for your L10N data.
Last Updated: 12 February 2025 (2025.02.12).