-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement translation service (i18n) #387
Comments
Do you plan to sue whole API from gettext? Like plural handling and stuff? |
Nope, I don't think so. Maybe I'm wrong, cause finally I haven't checked that precisely, but I think that it's too much work and we have very little use cases for those things, so this would be a low priority. I mean – knowing CKE4 – there were just a couple of places where we had to format a plural, which you can always somehow workaround, even if not very gracefully. So this wasn't high on the priority list. OTOH, perhaps we could dive into this, and if it's possible, predict a full gettext support. I wouldn't like to work on it now, but if it would be possible to implement it in the future, then we'd be safe. Or it may turn out that it's simpler than I think... I'll better check it :D. |
From what I understand from https://www.gnu.org/software/gettext/manual/gettext.html#Translating-plural-forms we need to pass a
So it'd be like: t( 'file-upload/uploaded-message: Uploaded a file. | Uploaded %0 files.', fileCount, fileCount ); Why t( {
context: 'file-upload',
id: 'uploaded-message',
singular: 'Upladed a file.',
plural: 'Uploaded %0 files',
count: fileCount
}, fileCount ); Now, the If I don't miss anything, I think that we should be able to add support for plural forms in the future. It will require more a complicated logic for |
BTW, I haven't mentioned one thing here – the use of external libraries. Obviously, there are libraries which parse and write PO files and we'll use one. However, for the string resolution logic (so what the |
Looks great. It would be nice if the tool could list some errors like:
|
@Reinmar Maybe it's more an edge-case, but words in many languages can have more than one plural form. |
@ma2ciek gettext has it already defined and its standard is used in many places. Checkout this: https://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html It handles polish irregular plural forms. |
Yep.
Yep.
Yep.
Yep.
You mean, in the plural usage form? I don't want to use "|" there, that's why I proposed the object format, as more explicit and less error prone. OTOH, the pipe format could be an option and totally satisfactory in most of the case. But we won't work on this now anyway, so we can discuss this in the future. |
The idea is that en has 2 forms, so our |
👍 |
I found a quick tutorial how gettext workflow looks like and it's a bit different than how I imagined it: http://www.labri.fr/perso/fleury/posts/programming/a-quick-gettext-tutorial.html
It's also important that when you run It would be worth checking how this merging process look and how sending this to Transifex looks like too. E.g. can we just generate For me the process could look like this, but perhaps I miss something:
|
AFAIR the The This workflow is nicely supported by some tools (namely poedit ) in which you can run So in such workflow the I think that the |
Also, I found https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/#What_form_of_msgids_should_be_used which clarifies how the msgids should be created:
This reminded me, what I forgot to describe and what have been confusing for you – where in the t( 'basic-styles/bold: Bold' );
{
"basic-styles/bold": "Bold button label."
} The msgid "basic-styles/bold"
msgstr "Bold"
msgctxt "Bold button label." So:
|
Thanks @jodator. TBH, I have still troubles in understanding the real purpose of But, if we accept what I described above, then |
Anyway, @ma2ciek will need to dig into Transifex docs to learn about the proper process. |
Just wanted to chip in to say that, with the new and awesome ES6 string templates, there is a lot of new functionality that can be applied here. For example, I just made a demo in 5 min (jsfiddle): let translations = {
'I am $1 $2!': {
en: 'I am $1 $2!',
es: '¡Soy $1 $2!',
jp: (name, lastname) => `私は${lastname.toUpperCase()} ${name.toUpperCase()}です`
}
};
let name = 'Francisco';
let lastname = 'Presencia';
let translation = t`I am ${name} ${lastname}!`; Even in this simple example this looks more natural than: let translation = t('I am $1 $2', name, lastname); Of course you could modify many things and improve others. The point is, with template strings the syntax is closer to natural English while giving you more flexibility than normal functions. |
Hi all. I implemented internationalization service for targetprocess. And I talk you how we made it. Maybe it will be useful for you. We use Transifex as well. const currentLang ='en'; //this variable we get from user settings
const defaultLang = 'en';
const dictionary = { // it load dynamically from separate bundle
'ru': {
'Bold button label.': 'Толстая кнопка'
}
}
const translate =() => {/*function which apply plural format or come variable*/} // we use https://github.com/yahoo/intl-messageformat for it
cost __ = (string, params) => {
if(currentLang === defaultLang) {
return translate(string, params);
} else {
return translate(dictionary[currentLang][string] || string, params);
}
} How we made lang files for others. We put all dictionary to github repo.
|
Hi! Thanks for chipping in :) The use of templates was already proposed: ckeditor/ckeditor5-design#136 (comment) and we came to a conclusion that it's not going to work in our case. We need to synchronise translations through a service like Transifex which means that they need to be plain text strings. Also, with the EDIT: @franciscop, I took a bit closer look at your demo and it's really interesting. I see that in the translation repository there are plain text strings and that you can change the order of placeholders in a translation and everything works fine: en: 'I am $1 $2!',
es: '¡Soy $2 $1!', If we were able to ensure that in the future we'd be able to implement the plural forms (see #387 (comment)), then using template strings could be an option. It doesn't sound particularly tricky to have the |
I edited my previous comment because it didn't make a lot of sense initially and I created https://github.com/ckeditor/ckeditor5-utils/issues/117 – it'll be worth considering adding support for template strings to the |
So I'm working now on the very basic support without plural forms and template strings which we'll add later. I'll be pushing changes to the pull request for the https://github.com/ckeditor/ckeditor5-utils/issues/116 After small searching for the phrase |
That's a bug. |
I talked with @fredck and we decided to simplify the I updated the spec in the issue description and for you to be able to understand what I changed I created https://gist.github.com/Reinmar/2ca4400d5fb724fd9be88fd903aee8eb/revisions. |
Thanks for the tips. I can see that we're planning to implement the same solution (especially after simplifying the |
So for the clarification @Reinmar @wwalc I'm going to implement errors after spec update for the following cases:
|
Sounds good to me. |
EDITED
Yes, I made a typo. Translations are saving under the
I'll fix few issues and I'll create a pull request within an hour.
I had to misunderstand implementation of this part. I hope it's not going be hard to add babel or similar parser here to find |
After a week I improved the I changed the way the It took quite a lot of time compare to the effects, mainly because of the researching in out of date webpack plugins and webpack documentation. I hope it'll change soon. |
How does the process look like? Is it split into two parts:
or not? If it is, then I don't see a problem in sorting the translations in a way that the core will always be first. Besides, why does it have to be first in the first place? There should be no conflicts between translations provided by packages anyway. |
How's that possible that Escodegen isn't supporting all ES6 yet? How's Webpack working with ES6? Is it always pre-transpiling the modules to CJS or some custom format? |
The process isn't split. For each js file in ckeditor5-* the |
Can't we split it? ;< Or somehow locate and add ckeditor5-core automatically? |
We can add manually |
The loader is used just after resolver for each file. So this part can't be separate if we don't know paths to the |
That's why I asked whether we cannot use Webpack's module resolution logic to find this out ;) We can't hardcode this in any way. |
So I have no idea how to get the informations about used packages before the loader is executed. This is a similar problem to the multi-language support problem. |
It doesn't have to be an information about used packages. It has to be a path to the |
I used webpack |
Webpack doesn't use Escodegen. I'm trying to figure out what is that package doing before files emitting. But I think, that Webpack may have own generator, because it transforms Acorn output to own objects and is injecting its |
PS. To build that sample I used https://github.com/ckeditor/ckeditor5-labs/tree/master/integrations/webpack-integration-localization |
EDITED: See #387 (comment)
Process
t()
functions in which we define the English value of a string (and, optionally, a context). E.g.t( 'Bold' )
ort( 'Button [context: clothing]' )
(the[context: clothing]
will be automatically removed upon build on, in the dev mode, on runtime). Each context must be defined inlang/contexts.json
of a package which uses it or in theckeditor5-core/lang/contexts.json
(for common strings).t()
usages, extract strings from it, builds a contexts map (based on alllang/contexts.json
files) and checks whether all used strings have defined contexts. Then builds a PO file with English strings and contexts and upload it to Transifex.lang/
directories.Implementation
ckeditor5-core/src/editor~Editor#constructor
I propose to merge the ctx name into the string in order to keep the same
t()
params in every possible environmentt( str, values )
. If it was a separate param, then in the build version (in which we replace strings with ids) it would have to be left unused, or, we'd need to change the implementation oft()
on the fly.The CKE_LANG will be defined by the bundler. It will be the editor class that's going to set it which means that it will only need to be set when bundling for the browser environment. Or we could go a bit further than that and define utils/global object which would retrieve the global scope depending on the env. In the browser that would be a window, in Node.js that would... something which is global there. That would allow this code to work without the bundler needing to preset this value.
PS. we already have
utils/dom/global
, but I think that it makes sense to keep them separated.ckeditor5-*/lang/contexts.json
Contexts for used messages.
Examples: https://github.com/ckeditor/ckeditor-dev/tree/master/dev/langtool/meta
ckeditor5-core/lang/contexts.json
:ckeditor5-form/lang/contexts.json
:ckeditor5-tailor/lang/contexts.json
:The button is first defined in the
ckeditor5-form
package without a context, because, e.g. historically, we could've used it there without a context (cause, in our case, button is a UI button most of the time). Then, while working on the CKEditor 5 Tailor plugin we realised that button is already used, but in a different context, so we can't uset( 'button' )
as it will point to a wrong context definition (contexts are global for all packages). Instead, we'll uset( 'button [context: clothing] ' )
and add its own definition in theckeditor5-tailor/lang/contexts.json
.ckeditor5-utils/src/locale
No magic here – uses the
translate()
function of theutils/translation-service
module to get translated string and replaces placeholder values ($1
,$2
, etc.) with passed args.ckeditor5-utils/src/translation-service
The goal of this module is to encapsulate the whole "translations repository" logic from the rest of the code.
It may be dirty with some preprocessor rules or may need to be generated on the fly depending on the build type (e.g. with or without code splitting – i.e. with separate or built in language file). However, it would be good if it was defined in the repository so for dev purposes we wouldn't have to do anything. It'd just return the passed string.
Development mode implementation:
For the bundles, we have two cases:
In case of just one language, it'll be best to simply replace the
str
param oft()
callswith the proper value (without the ctx now). This will allow for code splitting and hot-loading plugins without any tricks. It may happen, though, that some strings will then repeat multiple times – e.g. the ones from the core package. While this is going to make an un-gzipped package bigger, there shouldn't be a difference in case of a gzipped code. Besides, this will be just a few strings anyway and we'll save some space having such a simple implementation too.
In case of multiple languages we need to have some registry. The
translate()
implementation will be again simple:What's the
str
in this case? In order to ensure that we don't have name collisions (important for bundle splitting) I'd say that this should be either:A totally unique string – like typical uid, but preferably shorter, using a wider range of unicode characters. A 5 chars long string using a range of all Unicode chars will give us comparable complexity to
utils/uid
:The thing which worries me is whether there won't be any issues with encoding – we've seen people sourcing CKEditor in some weird encodings and it was always blowing up. But one could use a minifier which encodes special characters and it was be working again.
Another thing is ability to debug such code. With unreadble ids it may be tricky.
Finally – creating objects in which these uids are keys may be tricky. I can't find now whether there are any characters which needs to be escaped (other than the closing quote).
Anyway, this solution may create unpredictable and stupid issues.
Therefore, we may just use sequential ids from a short range (e.g.
[a-z]
). What about code splitting? There are two cases:ckeditor5-preset-article-editor
and then, separately (might be a different person), builtckeditor5-image
. In this case, the preset bundle will use normal, short, ids (because it's the main bundle) and the image feature bundle will use prefixed ids (e.g.ckeditor5-image/<id>
). The idea is that a developer releasing his/her package will use a special bundler setup which configures CKEditor plugin for Webpack to use prefixed ids.Anyway, this is nothing we have to worry today, because, most likely, we'll work on releasing standalone package bundles after 1.0.0.
Another thing to notice is that support for multiple languages and code splitting is an optional feature, so we can implement it for just one bundler, i.e. Webpack.
Let's say, that we want to split a package to some preset X and packages Y and Z.
This will make for 3 files –
x.js
,y.js
,z.js
.The idea is that each files will have an accompanying language(s) files defining a translations needed for this file. So there will be:
x.js
,x-pl.js
,x-en.js
, ...y.js
,y-pl.js
,y-en.js
, ...z.js
,z-pl.js
,z-en.js
, ...In order to run an the X preset with plugins with Polish translations one will need to load:
x.js
,y.js
,z.js
,x-pl.js
,y-pl.js
,z-pl.js
.The language files will be built using entry points like this:
And the
define()
function will merge these translations into other that it already has.Tadam!
PS. PO file generation
We've been changing the idea which part of
t()
call ismsgctxt
,msgid
andmsgstr
twice already, so let's clarify this:For the following
t()
calls andcontexts.json
:ckeditor5-core/lang/contexts.json
:ckeditor5-form/lang/contexts.json
:ckeditor5-tailor/lang/contexts.json
:The
en.po
file would look like this:In this case the
msgstr
can be empty because thengettext
will usemsgid
. In fact, in the samples I found in #387 (comment) it's empty.Regarding #387 (comment):
We ignore this issue. It's very rare situation.
The text was updated successfully, but these errors were encountered: