Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add formatting for durations e.g. "2h 4min. 2sec." #56

Merged

Conversation

KJTsanaktsidis
Copy link
Contributor

@KJTsanaktsidis KJTsanaktsidis commented Apr 27, 2022

Uses unumf_* to format the individual components of the duration, and
ulistfmt_* to join them. That's roughly analogous with what the C++
MeasureFormatter class does internally.

I asked about some feedback for this approach on the ICU mailing list
too: https://sourceforge.net/p/icu/mailman/message/37645785/

Annoyingly, this won't work on ICU < 67 because it needs UNumberFormatter and UListFormatter, so of course CI isn't actually running any of the tests I wrote. But they pass, I promise. If you like I wonder if I should put up a PR to make travis run the tests on a newer ICU as well.

@KJTsanaktsidis KJTsanaktsidis force-pushed the ktsanaktsidis/duration_formatting branch 2 times, most recently from 2813a39 to be125e5 Compare April 28, 2022 07:35
@KJTsanaktsidis
Copy link
Contributor Author

Update - new implementation! I took some inspiration from the Intl js proposal (https://github.com/tc39/proposal-intl-duration-format).

Basically, now, instead of taking a number of seconds, it takes a hash specifying {years: N, ..., hours: N, minutes: N, seconds: N} individually. This passes off the problem of how to best represent a given duration to calling code (which can use something like ActiveSupport::Duration for this if they like).

The style option can be one of :long, :short, :narrow and :digital, mirroring the Javascript proposal. The first three are pretty straightforward; we simply format them individually using the UNumberFormatter methods, then join them together using the UListFormat methods.

The :digital option is special. This is supposed to give output in a format like h:mm:ss. There is code in ICU's C++ classes to do this (icu::MeasureFormat), but that's not accessible from the C API. So, I re-implemented the basics of that (the formatNumber method) in Ruby here. We look up an appropriate locale-specific pattern (although in reality it seems like it's h:mm:s for ALL locales??) and fill out the format placeholders in a loop insdie the DurationFormat implementation.

What I did not implement from the javascript side of things:

  • Separate formats per component. I don't have a usecase for them but I don't think i've precluded their implementation
  • The fractionalDigits setting. The implemented behaviour is always like "auto", which is show as many decimal places in the last field as are present, and truncate all other decimals.

Some examples!

# What the various styles look like
irb(main):001:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :long).format({hours: 8, minutes: 40, seconds: 35})
=> "8 hours, 40 minutes, 35 seconds"
irb(main):002:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :short).format({hours: 8, minutes: 40, seconds: 35})
=> "8 hrs, 40 mins, 35 secs"
irb(main):003:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :narrow).format({hours: 8, minutes: 40, seconds: 35})
=> "8h 40min. 35s."
irb(main):004:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :digital).format({hours: 8, minutes: 40, seconds: 35})
=> "8:40:35"

# How digital & non-digital formats deal with units > hours
irb(main):005:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :narrow).format({days: 2, hours: 8, minutes: 40, seconds: 35})
=> "2d 8h 40min. 35s."
irb(main):006:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :digital).format({days: 2, hours: 8, minutes: 40, seconds: 35})
=> "2d 8:40:35"

# Missing or zero parts are omitted
irb(main):007:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :long).format({days: 2, minutes: 40, seconds:0})
=> "2 days, 40 minutes"
irb(main):008:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :digital).format({hours: 2, minutes: 40})
=> "2:40"
irb(main):009:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :digital).format({minutes: 40, seconds: 7})
=> "40:07"

# Sub-second parts are folded into seconds for digital display
irb(main):010:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :digital).format({hours: 5, minutes: 7, seconds: 23, milliseconds: 98, microseconds: 997})
=> "5:07:23.098997"

# Zero-extension of sub-second parts in digital style
irb(main):011:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :digital).format({hours: 5, minutes: 7, seconds: 23, milliseconds: 400})
=> "5:07:23.400"
irb(main):012:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :digital).format({hours: 5, minutes: 7, seconds: 23, milliseconds: 400, microseconds: 700})
=> "5:07:23.400700"

# All fractional parts except the last are truncated
irb(main):013:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :long).format({days: 2, hours: 7.3, minutes: 40.9, seconds:0.43})
=> "2 days, 7 hours, 40 minutes, 0.43 seconds"

@lwelti
Copy link

lwelti commented Apr 28, 2022

is looking good, could you add:

  • some test using negative units such as -7 hours, etc..
  • when the unit is beyond the expected what would happen? if you have for example:
{hours: 25, minutes: 61}
  • if you have some units with zero, for example:
{hours: 0, minutes: 2}
  • And we may need some tests in other languages with complex plural forms such as `ru, pl

@KJTsanaktsidis
Copy link
Contributor Author

Yowch. Negative durations. Going to make a whole separate comment about that...

So firstly, it's possible to construct a hash like {hours: 7, minutes: -8} where some parts are positive and some are negative. I have no idea what that would represent, so I think we should not support it.

Now, what to do when all parts of the duration are negative? That is how the JS Temporal object represents a negative duration:

> Temporal.Duration.from('-P3DT4H59M')
Temporal.Duration <-P3DT4H59M> {
  days: -3,
  hours: -4,
  minutes: -59
}

As written, this PR would emit the following:

irb(main):010:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :long).format({days: -32, hours: -7, minutes: 0, seconds: -29,milliseconds: -999})
=> "-32 days, -7 hours, -29 seconds, -999 milliseconds"

i.e. each component is individually formatted as a negative number, which is a bit silly IMO. This was discussed in the duration formatting spec here: tc39/proposal-intl-duration-format#29. They concluded that the requirements for formatting a negative duration were:

a) different from the default toLocaleString output for the same duration with a positive sign
b) a format that's widely used in software ("-2h 30m" or "-3:23:45" would meet this criteria. "T minus 2 days" would not.)
c) should look to end users and developers like the inverse of the positive format. "-2h 30m" vs. "2h 30m" meets this criteria, while "2 days and 30 minutes" vs. "2 days and 30 minutes remaining" does not because the inverse of "remaining" is probably "complete" not an empty string.

However there was also some suggestion that CLDR should be aware of this:

I think the scope of this issue for 402 is just to make sure that CLDR is aware of the sign of the duration when generating its implementation-defined output

But, CLDR doesn't have any data for doing this differently as far as I can tell. It would have to be part of the list format , and it's not - https://github.com/unicode-org/cldr/blob/208411e0dd2562db70d67b8d75dc7e8ed4ce30a8/common/main/en.xml#L8942. There would need to be a pattern like this defined, and there isn't:

		<listPattern type="unit-narrow-negative">
                       <!-- we put a negative sign at the START of the list! -->
			<listPatternPart type="start">-{0} {1}</listPatternPart>
			<listPatternPart type="middle">{0} {1}</listPatternPart>
			<listPatternPart type="end">{0} {1}</listPatternPart>
			<listPatternPart type="2">{0} {1}</listPatternPart>
		</listPattern>

And also ulistfmt_format() would need some way of being told that the whole quantity is negative, and it doesn't have that.

(This problem isn't unique to time, by the way - I think exactly the same issue would crop up if we wanted to display a quantity like "-3m 20cm" for example)


So, since this is tricky, and the correct CLDR/ICU support isn't there for doing this properly, I just made this formatter throw exceptions if any component of the duration is negative.

@KJTsanaktsidis
Copy link
Contributor Author

The other questions:

when the unit is beyond the expected what would happen? if you have for example: {hours: 25, minutes: 61}

It doesn't do any balancing (and neither does the Intl one: tc39/proposal-intl-duration-format#10) so it'll print exactly that:

irb(main):011:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :long).format({hours: 25, minutes: 61})
=> "25 hours, 61 minutes"

In Ruby-land, I think one would use something like ActiveSupport::Duration to do the balancing before passing it to ffi-icu:

irb(main):007:0> dr1 = ActiveSupport::Duration.parse('PT25H61M')
=> 25 hours and 61 minutes
irb(main):008:0> dr1.parts
=> {:hours=>25, :minutes=>61}
irb(main):009:0> dr2 = ActiveSupport::Duration.build(dr1.to_i)
=> 1 day, 2 hours, and 1 minute
irb(main):010:0> dr2.parts
=> {:days=>1, :hours=>2, :minutes=>1}
irb(main):011:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :long).format(dr1.parts)
=> "25 hours, 61 minutes"
irb(main):012:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :long).format(dr2.parts)
=> "1 day, 2 hours, 1 minute"

Anyway, a long way of saying "Not My Problem" 😂

if you have some units with zero, for example: {hours: 0, minutes: 2}

Ignores the zero parts, as per the js spec (and there are tests for this):

irb(main):013:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :long).format({hours: 0, minutes: 2})
=> "2 minutes"

And we may need some tests in other languages with complex plural forms such as ru or pl

Interesting. I wonder what it currently outputs.

irb(main):014:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'ru', style: :long).format({hours: 1, minutes: 2, seconds: 3})
=> "1 час 2 минуты 3 секунды"
irb(main):015:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'ru', style: :long).format({hours: 10, minutes: 20, seconds: 30})
=> "10 часов 20 минут 30 секунд"
irb(main):016:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'ru', style: :narrow).format({hours: 1, minutes: 2, seconds: 3})
=> "1 ч 2 мин 3 с"
irb(main):017:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'ru', style: :narrow).format({hours: 10, minutes: 20, seconds: 30})
=> "10 ч 20 мин 30 с"

To my uninformed eye it looks like it's doing the "2" vs "lots" plurals correctly (минуты vs минут) but I'll try and go poke a russian speaker and see what they think.

@aurelieverrot
Copy link

aurelieverrot commented Apr 29, 2022

# All fractional parts except the last are truncated
irb(main):013:0> ICU::DurationFormatting::DurationFormatter.new(locale: 'en-AU', style: :long).format({days: 2, hours: 7.3, minutes: 40.9, seconds:0.43})
=> "2 days, 7 hours, 40 minutes, 0.43 seconds"

Why is it not converting 7.3 hours in 7 hours, 18 minutes, and 40.9 minutes in 40 minutes, 54 seconds?
I would have expected to receive the following output:

=> "2 days, 7 hours, 58 minutes, 54.43 seconds"

@KJTsanaktsidis
Copy link
Contributor Author

Why is it not converting 7.3 hours in 7 hours, 18 minutes, and 40.9 minutes in 40 minutes, 54 seconds?

It seems in the Intl DurationFormatter spec no consensus had been reached about how to format fractional digits for anything > seconds: tc39/proposal-intl-duration-format#89, tc39/proposal-intl-duration-format#65

I think the reason not to convert 7.3 hours -> 7 hours, 18 minutes is that everything > seconds are calendar units - we add leap seconds to days to correct for the slowing of the earth's rotation, for example, so the precise number of seconds in 7.3 hours actually depends on whether this day has a leap second in it or not.

Now, perhaps we could just format it as "2 days, 7.3 hours, 40.9 minutes, 0.43 seconds", but the DurationFormatter spec explicitly doesn't define anything about fractionalDigits for anything > seconds. So I figured the most sensible behaviour was to just truncate off any fractional parts in > seconds. I could also just make it raise an exception if we think that's a better idea.

@KJTsanaktsidis
Copy link
Contributor Author

KJTsanaktsidis commented May 2, 2022

BTW @aurelieverrot if you want the balancing behaviour of 7.3 hours -> 7 hours, 18 minutes, I think ActiveSupport::Duration can do that for you:

irb(main):006:0> ActiveSupport::Duration.build(7.3.hours.to_i).parts
=> {:hours=>7, :minutes=>18}

But i think the reason for punting this to application code is that it needs to make choices about how to handle uneven months, leap seconds, etc etc. For example, ActiveSupport makes the following choices for you:

SECONDS_PER_DAY = 86400
SECONDS_PER_HOUR = 3600
SECONDS_PER_MINUTE = 60
SECONDS_PER_MONTH = 2629746
SECONDS_PER_WEEK = 604800
SECONDS_PER_YEAR = 31556952

But different applications might want more sophisticaed handling.

@aurelieverrot
Copy link

Oh so we could pass the result of ActiveSupport::Duration to ICU::DurationFormatting::DurationFormatter.new.
It makes sense. Thank you @KJTsanaktsidis

Copy link
Owner

@erickguan erickguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I will see what can I do with Travis.

list_join_format = STYLES_TO_LIST_JOIN_FORMAT.fetch(style)
@list_formatter = FFI::AutoPointer.new(
Lib.check_error { |error|
Lib.ulistfmt_openForType(@locale, :units, list_join_format, error)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we translate this :units to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow your question, sorry. :units here refers to the constant ULISTFMT_TYPE_UNITS, which is for formatting lists of quantities like "5 pounds, 12 ounces".

This list formatter gets used at the end of #format to glue all the duration components together

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it looks good.

KJ Tsanaktsidis added 3 commits December 20, 2022 11:13
Uses unumf_* to format the individual components of the duration, and
ulistfmt_* to join them. That's roughly analogous with what the C++
MeasureFormatter class does internally.

I asked about some feedback for this approach on the ICU mailing list
too: https://sourceforge.net/p/icu/mailman/message/37645785/
@erickguan erickguan force-pushed the ktsanaktsidis/duration_formatting branch from dc811fd to 8daded6 Compare December 20, 2022 10:13
@erickguan erickguan merged commit 59f7c32 into erickguan:master Dec 20, 2022
@erickguan
Copy link
Owner

Thanks. I will release it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants