Neo DateTime FFI #5940

sffc · 2024-12-20T07:59:32Z

Some thoughts on the structure of DateTime FFI.

Types: we might not need very many

The Rust type DateTimeFormatter<CompositeFieldSet> does everything. We can have many constructors that load different subsets of data. The reason to have multiple types is to reduce stack size (memory usage in FFI) and make the Drop impl a bit smaller, which I think is valuable.

It probably makes sense to start with these types:

FFI type	Rust type	Approximate stack size
`TimeFormatter`	`TimeFormatter<TimeFieldSet>`	272
`DateFormatter`	`DateTimeFormatter<DateFieldSet>`	440
`DateTimeFormatter`	`DateTimeFormatter<CompositeDateTimeFieldSet>`	480
`ZonedDateTimeFormatter`	`DateTimeFormatter<CompositeFieldSet>`	1472
`GregorianDateTimeFormatter`	`FixedCalendarDateTimeFormatter<Gregorian, CompositeDateTimeFieldSet>`	424
`GregorianZonedDateTimeFormatter`	`FixedCalendarDateTimeFormatter<Gregorian, CompositeFieldSet>`	1416

size_test!(TimeFormatter<fieldsets::enums::TimeFieldSet>, time_formatter_size, 272);
size_test!(DateTimeFormatter<fieldsets::enums::DateFieldSet>, date_formatter_size, 440);
size_test!(DateTimeFormatter<fieldsets::enums::CompositeDateTimeFieldSet>, date_time_formatter_size, 480);
size_test!(DateTimeFormatter<fieldsets::enums::CompositeFieldSet>, zoned_date_time_formatter_size, 1472);
size_test!(FixedCalendarDateTimeFormatter<icu_calendar::Gregorian, fieldsets::enums::CompositeDateTimeFieldSet>, gregorian_date_time_formatter_size, 424);
size_test!(FixedCalendarDateTimeFormatter<icu_calendar::Gregorian, fieldsets::enums::CompositeFieldSet>, gregorian_zoned_date_time_formatter_size, 1416);

Maybe we don't need the Gregorian types. The only difference is the presence of the AnyCalendar field and its associated destructor, which seems to be fairly small in the grand scheme. We can still of course have the Gregorian constructors, but they can resolve to these types.

I think we can also have just DateTimeFormatter and not DateFormatter.

Basically: time zone data is way bigger than date data which is way bigger than time data. So if you already have date data, removing time data is mostly in the noise, and if you already have zone data, removing date data is mostly in the noise.

Static Field Set Constructors

For dates, times, and datetimes, I think we can just have all the constructors. There are 20ish.

I think we should flatten the field set options into positional arguments, like we do in some other types. This avoids having a bunch of tiny field set structs like we do in Rust.

For time zones, I see a few options…

Maybe we could have a fn to "add" time zone data to a DateTimeFormatter, transforming it into a ZonedDateTimeFormatter. This would require a bit of work on the Rust side, but I think it's doable.
Or we do a hybrid approach where we have time zone constructors for each style and they take a dynamic datetime field set as an argument. You won't get slicing of date data, but date data is small relative to time zone data.
Anything else?

Dynamic Field Set Constructors

The Rust approach is infeasible over FFI, since it is built on the idea of a bunch of little structs that compose into bigger structs and eventually into an all-encompassing dataful enum.

I think we instead want to introduce a builder API. I think we should start in Rust and then port it to FFI. I've had in mind for a while that the builder API would work similarly to the Serde impl.

There are a few approaches for specifying the field sets:

Enums and structs: specify all date, calendar period, time, and zone fields in plain enums. Then you put the ones you want into a struct. The constructor is fallible if you put together an invalid combination of field sets, but the field sets individually are always valid. This is slightly more type-safe.
Set of fields: export a single enum with possible fields, and the field set is a vector of that enum. Then we check to see if the fields you put in the vector compose a valid field set. This is what the Serde impl does.
String: specify the semantic field set as a string, perhaps in JSON syntax, and use the Serde impl directly.

I lean toward option 2 because it is a smaller API surface, but I can be easily persuaded. If option 1 can be implemented more efficiently since it is structs and enums instead of a vector, that would convince me.

Alternatively, we could say that we don't export dynamic field sets over FFI. Clients in each language can build their own delegation mechanism that sits on top of the static constructors. This would be annoying for clients, but it is totally feasible. And if CLDR adds a string semantic skeleton syntax, we can add a single narrow constructor for that.

@Manishearth

The text was updated successfully, but these errors were encountered:

Manishearth · 2024-12-20T18:10:52Z

The only difference is the presence of the AnyCalendar field and its associated destructor, which seems to be fairly small in the grand scheme

AnyCalendar pulls in a lot of data statically, no? We could add try_new_gregorian() (etc) functions to it to reduce it a bit, but we'd still be stuck with tons of symbols data for the formatters.

I like your first set of six types.

Maybe we could have a fn to "add" time zone data to a DateTimeFormatter, transforming it into a ZonedDateTimeFormatter. This would require a bit of work on the Rust side, but I think it's doable.

I'd prefer to avoid doing mutation/moving over FFI if possible, there are safety concerns there. Eventually I'd like diplomat to start enforcing these things which would make me a bit more relaxed about it but right now our enforcement basically is "we have very few mutable types and we know how they get used".

In particular, I see hazards of mixing this with parts formatting since that involves persisting borrows.

Not opposed, but disprefer it.

I think we instead want to introduce a builder API. I think we should start in Rust and then port it to FFI. I've had in mind for a while that the builder API would work similarly to the Serde impl.

Quite interested.

(Yes, builder APIs also involve mutation, but this is a bit more relaxed: it's abnormal to ever borrow from a builder, builder types exist to be mutated and then consumed, and there are no other operations people typically add)

I lean toward option 2 because it is a smaller API surface, but I can be easily persuaded. If option 1 can be implemented more efficiently since it is structs and enums instead of a vector, that would convince me.

Requiring additional allocation for each DTF construction doesn't seem great. I do like option 1 and think that validation won't be too hard since Rust exhaustive matching will force us to handle everything. Might be messy, I'm not sure, if I recall correctly the "valid" fieldset combinations are actually pretty easily described.

String: specify the semantic field set as a string, perhaps in JSON syntax, and use the Serde impl directly.

I think this would be a fine API to have for 2.0. People probably will be wanting this anyway. It's not going to be efficient, but it's a nice API to have and gives us breathing room to design something good if we want.

Edit: Shane mentioned that semantic skeleton string formats are not specified in UTS 35, so this probably needs more work and is not an easy way out.

May have more thoughts later.

Part of #5940

sffc · 2024-12-21T01:04:11Z

I put up a new builder API in #5944.

We now have several ways of specifying skeletons:

2.0-BETA1 Serde

{
  "fieldSet": ["year","month","day","weekday","time","zoneGeneric"],
  "length":"medium",
  "alignment":"column",
  "yearStyle":"always",
  "timePrecision":"secondF3"
}

PR #5944

{
  "dateFields":"YMDE",
  "zoneStyle":"Zs",
  "length":"medium",
  "alignment":"column",
  "yearStyle":"always",
  "timePrecision":"secondF3"
}

Rust API

fieldsets::YMDET::medium()
  .with_alignment(Alignment::Column)
  .with_year_style(YearStyle::Always)
  .with_time_precision(TimePrecision::SecondF3)
  .with_zone_generic()

ECMA-402 (for comparison)

{
  "era": "short",
  "year": "numeric",
  "month": "short",
  "day": "2-digit",
  "hour": "2-digit",
  "minute": "numeric",
  "second": "numeric",
  "fractionalSecondDigits": 3,
  "timeZoneName": "shortGeneric",
}

Comparing and contrasting:

	2.0-BETA1 Serde	PR #5944	Rust API	ECMA-402
Type Safety	Enums for options, but allows invalid sets of fields and/or options.	Enums for options and fields, but allows invalid combinations of options.	Fully type-safe	Allows invalid sets of fields and/or options, including options not allowed by semantic skeletons.
FFI Compatibility	Medium; needs a vector	Easy	Difficult	Easy
ECMA-402 Compatibility	Conversion is relatively straightforward	Conversion is possible but takes a bit of work and a bunch of matching in userland	Conversion is possible but likely requires even more work than PR #5944	N/A
Readability	@sffc thinks this is the most readable since it avoids jargon like "YMDET"	@Manishearth thinks this is the most readable since it is easier to parse, and people should know what "YMDET" means	Mostly readable, aside from the YMDET	Mostly readable, though verbose

What I think we should do, after a brief conversation with @Manishearth:

Adopt PR Add new datetime field set builder #5944 for FFI
Export the builder to Rust users as an alternative to the type-safe API
Change the Serde impl to use the new schema
Export a standalone API to map from ECMA-402 skeletons to ICU4X skeletons (instead of doing this in userland)
Ask CLDR-TC to adopt this new schema as the standard serialized form in UTS 35

Thoughts? @zbraniecki

Part of #5940

sffc · 2024-12-28T07:19:42Z

AnyCalendar pulls in a lot of data statically, no?

Only if the AnyCalendar constructor is used. I propose using the Gregorian constructor and then casting the resulting FixedCalendarDateTimeFormatter into a DateTimeFormatter via into_formatter.

I really want a way to allow for time zone selection over FFI without including all time zone data, but I guess for 2.0 we can just say, here's the builder API, and if you want less data, upvote this issue and write a custom FFI fn in the interim.

Toward #5928, #5940

sffc · 2025-02-07T18:53:07Z

A few more thoughts on ZonedDateTimeFormatter FFI:

I'm only considering options that allow for DCE-based data slicing. Here are some options with C++ pseudocode:

// Option 1: build a DTF first and move out of it to get a ZDTF
auto formatter = DateTimeFormatter::create_mdet(locale).with_zone_specific_short();

// Option 1a: don't automatically retain the locale
auto formatter = DateTimeFormatter::create_mdet(locale).with_zone_specific_short(locale);

// Option 2: builder type specifically for ZDTF
auto builder = ZonedDateTimeFormatterBuilder::new(locale);
builder.load_mdet();
builder.load_zone_specific_short();
auto formatter = builder.build();

// Option 2a: similar API to option 1
auto builder = ZonedDateTimeFormatterBuilder::create_mdet(locale);
auto formatter = builder.build_with_zone_specific_short();

// Option 3: builder type for all datetime formatters, replacing type-specific constructors
auto builder = DateTimeFormatterBuilder::new(locale);
builder.load_mdet();
builder.load_zone_specific_short();
auto formatter = builder.build_zoned_datetime();

// Option 3a: a more type-safe version
DateTimeFormatterBuilder::new(locale)
    .with_mdet()
    .with_zone_specific_short()
    .build();

A bit more of a description of each:

Option 1: Move out of DTF
- Pro: Relatively fewer APIs
- Con: DTF format function can start throwing an "invalid state" exception
- Con: DTF needs to retain the locale in case someone tries to call a ZDTF constructor later
- Con: Prevents us from in the future adding APIs that borrow the DTF since we want to avoid both-borrow-and-mutate types in FFI
Option 1a: Move out of DTF and do not retain the locale
- Pro: Regular DTF has almost no penalty
- Con: The locales could end up being mismatched
Option 2: Builder for ZDTF
- Pro: Decouples DTF from ZDTF
- Con: Lots of code duplication; for example, we already have 4 create_mdet variants (anycal, fixedcal, anycal-buffer, fixedcal-buffer), and this creates 4 more
- Con: This would likely be an FFI-only abstraction
Option 2a: Builder for ZDTF with API more like option 1
- Pros and cons similar to option 1
Option 3: Builder type for all datetime formatters
- Pro: Single centralized builder that we can document clearly
- Pro: Could also be motivated as a Rust abstraction, keeping better alignment between Rust and FFI
- Pro: No duplicate functions
- Con: DTF loses its nice constructors
- Con: User is required to call the correct build function
Option 3a: Builder type for all datetime formatters with more type safety
- Pro: Looks a lot like the Rust static fieldset API, although it is implemented differently (each with function performs a data load)
- Con: DTF loses its nice constructors

For any of these options, there are a few mechanical ways to implement the "build" function:

Builder function takes &mut self, moves out of Option fields, and returns an opaque
- Con: Mutation is not generally recommended in Diplomat when there are alternatives (Solving variance with Opaque and OpaqueMut (Diplomat and aliasing safety) rust-diplomat/diplomat#225)
Builder function takes &self, moves out of RefCell or Mutex fields (possibly depending on a sync feature), and returns an opaque
- Con: Need to think about sync; slightly more expensive than regular mutation
Builder function takes &self, clones the data (which in most cases is a cheap clone), and returns an opaque
- Con: The clone might not be cheap, and it is probaly more expensive than moving

I am currently thinking of exploring Option 3a with RefCell/Mutex gated on a sync feature added to icu_capi (which will also imply other sync features such as icu_provider/sync).

@Manishearth @robertbastian

sffc · 2025-02-11T09:37:14Z

@Manishearth The &mut self is still useful to Diplomat to indicate to users that the method is a mutating method. Should we do both &mut self and RefCell/Mutex for visibility and safety together?

It also means we could remove the RefCell/Mutex in a stable way if Diplomat solves the mutation problem.

Manishearth · 2025-02-11T09:39:46Z

Oh, that's an interesting idea. Then we don't have to worry about people misusing it in a multithreaded context¹.

Yeah, go for it!

It's not possible in a singlethreaded context: nothing here is reentrant because we don't have callbacks. ↩

Manishearth · 2025-02-11T09:40:41Z

I am currently thinking of exploring Option 3a with RefCell/Mutex gated on a sync feature added to icu_capi (which will also imply other sync features such as icu_provider/sync).

Also I think I said this in person but I am happy with this plan, and do not have strong opinions on the choice of sync mitigation strategy though your latest idea seems liek the best.

robertbastian · 2025-02-11T10:33:11Z

Builder function takes &self, clones the data (which in most cases is a cheap clone), and returns an opaque
Con: The clone might not be cheap, and it is probaly more expensive than moving

Which clones here will be expensive? The options are all Copy, the Locale is borrowed.

sffc · 2025-02-11T10:38:03Z

Builder function takes &self, clones the data (which in most cases is a cheap clone), and returns an opaque
Con: The clone might not be cheap, and it is probaly more expensive than moving

Which clones here will be expensive? The options are all Copy, the Locale is borrowed.

Each step of the builder loads data and returns DataErrors (or DateTimeLoadErrors). It's the only way to make data slicing work.

robertbastian · 2025-02-11T10:40:04Z

How would that work? You'd have to poke a lot of holes in to the datetime crate. What are the intermediate builder types?

Cloning DataPayloads should be cheap, it's either a ref copy, or an Rc bump, which is still cheaper than a RefCell.

sffc · 2025-02-11T10:45:43Z

How would that work? You'd have to poke a lot of holes in to the datetime crate. What are the intermediate builder types?

I'd probably build it in Rust first: "Could also be motivated as a Rust abstraction, keeping better alignment between Rust and FFI"

Cloning DataPayloads should be cheap, it's either a ref copy, or an Rc bump, which is still cheaper than a RefCell.

In Rust, I wouldn't want to rely on this, since people can make payloads that own their data, but maybe it's okay in FFI? And we tell people that if they are making custom payloads, then don't use this function and make your own FFI wrapper?

I'm maybe okay with that. If we do this, then I'd prefer to implement option 1 or 1a. @Manishearth?

sffc · 2025-02-11T17:24:58Z

From a brief discussion, @Manishearth is okay with this path, and @robertbastian prefers the ergonomics of option 1.

I also think option 1 has the best ergonomics, though I am a bit sad to be adding a DataLocale to the heap size of DateTimeFormatter even if you don't normally need it.

sffc · 2025-02-12T21:51:20Z

Been thinking a bunch about how to implement this in a way that avoids expanding the Rust API in undesirable ways.

Here's an idea I had:

impl<C, FSet> TypedDateTimeNames<C, FSet> {
    pub fn from_formatter(
        self,
        preferences: DateTimeFormatterPreferences,
        formatter: FixedCalendarDateTimeFormatter<C, FSet>
    ) -> Self {
        // pull out the inner field of the formatter; throw away the pattern
    }
}

impl<C, FSet> TypedDateTimeNames<C, FSet> {
    pub fn try_into_formatter_unstable<P>(
        self,
        provider: &P,
        field_set: FSet,
    ) -> Result<FixedCalendarDateTimeFormatter<C, FSet>>
        /* where P contains FSet's Pattern-related data but not any names*/
    {
        // load the pattern and check that it matches
    }
}

This way, we can convert a formatter into a TypedDateTimeNames, load the time zone data, and then convert it back with a new pattern. Note that this is a low-level operation, so I put the APIs on TypedDateTimeNames and not on FixedCalendarDateTimeFormatter.

With DateTimeFormatter, I think we can go through TypedDateTimeNames<(), FSet>.

Toward #5940 This allows a client such as FFI to add arbitrary data to a DateTimeFormatter instance. I put the conversion functions and docs on the DateTimeNames types because this is a power-user feature.

sffc added the C-datetime Component: datetime, calendars, time zones label Dec 20, 2024

sffc added this to the ICU4X 2.0 ⟨P1⟩ milestone Dec 20, 2024

Manishearth mentioned this issue Dec 20, 2024

Basic macro support rust-diplomat/diplomat#752

Open

This was referenced Dec 20, 2024

Add with_fset fns to datetime types #5942

Merged

Add new datetime field set builder #5944

Merged

sffc added a commit that referenced this issue Dec 21, 2024

Add with_fset fns to datetime types (#5942)

8c064d0

Part of #5940

sffc added a commit that referenced this issue Dec 21, 2024

Add new datetime field set builder (#5944)

9a60869

Part of #5940

Manishearth assigned sffc Jan 6, 2025

sffc added the S-large Size: A few weeks (larger feature, major refactoring) label Jan 7, 2025

Manishearth mentioned this issue Jan 7, 2025

Decide on names for icu_datetime and icu_calendar errors over FFI #5274

Open

sffc mentioned this issue Jan 17, 2025

Implement Serde and improve test coverage of FieldSetBuilder #6014

Merged

sffc added a commit that referenced this issue Jan 20, 2025

Implement Serde and improve test coverage of FieldSetBuilder (#6014)

0ec545b

Toward #5928, #5940

This was referenced Jan 20, 2025

Update diplomat coverage #6019

Merged

Add initial neo datetime FFI #6035

Merged

sffc mentioned this issue Feb 13, 2025

DateTime: Add conversions between Formatter and Names #6110

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neo DateTime FFI #5940

Neo DateTime FFI #5940

sffc commented Dec 20, 2024 •

edited

Loading

Manishearth commented Dec 20, 2024 •

edited

Loading

sffc commented Dec 21, 2024 •

edited

Loading

sffc commented Dec 28, 2024

sffc commented Feb 7, 2025

sffc commented Feb 11, 2025 •

edited

Loading

Manishearth commented Feb 11, 2025

Manishearth commented Feb 11, 2025

robertbastian commented Feb 11, 2025

sffc commented Feb 11, 2025 •

edited

Loading

robertbastian commented Feb 11, 2025

sffc commented Feb 11, 2025

sffc commented Feb 11, 2025

sffc commented Feb 12, 2025

Neo DateTime FFI #5940

Neo DateTime FFI #5940

Comments

sffc commented Dec 20, 2024 • edited Loading

Types: we might not need very many

Static Field Set Constructors

Dynamic Field Set Constructors

Manishearth commented Dec 20, 2024 • edited Loading

sffc commented Dec 21, 2024 • edited Loading

sffc commented Dec 28, 2024

sffc commented Feb 7, 2025

sffc commented Feb 11, 2025 • edited Loading

Manishearth commented Feb 11, 2025

Footnotes

Manishearth commented Feb 11, 2025

robertbastian commented Feb 11, 2025

sffc commented Feb 11, 2025 • edited Loading

robertbastian commented Feb 11, 2025

sffc commented Feb 11, 2025

sffc commented Feb 11, 2025

sffc commented Feb 12, 2025

sffc commented Dec 20, 2024 •

edited

Loading

Manishearth commented Dec 20, 2024 •

edited

Loading

sffc commented Dec 21, 2024 •

edited

Loading

sffc commented Feb 11, 2025 •

edited

Loading

sffc commented Feb 11, 2025 •

edited

Loading