-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit return values of unicode property functions #1239
Comments
Happy to chat.
|
Yeah, the current API is largely a result of us having not yet hammered down the details. It's just what happened to be implementable. We should consider most parts of it to be fair game for debate. |
In #1269 I'm adding an FFI with the following syntax: auto result = ICU4XUnicodeSetProperty::try_get_ascii_hex_digit(dp);
if (result.success) {
bool contains = result.data.value().contains(cp);
} with the following Rust definitions: pub struct ICU4XUnicodeSetProperty(DataPayload<'static, UnicodePropertyV1Marker>);
impl ICU4XUnicodeSetProperty {
pub fn contains(&self, cp: char) -> bool {
self.0.get().inv_list.contains(cp)
}
} We should unify the FFI and the Rust API once we finalize the design. |
After reviewing #1608, I think we should have a class like |
@sffc to reproduce the conclusion whiteboard. |
@Manishearth here are the notes to our previous meeting to create an initial design of what this might look like, and after I opened PR #1333 to attempt implementing it, I ran into tricky Rust issues when implementing it, and the discussion you and I had about those issues exist at the end of the same section of the notes doc: |
Thanks so much for digging that up! |
A thing I noticed is that this plan would require renaming the |
Posting a WIP for the new design, now working on integrating // This file is part of ICU4X. For terms of use, please see the file
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).
//! Temporary module for new return value design
use core::marker::PhantomData;
use icu_codepointtrie::{CodePointTrie, TrieValue};
use icu_provider::prelude::*;
use icu_provider::{yoke, zerofrom};
use icu_uniset::UnicodeSet;
use icu_uniset::UnicodeSetBuilder;
/// A set of characters with a particular property.
#[derive(Debug, Eq, PartialEq, Clone, yoke::Yokeable, zerofrom::ZeroFrom)]
#[cfg_attr(feature = "datagen", derive(serde::Serialize))]
#[cfg_attr(feature = "serde", derive(serde::Deserialize))]
pub struct CodePointSet<'data> {
/// The set of characters, represented as an inversion list
#[cfg_attr(feature = "serde", serde(borrow))]
pub inv_list: UnicodeSet<'data>,
}
impl Default for CodePointSet<'static> {
/// Default empty property
fn default() -> CodePointSet<'static> {
CodePointSet {
inv_list: UnicodeSetBuilder::new().build(),
}
}
}
impl<'data> CodePointSet<'data> {
/// Creates a [`UnicodeProperty`] for the given [`UnicodeSet`].
pub fn from_owned_uniset(set: UnicodeSet<'data>) -> CodePointSet<'data> {
CodePointSet { inv_list: set }
}
}
impl<'data> From<CodePointSet<'data>> for UnicodeSet<'data> {
fn from(prop: CodePointSet<'data>) -> UnicodeSet<'data> {
prop.inv_list
}
}
/// A set of characters with a particular property.
#[derive(Clone)]
pub struct CodePointSetData<T: CodePointSetLikeProperty> {
data: DataPayload<CodePointSetMarker<T>>,
}
impl<T: CodePointSetLikeProperty> CodePointSetData<T> {
pub fn contains(&self, ch: char) -> bool {
self.data.get().inv_list.contains(ch)
}
}
/// A map efficiently storing data about individual characters.
#[derive(Debug, Eq, PartialEq, yoke::Yokeable, zerofrom::ZeroFrom)]
#[cfg_attr(feature = "datagen", derive(serde::Serialize))]
#[cfg_attr(feature = "serde", derive(serde::Deserialize))]
pub struct CodePointMap<'data, T: TrieValue> {
/// A codepoint trie storing the data
#[cfg_attr(feature = "serde", serde(borrow))]
pub code_point_trie: CodePointTrie<'data, T>,
}
impl<'data, T: TrieValue> Clone for CodePointMap<'data, T>
where
<T as zerovec::ule::AsULE>::ULE: Clone,
{
fn clone(&self) -> Self {
CodePointMap {
code_point_trie: self.code_point_trie.clone(),
}
}
}
/// A set of characters with a particular property.
#[derive(Clone)]
pub struct CodePointMapData<T: CodePointMapLikeProperty> {
data: DataPayload<CodePointMapMarker<T>>,
}
impl<T: CodePointMapLikeProperty> CodePointMapData<T> {
pub fn get(&self, ch: char) -> T::Value {
self.data.get().code_point_trie.get(ch as u32)
}
}
/// Properties that are stored in CodePointSets
pub trait CodePointSetLikeProperty {
const KEY: ResourceKey;
}
#[derive(Copy, Clone, Debug, PartialEq, Eq, Ord, PartialOrd, Hash)]
pub struct CodePointSetMarker<T>(PhantomData<T>);
impl<T> DataMarker for CodePointSetMarker<T> {
type Yokeable = CodePointSet<'static>;
}
impl<T: CodePointSetLikeProperty> ResourceMarker for CodePointSetMarker<T> {
const KEY: ResourceKey = T::KEY;
}
/// Properties that are stored in CodePointSets
pub trait CodePointMapLikeProperty {
const KEY: ResourceKey;
type Value: TrieValue;
}
#[derive(Copy, Clone, Debug, PartialEq, Eq, Ord, PartialOrd, Hash)]
pub struct CodePointMapMarker<T>(PhantomData<T>);
impl<T: CodePointMapLikeProperty> DataMarker for CodePointMapMarker<T> {
type Yokeable = CodePointMap<'static, T::Value>;
}
impl<T: CodePointMapLikeProperty> ResourceMarker for CodePointMapMarker<T> {
const KEY: ResourceKey = T::KEY;
}
fn get_cp_map<D, T>(provider: &D, key: ResourceKey) -> CodePointMapData<T>
where
D: DynProvider<CodePointMapMarker<T>> + ?Sized,
T: CodePointMapLikeProperty,
{
let resp: DataResponse<CodePointMapMarker<T>> =
provider.load_payload(key, &Default::default())?;
let property_payload: DataPayload<CodePointMapMarker<T>> = resp.take_payload()?;
Ok(CodePointMapData { data: property_payload })
}
macro_rules! make_map_property {
(
property: $prop_name:expr;
marker: $marker_name:ident;
value: $value_ty:path;
key: $key_name:expr;
func:
$(#[$attr:meta])*
pub fn $funcname:ident();
) => {
#[doc = concat!("Marker type for `", $prop_name, "` Unicode property, ")]
#[doc = concat!("with value [`", stringify!($value_ty), "`]")]
pub struct $marker_name;
impl CodePointMapLikeProperty for $marker_name {
const KEY: ResourceKey = $key_name;
type Value = $value_ty;
}
$(#[$attr])*
pub fn $funcname<D>(provider: &D) -> CodePointMapData<$marker_name>
where
D: DynProvider<CodePointMapMarker<$marker_name>> + ?Sized,
{
get_cp_map(provider, $key_name)
}
}
}
// use ;
make_map_property!{
property: "Word_Break";
marker: WordBreakProperty;
value: crate::props::WordBreak;
key: crate::provider::key::WORD_BREAK_V1;
func:
/// testing
pub fn get_word_break();
} |
With my current plan I'm also not sure what to do about versioning and I might make a PR without V1s on it, which we can add separately if desired. |
Hmm, I'm hitting rust-lang/rust#96223 |
What I might do is get rid of the prop marker types for now and perhaps introduce them in the future, which will be easy once we have a macro doing everything. |
Hm, that doesn't actually fix things 😅 |
WIP is at https://github.com/unicode-org/icu4x/pull/1808/files, split out macro stuff into #1810 |
I've written https://docs.google.com/document/d/1gVEV_cSrdCSk1WxKVthdcIB1XenrcB-wDyanRAwMbUI/edit to resolve some open questions here. |
We're not blocked on this anymore, the issue crops up when there's a missing Deserialize impl; we just need to test with |
(there's probably an issue for this already, but I'm filing it to make sure)
Currently we have
which results in call sites that look like
We should think about this further.
CC @echeran @markusicu
The text was updated successfully, but these errors were encountered: