Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge CodePointTrie, UnicodeSet, and Char16Trie into the same crate #1856

Closed
3 of 4 tasks
sffc opened this issue May 4, 2022 · 12 comments · Fixed by #2336
Closed
3 of 4 tasks

Merge CodePointTrie, UnicodeSet, and Char16Trie into the same crate #1856

sffc opened this issue May 4, 2022 · 12 comments · Fixed by #2336
Assignees
Labels
C-unicode Component: Props, sets, tries S-small Size: One afternoon (small bug fix or enhancement) T-core Type: Required functionality

Comments

@sffc
Copy link
Member

sffc commented May 4, 2022

I've been thinking for a while that we should have a single crate that houses all of the Unicode data structures, which currently include:

  • CodePointTrie: map from code point to value
  • UnicodeSet: collection of code points
  • Char16Trie: collection of strings

Thoughts?

@sffc sffc added T-core Type: Required functionality C-unicode Component: Props, sets, tries S-small Size: One afternoon (small bug fix or enhancement) needs-approval One or more stakeholders need to approve proposal labels May 4, 2022
@Manishearth
Copy link
Member

Manishearth commented May 5, 2022

I agree, but we should name them better

@echeran
Copy link
Contributor

echeran commented May 5, 2022

Is this a code organization thing, or is there a technical benefit to doing this, or something else? I assume dead code elimination means that there's not a perf difference either way.

But making such a change like this requires churn (we've had the churn already moving both UnicodeSet and CodePointTrie from components into utils). So it's good to know the motivating problem first.

If it's a code organization thing (ex: too many crates within utils), we can always create a subdirectory, ex: utils/props/codepointtrie, utils/props/uniset.
(Where possible, I think it's nice for codebases to organize files this way as appropriate, but I think subdirectories / subpackages get idiomatically under-utilized.)

+1 to reviewing naming

@Manishearth
Copy link
Member

To me it's actually more of a documentation and API design thing: it's easier to have this all in one, holistically-documented crate that can talk about the different types and their use cases. The current situation is kinda confusing, though that may just be because of the names.

@echeran
Copy link
Contributor

echeran commented May 12, 2022

Okay, it sounds like it better fits the Rust idiomatic practices around crates & the API docs that they generate... that sounds fine to me. Hopefully we feel confident that this is the end state of code organization (no need for more refactoring). Although more than refactoring, naming this new crate seems like the real challenge. :-)

@robertbastian
Copy link
Member

I approve, especially because we have cross-dependencies.

What are we calling this crate? icu_collections? unicode_collections?

@sffc sffc added the discuss Discuss at a future ICU4X-SC meeting label May 26, 2022
@sffc sffc added this to the ICU4X 1.0 Untriaged milestone May 26, 2022
@sffc
Copy link
Member Author

sffc commented Jun 3, 2022

Discussion:

  • @sffc - Using icu_collections sounds like we'd want to re-export it as icu::collections
  • @zbraniecki - I think it's fine if we do that
  • @robertbastian - We don't currently re-export other utilities
  • @QnnOkabayashi - @younies and I like unicode_collections
  • @zbraniecki - I like icu_collections. My position is that if we don't want to re-export, triefecta seems fine and is like yoke. If we want to re-export, icu_collections is better.
  • @robertbastian - If we want to heavily use it in our APIs in the future, triefecta isn't a great name.
  • @sffc - Maybe we want to apply the learnings here to icu_locid and icu_locale_canonicalizer as well.

Naming suggestions:

If we want to tie it to ICU:

  • icu_collections

If we don't want to tie it to ICU but have a clear name:

  • unicode_collections
  • code_point_collections
  • codepoint_collections
  • ucollections
  • unicollections

If we want a clever name:

  • trie_force
  • triefecta
  • triepod

@sffc
Copy link
Member Author

sffc commented Jun 6, 2022

Full results of voting: https://docs.google.com/document/d/1aFfGHUzEW8JY7oS95jWdLij82NStljDrsc0Mzr25I0U/edit#

Summary: icu_collections and unicollections are essentially tied in terms of first-choice preference, but unicollections has 3 vetoes, and icu_collections has no vetoes. Therefore, icu_collections appears to be the consensus pick.

@sffc
Copy link
Member Author

sffc commented Jun 6, 2022

Given that we are going with icu_collections, I lean toward putting it in the components directory, similar to icu_locid.

@sffc sffc removed discuss Discuss at a future ICU4X-SC meeting needs-approval One or more stakeholders need to approve proposal labels Jun 6, 2022
@andrewpollack andrewpollack self-assigned this Jul 17, 2022
@andrewpollack
Copy link
Contributor

andrewpollack commented Jul 17, 2022

To confirm, I will make the following changes:

  • Moving the following data structures into a single crate named icu_collections , located under components/collections:
    • CodePointTrie
    • CodePointSet
    • Char16Trie

(As part of #2141, UnicodeSet will be renamed to CodePointSet, along with related libraries)

EDIT: Per Shane suggestion, components/icu_collections -> components/collections

@sffc
Copy link
Member Author

sffc commented Jul 17, 2022

A crate named icu_collections located at components/collections (no need to repeat "icu" in the directory name). Otherwise yes.

@robertbastian
Copy link
Member

And we're re-exporting it as icu::collections?

@sffc
Copy link
Member Author

sffc commented Jul 20, 2022

And we're re-exporting it as icu::collections?

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-unicode Component: Props, sets, tries S-small Size: One afternoon (small bug fix or enhancement) T-core Type: Required functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants