no_std

Iterators which split strings on Grapheme Cluster or Word boundaries, according to the Unicode Standard Annex #29 rules.

Documentation

use unicode_segmentation::UnicodeSegmentation;

fn main() {
    let s = "a̐éö̲\r\n";
    let g = s.graphemes(true).collect::<Vec<&str>>();
    let b: &[_] = &["a̐", "é", "ö̲", "\r\n"];
    assert_eq!(g, b);

    let s = "The quick (\"brown\") fox can't jump 32.3 feet, right?";
    let w = s.unicode_words().collect::<Vec<&str>>();
    let b: &[_] = &["The", "quick", "brown", "fox", "can't", "jump", "32.3", "feet", "right"];
    assert_eq!(w, b);

    let s = "The quick (\"brown\")  fox";
    let w = s.split_word_bounds().collect::<Vec<&str>>();
    let b: &[_] = &["The", " ", "quick", " ", "(", "\"", "brown", "\"", ")", "  ", "fox"];
    assert_eq!(w, b);
}

no_std

unicode-segmentation does not depend on libstd, so it can be used in crates with the #![no_std] attribute.

crates.io

You can use this package in your project by adding the following to your Cargo.toml:

[dependencies]
unicode-segmentation = "1.10.1"

Change Log

1.11.0

#124 Update data to Unicode 15.1
#128 Add size_hint to iterators

1.10.1

#113 Use criterion.rs for word benchmarks
#112 Improve table search speed through lookups

1.10.0

#107 Upgrade to Unicode 15.0.0
#104 Supersedes and fixes #75

1.9.0

#101 Upgrade to Unicode 14.0.0

1.8.0

#100 * #100 - Increase #[inline] opportunities, resulting in 15-40% performance improvement.
#95 Implement debug for Graphemes
#94 Add Initial fuzzer for oss-fuzz integration
#93 Fix unused imports and deprecated pattern warnings
#91 Made local variable immutable by moving it into loop
#91 Add new iterator UnicodeWordIndices and unicode_word_indices

1.7.1

Update docs on version number

1.7.0

#87 Upgrade to Unicode 13
#79 Implement a special-case lookup for ascii grapheme categories
#77 Optimization for grapheme iteration

1.6.0

#72 Upgrade to Unicode 12

1.5.0

#68 Upgrade to Unicode 11

1.4.0

#56 Upgrade to Unicode 10

1.3.0

#24 Add support for sentence boundaries
#44 Treat gc=No as a subset of gc=N

1.2.1

#37: Fix panic in provide_context.
#40: Fix crash in prev_boundary.

1.2.0

New GraphemeCursor API allows random access and bidirectional iteration.
Fixed incorrect splitting of certain emoji modifier sequences.

1.1.0

Add as_str methods to the iterator types.

1.0.3

Code cleanup and additional tests.

1.0.1

Fix a bug affecting some grapheme clusters containing Prepend characters.

1.0.0

Upgrade to Unicode 9.0.0.

Name	Name	Last commit message	Last commit date
Latest commit Manishearth Publish 1.12 Sep 13, 2024 9e3f88c · Sep 13, 2024 History 229 Commits
.github/workflows	.github/workflows	Don't format generated files	May 12, 2024
benches	benches	Check clippy in CI	May 10, 2024
fuzz	fuzz	initial fuzzer for oss-fuzz integration.	Apr 1, 2021
scripts	scripts	Support Unicode 16.0.0 (#140 )	Sep 13, 2024
src	src	Support Unicode 16.0.0 (#140 )	Sep 13, 2024
tests	tests	Support Unicode 16.0.0 (#140 )	Sep 13, 2024
.gitignore	.gitignore	Ignore txt files	Feb 4, 2017
COPYRIGHT	COPYRIGHT	add license, Travis config	Apr 14, 2015
Cargo.toml	Cargo.toml	Publish 1.12	Sep 13, 2024
LICENSE-APACHE	LICENSE-APACHE	add license, Travis config	Apr 14, 2015
LICENSE-MIT	LICENSE-MIT	add license, Travis config	Apr 14, 2015
README.md	README.md	Bump to 1.11.0	Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

no_std

crates.io

Change Log

1.11.0

1.10.1

1.10.0

1.9.0

1.8.0

1.7.1

1.7.0

1.6.0

1.5.0

1.4.0

1.3.0

1.2.1

1.2.0

1.1.0

1.0.3

1.0.1

1.0.0

About

Licenses found

Releases

Packages

Used by 333k

Contributors 34

Languages

License

unicode-rs/unicode-segmentation

Folders and files

Latest commit

History

Repository files navigation

no_std

crates.io

Change Log

1.11.0

1.10.1

1.10.0

1.9.0

1.8.0

1.7.1

1.7.0

1.6.0

1.5.0

1.4.0

1.3.0

1.2.1

1.2.0

1.1.0

1.0.3

1.0.1

1.0.0

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Used by 333k

Contributors 34

Languages

Packages