Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic on find_from_ucs2 #101

Closed
jedel1043 opened this issue Jan 14, 2025 · 1 comment
Closed

Panic on find_from_ucs2 #101

jedel1043 opened this issue Jan 14, 2025 · 1 comment

Comments

@jedel1043
Copy link
Contributor

jedel1043 commented Jan 14, 2025

Probably related to #100.

The fix for that issue introduced Input::CODE_UNITS_ARE_BYTES, which skips some optimizations as invalid for UTF-16 and UCS2, like:

regress/src/cursor.rs

Lines 59 to 78 in ffb373e

#[inline(always)]
pub fn next_byte<Input: InputIndexer, Dir: Direction>(
input: &Input,
_dir: Dir,
pos: &mut Input::Position,
) -> Option<u8> {
assert!(
Input::CODE_UNITS_ARE_BYTES,
"Not implemented for non-byte input"
);
let res;
if Dir::FORWARD {
res = input.peek_byte_right(*pos);
*pos += if res.is_some() { 1 } else { 0 };
} else {
res = input.peek_byte_left(*pos);
*pos -= if res.is_some() { 1 } else { 0 };
}
res
}

However, some parts of the classical backtrack engine still use byte matching even for UTF-16:
&Insn::ByteSet2(bytes) => {
next_or_bt!(scm::MatchByteArraySet { bytes }.matches(input, dir, &mut pos))
}
&Insn::ByteSet3(bytes) => {
next_or_bt!(scm::MatchByteArraySet { bytes }.matches(input, dir, &mut pos))
}
&Insn::ByteSet4(bytes) => {
next_or_bt!(scm::MatchByteArraySet { bytes }.matches(input, dir, &mut pos))
}

Reproducer

use regress::Regex;

fn main() {
    let string: Vec<_> = "football".encode_utf16().collect();
    let res = regress::Regex::with_flags("foo", "i").unwrap();
    let matches: Vec<_> = res.find_from_ucs2(&string, 0).collect();
}
thread 'main' panicked at /home/jedel/.cargo/registry/src/index.crates.io-6f17d22bba15001f/regress-0.10.2/src/cursor.rs:65:5:
Not implemented for non-byte input
@ridiculousfish
Copy link
Owner

Yikes. On it.

ridiculousfish added a commit that referenced this issue Jan 18, 2025
This assumption was never right. Fixes #101
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants