Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return the automaton state with the (key, value) #60

Closed
fulmicoton opened this issue Apr 22, 2018 · 5 comments
Closed

Return the automaton state with the (key, value) #60

fulmicoton opened this issue Apr 22, 2018 · 5 comments

Comments

@fulmicoton
Copy link
Contributor

In some cases it can be useful to know the automaton state as well, when intersecting the fst with an automaton using .search(...).

For instance, levenshtein's automaton's state hold the actual levenshtein distance of the current key. It is a very valuable information that could be used for ranking the results for instance.

@BurntSushi
Copy link
Owner

Could you please state more explicitly what API changes your requesting? Please include any relevant changes in the public API documentation.

(I'm asking because this is the easiest way for me to understand the feature request.)

@fulmicoton
Copy link
Contributor Author

fulmicoton commented Apr 23, 2018

Sure...

So right now if I intersect an Automaton with an fst::Map, I will write something like .

let mut stream = fst_map.search(lev).into_stream();
while let Some((key, val)) = stream.next() {
    // ...
}

In the case where the state contains important information (e.g. a levenshtein automaton which would let you access the lev distance from its state), it might be handful to have a way to get a Streamer<Item=(&[u8], u64, Automaton::State)>.

Maybe this could be done with an extra method on the StreamBuilder.

So for instance, fn with_state(&self) -> StreamWithStateBuilder
where StreamWithStateBuilder implements the trait IntoStreamer<Item=(&'[u8], u64, Automaton::State), Into=StreamWithState>.

Note that I think there is enough API public access on the fst crate to implement this feature outside of the crate, so this is not a blocker.

In the end for the user, usage would look like this :

let mut stream = fst_map.search(lev).with_state().into_stream();
while let Some((key, val, state)) = stream.next() {
    let levenshtein_distance: usize = get_lev_distance(state);
    // ...
}

@BurntSushi
Copy link
Owner

@fulmicoton I think that seems reasonable to me!

@MartinKavik
Copy link

Hey guys, I'm writing a search library and I want to use fst for fuzzy search, but I can't compute the correct score because I can't get distances for keys.
So.. is there a way how to get them now or we have to implement the API suggested above? Can I help somehow?
Thank you!

@BurntSushi
Copy link
Owner

@MartinKavik I think submitting a PR with the above API might be a good start.

BurntSushi added a commit that referenced this issue Mar 6, 2020
This is apparently useful in some cases when access to the underlying
automaton's state can produce useful information about a match.
Regretably, the implementation requires the states to satisfy `Clone`,
or else we would otherwise need to execute the state transition twice.
In practice, states are usually `Copy` and quite small.

Closes #60, Closes #61
BurntSushi added a commit that referenced this issue Mar 6, 2020
This is apparently useful in some cases when access to the underlying
automaton's state can produce useful information about a match.
Regretably, the implementation requires the states to satisfy `Clone`,
or else we would otherwise need to execute the state transition twice.
In practice, states are usually `Copy` and quite small.

Closes #60, Closes #61
BurntSushi added a commit that referenced this issue Mar 7, 2020
This is apparently useful in some cases when access to the underlying
automaton's state can produce useful information about a match.
Regretably, the implementation requires the states to satisfy `Clone`,
or else we would otherwise need to execute the state transition twice.
In practice, states are usually `Copy` and quite small.

Closes #60, Closes #61
BurntSushi added a commit that referenced this issue Mar 7, 2020
This is apparently useful in some cases when access to the underlying
automaton's state can produce useful information about a match.
Regretably, the implementation requires the states to satisfy `Clone`,
or else we would otherwise need to execute the state transition twice.
In practice, states are usually `Copy` and quite small.

Closes #60, Closes #61
BurntSushi added a commit that referenced this issue Mar 7, 2020
This is apparently useful in some cases when access to the underlying
automaton's state can produce useful information about a match.
Regretably, the implementation requires the states to satisfy `Clone`,
or else we would otherwise need to execute the state transition twice.
In practice, states are usually `Copy` and quite small.

Closes #60, Closes #61
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants