Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a .search(automaton: fst::Automaton) to the TermDictionary #273

Closed
fulmicoton opened this issue Apr 23, 2018 · 3 comments · Fixed by #297
Closed

Add a .search(automaton: fst::Automaton) to the TermDictionary #273

fulmicoton opened this issue Apr 23, 2018 · 3 comments · Fixed by #297
Assignees

Comments

@fulmicoton
Copy link
Collaborator

fulmicoton commented Apr 23, 2018

This is required for #219 (sublime text like search) and #272 (FuzzyTermQuery).

Note that while similar, they are quite different in that

  • in #219, the user probably want the list of terms that matched, possibly sorted by a function of the doc frequencies and some function of the query and the term`. (See the ticket comments for some links)
  • in #272, this is effectively a query. The user wants the list of ranked documents.

The method already exists is already implemented in the fst crate.
Unfortunately it does not give access to the automaton state in the resulting Streamer.

I opened a ticket in the fst crate to discuss whether this feature should be added upstream.
Otherwise, the raw module of the fst crate exposes enough public API to reimplement the intersection in tantivy if required.

@drusellers
Copy link
Contributor

Just to be clear, what struct should have this search method added? I'm guessing the Searcher.

Is the automaton in this case an implementation of https://github.com/BurntSushi/fst/blob/master/src/automaton/mod.rs#L24 ?

@fulmicoton fulmicoton changed the title Add a .search(automaton) that intersects the term dictionary Add a .search(automaton: fst::Automaton) to the TermDictionary May 6, 2018
@fulmicoton
Copy link
Collaborator Author

I edited the title to make it clearer. The automaton I am talking about is indeed fst::Automaton.

The idea would be to have a method like this in the TermDictonary.

fn search<A: Automaton>(&'a self, automaton: A) ->TermStreamerBuilder<'a, A>

The TermStreamerBuilder would take an extra generics argument, and rely on the StreamBuilder<'a, A> interface.

If you feel uncomfortable, you can just start to write a simple unit test of what you feel should be expected

@drusellers
Copy link
Contributor

Ah, ok. Thank you. :) let me pivot and start to look at it from the direction. this makes more sense to me now. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants