Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format #94

Closed
wants to merge 72 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
21d2d70
Modified license.
fulmicoton Feb 20, 2019
ade4483
Removed mmap and set
fulmicoton Feb 20, 2019
b4650d1
Removed FstData from Streams.
fulmicoton Feb 20, 2019
3a0925a
Using Into
fulmicoton Feb 20, 2019
4b2f4c9
Using Vec<u8> by default
fulmicoton Feb 20, 2019
49a4a9f
Removing FstData
fulmicoton Feb 20, 2019
a445080
Not copying the fst data
fulmicoton Feb 20, 2019
121f4bb
simpler way to deal with the borrow checker
fulmicoton Feb 20, 2019
c5f7147
Moving fst-regex within the same crate
fulmicoton Feb 20, 2019
cc2d05b
Editing version and cleanup
fulmicoton Feb 21, 2019
8e40cb5
Version 0.1
fulmicoton Feb 21, 2019
c5c2854
Merge branch 'master' of https://github.com/BurntSushi/fst
hntd187 Aug 2, 2019
e10e671
Added the PR https://github.com/BurntSushi/fst/pull/61 to tantivy-fst
hntd187 Aug 2, 2019
abe48ea
Adapted travis ci
fulmicoton Aug 5, 2019
eec2957
Fixing doctest
fulmicoton Aug 5, 2019
943d01c
Added travis ci badge in README
fulmicoton Aug 5, 2019
5f50407
Address review commits
hntd187 Aug 5, 2019
b4da669
Merge pull request #1 from hntd187/master
fulmicoton Aug 8, 2019
24c2f88
Added vscode to gitignore
Dec 8, 2019
36953d1
Refactored to support min and max as bound
Dec 8, 2019
2205238
Refactored to support min and max as bound
Dec 8, 2019
5f39bf3
added basic needed functions
Dec 8, 2019
54d501e
Refactored with starting and next transition methods
Dec 14, 2019
f2e2d98
Added reverse to stream interface
Dec 19, 2019
14ba77a
added tests for next transtion and starting transition
Dec 19, 2019
1ff80bb
Refactored stack logic
Dec 19, 2019
df92a73
made basic reverse traversal work
Dec 24, 2019
10cc7ef
made more return stack work
Dec 24, 2019
6286c77
refactored done
Dec 24, 2019
ada4235
fixed bug with seek done
Dec 24, 2019
b47e59e
made everything except from empty ranges work
Dec 24, 2019
4196ac5
made reverse traversal work :)
Dec 24, 2019
e1fa6cb
added back some tests
Dec 24, 2019
eb3a849
Looked into removing return stack
Dec 25, 2019
677708a
simplefied out_of_bounds logic
Dec 25, 2019
7b851e3
Added a failing unit test. Added trivial change.
fulmicoton Dec 25, 2019
9e92a04
Moved code around and added some comments
Dec 26, 2019
877d9b7
Fixed logic for handling included bounds of length 1
Dec 26, 2019
124d183
added more unit tests
Dec 26, 2019
2bf2477
refactoring
Dec 26, 2019
f8c0ce6
removed return stack
Dec 26, 2019
ba92687
Changed public interface to be inline with rust 'standard'
Dec 26, 2019
f3b2681
More refactoring and documentation
Dec 26, 2019
4a29df5
Moved method to helper section
Dec 26, 2019
883bcdc
Style fix
Dec 26, 2019
a798367
Made tests more consistent and added stream input capacity constant
Jan 6, 2020
9a7c2c4
Adding failing unit test.
fulmicoton Jan 7, 2020
327407f
added tests for aut range
Jan 8, 2020
81a2c18
changed clone to resize and copy_from_slice
Jan 8, 2020
8a3d130
test using proptest
Jan 8, 2020
50413b7
added proptest and fixed bug
Jan 8, 2020
c92478d
Fixed bug in transition_within_bounds and added unit tests
Jan 8, 2020
704e58a
Merge branch 'halvorboe-reverse3' into checkout-proptest
Jan 8, 2020
c8ebd02
Merge pull request #10 from tantivy-search/checkout-proptest
fulmicoton Jan 9, 2020
aee49c9
minor changes
fulmicoton Jan 9, 2020
4bca2d3
Merge branch 'halvorboe-reverse3' of github.com:tantivy-search/fst in…
fulmicoton Jan 9, 2020
a5e7f13
minor changes
fulmicoton Jan 9, 2020
2372318
Reduced proptest pattern to a-c
fulmicoton Jan 9, 2020
9352729
fixed proptest
Jan 9, 2020
0eee7da
Simplified proptest.
fulmicoton Jan 9, 2020
53a12fd
Simplifying unit test
fulmicoton Jan 9, 2020
eb1b43d
fixed failing test, and added new failing test :)
fulmicoton Jan 10, 2020
9790a78
Fixed bug empty output and min
Jan 10, 2020
a67b081
Added bench, set the number of generates proptest case is set to 1000
fulmicoton Jan 16, 2020
5abf43f
Setting the stream for backward iteration is done in the builder not …
fulmicoton Jan 16, 2020
4093700
Removed has_seeked
fulmicoton Jan 16, 2020
4dbc364
Minor changes
fulmicoton Jan 17, 2020
b4f01d6
Avoid using the secondary input when we are iterating forward
fulmicoton Jan 17, 2020
546a3c8
Removed inp_return buffer.
fulmicoton Jan 17, 2020
478bc67
cargo fmt
fulmicoton Jan 17, 2020
75a507b
Cargo clippy
fulmicoton Jan 17, 2020
617b0b8
Edition 2018
fulmicoton Jan 17, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
.*.swp
.idea
tags
target
*.lock
Expand All @@ -12,3 +13,4 @@ words
dict
test
months
.vscode
1 change: 0 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
language: rust
rust:
- 1.24.0
- stable
- beta
- nightly
Expand Down
30 changes: 13 additions & 17 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,21 +1,16 @@
[package]
name = "fst"
version = "0.3.5" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
name = "tantivy-fst"
version = "0.1.0"
authors = ["Paul Masurel<paul.masurel@gmail.com>"] # forked from Andrew Gallant's work
description = """
Use finite state transducers to compactly represents sets or maps of many
strings (> 1 billion is possible).
This is a fork from the
"""
documentation = "https://docs.rs/fst"
homepage = "https://github.com/BurntSushi/fst"
repository = "https://github.com/BurntSushi/fst"
documentation = "https://docs.rs/tantivy-fst"
repository = "https://github.com/tantivy-search/fst"
readme = "README.md"
keywords = ["search", "information", "retrieval", "dictionary", "map"]
keywords = ["donotuseme"]
license = "Unlicense/MIT"

[features]
mmap = ["memmap"]
default = ["mmap"]
edition = "2018"

[[bench]]
name = "build"
Expand All @@ -31,15 +26,16 @@ bench = true

[dependencies]
byteorder = "1"
memmap = { version = "0.6.0", optional = true }
regex-syntax = "0.4"
utf8-ranges = "1"
levenshtein_automata="0.1"

[dev-dependencies]
fnv = "1.0.5"
fst-levenshtein = { version = "0.2", path = "fst-levenshtein" }
fst-regex = { version = "0.2", path = "fst-regex" }
lazy_static = "0.2.8"
lazy_static = "1.4"
quickcheck = { version = "0.7", default-features = false }
rand = "0.5"
proptest = "0.9.4"

[profile.release]
debug = true
Expand Down
1 change: 1 addition & 0 deletions LICENSE-MIT
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
The MIT License (MIT)

Copyright (c) 2015 Andrew Gallant
Copyright (c) 2019 Paul Masurel

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
91 changes: 6 additions & 85 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,90 +1,11 @@
fst
===
This crate provides a fast implementation of ordered sets and maps using finite
state machines. In particular, it makes use of finite state transducers to map
keys to values as the machine is executed. Using finite state machines as data
structures enables us to store keys in a compact format that is also easily
searchable. For example, this crate leverages memory maps to make range queries
very fast.

Check out my blog post
[Index 1,600,000,000 Keys with Automata and
Rust](http://blog.burntsushi.net/transducers/)
for extensive background, examples and experiments.

[![Linux build status](https://travis-ci.org/BurntSushi/fst.svg?branch=master)](https://travis-ci.org/BurntSushi/fst)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/fst?svg=true)](https://ci.appveyor.com/project/BurntSushi/fst)
[![](http://meritbadge.herokuapp.com/fst)](https://crates.io/crates/fst)

Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).


### Documentation

[Full API documentation and examples.](http://burntsushi.net/rustdoc/fst/)

The
[`fst-regex`](https://docs.rs/fst-regex)
and
[`fst-levenshtein`](https://docs.rs/fst-levenshtein)
crates provide regular expression matching and fuzzy searching on FSTs,
respectively.


### Installation

Simply add a corresponding entry to your `Cargo.toml` dependency list:
[![Build Status](https://travis-ci.org/tantivy-search/fst.svg?branch=master)](https://travis-ci.org/tantivy-search/fst)

```toml,ignore
[dependencies]
fst = "0.3"
```

And add this to your crate root:

```rust,ignore
extern crate fst;
```


### Example

This example demonstrates building a set in memory and executing a fuzzy query
against it. You'll need `fst = "0.3"` and `fst-levenshtein = "0.2"` in your
`Cargo.toml`.

```rust
extern crate fst;
extern crate fst_levenshtein;

use std::error::Error;
use std::process;

use fst::{IntoStreamer, Set};
use fst_levenshtein::Levenshtein;

fn try_main() -> Result<(), Box<Error>> {
// A convenient way to create sets in memory.
let keys = vec!["fa", "fo", "fob", "focus", "foo", "food", "foul"];
let set = Set::from_iter(keys)?;

// Build our fuzzy query.
let lev = Levenshtein::new("foo", 1)?;
tantivy-fst
===

// Apply our fuzzy query to the set we built.
let stream = set.search(lev).into_stream();

let keys = stream.into_strs()?;
assert_eq!(keys, vec!["fo", "fob", "foo", "food"]);
Ok(())
}
# WARNING: This is not the crate you are looking for.

fn main() {
if let Err(err) = try_main() {
eprintln!("{}", err);
process::exit(1);
}
}
```
This crate is a fork of the `fst` crate to better fit the need of tantivy.
You are probably looking for the [fst](https://github.com/BurntSushi/fst) crate

Check out the documentation for a lot more examples!
13 changes: 8 additions & 5 deletions benches/build.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#![feature(test)]

extern crate fst;
extern crate tantivy_fst;
extern crate test;

use std::collections::{BTreeMap, BTreeSet, HashMap, HashSet};

use fst::raw::{Builder, Fst};
use tantivy_fst::raw::{Builder, Fst};
use test::Bencher;

const WORDS: &'static str = include_str!("./../data/words-10000");
Expand All @@ -15,7 +15,10 @@ fn get_words() -> Vec<String> {
}

fn get_words_outputs() -> Vec<(String, u64)> {
WORDS.lines().map(|s| (s.to_owned(), s.len() as u64)).collect()
WORDS
.lines()
.map(|s| (s.to_owned(), s.len() as u64))
.collect()
}

#[bench]
Expand All @@ -27,7 +30,7 @@ fn build_fst_set(b: &mut Bencher) {
for word in &words {
bfst.add(word).unwrap();
}
Fst::from_bytes(bfst.into_inner().unwrap()).unwrap();
Fst::new(bfst.into_inner().unwrap()).unwrap();
});
}

Expand All @@ -40,7 +43,7 @@ fn build_fst_map(b: &mut Bencher) {
for &(ref word, len) in &words {
bfst.insert(word, len).unwrap();
}
Fst::from_bytes(bfst.into_inner().unwrap()).unwrap();
Fst::new(bfst.into_inner().unwrap()).unwrap();
});
}

Expand Down
52 changes: 36 additions & 16 deletions benches/search.rs
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
#![feature(test)]

extern crate fnv;
extern crate fst;
#[macro_use] extern crate lazy_static;
extern crate tantivy_fst;
#[macro_use]
extern crate lazy_static;
extern crate test;

const STR_WORDS: &'static str = include_str!("./../data/words-100000");
Expand All @@ -24,7 +25,7 @@ macro_rules! search {
use std::hash::BuildHasherDefault;

use fnv::FnvHasher;
use fst::raw::{Builder, Fst};
use tantivy_fst::raw::{Builder, Fst};
use test::Bencher;

#[bench]
Expand All @@ -36,7 +37,7 @@ macro_rules! search {
bfst.add(word).unwrap();
}
let bytes = bfst.into_inner().unwrap();
Fst::from_bytes(bytes).unwrap()
Fst::new(bytes).unwrap()
};
}
let mut i = 0;
Expand All @@ -46,13 +47,36 @@ macro_rules! search {
})
}

#[bench]
fn fst_streams(b: &mut Bencher) {
use tantivy_fst::{IntoStreamer, Streamer};
lazy_static! {
static ref FST: Fst = {
let mut bfst = Builder::memory();
for word in $keys.iter() {
bfst.add(word).unwrap();
}
let bytes = bfst.into_inner().unwrap();
Fst::new(bytes).unwrap()
};
}
b.iter(|| {
let start = 1000;
let stop = 2000;
let mut stream = FST.range().ge(&$keys[start]).lt(&$keys[stop]).into_stream();
let mut count = 0;
while stream.next().is_some() {
count += 1;
}
assert_eq!(count, stop - start);
})
}

#[bench]
fn hash_fnv_contains(b: &mut Bencher) {
type Fnv = BuildHasherDefault<FnvHasher>;
lazy_static! {
static ref SET: HashSet<String, Fnv> = {
$keys.clone().into_iter().collect()
};
static ref SET: HashSet<String, Fnv> = { $keys.clone().into_iter().collect() };
}
let mut i = 0;
b.iter(|| {
Expand All @@ -64,9 +88,7 @@ macro_rules! search {
#[bench]
fn hash_sip_contains(b: &mut Bencher) {
lazy_static! {
static ref SET: HashSet<String> = {
$keys.clone().into_iter().collect()
};
static ref SET: HashSet<String> = { $keys.clone().into_iter().collect() };
}
let mut i = 0;
b.iter(|| {
Expand All @@ -78,9 +100,7 @@ macro_rules! search {
#[bench]
fn btree_contains(b: &mut Bencher) {
lazy_static! {
static ref SET: BTreeSet<String> = {
$keys.clone().into_iter().collect()
};
static ref SET: BTreeSet<String> = { $keys.clone().into_iter().collect() };
}
let mut i = 0;
b.iter(|| {
Expand All @@ -89,8 +109,8 @@ macro_rules! search {
})
}
}
}
};
}

search!(words, ::WORDS);
search!(wiki_urls, ::WIKI_URLS);
search!(words, crate::WORDS);
search!(wiki_urls, crate::WIKI_URLS);
5 changes: 0 additions & 5 deletions ci/script.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ set -ex

cargo doc --verbose
cargo build --verbose
cargo build --verbose --manifest-path fst-bin/Cargo.toml

# If we're testing on an older version of Rust, then only check that we
# can build the crate. This is because the dev dependencies might be updated
Expand All @@ -16,7 +15,3 @@ if [ "$TRAVIS_RUST_VERSION" = "1.20.0" ]; then
fi

cargo test --verbose
cargo test --verbose --lib --no-default-features
if [ "$TRAVIS_RUST_VERSION" = "nightly" ]; then
cargo bench --verbose --no-run
fi
11 changes: 0 additions & 11 deletions ctags.rust

This file was deleted.

37 changes: 0 additions & 37 deletions fst-bin/Cargo.toml

This file was deleted.

Loading