Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring documentation for some dense models #2003

Merged
merged 2 commits into from
Oct 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 65 additions & 41 deletions docs/experiments-ance.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ python -m pyserini.search.faiss \
--index msmarco-v1-passage.ance \
--topics msmarco-passage-dev-subset \
--encoded-queries ance-msmarco-passage-dev-subset \
--output runs/run.msmarco-passage.ance.bf.tsv \
--output runs/run.msmarco-passage.ance.tsv \
--output-format msmarco \
--batch-size 36 --threads 12
--batch-size 512 --threads 16
```

The option `--encoded-queries` specifies the use of encoded queries (i.e., queries that have already been converted into dense vectors and cached).
Expand All @@ -28,9 +28,13 @@ As an alternative, replace with `--encoder castorini/ance-msmarco-passage` to pe
To evaluate:

```bash
$ python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset \
runs/run.msmarco-passage.ance.bf.tsv
python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset \
runs/run.msmarco-passage.ance.tsv
```

Results:

```
#####################
MRR @10: 0.3302
QueriesRanked: 6980
Expand All @@ -41,14 +45,18 @@ We can also use the official TREC evaluation tool `trec_eval` to compute other m
For that we first need to convert runs and qrels files to the TREC format:

```bash
$ python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-passage.ance.bf.tsv \
--output runs/run.msmarco-passage.ance.bf.trec
python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-passage.ance.tsv \
--output runs/run.msmarco-passage.ance.trec

python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset \
runs/run.msmarco-passage.ance.trec
```

$ python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset \
runs/run.msmarco-passage.ance.bf.trec
Results:

map all 0.3362
```
map all 0.3363
recall_1000 all 0.9584
```

Expand All @@ -63,7 +71,7 @@ python -m pyserini.search.faiss \
--encoded-queries ance_maxp-msmarco-doc-dev \
--output runs/run.msmarco-doc.passage.ance-maxp.txt \
--output-format msmarco \
--batch-size 36 --threads 12 \
--batch-size 512 --threads 16 \
--hits 1000 --max-passage --max-passage-hits 100
```

Expand All @@ -72,12 +80,16 @@ Same as above, replace `--encoded-queries` with `--encoder castorini/ance-msmarc
To evaluate:

```bash
$ python -m pyserini.eval.msmarco_doc_eval \
--judgments msmarco-doc-dev \
--run runs/run.msmarco-doc.passage.ance-maxp.txt
python -m pyserini.eval.msmarco_doc_eval \
--judgments msmarco-doc-dev \
--run runs/run.msmarco-doc.passage.ance-maxp.txt
```

Results:

```
#####################
MRR @100: 0.3796
MRR @100: 0.3795
QueriesRanked: 5193
#####################
```
Expand All @@ -86,14 +98,18 @@ We can also use the official TREC evaluation tool `trec_eval` to compute other m
For that we first need to convert runs and qrels files to the TREC format:

```bash
$ python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-doc.passage.ance-maxp.txt \
--output runs/run.msmarco-doc.passage.ance-maxp.trec
python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-doc.passage.ance-maxp.txt \
--output runs/run.msmarco-doc.passage.ance-maxp.trec

python -m pyserini.eval.trec_eval -c -mrecall.100 -mmap msmarco-doc-dev \
runs/run.msmarco-doc.passage.ance-maxp.trec
```

$ python -m pyserini.eval.trec_eval -c -mrecall.100 -mmap msmarco-doc-dev \
runs/run.msmarco-doc.passage.ance-maxp.trec
Results:

map all 0.3796
```
map all 0.3794
recall_100 all 0.9033
```

Expand All @@ -106,25 +122,29 @@ python -m pyserini.search.faiss \
--index wikipedia-dpr-100w.ance-multi \
--topics dpr-nq-test \
--encoded-queries ance_multi-nq-test \
--output runs/run.ance.nq-test.multi.bf.trec \
--batch-size 36 --threads 12
--output runs/run.ance.nq-test.multi.trec \
--batch-size 512 --threads 16
```

Same as above, replace `--encoded-queries` with `--encoder castorini/ance-dpr-question-multi` for on-the-fly query encoding.

To evaluate, first convert the TREC output format to DPR's `json` format:

```bash
$ python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--topics dpr-nq-test \
--index wikipedia-dpr-100w \
--input runs/run.ance.nq-test.multi.bf.trec \
--output runs/run.ance.nq-test.multi.bf.json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--topics dpr-nq-test \
--index wikipedia-dpr \
--input runs/run.ance.nq-test.multi.trec \
--output runs/run.ance.nq-test.multi.json

python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.ance.nq-test.multi.json \
--topk 20 100
```

$ python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.ance.nq-test.multi.bf.json \
--topk 20 100
Results:

```
Top20 accuracy: 0.8224
Top100 accuracy: 0.8787
```
Expand All @@ -138,25 +158,29 @@ python -m pyserini.search.faiss \
--index wikipedia-dpr-100w.ance-multi \
--topics dpr-trivia-test \
--encoded-queries ance_multi-trivia-test \
--output runs/run.ance.trivia-test.multi.bf.trec \
--batch-size 36 --threads 12
--output runs/run.ance.trivia-test.multi.trec \
--batch-size 512 --threads 16
```

Same as above, replace `--encoded-queries` with `--encoder castorini/ance-dpr-question-multi` for on-the-fly query encoding.

To evaluate, first convert the TREC output format to DPR's `json` format:

```bash
$ python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--topics dpr-trivia-test \
--index wikipedia-dpr-100w \
--input runs/run.ance.trivia-test.multi.bf.trec \
--output runs/run.ance.trivia-test.multi.bf.json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--topics dpr-trivia-test \
--index wikipedia-dpr \
--input runs/run.ance.trivia-test.multi.trec \
--output runs/run.ance.trivia-test.multi.json

$ python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.ance.trivia-test.multi.bf.json \
--topk 20 100
python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.ance.trivia-test.multi.json \
--topk 20 100
```

Results:

```
Top20 accuracy: 0.8010
Top100 accuracy: 0.8522
```
Expand Down
24 changes: 14 additions & 10 deletions docs/experiments-bpr.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ python -m pyserini.search.faiss \
--topics dpr-nq-test \
--encoded-queries bpr_single_nq-nq-test \
--output runs/run.bpr.rerank.nq-test.nq.hash.trec \
--batch-size 36 --threads 12 \
--batch-size 512 --threads 16 \
--hits 100 --binary-hits 1000 \
--searcher bpr --rerank
```
Expand All @@ -38,18 +38,22 @@ The option `--encoded-queries` specifies the use of encoded queries (i.e., queri
To evaluate, first convert the TREC output format to DPR's `json` format:

```bash
$ python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--index wikipedia-dpr-100w \
--topics dpr-nq-test \
--input runs/run.bpr.rerank.nq-test.nq.hash.trec \
--output runs/run.bpr.rerank.nq-test.nq.hash.json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--index wikipedia-dpr \
--topics dpr-nq-test \
--input runs/run.bpr.rerank.nq-test.nq.hash.trec \
--output runs/run.bpr.rerank.nq-test.nq.hash.json

python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.bpr.rerank.nq-test.nq.hash.json \
--topk 20 100
```

$ python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.bpr.rerank.nq-test.nq.hash.json \
--topk 20 100
Results:

```
Top20 accuracy: 0.7792
Top100 accuracy: 0.8573
Top100 accuracy: 0.8571
```

## Reproduction Log[*](reproducibility.md)
Expand Down
30 changes: 19 additions & 11 deletions docs/experiments-distilbert_kd.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,25 @@ python -m pyserini.search.faiss \
--index msmarco-v1-passage.distilbert-dot-margin-mse-t2 \
--topics msmarco-passage-dev-subset \
--encoded-queries distilbert_kd-msmarco-passage-dev-subset \
--output runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.bf.tsv \
--output runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.tsv \
--output-format msmarco \
--batch-size 36 --threads 12
--batch-size 512 --threads 16
```

Replace `--encoded-queries` with `--encoder sebastian-hofstaetter/distilbert-dot-margin_mse-T2-msmarco` for on-the-fly query encoding.

To evaluate:

```bash
$ python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.bf.tsv
python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.tsv
```

Results:

```
#####################
MRR @10: 0.3250
MRR @10: 0.3251
QueriesRanked: 6980
#####################
```
Expand All @@ -38,14 +42,18 @@ We can also use the official TREC evaluation tool `trec_eval` to compute other m
For that we first need to convert runs and qrels files to the TREC format:

```bash
$ python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.bf.tsv \
--output runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.bf.trec
python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.tsv \
--output runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.trec

$ python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.bf.trec
python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.trec
```

Results:

map all 0.3308
```
map all 0.3309
recall_1000 all 0.9553
```

Expand Down
31 changes: 19 additions & 12 deletions docs/experiments-distilbert_tasb.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,25 @@ python -m pyserini.search.faiss \
--index msmarco-v1-passage.distilbert-dot-tas_b-b256 \
--topics msmarco-passage-dev-subset \
--encoded-queries distilbert_tas_b-msmarco-passage-dev-subset \
--output runs/run.msmarco-passage.distilbert-dot-tas_b-b256.bf.tsv \
--output runs/run.msmarco-passage.distilbert-dot-tas_b-b256.tsv \
--output-format msmarco \
--batch-size 36 --threads 12
--batch-size 512 --threads 16
```

Replace `--encoded-queries` with `--encoder sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco` for on-the-fly query encoding.

To evaluate:


```bash
$ python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-tas_b-b256.bf.tsv
python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-tas_b-b256.tsv
```

Results:

```
#####################
MRR @10: 0.3443
MRR @10: 0.3444
QueriesRanked: 6980
#####################
```
Expand All @@ -39,14 +42,18 @@ We can also use the official TREC evaluation tool `trec_eval` to compute other m
For that we first need to convert runs and qrels files to the TREC format:

```bash
$ python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-passage.distilbert-dot-tas_b-b256.bf.tsv \
--output runs/run.msmarco-passage.distilbert-dot-tas_b-b256.bf.trec
python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-passage.distilbert-dot-tas_b-b256.tsv \
--output runs/run.msmarco-passage.distilbert-dot-tas_b-b256.trec

$ python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-tas_b-b256.bf.trec
python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-tas_b-b256.trec
```

Results:

map all 0.3514
```
map all 0.3515
recall_1000 all 0.9771
```

Expand Down
16 changes: 8 additions & 8 deletions docs/experiments-dkrr.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,20 @@ Running DKRR retrieval on `dpr-nq-dev` and `nq-test` of the Natural Questions da

```bash
python -m pyserini.search.faiss \
--index wikipedia-dpr-dkrr-nq \
--index wikipedia-dpr-100w.dkrr-nq \
--topics dpr-nq-dev \
--encoded-queries dkrr-dpr-nq-retriever-dpr-nq-dev \
--output runs/run.dpr-dkrr-nq.dev.trec \
--query-prefix question: \
--batch-size 36 --threads 12
--batch-size 512 --threads 16

python -m pyserini.search.faiss \
--index wikipedia-dpr-dkrr-nq \
--index wikipedia-dpr-100w.dkrr-nq \
--topics nq-test \
--encoded-queries dkrr-dpr-nq-retriever-nq-test \
--output runs/run.dpr-dkrr-nq.test.trec \
--query-prefix question: \
--batch-size 36 --threads 12
--batch-size 512 --threads 16
```

Alternatively, replace `--encoded-queries ...` with `--encoder castorini/dkrr-dpr-nq-retriever` for on-the-fly query encoding.
Expand Down Expand Up @@ -79,20 +79,20 @@ Running DKRR retrieval on `dpr-trivia-dev` and `dpr-trivia-test` of the TriviaQA

```bash
python -m pyserini.search.faiss \
--index wikipedia-dpr-dkrr-tqa \
--index wikipedia-dpr-100w.dkrr-tqa \
--topics dpr-trivia-dev \
--encoded-queries dkrr-dpr-tqa-retriever-dpr-tqa-dev \
--output runs/run.dpr-dkrr-trivia.dev.trec \
--query-prefix question: \
--batch-size 36 --threads 12
--batch-size 512 --threads 16

python -m pyserini.search.faiss \
--index wikipedia-dpr-dkrr-tqa \
--index wikipedia-dpr-100w.dkrr-tqa \
--topics dpr-trivia-test \
--encoded-queries dkrr-dpr-tqa-retriever-dpr-tqa-test \
--output runs/run.dpr-dkrr-trivia.test.trec \
--query-prefix question: \
--batch-size 36 --threads 12
--batch-size 512 --threads 16
```
Alternatively, replace `--encoded-queries ...` with `--encoder castorini/dkrr-dpr-tqa-retriever` for on-the-fly query encoding.

Expand Down
Loading