Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search API returns different results from Web UI #3239

Open
ghost opened this issue Sep 23, 2020 · 8 comments
Open

Search API returns different results from Web UI #3239

ghost opened this issue Sep 23, 2020 · 8 comments
Labels

Comments

@ghost
Copy link

ghost commented Sep 23, 2020

Describe the bug
The REST API (api/v1/search) returns different results from Web UI for the same query condition.

Environments:

  • OpenGrok 1.3.2
  • OpenJDK 1.8.0_222
  • OS: Amazon Linux on EC2 (4.14.146-93.123.amzn1.x86_64)
  • Apache Tomcat/8.5.42

To Reproduce
Steps to reproduce the behavior:
Searching from GUI, gets "Searched +full:google +refs:google (Results 25801 – 25802 of 25802) sorted by relevance"
But searching from REST API gets

curl https://<grok-server>/api/v1/search?full=google&defs=&refs=google&path=&hist=&type=&searchall=true&start=0&maxresults=1 | python -m json.tool
{
    "time": 1170,
    "resultCount": 38082,
    "startDocument": 0,
    "endDocument": 0,
    "results": {
...

Expected behavior
Web UI and API should return same results for the same search condition.

@vladak
Copy link
Member

vladak commented Sep 23, 2020

How do you run the indexer ? Do you have projects enabled ?

@ghost
Copy link
Author

ghost commented Sep 23, 2020

Both web UI and API went to the same OpenGrok instance, and using the same account.
All projects were included in the search.
So, this issue should have nothing to do with indexer.

@vladak
Copy link
Member

vladak commented Sep 23, 2020

There is #3170, that's why I am asking about projects and indexer.

@ghost
Copy link
Author

ghost commented Sep 25, 2020

2020-09-25 08:38:15.698+0000 INFO t1 Indexer.parseOptions: Indexer options: [
-v, --displayRepositories, off, --optimize, on, -r, uionly, -H, -S, --depth, 99, --progress, -c, /usr/bin/ctags, -o, /var/opengrok/conf/ctags/config, -m, 256, --leadingWildCards, on, -R, configuration.ro.xml, -W, configuration.xml, -P, -U, http://localhost:9080/vanilla_android, -s, /var/opengrok/stage1/src, -d, /var/opengrok/stage1/data
]

@ghost
Copy link
Author

ghost commented Sep 25, 2020

I got more results from API than web UI.

@vladak vladak added the API label Apr 25, 2022
@vladak
Copy link
Member

vladak commented Dec 19, 2023

Tried to replicate this with 1.12.28 using AOSP source code and fulltext searching for 'google' (http://localhost:8080/source/api/v1/search?projects=AOSP&full=google&maxresults=200000). Using the API I got "resultCount":41556, and using the web UI I got way less - several thousands of results as reported by the webapp. Interestingly when I refreshed the first result page, the result count was almost always different. It seems to me as if it is cycling though a small set of numbers. Even more surprising was clicking through the various result pages - progressing through results pages 1, 2, 3, ... etc. the total number of results reported with each ascending page number was higher. The last page of the results, page 3810 reported 95241 of total results. On the last page the total number of results did not change when the page was refreshed. Based on this experience, I tried the API call multiple times to see if it will change, however it remained the same.

@vladak
Copy link
Member

vladak commented Dec 19, 2023

There is quite a difference how the search is done between web UI and the API. In API, the SearchController in the end uses the SearchEngine class (via the SearchEngineWrapper subclass of the SearchController class) . This class grabs the IndexSearcher (Lucene) using

SuperIndexSearcher superIndexSearcher = RuntimeEnvironment.getInstance().getSuperIndexSearcher("");
(where SuperIndexSearcher is a super class wrapping IndexSearcher for the purpose of "bumping" the related IndexReader after reindex so that newly indexed data can be displayed in search results) or
MultiReader searchables = RuntimeEnvironment.getInstance().getMultiReader(projectNames, searcherList);
searcher = RuntimeEnvironment.getInstance().getIndexSearcherFactory().newSearcher(searchables);
for project-less and project searches, respectively. The difference is that while in project-less mode the IndexSearcher is reused, with projects it is created from scratch. The query is created from the API arguments using
return new QueryBuilder()
.setFreetext(freetext)
.setDefs(definition)
.setRefs(symbol)
.setPath(file)
.setHist(history)
.setType(type);
. The search results are collected using TopScoreDocCollector (Lucene). The results are then processed by SearchEngine#results() that can actually perform re-query, i.e. perform the search once again. This is also where any context is fetched from the index and source and added to the Hit objects that are then returned in a list. The search count comes from the hits length. The hits object is acquired here:

@vladak
Copy link
Member

vladak commented Dec 19, 2023

The web UI uses the SearchHelper class like so:

searchHelper.prepareExec(cfg.getRequestedProjects()).executeQuery().prepareSummary();
. The IndexSearcher is acquired in SearchHelper#prepareExec():
reader = RuntimeEnvironment.getInstance().getMultiReader(projects, superIndexSearchers);
if (reader != null) {
searcher = RuntimeEnvironment.getInstance().getIndexSearcherFactory().newSearcher(reader);
and then used in executeQuery():
TopFieldDocs fdocs = searcher.search(query, start + maxItems, sort);
totalHits = fdocs.totalHits.value;
. The collected and summarized results are then embedded to the page:
<table aria-label="table of results"><%
Results.prettyPrint(out, searchHelper, start, start + thispage);
aggregated by directory:
ArrayList<Integer> dirDocs = dirHash.computeIfAbsent(parent, k -> new ArrayList<>());
dirDocs.add(docId);
. The number of results reported near the top of the page comes from the totalHits field as visible above. Compared to how the hits are extracted for the API in the SearchEngine, there is no collector involved.

The API uses Lucene's public void search(Query query, Collector results) while the web UI uses public TopFieldDocs search(Query query, int n, Sort sort).

@vladak vladak added the webapp web application label Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant