Search API returns different results from Web UI #3239

ghost · 2020-09-23T05:40:32Z

Describe the bug
The REST API (api/v1/search) returns different results from Web UI for the same query condition.

Environments:

OpenGrok 1.3.2
OpenJDK 1.8.0_222
OS: Amazon Linux on EC2 (4.14.146-93.123.amzn1.x86_64)
Apache Tomcat/8.5.42

To Reproduce
Steps to reproduce the behavior:
Searching from GUI, gets "Searched +full:google +refs:google (Results 25801 – 25802 of 25802) sorted by relevance"
But searching from REST API gets

curl https://<grok-server>/api/v1/search?full=google&defs=&refs=google&path=&hist=&type=&searchall=true&start=0&maxresults=1 | python -m json.tool
{
    "time": 1170,
    "resultCount": 38082,
    "startDocument": 0,
    "endDocument": 0,
    "results": {
...

Expected behavior
Web UI and API should return same results for the same search condition.

The text was updated successfully, but these errors were encountered:

vladak · 2020-09-23T07:56:42Z

How do you run the indexer ? Do you have projects enabled ?

ghost · 2020-09-23T09:18:04Z

Both web UI and API went to the same OpenGrok instance, and using the same account.
All projects were included in the search.
So, this issue should have nothing to do with indexer.

vladak · 2020-09-23T18:49:17Z

There is #3170, that's why I am asking about projects and indexer.

ghost · 2020-09-25T10:06:36Z

2020-09-25 08:38:15.698+0000 INFO t1 Indexer.parseOptions: Indexer options: [
-v, --displayRepositories, off, --optimize, on, -r, uionly, -H, -S, --depth, 99, --progress, -c, /usr/bin/ctags, -o, /var/opengrok/conf/ctags/config, -m, 256, --leadingWildCards, on, -R, configuration.ro.xml, -W, configuration.xml, -P, -U, http://localhost:9080/vanilla_android, -s, /var/opengrok/stage1/src, -d, /var/opengrok/stage1/data
]

ghost · 2020-09-25T10:07:34Z

I got more results from API than web UI.

vladak · 2023-12-19T11:32:02Z

Tried to replicate this with 1.12.28 using AOSP source code and fulltext searching for 'google' (http://localhost:8080/source/api/v1/search?projects=AOSP&full=google&maxresults=200000). Using the API I got "resultCount":41556, and using the web UI I got way less - several thousands of results as reported by the webapp. Interestingly when I refreshed the first result page, the result count was almost always different. It seems to me as if it is cycling though a small set of numbers. Even more surprising was clicking through the various result pages - progressing through results pages 1, 2, 3, ... etc. the total number of results reported with each ascending page number was higher. The last page of the results, page 3810 reported 95241 of total results. On the last page the total number of results did not change when the page was refreshed. Based on this experience, I tried the API call multiple times to see if it will change, however it remained the same.

vladak · 2023-12-19T13:12:16Z

There is quite a difference how the search is done between web UI and the API. In API, the SearchController in the end uses the SearchEngine class (via the SearchEngineWrapper subclass of the SearchController class) . This class grabs the IndexSearcher (Lucene) using

opengrok/opengrok-indexer/src/main/java/org/opengrok/indexer/search/SearchEngine.java

Line 181 in b4a9940

    
           SuperIndexSearcher superIndexSearcher = RuntimeEnvironment.getInstance().getSuperIndexSearcher("");

(where SuperIndexSearcher is a super class wrapping IndexSearcher for the purpose of "bumping" the related IndexReader after reindex so that newly indexed data can be displayed in search results) or

opengrok/opengrok-indexer/src/main/java/org/opengrok/indexer/search/SearchEngine.java

Lines 202 to 203 in b4a9940

    
           MultiReader searchables = RuntimeEnvironment.getInstance().getMultiReader(projectNames, searcherList); 
        
           searcher = RuntimeEnvironment.getInstance().getIndexSearcherFactory().newSearcher(searchables);

for project-less and project searches, respectively. The difference is that while in project-less mode the IndexSearcher is reused, with projects it is created from scratch. The query is created from the API arguments using

opengrok/opengrok-indexer/src/main/java/org/opengrok/indexer/search/SearchEngine.java

Lines 154 to 160 in b4a9940

    
           return new QueryBuilder() 
        
                   .setFreetext(freetext) 
        
                   .setDefs(definition) 
        
                   .setRefs(symbol) 
        
                   .setPath(file) 
        
                   .setHist(history) 
        
                   .setType(type);

. The search results are collected using TopScoreDocCollector (Lucene). The results are then processed by SearchEngine#results() that can actually perform re-query, i.e. perform the search once again. This is also where any context is fetched from the index and source and added to the Hit objects that are then returned in a list. The search count comes from the hits length. The hits object is acquired here:

opengrok/opengrok-indexer/src/main/java/org/opengrok/indexer/search/SearchEngine.java

Line 219 in b4a9940

hits = collector.topDocs().scoreDocs;

vladak · 2023-12-19T13:15:50Z

The web UI uses the SearchHelper class like so:

opengrok/opengrok-web/src/main/webapp/search.jsp

Line 86 in b4a9940

    
               searchHelper.prepareExec(cfg.getRequestedProjects()).executeQuery().prepareSummary();

. The IndexSearcher is acquired in SearchHelper#prepareExec():

opengrok/opengrok-indexer/src/main/java/org/opengrok/indexer/web/SearchHelper.java

Lines 400 to 402 in b4a9940

    
           reader = RuntimeEnvironment.getInstance().getMultiReader(projects, superIndexSearchers); 
        
           if (reader != null) { 
        
               searcher = RuntimeEnvironment.getInstance().getIndexSearcherFactory().newSearcher(reader);

and then used in executeQuery():

opengrok/opengrok-indexer/src/main/java/org/opengrok/indexer/web/SearchHelper.java

Lines 478 to 479 in b4a9940

    
           TopFieldDocs fdocs = searcher.search(query, start + maxItems, sort); 
        
           totalHits = fdocs.totalHits.value;

. The collected and summarized results are then embedded to the page:

opengrok/opengrok-web/src/main/webapp/search.jsp

Lines 227 to 228 in b4a9940

    
                   <table aria-label="table of results"><% 
        
                   Results.prettyPrint(out, searchHelper, start, start + thispage);

aggregated by directory:

opengrok/opengrok-indexer/src/main/java/org/opengrok/indexer/search/Results.java

Lines 109 to 110 in b4a9940

    
           ArrayList<Integer> dirDocs = dirHash.computeIfAbsent(parent, k -> new ArrayList<>()); 
        
           dirDocs.add(docId);

. The number of results reported near the top of the page comes from the totalHits field as visible above. Compared to how the hits are extracted for the API in the SearchEngine, there is no collector involved.

The API uses Lucene's public void search(Query query, Collector results) while the web UI uses public TopFieldDocs search(Query query, int n, Sort sort).

vladak added the question label Sep 23, 2020

vladak added the API label Apr 25, 2022

vladak added the webapp web application label Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search API returns different results from Web UI #3239

Search API returns different results from Web UI #3239

ghost commented Sep 23, 2020

vladak commented Sep 23, 2020

ghost commented Sep 23, 2020

vladak commented Sep 23, 2020

ghost commented Sep 25, 2020 •

edited by vladak

Loading

ghost commented Sep 25, 2020

vladak commented Dec 19, 2023

vladak commented Dec 19, 2023

vladak commented Dec 19, 2023

Search API returns different results from Web UI #3239

Search API returns different results from Web UI #3239

Comments

ghost commented Sep 23, 2020

vladak commented Sep 23, 2020

ghost commented Sep 23, 2020

vladak commented Sep 23, 2020

ghost commented Sep 25, 2020 • edited by vladak Loading

ghost commented Sep 25, 2020

vladak commented Dec 19, 2023

vladak commented Dec 19, 2023

vladak commented Dec 19, 2023

ghost commented Sep 25, 2020 •

edited by vladak

Loading