Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search API def return more than needed #3748

Closed
chkp-baselz opened this issue Oct 13, 2021 · 6 comments
Closed

search API def return more than needed #3748

chkp-baselz opened this issue Oct 13, 2021 · 6 comments
Assignees

Comments

@chkp-baselz
Copy link

chkp-baselz commented Oct 13, 2021

Hey,
My Opengrok version is 1-5-11
When I try to search for definitions I got each appears on the file the variable is defined there, for example,
https://opengrok.checkpoint.com:8443/source/api/v1/search?def=ws_http2_stream_error_action&projects=ivory_main
This is what I get, it should return only line 105, on UI it works and jumps me direct to line 105

{
  "time": 11,
  "resultCount": 1,
  "startDocument": 0,
  "endDocument": 0,
  "results": {
    "/ivory_main/ws/http2/ws_http2_parser.c": [
      {
        "line": " * fw ctl set int <b>ws_http2_stream_error_action</b> 0/1",
        "lineNumber": "103"
      },
      {
        "line": "int <b>ws_http2_stream_error_action</b> = E_WS_HTTP2_STREAM_ERROR_HANDLE;",
        "lineNumber": "105"
      },
      {
        "line": "\t\tswitch(<b>ws_http2_stream_error_action</b>)",
        "lineNumber": "5009"
      },
      {
        "line": "\t\t\t\tWS_DEBUG(WS_HTTP2, WS_ERROR, (\"invalid stream error action %d\", <b>ws_http2_stream_error_action</b>));",
        "lineNumber": "5035"
      }
    ]
  }
}
@vladak
Copy link
Member

vladak commented Oct 13, 2021

I assume that the search in the UI is performed by using the 'Definition' field in the search form, correct ?

@vladak vladak added bug and removed question labels Oct 13, 2021
@vladak
Copy link
Member

vladak commented Oct 13, 2021

Tried that with search for the USERNAME_FIELD definition in OpenGrok source code. Using the search API returns 5 hits in plugins/src/main/java/opengrok/auth/plugin/UserWhiteListPlugin.java which is where the actual definition is located. The API search results contain just this file even though there are some references to this symbol also in plugins/src/test/java/opengrok/auth/plugin/UserWhiteListPluginTest.java, otherwise I'd say this is wrongly formed query inside the SearchController somewhere, however this looks more like a problem with interpreting the search results.

@vladak vladak self-assigned this Oct 13, 2021
vladak pushed a commit to vladak/OpenGrok that referenced this issue Oct 13, 2021
@vladak
Copy link
Member

vladak commented Oct 13, 2021

Getting the results of search to the client is actually pretty complex.

There are major differences in how the search is done in the UI and via the API. I will try to summarize it here for future reference since I don't know whole lot about it myself.

When going via the API, it starts in SearchController#search(). This uses SearchEngineWrapper, a private static class located in the same source code file. As suggested by the name, this class is a wrapper to the SearchEngine class. The search() method then calls engine.search() (where engine is instance of SearchEngineWrapper) which returns the list of Hit objects which are then wrapped inside the SearchResult object and this gets converted automatically to the JSON representation returned to the client.

Now, for the context (sic!) of this bug, SearchEngineWrapper#search() calls SearchEngine#search() to get the search results and then calls SearchEngine#results() to convert the search results to the list of Hit objects. The SearchEngine object keeps the list of hits in an array of ScoreDoc (Lucene) objects. The SearchEngine#results() traverses this array and in this particular case uses the Context#getContext() that fills the list of Hit objects:

if (sourceContext != null) {
sourceContext.toggleAlt();
try {
if (AbstractAnalyzer.Genre.PLAIN == genre && (source != null)) {
// SRCROOT is read with UTF-8 as a default.
hasContext = sourceContext.getContext(
new InputStreamReader(new FileInputStream(
source + filename), StandardCharsets.UTF_8),
null, null, null, filename, tags, nhits > 100,
false, ret, scopes);

This method then uses PlainLineTokenizer:

PlainLineTokenizer tokens = new PlainLineTokenizer(null);
which might add Hit objects to the list for the non-definition tokens in dumpRest():
if (hits != null) {
tokens.setAlt(alt);
tokens.setHitList(hits);
if (!isDefSearch) {
tokens.printContext();
} else if (tokens.tags.containsKey(tokens.markedLine)) {
tokens.printContext();
}
which in this case is actually unwanted.

@vladak
Copy link
Member

vladak commented Oct 13, 2021

Searching from the UI is a tad different as it uses different search wraper:

SearchHelper searchHelper = cfg.prepareSearch();
// N.b. searchHelper.destroy() is called via
// WebappListener.requestDestroyed() on presence of the following
// REQUEST_ATTR.
request.setAttribute(SearchHelper.REQUEST_ATTR, searchHelper);
searchHelper.prepareExec(cfg.getRequestedProjects()).executeQuery().prepareSummary();

It builds IndexSearcher in prepareExec() and calls search() in executeQuery - the approach is similar to what SearchEngine does for the API based search.

The results are then presented via

Results.prettyPrint(out, searchHelper, start, start + thispage);
which then calls Results#printPlain(). Similarly to SearchEngine used by the API, the SearchHelper uses Context (at least in our case of plaintext) however different method:
private static void printPlain(PrintPlainFinalArgs fargs, Document doc,
int docId, String rpath) throws ClassNotFoundException, IOException {
fargs.shelp.sourceContext.toggleAlt();
boolean didPresentNew = fargs.shelp.sourceContext.getContext2(fargs.env,
which then uses OGKUnifiedHighlighter that extends Lucene's UnifiedHighlighter to display the results. It uses the query to filter the results, I think.

So, while SearchEngine (used by API) and SearchHelper (used by UI) both use the Context, they use radically different pieces of it in the generic case. There is one interesting tidbit where printPlain() falls back to using getContext() after getContext2() indicated no matching context:

if (!didPresentNew) {
/*
* Fall back to the old view, which re-analyzes text using
* PlainLinetokenizer. E.g., when source code is updated (thus
* affecting timestamps) but re-indexing is not yet complete.
*/
however in that case it sets the isDefSearch properly:
boolean isDefSearch = fargs.shelp.builder.isDefSearch();
// SRCROOT is read with UTF-8 as a default.
File sourceFile = new File(fargs.shelp.sourceRoot, rpath);
try (FileInputStream fis = new FileInputStream(sourceFile);
Reader r = IOUtils.createBOMStrippedReader(fis, StandardCharsets.UTF_8.name())) {
fargs.shelp.sourceContext.getContext(r, fargs.out,
fargs.xrefPrefix, fargs.morePrefix, rpath, tags, true,
isDefSearch, null, scopes);

@vladak
Copy link
Member

vladak commented Oct 13, 2021

The use of Context to present the results from the /search API call likely leads to problems such as #2612, #3090, #3170.

@vladak
Copy link
Member

vladak commented Oct 13, 2021

Looking at #2732, this bug could be fixed if we merged that PR since it replaces the offending sourceContext.getContext() call with something else, however it might as well reintroduce the bug - need to check this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants