Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - Solr enhancements #2246

Closed

Conversation

tfmorris
Copy link
Contributor

@tfmorris tfmorris commented Jul 30, 2019

Description

Technical

This is still a work in progress, but creating a PR for the branch to give it some visibility for review. It is based on @cdrini's PR #1843 which provided the framework.

Things that have been added include:

  • Updated schema for the temp database
  • performance improvements
  • Solr 8.1 support using the official Solr Docker image, updating the schema as necessary (minor)
  • Support for ICUFoldingFilter for author names and subject titles (can be expanded) to normalization plus case and diacritic folding so that Hervé and Herve get indexed the same (as well as more complicated cases)
  • Blacklisting of all fake subjects as candidates for authors' "Top Subjects" list (can be extended to anyplace these subjects are indexed, but this should give us a big usability boost to start with)

Things remaining to be done:

  • Fix delete by query performance issues (~50% time savings?)
  • Fix subject indexing (plus clean up some outstanding bugs like merging subjects that differ only in punctuation Design librarian process & UI for merging duplicate Subjects #65 )
  • Integration of all pieces to be able to download a dump and build an index from scratch in under 12 hours (current time is probably 3x that)

Testing

Evidence

cdrini added 30 commits July 22, 2019 11:58
Repair missing comma that was merging two entries on the author subject
blacklist. Add "Large type books" to the blacklist (not informative for
an author's "top subjects")
@tfmorris
Copy link
Contributor Author

tfmorris commented Feb 9, 2023

Abandoned years ago (although some pieces were apparently used in other PRs like #4283)

@tfmorris tfmorris closed this Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed] Needs: Review This issue/PR needs to be reviewed in order to be closed or merged (see comments). [managed] State: Work In Progress This issue is being actively worked on. [managed]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants