Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform adds with field updates using JSON #374

Merged
merged 2 commits into from
Oct 19, 2021

Conversation

mitchelljkotler
Copy link
Contributor

This pull request adds support for doing field updates using JSON instead of XML. I have a need for this due to the intersection of indexing nested documents and field updates.

Indexing nested documents can now work with labelled fields:
https://solr.apache.org/guide/8_0/indexing-nested-documents.html#xml-examples
https://solr.apache.org/guide/8_0/indexing-nested-documents.html#json-examples
And this is preferred over the anonymous way, using the special key _childDocuments_ which pysolr currently supports.

This currently works when using JSON serialization (the default), but the child documents end up being serialized as a string when using XML serialization (which happens when field updates is set).

Field updates can be done through JSON (the documentation actually doesn't even show the XML syntax anymore):
https://solr.apache.org/guide/8_0/updating-parts-of-documents.html#example-updating-part-of-a-document

I have implemented field updates with JSON serialization. I did not add any tests, but I believe the existing field update tests validate the JSON implementation.

This leaves XML serialization to be used only for index time boosts, which have been removed in version 7:
https://solr.apache.org/docs/7_0_0/changes/Changes.html#v7.0.0.upgrading_from_solr_6.x

  1. Index-time boosts are not supported anymore. If any boosts are provided, they will be ignored by the indexing chain. As a replacement, index-time scoring factors should be indexed in a separate field and combined with the query score using a function query.

I believe this means the XML serialization can be completely dropped, but I wanted to check with you if this is something you would be willing to merge in. If it is, I would be happy to continue working on this.

acdha
acdha previously approved these changes Oct 1, 2021
@acdha
Copy link
Collaborator

acdha commented Oct 1, 2021

This looks like a good idea — it's been a while but I believe we also will see a non-trivial performance win moving away from XML. The main thing I was thinking we should have are some tests for that conditional logic.

@mitchelljkotler
Copy link
Contributor Author

Great! Are you interested in completely removing XML? Is index time boosting something important to keep available?

Otherwise, I think the best way to test the conditional logic would be to refactor it out into another method - right now its embedded within the add method and would be hard to test. I can work on that.

@mitchelljkotler
Copy link
Contributor Author

Ok, I updated it to factor out _build_docs, which chooses between XML and JSON, and added tests for that (use JSON for plain and field updates, use XML for boosts).

I noticed two issues with the original code, which I did not want to change without checking first:

  1. If using JSON, the add checks if a single dictionary was passed, and wraps it in a list if so. For XML, you must pass a list or else it will error. This seems like it should be consistent and can be pulled up to properly wrap a single dictionary for either case.
  2. commitWithin is only used in the XML case. It is not used to determine if JSON or XML is used though, so if commitWithin is set and the JSON handler is used, it is silently dropped. I am not sure if there is a way to set commitWithin using JSON or if it being set should trigger it to choose the XML handler.

Thanks

Allows testing of logic to choose between XML and JSON
@mitchelljkotler
Copy link
Contributor Author

@acdha Just wanted to check if you got a chance to look at the latest changes, thanks

@acdha acdha merged commit cbc9b9e into django-haystack:master Oct 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants