Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix binary extraction flow #7

Merged
merged 2 commits into from
Apr 2, 2024
Merged

Conversation

rishabh-sagar-20
Copy link
Contributor

Refactor:

  • Fix Binary Extraction in XmlPowerToolsEngine - The binary extraction flow is broken and will not extract binaries unless fixed. This PR fixes this bug.

Update:

  • Added MacOS Binary Build Instructions to Developer's Guide

The code has been updated to include support for macOS in the build and extraction process of the project. A new section is added to perform build and compression for macOS, in addition to existing mechanisms for Windows and Linux. The extraction method in engines.py is also modified to handle macOS specific binaries.
@rishabh-sagar-20 rishabh-sagar-20 changed the title Add support for macOS in build and extraction process Fix binary extraction flow Apr 2, 2024
Copy link
Owner

@JSv4 JSv4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Nice, clean improvement.

@JSv4 JSv4 merged commit 4ee2e99 into JSv4:main Apr 2, 2024
@JSv4
Copy link
Owner

JSv4 commented Apr 2, 2024

Hey @rishabh-sagar-20, running into some issues getting this packaged on PyPi due to the large binary size of the build redlining engines. I have some thoughts on how to fix this, but still working on it. If you have thoughts, definitely welcome any suggestions. Max file size we can put on PyPi is 100 MB. I'm thinking we don't distribute the binaries and instead build them on the client OR host them on AWS or something and have an install step to pull the binaries after install - like how Spacy and other NLP libraries often download large binaries.

@rishabh-sagar-20
Copy link
Contributor Author

It's feasible. Additionally, we can create distinct builds for each platform. By doing so, you can circumvent size constraints and upload it to PyPi.

@rishabh-sagar-20
Copy link
Contributor Author

What's the overall size? I came across a page stating that we can discuss with PyPi to waive the size limit.
Ref:

  1. https://pypi.org/help/#file-size-limit
  2. StackOverflow

@JSv4
Copy link
Owner

JSv4 commented Apr 2, 2024

Yes, I did see that, but it seems the limit is 60 MB (compressed) and the binaries (at least on my machine) are about 65 - 80 MB each.

@rishabh-sagar-20
Copy link
Contributor Author

I think the method you mentioned earlier, which is similar to Spacy models, should be effective. For hosting, GitHub Releases should work fine. We can update it later if we encounter any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants