Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File Limit Request: database-stein-watkins-huge - 3000 MB #1946

Closed
3 tasks done
mkoeppe opened this issue May 22, 2022 · 8 comments
Closed
3 tasks done

File Limit Request: database-stein-watkins-huge - 3000 MB #1946

mkoeppe opened this issue May 22, 2022 · 8 comments
Labels
limit request status: awaiting response Needs more information before proceeding

Comments

@mkoeppe
Copy link

mkoeppe commented May 22, 2022

Project URL

https://pypi.org/project/database-stein-watkins-huge

Does this project already exist?

  • Yes

New Limit

3000MB (3GB)

Update issue title

  • I have updated the title.

Which indexes

PyPI

About the project

This is a fundamental mathematical database that is in use with the SageMath project (https://www.sagemath.org/).
The files have been static since 2011. After an initial upload, we do not expect to make further changes.

The data files in this package are individually compressed with bzip2. A further size reduction is unfortunately not possible. We have a minimal version of the database in https://pypi.org/project/database-stein-watkins-mini, which can be used for CI purposes or for some basic educational use. The full database is needed for research-level work in SageMath.

Reasons for the request

As we prepare SageMath for pip-installability (https://trac.sagemath.org/ticket/29705), it becomes important to us that also the mathematical databases can be pip-installed.

Code of Conduct

  • I agree to follow the PSF Code of Conduct
@mkoeppe mkoeppe changed the title File Limit Request: database-stein-watkins - 3000 MB File Limit Request: database-stein-watkins-huge - 3000 MB May 22, 2022
@cmaureir
Copy link
Member

cmaureir commented May 24, 2022

Hey @mkoeppe 👋
Having a 30 GB package sounds a little bit complicated. Have you explore other ways for people to acquire the data? for example like NLTK does https://www.nltk.org/data.html (with an ad-hoc .download() method) so people can fetch it from one of your servers?
Besides the file limit, you will also go over the Project Size Limit, which is 10 GB.

You mentioned that there are some 'individually compressed with bzip2' file. Could it be an option to somehow split groups of those files in packages that have a similar topic? (just an idea)

edit: 3GB not 30GB!

@cmaureir cmaureir added the status: awaiting response Needs more information before proceeding label May 24, 2022
@mkoeppe
Copy link
Author

mkoeppe commented May 24, 2022

Not 30GB, "only" 3GB

@mkoeppe
Copy link
Author

mkoeppe commented May 24, 2022

Thanks for the pointer to NLTK. But what its documentation describes is exactly what I would like to avoid: Picking an installation location, configuring it using an environment variable, a discussion of "sudo" etc.
Instead I would like to use standard Python packaging and Python discovery (importlib.resources).

@cmaureir
Copy link
Member

Not 30GB, "only" 3GB

You are completely right, I was too scared when I saw 3 zeroes and I transformed them into 4 😅

@cmaureir
Copy link
Member

Hey @mkoeppe 👋
sorry for keeping you hanging for so long. After some attempts, I couldn't set 3G as a file limit for your project. We thought it was a problem with the system, some parsing, etc, etc, but it's that we have a limitation in the database, that only allows 1G uploads.

Do you think you could split this package in smaller packages? maybe there are some files that can be packaged together, and you can keep the 'database-stein-wartking-huge' being a "meta package", which depends on the other projects you can create out of them. For users it would be transparent, because they would still need to install the package from this issue.

Let me know if I can give you more pointers in case something is not clear.

@mkoeppe
Copy link
Author

mkoeppe commented Jun 27, 2022

Thanks for the update! I'll look into this approach

@di
Copy link
Member

di commented Oct 12, 2022

@mkoeppe Have you been able to work around this?

@pradyunsg
Copy link
Contributor

Closing this out due to lack of a response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
limit request status: awaiting response Needs more information before proceeding
Projects
None yet
Development

No branches or pull requests

4 participants