-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add {numpy, scipy}.ufuncs to the trusted list #224
Comments
I would also assume that those ufuncs are safe, e.g. they operate on a very specific set of inputs and should have no side effects, but I'm not sure how to verify this without actually going through the code. Just a quick check I did: >>> npufuncs = [getattr(np, a) for a in dir(np) if isinstance(getattr(np, a), np.ufunc)]
>>> len(npufuncs)
91
>>> spufuncs = [getattr(scipy.special, a) for a in dir(scipy.special) if isinstance(getattr(scipy.special, a), np.ufunc)]
>>> len(spufuncs)
231 Not sure if that's all. |
@rgommers would you know if we can consider all these ufuncs safe in this context, and if @BenjaminBossan 's snippet captures all of them? Context: we're created an alternative to pickle persistence format, where users would have to pass what they trust for a file to load, and right now none of the ufuncs are trusted, which are used inside sklearn estimators. We're trying to reduce the noise on the user side by trusting certain funcs/objects by default. |
Ufuncs should be stateless, so safe to obtain from "they operate on a very specific set of inputs" isn't quite true, at least in terms of Python types, given that they accept anything that can be coerced to a The list isn't quite complete for |
Hi @adrinjalali ! I'm an undergrad from India. I was exploring the repository and I find it interesting. I want to contribute to the project. Can I take up this issue? |
@anamika-yadav99 yes of course :) You can have a look at #237 for a similar pull request. |
@anamika-yadav99 Do you still plan to work on this? |
Hi @BenjaminBossan ! Apologies, give me this weekend. I'll raise a PR. |
@anamika-yadav99 a friendly ping to ask if you still plan on working on this :) |
Hi, I would be happy to contribute, if @BenjaminBossan and @anamika-yadav99 would give the 🟢 light! |
@omar-araboghli Since we haven't heard back in a while from Anamika, I think it's safe to say that you can take this task. |
Hi @BenjaminBossan and @adrinjalali, I have done some experiments so far to make sure we're including all Please check the following snippet and output and give your advice, whether it's safe to include the inspected ones. The traversal is inspired from def all_typed_instances(root_module: ModuleType, base_type: Type, modules_to_ignore: list[str] = None) -> list[str]:
if not root_module or not base_type:
return []
_MODULES_TO_IGNORE = [
"tests",
"setup",
"conftest"
]
_SUBSTRINGS_TO_IGNORE = [
".__",
"test"
]
if modules_to_ignore is None:
modules_to_ignore = _MODULES_TO_IGNORE
else:
modules_to_ignore += _MODULES_TO_IGNORE
typed_instances = []
for _, module_name, _ in pkgutil.walk_packages(root_module.__path__, prefix=f"{root_module.__name__}."):
module_parts = module_name.split('.')
if (
any(part in modules_to_ignore for part in module_parts)
or any(substring in module_name for substring in _SUBSTRINGS_TO_IGNORE)
):
continue
try:
module = importlib.import_module(module_name)
except Exception:
continue
typed_members = [
get_type_name(member[1]) for member in inspect.getmembers(module)
if isinstance(member[1], base_type)
and get_type_name(member[1]).startswith(root_module.__name__)
]
typed_instances.extend(typed_members)
return sorted(set(typed_instances))
----------------------------------------------------
numpy_v1 = sorted(set([
get_type_name(getattr(np, attr))
for attr in dir(np)
if isinstance(getattr(np, attr), np.ufunc)
and get_type_name(getattr(np, attr)).startswith("numpy")
]))
numpy_v2 = all_typed_instances(np, np.ufunc)
scipy_v1 = sorted(set([
get_type_name(getattr(scipy.special, attr))
for attr in dir(scipy.special)
if isinstance(getattr(scipy.special, attr), np.ufunc)
and get_type_name(getattr(scipy.special, attr)).startswith("scipy")
]))
scipy_v2 = all_typed_instances(scipy, np.ufunc)
----------------------------------------------------
>>> set(numpy_v1).difference(numpy_v2)
set()
>>> set(numpy_v2).difference(numpy_v1)
{'numpy.linalg._umath_linalg.lstsq_n', 'numpy.linalg._umath_linalg.eigvalsh_lo', 'numpy.core._multiarray_umath.clip', 'numpy.linalg._umath_linalg.inv', 'numpy.linalg._umath_linalg.svd_m', 'numpy.core._multiarray_umath._ones_like', 'numpy.linalg._umath_linalg.svd_n', 'numpy.linalg._umath_linalg.svd_m_s', 'numpy.linalg._umath_linalg.eig', 'numpy.linalg._umath_linalg.svd_n_f', 'numpy.linalg._umath_linalg.eigvals', 'numpy.linalg._umath_linalg.solve', 'numpy.linalg._umath_linalg.qr_complete', 'numpy.linalg._umath_linalg.eigh_lo', 'numpy.linalg._umath_linalg.qr_r_raw_n', 'numpy.linalg._umath_linalg.solve1', 'numpy.linalg._umath_linalg.eigvalsh_up', 'numpy.linalg._umath_linalg.slogdet', 'numpy.linalg._umath_linalg.qr_reduced', 'numpy.linalg._umath_linalg.svd_m_f', 'numpy.linalg._umath_linalg.eigh_up', 'numpy.linalg._umath_linalg.qr_r_raw_m', 'numpy.linalg._umath_linalg.svd_n_s', 'numpy.core._multiarray_umath._arg', 'numpy.linalg._umath_linalg.lstsq_m', 'numpy.linalg._umath_linalg.det', 'numpy.linalg._umath_linalg.cholesky_lo'}
>>> len(numpy_v1)
86
>>> len(numpy_v2)
113
----------------------------------------------------
>>> set(scipy_v1).difference(scipy_v2)
set()
>>> set(scipy_v2).difference(scipy_v1)
{'scipy.stats.invgauss_ufunc._invgauss_cdf', 'scipy.special._ufuncs._riemann_zeta', 'scipy.special._ufuncs._smirnovc', 'scipy.stats.ncx2_ufunc._ncx2_kurtosis_excess', 'scipy.stats.skewnorm_ufunc._skewnorm_ppf', 'scipy.special._ufuncs._cospi', 'scipy.stats.nct_ufunc._nct_mean', 'scipy.stats.ncx2_ufunc._ncx2_isf', 'scipy.stats.skewnorm_ufunc._skewnorm_sf', 'scipy.special._ufuncs._zeta', 'scipy.stats.beta_ufunc._beta_isf', 'scipy.stats.binom_ufunc._binom_cdf', 'scipy.stats.nct_ufunc._nct_cdf', 'scipy.stats.nbinom_ufunc._nbinom_sf', 'scipy.stats.hypergeom_ufunc._hypergeom_cdf', 'scipy.special._ufuncs._spherical_kn_d', 'scipy.stats.ncx2_ufunc._ncx2_variance', 'scipy.special._ufuncs._struve_power_series', 'scipy.stats.hypergeom_ufunc._hypergeom_isf', 'scipy.stats.nbinom_ufunc._nbinom_mean', 'scipy.special._ufuncs._log1pmx', 'scipy.stats.binom_ufunc._binom_sf', 'scipy.special._ufuncs._ellip_harm', 'scipy.special._ufuncs._spherical_yn', 'scipy.stats.invgauss_ufunc._invgauss_mean', 'scipy.stats.invgauss_ufunc._invgauss_ppf', 'scipy.stats.hypergeom_ufunc._hypergeom_variance', 'scipy.stats.skewnorm_ufunc._skewnorm_variance', 'scipy.stats.nct_ufunc._nct_sf', 'scipy.special._ufuncs._cosine_cdf', 'scipy.stats.ncx2_ufunc._ncx2_sf', 'scipy.special._ufuncs._smirnovp', 'scipy.stats.ncx2_ufunc._ncx2_ppf', 'scipy.stats.binom_ufunc._binom_kurtosis_excess', 'scipy.stats.skewnorm_ufunc._skewnorm_kurtosis_excess', 'scipy.stats.hypergeom_ufunc._hypergeom_skewness', 'scipy.stats.ncf_ufunc._ncf_sf', 'scipy.stats.nct_ufunc._nct_isf', 'scipy.special._ufuncs._kolmogc', 'scipy.stats.skewnorm_ufunc._skewnorm_isf', 'scipy.stats.ncx2_ufunc._ncx2_cdf', 'scipy.stats.hypergeom_ufunc._hypergeom_kurtosis_excess', 'scipy.stats.nbinom_ufunc._nbinom_kurtosis_excess', 'scipy.special._ufuncs._spherical_jn_d', 'scipy.stats.binom_ufunc._binom_skewness', 'scipy.stats.ncx2_ufunc._ncx2_mean', 'scipy.stats.nct_ufunc._nct_skewness', 'scipy.stats.beta_ufunc._beta_variance', 'scipy.special._ufuncs._lgam1p', 'scipy.stats.binom_ufunc._binom_isf', 'scipy.special._ufuncs._spherical_in', 'scipy.stats.beta_ufunc._beta_sf', 'scipy.stats.nbinom_ufunc._nbinom_isf', 'scipy.stats.ncf_ufunc._ncf_pdf', 'scipy.stats.hypergeom_ufunc._hypergeom_ppf', 'scipy.special._ufuncs._lambertw', 'scipy.special._ufuncs._igam_fac', 'scipy.stats.invgauss_ufunc._invgauss_isf', 'scipy.special._ufuncs._smirnovci', 'scipy.stats.beta_ufunc._beta_pdf', 'scipy.stats.ncf_ufunc._ncf_cdf', 'scipy.stats.skewnorm_ufunc._skewnorm_skewness', 'scipy.stats.nct_ufunc._nct_kurtosis_excess', 'scipy.special._ufuncs._spherical_kn', 'scipy.stats.skewnorm_ufunc._skewnorm_pdf', 'scipy.stats.nbinom_ufunc._nbinom_skewness', 'scipy.stats.invgauss_ufunc._invgauss_variance', 'scipy.stats.beta_ufunc._beta_ppf', 'scipy.stats.skewnorm_ufunc._skewnorm_mean', 'scipy.special._ufuncs._spherical_yn_d', 'scipy.stats.invgauss_ufunc._invgauss_skewness', 'scipy.stats.ncf_ufunc._ncf_kurtosis_excess', 'scipy.stats.invgauss_ufunc._invgauss_kurtosis_excess', 'scipy.stats.invgauss_ufunc._invgauss_pdf', 'scipy.special._ufuncs._lanczos_sum_expg_scaled', 'scipy.special._ufuncs._cosine_invcdf', 'scipy.stats.nct_ufunc._nct_variance', 'scipy.stats.nbinom_ufunc._nbinom_pdf', 'scipy.stats.binom_ufunc._binom_variance', 'scipy.special._ufuncs._sf_error_test_function', 'scipy.stats.ncf_ufunc._ncf_skewness', 'scipy.stats.ncx2_ufunc._ncx2_skewness', 'scipy.stats.nct_ufunc._nct_pdf', 'scipy.stats.nbinom_ufunc._nbinom_cdf', 'scipy.special._ufuncs._spherical_jn', 'scipy.stats.binom_ufunc._binom_ppf', 'scipy.special._ufuncs._factorial', 'scipy.stats.ncx2_ufunc._ncx2_pdf', 'scipy.stats.ncf_ufunc._ncf_mean', 'scipy.stats.beta_ufunc._beta_skewness', 'scipy.stats.invgauss_ufunc._invgauss_sf', 'scipy.special._ufuncs._spherical_in_d', 'scipy.stats.beta_ufunc._beta_cdf', 'scipy.special._ufuncs._kolmogp', 'scipy.stats.hypergeom_ufunc._hypergeom_sf', 'scipy.stats.hypergeom_ufunc._hypergeom_pdf', 'scipy.special._ufuncs._struve_bessel_series', 'scipy.special._ufuncs._kolmogci', 'scipy.stats.ncf_ufunc._ncf_variance', 'scipy.stats.nbinom_ufunc._nbinom_variance', 'scipy.stats.binom_ufunc._binom_pdf', 'scipy.stats.nct_ufunc._nct_ppf', 'scipy.stats.binom_ufunc._binom_mean', 'scipy.stats.beta_ufunc._beta_mean', 'scipy.stats.hypergeom_ufunc._hypergeom_mean', 'scipy.stats.skewnorm_ufunc._skewnorm_cdf', 'scipy.stats.ncf_ufunc._ncf_ppf', 'scipy.special._ufuncs._struve_asymp_large_z', 'scipy.special._ufuncs._sinpi', 'scipy.stats.beta_ufunc._beta_kurtosis_excess', 'scipy.stats.ncf_ufunc._ncf_isf', 'scipy.stats.nbinom_ufunc._nbinom_ppf'}
>>> len(scipy_v1)
230
>>> len(scipy_v2)
342 |
Thanks a lot @omar-araboghli for looking into this. I tried to run your snippet locally but there were a lot of missing imports. Could you be so kind as to provide a code block that works as is? Regarding the results, I think we're collecting too many functions here. To check, I just picked the first and last result from the list of differences you provided for numpy and scipy:
The problem with adding those is that given their "private" nature, they are much more likely to change in a way that creates maintenance burden for us (being removed, renamed, moved to a different module) while being extremely unlikely to be used by any user directly. And I would assume that if a user wants to use those private functions, they are very knowledgeable, so they should not have a problem adding them to the Therefore, I would rather be more conservative and only add "public" functions. Do you want to investigate how to filter out the "private" functions? |
Hi @BenjaminBossan and thank you for the quick review! I just commited these quick changes on my fork here and you can also quickly check them in this notebook, but with different numpy/scipy versions indeed. Thanks for the clarification about the results. I was and still am a bit sceptical about the private functions, since @rgommers mentioned As a generic approach, would we settle for getting everything possible except The filtering can be done with _MODULES_TO_IGNORE = [
"tests",
"setup",
"conftest"
]
_SUBSTRINGS_TO_IGNORE = [
".__",
"test"
] |
Note that in the most recent version of SciPy, we consistently use underscores for private things. There's some non-underscored shims for backwards compat, but those all raise warnings and are not present in |
@omar-araboghli Thx for providing more context and for the draft PR, now I understand better what's going on. I think the suggestion by Ralf of using I hope we can use the same logic for numpy, but if it's easier to start with scipy and figure out numpy later, we can just have two separate PRs. |
Unfortunately not yet, the NumPy public/private split is a huge mess. I plan to fix that for NumPy 2.0 |
Thanks for the update. @omar-araboghli so how about starting with scipy first and doing numpy after that? |
Thanks for the discussions. I will take that up tomorrow or lately by the weekend and raise questions, if any. |
…nd embedding the test within estimators_fitted.
Sorry, this should not have been closed, since numpy ufuncs are still not trusted by default. Numpy dtypes could also be added to the types trusted by default. |
Thought a separate issue will be created. I will work on this soon-ish. Thanks for pinging! |
…ypes has the same type.
I think most (if not all) ufuncs are actually safe, so we can add them to the default trusted list and reduce the size of
untrusted_types
.The text was updated successfully, but these errors were encountered: