You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thunder's Series.subset() method relies on PySpark's rdd.takeSample(). Due to a recent patch to NumPy (numpy/numpy@6b1a120), takeSample is broken on NumPy 1.9 installations because it generates random seeds that frequently exceed the maximum bound of 2 ** 32.
As a result, the following example code:
data = tsc.makeExample('pca')
data.subset(10)
Will almost always produce:
ValueError: Seed must be between 0 and 4294967295
The underlying issue needs to be fixed in PySpark, but for now we can avoid the problem by explicitly specifying a seed in the correct range.
The text was updated successfully, but these errors were encountered:
Thunder's
Series.subset()
method relies on PySpark'srdd.takeSample()
. Due to a recent patch to NumPy (numpy/numpy@6b1a120),takeSample
is broken on NumPy 1.9 installations because it generates random seeds that frequently exceed the maximum bound of 2 ** 32.As a result, the following example code:
Will almost always produce:
The underlying issue needs to be fixed in PySpark, but for now we can avoid the problem by explicitly specifying a seed in the correct range.
The text was updated successfully, but these errors were encountered: