Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Pandas v3.0 #3737

Open
zaneselvans opened this issue Jul 25, 2024 · 0 comments
Open

Update to Pandas v3.0 #3737

zaneselvans opened this issue Jul 25, 2024 · 0 comments
Labels
data-types Dtype conversions, standardization and implications of data types dependencies Pull requests that update a dependency file performance Make PUDL run faster!

Comments

@zaneselvans
Copy link
Member

zaneselvans commented Jul 25, 2024

Pandas v3.0 should come out during the summer of 2024.

  • The biggest change is deeper integration of Arrow, which should give us much better rich dtype support and potentially better performance if we move toward using Arrow as our Pandas backend, which would also dovetail nicely with our use of PyArrow to output Parquet files.
  • It's a major release and we use Pandas a ton, so we should expect breakage. We have a lot of warnings right now, and there are deprecations that we'll need to address.
@zaneselvans zaneselvans added dependencies Pull requests that update a dependency file performance Make PUDL run faster! data-types Dtype conversions, standardization and implications of data types labels Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-types Dtype conversions, standardization and implications of data types dependencies Pull requests that update a dependency file performance Make PUDL run faster!
Projects
Status: New
Development

No branches or pull requests

1 participant