Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Disallow dtype inference when setting Index into DataFrame #56102

Merged
merged 3 commits into from
Dec 9, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Nov 21, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

cc @jbrockmendel I think we chatted about this. Series keeps the dtype, so this is mostly for consistency and to make our lives a little easier

@phofl phofl requested a review from jbrockmendel November 21, 2023 22:00
@phofl phofl added the Deprecate Functionality to remove in pandas label Nov 21, 2023
@mroeschke mroeschke added this to the 2.2 milestone Dec 9, 2023
@mroeschke mroeschke merged commit ee6a062 into pandas-dev:main Dec 9, 2023
@mroeschke
Copy link
Member

Thanks @phofl

@phofl phofl deleted the deprecate_setitem_coercing branch December 9, 2023 19:38
# TODO: Remove kludge in sanitize_array for string mode when enforcing
# this deprecation
warnings.warn(
"Setting an Index with object dtype into a DataFrame will no longer "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "will no longer" clear enough that this refers to a future version/deprecation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I'll put up a PR to clarify

return sanitize_array(value, self.index, copy=True, allow_2d=True), None
arr = sanitize_array(value, self.index, copy=True, allow_2d=True)
if (
isinstance(value, Index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be more performant to do the Index check before the sanitize_array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? We have to sanitise anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you know you have an Index (and no dtype), sanitize_array boils down to:

if len(value) != len(index): raise ...
return value._values

so a lot of sanitize_array may be made unnecessary in this case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I want in the future, but we go through maybe_infer_to_datetimelike at the moment which changes the dtype, so we can't shortcut this before we enforce the deprecation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants