Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dispatch pd.is_na for scalar extension value #27825

Open
TomAugspurger opened this issue Aug 8, 2019 · 6 comments
Open

Dispatch pd.is_na for scalar extension value #27825

TomAugspurger opened this issue Aug 8, 2019 · 6 comments
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@TomAugspurger
Copy link
Contributor

Right now, I don't believe there's a way for an ExtensionDtype to declare a custom scalar NA value and have pd.isna(scalar) do the right thing.

_nas = object()


class NaSType(str):
    """
    NA for String type.
    """

    # TODO: enforce singleton

    def __new__(cls, value):
        if value is not _nas:
            raise ValueError("Cannot create NaS from '{}'".format(value))
        return super().__new__(cls, value)

    def __eq__(self, other):
        # TODO: array comparisons, etc.
        return False

    def __str__(self):
        return "NaS"

    def __repr__(self):
        return str(self)


NaS = NaSType(_nas)


@register_extension_dtype
class StringDtype(ExtensionDtype):

    @property
    def na_value(self):
        return NaS

    @property
    def type(self) -> Type:
        return str

    @property
    def name(self) -> str:
        return "string"

    @classmethod
    def construct_from_string(cls, string: str):
        if string in {"string", "str"}:
            return cls()
        return super().construct_from_string(string)

    @classmethod
    def construct_array_type(cls) -> "Type[StringArray]":
        return StringArray
In [18]: NaS
Out[18]: NaS

In [19]: pd.isna(NaS)
Out[19]: False

That should be True. In https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/missing.py#L131-L132 we go straight to lib missing.checknull(obj) for scalar values.

@TomAugspurger TomAugspurger added ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Aug 8, 2019
@jbrockmendel
Copy link
Member

One workaround would be to add a isna method to NaSType that always returns True.

@jorisvandenbossche
Copy link
Member

Or: don't use such a custom NaS, but a default NA provided by pandas ?

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Aug 8, 2019 via email

@jorisvandenbossche
Copy link
Member

I have been pondering over a roadmap discussion post about that the last week, so I suppose I should try to write that :)

@TomAugspurger
Copy link
Contributor Author

It'd be good to do that before the meeting next week :)

@jorisvandenbossche
Copy link
Member

I couldn't finish it yet, but a very rough sketch can already be found here: https://hackmd.io/gPDsgTsKRlyHfoCRR1i_ng (in case you already want to think on it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

4 participants