Skip to content

Commit

Permalink
Merge branch 'master' into eat
Browse files Browse the repository at this point in the history
  • Loading branch information
jreback committed Jul 2, 2018
2 parents f560ea1 + dc45fba commit 5fabd51
Show file tree
Hide file tree
Showing 25 changed files with 657 additions and 83 deletions.
59 changes: 53 additions & 6 deletions doc/source/extending.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Extension Types

.. warning::

The :class:`pandas.api.extension.ExtensionDtype` and :class:`pandas.api.extension.ExtensionArray` APIs are new and
The :class:`pandas.api.extensions.ExtensionDtype` and :class:`pandas.api.extensions.ExtensionArray` APIs are new and
experimental. They may change between versions without warning.

Pandas defines an interface for implementing data types and arrays that *extend*
Expand All @@ -79,10 +79,10 @@ on :ref:`ecosystem.extensions`.

The interface consists of two classes.

:class:`~pandas.api.extension.ExtensionDtype`
:class:`~pandas.api.extensions.ExtensionDtype`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A :class:`pandas.api.extension.ExtensionDtype` is similar to a ``numpy.dtype`` object. It describes the
A :class:`pandas.api.extensions.ExtensionDtype` is similar to a ``numpy.dtype`` object. It describes the
data type. Implementors are responsible for a few unique items like the name.

One particularly important item is the ``type`` property. This should be the
Expand All @@ -99,9 +99,8 @@ example ``'category'`` is a registered string accessor for the ``CategoricalDtyp

See the `extension dtype dtypes`_ for more on how to register dtypes.


:class:`~pandas.api.extension.ExtensionArray`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:class:`~pandas.api.extensions.ExtensionArray`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This class provides all the array-like functionality. ExtensionArrays are
limited to 1 dimension. An ExtensionArray is linked to an ExtensionDtype via the
Expand All @@ -122,6 +121,54 @@ by some other storage type, like Python lists.
See the `extension array source`_ for the interface definition. The docstrings
and comments contain guidance for properly implementing the interface.

.. _extending.extension.operator:

:class:`~pandas.api.extensions.ExtensionArray` Operator Support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. versionadded:: 0.24.0

By default, there are no operators defined for the class :class:`~pandas.api.extensions.ExtensionArray`.
There are two approaches for providing operator support for your ExtensionArray:

1. Define each of the operators on your ``ExtensionArray`` subclass.
2. Use an operator implementation from pandas that depends on operators that are already defined
on the underlying elements (scalars) of the ExtensionArray.

For the first approach, you define selected operators, e.g., ``__add__``, ``__le__``, etc. that
you want your ``ExtensionArray`` subclass to support.

The second approach assumes that the underlying elements (i.e., scalar type) of the ``ExtensionArray``
have the individual operators already defined. In other words, if your ``ExtensionArray``
named ``MyExtensionArray`` is implemented so that each element is an instance
of the class ``MyExtensionElement``, then if the operators are defined
for ``MyExtensionElement``, the second approach will automatically
define the operators for ``MyExtensionArray``.

A mixin class, :class:`~pandas.api.extensions.ExtensionScalarOpsMixin` supports this second
approach. If developing an ``ExtensionArray`` subclass, for example ``MyExtensionArray``,
can simply include ``ExtensionScalarOpsMixin`` as a parent class of ``MyExtensionArray``,
and then call the methods :meth:`~MyExtensionArray._add_arithmetic_ops` and/or
:meth:`~MyExtensionArray._add_comparison_ops` to hook the operators into
your ``MyExtensionArray`` class, as follows:

.. code-block:: python
class MyExtensionArray(ExtensionArray, ExtensionScalarOpsMixin):
pass
MyExtensionArray._add_arithmetic_ops()
MyExtensionArray._add_comparison_ops()
Note that since ``pandas`` automatically calls the underlying operator on each
element one-by-one, this might not be as performant as implementing your own
version of the associated operators directly on the ``ExtensionArray``.

.. _extending.extension.testing:

Testing Extension Arrays
^^^^^^^^^^^^^^^^^^^^^^^^

We provide a test suite for ensuring that your extension arrays satisfy the expected
behavior. To use the test suite, you must provide several pytest fixtures and inherit
from the base test class. The required fixtures are found in
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ Fixed Regressions
- Bug in both :meth:`DataFrame.first_valid_index` and :meth:`Series.first_valid_index` raised for a row index having duplicate values (:issue:`21441`)
- Fixed regression in unary negative operations with object dtype (:issue:`21380`)
- Bug in :meth:`Timestamp.ceil` and :meth:`Timestamp.floor` when timestamp is a multiple of the rounding frequency (:issue:`21262`)
- Fixed regression in :func:`to_clipboard` that defaulted to copying dataframes with space delimited instead of tab delimited (:issue:`21104`)

.. _whatsnew_0232.performance:

Expand Down
18 changes: 18 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,22 @@ New features

- ``ExcelWriter`` now accepts ``mode`` as a keyword argument, enabling append to existing workbooks when using the ``openpyxl`` engine (:issue:`3441`)

.. _whatsnew_0240.enhancements.extension_array_operators

``ExtensionArray`` operator support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A ``Series`` based on an ``ExtensionArray`` now supports arithmetic and comparison
operators. (:issue:`19577`). There are two approaches for providing operator support for an ``ExtensionArray``:

1. Define each of the operators on your ``ExtensionArray`` subclass.
2. Use an operator implementation from pandas that depends on operators that are already defined
on the underlying elements (scalars) of the ``ExtensionArray``.

See the :ref:`ExtensionArray Operator Support
<extending.extension.operator>` documentation section for details on both
ways of adding operator support.

.. _whatsnew_0240.enhancements.other:

Other Enhancements
Expand Down Expand Up @@ -118,6 +134,7 @@ Datetimelike API Changes

- For :class:`DatetimeIndex` and :class:`TimedeltaIndex` with non-``None`` ``freq`` attribute, addition or subtraction of integer-dtyped array or ``Index`` will return an object of the same class (:issue:`19959`)
- :class:`DateOffset` objects are now immutable. Attempting to alter one of these will now raise ``AttributeError`` (:issue:`21341`)
- :class:`PeriodIndex` subtraction of another ``PeriodIndex`` will now return an object-dtype :class:`Index` of :class:`DateOffset` objects instead of raising a ``TypeError`` (:issue:`20049`)

.. _whatsnew_0240.api.extension:

Expand All @@ -127,6 +144,7 @@ ExtensionType Changes
- ``ExtensionDtype`` has gained the ability to instantiate from string dtypes, e.g. ``decimal`` would instantiate a registered ``DecimalDtype``; furthermore
the ``ExtensionDtype`` has gained the method ``construct_array_type`` (:issue:`21185`)
- The ``ExtensionArray`` constructor, ``_from_sequence`` now take the keyword arg ``copy=False`` (:issue:`21185`)
- Bug in :meth:`Series.get` for ``Series`` using ``ExtensionArray`` and integer index (:issue:`21257`)
- :meth:`Series.combine()` works correctly with :class:`~pandas.api.extensions.ExtensionArray` inside of :class:`Series` (:issue:`20825`)
- :meth:`Series.combine()` with scalar argument now works for any function type (:issue:`21248`)
-
Expand Down
3 changes: 2 additions & 1 deletion pandas/api/extensions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@
register_index_accessor,
register_series_accessor)
from pandas.core.algorithms import take # noqa
from pandas.core.arrays.base import ExtensionArray # noqa
from pandas.core.arrays.base import (ExtensionArray, # noqa
ExtensionScalarOpsMixin)
from pandas.core.dtypes.dtypes import ExtensionDtype # noqa
36 changes: 18 additions & 18 deletions pandas/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,8 @@ def observed(request):
'__mul__', '__rmul__',
'__floordiv__', '__rfloordiv__',
'__truediv__', '__rtruediv__',
'__pow__', '__rpow__']
'__pow__', '__rpow__',
'__mod__', '__rmod__']
if not PY3:
_all_arithmetic_operators.extend(['__div__', '__rdiv__'])

Expand All @@ -102,6 +103,22 @@ def all_arithmetic_operators(request):
return request.param


@pytest.fixture(params=['__eq__', '__ne__', '__le__',
'__lt__', '__ge__', '__gt__'])
def all_compare_operators(request):
"""
Fixture for dunder names for common compare operations
* >=
* >
* ==
* !=
* <
* <=
"""
return request.param


@pytest.fixture(params=[None, 'gzip', 'bz2', 'zip',
pytest.param('xz', marks=td.skip_if_no_lzma)])
def compression(request):
Expand Down Expand Up @@ -320,20 +337,3 @@ def mock():
return importlib.import_module("unittest.mock")
else:
return pytest.importorskip("mock")


@pytest.fixture(params=['__eq__', '__ne__', '__le__',
'__lt__', '__ge__', '__gt__'])
def all_compare_operators(request):
"""
Fixture for dunder names for common compare operations
* >=
* >
* ==
* !=
* <
* <=
"""

return request.param
3 changes: 2 additions & 1 deletion pandas/core/arrays/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
from .base import ExtensionArray # noqa
from .base import (ExtensionArray, # noqa
ExtensionScalarOpsMixin)
from .categorical import Categorical # noqa
127 changes: 127 additions & 0 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,13 @@
"""
import numpy as np

import operator

from pandas.errors import AbstractMethodError
from pandas.compat.numpy import function as nv
from pandas.compat import set_function_name, PY3
from pandas.core.dtypes.common import is_list_like
from pandas.core import ops

_not_implemented_message = "{} does not implement {}."

Expand Down Expand Up @@ -623,3 +628,125 @@ def _ndarray_values(self):
used for interacting with our indexers.
"""
return np.array(self)


class ExtensionOpsMixin(object):
"""
A base class for linking the operators to their dunder names
"""
@classmethod
def _add_arithmetic_ops(cls):
cls.__add__ = cls._create_arithmetic_method(operator.add)
cls.__radd__ = cls._create_arithmetic_method(ops.radd)
cls.__sub__ = cls._create_arithmetic_method(operator.sub)
cls.__rsub__ = cls._create_arithmetic_method(ops.rsub)
cls.__mul__ = cls._create_arithmetic_method(operator.mul)
cls.__rmul__ = cls._create_arithmetic_method(ops.rmul)
cls.__pow__ = cls._create_arithmetic_method(operator.pow)
cls.__rpow__ = cls._create_arithmetic_method(ops.rpow)
cls.__mod__ = cls._create_arithmetic_method(operator.mod)
cls.__rmod__ = cls._create_arithmetic_method(ops.rmod)
cls.__floordiv__ = cls._create_arithmetic_method(operator.floordiv)
cls.__rfloordiv__ = cls._create_arithmetic_method(ops.rfloordiv)
cls.__truediv__ = cls._create_arithmetic_method(operator.truediv)
cls.__rtruediv__ = cls._create_arithmetic_method(ops.rtruediv)
if not PY3:
cls.__div__ = cls._create_arithmetic_method(operator.div)
cls.__rdiv__ = cls._create_arithmetic_method(ops.rdiv)

cls.__divmod__ = cls._create_arithmetic_method(divmod)
cls.__rdivmod__ = cls._create_arithmetic_method(ops.rdivmod)

@classmethod
def _add_comparison_ops(cls):
cls.__eq__ = cls._create_comparison_method(operator.eq)
cls.__ne__ = cls._create_comparison_method(operator.ne)
cls.__lt__ = cls._create_comparison_method(operator.lt)
cls.__gt__ = cls._create_comparison_method(operator.gt)
cls.__le__ = cls._create_comparison_method(operator.le)
cls.__ge__ = cls._create_comparison_method(operator.ge)


class ExtensionScalarOpsMixin(ExtensionOpsMixin):
"""A mixin for defining the arithmetic and logical operations on
an ExtensionArray class, where it is assumed that the underlying objects
have the operators already defined.
Usage
------
If you have defined a subclass MyExtensionArray(ExtensionArray), then
use MyExtensionArray(ExtensionArray, ExtensionScalarOpsMixin) to
get the arithmetic operators. After the definition of MyExtensionArray,
insert the lines
MyExtensionArray._add_arithmetic_ops()
MyExtensionArray._add_comparison_ops()
to link the operators to your class.
"""

@classmethod
def _create_method(cls, op, coerce_to_dtype=True):
"""
A class method that returns a method that will correspond to an
operator for an ExtensionArray subclass, by dispatching to the
relevant operator defined on the individual elements of the
ExtensionArray.
Parameters
----------
op : function
An operator that takes arguments op(a, b)
coerce_to_dtype : bool
boolean indicating whether to attempt to convert
the result to the underlying ExtensionArray dtype
(default True)
Returns
-------
A method that can be bound to a method of a class
Example
-------
Given an ExtensionArray subclass called MyExtensionArray, use
>>> __add__ = cls._create_method(operator.add)
in the class definition of MyExtensionArray to create the operator
for addition, that will be based on the operator implementation
of the underlying elements of the ExtensionArray
"""

def _binop(self, other):
def convert_values(param):
if isinstance(param, ExtensionArray) or is_list_like(param):
ovalues = param
else: # Assume its an object
ovalues = [param] * len(self)
return ovalues
lvalues = self
rvalues = convert_values(other)

# If the operator is not defined for the underlying objects,
# a TypeError should be raised
res = [op(a, b) for (a, b) in zip(lvalues, rvalues)]

if coerce_to_dtype:
try:
res = self._from_sequence(res)
except TypeError:
pass

return res

op_name = ops._get_op_name(op, True)
return set_function_name(_binop, op_name, cls)

@classmethod
def _create_arithmetic_method(cls, op):
return cls._create_method(op)

@classmethod
def _create_comparison_method(cls, op):
return cls._create_method(op, coerce_to_dtype=False)
10 changes: 7 additions & 3 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -2988,16 +2988,20 @@ def get_value(self, series, key):
# use this, e.g. DatetimeIndex
s = getattr(series, '_values', None)
if isinstance(s, (ExtensionArray, Index)) and is_scalar(key):
# GH 20825
# GH 20882, 21257
# Unify Index and ExtensionArray treatment
# First try to convert the key to a location
# If that fails, see if key is an integer, and
# If that fails, raise a KeyError if an integer
# index, otherwise, see if key is an integer, and
# try that
try:
iloc = self.get_loc(key)
return s[iloc]
except KeyError:
if is_integer(key):
if (len(self) > 0 and
self.inferred_type in ['integer', 'boolean']):
raise
elif is_integer(key):
return s[key]

s = com._values_from_object(series)
Expand Down
Loading

0 comments on commit 5fabd51

Please sign in to comment.