diff --git a/doc/source/merging.rst b/doc/source/merging.rst index cfd3f9e88e4ea..7b4cf7cb1bda9 100644 --- a/doc/source/merging.rst +++ b/doc/source/merging.rst @@ -31,10 +31,10 @@ operations. Concatenating objects --------------------- -The :func:`~pandas.concat` function (in the main pandas namespace) does all of -the heavy lifting of performing concatenation operations along an axis while -performing optional set logic (union or intersection) of the indexes (if any) on -the other axes. Note that I say "if any" because there is only a single possible +The :func:`~pandas.concat` function (in the main pandas namespace) does all of +the heavy lifting of performing concatenation operations along an axis while +performing optional set logic (union or intersection) of the indexes (if any) on +the other axes. Note that I say "if any" because there is only a single possible axis of concatenation for Series. Before diving into all of the details of ``concat`` and what it can do, here is @@ -109,9 +109,9 @@ some configurable handling of "what to do with the other axes": to the actual data concatenation. - ``copy`` : boolean, default True. If False, do not copy data unnecessarily. -Without a little bit of context many of these arguments don't make much sense. -Let's revisit the above example. Suppose we wanted to associate specific keys -with each of the pieces of the chopped up DataFrame. We can do this using the +Without a little bit of context many of these arguments don't make much sense. +Let's revisit the above example. Suppose we wanted to associate specific keys +with each of the pieces of the chopped up DataFrame. We can do this using the ``keys`` argument: .. ipython:: python @@ -138,9 +138,9 @@ It's not a stretch to see how this can be very useful. More detail on this functionality below. .. note:: - It is worth noting that :func:`~pandas.concat` (and therefore - :func:`~pandas.append`) makes a full copy of the data, and that constantly - reusing this function can create a significant performance hit. If you need + It is worth noting that :func:`~pandas.concat` (and therefore + :func:`~pandas.append`) makes a full copy of the data, and that constantly + reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension. :: @@ -153,7 +153,7 @@ Set logic on the other axes ~~~~~~~~~~~~~~~~~~~~~~~~~~~ When gluing together multiple DataFrames, you have a choice of how to handle -the other axes (other than the one being concatenated). This can be done in +the other axes (other than the one being concatenated). This can be done in the following three ways: - Take the (sorted) union of them all, ``join='outer'``. This is the default @@ -216,8 +216,8 @@ DataFrame: Concatenating using ``append`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A useful shortcut to :func:`~pandas.concat` are the :meth:`~DataFrame.append` -instance methods on ``Series`` and ``DataFrame``. These methods actually predated +A useful shortcut to :func:`~pandas.concat` are the :meth:`~DataFrame.append` +instance methods on ``Series`` and ``DataFrame``. These methods actually predated ``concat``. They concatenate along ``axis=0``, namely the index: .. ipython:: python @@ -263,8 +263,8 @@ need to be: .. note:: - Unlike the :py:meth:`~list.append` method, which appends to the original list - and returns ``None``, :meth:`~DataFrame.append` here **does not** modify + Unlike the :py:meth:`~list.append` method, which appends to the original list + and returns ``None``, :meth:`~DataFrame.append` here **does not** modify ``df1`` and returns its copy with ``df2`` appended. .. _merging.ignore_index: @@ -362,9 +362,9 @@ Passing ``ignore_index=True`` will drop all name references. More concatenating with group keys ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A fairly common use of the ``keys`` argument is to override the column names +A fairly common use of the ``keys`` argument is to override the column names when creating a new ``DataFrame`` based on existing ``Series``. -Notice how the default behaviour consists on letting the resulting ``DataFrame`` +Notice how the default behaviour consists on letting the resulting ``DataFrame`` inherit the parent ``Series``' name, when these existed. .. ipython:: python @@ -460,7 +460,7 @@ Appending rows to a DataFrame ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ While not especially efficient (since a new object must be created), you can -append a single row to a ``DataFrame`` by passing a ``Series`` or dict to +append a single row to a ``DataFrame`` by passing a ``Series`` or dict to ``append``, which returns a new ``DataFrame`` as above. .. ipython:: python @@ -505,7 +505,7 @@ pandas has full-featured, **high performance** in-memory join operations idiomatically very similar to relational databases like SQL. These methods perform significantly better (in some cases well over an order of magnitude better) than other open source implementations (like ``base::merge.data.frame`` -in R). The reason for this is careful algorithmic design and the internal layout +in R). The reason for this is careful algorithmic design and the internal layout of the data in ``DataFrame``. See the :ref:`cookbook` for some advanced strategies. @@ -513,7 +513,7 @@ See the :ref:`cookbook` for some advanced strategies. Users who are familiar with SQL but new to pandas might be interested in a :ref:`comparison with SQL`. -pandas provides a single function, :func:`~pandas.merge`, as the entry point for +pandas provides a single function, :func:`~pandas.merge`, as the entry point for all standard database join operations between ``DataFrame`` objects: :: @@ -582,7 +582,7 @@ and ``right`` is a subclass of DataFrame, the return type will still be ``DataFrame``. ``merge`` is a function in the pandas namespace, and it is also available as a -``DataFrame`` instance method :meth:`~DataFrame.merge`, with the calling +``DataFrame`` instance method :meth:`~DataFrame.merge`, with the calling ``DataFrame `` being implicitly considered the left object in the join. The related :meth:`~DataFrame.join` method, uses ``merge`` internally for the @@ -594,7 +594,7 @@ Brief primer on merge methods (relational algebra) Experienced users of relational databases like SQL will be familiar with the terminology used to describe join operations between two SQL-table like -structures (``DataFrame`` objects). There are several cases to consider which +structures (``DataFrame`` objects). There are several cases to consider which are very important to understand: - **one-to-one** joins: for example when joining two ``DataFrame`` objects on @@ -634,8 +634,8 @@ key combination: labels=['left', 'right'], vertical=False); plt.close('all'); -Here is a more complicated example with multiple join keys. Only the keys -appearing in ``left`` and ``right`` are present (the intersection), since +Here is a more complicated example with multiple join keys. Only the keys +appearing in ``left`` and ``right`` are present (the intersection), since ``how='inner'`` by default. .. ipython:: python @@ -751,13 +751,13 @@ Checking for duplicate keys .. versionadded:: 0.21.0 -Users can use the ``validate`` argument to automatically check whether there -are unexpected duplicates in their merge keys. Key uniqueness is checked before -merge operations and so should protect against memory overflows. Checking key -uniqueness is also a good way to ensure user data structures are as expected. +Users can use the ``validate`` argument to automatically check whether there +are unexpected duplicates in their merge keys. Key uniqueness is checked before +merge operations and so should protect against memory overflows. Checking key +uniqueness is also a good way to ensure user data structures are as expected. -In the following example, there are duplicate values of ``B`` in the right -``DataFrame``. As this is not a one-to-one merge -- as specified in the +In the following example, there are duplicate values of ``B`` in the right +``DataFrame``. As this is not a one-to-one merge -- as specified in the ``validate`` argument -- an exception will be raised. @@ -770,11 +770,11 @@ In the following example, there are duplicate values of ``B`` in the right In [53]: result = pd.merge(left, right, on='B', how='outer', validate="one_to_one") ... - MergeError: Merge keys are not unique in right dataset; not a one-to-one merge + MergeError: Merge keys are not unique in right dataset; not a one-to-one merge -If the user is aware of the duplicates in the right ``DataFrame`` but wants to -ensure there are no duplicates in the left DataFrame, one can use the -``validate='one_to_many'`` argument instead, which will not raise an exception. +If the user is aware of the duplicates in the right ``DataFrame`` but wants to +ensure there are no duplicates in the left DataFrame, one can use the +``validate='one_to_many'`` argument instead, which will not raise an exception. .. ipython:: python @@ -786,8 +786,8 @@ ensure there are no duplicates in the left DataFrame, one can use the The merge indicator ~~~~~~~~~~~~~~~~~~~ -:func:`~pandas.merge` accepts the argument ``indicator``. If ``True``, a -Categorical-type column called ``_merge`` will be added to the output object +:func:`~pandas.merge` accepts the argument ``indicator``. If ``True``, a +Categorical-type column called ``_merge`` will be added to the output object that takes on values: =================================== ================ @@ -895,7 +895,7 @@ Joining on index ~~~~~~~~~~~~~~~~ :meth:`DataFrame.join` is a convenient method for combining the columns of two -potentially differently-indexed ``DataFrames`` into a single result +potentially differently-indexed ``DataFrames`` into a single result ``DataFrame``. Here is a very basic example: .. ipython:: python @@ -975,9 +975,9 @@ indexes: Joining key columns on an index ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -:meth:`~DataFrame.join` takes an optional ``on`` argument which may be a column +:meth:`~DataFrame.join` takes an optional ``on`` argument which may be a column or multiple column names, which specifies that the passed ``DataFrame`` is to be -aligned on that column in the ``DataFrame``. These two function calls are +aligned on that column in the ``DataFrame``. These two function calls are completely equivalent: :: @@ -987,7 +987,7 @@ completely equivalent: how='left', sort=False) Obviously you can choose whichever form you find more convenient. For -many-to-one joins (where one of the ``DataFrame``'s is already indexed by the +many-to-one joins (where one of the ``DataFrame``'s is already indexed by the join key), using ``join`` may be more convenient. Here is a simple example: .. ipython:: python @@ -1125,20 +1125,25 @@ This is equivalent but less verbose and more memory efficient / faster than this Joining with two multi-indexes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -This is not implemented via ``join`` at-the-moment, however it can be done using -the following code. +As of Pandas 0.23.1 the :func:`Dataframe.join` can be used to join multi-indexed ``Dataframe`` instances on the overlaping index levels .. ipython:: python - index = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'), + index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'), ('K1', 'X2')], names=['key', 'X']) left = pd.DataFrame({'A': ['A0', 'A1', 'A2'], 'B': ['B0', 'B1', 'B2']}, - index=index) + index=index_left) + + index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'), + ('K2', 'Y2'), ('K2', 'Y3')], + names=['key', 'Y']) + right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], + 'D': ['D0', 'D1', 'D2', 'D3']}, + index=index_right) - result = pd.merge(left.reset_index(), right.reset_index(), - on=['key'], how='inner').set_index(['key','X','Y']) + left.join(right) .. ipython:: python :suppress: @@ -1148,6 +1153,13 @@ the following code. labels=['left', 'right'], vertical=False); plt.close('all'); +For earlier versions it can be done using the following. + +.. ipython:: python + + pd.merge(left.reset_index(), right.reset_index(), + on=['key'], how='inner').set_index(['key','X','Y']) + .. _merging.merge_on_columns_and_levels: Merging on a combination of columns and index levels @@ -1254,7 +1266,7 @@ similarly. Joining multiple DataFrame or Panel objects ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A list or tuple of ``DataFrames`` can also be passed to :meth:`~DataFrame.join` +A list or tuple of ``DataFrames`` can also be passed to :meth:`~DataFrame.join` to join them together on their indexes. .. ipython:: python @@ -1276,7 +1288,7 @@ Merging together values within Series or DataFrame columns ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Another fairly common situation is to have two like-indexed (or similarly -indexed) ``Series`` or ``DataFrame`` objects and wanting to "patch" values in +indexed) ``Series`` or ``DataFrame`` objects and wanting to "patch" values in one object from values for matching indices in the other. Here is an example: .. ipython:: python @@ -1301,7 +1313,7 @@ For this, use the :meth:`~DataFrame.combine_first` method: plt.close('all'); Note that this method only takes values from the right ``DataFrame`` if they are -missing in the left ``DataFrame``. A related method, :meth:`~DataFrame.update`, +missing in the left ``DataFrame``. A related method, :meth:`~DataFrame.update`, alters non-NA values inplace: .. ipython:: python @@ -1353,15 +1365,15 @@ Merging AsOf .. versionadded:: 0.19.0 -A :func:`merge_asof` is similar to an ordered left-join except that we match on -nearest key rather than equal keys. For each row in the ``left`` ``DataFrame``, -we select the last row in the ``right`` ``DataFrame`` whose ``on`` key is less +A :func:`merge_asof` is similar to an ordered left-join except that we match on +nearest key rather than equal keys. For each row in the ``left`` ``DataFrame``, +we select the last row in the ``right`` ``DataFrame`` whose ``on`` key is less than the left's key. Both DataFrames must be sorted by the key. -Optionally an asof merge can perform a group-wise merge. This matches the +Optionally an asof merge can perform a group-wise merge. This matches the ``by`` key equally, in addition to the nearest match on the ``on`` key. -For example; we might have ``trades`` and ``quotes`` and we want to ``asof`` +For example; we might have ``trades`` and ``quotes`` and we want to ``asof`` merge them. .. ipython:: python @@ -1420,8 +1432,8 @@ We only asof within ``2ms`` between the quote time and the trade time. by='ticker', tolerance=pd.Timedelta('2ms')) -We only asof within ``10ms`` between the quote time and the trade time and we -exclude exact matches on time. Note that though we exclude the exact matches +We only asof within ``10ms`` between the quote time and the trade time and we +exclude exact matches on time. Note that though we exclude the exact matches (of the quotes), prior quotes **do** propagate to that point in time. .. ipython:: python diff --git a/doc/source/whatsnew/v0.23.0.txt b/doc/source/whatsnew/v0.23.0.txt index f686a042c1a74..6e94ce45fced1 100644 --- a/doc/source/whatsnew/v0.23.0.txt +++ b/doc/source/whatsnew/v0.23.0.txt @@ -59,6 +59,40 @@ The :func:`get_dummies` now accepts a ``dtype`` argument, which specifies a dtyp pd.get_dummies(df, columns=['c']).dtypes pd.get_dummies(df, columns=['c'], dtype=bool).dtypes +.. _whatsnew_0230.enhancements.join_with_two_multiindexes: + +Joining with two multi-indexes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +As of Pandas 0.23.1 the :func:`Dataframe.join` can be used to join multi-indexed ``Dataframe`` instances on the overlaping index levels + +See the :ref:`Merge, join, and concatenate +` documentation section. + +.. ipython:: python + + index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'), + ('K1', 'X2')], + names=['key', 'X']) + left = pd.DataFrame({'A': ['A0', 'A1', 'A2'], + 'B': ['B0', 'B1', 'B2']}, + index=index_left) + + index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'), + ('K2', 'Y2'), ('K2', 'Y3')], + names=['key', 'Y']) + right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], + 'D': ['D0', 'D1', 'D2', 'D3']}, + index=index_right) + + left.join(right) + +For earlier versions it can be done using the following. + +.. ipython:: python + + pd.merge(left.reset_index(), right.reset_index(), + on=['key'], how='inner').set_index(['key','X','Y']) .. _whatsnew_0230.enhancements.merge_on_columns_and_levels: diff --git a/pandas/core/indexes/base.py b/pandas/core/indexes/base.py index e82b641db98fd..bdf298be1510c 100644 --- a/pandas/core/indexes/base.py +++ b/pandas/core/indexes/base.py @@ -3498,46 +3498,68 @@ def join(self, other, how='left', level=None, return_indexers=False, def _join_multi(self, other, how, return_indexers=True): from .multi import MultiIndex - self_is_mi = isinstance(self, MultiIndex) - other_is_mi = isinstance(other, MultiIndex) + from pandas.core.reshape.merge import _complete_multilevel_join # figure out join names - self_names = com._not_none(*self.names) - other_names = com._not_none(*other.names) + self_names = list(com._not_none(*self.names)) + other_names = list(com._not_none(*other.names)) overlap = list(set(self_names) & set(other_names)) - # need at least 1 in common, but not more than 1 + # need at least 1 in common if not len(overlap): - raise ValueError("cannot join with no level specified and no " - "overlapping names") - if len(overlap) > 1: - raise NotImplementedError("merging with more than one level " - "overlap on a multi-index is not " - "implemented") - jl = overlap[0] + raise ValueError("cannot join with no overlapping index names") + + self_is_mi = isinstance(self, MultiIndex) + other_is_mi = isinstance(other, MultiIndex) + + # Drop the non matching levels + ldrop_levels = list(set(self_names) - set(overlap)) + rdrop_levels = list(set(other_names) - set(overlap)) + + if self_is_mi and other_is_mi: + self_jnlevels = self.droplevel(ldrop_levels) + other_jnlevels = other.droplevel(rdrop_levels) + + if not (self_jnlevels.is_unique and other_jnlevels.is_unique): + raise ValueError("Join on level between two MultiIndex objects" + "is ambiguous") + + dropped_levels = ldrop_levels + rdrop_levels + + join_idx, lidx, ridx = self_jnlevels.join(other_jnlevels, how, + return_indexers=True) + + levels, labels, names = _complete_multilevel_join(self, other, how, + dropped_levels, + join_idx, + lidx, ridx) + + multi_join_idx = MultiIndex(levels=levels, labels=labels, + names=names, verify_integrity=False) + + # Check for unused levels + multi_join_idx = multi_join_idx.remove_unused_levels() + + return multi_join_idx, lidx, ridx + + jl = list(overlap)[0] # make the indices into mi's that match - if not (self_is_mi and other_is_mi): - - flip_order = False - if self_is_mi: - self, other = other, self - flip_order = True - # flip if join method is right or left - how = {'right': 'left', 'left': 'right'}.get(how, how) - - level = other.names.index(jl) - result = self._join_level(other, level, how=how, - return_indexers=return_indexers) - - if flip_order: - if isinstance(result, tuple): - return result[0], result[2], result[1] - return result + flip_order = False + if self_is_mi: + self, other = other, self + flip_order = True + # flip if join method is right or left + how = {'right': 'left', 'left': 'right'}.get(how, how) + + level = other.names.index(jl) + result = self._join_level(other, level, how=how, + return_indexers=return_indexers) - # 2 multi-indexes - raise NotImplementedError("merging with both multi-indexes is not " - "implemented") + if flip_order: + if isinstance(result, tuple): + return result[0], result[2], result[1] + return result def _join_non_unique(self, other, how='left', return_indexers=False): from pandas.core.reshape.merge import _get_join_indexers diff --git a/pandas/core/reshape/merge.py b/pandas/core/reshape/merge.py index 7b1a0875bba59..3b77e2a3f480f 100644 --- a/pandas/core/reshape/merge.py +++ b/pandas/core/reshape/merge.py @@ -1143,6 +1143,82 @@ def _get_join_indexers(left_keys, right_keys, sort=False, how='inner', return join_func(lkey, rkey, count, **kwargs) +def _complete_multilevel_join(left, right, how, dropped_levels, + join_idx, lidx, ridx): + """ + *this is an internal non-public method* + + Returns the levels, labels and names of a multilevel to multilevel join + Depending on the type of join, this method restores the appropriate + dropped levels of the joined multi-index. The method relies on lidx, ridx + which hold the index positions of left and right, where a join was feasible + + Parameters + ---------- + left : Index + left index + right : Index + right index + join_idx : Index + the index of the join between the common levels of left and right + how : {'left', 'right', 'outer', 'inner'} + lidx : intp array + left indexer + right : intp array + right indexer + dropped_levels : str array + list of non-common levels + + Returns + ------- + levels : intp array + levels of combined multiindexes + labels : str array + labels of combined multiindexes + names : str array + names of combined multiindexes + + """ + + join_levels = join_idx.levels + join_labels = join_idx.labels + join_names = join_idx.names + + # lidx and ridx hold the indexes where the join occured + # for left and right respectively. If left (right) is None it means that + # the join occured on all indices of left (right) + if lidx is None: + lidx = range(0, len(left)) + + if ridx is None: + ridx = range(0, len(right)) + + # Iterate through the levels that must be restored + for dl in dropped_levels: + if dl in left.names: + idx = left + indexer = lidx + else: + idx = right + indexer = ridx + + # The index of the level name to be restored + name_idx = idx.names.index(dl) + + restore_levels = idx.levels[name_idx].values + restore_labels = idx.labels[name_idx] + + join_levels = join_levels.__add__([restore_levels]) + join_names = join_names.__add__([dl]) + + # Inject -1 in the labels list where a join was not possible + # IOW indexer[i]=-1 + labels = [restore_labels[i] if i != -1 else -1 for i in indexer] + join_labels = join_labels.__add__([labels]) + + return join_levels, join_labels, join_names + + class _OrderedMerge(_MergeOperation): _merge_type = 'ordered_merge' diff --git a/pandas/tests/reshape/merge/test_merge.py b/pandas/tests/reshape/merge/test_merge.py index 5dca45c8dd8bb..f6bccc9be6dd0 100644 --- a/pandas/tests/reshape/merge/test_merge.py +++ b/pandas/tests/reshape/merge/test_merge.py @@ -1345,10 +1345,6 @@ def test_join_multi_levels2(self): .set_index(["household_id", "asset_id", "t"]) .reindex(columns=['share', 'log_return'])) - def f(): - household.join(log_return, how='inner') - pytest.raises(NotImplementedError, f) - # this is the equivalency result = (merge(household.reset_index(), log_return.reset_index(), on=['asset_id'], how='inner') @@ -1358,7 +1354,7 @@ def f(): expected = ( DataFrame(dict( household_id=[1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4], - asset_id=["nl0000301109", "nl0000289783", "gb00b03mlx29", + asset_id=["nl0000301109", "nl0000301109", "gb00b03mlx29", "gb00b03mlx29", "gb00b03mlx29", "gb00b03mlx29", "gb00b03mlx29", "gb00b03mlx29", "lu0197800237", "lu0197800237", @@ -1371,11 +1367,118 @@ def f(): .09604978, -.06524096, .03532373, .03025441, .036997, None, None] )) - .set_index(["household_id", "asset_id", "t"])) + .set_index(["household_id", "asset_id", "t"]) + .reindex(columns=['share', 'log_return'])) + + result = (merge(household.reset_index(), log_return.reset_index(), + on=['asset_id'], how='outer') + .set_index(['household_id', 'asset_id', 't'])) + + assert_frame_equal(result, expected) + + +@pytest.fixture +def left_multi(): + return ( + DataFrame( + dict(Origin=['A', 'A', 'B', 'B', 'C'], + Destination=['A', 'B', 'A', 'C', 'A'], + Period=['AM', 'PM', 'IP', 'AM', 'OP'], + TripPurp=['hbw', 'nhb', 'hbo', 'nhb', 'hbw'], + Trips=[1987, 3647, 2470, 4296, 4444]), + columns=['Origin', 'Destination', 'Period', + 'TripPurp', 'Trips']) + .set_index(['Origin', 'Destination', 'Period', 'TripPurp'])) + + +@pytest.fixture +def right_multi(): + return ( + DataFrame( + dict(Origin=['A', 'A', 'B', 'B', 'C', 'C', 'E'], + Destination=['A', 'B', 'A', 'B', 'A', 'B', 'F'], + Period=['AM', 'PM', 'IP', 'AM', 'OP', 'IP', 'AM'], + LinkType=['a', 'a', 'c', 'b', 'a', 'b', 'a'], + Distance=[100, 80, 90, 80, 75, 35, 55]), + columns=['Origin', 'Destination', 'Period', + 'LinkType', 'Distance']) + .set_index(['Origin', 'Destination', 'Period', 'LinkType'])) + + +@pytest.fixture +def on_cols(): + return ['Origin', 'Destination', 'Period'] + + +@pytest.fixture +def idx_cols(): + return ['Origin', 'Destination', 'Period', 'TripPurp', 'LinkType'] + + +class TestJoinMultiMulti(object): + + @pytest.mark.parametrize('how', ['left', 'right', 'inner', 'outer']) + def test_join_multi_multi(self, left_multi, right_multi, how, + on_cols, idx_cols): + # Multi-index join tests + expected = (pd.merge(left_multi.reset_index(), + right_multi.reset_index(), + how=how, on=on_cols).set_index(idx_cols) + .sort_index()) + + result = left_multi.join(right_multi, how=how).sort_index() + tm.assert_frame_equal(result, expected) + + """ + @pytest.mark.parametrize('how', ['left', 'right', 'inner', 'outer']) + def test_join_multi_multi_emptylevel(self, left_multi, right_multi, how, + on_cols, idx_cols): + # Join with empty level + num_lvls = len(right_multi.index.get_level_values('Period')) + # Set one level to None + right_multi.index.set_levels([np.nan] * num_lvls, level='Period', + inplace=True) + + expected = (pd.merge(left_multi.reset_index(), + right_multi.reset_index(), + how=how, on=on_cols).set_index(idx_cols) + .sort_index()) + + result = left_multi.join(right_multi, how=how).sort_index() + tm.assert_frame_equal(result, expected) + """ + + @pytest.mark.parametrize('how', ['left', 'right', 'inner', 'outer']) + def test_join_multi_empty_frames(self, left_multi, right_multi, how, + on_cols, idx_cols): + + left_multi = left_multi.drop(columns=left_multi.columns) + right_multi = right_multi.drop(columns=right_multi.columns) + + expected = (pd.merge(left_multi.reset_index(), + right_multi.reset_index(), + how=how, on=on_cols).set_index(idx_cols) + .sort_index()) + + result = left_multi.join(right_multi, how=how).sort_index() + tm.assert_frame_equal(result, expected) + + def test_join_multi_multi_nonunique(self, left_multi): + # Non-unique resulting index + right_multi = ( + DataFrame( + dict(Origin=[1, 1, 2], + Destination=[1, 1, 1], + Period=['AM', 'AM', 'PM'], + LinkType=['a', 'b', 'a'], + Distance=[100, 110, 120]), + columns=['Origin', 'Destination', 'Period', + 'LinkType', 'Distance']) + .set_index(['Origin', 'Destination', 'Period', 'LinkType'])) def f(): - household.join(log_return, how='outer') - pytest.raises(NotImplementedError, f) + left_multi.join(right_multi, how='left') + pytest.raises(ValueError, f) @pytest.mark.parametrize("klass", [None, np.asarray, Series, Index]) def test_merge_datetime_index(self, klass):