-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Add NDFrame property to disallow duplicates #27108
Comments
Discussion we had about this: should this "property" live on the Index, or on the DataFrame/Series object? |
Yeah, my example kinda exposes that difficulty, as the new index in |
I played with this a bit today. For a user-API, I think it makes the most since to have an >>> pd.Series(data, index, allow_duplicate_labels=False) rather than >>> pd.Series(data, pd.Index(..., allow_duplicate_labels=False)) While the duplicate detection is done in the Index, it seems more ergonomic to have it on NDFrame. A potential downside is that you can't disallow duplicates in the columns while allowing duplicates in the index. If we really wanted that, we can support setting For an implementation, it seems like In [1]: import pandas as pd
In [2]: df = pd.DataFrame(index=['a', 'A'], allow_duplicate_labels=False)
In [3]: df
Out[3]:
Empty DataFrame
Columns: []
Index: [a, A]
In [4]: df.rename(str.upper)
---------------------------------------------------------------------------
DuplicateLabelError Traceback (most recent call last)
<ipython-input-4-17c8fb0b7c7f> in <module>
----> 1 df.rename(str.upper)
~/sandbox/pandas/pandas/util/_decorators.py in wrapper(*args, **kwargs)
233 @wraps(func)
234 def wrapper(*args, **kwargs) -> Callable[..., Any]:
--> 235 return func(*args, **kwargs)
236
237 kind = inspect.Parameter.POSITIONAL_OR_KEYWORD
...
~/sandbox/pandas/pandas/core/indexes/base.py in _maybe_check_unique(self)
562 # TODO: position, value, not too large.
563 msg = "Index has duplicates."
--> 564 raise DuplicateLabelError(msg)
565
566 # --------------------------------------------------------------------
DuplicateLabelError: Index has duplicates. Likewise for The changes are at https://github.com/pandas-dev/pandas/compare/master...TomAugspurger:unique-index?expand=1, but all I've needed so far is
|
In preperation for pandas-dev#27108 (disallowing duplicates), we need to enhance our metadata propagation. *We need a way for a particiular attribute to deterimine how it's propagated for a particular method*. Our current method of metadata propagation lacked two features 1. It only copies an attribute from a source NDFrame to a new NDFrame. There is no way to propagate metadata from a collection of NDFrames (say from `pd.concat`) to a new NDFrame. 2. It only and always copies the attribute. This is not always appropriate when dealing with a collection of input NDFrames, as the source attributes may differ. The resolution of conflicts will differ by attribute (for `Series.name` we might throw away the name. For `Series.allow_duplicates`, any Series disallowing duplicates should mean the output disallows duplicates)
This was closed by #28394. |
edit: see #27108 (comment)
I'd like to be able to have an index, and ensure that no operation introduces duplicates.
From here, any pandas operation that introduces duplicates (e.g.
s.loc[['a', 'a']]
) would raise, rather than return an Index with two values.The text was updated successfully, but these errors were encountered: