Python ValueError: cannot reindex from a duplicate axis

Table of Contents Hide

Verify if your DataFrame Index contains Duplicate values
Test which values in an index is duplicate
Drop rows with duplicate index values
Prevent duplicate values in a DataFrame index
Overwrite DataFrame index with a new one

In Python, you will get a valueerror: cannot reindex from a duplicate axis usually when you set an index to a specific value, reindexing or resampling the DataFrame using reindex method.

If you look at the error message “cannot reindex from a duplicate axis“, it means that Pandas DataFrame has duplicate index values. Hence when we do certain operations such as concatenating a DataFrame, reindexing a DataFrame, or resampling a DataFrame in which the index has duplicate values, it will not work, and Python will throw a ValueError.

Verify if your DataFrame Index contains Duplicate values

When you get this error, the first thing you need to do is to check the DataFrame index for duplicate values using the below code.

df.index.is_unique

The index.is_unique method will return a boolean value. If the index has unique values, it returns True else False.

Test which values in an index is duplicate

If you want to check which values in an index have duplicates, you can use index.duplicated method as shown below.

df.index.duplicated()

The method returns an array of boolean values. The duplicated values are returned as True in an array.

idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
idx.duplicated()

Output

array([False, False,  True, False,  True])

Drop rows with duplicate index values

By using the same index.duplicated method, we can remove the duplicate values in the DataFrame using the following code.

It will traverse the DataFrame from a top-down approach and ensure all the duplicate values in the index are removed, and the unique values are preserved.

df.loc[~df.index.duplicated(), :]

Alternatively, if you use the latest version, you can even use the method df.drop_duplicates() as shown below.

Consider dataset containing ramen rating.

>>> df = pd.DataFrame({
...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
...     'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
    brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

By default, it removes duplicate rows based on all columns.

>>> df.drop_duplicates()
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

To remove duplicates on specific column(s), use subset.

>>> df.drop_duplicates(subset=['brand'])
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5

To remove duplicates and keep last occurrences, use keep.

>>> df.drop_duplicates(subset=['brand', 'style'], keep='last')
    brand style  rating
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
4  Indomie  pack     5.0

Prevent duplicate values in a DataFrame index

If you want to ensure Pandas DataFrame without duplicate values in the index, one can set a flag. Setting the allows_duplicate_labels flag to False will prevent the assignment of duplicate values.

df.flags.allows_duplicate_labels = False

Applying this flag to a DataFrame with duplicate values or assigning duplicate values will result in DuplicateLabelError: Index has duplicates.

Overwrite DataFrame index with a new one

Alternatively, to overwrite your current DataFrame index with a new one:

df.index = new_index

or, use .reset_index:

df.reset_index(level=0, inplace=True)

Remove inplace=True if you want it to return the dataframe.

Python ValueError: cannot reindex from a duplicate axis

Table of Contents Hide

Verify if your DataFrame Index contains Duplicate values

Test which values in an index is duplicate

Drop rows with duplicate index values

Prevent duplicate values in a DataFrame index

Overwrite DataFrame index with a new one

Srinivas Ramakrishna

Leave a Reply Cancel reply

How to Create a Directory in Python?

Remove Character From String Python

Python Remove Newline From String

Python Reverse a List: A Step-by-Step Tutorial

SyntaxError: (unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape

How To Convert Python String To Array

How to Concatenate Strings in R

numpy.median() Function

How to Check the NumPy Version

numpy.ndarray.flatten() function

numpy.mean() Function

Table of Contents Hide

Verify if your DataFrame Index contains Duplicate values

Test which values in an index is duplicate

Drop rows with duplicate index values

Prevent duplicate values in a DataFrame index

Overwrite DataFrame index with a new one

Leave a Reply Cancel reply

Sign Up for Our Newsletters

You May Also Like