Python ValueError: cannot reindex from a duplicate axis

In Python, you will get a valueerror: cannot reindex from a duplicate axis usually when you set an index to a specific value, reindexing or resampling the DataFrame using reindex method.

If you look at the error message “cannot reindex from a duplicate axis“, it means that Pandas DataFrame has duplicate index values. Hence when we do certain operations such as concatenating a DataFrame, reindexing a DataFrame, or resampling a DataFrame in which the index has duplicate values, it will not work, and Python will throw a ValueError.

Verify if your DataFrame Index contains Duplicate values

When you get this error, the first thing you need to do is to check the DataFrame index for duplicate values using the below code.

df.index.is_unique

The index.is_unique method will return a boolean value. If the index has unique values, it returns True else False.

Test which values in an index is duplicate

If you want to check which values in an index have duplicates, you can use index.duplicated method as shown below.

df.index.duplicated()

The method returns an array of boolean values. The duplicated values are returned as True in an array.

idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
idx.duplicated()

Output

array([False, False,  True, False,  True])

Drop rows with duplicate index values

By using the same index.duplicated method, we can remove the duplicate values in the DataFrame using the following code.

It will traverse the DataFrame from a top-down approach and ensure all the duplicate values in the index are removed, and the unique values are preserved.

df.loc[~df.index.duplicated(), :]

Alternatively, if you use the latest version, you can even use the method df.drop_duplicates() as shown below.

Consider dataset containing ramen rating.

>>> df = pd.DataFrame({
...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
...     'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
    brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

By default, it removes duplicate rows based on all columns.

>>> df.drop_duplicates()
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

To remove duplicates on specific column(s), use subset.

>>> df.drop_duplicates(subset=['brand'])
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5

To remove duplicates and keep last occurrences, use keep.

>>> df.drop_duplicates(subset=['brand', 'style'], keep='last')
    brand style  rating
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
4  Indomie  pack     5.0

Prevent duplicate values in a DataFrame index

If you want to ensure Pandas DataFrame without duplicate values in the index, one can set a flag. Setting the allows_duplicate_labels flag to False will prevent the assignment of duplicate values.

df.flags.allows_duplicate_labels = False

Applying this flag to a DataFrame with duplicate values or assigning duplicate values will result in DuplicateLabelError: Index has duplicates.

Overwrite DataFrame index with a new one

Alternatively, to overwrite your current DataFrame index with a new one:

df.index = new_index

or, use .reset_index:

df.reset_index(level=0, inplace=True)

Remove inplace=True if you want it to return the dataframe.

Leave a Reply

Your email address will not be published.

Sign Up for Our Newsletters

Get notified of the best deals on our WordPress themes.

You May Also Like
Python String Rindex()

Python String rindex()

Table of Contents Hide rindex() Syntaxrindex() Parametersrindex() Return ValueDifference between rindex() method and rfind() methodExample 1: Find the last occurence of a string in PythonExample 2: If string is not…
View Post
Python Dir()

Python dir()

Table of Contents Hide dir() Syntax dir() Parametersdir() Return ValueExample 1: How dir() works?Example 2: When no parameters are passed to dir() method with and without importing external libraries.Example 3: When a module…
View Post
Python Read Text File

Python Read Text file

Table of Contents Hide Steps to Read Text File in Python Python open() function Methods for Reading file contentsPython close() functionExamples for Reading a Text file in Python Example 1 – Read…
View Post
Python Print To File

Python Print to File

We always use print statements in Python to display the output in the console or command line terminal. However, sometimes we want to change this behavior to print to a…
View Post
Xor In Python

XOR in Python

XOR Operator in Python is also known as “exclusive or”  that compares two binary numbers bitwise if two bits are identical XOR outputs as 0 and when two bits are different then…
View Post