How to Fix: KeyError in Pandas?

The KeyError in Pandas occurs when you try to access the columns in pandas DataFrame, which does not exist, or you misspell them. 

Typically, we import data from the excel name, which imports the column names, and there are high chances that you misspell the column names or include an unwanted space before or after the column name.

The column names are case-sensitive, and if you make a mistake, then Python will raise an exception KeyError: ‘column_name

Let us take a simple example to demonstrate KeyError in Pandas. In this example, we create a pandas DataFrame of employee’s data, and let’s say we need to print all the employee names.

# import pandas library
import pandas
import numpy as np

# create pandas DataFrame
df =  pandas.DataFrame(np.array([["Jack", 22, "US"], ["Chandler", 55, "Canada"], ["Ross", 48, "India"]]),
                   columns=['name', 'age', 'country'])

# print names of employee
print(df["Name"])

Output

    raise KeyError(key) from err
KeyError: 'Name'

When we run the program, Python raises KeyError, since we have misspelled the “name” column as “Name”.

Solution KeyError in Pandas

We can fix the issue by correcting the spelling of the key. If we are not sure what the column names are, we can print all the columns into the list as shown below.

# import pandas library
import pandas
import numpy as np

# create pandas DataFrame
df =  pandas.DataFrame(np.array([["Jack", 22, "US"], ["Chandler", 55, "Canada"], ["Ross", 48, "India"]]),
                   columns=['name', 'age', 'country'])

# print names of employee
print(df["name"])

Output

0        Jack
1    Chandler
2        Ross
Name: name, dtype: object

We can now see a column called “name,” and we can fix our code by providing the correct spelling as a key to the pandas DataFrame, as shown below.

We can also avoid the KeyErrors raised by the compilers when an invalid key is passed. The DataFrame has a get method where we can give a column name and retrieve all the column values.

Syntax : DataFrame.get( 'column_name' , default = default_value_if_column_is_not_present)

If there are any misspelled or invalid columns, the default value will be printed instead of raising a KeyError. Let’s look at an example to demonstrate how this works.

# import pandas library
import pandas
import numpy as np

# create pandas DataFrame
df = pandas.DataFrame(np.array([["Jack", 22, "US"], ["Chandler", 55, "Canada"], ["Ross", 48, "India"]]),
                      columns=['name', 'age', 'country'])

# print names of employee
print(df.get("Name", default="Name is not present"))

Output

Name is not present

And if we provide the correct column name to the DataFrame.get() method, it will list all the column values present in that.

# import pandas library
import pandas
import numpy as np

# create pandas DataFrame
df = pandas.DataFrame(np.array([["Jack", 22, "US"], ["Chandler", 55, "Canada"], ["Ross", 48, "India"]]),
                      columns=['name', 'age', 'country'])

# print names of employee
print(df.get("name", default="Name is not present"))

Output

0        Jack
1    Chandler
2        Ross
Name: name, dtype: object