In this article, we will look at different ways to adding new column to existing DataFrame in Pandas.
Let us create a simple DataFrame that we will use as a reference throughout this article to demonstrate adding new columns into Pandas DataFrame.
# import pandas library
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'],
'points': [10, 8, 3, 5],
'runrate': [0.5, 1.4, 2, -0.6],
'wins': [5, 4, 2, 2]})
# print the DataFrame
print(df)
Output
team points runrate wins
0 India 10 0.5 5
1 South Africa 8 1.4 4
2 New Zealand 3 2.0 2
3 England 5 -0.6 2
Now that we have created a DataFrame let’s assume that we need to add a new column called “lost”, which holds the count of total matches each team has lost.
Method 1: Declare and assign a new list as a column
The simplest way is to create a new list and assign the list to the new DataFrame column. Let us see how we can achieve this with an example.
# import pandas library
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'],
'points': [10, 8, 3, 5],
'runrate': [0.5, 1.4, 2, -0.6],
'wins': [5, 4, 2, 2]})
# print the DataFrame
print(df)
# declare a new list and add the values into the list
match_lost = [2, 1, 3, 4]
# assign the list to the new DataFrame Column
df["lost"] = match_lost
# Print the new DataFrame
print(df)
Output
team points runrate wins lost
0 India 10 0.5 5 2
1 South Africa 8 1.4 4 1
2 New Zealand 3 2.0 2 3
3 England 5 -0.6 2 4
Method 2: Using the DataFrame.insert() method
The disadvantage of the above approach is that we cannot add the column at the specified position, and by default, the column is inserted towards the end, making it the last column.
We can overcome the issue using the pandas.DataFrame.insert()
method. This method is useful when you need to insert a new column in a specific position or index.
In the below example, let us insert the new column “lost” before the “wins” column. We can achieve this by inserting a new column at index 2.
# import pandas library
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'],
'points': [10, 8, 3, 5],
'runrate': [0.5, 1.4, 2, -0.6],
'wins': [5, 4, 2, 2]})
# print the DataFrame
print(df)
# insert the new column at the specific position
df.insert(3, "lost", [2, 1, 3, 4], True)
# Print the new DataFrame
print(df)
Output
team points runrate lost wins
0 India 10 0.5 2 5
1 South Africa 8 1.4 1 4
2 New Zealand 3 2.0 3 2
3 England 5 -0.6 4 2
Method 3: Using the DataFrame.assign() method
The pandas.DataFrame.assign()
method is used if we need to create multiple new columns in a DataFrame.
This method returns a new object with all original columns in addition to new ones. All the existing columns that are re-assigned will be overwritten.
In the below example, we are adding multiple columns to Pandas DataFrame.
# import pandas library
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'],
'points': [10, 8, 3, 5],
'runrate': [0.5, 1.4, 2, -0.6],
'wins': [5, 4, 2, 2]})
# print the DataFrame
print(df)
# append multiple columns to Pandas DataFrame
df2 = df.assign(lost=[2, 1, 3, 4], matches_remaining=[2, 3, 1, 1])
# Print the new DataFrame
print(df2)
Output
team points runrate wins lost matches_remaining
0 India 10 0.5 5 2 2
1 South Africa 8 1.4 4 1 3
2 New Zealand 3 2.0 2 3 1
3 England 5 -0.6 2 4 1
Method 4: Using the pandas.concat() method
We can also leverage the pandas.concat()
method to concatenate a new column to a DataFrame by passing axis=1 as an argument. This method returns a new DataFrame after concatenating the columns.
# import pandas library
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'],
'points': [10, 8, 3, 5],
'runrate': [0.5, 1.4, 2, -0.6],
'wins': [5, 4, 2, 2]})
# print the DataFrame
print(df)
# create a new DataFrame
df2 = pd.DataFrame([[1, 2], [2, 1], [3, 4], [0, 3]],
columns=['matches_left', 'lost'])
# concat and Print the new DataFrame
print(pd.concat([df, df2], axis=1))
Output
team points runrate wins matches_left lost
0 India 10 0.5 5 1 2
1 South Africa 8 1.4 4 2 1
2 New Zealand 3 2.0 2 3 4
3 England 5 -0.6 2 0 3
Method 5: Using the Dictionary
Another trick is to create a dictionary to add a new column in Pandas DataFrame. We can use the existing columns as Key to the dictionary and assign values respectively to the new column.
# import pandas library
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'],
'points': [10, 8, 3, 5],
'runrate': [0.5, 1.4, 2, -0.6],
'wins': [5, 4, 2, 2]})
# print the DataFrame
print(df)
# Create a new dictionary with keys as existing column
# and the values of new column
match_lost = {2: 'India', 1: 'South Africa', 3: 'New Zealand', 0: 'England'}
# assign the dictionary to the DataFrame Column
df['lost'] = match_lost
# print Dataframe
print(df)
Output
team points runrate wins lost
0 India 10 0.5 5 2
1 South Africa 8 1.4 4 1
2 New Zealand 3 2.0 2 3
3 England 5 -0.6 2 0
Conclusion
In this article, we saw the 5 approaches creating and assigning a list, insert()
, assign()
, concat()
and dictionary to insert new columns into Pandas DataFrame or overwrite the existing ones. Depending on the need and the requirement, you can choose one of the methods specified which are more suitable.