Adding new column to existing DataFrame in Pandas

In this article, we will look at different ways to adding new column to existing DataFrame in Pandas. 

Let us create a simple DataFrame that we will use as a reference throughout this article to demonstrate adding new columns into Pandas DataFrame.

# import pandas library
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'],
                   'points': [10, 8, 3, 5],
                   'runrate': [0.5, 1.4, 2, -0.6],
                   'wins': [5, 4, 2, 2]})

# print the DataFrame
print(df)

Output

           team  points  runrate  wins
0         India      10      0.5     5
1  South Africa       8      1.4     4
2   New Zealand       3      2.0     2
3       England       5     -0.6     2

Now that we have created a DataFrame let’s assume that we need to add a new column called “lost”, which holds the count of total matches each team has lost.

Method 1: Declare and assign a new list as a column

The simplest way is to create a new list and assign the list to the new DataFrame column. Let us see how we can achieve this with an example.

# import pandas library
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'],
                   'points': [10, 8, 3, 5],
                   'runrate': [0.5, 1.4, 2, -0.6],
                   'wins': [5, 4, 2, 2]})

# print the DataFrame
print(df)

# declare a new list and add the values into the list
match_lost = [2, 1, 3, 4]

# assign the list to the new DataFrame Column
df["lost"] = match_lost

# Print the new DataFrame
print(df)

Output

           team  points  runrate  wins  lost
0         India      10      0.5     5     2
1  South Africa       8      1.4     4     1
2   New Zealand       3      2.0     2     3
3       England       5     -0.6     2     4

Method 2: Using the DataFrame.insert() method

The disadvantage of the above approach is that we cannot add the column at the specified position, and by default, the column is inserted towards the end, making it the last column.

We can overcome the issue using the pandas.DataFrame.insert() method. This method is useful when you need to insert a new column in a specific position or index.

In the below example, let us insert the new column “lost” before the “wins” column. We can achieve this by inserting a new column at index 2.

# import pandas library
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'],
                   'points': [10, 8, 3, 5],
                   'runrate': [0.5, 1.4, 2, -0.6],
                   'wins': [5, 4, 2, 2]})

# print the DataFrame
print(df)


# insert the new column at the specific position
df.insert(3, "lost", [2, 1, 3, 4], True)

# Print the new DataFrame
print(df)

Output

           team  points  runrate  lost  wins
0         India      10      0.5     2     5
1  South Africa       8      1.4     1     4
2   New Zealand       3      2.0     3     2
3       England       5     -0.6     4     2

Method 3: Using the DataFrame.assign() method

The pandas.DataFrame.assign() method is used if we need to create multiple new columns in a DataFrame.

This method returns a new object with all original columns in addition to new ones. All the existing columns that are re-assigned will be overwritten.

In the below example, we are adding multiple columns to Pandas DataFrame.

# import pandas library
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'],
                   'points': [10, 8, 3, 5],
                   'runrate': [0.5, 1.4, 2, -0.6],
                   'wins': [5, 4, 2, 2]})

# print the DataFrame
print(df)

# append multiple columns to Pandas DataFrame
df2 = df.assign(lost=[2, 1, 3, 4], matches_remaining=[2, 3, 1, 1])

# Print the new DataFrame
print(df2)

Output

           team  points  runrate  wins  lost  matches_remaining
0         India      10      0.5     5     2                  2
1  South Africa       8      1.4     4     1                  3
2   New Zealand       3      2.0     2     3                  1
3       England       5     -0.6     2     4                  1

Method 4: Using the pandas.concat() method

We can also leverage the pandas.concat() method to concatenate a new column to a DataFrame by passing axis=1 as an argument. This method returns a new DataFrame after concatenating the columns.

# import pandas library
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'],
                   'points': [10, 8, 3, 5],
                   'runrate': [0.5, 1.4, 2, -0.6],
                   'wins': [5, 4, 2, 2]})

# print the DataFrame
print(df)

# create a new DataFrame
df2 = pd.DataFrame([[1, 2], [2, 1], [3, 4], [0, 3]],
                   columns=['matches_left', 'lost'])

# concat and Print the new DataFrame
print(pd.concat([df, df2], axis=1))

Output

           team  points  runrate  wins  matches_left  lost
0         India      10      0.5     5             1     2
1  South Africa       8      1.4     4             2     1
2   New Zealand       3      2.0     2             3     4
3       England       5     -0.6     2             0     3

Method 5: Using the Dictionary

Another trick is to create a dictionary to add a new column in Pandas DataFrame. We can use the existing columns as Key to the dictionary and assign values respectively to the new column.

# import pandas library
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'],
                   'points': [10, 8, 3, 5],
                   'runrate': [0.5, 1.4, 2, -0.6],
                   'wins': [5, 4, 2, 2]})

# print the DataFrame
print(df)

# Create a new dictionary with keys as existing column
# and the values of new column
match_lost = {2: 'India', 1: 'South Africa', 3: 'New Zealand', 0: 'England'}

# assign the dictionary to the DataFrame Column
df['lost'] = match_lost

# print Dataframe
print(df)

Output

           team  points  runrate  wins  lost
0         India      10      0.5     5     2
1  South Africa       8      1.4     4     1
2   New Zealand       3      2.0     2     3
3       England       5     -0.6     2     0

Conclusion

In this article, we saw the 5 approaches creating and assigning a list, insert(), assign(), concat() and dictionary to insert new columns into Pandas DataFrame or overwrite the existing ones. Depending on the need and the requirement, you can choose one of the methods specified which are more suitable.

Leave a Reply

Your email address will not be published. Required fields are marked *

Sign Up for Our Newsletters

Subscribe to get notified of the latest articles. We will never spam you. Be a part of our ever-growing community.

You May Also Like