How to get column names in Pandas Dataframe

Pandas DataFrame isTwo-dimensional, size-mutable, potentially heterogeneous tabular data. Pandas DataFrame consists of rows and columns to store the data. Each column will have its own header name that can be used to identify the columns.

This tutorial will explore different methods available to get column names in Pandas Dataframe with examples.

Get Column names in Pandas DataFrame

Let us consider a simple dataframe that we will be using throughout the tutorial.

# import pandas library
import numpy as np
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
                   'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
                   'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
                   'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
                   'Total_Sales': [3, 1, 7, 5, 11, 14]

                   })
print(df)

Output

        Software_Names  Rating  Total_Qty_In_Stock  Unit_Price  Total_Sales
0    Windows Defender     4.2                  10       23.55            3
1       AVG Antivirus     3.7                   4         NaN            1
2    Mcafee Antivirus     4.0                   8       32.78            7
3  Kaspersky Security     4.5                   3       33.00            5
4    Norton Antivirus     3.0                   5         NaN           11
5        Bit Defender     4.7                  20       45.00           14

Pandas Get column names using column attribute

The easiest way to get the column names in Pandas Dataframe is using the Columns attribute. The df.columns attribute returns all the column labels of the dataframe.

Syntax

df.columns

Let us check how it works with an example.

# import pandas library
import numpy as np
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
                   'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
                   'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
                   'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
                   'Total_Sales': [3, 1, 7, 5, 11, 14]

                   })
                   
# print all the columns in the dataframe
print(df.columns)

Output

Index(['Software_Names', 'Rating', 'Total_Qty_In_Stock', 'Unit_Price',
       'Total_Sales'],
      dtype='object')

Get a list from Pandas DataFrame column headers

If you are using Python 3.5 and above or the latest Pandas version 1.4 or above, you could use df.columns.values that return all the columns as NumPy array or list.

Syntax

df.columns.values

Let us check how it works with an example.

# import pandas library
import numpy as np
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
                   'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
                   'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
                   'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
                   'Total_Sales': [3, 1, 7, 5, 11, 14]

                   })
column_list = df.columns.values
# print all the columns in the dataframe
print(column_list)

Output

['Software_Names' 'Rating' 'Total_Qty_In_Stock' 'Unit_Price' 'Total_Sales']

If you are using an older version of Python and Pandas, you need to convert the NumPy array into a list using the tolist() method.

Syntax

df.columns.values.tolist()

Let us check how it works with an example.

# import pandas library
import numpy as np
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
                   'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
                   'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
                   'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
                   'Total_Sales': [3, 1, 7, 5, 11, 14]

                   })
column_list = df.columns.values.tolist()
# print all the columns in the dataframe
print(column_list)

Output

['Software_Names' 'Rating' 'Total_Qty_In_Stock' 'Unit_Price' 'Total_Sales']

Another way to get the list of column headers from Pandas Dataframe is using the list() method.

We can pass the Dataframe object to the list() method, and it returns all the column headers as a list.

Syntax

columns_list = list(df)

# import pandas library
import numpy as np
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
                   'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
                   'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
                   'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
                   'Total_Sales': [3, 1, 7, 5, 11, 14]

                   })
column_list = list(df)
# pandas print column names
print(column_list)

Output

['Software_Names' 'Rating' 'Total_Qty_In_Stock' 'Unit_Price' 'Total_Sales']

Get Pandas Column names with datatype

We may need to fetch the column name with its type in specific situations. In that case, we can use the dtypes attribute. This returns a Series with the data type of each column in the dataframe.

Syntax

df.dtypes

Let us check how it works with an example.

# import pandas library
import numpy as np
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
                   'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
                   'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
                   'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
                   'Total_Sales': [3, 1, 7, 5, 11, 14]

                   })
# pandas print column names with datatype
print(df.dtypes)

Output

Software_Names         object
Rating                float64
Total_Qty_In_Stock      int64
Unit_Price            float64
Total_Sales             int64
dtype: object

Get the list of columns from Pandas Dataframe based on specific Datatype

Here let us check how to get a list from dataframe column headers based on the data type of the column.

For instance, if we need to fetch all the columns names of datatype int64. We can use select_dtypes() method available in the dataframe. The select_dtypes() method returns a subset of the DataFrame’s columns based on the column dtypes.

Syntax

DataFrame.select_dtypes(include=None, exclude=None)

Let us check how it works with an example.

# import pandas library
import numpy as np
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
                   'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
                   'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
                   'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
                   'Total_Sales': [3, 1, 7, 5, 11, 14]

                   })
# pandas print column names based on datatype
print(df.select_dtypes('int64').columns.values)

Output

['Total_Qty_In_Stock' 'Total_Sales']

Get Pandas Dataframe Columns names sorted

The sorted() method accepts the dataframe and returns a list of column names or headers sorted alphabetically.

Syntax

sorted(df)

Let us check how it works with an example.

# import pandas library
import numpy as np
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
                   'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
                   'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
                   'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
                   'Total_Sales': [3, 1, 7, 5, 11, 14]

                   })
# pandas print column names sorted alphabetically
print(sorted(df))

Output

['Rating', 'Software_Names', 'Total_Qty_In_Stock', 'Total_Sales', 'Unit_Price']

Pandas Get Column Names With NaN

We can also get all the column headers with NaN. In Pandas, the missing values are denoted using the NaN.

We can use isna() and isnull() methods in Pandas to get all the columns with missing data.

The isna() method returns a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values.

Syntax

df.isna().any()

Let us check how it works with an example.

# import pandas library
import numpy as np
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
                   'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
                   'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
                   'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
                   'Total_Sales': [3, 1, 7, 5, 11, 14]

                   })
# pandas print column names which are NaN
print(df.isna().any())

Output

Software_Names        False
Rating                False
Total_Qty_In_Stock    False
Unit_Price             True
Total_Sales           False
dtype: bool

Syntax

df.isnull().any()

This isnull() function takes a scalar or array-like object and indicates whether values are missing (NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike).

Let us check how it works with an example.

# import pandas library
import numpy as np
import pandas as pd

# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
                   'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
                   'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
                   'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
                   'Total_Sales': [3, 1, 7, 5, 11, 14]

                   })
# pandas print column names which are NaN
print(df.isnull().any())

Output

Software_Names        False
Rating                False
Total_Qty_In_Stock    False
Unit_Price             True
Total_Sales           False
dtype: bool

Conclusion

Pandas Datafrmae consists of rows and columns to store data. Each columns will have its own header name to identify the column.

We have used multiple ways to get the column names in Pandas Dataframe using attributes and methods such as df.columns, df.columns.values, df.columns.values.tolist(), list(df) etc.

Get Column names in Pandas DataFrame

Pandas Get column names using column attribute

Get a list from Pandas DataFrame column headers

Get Pandas Column names with datatype

Get the list of columns from Pandas Dataframe based on specific Datatype

Get Pandas Dataframe Columns names sorted

Pandas Get Column Names With NaN

Conclusion

Related Posts

Convert a list to string in Python

TypeError: only size-1 arrays can be converted to python scalars

[Solved] RuntimeWarning: invalid value encountered in double_scalars