Pandas DataFrame is Two-dimensional, size-mutable, potentially heterogeneous tabular data. Pandas DataFrame consists of rows and columns to store the data. Each column will have its own header name that can be used to identify the columns.
This tutorial will explore different methods available to get column names in Pandas Dataframe with examples.
Get Column names in Pandas DataFrame
Let us consider a simple dataframe that we will be using throughout the tutorial.
# import pandas library
import numpy as np
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
'Total_Sales': [3, 1, 7, 5, 11, 14]
})
print(df)
Output
Software_Names Rating Total_Qty_In_Stock Unit_Price Total_Sales
0 Windows Defender 4.2 10 23.55 3
1 AVG Antivirus 3.7 4 NaN 1
2 Mcafee Antivirus 4.0 8 32.78 7
3 Kaspersky Security 4.5 3 33.00 5
4 Norton Antivirus 3.0 5 NaN 11
5 Bit Defender 4.7 20 45.00 14
Pandas Get column names using column attribute
The easiest way to get the column names in Pandas Dataframe is using the Columns attribute. The df.columns
attribute returns all the column labels of the dataframe.
Syntax
df.columns
Let us check how it works with an example.
# import pandas library
import numpy as np
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
'Total_Sales': [3, 1, 7, 5, 11, 14]
})
# print all the columns in the dataframe
print(df.columns)
Output
Index(['Software_Names', 'Rating', 'Total_Qty_In_Stock', 'Unit_Price',
'Total_Sales'],
dtype='object')
Get a list from Pandas DataFrame column headers
If you are using Python 3.5 and above or the latest Pandas version 1.4 or above, you could use df.columns.values
that return all the columns as NumPy array or list.
Syntax
df.columns.values
Let us check how it works with an example.
# import pandas library
import numpy as np
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
'Total_Sales': [3, 1, 7, 5, 11, 14]
})
column_list = df.columns.values
# print all the columns in the dataframe
print(column_list)
Output
['Software_Names' 'Rating' 'Total_Qty_In_Stock' 'Unit_Price' 'Total_Sales']
If you are using an older version of Python and Pandas, you need to convert the NumPy array into a list using the tolist()
method.
Syntax
df.columns.values.tolist()
Let us check how it works with an example.
# import pandas library
import numpy as np
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
'Total_Sales': [3, 1, 7, 5, 11, 14]
})
column_list = df.columns.values.tolist()
# print all the columns in the dataframe
print(column_list)
Output
['Software_Names' 'Rating' 'Total_Qty_In_Stock' 'Unit_Price' 'Total_Sales']
Another way to get the list of column headers from Pandas Dataframe is using the list()
method.
We can pass the Dataframe object to the list()
method, and it returns all the column headers as a list.
Syntax
columns_list = list(df)
# import pandas library
import numpy as np
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
'Total_Sales': [3, 1, 7, 5, 11, 14]
})
column_list = list(df)
# pandas print column names
print(column_list)
Output
['Software_Names' 'Rating' 'Total_Qty_In_Stock' 'Unit_Price' 'Total_Sales']
Get Pandas Column names with datatype
We may need to fetch the column name with its type in specific situations. In that case, we can use the dtypes attribute. This returns a Series with the data type of each column in the dataframe.
Syntax
df.dtypes
Let us check how it works with an example.
# import pandas library
import numpy as np
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
'Total_Sales': [3, 1, 7, 5, 11, 14]
})
# pandas print column names with datatype
print(df.dtypes)
Output
Software_Names object
Rating float64
Total_Qty_In_Stock int64
Unit_Price float64
Total_Sales int64
dtype: object
Get the list of columns from Pandas Dataframe based on specific Datatype
Here let us check how to get a list from dataframe column headers based on the data type of the column.
For instance, if we need to fetch all the columns names of datatype int64
. We can use select_dtypes()
method available in the dataframe. The select_dtypes()
method returns a subset of the DataFrame’s columns based on the column dtypes.
Syntax
DataFrame.select_dtypes(include=None, exclude=None)
Let us check how it works with an example.
# import pandas library
import numpy as np
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
'Total_Sales': [3, 1, 7, 5, 11, 14]
})
# pandas print column names based on datatype
print(df.select_dtypes('int64').columns.values)
Output
['Total_Qty_In_Stock' 'Total_Sales']
Get Pandas Dataframe Columns names sorted
The sorted()
method accepts the dataframe and returns a list of column names or headers sorted alphabetically.
Syntax
sorted(df)
Let us check how it works with an example.
# import pandas library
import numpy as np
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
'Total_Sales': [3, 1, 7, 5, 11, 14]
})
# pandas print column names sorted alphabetically
print(sorted(df))
Output
['Rating', 'Software_Names', 'Total_Qty_In_Stock', 'Total_Sales', 'Unit_Price']
Pandas Get Column Names With NaN
We can also get all the column headers with NaN
. In Pandas, the missing values are denoted using the NaN
.
We can use isna()
and isnull()
methods in Pandas to get all the columns with missing data.
The isna()
method returns a boolean same-sized object indicating if the values are NA. NA values, such as None
or numpy.NaN
, gets mapped to True
values. Everything else gets mapped to False
values.
Syntax
df.isna().any()
Let us check how it works with an example.
# import pandas library
import numpy as np
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
'Total_Sales': [3, 1, 7, 5, 11, 14]
})
# pandas print column names which are NaN
print(df.isna().any())
Output
Software_Names False
Rating False
Total_Qty_In_Stock False
Unit_Price True
Total_Sales False
dtype: bool
Syntax
df.isnull().any()
This isnull()
function takes a scalar or array-like object and indicates whether values are missing (NaN
in numeric arrays, None
or NaN
in object arrays, NaT
in datetimelike).
Let us check how it works with an example.
# import pandas library
import numpy as np
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'],
'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7],
'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20],
'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45],
'Total_Sales': [3, 1, 7, 5, 11, 14]
})
# pandas print column names which are NaN
print(df.isnull().any())
Output
Software_Names False
Rating False
Total_Qty_In_Stock False
Unit_Price True
Total_Sales False
dtype: bool
Conclusion
Pandas Datafrmae consists of rows and columns to store data. Each columns will have its own header name to identify the column.
We have used multiple ways to get the column names in Pandas Dataframe using attributes and methods such as df.columns
, df.columns.values
, df.columns.values.tolist()
, list(df)
etc.