How do I check if a column exists in a Pandas DataFrame df?
A B C 0 3 40 100 1 6 30 200
How would I check that the column "A" exists in the above DataFrame so that I can compute:
df['sum'] = df['A'] + df['C']
And if "A" doesn’t exist:
df['sum'] = df['B'] + df['C']
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
This will work:
if 'A' in df:
But for clarity, I’d probably write it as:
if 'A' in df.columns:
Method 2
To check if one or more columns all exist, you can use set.issubset, as in:
if set(['A','C']).issubset(df.columns): df['sum'] = df['A'] + df['C']
As @brianpck points out in a comment, set([]) can alternatively be constructed with curly braces,
if {'A', 'C'}.issubset(df.columns):
See this question for a discussion of the curly-braces syntax.
Or, you can use a generator comprehension, as in:
if all(item in df.columns for item in ['A','C']):
Method 3
Just to suggest another way without using if statements, you can use the get() method for DataFrames. For performing the sum based on the question:
df['sum'] = df.get('A', df['B']) + df['C']
The DataFrame get method has similar behavior as python dictionaries.
Method 4
You can use the set’s method issuperset:
set(df).issuperset(['A', 'B']) # set(df.columns).issuperset(['A', 'B'])
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0