Filter a pandas dataframe using values from a dict

I need to filter a data frame with a dict, constructed with the key being the column name and the value being the value that I want to filter:

filter_v = {'A':1, 'B':0, 'C':'This is right'}
# this would be the normal approach
df[(df['A'] == 1) & (df['B'] ==0)& (df['C'] == 'This is right')]

But I want to do something on the lines

for column, value in filter_v.items():
    df[df<div class="su-column su-column-size-1-2"><div class="su-column-inner su-u-clearfix su-u-trim"></div></div> == value]

but this will filter the data frame several times, one value at a time, and not apply all filters at the same time. Is there a way to do it programmatically?

EDIT: an example:

df1 = pd.DataFrame({'A':[1,0,1,1, np.nan], 'B':[1,1,1,0,1], 'C':['right','right','wrong','right', 'right'],'D':[1,2,2,3,4]})
filter_v = {'A':1, 'B':0, 'C':'right'}
df1.loc[df1[filter_v.keys()].isin(filter_v.values()).all(axis=1), :]

gives

    A   B   C   D
0   1   1   right   1
1   0   1   right   2
3   1   0   right   3

but the expected result was

    A   B   C   D
3   1   0   right   3

only the last one should be selected.

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Method 6

Method 7

Method 8

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

IIUC, you should be able to do something like this:

>>> df1.loc[(df1[list(filter_v)] == pd.Series(filter_v)).all(axis=1)]
   A  B      C  D
3  1  0  right  3

This works by making a Series to compare against:

>>> pd.Series(filter_v)
A        1
B        0
C    right
dtype: object

Selecting the corresponding part of df1:

>>> df1[list(filter_v)]
    A      C  B
0   1  right  1
1   0  right  1
2   1  wrong  1
3   1  right  0
4 NaN  right  1

Finding where they match:

>>> df1[list(filter_v)] == pd.Series(filter_v)
       A      B      C
0   True  False   True
1  False  False   True
2   True  False  False
3   True   True   True
4  False  False   True

Finding where they all match:

>>> (df1[list(filter_v)] == pd.Series(filter_v)).all(axis=1)
0    False
1    False
2    False
3     True
4    False
dtype: bool

And finally using this to index into df1:

>>> df1.loc[(df1[list(filter_v)] == pd.Series(filter_v)).all(axis=1)]
   A  B      C  D
3  1  0  right  3

Method 2

Here is a way to do it:

df.loc[df[filter_v.keys()].isin(filter_v.values()).all(axis=1), :]

UPDATE:

With values being the same across columns you could then do something like this:

# Create your filtering function:

def filter_dict(df, dic):
    return df[df[dic.keys()].apply(
            lambda x: x.equals(pd.Series(dic.values(), index=x.index, name=x.name)), asix=1)]

# Use it on your DataFrame:

filter_dict(df1, filter_v)

Which yields:

   A  B      C  D
3  1  0  right  3

If it something that you do frequently you could go as far as to patch DataFrame for an easy access to this filter:

pd.DataFrame.filter_dict_ = filter_dict

And then use this filter like this:

df1.filter_dict_(filter_v)

Which would yield the same result.

BUT, it is not the right way to do it, clearly.
I would use DSM’s approach.

Method 3

For python2, that’s OK in @primer’s answer. But, you should be careful in Python3 because of dict_keys. For instance,

>> df.loc[df[filter_v.keys()].isin(filter_v.values()).all(axis=1), :]
>> TypeError: unhashable type: 'dict_keys'

The correct way to Python3:

df.loc[df[list(filter_v.keys())].isin(list(filter_v.values())).all(axis=1), :]

Method 4

Abstraction of the above for case of passing array of filter values rather than single value (analogous to pandas.core.series.Series.isin()). Using the same example:

df1 = pd.DataFrame({'A':[1,0,1,1, np.nan], 'B':[1,1,1,0,1], 'C':['right','right','wrong','right', 'right'],'D':[1,2,2,3,4]})
filter_v = {'A':[1], 'B':[1,0], 'C':['right']}
##Start with array of all True
ind = [True] * len(df1)

##Loop through filters, updating index
for col, vals in filter_v.items():
    ind = ind & (df1[col].isin(vals))

##Return filtered dataframe
df1[ind]

##Returns

    A   B    C      D
0   1.0 1   right   1
3   1.0 0   right   3

Method 5

Here’s another way:

filterSeries = pd.Series(np.ones(df.shape[0],dtype=bool))
for column, value in filter_v.items():
    filterSeries = ((df<div class="su-column su-column-size-1-2"><div class="su-column-inner su-u-clearfix su-u-trim"></div></div> == value) & filterSeries)

This gives:

>>> df[filterSeries]
   A  B      C  D
3  1  0  right  3

Method 6

To follow up on DSM’s answer, you can also use any() to turn your query into an OR operation (instead of AND):

df1.loc[(df1[list(filter_v)] == pd.Series(filter_v)).any(axis=1)]

Method 7

You can also create a query

query_string = ' and '.join(
    [f'({key} == "{val}")' if type(val) == str else f'({key} == {val})' for key, val in filter_v.items()]
)

df1.query(query_string)

Method 8

I had an issue due to my dictionary having multiple values for the same key.

I was able to change DSM’s query to:

df1.loc[df1[list(filter_v)].isin(filter_v).all(axis=1), :]

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating