Find column name in pandas that matches an array

I have a large dataframe (5000 x 12039) and I want to get the column name that matches a numpy array.

For example, if I have the table

        m1lenhr m1lenmin    m1citywt    m1a12a  cm1age  cm1numb m1b1a   m1b1b   m1b12a  m1b12b  ... kind_attention_scale_10 kind_attention_scale_22 kind_attention_scale_21 kind_attention_scale_15 kind_attention_scale_18 kind_attention_scale_19 kind_attention_scale_25 kind_attention_scale_24 kind_attention_scale_27 kind_attention_scale_23
challengeID                                                                                 
1   0.130765    40.0    202.485367  1.893256    27.0    1.0 2.0 0.0 2.254198    2.289966    ... 0   0   0   0   0   0   0   0   0   0
2   0.000000    40.0    45.608219   1.000000    24.0    1.0 2.0 0.0 2.000000    3.000000    ... 0   0   0   0   0   0   0   0   0   0
3   0.000000    35.0    39.060299   2.000000    23.0    1.0 2.0 0.0 2.254198    2.289966    ... 0   0   0   0   0   0   0   0   0   0
4   0.000000    30.0    22.304855   1.893256    22.0    1.0 3.0 0.0 2.000000    3.000000    ... 0   0   0   0   0   0   0   0   0   0
5   0.000000    25.0    35.518272   1.893256    19.0    1.0 1.0 6.0 1.000000    3.000000    ... 0

I want to do this:

x = [40.0, 40.0, 35.0, 30.0, 25.0]
find_column(x)

and have find_column(x) return m1lenmin

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Approach #1

Here’s one vectorized approach leveraging NumPy broadcasting

df.columns[(df.values == np.asarray(x)[:,None]).all(0)]

Sample run –

In [367]: df
Out[367]: 
   0  1  2  3  4  5  6  7  8  9
0  7  1  2  6  2  1  7  2  0  6
1  5  4  3  3  2  1  1  1  5  5
2  7  7  2  2  5  4  6  6  5  7
3  0  5  4  1  5  7  8  2  2  4
4  7  1  0  4  5  4  3  2  8  6

In [368]: x = df.iloc[:,2].values.tolist()

In [369]: x
Out[369]: [2, 3, 2, 4, 0]

In [370]: df.columns[(df.values == np.asarray(x)[:,None]).all(0)]
Out[370]: Int64Index([2], dtype='int64')

Approach #2

Alternatively, here’s another using the concept of views

def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    b = np.ascontiguousarray(b)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

df1D_arr, x1D = view1D(df.values.T,np.asarray(x)[None])
out = np.flatnonzero(df1D_arr==x1D)

Sample run –

In [442]: df
Out[442]: 
   0  1  2  3  4  5  6  7  8  9
0  7  1  2  6  2  1  7  2  0  6
1  5  4  3  3  2  1  1  1  5  5
2  7  7  2  2  5  4  6  6  5  7
3  0  5  4  1  5  7  8  2  2  4
4  7  1  0  4  5  4  3  2  8  6

In [443]: x = df.iloc[:,5].values.tolist()

In [444]: df1D_arr, x1D = view1D(df.values.T,np.asarray(x)[None])

In [445]: np.flatnonzero(df1D_arr==x1D)
Out[445]: array([5])

Method 2

Try this:

In [91]: x = np.array(x)

In [94]: df.apply(lambda col: col.eq(x).all())
Out[94]:
m1lenhr     False
m1lenmin     True
m1citywt    False
m1a12a      False
cm1age      False
cm1numb     False
m1b1a       False
m1b1b       False
m1b12a      False
m1b12b      False
dtype: bool

In [95]: df.columns[df.apply(lambda col: col.eq(x).all()).values]
Out[95]: Index(['m1lenmin'], dtype='object')

Method 3

You can use the method eq (get equal) with the axis parameter set to 0 or 'index':

df = pd.DataFrame({'A': [3, 4, 5, 6], 'B': [1, 2, 2, 2]})

df.columns[df.eq([1, 2, 2, 2], axis=0).all(0)]

or

df.columns[df.eq([1, 2, 2, 2], axis='index').all('index')]

Output:

Index(['B'], dtype='object')


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x