I have a large dataframe (5000 x 12039) and I want to get the column name that matches a numpy array.
For example, if I have the table
m1lenhr m1lenmin m1citywt m1a12a cm1age cm1numb m1b1a m1b1b m1b12a m1b12b ... kind_attention_scale_10 kind_attention_scale_22 kind_attention_scale_21 kind_attention_scale_15 kind_attention_scale_18 kind_attention_scale_19 kind_attention_scale_25 kind_attention_scale_24 kind_attention_scale_27 kind_attention_scale_23 challengeID 1 0.130765 40.0 202.485367 1.893256 27.0 1.0 2.0 0.0 2.254198 2.289966 ... 0 0 0 0 0 0 0 0 0 0 2 0.000000 40.0 45.608219 1.000000 24.0 1.0 2.0 0.0 2.000000 3.000000 ... 0 0 0 0 0 0 0 0 0 0 3 0.000000 35.0 39.060299 2.000000 23.0 1.0 2.0 0.0 2.254198 2.289966 ... 0 0 0 0 0 0 0 0 0 0 4 0.000000 30.0 22.304855 1.893256 22.0 1.0 3.0 0.0 2.000000 3.000000 ... 0 0 0 0 0 0 0 0 0 0 5 0.000000 25.0 35.518272 1.893256 19.0 1.0 1.0 6.0 1.000000 3.000000 ... 0
I want to do this:
x = [40.0, 40.0, 35.0, 30.0, 25.0] find_column(x)
and have find_column(x) return m1lenmin
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Approach #1
Here’s one vectorized approach leveraging NumPy broadcasting –
df.columns[(df.values == np.asarray(x)[:,None]).all(0)]
Sample run –
In [367]: df Out[367]: 0 1 2 3 4 5 6 7 8 9 0 7 1 2 6 2 1 7 2 0 6 1 5 4 3 3 2 1 1 1 5 5 2 7 7 2 2 5 4 6 6 5 7 3 0 5 4 1 5 7 8 2 2 4 4 7 1 0 4 5 4 3 2 8 6 In [368]: x = df.iloc[:,2].values.tolist() In [369]: x Out[369]: [2, 3, 2, 4, 0] In [370]: df.columns[(df.values == np.asarray(x)[:,None]).all(0)] Out[370]: Int64Index([2], dtype='int64')
Approach #2
Alternatively, here’s another using the concept of views –
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
df1D_arr, x1D = view1D(df.values.T,np.asarray(x)[None])
out = np.flatnonzero(df1D_arr==x1D)
Sample run –
In [442]: df Out[442]: 0 1 2 3 4 5 6 7 8 9 0 7 1 2 6 2 1 7 2 0 6 1 5 4 3 3 2 1 1 1 5 5 2 7 7 2 2 5 4 6 6 5 7 3 0 5 4 1 5 7 8 2 2 4 4 7 1 0 4 5 4 3 2 8 6 In [443]: x = df.iloc[:,5].values.tolist() In [444]: df1D_arr, x1D = view1D(df.values.T,np.asarray(x)[None]) In [445]: np.flatnonzero(df1D_arr==x1D) Out[445]: array([5])
Method 2
Try this:
In [91]: x = np.array(x) In [94]: df.apply(lambda col: col.eq(x).all()) Out[94]: m1lenhr False m1lenmin True m1citywt False m1a12a False cm1age False cm1numb False m1b1a False m1b1b False m1b12a False m1b12b False dtype: bool In [95]: df.columns[df.apply(lambda col: col.eq(x).all()).values] Out[95]: Index(['m1lenmin'], dtype='object')
Method 3
You can use the method eq (get equal) with the axis parameter set to 0 or 'index':
df = pd.DataFrame({'A': [3, 4, 5, 6], 'B': [1, 2, 2, 2]})
df.columns[df.eq([1, 2, 2, 2], axis=0).all(0)]
or
df.columns[df.eq([1, 2, 2, 2], axis='index').all('index')]
Output:
Index(['B'], dtype='object')
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0