Lets say I have a MultiIndex Series s:
>>> s
values
a b
1 2 0.1
3 6 0.3
4 4 0.7
and I want to apply a function which uses the index of the row:
def f(x): # conditions or computations using the indexes if x.index[0] and ...: other = sum(x.index) + ... return something
How can I do s.apply(f) for such a function? What is the recommended way to make this kind of operations? I expect to obtain a new Series with the values resulting from this function applied on each row and the same MultiIndex.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
I don’t believe apply has access to the index; it treats each row as a numpy object, not a Series, as you can see:
In [27]: s.apply(lambda x: type(x)) Out[27]: a b 1 2 <type 'numpy.float64'> 3 6 <type 'numpy.float64'> 4 4 <type 'numpy.float64'>
To get around this limitation, promote the indexes to columns, apply your function, and recreate a Series with the original index.
Series(s.reset_index().apply(f, axis=1).values, index=s.index)
Other approaches might use s.get_level_values, which often gets a little ugly in my opinion, or s.iterrows(), which is likely to be slower — perhaps depending on exactly what f does.
Method 2
Make it a frame, return scalars if you want (so the result is a series)
Setup
In [11]: s = Series([1,2,3],dtype='float64',index=['a','b','c']) In [12]: s Out[12]: a 1 b 2 c 3 dtype: float64
Printing function
In [13]: def f(x):
print type(x), x
return x
....:
In [14]: pd.DataFrame(s).apply(f)
<class 'pandas.core.series.Series'> a 1
b 2
c 3
Name: 0, dtype: float64
<class 'pandas.core.series.Series'> a 1
b 2
c 3
Name: 0, dtype: float64
Out[14]:
0
a 1
b 2
c 3
Since you can return anything here, just return the scalars (access the index via the name attribute)
In [15]: pd.DataFrame(s).apply(lambda x: 5 if x.name == 'a' else x[0] ,1) Out[15]: a 5 b 2 c 3 dtype: float64
Method 3
Convert to DataFrame and apply along row. You can access the index as x.name. x is also a Series now with 1 value
s.to_frame(0).apply(f, axis=1)[0]
Method 4
You may find it faster to use where rather than apply here:
In [11]: s = pd.Series([1., 2., 3.], index=['a' ,'b', 'c']) In [12]: s.where(s.index != 'a', 5) Out[12]: a 5 b 2 c 3 dtype: float64
Also you can use numpy-style logic/functions to any of the parts:
In [13]: (2 * s + 1).where((s.index == 'b') | (s.index == 'c'), -s) Out[13]: a -1 b 5 c 7 dtype: float64 In [14]: (2 * s + 1).where(s.index != 'a', -s) Out[14]: a -1 b 5 c 7 dtype: float64
I recommend testing for speed (as efficiency against apply will depend on the function). Although, I find that applys are more readable…
Method 5
You can access the whole row as argument inside the fucntion if you use DataFrame.apply() instead of Series.apply().
def f1(row):
if row['I'] < 0.5:
return 0
else:
return 1
def f2(row):
if row['N1']==1:
return 0
else:
return 1
import pandas as pd
import numpy as np
df4 = pd.DataFrame(np.random.rand(6,1), columns=list('I'))
df4['N1']=df4.apply(f1, axis=1)
df4['N2']=df4.apply(f2, axis=1)
Method 6
Use reset_index() to convert the Series to a DataFrame and the index to a column, and then apply your function to the DataFrame.
The tricky part is knowing how reset_index() names the columns, so here are a couple of examples.
With a Singly Indexed Series
s=pd.Series({'idx1': 'val1', 'idx2': 'val2'})
def use_index_and_value(row):
return 'I made this with index {} and value {}'.format(row['index'], row[0])
s2 = s.reset_index().apply(use_index_and_value, axis=1)
# The new Series has an auto-index;
# You'll want to replace that with the index from the original Series
s2.index = s.index
s2
Output:
idx1 I made this with index idx1 and value val1 idx2 I made this with index idx2 and value val2 dtype: object
With a Multi-Indexed Series
Same concept here, but you’ll need to access the index values as row['level_*'] because that’s where they’re placed by Series.reset_index().
s=pd.Series({
('idx(0,0)', 'idx(0,1)'): 'val1',
('idx(1,0)', 'idx(1,1)'): 'val2'
})
def use_index_and_value(row):
return 'made with index: {},{} & value: {}'.format(
row['level_0'],
row['level_1'],
row[0]
)
s2 = s.reset_index().apply(use_index_and_value, axis=1)
# Replace auto index with the index from the original Series
s2.index = s.index
s2
Output:
idx(0,0) idx(0,1) made with index: idx(0,0),idx(0,1) & value: val1 idx(1,0) idx(1,1) made with index: idx(1,0),idx(1,1) & value: val2 dtype: object
If your series or indexes have names, you will need to adjust accordingly.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0