Suppose I have a dataframe like this
0 5 10 15 20 25 ...
action_0_Q0 0.299098 0.093973 0.761735 0.058112 0.013463 0.164322 ...
action_0_Q1 0.463095 0.468425 0.202679 0.742424 0.865005 0.479546 ...
action_0_Q2 0.237807 0.437602 0.035587 0.199465 0.121532 0.356132 ...
action_1_Q0 0.263191 0.176407 0.471295 0.082457 0.029566 0.426428 ...
action_1_Q1 0.508573 0.490355 0.431732 0.249432 0.189732 0.396947 ...
action_1_Q2 0.228236 0.333238 0.096973 0.668111 0.780702 0.176625 ...
action_2_Q0 0.256632 0.122589 0.495720 0.059918 0.824424 0.384998 ...
action_2_Q1 0.485362 0.462969 0.420790 0.211578 0.155771 0.186493 ...
action_2_Q2 0.258006 0.414442 0.083490 0.728504 0.019805 0.428509 ...
This dataframe may be very large (a lot of rows, about 3000 columns).
What I have to do is to apply a function to each column, which in turn returns a distance matrix. However, such function should be applied by considering 3 rows at once. For example, taking the first column:
a = distance_function([[0.299098, 0.463095, 0.237807], [0.263191, 0.508573, 0.228236], [0.256632, 0.485362, 0.258006]])
# Returns
print(a.shape) -> (3,3)
Now, this is not overly complicated via a for loop, but the time required would be huge. Is there some alternative way?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
IIUC use:
df = df.apply(lambda x: distance_function(x.to_numpy().reshape(-1,3)))
If need flatten values:
from itertools import chain df = df.apply(lambda x: list(chain.from_iterable(distance_function(x.to_numpy().reshape(-1,3))))
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0