Say I have the following Pandas Dataframe:
df = pd.DataFrame({"a" : [1,2,3], "b" : [[1,2],[2,3,4],[5]]})
a b
0 1 [1, 2]
1 2 [2, 3, 4]
2 3 [5]
How would I “unstack” the lists in the “b” column in order to transform it into the dataframe:
a b 0 1 1 1 1 2 2 2 2 3 2 3 4 2 4 5 3 5
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
UPDATE: generic vectorized approach – will work also for multiple columns DFs:
assuming we have the following DF:
In [159]: df Out[159]: a b c 0 1 [1, 2] 5 1 2 [2, 3, 4] 6 2 3 [5] 7
Solution:
In [160]: lst_col = 'b'
In [161]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.difference([lst_col])
...: }).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns.tolist()]
...:
Out[161]:
a b c
0 1 1 5
1 1 2 5
2 2 2 6
3 2 3 6
4 2 4 6
5 3 5 7
Setup:
df = pd.DataFrame({
"a" : [1,2,3],
"b" : [[1,2],[2,3,4],[5]],
"c" : [5,6,7]
})
Vectorized NumPy approach:
In [124]: pd.DataFrame({'a':np.repeat(df.a.values, df.b.str.len()),
'b':np.concatenate(df.b.values)})
Out[124]:
a b
0 1 1
1 1 2
2 2 2
3 2 3
4 2 4
5 3 5
OLD answer:
Try this:
In [89]: df.set_index('a', append=True).b.apply(pd.Series).stack().reset_index(level=[0, 2], drop=True).reset_index()
Out[89]:
a 0
0 1 1.0
1 1 2.0
2 2 2.0
3 2 3.0
4 2 4.0
5 3 5.0
Or bit nicer solution provided by @Boud:
In [110]: df.set_index('a').b.apply(pd.Series).stack().reset_index(level=-1, drop=True).astype(int).reset_index()
Out[110]:
a 0
0 1 1
1 1 2
2 2 2
3 2 3
4 2 4
5 3 5
Method 2
Here is another approach with itertuples –
df = pd.DataFrame({"a" : [1,2,3], "b" : [[1,2],[2,3,4],[5]]})
data = []
for i in df.itertuples():
lst = i[2]
for col2 in lst:
data.append([i[1], col2])
df_output = pd.DataFrame(data =data, columns=df.columns)
df_output
Output is –
a b
0 1 1
1 1 2
2 2 2
3 2 3
4 2 4
5 3 5
Edit: You can also compress the loops into a single code and populate data as –
data = [[i[1], col2] for i in df.itertuples() for col2 in i[2]]
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0