Replicating rows in a pandas data frame by a column value

I want to replicate rows in a Pandas Dataframe. Each row should be repeated n times, where n is a field of each row.

import pandas as pd

what_i_have = pd.DataFrame(data={
  'id': ['A', 'B', 'C'],
  'n' : [  1,   2,   3],
  'v' : [ 10,  13,   8]
})

what_i_want = pd.DataFrame(data={
  'id': ['A', 'B', 'B', 'C', 'C', 'C'],
  'v' : [ 10,  13,  13,   8,   8,   8]
})

Is this possible?

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can use Index.repeat to get repeated index values based on the column then select from the DataFrame:

df2 = df.loc[df.index.repeat(df.n)]

  id  n   v
0  A  1  10
1  B  2  13
1  B  2  13
2  C  3   8
2  C  3   8
2  C  3   8

Or you could use np.repeat to get the repeated indices and then use that to index into the frame:

df2 = df.loc[np.repeat(df.index.values, df.n)]

  id  n   v
0  A  1  10
1  B  2  13
1  B  2  13
2  C  3   8
2  C  3   8
2  C  3   8

After which there’s only a bit of cleaning up to do:

df2 = df2.drop("n", axis=1).reset_index(drop=True)

  id   v
0  A  10
1  B  13
2  B  13
3  C   8
4  C   8
5  C   8

Note that if you might have duplicate indices to worry about, you could use .iloc instead:

df.iloc[np.repeat(np.arange(len(df)), df["n"])].drop("n", axis=1).reset_index(drop=True)

  id   v
0  A  10
1  B  13
2  B  13
3  C   8
4  C   8
5  C   8

which uses the positions, and not the index labels.

Method 2

You could use set_index and repeat

In [1057]: df.set_index(['id'])['v'].repeat(df['n']).reset_index()
Out[1057]:
  id   v
0  A  10
1  B  13
2  B  13
3  C   8
4  C   8
5  C   8

Details

In [1058]: df
Out[1058]:
  id  n   v
0  A  1  10
1  B  2  13
2  C  3   8

Method 3

Not the best solution, but I want to share this: you could also use pandas.reindex() and .repeat():

df.reindex(df.index.repeat(df.n)).drop('n', axis=1)

Output:

You can further append .reset_index(drop=True) to reset the .index.

Method 4

It’s something like the uncount in tidyr:

https://tidyr.tidyverse.org/reference/uncount.html

I wrote a package (https://github.com/pwwang/datar) that implements this API:

from datar import f
from datar.tibble import tribble
from datar.tidyr import uncount

what_i_have = tribble(
    f.id, f.n, f.v,
    'A',  1,   10,
    'B',  2,   13,
    'C',  3,   8
)
what_i_have >> uncount(f.n)

Output:

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating