Missing data, insert rows in Pandas and fill with NAN

I’m new to Python and Pandas so there might be a simple solution which I don’t see.

I have a number of discontinuous datasets which look like this:

ind A    B  C  
0   0.0  1  3  
1   0.5  4  2  
2   1.0  6  1  
3   3.5  2  0  
4   4.0  4  5  
5   4.5  3  3

I now look for a solution to get the following:

ind A    B  C  
0   0.0  1  3  
1   0.5  4  2  
2   1.0  6  1  
3   1.5  NAN NAN  
4   2.0  NAN NAN  
5   2.5  NAN NAN  
6   3.0  NAN NAN  
7   3.5  2  0  
8   4.0  4  5  
9   4.5  3  3

The problem is,that the gap in A varies from dataset to dataset in position and length…

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

set_index and reset_index are your friends.

df = DataFrame({"A":[0,0.5,1.0,3.5,4.0,4.5], "B":[1,4,6,2,4,3], "C":[3,2,1,0,5,3]})

First move column A to the index:

In [64]: df.set_index("A")
Out[64]: 
     B  C
 A        
0.0  1  3
0.5  4  2
1.0  6  1
3.5  2  0
4.0  4  5
4.5  3  3

Then reindex with a new index, here the missing data is filled in with nans. We use the Index object since we can name it; this will be used in the next step.

In [66]: new_index = Index(arange(0,5,0.5), name="A")
In [67]: df.set_index("A").reindex(new_index)
Out[67]: 
      B   C
0.0   1   3
0.5   4   2
1.0   6   1
1.5 NaN NaN
2.0 NaN NaN
2.5 NaN NaN
3.0 NaN NaN
3.5   2   0
4.0   4   5
4.5   3   3

Finally move the index back to the columns with reset_index. Since we named the index, it all works magically:

In [69]: df.set_index("A").reindex(new_index).reset_index()
Out[69]: 
       A   B   C
0    0.0   1   3
1    0.5   4   2
2    1.0   6   1
3    1.5 NaN NaN
4    2.0 NaN NaN
5    2.5 NaN NaN
6    3.0 NaN NaN
7    3.5   2   0
8    4.0   4   5
9    4.5   3   3

Method 2

Using the answer by EdChum above, I created the following function

def fill_missing_range(df, field, range_from, range_to, range_step=1, fill_with=0):
    return df
      .merge(how='right', on=field,
            right = pd.DataFrame({field:np.arange(range_from, range_to, range_step)}))
      .sort_values(by=field).reset_index().fillna(fill_with).drop(['index'], axis=1)

Example usage:

fill_missing_range(df, 'A', 0.0, 4.5, 0.5, np.nan)

Method 3

In this case I am overwriting your A column with a newly generated dataframe and merging this to your original df, I then resort it:

    In [177]:

df.merge(how='right', on='A', right = pd.DataFrame({'A':np.arange(df.iloc[0]['A'], df.iloc[-1]['A'] + 0.5, 0.5)})).sort(columns='A').reset_index().drop(['index'], axis=1)
Out[177]:
     A   B   C
0  0.0   1   3
1  0.5   4   2
2  1.0   6   1
3  1.5 NaN NaN
4  2.0 NaN NaN
5  2.5 NaN NaN
6  3.0 NaN NaN
7  3.5   2   0
8  4.0   4   5
9  4.5   3   3

So in the general case you can adjust the arange function which takes a start and end value, note I added 0.5 to the end as ranges are open closed, and pass a step value.

A more general method could be like this:

In [197]:

df = df.set_index(keys='A', drop=False).reindex(np.arange(df.iloc[0]['A'], df.iloc[-1]['A'] + 0.5, 0.5))
df.reset_index(inplace=True) 
df['A'] = df['index']
df.drop(['A'], axis=1, inplace=True)
df.reset_index().drop(['level_0'], axis=1)
Out[197]:
   index   B   C
0    0.0   1   3
1    0.5   4   2
2    1.0   6   1
3    1.5 NaN NaN
4    2.0 NaN NaN
5    2.5 NaN NaN
6    3.0 NaN NaN
7    3.5   2   0
8    4.0   4   5
9    4.5   3   3

Here we set the index to column A but don’t drop it and then reindex the df using the arange function.

Method 4

This question was asked a long time ago, but I have a simple solution that’s worth mentioning. You can simply use NumPy’s NaN. For instance:

import numpy as np
df[i,j] = np.NaN

will do the trick.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x