I would like to know if there is someway of replacing all DataFrame negative numbers by zeros?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
If all your columns are numeric, you can use boolean indexing:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})
In [3]: df
Out[3]:
a b
0 0 -3
1 -1 2
2 2 1
In [4]: df[df < 0] = 0
In [5]: df
Out[5]:
a b
0 0 0
1 0 2
2 2 1
For the more general case, this answer shows the private method _get_numeric_data:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1],
'c': ['foo', 'goo', 'bar']})
In [3]: df
Out[3]:
a b c
0 0 -3 foo
1 -1 2 goo
2 2 1 bar
In [4]: num = df._get_numeric_data()
In [5]: num[num < 0] = 0
In [6]: df
Out[6]:
a b c
0 0 0 foo
1 0 2 goo
2 2 1 bar
With timedelta type, boolean indexing seems to work on separate columns, but not on the whole dataframe. So you can do:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
...: 'b': pd.to_timedelta([-3, 2, 1], 'd')})
In [3]: df
Out[3]:
a b
0 0 days -3 days
1 -1 days 2 days
2 2 days 1 days
In [4]: for k, v in df.iteritems():
...: v[v < 0] = 0
...:
In [5]: df
Out[5]:
a b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days
Update: comparison with a pd.Timedelta works on the whole DataFrame:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
...: 'b': pd.to_timedelta([-3, 2, 1], 'd')})
In [3]: df[df < pd.Timedelta(0)] = 0
In [4]: df
Out[4]:
a b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days
Method 2
Another succinct way of doing this is pandas.DataFrame.clip.
For example:
import pandas as pd
In [20]: df = pd.DataFrame({'a': [-1, 100, -2]})
In [21]: df
Out[21]:
a
0 -1
1 100
2 -2
In [22]: df.clip(lower=0)
Out[22]:
a
0 0
1 100
2 0
There’s also df.clip_lower(0).
Method 3
Perhaps you could use pandas.where(args) like so:
data_frame = data_frame.where(data_frame < 0, 0)
Method 4
Another clean option that I have found useful is pandas.DataFrame.mask which will “replace values where the condition is true.”
Create the DataFrame:
In [2]: import pandas as pd
In [3]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})
In [4]: df
Out[4]:
a b
0 0 -3
1 -1 2
2 2 1
Replace negative numbers with 0:
In [5]: df.mask(df < 0, 0) Out[5]: a b 0 0 0 1 0 2 2 2 1
Or, replace negative numbers with NaN, which I frequently need:
In [7]: df.mask(df < 0)
Out[7]:
a b
0 0.0 NaN
1 NaN 2.0
2 2.0 1.0
Method 5
With lambda function
df['column'] = df['column'].apply(lambda x : x if x > 0 else 0)
Method 6
If you are dealing with a large df (40m x 700 in my case) it works much faster and memory savvy through iteration on columns with something like.
for col in df.columns:
df[col][df[col] < 0] = 0
Method 7
A slight modification of the answers present.
Let’s identify all the numeric columns and create a dataframe with all numeric values.
Then replace the negative values with NaN in new dataframe
df_numeric = df.select_dtypes(include=[np.number]) df_numeric = df_numeric.where(lambda x: x > 0, np.nan)
Now, drop the columns where negative values are handled in the main data frame and then concatenate the new column values to the main data frame
numeric_cols = df_numeric.columns.values df = df.drop(columns=numeric_cols) df = pd.concat([df, df_numeric], axis = 1)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0