I have DataFrame with column Sales.
How can I split it into 2 based on Sales value?
First DataFrame will have data with 'Sales' < s and second with 'Sales' >= s
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can use boolean indexing:
df = pd.DataFrame({'Sales':[10,20,30,40,50], 'A':[3,4,7,6,1]})
print (df)
A Sales
0 3 10
1 4 20
2 7 30
3 6 40
4 1 50
s = 30
df1 = df[df['Sales'] >= s]
print (df1)
A Sales
2 7 30
3 6 40
4 1 50
df2 = df[df['Sales'] < s]
print (df2)
A Sales
0 3 10
1 4 20
It’s also possible to invert mask by ~:
mask = df['Sales'] >= s df1 = df[mask] df2 = df[~mask] print (df1) A Sales 2 7 30 3 6 40 4 1 50 print (df2) A Sales 0 3 10 1 4 20
print (mask) 0 False 1 False 2 True 3 True 4 True Name: Sales, dtype: bool print (~mask) 0 True 1 True 2 False 3 False 4 False Name: Sales, dtype: bool
Method 2
Using groupby you could split into two dataframes like
In [1047]: df1, df2 = [x for _, x in df.groupby(df['Sales'] < 30)] In [1048]: df1 Out[1048]: A Sales 2 7 30 3 6 40 4 1 50 In [1049]: df2 Out[1049]: A Sales 0 3 10 1 4 20
Method 3
Using “groupby” and list comprehension:
Storing all the split dataframe in list variable and accessing each of the seprated dataframe by their index.
DF = pd.DataFrame({'chr':["chr3","chr3","chr7","chr6","chr1"],'pos':[10,20,30,40,50],})
ans = [y for x, y in DF.groupby('chr', as_index=False)]
accessing the separated DF like this:
ans[0] ans[1] ans[len(ans)-1] # this is the last separated DF
accessing the column value of the separated DF like this:
ansI_chr=ans[i].chr
Method 4
One-liner using the walrus operator (Python 3.8):
df1, df2 = df[(mask:=df['Sales'] >= 30)], df[~mask]
Consider using copy to avoid SettingWithCopyWarning:
df1, df2 = df[(mask:=df['Sales'] >= 30)].copy(), df[~mask].copy()
Alternatively, you can use the method query:
df1, df2 = df.query('Sales >= 30').copy(), df.query('Sales < 30').copy()
Method 5
I like to use this for speeding up searches or rolling average finds .apply(lambda x…) type functions so I split big files into dictionaries of dataframes:
df_dict = {sale_v: df[df['Sales'] == sale_v] for sale_v in df.Sales.unique()}
This should do it if you wanted to go based on categorical groups.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0