I’m using Pandas data frames. I have a initial data frame, say D. I extract two data frames from it like this:
A = D[D.label == k] B = D[D.label != k]
I want to combine A and B so I can have them as one DataFrame, something like a union operation. The order of the data is not important. However, when we sample A and B from D, they retain their indexes from D.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Deprecation Notice:
DataFrame.appendandSeries.appendwere deprecated in v1.4.0
I believe you can use the append method
bigdata = data1.append(data2, ignore_index=True)
to keep their indexes just don’t use the ignore_index keyword…
Method 2
You can also use pd.concat, which is particularly helpful when you are joining more than two dataframes:
bigdata = pd.concat([data1, data2], ignore_index=True, sort=False)
Method 3
Thought to add this here in case someone finds it useful. @ostrokach already mentioned how you can merge the data frames across rows which is
df_row_merged = pd.concat([df_a, df_b], ignore_index=True)
To merge across columns, you can use the following syntax:
df_col_merged = pd.concat([df_a, df_b], axis=1)
Method 4
If you’re working with big data and need to concatenate multiple datasets calling concat many times can get performance-intensive.
If you don’t want to create a new df each time, you can instead aggregate the changes and call concat only once:
frames = [df_A, df_B] # Or perform operations on the DFs
result = pd.concat(frames)
This is pointed out in the pandas docs under concatenating objects at the bottom of the section):
Note: It is worth noting however, that
concat(and thereforeappend)
makes a full copy of the data, and that constantly reusing this
function can create a significant performance hit. If you need to use
the operation over several datasets, use a list comprehension.
Method 5
If you want to update/replace the values of first dataframe df1 with the values of second dataframe df2. you can do it by following steps —
Step 1: Set index of the first dataframe (df1)
df1.set_index('id')
Step 2: Set index of the second dataframe (df2)
df2.set_index('id')
and finally update the dataframe using the following snippet —
df1.update(df2)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0