I have a list of Pandas dataframes that I would like to combine into one Pandas dataframe. I am using Python 2.7.10 and Pandas 0.16.2
I created the list of dataframes from:
import pandas as pd
dfs = []
sqlall = "select * from mytable"
for chunk in pd.read_sql_query(sqlall , cnxn, chunksize=10000):
dfs.append(chunk)
This returns a list of dataframes
type(dfs[0]) Out[6]: pandas.core.frame.DataFrame type(dfs) Out[7]: list len(dfs) Out[8]: 408
Here is some sample data
# sample dataframes
d1 = pd.DataFrame({'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]})
d2 = pd.DataFrame({'one' : [5., 6., 7., 8.], 'two' : [9., 10., 11., 12.]})
d3 = pd.DataFrame({'one' : [15., 16., 17., 18.], 'two' : [19., 10., 11., 12.]})
# list of dataframes
mydfs = [d1, d2, d3]
I would like to combine d1, d2, and d3 into one pandas dataframe. Alternatively, a method of reading a large-ish table directly into a dataframe when using the chunksize option would be very helpful.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Given that all the dataframes have the same columns, you can simply concat them:
import pandas as pd df = pd.concat(list_of_dataframes)
Method 2
Just to add few more details:
Example:
list1 = [df1, df2, df3] import pandas as pd
-
Row-wise concatenation & ignoring indexes
pd.concat(list1, axis=0, ignore_index=True)
Note: If column names are not same then NaN would be inserted at different column values
-
Column-wise concatenation & want to keep column names
pd.concat(list1, axis=1, ignore_index=False)
If ignore_index=True, column names would be filled with numbers starting from 0 to (n-1), where n is the count of unique column names
Method 3
If the dataframes DO NOT all have the same columns try the following:
df = pd.DataFrame.from_dict(map(dict,df_list))
Method 4
You also can do it with functional programming:
from functools import reduce reduce(lambda df1, df2: df1.merge(df2, "outer"), mydfs)
Method 5
concat also works nicely with a list comprehension pulled using the “loc” command against an existing dataframe
df = pd.read_csv('./data.csv') # ie; Dataframe pulled from csv file with a "userID" column
review_ids = ['1','2','3'] # ie; ID values to grab from DataFrame
# Gets rows in df where IDs match in the userID column and combines them
dfa = pd.concat([df.loc[df['userID'] == x] for x in review_ids])
Method 6
panders concat works also as well in addition with functools
from functors import reduce as reduce
import pandas as pd;
deaf = pd.read_csv("http://www.aol.com/users/data.csv")
for q in range(0, Len(deaf)):
new = map(lambda x: reduce(pd.concat(x))
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0