I want to know if it is possible to use the pandas to_csv() function to add a dataframe to an existing csv file. The csv file has the same structure as the loaded data.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can specify a python write mode in the pandas to_csv function. For append it is ‘a’.
In your case:
df.to_csv('my_csv.csv', mode='a', header=False)
The default mode is ‘w’.
If the file initially might be missing, you can make sure the header is printed at the first write using this variation:
output_path='my_csv.csv' df.to_csv(output_path, mode='a', header=not os.path.exists(output_path))
Method 2
You can append to a csv by opening the file in append mode:
with open('my_csv.csv', 'a') as f:
df.to_csv(f, header=False)
If this was your csv, foo.csv:
,A,B,C 0,1,2,3 1,4,5,6
If you read that and then append, for example, df + 6:
In [1]: df = pd.read_csv('foo.csv', index_col=0)
In [2]: df
Out[2]:
A B C
0 1 2 3
1 4 5 6
In [3]: df + 6
Out[3]:
A B C
0 7 8 9
1 10 11 12
In [4]: with open('foo.csv', 'a') as f:
(df + 6).to_csv(f, header=False)
foo.csv becomes:
,A,B,C 0,1,2,3 1,4,5,6 0,7,8,9 1,10,11,12
Method 3
with open(filename, 'a') as f:
df.to_csv(f, header=f.tell()==0)
- Create file unless exists, otherwise append
- Add header if file is being created, otherwise skip it
Method 4
A little helper function I use with some header checking safeguards to handle it all:
def appendDFToCSV_void(df, csvFilePath, sep=","):
import os
if not os.path.isfile(csvFilePath):
df.to_csv(csvFilePath, mode='a', index=False, sep=sep)
elif len(df.columns) != len(pd.read_csv(csvFilePath, nrows=1, sep=sep).columns):
raise Exception("Columns do not match!! Dataframe has " + str(len(df.columns)) + " columns. CSV file has " + str(len(pd.read_csv(csvFilePath, nrows=1, sep=sep).columns)) + " columns.")
elif not (df.columns == pd.read_csv(csvFilePath, nrows=1, sep=sep).columns).all():
raise Exception("Columns and column order of dataframe and csv file do not match!!")
else:
df.to_csv(csvFilePath, mode='a', index=False, sep=sep, header=False)
Method 5
Initially starting with a pyspark dataframes – I got type conversion errors (when converting to pandas df’s and then appending to csv) given the schema/column types in my pyspark dataframes
Solved the problem by forcing all columns in each df to be of type string and then appending this to csv as follows:
with open('testAppend.csv', 'a') as f:
df2.toPandas().astype(str).to_csv(f, header=False)
Method 6
A bit late to the party but you can also use a context manager, if you’re opening and closing your file multiple times, or logging data, statistics, etc.
from contextlib import contextmanager
import pandas as pd
@contextmanager
def open_file(path, mode):
file_to=open(path,mode)
yield file_to
file_to.close()
##later
saved_df=pd.DataFrame(data)
with open_file('yourcsv.csv','r') as infile:
saved_df.to_csv('yourcsv.csv',mode='a',header=False)`
Method 7
This is how I did it in 2021
Let us say I have a csv sales.csv which has the following data in it:
sales.csv:
Order Name,Price,Qty oil,200,2 butter,180,10
and to add more rows I can load them in a data frame and append it to the csv like this:
import pandas
data = [
['matchstick', '60', '11'],
['cookies', '10', '120']
]
dataframe = pandas.DataFrame(data)
dataframe.to_csv("sales.csv", index=False, mode='a', header=False)
and the output will be:
Order Name,Price,Qty oil,200,2 butter,180,10 matchstick,60,11 cookies,10,120
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0