I have an existing dataframe which I need to add an additional column to which will contain the same value for every row.
Existing df:
Date, Open, High, Low, Close 01-01-2015, 565, 600, 400, 450
New df:
Name, Date, Open, High, Low, Close abc, 01-01-2015, 565, 600, 400, 450
I know how to append an existing series / dataframe column. But this is a different situation, because all I need is to add the ‘Name’ column and set every row to the same value, in this case ‘abc’.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
df['Name']='abc' will add the new column and set all rows to that value:
In [79]:
df
Out[79]:
Date, Open, High, Low, Close
0 01-01-2015, 565, 600, 400, 450
In [80]:
df['Name'] = 'abc'
df
Out[80]:
Date, Open, High, Low, Close Name
0 01-01-2015, 565, 600, 400, 450 abc
Method 2
You can use insert to specify where you want to new column to be. In this case, I use 0 to place the new column at the left.
df.insert(0, 'Name', 'abc') Name Date Open High Low Close 0 abc 01-01-2015 565 600 400 450
Method 3
Summing up what the others have suggested, and adding a third way
You can:
-
assign(**kwargs):
df.assign(Name='abc')
-
access the new column series (it will be created) and set it:
df['Name'] = 'abc'
-
insert(loc, column, value, allow_duplicates=False)
df.insert(0, 'Name', 'abc')
where the argument loc ( 0 <= loc <= len(columns) ) allows you to insert the column where you want.
‘loc’ gives you the index that your column will be at after the insertion. For example, the code above inserts the column Name as the 0-th column, i.e. it will be inserted before the first column, becoming the new first column. (Indexing starts from 0).
All these methods allow you to add a new column from a Series as well (just substitute the ‘abc’ default argument above with the series).
Method 4
Single liner works
df['Name'] = 'abc'
Creates a Name column and sets all rows to abc value
Method 5
One Line did the job for me.
df['New Column'] = 'Constant Value' df['New Column'] = 123
Method 6
I want to draw more attention to a portion of @michele-piccolini’s answer.
I strongly believe that .assign is the best solution here. In the real world, these operations are not in isolation, but in a chain of operations. And if you want to support a chain of operations, you should probably use the .assign method.
Here is an example using snowfall data at a ski resort (but the same principles would apply to say … financial data).
This code reads like a recipe of steps. Both assignment (with =) and .insert make this much harder:
raw = pd.read_csv('https://github.com/mattharrison/datasets/raw/master/data/alta-noaa-1980-2019.csv',
parse_dates=['DATE'])
def clean_alta(df):
return (df
.loc[:, ['STATION', 'NAME', 'LATITUDE', 'LONGITUDE', 'ELEVATION', 'DATE',
'PRCP', 'SNOW', 'SNWD', 'TMAX', 'TMIN', 'TOBS']]
.groupby(pd.Grouper(key='DATE', freq='W'))
.agg({'PRCP': 'sum', 'TMAX': 'max', 'TMIN': 'min', 'SNOW': 'sum', 'SNWD': 'mean'})
.assign(LOCATION='Alta',
T_RANGE=lambda w_df: w_df.TMAX-w_df.TMIN)
)
clean_alta(raw)
Notice the line .assign(LOCATION='Alta', that creates a column with a single value in the middle of the rest of the operations.
Method 7
You can Simply do the following:
df['New Col'] = pd.Series(["abc" for x in range(len(df.index))])
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0