How to deal with multi-level column names downloaded with yfinance

I have a list of tickers (tickerStrings) that I have to download all at once. When I try to use Pandas’ read_csv it doesn’t read the CSV file in the way it does when I download the data from yfinance.

I usually access my data by ticker like this: data['AAPL'] or data['AAPL'].Close, but when I read the data from the CSV file it does not let me do that.

if path.exists(data_file):
    data = pd.read_csv(data_file, low_memory=False)
    data = pd.DataFrame(data)
    print(data.head())
else:
    data = yf.download(tickerStrings, group_by="Ticker", period=prd, interval=intv)
    data.to_csv(data_file)

Here’s the print output:

                  Unnamed: 0                 OLN               OLN.1               OLN.2               OLN.3  ...                 W.1                 W.2                 W.3                 W.4     W.5
0                        NaN                Open                High                 Low               Close  ...                High                 Low               Close           Adj Close  Volume
1                   Datetime                 NaN                 NaN                 NaN                 NaN  ...                 NaN                 NaN                 NaN                 NaN     NaN
2  2020-06-25 09:30:00-04:00    11.1899995803833  11.220000267028809  11.010000228881836  11.079999923706055  ...   201.2899932861328   197.3000030517578  197.36000061035156  197.36000061035156  112156
3  2020-06-25 09:45:00-04:00  11.130000114440918  11.260000228881836  11.100000381469727   11.15999984741211  ...  200.48570251464844  196.47999572753906  199.74000549316406  199.74000549316406   83943
4  2020-06-25 10:00:00-04:00  11.170000076293945  11.220000267028809  11.119999885559082  11.170000076293945  ...  200.49000549316406  198.19000244140625   200.4149932861328   200.4149932861328   88771

The error I get when trying to access the data:

Traceback (most recent call last):
File "getdata.py", line 49, in processData
    avg = data[x].Close.mean()
AttributeError: 'Series' object has no attribute 'Close'

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Download all tickers into single dataframe with single level column headers

Option 1

  • When downloading single stock ticker data, the returned dataframe column names are a single level, but don’t have a ticker column.
  • This will download data for each ticker, add a ticker column, and create a single dataframe from all desired tickers.
import yfinance as yf
import pandas as pd

tickerStrings = ['AAPL', 'MSFT']
df_list = list()
for ticker in tickerStrings:
    data = yf.download(ticker, group_by="Ticker", period='2d')
    data['ticker'] = ticker  # add this column because the dataframe doesn't contain a column with the ticker
    df_list.append(data)

# combine all dataframes into a single dataframe
df = pd.concat(df_list)

# save to csv
df.to_csv('ticker.csv')

Option 2

  • Download all the tickers and unstack the levels
    • group_by='Ticker' puts the ticker at level=0 of the column name
tickerStrings = ['AAPL', 'MSFT']
df = yf.download(tickerStrings, group_by='Ticker', period='2d')
df = df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)

Read yfinance csv already stored with multi-level column names

  • If you wish to keep, and read in a file with a multi-level column index, use the following code, which will return the dataframe to its original form.
df = pd.read_csv('test.csv', header=[0, 1])
df.drop([0], axis=0, inplace=True)  # drop this row because it only has one column with Date in it
df[('Unnamed: 0_level_0', 'Unnamed: 0_level_1')] = pd.to_datetime(df[('Unnamed: 0_level_0', 'Unnamed: 0_level_1')], format='%Y-%m-%d')  # convert the first column to a datetime
df.set_index(('Unnamed: 0_level_0', 'Unnamed: 0_level_1'), inplace=True)  # set the first column as the index
df.index.name = None  # rename the index
  • The issue is, tickerStrings is a list of tickers, which results in a final dataframe with multi-level column names
                AAPL                                                    MSFT                                
                Open      High       Low     Close Adj Close     Volume Open High Low Close Adj Close Volume
Date                                                                                                        
1980-12-12  0.513393  0.515625  0.513393  0.513393  0.405683  117258400  NaN  NaN NaN   NaN       NaN    NaN
1980-12-15  0.488839  0.488839  0.486607  0.486607  0.384517   43971200  NaN  NaN NaN   NaN       NaN    NaN
1980-12-16  0.453125  0.453125  0.450893  0.450893  0.356296   26432000  NaN  NaN NaN   NaN       NaN    NaN
1980-12-17  0.462054  0.464286  0.462054  0.462054  0.365115   21610400  NaN  NaN NaN   NaN       NaN    NaN
1980-12-18  0.475446  0.477679  0.475446  0.475446  0.375698   18362400  NaN  NaN NaN   NaN       NaN    NaN
  • When this is saved to a csv, it looks like the following example, and results in a dataframe like you’re having issues with.
,AAPL,AAPL,AAPL,AAPL,AAPL,AAPL,MSFT,MSFT,MSFT,MSFT,MSFT,MSFT
,Open,High,Low,Close,Adj Close,Volume,Open,High,Low,Close,Adj Close,Volume
Date,,,,,,,,,,,,
1980-12-12,0.5133928656578064,0.515625,0.5133928656578064,0.5133928656578064,0.40568336844444275,117258400,,,,,,
1980-12-15,0.4888392984867096,0.4888392984867096,0.4866071343421936,0.4866071343421936,0.3845173120498657,43971200,,,,,,
1980-12-16,0.453125,0.453125,0.4508928656578064,0.4508928656578064,0.3562958240509033,26432000,,,,,,

Flatten multi-level columns into a single level and add a ticker column

  • If the ticker symbol is level=0 (top) of the column names
    • When group_by='Ticker' is used
df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)
  • If the ticker symbol is level=1 (bottom) of the column names
df.stack(level=1).rename_axis(['Date', 'Ticker']).reset_index(level=1)

Download each ticker and save it to a separate file

  • I recommend downloading and saving each ticker individually, which would look something like the following:
import yfinance as yf
import pandas as pd

tickerStrings = ['AAPL', 'MSFT']
for ticker in tickerStrings:
    data = yf.download(ticker, group_by="Ticker", period=prd, interval=intv)
    data['ticker'] = ticker  # add this column because the dataframe doesn't contain a column with the ticker
    data.to_csv(f'ticker_{ticker}.csv')  # ticker_AAPL.csv for example
  • data will look like
                Open      High       Low     Close  Adj Close      Volume ticker
Date                                                                            
1986-03-13  0.088542  0.101562  0.088542  0.097222   0.062205  1031788800   MSFT
1986-03-14  0.097222  0.102431  0.097222  0.100694   0.064427   308160000   MSFT
1986-03-17  0.100694  0.103299  0.100694  0.102431   0.065537   133171200   MSFT
1986-03-18  0.102431  0.103299  0.098958  0.099826   0.063871    67766400   MSFT
1986-03-19  0.099826  0.100694  0.097222  0.098090   0.062760    47894400   MSFT
  • the resulting csv will look like
Date,Open,High,Low,Close,Adj Close,Volume,ticker
1986-03-13,0.0885416641831398,0.1015625,0.0885416641831398,0.0972222238779068,0.0622050017118454,1031788800,MSFT
1986-03-14,0.0972222238779068,0.1024305522441864,0.0972222238779068,0.1006944477558136,0.06442664563655853,308160000,MSFT
1986-03-17,0.1006944477558136,0.1032986119389534,0.1006944477558136,0.1024305522441864,0.0655374601483345,133171200,MSFT
1986-03-18,0.1024305522441864,0.1032986119389534,0.0989583358168602,0.0998263880610466,0.06387123465538025,67766400,MSFT
1986-03-19,0.0998263880610466,0.1006944477558136,0.0972222238779068,0.0980902761220932,0.06276042759418488,47894400,MSFT

Read in multiple files saved with the previous section and create a single dataframe

import pandas as pd
from pathlib import Path

# set the path to the files
p = Path('c:/path_to_files')

# find the files; this is a generator, not a list
files = p.glob('ticker_*.csv')

# read the files into a dataframe
df = pd.concat([pd.read_csv(file) for file in files])

Method 2

To turn it into a dict of d[ticker]=df:

df = yf.download(tickers, group_by="ticker")
d = {idx: gp.xs(idx, level=0, axis=1) for idx, gp in df.groupby(level=0, axis=1)}

Method 3

Another option which maintains the pandas dataframe but drops the data you don’t need is to change the column index from a multiindex to a single index. Since you only care about the ‘Close’ column, the first step will be throwing the other ones out:

df = yf.download(...)
df = df[['Close']]

This is great but leaves each column with a multiindex which looks like (Close/AAPL) or (Close/MSFT) etc. What you really want is just the ticker.

df.columns = [col[1] for col in df.columns]

Now if you want to split the dataframe into separate ones for each column you can do this with list comprehension.

separated = [df.iloc[:,i] for i in range(len(df.columns))]

Method 4

Use the below line to write and read the CSV file. They will be in the exact format as you downloaded from the yfinance API.

To write to a file

data.to_csv('file_loc')

To read the file

data = pd.read_csv('file_loc', header=[0, 1], index_col=[0])


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x