reading excel to a python data frame starting from row 5 and including headers

how do I import excel data into a dataframe in python.

Basically the current excel workbook runs some vba on opening which refreshes a pivot table and does some other stuff.

Then I wish to import the results of the pivot table refresh into a dataframe in python for further analysis.

import xlrd

wb = xlrd.open_workbook('C:UserscbMachine_LearningcMap_Joins.xlsm')

#sheetnames
print wb.sheet_names()

#number of sheets
print wb.nsheets

The refreshing and opening of the file works fine. But how do i select the data from the first sheet from say row 5 including header down to last record n.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can use pandas’ ExcelFile parse method to read Excel sheets, see io docs:

xls = pd.ExcelFile('C:UserscbMachine_LearningcMap_Joins.xlsm')

df = xls.parse('Sheet1', skiprows=4, index_col=None, na_values=['NA'])

skiprows will ignore the first 4 rows (i.e. start at row index 4), and several other options.

Method 2

The accepted answer is old (as discussed in comments of the accepted answer).
Now the preferred option is using pd.read_excel(). For example:

df = pandas.read_excel('C:UserscbMachine_LearningcMap_Joins.xlsm'), skiprows=[0,1,2,3,4])


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x