How to read specific part of large file in Python

Given a large file (hundreds of MB) how would I use Python to quickly read the content between a specific start and end index within the file?

Essentially, I’m looking for a more efficient way of doing:

open(filename).read()[start_index:end_index]

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can seek into the file the file and then read a certain amount from there. Seek allows you to get to a specific offset within a file, and then you can limit your read to only the number of bytes in that range.

with open(filename) as fin:
    fin.seek(start_index)
    data = fin.read(end_index - start_index)

That will only read that data that you’re looking for.

Method 2

This is my solution with variable width encoding. My CSV file contains a dictionary where each row is a new item.

def get_stuff(filename, count, start_index):
    with open(filename, 'r') as infile:
             reader = csv.reader(infile)
             num = 0 
             for idx, row in enumerate(reader):
                 if idx >= start_index-1:
                     if num >= count:
                         return
                 else:
                     yield row 
                     num += 1


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x