hdf5 Archives - Magenaut

“Large data” workflows using pandas

August 22, 2022 by Magenaut

Dask emphasizes the following virtues:

Optimal HDF5 dataset chunk shape for reading rows

August 18, 2022 by Magenaut

I have a reasonable size (18GB compressed) HDF5 dataset and am looking to optimize reading rows for speed. Shape is (639038, 10000). I will be reading a selection of rows (say ~1000 rows) many times, located across the dataset. So I can’t use x:(x+1000) to slice rows.

How to read HDF5 files in Python

August 16, 2022 by Magenaut

I am trying to read data from hdf5 file in Python. I can read the hdf5 file using h5py, but I cannot figure out how to access data within the file.

Saving to hdf5 is very slow (Python freezing)

August 15, 2022 by Magenaut

I’m trying to save bottleneck values to a newly created hdf5 file.
The bottleneck values come in batches of shape (120,10,10, 2048).
Saving one alone batch is taking up more than 16 gigs and python seems to be freezing at that one batch. Based on recent findings (see update, it seems hdf5 taking up large memory is okay, but the freezing part seems to be a glitch.

How to read a v7.3 mat file via h5py?

August 12, 2022 by Magenaut

I have a struct array created by matlab and stored in v7.3 format mat file: