I am trying to read data from hdf5 file in Python. I can read the hdf5 file using h5py
, but I cannot figure out how to access data within the file.
My code
import h5py import numpy as np f1 = h5py.File(file_name,'r+')
This works and the file is read. But how can I access data inside the file object f1
?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Read HDF5
import h5py filename = "file.hdf5" with h5py.File(filename, "r") as f: # Print all root level object names (aka keys) # these can be group or dataset names print("Keys: %s" % f.keys()) # get first object name/key; may or may NOT be a group a_group_key = list(f.keys())[0] # get the object type for a_group_key: usually group or dataset print(type(f[a_group_key])) # If a_group_key is a group name, # this gets the object names in the group and returns as a list data = list(f[a_group_key]) # If a_group_key is a dataset name, # this gets the dataset values and returns as a list data = list(f[a_group_key]) # preferred methods to get dataset values: ds_obj = f[a_group_key] # returns as a h5py dataset object ds_arr = f[a_group_key][()] # returns as a numpy array
Write HDF5
import h5py # Create random data import numpy as np data_matrix = np.random.uniform(-1, 1, size=(10, 3)) # Write data to HDF5 with h5py.File("file.hdf5", "w") as data_file: data_file.create_dataset("dataset_name", data=data_matrix)
See h5py docs for more information.
Alternatives
- JSON: Nice for writing human-readable data; VERY commonly used (read & write)
- CSV: Super simple format (read & write)
- pickle: A Python serialization format (read & write)
- MessagePack (Python package): More compact representation (read & write)
- HDF5 (Python package): Nice for matrices (read & write)
- XML: exists too *sigh* (read & write)
For your application, the following might be important:
- Support by other programming languages
- Reading / writing performance
- Compactness (file size)
See also: Comparison of data serialization formats
In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python
Method 2
Reading the file
import h5py f = h5py.File(file_name, mode)
Studying the structure of the file by printing what HDF5 groups are present
for key in f.keys(): print(key) #Names of the root level object names in HDF5 file - can be groups or datasets. print(type(f[key])) # get the object type: usually group or dataset
Extracting the data
#Get the HDF5 group; key needs to be a group name from above group = f[key] #Checkout what keys are inside that group. for key in group.keys(): print(key) # This assumes group[some_key_inside_the_group] is a dataset, # and returns a np.array: data = group[some_key_inside_the_group][()] #Do whatever you want with data #After you are done f.close()
Method 3
you can use Pandas.
import pandas as pd pd.read_hdf(filename,key)
Method 4
Here’s a simple function I just wrote which reads a .hdf5 file generated by the save_weights function in keras and returns a dict with layer names and weights:
def read_hdf5(path): weights = {} keys = [] with h5py.File(path, 'r') as f: # open file f.visit(keys.append) # append all keys to list for key in keys: if ':' in key: # contains data if ':' in key print(f[key].name) weights[f[key].name] = f[key].value return weights
https://gist.github.com/Attila94/fb917e03b04035f3737cc8860d9e9f9b.
Haven’t tested it thoroughly but does the job for me.
Method 5
To read the content of .hdf5 file as an array, you can do something as follow
> import numpy as np > myarray = np.fromfile('file.hdf5', dtype=float) > print(myarray)
Method 6
Use below code to data read and convert into numpy array
import h5py
f1 = h5py.File('data_1.h5', 'r')
list(f1.keys())
X1 = f1['x']
y1=f1['y']
df1= np.array(X1.value)
dfy1= np.array(y1.value)
print (df1.shape)
print (dfy1.shape)
Preferred method to read dataset values into a numpy array:
import h5py # use Python file context manager: with h5py.File('data_1.h5', 'r') as f1: print(list(f1.keys())) # print list of root level objects # following assumes 'x' and 'y' are dataset objects ds_x1 = f1['x'] # returns h5py dataset object for 'x' ds_y1 = f1['y'] # returns h5py dataset object for 'y' arr_x1 = f1['x'][()] # returns np.array for 'x' arr_y1 = f1['y'][()] # returns np.array for 'y' arr_x1 = ds_x1[()] # uses dataset object to get np.array for 'x' arr_y1 = ds_y1[()] # uses dataset object to get np.array for 'y' print (arr_x1.shape) print (arr_y1.shape)
Method 7
from keras.models import load_model h= load_model('FILE_NAME.h5')
Method 8
If you have named datasets in the hdf file then you can use the following code to read and convert these datasets in numpy arrays:
import h5py file = h5py.File('filename.h5', 'r') xdata = file.get('xdata') xdata= np.array(xdata)
If your file is in a different directory you can add the path in front of'filename.h5'
.
Method 9
What you need to do is create a dataset. If you take a look at the quickstart guide, it shows you that you need to use the file object in order to create a dataset. So, f.create_dataset
and then you can read the data. This is explained in the docs.
Method 10
Using bits of answers from this question and the latest doc, I was able to extract my numerical arrays using
import h5py with h5py.File(filename, 'r') as h5f: h5x = h5f[list(h5f.keys())[0]]['x'][()]
Where 'x'
is simply the X coordinate in my case.
Method 11
use this it works fine for me
weights = {} keys = [] with h5py.File("path.h5", 'r') as f: f.visit(keys.append) for key in keys: if ':' in key: print(f[key].name) weights[f[key].name] = f[key][()] return weights print(read_hdf5())
if you are using the h5py<=’2.9.0′
then you can use
weights = {} keys = [] with h5py.File("path.h5", 'r') as f: f.visit(keys.append) for key in keys: if ':' in key: print(f[key].name) weights[f[key].name] = f[key].value return weights print(read_hdf5())
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0