How to retrieve all variable names within a netcdf using GDAL

I am struggling to find a way to retrieve metadata information from a FILE using GDAL.
Specifically, I would like to retrieve the band names and the order in which they are stored in a given file (may that be a GEOTIFF or a NETCDF).

For instance, if we follow the description within the GDAL documentation, we have the “GetMetaData” method from the gdal.Dataset (see here and here). Despite this method returning a whole set of information regarding the dataset, it does not provide the band names and the order that they are stored within the given FILE. As a matter of fact, it seems to be an old problem (from 2015) that seems not to be solved yet (more info here). As it seems, “R” language has already solved this problem (see here), though Python hasn’t.

Just to be thorough here, I know that there are other Python packages that can help in this endeavour (e.g., xarray, rasterio, etc.); nevertheless, it would be important to be concise with the set of packages that one should use in a single script. Therefore, I would like to know a definite way to find the band (a.k.a., variable) names and the order they are stored within a single FILE using gdal.

Please, let me know your thoughs in this regard.

Below, I present a starting point for solving this Issue, in which a file is opened by GDAL (creating a Dataset object).

from gdal import Dataset
from osgeo import gdal

OpeneddatasetFile = gdal.Open(f'NETCDF:{input}/{file_name}.nc:' + var)

if isinstance(OpeneddatasetFile , Dataset):
    print("File opened successfully")


# here is where one should be capable of fetching the variable (a.k.a., band) names
# of the OpeneddatasetFile.
# Ideally, it would be most welcome some kind of method that could return a dictionary 
# with this information

# something like:

# VariablesWithinFile = OpeneddatasetFile.getVariablesWithinFileAsDictionary()

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

I have finally found a way to retrieve variable names from the NETCDF file using GDAL, and that is thank’s to the comments given by Robert Davy above.

I have organized the code into a set of functions to help its visualization. Notice that there is also a function for reading metadata from the NETCDF, which returns this info in a dictionary format (see the “readInfo” function).

from gdal import Dataset, InfoOptions
from osgeo import gdal
import numpy as np


def read_data(filename):

    dataset = gdal.Open(filename)

    if not isinstance(dataset, Dataset):
        raise FileNotFoundError("Impossible to open the netcdf file")

    return dataset


def readInfo(ds, infoFormat="json"):
    "how to: https://gdal.org/python/"

    info = gdal.Info(ds, options=InfoOptions(format=infoFormat))

    return info


def listAllSubDataSets(infoDict: dict):

    subDatasetVariableKeys = [x for x in infoDict["metadata"]["SUBDATASETS"].keys()
                              if "_NAME" in x]

    subDatasetVariableNames = [infoDict["metadata"]["SUBDATASETS"][x]
                               for x in subDatasetVariableKeys]

    formatedsubDatasetVariableNames = []

    for x in subDatasetVariableNames:

        s = x.replace('"', '').split(":")[-1]
        s = ''.join(s)
        formatedsubDatasetVariableNames.append(s)

    return formatedsubDatasetVariableNames


if "__main__" == __name__:

    filename = "netcdfFile.nc"
    ds = read_data(filename)

    infoDict = readInfo(ds)

    infoDict["VariableNames"] = listAllSubDataSets(infoDict)


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x