Enumerations ------------ HDF5 has the concept of an "`enumeration data type`", where integer values are stored in an array, but where those integer values should be interpreted as the indexes to some string values. For example, one could have an enumeration dictionary (`enum_dict`) defined as: .. code-block:: python clouds = ['stratus', 'strato-cumulus', 'missing', 'nimbus', 'cumulus', 'longcloudname'] enum_dict = {v:k for k,v in enumerate(clouds)} enum_dict['missing'] = 255 And an array of data which looked something like .. code-block:: python cloud_cover = [0, 3, 4, 4, 4, 1, 255, 1, 1] Which one would expect to interpret as .. code-block:: python actual_cloud_cover = ['stratus', 'nimbus', 'cumulus', 'cumulus', 'cumulus', 'stratus', 'missing', 'strato-cumulus', 'strato-cumulus'] These data are stored in HDF5 using a combination of an integer valued array and a stored dictionary which is used for the enumeration. When the data is read, the integer array has a special numpy datatype, with the enumeration dictionary stored as metadata on the data type. The enumeration dictionary itself can be stored as a ``Datatype``, but it doesn't need to be and nor is it necessary to use that datatype to use an enumeration variable (the enumeration is not stored as a normal data variable and so can be stored without using a Datatype object in the file). So, while finding a Datatype in your HDF5 file is probably an indication that you have an enumeration (or some other complication) in the file, it is not necessary to do anything with it if it is an enumeration datatype. Whether or not there is an enumeration DataType in the file, one can only find out if any integer data array read from a data file is linked to an enumeration by checking it's data type using :meth:`pyfive.check_enum_dtype` as shown in the following example: .. code-block:: python with pyfive.File('myfile.h5') as pfile: evar = pfile['evar'] edict = pyfive.check_enum_dtype(evar.dtype) if edict is None: pass # not an enumeration else: # for some reason HDF5 defines these in what seems to be the wrong way around, # with the string values as keys to the integer indices. edict_reverse = {v:k for k,v in edict.items()} # assuming evar data is a one dimensional array of integers edata = [edict_reverse[k] for k in evar[:]] In this instance, ``edata`` would now be a array of strings indexed from the enumeration dictionary using the ``evar`` data as the index values. .. note:: ``h5py`` and hence ``pyfive`` have both used an internal numpy dtype metadata feature to implement enumerations. ``numpy`` is not clear on the future of this feature, and doesn't promise to transfer metadata with all operations, so the output of operations on this integer array may lose the direct link to the enumeration via the dtype. Meanwhile, as well as using the `check_enum_dtype`, you can also get to this dictionary directly yourself, it is available at ``evar.dtype.metadata['enum']``.