WHY NETCDF AND OPENDAP?

Because, as a data format, netCDF has a lot of desirable properties that are important for both long-term usability and efficient processing, for a large class of numerical data. OPeNDAP adds programmatic access to specific data inside datasets without the need to download the whole file.

Currently, the 4TU.ReseachData archive consists of approx. 90% netCDF data. This netCDF data can be found in our data repository or directly on our OPeNDAP server.

What is netCDF

Many datasets essentially consist of multi-dimensional blocks of numbers associated with some variable and a physical unit, and each dimension (axis of the block) is decorated with numbers and a unit. Add global metadata describing everything needed to understand the data, stuff it all in one file, make sure it is efficient to store and process, even add support for more complex data structuers and voilá, there is your perfect data format. Exactly that is netCDF.

Support for reading and writing netCDF exists, either natively or embedded in rich distributions or through additional packages, for Python, Java, C/C++, Fortran, Matlab.

More about netCDF: short, long.

Conventions

NetCDF offers infinite freedom over how to model your data and metadata within the paradigm of the annotated multidimensional array. This is nice, but for proper cooperation and interoperability you need to agree on a few things, like the physical units and their names or, more generally, names (and sometimes allowed values) of metadata fields for the dataset and its variables and dimensions. These agreements are called conventions and the most important ones are the CF (Climate and Forecast) conventions. There is a large community around the CF conventions and there are specific software tools that rely on these conventions being used. As you may expect from the name “Climate and Forecast”, netCDF is most widely used in atmospheric sciences, oceanography and related fields.

What is OPeNDAP

OPeNDAP is a protocol enabling the use of data from a remote server without the need to download the data files. This includes inspection of the embedded metadata as well as specific ranges, slices and subsamples of the data. Conversely, a server can be configured to serve a set of data files (e.g. a time series with a file for each month of data) as a single dataset. While much of this is directly accessible with a web browser, most powerful use of OPeNDAP is through the interfaces that are available for Pyhon, Java, Matlab etc.
OPeNDAP is especially suited (but not strictly limited) to netCDF data.

OPeNDAP and 4TU.ResearchData

At 4TU.ResearchData, we have a so-called THREDDS server that “speaks” OPeNDAP, for (primarily) netCDF files. Uploaded datasets consisting of netCDF files end up there. For datasets with mixed content (partially netCDF), we decide on a case-by-case basis whether the netCDF files be stored on the OPeNDAP server or in the regular data repository together with the rest of the dataset. In all cases, there will be a “homepage” of the dataset in the regular data repository, with metadata and a DOI, if necessary linking to the OPeNDAP data. Currently, this page also shows an overview of the files op OPeNDAP and the available options of viewing their content and/or metadata.

For large data collections involving many files, please contact us for a tailored solution.

Examples