Using the GöDL Data Catalog for Semantic Data Access on the GWDG HPC Systems

Content

Data management is generally challenging, but particularly on HPC systems. Due to the tiered storage systems, data may reside on different storage systems. Particularly data-intensive research often have large data sets, with many files. Using the well-established practice of encoding semantic metadata in paths and filenames can quickly accumulate, rendering it hard to employ on very big data sets.

A different approach is to use a data catalog, where a set of metadata tags can be indexed and associated with individual files. This allows to identify and access files based on semantic queries, not based on overly complicated paths.

This course will provide a basic introduction into the Data Catalog tool provided by the GWDG on all of its HPC systems. Following a short presentation, participants can explore the tool during a hands-on session on their own.

Requirements

  • Basic experience with HPC systems
  • Basic experience with data management

Learning goal

Understand the concept of a data catalog and how to apply them in your use cases Learn how to use the GöDL Data Catalog to ingest, search, stage and migrate your data as part of an overarching HPC workflow

Skills

Trainer

Next appointment

DateLink
13.03.2025https://academy.gwdg.de/p/event.xhtml?id=6734551a5d441669671bc634
30.10.2025https://academy.gwdg.de/p/event.xhtml?id=6826435c298a9177e714d86e
Last modified: 2025-05-27 06:59:41