Using the GöDL Data Catalog for Semantic Data Access on the GWDG HPC Systems
Content
Data management is generally challenging, but particularly on HPC systems. Due to the tiered storage systems, data may reside on different storage systems. Particularly data-intensive research often have large data sets, with many files. Using the well-established practice of encoding semantic metadata in paths and filenames can quickly accumulate, rendering it hard to employ on very big data sets.
A different approach is to use a data catalog, where a set of metadata tags can be indexed and associated with individual files. This allows to identify and access files based on semantic queries, not based on overly complicated paths.
This course will provide a basic introduction into the Data Catalog tool provided by the GWDG on all of its HPC systems. Following a short presentation, participants can explore the tool during a hands-on session on their own.
Requirements
- Basic experience with HPC systems
- Basic experience with data management
Learning goal
Understand the concept of a data catalog and how to apply them in your use cases Learn how to use the GöDL Data Catalog to ingest, search, stage and migrate your data as part of an overarching HPC workflow
Skills
Trainer
Next appointment
Date | Link |
---|---|
13.03.2025 | https://academy.gwdg.de/p/event.xhtml?id=6734551a5d441669671bc634 |
30.10.2025 | https://academy.gwdg.de/p/event.xhtml?id=6826435c298a9177e714d86e |