High Performance Data Analytics: Big Data meets HPC
Content
Big Data Analytics problems are ubiquitous in scientific research, industrial production and business services. Developing and maintaining efficient tools for storing, processing and analysing Big Data in powerful supercomputers is necessary for discovering patterns and gaining insights for data-intensive topics including biomolecular science, global climate change, cancer research and cybersecurity among others. Big Data Analytics technology is developing tremendously. High Performance Computing (HPC) infrastructure used in processing and analysing Big Data is of great importance in scientific research.
In this course learners will be provided with essential knowledge in emerging tools for Data Analysis in HPC systems. We will investigate parallelization opportunities in standard examples of Big Data analytics. Learners will also acquire skills on how to manage and integrate data into parallel processing tools.
Targeted Audience: Researchers and students using the HPC system for data-intensive problems.
Curriculum:
- Data Management and Integration:
- GWDG Data Pool for Scientific Research
- Data Lakes and Data Warehouse
- Distributed Big Data Analytics Tools and Technology:
- Using Apache Spark in HPC Systems
Requirements
- Introduction to SCC course (GWDG Academy) or General knowledge on Linux and HPC system
- Data Management course (GWDG Academy)
- Basic understanding of Linear Algebra
- Basic programming skills in Python
Learning goal
- Providing interested learners with essential knowledge on emerging tools for Data Analysis in HPC systems
- Learners will also have an opportunity to work on their own data sets
Skills
Trainer
Next appointment
Date | Link |
---|---|
22.05.2025 | https://academy.gwdg.de/p/event.xhtml?id=673457e15d441669671bc637 |