High Performance Computing

Master high performance computing from cluster architecture to parallel programming, covering MPI, OpenMP, job schedulers, profiling, and real-world scientific computing applications.

Advanced

12 modules

480 min

4.7

Overview

Master high performance computing from cluster architecture to parallel programming, covering MPI, OpenMP, job schedulers, profiling, and real-world scientific computing applications.

What you'll learn

Design and understand HPC cluster architectures
Write parallel programs using MPI and OpenMP
Optimize code for performance on multi-core and distributed systems
Profile and debug parallel applications
Deploy and manage jobs on HPC schedulers

Course Modules

12 modules

Introduction to High Performance Computing

Understanding what HPC is and why it matters for science and industry.

30m

Key Concepts

Supercomputer FLOPS Compute Node Interconnect TOP500

Learning Objectives

By the end of this module, you will be able to:

Define and explain Supercomputer
Define and explain FLOPS
Define and explain Compute Node
Define and explain Interconnect
Define and explain TOP500
Apply these concepts to real-world examples and scenarios
Analyze and compare the key concepts presented in this module

Introduction

High Performance Computing (HPC) aggregates computing power to solve problems impossible for single computers. From weather prediction to genome sequencing, drug discovery to financial modeling, HPC enables breakthroughs by processing massive datasets and running complex simulations. Modern supercomputers contain millions of processor cores connected by high-speed networks, achieving petaflops (10^15 floating-point operations per second) and approaching exascale. HPC combines hardware architecture, parallel programming, and optimization techniques. Understanding HPC opens doors to cutting-edge research and industrial applications where computational power drives innovation.

In this module, we will explore the fascinating world of Introduction to High Performance Computing. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.

This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!

Supercomputer

What is Supercomputer?

Definition: Computer system with extremely high processing capability

When experts study supercomputer, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding supercomputer helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.

Key Point: Supercomputer is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

FLOPS

What is FLOPS?

Definition: Floating-point operations per second, measure of computing speed

The concept of flops has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about flops, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about flops every day.

Key Point: FLOPS is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Compute Node

What is Compute Node?

Definition: Individual server in an HPC cluster running calculations

To fully appreciate compute node, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of compute node in different contexts around you.

Key Point: Compute Node is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Interconnect

What is Interconnect?

Definition: High-speed network connecting cluster nodes

Understanding interconnect helps us make sense of many processes that affect our daily lives. Experts use their knowledge of interconnect to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.

Key Point: Interconnect is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

TOP500

What is TOP500?

Definition: Ranking of world's fastest supercomputers updated twice yearly

The study of top500 reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.

Key Point: TOP500 is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

🔬 Deep Dive: HPC System Architecture Overview

HPC systems consist of compute nodes, storage systems, and interconnects. Compute nodes contain CPUs (often 32-128 cores), memory (256GB-2TB), and sometimes GPUs or accelerators. Storage tiers include fast parallel file systems (Lustre, GPFS) for active data and tape archives for long-term storage. High-speed interconnects (InfiniBand, Slingshot, Omni-Path) provide low-latency, high-bandwidth communication between nodes. Job schedulers (Slurm, PBS, LSF) manage resource allocation. The TOP500 list ranks the world's fastest supercomputers; as of 2024, Frontier (Oak Ridge) leads at over 1 exaflop. Understanding this architecture is essential for writing efficient HPC applications.

This is an advanced topic that goes beyond the core material, but understanding it will give you a deeper appreciation of the subject. Researchers continue to study this area, and new discoveries are being made all the time.

Did You Know? The first computer to break the exaflop barrier was Frontier in 2022, capable of a quintillion calculations per second, more than the next seven supercomputers combined!

Key Concepts at a Glance

Concept	Definition
Supercomputer	Computer system with extremely high processing capability
FLOPS	Floating-point operations per second, measure of computing speed
Compute Node	Individual server in an HPC cluster running calculations
Interconnect	High-speed network connecting cluster nodes
TOP500	Ranking of world's fastest supercomputers updated twice yearly

Comprehension Questions

Test your understanding by answering these questions:

In your own words, explain what Supercomputer means and give an example of why it is important.
In your own words, explain what FLOPS means and give an example of why it is important.
In your own words, explain what Compute Node means and give an example of why it is important.
In your own words, explain what Interconnect means and give an example of why it is important.
In your own words, explain what TOP500 means and give an example of why it is important.

Summary

In this module, we explored Introduction to High Performance Computing. We learned about supercomputer, flops, compute node, interconnect, top500. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!

Parallel Computing Fundamentals

Core concepts of parallelism and concurrent execution.

30m

Key Concepts

Parallel Computing Amdahl's Law Strong Scaling Weak Scaling Load Balancing

Learning Objectives

By the end of this module, you will be able to:

Define and explain Parallel Computing
Define and explain Amdahl's Law
Define and explain Strong Scaling
Define and explain Weak Scaling
Define and explain Load Balancing
Apply these concepts to real-world examples and scenarios
Analyze and compare the key concepts presented in this module

Introduction

Parallel computing executes multiple calculations simultaneously to solve problems faster. The key challenge is decomposing problems into independent pieces that can run concurrently. Two main paradigms exist: shared memory (multiple cores access common RAM) and distributed memory (separate nodes with private memory communicate via messages). Amdahl's Law states that speedup is limited by the sequential portion of code. Gustafson's Law counters that larger problems can maintain efficiency. Understanding parallelism types (data parallelism, task parallelism, pipeline parallelism) and their appropriate applications is fundamental to HPC programming.

In this module, we will explore the fascinating world of Parallel Computing Fundamentals. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.

This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!

Parallel Computing

What is Parallel Computing?

Definition: Simultaneous execution of multiple calculations

When experts study parallel computing, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding parallel computing helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.

Key Point: Parallel Computing is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Amdahl's Law

What is Amdahl's Law?

Definition: Formula for maximum speedup limited by serial code fraction

The concept of amdahl's law has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about amdahl's law, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about amdahl's law every day.

Key Point: Amdahl's Law is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Strong Scaling

What is Strong Scaling?

Definition: Speedup for fixed problem size with more processors

To fully appreciate strong scaling, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of strong scaling in different contexts around you.

Key Point: Strong Scaling is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Weak Scaling

What is Weak Scaling?

Definition: Efficiency when problem and processors scale together

Understanding weak scaling helps us make sense of many processes that affect our daily lives. Experts use their knowledge of weak scaling to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.

Key Point: Weak Scaling is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Load Balancing

What is Load Balancing?

Definition: Distributing work evenly across processors

The study of load balancing reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.

Key Point: Load Balancing is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

🔬 Deep Dive: Amdahl's Law and Scalability

Amdahl's Law: Speedup = 1 / (S + P/N), where S is the serial fraction, P is the parallel fraction (S+P=1), and N is the number of processors. If 10% of code is serial, maximum speedup is 10x regardless of processors. This highlights the importance of minimizing serial bottlenecks. Strong scaling measures speedup for fixed problem size; weak scaling measures efficiency as both problem size and processors increase proportionally. Efficiency = Speedup/N; ideal is 1.0 (100%). Real applications rarely achieve linear scaling due to communication overhead, load imbalance, and serial sections. Profiling identifies scaling bottlenecks.

Did You Know? Gene Amdahl presented his famous law in 1967, and it's still the fundamental limit on parallel speedup today, over 55 years later!

Key Concepts at a Glance

Concept	Definition
Parallel Computing	Simultaneous execution of multiple calculations
Amdahl's Law	Formula for maximum speedup limited by serial code fraction
Strong Scaling	Speedup for fixed problem size with more processors
Weak Scaling	Efficiency when problem and processors scale together
Load Balancing	Distributing work evenly across processors

Comprehension Questions

Test your understanding by answering these questions:

In your own words, explain what Parallel Computing means and give an example of why it is important.
In your own words, explain what Amdahl's Law means and give an example of why it is important.
In your own words, explain what Strong Scaling means and give an example of why it is important.
In your own words, explain what Weak Scaling means and give an example of why it is important.
In your own words, explain what Load Balancing means and give an example of why it is important.

Summary

In this module, we explored Parallel Computing Fundamentals. We learned about parallel computing, amdahl's law, strong scaling, weak scaling, load balancing. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!

Shared Memory Programming with OpenMP

Parallel programming for multi-core processors using OpenMP directives.

30m

Key Concepts

OpenMP Fork-Join Model Thread Critical Section Reduction

Learning Objectives

By the end of this module, you will be able to:

Define and explain OpenMP
Define and explain Fork-Join Model
Define and explain Thread
Define and explain Critical Section
Define and explain Reduction
Apply these concepts to real-world examples and scenarios
Analyze and compare the key concepts presented in this module

Introduction

OpenMP (Open Multi-Processing) is an API for shared-memory parallel programming in C, C++, and Fortran. It uses compiler directives (#pragma omp) to parallelize code with minimal changes. The fork-join model creates threads that execute in parallel, then rejoin. Basic parallelization: #pragma omp parallel for before a loop divides iterations among threads. OpenMP handles thread creation, synchronization, and work distribution. It's ideal for loop-level parallelism on multi-core CPUs. Key concepts include private/shared variables, reductions, critical sections, and scheduling options. OpenMP is often the first step in parallelizing sequential code.

In this module, we will explore the fascinating world of Shared Memory Programming with OpenMP. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.

This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!

OpenMP

What is OpenMP?

Definition: API for shared-memory parallel programming using compiler directives

When experts study openmp, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding openmp helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.

Key Point: OpenMP is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Fork-Join Model

What is Fork-Join Model?

Definition: Parallel execution pattern where threads fork and later rejoin

The concept of fork-join model has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about fork-join model, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about fork-join model every day.

Key Point: Fork-Join Model is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Thread

What is Thread?

Definition: Independent execution path within a process

To fully appreciate thread, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of thread in different contexts around you.

Key Point: Thread is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Critical Section

What is Critical Section?

Definition: Code region that only one thread can execute at a time

Understanding critical section helps us make sense of many processes that affect our daily lives. Experts use their knowledge of critical section to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.

Key Point: Critical Section is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Reduction

What is Reduction?

Definition: Combining values from multiple threads into a single result

The study of reduction reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.

Key Point: Reduction is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

🔬 Deep Dive: OpenMP Directives and Clauses

Key OpenMP directives: #pragma omp parallel creates a team of threads; #pragma omp for distributes loop iterations; #pragma omp sections defines distinct parallel tasks; #pragma omp single executes on one thread only; #pragma omp critical protects shared data access. Important clauses: private(var) gives each thread its own copy; shared(var) indicates all threads access the same variable; reduction(op:var) combines thread-local results (e.g., reduction(+:sum)); schedule(type) controls loop iteration distribution (static, dynamic, guided). Thread count is set via OMP_NUM_THREADS environment variable or omp_set_num_threads(). Proper use of these constructs ensures correct, efficient parallel execution.

Did You Know? OpenMP was first released in 1997 and is now supported by all major compilers including GCC, Clang, Intel, and Microsoft Visual C++!

Key Concepts at a Glance

Concept	Definition
OpenMP	API for shared-memory parallel programming using compiler directives
Fork-Join Model	Parallel execution pattern where threads fork and later rejoin
Thread	Independent execution path within a process
Critical Section	Code region that only one thread can execute at a time
Reduction	Combining values from multiple threads into a single result

Comprehension Questions

Test your understanding by answering these questions:

In your own words, explain what OpenMP means and give an example of why it is important.
In your own words, explain what Fork-Join Model means and give an example of why it is important.
In your own words, explain what Thread means and give an example of why it is important.
In your own words, explain what Critical Section means and give an example of why it is important.
In your own words, explain what Reduction means and give an example of why it is important.

Summary

In this module, we explored Shared Memory Programming with OpenMP. We learned about openmp, fork-join model, thread, critical section, reduction. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!

Distributed Memory Programming with MPI

Message passing for scalable parallel programs across multiple nodes.

30m

Key Concepts

MPI Rank Communicator Point-to-Point Collective Operation

Learning Objectives

By the end of this module, you will be able to:

Define and explain MPI
Define and explain Rank
Define and explain Communicator
Define and explain Point-to-Point
Define and explain Collective Operation
Apply these concepts to real-world examples and scenarios
Analyze and compare the key concepts presented in this module

Introduction

MPI (Message Passing Interface) is the standard for distributed-memory parallel programming. Unlike OpenMP, each MPI process has its own memory space; data sharing requires explicit message passing. MPI programs run multiple processes (ranks) that communicate using point-to-point operations (Send/Recv) or collective operations (Broadcast, Reduce, Gather, Scatter). MPI scales to thousands of nodes, enabling massive parallelism. The SPMD (Single Program Multiple Data) model runs the same code on all ranks, with rank-based branching for different roles. Understanding MPI is essential for programming supercomputers and large clusters.

In this module, we will explore the fascinating world of Distributed Memory Programming with MPI. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.

This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!

MPI

What is MPI?

Definition: Message Passing Interface for distributed-memory parallel programming

When experts study mpi, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding mpi helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.

Key Point: MPI is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Rank

What is Rank?

Definition: Unique identifier for each MPI process

The concept of rank has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about rank, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about rank every day.

Key Point: Rank is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Communicator

What is Communicator?

Definition: Group of MPI processes that can communicate

To fully appreciate communicator, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of communicator in different contexts around you.

Key Point: Communicator is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Point-to-Point

What is Point-to-Point?

Definition: Communication between two specific processes

Understanding point-to-point helps us make sense of many processes that affect our daily lives. Experts use their knowledge of point-to-point to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.

Key Point: Point-to-Point is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Collective Operation

What is Collective Operation?

Definition: Communication involving all processes in a group

The study of collective operation reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.

Key Point: Collective Operation is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

🔬 Deep Dive: MPI Communication Patterns

Point-to-point: MPI_Send(buf, count, type, dest, tag, comm) sends data; MPI_Recv(buf, count, type, source, tag, comm, status) receives. Blocking operations wait for completion; non-blocking (MPI_Isend, MPI_Irecv) return immediately, allowing overlap of computation and communication. Collective operations: MPI_Bcast distributes data from one rank to all; MPI_Reduce combines data from all ranks to one; MPI_Allreduce combines and distributes result to all; MPI_Scatter distributes array portions; MPI_Gather collects portions into array. MPI_Barrier synchronizes all processes. Proper use of collectives improves performance over equivalent point-to-point implementations. MPI supports custom datatypes and communicators for advanced patterns.

Did You Know? MPI was standardized in 1994 and is still the dominant model for supercomputing. The same program can run on a laptop with 4 cores or a supercomputer with millions!

Key Concepts at a Glance

Concept	Definition
MPI	Message Passing Interface for distributed-memory parallel programming
Rank	Unique identifier for each MPI process
Communicator	Group of MPI processes that can communicate
Point-to-Point	Communication between two specific processes
Collective Operation	Communication involving all processes in a group

Comprehension Questions

Test your understanding by answering these questions:

In your own words, explain what MPI means and give an example of why it is important.
In your own words, explain what Rank means and give an example of why it is important.
In your own words, explain what Communicator means and give an example of why it is important.
In your own words, explain what Point-to-Point means and give an example of why it is important.
In your own words, explain what Collective Operation means and give an example of why it is important.

Summary

In this module, we explored Distributed Memory Programming with MPI. We learned about mpi, rank, communicator, point-to-point, collective operation. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!

Hybrid Programming: MPI + OpenMP

Combining distributed and shared memory parallelism for maximum performance.

30m

Key Concepts

Hybrid Programming Thread Safety MPI_Init_thread Process Affinity Memory Footprint

Learning Objectives

By the end of this module, you will be able to:

Define and explain Hybrid Programming
Define and explain Thread Safety
Define and explain MPI_Init_thread
Define and explain Process Affinity
Define and explain Memory Footprint
Apply these concepts to real-world examples and scenarios
Analyze and compare the key concepts presented in this module

Introduction

Hybrid MPI+OpenMP programming uses MPI for inter-node communication and OpenMP for intra-node parallelism. This matches modern cluster architectures: nodes with many cores sharing memory, connected by network. Instead of running one MPI rank per core, run fewer ranks per node with each rank using multiple OpenMP threads. Benefits include reduced memory footprint (one copy of data per node instead of per core), reduced communication overhead, and better cache utilization. The challenge is balancing MPI and OpenMP work, managing thread safety in MPI calls, and choosing optimal rank/thread configurations.

In this module, we will explore the fascinating world of Hybrid Programming: MPI + OpenMP. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.

This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!

Hybrid Programming

What is Hybrid Programming?

Definition: Combining MPI and OpenMP for multi-level parallelism

When experts study hybrid programming, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding hybrid programming helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.

Key Point: Hybrid Programming is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Thread Safety

What is Thread Safety?

Definition: Code that works correctly when called by multiple threads

The concept of thread safety has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about thread safety, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about thread safety every day.

Key Point: Thread Safety is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

MPI_Init_thread

What is MPI_Init_thread?

Definition: MPI initialization with thread support specification

To fully appreciate mpi_init_thread, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of mpi_init_thread in different contexts around you.

Key Point: MPI_Init_thread is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Process Affinity

What is Process Affinity?

Definition: Binding processes/threads to specific CPU cores

Understanding process affinity helps us make sense of many processes that affect our daily lives. Experts use their knowledge of process affinity to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.

Key Point: Process Affinity is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Memory Footprint

What is Memory Footprint?

Definition: Total memory used by an application

The study of memory footprint reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.

Key Point: Memory Footprint is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

🔬 Deep Dive: Implementing Hybrid Programs

Initialize MPI with thread support: MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided). Thread levels: MPI_THREAD_SINGLE (no threads), MPI_THREAD_FUNNELED (only master thread makes MPI calls), MPI_THREAD_SERIALIZED (one thread at a time), MPI_THREAD_MULTIPLE (any thread anytime). Common pattern: MPI ranks divide problem into chunks; within each rank, OpenMP parallelizes computation on that chunk. Example: MPI distributes matrix rows across nodes; OpenMP parallelizes row operations within each node. Use environment variables to control threading: OMP_NUM_THREADS, OMP_PLACES, OMP_PROC_BIND. Profile to find optimal balance; typical is 1-4 MPI ranks per node with 16-32 threads each.

Did You Know? The fastest supercomputers use hybrid MPI+OpenMP+GPU programming, combining three levels of parallelism to achieve maximum performance!

Key Concepts at a Glance

Concept	Definition
Hybrid Programming	Combining MPI and OpenMP for multi-level parallelism
Thread Safety	Code that works correctly when called by multiple threads
MPI_Init_thread	MPI initialization with thread support specification
Process Affinity	Binding processes/threads to specific CPU cores
Memory Footprint	Total memory used by an application

Comprehension Questions

Test your understanding by answering these questions:

In your own words, explain what Hybrid Programming means and give an example of why it is important.
In your own words, explain what Thread Safety means and give an example of why it is important.
In your own words, explain what MPI_Init_thread means and give an example of why it is important.
In your own words, explain what Process Affinity means and give an example of why it is important.
In your own words, explain what Memory Footprint means and give an example of why it is important.

Summary

In this module, we explored Hybrid Programming: MPI + OpenMP. We learned about hybrid programming, thread safety, mpi_init_thread, process affinity, memory footprint. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!

Job Scheduling and Resource Management

Running parallel jobs on HPC clusters using Slurm and PBS.

30m

Key Concepts

Job Scheduler Slurm Partition Job Array Walltime

Learning Objectives

By the end of this module, you will be able to:

Define and explain Job Scheduler
Define and explain Slurm
Define and explain Partition
Define and explain Job Array
Define and explain Walltime
Apply these concepts to real-world examples and scenarios
Analyze and compare the key concepts presented in this module

Introduction

HPC systems use job schedulers to manage resource allocation fairly among users. Users submit jobs specifying required resources (nodes, cores, time, memory); the scheduler queues jobs and launches them when resources are available. Slurm (Simple Linux Utility for Resource Management) dominates modern HPC. Jobs are submitted via scripts with #SBATCH directives specifying requirements. PBS (Portable Batch System) is an older alternative with similar concepts. Understanding job scheduling is essential for using any HPC system effectively, from getting allocations to optimizing queue wait times.

In this module, we will explore the fascinating world of Job Scheduling and Resource Management. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.

This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!

Job Scheduler

What is Job Scheduler?

Definition: System that manages job queuing and resource allocation

When experts study job scheduler, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding job scheduler helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.

Key Point: Job Scheduler is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Slurm

What is Slurm?

Definition: Popular open-source workload manager for HPC clusters

The concept of slurm has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about slurm, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about slurm every day.

Key Point: Slurm is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Partition

What is Partition?

Definition: Group of nodes with common characteristics in Slurm

To fully appreciate partition, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of partition in different contexts around you.

Key Point: Partition is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Job Array

What is Job Array?

Definition: Collection of similar jobs submitted as one

Understanding job array helps us make sense of many processes that affect our daily lives. Experts use their knowledge of job array to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.

Key Point: Job Array is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Walltime

What is Walltime?

Definition: Maximum execution time for a job

The study of walltime reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.

Key Point: Walltime is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

🔬 Deep Dive: Slurm Job Scripts and Commands

A Slurm job script starts with #!/bin/bash, followed by #SBATCH directives: #SBATCH --nodes=4 (4 nodes), #SBATCH --ntasks-per-node=32 (32 MPI ranks per node), #SBATCH --cpus-per-task=2 (2 threads per rank), #SBATCH --time=02:00:00 (2 hour limit), #SBATCH --partition=compute (queue name). Then load modules and run: srun ./myprogram. Submit with sbatch script.sh. Check status: squeue -u username. Cancel: scancel jobid. Interactive session: salloc --nodes=1 --time=1:00:00. View completed job info: sacct -j jobid. Arrays handle parameter sweeps: #SBATCH --array=1-100. Dependencies chain jobs: --dependency=afterok:jobid.

Did You Know? Slurm was developed at Lawrence Livermore National Laboratory and now manages some of the world's largest supercomputers, including Frontier with over 9,000 nodes!

Key Concepts at a Glance

Concept	Definition
Job Scheduler	System that manages job queuing and resource allocation
Slurm	Popular open-source workload manager for HPC clusters
Partition	Group of nodes with common characteristics in Slurm
Job Array	Collection of similar jobs submitted as one
Walltime	Maximum execution time for a job

Comprehension Questions

Test your understanding by answering these questions:

In your own words, explain what Job Scheduler means and give an example of why it is important.
In your own words, explain what Slurm means and give an example of why it is important.
In your own words, explain what Partition means and give an example of why it is important.
In your own words, explain what Job Array means and give an example of why it is important.
In your own words, explain what Walltime means and give an example of why it is important.

Summary

In this module, we explored Job Scheduling and Resource Management. We learned about job scheduler, slurm, partition, job array, walltime. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!

Performance Profiling and Optimization

Identifying bottlenecks and improving parallel application performance.

30m

Key Concepts

Profiling Hotspot Roofline Model Cache Miss NUMA

Learning Objectives

By the end of this module, you will be able to:

Define and explain Profiling
Define and explain Hotspot
Define and explain Roofline Model
Define and explain Cache Miss
Define and explain NUMA
Apply these concepts to real-world examples and scenarios
Analyze and compare the key concepts presented in this module

Introduction

Profiling reveals where applications spend time and resources. For HPC, this includes computation time, communication overhead, memory usage, and I/O patterns. Tools like gprof, perf, and Valgrind profile serial code. Intel VTune and AMD uProf provide detailed CPU analysis. For parallel programs, Scalasca, TAU, and HPCToolkit trace MPI and OpenMP behavior. NVIDIA Nsight profiles GPU code. The optimization cycle: profile to identify bottlenecks, optimize the most impactful areas, re-profile to verify improvements. Common issues include poor load balance, excessive communication, memory bandwidth limits, and cache misses.

In this module, we will explore the fascinating world of Performance Profiling and Optimization. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.

This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!

Profiling

What is Profiling?

Definition: Measuring where an application spends time and resources

When experts study profiling, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding profiling helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.

Key Point: Profiling is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Hotspot

What is Hotspot?

Definition: Code region consuming significant execution time

The concept of hotspot has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about hotspot, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about hotspot every day.

Key Point: Hotspot is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Roofline Model

What is Roofline Model?

Definition: Visual model showing compute vs memory bandwidth limits

To fully appreciate roofline model, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of roofline model in different contexts around you.

Key Point: Roofline Model is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Cache Miss

What is Cache Miss?

Definition: Memory access that must fetch data from slower memory

Understanding cache miss helps us make sense of many processes that affect our daily lives. Experts use their knowledge of cache miss to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.

Key Point: Cache Miss is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

NUMA

What is NUMA?

Definition: Non-Uniform Memory Access architecture with local and remote memory

The study of numa reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.

Key Point: NUMA is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

🔬 Deep Dive: Performance Analysis Techniques

Sampling profilers (perf, VTune) periodically record program state with low overhead, showing where time is spent. Tracing (Scalasca, TAU) records every event with more detail but higher overhead. MPI profiling reveals communication patterns: look for imbalanced send/receive, all-to-all bottlenecks, and serialized collective operations. Memory analysis finds cache misses, NUMA effects, and bandwidth limits. Roofline model plots achieved performance against memory/compute bounds, showing optimization potential. Hardware counters measure cycles, instructions, cache hits/misses, and branch mispredictions. Start with high-level profiling, then drill down. Focus on hotspots consuming significant runtime percentage.

Did You Know? The Roofline model, developed at Berkeley Lab, has become the standard way to visualize whether code is limited by compute or memory bandwidth!

Key Concepts at a Glance

Concept	Definition
Profiling	Measuring where an application spends time and resources
Hotspot	Code region consuming significant execution time
Roofline Model	Visual model showing compute vs memory bandwidth limits
Cache Miss	Memory access that must fetch data from slower memory
NUMA	Non-Uniform Memory Access architecture with local and remote memory

Comprehension Questions

Test your understanding by answering these questions:

In your own words, explain what Profiling means and give an example of why it is important.
In your own words, explain what Hotspot means and give an example of why it is important.
In your own words, explain what Roofline Model means and give an example of why it is important.
In your own words, explain what Cache Miss means and give an example of why it is important.
In your own words, explain what NUMA means and give an example of why it is important.

Summary

In this module, we explored Performance Profiling and Optimization. We learned about profiling, hotspot, roofline model, cache miss, numa. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!

Parallel File Systems and I/O

Efficient data storage and access for HPC workloads.

30m

Key Concepts

Parallel File System Lustre Striping MPI-IO HDF5

Learning Objectives

By the end of this module, you will be able to:

Define and explain Parallel File System
Define and explain Lustre
Define and explain Striping
Define and explain MPI-IO
Define and explain HDF5
Apply these concepts to real-world examples and scenarios
Analyze and compare the key concepts presented in this module

Introduction

HPC applications often process terabytes to petabytes of data. Standard file systems cannot provide adequate bandwidth for thousands of concurrent processes. Parallel file systems like Lustre, GPFS (Spectrum Scale), and BeeGFS stripe data across many storage servers, enabling aggregate bandwidth of hundreds of GB/s. Understanding I/O patterns is crucial: avoid many small operations; use collective I/O when possible. MPI-IO provides parallel I/O primitives. HDF5 and NetCDF offer high-level libraries for structured scientific data. I/O is often the bottleneck in HPC applications; optimization can dramatically improve overall performance.

In this module, we will explore the fascinating world of Parallel File Systems and I/O. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.

This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!

Parallel File System

What is Parallel File System?

Definition: File system distributing data across multiple servers for high bandwidth

When experts study parallel file system, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding parallel file system helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.

Key Point: Parallel File System is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Lustre

What is Lustre?

Definition: Popular open-source parallel file system for HPC

The concept of lustre has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about lustre, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about lustre every day.

Key Point: Lustre is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Striping

What is Striping?

Definition: Distributing file data across multiple storage targets

To fully appreciate striping, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of striping in different contexts around you.

Key Point: Striping is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

MPI-IO

What is MPI-IO?

Definition: MPI interface for parallel file operations

Understanding mpi-io helps us make sense of many processes that affect our daily lives. Experts use their knowledge of mpi-io to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.

Key Point: MPI-IO is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

HDF5

What is HDF5?

Definition: High-level library for hierarchical scientific data storage

The study of hdf5 reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.

Key Point: HDF5 is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

🔬 Deep Dive: I/O Optimization Strategies

Key strategies: (1) Collective I/O - instead of each process writing separately, aggregate writes through fewer processes (MPI_File_write_all). (2) Striping - set stripe count and size to match access patterns; more stripes for large files, fewer for small. (3) Buffering - accumulate small writes into large buffers before I/O. (4) Asynchronous I/O - overlap computation with I/O operations. (5) Data staging - use node-local SSDs as burst buffers for checkpoints. (6) Compression - reduce data volume for I/O-bound applications. Lustre commands: lfs setstripe -c stripe_count -S stripe_size. Monitor I/O with Darshan profiler to identify bottlenecks. Avoid opening files from all ranks simultaneously.

Did You Know? Lustre, the dominant HPC file system, was named to suggest light and lustre (brilliance). It powers 7 of the TOP10 supercomputers!

Key Concepts at a Glance

Concept	Definition
Parallel File System	File system distributing data across multiple servers for high bandwidth
Lustre	Popular open-source parallel file system for HPC
Striping	Distributing file data across multiple storage targets
MPI-IO	MPI interface for parallel file operations
HDF5	High-level library for hierarchical scientific data storage

Comprehension Questions

Test your understanding by answering these questions:

In your own words, explain what Parallel File System means and give an example of why it is important.
In your own words, explain what Lustre means and give an example of why it is important.
In your own words, explain what Striping means and give an example of why it is important.
In your own words, explain what MPI-IO means and give an example of why it is important.
In your own words, explain what HDF5 means and give an example of why it is important.

Summary

In this module, we explored Parallel File Systems and I/O. We learned about parallel file system, lustre, striping, mpi-io, hdf5. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!

Debugging Parallel Applications

Finding and fixing bugs in MPI and OpenMP programs.

30m

Key Concepts

Race Condition Deadlock Thread Sanitizer DDT MUST

Learning Objectives

By the end of this module, you will be able to:

Define and explain Race Condition
Define and explain Deadlock
Define and explain Thread Sanitizer
Define and explain DDT
Define and explain MUST
Apply these concepts to real-world examples and scenarios
Analyze and compare the key concepts presented in this module

Introduction

Parallel bugs are notoriously difficult to find because they may appear intermittently depending on timing. Common issues include race conditions (threads accessing shared data without synchronization), deadlocks (processes waiting for each other forever), and incorrect message passing (wrong source, destination, or buffer). Tools help: DDT and TotalView are commercial parallel debuggers; GDB with mpirun can debug MPI. Valgrind's helgrind and DRD find thread errors. MPI correctness tools like MUST verify MPI usage. Reproducibility challenges mean bugs may not manifest consistently. Defensive programming with assertions and thorough testing on small scale before large runs is essential.

In this module, we will explore the fascinating world of Debugging Parallel Applications. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.

This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!

Race Condition

What is Race Condition?

Definition: Bug where outcome depends on unpredictable thread timing

When experts study race condition, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding race condition helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.

Key Point: Race Condition is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Deadlock

What is Deadlock?

Definition: State where processes wait for each other indefinitely

The concept of deadlock has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about deadlock, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about deadlock every day.

Key Point: Deadlock is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Thread Sanitizer

What is Thread Sanitizer?

Definition: Tool detecting race conditions and other thread errors

To fully appreciate thread sanitizer, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of thread sanitizer in different contexts around you.

Key Point: Thread Sanitizer is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

DDT

What is DDT?

Definition: Commercial parallel debugger for MPI and OpenMP

Understanding ddt helps us make sense of many processes that affect our daily lives. Experts use their knowledge of ddt to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.

Key Point: DDT is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

MUST

What is MUST?

Definition: MPI correctness checking tool

The study of must reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.

Key Point: MUST is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

🔬 Deep Dive: Common Parallel Bugs and Detection

Race conditions: multiple threads modify shared data without locks. Symptoms: incorrect results that change between runs. Detection: run with thread sanitizers (gcc -fsanitize=thread), use helgrind. Fix: add proper synchronization (mutex, critical section, atomic operations). Deadlocks: process A waits for B while B waits for A. Symptoms: program hangs. Detection: attach debugger to hanging process, check stack traces. Fix: ensure consistent ordering of communications. MPI mismatches: send/receive type or count mismatch causes undefined behavior. Detection: use MUST or Intel MPI's -check_mpi. Memory errors in parallel: use Valgrind memcheck on single-rank runs. Buffer overruns in MPI: ensure buffers are large enough for all received data.

Did You Know? The first parallel debugger, Totalview, was created in 1989 and is still actively developed and used on the world's largest supercomputers today!

Key Concepts at a Glance

Concept	Definition
Race Condition	Bug where outcome depends on unpredictable thread timing
Deadlock	State where processes wait for each other indefinitely
Thread Sanitizer	Tool detecting race conditions and other thread errors
DDT	Commercial parallel debugger for MPI and OpenMP
MUST	MPI correctness checking tool

Comprehension Questions

Test your understanding by answering these questions:

In your own words, explain what Race Condition means and give an example of why it is important.
In your own words, explain what Deadlock means and give an example of why it is important.
In your own words, explain what Thread Sanitizer means and give an example of why it is important.
In your own words, explain what DDT means and give an example of why it is important.
In your own words, explain what MUST means and give an example of why it is important.

Summary

In this module, we explored Debugging Parallel Applications. We learned about race condition, deadlock, thread sanitizer, ddt, must. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!

Domain Decomposition and Load Balancing

Distributing work efficiently across parallel processes.

30m

Key Concepts

Domain Decomposition Halo Exchange ParMETIS Load Imbalance Space-Filling Curve

Learning Objectives

By the end of this module, you will be able to:

Define and explain Domain Decomposition
Define and explain Halo Exchange
Define and explain ParMETIS
Define and explain Load Imbalance
Define and explain Space-Filling Curve
Apply these concepts to real-world examples and scenarios
Analyze and compare the key concepts presented in this module

Introduction

Domain decomposition divides the computational domain (data and work) among parallel processes. For regular grids, 1D, 2D, or 3D block decomposition maps naturally to the problem structure. The surface-to-volume ratio affects communication overhead: higher-dimensional decomposition often reduces halo (ghost cell) exchange. For irregular domains like graphs or particles, partitioning libraries (ParMETIS, Zoltan) balance load while minimizing communication. Dynamic load balancing redistributes work as computation evolves (e.g., adaptive mesh refinement). The goal is equal work per process with minimal data exchange between them.

In this module, we will explore the fascinating world of Domain Decomposition and Load Balancing. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.

This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!

Domain Decomposition

What is Domain Decomposition?

Definition: Dividing problem space among parallel processes

When experts study domain decomposition, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding domain decomposition helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.

Key Point: Domain Decomposition is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Halo Exchange

What is Halo Exchange?

Definition: Communication of boundary data between neighboring domains

The concept of halo exchange has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about halo exchange, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about halo exchange every day.

Key Point: Halo Exchange is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

ParMETIS

What is ParMETIS?

Definition: Parallel graph partitioning library

To fully appreciate parmetis, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of parmetis in different contexts around you.

Key Point: ParMETIS is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Load Imbalance

What is Load Imbalance?

Definition: Unequal work distribution causing idle time

Understanding load imbalance helps us make sense of many processes that affect our daily lives. Experts use their knowledge of load imbalance to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.

Key Point: Load Imbalance is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Space-Filling Curve

What is Space-Filling Curve?

Definition: Continuous path through multi-dimensional space preserving locality

The study of space-filling curve reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.

Key Point: Space-Filling Curve is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

🔬 Deep Dive: Partitioning Strategies and Tools

For structured grids: Block decomposition divides domain into contiguous chunks. 1D (slabs), 2D (pencils), 3D (blocks) trade communication patterns for surface area. Cyclic distribution (round-robin) helps with load imbalance if work per cell varies. For unstructured meshes and graphs: ParMETIS uses multilevel recursive bisection to minimize edge cuts (communication). Zoltan provides multiple algorithms including geometric, graph, and hypergraph partitioning. Space-filling curves (Hilbert, Morton) map multi-dimensional data to 1D while preserving locality. Measure load imbalance as max_work/avg_work; target < 1.05 (5% imbalance). Re-partition when imbalance exceeds threshold in dynamic simulations.

Did You Know? ParMETIS can partition a billion-element mesh across thousands of processors in minutes, making it possible to run CFD simulations of entire aircraft!

Key Concepts at a Glance

Concept	Definition
Domain Decomposition	Dividing problem space among parallel processes
Halo Exchange	Communication of boundary data between neighboring domains
ParMETIS	Parallel graph partitioning library
Load Imbalance	Unequal work distribution causing idle time
Space-Filling Curve	Continuous path through multi-dimensional space preserving locality

Comprehension Questions

Test your understanding by answering these questions:

In your own words, explain what Domain Decomposition means and give an example of why it is important.
In your own words, explain what Halo Exchange means and give an example of why it is important.
In your own words, explain what ParMETIS means and give an example of why it is important.
In your own words, explain what Load Imbalance means and give an example of why it is important.
In your own words, explain what Space-Filling Curve means and give an example of why it is important.

Summary

In this module, we explored Domain Decomposition and Load Balancing. We learned about domain decomposition, halo exchange, parmetis, load imbalance, space-filling curve. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!

HPC Applications: Scientific Simulations

Real-world HPC applications in physics, chemistry, and engineering.

30m

Key Concepts

Molecular Dynamics CFD Climate Modeling Finite Element N-body Problem

Learning Objectives

By the end of this module, you will be able to:

Define and explain Molecular Dynamics
Define and explain CFD
Define and explain Climate Modeling
Define and explain Finite Element
Define and explain N-body Problem
Apply these concepts to real-world examples and scenarios
Analyze and compare the key concepts presented in this module

Introduction

HPC enables scientific breakthroughs impossible with smaller computers. Climate modeling simulates decades of global weather using millions of grid cells. Computational fluid dynamics (CFD) predicts airflow around aircraft and vehicles. Molecular dynamics simulates protein folding and drug interactions at atomic scale. Cosmological simulations model universe evolution with billions of particles. Finite element analysis designs structures and machines. Quantum chemistry calculates molecular properties. Each application type has characteristic algorithms, scaling properties, and software ecosystems. Understanding these applications helps design effective HPC systems and optimize code for specific domains.

In this module, we will explore the fascinating world of HPC Applications: Scientific Simulations. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.

This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!

Molecular Dynamics

What is Molecular Dynamics?

Definition: Simulation of atomic motion by integrating equations of motion

When experts study molecular dynamics, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding molecular dynamics helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.

Key Point: Molecular Dynamics is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

CFD

What is CFD?

Definition: Computational Fluid Dynamics for simulating fluid flow

The concept of cfd has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about cfd, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about cfd every day.

Key Point: CFD is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Climate Modeling

What is Climate Modeling?

Definition: HPC simulation of Earth's atmosphere and oceans

To fully appreciate climate modeling, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of climate modeling in different contexts around you.

Key Point: Climate Modeling is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Finite Element

What is Finite Element?

Definition: Numerical method for solving PDEs on complex geometries

Understanding finite element helps us make sense of many processes that affect our daily lives. Experts use their knowledge of finite element to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.

Key Point: Finite Element is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

N-body Problem

What is N-body Problem?

Definition: Computing interactions among many particles

The study of n-body problem reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.

Key Point: N-body Problem is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

🔬 Deep Dive: Molecular Dynamics Case Study

Molecular dynamics (MD) simulates atomic/molecular motion by numerically integrating Newton's equations. Codes like LAMMPS, GROMACS, and NAMD simulate millions of atoms for microseconds of real time. Key components: force calculation (Lennard-Jones, Coulomb), integration (Verlet), neighbor lists, periodic boundaries. Parallelization: domain decomposition assigns spatial regions to processes; atoms near boundaries require halo exchange. Load balancing is challenging as atoms move. GPU acceleration provides 10-100x speedup for force calculations. Applications include drug discovery (protein-ligand binding), materials science (polymers, batteries), and biophysics (membrane proteins). Typical runs use thousands of CPU cores or hundreds of GPUs.

Did You Know? The 2013 Nobel Prize in Chemistry recognized molecular dynamics simulations, which can now simulate entire viruses with over 100 million atoms!

Key Concepts at a Glance

Concept	Definition
Molecular Dynamics	Simulation of atomic motion by integrating equations of motion
CFD	Computational Fluid Dynamics for simulating fluid flow
Climate Modeling	HPC simulation of Earth's atmosphere and oceans
Finite Element	Numerical method for solving PDEs on complex geometries
N-body Problem	Computing interactions among many particles

Comprehension Questions

Test your understanding by answering these questions:

In your own words, explain what Molecular Dynamics means and give an example of why it is important.
In your own words, explain what CFD means and give an example of why it is important.
In your own words, explain what Climate Modeling means and give an example of why it is important.
In your own words, explain what Finite Element means and give an example of why it is important.
In your own words, explain what N-body Problem means and give an example of why it is important.

Summary

In this module, we explored HPC Applications: Scientific Simulations. We learned about molecular dynamics, cfd, climate modeling, finite element, n-body problem. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!

Future of HPC: Exascale and Beyond

Emerging technologies and challenges shaping next-generation supercomputing.

30m

Key Concepts

Exascale Heterogeneous Computing Performance Portability Kokkos SYCL

Learning Objectives

By the end of this module, you will be able to:

Define and explain Exascale
Define and explain Heterogeneous Computing
Define and explain Performance Portability
Define and explain Kokkos
Define and explain SYCL
Apply these concepts to real-world examples and scenarios
Analyze and compare the key concepts presented in this module

Introduction

HPC continues evolving toward exascale (10^18 FLOPS) and beyond. Frontier achieved exascale in 2022; Aurora and El Capitan follow. But power consumption (20-30 MW per system) and reliability (millions of components) pose challenges. Heterogeneous architectures mix CPUs, GPUs, and specialized accelerators. AI/ML integration blurs lines between traditional HPC and deep learning. Quantum computing may complement classical HPC for specific problems. New programming models (SYCL, Kokkos, RAJA) aim for performance portability. Neuromorphic and optical computing explore alternative computational paradigms. The future demands innovations in algorithms, hardware, and software to maintain progress.

In this module, we will explore the fascinating world of Future of HPC: Exascale and Beyond. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.

This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!

Exascale

What is Exascale?

Definition: Computing performance of 10^18 floating-point operations per second

When experts study exascale, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding exascale helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.

Key Point: Exascale is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Heterogeneous Computing

What is Heterogeneous Computing?

Definition: Using different processor types (CPU, GPU, accelerators) together

The concept of heterogeneous computing has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about heterogeneous computing, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about heterogeneous computing every day.

Key Point: Heterogeneous Computing is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Performance Portability

What is Performance Portability?

Definition: Code that achieves good performance across different architectures

To fully appreciate performance portability, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of performance portability in different contexts around you.

Key Point: Performance Portability is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

Kokkos

What is Kokkos?

Definition: C++ library for portable parallel programming

Understanding kokkos helps us make sense of many processes that affect our daily lives. Experts use their knowledge of kokkos to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.

Key Point: Kokkos is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

SYCL

What is SYCL?

Definition: Khronos standard for C++ heterogeneous computing

The study of sycl reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.

Key Point: SYCL is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!

🔬 Deep Dive: Performance Portability Frameworks

Writing code that runs efficiently on diverse hardware (Intel/AMD CPUs, NVIDIA/AMD GPUs) is challenging. Performance portability frameworks provide abstraction: Kokkos (Sandia) offers C++ abstractions for parallel execution and memory spaces; code compiles to OpenMP, CUDA, or HIP. RAJA (LLNL) provides similar abstractions with different design choices. SYCL is a Khronos standard extending C++ for heterogeneous computing; Intel oneAPI implements SYCL. OpenACC uses directives like OpenMP but targets accelerators. The tradeoff: abstraction enables portability but may sacrifice peak performance versus native code. These frameworks are increasingly adopted for large HPC codes that must run across multiple supercomputing centers.

Did You Know? Frontier, the first exascale supercomputer, uses 9,408 AMD compute nodes each with one CPU and four GPUs, consuming enough power to supply 10,000 homes!

Key Concepts at a Glance

Concept	Definition
Exascale	Computing performance of 10^18 floating-point operations per second
Heterogeneous Computing	Using different processor types (CPU, GPU, accelerators) together
Performance Portability	Code that achieves good performance across different architectures
Kokkos	C++ library for portable parallel programming
SYCL	Khronos standard for C++ heterogeneous computing

Comprehension Questions

Test your understanding by answering these questions:

In your own words, explain what Exascale means and give an example of why it is important.
In your own words, explain what Heterogeneous Computing means and give an example of why it is important.
In your own words, explain what Performance Portability means and give an example of why it is important.
In your own words, explain what Kokkos means and give an example of why it is important.
In your own words, explain what SYCL means and give an example of why it is important.

Summary

In this module, we explored Future of HPC: Exascale and Beyond. We learned about exascale, heterogeneous computing, performance portability, kokkos, sycl. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!

Ready to master High Performance Computing?

Get personalized AI tutoring with flashcards, quizzes, and interactive exercises in the Eludo app

App Store Google Play

Personalized learning

Interactive exercises

Offline access