High Performance Computing
Master high performance computing from cluster architecture to parallel programming, covering MPI, OpenMP, job schedulers, profiling, and real-world scientific computing applications.
Overview
Master high performance computing from cluster architecture to parallel programming, covering MPI, OpenMP, job schedulers, profiling, and real-world scientific computing applications.
What you'll learn
- Design and understand HPC cluster architectures
- Write parallel programs using MPI and OpenMP
- Optimize code for performance on multi-core and distributed systems
- Profile and debug parallel applications
- Deploy and manage jobs on HPC schedulers
Course Modules
12 modules 1 Introduction to High Performance Computing
Understanding what HPC is and why it matters for science and industry.
30m
Introduction to High Performance Computing
Understanding what HPC is and why it matters for science and industry.
Learning Objectives
By the end of this module, you will be able to:
- Define and explain Supercomputer
- Define and explain FLOPS
- Define and explain Compute Node
- Define and explain Interconnect
- Define and explain TOP500
- Apply these concepts to real-world examples and scenarios
- Analyze and compare the key concepts presented in this module
Introduction
High Performance Computing (HPC) aggregates computing power to solve problems impossible for single computers. From weather prediction to genome sequencing, drug discovery to financial modeling, HPC enables breakthroughs by processing massive datasets and running complex simulations. Modern supercomputers contain millions of processor cores connected by high-speed networks, achieving petaflops (10^15 floating-point operations per second) and approaching exascale. HPC combines hardware architecture, parallel programming, and optimization techniques. Understanding HPC opens doors to cutting-edge research and industrial applications where computational power drives innovation.
In this module, we will explore the fascinating world of Introduction to High Performance Computing. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.
This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!
Supercomputer
What is Supercomputer?
Definition: Computer system with extremely high processing capability
When experts study supercomputer, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding supercomputer helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.
Key Point: Supercomputer is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
FLOPS
What is FLOPS?
Definition: Floating-point operations per second, measure of computing speed
The concept of flops has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about flops, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about flops every day.
Key Point: FLOPS is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Compute Node
What is Compute Node?
Definition: Individual server in an HPC cluster running calculations
To fully appreciate compute node, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of compute node in different contexts around you.
Key Point: Compute Node is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Interconnect
What is Interconnect?
Definition: High-speed network connecting cluster nodes
Understanding interconnect helps us make sense of many processes that affect our daily lives. Experts use their knowledge of interconnect to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.
Key Point: Interconnect is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
TOP500
What is TOP500?
Definition: Ranking of world's fastest supercomputers updated twice yearly
The study of top500 reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.
Key Point: TOP500 is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
🔬 Deep Dive: HPC System Architecture Overview
HPC systems consist of compute nodes, storage systems, and interconnects. Compute nodes contain CPUs (often 32-128 cores), memory (256GB-2TB), and sometimes GPUs or accelerators. Storage tiers include fast parallel file systems (Lustre, GPFS) for active data and tape archives for long-term storage. High-speed interconnects (InfiniBand, Slingshot, Omni-Path) provide low-latency, high-bandwidth communication between nodes. Job schedulers (Slurm, PBS, LSF) manage resource allocation. The TOP500 list ranks the world's fastest supercomputers; as of 2024, Frontier (Oak Ridge) leads at over 1 exaflop. Understanding this architecture is essential for writing efficient HPC applications.
This is an advanced topic that goes beyond the core material, but understanding it will give you a deeper appreciation of the subject. Researchers continue to study this area, and new discoveries are being made all the time.
Did You Know? The first computer to break the exaflop barrier was Frontier in 2022, capable of a quintillion calculations per second, more than the next seven supercomputers combined!
Key Concepts at a Glance
| Concept | Definition |
|---|---|
| Supercomputer | Computer system with extremely high processing capability |
| FLOPS | Floating-point operations per second, measure of computing speed |
| Compute Node | Individual server in an HPC cluster running calculations |
| Interconnect | High-speed network connecting cluster nodes |
| TOP500 | Ranking of world's fastest supercomputers updated twice yearly |
Comprehension Questions
Test your understanding by answering these questions:
In your own words, explain what Supercomputer means and give an example of why it is important.
In your own words, explain what FLOPS means and give an example of why it is important.
In your own words, explain what Compute Node means and give an example of why it is important.
In your own words, explain what Interconnect means and give an example of why it is important.
In your own words, explain what TOP500 means and give an example of why it is important.
Summary
In this module, we explored Introduction to High Performance Computing. We learned about supercomputer, flops, compute node, interconnect, top500. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!
2 Parallel Computing Fundamentals
Core concepts of parallelism and concurrent execution.
30m
Parallel Computing Fundamentals
Core concepts of parallelism and concurrent execution.
Learning Objectives
By the end of this module, you will be able to:
- Define and explain Parallel Computing
- Define and explain Amdahl's Law
- Define and explain Strong Scaling
- Define and explain Weak Scaling
- Define and explain Load Balancing
- Apply these concepts to real-world examples and scenarios
- Analyze and compare the key concepts presented in this module
Introduction
Parallel computing executes multiple calculations simultaneously to solve problems faster. The key challenge is decomposing problems into independent pieces that can run concurrently. Two main paradigms exist: shared memory (multiple cores access common RAM) and distributed memory (separate nodes with private memory communicate via messages). Amdahl's Law states that speedup is limited by the sequential portion of code. Gustafson's Law counters that larger problems can maintain efficiency. Understanding parallelism types (data parallelism, task parallelism, pipeline parallelism) and their appropriate applications is fundamental to HPC programming.
In this module, we will explore the fascinating world of Parallel Computing Fundamentals. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.
This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!
Parallel Computing
What is Parallel Computing?
Definition: Simultaneous execution of multiple calculations
When experts study parallel computing, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding parallel computing helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.
Key Point: Parallel Computing is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Amdahl's Law
What is Amdahl's Law?
Definition: Formula for maximum speedup limited by serial code fraction
The concept of amdahl's law has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about amdahl's law, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about amdahl's law every day.
Key Point: Amdahl's Law is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Strong Scaling
What is Strong Scaling?
Definition: Speedup for fixed problem size with more processors
To fully appreciate strong scaling, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of strong scaling in different contexts around you.
Key Point: Strong Scaling is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Weak Scaling
What is Weak Scaling?
Definition: Efficiency when problem and processors scale together
Understanding weak scaling helps us make sense of many processes that affect our daily lives. Experts use their knowledge of weak scaling to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.
Key Point: Weak Scaling is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Load Balancing
What is Load Balancing?
Definition: Distributing work evenly across processors
The study of load balancing reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.
Key Point: Load Balancing is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
🔬 Deep Dive: Amdahl's Law and Scalability
Amdahl's Law: Speedup = 1 / (S + P/N), where S is the serial fraction, P is the parallel fraction (S+P=1), and N is the number of processors. If 10% of code is serial, maximum speedup is 10x regardless of processors. This highlights the importance of minimizing serial bottlenecks. Strong scaling measures speedup for fixed problem size; weak scaling measures efficiency as both problem size and processors increase proportionally. Efficiency = Speedup/N; ideal is 1.0 (100%). Real applications rarely achieve linear scaling due to communication overhead, load imbalance, and serial sections. Profiling identifies scaling bottlenecks.
This is an advanced topic that goes beyond the core material, but understanding it will give you a deeper appreciation of the subject. Researchers continue to study this area, and new discoveries are being made all the time.
Did You Know? Gene Amdahl presented his famous law in 1967, and it's still the fundamental limit on parallel speedup today, over 55 years later!
Key Concepts at a Glance
| Concept | Definition |
|---|---|
| Parallel Computing | Simultaneous execution of multiple calculations |
| Amdahl's Law | Formula for maximum speedup limited by serial code fraction |
| Strong Scaling | Speedup for fixed problem size with more processors |
| Weak Scaling | Efficiency when problem and processors scale together |
| Load Balancing | Distributing work evenly across processors |
Comprehension Questions
Test your understanding by answering these questions:
In your own words, explain what Parallel Computing means and give an example of why it is important.
In your own words, explain what Amdahl's Law means and give an example of why it is important.
In your own words, explain what Strong Scaling means and give an example of why it is important.
In your own words, explain what Weak Scaling means and give an example of why it is important.
In your own words, explain what Load Balancing means and give an example of why it is important.
Summary
In this module, we explored Parallel Computing Fundamentals. We learned about parallel computing, amdahl's law, strong scaling, weak scaling, load balancing. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!
3 Shared Memory Programming with OpenMP
Parallel programming for multi-core processors using OpenMP directives.
30m
Shared Memory Programming with OpenMP
Parallel programming for multi-core processors using OpenMP directives.
Learning Objectives
By the end of this module, you will be able to:
- Define and explain OpenMP
- Define and explain Fork-Join Model
- Define and explain Thread
- Define and explain Critical Section
- Define and explain Reduction
- Apply these concepts to real-world examples and scenarios
- Analyze and compare the key concepts presented in this module
Introduction
OpenMP (Open Multi-Processing) is an API for shared-memory parallel programming in C, C++, and Fortran. It uses compiler directives (#pragma omp) to parallelize code with minimal changes. The fork-join model creates threads that execute in parallel, then rejoin. Basic parallelization: #pragma omp parallel for before a loop divides iterations among threads. OpenMP handles thread creation, synchronization, and work distribution. It's ideal for loop-level parallelism on multi-core CPUs. Key concepts include private/shared variables, reductions, critical sections, and scheduling options. OpenMP is often the first step in parallelizing sequential code.
In this module, we will explore the fascinating world of Shared Memory Programming with OpenMP. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.
This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!
OpenMP
What is OpenMP?
Definition: API for shared-memory parallel programming using compiler directives
When experts study openmp, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding openmp helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.
Key Point: OpenMP is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Fork-Join Model
What is Fork-Join Model?
Definition: Parallel execution pattern where threads fork and later rejoin
The concept of fork-join model has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about fork-join model, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about fork-join model every day.
Key Point: Fork-Join Model is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Thread
What is Thread?
Definition: Independent execution path within a process
To fully appreciate thread, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of thread in different contexts around you.
Key Point: Thread is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Critical Section
What is Critical Section?
Definition: Code region that only one thread can execute at a time
Understanding critical section helps us make sense of many processes that affect our daily lives. Experts use their knowledge of critical section to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.
Key Point: Critical Section is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Reduction
What is Reduction?
Definition: Combining values from multiple threads into a single result
The study of reduction reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.
Key Point: Reduction is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
🔬 Deep Dive: OpenMP Directives and Clauses
Key OpenMP directives: #pragma omp parallel creates a team of threads; #pragma omp for distributes loop iterations; #pragma omp sections defines distinct parallel tasks; #pragma omp single executes on one thread only; #pragma omp critical protects shared data access. Important clauses: private(var) gives each thread its own copy; shared(var) indicates all threads access the same variable; reduction(op:var) combines thread-local results (e.g., reduction(+:sum)); schedule(type) controls loop iteration distribution (static, dynamic, guided). Thread count is set via OMP_NUM_THREADS environment variable or omp_set_num_threads(). Proper use of these constructs ensures correct, efficient parallel execution.
This is an advanced topic that goes beyond the core material, but understanding it will give you a deeper appreciation of the subject. Researchers continue to study this area, and new discoveries are being made all the time.
Did You Know? OpenMP was first released in 1997 and is now supported by all major compilers including GCC, Clang, Intel, and Microsoft Visual C++!
Key Concepts at a Glance
| Concept | Definition |
|---|---|
| OpenMP | API for shared-memory parallel programming using compiler directives |
| Fork-Join Model | Parallel execution pattern where threads fork and later rejoin |
| Thread | Independent execution path within a process |
| Critical Section | Code region that only one thread can execute at a time |
| Reduction | Combining values from multiple threads into a single result |
Comprehension Questions
Test your understanding by answering these questions:
In your own words, explain what OpenMP means and give an example of why it is important.
In your own words, explain what Fork-Join Model means and give an example of why it is important.
In your own words, explain what Thread means and give an example of why it is important.
In your own words, explain what Critical Section means and give an example of why it is important.
In your own words, explain what Reduction means and give an example of why it is important.
Summary
In this module, we explored Shared Memory Programming with OpenMP. We learned about openmp, fork-join model, thread, critical section, reduction. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!
4 Distributed Memory Programming with MPI
Message passing for scalable parallel programs across multiple nodes.
30m
Distributed Memory Programming with MPI
Message passing for scalable parallel programs across multiple nodes.
Learning Objectives
By the end of this module, you will be able to:
- Define and explain MPI
- Define and explain Rank
- Define and explain Communicator
- Define and explain Point-to-Point
- Define and explain Collective Operation
- Apply these concepts to real-world examples and scenarios
- Analyze and compare the key concepts presented in this module
Introduction
MPI (Message Passing Interface) is the standard for distributed-memory parallel programming. Unlike OpenMP, each MPI process has its own memory space; data sharing requires explicit message passing. MPI programs run multiple processes (ranks) that communicate using point-to-point operations (Send/Recv) or collective operations (Broadcast, Reduce, Gather, Scatter). MPI scales to thousands of nodes, enabling massive parallelism. The SPMD (Single Program Multiple Data) model runs the same code on all ranks, with rank-based branching for different roles. Understanding MPI is essential for programming supercomputers and large clusters.
In this module, we will explore the fascinating world of Distributed Memory Programming with MPI. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.
This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!
MPI
What is MPI?
Definition: Message Passing Interface for distributed-memory parallel programming
When experts study mpi, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding mpi helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.
Key Point: MPI is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Rank
What is Rank?
Definition: Unique identifier for each MPI process
The concept of rank has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about rank, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about rank every day.
Key Point: Rank is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Communicator
What is Communicator?
Definition: Group of MPI processes that can communicate
To fully appreciate communicator, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of communicator in different contexts around you.
Key Point: Communicator is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Point-to-Point
What is Point-to-Point?
Definition: Communication between two specific processes
Understanding point-to-point helps us make sense of many processes that affect our daily lives. Experts use their knowledge of point-to-point to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.
Key Point: Point-to-Point is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Collective Operation
What is Collective Operation?
Definition: Communication involving all processes in a group
The study of collective operation reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.
Key Point: Collective Operation is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
🔬 Deep Dive: MPI Communication Patterns
Point-to-point: MPI_Send(buf, count, type, dest, tag, comm) sends data; MPI_Recv(buf, count, type, source, tag, comm, status) receives. Blocking operations wait for completion; non-blocking (MPI_Isend, MPI_Irecv) return immediately, allowing overlap of computation and communication. Collective operations: MPI_Bcast distributes data from one rank to all; MPI_Reduce combines data from all ranks to one; MPI_Allreduce combines and distributes result to all; MPI_Scatter distributes array portions; MPI_Gather collects portions into array. MPI_Barrier synchronizes all processes. Proper use of collectives improves performance over equivalent point-to-point implementations. MPI supports custom datatypes and communicators for advanced patterns.
This is an advanced topic that goes beyond the core material, but understanding it will give you a deeper appreciation of the subject. Researchers continue to study this area, and new discoveries are being made all the time.
Did You Know? MPI was standardized in 1994 and is still the dominant model for supercomputing. The same program can run on a laptop with 4 cores or a supercomputer with millions!
Key Concepts at a Glance
| Concept | Definition |
|---|---|
| MPI | Message Passing Interface for distributed-memory parallel programming |
| Rank | Unique identifier for each MPI process |
| Communicator | Group of MPI processes that can communicate |
| Point-to-Point | Communication between two specific processes |
| Collective Operation | Communication involving all processes in a group |
Comprehension Questions
Test your understanding by answering these questions:
In your own words, explain what MPI means and give an example of why it is important.
In your own words, explain what Rank means and give an example of why it is important.
In your own words, explain what Communicator means and give an example of why it is important.
In your own words, explain what Point-to-Point means and give an example of why it is important.
In your own words, explain what Collective Operation means and give an example of why it is important.
Summary
In this module, we explored Distributed Memory Programming with MPI. We learned about mpi, rank, communicator, point-to-point, collective operation. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!
5 Hybrid Programming: MPI + OpenMP
Combining distributed and shared memory parallelism for maximum performance.
30m
Hybrid Programming: MPI + OpenMP
Combining distributed and shared memory parallelism for maximum performance.
Learning Objectives
By the end of this module, you will be able to:
- Define and explain Hybrid Programming
- Define and explain Thread Safety
- Define and explain MPI_Init_thread
- Define and explain Process Affinity
- Define and explain Memory Footprint
- Apply these concepts to real-world examples and scenarios
- Analyze and compare the key concepts presented in this module
Introduction
Hybrid MPI+OpenMP programming uses MPI for inter-node communication and OpenMP for intra-node parallelism. This matches modern cluster architectures: nodes with many cores sharing memory, connected by network. Instead of running one MPI rank per core, run fewer ranks per node with each rank using multiple OpenMP threads. Benefits include reduced memory footprint (one copy of data per node instead of per core), reduced communication overhead, and better cache utilization. The challenge is balancing MPI and OpenMP work, managing thread safety in MPI calls, and choosing optimal rank/thread configurations.
In this module, we will explore the fascinating world of Hybrid Programming: MPI + OpenMP. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.
This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!
Hybrid Programming
What is Hybrid Programming?
Definition: Combining MPI and OpenMP for multi-level parallelism
When experts study hybrid programming, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding hybrid programming helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.
Key Point: Hybrid Programming is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Thread Safety
What is Thread Safety?
Definition: Code that works correctly when called by multiple threads
The concept of thread safety has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about thread safety, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about thread safety every day.
Key Point: Thread Safety is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
MPI_Init_thread
What is MPI_Init_thread?
Definition: MPI initialization with thread support specification
To fully appreciate mpi_init_thread, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of mpi_init_thread in different contexts around you.
Key Point: MPI_Init_thread is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Process Affinity
What is Process Affinity?
Definition: Binding processes/threads to specific CPU cores
Understanding process affinity helps us make sense of many processes that affect our daily lives. Experts use their knowledge of process affinity to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.
Key Point: Process Affinity is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Memory Footprint
What is Memory Footprint?
Definition: Total memory used by an application
The study of memory footprint reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.
Key Point: Memory Footprint is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
🔬 Deep Dive: Implementing Hybrid Programs
Initialize MPI with thread support: MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided). Thread levels: MPI_THREAD_SINGLE (no threads), MPI_THREAD_FUNNELED (only master thread makes MPI calls), MPI_THREAD_SERIALIZED (one thread at a time), MPI_THREAD_MULTIPLE (any thread anytime). Common pattern: MPI ranks divide problem into chunks; within each rank, OpenMP parallelizes computation on that chunk. Example: MPI distributes matrix rows across nodes; OpenMP parallelizes row operations within each node. Use environment variables to control threading: OMP_NUM_THREADS, OMP_PLACES, OMP_PROC_BIND. Profile to find optimal balance; typical is 1-4 MPI ranks per node with 16-32 threads each.
This is an advanced topic that goes beyond the core material, but understanding it will give you a deeper appreciation of the subject. Researchers continue to study this area, and new discoveries are being made all the time.
Did You Know? The fastest supercomputers use hybrid MPI+OpenMP+GPU programming, combining three levels of parallelism to achieve maximum performance!
Key Concepts at a Glance
| Concept | Definition |
|---|---|
| Hybrid Programming | Combining MPI and OpenMP for multi-level parallelism |
| Thread Safety | Code that works correctly when called by multiple threads |
| MPI_Init_thread | MPI initialization with thread support specification |
| Process Affinity | Binding processes/threads to specific CPU cores |
| Memory Footprint | Total memory used by an application |
Comprehension Questions
Test your understanding by answering these questions:
In your own words, explain what Hybrid Programming means and give an example of why it is important.
In your own words, explain what Thread Safety means and give an example of why it is important.
In your own words, explain what MPI_Init_thread means and give an example of why it is important.
In your own words, explain what Process Affinity means and give an example of why it is important.
In your own words, explain what Memory Footprint means and give an example of why it is important.
Summary
In this module, we explored Hybrid Programming: MPI + OpenMP. We learned about hybrid programming, thread safety, mpi_init_thread, process affinity, memory footprint. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!
6 Job Scheduling and Resource Management
Running parallel jobs on HPC clusters using Slurm and PBS.
30m
Job Scheduling and Resource Management
Running parallel jobs on HPC clusters using Slurm and PBS.
Learning Objectives
By the end of this module, you will be able to:
- Define and explain Job Scheduler
- Define and explain Slurm
- Define and explain Partition
- Define and explain Job Array
- Define and explain Walltime
- Apply these concepts to real-world examples and scenarios
- Analyze and compare the key concepts presented in this module
Introduction
HPC systems use job schedulers to manage resource allocation fairly among users. Users submit jobs specifying required resources (nodes, cores, time, memory); the scheduler queues jobs and launches them when resources are available. Slurm (Simple Linux Utility for Resource Management) dominates modern HPC. Jobs are submitted via scripts with #SBATCH directives specifying requirements. PBS (Portable Batch System) is an older alternative with similar concepts. Understanding job scheduling is essential for using any HPC system effectively, from getting allocations to optimizing queue wait times.
In this module, we will explore the fascinating world of Job Scheduling and Resource Management. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.
This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!
Job Scheduler
What is Job Scheduler?
Definition: System that manages job queuing and resource allocation
When experts study job scheduler, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding job scheduler helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.
Key Point: Job Scheduler is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Slurm
What is Slurm?
Definition: Popular open-source workload manager for HPC clusters
The concept of slurm has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about slurm, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about slurm every day.
Key Point: Slurm is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Partition
What is Partition?
Definition: Group of nodes with common characteristics in Slurm
To fully appreciate partition, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of partition in different contexts around you.
Key Point: Partition is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Job Array
What is Job Array?
Definition: Collection of similar jobs submitted as one
Understanding job array helps us make sense of many processes that affect our daily lives. Experts use their knowledge of job array to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.
Key Point: Job Array is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Walltime
What is Walltime?
Definition: Maximum execution time for a job
The study of walltime reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.
Key Point: Walltime is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
🔬 Deep Dive: Slurm Job Scripts and Commands
A Slurm job script starts with #!/bin/bash, followed by #SBATCH directives: #SBATCH --nodes=4 (4 nodes), #SBATCH --ntasks-per-node=32 (32 MPI ranks per node), #SBATCH --cpus-per-task=2 (2 threads per rank), #SBATCH --time=02:00:00 (2 hour limit), #SBATCH --partition=compute (queue name). Then load modules and run: srun ./myprogram. Submit with sbatch script.sh. Check status: squeue -u username. Cancel: scancel jobid. Interactive session: salloc --nodes=1 --time=1:00:00. View completed job info: sacct -j jobid. Arrays handle parameter sweeps: #SBATCH --array=1-100. Dependencies chain jobs: --dependency=afterok:jobid.
This is an advanced topic that goes beyond the core material, but understanding it will give you a deeper appreciation of the subject. Researchers continue to study this area, and new discoveries are being made all the time.
Did You Know? Slurm was developed at Lawrence Livermore National Laboratory and now manages some of the world's largest supercomputers, including Frontier with over 9,000 nodes!
Key Concepts at a Glance
| Concept | Definition |
|---|---|
| Job Scheduler | System that manages job queuing and resource allocation |
| Slurm | Popular open-source workload manager for HPC clusters |
| Partition | Group of nodes with common characteristics in Slurm |
| Job Array | Collection of similar jobs submitted as one |
| Walltime | Maximum execution time for a job |
Comprehension Questions
Test your understanding by answering these questions:
In your own words, explain what Job Scheduler means and give an example of why it is important.
In your own words, explain what Slurm means and give an example of why it is important.
In your own words, explain what Partition means and give an example of why it is important.
In your own words, explain what Job Array means and give an example of why it is important.
In your own words, explain what Walltime means and give an example of why it is important.
Summary
In this module, we explored Job Scheduling and Resource Management. We learned about job scheduler, slurm, partition, job array, walltime. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!
7 Performance Profiling and Optimization
Identifying bottlenecks and improving parallel application performance.
30m
Performance Profiling and Optimization
Identifying bottlenecks and improving parallel application performance.
Learning Objectives
By the end of this module, you will be able to:
- Define and explain Profiling
- Define and explain Hotspot
- Define and explain Roofline Model
- Define and explain Cache Miss
- Define and explain NUMA
- Apply these concepts to real-world examples and scenarios
- Analyze and compare the key concepts presented in this module
Introduction
Profiling reveals where applications spend time and resources. For HPC, this includes computation time, communication overhead, memory usage, and I/O patterns. Tools like gprof, perf, and Valgrind profile serial code. Intel VTune and AMD uProf provide detailed CPU analysis. For parallel programs, Scalasca, TAU, and HPCToolkit trace MPI and OpenMP behavior. NVIDIA Nsight profiles GPU code. The optimization cycle: profile to identify bottlenecks, optimize the most impactful areas, re-profile to verify improvements. Common issues include poor load balance, excessive communication, memory bandwidth limits, and cache misses.
In this module, we will explore the fascinating world of Performance Profiling and Optimization. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.
This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!
Profiling
What is Profiling?
Definition: Measuring where an application spends time and resources
When experts study profiling, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding profiling helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.
Key Point: Profiling is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Hotspot
What is Hotspot?
Definition: Code region consuming significant execution time
The concept of hotspot has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about hotspot, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about hotspot every day.
Key Point: Hotspot is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Roofline Model
What is Roofline Model?
Definition: Visual model showing compute vs memory bandwidth limits
To fully appreciate roofline model, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of roofline model in different contexts around you.
Key Point: Roofline Model is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Cache Miss
What is Cache Miss?
Definition: Memory access that must fetch data from slower memory
Understanding cache miss helps us make sense of many processes that affect our daily lives. Experts use their knowledge of cache miss to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.
Key Point: Cache Miss is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
NUMA
What is NUMA?
Definition: Non-Uniform Memory Access architecture with local and remote memory
The study of numa reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.
Key Point: NUMA is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
🔬 Deep Dive: Performance Analysis Techniques
Sampling profilers (perf, VTune) periodically record program state with low overhead, showing where time is spent. Tracing (Scalasca, TAU) records every event with more detail but higher overhead. MPI profiling reveals communication patterns: look for imbalanced send/receive, all-to-all bottlenecks, and serialized collective operations. Memory analysis finds cache misses, NUMA effects, and bandwidth limits. Roofline model plots achieved performance against memory/compute bounds, showing optimization potential. Hardware counters measure cycles, instructions, cache hits/misses, and branch mispredictions. Start with high-level profiling, then drill down. Focus on hotspots consuming significant runtime percentage.
This is an advanced topic that goes beyond the core material, but understanding it will give you a deeper appreciation of the subject. Researchers continue to study this area, and new discoveries are being made all the time.
Did You Know? The Roofline model, developed at Berkeley Lab, has become the standard way to visualize whether code is limited by compute or memory bandwidth!
Key Concepts at a Glance
| Concept | Definition |
|---|---|
| Profiling | Measuring where an application spends time and resources |
| Hotspot | Code region consuming significant execution time |
| Roofline Model | Visual model showing compute vs memory bandwidth limits |
| Cache Miss | Memory access that must fetch data from slower memory |
| NUMA | Non-Uniform Memory Access architecture with local and remote memory |
Comprehension Questions
Test your understanding by answering these questions:
In your own words, explain what Profiling means and give an example of why it is important.
In your own words, explain what Hotspot means and give an example of why it is important.
In your own words, explain what Roofline Model means and give an example of why it is important.
In your own words, explain what Cache Miss means and give an example of why it is important.
In your own words, explain what NUMA means and give an example of why it is important.
Summary
In this module, we explored Performance Profiling and Optimization. We learned about profiling, hotspot, roofline model, cache miss, numa. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!
8 Parallel File Systems and I/O
Efficient data storage and access for HPC workloads.
30m
Parallel File Systems and I/O
Efficient data storage and access for HPC workloads.
Learning Objectives
By the end of this module, you will be able to:
- Define and explain Parallel File System
- Define and explain Lustre
- Define and explain Striping
- Define and explain MPI-IO
- Define and explain HDF5
- Apply these concepts to real-world examples and scenarios
- Analyze and compare the key concepts presented in this module
Introduction
HPC applications often process terabytes to petabytes of data. Standard file systems cannot provide adequate bandwidth for thousands of concurrent processes. Parallel file systems like Lustre, GPFS (Spectrum Scale), and BeeGFS stripe data across many storage servers, enabling aggregate bandwidth of hundreds of GB/s. Understanding I/O patterns is crucial: avoid many small operations; use collective I/O when possible. MPI-IO provides parallel I/O primitives. HDF5 and NetCDF offer high-level libraries for structured scientific data. I/O is often the bottleneck in HPC applications; optimization can dramatically improve overall performance.
In this module, we will explore the fascinating world of Parallel File Systems and I/O. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.
This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!
Parallel File System
What is Parallel File System?
Definition: File system distributing data across multiple servers for high bandwidth
When experts study parallel file system, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding parallel file system helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.
Key Point: Parallel File System is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Lustre
What is Lustre?
Definition: Popular open-source parallel file system for HPC
The concept of lustre has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about lustre, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about lustre every day.
Key Point: Lustre is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Striping
What is Striping?
Definition: Distributing file data across multiple storage targets
To fully appreciate striping, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of striping in different contexts around you.
Key Point: Striping is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
MPI-IO
What is MPI-IO?
Definition: MPI interface for parallel file operations
Understanding mpi-io helps us make sense of many processes that affect our daily lives. Experts use their knowledge of mpi-io to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.
Key Point: MPI-IO is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
HDF5
What is HDF5?
Definition: High-level library for hierarchical scientific data storage
The study of hdf5 reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.
Key Point: HDF5 is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
🔬 Deep Dive: I/O Optimization Strategies
Key strategies: (1) Collective I/O - instead of each process writing separately, aggregate writes through fewer processes (MPI_File_write_all). (2) Striping - set stripe count and size to match access patterns; more stripes for large files, fewer for small. (3) Buffering - accumulate small writes into large buffers before I/O. (4) Asynchronous I/O - overlap computation with I/O operations. (5) Data staging - use node-local SSDs as burst buffers for checkpoints. (6) Compression - reduce data volume for I/O-bound applications. Lustre commands: lfs setstripe -c stripe_count -S stripe_size. Monitor I/O with Darshan profiler to identify bottlenecks. Avoid opening files from all ranks simultaneously.
This is an advanced topic that goes beyond the core material, but understanding it will give you a deeper appreciation of the subject. Researchers continue to study this area, and new discoveries are being made all the time.
Did You Know? Lustre, the dominant HPC file system, was named to suggest light and lustre (brilliance). It powers 7 of the TOP10 supercomputers!
Key Concepts at a Glance
| Concept | Definition |
|---|---|
| Parallel File System | File system distributing data across multiple servers for high bandwidth |
| Lustre | Popular open-source parallel file system for HPC |
| Striping | Distributing file data across multiple storage targets |
| MPI-IO | MPI interface for parallel file operations |
| HDF5 | High-level library for hierarchical scientific data storage |
Comprehension Questions
Test your understanding by answering these questions:
In your own words, explain what Parallel File System means and give an example of why it is important.
In your own words, explain what Lustre means and give an example of why it is important.
In your own words, explain what Striping means and give an example of why it is important.
In your own words, explain what MPI-IO means and give an example of why it is important.
In your own words, explain what HDF5 means and give an example of why it is important.
Summary
In this module, we explored Parallel File Systems and I/O. We learned about parallel file system, lustre, striping, mpi-io, hdf5. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!
9 Debugging Parallel Applications
Finding and fixing bugs in MPI and OpenMP programs.
30m
Debugging Parallel Applications
Finding and fixing bugs in MPI and OpenMP programs.
Learning Objectives
By the end of this module, you will be able to:
- Define and explain Race Condition
- Define and explain Deadlock
- Define and explain Thread Sanitizer
- Define and explain DDT
- Define and explain MUST
- Apply these concepts to real-world examples and scenarios
- Analyze and compare the key concepts presented in this module
Introduction
Parallel bugs are notoriously difficult to find because they may appear intermittently depending on timing. Common issues include race conditions (threads accessing shared data without synchronization), deadlocks (processes waiting for each other forever), and incorrect message passing (wrong source, destination, or buffer). Tools help: DDT and TotalView are commercial parallel debuggers; GDB with mpirun can debug MPI. Valgrind's helgrind and DRD find thread errors. MPI correctness tools like MUST verify MPI usage. Reproducibility challenges mean bugs may not manifest consistently. Defensive programming with assertions and thorough testing on small scale before large runs is essential.
In this module, we will explore the fascinating world of Debugging Parallel Applications. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.
This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!
Race Condition
What is Race Condition?
Definition: Bug where outcome depends on unpredictable thread timing
When experts study race condition, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding race condition helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.
Key Point: Race Condition is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Deadlock
What is Deadlock?
Definition: State where processes wait for each other indefinitely
The concept of deadlock has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about deadlock, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about deadlock every day.
Key Point: Deadlock is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Thread Sanitizer
What is Thread Sanitizer?
Definition: Tool detecting race conditions and other thread errors
To fully appreciate thread sanitizer, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of thread sanitizer in different contexts around you.
Key Point: Thread Sanitizer is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
DDT
What is DDT?
Definition: Commercial parallel debugger for MPI and OpenMP
Understanding ddt helps us make sense of many processes that affect our daily lives. Experts use their knowledge of ddt to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.
Key Point: DDT is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
MUST
What is MUST?
Definition: MPI correctness checking tool
The study of must reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.
Key Point: MUST is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
🔬 Deep Dive: Common Parallel Bugs and Detection
Race conditions: multiple threads modify shared data without locks. Symptoms: incorrect results that change between runs. Detection: run with thread sanitizers (gcc -fsanitize=thread), use helgrind. Fix: add proper synchronization (mutex, critical section, atomic operations). Deadlocks: process A waits for B while B waits for A. Symptoms: program hangs. Detection: attach debugger to hanging process, check stack traces. Fix: ensure consistent ordering of communications. MPI mismatches: send/receive type or count mismatch causes undefined behavior. Detection: use MUST or Intel MPI's -check_mpi. Memory errors in parallel: use Valgrind memcheck on single-rank runs. Buffer overruns in MPI: ensure buffers are large enough for all received data.
This is an advanced topic that goes beyond the core material, but understanding it will give you a deeper appreciation of the subject. Researchers continue to study this area, and new discoveries are being made all the time.
Did You Know? The first parallel debugger, Totalview, was created in 1989 and is still actively developed and used on the world's largest supercomputers today!
Key Concepts at a Glance
| Concept | Definition |
|---|---|
| Race Condition | Bug where outcome depends on unpredictable thread timing |
| Deadlock | State where processes wait for each other indefinitely |
| Thread Sanitizer | Tool detecting race conditions and other thread errors |
| DDT | Commercial parallel debugger for MPI and OpenMP |
| MUST | MPI correctness checking tool |
Comprehension Questions
Test your understanding by answering these questions:
In your own words, explain what Race Condition means and give an example of why it is important.
In your own words, explain what Deadlock means and give an example of why it is important.
In your own words, explain what Thread Sanitizer means and give an example of why it is important.
In your own words, explain what DDT means and give an example of why it is important.
In your own words, explain what MUST means and give an example of why it is important.
Summary
In this module, we explored Debugging Parallel Applications. We learned about race condition, deadlock, thread sanitizer, ddt, must. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!
10 Domain Decomposition and Load Balancing
Distributing work efficiently across parallel processes.
30m
Domain Decomposition and Load Balancing
Distributing work efficiently across parallel processes.
Learning Objectives
By the end of this module, you will be able to:
- Define and explain Domain Decomposition
- Define and explain Halo Exchange
- Define and explain ParMETIS
- Define and explain Load Imbalance
- Define and explain Space-Filling Curve
- Apply these concepts to real-world examples and scenarios
- Analyze and compare the key concepts presented in this module
Introduction
Domain decomposition divides the computational domain (data and work) among parallel processes. For regular grids, 1D, 2D, or 3D block decomposition maps naturally to the problem structure. The surface-to-volume ratio affects communication overhead: higher-dimensional decomposition often reduces halo (ghost cell) exchange. For irregular domains like graphs or particles, partitioning libraries (ParMETIS, Zoltan) balance load while minimizing communication. Dynamic load balancing redistributes work as computation evolves (e.g., adaptive mesh refinement). The goal is equal work per process with minimal data exchange between them.
In this module, we will explore the fascinating world of Domain Decomposition and Load Balancing. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.
This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!
Domain Decomposition
What is Domain Decomposition?
Definition: Dividing problem space among parallel processes
When experts study domain decomposition, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding domain decomposition helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.
Key Point: Domain Decomposition is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Halo Exchange
What is Halo Exchange?
Definition: Communication of boundary data between neighboring domains
The concept of halo exchange has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about halo exchange, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about halo exchange every day.
Key Point: Halo Exchange is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
ParMETIS
What is ParMETIS?
Definition: Parallel graph partitioning library
To fully appreciate parmetis, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of parmetis in different contexts around you.
Key Point: ParMETIS is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Load Imbalance
What is Load Imbalance?
Definition: Unequal work distribution causing idle time
Understanding load imbalance helps us make sense of many processes that affect our daily lives. Experts use their knowledge of load imbalance to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.
Key Point: Load Imbalance is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Space-Filling Curve
What is Space-Filling Curve?
Definition: Continuous path through multi-dimensional space preserving locality
The study of space-filling curve reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.
Key Point: Space-Filling Curve is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
🔬 Deep Dive: Partitioning Strategies and Tools
For structured grids: Block decomposition divides domain into contiguous chunks. 1D (slabs), 2D (pencils), 3D (blocks) trade communication patterns for surface area. Cyclic distribution (round-robin) helps with load imbalance if work per cell varies. For unstructured meshes and graphs: ParMETIS uses multilevel recursive bisection to minimize edge cuts (communication). Zoltan provides multiple algorithms including geometric, graph, and hypergraph partitioning. Space-filling curves (Hilbert, Morton) map multi-dimensional data to 1D while preserving locality. Measure load imbalance as max_work/avg_work; target < 1.05 (5% imbalance). Re-partition when imbalance exceeds threshold in dynamic simulations.
This is an advanced topic that goes beyond the core material, but understanding it will give you a deeper appreciation of the subject. Researchers continue to study this area, and new discoveries are being made all the time.
Did You Know? ParMETIS can partition a billion-element mesh across thousands of processors in minutes, making it possible to run CFD simulations of entire aircraft!
Key Concepts at a Glance
| Concept | Definition |
|---|---|
| Domain Decomposition | Dividing problem space among parallel processes |
| Halo Exchange | Communication of boundary data between neighboring domains |
| ParMETIS | Parallel graph partitioning library |
| Load Imbalance | Unequal work distribution causing idle time |
| Space-Filling Curve | Continuous path through multi-dimensional space preserving locality |
Comprehension Questions
Test your understanding by answering these questions:
In your own words, explain what Domain Decomposition means and give an example of why it is important.
In your own words, explain what Halo Exchange means and give an example of why it is important.
In your own words, explain what ParMETIS means and give an example of why it is important.
In your own words, explain what Load Imbalance means and give an example of why it is important.
In your own words, explain what Space-Filling Curve means and give an example of why it is important.
Summary
In this module, we explored Domain Decomposition and Load Balancing. We learned about domain decomposition, halo exchange, parmetis, load imbalance, space-filling curve. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!
11 HPC Applications: Scientific Simulations
Real-world HPC applications in physics, chemistry, and engineering.
30m
HPC Applications: Scientific Simulations
Real-world HPC applications in physics, chemistry, and engineering.
Learning Objectives
By the end of this module, you will be able to:
- Define and explain Molecular Dynamics
- Define and explain CFD
- Define and explain Climate Modeling
- Define and explain Finite Element
- Define and explain N-body Problem
- Apply these concepts to real-world examples and scenarios
- Analyze and compare the key concepts presented in this module
Introduction
HPC enables scientific breakthroughs impossible with smaller computers. Climate modeling simulates decades of global weather using millions of grid cells. Computational fluid dynamics (CFD) predicts airflow around aircraft and vehicles. Molecular dynamics simulates protein folding and drug interactions at atomic scale. Cosmological simulations model universe evolution with billions of particles. Finite element analysis designs structures and machines. Quantum chemistry calculates molecular properties. Each application type has characteristic algorithms, scaling properties, and software ecosystems. Understanding these applications helps design effective HPC systems and optimize code for specific domains.
In this module, we will explore the fascinating world of HPC Applications: Scientific Simulations. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.
This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!
Molecular Dynamics
What is Molecular Dynamics?
Definition: Simulation of atomic motion by integrating equations of motion
When experts study molecular dynamics, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding molecular dynamics helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.
Key Point: Molecular Dynamics is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
CFD
What is CFD?
Definition: Computational Fluid Dynamics for simulating fluid flow
The concept of cfd has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about cfd, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about cfd every day.
Key Point: CFD is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Climate Modeling
What is Climate Modeling?
Definition: HPC simulation of Earth's atmosphere and oceans
To fully appreciate climate modeling, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of climate modeling in different contexts around you.
Key Point: Climate Modeling is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Finite Element
What is Finite Element?
Definition: Numerical method for solving PDEs on complex geometries
Understanding finite element helps us make sense of many processes that affect our daily lives. Experts use their knowledge of finite element to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.
Key Point: Finite Element is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
N-body Problem
What is N-body Problem?
Definition: Computing interactions among many particles
The study of n-body problem reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.
Key Point: N-body Problem is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
🔬 Deep Dive: Molecular Dynamics Case Study
Molecular dynamics (MD) simulates atomic/molecular motion by numerically integrating Newton's equations. Codes like LAMMPS, GROMACS, and NAMD simulate millions of atoms for microseconds of real time. Key components: force calculation (Lennard-Jones, Coulomb), integration (Verlet), neighbor lists, periodic boundaries. Parallelization: domain decomposition assigns spatial regions to processes; atoms near boundaries require halo exchange. Load balancing is challenging as atoms move. GPU acceleration provides 10-100x speedup for force calculations. Applications include drug discovery (protein-ligand binding), materials science (polymers, batteries), and biophysics (membrane proteins). Typical runs use thousands of CPU cores or hundreds of GPUs.
This is an advanced topic that goes beyond the core material, but understanding it will give you a deeper appreciation of the subject. Researchers continue to study this area, and new discoveries are being made all the time.
Did You Know? The 2013 Nobel Prize in Chemistry recognized molecular dynamics simulations, which can now simulate entire viruses with over 100 million atoms!
Key Concepts at a Glance
| Concept | Definition |
|---|---|
| Molecular Dynamics | Simulation of atomic motion by integrating equations of motion |
| CFD | Computational Fluid Dynamics for simulating fluid flow |
| Climate Modeling | HPC simulation of Earth's atmosphere and oceans |
| Finite Element | Numerical method for solving PDEs on complex geometries |
| N-body Problem | Computing interactions among many particles |
Comprehension Questions
Test your understanding by answering these questions:
In your own words, explain what Molecular Dynamics means and give an example of why it is important.
In your own words, explain what CFD means and give an example of why it is important.
In your own words, explain what Climate Modeling means and give an example of why it is important.
In your own words, explain what Finite Element means and give an example of why it is important.
In your own words, explain what N-body Problem means and give an example of why it is important.
Summary
In this module, we explored HPC Applications: Scientific Simulations. We learned about molecular dynamics, cfd, climate modeling, finite element, n-body problem. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!
12 Future of HPC: Exascale and Beyond
Emerging technologies and challenges shaping next-generation supercomputing.
30m
Future of HPC: Exascale and Beyond
Emerging technologies and challenges shaping next-generation supercomputing.
Learning Objectives
By the end of this module, you will be able to:
- Define and explain Exascale
- Define and explain Heterogeneous Computing
- Define and explain Performance Portability
- Define and explain Kokkos
- Define and explain SYCL
- Apply these concepts to real-world examples and scenarios
- Analyze and compare the key concepts presented in this module
Introduction
HPC continues evolving toward exascale (10^18 FLOPS) and beyond. Frontier achieved exascale in 2022; Aurora and El Capitan follow. But power consumption (20-30 MW per system) and reliability (millions of components) pose challenges. Heterogeneous architectures mix CPUs, GPUs, and specialized accelerators. AI/ML integration blurs lines between traditional HPC and deep learning. Quantum computing may complement classical HPC for specific problems. New programming models (SYCL, Kokkos, RAJA) aim for performance portability. Neuromorphic and optical computing explore alternative computational paradigms. The future demands innovations in algorithms, hardware, and software to maintain progress.
In this module, we will explore the fascinating world of Future of HPC: Exascale and Beyond. You will discover key concepts that form the foundation of this subject. Each concept builds on the previous one, so pay close attention and take notes as you go. By the end, you'll have a solid understanding of this important topic.
This topic is essential for understanding how the subject works and how experts organize their knowledge. Let's dive in and discover what makes this subject so important!
Exascale
What is Exascale?
Definition: Computing performance of 10^18 floating-point operations per second
When experts study exascale, they discover fascinating details about how systems work. This concept connects to many aspects of the subject that researchers investigate every day. Understanding exascale helps us see the bigger picture. Think about everyday examples to deepen your understanding — you might be surprised how often you encounter this concept in the world around you.
Key Point: Exascale is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Heterogeneous Computing
What is Heterogeneous Computing?
Definition: Using different processor types (CPU, GPU, accelerators) together
The concept of heterogeneous computing has been studied for many decades, leading to groundbreaking discoveries. Research in this area continues to advance our understanding at every scale. By learning about heterogeneous computing, you are building a strong foundation that will support your studies in more advanced topics. Experts around the world work to uncover new insights about heterogeneous computing every day.
Key Point: Heterogeneous Computing is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Performance Portability
What is Performance Portability?
Definition: Code that achieves good performance across different architectures
To fully appreciate performance portability, it helps to consider how it works in real-world applications. This universal nature is what makes it such a fundamental concept in this field. As you learn more, try to identify examples of performance portability in different contexts around you.
Key Point: Performance Portability is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
Kokkos
What is Kokkos?
Definition: C++ library for portable parallel programming
Understanding kokkos helps us make sense of many processes that affect our daily lives. Experts use their knowledge of kokkos to solve problems, develop new solutions, and improve outcomes. This concept has practical applications that go far beyond the classroom.
Key Point: Kokkos is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
SYCL
What is SYCL?
Definition: Khronos standard for C++ heterogeneous computing
The study of sycl reveals the elegant complexity of how things work. Each new discovery opens doors to understanding other aspects and how knowledge in this field has evolved over time. As you explore this concept, try to connect it with what you already know — you'll find that everything is interconnected in beautiful and surprising ways.
Key Point: SYCL is a fundamental concept that you will encounter throughout your studies. Make sure you can explain it in your own words!
🔬 Deep Dive: Performance Portability Frameworks
Writing code that runs efficiently on diverse hardware (Intel/AMD CPUs, NVIDIA/AMD GPUs) is challenging. Performance portability frameworks provide abstraction: Kokkos (Sandia) offers C++ abstractions for parallel execution and memory spaces; code compiles to OpenMP, CUDA, or HIP. RAJA (LLNL) provides similar abstractions with different design choices. SYCL is a Khronos standard extending C++ for heterogeneous computing; Intel oneAPI implements SYCL. OpenACC uses directives like OpenMP but targets accelerators. The tradeoff: abstraction enables portability but may sacrifice peak performance versus native code. These frameworks are increasingly adopted for large HPC codes that must run across multiple supercomputing centers.
This is an advanced topic that goes beyond the core material, but understanding it will give you a deeper appreciation of the subject. Researchers continue to study this area, and new discoveries are being made all the time.
Did You Know? Frontier, the first exascale supercomputer, uses 9,408 AMD compute nodes each with one CPU and four GPUs, consuming enough power to supply 10,000 homes!
Key Concepts at a Glance
| Concept | Definition |
|---|---|
| Exascale | Computing performance of 10^18 floating-point operations per second |
| Heterogeneous Computing | Using different processor types (CPU, GPU, accelerators) together |
| Performance Portability | Code that achieves good performance across different architectures |
| Kokkos | C++ library for portable parallel programming |
| SYCL | Khronos standard for C++ heterogeneous computing |
Comprehension Questions
Test your understanding by answering these questions:
In your own words, explain what Exascale means and give an example of why it is important.
In your own words, explain what Heterogeneous Computing means and give an example of why it is important.
In your own words, explain what Performance Portability means and give an example of why it is important.
In your own words, explain what Kokkos means and give an example of why it is important.
In your own words, explain what SYCL means and give an example of why it is important.
Summary
In this module, we explored Future of HPC: Exascale and Beyond. We learned about exascale, heterogeneous computing, performance portability, kokkos, sycl. Each of these concepts plays a crucial role in understanding the broader topic. Remember that these ideas are building blocks — each module connects to the next, helping you build a complete picture. Keep reviewing these concepts and you'll be well prepared for what comes next!
Ready to master High Performance Computing?
Get personalized AI tutoring with flashcards, quizzes, and interactive exercises in the Eludo app