Introduction to Parallel Programming
26 April 2021
Obed N Munoz
Cloud Software Engineer
Concurrent vs Parallel
Parallel Programming - Serial Computing
- A problem is broken into a serie of instructions
- Instructions are executed sequentially
- Instructions are executed in a single processor
- Only one instructions at the time
Parallel Programming - Parallel Computing
- A problem is broken into a parts that can be solved concurrently
- Each part is broken to a serie of instructions
- Instructions from each part are executed simultaneously on different processors.
- A control/synchronization mechanism is required
Parallel Programming - Parallel Computers 1/2
- Multiple functional units (Lx Caches, prefetch, decode, floating pont, GPU, etc)
- Multiple execution units/cores
- Multiple hardware threads
Parallel Programming - Parallel Computers 2/2
Multiple stand-alone computers to make a larger parallel computer(cluster).
Parallel Computing - Where?
Parallel Computing - Why?
- Save time and money.
- Solve larger and more complex problems.
- Provide concurrency.
- Take advantage of non-local resources.
- Better use of underlying parallel hardware.
- What else?
Jargon of Parallel Programming
von Neumann Architecture
So what? Who cares?
Parallel computers still follow this basic design, just multiplied in units. The basic, fundamental architecture remains the same.
Jargon of Parallel Programming - Terms
- Task - as a sequence of instructions to solve a particular problem
- Unit of execution (UE) - a task needs to be mapped to a UE such as a process or thread
- Processing element (PE) - as a generic term for a hardware element that executes a stream of instructions.
- Load balance and load balancing
- Synchronus vs Asynchronus
- Race Conditions
Distinguishes multi-processor computer architectures according to how they can be classified along the two independent dimensions of Instruction Stream and Data Stream. Each of these dimensions can have only one of two possible states: Single or Multiple.
Parallel Architectures - Shared Memory
- Generaly speaking, all processors have access to all memory as a global address space.
- Multiple processors can operate independently but share same memory resources.
Uniform Memory Access (UMA)
Parallel Architectures - Shared Memory
Non-Uniform Memory Access (NUMA)
- User-friendly global address space
- Fast and Uniform data sharing between tasks
- Lack of scalability between memory and CPUs relationship
- Synchronization relies on programmer
Parallel Architectures - Distributed Memory
Distributed memory systems require a communication network to connect inter-processor memory.
- Memory is scalable with number of CPUs
- Each CPU can rapidly access its own memory
- Cost effectiveness: can use commodity, off-the-shelf processors and networking
- Data communication is mostly responsability of the programmer
- Global memory data structures can not easily map to this memory organization
- Non-uniform memory access times
Parallel Architectures - Hybrid Distributed-Shared Memory
The largest and fastest computers in the world today employ both shared and distributed memory architectures.
- Whatever is common to both shared and distributed memory architectures.
- Increased scalability is an important advantage
- Increased programmer complexity is an important disadvantage
Parallel Programming Pattern's Definition
In the book of Patterns for Parallel Programming from Massingill, Sanders and Mattson there's a pattern language that helps on the process of understanding and designing parallel programs.
Programmers should start their design of a parallel solution by analyzing the problem within the problem domain to expose exploitable concurrency.
Is the problem large enough and the results significant enough to justify?
Our goal is to refine the design and move it closer to a program that can execute tasks concurrently by mapping the concurrency onto multiple UEs running on a parallel computer.
The key issue at this stage is to decide which pattern or patterns are most appropriate for the problem.
We call these patterns Supporting Structures because they describe software constructions or "structures" that support the expression of parallel algorithms.
Every parallel program needs to:
1. Create the set of UEs.
2. Manage interactions between them and their access to shared resources
3. Exchange information between UEs.
4. Shut them down in an orderly manner.
Parallel Programming Models
- Shared Memory (without threads)
- Distributed Memory / Message Passing
- Data Parallel
- Single Program Multiple Data (SPMD)
- Multiple Program Multiple Data (MPMD)
More details at: computing.llnl.gov/tutorials/parallel_comp/#Models
Parallel Programming Design Considerations
- Understand the problem and the program
- Identify hotspots and bottlenecks
- Partitioning (tasks and data)
- Latency vs Bandwith
- Synchronous or Asynchronous
- Scope (point-to-point or collective)
- Synhcronous communication operations
- Load balancing
More at: computing.llnl.gov/tutorials/parallel_comp/#Designing
Resources and Credits
This material is genereated thanks to some extracts from following resources:
- Introduction to Parallel Computing - Blaise Barney, Lawrence Livermore National Laboratory
- Patterns for Parallel Programming - Berna L Massingill, Beverly A. Sanders, Timothy G. Mattson
Use the left and right arrow keys or click the left and right
edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)