2020 High Performance Scientific Computing

Font size  SML

Register update notification mail Add to favorite lecture list
Academic unit or major
Graduate major in Computer Science
Instructor(s)
Yokota Rio 
Course component(s)
Lecture
Mode of instruction
ZOOM
Day/Period(Room No.)
Mon1-2(W831)  Thr1-2(W831)  
Group
-
Course number
CSC.T526
Credits
2
Academic year
2020
Offered quarter
1Q
Syllabus updated
2020/4/24
Lecture notes updated
2020/6/18
Language used
English
Access Index

Course description and aims

This course will equip students with the necessary knowledge and skills to develop fast algorithms and their massively parallel implementation on modern supercomputers using parallel programming techniques such as SIMD, OpenMP, MPI, and CUDA. The course will cover how to use various linear algebra libraries for parallel execution on both CPUs and GPUs. Tutorials on how to use debuggers and profilers in a massively parallel environment will also be given. Demonstration of performance primitives and how to build container environments on TSUBAME will be given, along with tips on how to execute deep learning frameworks on large GPU supercomputers.

Student learning outcomes

By the end of this course, students will be able to
1. Use SIMD vectorization, shared memory parallelization via OpenMP, and distributed memory parallelization via MPI
2. Program GPUs using OpenACC, CUDA, and HIP
3. Understand how high performance numerical libraries function, and will be able to use them appropriately
4. Debug and profile code in a parallel environment by using parallel debuggers and profilers
5. Use containers and deep learning frameworks on massively parallel computers

Keywords

Vectorization, Shared memory parallelism, Distributed memory parallelism, GPU programming, Python libraries, Matrix Multiplication, Linear solvers, Parallel debugger, Parallel profilers, Containers, Deep Learning

Competencies that will be developed

Specialist skills Intercultural skills Communication skills Critical thinking skills Practical and/or problem-solving skills

Class flow

Courses will be taught online.
Sample codes will be prepared for each lecture, and exercises will be performed on TSUBAME.

Course schedule/Required learning

  Course schedule Required learning
Class 1 Introduction to parallel programming Introduction to the basic concepts of parallel programming
Class 2 Shared memory parallelization Use OpenMP to achieve shared memory parallelization
Class 3 Distributed memory parallelization Use MPI to achieve distributed memory parallelization
Class 4 SIMD parallelization Use SSE, AVX, and AVX512 to achieve SIMD vectorization
Class 5 GPU programming Use OpenACC, CUDA, and HIP to program GPUs
Class 6 Parallel programming models Use advanced parallel programming models such as StarPU, OmpSs, and Legion
Class 7 Cache blocking Use BLISLAB and CUBLAS as an example to practice cache blocking
Class 8 High performance Python Understand how numPy, cuPy, and other libraries can be used to accelerate Python code
Class 9 I/O libraries Use NetCDF, HDF5, MPI-IO to read and write on large parallel file systems
Class 10 Parallel debugger Use CUDA-GDB, Valgrind, TotalView to debug parallel code
Class 11 Parallel profiler Use gprof, VTune, PAPI, Tau, Vampire to profile parallel code
Class 12 Containers Use Singularity with Docker images to build container environments
Class 13 Scientific Computing Learn how to discretize partial differential equations and parallelize the resulting system of equations
Class 14 Deep Learning Use PyTorch to train a large neural network on a parallel computer

Textbook(s)

None

Reference books, course materials, etc.

None

Assessment criteria and methods

Evaluation is based on written reports (40%) and final report (60%).

Related courses

  • Numerical Analysis
  • Basic Application of Computing and Mathematical Sciences

Prerequisites (i.e., required knowledge, skills, courses, etc.)

None

Other

The Zoom link will be send to registered students one day before the first lecture.

Page Top