2018 | High Performance Scientific Computing

Home
> School of Computing
> Common courses
> High Performance Scientific Computing

Undergraduate
Graduate

2018　High Performance Scientific Computing

Font size S M L

Academic unit or major: Graduate major in Computer Science

Instructor(s): Yokota Rio

Class Format: Lecture

Media-enhanced courses

Day/Period(Room No.): Mon1-2(W831) Thr1-2(W831)

Group: -

Course number: CSC.T526

Credits: 2

Academic year: 2018

Offered quarter: 1Q

Syllabus updated: 2018/3/20

Lecture notes updated: 2018/5/31

Language used: English

Access Index

Course description and aims

This course will equip students with the necessary knowledge and skills to develop fast algorithms and their massively parallel implementation on modern supercomputers using parallel programming techniques such as SIMD, OpenMP, MPI, and CUDA. The course will cover how to use various linear algebra libraries for parallel execution on both CPUs and GPUs. Tutorials on how to use debuggers and profilers in a massively parallel environment will also be given. Demonstration of performance primitives such as MapReduce and graph partitioning tools will be given, along with tips on how to execute deep learning frameworks on large GPU supercomputers.

Student learning outcomes

By the end of this course, students will be able to
1. Use SIMD vectorization, shared memory parallelization via OpenMP, and distributed memory parallelization via MPI
2. Program GPUs using OpenACC, CUDA, and OpenCL
3. Understand how high performance numerical libraries function, and will be able to use them appropriately
4. Debug and profile code in a parallel environment by using parallel debuggers and profilers
5. Use performance primitives such as ModernGPU and MapReduce to achieve high performance with minimal effort
6. Use graph partitioning tools and deep learning frameworks on massively parallel computers

Keywords

Vectorization, Shared memory parallelism, Distributed memory parallelism, GPU programming, Numerical libraries, Matrix Multiplication, Linear solvers, FFT, Parallel debugger, Parallel profilers, Graph partitioning, Deep Learning

Competencies that will be developed

✔ Specialist skills

Intercultural skills

Communication skills

✔ Critical thinking skills

✔ Practical and/or problem-solving skills

Class flow

From the second lecture, the class will proceed interactively under the assumption that course materials for that week have been read.

Course schedule/Required learning

	Course schedule	Required learning
Class 1	How to use TSUBAME	Login to Tokyo Tech's supercomputer TSUBAME and learn how to use libraries and the job scheduler
Class 2	Shared memory parallelization	Use pthreads and OpenMP to achieve shared memory parallelization
Class 3	Distributed memory parallelization	Use MPI to achieve distributed memory parallelization
Class 4	SIMD parallelization	Use SSE, AVX, and AVX512 to achieve SIMD vectorization
Class 5	GPU programming	Use OpenACC, CUDA, and OpenCL to program GPUs
Class 6	Multi-GPU programming	Combine CUDA and MPI to use multiple GPUs on TSUBAME
Class 7	Cache blocking	Use BLISLAB and CUBLAS as an example to practice cache blocking
Class 8	Numerical libraries	Understand how LAPACK, SCALAPACK, and FFTW work, and learn to use them appropriately
Class 9	Fast linear solvers	Understand how to choose the appropriate solvers in PETSc and Trilinos
Class 10	I/O libraries	Use NetCDF, HDF5, MPI-IO to read and write on large parallel file systems
Class 11	Parallel debugger	Use CUDA-GDB, Valgrind, TotalView to debug parallel code
Class 12	Parallel profiler	Use gprof, VTune, PAPI, Tau, Vampire to profile parallel code
Class 13	Performance primitives	Learn how to use performance primitives such as ModernGPU and MapReduce
Class 14	Graph partitioning	Use METIS and ParMETIS to partition a large graph in parallel
Class 15	Deep Learning	Use ChainerMN to train a large neural network on a parallel computer

Textbook(s)

None

Reference books, course materials, etc.

None

Assessment criteria and methods

Evaluation is based on written reports (40%) and final report (60%).

Related courses

Numerical Analysis
Basic Application of Computing and Mathematical Sciences

Prerequisites (i.e., required knowledge, skills, courses, etc.)

None

TOKYO INSTITUTE OF TECHNOLOGY