Recently, a new high-performance computing for processing a large amount of data is demanded by many fields, since the amount of data is rapidly increasing. To cope with this issue, researchers have developed new computational models, algorithms, highly productive software design methods in order to effectively use modern computer systems with high-performance hardware.
This course aims to provide students with the topics on the frontier of high-performance computing and its concepts, computational models, and techniques. More specifically, it introduces new algorithms and systems by using modern high-performance hardware from two aspects: cloud storage, MapReduce framework from a macro viewpoint, and concurrent data management for multicore processors, such as lock-free algorithms, transactional memory, from a micro viewpoint.
At the end of this course, students will:
- Understand the organization of modern computer systems with high-performance hardware,
- Understand the concepts and techniques of new parallel processing models and algorithms to use these systems,
- Understand highly abstractive parallel computing environments for cloud computing such as cloud storage and the MapReduce framework,
- Understand advanced algorithms for high-performance computing and concurrent data access methods on modern multicore processors, and
- Be able to apply the acquired computational models and algorithms to emerging software used in many fields that need to deal with a large amount of data and/or high-performance computing as well as in the IT field.
cloud computing, cloud storage, MapReduce framework, cache-conscious algorithm, concurrent data access, lock-free algorithm, transactional memory
|Intercultural skills||Communication skills||✔ Specialist skills||Critical thinking skills||Practical and/or problem-solving skills|
Students must thoroughly review the subjects described in the required learning section and study their related topics by themselves after each class.
|Course schedule||Required learning|
|Class 1||Large data processing by cloud computing||Understand the present situation of cloud services.|
|Class 2||Data model and consistency model of key-value store||Understand the properties of key-value stores.|
|Class 3||Data distribution and high availability of cloud storage||Understand the data distribution and high availability in cloud storage.|
|Class 4||Organization of cloud storage||Understand distributed algorithms used for cloud storage and their purposes.|
|Class 5||MapReduce framework and its computational model||Understand the roles and restrictions of each phase in MapReduce framework.|
|Class 6||Large text processing algorithms for MapReduce||Understand an algorithm to build an inverted index with MapReduce.|
|Class 7||Large graph processing algorithms for MapReduce||Understand the calculation of the PageRank with MapReduce.|
|Class 8||Memory hierarchy and high-performance computing||Understand the relationship between data access and its effect on the cache memory.|
|Class 9||Cache-conscious search algorithms||Understand the relationship between cache-conscious search algorithms and cache memory.|
|Class 10||Concurrent threads and synchronization||Understand basic concurrent data accessing methods with locking.|
|Class 11||The formal model for concurrent threads||Understand the definition of linearizability.|
|Class 12||Concurrent data access: lock-free algorithms||Concurrent data access: lock-free algorithms Understand concurrent data accessing methods without locking.|
|Class 13||Concurrent data access: software transactional memory||Understand the principle of software transactional memory.|
|Class 14||Concurrent data access: hardware transactional memory||Understand the mechanism and limitation of best-effort hardware transactional memory.|
None required. Handouts used in class can be found on OCW-i.
J. Lin, C. Dyer, "Data-Intensive Text Processing with MapReduce", Morgan & Claypool Publisher
T. Harris, J. Larus, R. Rajwar, "Transactional Memory", 2nd edition, Morgan & Claypool Publisher
Students will be assessed on their understanding of principles of cloud computing and its parallel processing, the relationship between memory hierarchy and algorithms, and the basics of concurrent data access. Students’ course scores are based on a midterm assignment (50%) and a term-end assignment (50%).
Having the following knowledge is desirable:
- Distributed algorithms
- Concurrent programming
- Computer architecture