The data engineering is an active research area focuses on the sophisticated processing of a large amount of various data in computer systems, such as processing advanced databases.
This course aims to let students learn advanced methodologies and mechanisms for manipulating a large amount of data efficiently through understanding various contemporary technologies of data engineering, including application examples, data structures, indexing, processing algorithms, and parallel processing methods for highly functional and high-speed processing of a large amount of data.
By the end of this course, students will be able to
1) Understand the basic concept of data engineering and its basics: Relational databases and transaction processing
2) Understand technologies for data warehouse as a typical application of data engineering
3) Understand data structure and algorithms of OLAP and data mining executed in the data warehouse
4) Understand implementation algorithms and costs of relational database operations for the data warehouse
5) Understand parallelization approaches for high-speed relational database operations
6) Understand skew handling methods for parallel database operations
7) Understand distributed database processing including a database in the cloud
8) Understand trends of recent XML/RDF databases
Data Warehousing, OLAP, Data Mining, Indexing Methods, Parallel Database Operations, Data Placement, Skew Handling, Cloud Database, XML/RDF databases
✔ Specialist skills | Intercultural skills | Communication skills | Critical thinking skills | ✔ Practical and/or problem-solving skills |
Standard Lecture
Course schedule | Required learning | |
---|---|---|
Class 1 | Basic Concept and Background of Data Engineering | Understand the basic concept of data engineering |
Class 2 | Relational Database and Transaction Procesing | Understand relational databases and transaction processing |
Class 3 | Data Warehouse, OLAP, and Data Mining | Understand Data Warehouse, OLAP and Data mining |
Class 4 | Storing Data | Understand Storing Data |
Class 5 | Indexing | Understand Indexing |
Class 6 | Estimate Cost of Relational Algebra Operations 1: Selection, Projection | Understand Algorithms and Cost for Selection and Projection Operations |
Class 7 | Estimate Cost of Relational Algebra Operations 2: Join, Aggregate Functions | Understand Algorithms and Costs for Join Operation and Aggregate Functions |
Class 8 | Classify Parallelize Database Operations and Data Partitioning | Understand Classification of Parallel Database Processing and Data Distributiion |
Class 9 | Parallel Join Operations: Sort Merge Join, Hash Join | Understand Algorithm and Costs of Parallel Merge Sort Join and Hash Join |
Class 10 | Parallel Aggregate Functions, Skew Handling | Understand Algorithm and Cost of Parallel Aggregation Functions and Skew Handling |
Class 11 | Distributed Database Processing and Blockchain | Understand Distributed Database Processing and Blockchain |
Class 12 | Cloud and Databases | Understad Database Processing in Cloud Environment |
Class 13 | XML Databases | Understand XML Databases and RDF Databases |
Class 14 | Privacy and Security of Database | Understand Privacy and Security |
To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.
Distribute manuscripts through OCW/OCW-i
Jim Gray and Andreas Reuter著「Transaction Processing: Concept and Techniques」 Morgan Kaufmann Publishers,
Assignments in Lectures (60%) and Final Report (40%)
Basic knowledge of databases and computer architecture
yokota[at]cs.titech.ac.jp