In the current society, it is essential in all fields to appropriately exploit "big data" for finding rules and/or making predictions/decisions. This course gives fundamental knowledge and basic skills for handling large-scale data sets with the aid of computers.
Students will be able to apply basic knowledge on the statistical machine learning for analyzing data and evaluating the obtained results mathematically.
classification, regression, clustering, dimensionality reduction, training/generalization errors, model selection
|✔ Specialist skills||Intercultural skills||Communication skills||Critical thinking skills||✔ Practical and/or problem-solving skills|
These classes will be given by online (Zoom etc. will be used). Classes and exercises are held alternately.
|Course schedule||Required learning|
|Class 1||Class guidance||Learn the abstract of mathematical/statistical knowledge in data science.|
|Class 2||Exercise: basic Python exercises||Understand the general idea of computation on python, and obtain basic knowledge for this course.|
|Class 3||Fundamentas of data analysis||Understand the outline of statistics and data science.|
|Class 4||Exercise: computing environment and warming-up of programming||Learn how to use Python libraries and Google Colaboratory for data mining.|
|Class 5||Classification and model selection||Understand the simple classification rule generation mechanism. Learn the idea of training/generalization error and model selection method.|
|Class 6||Exercise: Classification and model selection||Understand the principle idea of the decision tree and how to use it.|
|Class 7||Clustering||Understand the idea of unsupervised learning and clustering algorithm.|
|Class 8||Exercise: Clustering||Understand the mechanism of clustering and how to apply to sample data.|
|Class 9||Principal component analysis||Understand the mechanism of principal component analysis and its mathematical background.|
|Class 10||Exercise: Principal component analysis||Learn how to apply principal component analysis to sample data.|
|Class 11||Dimensionality reduction||Learn dimensionality reduction methods to map high dimensional data into low dimensional space.|
|Class 12||Exercise: dimensionality reduction||Learn how to apply dimensionality reduction methods to sample data.|
|Class 13||Ensemble learning||Understand the mechanism of ensemble learning and its major methods.|
|Class 14||Exercise: Ensemble learning||Learn how to apply ensemble learning methods to sample data.|
|Class 15||General discussion||Choose an appropriate method for analyzing more practical (but still basic) data mining problems, and evaluate the obtained analysis.|
To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.
Distributed via OCW-i
Based on reports of classes/exercises and final assignments.
Take a prerequirement exam on "linear algebra", "analysis", and "basic grammar and functions of Python3" in the first class.
Questions can be sent by email (at any time).
Only TAC-MI students can register this course in 2020. XCO.T487: fundamentals of data science and XCO.T488: exercises of fundamentals of data science are duplicated to this class. XCO.T489: fundamentals of artificial intelligence and XCO.T490: exercises of fundamentals of artificial intelligence are recommended to be taken.
Students are required to get Google accounts and to get ready for using functions of "file upload/download" in Google Drive.