This course introduces basic knowledge of machine learning and data mining.
[Goal] The goal of this course is to learn basic concepts and methods (such as classification, association, clustering and numeric prediction) for mining huge data observed in real world. Basic concepts and methods in machine learning and data mining are explained, and tools for promoting understanding are also introduced in this course.
[Theme] The following themes are mainly explained: input and output data formats for machine learning, machine learning algorithms, evaluation methods for learning algorithms, and methods for handling missing / noisy data in real world.
attribute, instance, bias, overfitting, missing value, exception, supervised learning, unsupervised learning, decision tree, information gain, pruning, classification, naïve bayes, association rule, apriori algorithm, numeric prediction, regression, instance-based learning, clustering, k-means algorithm, hierarchical clustering, support, confidence, cross-validation, bootstrap, significance test, confusion matrix, ROC curve, MDL principle, support vector machine, EM algorithm
|Intercultural skills||Communication skills||Specialist skills||Critical thinking skills||Practical and/or problem-solving skills|
Tools for machine learning are explained in the lectures to promote understanding.
|Course schedule||Required learning|
|Class 1||introduction||machine learning and data mining, and a tool (Weka)|
|Class 2||concept description, applications of machine learning||Representations and applications of machine learning|
|Class 3||concept space, biases||biases of each representation|
|Class 4||input data format, classification, association||input format of classification and association|
|Class 5||clustering, numeric prediction||input format of clustering and numeric prediction|
|Class 6||attribute types and their transformations||characteristics of attributes for representation|
|Class 7||knowledge representation, decision tree, classification rule||representation of decision tree and classification rule|
|Class 8||association rule, instance-based representation||representation of association rule and instance-based learning|
|Class 9||basic learning algorithm, Naive bayes||Naive bayes learning|
|Class 10||decision tree, information gain, gain ratio||decision tree learning|
|Class 11||covering algorithm, rule and decision tree||decision rule learning and comparison with decision tree|
|Class 12||evaluation of learning methods, cross validation||evaluation of learning and usage of data|
|Class 13||tistatistic, minimum description length||comparison of machine learning methods|
|Class 14||ROC curve, recall, precision||evaluation metrics for learning methods|
|Class 15||support vector machine, EM algorithm||SVM and EM algorithm|
Data Mining: Practical Machine Learning Tools and Techniques (Third Edition)
I. H. Witten, E. Frank, Morgan Kaufmann, 2011.
Specified in the class.
Course scores are based on assignments(70%) and quizzes(30%).