This course covers the fundamentals of probability theory and statistics for data science. Topics in probability theory include discrete and continuous random variables, probability rules, expected value, correlation, important probability distributions, and definition and use of moment generating functions. This course covers both frequentist statistics (sampling distributions, confidence intervals, significance testing) and Bayesian statistics (Bayes' theorem, Bayesian analysis, credibility intervals). This course also addresses the fundamentals of statistical learning theory. The goal of this course is that the students acquire a solid statistical background that enables them to use the appropriate statistical methodologies and tools to analyze data scientifically. To achieve this goal, the course includes many real-world examples from engineering and the sciences.
After successful completion of this course, the students will
(1) understand the statistical fundamentals of data science;
(2) be able to analyze data scientifically;
(3) be able to communicate analytical results in an interdisciplinary environment.
Bayes' theorem; Bayesian hypothesis test; Beta function; binomial probability distribution; conditional probability; confidence interval; credibility interval; data science; expected value; Gamma function; hypothesis testing; joint probability; marginal probability; naive Bayes classifier; probability distribution; power; p-value; random variable; sampling distribution; significance testing.
✔ Specialist skills | Intercultural skills | Communication skills | Critical thinking skills | ✔ Practical and/or problem-solving skills |
Classes usually begin with a real-world example to motivate a statistical concept. This concept is then formally described, and mathematical proofs are given where appropriate. Then, we will solve the real-world problem together.
Course schedule | Required learning | |
---|---|---|
Class 1 | Course overview; fundamentals of data science; covariance, correlation, regression | None. |
Class 2 | Conditional probability; Bayes' theorem | Revise contents of previous class; complete assignment |
Class 3 | Discrete random variables; expected value; moment generating functions | Revise contents of previous class; complete assignment |
Class 4 | Gamma function; binomial probability distributions | Revise contents of previous class; complete assignment |
Class 5 | Continuous random variables; continuous probability distribution; normal distributions | Revise contents of previous class; complete assignment |
Class 6 | Distribution of functions of random variables; sampling distributions | Revise contents of previous class; complete assignment |
Class 7 | Point estimates and confidence intervals | Revise contents of previous class; complete assignment |
Class 8 | Student's t-distribution | Revise contents of previous class; complete assignment |
Class 9 | Significance testing; p-value | Revise contents of previous class; complete assignment |
Class 10 | 優位性検定 | Revise contents of previous class; complete assignment |
Class 11 | Beta function; Bayesian analysis | Revise contents of previous class; complete assignment |
Class 12 | Bayesian data analysis [1/2] | Revise contents of previous class; complete assignment |
Class 13 | Bayesian data analysis [2/2] | Revise contents of previous class; complete assignment |
Class 14 | Fundamentals of statistical learning theory | Revise contents of previous class; complete assignment |
None required. Course materials are provided during class.
Bertsekas D.P. and Tsitsiklis J.N. (2008) Introduction to Probability. Athena Scientific; 2nd edition.
Kruschke J. (2014) Doing Bayesian Data Analysis. Academic Press, 2nd edition.
Students' course grades will be based on the final exam.
Knowledge of elementary algebra and calculus is required.