### 2020　Statistics for Data Science

Font size  SML

Academic unit or major
School of Engineering
Instructor(s)
Berrar Daniel
Course component(s)
Lecture
Mode of instruction
ZOOM
Day/Period(Room No.)
()
Group
A
Course number
XEG.G301
Credits
2
Academic year
2020
Offered quarter
3Q
Syllabus updated
2020/6/7
Lecture notes updated
-
Language used
English
Access Index

### Course description and aims

This course covers the fundamentals of probability theory and statistics for data science. Topics in probability theory include discrete and continuous random variables, probability rules, expected value, correlation, important probability distributions, and definition and use of moment generating functions. This course covers both frequentist statistics (sampling distributions, confidence intervals, significance testing) and Bayesian statistics (Bayes' theorem, Bayesian analysis, credibility intervals). This course also addresses the fundamentals of statistical learning theory. The goal of this course is that the students acquire a solid statistical background that enables them to use the appropriate statistical methodologies and tools to analyze data scientifically. To achieve this goal, the course includes many real-world examples from engineering and the sciences.

### Student learning outcomes

After successful completion of this course, the students will
(1) understand the statistical fundamentals of data science;
(2) be able to analyze data scientifically;
(3) be able to communicate analytical results in an interdisciplinary environment.

### Keywords

Bayes' theorem; Bayesian hypothesis test; Beta function; binomial probability distribution; conditional probability; confidence interval; credibility interval; data science; expected value; Gamma function; hypothesis testing; joint probability; marginal probability; naive Bayes classifier; probability distribution; power; p-value; random variable; sampling distribution; significance testing.

### Competencies that will be developed

 ✔ Specialist skills Intercultural skills Communication skills Critical thinking skills ✔ Practical and/or problem-solving skills

### Class flow

Classes usually begin with a real-world example to motivate a statistical concept. This concept is then formally described, and mathematical proofs are given where appropriate. Then, we will solve the real-world problem together.

### Course schedule/Required learning

Course schedule Required learning
Class 1 Course overview; fundamentals of data science; covariance, correlation, regression None.
Class 2 Conditional probability; Bayes' theorem Revise contents of previous class; complete assignment
Class 3 Discrete random variables; expected value; moment generating functions Revise contents of previous class; complete assignment
Class 4 Gamma function; binomial probability distributions Revise contents of previous class; complete assignment
Class 5 Continuous random variables; continuous probability distribution; normal distributions Revise contents of previous class; complete assignment
Class 6 Distribution of functions of random variables; sampling distributions Revise contents of previous class; complete assignment
Class 7 Point estimates and confidence intervals Revise contents of previous class; complete assignment
Class 8 Student's t-distribution Revise contents of previous class; complete assignment
Class 9 Significance testing; p-value Revise contents of previous class; complete assignment
Class 10 優位性検定 Revise contents of previous class; complete assignment
Class 11 Beta function; Bayesian analysis Revise contents of previous class; complete assignment
Class 12 Bayesian data analysis [1/2] Revise contents of previous class; complete assignment
Class 13 Bayesian data analysis [2/2] Revise contents of previous class; complete assignment
Class 14 Fundamentals of statistical learning theory Revise contents of previous class; complete assignment

### Textbook(s)

None required. Course materials are provided during class.

### Reference books, course materials, etc.

Bertsekas D.P. and Tsitsiklis J.N. (2008) Introduction to Probability. Athena Scientific; 2nd edition.

Kruschke J. (2014) Doing Bayesian Data Analysis. Academic Press, 2nd edition.

### Assessment criteria and methods

Students' course grades will be based on the final exam.

### Related courses

• XCO.T483 ： Advanced Artificial Intelligence and Data Science A
• IEE.A205 ： Statistics for Industrial Engineering and Economics
• ICT.M202: Probability and Statistics (ICT)
• XCO.T487 ： Fundamentals of data science

### Prerequisites (i.e., required knowledge, skills, courses, etc.)

Knowledge of elementary algebra and calculus is required.