2020 Speech Information Technology

Font size  SML

Register update notification mail Add to favorite lecture list
Academic unit or major
Graduate major in Information and Communications Engineering
Shinozaki Takahiro 
Class Format
Lecture    (ZOOM)
Media-enhanced courses
Day/Period(Room No.)
Tue1-2(G224)  Fri1-2(G224)  
Course number
Academic year
Offered quarter
Syllabus updated
Lecture notes updated
Language used
Access Index

Course description and aims

This course focuses on automatic speech recognition and synthesis. Topics include signal processing, human interface, and statistical models. The statistical models are key components of today's speech processing technology. Particularly, hidden Markov model, graphical model, N-gram, weighted finite state transducer, and artificial neural network are explained in details. By combining lectures and exercises, the course enables students to understand and acquire the fundamentals of the up-to-date speech processing techniques.
Speech communication is natural to us humans, and we speak and listen to a lot of utterances in our daily lives. It is so easy, and we rarely consider how we do it. However, it requires highly complicated processing from the engineering point of view. Even today’s up-to-date systems have only limited performance compared to humans. Although, based on the accumulated research effort, some systems are recently giving a comparable performance in some specific conditions. Through this course, students will learn how our speech communication is sophisticated. At the same time, students will have some concrete idea about how to challenge it using the learned techniques as the clues.

Student learning outcomes

By the end of this course, students will be able to:
1) Make speech models based on statistical methods
2) Induce algorithms for model training and inferences
3) Explain how speech recognition and synthesis systems are organized
4) Explain the relationship between the mechanism of speech communication and speech processing systems
5) Explain the organization of speech based human interface
6) Formulate some basic problems of signal processing for speech signals


speech recognition, speech synthesis, speech enhancement, human interface, speech production mechanism, auditory mechanism, hidden Markov model, N-gram, weighted finite state transducer, graphical model, Bayesian inference, artificial neural network, adaptation

Competencies that will be developed

Specialist skills Intercultural skills Communication skills Critical thinking skills Practical and/or problem-solving skills

Class flow

Students will work on exercises to review the lectures and submit answer reports through OCW. Students are expected to prepare for the class checking the course schedule.

Course schedule/Required learning

  Course schedule Required learning
Class 1 Speech and human interface Explain human interface using speech
Class 2 Speech production mechanism, auditory mechanism, and phonology Explain how human speech communication is realized
Class 3 Waveform Coding and speech signal analysis Explain techniques for representing and analyzing sound signals
Class 4 Parametric representation of speech signals Explain Parametric representation method of speech signals
Class 5 Principle of speech recognition Explain the principle of statistical automatic speech recognition
Class 6 Hidden Markov model Explain the definition of hidden Markov model. Explain its training and inference algorithms
Class 7 Word network and N-gram Explain probabilistic models for word sequences
Class 8 Weighted finite state transducer Explain weighted finite state transducers that can express various probabilistic models in a systematic manner
Class 9 Graphical model Explain graphical model which gives diagrammatic representations of probability distributions
Class 10 Bayesian inference Explain Bayesian inference and its application to some problems
Class 11 Artificial neural network Explain artificial neural networks about their structure, and learning and inference algorithms
Class 12 Statistical speech synthesis Explain speech synthesis techniques based on statistical modeling approach
Class 13 Adaptation techniques Explain adaptation techniques that address the variations due to speakers and environments
Class 14 Speech enhancement and its application Explain noise reduction techniques and their applications to noise robust speech recognition

Out-of-Class Study Time (Preparation and Review)

To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.


Handouts are distributed

Reference books, course materials, etc.

C. Bishop, "Pattern Recognition and Machine Learning," Springer, ISBN-13: 978-0387310732
L. R. Rabiner, B. H. Juang, "Fundamentals of Speech Recognition," Prentice Hall, ISBN-13: 978-0130151575
X. Huang, A. Acero, H.-W. Hon, "Spoken Language Processing," Prentice Hall, ISBN-13: 978-0130226167

Assessment criteria and methods

Evaluate the student's understandings about speech recognition, speech synthesis, speech signal processing, and statistical models used in there.
Report is 40% and the final exam is 60%.
The final exam may be replaced with a final report if the situation does not permit students to come to University, in which case the evaluation is only based on reports.

Related courses

  • ICT.H410 : Computational Linguistics
  • ICT.H416 : Statistical Theories for Brain and Parallel Computing
  • ICT.H508 : Language Engineering

Prerequisites (i.e., required knowledge, skills, courses, etc.)

Students are required to have knowledge that corresponds to the following classes.
LAS.M102 : Linear Algebra I / Recitation
LAS.M101 : Calculus I / Recitation
ICT.S206 : Signal and System Analysis

Page Top