This course focuses on automatic speech recognition and synthesis. Topics include signal processing, human interface, and statistical models. The statistical models are key components of today's speech processing technology. Particularly, hidden Markov model, graphical model, N-gram, weighted finite state transducer, and artificial neural network are explained in details. By combining lectures and exercises, the course enables students to understand and acquire the fundamentals of the up-to-date speech processing techniques.
Speech communication is natural to us humans, and we speak and listen to a lot of utterances in our daily lives. It is so easy, and we rarely consider how we do it. However, it requires highly complicated processing from the engineering point of view. Even today’s up-to-date systems have only limited performance compared to humans. Although, based on the accumulated research effort, some systems are recently giving a comparable performance in some specific conditions. Through this course, students will learn how our speech communication is sophisticated. At the same time, students will have some concrete idea about how to challenge it using the learned techniques as the clues.
By the end of this course, students will be able to:
1) Make speech models based on statistical methods
2) Induce algorithms for model training and inferences
3) Explain how speech recognition and synthesis systems are organized
4) Explain the relationship between the mechanism of speech communication and speech processing systems
5) Explain the organization of speech based human interface
6) Formulate some basic problems of signal processing for speech signals
speech recognition, speech synthesis, speech enhancement, human interface, speech production mechanism, auditory mechanism, hidden Markov model, N-gram, weighted finite state transducer, graphical model, Bayesian inference, artificial neural network, adaptation
✔ Specialist skills | Intercultural skills | Communication skills | Critical thinking skills | ✔ Practical and/or problem-solving skills |
At the beginning of each class, answers are given for the exercises given in the previous class. At the end of each class, students will work on exercises related to the lecture on that day. Students are expected to prepare for the class checking the course schedule. Reviewing the contents of the class is also very important.
Course schedule | Required learning | |
---|---|---|
Class 1 | Speech and human interface | Explain human interface using speech |
Class 2 | Speech production mechanism, auditory mechanism, and phonology | Explain how human speech communication is realized |
Class 3 | Speech signal processing and parametric representation | Explain how to extract speech features from waveform speech signals |
Class 4 | Principle of speech recognition | Explain the principle of statistical automatic speech recognition |
Class 5 | Hidden Markov model | Explain the define of hidden Markov model. Explain its training and inference algorithms |
Class 6 | Word network and N-gram | Explain probabilistic models for word sequence modeling |
Class 7 | Weighted finite state transducer | Explain weighted finite state transducers that can express various probabilistic models in a systematic manner |
Class 8 | Graphical model | Explain graphical model which gives diagrammatic representations of probability distributions |
Class 9 | Bayesian inference | Explain Bayesian inference and its application to some specific problems |
Class 10 | Artificial neural network | Explain artificial neural networks about their structure, and learning and inference algorithms |
Class 11 | Adaptation techniques | Explain adaptation techniques that compensate for variations due to speakers and environments |
Class 12 | Speech enhancement and its application | Explain noise reduction techniques for degraded speech and their applications to noise robust speech recognition |
Class 13 | Basics of speech synthesis | Understand how to generate speech waveforms from acoustic features |
Class 14 | Statistical speech synthesis | Explain text-to-speech synthesis techniques based on statistical modeling approach |
Class 15 | Speech coding | Explain techniques for efficiently representing and restoring speech signals from the viewpoint of data compression |
Handouts are distributed
C. Bishop, "Pattern Recognition and Machine Learning," Springer, ISBN-13: 978-0387310732
L. R. Rabiner, B. H. Juang, "Fundamentals of Speech Recognition," Prentice Hall, ISBN-13: 978-0130151575
X. Huang, A. Acero, H.-W. Hon, "Spoken Language Processing," Prentice Hall, ISBN-13: 978-0130226167
Evaluate the student's understandings about speech recognition, speech syntesis, speech signal processing, and statistical models used in there.
Report is 40% and the final exam is 60%.
Students are required to have knowledge that corresponds to the following classes.
LAS.M102 : Linear Algebra I / Recitation
LAS.M101 : Calculus I / Recitation
ICT.S206 : Signal and System Analysis