In this lecture, we will focus on the latest technological advancements and fundamental theories in speech recognition, synthesis, and understanding. In particular, we will deepen our understanding of modern probabilistic and statistical models, including neural networks, and explain in detail how these are applied in speech information processing. Additionally, we will cover from the basic principles of speech recognition and synthesis to the latest research trends, such as graphical models, Markov decision processes, and advanced concepts in reinforcement learning. We will also explore how these technologies are integrated into human interfaces and other application areas.
Speech communication is a fundamental activity in our daily lives, and the engineering effort to mimic it requires complex information processing technologies. Modern speech information processing technologies have the potential to match or even surpass human capabilities. Through this lecture, students will experience the complexity of human speech communication and the excitement of attempting to engineer its replication.
By the end of this course, students will be able to:
1) Make speech models based on statistical methods
2) Induce algorithms for model training and inferences
3) Explain how speech recognition and synthesis systems are organized
4) Explain the relationship between the mechanism of speech communication and speech processing systems
5) Explain the organization of speech based human interface
6) Formulate some basic problems of signal processing for speech signals
speech recognition, speech synthesis, human interface, speech production mechanism, auditory mechanism, acoustic model, language model, graphical model, Bayesian inference, artificial neural network, spoken language acquisition
✔ Specialist skills | Intercultural skills | Communication skills | Critical thinking skills | ✔ Practical and/or problem-solving skills |
Students will work on exercises to review the lectures and submit answer reports through T2SCHOLA. Students are expected to prepare for the class checking the course schedule.
Course schedule | Required learning | |
---|---|---|
Class 1 | Speech Communication and Speech Interface Systems | Explain the basic concepts and historical development of speech communication and speech interface systems |
Class 2 | Signal Analysis | Explain the fundamental principles of sampling, Linear Time-Invariant (LTI) systems, Fourier Transform, and Z-Transform, and their applications to speech signals |
Class 3 | Parametric and non-parametric speech representations | Explain the parametric and non-parametric representations of speech, including LPC, Parcor, concatenated acoustic tube model, and cepstrum |
Class 4 | Basics of Probability Distributions | Explain the basics of probability distributions |
Class 5 | Principles of Speech Recognition and Synthesis | Explain the fundamental principles of speech recognition and synthesis and the main technical challenges in their implementation |
Class 6 | Graphical model | Explain the basic concepts of graphical models, including Bayesian networks, factor graphs, and d-separation |
Class 7 | Markov and Hidden Markov models | Explain the definitions of Markov models and Hidden Markov models and how they are applied in speech and language information processing |
Class 8 | Viterbi and Message-Passing Algorithms | Explain the principles of the Viterbi algorithm and message-passing algorithms for efficient computation on HMMs and graphical models |
Class 9 | Bayesian Estimation | Explain the basics of Bayesian inference, including conjugate priors, variational Bayes, and sampling |
Class 10 | Basics of Neural Networks | Explain the basic structures and functions of multilayer perceptrons, feedforward networks, and recurrent networks. |
Class 11 | Neural Network-based Speech Recognition | Explain the fundamental principles and recent advances in neural network-based speech recognition |
Class 12 | Neural Network-based Speech Synthesis | Explain the fundamental principles and recent advances in neural network-based speech synthesis |
Class 13 | Markov Decision Processes and Reinforcement Learning | Explain the basic concepts of Markov decision processes and reinforcement learning |
Class 14 | Dialogue Systems and Spoken Language Acquisition | Explain the fundamental principles of dialogue systems and spoken language acquisition |
To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.
Handouts are distributed
C. Bishop, "Pattern Recognition and Machine Learning," Springer, ISBN-13: 978-0387310732
L. R. Rabiner, B. H. Juang, "Fundamentals of Speech Recognition," Prentice Hall, ISBN-13: 978-0130151575
X. Huang, A. Acero, H.-W. Hon, "Spoken Language Processing," Prentice Hall, ISBN-13: 978-0130226167
Evaluate the student's understandings about speech recognition, speech synthesis, speech signal processing, and statistical models used in there.
Report is 40% and the final exam is 60%.
Students are required to have knowledge that corresponds to the following classes.
LAS.M102 : Linear Algebra I / Recitation
LAS.M101 : Calculus I / Recitation
ICT.S206 : Signal and System Analysis