2024 Speech Information Technology

Font size  SML

Register update notification mail Add to favorite lecture list
Academic unit or major
Graduate major in Information and Communications Engineering
Instructor(s)
Shinozaki Takahiro 
Class Format
Lecture    (Face-to-face)
Media-enhanced courses
Day/Period(Room No.)
Tue1-2(G1-103 (G114))  Fri1-2(G1-103 (G114))  
Group
-
Course number
ICT.H503
Credits
2
Academic year
2024
Offered quarter
1Q
Syllabus updated
2024/3/14
Lecture notes updated
-
Language used
English
Access Index

Course description and aims

In this lecture, we will focus on the latest technological advancements and fundamental theories in speech recognition, synthesis, and understanding. In particular, we will deepen our understanding of modern probabilistic and statistical models, including neural networks, and explain in detail how these are applied in speech information processing. Additionally, we will cover from the basic principles of speech recognition and synthesis to the latest research trends, such as graphical models, Markov decision processes, and advanced concepts in reinforcement learning. We will also explore how these technologies are integrated into human interfaces and other application areas.

Speech communication is a fundamental activity in our daily lives, and the engineering effort to mimic it requires complex information processing technologies. Modern speech information processing technologies have the potential to match or even surpass human capabilities. Through this lecture, students will experience the complexity of human speech communication and the excitement of attempting to engineer its replication.

Student learning outcomes

By the end of this course, students will be able to:
1) Make speech models based on statistical methods
2) Induce algorithms for model training and inferences
3) Explain how speech recognition and synthesis systems are organized
4) Explain the relationship between the mechanism of speech communication and speech processing systems
5) Explain the organization of speech based human interface
6) Formulate some basic problems of signal processing for speech signals

Keywords

speech recognition, speech synthesis, human interface, speech production mechanism, auditory mechanism, acoustic model, language model, graphical model, Bayesian inference, artificial neural network, spoken language acquisition

Competencies that will be developed

Specialist skills Intercultural skills Communication skills Critical thinking skills Practical and/or problem-solving skills

Class flow

Students will work on exercises to review the lectures and submit answer reports through T2SCHOLA. Students are expected to prepare for the class checking the course schedule.

Course schedule/Required learning

  Course schedule Required learning
Class 1 Speech Communication and Speech Interface Systems Explain the basic concepts and historical development of speech communication and speech interface systems
Class 2 Signal Analysis Explain the fundamental principles of sampling, Linear Time-Invariant (LTI) systems, Fourier Transform, and Z-Transform, and their applications to speech signals
Class 3 Parametric and non-parametric speech representations Explain the parametric and non-parametric representations of speech, including LPC, Parcor, concatenated acoustic tube model, and cepstrum
Class 4 Basics of Probability Distributions Explain the basics of probability distributions
Class 5 Principles of Speech Recognition and Synthesis Explain the fundamental principles of speech recognition and synthesis and the main technical challenges in their implementation
Class 6 Graphical model Explain the basic concepts of graphical models, including Bayesian networks, factor graphs, and d-separation
Class 7 Markov and Hidden Markov models Explain the definitions of Markov models and Hidden Markov models and how they are applied in speech and language information processing
Class 8 Viterbi and Message-Passing Algorithms Explain the principles of the Viterbi algorithm and message-passing algorithms for efficient computation on HMMs and graphical models
Class 9 Bayesian Estimation Explain the basics of Bayesian inference, including conjugate priors, variational Bayes, and sampling
Class 10 Basics of Neural Networks Explain the basic structures and functions of multilayer perceptrons, feedforward networks, and recurrent networks.
Class 11 Neural Network-based Speech Recognition Explain the fundamental principles and recent advances in neural network-based speech recognition
Class 12 Neural Network-based Speech Synthesis Explain the fundamental principles and recent advances in neural network-based speech synthesis
Class 13 Markov Decision Processes and Reinforcement Learning Explain the basic concepts of Markov decision processes and reinforcement learning
Class 14 Dialogue Systems and Spoken Language Acquisition Explain the fundamental principles of dialogue systems and spoken language acquisition

Out-of-Class Study Time (Preparation and Review)

To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.

Textbook(s)

Handouts are distributed

Reference books, course materials, etc.

C. Bishop, "Pattern Recognition and Machine Learning," Springer, ISBN-13: 978-0387310732
L. R. Rabiner, B. H. Juang, "Fundamentals of Speech Recognition," Prentice Hall, ISBN-13: 978-0130151575
X. Huang, A. Acero, H.-W. Hon, "Spoken Language Processing," Prentice Hall, ISBN-13: 978-0130226167

Assessment criteria and methods

Evaluate the student's understandings about speech recognition, speech synthesis, speech signal processing, and statistical models used in there.
Report is 40% and the final exam is 60%.

Related courses

  • ICT.H410 : Computational Linguistics
  • ICT.H416 : Statistical Theories for Brain and Parallel Computing
  • ICT.H508 : Language Engineering

Prerequisites (i.e., required knowledge, skills, courses, etc.)

Students are required to have knowledge that corresponds to the following classes.
LAS.M102 : Linear Algebra I / Recitation
LAS.M101 : Calculus I / Recitation
ICT.S206 : Signal and System Analysis

Page Top