2024 | Speech Information Technology

Home
> School of Engineering
> Graduate major in Information and Communications Engineering
> Speech Information Technology

Undergraduate
Graduate

2024　Speech Information Technology

Font size S M L

Academic unit or major: Graduate major in Information and Communications Engineering

Instructor(s): Shinozaki Takahiro

Class Format: Lecture (Face-to-face)

Media-enhanced courses

Day/Period(Room No.): Tue1-2(G1-103 (G114)) Fri1-2(G1-103 (G114))

Group: -

Course number: ICT.H503

Credits: 2

Academic year: 2024

Offered quarter: 1Q

Syllabus updated: 2024/3/14

Lecture notes updated: -

Language used: English

Access Index

Syllabus

Course description and aims

In this lecture, we will focus on the latest technological advancements and fundamental theories in speech recognition, synthesis, and understanding. In particular, we will deepen our understanding of modern probabilistic and statistical models, including neural networks, and explain in detail how these are applied in speech information processing. Additionally, we will cover from the basic principles of speech recognition and synthesis to the latest research trends, such as graphical models, Markov decision processes, and advanced concepts in reinforcement learning. We will also explore how these technologies are integrated into human interfaces and other application areas.

Speech communication is a fundamental activity in our daily lives, and the engineering effort to mimic it requires complex information processing technologies. Modern speech information processing technologies have the potential to match or even surpass human capabilities. Through this lecture, students will experience the complexity of human speech communication and the excitement of attempting to engineer its replication.

Student learning outcomes

By the end of this course, students will be able to:
1) Make speech models based on statistical methods
2) Induce algorithms for model training and inferences
3) Explain how speech recognition and synthesis systems are organized
4) Explain the relationship between the mechanism of speech communication and speech processing systems
5) Explain the organization of speech based human interface
6) Formulate some basic problems of signal processing for speech signals

Keywords

speech recognition, speech synthesis, human interface, speech production mechanism, auditory mechanism, acoustic model, language model, graphical model, Bayesian inference, artificial neural network, spoken language acquisition

Competencies that will be developed

✔ Specialist skills

Intercultural skills

Communication skills

Critical thinking skills

✔ Practical and/or problem-solving skills

Class flow

Students will work on exercises to review the lectures and submit answer reports through T2SCHOLA. Students are expected to prepare for the class checking the course schedule.

Course schedule/Required learning

	Course schedule	Required learning
Class 1	Speech Communication and Speech Interface Systems	Explain the basic concepts and historical development of speech communication and speech interface systems
Class 2	Signal Analysis	Explain the fundamental principles of sampling, Linear Time-Invariant (LTI) systems, Fourier Transform, and Z-Transform, and their applications to speech signals
Class 3	Parametric and non-parametric speech representations	Explain the parametric and non-parametric representations of speech, including LPC, Parcor, concatenated acoustic tube model, and cepstrum
Class 4	Basics of Probability Distributions	Explain the basics of probability distributions
Class 5	Principles of Speech Recognition and Synthesis	Explain the fundamental principles of speech recognition and synthesis and the main technical challenges in their implementation
Class 6	Graphical model	Explain the basic concepts of graphical models, including Bayesian networks, factor graphs, and d-separation
Class 7	Markov and Hidden Markov models	Explain the definitions of Markov models and Hidden Markov models and how they are applied in speech and language information processing
Class 8	Viterbi and Message-Passing Algorithms	Explain the principles of the Viterbi algorithm and message-passing algorithms for efficient computation on HMMs and graphical models
Class 9	Bayesian Estimation	Explain the basics of Bayesian inference, including conjugate priors, variational Bayes, and sampling
Class 10	Basics of Neural Networks	Explain the basic structures and functions of multilayer perceptrons, feedforward networks, and recurrent networks.
Class 11	Neural Network-based Speech Recognition	Explain the fundamental principles and recent advances in neural network-based speech recognition
Class 12	Neural Network-based Speech Synthesis	Explain the fundamental principles and recent advances in neural network-based speech synthesis
Class 13	Markov Decision Processes and Reinforcement Learning	Explain the basic concepts of Markov decision processes and reinforcement learning
Class 14	Dialogue Systems and Spoken Language Acquisition	Explain the fundamental principles of dialogue systems and spoken language acquisition

Out-of-Class Study Time (Preparation and Review)

To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.

Textbook(s)

Handouts are distributed

Reference books, course materials, etc.

C. Bishop, "Pattern Recognition and Machine Learning," Springer, ISBN-13: 978-0387310732
L. R. Rabiner, B. H. Juang, "Fundamentals of Speech Recognition," Prentice Hall, ISBN-13: 978-0130151575
X. Huang, A. Acero, H.-W. Hon, "Spoken Language Processing," Prentice Hall, ISBN-13: 978-0130226167

Assessment criteria and methods

Evaluate the student's understandings about speech recognition, speech synthesis, speech signal processing, and statistical models used in there.
Report is 40% and the final exam is 60%.

Related courses

ICT.H410 ： Computational Linguistics
ICT.H416 ： Statistical Theories for Brain and Parallel Computing
ICT.H508 ： Language Engineering

Prerequisites (i.e., required knowledge, skills, courses, etc.)

Students are required to have knowledge that corresponds to the following classes.
LAS.M102 ： Linear Algebra I / Recitation
LAS.M101 ： Calculus I / Recitation
ICT.S206 ： Signal and System Analysis

TOKYO INSTITUTE OF TECHNOLOGY