2016 | Computational Linguistics

Home
> School of Engineering
> Graduate major in Information and Communications Engineering
> Computational Linguistics

Undergraduate
Graduate

2016　Computational Linguistics

Font size S M L

Academic unit or major: Graduate major in Information and Communications Engineering

Instructor(s): Takamura Hiroya

Class Format: Lecture

Media-enhanced courses

Day/Period(Room No.): Mon3-4(G224) Thr3-4(G224)

Group: -

Course number: ICT.H410

Credits: 2

Academic year: 2016

Offered quarter: 2Q

Syllabus updated: 2016/4/27

Lecture notes updated: -

Language used: English

Access Index

Syllabus

Course description and aims

To make use of huge text data such as web text, automatic processing with computers is necessary. In natural language processing, computers recognize words from text data represented as sequences of characters, find phrases, and estimate the syntactic structures. This course is designed to give students an opportunity to learn the basic idea and knowledge, and the methods especially based on machine learning. Applications and their mathematical models are also topics of this course, such as machine translation, text summarization, and sentiment analysis. Mathematical approaches to language study are also briefly explained.

Student learning outcomes

By the end of this course, students will have acquired the following skills:
(i) read and understand research papers in the natural language processing field
(ii) use basic techniques of natural language processing such as part-of-speech tagging and syntactic parsing
(iii) derive mathematical formula of basic machine learning methods used in natural language processing

Keywords

computational linguistics, natural language processing, machine learning, text mining

Competencies that will be developed

✔ Specialist skills

Intercultural skills

Communication skills

Critical thinking skills

Practical and/or problem-solving skills

Class flow

At the beginning of each class, assignments given in the previous class are reviewed, followed by a lecture.
Homework assignments include reading assignments, exercise problems, and programming assignments.

Course schedule/Required learning

	Course schedule	Required learning
Class 1	Part-of-speech tagging with HMM	Understand the probabilistic model of HMM-based POS tagging and its decoding with dynamic programming.
Class 2	Text classification with naive bayes classifier	Learn the multinomial model and the multi-variate Bernoulli model of naive bayes classifiers and learn the idea of generative models.
Class 3	Basic knowledge of optimization and parameter estimation	Learn the constrained optimization based on the method of Lagrange multipliers and its application to parameter estimation.
Class 4	Mathematical representation of document and classification with support vector machines	Learn the bag-of-words representation of a document and its variant, as well as the classification with support vector machine,
Class 5	Named-entity recognition and dependency parsing with sequential tagging	Understand how the named-entity extraction and dependency analysis are implemented as sequential classification.
Class 6	Probabilistic model for sequential tagging	Understand the log-linear model and its variant for sequence data: conditional random fields.
Class 7	Text summarization	Learn the basic knowledge on text summarization and understand the importance of optimization problems in this task.
Class 8	Methods for text clustering	Learn k-means clustering, Gaussian mixture clustering, EM algorithm, probabilistic latent semantic analysis.
Class 9	Generative models of documents	Understand the latent Dirichlet allocation and the Gibbs sampling for it.
Class 10	Language resources and algorithm implementation	Obtain the knowledge of various language resources and tools and learn how to use them.
Class 11	Sophisticated methods for representing words, sentences, and documents	Learn the distributed representations of words, sentences and documents.
Class 12	Sentiment analysis of text	Learn various tasks and their methods for sentiment analysis of text.
Class 13	Machine translation	Learn about the IBM model, which is a statistical machine translation model, and understand the basic part of its algorithm.
Class 14	Basic knowledge for language study	Learn the computational methods that are used for language study and the research areas for which computational methods are useful.
Class 15	Mathematical methods for language study	Learn the instances of language studies with computational approaches.

Textbook(s)

None.

Reference books, course materials, etc.

None.

Assessment criteria and methods

Students' knowledge and practical skills of natural language processing and mathematical models for language will be assessed.
Exercise problems 40%, term paper 60%.

Related courses

ICT.H508 ： Language Engineering
ART.T459 ： Natural Language Processing
ICT.S311 ： Machine Learning (ICT)
ART.T458 ： Machine Learning

Prerequisites (i.e., required knowledge, skills, courses, etc.)

None.

TOKYO INSTITUTE OF TECHNOLOGY