2017 | Biological Data Analysis

Home
> School of Computing
> Undergraduate major in Computer Science
> Biological Data Analysis

Undergraduate
Graduate

2017　Biological Data Analysis

Font size S M L

Academic unit or major: Undergraduate major in Computer Science

Instructor(s): Akiyama Yutaka Yamamura Masayuki

Class Format: Lecture

Media-enhanced courses

Day/Period(Room No.): Tue7-8(H121) Fri7-8(H121)

Group: -

Course number: CSC.T353

Credits: 2

Academic year: 2017

Offered quarter: 2Q

Syllabus updated: 2017/3/17

Lecture notes updated: -

Language used: Japanese

Access Index

Syllabus

Course description and aims

This course focuses on data representation methods and comparative and knowledge extraction algorithms for massive biological data. Topics include pairwise sequence alignment, dynamic programming, multiple sequence alignment, phylogenetic tree estimation, approximated methods, sequential motif representation, rapid homology search techniques versus large-scale database, protein structure modeling and structure prediction, and so on. No biological nor biocheical knowledge are prerequisite. The basic biology notions are introduced within the course, and students are required to consider the topics in the view of computational algorithms and its complexity.
Biological information analysis is significantly important for our society in the 21st century in order to improve our quality of life, environment, and safety. Thus this course is aiming at providing the fundamental understanding to the nature of biological data and typical algorithms, like as dynamic programming, repeatedly used in this area. On the other hand, most of the methods explained in the course are also applicable to wide range of engineering subjects. We aim to provide this course to students as an illustrative example how computer science techniques are applied in a specific real-world problem.

Student learning outcomes

By the end of this course, students will be able to:
1) Explain several data representation for sequence analysis (e.g. regular expression, profile matrix, HMM).
2) Explain the notion and implementation of dynamic programing, as well as its several applications in bioinfomatics.
3) Explain the important role of approximated methods in multiple sequence alignment and phylogenetic tree estimation, in terms of computational complexity.
4) Explain the notion of e-value and p-value in homology search against a large database, and compute the values.
5) Explain several algorithmic techniques to make faster homology search against a large database, and
6) Explain protein tertiary structure model and structure prediction methods.

Keywords

biological information, algorithm, complexity, dynamic programming, hidden Markov model, gene, protein

Competencies that will be developed

✔ Specialist skills

Intercultural skills

Communication skills

✔ Critical thinking skills

Practical and/or problem-solving skills

Class flow

Each class starts from explanation of new topic (through notion, example, systems, applicational importance, etc.). At the end of class, students are given exercise problems related to the lecture given that day to solve.

Course schedule/Required learning

	Course schedule	Required learning
Class 1	Biological information and its importance 　- Computational view of, gene, genome, protein, cell, and body	Understand hierarchical system, scale comparison, and information flow
Class 2	Global sequence alignment 　- Optimal path search, dynamic programmig, global sequence alignment	Calculate global sequence alignment based on dynamic programming
Class 3	Local sequence alignment - Protein, amino acids, local sequence alignment	Calculate local sequence alignment based on dynamic programming
Class 4	Multiple sequence alignment 　- Complexity of multiple alignment, heuristic methods	Calculate multiple sequence alignment based on star method or tree-based method
Class 5	Phylogenetic tree estimation 　－Distance matrix method，character state method，bootstrap evaluation	Calculate phylogenetic tree based on UPGMA method or NJ method
Class 6	Homology search against database 　－Amino acid mutation matrix，hit significance，e-value，bit score, p-value	Calculate e-value and p-value for a hit in homology search
Class 7	Faster methods for homology search 　－FASTA，BLAST，PSI-BLAST	Build a k-mer index table for faster similarity search
Class 8	Motif representation and extraction 　－Regular expression, profile matrix, hidden Markov model	Understand several representation methods for sequence motif
Class 9	Probabilistic modeling for sequence analysis 　－Coding region prediction, Markov model	Understand roles of probabilistic modeling for sequence analysis
Class 10	RNA secondary structure prediction 　－RNA secondary structure，Nussinov algorithm，Zuker method	Calculate RNA secondary structure prediction based on dynamic programing
Class 11	Genome-wide sequence analysis 　－Requirement for further speed-up, BLAT，RMAP	Understand approximate methods for rapid sequence analysis
Class 12	Protein secondary structure prediction 　－Protein secondary structure，DSSP code，neural net，PSI-PRED	Understand protein secondary structure prediction methods
Class 13	Protein tertiary structure prediction 　－Homology modeling, fold recognition, fragment assembly method	Understand protein tertiary structure prediction methods
Class 14	Gene expression and gene regulartory network estimation 　－Gene expression，DNA microarray，gene regulatory network	Understand gene regulatory network estimation methods
Class 15	Chemical compound metabolism and metabolic network estimation 　－Compound metabolism，metabolic network	Understand metabolic pathway network estimation methods

Textbook(s)

Original slides by Akiyama and Yamamura are provided.

Reference books, course materials, etc.

(Ed. Japanese Society of Bioinformatics). Introduction to Bioinformatics. Tokyo: Keio University Press; ISBN：978-4-7664-2251-1. (Japanese)

Assessment criteria and methods

Students' knowledge of data representations, algorithms, and applications in biological information analysis, and their ability to apply them to problems will be assessed.
Final exams 80%, exercise problems 20%.

Related courses

none

Prerequisites (i.e., required knowledge, skills, courses, etc.)

none

TOKYO INSTITUTE OF TECHNOLOGY