Share Course Ware
Natural Sciences > Biology > Foundations of Computational and Systems Biology
 Foundations of Computational and Systems Biology  posted by  duggu   on 12/9/2007  Add Courseware to favorites Add To Favorites  
Further Reading
More Options

Burge, Christopher, Michael Yaffe, Peter Woolf, and Amy Keating, 7.91J Foundations of Computational and Systems Biology, Spring 2004. (Massachusetts Institute of Technology: MIT OpenCourseWare), (Accessed 07 Jul, 2010). License: Creative Commons BY-NC-SA

Gibbs Sampler - Strong Motif example.

Gibbs Sampler - Strong Motif example. (Figure by Prof. Chris Burge.)

Course Highlights

The MIT Initiative in Computational and Systems Biology (CSBi) is a campus-wide research and education program that links biology, engineering, and computer science in a multidisciplinary approach to the systematic analysis and modeling of complex biological phenomena. This course is one of a series of core subjects offered through the CSB Ph.D program, for students with an interest in interdisciplinary training and research in the area of computational and systems biology.

This course site includes an extensive listing of bioinformatics tools with links to online resources in the tools section as well as a full set of lecture notes.

Course Description

Serving as an introduction to computational biology, this course emphasizes the fundamentals of nucleic acid and protein sequence analysis, structural analysis, and the analysis of complex biological systems. The principles and methods used for sequence alignment, motif finding, structural modeling, structure prediction, and network modeling are covered. Students are also exposed to currently emerging research areas in the fields of computational and systems biology.

Technical Requirements

MATLAB® software is required to run the .m files found on this course site. RasMol software or another molecular graphics program (e.g. DeepView, PyMol) is required to view the .pdb files found on this course site. Any number of biological sequence comparison software tools can be used to import the FASTA formatted sequence (.fa) files found on this course site. Use the Python Interpreter to run the .py files found on this course site. Media player software, such as QuickTime® Player, RealOne™ Player, or Windows Media® Player, is required to run the .avi files found on this course site.

*Some translations represent previous versions of courses.



This course is an introduction to computational biology emphasizing the fundamentals of nucleic acid and protein sequence analysis, structural analysis, and also serves as an introduction to the analysis of complex biological systems. It covers principles and methods used for sequence alignment, motif finding, structural modeling, structure prediction, and network modeling. It also includes exposure to currently emerging research areas. The subject is designed for advanced undergraduates and graduate students with strong backgrounds in either molecular biology or computer science, but not necessarily both. Foundational material covering basic programming skills or biological principles will be provided for students, according to their needs.


Some background in basic genetics, biochemistry, and molecular biology is required for this course. We assume that you are familiar with chapters 3, 4, and 6 of the textbook Molecular Biology of the Cell, by Alberts et al. If you are having problems with this introductory material, please see your teaching assistant.


Mount, David W. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press, 2001. ISBN: 0879695978.

Python programming text:

Lutz, Mark, and David Ascher. Learning Python. 2nd ed. Beijing; Cambridge, MA: O'Reilly, 2003. ISBN: 0596002815.

Recommended texts on basic molecular biology, cell biology, and biochemistry:

Watson, James, Tania Baker, Stephen Bell, Alexander Gann, Michael Levine, and Richard Losick. Molecular Biology of the Gene. 5th ed. San Francisco: Addison-Wesley, 2000. ISBN: 0805316434.

Alberts, Bruce, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter. Molecular Biology of the Cell. 4th ed. New York: Garland Science, 2002. ISBN: 0815332181.

Branden, Carl-Ivar, and John Tooze. Introduction to Protein Structure. 2nd ed. New York: Garland Pub., 1991. ISBN: 0815303440.

The following text provides in-depth treatment of molecular mechanics and other modeling methods:

Leach, Andrew R. Molecular Modelling: Principles and Applications. Harlow, England: Longman, 1996. ISBN: 0582239338.

The following text contains more in-depth discussion of many of the algorithms and methods discussed in class:

Durbin, R., S. Eddy, A. Krogh, G. Mitchison. Biological Sequence Analysis. Cambridge, U.K.; New York: Cambridge University Press, 1998. ISBN: 0521620414.

Better than it sounds, and fun too! A very useful statistics introduction or refresher:

Gonick, Larry, and Woollcott Smith. The Cartoon Guide to Statistics. 1st HarperPerennial ed. New York, NY: Harper Perennial, 1993. ISBN: 0062731025.

The following text covers basic statistics, with applications to biology and medicine:

Glantz, S. A. Primer of Biostatistics. 4th ed. New York: McGraw-Hill, Health Professions Division, 1997. ISBN: 0070242682.

Structure of Class

The course is divided into four units. A majority of sessions will consist of lectures by the instructors. On four of the class days we have scheduled literature discussions in order to link material covered in lecture to current research. On each of the literature discussion days, two papers will be assigned. A short take-home quiz covering the main points of the two papers will be available two days before the discussion and will be collected at the beginning of class. This will be followed by an instructor-and/or student-led discussion. The quizzes, as well as participation in discussions, will contribute to your grade (see below).


One biology and two computer science/math recitations are offered on a weekly basis. Students may elect to attend, twice a week, any combination of these recitations.


There will be four written or computer-based homework assignments. These are designed to promote deeper understanding of the algorithms discussed in class and to provide hands-on experience with bioinformatics and computational biology tools. Students may discuss the homework problems amongst themselves, but all students must complete their own assignment. Duplicate or nearly-identical homeworks from different students will not be accepted.

Homework 1 Protein Sequence Analysis
Homework 2 DNA Sequence Analysis
Homework 3 Protein Structure Analysis
Homework 4 Systems Analysis

Homeworks must be turned in at the beginning of class on the due date to be eligible for full credit. Homeworks turned in by Monday at noon following the due date will be eligible for 50% credit. No homeworks will be accepted after this time. No exceptions.


Grades will be assigned based on the following scheme:

Four Homework Assignments (10+10+10+5%) 35%
Three Literature-based Quizzes (5% each) 15%
Two In-class Exams (20% Midterm; 25% Final) 45%
Class Participation 5%

Python Programming and Tutorial

Python and Perl are widely used in bioinformatics and computational biology. The majority of the homework assignments will include problems that involve writing simple programs in the scripting language Python. Because many students may have little or no programming experience, hands-on tutorials in Python will be offered by Dr. Peter Woolf during the second week of classes.

The aim of the Python tutorial is to give students a basic working knowledge of this scripting language. This tutorial is intended for students with little or no programming experience, and will focus on the tools and utilities needed to do research in bioinformatics and computational biology.

The TAs will also provide instruction on writing Python scripts in recitation sections during the first few weeks of class. See your TA if you are still having difficulty after attending all of these sessions and studying in the books recommended above.


1 Introduction/Sequence Comparison and Dynamic Programming  
2 Multiple Sequence Alignments I  
3 Multiple Sequence Alignments II Homework 1 handed out
4 Phylogenetic Analysis  
5 Literature Discussion  
6 Genome Sequencing and DNA Sequence Analysis Homework 1 due
7 DNA Sequence Comparison and Alignment  
8 DNA Motif Modeling and Discovery Homework 2 handed out
9 Markov and Hidden Markov Models for DNA Sequences  
10 DNA Sequence Evolution  
11 RNA Secondary Structure Prediction Homework 2 due
12 Literature Discussion on Predicting the Functions of DNA/RNA Sequences  
13 Midterm Exam – in class – Protein and DNA Sequence Analysis  
14 Protein Secondary Structure Prediction  
15 Introduction to Protein Structure and Classification  
16 Comparing Protein Structures

Molecular Modeling: Methods and Applications
17 Using Computational Methods to Analyze, Predict, and Design Protein Sequences and Structures

Solving Structures using X-ray Crystallographpy and NMR
18 Solving Structures using X-ray Crystallographpy and NMR (cont.)

Homology Modeling
Homework 3 handed out
19 Methods for Protein Structure Prediction: Homology Modeling and Fold Recognition  
20 Threading and ab initio Structure Prediction

Computational Protein Design
Homework 3 due
21 Introduction to Systems Biology Homework 4 handed out
22 Feedback Systems and Coupled Differential Equations  
23 DNA Microarrays and Clustering  
24 Literature Discussion on DNA Microarrays and Clustering Homework 4 due
(a day after Lec #24)
25 Computational Annotation of the Proteome  
26 Literature Discussion on Computational Annotation of the Proteome     Tell A Friend