Overview
This course is an introduction to computational biology emphasizing the fundamentals of nucleic acid and protein sequence analysis, structural analysis, and also serves as an introduction to the analysis of complex biological systems. It covers principles and methods used for sequence alignment, motif finding, structural modeling, structure prediction, and network modeling. It also includes exposure to currently emerging research areas. The subject is designed for advanced undergraduates and graduate students with strong backgrounds in either molecular biology or computer science, but not necessarily both. Foundational material covering basic programming skills or biological principles will be provided for students, according to their needs.
Prerequisites
Some background in basic genetics, biochemistry, and molecular biology is required for this course. We assume that you are familiar with chapters 3, 4, and 6 of the textbook Molecular Biology of the Cell, by Alberts et al. If you are having problems with this introductory material, please see your teaching assistant.
Textbooks
Mount, David W. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press, 2001. ISBN: 0879695978.
Python programming text:
Lutz, Mark, and David Ascher. Learning Python. 2nd ed. Beijing; Cambridge, MA: O'Reilly, 2003. ISBN: 0596002815.
Recommended texts on basic molecular biology, cell biology, and biochemistry:
Watson, James, Tania Baker, Stephen Bell, Alexander Gann, Michael Levine, and Richard Losick. Molecular Biology of the Gene. 5th ed. San Francisco: Addison-Wesley, 2000. ISBN: 0805316434.
Alberts, Bruce, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter. Molecular Biology of the Cell. 4th ed. New York: Garland Science, 2002. ISBN: 0815332181.
Branden, Carl-Ivar, and John Tooze. Introduction to Protein Structure. 2nd ed. New York: Garland Pub., 1991. ISBN: 0815303440.
The following text provides in-depth treatment of molecular mechanics and other modeling methods:
Leach, Andrew R. Molecular Modelling: Principles and Applications. Harlow, England: Longman, 1996. ISBN: 0582239338.
The following text contains more in-depth discussion of many of the algorithms and methods discussed in class:
Durbin, R., S. Eddy, A. Krogh, G. Mitchison. Biological Sequence Analysis. Cambridge, U.K.; New York: Cambridge University Press, 1998. ISBN: 0521620414.
Better than it sounds, and fun too! A very useful statistics introduction or refresher:
Gonick, Larry, and Woollcott Smith. The Cartoon Guide to Statistics. 1st HarperPerennial ed. New York, NY: Harper Perennial, 1993. ISBN: 0062731025.
The following text covers basic statistics, with applications to biology and medicine:
Glantz, S. A. Primer of Biostatistics. 4th ed. New York: McGraw-Hill, Health Professions Division, 1997. ISBN: 0070242682.
Structure of Class
The course is divided into four units. A majority of sessions will consist of lectures by the instructors. On four of the class days we have scheduled literature discussions in order to link material covered in lecture to current research. On each of the literature discussion days, two papers will be assigned. A short take-home quiz covering the main points of the two papers will be available two days before the discussion and will be collected at the beginning of class. This will be followed by an instructor-and/or student-led discussion. The quizzes, as well as participation in discussions, will contribute to your grade (see below).
Recitations
One biology and two computer science/math recitations are offered on a weekly basis. Students may elect to attend, twice a week, any combination of these recitations.
Homework
There will be four written or computer-based homework assignments. These are designed to promote deeper understanding of the algorithms discussed in class and to provide hands-on experience with bioinformatics and computational biology tools. Students may discuss the homework problems amongst themselves, but all students must complete their own assignment. Duplicate or nearly-identical homeworks from different students will not be accepted.
Homework 1 |
Protein Sequence Analysis |
Homework 2 |
DNA Sequence Analysis |
Homework 3 |
Protein Structure Analysis |
Homework 4 |
Systems Analysis |
|
Homeworks must be turned in at the beginning of class on the due date to be eligible for full credit. Homeworks turned in by Monday at noon following the due date will be eligible for 50% credit. No homeworks will be accepted after this time. No exceptions.
Grading
Grades will be assigned based on the following scheme:
Four Homework Assignments (10+10+10+5%) |
35% |
Three Literature-based Quizzes (5% each) |
15% |
Two In-class Exams (20% Midterm; 25% Final) |
45% |
Class Participation |
5% |
|
Python Programming and Tutorial
Python and Perl are widely used in bioinformatics and computational biology. The majority of the homework assignments will include problems that involve writing simple programs in the scripting language Python. Because many students may have little or no programming experience, hands-on tutorials in Python will be offered by Dr. Peter Woolf during the second week of classes.
The aim of the Python tutorial is to give students a basic working knowledge of this scripting language. This tutorial is intended for students with little or no programming experience, and will focus on the tools and utilities needed to do research in bioinformatics and computational biology.
The TAs will also provide instruction on writing Python scripts in recitation sections during the first few weeks of class. See your TA if you are still having difficulty after attending all of these sessions and studying in the books recommended above.
Calendar
1 |
Introduction/Sequence Comparison and Dynamic Programming |
|
2 |
Multiple Sequence Alignments I |
|
3 |
Multiple Sequence Alignments II |
Homework 1 handed out |
4 |
Phylogenetic Analysis |
|
5 |
Literature Discussion |
|
6 |
Genome Sequencing and DNA Sequence Analysis |
Homework 1 due |
7 |
DNA Sequence Comparison and Alignment |
|
8 |
DNA Motif Modeling and Discovery |
Homework 2 handed out |
9 |
Markov and Hidden Markov Models for DNA Sequences |
|
10 |
DNA Sequence Evolution |
|
11 |
RNA Secondary Structure Prediction |
Homework 2 due |
12 |
Literature Discussion on Predicting the Functions of DNA/RNA Sequences |
|
13 |
Midterm Exam – in class – Protein and DNA Sequence Analysis |
|
14 |
Protein Secondary Structure Prediction |
|
15 |
Introduction to Protein Structure and Classification |
|
16 |
Comparing Protein Structures
Molecular Modeling: Methods and Applications |
|
17 |
Using Computational Methods to Analyze, Predict, and Design Protein Sequences and Structures
Solving Structures using X-ray Crystallographpy and NMR |
|
18 |
Solving Structures using X-ray Crystallographpy and NMR (cont.)
Homology Modeling |
Homework 3 handed out |
19 |
Methods for Protein Structure Prediction: Homology Modeling and Fold Recognition |
|
20 |
Threading and ab initio Structure Prediction
Computational Protein Design |
Homework 3 due |
21 |
Introduction to Systems Biology |
Homework 4 handed out |
22 |
Feedback Systems and Coupled Differential Equations |
|
23 |
DNA Microarrays and Clustering |
|
24 |
Literature Discussion on DNA Microarrays and Clustering |
Homework 4 due
(a day after Lec #24) |
25 |
Computational Annotation of the Proteome |
|
26 |
Literature Discussion on Computational Annotation of the Proteome |
|
|