Theoretical Biology

Course in Theoretical Biology

Some years back, I designed a graduate course titled “Theoretical Biology”. Below are some notes and materials for that course. Feel free to reproduce them and use them as you like. Also, please send me any questions or comments you have related to the course notes.


There is a long history of courses with names like “mathematical biology,” “quantitative biology,” “computational biology,” “biostatistics,” and “modeling.” These courses can be great; they often provide students with mathematical or statistical tools that can save them a lot of hair-pulling later in their careers. What these courses tend NOT to provide is a forum for discussing how research in biology fits into some self-consistent body of theory that explains how things work. I developed a course called “Theoretical Biology” to address this gap. It focuses less on methods and more on developing a working definition of “theoretical biology” and deciding whether such an approach is useful for understanding living things. 
See my notes on the discussions we had below the topic listings.


Lecture 1. Why not use fundamental physics to understand living things?

“There is nothing that living things do that cannot be understood from the point of view that they are made of atoms acting according to the laws of physics.” –R. P. Feynman (Six Easy Pieces)

This proved to be a provocative suggestion. We discussed some examples of phenomena that are rampant in biology that make this approach difficult: Emergence, strong conditional dependence (accidents), dominance of noise.
Notes from discussion: I very much enjoyed our discussion about trying to develop theories in ways that make them easy for us to understand. As I mentioned, a number of scientists have been pushing the idea of beauty as a guiding principle for developing theory (e.g. this TED talk by Nobel Prize winner Murray Gell-Mann). Are these ideas related to trying to develop theory that is particularly amenable to the way the human mind works?
 Remember that there is a distinction between a system that has randomness (e.g. mutation in evolution) and a system that is DOMINATED by randomness (chemical reactions with extremely low concentrations of reactants/catalysts). A good guiding principle is that, if you can predict the behavior of your system pretty well by just using its average behavior, there may be randomness in that system, but its behavior is probably not dominated by random fluctuations.

There seemed to be some confusion about the difference between strong conditional dependence and emergence. As I mentioned, these are relatively new concepts and not everyone agrees on precisely what they mean. However, I think a useful way to think about emergence is the following: if you leave the behavior of individual components exactly the same, but change some parameter in the system (e.g. number of cars in the traffic simulation) does a qualitatively new behavior emerge (e.g. traffic jams), or does the system just change quantitatively (e.g. mean rate of travel of the cars goes down a little)? In the first case you have emergence and in the second, you don’t. In strong conditional dependence, the properties of the components of your system or the way they interact change depending on past events; they are not constant as in the simple emergence example above.

Lecture 2. Describing regularities in nature: theory as data compression
We began by discussing the subjective nature of prediction and understanding. When describing any system, one has to deal with choices that are uncomfortably subjective. How do I coarse-grain the system (i.e., choose to look at the system at a particular level of resolution)? How much precision will I be satisfied with? What components of the system will I exclude when developing a simplified description? These questions are seldom discussed in any general sense and they aren’t really part of the scientific method. Yet, we have to answer them every time we consider a biological system we are studying. A word that gets tossed around a lot in biology is the word “mechanistic.” Yet distinguishing a description at one level of resolution as mechanistic and a description at another level as a mechanistic immediately seems arbitrary when we realize that a researcher must select her/his desired level of resolution in advance.
 Describing regularities in an image –

– Image compression exercise: In this exercise, I presented the class with a photographic portrait of a man. I separated students into groups and charged each with the task of developing an image compression algorithm that could be used on the image. Algorithms were judged based on three criteria: (1) simplicity of the algorithm (i.e., fewer steps is better, simpler operations is better), (2) ability to extract properties of interest, specified in advance by each group of students, and (3) ability to reduce the file size of the final image.

The purpose of this exercise was to emphasize several things about a good theory. First, that the measure of a good theory is subjective. A theory should preserve the desired properties of the system it describes. Hence, I asked students to select specific properties they wished to preserve. An example is, “does the person in the image have a mustache?” A second concept I wanted to emphasize is that a good theory should be brief and relatively simple. Thus, algorithms with fewer operations and simpler operations were rewarded. Finally, a good theory should produce a caricature of the system that contains the features of interest, but does not contain all the features of the system that are not of interest. Thus, algorithms that reduced the size of the file were rewarded.
To illustrate a final point, we ran algorithms on the original portrait and recorded their performance, but I also ran them on a different image that was much different from the first. The second image was from a famous piece of pop art showing a painted elephant in a painted room. The purpose of this step was to illustrate that a good theory must also be general. It can’t be too closely tailored to a particular system if such a close fit makes it bad at describing other systems.
Notes from discussion: Today we talked about the idea of theory as data compression. I really designed this exercise to help us explore one phase of the development of a useful theory: the phase in which many complex observations are compressed into simpler descriptions of phenomena. When describing how different objects fall from different places–for example, an apple from a tree, a brick from a building, and a skydiver from the sky–we wish to focus on the phenomenon of falling. Therefore, we must compress our observations into more compact descriptions of the data. For instance, we may choose to disregard the scent of the apple or the color of the skydivers parachute. We are choosing to disregard information that is present in the original data for the sake of simplicity. A good theory says what to disregard and what to keep, given a particular phenomenon of interest.

Lecture 3. Aging as a case study
-Reverse engineer theories of aging from aging data
In this lecture, I asked students to begin to think about the problem of aging as a case study. I’ll go out on a limb and speculate that aging is something that most living people have thought about at one point or another. My goal in this section was to have students describe common theories of aging by studying papers that were almost totally restricted to data. I specifically chose papers that did not discuss the theory underlying their experiments. I asked students to read these papers and be prepared to disuss a few questions:
(1) What is the definition of aging/senescence?
(2) What are the proximate steps that cause organisms to age (e.g., loss of a particular cellular protien, function, etc.)
(3) What, if any, evolutionary processes are responsible for aging?
(4) How are the different symptoms of aging associated with one another?

My intention in designing this exercise was to explore the degree to which a coherent theory (or set of theories) of aging is guiding empirical research on the phenomenon of aging. This was both an exercise and an experiment. Neither I nor anyone else in the class really knew the answer to this question in advance.

Lecture 4. Aging as a case study ctd.
-Present and analyze reverse engineered theories

In this lecture/exercise we reconvened and discussed the reverse-engineered theories of aging groups of students compiled by reading papers. The questions we tried to answer were: (1) What are the most important patterns in the field? (2) What are the proximate mechanisms that studies have hypothesized and investigated? (3) How are the major patterns in the field related to one another? (4) What is the definition of aging?

The answers to these questions were as follows:

(1/2/3) What are the most important patterns in the field? Proximate mechanisms? How are patterns related to one another?

After a certain developmental stage is reached, the body gets worse at everything (e.g., cell function, DNA replication, repair). Telomere damage increases, frequency of somatic mutations increase, brain tissue changes, DNA content of cells increases. There is a functional decline in organ systems. These patterns are probably related but their relationships are unclear from empirical studies.

(4) What is the definition of aging? 

The definitions drawn from different sets of empirical papers by different groups of students were, in some cases, contradictory. For example, one group defined aging in the following way:

Aging is the total changes associated with development and senescence. Aging is not the same as senescence. Aging is a decline of function of the brain and organ systems through time. 

Another group’s definition: Aging is simply the passing of time.

A third definition: Aging is a decline in function near the end of an organism’s life and therefore aging is senescence.

Many of the definitions rely on other technical terms (e.g. senescence) which were, themselves, difficult to define from empirical papers. We discussed, at some length, an empirical paper that studied oxidative damage and lifespan of C. elegans. This paper showed an interesting pattern: that mutant strains with very long lifespans actually had much higher rates of oxidative damage to cells and tissues. This oxidative damage appeared, however, to be counterbalanced by increased rates of cell and tissue repair. The end result appeared to be that wild type worms were relatively short lived compared to a mutant with high rates of damage and high rates of repair. This was interesting in light of the definition of aging as an unstoppable accumulation of damage, and raised a provocative question: if damage can be repaired, can the rate of damage accumulation be reduced to zero? If so, why don’t individuals live indefinitely?

Lecture 5. Aging as a case study ctd.
-Explore existing theories of aging
-Compare existing theories to reverse-engineered theory

In this lecture/discussion we compared reverse-engineered theories of aging to theories discussed in reviews of aging research. The definitions of aging from these reviews were:

Aging is the phenomenon of growth, decline, and death and Aging is the progressive loss of function accompanied by decreasing fertility and increasing mortality with advancing age and Aging is a series of time-related processes occurring in the adult individual that ultimately bring life to a close.
Lecture 6. Theory in other areas of biology
-5 minute flash presentations on one theory in your area of biology

These were excellent an eye-opening presentations given by groups of students about theories in their own research areas. The assignment was to prepare an 8 minute presentation with 2 minutes for questions on a theory in the field of one member of each group. Begin by showing the major patterns in data that the theory is intended to explain. Then describe the level of coarse-graining the theory targets (i.e., at what scale does the theory make predictions). Avoid jargon and language that obscures how the theory works.

The topics students covered were: The dilution effect, Evolutionary stasis, Niche vs non niche theories in ecology, theories of polyploidization, Janzen-Connell effects

Supplementary media

Murray Gell-Mann TED talk on developing theory
Steven Strogat’s TED talk on the emergence of synchronous behavior
Simon DeDeo lecture on emergence
Robert May lecture on applying theory from ecosystems to financial markets
Peter Turchin’s paper on the existence of general laws in population biology
Bill Bialek’s talk about coarse-graining huge arrays of neurons to study system properties

Link to NetLogo simulation software we used in the traffic simulation

Link to OpenCV computer vision package