Biological Inspiration

Next: Proteins, Up: The Cytoplasm and the Monod Cell

3.1 Biological Inspiration

In this section, we present those aspects of the biological cell which we find related to computation. A first implicit principle is indeed that cells perform computations through the physical and chemical activity that take place within them. At a grand scale, it is difficult to explain the nature or goal of these computations - life, survival, reproduction, etc. But at a small scale, it is easy to list a number of principles which seem to have a computational relevance. Unfortunately, it is very difficult to distinguish those mechanisms that may have computational relevance from those that don't. This is in great part due to the fact that biological processes are notoriously difficult to modularize into well-defined entities across well-defined layers with well-defined interactions, either in the biochemistry of the cell (with which we're concerned), neurological processes, ecology, etc.

In short, there are no epiphenomenological processes in biology! Everything counts.

Here is a list of principles that seem to have some importance within cells and that we have singled out as being potentially computationally relevant. The way we choose them is simple: we ask “what do we have to do to simulate the microbiology of a cell”. Most of these points come from [Alberts et al. 2002]. Note that we have an explicit agenda in discussing these topics: we're looking for anything that may have to do with computation. Many points may seem farfetched. We point out, along with each principle, whether we adapt it as a feature within the Monod project and if so, in which layer.

A finite number of domains combine to create most proteins

Most proteins can be subdivided into a small number of domains. A domain is “a substructure produced by any part of a polypeptide chain that can fold independently into a compact, stable structure” [Ibid, p. 140]. Many domains — called modules have been identified which are found across different proteins, and “many large proteins show clear signs of having evolved by the joining of preexisting domains in new combinations, an evolutionary process called domain shuffling” [Ibid, p. 146]. The function of certain proteins can be analyzed by identifying the domains that it is made of, and domains are being classified on their own terms (“immunoglobulin module”, “growth factor module”, etc.). It is not the case that a protein's function can be decomposed into the sum of the functions of the domains. For instance, “novel binding surfaces have often been created at the juxtaposition of domains” [Ibid, p. 145].

Nevertheless, the main new feature of the Cytoplasm layer is to restrict the functionality of the Incubator processing units: they can not be arbitrary programs anymore, but must be built out of a predefined set of domains. Processing units are then called proteins.

The question of what kinds of domains exist in the Cytoplasm layer is also driven by biology, through the following observations:

Proteins interact through binding and releasing. The Incubator's purpose is to orchestrate these activities, so there must be domains which express projections — both with other proteins and with ligands. (A note to the reader: in biology, ligand refers to anything which binds to a protein — including other proteins. In here, it only refers to ASCII strings.)
Proteins have a state. The state is often referred to as a conformational state, and the characteristic of having multiple state is called allostery. The state is determined, in part, by the bindings in which the protein participates. In turn, the state affects the affinity of all other bindings. Hence, domains must be able to carry a state and must be able to change the matchers and snippets of other domains.
FIXME: What else?

Bindings are released stochastically

Proteins and ligands constantly bump into each other. Through collisions and the heat bath, bindings are broken. Bindings have highly variable strengths, depending on the match [Ibid, p. 160]. The strength can of a match between two entities be measured; the measure is called the equilibrium constant: it is the quotient, at equilibrium, of the concentrations of the bound entities to the product of their individual concentrations.

It is possible for programs in the Incubator layer to simply match and never release — blocking anything else that's might be useful. In the Cytoplasm layer, we introduce an external release mechanism analogous to the thermodynamic action within the cell.

FIXME: As we discussed in the Turing soup section above, this changes the analysis of computability and of the Halting problem (???).

FIXME: I don't understand this well enough, but it seems to be absolutely critical.

Interactions may or may not consume energy

Molecular recognition is free, in the sense that it is “powered” simply by the heat bath. Many other reactions also do not require explicit energy input (often in the form of ATP). However, many important reactions do. There is a correlation between reactions that require energy input and irreversible reactions. The consumption of energy gives the reaction a preferred direction. For instance, contractions of elastic proteins, movement of proteins along DNA, etc. [Alberts et al., p. 183]. Energy consumption is a “fact of life”. The laws of physics and the chemical reactions in a biological cell, as efficient as they may be, require the input of energy. Nevertheless, one may well ask what impact, if any, energy requirements have — or have had, in evolutionary terms — on the computationally-relevant aspects of cell biology. An immediate candidate principle is energy minimization: is it the case that evolution will choose less endothermic reactions among multiple otherwise equivalent candidates? This principle certainly seems plausible, and probably has an impact on most of the other facts presented in this section.

In an environment simulated on computer hardware, it might be tempting to establish an analogy between energy consumption and the consumption of CPU cycles. At the very least, this analogy makes the desirable property of energy minimization carry over. Note that we're talking about resource utilization minimization as an element of evolutionary fitness. For instance, if an otherwise useful reaction which takes the system from state A to state B is reversible, it may lead the CPU to thrash wildly between A and B. Requiring that the reaction consume energy — and making it irreversible — may be a solution to the problem. If the reverse reaction is also available and also consumes energy, thrashing can still occur but will lead to early cell death, hopefully weeding out or regulating the problem reaction. Unfortunately, the measurements required to use CPU utilization as a measure of energy are computationally difficult to handle. In the Cytoplasm layer, we introduce a related notion of energy.

FIXME: This is also fairly fuzzy for now...

Cells absorb and eliminate molecules

Cells both need and process external molecules — taking them in from the outside. They also discard and forward molecules — eliminating them from their interior. Transport across the cell membrane has both to do with maintaining the well-being of the cell and with it fulfilling its function — and sometimes, the line is fuzzy. A Monod cell has the same dual relationship to the outside world. On the metabolic side, it can increase the amount of available energy by taking it certain ligands and reduce its energy consumption by eliminating other ligands. On the processing side, in order for a Monod cell to perform a computation, it is fed a ligand, and the output may be identified with an excreted ligand.

We even go a step further by identifying a successful computation with an energy-providing intake. More precisely, at the level of the Monod Culture, we can program a harness to identify the validity of a computation and tie it to the energy level of the cell. This makes the fitness function purely dependent on energy, simplifying computation and description. Input ligands may be seen as antigens entering the cell, and the result of a successful computation is the neutralization and elimination of the antigen, while an unsuccessful computation leads to the gradual poisoning of the cell, which ends up dying.

The cell has a topological structure

The levels of “topological” and “geometrical” structures present in a cell probably constitute one of the most puzzling aspects as far as computation is concerned. There are three levels we are concerned with: the coarse-grained subdivision of the cell into different, more or less impermeable compartments like the various organelles; the spatial extent of these compartments, which imply varying concentration gradients of all molecules; and the physical extent and geometrical structure of the molecules themselves, coupled with an exclusion principle which prevents different molecules from occupying the same space at the same time. The subdivision of the cell into compartments is the most readily interpretable property, from a computational point of view: it substantially increases performance and efficiency, and reduces side effects. Consider two reactions which are desirable:

     A + X -> B + X'

and

     B + Y -> C + Y'

and one that is not:

     A + Y -> D

Segregating X to a compartment and Y to another, while having a mechanism to shuttle the result B from the first compartment to the second has the effect of increasing the rate of both desired reactions (by increasing the relative concentrations of the reactants) while eliminating the occurrence of the undesirable reaction. This phenomenon is apparent in the multiple compartments of the Golgi apparatus, for instance [Alberts et al., p. 736]. Of course, compartmentalization can be taken further by specializing different cell types, different organs, etc.

We introduce a notion of compartments in the Cytoplasm layer, as well as the suddenly necessary notion of compartment transport. Fortunately, there is no computational load implied by this feature. The impact of the spatial extent of the various compartments is more difficult to analyze. For sure, it allows the existence of concentration gradients across the cell. These are manifest during many different cellular mechanisms, including cell division, development and differentiation, organelle development and movement, etc. However, the impact on the computational work performed by the cell is unknown (by us) at this point. Much research is focused on the computational properties of reaction-diffusion systems [Sienko et al., Chapters 3 and 4], but its relevance to cell biology is not emphasized. For instance, what is the importance of the number of geometrical dimensions? Guess: two dimensions would not be enough, while four would be (self-evidently) more than enough. Nevertheless, despite the uncertainties, we introduce in the Cytoplasm layer a framework with which to impart a topological notion of closeness to compartments. It is a plugin-type framework, so that we may choose different geometries for the cell. After all, the goal of Monod is to understand the impact of such biological characteristics on computation. Of course, the computational load imposed by simulating geometry can be very high.

Finally, molecules have a physical extent and structure, and they are subject to an exclusion principle which prevents two molecules from occupying the same space. It is in fact the geometrical structure of proteins (the tertiary structure) which gives them recognition capabilities. This aspect is completely abstracted away in the Incubator, where binding is a simple matter of traditional regular expression matching. A further consequence of the exclusion principle is that binding is always one-to-one (per site, that is). The Incubator already takes care of this attribute. Slightly more complex is what may be called geometrical repression, where a repressor may prevent further binding by simply blocking the way, even while the repressor binds far away from the sites it represses [Alberts et al., p. 406]. There are also much more complex consequences. For instance, DNA packing is used as an active, very intricate and poorly understood means of regulating protein synthesis [Ibid, much of Chapter 7]. There are many proteins concerned with keeping tabs on geometrical consequences of extended molecules — for instance, DNA topoisomerases ingeniously prevent DNA tangling during replication [Ibid, p. 251].

At the very least, the regulatory aspects above point to computational consequences of molecular geometry beyond molecular recognition. Unfortunately, the load on any putative simulation is probably prohibitive — and the impact ultimately may be available through other means. At this point we are ignoring in Monod this aspect of cell biology. Biological evolution has to deal with the laws of nature - simulated evolution has to deal with the simulated laws of the Monod Cell, and there is arguably sufficient evidence to show that the success of genetic algorithms is (at least qualitatively) independent of the particular laws at hand.

The impact of this omission is difficult to gauge a priori. Consider for instance the case of geometrical repression. The functionality may be recreated by making sure that the secondary activator must also bind to the repressor site in order to be active. However, there is a significant difference: in a biological cell, the geometrical repressor operates completely independently of the activator, may be reused across multiple sites, and, most importantly, may have evolved completely independently. This last point indicates that the Monod Cell is possibly missing an “evolutionary independent variable”, so to speak.

All proteins are created according to precise instructions

Protein synthesis, the process of transforming genes into proteins, is comparable to a compilation process. At a certain level, that process may appear fairly simple: a sequence of codons is transformed in an unambiguous way into a sequence (the same sequence) of amino acids. Compilation in most computer languages is more complex, from a computational point of view! Of course, this simplistic description ignores the crucial aspect of protein folding — see below. Also of note in this vein is the fact that the machinery which is responsible for protein creation is itself a product of the compilation process.

In the Monod Cell layer, we introduce Monod genes and the (very simple) process of conversion of the genes into proteins.

Proteins are created by folding sequences of amino acids in a very complex way

The amino acid sequence that makes up the primary structure of a protein is unambiguously determined by the gene(s) involved. Further, the secondary and tertiary structures — the physical folding of the sequence into a functional protein — is also determined, by the laws of physics and the context (temperature, presence of other catalysts, etc.). However, it is extremely difficult to predict, without a precise modeling effort, the resulting structure from the base sequence. In particular, there is no compact algorithm — other than a lookup — available to deduce the function of a protein from its genetic description. In computational parlance, the folding algorithm is incompressible: the best way to understand the result of folding is to simply do it! “...all structurally programmable architectures must have a highly compressible description in order to conform to formal rules specified in a simple user manual” [Sienko et al. 2003, p. 5]. This situation is in marked contrast with the process of compilation, with which an analogy was made above. The algorithm for compilation from source code to executable code is contained in a compiler, which is a very compact program. The quantum mechanics involved in folding are emphatically not a program. How significant is it, computationally speaking, that the function of a “biological program” can not (easily) be deduced by examining the program alone? There are claims that the fact is central to molecular computation [Idem]. However, the reasons given are vague at best. For sure, philosophically, it emphasizes the claim that evolution is blind to the genotype and only acts through the office of the expressed phenotype.

Monod, currently, does not incorporate the equivalent of a folding process as a step from genotype to phenotype. In fact, quite to the contrary, the process used to synthesize proteins from their Monod-genetic description is very simple. The initial maintainer of Monod has a sense that this is probably a most profound deficiency of the model. Are we missing the boat completely? Is this the single most important computationally-relevant fact of cellular biology? Nevertheless, before attempting to incorporate such a feature, two questions must be at least vaguely answered: how do we construct such a folding analogue? and why?

Most sequences of amino acids do not fold into valid proteins

There are 20^n possible amino acid chains of length n. A typical protein has about 300 amino acids — giving an incredibly large number. “Only a very small fraction of this vast set of conceivable polypeptide chains would adopt a single, stable three-dimensional conformation — by some estimates, less than one in a billion” [Ibid, p. 141]. In order words, using the compilation analogy above, most theoretical source files are not valid programs — which is not a surprise. “And yet virtually all proteins present in cells adopt unique and stable conformations. How is this possible? The answer lies in natural selection” [Idem]. Even further, we can draw a dichotomy between the syntactic validity of an amino acid chain, which has to do with it having a single, stable conformation, and the semantic validity of the chain, which refers to the resultant protein being able to fulfill its purpose in the context in which it is meant. Natural selection ensures the validity, both syntactic and semantic, of entire genomes.

In Monod Cultures, we subject Monod Cell genomes to such evolutionary algorithms.

Note that the cell possesses fantastically ingenious mechanisms to detect, correct and/or remove incorrectly created proteins (chaperones, the proteasom-ubiquitin pathway, etc) [Ibid, p. 357] [Nature 426, p. 883]. Such unstable chains can indeed be extremely dangerous (witness Alzheimer's, Huntington's and prion diseases). However, these mechanisms — or any other, thanks to the Central Dogma — do little good to genotypic evolution. In the Monod Culture, we create analogues of these mechanisms, but we are able to put them to work very early on in the chain. We call these violations of the Central Dogma cheating. Cheating offers a glimmer of hope in the face of the fundamental inadequacy of current hardware for running Monod.

The concentration of molecules in a cell is part of the state of the cell

FIXME: Concentration thresholds and triggers. Relative concentrations. Etc.

Certain cellular functions are quickly reprogrammable and possess long-term memory

FIXME: The immune system. Inspiration for certain not-quite-evolutionary mechanisms.

FIXME: The immune system always recognizes non-self by positively identifying non-self antigens. It never recognizes non-self by the *absence* of self-antigens, which would presumably be more powerful. Does this idea fit anywhere in Monod? Is this a limitation of the model, as it is a limitation of our immune system?