Proteins - Monod

Next: The Cytoplasm, Previous: Biological Inspiration, Up: The Cytoplasm and the Monod Cell

3.2 Proteins

All computation in the Monod model is accomplished by proteins. They are analogous to proteins in biological cells — hence the name! In turn, each protein is composed of possibly multiple domains, which are somewhat analogous to domains in biology as well.

The domains encode what a Monod cell can do at the most fundamental level. For example, there are domains for the following activities (which are described in detail further): binding, logical integration operations, changing ligands. From the point of view of the evolutionary algorithms which are part of Monod, the domains are the terminal entities, as we will see when we discuss Monod cultures.

The set of domains in Monod are Turing-complete, in that any Turing program can be created by a protein (FIXME: should be easy to show: have each protein insert a state marker on the tape at the mark position). However, they are not optimized along any other dimension right now, like efficiency, parallelizability, orthogonality. They have also not been evolved: they were designed in Monod directly. In order to reach certain levels of optimality, evolving them would probably be necessary. This would consist in metaevolution. (FIXME: References?)

We now proceed to describe in precise detail the Monod protein model. We do this in two stages. The first stage is a quick enumeration of all the salient definitions involved and the second stage is an elaboration of each definition along with examples. The Monod protein model is not simple to explain. It does not break down into neat stacked layers or independent modules. It is hoped that this situation is more than a failure on the part of the developers, a failure to explain properly. The model itself is complex, and, we presume, necessarily so. (FIXME: Why?)

Here is the quick breakdown of the Monod protein model. (FIXME: quickly review in light of the expanded version now written. Small changes are needed.)

An abstract domain, or simply domain, for short, is a fundamental building block of proteins.
There are different flavors of domains. These flavors are predefined in the Monod model. They have different uses and instructions.
Domains have interfaces. There are expressor and acceptor interfaces, and also different interface types. Each interface type has an allowable range of values.
Domains can be connected with one another by joining interfaces, one-to-one, an expressor to an acceptor of the same type.
Certain domains also possess projection sites. There are two kinds of projection sites: ligand binding projections and structural binding projections, as for a processing unit in an Incubator. Those domains that have projection sites are called boundary domains.
Projection sites accept input operations and emit output operations. The set of operations correspond to the possible actions of processing units in an Incubator. For instance, binding to a ligand, releasing another protein, changing a ligand are operations.
Domains, each according to its flavor, transform input values on their input interfaces and input operations into output values and output operations through a process known as conformation change. Conformation changes are entirely specified by two entities, the domain functional logic and the conformation engine.
A domain assembly is a set of domains along with valid connections between interfaces. There can be interfaces left naked.
A state S of a domain assembly is a set of values for each connection in the assembly. A state history associates a state of the protein to a time variable.
An abstract protein, or simply protein, is a domain assembly along with a distinguished state, called the initial state. By extension from a domain assembly, we can talk about the state of a protein, its boundary domains, etc.
An input history for a protein is a series of input operations along with times at which they are presented to particular projection sites on the protein. Similarly for output history.
A behavior function for a protein is a mapping from input histories to output histories. It describes how the protein reacts to input operations.
An unqualified behavior function can thus associate any behavior with any protein. In subsequent definitions, we constrain allowed functions along different axes.
Two properties can be stated easily in terms of the pure behavior functions.
A behavior function is causal if an input operation can only have an impact on future output operations.
A behavior function is spacing independent if it is causal and if different chunks of the input history can be shifted around, within limits, without substantially affecting the output history other than through shifting. In particular, a spacing independent behavior function is predictable, in that the response to an input function does not depend on the time at which the input function is presented.
Given a spacing independent behavior function, we can extract its behavior essence function, which is equal to the output operations it returns in response to a suitably spaced out sequence of input operations. The behavior essence characterizes what a protein does pretty much most of the time.
We define an equivalence relation among spacing independent behavior functions, declaring that two functions are essentially equivalent if they have the same behavior essence function.
In order to relate behavior functions to the domain structure of the protein, however, we need to look for a refinement of the behavior function.
A realization function for a protein enriches a bevahior function for the protein with another function which, for any input history, gives a complete, time dependent state history of the protein.
As stated, the two parts of the realization function — the behavior function and the state history mapping — do not have to be formally tied to one another, or even to the decomposition of the protein into domains.
Hence, we define a valid realization function as one that obeys certain rules about how state values are transformed by the conformation functions of the domains in the protein and how input and output operations are reflected in the state history function. These rules are fairly complex and are described in detail below.
A behavior function for a protein which is equal to the behavior function of a valid realization function for that protein is called a valid behavior function for the protein.
Note that there can be many different realization functions for the same behavior function. If suffices that there be a single valid one for the behavior function to be valid itself.
An abstract protein along with a valid behavior function for it is called a concrete protein. A concrete protein is called a realization of an abstract protein.
We can talk of a realization function for a concrete protein as a valid realization function which extends the given behavior function.
Concrete proteins are the focus of this chapter, and of the Cytoplasm layer of the design stack. The definitions, as complex as they may seem, are tailored to be flexible enough to allow for a complex compilation step from an abstract protein to a concrete protein while being rigid enough to allow the study of a protein from its purely abstract description.
The remaining definitions introduce desirable properties of concrete proteins.
A concrete protein is well-grounded if the state history corresponding to the empty input history is eventually constant. A concrete protein is called stable if, given any finite input history, the protein's state eventually becomes constant. A stable protein is well-grounded.
An abstract protein is stable if all of its realizations are stable.
An abstract protein is well-defined if all its realizations are behaviorally essentially equivalent.
A protein is well-behaved if it is stable and well-defined. There are well-behaved proteins.
FIXME: Introduce functional equivalence across proteins.

The definitions towards the end of the list above are important because they point to properties that we tend to expect from traditional programs. For instance, an unstable protein is such that it can be in a state where it is always working, but will never produce a result. A protein which is not well-defined will sometimes behave in one way, then in another, unpredictably. We can not completely restrict our attention to well-behaved proteins, because Monod cultures will certainly create way unbehaved proteins through random mutations and recombinations. However, well-behaved proteins provide the archetype of desired behavior.

Proteins which are not well-behaved may be seen as analogous to those polypeptide chains which do not have a unique stable conformation, or which are misfolded. Such chains are often biologically extremely deleterious. For example, many such chains expose hydrophobic areas and they tend to form clumps. Fortunately, the cellular machinery includes extensive protection to detect and neutralize these proteins (the proteasomes and ubiquitin binders).

Note that in biology, such invalid polypeptide chains are never called proteins. By definition, in biology a protein is well-defined, otherwise it's not called a protein. In the Monod world, a protein does not have to be well-defined, though sometimes we abuse language and assume that this is the case.

Note also that the functional representation of proteins we are currently describing is completely orthogonal to their genetic representation. It is not the case that the functional representation is easily derivable from the genetic representation, and vice-versa. This situation is analogous to that in biology, where the tertiary structure of a protein, which is close to its functional representation, is difficult to relate algorithmically to its primary structure. We will explore this later when we discuss Monod Cells, in this chapter.

In the rest of this section, we present the details of the construction and properties of proteins, proceeding in the order that we laid out in the bullet overview above. We also lay out some examples of all the concepts introduced.