Biological systems self-assemble under physiological conditions and can display functional properties that rival or exceed the performance of many man-made systems. For example, proteins fold spontaneously into well-ordered three-dimensional structures that exhibit the capacity for specific molecular recognition, catalysis of complex chemical reactions, signal transmission, and allosteric regulation. At a larger spatial scale, networks of proteins assemble in cells to form well-ordered signaling systems that provide for complex, non-linear signal processing capablities. Because we assume that such properties require great precision in the design of systems, one view is to regard proteins and cells as finely tuned machines that are somehow exactly arranged for mediating their selected function. However, other aspects seem less consistent with this view. For example, biological systems are thought to be robust to random perturbation; that is, they display tolerance to removal or alteration of many system components. In addition, they are plastic; that is, they maintain the ability to adapt to changing selection pressures by allowing specific variation of a few system components to alter function profoundly. This curious mixture of robustness to random perturbation and yet sensitivity to specific perturbation suggests that despite the appearance of precise construction throughout, strong functional heterogeneity exists in the design of evolved systems. That is, some parts and connections are much more important than others.

Inspired by these ideas, our main goals are (1) to systematically map the pattern of interactions between the components that make up biological systems, (2) to mechanistically understand the operation of these systems, and (3) to define the evolutionary principles that generate these (and not other) architectures. In other words, we wish to understand what nature has built, how it works, and why it is built the way it is. In principle, such understanding would provide powerful rules for the rational engineering and control of biological systems, and would begin to explain how they are even possible through the random algorithmic process that we call evolution.

We are currently working on this problem at two levels, briefly summarized below.


I. The evolutionary "design" of proteins

At the atomic level, we are trying to understand the structure, function, and evolution of proteins.

Click here for detail...

Proteins are the basic building blocks of celular activities, and are synthesized as linear chains of amino acid residues that then fold up into compact functional three-dimensional structures. SeqFunction For example, in the PDZ domain family of protein interaction modules, the sequence specifies a six-stranded b-sandwich configuration with two asymmetrically positioned helices, and peptide ligands (yellow stick bonds) bind at a groove on the surface formed between the b2 strand and the a2 helix (A). So, what information in the sequence of amino acids is necessary and sufficient to encode the atomic structure and specific biochemical function? What aspects of the design provide for specificity in molecular recognition - critical for the biological role of the PDZ domain - while yet maintaining the capacity for adaptive variation to yield the large ensemble of PDZ - ligand combinations that occurs in mammalian genomes? More generally, how is high-performance biochemical function, robustness to random mutation, and adaptation to constantly varying fitness criteria encoded in the design of proteins?

To address these questions we began by developing a method, known as the statistical coupling analysis (or SCA), that estimates the pattern of constraints between amino acids in a protein from analysis of multiple sequence alignments. The SCA is essentially a generalization of the traditional concept of positional conservation in protein families to include the co-evolution of pairs of amino acid positions - the statistical signature of conserved functional interactions between positions. SCA shows that proteins can be broken down by the pattern of co-evolution (but not by position-specific conservation) into distinct subsets of amino acids that we call "protein sectors". In several different proteins, the sectors are found to correspond to sparse but physically contiguous networks of amino acids that underlie various aspects of function - allosteric regulation, binding and catalytic specificity, and/or fold stability.

Interestingly, the sectors do not seem to correspond to known classifications of proteins by primary, secondary, or tertiary structural motifs and more generally, the evolutionary correlations between positions are not representative of the dense pattern of local contacts between amino acids observed in the tertiary structure. To deeply test whether this correlation-based description of amino acid interactions is meaningful, we carried out a protein design experiment in which randomized sequences are computationally "evolved" to reproduce only the pattern of evolutionary constraints defined by SCA. ww proteinsThe idea is that if the correlations in SCA are in fact representative of the physical constraints between amino acids in individual proteins, then these artificial sequences should fold and function like their natural counterparts. For the WW domain, a small three-stranded b-sheet protein named as such for two conserved tryptophan residues, we find that this hypothesis holds; designed sequences that recapitulate the evolutionary constraints (but not those with the same top-hit sequence identities and amino acid propensities at each position, but lacking correlations) were found to natively fold into the canonical WW structure and to show binding properties that are quantitatively the same as those of natural WW domains. Panel B shows a gallery of several natural WW domain structures (in white), and for comparison, the atomic structure of one synthetic WW domain (in yellow). The conserved tryptophans are shown in space-filling representation.

Taken together, these results inspire current work in the lab in the following directions:

(1) development of improved and more general approaches for identification and analysis of protein sectors in proteins.

(2) evolution-based design of more complex proteins to further test the argument that the SCA captures the necessary sequence information for specifying the native state of proteins. A particularly interesting possibility is targeted design of protein sectors to selectively tune biochemical activities.

(3) extension of the concept of sectors within proteins to understand the mechanisms and evolution of cooperative functional interactions between two or more protein domains.

(4) development of theoretical and experimental approaches for understanding he physical basis of sectors in proteins. What physical properties differentiate "sector" positions and "non-sector" positions?

(5) understanding how the statistical properties of proteins are related to the dynamics of selection pressures on an evolutionary time scale. It seems inevitable that natural protein sequences must store information about the history of fluctuations in the conditions of fitness.

Aspects of these studies are being done in collaboration with Dr. Stanislas Leibler (Rockefeller University), Dr. Steve Benkovic (PSU), Dr. Gavin MacBeath (Harvard University) and Dr. Lila Gierasch (U. Mass Amherst).


II. Principles of cellular signaling networks

At the cellular level, we are working on understanding how structure and dynamics at the macromolecular to organelle scale influences functional properties of signal transduction systems.

Click here for detail...

A classic model for understanding cellular information processing is the signaling system that operates in photoreceptor cells of the Drosophila compound eye to transduce light energy into a graded electrical response (C-D). ommatidiumA key aspect of function in these specialized sensory neurons is their cellular architecture; they have long and skinny cell bodies (~5μm by 80μm), and extend a stack of roughly 30,000 tightly packed microvilli called the rhabdomere from one surface (D). This specialized organelle houses the visual signaling machinery. Each microvillus is a long and thin process (1.5µm by 50nm wide); in essence, an highly constrained and perhaps isolated compartment for light signaling. With regard to compartmentalization of signaling, the microvilli represent a first-level of structural order in the hierarchical organization of the signal transduction machinery. Technically, we have previously developed methods to isolate clusters of these cells from adult Drosophila that are suitable for tight-seal voltage clamp recordings (e.g. panel C). From these experiments, we can measure the electrical response to light in single cells with millisecond resolution while also controlling both the extracellular and intracellular environment.

What is the response of these cells to light stimulation? Much like central neurons that integrate quantal synaptic potentials over the plasma membrane, these photoreceptor cells generate stochastic electrical responses to absorption of single photons that are summed up to generate the macroscopic response to brighter light stimuli containing many simultaneous photons (E-F). Single-photon responses are called “quantum bumps” (QBs), remarkable processes characterized by a rapid, coordinated activation and deactivation of tens of cation-selective ion channels after a brief, random delay (E). qbumpsThe macroscopic visual response (F) is determined by tuning four fundamental properties of the QB: (1) amplitude (2) waveform shape, (3) the mean wait time for occurrence after light absorption (the “latency”, in green, E), and (4) a refractory period after QB generation during which no further QBs can be generated (in red, E). Each of these properties make critical contributions to the process of vision; short latency distributions and high QB amplitude ensure rapid activation and high signal to noise ratio of the macroscopic visual response (i.e. the QBs sum up efficiently in time), and narrow waveform and existence of the refractory period ensure high temporal resolution by making sure that the macroscopic response decays rapidly following termination of light stimulus. A number of studies argue that the QB represents the coordinated activity of one microvillus; that is, each microvillus is one QB generating unit, and the rhabdomere is a collection of roughly 30,000 independent QB generators.

What is known about the molecular basis for the QB?. Signal transduction in Drosophila photoreceptors begins with the absorption of a photon by the G-protein-coupled receptor rhodopsin, the activation by rhodopsin of a member of the Gq-class of heterotrimeric G protein, and subsequent activation of a phospholipase C-β (PLC-β). Activation of PLC ultimately triggers the opening of cation-channels, resulting in depolarization of the photoreceptor cell. The divalent cation permeability of one class of light-activated channels (Trp) leads to a rapid influx of Ca2+ upon photoexcitation that is the signal for feedback regulation of the signaling process. Calcium triggers sequential positive and negative feedback which is critical for generating the QB. Positive feedback causes the cooperative opening of channels that comprises the activation phase of the QB, and negative feedback causes rapid bump shutoff and probably also sets the refractory period. The molecular basis for positive feedback is yet unclear, but an eye-specific isoform of protein kinase C (eye-PKC) and a calcium calmodulin dependent protein kinase (CamKinase) are the primary effectors for Ca2+-dependent negative feedback regulation. Imaging of light-dependent calcium fluxes in Drosophila photoreceptors coupled with whole-cell patch clamp measurements show that Ca2+ transients in the tens to hundreds of micromolar range are necessary for the PKC-dependent feedback regulation while much lower levels of calcium are sufficient for positive feedback. A central player in organizing both activation and feedback regulation of signaling is InaD, a 674 amino acid scaffolding protein which is comprised of five PDZ domains, a large and conserved family of protein interaction modules. Through specific PDZ-mediated interactions, InaD assembles PLC-β, the Trp Ca2+ channel and eye-PKC into a single signaling complex. In other words, InaD assembles the main effector molecule for vision (PLC-β) with the main mechanism for Ca2+ influx (Trp) with a central mechanism for Ca2+-dependent negative feedback (eye-PKC).

Despite knowledge of essentially all the parts that make up the signaling machinery, and despite excellent experimental method available to measure and perturb signaling in this system, we have yet to understand how the dynamics of signaling reactions work together to produce the quantum bump - the elementary signaling event. Towards this goal, we are working on three specific areas, itemized below.

(1) development and testing of a theoretical model that describe the how the quantum bump arises from the non-linear dynamical properties of the underlying chemical reactions.

(2) understanding scaffolding by the InaD complex using a combination of approaches from atomic structure determination to genetics to cell physiology to behavior. The InaD machinery gives us a chance to connect design features at the atomic level (using methods such as SCA) to cellular and organismal properties that are more directly connected to evolutionary selection.

(3) understanding calcium dynamics in the rhabdomere using a combination of theory and experiment. How does the microvillar architecture influence calcium dynamics (and therefore the many feedback reactions that rely on calcium transients)?

Aspects of this work are in collaboration with Dr. Boris Shraiman at the Kavli Institute for Theoretical Physics (UC Santa Barbara) and with Dr. Alain Pumir at the Centre National de la Recherche Scientifique (Universite de Nice).