[go: up one dir, main page]

WO1993014465A1 - Prediction de la conformation et de la stabilite de structures macromoleculaires - Google Patents

Prediction de la conformation et de la stabilite de structures macromoleculaires Download PDF

Info

Publication number
WO1993014465A1
WO1993014465A1 PCT/US1993/000418 US9300418W WO9314465A1 WO 1993014465 A1 WO1993014465 A1 WO 1993014465A1 US 9300418 W US9300418 W US 9300418W WO 9314465 A1 WO9314465 A1 WO 9314465A1
Authority
WO
WIPO (PCT)
Prior art keywords
conformation
energy
freedom
peptide
probability
Prior art date
Application number
PCT/US1993/000418
Other languages
English (en)
Inventor
Christopher Lee
Original Assignee
The Board Of Trustees Of The Leland Stanford Jr. University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Board Of Trustees Of The Leland Stanford Jr. University filed Critical The Board Of Trustees Of The Leland Stanford Jr. University
Publication of WO1993014465A1 publication Critical patent/WO1993014465A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K1/00General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment

Definitions

  • This invention relates to methods for determining th stability and conformation of molecular systems. The method also predicts the effects of mutations on the structure and th stability of the molecular system.
  • a peptide is an oligomer of amino acids attached in a linear sequence to form, for example, a protein or an enzyme.
  • Peptides consist of a main chain backbone having the following general pattern:
  • the primary sequence of a peptide represents the sequence of the constituent amino acids such as, for example, NH 2 -Glu-Ala-Thr-Gly-OH (SEQ ID N0:1) (the three letter symbols represent amino acid residues) .
  • the NH 2 - and -OH moieties represent the amino and carboxyl termini of the peptide, respectively, and also indicate the directionality of the peptide chain.
  • the peptide's secondary structure represents the complex shape of main chain and generally indicate structural motifs of different portions of the peptide. Common secondary structure includes, for example, alpha-helices, beta- sheets, etc.
  • Th-e tertiary structure of a peptide represents the three dimensional structure of the main chain, as well as the side-chains conformations.
  • Tertiary structure is usually represented by a set of coordinates that specify that positions of each atom in the peptide main chain and side-chains and is often visualized using computer graphics or stereopictures.
  • quaternary structure represents the three-dimensional shape and the interactions that occur between different peptide chains, such as between subunits of a protein complex.
  • Non-amino acid fragments are often associated with a peptide. Such fragments can be covalently attached to a portion of the peptide or attached by non-covalent forces
  • Non-amino acid moieties include, but are not limited to, heavy metal atoms such as, for example single molybdenum, iron, or manganese atoms, or clusters of metal atoms, nucleic acid fragments (such as DNA, RNA, etc.), lipids, and other organic and inorganic molecules (such as he es cofactors, etc.) .
  • heavy metal atoms such as, for example single molybdenum, iron, or manganese atoms, or clusters of metal atoms, nucleic acid fragments (such as DNA, RNA, etc.), lipids, and other organic and inorganic molecules (such as he es cofactors, etc.) .
  • the three-dimensional complexity of a peptide arises because covalent bonds in each amino acid can rotate.
  • the conformation of peptide is a particular three-dimensional arrangement of atoms and, as used herein, is equivalent to its tertiary structure.
  • the conformation of an amino acid side-chains is the three-dimensional structure the side- chains.
  • an amino acid side-chains can assume many different conformations, with the exception of glycine which assumes only one.
  • Peptide folding and structure prediction has traditionally been viewed as a very complex problem because of the large number of atoms in a typical peptide.
  • the large size of a peptide chain in combination with its large number of degrees of freedom, allows it adopt an immense number of conformations.
  • a relatively small polypeptide of 100 residues has 3 100 possible conformations considering only three possible confor ational states for each residue.
  • Despit the multitude of possible conformations, many peptides, even large proteins and enzymes fold in vivo into precise three- dimensional structures.
  • the peptide generally folds back on itself creating numerous simultaneous interactions between different parts of the peptide.
  • the principal difficulty of predicting side-chain conformations is the enormous size of the conformation space (i.e., number of possible combinations of side-chain conformations) .
  • a peptide having n different ⁇ torsions produces up to 36 n conformationally distinct peptides. For example, a five residue peptide with a total of ten ⁇ torsions has 3.7xl0 15 possible conformations that need to be evaluated to determined the low energy conformations.
  • a prior strategy to optimize the structure of peptide decreases the number of conformational permutations by limiting the number of conformations allowed for each side-chains (see, Ponder and Richards J. Mol. Biol. (1987) vol. 193, pg. 775, which is incorporated by reference for all purposes) .
  • This method allows each side-chains to exist in a only small number of predetermined rotamers, typically three to seven, and forbids free rotation of each amino acid torsion. Thus, for a five amino acid peptide where each side-chains is constrained to five rotamers, there are only 3125 possible permutations.
  • Reid and Thornton Proteins (1989) vol. 5, pg. 170, which is incorporated by reference for all purposes) used this method to predict side-chain conformations of flavodoxin with an overall root-mean-square (r.m.s.) deviation of 2.41 A, compared with the X-ray crystal structure. They started from the alpha carbon coordinates alone, using computational methods to predict main-chain atoms, and manual examination and adjustment using computer graphics to predict the side-chains conformations.
  • a third approach to predict peptide structure is exemplified by Karplus et al. (Proc. Nat. Acad. Sci.. USA (1989) vol. 86, pg. 8237, which is incorporated by reference for all purposes) which uses a multiterm potential energy function to calculate the interaction energy between atoms in the protein.
  • the minimization method used molecular dynamics and, as a method for predicting peptide structure, this method relies on a detailed structure as a starting point. Like the other methods, this method has an exponential dependence on th number of atoms considered in the calculation.
  • the present invention provides a new method that combines an explicit focus on structure prediction with ensemble methods more suited to calculation of energies.
  • the ensemble methods which are rooted in thermodynamic formalisms provide accurate predictions of mutant thermostability.
  • One aspect of the present invention involves a metho for using a computer having a memory to compare the physical stability of a first molecular system and a second molecular system. Both molecular systems have one or more degrees of freedom and a plurality of conformations for each of the one o more degrees of freedom.
  • the method includes the following steps: a) preparing a geometric representation of the first molecular system, the geometric representation having a defined initial structure; b) assigning probabilities to each of the plurality of conformations of each of the degrees of freedom; c) repeatedly adjusting the conformation of each degree of freedom according to the probabilities assigned to the conformations of each degree of freedom, and, after eac adjustment, determining an energy of each conformation of each degree of freedom in a field, the field associated with each conformation of each degree of freedom being caused by the conformations of the remaining degrees of freedom; d) replacing the probability assigned to each of the plurality of conformations of each degree of freedom, the probability determined from the energy of each conformation; e) repeating steps c and d until the energy of each conformation of each degree of freedom converges to a substantially unchanging value, that value corresponding to th physical stability of said first molecular system; f) repeating steps a through e for said second molecular system to determine the physical stabilities of both the first
  • conformation energy and probability maps are employed to determining the packing energy of a macromolecular structure. This can be accomplished by first preparing a geometric representation of the macromolecular structure and dividing it into one or more residues, each of the residues having a side- chain and torsion angle degrees of freedom. Next, an initial conformation probability map for each of the residues is prepared. At least one of the residues is then moved to a new conformation; although in most instances, the many residues will be moved to a new conformation. The residues' conformation probability maps are used in determining an average conformation energy map for each of the residues. Next, each residue's average energy map is used to prepare a new conformation probability map to replace the previous conformation probability maps of each residue. The whole process of moving the residues to new conformations and determining average conformation energy maps is then repeated over and over until the average conformation energy map converges. Finally, the packing energy of the macromolecular structure is determined from the average conformation energy maps of each residue.
  • a preferred method predicts the three dimensional conformation of a peptide.
  • the method utilizes the understanding that amino acid side-chains of a peptide adopt conformations that maximize favorable atom-atom contacts and minimize unfavorable contacts. With this principle, the method determines the energy of atom-atom interactions and adjusts the amino acid side-chains conformations to minimize this energy.
  • the invention is directed to a method for determining the three-dimensional structure of a peptide having amino acid side-chains extending from a defined main chain.
  • Each amino acid side-chains has predefined rotational degrees of freedom.
  • the present invention also provides a method for determining the time-average packing conformation of a macromolecular structure.
  • the method includes the following steps: a) preparing a geometric representation of the macromolecular structure; b) dividing the geometric representation into one or more structural zones; c) determining an initial conformation probability map for each of the structural zones; d) moving said geometric representation to a new conformation; e) determining an average conformation energy map for each of the structural zones from that zone's conformation probability map; f) replacing the conformation probability maps of each zone with new conformation probability maps determined from each zone's average energy map; and g) repeating steps d and f until said conformation probability map converges, the converged conformation probability map representing a time-average packing conformation of the macromolecular structure.
  • the present invention also provides method for producing a peptide having a specified stability.
  • this method consists of the following steps: a) selecting a known peptide having a desired activity; b) generating a series of mutant peptide sequences from the known peptide by replacing one or more of its residues with different amino acid residues; c) determining the stability of each of said mutant protein structures by the following steps:
  • step d repeating steps iii and iv until the energy of each conformation of each degree of freedom converges to a substantially unchanging value, the sum of these values corresponding to the stability of said peptide; d) identifying a mutant peptide sequence from among said series of mutant peptide sequences having the specified stability; and e) synthesizing a peptide having the mutant peptide sequence identified in step d.
  • the present invention is also directed to synthetic peptide compositions which exhibit thermal stabilization by improved core packing.
  • some subtilisin polypeptide mutants will exhibit improved stability when certain hydrophobic amino acids are substituted for other hydrophobic amino acids. It has been found that by substituting isoleucine for valine at the 30, 180 and/or 192 sequence positions strongly increases the peptide stability.
  • a peptide is synthesized having the structure used to model a low energy or otherwise stable peptide.
  • the stable peptide structure is identified from among a group of structures that are modeled according to the above procedure. Each of these structures will have at least one amino acid that is different from a corresponding amino acid in the other structures. At least one structure from among this group will be identified as having a suitable stability and thereafter synthesized by techniques that are well known in the art.
  • Fig. 1 illustrates the arrangement of atoms in a peptide backbone.
  • Fig. 2 illustrates (a) the general chemical structur of a naturally occurring amino acid, (b) the chemical structure of glycine, and (c) chemical structure of proline.
  • Fig. 3a illustrates preferred sets of rotational degrees of freedom for each naturally occurring amino acid.
  • Fig. 3b illustrates the chemical structure of the five naturally occurring nucleotides.
  • Fig. 4 schematically shows the torsion about a carbon-carbon single bond.
  • Fig. 5 illustrates a digital computer system that may be used to implement some aspects of the present invention.
  • Fig. 6a shows the procedure used to load the main chain coordinates and amino acid sequence of the peptide to be modeled.
  • Fig. 6b schematically illustrates the set-up and precalculation steps employed in some methods of the present invention.
  • Fig. 6c schematically illustrates the preparation of lists of interactions between side-chains and main chains, and side-chains and other side-chains.
  • Fig. 6d schematically illustrates the main program for calculating the peptide packing conformation.
  • Figs. 7a and 7b schematically show the bond between Co and the plain formed by C, C ⁇ and N.
  • Figs. 7c and 7d schematically illustrate the torsion angle about the bond between Co and C ⁇ .
  • Figs. 8a-d present various comparisons between the predicted results of the present invention and the experimental results for activity and stability of lambda repressor mutants.
  • Fig. 9a shows a histogram comparing the activity of various lambda repressor mutants based upon their packing energies as calculated by the present invention.
  • Fig. 9b shows a histogram comparing the activity of various lambda repressor mutants over the range of their volumes (relative to the wild-type in units of methylene groups) .
  • Fig. 10 presents a comparison of predicted side-chain coordinates for an eight residue molten zone surrounding mutations in lambda repressor.
  • Fig. 11 presents a comparison of the internal r s deviations of side-chain predictions from seven runs seeded with different starting structures.
  • Fig. 12 shows a contour plot of conformation space for the condensation of a single residue.
  • Fig. 13a-c illustrates the condensation of a six residue molten zone for wild-type lambda repressor.
  • Fig. 14 illustrates convergence of the total system energy as a function of iteration cycle for a six residue molten zone.
  • Figs. 15a-c present comparisons of simulated annealing and the method of the present invention: (a) on the basis of lowest energy conformation of flavodoxin versus number of cycles; (b) on the basis of increasing molten zone size and the final peptide energy; and (c) on the basis of the number of moves required for convergence.
  • Fig. 16 shows theoretical calculations of the free energy of folding for a series of hydrophobic core mutations in the protein barnase.
  • Fig. 17 shows a comparison of experimental and calculated binding free energies for 15-mer peptides binding to the S-protein fragment of pancreatic ribonuclease A.
  • a side-chains "rotamer” is a frequently observed rotational isomer for a residue, constituting a single, static conformation of that residue. This term has been used in some references to describe the limited set of isomers used to model peptide side-chains conformations.
  • Conformation map refers to a map of the conformations of a zone (region of structure) within a particular molecular system. For example, an amino acid residue can be considered to be a zone within a peptide.
  • the axes of a conformation map are the degrees of freedom associated with the particular zone or residue. For example, translational, rotational, torsional, and vibrational degrees of freedom may serve as axes.
  • a “conformation probability map” describes the probability that a residue is in a given conformation at any particular instant in time.
  • the conformation probability map will have an axis representing the probability associated with each conformation (dependent variable) .
  • Conformation energy map refers to a map describing the energy experienced by a given residue in each of its possible conformations, due to its interactions with other residues and molecules.
  • the conformation energy map will have axes corresponding to degrees of freedom in the zone. In addition, it will have an axis corresponding to the energy associated with each conformation.
  • Stability refers, in one sense, to the ability of molecular system such as a polymer to remain in an active conformation when subjected to thermal or disruptive effects.
  • a conformation is active when it possesses at least one measurable property.
  • an enzyme may be considered active so long as it can act as a catalyst.
  • Stability also refers, in a thermodynamic sense, to molecular complexes havin generally low free energies. Of course, the free energy is determined with respect to some base state such as the unfolde conformation or an unbound receptor.
  • stability can represent the free energy of foldin for a given peptide sequence, while in the case of two interacting molecules (e.g. an enzyme and its substrate), stability can represent the free energy of binding between them.
  • An increase in the physical stability of a peptide generally results in an increase in thermal stability, although it could result in an increase in binding affinity, stability in salt solutions, pH stability, and other environmental conditions.
  • Molecular System refers to a collection of atoms, at least some of which are covalently bonded to one another, interacting via a defined set of noncovalent forces. These may include interacting van der Waals forces, hydrogen bonding, and electrostatic forces, may also be present.
  • a molecular system refers to species in any chemical class such as organic compounds and inorganic compounds. It also refers to species in any phase, such as gas, liquid or solid phases.
  • Degrees of Freedom refers to the independent parameters that define the conformation of the molecular system. Common examples of degrees of freedom associated with motion include translation, rotation and vibration. A degree of freedom may also be used to describe independent ways that a molecular system may take up energy.
  • Macromolecular structure refers to molecules, sub- molecular groups, and complexes between one or more molecules or groups.
  • a macromolecular group will generally have a molecular weight of more than about 200, preferably more than about 500, and most preferably more than about 1000.
  • the macromolecular complex will typically have a main-chain or "backbone” which is a string of repeating molecular units.
  • the macromolecular complex will often possess a series of side-chains extending from the main-chain.
  • Examples of macromolecular complexes include, proteins and large peptides alone or associated with other molecules such as co actors, substrates, membranes, and cell structural organelles.
  • Macromolecular complexes will also include nucleic acids such as DNA and RNA, as well as these materials combined with materials such as histones, ribosomes, and polymerases.
  • “Mutant” refers to molecular systems that are expressed by a mutation (i.e. an alteration in the amount or arrangement of genetic material of a cell or virus) .
  • a mutant is any variant of a wildtype structure (i.e. the typical form of a biological molecule as it occurs in nature) in which one or more amino acids have been deleted or changed. In most instances, the mutant will retain most of the structural information of the parent wildtype structure.
  • mutant refers to any molecular system that has been modified to deviate from its native state. For example, a peptide containing amino acids that are not genetically coded is a "mutant".
  • a “geometric representation” refers to an abstract model of a real molecular system.
  • the geometric representatio will have an arrangement of structural features and degrees of freedom that correspond to the real molecular system. Through manipulations on a computer or other means for rapidly evaluating equations, the geometric representation may be carried through a range of movements to explore the properties of a real molecular system in various conformations.
  • Converge refers to the state of an iterative process in which a result of the process remains substantially unchanged after each iteration. For example, if the process repeatedly calculates a conformation energy map, the process converges when the absolute magnitude of the energy values and the relative topography of the map remain substantially unchanged from one iteration to the next.
  • One aspect of determining peptide conformation and energetics is the prediction of two basic classes of degrees of freedom.
  • the ⁇ - torsions which determine the folding of main chain atoms of the peptide, and the ⁇ torsions, which specify the set of angles that defines the conformation of each amino acid side-chains. These two sets of variables are closely coupled, because of the tremendous importance of side-chains conformation and packing for the stability of the overall peptide conformation.
  • a preferred method of the invention determines the set of favored ⁇ torsional conformations, holding the ⁇ — ⁇ torsions substantially constant.
  • Peptides fall into the general class of polymers and are simply molecules generated from a sequence of amino acid residues connected in series.
  • the peptide backbone, or main chain consists of a repeated sequence of three atoms: an amide nitrogen N_, the alpha carbon C_ a , and the carbonyl carbon C_, where i represents the amino acid in the peptide sequence.
  • the carbonyl oxygen, 0_ is attached to the carbonyl carbon and hydrogens are attached to both the amide nitrogen and alpha carbon. In principle, rotation can occur around any of the three bonds of the peptide main-chain.
  • the bond between C_ and N i+1 has partial double bond character that inhibits its rotation and in the absence of a strong force, C ⁇ ", C_ r 0_, and N i+1 lie in approximately the same plane.
  • the first carbon of the side-chains, which is attached to C_ a is the beta carbon, C- 3 .
  • the beta carbon of each has a fixed position relative to the peptide main chain defined by C ⁇ , C i , and N i .
  • the position of the main chain specifies the position of the first atom of each side-chains.
  • beta carbon, or C ⁇ we refer to the atom of a side-chains attached to C ⁇ .
  • C ⁇ is a carbon
  • for glycine C ⁇ is a hydrogen.
  • Each side-chains has unique physical and chemical properties, as is well known (see, Creighton "Proteins: Structure and Molecular Principles," W.H. Freeman and Company, New York, 1984, which is incorporated by reference for all purposes) .
  • the side-chains of each amino acid can adopt a myriad of possible conformations, the number of which depends on the number of predefined rotational degrees of freedom.
  • Fig. 3a illustrates a preferred set of rotational degrees of freedom for each amino acid. As defined in this set, rotation about a methyl group which has, in theory, a three-fold rotation axis (taking hydrogen atoms into account) , and hydroxyl/sulfhydryl groups which have no rotational symmetry, are in most instances not included.
  • Structurally simple amino acids such as alanine and glycine, as well as the imino acid proline, consist of side- chains that have no rotational degrees of freedom, while the side-chains of more complex amino acids such as lysine and arginine have four.
  • the side-chains of alanine and proline all have one conformation.
  • amino acids in these categories include enantiomers and diastereomers of the natural D-amino acids, oxyproline, cyclohexylalanine, norleucine, cysteic acid, methionine sulfoxide, ornithine, citrulline, omega-a ino acids such as 3-amino propionic acid, 4-amino butyric acid, etc. All such amino acids can be incorporated into peptides by suitable methods known in the art, and the structure of a peptide having these uncommon amino acids can be determined when the structure and properties of the uncommon amino acids are known.
  • amino acid includes all natural amino acids encoded by the genetic code, as well as uncommon natural amino acids and unnatural amino acids.
  • the invention method is suitable for determining the structure of poly-deoxyribonucleic acids (DNA) and poly-ribonucleic acids (RNA) , as well as protein-DNA and protein-RNA complexes.
  • the monomeric units of these biological polymers are the nucleotides, which are shown in Fig. 3b.
  • the five common, naturally occurring nucleotides adenine, guanine, cytosine, thymine, and uracil have the general structure consisting of a phosphate, a sugar, and a purine or pyrimidine base. Each of these nucleotides is planar, and has one rotational degree of freedom, as shown in Fig. 4.
  • nucleotide refers to the set of common and uncommon naturally-occurring nucleotides, as well as the set of unnatural nucleotides.
  • the sugar ring may be deoxyribose, ribose, or any suitable variation (such as, for example, in a 2-methyl nucleotide) .
  • the preferred method utilizes the three-dimensional structure of the peptide main chain as a starting point for predicting the conformation of the peptide side-chains.
  • X-ray or neutron diffraction (hereinafter referred to as "diffraction") provides a detailed picture of the three-dimensional positioning of the peptide main chain. Diffraction methods are well known (see, for example, Cantor et al. "Biophysical Chemistry Vol. Ill” (1980) W.H. Freeman & Co., San Francisco, chapter 13, which is incorporated by reference for all purposes) .
  • Diffraction methods are based on the observation that many peptides crystallize into a well-defined three dimensional crystal lattices which scatter impinging X-ray or neutron irradiation. Collection and analysis of the scattered beams, in conjunction other experiments, produces the three dimensional structure of crystal lattice. In the current state of peptide crystallography, to obtain the three dimensional structure generally requires use of auxiliary techniques such as isomorphous heavy metal replacement, multiple wavelength scattering, anomalous scattering, to supplement the collected scattered X-ray or neutron beam data (See Cantor et al. "Biophysical Chemistry Vol. III").
  • Coordinates for each atom of the peptide main chain are obtained once the electron density map of the peptide main chain has been solved.
  • the electron density map of the peptid generally has an associated correlation coefficient (or resolution) that represents the accuracy of the data and the amount of detail present, respectively.
  • resolution In accurate high resolution electron maps, structural elements such as the coordinates of main chain and side-chains atoms are readily observed.
  • Low resolution data generally includes the position of the main chain atoms but does not, however, include side- chains positions.
  • the present method utilizes both high and resolution diffraction data.
  • Other methods for determining the three-dimensional conformation of the peptide main chain suitable for use with the invention include, for example, nuclear magnetic resonance (NMR) spectroscopy and theoretical prediction.
  • NMR nuclear magnetic resonance
  • Structural determination by NMR spectroscopy involves three steps: identification and assignment of resonance signals of the spectra to individual nuclei, inter-nuclei distance measurements, and computation of the structure.
  • Suitable NMR methods include, for example, one-dimensional proton ( 1 H) NMR spectroscopy, which is used to identify individual protons in a peptide, two-dimensional 1 H NMR methods (including correlated experiments which rely on J-coupling) which provide interproton relationships using through-bond coupling, and the Nuclear Overhauser Effect (NOE) experiments which provide spatial relationships using through-space information (see Griesing et al. J. Mag. Res. (1989), vol. 73, pg. 574.
  • NMR methods suitable for use with the present invention include the use of insensitive nucleus enhancement by polarization transfer (INEPT) , two-dimensional Nuclear Overhauser spectroscopy (NOESY) , reverse INEPT, totally correlated spectroscopy
  • Such methods will in some instances involve ab initio prediction of the main chain coordinates (such as the method of Finkelstein and Reva, 1991) , and in other instances involve interpretation of experimental data (e.g. X-ray diffraction results) to resolve the main chain coordinates.
  • the positions of all main chain atoms need not be initially determined.
  • the carbonyl carbon and oxygen, C ⁇ , and the amide nitrogen are generally constrained to lie in a plane. With this constraint and the knowledge of the positions of some of these atoms, and amino to carboxyl direction, the remaining atoms of the peptide main chain can be constructed as known in the art (see, Kabsch, Acta Crvst. (1978) vol. A34, pg. 827, which is incorporated by reference for all purposes) .
  • amino acids may be either L-optical isomers or D-optical isomers, but unless otherwise specified will be the naturally occurring L- natural amino acids.
  • Standard abbreviations for amino acids will be used, whether a single letter or three letters are used. The single letter abbreviations are included in Stryer, Biochemistr . 3rd
  • a primary sequence of the peptide is mapped onto this peptide conformation.
  • a primary sequence is mapped onto a main chain by assigning a side-chain to a particular main chain atom.
  • a glutamic acid side-chain conventionally designated by the symbol E, is assigned to the first alpha carbon of the peptide, C_ a of the peptide, an aspartic acid side-chain (symbol D) is assigned to the second alpha carbon peptide, C 2 ⁇ , a glycine (symbol G) side-chain to C 3 ⁇ and C 4 ⁇ , etc.
  • the three-dimensional position of Co for each side- chain is determined according to predefined relationships between the main chain backbone. Mapping of the primary sequence of the peptide onto the main chain backbone identifies the alpha carbons associated with each amino acid and it positions C ⁇ for each residue in a predetermined position relative to the main chain backbone.
  • the primary sequence of a peptide represents the identity and sequence of the peptide's amino acids and may be obtained by techniques well-known in the art of peptide chemistry and molecular biology. Suitable methods for determining the primary sequence include, but are not limited to, direct determination from X-ray crystal data, peptide sequencing, and gene sequencing. Determination of a peptide's primary sequence from X- ray data consists of tracing the electron density map of the peptide and assigning the side-chains to each residue based on the electron density and knowledge of side-chains structure. A second and more conventional method of primary structure determination is peptide sequencing and is well known in the art.
  • Edman degradation which exemplifies peptide sequencing, removes a single amino acid from amino terminus of the peptide bonds between other amino acid residues.
  • Edman degradation generally uses phenyl isothiocyanate which reacts with the uncharged terminal amino group of the peptide to form a phenylthiocarbamoyl derivative.
  • a cyclic derivative of the terminal amino acid is released into the solution leaving the intact peptide shortened by one amino acid.
  • the liberated cyclic compound is a phenylthiohydantoin amino acid that is identified by chromatography (See Stryer "Biochemistry" (1975) W.H. Freeman & Co., pg.
  • Another peptide sequencing method uses isothiocyanate under different conditions to sequence the peptide from the carboxyl terminus (see Schlack et al. Z. Physiol. Chem. (1926) vol. 26, pg. 865; Bailey et al. Tech. Prot. Chem. II (1991) pg 115; and Boyd et al. Tet. Lett. (1990) vol. 27, pg. 3849; which are all incorporated by reference for all purposes) .
  • Other methods of peptide sequencing include cyanogen bromide degradation, trypsin digestion, staphylococcal protease, etc. , alone, or in combination with the above described techniques, as is well-known in the art.
  • Gene sequencing is another common method for obtaining a peptide primary sequence. This method involves isolating the gene encoding the peptide, sequencing the gene, converting the resulting four-nucleotide code of nucleic acids to the 20-amino acid code of peptides.
  • Insertion and expression of the library in a suitable host identifies host cells containing the vector containing the gene that encodes the peptide.
  • host cells can be isolated and their DNA isolated and sequenced.
  • Methods for sequencing genes are well known in the art, (see for example, Sambrook et al. "Molecular Cloning: A laboratory Manual” 2d ed. , (1989) Cold Spring Harbor Press, chapter 13, which is herein incorporated by reference. In general, two sequencing techniques are commonly used: the enzymatic method of Sanger et al. and the chemical degradation method of Maxam and Gilbert.
  • each nucleotide base in the oligonucleotide has an approximately equal chance of being the terminus, and each population consists of an equal mixture of oligonucleotides fragments of varying lengths.
  • This population of oligonucleotides is then resolved by electrophoresis under conditions that can discriminate between individual olignucleotides differing in length by as little as one nucleotide.
  • the order of nucleotides along the DNA can be read directly from an autoradiographic image of the gel.
  • amino acid side- chains are mapped onto the main chain backbone.
  • mapping refers to the process of identifying the amino acid side-chains for each alpha carbon of a peptide main chain. This step is necessary to associate the correct side-chains with each residue's alpha carbon when only the main chain backbone structure is available. For example, in cases where the position of the main chain backbone structure is determined by low resolution crystallography, the identity of each residue is not obtained.
  • Use of gene sequencing can provide the primary sequence of the peptide, which is used to specify the amino acid side-chains attached to each alpha carbon on the main chain backbone.
  • a second aspect of sequence mapping involves specifying the three dimensional position of the beta carbon for each side-chains.
  • the beta carbon for each amino acid has a predefined spatial relationship relative to the main chain atoms. This relationship is used when the position of the beta carbon is unknown.
  • conformation energy of a peptide or other molecular system can be modelled in many ways, ranging from potential energy functions having a single van der Waals interaction term, to potential energy functions having many terms that account for torsional biasing, electrostatic interactions, hydrogen bonding, hydrophobic interactions, entropic destabilization, cystine bond formation, and other e fects.
  • r is the interatomic distance and r 0 and £ 0 are empirical parameters describing, respectively, the equilibrium interatomic distance and the depth of the energy well for the van der Waals interaction of the pair of atoms.
  • Other forms of this expression such as those involving different combinations of exponents may also be used.
  • Table 1 presents preferred values used in the preferred embodiment of the invention. These parameters may be optimized by a variety of means known to those of skill in the art. No attempt has been made to optimize the particular values shown because they gave excellent results.
  • hydrogens atoms attached to both main chain and side- chains atoms are preferably not included in this molecular representation. In order to compensate for this, the van der Waals radius of each atom that has attached hydrogens is slightly augmented.
  • the van der Waals force is an electrostatic interaction arising from an instantaneous asymmetric electron distribution, which causes a temporary dipole. This transient dipole induces a complementary dipole in a neighboring atom to stabilize the transient dipole. An instant later the dipoles are likely to be reversed resulting in an oscillation and a net attractive force. At one extreme (as r tends to infinity) , atoms do not interact and have no stabilizing or destabilizing effect on one another. At the other extreme (as r tends to zero) the electrostatic repulsion between atoms becomes strong and dominates other stabilizing effects. The Lennard-Jones potential becomes infinite, which physically corresponds to superimposing two atoms.
  • a torsional potential energy function models the interaction of linear four-atom sequences, such as Y-C-C-X.
  • Y-C-C-X See Streitwiser et al. "Organic Chemistry," 2d ed. , Wiley & Sons, pg. 70 (1987) for a description of torsions about a carbon-carbon single bond
  • Fig. 4 schematically shows a torsion about a carbon-carbon single bond.
  • Fig. 4a is a stick representation of Y-C-C-X
  • Fig. 4b shows torsion X in a view along the C-C bond.
  • Suitable torsional potentials have the form:
  • K is a constant that is typically about 1 to about 5 kcal/mol (preferably about 1.5 kcal/mol), n is 1-3, d is 0-360° and ⁇ is the torsion angle between the groups attached to the two central carbon atoms.
  • the magnitude of the interaction, K depends on the individual identities of all groups attached to the central carbons. In general, when the atoms X and Y are large, K is also large.
  • the torsional potential for alkane bonds represents the tendency for groups attached to central carbon-carbon single bond to adopt a trans or gauche conformation. The potential is applied to all rotational degrees of freedom for each amino acid residue, except for ⁇ of phenylalanine, tyrosine, histidine, and tryptophan. Since these involve an sp 2 hybridized carbon, they require a torsion potential that accounts for the two-fold rotational symmetry of the planar ring.
  • E electrostatic ⁇ Z A Z B/ D ⁇ where r is the interatomic distance between two charged atoms, A and B; Z A and Z B equal the respective charges on the two atoms; and D is the dielectric constant of the environment around atoms A and B.
  • r is the interatomic distance between two charged atoms, A and B; Z A and Z B equal the respective charges on the two atoms; and D is the dielectric constant of the environment around atoms A and B.
  • r is the interatomic distance between two charged atoms, A and B
  • Z A and Z B equal the respective charges on the two atoms
  • D is the dielectric constant of the environment around atoms A and B.
  • the effective charge of an atom depends on its surrounding environment including such factors as, for example, pH, accessibility to water, the polarity of the solvent, and the presence of other charges.
  • Other types of electrostatic forces influence peptide structure as well.
  • dipole moments which describe partial charges on an atoms, occur in an uncharged, but polar groups of atoms.
  • the electrostatic potential described by such dipole moments are well known and may be implemented as is known in the art.
  • Another type of primarily electrostatic interaction is the hydrogen bond, which occurs when a hydrogen atom is shared between a proton donor and a proton acceptor.
  • Hydrogen bonds stabilize pairs of polar moieties having hydrogen atoms to share and donate, such as between a serine hydroxyl group and the carbonyl carbon of an amide group, or between acid group such as the carboxyl of a glutamic acid and water.
  • the potential energy terms for both dipole and hydrogen bond interactions are well known in the art (see Cantor et al.) .
  • Hydrophobic interactions are destabilizing noncovalent interactions between an atom having hydrophilic character and one having hydrophobic character. For example, large hydrophobic interactions occur between the polar, aqueous environment of the solvent and nonpolar residues of the peptide, such as valine, leucine, isoleucine, phenylalanme, etc.
  • hydrophobic interactions result in a tendency for nonpolar side-chains to avoid interaction with solvent.
  • Potential energy functions representing hydrophobic interactions are well known in the art and are used in some preferred embodiments to increase the prediction accuracy of hydrophobic side-chains that happen to be exposed to solvent on the surface of the peptide.
  • the physical stability of a peptide is modelled by a potential energy function having only van der Waals and torsional energy terms for simplicity. In other preferred embodiments, one or more of the previously-described energy terms are added.
  • the method of the present invention involves moving structural elements of molecular systems (e.g. peptide or nucleotide side-chains) to maximize favorable interactions and minimize unfavorable ones.
  • a conformation probability map is produced which represents the probability that the molecular system will reside in a particular conformation at any given time.
  • an "ensemble" of probable molecular conformations is produced that provides a substantially more accurate description of a molecular system than a static structure representation. This is because real molecular systems constantly move between a variety of conformations many of which are not accounted for by a static structure. Even if the static conformation chosen is energetically favorable, it will only represent the state of the molecular system over a fraction of time.
  • the method of the present invention also produces a conformation energy map representing the energy ensemble of the molecular system.
  • energetically favorable conformations of the molecular system can be quickly identified.
  • a major factor influencing the conformation of a structural element is the necessity of avoiding steric overlap.
  • One aspect of the present invention predicts energetically favorable conformations by minimizing the steric packing interactions. Of course, other influences such as electrostatic interactions are very important in some molecular systems and must therefore be included in some predictive method.
  • a preferred embodiment of the invention predicts time averaged peptide side-chains positions by determining the relative steric energy of each conformation for each side-chains. Low energy side-chains conformations correspond physically to a peptide conformation having well-packed side-chains. Finding these side-chains conformations requires an efficient search and minimization strategy to locate energy minima in a very large conformation space.
  • a preferred minimization method resembles the molecular field theory reported by Finkelstein and Reva (1991) , and, more generally, the Hartree-Fock self-consistent-field (SCF) methods (see Levine “Quantum Chemistry” (1983), pg. 256 et seq. , Allyn and Bacon, Newton, Massachusetts, and Blinder Am. J. Phys.. (1965) vol. 33, pg. 431, which are both incorporated by reference for all purposes) . Finkelstein and Reva employed a similar molecular field approximation to select among prospective ⁇ sheet foldings.
  • the SCF method for multi-electron atoms uses the approximation that the exact wave function of a higher atomic number atom or polyatomic molecules is approximated by product of single electron wave functions and minimizes the variational integral with this approximate wave function.
  • the Hartree-Fock method first guesses a wave function. The method concentrates on a first electron, ignoring the positions of the remaining electrons and assuming that they form a static electronic distribution through which the first electron moves. In effect, the method time averages the instantaneous interactions between the first electron and the remaining electrons. This static electronic configuration produces a potential energy field. Solution of the one-electron Schroedinger equation with this potential energy function results in an improved orbital for the first electron.
  • the SCMF method then calculates an improved orbital for each electron in the atom to give a full set of improved orbitals. To improve the orbital wave functions, the method repeats this entire method using this improved set of orbitals to further improve the orbitals. This is repeated until it converges to a "self-consistent" set of electron wave functions.
  • the inventive method minimizes the conformation energy of a macromolecular structure by minimizing the interaction energy of a side-chains in the potential energy field created by the macromolecular complex.
  • the invention method uses approximate potential fields to bootstrap the solution.
  • a preferred embodiment of the inventive method begins by supplying an potential energy function for the macromolecular structure. Next, the interaction energy of the various elements (or residues) of the complex are calculated from the potential energy field created by the other elements. The elements are then moved about to form a variety of conformations for each element. After each move, the interaction energy is recalculated for the modified complex.
  • the interaction energies for each conformation of each element are averaged to form conformation energy maps for each element. These maps are then used to produce corresponding conformation probability maps for each element a the cycle is completed.
  • the elements are moved through a variety of conformations in accordance with the probability maps constructed in the previous cycle. Conformation energy and probability maps are then produced in the first cycle.
  • the interaction energy of the various elements will converge to a "self-consistent" or constant solution.
  • the conformation probability and energy maps associated with this solution represent ensembles of the macromolecular complex.
  • the probability map can be viewed as a representation of the time-average conformation of a given element.
  • the prediction problem may be recast in a very different way.
  • the thermal ensemble of conformations may be optimized to find the ensemble most likely for a given protein at a given temperature.
  • Such an approach might not only give a more realistic prediction of a protein's structure and energetics, but can also draw upon a rather different set of optimization techniques, founded in basic thermodynamics.
  • a preferred procedure of the present invention involves iterative thermodynamic refinement, which gradually condenses a protein's set of possible side-chain conformations into the most likely, self-consistent ensemble at a chosen temperature.
  • each residue i is assigned a conformational probability map pi ( ⁇ _) , which records the time-fraction it spends in each of its possible conformations ⁇ - j _:
  • the set of all residues' probability maps P ⁇ p_ p 2 ... p n ⁇ specifies the state of the overall protein's ensemble, and permits the calculation of a potential of mean- force, for example, the potential energy of a probe atom A
  • Ei(Xi) ⁇ all res j ⁇ i ⁇ a ll ⁇ j Pj(Xj) j(Xi, j)
  • E ij ( ⁇ i , ) is the interaction between residue i (in conformation ⁇ _) and residue j (in conformation x ⁇ ) .
  • a residue's set of potential energies E_ ( ⁇ _) over all its possible ⁇ _ comprises its conformation energy map, and the set of all such E i ( ⁇ i ) for all residues i comprises E, the mean-field.
  • the probability map set P specifies a unique mean-field E.
  • the probability of each particular conformation may be determined by many forms known to those of skill in the art. However, it should depend directly upon the unique mean field.
  • a preferred method of determining the probability associated with each conformation derives from the statistical mechanical canonical ensemble.
  • the thermal ensemble representing the correct time-averaged structure for the protein at equilibrium
  • Self-consistent solutions obtained by this procedure provide a predictive model of the protein's thermal ensemble. In general any starting ensemble will converge to a self-consistent ensemble. However, this does not guarantee convergence to the ensemble representing the native state, as there might be multiple solutions to the ensemble prediction problem, which confound its search for the desired native structure.
  • each residue should be sampled in each possible conformation between about 5 and about 20 times on average to calculate E i ( ⁇ i ) . In most preferred embodiments, only about 8 to about 10 samples are necessary.
  • the conformations are generated randomly by selecting a conformation ⁇ for each i according to its p_ ( ⁇ _) , interspersing simple step moves (in which the residue moves by a slight perturbation from its current conformation) with occasional jump moves (in which it can move to any conformation) .
  • this sampling procedure may be seeded with a randomly selected conformation which will be referred to herein as the "starting structure".
  • Jump moves though in this respect computationally more expensive, ensure that the sampling procedure can cross energetic barriers to give uniform sampling across the conformation space. Combined, these moves provide an efficient and comprehensive method for sampling the mean- field.
  • the potential energy of each residue is calculated, and added to the running average of the potential energy of the current conformation.
  • the average E i ( ⁇ i ) is used to calculate a new p i ( ⁇ i ) for each residue.
  • a variety of starting conformations may be employed in the present invention. These will preferably take the form of a geometric representation of the peptide stored within a computer memory. In many instances, the initial conformation of the geometric representation will have each side-chain randomly oriented without regard to neighboring side-chains. Not only does this provide the most strenuous test of the method's predictive power, but it is also generally good practice for prediction of unknown structures—when little is known about a structure. It is generally better to start unbiased than to employ a generic bias that might exclude correct answers. However, a completely random initial structure contains no information about the actual ensemble and, therefore, results in a computationally expensive procedure. In instances where some information is known about the starting structure (from sources such X-ray diffraction or other models for example) , it will sometimes be advantageous to use that information to construct a starting structure. This approach will often result in considerable savings in computation time.
  • the conformation probability map is uniform. Physically, this corresponds to a very high or nearly infinite temperature. At such temperatures the thermal motion of the peptide will overwhelm the steric pressure that promotes ordered packing.
  • the constant-condense method At very high temperatures, all conformations have about equal probability, while at room temperature the ensemble is sharply focused into a small peak of conformations that represent the native state. The prediction problem, then corresponds to condensing down the diffuse probability map of the T — ⁇ starting ensemble into a sharp peak. It is important that this occur smoothly and gradually.
  • the constant condense method automatically gives a constant, controllable amount of condensation of the P i (X j _) in each cycle.
  • the thermal factor kT in the Boltzmann probability equation is replaced with an effective temperature ⁇ , where ⁇ is the standard deviation of the current mean field (E i ( ⁇ i ) (over all ⁇ _) , and where r is a constant "thermal" factor controlling the rate of condensation (the larger ⁇ is, the slower the condensation) .
  • This has the effect of scaling the effective temperature to the "natural dimension" of the mean- field energy distribution. (e.g., Conformations with energy one standard deviation above the mean will be assigned probability e _1 / ⁇ lower, two standard deviations above gives probability e ⁇ 2 / ⁇ lower, etc.
  • Two other cooling procedures are reciprocal and linear thermal cooling. Since the constant-condense effective temperature ⁇ differs for each residue (via its dependence on ⁇ ) , this method departs from correct thermodynamics in that it does not model an ensemble equilibrated to a uniform temperature.
  • Two cooling methods that do employ a uniform temperature to cool gradually from 6000°K -> 298°K are reciprocal cooling, which sets the temperature proportional to 1/icyc (where icyc is the number of cycles done so far) , and linear cooling, which simply reduces T linearly, and then allows the ensemble to equilibrate over several cycles (10) at the final temperature. Both methods gave essentially the same structure and energetic predictions as the constant-condense procedure.
  • Energetics predictions were derived from these calculations simply by tracking the average total energy of the system until it converged to an unchanging value, and using the average total energy in the final cycle as the predicted "packing energy" for the peptide.
  • the energy for different mutant peptides was found to condense at identical rates, so a constant number of iteration cycles will preferably be used for all the calculations.
  • fifteen cycles were used for constant-condense or reciprocal cooling runs, and fifteen cooling plus ten final equilibration cycles for linear cooling runs. The latter runs were given extra equilibration cycles at the final temperature (298°K) , because the energy was still not converged at the end of the linear cooling cycles.
  • an ensemble may be generated according to the present invention by starting with a static structure that corresponds to a known conformation, using Information obtained from X-ray diffraction or other techniques.
  • the present invention can be used to take a known static structure and convert it to a more complete ensemble of structures and associated energies.
  • the geometric representation of the peptide will be heated rather than cooled to produce the thermal ensemble.
  • the starting temperature used in the probability expression will be well below infinity and the conformation probability map will be well defined (as opposed to uniform) at the beginning of the procedure.
  • a heating procedure may also provide greater overall accuracy when the peptide to be modelled contains many residues that are not well-packed (i.e.
  • the invention may be embodied on a digital computer system such as the system 100 of Fig. 5, which includes a keyboard 102, a fixed disk 104, a display monitor 104, an input/output controller 106, a central processor 108, and a main memory 110.
  • the various components communicate through a system bus 112 or similar architecture.
  • the user enters commands through keyboard 102; the computer displays images through the display monitor 104, such as a cathode ray tube or a printer.
  • an appropriately programmed computer such as a Silicon Graphics Iris 4D/240GTX is used.
  • Other computers may be used in conjunction with the invention. Suitable computers include mainframe computers such as a VAX (Digital Equipment Corporation,
  • the internal processes of the prediction method generally consists of a setup routine which loads data and performs preliminary data analysis, and a minimization routine that minimizes the conformation energy of the peptide. These processes will be described in detail with reference to the flow charts in Fig. 6.
  • Data for each residue type may be stored in a residue description having the following form:
  • This residue description contains four major sections.
  • the next section describes the atoms by type, the movement order, and the van der Waals (Lennard-Jones) constants. For example, the entry:
  • the third section of the data describes the bond lengths and bond angles of the residues in a local frame of reference. For example, the entries,
  • CA 1.48 specifies that the bond length between the amide nitrogen and C ⁇ is 1.48 angstroms. "ang ILE C ILE CA ILE CB
  • the fourth section defines the three dimensional angular relationships between different atoms.
  • Fig. 7 The "twist” relationship is shown in Fig. 7 and describes the relationship of an atom with respect to a set of three other atoms.
  • Fig. 7a the three atoms N, C ⁇ and C of the ILE residue uniquely define a plane that pass through the atoms.
  • "Twist” defines the angle that the fourth atom makes with this plane, as shown in the Fig. 7b.
  • "tor” 25 is shown in Fig. 7c-d and defines the torsional angle between the atoms N and CGI, about the bond formed by CA and CB.
  • ⁇ dof> indicator which specifies that there is a rotational degree of freedom between atoms CA and CB.
  • K is preferably 1.5 kcal/mol
  • n is preferably 3.0
  • d is preferably 0.0, indicating a three-fold torsional potential having a maximum potential energy of 1.5 kcal/mol for a full eclipsed structure.
  • the initial data which represent the main chain conformation, includes data for each atom in the peptide main chain such as the three dimensional position and its chemical identity (for example, whether the atom is a carbon, nitrogen, oxygen, etc.). Such data comes from a variety of sources. As described above, diffraction, NMR, theoretical prediction or another suitable method may provide the main chain coordinates and identities.
  • the main chain coordinates for peptides described herein, however, are derived from the Standard Brookhaven Protein Data Bank (PDB) , which is well known in the art.
  • PDB Standard Brookhaven Protein Data Bank
  • a computer program was written in c.
  • the main calculations consist of iterating the mean-field calculation over a set number of thermal cycles sufficient to allow the energy and maps to converge to a final, unchanging answer.
  • the van der Waals interactions for each residue with all fixed atoms are precalculated for all its possible side-chain conformations (the energy calculations are described below) .
  • lists of side-chain atoms which can come within a nonbonded cutoff distance (6 A) of each other are compiled prior to the main calculations.
  • a preferred method of data input is described with reference to Fig. 6a.
  • an optional process step 212 supplies the coordinates and the atom types for the missing main chain atoms.
  • Such a process calculates, based on the positions of C and the amino to carboxyl direction, the positions of the carbonyl carbon and oxygen, and the amino group.
  • the primary sequence of the peptide is mapped onto the main chain.
  • the primary sequence is merely a sequence of data representing the amino acid sequence.
  • the mapping associates an amino acid side chain with each C_ a , as shown in step 214. Referring to Fig.
  • step 300 the coordinates of the main chain atoms are used as input to determine the position of Ci ⁇ (step 302) . Pairwise interaction tables are calculated (step 304) , initial conformations are assigned to each side-chains (step 306) and the steric energy is calculated for this initial conformation (step 308) .
  • a group of computer programs written in c is used to perform set up and execution of the method of this invention. These routines were compiled and ran on a Silicon Graphics Iris 4D/240GTX computer. The main program was employed to calculate self consistent mean field calculations.
  • the main program (“cara”) provides conformation energy maps and confirmation probability maps for the test peptide at the final temperature of the run. As described above, this information can be used to determine the packing or binding energy of the system being investigated.
  • the main program reads a binary input file ("readpro") produced by another routine.
  • the information used by the binary input file to create the binary input file is taken from three files.
  • a coordinate file is used to supply a list of the peptide atoms together with their Cartesian coordinates.
  • One source of such lists is the Brookhaven Protein Data Bank (PDB) .
  • a data file (plib) , which is described in detail above, provides, among other information, the types of atoms, each of the amino acids, their movement order and van der Waals constants.
  • a routine known as resmap.lib describes an envelope or range of movement available to the various atoms of each side chain by virtue of the side chain torsional degrees of freedom.
  • Plib and the PDB data are used by another routine ("applib") to create text describing the information contained in plib and the PDB. Collection of this information is coordinated by another routing ("upset") , a routine contained in a listing of auxially files (“auxfiles”) .
  • the output of applib is used by "makegen” to convert the text information from applib into local frames of reference for each degree of freedom.
  • This information together with the output of resmap.lib is used by readpro (described above) to produce the binary data file used by the main routine.
  • Another routine (“psizer") determines the size of the files being sent to readpro and allocates memory sufficient to store this information.
  • a preferred embodiment of the invention uses look-up tables to tabulate pairwise interactions between atoms in the peptide.
  • the first look-up table lists side-chains-main chain atom interactions while the second table lists side-chains-side-chains interactions. These lists reflect the notion that atoms in separate three-dimensional areas of the peptide do not interact and, thus, should not considered.
  • construction of the first list begins by first classifying atoms as moving or stationary. Stationary atoms are not moved during the minimization and include all main chain atoms, as well as c" atoms, and any other atoms in the peptide (including any desired side-chains) that are held fixed in space during the minimization.
  • a pairwise list of the moving atoms that could interact with main chain atoms is generated by moving the side-chains through their possible positions and tabulating atom pairs that can come within ⁇ k of each other.
  • This pairwise interaction list is a boolean list that indicates which atoms could possibly interact with other moving atoms.
  • Each thermal cycle consists of setting all moving residues to a starting structure, and calculating the energy for a large number of moves, sufficient to obtain a good approximation for each residue's conformational energy map.
  • enough moves are made to obtain a weighted average of at least about 5 samples of each (and preferably about 8) residue conformation.
  • the weighting accounts for the different probabilities associated with each conformation.
  • To make one move a new conformation is selected for each residue, by looking up the relative probabilities of all conformations within a certain distance of its current conformation, and choosing one.
  • torsions were represented as the integers 0-32, covering the range of rotations 0-360° in discrete steps of about 12°, providing sufficient resolution for the work described herein.
  • step sizes may be used depending upon the resolution desired in the run, such as, for example, 5-20°.
  • conformations within ⁇ 1 step of the current torsion angles are allowed; in "jump" moves steps of magnitude greater than 1 are considered and preferably all of the residue's torsional conformations are considered.
  • the temperature factor ⁇ is calculated from the ⁇ of the E mean _ fiel for the residue.
  • a new conformational probability ma is generated from each residue's E mean _ field , according to the Boltz ann probability equation, concluding the thermal cycle.
  • the starting structure used for beginning a new thermal cycle is set either to random torsions (for the very first cycle) , or to the peak probability conformation.
  • the latter structure, used as the start of every cycle after the first, is generated by setting each residue to the highest probability conformation in its latest conformational probability map.
  • the method's physical basis is both simple and well- founded. It uses only van der Waals interactions and a simple alkane torsional potential, whose force constants are relatively well-known from experimental data.
  • the constants used in the present calculations were derived from refinement of experimental measurements of organic crystals (Hagler et al., 1974), and have been in use since the 1970's (e.g. Levitt, 1983) .
  • the van der Waals calculations recognize three distinc atom types—oxygen, nitrogen, and carbon/sulfur—characterized by two constants each: an equilibrium atomic diameter, and a scale factor giving the strength of the equilibrium interaction.
  • Figure 6d is a flow chart of the main program for calculating a peptide's conformation energy maps and conformation probability maps.
  • the routine is initiated by setting an initial conformation probability map (typically having a flat contour) .
  • various operations for a thermal cycle at 1001 are performed, and the cycle counter, "icyc," is set to 0.
  • an initial temperature will be set.
  • the temperature is typically changed with each new iteration.
  • a number of cycles are run to generate a conformation energy map for each residue. This is accomplished by first checking the number of moves (icyc) at 1004 to determine whether a multiple of 200 moves have been made.
  • a normal step move 1006 is made by adjusting the conformations of the peptide's side chains by a small increment as described above. If, however, icyc is a multiple of 200 a jump move 1008 is performed. As described above, a jump move results in the conformations of the peptide's side chains being moved by relatively large increments.
  • a flag 1010 is set forcing an update of the list of interactions considered in calculating the interaction's energies between various residue atoms. After either a jump move or a step move, the interaction energies for the atoms of each residue are calculated at block 1012. Next, those interactions energies are summed to obtain an overall residue energy for each side chain at 1014.
  • Step 1016 The residue energy is then saved at step 1016 to be used later to calculate the conformation energy map for the peptide.
  • Steps 1018 and 1020 check to determine if a sufficient number of moves (cycles) have been made to adequately sample conformation space. As noted above, when each conformation has been sampled by a weighted average of 8, the iteration at the current temperature is completed. In addition, if icyc exceeds a pre-set data- limit the current temperature iteration is completed. After each current temperature iteration, a new temperature is set
  • step 1022 the system is checked to see whether a sufficient number of temperature iterations has been conducted to end the run.
  • all side-chain torsions are set to random angles selected in the range of 0 to 360° and having a uniform probability distribution.
  • the side-chains are place in predetermined positions according to, for example, the crystal structure data on homologous or mutant/wild type enzyme.
  • the method can predict side-chains conformations in local zones of 5 to 15 residues within a protein, or alternatively, simultaneously predict all the side-chains conformations within a whole protein. In some cases, it may b advantageous to predict the conformations of only a fraction of the total peptide side-chains. For example, some peptides will have conformations that are well known from X-ray crystallography or other techniques, and mutants of these peptides will have only slightly perturbed structures. Because it can be expected that the mutant structure will deviate from the wildtype peptide structure only at certain localities, e.g. near the mutation site, the side-chains that are sufficiently separated from these localities may, in certain circumstances, be held in fixed conformations during the self consistent mean field iterations.
  • These fixed conformations preferably correspond to the known conformations of the wildtype peptide.
  • the initial conformations of peptides in the vicinity of the mutation may be selected randomly or on the basis of some preferred pattern, such as the wildtype conformation. This approach may require considerably fewer computations when the overall peptide size is large in comparison to the mutated region(s) .
  • Mutated peptides within the scope of the present invention can be synthesized chemically by means well-known in the art such as, Merrifield solid phase peptide synthesis and its modern variants.
  • Merrifield solid phase peptide synthesis for an exhaustive overview of chemical peptide synthesis, see Principles of Peptide Synthesis, M. Bodansky, Springer, Verlag (1984) ; Solid Phase Peptide Synthesis, J.M. Stewart and J.B. Young, 2d ed., Pierce Chemical Co. (1984); The Peptides: Analysis, Synthesis, and Biology, (pp. 3-285) G. Barany and R.B. Merrifield, Academic Press (1980) . Each of these references is herein incorporated by reference for all purposes.
  • the synthesis starts at the carboxyl-terminal end of the peptide by attaching an alpha-amino protected amino acid such as, t- butyloxycarbonyl (Boc) or fluorenylmethyloxycarbonyl (Fmoc) protective groups, to a solid support.
  • Suitable polystyrene resins consist of insoluble copolymers of styrene with about 0.5 to 2% of a cross-linking agent, such as divinyl benzene.
  • the synthesis uses manual synthesis techniques, as in traditional Merrifield synthesis, or automatically employs peptide synthesizers. Both manual and automatic techniques are well known in the art of peptide chemistry.
  • the resulting peptides can be cleaved from the support resins using standard techniques, such as HF (hydrofluoric acid) deprotection protocols as described in Lu, G.S., Int. J Peptide & Protein
  • cleavage methods include the use of hydrazine or TFA (tri-fluoracetic acid) .
  • mutated peptide designed by the methods described in the present disclosure can be produced by expression of recombinant DNA constructs prepared according to well-known methods. Such production can be desirable when large quantities are needed or when many different mutating peptides are required. Since the DNA of the wildtype (or other related) peptide has often been isolated, mutation into modified peptide is possible.
  • the DNA encoding the mutated peptides is preferably prepared using commercially available nucleic acid synthesis methods. See Gait et al. "Oligonucleotide Synthesis; A
  • Expression can be affected in either procaryotic or eucaryotic hosts.
  • Procaryotes most frequently are represented by various strains of E. Coli. However, other microbial strains may also be used, such as bacilli, for example Bacillus subtilis. species of pseudomonas, or other bacterial strains.
  • plasmid vectors that contain replication sites and control sequences derived from a species compatible with the host are used. For example, a common vector for E. coli is pBR322 and its derivatives.
  • procaryotic control sequences which contain promoters for transcription initiation, optionally with an operator, along with ribosome binding-site sequences, include such commonly used promoters as the beta-lactamase and lactose (lac) promoter systems, the tryptophan (trp) promoter system, and the lambda- derived P L promoter.
  • lac beta-lactamase and lactose
  • trp tryptophan
  • lambda- derived P L promoter any available promoter system compatible with procaryotes can be used.
  • Expression systems useful in eucaryotic hosts consist of promoters derived from appropriate eucaryotic genes.
  • a class of promoters useful in yeast includes promoters for synthesis of glycolytic enzymes, such as 3- phosphoglycerate kinase.
  • Other yeast promoters include those from the enolase gene or the Leu2 gene obtained from YEpl3.
  • Suitable mammalian promoters include the early and late promoters from SV40 or other viral promoters such as those derived from polyoma, adenovirus II, bovine papilloma virus or avian sarcoma viruses. Suitable viral and mammalian enhancers are cited above. When plant cells are used as an expression system, the nopaline synthesis promoter, for example, is appropriate.
  • the expression systems are constructed using well- known restriction and ligation techniques and transformed into appropriate hosts. Transformation is done using standard techniques appropriate to such cells.
  • the cells containing the expression systems are cultured under conditions appropriate for production. It will be readily appreciated by those having ordinary skill in the art of peptide design that the mutated peptides that are designed in accordance with the present disclosure and subsequently synthesized are themselves novel and useful compounds and are thus within the scope of the invention.
  • the physical stabilities can be measured using a variety of physical techniques.
  • thermal stability can be determined by assaying a specific property of the mutated protein at different temperatures as is well known in the art.
  • Physical stability is a structural property, and generally indicates the stability of a folded conformation of the peptide relative to an unfolded or denatured state.
  • Many methods such as spectroscopy, sedimentation analysis, chemical assays, etc. can determine whether a peptide has undergone a structure change. For example, NMR, circular dichroism, fluorescent transfer, etc. can measure the folded state of a peptide at different conditions.
  • mutants Of the 125 possible permutations, seventy-eight of the mutants were analyzed in vivo for DNA binding activity, and nine were purified for thermostability measurements. These mutants will be designated by the amino acids at the three mutated residues 36, 40 and 47; thus 36 val 40 met 47 val, the wildtype is "VMV".
  • Fig. 8a presents a comparison of predicted packing energy versus measured thermostability for a six residue molten-zone. The predicted energies were generated by seven runs form different random starts for each mutant; the error bars indicate their standard deviation.
  • Fig. 8 b shows detection of anomalous strain by comparing the energies calculated using the six residue molten- zone versus an eight residue zone. To facilitate comparison, the energies are shown relative to wildtype.
  • Fig. 8 c presents a comparison of predicted packing energy versus measured thermostability for the eight residue molten-zone.
  • VAV is the only example of destabilization by loss of attractive van der Waals interactions, rather than by gain of repulsive interactions, and thus probably creates little anomalous strain.
  • the calculated packing energies for two of the mutants, IMV and LLI are less than that of wildtype, because they are able to fit in additional methylene groups (one in IMV, two in LLI) without bad contacts to the surrounding structure, obtaining a net decrease in the van der Waal's energy.
  • these mutants do exhibit improved thermostability (3°C and 4°C, respectively). While both the overall trend and detailed ordering of the thermostability data are captured well by the predictions, the two mutant containing Phe do not fit the observed correlation line at all. Examination of their predicted structures reveals bad contacts with the fixed context surrounding the mutated residues, that produce an "anomalous strain" component that does not reflect the inherent packing qualities of these mutations, but rather artifact clashes resulting from holding the surrounding context fixed.
  • This anomalous strain component can be directly detected by expanding the molten zone to include residues around these Phe insertions, specifically Leu 65 (adjacent to Phe 36) and Asn 61 (adjacent to Phe 40) .
  • the pattern of predicted energies for the nine mutants is unchanged, except that FLV and FFI drop 11 and 18 kcal/mol respectively, relative to wildtype (see Table and Fig. 8 b) .
  • These large energy shifts were produced by relatively slight structural adjustments in Leu 65 (30° in ⁇ _ , 15° in ⁇ 2 ) , and in Asn 61 (20° in ⁇ _ , 1 ° in ⁇ 2 ) .
  • thermostability is the same as that of FFI (destabilized by 9°C relative to wildtype)
  • its calculated packing energy is about 7.5 kcal/mol lower than that for FFI, which contains many repulsive contacts.
  • anomalous strain component is about 60% of its calculated packing energy difference versus wildtype.
  • the calculated energies also discriminated active from inactive mutants quite well (Fig. 9a) .
  • Fig. 9 is a histogram showing the distribution of active (dark bars) and inactive (open bars) over the calculated energies (relative to wildtype) , the 10 sequences found experimentally to be fully active at 26°C (activity grade 5) all had strongly negative
  • Fig. 9b reproduces Lim and Sauer*s analysis of the distributions of their active versus inactive mutants over the range of core packing volume, a simple measure often used to forecast internal mutations' viability.
  • the distribution of inactive mutants is slightly shifted relative to that of active mutants, their extensive overlap makes volume a poor predictor of activity.
  • choosing the optimal cut-off rule of ' ⁇ volume ⁇ 3 then active' is only a marginally better predictor (19 errors out of 78) than simply asserting 'all sequences are active' (22 errors out of 78) .
  • E calc is 22/6 (nearly fourfold better) than this null assertion, volume is only 22/19 (16% better) .
  • Fig. 8d presents a comparison of predicted energetics with experimentally measured activity.
  • the 78 mutants* activity (measured experimentally) were plotted against their calculated energies.
  • the experimental activity grades 0-5 were defined by a simple plate assay that challenges each repressor mutant-bearing clone with five phage covering a range of different virulence levels. Thus there was no reason to expect a linear relation between the activity grades and true activity.
  • Lim and Sauer report that, among 10 mutants tested, DNA-binding affinity was about 0.1 (relative to wildtype) for mutants in grades 2-4, and ⁇ 0.01 for mutants in grades 0-1. 0 i? calc ; ⁇ , grade average.
  • Comparison of prediction sets generated from many different starting structures provides a straightforward measure of the level of "noise" within the calculated energy (a very different matter from systematic error, due to incorrect aspects of the theory) , and its dependence on initial conformation.
  • the standard deviation for VMV was 1.2 kcal/mol (by constant condense method) and 0.3 kcal/mol (by linear cooling method) .
  • the average standard deviation was 0.7 kcal/mol (by constant condense method), and 0.5 kcal/mol (by linear cooling method); the overall trend and detailed ordering of the mutants' energies were unchanged.
  • VMV the wildtype
  • the predicted peak-probability structure closely matched the native structure as determined by X-ray diffraction, with an overall rms deviation of 0.49 A.
  • Fig. 8 presents a comparison of predicted side-chain coordinates (bold lines) for an eight residue molten-zone surrounding the mutations, versus the X-ray structure (thin lines) . The main chain is shown as a dotted line.
  • the prediction errors were confined primarily to the two methionines (residues 40 and 42, side-chain rms error 0.56A and 0.94A, respectively) and Asn 61 (0.65A). While these coordinate errors were slight, their basis was interesting.
  • the X-ray coordinates contained a bad contact (Met 40 C ⁇ - Val 47 C ⁇ l , 2.93A « 6 kcal/mol) which the prediction avoided by moving the side-chain away and out, to position these atoms « 4.0 A apart. However, this forced Met 40 to within 2.7 A of Met 42 C ⁇ , in turn forcing this surface side-chain outwards into the gauche-conformation.
  • Fig. 12 shows the condensation of one residue (Leu 64) in typical constant-condense prediction run on VMV.
  • a contour line is drawn at a probability level equal to 10% of the peak probability density in the current map. The first cycle winnows the residue's conformation probability map greatly, excluding regions where it clashes strongly with the main-chain and surrounding fixed side-chains.
  • a 0 was set to give a peak six-fold higher than the uniform background probability, and the decay constant r 0 was 60°.
  • the conformational probability maps and total energy were slightly perturbed in the initial condensation cycles, the final result was not substantially effected. The system converged similarly to a low energy, and leu 64's conformational probability map gradually became more and more like that in the unbiased case, and converged to the same final peak.
  • Fig. 13 illustrates the condensation of a six residue molten-zone for the wildtype protein, showing random samples of conformations form cycles at the beginning (a) , middle (b) , and end (c) of the run. The predicted side-chains are labeled in c. The last panel is representative of the dynamics in the final ensemble predicted by the method. The ensemble's progressive condensation slows and eventually halts because the residues become so focused they no longer strike bad contacts with each other.
  • the ensemble becomes self- consistent when the residues' pressure to condense (due to collisions with each other) falls below the inherent pressure to diffuse supplied by thermal motion.
  • the thermal motions in the final, equilibrated ensemble were quite slight, corresponding to B-factors in the range of 3 - 11 A 2 .
  • B- factors correlated with atoms' distance along the side-chain from the fixed backbone, and were highest for residues at the protein surface (Met 42, Leu 64).
  • the low overall B-factors reflect the method's use of a fixed mainchain, which disallows coupled motions of the protein as a whole.
  • Met 42 converged to a single, well-defined peak in the final conformational probability maps, indicating a relatively confident structural prediction.
  • Met 42's final ensemble in contrast, consisted of two separate peaks representing the conformations gauche- trans and gauche- gauche- (see Fig. 13c) .
  • the method could not "decide” on a best conformation for this residue, and instead left both peaks in the final map, marking the prediction as internally inconsistent.
  • the final map provides a further indicator identifying possible errors in the predictions.
  • Fig. 14 illustrates convergence of the total system energy as a function of iteration cycle, for a six residue molten-zone by the constant condense algorithm.
  • the plots for the wildtype protein and six mutants are nearly identical, except for relatively slight differences in their final, converged energies. It is these differences which are used to predict the mutations' effect on stability.
  • Positive controls included significant procedural changes in the method that in principle are irrelevant to packing energetics (Table 2) , while negative controls disturbed the method's ability to calculate van der Waals' interactions accurately, and to condense reliably to the global minimum.
  • the largest internal deviation among the set of positive controls was the variation of LLI, the most stable mutant, relative to IMV, the next most stable mutant.
  • deliberately introducing inaccuracies into the van der Waals calculations severely disrupted the predictions. Deleting just one non-bonded list entry reduced the calculations' correlation with experimental T m s from about 0.9 to about 0.4 correlation coefficient.
  • forcing the ensemble to condense too rapidly to locate the global minimum also destroyed the correlation with experiment.
  • the simple point of these tests is that the method's predictions reflect the packing quality of the mutants' global minimum ensembles, not artifacts of the method's procedures.
  • SCMF self-consistent field method
  • Fig. 15a is a graphical representation of this, lowest energy (Emin) versus the number of conformational moves generated (N) .
  • Emin lowest energy
  • N number of conformational moves generated
  • Fig. 15c analyzes the size-dependence of the two methods' rates of convergence.
  • Fig. 15c shows how zone size affects the number of conformational moves required before attaining an energy one- third of the way between the minimum possible energy and the mean energy of a random sample of conformations.
  • SCMF is uniformly low, about 1000-2000 moves for the very small zones containing only residues with 1-2 free torsions, and rising to about 7000 moves for the zones containing larger side-chains.
  • Simulated annealing by contrast, requires increasingly large numbers of moves to converge to E one _ third , in proportion to the total number of residues i the molten zone.
  • thermodynamic free energy difference ⁇ E
  • FIG. 16 shows theoretical calculations of the ⁇ E for a series of hydrophobic core mutations in the protein barnase, compared with experimental measurements on these mutants (Kellis et al. , Biochemistry, 28:4914-4922 (1989)).
  • Physical stability calculations for the native state were performed starting from x-ray coordinates of the wildtype protein; physical stability for the unfolded state was calculated starting from an extended 0-chain conformation. The energy differences between these calculated stabilities were added to the known ⁇ G transfer values for the mutated amino acids (representing the hydrophobic effect; the values reported by Bull and Breese (1974) Arch. Biochem.
  • One aspect of the present invention involves the automatic search and identification of mutations in a given macromolecular complex which produce a desired effect on its physical stability.
  • This aspect of the invention may be applied to design of drugs that bind their target more tightly, proteins which are more thermostable, proteins which bind a given DNA sequence specifically, and other uses which will be apparent to those of skill in the art.
  • Subtilisin is a commercially important protein used in some cleaning applications. It would be highly desirable to produce mutant subtilisin peptides having increased thermal stability, while retaining the activity of the wildtype compound.
  • the present method has been employed to generate stability prediction for various peptides including subtilisin.
  • One approach employed was to identify buried hydrophobic residues and substitute for them similar hydrophobic residues, such that the wildtype amino acid and the substituted mutant amino acid differ only in the presence of one or few methylene groups. By quickly scanning the stability of the various mutants, promising candidates can be identified by noting sequences having energies below a preselected value. Such sequences can then be synthesized by techniques well known in the art, as described above.
  • Table 3 presents the physical stability changes calculated for mutations of buried Valine residues in Subtilisin BPN-prime, to Isoleucine. Of the twenty-two mutations tested, at least seven (indicated by three or more "+" signs) are calculated to produce significant improvements in the native protein's physical stability. Furthermore, these mutations may be combined to produce additive improvements in physical stability which are quite large.
  • wildtype subtilisin is as follows: ALA GLN SER VAL PRO TYR GLY VAL SER GLN ILE LYS ALA PRO ALA LEU HIS SER GLN GLY TYR THR GLY SER ASN VAL LYS VAL ALA VAL ILE ASP SER GLY ILE ASP SER SER HIS PRO ASP LEU LYS VAL ALA GLY GLY ALA SER MET VAL PRO SER GLU THR PRO ASN PHE GLN ASP ASP ASN SER HIS GLY THR HIS VAL ALA GLY THR VAL ALA ALA LEU ASN ASN SER ILE GLY VAL LEU GLY VAL ALA PRO SER SER ALA LEU TYR ALA VAL LYS VAL LEU GLY ASP ALA GLY GLN TYR SER TRP ILE ILE ASN GLY ILE GLU TRP ALA ILE ALA ASN ASN ASN ASN SER
  • Trp Ala lie Ala Asn Asn Met Asp Val lie Asn Met Ser Leu Gly Gly 115 120 125

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Genetics & Genomics (AREA)
  • Medicinal Chemistry (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Peptides Or Proteins (AREA)

Abstract

L'invention se rapporte à un procédé pour déterminer la conformation en moyenne temporelle et l'énergie de tassement d'une structure macromoléculaire, tel qu'un peptide ou un acide nucléique. A cet effet, on utilise des cartes de probabilité de conformations pour mettre en rotation une multitude de chaînes latérales de peptides en vue d'obtenir un grand nombre de conformations différentes. A chaque conformation, l'énergie d'interaction de chaque chaîne latérale de peptides avec ses voisines est déterminée et utilisée pour affiner la carte de l'énergie de conformation. A la suite de mouvements de rotation répétés, ce procédé produit, pour chaque chaîne latérale, une carte d'énergie de conformation complète qui est ensuite employée pour déterminer une carte de probabilité de conformation. La nouvelle carte de probabilité de conformation remplace la précédante et un nouveau cycle peut commencer. Ce processus transforme une structure macromoléculaire en un ensemble final autoconsistant de conformations probables représentant une structure en moyenne temporelle de la macromolécule effective. L'énergie libre de la structure peut également être déterminée. Ce procédé peut servir à identifier la stabilité de peptides mutants.
PCT/US1993/000418 1992-01-21 1993-01-20 Prediction de la conformation et de la stabilite de structures macromoleculaires WO1993014465A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US82372592A 1992-01-21 1992-01-21
US07/823,725 1992-01-21

Publications (1)

Publication Number Publication Date
WO1993014465A1 true WO1993014465A1 (fr) 1993-07-22

Family

ID=25239554

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1993/000418 WO1993014465A1 (fr) 1992-01-21 1993-01-20 Prediction de la conformation et de la stabilite de structures macromoleculaires

Country Status (1)

Country Link
WO (1) WO1993014465A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012037659A1 (fr) 2010-09-24 2012-03-29 Zymeworks Inc. Système pour calculs de structures moléculaires
WO2019232222A1 (fr) * 2018-05-31 2019-12-05 Trustees Of Dartmouth College Conception de protéine par modélisation numérique utilisant des motifs structuraux tertiaires ou quaternaires
CN113421610A (zh) * 2021-07-01 2021-09-21 北京望石智慧科技有限公司 一种分子叠合构象确定方法、装置以及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4704692A (en) * 1986-09-02 1987-11-03 Ladner Robert C Computer based system and method for determining and displaying possible chemical structures for converting double- or multiple-chain polypeptides to single-chain polypeptides
US4852017A (en) * 1987-06-19 1989-07-25 Applied Biosystems, Inc. Determination of peptide sequences
US4853871A (en) * 1987-04-06 1989-08-01 Genex Corporation Computer-based method for designing stablized proteins
US4908773A (en) * 1987-04-06 1990-03-13 Genex Corporation Computer designed stabilized proteins and method for producing same
US5008831A (en) * 1989-01-12 1991-04-16 The United States Of America As Represented By The Department Of Health And Human Services Method for producing high quality chemical structure diagrams
US5081584A (en) * 1989-03-13 1992-01-14 United States Of America Computer-assisted design of anti-peptides based on the amino acid sequence of a target peptide

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4704692A (en) * 1986-09-02 1987-11-03 Ladner Robert C Computer based system and method for determining and displaying possible chemical structures for converting double- or multiple-chain polypeptides to single-chain polypeptides
US4853871A (en) * 1987-04-06 1989-08-01 Genex Corporation Computer-based method for designing stablized proteins
US4908773A (en) * 1987-04-06 1990-03-13 Genex Corporation Computer designed stabilized proteins and method for producing same
US4852017A (en) * 1987-06-19 1989-07-25 Applied Biosystems, Inc. Determination of peptide sequences
US5008831A (en) * 1989-01-12 1991-04-16 The United States Of America As Represented By The Department Of Health And Human Services Method for producing high quality chemical structure diagrams
US5081584A (en) * 1989-03-13 1992-01-14 United States Of America Computer-assisted design of anti-peptides based on the amino acid sequence of a target peptide

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012037659A1 (fr) 2010-09-24 2012-03-29 Zymeworks Inc. Système pour calculs de structures moléculaires
EP2619700A4 (fr) * 2010-09-24 2017-06-07 Zymeworks, Inc. Système pour calculs de structures moléculaires
US10832794B2 (en) 2010-09-24 2020-11-10 Zymeworks Inc. System for molecular packing calculations
WO2019232222A1 (fr) * 2018-05-31 2019-12-05 Trustees Of Dartmouth College Conception de protéine par modélisation numérique utilisant des motifs structuraux tertiaires ou quaternaires
CN113421610A (zh) * 2021-07-01 2021-09-21 北京望石智慧科技有限公司 一种分子叠合构象确定方法、装置以及存储介质
CN113421610B (zh) * 2021-07-01 2023-10-20 北京望石智慧科技有限公司 一种分子叠合构象确定方法、装置以及存储介质

Similar Documents

Publication Publication Date Title
US5241470A (en) Prediction of protein side-chain conformation by packing optimization
US6631332B2 (en) Methods for using functional site descriptors and predicting protein function
Brünger et al. Computational challenges for macromolecular structure determination by X-ray crystallography and solution NMRspectroscopy
US7139665B2 (en) Computational method for designing enzymes for incorporation of non natural amino acids into proteins
Voigt et al. Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design
US6950754B2 (en) Apparatus and method for automated protein design
US20030130797A1 (en) Protein modeling tools
Rufino et al. Predicting the conformational class of short and medium size loops connecting regular secondary structures: application to comparative modelling
US20130013279A1 (en) Apparatus and method for structure-based prediction of amino acid sequences
US5553004A (en) Constrained langevin dynamics method for simulating molecular conformations
Cavasotto et al. The challenge of considering receptor flexibility in ligand docking and virtual screening
Vedani et al. Pseudo-receptor modeling: a new concept for the three-dimensional construction of receptor binding sites
King et al. Structure‐based prediction of protein–peptide specificity in rosetta
WO2001016810A2 (fr) Procede informatise destine a l'ingenierie et a la conception macromoleculaires
Stoddard et al. Molecular recognition analyzed by docking simulations: the aspartate receptor and isocitrate dehydrogenase from Escherichia coli.
WO1993014465A1 (fr) Prediction de la conformation et de la stabilite de structures macromoleculaires
EP1471443B1 (fr) Methode de construction de la stereostructure d'une proteine a plusieurs chaines
Datta et al. Selectivity and specificity of substrate binding in methionyl‐tRNA synthetase
US7751987B1 (en) Method and system for predicting amino acid sequences compatible with a specified three dimensional structure
WO1999061654A1 (fr) Procedes et systeme de prediction des fonctions biologiques de proteines
Zacharias Computational Protein–Protein Docking
Alber et al. Structure determination of macromolecular complexes by experiment and computation
Wrabl et al. Experimental Characterization of “Metamorphic” Proteins Predicted from an Ensemble-Based Thermodynamic Description
bioRχiv PREPRINT et al. NEURAL NETWORK-DERIVED POTTS MODELS FOR STRUCTURE-BASED PROTEIN DESIGN USING BACKBONE ATOMIC COORDINATES AND TERTIARY MOTIFS
Opuu Computational design of proteins and enzymes

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA