WO1993014465A1 - Prediction de la conformation et de la stabilite de structures macromoleculaires - Google Patents
Prediction de la conformation et de la stabilite de structures macromoleculaires Download PDFInfo
- Publication number
- WO1993014465A1 WO1993014465A1 PCT/US1993/000418 US9300418W WO9314465A1 WO 1993014465 A1 WO1993014465 A1 WO 1993014465A1 US 9300418 W US9300418 W US 9300418W WO 9314465 A1 WO9314465 A1 WO 9314465A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- conformation
- energy
- freedom
- peptide
- probability
- Prior art date
Links
- 108090000765 processed proteins & peptides Proteins 0.000 claims abstract description 230
- 238000000034 method Methods 0.000 claims abstract description 221
- 230000003993 interaction Effects 0.000 claims abstract description 84
- 238000012856 packing Methods 0.000 claims abstract description 43
- 102000004196 processed proteins & peptides Human genes 0.000 claims abstract description 41
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 11
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 10
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 10
- 229940024606 amino acid Drugs 0.000 claims description 67
- 150000001413 amino acids Chemical class 0.000 claims description 64
- 230000000694 effects Effects 0.000 claims description 30
- 238000005381 potential energy Methods 0.000 claims description 26
- 230000014509 gene expression Effects 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 19
- 239000002773 nucleotide Substances 0.000 claims description 19
- 125000003729 nucleotide group Chemical group 0.000 claims description 17
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 15
- 125000000539 amino acid group Chemical group 0.000 claims description 13
- 108090000787 Subtilisin Proteins 0.000 claims description 12
- 230000001976 improved effect Effects 0.000 claims description 10
- 229920001184 polypeptide Polymers 0.000 claims description 8
- 238000004519 manufacturing process Methods 0.000 claims description 7
- 239000000203 mixture Substances 0.000 claims description 7
- 125000000741 isoleucyl group Chemical group [H]N([H])C(C(C([H])([H])[H])C([H])([H])C([H])([H])[H])C(=O)O* 0.000 claims description 6
- 125000002987 valine group Chemical group [H]N([H])C([H])(C(*)=O)C([H])(C([H])([H])[H])C([H])([H])[H] 0.000 claims description 6
- 125000001165 hydrophobic group Chemical group 0.000 claims description 4
- 230000006641 stabilisation Effects 0.000 claims description 4
- 238000011105 stabilization Methods 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000005076 Van der Waals potential Methods 0.000 claims description 3
- 102000008300 Mutant Proteins Human genes 0.000 claims description 2
- 108010021466 Mutant Proteins Proteins 0.000 claims description 2
- 230000002194 synthesizing effect Effects 0.000 claims description 2
- 238000013519 translation Methods 0.000 claims description 2
- FGUUSXIOTUKUDN-IBGZPJMESA-N C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 Chemical compound C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 FGUUSXIOTUKUDN-IBGZPJMESA-N 0.000 claims 2
- 230000008569 process Effects 0.000 abstract description 15
- 229920002521 macromolecule Polymers 0.000 abstract description 3
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 107
- 125000004429 atom Chemical group 0.000 description 105
- 235000001014 amino acid Nutrition 0.000 description 64
- 108090000623 proteins and genes Proteins 0.000 description 60
- 235000018102 proteins Nutrition 0.000 description 52
- 102000004169 proteins and genes Human genes 0.000 description 51
- 238000004364 calculation method Methods 0.000 description 36
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 description 33
- 230000035772 mutation Effects 0.000 description 30
- 229910052799 carbon Inorganic materials 0.000 description 25
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 23
- 230000002209 hydrophobic effect Effects 0.000 description 21
- 238000001816 cooling Methods 0.000 description 19
- 238000009739 binding Methods 0.000 description 18
- 238000009833 condensation Methods 0.000 description 17
- 230000005494 condensation Effects 0.000 description 17
- 230000027455 binding Effects 0.000 description 16
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 14
- 238000013459 approach Methods 0.000 description 13
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 12
- 238000012163 sequencing technique Methods 0.000 description 12
- 230000003068 static effect Effects 0.000 description 12
- 125000001360 methionine group Chemical group N[C@@H](CCSC)C(=O)* 0.000 description 11
- 238000002922 simulated annealing Methods 0.000 description 11
- 239000000243 solution Substances 0.000 description 11
- 239000000126 substance Substances 0.000 description 11
- 230000002547 anomalous effect Effects 0.000 description 10
- 230000000875 corresponding effect Effects 0.000 description 9
- 239000013078 crystal Substances 0.000 description 9
- 238000009826 distribution Methods 0.000 description 9
- 238000002474 experimental method Methods 0.000 description 9
- 230000033001 locomotion Effects 0.000 description 9
- 238000005259 measurement Methods 0.000 description 9
- 108010083127 phage repressor proteins Proteins 0.000 description 9
- 238000003786 synthesis reaction Methods 0.000 description 9
- 238000012360 testing method Methods 0.000 description 9
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 8
- 238000005481 NMR spectroscopy Methods 0.000 description 8
- 239000000523 sample Substances 0.000 description 8
- 108020004414 DNA Proteins 0.000 description 7
- 230000015572 biosynthetic process Effects 0.000 description 7
- 210000004027 cell Anatomy 0.000 description 7
- -1 for example Chemical class 0.000 description 7
- 125000004435 hydrogen atom Chemical group [H]* 0.000 description 7
- 229910052757 nitrogen Inorganic materials 0.000 description 7
- 238000005070 sampling Methods 0.000 description 7
- 102000004190 Enzymes Human genes 0.000 description 6
- 108090000790 Enzymes Proteins 0.000 description 6
- 239000004471 Glycine Substances 0.000 description 6
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 description 6
- 239000002253 acid Substances 0.000 description 6
- VILAVOFMIJHSJA-UHFFFAOYSA-N dicarbon monoxide Chemical compound [C]=C=O VILAVOFMIJHSJA-UHFFFAOYSA-N 0.000 description 6
- 230000009881 electrostatic interaction Effects 0.000 description 6
- 229940088598 enzyme Drugs 0.000 description 6
- 239000001257 hydrogen Substances 0.000 description 6
- 229910052739 hydrogen Inorganic materials 0.000 description 6
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 description 6
- 229960000310 isoleucine Drugs 0.000 description 6
- 230000005428 wave function Effects 0.000 description 6
- ONIBWKKTOPOVIA-UHFFFAOYSA-N Proline Natural products OC(=O)C1CCCN1 ONIBWKKTOPOVIA-UHFFFAOYSA-N 0.000 description 5
- 238000002441 X-ray diffraction Methods 0.000 description 5
- 125000003178 carboxy group Chemical group [H]OC(*)=O 0.000 description 5
- 238000012790 confirmation Methods 0.000 description 5
- 230000002349 favourable effect Effects 0.000 description 5
- 238000000329 molecular dynamics simulation Methods 0.000 description 5
- 238000010647 peptide synthesis reaction Methods 0.000 description 5
- 229920000642 polymer Polymers 0.000 description 5
- 239000002904 solvent Substances 0.000 description 5
- 241000894007 species Species 0.000 description 5
- 238000012546 transfer Methods 0.000 description 5
- 102100031673 Corneodesmosin Human genes 0.000 description 4
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 description 4
- 108091034117 Oligonucleotide Proteins 0.000 description 4
- 229940096437 Protein S Drugs 0.000 description 4
- 108010031318 Vitronectin Proteins 0.000 description 4
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 4
- 150000001408 amides Chemical class 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 4
- 230000004888 barrier function Effects 0.000 description 4
- CREMABGTGYGIQB-UHFFFAOYSA-N carbon carbon Chemical compound C.C CREMABGTGYGIQB-UHFFFAOYSA-N 0.000 description 4
- 239000011203 carbon fibre reinforced carbon Substances 0.000 description 4
- 150000001875 compounds Chemical class 0.000 description 4
- 230000007423 decrease Effects 0.000 description 4
- 238000002050 diffraction method Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 229910052760 oxygen Inorganic materials 0.000 description 4
- 239000001301 oxygen Substances 0.000 description 4
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 4
- 230000012846 protein folding Effects 0.000 description 4
- 239000004474 valine Substances 0.000 description 4
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 3
- MGSVBZIBCCKGCY-ZLUOBGJFSA-N Asp-Ser-Ser Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(O)=O MGSVBZIBCCKGCY-ZLUOBGJFSA-N 0.000 description 3
- 108010016529 Bacillus amyloliquefaciens ribonuclease Proteins 0.000 description 3
- 241001247437 Cerbera odollam Species 0.000 description 3
- 108010057366 Flavodoxin Proteins 0.000 description 3
- WHUUTDBJXJRKMK-UHFFFAOYSA-N Glutamic acid Natural products OC(=O)C(N)CCC(O)=O WHUUTDBJXJRKMK-UHFFFAOYSA-N 0.000 description 3
- SBVMXEZQJVUARN-XPUUQOCRSA-N Gly-Val-Ser Chemical compound NCC(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CO)C(O)=O SBVMXEZQJVUARN-XPUUQOCRSA-N 0.000 description 3
- WHUUTDBJXJRKMK-VKHMYHEASA-N L-glutamic acid Chemical compound OC(=O)[C@@H](N)CCC(O)=O WHUUTDBJXJRKMK-VKHMYHEASA-N 0.000 description 3
- 238000005411 Van der Waals force Methods 0.000 description 3
- 150000007513 acids Chemical class 0.000 description 3
- 108010017893 alanyl-alanyl-alanine Proteins 0.000 description 3
- 238000003556 assay Methods 0.000 description 3
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 3
- 238000013480 data collection Methods 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000011067 equilibration Methods 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000001727 in vivo Methods 0.000 description 3
- 238000003780 insertion Methods 0.000 description 3
- 230000037431 insertion Effects 0.000 description 3
- 235000006109 methionine Nutrition 0.000 description 3
- 125000001570 methylene group Chemical group [H]C([H])([*:1])[*:2] 0.000 description 3
- 230000036961 partial effect Effects 0.000 description 3
- 239000013641 positive control Substances 0.000 description 3
- 238000005036 potential barrier Methods 0.000 description 3
- 239000000047 product Substances 0.000 description 3
- 239000000376 reactant Substances 0.000 description 3
- 108010048818 seryl-histidine Proteins 0.000 description 3
- 239000007790 solid phase Substances 0.000 description 3
- 230000000087 stabilizing effect Effects 0.000 description 3
- 108010061238 threonyl-glycine Proteins 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- MYRTYDVEIRVNKP-UHFFFAOYSA-N 1,2-Divinylbenzene Chemical compound C=CC1=CC=CC=C1C=C MYRTYDVEIRVNKP-UHFFFAOYSA-N 0.000 description 2
- LJTZPXOCBZRFBH-CIUDSAMLSA-N Ala-Ala-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@H](C)N LJTZPXOCBZRFBH-CIUDSAMLSA-N 0.000 description 2
- 239000004475 Arginine Substances 0.000 description 2
- 230000004568 DNA-binding Effects 0.000 description 2
- 241000588724 Escherichia coli Species 0.000 description 2
- KRHYYFGTRYWZRS-UHFFFAOYSA-N Fluorane Chemical compound F KRHYYFGTRYWZRS-UHFFFAOYSA-N 0.000 description 2
- KQDMENMTYNBWMR-WHFBIAKZSA-N Gly-Asp-Ala Chemical compound [H]NCC(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C)C(O)=O KQDMENMTYNBWMR-WHFBIAKZSA-N 0.000 description 2
- JJGBXTYGTKWGAT-YUMQZZPRSA-N Gly-Pro-Glu Chemical compound NCC(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(O)=O JJGBXTYGTKWGAT-YUMQZZPRSA-N 0.000 description 2
- SOEGEPHNZOISMT-BYPYZUCNSA-N Gly-Ser-Gly Chemical compound NCC(=O)N[C@@H](CO)C(=O)NCC(O)=O SOEGEPHNZOISMT-BYPYZUCNSA-N 0.000 description 2
- 238000004965 Hartree-Fock calculation Methods 0.000 description 2
- KFQDSSNYWKZFOO-LSJOCFKGSA-N His-Val-Ala Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(O)=O KFQDSSNYWKZFOO-LSJOCFKGSA-N 0.000 description 2
- OAKJQQAXSVQMHS-UHFFFAOYSA-N Hydrazine Chemical compound NN OAKJQQAXSVQMHS-UHFFFAOYSA-N 0.000 description 2
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 2
- ONIBWKKTOPOVIA-BYPYZUCNSA-N L-Proline Chemical compound OC(=O)[C@@H]1CCCN1 ONIBWKKTOPOVIA-BYPYZUCNSA-N 0.000 description 2
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 description 2
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 2
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 description 2
- 238000004510 Lennard-Jones potential Methods 0.000 description 2
- 241000880493 Leptailurus serval Species 0.000 description 2
- QCSFMCFHVGTLFF-NHCYSSNCSA-N Leu-Asp-Val Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(O)=O QCSFMCFHVGTLFF-NHCYSSNCSA-N 0.000 description 2
- AMSSKPUHBUQBOQ-SRVKXCTJSA-N Leu-Ser-Lys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)O)N AMSSKPUHBUQBOQ-SRVKXCTJSA-N 0.000 description 2
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 2
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 2
- 239000004472 Lysine Substances 0.000 description 2
- ULNXMMYXQKGNPG-LPEHRKFASA-N Met-Ala-Pro Chemical compound C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CCSC)N ULNXMMYXQKGNPG-LPEHRKFASA-N 0.000 description 2
- YBAFDPFAUTYYRW-UHFFFAOYSA-N N-L-alpha-glutamyl-L-leucine Natural products CC(C)CC(C(O)=O)NC(=O)C(N)CCC(O)=O YBAFDPFAUTYYRW-UHFFFAOYSA-N 0.000 description 2
- 108020002230 Pancreatic Ribonuclease Proteins 0.000 description 2
- 102000005891 Pancreatic ribonuclease Human genes 0.000 description 2
- FNGOXVQBBCMFKV-CIUDSAMLSA-N Pro-Ser-Glu Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(O)=O FNGOXVQBBCMFKV-CIUDSAMLSA-N 0.000 description 2
- 108020004511 Recombinant DNA Proteins 0.000 description 2
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 2
- XXXAXOWMBOKTRN-XPUUQOCRSA-N Ser-Gly-Val Chemical compound [H]N[C@@H](CO)C(=O)NCC(=O)N[C@@H](C(C)C)C(O)=O XXXAXOWMBOKTRN-XPUUQOCRSA-N 0.000 description 2
- ASGYVPAVFNDZMA-GUBZILKMSA-N Ser-Met-Val Chemical compound CC(C)[C@@H](C(=O)O)NC(=O)[C@H](CCSC)NC(=O)[C@H](CO)N ASGYVPAVFNDZMA-GUBZILKMSA-N 0.000 description 2
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 2
- PPBRXRYQALVLMV-UHFFFAOYSA-N Styrene Chemical compound C=CC1=CC=CC=C1 PPBRXRYQALVLMV-UHFFFAOYSA-N 0.000 description 2
- NYQIZWROIMIQSL-VEVYYDQMSA-N Thr-Pro-Asn Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(N)=O)C(O)=O NYQIZWROIMIQSL-VEVYYDQMSA-N 0.000 description 2
- DTQVDTLACAAQTR-UHFFFAOYSA-N Trifluoroacetic acid Chemical compound OC(=O)C(F)(F)F DTQVDTLACAAQTR-UHFFFAOYSA-N 0.000 description 2
- HHPSUFUXXBOFQY-AQZXSJQPSA-N Trp-Thr-Asn Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)N)O HHPSUFUXXBOFQY-AQZXSJQPSA-N 0.000 description 2
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 description 2
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 2
- 108010064997 VPY tripeptide Proteins 0.000 description 2
- IJBTVYLICXHDRI-UHFFFAOYSA-N Val-Ala-Ala Natural products CC(C)C(N)C(=O)NC(C)C(=O)NC(C)C(O)=O IJBTVYLICXHDRI-UHFFFAOYSA-N 0.000 description 2
- 235000004279 alanine Nutrition 0.000 description 2
- 108010076324 alanyl-glycyl-glycine Proteins 0.000 description 2
- 108010024078 alanyl-glycyl-serine Proteins 0.000 description 2
- 108010069020 alanyl-prolyl-glycine Proteins 0.000 description 2
- 108010047495 alanylglycine Proteins 0.000 description 2
- 150000001335 aliphatic alkanes Chemical class 0.000 description 2
- 230000004075 alteration Effects 0.000 description 2
- 125000003368 amide group Chemical group 0.000 description 2
- 125000003277 amino group Chemical group 0.000 description 2
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 2
- 235000009697 arginine Nutrition 0.000 description 2
- UCMIRNVEIXFBKS-UHFFFAOYSA-N beta-alanine Chemical compound NCCC(O)=O UCMIRNVEIXFBKS-UHFFFAOYSA-N 0.000 description 2
- 244000309464 bull Species 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 239000000470 constituent Substances 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- XVOYSCVBGLVSOL-UHFFFAOYSA-N cysteic acid Chemical compound OC(=O)C(N)CS(O)(=O)=O XVOYSCVBGLVSOL-UHFFFAOYSA-N 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- 230000001687 destabilization Effects 0.000 description 2
- 230000000368 destabilizing effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000005421 electrostatic potential Methods 0.000 description 2
- 238000007710 freezing Methods 0.000 description 2
- 230000008014 freezing Effects 0.000 description 2
- BTCSSZJGUNDROE-UHFFFAOYSA-N gamma-aminobutyric acid Chemical compound NCCCC(O)=O BTCSSZJGUNDROE-UHFFFAOYSA-N 0.000 description 2
- 235000013922 glutamic acid Nutrition 0.000 description 2
- 239000004220 glutamic acid Substances 0.000 description 2
- 108010027668 glycyl-alanyl-valine Proteins 0.000 description 2
- 108010079413 glycyl-prolyl-glutamic acid Proteins 0.000 description 2
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 2
- 238000010438 heat treatment Methods 0.000 description 2
- 229910001385 heavy metal Inorganic materials 0.000 description 2
- 238000003929 heteronuclear multiple quantum coherence Methods 0.000 description 2
- 125000001841 imino group Chemical group [H]N=* 0.000 description 2
- 108010083708 leucyl-aspartyl-valine Proteins 0.000 description 2
- 108010044311 leucyl-glycyl-glycine Proteins 0.000 description 2
- 108010034529 leucyl-lysine Proteins 0.000 description 2
- 235000018977 lysine Nutrition 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 150000002742 methionines Chemical class 0.000 description 2
- 238000010369 molecular cloning Methods 0.000 description 2
- 239000013642 negative control Substances 0.000 description 2
- 238000001668 nucleic acid synthesis Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- QKFJKGMPGYROCL-UHFFFAOYSA-N phenyl isothiocyanate Chemical compound S=C=NC1=CC=CC=C1 QKFJKGMPGYROCL-UHFFFAOYSA-N 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 2
- 230000008707 rearrangement Effects 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 230000010076 replication Effects 0.000 description 2
- 229910052710 silicon Inorganic materials 0.000 description 2
- 239000010703 silicon Substances 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000010561 standard procedure Methods 0.000 description 2
- 239000000758 substrate Substances 0.000 description 2
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 230000003612 virological effect Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- LSLXWOCIIFUZCQ-SRVKXCTJSA-N (2S)-2-[[(2S)-2-[[(2S)-2-amino-3-methyl-1-oxobutyl]amino]-3-methyl-1-oxobutyl]amino]-3-methylbutanoic acid Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C(C)C)C(O)=O LSLXWOCIIFUZCQ-SRVKXCTJSA-N 0.000 description 1
- BVAUMRCGVHUWOZ-ZETCQYMHSA-N (2s)-2-(cyclohexylazaniumyl)propanoate Chemical compound OC(=O)[C@H](C)NC1CCCCC1 BVAUMRCGVHUWOZ-ZETCQYMHSA-N 0.000 description 1
- 125000003088 (fluoren-9-ylmethoxy)carbonyl group Chemical group 0.000 description 1
- 238000005160 1H NMR spectroscopy Methods 0.000 description 1
- 125000000143 2-carboxyethyl group Chemical group [H]OC(=O)C([H])([H])C([H])([H])* 0.000 description 1
- ASJSAQIRZKANQN-CRCLSJGQSA-N 2-deoxy-D-ribose Chemical compound OC[C@@H](O)[C@@H](O)CC=O ASJSAQIRZKANQN-CRCLSJGQSA-N 0.000 description 1
- OSJPPGNTCRNQQC-UWTATZPHSA-N 3-phospho-D-glyceric acid Chemical compound OC(=O)[C@H](O)COP(O)(O)=O OSJPPGNTCRNQQC-UWTATZPHSA-N 0.000 description 1
- QCVGEOXPDFCNHA-UHFFFAOYSA-N 5,5-dimethyl-2,4-dioxo-1,3-oxazolidine-3-carboxamide Chemical compound CC1(C)OC(=O)N(C(N)=O)C1=O QCVGEOXPDFCNHA-UHFFFAOYSA-N 0.000 description 1
- 102100026802 72 kDa type IV collagenase Human genes 0.000 description 1
- HBAQYPYDRFILMT-UHFFFAOYSA-N 8-[3-(1-cyclopropylpyrazol-4-yl)-1H-pyrazolo[4,3-d]pyrimidin-5-yl]-3-methyl-3,8-diazabicyclo[3.2.1]octan-2-one Chemical class C1(CC1)N1N=CC(=C1)C1=NNC2=C1N=C(N=C2)N1C2C(N(CC1CC2)C)=O HBAQYPYDRFILMT-UHFFFAOYSA-N 0.000 description 1
- 229930024421 Adenine Natural products 0.000 description 1
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 1
- BYXHQQCXAJARLQ-ZLUOBGJFSA-N Ala-Ala-Ala Chemical compound C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](C)C(O)=O BYXHQQCXAJARLQ-ZLUOBGJFSA-N 0.000 description 1
- RLMISHABBKUNFO-WHFBIAKZSA-N Ala-Ala-Gly Chemical compound C[C@H](N)C(=O)N[C@@H](C)C(=O)NCC(O)=O RLMISHABBKUNFO-WHFBIAKZSA-N 0.000 description 1
- LBJYAILUMSUTAM-ZLUOBGJFSA-N Ala-Asn-Asn Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O LBJYAILUMSUTAM-ZLUOBGJFSA-N 0.000 description 1
- MVBWLRJESQOQTM-ACZMJKKPSA-N Ala-Gln-Ser Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CO)C(O)=O MVBWLRJESQOQTM-ACZMJKKPSA-N 0.000 description 1
- ZVFVBBGVOILKPO-WHFBIAKZSA-N Ala-Gly-Ala Chemical compound C[C@H](N)C(=O)NCC(=O)N[C@@H](C)C(O)=O ZVFVBBGVOILKPO-WHFBIAKZSA-N 0.000 description 1
- VGPWRRFOPXVGOH-BYPYZUCNSA-N Ala-Gly-Gly Chemical compound C[C@H](N)C(=O)NCC(=O)NCC(O)=O VGPWRRFOPXVGOH-BYPYZUCNSA-N 0.000 description 1
- HHRAXZAYZFFRAM-CIUDSAMLSA-N Ala-Leu-Asn Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(O)=O HHRAXZAYZFFRAM-CIUDSAMLSA-N 0.000 description 1
- WUHJHHGYVVJMQE-BJDJZHNGSA-N Ala-Leu-Ile Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O WUHJHHGYVVJMQE-BJDJZHNGSA-N 0.000 description 1
- IPZQNYYAYVRKKK-FXQIFTODSA-N Ala-Pro-Ala Chemical compound C[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](C)C(O)=O IPZQNYYAYVRKKK-FXQIFTODSA-N 0.000 description 1
- XWFWAXPOLRTDFZ-FXQIFTODSA-N Ala-Pro-Ser Chemical compound C[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CO)C(O)=O XWFWAXPOLRTDFZ-FXQIFTODSA-N 0.000 description 1
- PEEYDECOOVQKRZ-DLOVCJGASA-N Ala-Ser-Phe Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O PEEYDECOOVQKRZ-DLOVCJGASA-N 0.000 description 1
- NZGRHTKZFSVPAN-BIIVOSGPSA-N Ala-Ser-Pro Chemical compound C[C@@H](C(=O)N[C@@H](CO)C(=O)N1CCC[C@@H]1C(=O)O)N NZGRHTKZFSVPAN-BIIVOSGPSA-N 0.000 description 1
- SOTXLXCVCZAKFI-FXQIFTODSA-N Ala-Val-Ala Chemical compound C[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(O)=O SOTXLXCVCZAKFI-FXQIFTODSA-N 0.000 description 1
- 240000000662 Anethum graveolens Species 0.000 description 1
- GIVATXIGCXFQQA-FXQIFTODSA-N Arg-Ala-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CCCN=C(N)N GIVATXIGCXFQQA-FXQIFTODSA-N 0.000 description 1
- FRBAHXABMQXSJQ-FXQIFTODSA-N Arg-Ser-Ser Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(O)=O FRBAHXABMQXSJQ-FXQIFTODSA-N 0.000 description 1
- KXFCBAHYSLJCCY-ZLUOBGJFSA-N Asn-Asn-Ser Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(O)=O KXFCBAHYSLJCCY-ZLUOBGJFSA-N 0.000 description 1
- XWFPGQVLOVGSLU-CIUDSAMLSA-N Asn-Gln-Arg Chemical compound NC(=O)C[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@H](C(O)=O)CCCN=C(N)N XWFPGQVLOVGSLU-CIUDSAMLSA-N 0.000 description 1
- GNKVBRYFXYWXAB-WDSKDSINSA-N Asn-Glu-Gly Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)NCC(O)=O GNKVBRYFXYWXAB-WDSKDSINSA-N 0.000 description 1
- KLKHFFMNGWULBN-VKHMYHEASA-N Asn-Gly Chemical compound NC(=O)C[C@H](N)C(=O)NCC(O)=O KLKHFFMNGWULBN-VKHMYHEASA-N 0.000 description 1
- PBSQFBAJKPLRJY-BYULHYEWSA-N Asn-Gly-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CC(=O)N)N PBSQFBAJKPLRJY-BYULHYEWSA-N 0.000 description 1
- RAQMSGVCGSJKCL-FOHZUACHSA-N Asn-Gly-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)CNC(=O)[C@@H](N)CC(N)=O RAQMSGVCGSJKCL-FOHZUACHSA-N 0.000 description 1
- NTWOPSIUJBMNRI-KKUMJFAQSA-N Asn-Lys-Tyr Chemical compound NC(=O)C[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 NTWOPSIUJBMNRI-KKUMJFAQSA-N 0.000 description 1
- MDDXKBHIMYYJLW-FXQIFTODSA-N Asn-Met-Asp Chemical compound CSCC[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)[C@H](CC(=O)N)N MDDXKBHIMYYJLW-FXQIFTODSA-N 0.000 description 1
- KEUNWIXNKVWCFL-FXQIFTODSA-N Asn-Met-Ser Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CO)C(O)=O KEUNWIXNKVWCFL-FXQIFTODSA-N 0.000 description 1
- UXHYOWXTJLBEPG-GSSVUCPTSA-N Asn-Thr-Thr Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O UXHYOWXTJLBEPG-GSSVUCPTSA-N 0.000 description 1
- KWBQPGIYEZKDEG-FSPLSTOPSA-N Asn-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@@H](N)CC(N)=O KWBQPGIYEZKDEG-FSPLSTOPSA-N 0.000 description 1
- LMIWYCWRJVMAIQ-NHCYSSNCSA-N Asn-Val-Lys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CC(=O)N)N LMIWYCWRJVMAIQ-NHCYSSNCSA-N 0.000 description 1
- KNMRXHIAVXHCLW-ZLUOBGJFSA-N Asp-Asn-Ser Chemical compound C([C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CO)C(=O)O)N)C(=O)O KNMRXHIAVXHCLW-ZLUOBGJFSA-N 0.000 description 1
- QOVWVLLHMMCFFY-ZLUOBGJFSA-N Asp-Asp-Asn Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O QOVWVLLHMMCFFY-ZLUOBGJFSA-N 0.000 description 1
- ZQFRDAZBTSFGGW-SRVKXCTJSA-N Asp-Ser-Phe Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O ZQFRDAZBTSFGGW-SRVKXCTJSA-N 0.000 description 1
- 241000713842 Avian sarcoma virus Species 0.000 description 1
- 244000063299 Bacillus subtilis Species 0.000 description 1
- 235000014469 Bacillus subtilis Nutrition 0.000 description 1
- 241000701822 Bovine papillomavirus Species 0.000 description 1
- CURLTUGMZLYLDI-UHFFFAOYSA-N Carbon dioxide Chemical compound O=C=O CURLTUGMZLYLDI-UHFFFAOYSA-N 0.000 description 1
- 150000008574 D-amino acids Chemical class 0.000 description 1
- LMKYZBGVKHTLTN-NKWVEPMBSA-N D-nopaline Chemical compound NC(=N)NCCC[C@@H](C(O)=O)N[C@@H](C(O)=O)CCC(O)=O LMKYZBGVKHTLTN-NKWVEPMBSA-N 0.000 description 1
- HMFHBZSHGGEWLO-SOOFDHNKSA-N D-ribofuranose Chemical compound OC[C@H]1OC(O)[C@H](O)[C@@H]1O HMFHBZSHGGEWLO-SOOFDHNKSA-N 0.000 description 1
- 108090000204 Dipeptidase 1 Proteins 0.000 description 1
- 102000002322 Egg Proteins Human genes 0.000 description 1
- 108010000912 Egg Proteins Proteins 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 101000925646 Enterobacteria phage T4 Endolysin Proteins 0.000 description 1
- YJIUYQKQBBQYHZ-ACZMJKKPSA-N Gln-Ala-Ala Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](C)C(=O)N[C@@H](C)C(O)=O YJIUYQKQBBQYHZ-ACZMJKKPSA-N 0.000 description 1
- YXQCLIVLWCKCRS-RYUDHWBXSA-N Gln-Gly-Tyr Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CCC(=O)N)N)O YXQCLIVLWCKCRS-RYUDHWBXSA-N 0.000 description 1
- MTCXQQINVAFZKW-MNXVOIDGSA-N Gln-Ile-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CCC(=O)N)N MTCXQQINVAFZKW-MNXVOIDGSA-N 0.000 description 1
- UBRQJXFDVZNYJP-AVGNSLFASA-N Gln-Tyr-Ser Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@H](CCC(=O)N)N)O UBRQJXFDVZNYJP-AVGNSLFASA-N 0.000 description 1
- FYBSCGZLICNOBA-XQXXSGGOSA-N Glu-Ala-Thr Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O FYBSCGZLICNOBA-XQXXSGGOSA-N 0.000 description 1
- DSPQRJXOIXHOHK-WDSKDSINSA-N Glu-Asp-Gly Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)NCC(O)=O DSPQRJXOIXHOHK-WDSKDSINSA-N 0.000 description 1
- ZQNCUVODKOBSSO-XEGUGMAKSA-N Glu-Trp-Ala Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(=O)N[C@@H](C)C(O)=O ZQNCUVODKOBSSO-XEGUGMAKSA-N 0.000 description 1
- PYTZFYUXZZHOAD-WHFBIAKZSA-N Gly-Ala-Ala Chemical compound OC(=O)[C@H](C)NC(=O)[C@H](C)NC(=O)CN PYTZFYUXZZHOAD-WHFBIAKZSA-N 0.000 description 1
- QIZJOTQTCAGKPU-KWQFWETISA-N Gly-Ala-Tyr Chemical compound [NH3+]CC(=O)N[C@@H](C)C(=O)N[C@H](C([O-])=O)CC1=CC=C(O)C=C1 QIZJOTQTCAGKPU-KWQFWETISA-N 0.000 description 1
- JRDYDYXZKFNNRQ-XPUUQOCRSA-N Gly-Ala-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)CN JRDYDYXZKFNNRQ-XPUUQOCRSA-N 0.000 description 1
- NZAFOTBEULLEQB-WDSKDSINSA-N Gly-Asn-Glu Chemical compound C(CC(=O)O)[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)CN NZAFOTBEULLEQB-WDSKDSINSA-N 0.000 description 1
- OCDLPQDYTJPWNG-YUMQZZPRSA-N Gly-Asn-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)CN OCDLPQDYTJPWNG-YUMQZZPRSA-N 0.000 description 1
- CCQOOWAONKGYKQ-BYPYZUCNSA-N Gly-Gly-Ala Chemical compound OC(=O)[C@H](C)NC(=O)CNC(=O)CN CCQOOWAONKGYKQ-BYPYZUCNSA-N 0.000 description 1
- BUEFQXUHTUZXHR-LURJTMIESA-N Gly-Gly-Pro zwitterion Chemical compound NCC(=O)NCC(=O)N1CCC[C@H]1C(O)=O BUEFQXUHTUZXHR-LURJTMIESA-N 0.000 description 1
- UTYGDAHJBBDPBA-BYULHYEWSA-N Gly-Ile-Asp Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)CN UTYGDAHJBBDPBA-BYULHYEWSA-N 0.000 description 1
- CVFOYJJOZYYEPE-KBPBESRZSA-N Gly-Lys-Tyr Chemical compound [H]NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O CVFOYJJOZYYEPE-KBPBESRZSA-N 0.000 description 1
- IALQAMYQJBZNSK-WHFBIAKZSA-N Gly-Ser-Asn Chemical compound [H]NCC(=O)N[C@@H](CO)C(=O)N[C@@H](CC(N)=O)C(O)=O IALQAMYQJBZNSK-WHFBIAKZSA-N 0.000 description 1
- WCORRBXVISTKQL-WHFBIAKZSA-N Gly-Ser-Ser Chemical compound NCC(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(O)=O WCORRBXVISTKQL-WHFBIAKZSA-N 0.000 description 1
- LCRDMSSAKLTKBU-ZDLURKLDSA-N Gly-Ser-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)[C@H](CO)NC(=O)CN LCRDMSSAKLTKBU-ZDLURKLDSA-N 0.000 description 1
- RHRLHXQWHCNJKR-PMVVWTBXSA-N Gly-Thr-His Chemical compound NCC(=O)N[C@@H]([C@H](O)C)C(=O)N[C@H](C(O)=O)CC1=CN=CN1 RHRLHXQWHCNJKR-PMVVWTBXSA-N 0.000 description 1
- CUVBTVWFVIIDOC-YEPSODPASA-N Gly-Thr-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@H]([C@@H](C)O)NC(=O)CN CUVBTVWFVIIDOC-YEPSODPASA-N 0.000 description 1
- LYZYGGWCBLBDMC-QWHCGFSZSA-N Gly-Tyr-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CC=C(C=C2)O)NC(=O)CN)C(=O)O LYZYGGWCBLBDMC-QWHCGFSZSA-N 0.000 description 1
- GBYYQVBXFVDJPJ-WLTAIBSBSA-N Gly-Tyr-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)NC(=O)CN)O GBYYQVBXFVDJPJ-WLTAIBSBSA-N 0.000 description 1
- 238000003078 Hartree-Fock method Methods 0.000 description 1
- BDFCIKANUNMFGB-PMVVWTBXSA-N His-Gly-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)CNC(=O)[C@@H](N)CC1=CN=CN1 BDFCIKANUNMFGB-PMVVWTBXSA-N 0.000 description 1
- GNBHSMFBUNEWCJ-DCAQKATOSA-N His-Pro-Asn Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(N)=O)C(O)=O GNBHSMFBUNEWCJ-DCAQKATOSA-N 0.000 description 1
- BZAQOPHNBFOOJS-DCAQKATOSA-N His-Pro-Asp Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(O)=O)C(O)=O BZAQOPHNBFOOJS-DCAQKATOSA-N 0.000 description 1
- KRBMQYPTDYSENE-BQBZGAKWSA-N His-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@@H](N)CC1=CNC=N1 KRBMQYPTDYSENE-BQBZGAKWSA-N 0.000 description 1
- 102000006947 Histones Human genes 0.000 description 1
- 108010033040 Histones Proteins 0.000 description 1
- 101000627872 Homo sapiens 72 kDa type IV collagenase Proteins 0.000 description 1
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 description 1
- LQSBBHNVAVNZSX-GHCJXIJMSA-N Ile-Ala-Asn Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](C)C(=O)N[C@@H](CC(=O)N)C(=O)O)N LQSBBHNVAVNZSX-GHCJXIJMSA-N 0.000 description 1
- MKWSZEHGHSLNPF-NAKRPEOUSA-N Ile-Ala-Val Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](C)C(=O)N[C@@H](C(C)C)C(=O)O)N MKWSZEHGHSLNPF-NAKRPEOUSA-N 0.000 description 1
- QIHJTGSVGIPHIW-QSFUFRPTSA-N Ile-Asn-Val Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](C(C)C)C(=O)O)N QIHJTGSVGIPHIW-QSFUFRPTSA-N 0.000 description 1
- DCQMJRSOGCYKTR-GHCJXIJMSA-N Ile-Asp-Ser Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(O)=O DCQMJRSOGCYKTR-GHCJXIJMSA-N 0.000 description 1
- YBJWJQQBWRARLT-KBIXCLLPSA-N Ile-Gln-Ser Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CO)C(O)=O YBJWJQQBWRARLT-KBIXCLLPSA-N 0.000 description 1
- VOBYAKCXGQQFLR-LSJOCFKGSA-N Ile-Gly-Val Chemical compound CC[C@H](C)[C@H](N)C(=O)NCC(=O)N[C@@H](C(C)C)C(O)=O VOBYAKCXGQQFLR-LSJOCFKGSA-N 0.000 description 1
- 229930010555 Inosine Natural products 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- AHLPHDHHMVZTML-BYPYZUCNSA-N L-Ornithine Chemical compound NCCC[C@H](N)C(O)=O AHLPHDHHMVZTML-BYPYZUCNSA-N 0.000 description 1
- ODKSFYDXXFIFQN-BYPYZUCNSA-P L-argininium(2+) Chemical compound NC(=[NH2+])NCCC[C@H]([NH3+])C(O)=O ODKSFYDXXFIFQN-BYPYZUCNSA-P 0.000 description 1
- RHGKLRLOHDJJDR-BYPYZUCNSA-N L-citrulline Chemical compound NC(=O)NCCC[C@H]([NH3+])C([O-])=O RHGKLRLOHDJJDR-BYPYZUCNSA-N 0.000 description 1
- HNDVDQJCIGZPNO-YFKPBYRVSA-N L-histidine Chemical compound OC(=O)[C@@H](N)CC1=CN=CN1 HNDVDQJCIGZPNO-YFKPBYRVSA-N 0.000 description 1
- LHSGPCFBGJHPCY-UHFFFAOYSA-N L-leucine-L-tyrosine Natural products CC(C)CC(N)C(=O)NC(C(O)=O)CC1=CC=C(O)C=C1 LHSGPCFBGJHPCY-UHFFFAOYSA-N 0.000 description 1
- KDXKERNSBIXSRK-YFKPBYRVSA-N L-lysine Chemical compound NCCCC[C@H](N)C(O)=O KDXKERNSBIXSRK-YFKPBYRVSA-N 0.000 description 1
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 description 1
- QEFRNWWLZKMPFJ-ZXPFJRLXSA-N L-methionine (R)-S-oxide Chemical compound C[S@@](=O)CC[C@H]([NH3+])C([O-])=O QEFRNWWLZKMPFJ-ZXPFJRLXSA-N 0.000 description 1
- QEFRNWWLZKMPFJ-UHFFFAOYSA-N L-methionine sulphoxide Natural products CS(=O)CCC(N)C(O)=O QEFRNWWLZKMPFJ-UHFFFAOYSA-N 0.000 description 1
- LRQKBLKVPFOOQJ-YFKPBYRVSA-N L-norleucine Chemical compound CCCC[C@H]([NH3+])C([O-])=O LRQKBLKVPFOOQJ-YFKPBYRVSA-N 0.000 description 1
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 1
- GUBGYTABKSRVRQ-QKKXKWKRSA-N Lactose Natural products OC[C@H]1O[C@@H](O[C@H]2[C@H](O)[C@@H](O)C(O)O[C@@H]2CO)[C@H](O)[C@@H](O)[C@H]1O GUBGYTABKSRVRQ-QKKXKWKRSA-N 0.000 description 1
- DLCXCECTCPKKCD-GUBZILKMSA-N Leu-Gln-Asn Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O DLCXCECTCPKKCD-GUBZILKMSA-N 0.000 description 1
- LAPSXOAUPNOINL-YUMQZZPRSA-N Leu-Gly-Asp Chemical compound CC(C)C[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CC(O)=O LAPSXOAUPNOINL-YUMQZZPRSA-N 0.000 description 1
- VWHGTYCRDRBSFI-ZETCQYMHSA-N Leu-Gly-Gly Chemical compound CC(C)C[C@H](N)C(=O)NCC(=O)NCC(O)=O VWHGTYCRDRBSFI-ZETCQYMHSA-N 0.000 description 1
- POZULHZYLPGXMR-ONGXEEELSA-N Leu-Gly-Val Chemical compound CC(C)C[C@H](N)C(=O)NCC(=O)N[C@@H](C(C)C)C(O)=O POZULHZYLPGXMR-ONGXEEELSA-N 0.000 description 1
- OYQUOLRTJHWVSQ-SRVKXCTJSA-N Leu-His-Ser Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CO)C(O)=O OYQUOLRTJHWVSQ-SRVKXCTJSA-N 0.000 description 1
- LZHJZLHSRGWBBE-IHRRRGAJSA-N Leu-Lys-Val Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(O)=O LZHJZLHSRGWBBE-IHRRRGAJSA-N 0.000 description 1
- UCBPDSYUVAAHCD-UWVGGRQHSA-N Leu-Pro-Gly Chemical compound CC(C)C[C@H](N)C(=O)N1CCC[C@H]1C(=O)NCC(O)=O UCBPDSYUVAAHCD-UWVGGRQHSA-N 0.000 description 1
- FZIJIFCXUCZHOL-CIUDSAMLSA-N Lys-Ala-Ala Chemical compound OC(=O)[C@H](C)NC(=O)[C@H](C)NC(=O)[C@@H](N)CCCCN FZIJIFCXUCZHOL-CIUDSAMLSA-N 0.000 description 1
- IXHKPDJKKCUKHS-GARJFASQSA-N Lys-Ala-Pro Chemical compound C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CCCCN)N IXHKPDJKKCUKHS-GARJFASQSA-N 0.000 description 1
- GQFDWEDHOQRNLC-QWRGUYRKSA-N Lys-Gly-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)CNC(=O)[C@@H](N)CCCCN GQFDWEDHOQRNLC-QWRGUYRKSA-N 0.000 description 1
- VMTYLUGCXIEDMV-QWRGUYRKSA-N Lys-Leu-Gly Chemical compound OC(=O)CNC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CCCCN VMTYLUGCXIEDMV-QWRGUYRKSA-N 0.000 description 1
- IEIHKHYMBIYQTH-YESZJQIVSA-N Lys-Tyr-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CC=C(C=C2)O)NC(=O)[C@H](CCCCN)N)C(=O)O IEIHKHYMBIYQTH-YESZJQIVSA-N 0.000 description 1
- VWPJQIHBBOJWDN-DCAQKATOSA-N Lys-Val-Ala Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(O)=O VWPJQIHBBOJWDN-DCAQKATOSA-N 0.000 description 1
- DRRXXZBXDMLGFC-IHRRRGAJSA-N Lys-Val-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](C(C)C)NC(=O)[C@@H](N)CCCCN DRRXXZBXDMLGFC-IHRRRGAJSA-N 0.000 description 1
- PWHULOQIROXLJO-UHFFFAOYSA-N Manganese Chemical group [Mn] PWHULOQIROXLJO-UHFFFAOYSA-N 0.000 description 1
- FVKRQMQQFGBXHV-QXEWZRGKSA-N Met-Asp-Val Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(O)=O FVKRQMQQFGBXHV-QXEWZRGKSA-N 0.000 description 1
- HLZORBMOISUNIV-DCAQKATOSA-N Met-Ser-Leu Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CC(C)C HLZORBMOISUNIV-DCAQKATOSA-N 0.000 description 1
- 108010059724 Micrococcal Nuclease Proteins 0.000 description 1
- ZOKXTWBITQBERF-UHFFFAOYSA-N Molybdenum Chemical compound [Mo] ZOKXTWBITQBERF-UHFFFAOYSA-N 0.000 description 1
- 238000000342 Monte Carlo simulation Methods 0.000 description 1
- 102000016943 Muramidase Human genes 0.000 description 1
- 108010014251 Muramidase Proteins 0.000 description 1
- RSPURTUNRHNVGF-IOSLPCCCSA-N N(2),N(2)-dimethylguanosine Chemical compound C1=NC=2C(=O)NC(N(C)C)=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O RSPURTUNRHNVGF-IOSLPCCCSA-N 0.000 description 1
- 108010062010 N-Acetylmuramoyl-L-alanine Amidase Proteins 0.000 description 1
- RHGKLRLOHDJJDR-UHFFFAOYSA-N Ndelta-carbamoyl-DL-ornithine Natural products OC(=O)C(N)CCCNC(N)=O RHGKLRLOHDJJDR-UHFFFAOYSA-N 0.000 description 1
- 241001282315 Nemesis Species 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- AHLPHDHHMVZTML-UHFFFAOYSA-N Orn-delta-NH2 Natural products NCCCC(N)C(O)=O AHLPHDHHMVZTML-UHFFFAOYSA-N 0.000 description 1
- UTJLXEIPEHZYQJ-UHFFFAOYSA-N Ornithine Natural products OC(=O)C(C)CCCN UTJLXEIPEHZYQJ-UHFFFAOYSA-N 0.000 description 1
- 229910019142 PO4 Inorganic materials 0.000 description 1
- SXJGROGVINAYSH-AVGNSLFASA-N Phe-Gln-Asp Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CC(=O)O)C(=O)O)N SXJGROGVINAYSH-AVGNSLFASA-N 0.000 description 1
- MCIXMYKSPQUMJG-SRVKXCTJSA-N Phe-Ser-Ser Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(O)=O MCIXMYKSPQUMJG-SRVKXCTJSA-N 0.000 description 1
- 102000011755 Phosphoglycerate Kinase Human genes 0.000 description 1
- 108010022181 Phosphopyruvate Hydratase Proteins 0.000 description 1
- KPDRZQUWJKTMBP-DCAQKATOSA-N Pro-Asp-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@@H]1CCCN1 KPDRZQUWJKTMBP-DCAQKATOSA-N 0.000 description 1
- LPGSNRSLPHRNBW-AVGNSLFASA-N Pro-His-Val Chemical compound C([C@@H](C(=O)N[C@@H](C(C)C)C([O-])=O)NC(=O)[C@H]1[NH2+]CCC1)C1=CN=CN1 LPGSNRSLPHRNBW-AVGNSLFASA-N 0.000 description 1
- PRKWBYCXBBSLSK-GUBZILKMSA-N Pro-Ser-Val Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(O)=O PRKWBYCXBBSLSK-GUBZILKMSA-N 0.000 description 1
- 241000589516 Pseudomonas Species 0.000 description 1
- KDCGOANMDULRCW-UHFFFAOYSA-N Purine Natural products N1=CNC2=NC=NC2=C1 KDCGOANMDULRCW-UHFFFAOYSA-N 0.000 description 1
- CZPWVGJYEJSRLH-UHFFFAOYSA-N Pyrimidine Chemical compound C1=CN=CN=C1 CZPWVGJYEJSRLH-UHFFFAOYSA-N 0.000 description 1
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 1
- PYMYPHUHKUWMLA-LMVFSUKVSA-N Ribose Natural products OC[C@@H](O)[C@@H](O)[C@@H](O)C=O PYMYPHUHKUWMLA-LMVFSUKVSA-N 0.000 description 1
- HRNQLKCLPVKZNE-CIUDSAMLSA-N Ser-Ala-Leu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(O)=O HRNQLKCLPVKZNE-CIUDSAMLSA-N 0.000 description 1
- WOUIMBGNEUWXQG-VKHMYHEASA-N Ser-Gly Chemical compound OC[C@H](N)C(=O)NCC(O)=O WOUIMBGNEUWXQG-VKHMYHEASA-N 0.000 description 1
- UIGMAMGZOJVTDN-WHFBIAKZSA-N Ser-Gly-Ser Chemical compound OC[C@H](N)C(=O)NCC(=O)N[C@@H](CO)C(O)=O UIGMAMGZOJVTDN-WHFBIAKZSA-N 0.000 description 1
- UGGWCAFQPKANMW-FXQIFTODSA-N Ser-Met-Ala Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](C)C(O)=O UGGWCAFQPKANMW-FXQIFTODSA-N 0.000 description 1
- ZKBKUWQVDWWSRI-BZSNNMDCSA-N Ser-Phe-Tyr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O ZKBKUWQVDWWSRI-BZSNNMDCSA-N 0.000 description 1
- NVNPWELENFJOHH-CIUDSAMLSA-N Ser-Ser-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](CO)N NVNPWELENFJOHH-CIUDSAMLSA-N 0.000 description 1
- XQJCEKXQUJQNNK-ZLUOBGJFSA-N Ser-Ser-Ser Chemical compound OC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(O)=O XQJCEKXQUJQNNK-ZLUOBGJFSA-N 0.000 description 1
- VGQVAVQWKJLIRM-FXQIFTODSA-N Ser-Ser-Val Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(O)=O VGQVAVQWKJLIRM-FXQIFTODSA-N 0.000 description 1
- PURRNJBBXDDWLX-ZDLURKLDSA-N Ser-Thr-Gly Chemical compound C[C@H]([C@@H](C(=O)NCC(=O)O)NC(=O)[C@H](CO)N)O PURRNJBBXDDWLX-ZDLURKLDSA-N 0.000 description 1
- BDMWLJLPPUCLNV-XGEHTFHBSA-N Ser-Thr-Val Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(O)=O BDMWLJLPPUCLNV-XGEHTFHBSA-N 0.000 description 1
- ANOQEBQWIAYIMV-AEJSXWLSSA-N Ser-Val-Pro Chemical compound CC(C)[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CO)N ANOQEBQWIAYIMV-AEJSXWLSSA-N 0.000 description 1
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 1
- 108700034501 Staphylococcus aureus auR Proteins 0.000 description 1
- 101001099217 Thermotoga maritima (strain ATCC 43589 / DSM 3109 / JCM 10099 / NBRC 100826 / MSB8) Triosephosphate isomerase Proteins 0.000 description 1
- DKDHTRVDOUZZTP-IFFSRLJSSA-N Thr-Gln-Val Chemical compound CC(C)[C@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)[C@@H](C)O)C(O)=O DKDHTRVDOUZZTP-IFFSRLJSSA-N 0.000 description 1
- DJDSEDOKJTZBAR-ZDLURKLDSA-N Thr-Gly-Ser Chemical compound C[C@@H](O)[C@H](N)C(=O)NCC(=O)N[C@@H](CO)C(O)=O DJDSEDOKJTZBAR-ZDLURKLDSA-N 0.000 description 1
- NCXVJIQMWSGRHY-KXNHARMFSA-N Thr-Leu-Pro Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N1CCC[C@@H]1C(=O)O)N)O NCXVJIQMWSGRHY-KXNHARMFSA-N 0.000 description 1
- XZUBGOYOGDRYFC-XGEHTFHBSA-N Thr-Ser-Met Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCSC)C(O)=O XZUBGOYOGDRYFC-XGEHTFHBSA-N 0.000 description 1
- COYHRQWNJDJCNA-NUJDXYNKSA-N Thr-Thr-Thr Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O COYHRQWNJDJCNA-NUJDXYNKSA-N 0.000 description 1
- OGOYMQWIWHGTGH-KZVJFYERSA-N Thr-Val-Ala Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(O)=O OGOYMQWIWHGTGH-KZVJFYERSA-N 0.000 description 1
- AKHDFZHUPGVFEJ-YEPSODPASA-N Thr-Val-Gly Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(=O)NCC(O)=O AKHDFZHUPGVFEJ-YEPSODPASA-N 0.000 description 1
- OHGNSVACHBZKSS-KWQFWETISA-N Trp-Ala Chemical compound C1=CC=C2C(C[C@H]([NH3+])C(=O)N[C@@H](C)C([O-])=O)=CNC2=C1 OHGNSVACHBZKSS-KWQFWETISA-N 0.000 description 1
- LDMUNXDDIDAPJH-VMBFOHBNSA-N Trp-Ile-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)O)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)N LDMUNXDDIDAPJH-VMBFOHBNSA-N 0.000 description 1
- 108090000631 Trypsin Proteins 0.000 description 1
- 102000004142 Trypsin Human genes 0.000 description 1
- DXYWRYQRKPIGGU-BPNCWPANSA-N Tyr-Ala-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 DXYWRYQRKPIGGU-BPNCWPANSA-N 0.000 description 1
- GFHYISDTIWZUSU-QWRGUYRKSA-N Tyr-Asn-Gly Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(N)=O)C(=O)NCC(O)=O GFHYISDTIWZUSU-QWRGUYRKSA-N 0.000 description 1
- CNLKDWSAORJEMW-KWQFWETISA-N Tyr-Gly-Ala Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)NCC(=O)N[C@@H](C)C(O)=O CNLKDWSAORJEMW-KWQFWETISA-N 0.000 description 1
- JKUZFODWJGEQAP-KBPBESRZSA-N Tyr-Gly-Lys Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)O)N)O JKUZFODWJGEQAP-KBPBESRZSA-N 0.000 description 1
- CTDPLKMBVALCGN-JSGCOSHPSA-N Tyr-Gly-Val Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)NCC(=O)N[C@@H](C(C)C)C(O)=O CTDPLKMBVALCGN-JSGCOSHPSA-N 0.000 description 1
- SZEIFUXUTBBQFQ-STQMWFEESA-N Tyr-Pro-Gly Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N1CCC[C@H]1C(=O)NCC(O)=O SZEIFUXUTBBQFQ-STQMWFEESA-N 0.000 description 1
- PLVVHGFEMSDRET-IHPCNDPISA-N Tyr-Ser-Trp Chemical compound C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](CC3=CC=C(C=C3)O)N PLVVHGFEMSDRET-IHPCNDPISA-N 0.000 description 1
- WYOBRXPIZVKNMF-IRXDYDNUSA-N Tyr-Tyr-Gly Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)NCC(O)=O)C1=CC=C(O)C=C1 WYOBRXPIZVKNMF-IRXDYDNUSA-N 0.000 description 1
- ASQFIHTXXMFENG-XPUUQOCRSA-N Val-Ala-Gly Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](C)C(=O)NCC(O)=O ASQFIHTXXMFENG-XPUUQOCRSA-N 0.000 description 1
- ZLFHAAGHGQBQQN-GUBZILKMSA-N Val-Ala-Pro Natural products CC(C)[C@H](N)C(=O)N[C@@H](C)C(=O)N1CCC[C@H]1C(O)=O ZLFHAAGHGQBQQN-GUBZILKMSA-N 0.000 description 1
- QRZVUAAKNRHEOP-GUBZILKMSA-N Val-Ala-Val Chemical compound [H]N[C@@H](C(C)C)C(=O)N[C@@H](C)C(=O)N[C@@H](C(C)C)C(O)=O QRZVUAAKNRHEOP-GUBZILKMSA-N 0.000 description 1
- VMRFIKXKOFNMHW-GUBZILKMSA-N Val-Arg-Ser Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CO)C(=O)O)N VMRFIKXKOFNMHW-GUBZILKMSA-N 0.000 description 1
- BMGOFDMKDVVGJG-NHCYSSNCSA-N Val-Asp-Lys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)O)N BMGOFDMKDVVGJG-NHCYSSNCSA-N 0.000 description 1
- CELJCNRXKZPTCX-XPUUQOCRSA-N Val-Gly-Ala Chemical compound CC(C)[C@H](N)C(=O)NCC(=O)N[C@@H](C)C(O)=O CELJCNRXKZPTCX-XPUUQOCRSA-N 0.000 description 1
- BZMIYHIJVVJPCK-QSFUFRPTSA-N Val-Ile-Asn Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](C(C)C)N BZMIYHIJVVJPCK-QSFUFRPTSA-N 0.000 description 1
- VPGCVZRRBYOGCD-AVGNSLFASA-N Val-Lys-Val Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(O)=O VPGCVZRRBYOGCD-AVGNSLFASA-N 0.000 description 1
- QWCZXKIFPWPQHR-JYJNAYRXSA-N Val-Pro-Tyr Chemical compound CC(C)[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 QWCZXKIFPWPQHR-JYJNAYRXSA-N 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 230000002378 acidificating effect Effects 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 229960000643 adenine Drugs 0.000 description 1
- 108010086434 alanyl-seryl-glycine Proteins 0.000 description 1
- HMFHBZSHGGEWLO-UHFFFAOYSA-N alpha-D-Furanose-Ribose Natural products OCC1OC(O)C(O)C1O HMFHBZSHGGEWLO-UHFFFAOYSA-N 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 235000003704 aspartic acid Nutrition 0.000 description 1
- 108010040443 aspartyl-aspartic acid Proteins 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000001580 bacterial effect Effects 0.000 description 1
- 244000052616 bacterial pathogen Species 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- OQFSQFPPLPISGP-UHFFFAOYSA-N beta-carboxyaspartic acid Natural products OC(=O)C(N)C(C(O)=O)C(O)=O OQFSQFPPLPISGP-UHFFFAOYSA-N 0.000 description 1
- 102000006635 beta-lactamase Human genes 0.000 description 1
- 238000006664 bond formation reaction Methods 0.000 description 1
- 238000009933 burial Methods 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- 150000001720 carbohydrates Chemical group 0.000 description 1
- 125000004432 carbon atom Chemical group C* 0.000 description 1
- 125000002057 carboxymethyl group Chemical group [H]OC(=O)C([H])([H])[*] 0.000 description 1
- 239000003054 catalyst Substances 0.000 description 1
- 210000000170 cell membrane Anatomy 0.000 description 1
- 230000005859 cell recognition Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000002144 chemical decomposition reaction Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000004587 chromatography analysis Methods 0.000 description 1
- 238000002983 circular dichroism Methods 0.000 description 1
- 229960002173 citrulline Drugs 0.000 description 1
- 235000013477 citrulline Nutrition 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 239000002299 complementary DNA Substances 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000002079 cooperative effect Effects 0.000 description 1
- 229920001577 copolymer Polymers 0.000 description 1
- 238000005100 correlation spectroscopy Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 239000003431 cross linking reagent Substances 0.000 description 1
- 238000002425 crystallisation Methods 0.000 description 1
- 230000008025 crystallization Effects 0.000 description 1
- 238000002447 crystallographic data Methods 0.000 description 1
- ATDGTVJJHBUTRL-UHFFFAOYSA-N cyanogen bromide Chemical compound BrC#N ATDGTVJJHBUTRL-UHFFFAOYSA-N 0.000 description 1
- 150000001923 cyclic compounds Chemical class 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 229960003067 cystine Drugs 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010511 deprotection reaction Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- ZPTBLXKRQACLCR-XVFCMESISA-N dihydrouridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)CC1 ZPTBLXKRQACLCR-XVFCMESISA-N 0.000 description 1
- FSXRLASFHBWESK-UHFFFAOYSA-N dipeptide phenylalanyl-tyrosine Natural products C=1C=C(O)C=CC=1CC(C(O)=O)NC(=O)C(N)CC1=CC=CC=C1 FSXRLASFHBWESK-UHFFFAOYSA-N 0.000 description 1
- 238000010494 dissociation reaction Methods 0.000 description 1
- 230000005593 dissociations Effects 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000009509 drug development Methods 0.000 description 1
- 235000014103 egg white Nutrition 0.000 description 1
- 210000000969 egg white Anatomy 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 239000003623 enhancer Substances 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000006911 enzymatic reaction Methods 0.000 description 1
- 238000005290 field theory Methods 0.000 description 1
- 125000005519 fluorenylmethyloxycarbonyl group Chemical group 0.000 description 1
- 125000000524 functional group Chemical group 0.000 description 1
- 230000005021 gait Effects 0.000 description 1
- 229960003692 gamma aminobutyric acid Drugs 0.000 description 1
- 239000007789 gas Substances 0.000 description 1
- 238000012215 gene cloning Methods 0.000 description 1
- 238000010359 gene isolation Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 108010042598 glutamyl-aspartyl-glycine Proteins 0.000 description 1
- 150000004676 glycans Chemical class 0.000 description 1
- 230000002414 glycolytic effect Effects 0.000 description 1
- VPZXBVLAVMBEQI-UHFFFAOYSA-N glycyl-DL-alpha-alanine Natural products OC(=O)C(C)NC(=O)CN VPZXBVLAVMBEQI-UHFFFAOYSA-N 0.000 description 1
- XBGGUPMXALFZOT-UHFFFAOYSA-N glycyl-L-tyrosine hemihydrate Natural products NCC(=O)NC(C(O)=O)CC1=CC=C(O)C=C1 XBGGUPMXALFZOT-UHFFFAOYSA-N 0.000 description 1
- 108010082286 glycyl-seryl-alanine Proteins 0.000 description 1
- 108010089804 glycyl-threonine Proteins 0.000 description 1
- 108010010147 glycylglutamine Proteins 0.000 description 1
- 108010050848 glycylleucine Proteins 0.000 description 1
- 108010015792 glycyllysine Proteins 0.000 description 1
- 108010087823 glycyltyrosine Proteins 0.000 description 1
- 108010037850 glycylvaline Proteins 0.000 description 1
- HNDVDQJCIGZPNO-UHFFFAOYSA-N histidine Natural products OC(=O)C(N)CC1=CN=CN1 HNDVDQJCIGZPNO-UHFFFAOYSA-N 0.000 description 1
- 108010085325 histidylproline Proteins 0.000 description 1
- 108010018006 histidylserine Proteins 0.000 description 1
- 238000009396 hybridization Methods 0.000 description 1
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 1
- 150000002484 inorganic compounds Chemical class 0.000 description 1
- 229910010272 inorganic material Inorganic materials 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- 229910052742 iron Inorganic materials 0.000 description 1
- 150000002540 isothiocyanates Chemical class 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 239000008101 lactose Substances 0.000 description 1
- 108010012058 leucyltyrosine Proteins 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 239000007791 liquid phase Substances 0.000 description 1
- 229960000274 lysozyme Drugs 0.000 description 1
- 239000004325 lysozyme Substances 0.000 description 1
- 235000010335 lysozyme Nutrition 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 229910052751 metal Inorganic materials 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 229930182817 methionine Natural products 0.000 description 1
- 125000002496 methyl group Chemical group [H]C([H])([H])* 0.000 description 1
- 230000000813 microbial effect Effects 0.000 description 1
- 108091005601 modified peptides Proteins 0.000 description 1
- 229910052750 molybdenum Inorganic materials 0.000 description 1
- 239000011733 molybdenum Substances 0.000 description 1
- SYSQUGFVNFXIIT-UHFFFAOYSA-N n-[4-(1,3-benzoxazol-2-yl)phenyl]-4-nitrobenzenesulfonamide Chemical class C1=CC([N+](=O)[O-])=CC=C1S(=O)(=O)NC1=CC=C(C=2OC3=CC=CC=C3N=2)C=C1 SYSQUGFVNFXIIT-UHFFFAOYSA-N 0.000 description 1
- 238000001683 neutron diffraction Methods 0.000 description 1
- 239000002547 new drug Substances 0.000 description 1
- 125000004433 nitrogen atom Chemical group N* 0.000 description 1
- 238000002515 oligonucleotide synthesis Methods 0.000 description 1
- 210000003463 organelle Anatomy 0.000 description 1
- 150000002894 organic compounds Chemical class 0.000 description 1
- 229960003104 ornithine Drugs 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 238000005009 overhauser spectroscopy Methods 0.000 description 1
- 239000012071 phase Substances 0.000 description 1
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 description 1
- 108010051242 phenylalanylserine Proteins 0.000 description 1
- 229940117953 phenylisothiocyanate Drugs 0.000 description 1
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 1
- 239000010452 phosphate Substances 0.000 description 1
- 238000000053 physical method Methods 0.000 description 1
- 239000013600 plasmid vector Substances 0.000 description 1
- 230000010287 polarization Effects 0.000 description 1
- 229920001282 polysaccharide Polymers 0.000 description 1
- 239000005017 polysaccharide Substances 0.000 description 1
- 229920005990 polystyrene resin Polymers 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 108010031719 prolyl-serine Proteins 0.000 description 1
- 108010029020 prolylglycine Proteins 0.000 description 1
- 125000006239 protecting group Chemical group 0.000 description 1
- IGFXRKMLLMBKSA-UHFFFAOYSA-N purine Chemical compound N1=C[N]C2=NC=NC2=C1 IGFXRKMLLMBKSA-UHFFFAOYSA-N 0.000 description 1
- NHDHVHZZCFYRSB-UHFFFAOYSA-N pyriproxyfen Chemical compound C=1C=CC=NC=1OC(C)COC(C=C1)=CC=C1OC1=CC=CC=C1 NHDHVHZZCFYRSB-UHFFFAOYSA-N 0.000 description 1
- 238000010791 quenching Methods 0.000 description 1
- 238000010188 recombinant method Methods 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 229920005989 resin Polymers 0.000 description 1
- 239000011347 resin Substances 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 210000003705 ribosome Anatomy 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 229920002477 rna polymer Polymers 0.000 description 1
- 239000012266 salt solution Substances 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000004062 sedimentation Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012601 sigmoid curve-fitting method Methods 0.000 description 1
- 238000010583 slow cooling Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000010532 solid phase synthesis reaction Methods 0.000 description 1
- 238000004611 spectroscopical analysis Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 229940113082 thymine Drugs 0.000 description 1
- 238000012876 topography Methods 0.000 description 1
- 238000001551 total correlation spectroscopy Methods 0.000 description 1
- 230000005026 transcription initiation Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 239000012588 trypsin Substances 0.000 description 1
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 1
- 108010020532 tyrosyl-proline Proteins 0.000 description 1
- 108010003137 tyrosyltyrosine Proteins 0.000 description 1
- 241000701161 unidentified adenovirus Species 0.000 description 1
- 229940035893 uracil Drugs 0.000 description 1
- IBIDRSSEHFLGSD-UHFFFAOYSA-N valinyl-arginine Natural products CC(C)C(N)C(=O)NC(C(O)=O)CCCN=C(N)N IBIDRSSEHFLGSD-UHFFFAOYSA-N 0.000 description 1
- 108010021199 valyl-valyl-valine Proteins 0.000 description 1
- 108010073969 valyllysine Proteins 0.000 description 1
- 230000035899 viability Effects 0.000 description 1
- 230000001018 virulence Effects 0.000 description 1
- 238000002424 x-ray crystallography Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K1/00—General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/20—Protein or domain folding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
Definitions
- This invention relates to methods for determining th stability and conformation of molecular systems. The method also predicts the effects of mutations on the structure and th stability of the molecular system.
- a peptide is an oligomer of amino acids attached in a linear sequence to form, for example, a protein or an enzyme.
- Peptides consist of a main chain backbone having the following general pattern:
- the primary sequence of a peptide represents the sequence of the constituent amino acids such as, for example, NH 2 -Glu-Ala-Thr-Gly-OH (SEQ ID N0:1) (the three letter symbols represent amino acid residues) .
- the NH 2 - and -OH moieties represent the amino and carboxyl termini of the peptide, respectively, and also indicate the directionality of the peptide chain.
- the peptide's secondary structure represents the complex shape of main chain and generally indicate structural motifs of different portions of the peptide. Common secondary structure includes, for example, alpha-helices, beta- sheets, etc.
- Th-e tertiary structure of a peptide represents the three dimensional structure of the main chain, as well as the side-chains conformations.
- Tertiary structure is usually represented by a set of coordinates that specify that positions of each atom in the peptide main chain and side-chains and is often visualized using computer graphics or stereopictures.
- quaternary structure represents the three-dimensional shape and the interactions that occur between different peptide chains, such as between subunits of a protein complex.
- Non-amino acid fragments are often associated with a peptide. Such fragments can be covalently attached to a portion of the peptide or attached by non-covalent forces
- Non-amino acid moieties include, but are not limited to, heavy metal atoms such as, for example single molybdenum, iron, or manganese atoms, or clusters of metal atoms, nucleic acid fragments (such as DNA, RNA, etc.), lipids, and other organic and inorganic molecules (such as he es cofactors, etc.) .
- heavy metal atoms such as, for example single molybdenum, iron, or manganese atoms, or clusters of metal atoms, nucleic acid fragments (such as DNA, RNA, etc.), lipids, and other organic and inorganic molecules (such as he es cofactors, etc.) .
- the three-dimensional complexity of a peptide arises because covalent bonds in each amino acid can rotate.
- the conformation of peptide is a particular three-dimensional arrangement of atoms and, as used herein, is equivalent to its tertiary structure.
- the conformation of an amino acid side-chains is the three-dimensional structure the side- chains.
- an amino acid side-chains can assume many different conformations, with the exception of glycine which assumes only one.
- Peptide folding and structure prediction has traditionally been viewed as a very complex problem because of the large number of atoms in a typical peptide.
- the large size of a peptide chain in combination with its large number of degrees of freedom, allows it adopt an immense number of conformations.
- a relatively small polypeptide of 100 residues has 3 100 possible conformations considering only three possible confor ational states for each residue.
- Despit the multitude of possible conformations, many peptides, even large proteins and enzymes fold in vivo into precise three- dimensional structures.
- the peptide generally folds back on itself creating numerous simultaneous interactions between different parts of the peptide.
- the principal difficulty of predicting side-chain conformations is the enormous size of the conformation space (i.e., number of possible combinations of side-chain conformations) .
- a peptide having n different ⁇ torsions produces up to 36 n conformationally distinct peptides. For example, a five residue peptide with a total of ten ⁇ torsions has 3.7xl0 15 possible conformations that need to be evaluated to determined the low energy conformations.
- a prior strategy to optimize the structure of peptide decreases the number of conformational permutations by limiting the number of conformations allowed for each side-chains (see, Ponder and Richards J. Mol. Biol. (1987) vol. 193, pg. 775, which is incorporated by reference for all purposes) .
- This method allows each side-chains to exist in a only small number of predetermined rotamers, typically three to seven, and forbids free rotation of each amino acid torsion. Thus, for a five amino acid peptide where each side-chains is constrained to five rotamers, there are only 3125 possible permutations.
- Reid and Thornton Proteins (1989) vol. 5, pg. 170, which is incorporated by reference for all purposes) used this method to predict side-chain conformations of flavodoxin with an overall root-mean-square (r.m.s.) deviation of 2.41 A, compared with the X-ray crystal structure. They started from the alpha carbon coordinates alone, using computational methods to predict main-chain atoms, and manual examination and adjustment using computer graphics to predict the side-chains conformations.
- a third approach to predict peptide structure is exemplified by Karplus et al. (Proc. Nat. Acad. Sci.. USA (1989) vol. 86, pg. 8237, which is incorporated by reference for all purposes) which uses a multiterm potential energy function to calculate the interaction energy between atoms in the protein.
- the minimization method used molecular dynamics and, as a method for predicting peptide structure, this method relies on a detailed structure as a starting point. Like the other methods, this method has an exponential dependence on th number of atoms considered in the calculation.
- the present invention provides a new method that combines an explicit focus on structure prediction with ensemble methods more suited to calculation of energies.
- the ensemble methods which are rooted in thermodynamic formalisms provide accurate predictions of mutant thermostability.
- One aspect of the present invention involves a metho for using a computer having a memory to compare the physical stability of a first molecular system and a second molecular system. Both molecular systems have one or more degrees of freedom and a plurality of conformations for each of the one o more degrees of freedom.
- the method includes the following steps: a) preparing a geometric representation of the first molecular system, the geometric representation having a defined initial structure; b) assigning probabilities to each of the plurality of conformations of each of the degrees of freedom; c) repeatedly adjusting the conformation of each degree of freedom according to the probabilities assigned to the conformations of each degree of freedom, and, after eac adjustment, determining an energy of each conformation of each degree of freedom in a field, the field associated with each conformation of each degree of freedom being caused by the conformations of the remaining degrees of freedom; d) replacing the probability assigned to each of the plurality of conformations of each degree of freedom, the probability determined from the energy of each conformation; e) repeating steps c and d until the energy of each conformation of each degree of freedom converges to a substantially unchanging value, that value corresponding to th physical stability of said first molecular system; f) repeating steps a through e for said second molecular system to determine the physical stabilities of both the first
- conformation energy and probability maps are employed to determining the packing energy of a macromolecular structure. This can be accomplished by first preparing a geometric representation of the macromolecular structure and dividing it into one or more residues, each of the residues having a side- chain and torsion angle degrees of freedom. Next, an initial conformation probability map for each of the residues is prepared. At least one of the residues is then moved to a new conformation; although in most instances, the many residues will be moved to a new conformation. The residues' conformation probability maps are used in determining an average conformation energy map for each of the residues. Next, each residue's average energy map is used to prepare a new conformation probability map to replace the previous conformation probability maps of each residue. The whole process of moving the residues to new conformations and determining average conformation energy maps is then repeated over and over until the average conformation energy map converges. Finally, the packing energy of the macromolecular structure is determined from the average conformation energy maps of each residue.
- a preferred method predicts the three dimensional conformation of a peptide.
- the method utilizes the understanding that amino acid side-chains of a peptide adopt conformations that maximize favorable atom-atom contacts and minimize unfavorable contacts. With this principle, the method determines the energy of atom-atom interactions and adjusts the amino acid side-chains conformations to minimize this energy.
- the invention is directed to a method for determining the three-dimensional structure of a peptide having amino acid side-chains extending from a defined main chain.
- Each amino acid side-chains has predefined rotational degrees of freedom.
- the present invention also provides a method for determining the time-average packing conformation of a macromolecular structure.
- the method includes the following steps: a) preparing a geometric representation of the macromolecular structure; b) dividing the geometric representation into one or more structural zones; c) determining an initial conformation probability map for each of the structural zones; d) moving said geometric representation to a new conformation; e) determining an average conformation energy map for each of the structural zones from that zone's conformation probability map; f) replacing the conformation probability maps of each zone with new conformation probability maps determined from each zone's average energy map; and g) repeating steps d and f until said conformation probability map converges, the converged conformation probability map representing a time-average packing conformation of the macromolecular structure.
- the present invention also provides method for producing a peptide having a specified stability.
- this method consists of the following steps: a) selecting a known peptide having a desired activity; b) generating a series of mutant peptide sequences from the known peptide by replacing one or more of its residues with different amino acid residues; c) determining the stability of each of said mutant protein structures by the following steps:
- step d repeating steps iii and iv until the energy of each conformation of each degree of freedom converges to a substantially unchanging value, the sum of these values corresponding to the stability of said peptide; d) identifying a mutant peptide sequence from among said series of mutant peptide sequences having the specified stability; and e) synthesizing a peptide having the mutant peptide sequence identified in step d.
- the present invention is also directed to synthetic peptide compositions which exhibit thermal stabilization by improved core packing.
- some subtilisin polypeptide mutants will exhibit improved stability when certain hydrophobic amino acids are substituted for other hydrophobic amino acids. It has been found that by substituting isoleucine for valine at the 30, 180 and/or 192 sequence positions strongly increases the peptide stability.
- a peptide is synthesized having the structure used to model a low energy or otherwise stable peptide.
- the stable peptide structure is identified from among a group of structures that are modeled according to the above procedure. Each of these structures will have at least one amino acid that is different from a corresponding amino acid in the other structures. At least one structure from among this group will be identified as having a suitable stability and thereafter synthesized by techniques that are well known in the art.
- Fig. 1 illustrates the arrangement of atoms in a peptide backbone.
- Fig. 2 illustrates (a) the general chemical structur of a naturally occurring amino acid, (b) the chemical structure of glycine, and (c) chemical structure of proline.
- Fig. 3a illustrates preferred sets of rotational degrees of freedom for each naturally occurring amino acid.
- Fig. 3b illustrates the chemical structure of the five naturally occurring nucleotides.
- Fig. 4 schematically shows the torsion about a carbon-carbon single bond.
- Fig. 5 illustrates a digital computer system that may be used to implement some aspects of the present invention.
- Fig. 6a shows the procedure used to load the main chain coordinates and amino acid sequence of the peptide to be modeled.
- Fig. 6b schematically illustrates the set-up and precalculation steps employed in some methods of the present invention.
- Fig. 6c schematically illustrates the preparation of lists of interactions between side-chains and main chains, and side-chains and other side-chains.
- Fig. 6d schematically illustrates the main program for calculating the peptide packing conformation.
- Figs. 7a and 7b schematically show the bond between Co and the plain formed by C, C ⁇ and N.
- Figs. 7c and 7d schematically illustrate the torsion angle about the bond between Co and C ⁇ .
- Figs. 8a-d present various comparisons between the predicted results of the present invention and the experimental results for activity and stability of lambda repressor mutants.
- Fig. 9a shows a histogram comparing the activity of various lambda repressor mutants based upon their packing energies as calculated by the present invention.
- Fig. 9b shows a histogram comparing the activity of various lambda repressor mutants over the range of their volumes (relative to the wild-type in units of methylene groups) .
- Fig. 10 presents a comparison of predicted side-chain coordinates for an eight residue molten zone surrounding mutations in lambda repressor.
- Fig. 11 presents a comparison of the internal r s deviations of side-chain predictions from seven runs seeded with different starting structures.
- Fig. 12 shows a contour plot of conformation space for the condensation of a single residue.
- Fig. 13a-c illustrates the condensation of a six residue molten zone for wild-type lambda repressor.
- Fig. 14 illustrates convergence of the total system energy as a function of iteration cycle for a six residue molten zone.
- Figs. 15a-c present comparisons of simulated annealing and the method of the present invention: (a) on the basis of lowest energy conformation of flavodoxin versus number of cycles; (b) on the basis of increasing molten zone size and the final peptide energy; and (c) on the basis of the number of moves required for convergence.
- Fig. 16 shows theoretical calculations of the free energy of folding for a series of hydrophobic core mutations in the protein barnase.
- Fig. 17 shows a comparison of experimental and calculated binding free energies for 15-mer peptides binding to the S-protein fragment of pancreatic ribonuclease A.
- a side-chains "rotamer” is a frequently observed rotational isomer for a residue, constituting a single, static conformation of that residue. This term has been used in some references to describe the limited set of isomers used to model peptide side-chains conformations.
- Conformation map refers to a map of the conformations of a zone (region of structure) within a particular molecular system. For example, an amino acid residue can be considered to be a zone within a peptide.
- the axes of a conformation map are the degrees of freedom associated with the particular zone or residue. For example, translational, rotational, torsional, and vibrational degrees of freedom may serve as axes.
- a “conformation probability map” describes the probability that a residue is in a given conformation at any particular instant in time.
- the conformation probability map will have an axis representing the probability associated with each conformation (dependent variable) .
- Conformation energy map refers to a map describing the energy experienced by a given residue in each of its possible conformations, due to its interactions with other residues and molecules.
- the conformation energy map will have axes corresponding to degrees of freedom in the zone. In addition, it will have an axis corresponding to the energy associated with each conformation.
- Stability refers, in one sense, to the ability of molecular system such as a polymer to remain in an active conformation when subjected to thermal or disruptive effects.
- a conformation is active when it possesses at least one measurable property.
- an enzyme may be considered active so long as it can act as a catalyst.
- Stability also refers, in a thermodynamic sense, to molecular complexes havin generally low free energies. Of course, the free energy is determined with respect to some base state such as the unfolde conformation or an unbound receptor.
- stability can represent the free energy of foldin for a given peptide sequence, while in the case of two interacting molecules (e.g. an enzyme and its substrate), stability can represent the free energy of binding between them.
- An increase in the physical stability of a peptide generally results in an increase in thermal stability, although it could result in an increase in binding affinity, stability in salt solutions, pH stability, and other environmental conditions.
- Molecular System refers to a collection of atoms, at least some of which are covalently bonded to one another, interacting via a defined set of noncovalent forces. These may include interacting van der Waals forces, hydrogen bonding, and electrostatic forces, may also be present.
- a molecular system refers to species in any chemical class such as organic compounds and inorganic compounds. It also refers to species in any phase, such as gas, liquid or solid phases.
- Degrees of Freedom refers to the independent parameters that define the conformation of the molecular system. Common examples of degrees of freedom associated with motion include translation, rotation and vibration. A degree of freedom may also be used to describe independent ways that a molecular system may take up energy.
- Macromolecular structure refers to molecules, sub- molecular groups, and complexes between one or more molecules or groups.
- a macromolecular group will generally have a molecular weight of more than about 200, preferably more than about 500, and most preferably more than about 1000.
- the macromolecular complex will typically have a main-chain or "backbone” which is a string of repeating molecular units.
- the macromolecular complex will often possess a series of side-chains extending from the main-chain.
- Examples of macromolecular complexes include, proteins and large peptides alone or associated with other molecules such as co actors, substrates, membranes, and cell structural organelles.
- Macromolecular complexes will also include nucleic acids such as DNA and RNA, as well as these materials combined with materials such as histones, ribosomes, and polymerases.
- “Mutant” refers to molecular systems that are expressed by a mutation (i.e. an alteration in the amount or arrangement of genetic material of a cell or virus) .
- a mutant is any variant of a wildtype structure (i.e. the typical form of a biological molecule as it occurs in nature) in which one or more amino acids have been deleted or changed. In most instances, the mutant will retain most of the structural information of the parent wildtype structure.
- mutant refers to any molecular system that has been modified to deviate from its native state. For example, a peptide containing amino acids that are not genetically coded is a "mutant".
- a “geometric representation” refers to an abstract model of a real molecular system.
- the geometric representatio will have an arrangement of structural features and degrees of freedom that correspond to the real molecular system. Through manipulations on a computer or other means for rapidly evaluating equations, the geometric representation may be carried through a range of movements to explore the properties of a real molecular system in various conformations.
- Converge refers to the state of an iterative process in which a result of the process remains substantially unchanged after each iteration. For example, if the process repeatedly calculates a conformation energy map, the process converges when the absolute magnitude of the energy values and the relative topography of the map remain substantially unchanged from one iteration to the next.
- One aspect of determining peptide conformation and energetics is the prediction of two basic classes of degrees of freedom.
- the ⁇ - torsions which determine the folding of main chain atoms of the peptide, and the ⁇ torsions, which specify the set of angles that defines the conformation of each amino acid side-chains. These two sets of variables are closely coupled, because of the tremendous importance of side-chains conformation and packing for the stability of the overall peptide conformation.
- a preferred method of the invention determines the set of favored ⁇ torsional conformations, holding the ⁇ — ⁇ torsions substantially constant.
- Peptides fall into the general class of polymers and are simply molecules generated from a sequence of amino acid residues connected in series.
- the peptide backbone, or main chain consists of a repeated sequence of three atoms: an amide nitrogen N_, the alpha carbon C_ a , and the carbonyl carbon C_, where i represents the amino acid in the peptide sequence.
- the carbonyl oxygen, 0_ is attached to the carbonyl carbon and hydrogens are attached to both the amide nitrogen and alpha carbon. In principle, rotation can occur around any of the three bonds of the peptide main-chain.
- the bond between C_ and N i+1 has partial double bond character that inhibits its rotation and in the absence of a strong force, C ⁇ ", C_ r 0_, and N i+1 lie in approximately the same plane.
- the first carbon of the side-chains, which is attached to C_ a is the beta carbon, C- 3 .
- the beta carbon of each has a fixed position relative to the peptide main chain defined by C ⁇ , C i , and N i .
- the position of the main chain specifies the position of the first atom of each side-chains.
- beta carbon, or C ⁇ we refer to the atom of a side-chains attached to C ⁇ .
- C ⁇ is a carbon
- for glycine C ⁇ is a hydrogen.
- Each side-chains has unique physical and chemical properties, as is well known (see, Creighton "Proteins: Structure and Molecular Principles," W.H. Freeman and Company, New York, 1984, which is incorporated by reference for all purposes) .
- the side-chains of each amino acid can adopt a myriad of possible conformations, the number of which depends on the number of predefined rotational degrees of freedom.
- Fig. 3a illustrates a preferred set of rotational degrees of freedom for each amino acid. As defined in this set, rotation about a methyl group which has, in theory, a three-fold rotation axis (taking hydrogen atoms into account) , and hydroxyl/sulfhydryl groups which have no rotational symmetry, are in most instances not included.
- Structurally simple amino acids such as alanine and glycine, as well as the imino acid proline, consist of side- chains that have no rotational degrees of freedom, while the side-chains of more complex amino acids such as lysine and arginine have four.
- the side-chains of alanine and proline all have one conformation.
- amino acids in these categories include enantiomers and diastereomers of the natural D-amino acids, oxyproline, cyclohexylalanine, norleucine, cysteic acid, methionine sulfoxide, ornithine, citrulline, omega-a ino acids such as 3-amino propionic acid, 4-amino butyric acid, etc. All such amino acids can be incorporated into peptides by suitable methods known in the art, and the structure of a peptide having these uncommon amino acids can be determined when the structure and properties of the uncommon amino acids are known.
- amino acid includes all natural amino acids encoded by the genetic code, as well as uncommon natural amino acids and unnatural amino acids.
- the invention method is suitable for determining the structure of poly-deoxyribonucleic acids (DNA) and poly-ribonucleic acids (RNA) , as well as protein-DNA and protein-RNA complexes.
- the monomeric units of these biological polymers are the nucleotides, which are shown in Fig. 3b.
- the five common, naturally occurring nucleotides adenine, guanine, cytosine, thymine, and uracil have the general structure consisting of a phosphate, a sugar, and a purine or pyrimidine base. Each of these nucleotides is planar, and has one rotational degree of freedom, as shown in Fig. 4.
- nucleotide refers to the set of common and uncommon naturally-occurring nucleotides, as well as the set of unnatural nucleotides.
- the sugar ring may be deoxyribose, ribose, or any suitable variation (such as, for example, in a 2-methyl nucleotide) .
- the preferred method utilizes the three-dimensional structure of the peptide main chain as a starting point for predicting the conformation of the peptide side-chains.
- X-ray or neutron diffraction (hereinafter referred to as "diffraction") provides a detailed picture of the three-dimensional positioning of the peptide main chain. Diffraction methods are well known (see, for example, Cantor et al. "Biophysical Chemistry Vol. Ill” (1980) W.H. Freeman & Co., San Francisco, chapter 13, which is incorporated by reference for all purposes) .
- Diffraction methods are based on the observation that many peptides crystallize into a well-defined three dimensional crystal lattices which scatter impinging X-ray or neutron irradiation. Collection and analysis of the scattered beams, in conjunction other experiments, produces the three dimensional structure of crystal lattice. In the current state of peptide crystallography, to obtain the three dimensional structure generally requires use of auxiliary techniques such as isomorphous heavy metal replacement, multiple wavelength scattering, anomalous scattering, to supplement the collected scattered X-ray or neutron beam data (See Cantor et al. "Biophysical Chemistry Vol. III").
- Coordinates for each atom of the peptide main chain are obtained once the electron density map of the peptide main chain has been solved.
- the electron density map of the peptid generally has an associated correlation coefficient (or resolution) that represents the accuracy of the data and the amount of detail present, respectively.
- resolution In accurate high resolution electron maps, structural elements such as the coordinates of main chain and side-chains atoms are readily observed.
- Low resolution data generally includes the position of the main chain atoms but does not, however, include side- chains positions.
- the present method utilizes both high and resolution diffraction data.
- Other methods for determining the three-dimensional conformation of the peptide main chain suitable for use with the invention include, for example, nuclear magnetic resonance (NMR) spectroscopy and theoretical prediction.
- NMR nuclear magnetic resonance
- Structural determination by NMR spectroscopy involves three steps: identification and assignment of resonance signals of the spectra to individual nuclei, inter-nuclei distance measurements, and computation of the structure.
- Suitable NMR methods include, for example, one-dimensional proton ( 1 H) NMR spectroscopy, which is used to identify individual protons in a peptide, two-dimensional 1 H NMR methods (including correlated experiments which rely on J-coupling) which provide interproton relationships using through-bond coupling, and the Nuclear Overhauser Effect (NOE) experiments which provide spatial relationships using through-space information (see Griesing et al. J. Mag. Res. (1989), vol. 73, pg. 574.
- NMR methods suitable for use with the present invention include the use of insensitive nucleus enhancement by polarization transfer (INEPT) , two-dimensional Nuclear Overhauser spectroscopy (NOESY) , reverse INEPT, totally correlated spectroscopy
- Such methods will in some instances involve ab initio prediction of the main chain coordinates (such as the method of Finkelstein and Reva, 1991) , and in other instances involve interpretation of experimental data (e.g. X-ray diffraction results) to resolve the main chain coordinates.
- the positions of all main chain atoms need not be initially determined.
- the carbonyl carbon and oxygen, C ⁇ , and the amide nitrogen are generally constrained to lie in a plane. With this constraint and the knowledge of the positions of some of these atoms, and amino to carboxyl direction, the remaining atoms of the peptide main chain can be constructed as known in the art (see, Kabsch, Acta Crvst. (1978) vol. A34, pg. 827, which is incorporated by reference for all purposes) .
- amino acids may be either L-optical isomers or D-optical isomers, but unless otherwise specified will be the naturally occurring L- natural amino acids.
- Standard abbreviations for amino acids will be used, whether a single letter or three letters are used. The single letter abbreviations are included in Stryer, Biochemistr . 3rd
- a primary sequence of the peptide is mapped onto this peptide conformation.
- a primary sequence is mapped onto a main chain by assigning a side-chain to a particular main chain atom.
- a glutamic acid side-chain conventionally designated by the symbol E, is assigned to the first alpha carbon of the peptide, C_ a of the peptide, an aspartic acid side-chain (symbol D) is assigned to the second alpha carbon peptide, C 2 ⁇ , a glycine (symbol G) side-chain to C 3 ⁇ and C 4 ⁇ , etc.
- the three-dimensional position of Co for each side- chain is determined according to predefined relationships between the main chain backbone. Mapping of the primary sequence of the peptide onto the main chain backbone identifies the alpha carbons associated with each amino acid and it positions C ⁇ for each residue in a predetermined position relative to the main chain backbone.
- the primary sequence of a peptide represents the identity and sequence of the peptide's amino acids and may be obtained by techniques well-known in the art of peptide chemistry and molecular biology. Suitable methods for determining the primary sequence include, but are not limited to, direct determination from X-ray crystal data, peptide sequencing, and gene sequencing. Determination of a peptide's primary sequence from X- ray data consists of tracing the electron density map of the peptide and assigning the side-chains to each residue based on the electron density and knowledge of side-chains structure. A second and more conventional method of primary structure determination is peptide sequencing and is well known in the art.
- Edman degradation which exemplifies peptide sequencing, removes a single amino acid from amino terminus of the peptide bonds between other amino acid residues.
- Edman degradation generally uses phenyl isothiocyanate which reacts with the uncharged terminal amino group of the peptide to form a phenylthiocarbamoyl derivative.
- a cyclic derivative of the terminal amino acid is released into the solution leaving the intact peptide shortened by one amino acid.
- the liberated cyclic compound is a phenylthiohydantoin amino acid that is identified by chromatography (See Stryer "Biochemistry" (1975) W.H. Freeman & Co., pg.
- Another peptide sequencing method uses isothiocyanate under different conditions to sequence the peptide from the carboxyl terminus (see Schlack et al. Z. Physiol. Chem. (1926) vol. 26, pg. 865; Bailey et al. Tech. Prot. Chem. II (1991) pg 115; and Boyd et al. Tet. Lett. (1990) vol. 27, pg. 3849; which are all incorporated by reference for all purposes) .
- Other methods of peptide sequencing include cyanogen bromide degradation, trypsin digestion, staphylococcal protease, etc. , alone, or in combination with the above described techniques, as is well-known in the art.
- Gene sequencing is another common method for obtaining a peptide primary sequence. This method involves isolating the gene encoding the peptide, sequencing the gene, converting the resulting four-nucleotide code of nucleic acids to the 20-amino acid code of peptides.
- Insertion and expression of the library in a suitable host identifies host cells containing the vector containing the gene that encodes the peptide.
- host cells can be isolated and their DNA isolated and sequenced.
- Methods for sequencing genes are well known in the art, (see for example, Sambrook et al. "Molecular Cloning: A laboratory Manual” 2d ed. , (1989) Cold Spring Harbor Press, chapter 13, which is herein incorporated by reference. In general, two sequencing techniques are commonly used: the enzymatic method of Sanger et al. and the chemical degradation method of Maxam and Gilbert.
- each nucleotide base in the oligonucleotide has an approximately equal chance of being the terminus, and each population consists of an equal mixture of oligonucleotides fragments of varying lengths.
- This population of oligonucleotides is then resolved by electrophoresis under conditions that can discriminate between individual olignucleotides differing in length by as little as one nucleotide.
- the order of nucleotides along the DNA can be read directly from an autoradiographic image of the gel.
- amino acid side- chains are mapped onto the main chain backbone.
- mapping refers to the process of identifying the amino acid side-chains for each alpha carbon of a peptide main chain. This step is necessary to associate the correct side-chains with each residue's alpha carbon when only the main chain backbone structure is available. For example, in cases where the position of the main chain backbone structure is determined by low resolution crystallography, the identity of each residue is not obtained.
- Use of gene sequencing can provide the primary sequence of the peptide, which is used to specify the amino acid side-chains attached to each alpha carbon on the main chain backbone.
- a second aspect of sequence mapping involves specifying the three dimensional position of the beta carbon for each side-chains.
- the beta carbon for each amino acid has a predefined spatial relationship relative to the main chain atoms. This relationship is used when the position of the beta carbon is unknown.
- conformation energy of a peptide or other molecular system can be modelled in many ways, ranging from potential energy functions having a single van der Waals interaction term, to potential energy functions having many terms that account for torsional biasing, electrostatic interactions, hydrogen bonding, hydrophobic interactions, entropic destabilization, cystine bond formation, and other e fects.
- r is the interatomic distance and r 0 and £ 0 are empirical parameters describing, respectively, the equilibrium interatomic distance and the depth of the energy well for the van der Waals interaction of the pair of atoms.
- Other forms of this expression such as those involving different combinations of exponents may also be used.
- Table 1 presents preferred values used in the preferred embodiment of the invention. These parameters may be optimized by a variety of means known to those of skill in the art. No attempt has been made to optimize the particular values shown because they gave excellent results.
- hydrogens atoms attached to both main chain and side- chains atoms are preferably not included in this molecular representation. In order to compensate for this, the van der Waals radius of each atom that has attached hydrogens is slightly augmented.
- the van der Waals force is an electrostatic interaction arising from an instantaneous asymmetric electron distribution, which causes a temporary dipole. This transient dipole induces a complementary dipole in a neighboring atom to stabilize the transient dipole. An instant later the dipoles are likely to be reversed resulting in an oscillation and a net attractive force. At one extreme (as r tends to infinity) , atoms do not interact and have no stabilizing or destabilizing effect on one another. At the other extreme (as r tends to zero) the electrostatic repulsion between atoms becomes strong and dominates other stabilizing effects. The Lennard-Jones potential becomes infinite, which physically corresponds to superimposing two atoms.
- a torsional potential energy function models the interaction of linear four-atom sequences, such as Y-C-C-X.
- Y-C-C-X See Streitwiser et al. "Organic Chemistry," 2d ed. , Wiley & Sons, pg. 70 (1987) for a description of torsions about a carbon-carbon single bond
- Fig. 4 schematically shows a torsion about a carbon-carbon single bond.
- Fig. 4a is a stick representation of Y-C-C-X
- Fig. 4b shows torsion X in a view along the C-C bond.
- Suitable torsional potentials have the form:
- K is a constant that is typically about 1 to about 5 kcal/mol (preferably about 1.5 kcal/mol), n is 1-3, d is 0-360° and ⁇ is the torsion angle between the groups attached to the two central carbon atoms.
- the magnitude of the interaction, K depends on the individual identities of all groups attached to the central carbons. In general, when the atoms X and Y are large, K is also large.
- the torsional potential for alkane bonds represents the tendency for groups attached to central carbon-carbon single bond to adopt a trans or gauche conformation. The potential is applied to all rotational degrees of freedom for each amino acid residue, except for ⁇ of phenylalanine, tyrosine, histidine, and tryptophan. Since these involve an sp 2 hybridized carbon, they require a torsion potential that accounts for the two-fold rotational symmetry of the planar ring.
- E electrostatic ⁇ Z A Z B/ D ⁇ where r is the interatomic distance between two charged atoms, A and B; Z A and Z B equal the respective charges on the two atoms; and D is the dielectric constant of the environment around atoms A and B.
- r is the interatomic distance between two charged atoms, A and B; Z A and Z B equal the respective charges on the two atoms; and D is the dielectric constant of the environment around atoms A and B.
- r is the interatomic distance between two charged atoms, A and B
- Z A and Z B equal the respective charges on the two atoms
- D is the dielectric constant of the environment around atoms A and B.
- the effective charge of an atom depends on its surrounding environment including such factors as, for example, pH, accessibility to water, the polarity of the solvent, and the presence of other charges.
- Other types of electrostatic forces influence peptide structure as well.
- dipole moments which describe partial charges on an atoms, occur in an uncharged, but polar groups of atoms.
- the electrostatic potential described by such dipole moments are well known and may be implemented as is known in the art.
- Another type of primarily electrostatic interaction is the hydrogen bond, which occurs when a hydrogen atom is shared between a proton donor and a proton acceptor.
- Hydrogen bonds stabilize pairs of polar moieties having hydrogen atoms to share and donate, such as between a serine hydroxyl group and the carbonyl carbon of an amide group, or between acid group such as the carboxyl of a glutamic acid and water.
- the potential energy terms for both dipole and hydrogen bond interactions are well known in the art (see Cantor et al.) .
- Hydrophobic interactions are destabilizing noncovalent interactions between an atom having hydrophilic character and one having hydrophobic character. For example, large hydrophobic interactions occur between the polar, aqueous environment of the solvent and nonpolar residues of the peptide, such as valine, leucine, isoleucine, phenylalanme, etc.
- hydrophobic interactions result in a tendency for nonpolar side-chains to avoid interaction with solvent.
- Potential energy functions representing hydrophobic interactions are well known in the art and are used in some preferred embodiments to increase the prediction accuracy of hydrophobic side-chains that happen to be exposed to solvent on the surface of the peptide.
- the physical stability of a peptide is modelled by a potential energy function having only van der Waals and torsional energy terms for simplicity. In other preferred embodiments, one or more of the previously-described energy terms are added.
- the method of the present invention involves moving structural elements of molecular systems (e.g. peptide or nucleotide side-chains) to maximize favorable interactions and minimize unfavorable ones.
- a conformation probability map is produced which represents the probability that the molecular system will reside in a particular conformation at any given time.
- an "ensemble" of probable molecular conformations is produced that provides a substantially more accurate description of a molecular system than a static structure representation. This is because real molecular systems constantly move between a variety of conformations many of which are not accounted for by a static structure. Even if the static conformation chosen is energetically favorable, it will only represent the state of the molecular system over a fraction of time.
- the method of the present invention also produces a conformation energy map representing the energy ensemble of the molecular system.
- energetically favorable conformations of the molecular system can be quickly identified.
- a major factor influencing the conformation of a structural element is the necessity of avoiding steric overlap.
- One aspect of the present invention predicts energetically favorable conformations by minimizing the steric packing interactions. Of course, other influences such as electrostatic interactions are very important in some molecular systems and must therefore be included in some predictive method.
- a preferred embodiment of the invention predicts time averaged peptide side-chains positions by determining the relative steric energy of each conformation for each side-chains. Low energy side-chains conformations correspond physically to a peptide conformation having well-packed side-chains. Finding these side-chains conformations requires an efficient search and minimization strategy to locate energy minima in a very large conformation space.
- a preferred minimization method resembles the molecular field theory reported by Finkelstein and Reva (1991) , and, more generally, the Hartree-Fock self-consistent-field (SCF) methods (see Levine “Quantum Chemistry” (1983), pg. 256 et seq. , Allyn and Bacon, Newton, Massachusetts, and Blinder Am. J. Phys.. (1965) vol. 33, pg. 431, which are both incorporated by reference for all purposes) . Finkelstein and Reva employed a similar molecular field approximation to select among prospective ⁇ sheet foldings.
- the SCF method for multi-electron atoms uses the approximation that the exact wave function of a higher atomic number atom or polyatomic molecules is approximated by product of single electron wave functions and minimizes the variational integral with this approximate wave function.
- the Hartree-Fock method first guesses a wave function. The method concentrates on a first electron, ignoring the positions of the remaining electrons and assuming that they form a static electronic distribution through which the first electron moves. In effect, the method time averages the instantaneous interactions between the first electron and the remaining electrons. This static electronic configuration produces a potential energy field. Solution of the one-electron Schroedinger equation with this potential energy function results in an improved orbital for the first electron.
- the SCMF method then calculates an improved orbital for each electron in the atom to give a full set of improved orbitals. To improve the orbital wave functions, the method repeats this entire method using this improved set of orbitals to further improve the orbitals. This is repeated until it converges to a "self-consistent" set of electron wave functions.
- the inventive method minimizes the conformation energy of a macromolecular structure by minimizing the interaction energy of a side-chains in the potential energy field created by the macromolecular complex.
- the invention method uses approximate potential fields to bootstrap the solution.
- a preferred embodiment of the inventive method begins by supplying an potential energy function for the macromolecular structure. Next, the interaction energy of the various elements (or residues) of the complex are calculated from the potential energy field created by the other elements. The elements are then moved about to form a variety of conformations for each element. After each move, the interaction energy is recalculated for the modified complex.
- the interaction energies for each conformation of each element are averaged to form conformation energy maps for each element. These maps are then used to produce corresponding conformation probability maps for each element a the cycle is completed.
- the elements are moved through a variety of conformations in accordance with the probability maps constructed in the previous cycle. Conformation energy and probability maps are then produced in the first cycle.
- the interaction energy of the various elements will converge to a "self-consistent" or constant solution.
- the conformation probability and energy maps associated with this solution represent ensembles of the macromolecular complex.
- the probability map can be viewed as a representation of the time-average conformation of a given element.
- the prediction problem may be recast in a very different way.
- the thermal ensemble of conformations may be optimized to find the ensemble most likely for a given protein at a given temperature.
- Such an approach might not only give a more realistic prediction of a protein's structure and energetics, but can also draw upon a rather different set of optimization techniques, founded in basic thermodynamics.
- a preferred procedure of the present invention involves iterative thermodynamic refinement, which gradually condenses a protein's set of possible side-chain conformations into the most likely, self-consistent ensemble at a chosen temperature.
- each residue i is assigned a conformational probability map pi ( ⁇ _) , which records the time-fraction it spends in each of its possible conformations ⁇ - j _:
- the set of all residues' probability maps P ⁇ p_ p 2 ... p n ⁇ specifies the state of the overall protein's ensemble, and permits the calculation of a potential of mean- force, for example, the potential energy of a probe atom A
- Ei(Xi) ⁇ all res j ⁇ i ⁇ a ll ⁇ j Pj(Xj) j(Xi, j)
- E ij ( ⁇ i , ) is the interaction between residue i (in conformation ⁇ _) and residue j (in conformation x ⁇ ) .
- a residue's set of potential energies E_ ( ⁇ _) over all its possible ⁇ _ comprises its conformation energy map, and the set of all such E i ( ⁇ i ) for all residues i comprises E, the mean-field.
- the probability map set P specifies a unique mean-field E.
- the probability of each particular conformation may be determined by many forms known to those of skill in the art. However, it should depend directly upon the unique mean field.
- a preferred method of determining the probability associated with each conformation derives from the statistical mechanical canonical ensemble.
- the thermal ensemble representing the correct time-averaged structure for the protein at equilibrium
- Self-consistent solutions obtained by this procedure provide a predictive model of the protein's thermal ensemble. In general any starting ensemble will converge to a self-consistent ensemble. However, this does not guarantee convergence to the ensemble representing the native state, as there might be multiple solutions to the ensemble prediction problem, which confound its search for the desired native structure.
- each residue should be sampled in each possible conformation between about 5 and about 20 times on average to calculate E i ( ⁇ i ) . In most preferred embodiments, only about 8 to about 10 samples are necessary.
- the conformations are generated randomly by selecting a conformation ⁇ for each i according to its p_ ( ⁇ _) , interspersing simple step moves (in which the residue moves by a slight perturbation from its current conformation) with occasional jump moves (in which it can move to any conformation) .
- this sampling procedure may be seeded with a randomly selected conformation which will be referred to herein as the "starting structure".
- Jump moves though in this respect computationally more expensive, ensure that the sampling procedure can cross energetic barriers to give uniform sampling across the conformation space. Combined, these moves provide an efficient and comprehensive method for sampling the mean- field.
- the potential energy of each residue is calculated, and added to the running average of the potential energy of the current conformation.
- the average E i ( ⁇ i ) is used to calculate a new p i ( ⁇ i ) for each residue.
- a variety of starting conformations may be employed in the present invention. These will preferably take the form of a geometric representation of the peptide stored within a computer memory. In many instances, the initial conformation of the geometric representation will have each side-chain randomly oriented without regard to neighboring side-chains. Not only does this provide the most strenuous test of the method's predictive power, but it is also generally good practice for prediction of unknown structures—when little is known about a structure. It is generally better to start unbiased than to employ a generic bias that might exclude correct answers. However, a completely random initial structure contains no information about the actual ensemble and, therefore, results in a computationally expensive procedure. In instances where some information is known about the starting structure (from sources such X-ray diffraction or other models for example) , it will sometimes be advantageous to use that information to construct a starting structure. This approach will often result in considerable savings in computation time.
- the conformation probability map is uniform. Physically, this corresponds to a very high or nearly infinite temperature. At such temperatures the thermal motion of the peptide will overwhelm the steric pressure that promotes ordered packing.
- the constant-condense method At very high temperatures, all conformations have about equal probability, while at room temperature the ensemble is sharply focused into a small peak of conformations that represent the native state. The prediction problem, then corresponds to condensing down the diffuse probability map of the T — ⁇ starting ensemble into a sharp peak. It is important that this occur smoothly and gradually.
- the constant condense method automatically gives a constant, controllable amount of condensation of the P i (X j _) in each cycle.
- the thermal factor kT in the Boltzmann probability equation is replaced with an effective temperature ⁇ , where ⁇ is the standard deviation of the current mean field (E i ( ⁇ i ) (over all ⁇ _) , and where r is a constant "thermal" factor controlling the rate of condensation (the larger ⁇ is, the slower the condensation) .
- This has the effect of scaling the effective temperature to the "natural dimension" of the mean- field energy distribution. (e.g., Conformations with energy one standard deviation above the mean will be assigned probability e _1 / ⁇ lower, two standard deviations above gives probability e ⁇ 2 / ⁇ lower, etc.
- Two other cooling procedures are reciprocal and linear thermal cooling. Since the constant-condense effective temperature ⁇ differs for each residue (via its dependence on ⁇ ) , this method departs from correct thermodynamics in that it does not model an ensemble equilibrated to a uniform temperature.
- Two cooling methods that do employ a uniform temperature to cool gradually from 6000°K -> 298°K are reciprocal cooling, which sets the temperature proportional to 1/icyc (where icyc is the number of cycles done so far) , and linear cooling, which simply reduces T linearly, and then allows the ensemble to equilibrate over several cycles (10) at the final temperature. Both methods gave essentially the same structure and energetic predictions as the constant-condense procedure.
- Energetics predictions were derived from these calculations simply by tracking the average total energy of the system until it converged to an unchanging value, and using the average total energy in the final cycle as the predicted "packing energy" for the peptide.
- the energy for different mutant peptides was found to condense at identical rates, so a constant number of iteration cycles will preferably be used for all the calculations.
- fifteen cycles were used for constant-condense or reciprocal cooling runs, and fifteen cooling plus ten final equilibration cycles for linear cooling runs. The latter runs were given extra equilibration cycles at the final temperature (298°K) , because the energy was still not converged at the end of the linear cooling cycles.
- an ensemble may be generated according to the present invention by starting with a static structure that corresponds to a known conformation, using Information obtained from X-ray diffraction or other techniques.
- the present invention can be used to take a known static structure and convert it to a more complete ensemble of structures and associated energies.
- the geometric representation of the peptide will be heated rather than cooled to produce the thermal ensemble.
- the starting temperature used in the probability expression will be well below infinity and the conformation probability map will be well defined (as opposed to uniform) at the beginning of the procedure.
- a heating procedure may also provide greater overall accuracy when the peptide to be modelled contains many residues that are not well-packed (i.e.
- the invention may be embodied on a digital computer system such as the system 100 of Fig. 5, which includes a keyboard 102, a fixed disk 104, a display monitor 104, an input/output controller 106, a central processor 108, and a main memory 110.
- the various components communicate through a system bus 112 or similar architecture.
- the user enters commands through keyboard 102; the computer displays images through the display monitor 104, such as a cathode ray tube or a printer.
- an appropriately programmed computer such as a Silicon Graphics Iris 4D/240GTX is used.
- Other computers may be used in conjunction with the invention. Suitable computers include mainframe computers such as a VAX (Digital Equipment Corporation,
- the internal processes of the prediction method generally consists of a setup routine which loads data and performs preliminary data analysis, and a minimization routine that minimizes the conformation energy of the peptide. These processes will be described in detail with reference to the flow charts in Fig. 6.
- Data for each residue type may be stored in a residue description having the following form:
- This residue description contains four major sections.
- the next section describes the atoms by type, the movement order, and the van der Waals (Lennard-Jones) constants. For example, the entry:
- the third section of the data describes the bond lengths and bond angles of the residues in a local frame of reference. For example, the entries,
- CA 1.48 specifies that the bond length between the amide nitrogen and C ⁇ is 1.48 angstroms. "ang ILE C ILE CA ILE CB
- the fourth section defines the three dimensional angular relationships between different atoms.
- Fig. 7 The "twist” relationship is shown in Fig. 7 and describes the relationship of an atom with respect to a set of three other atoms.
- Fig. 7a the three atoms N, C ⁇ and C of the ILE residue uniquely define a plane that pass through the atoms.
- "Twist” defines the angle that the fourth atom makes with this plane, as shown in the Fig. 7b.
- "tor” 25 is shown in Fig. 7c-d and defines the torsional angle between the atoms N and CGI, about the bond formed by CA and CB.
- ⁇ dof> indicator which specifies that there is a rotational degree of freedom between atoms CA and CB.
- K is preferably 1.5 kcal/mol
- n is preferably 3.0
- d is preferably 0.0, indicating a three-fold torsional potential having a maximum potential energy of 1.5 kcal/mol for a full eclipsed structure.
- the initial data which represent the main chain conformation, includes data for each atom in the peptide main chain such as the three dimensional position and its chemical identity (for example, whether the atom is a carbon, nitrogen, oxygen, etc.). Such data comes from a variety of sources. As described above, diffraction, NMR, theoretical prediction or another suitable method may provide the main chain coordinates and identities.
- the main chain coordinates for peptides described herein, however, are derived from the Standard Brookhaven Protein Data Bank (PDB) , which is well known in the art.
- PDB Standard Brookhaven Protein Data Bank
- a computer program was written in c.
- the main calculations consist of iterating the mean-field calculation over a set number of thermal cycles sufficient to allow the energy and maps to converge to a final, unchanging answer.
- the van der Waals interactions for each residue with all fixed atoms are precalculated for all its possible side-chain conformations (the energy calculations are described below) .
- lists of side-chain atoms which can come within a nonbonded cutoff distance (6 A) of each other are compiled prior to the main calculations.
- a preferred method of data input is described with reference to Fig. 6a.
- an optional process step 212 supplies the coordinates and the atom types for the missing main chain atoms.
- Such a process calculates, based on the positions of C and the amino to carboxyl direction, the positions of the carbonyl carbon and oxygen, and the amino group.
- the primary sequence of the peptide is mapped onto the main chain.
- the primary sequence is merely a sequence of data representing the amino acid sequence.
- the mapping associates an amino acid side chain with each C_ a , as shown in step 214. Referring to Fig.
- step 300 the coordinates of the main chain atoms are used as input to determine the position of Ci ⁇ (step 302) . Pairwise interaction tables are calculated (step 304) , initial conformations are assigned to each side-chains (step 306) and the steric energy is calculated for this initial conformation (step 308) .
- a group of computer programs written in c is used to perform set up and execution of the method of this invention. These routines were compiled and ran on a Silicon Graphics Iris 4D/240GTX computer. The main program was employed to calculate self consistent mean field calculations.
- the main program (“cara”) provides conformation energy maps and confirmation probability maps for the test peptide at the final temperature of the run. As described above, this information can be used to determine the packing or binding energy of the system being investigated.
- the main program reads a binary input file ("readpro") produced by another routine.
- the information used by the binary input file to create the binary input file is taken from three files.
- a coordinate file is used to supply a list of the peptide atoms together with their Cartesian coordinates.
- One source of such lists is the Brookhaven Protein Data Bank (PDB) .
- a data file (plib) , which is described in detail above, provides, among other information, the types of atoms, each of the amino acids, their movement order and van der Waals constants.
- a routine known as resmap.lib describes an envelope or range of movement available to the various atoms of each side chain by virtue of the side chain torsional degrees of freedom.
- Plib and the PDB data are used by another routine ("applib") to create text describing the information contained in plib and the PDB. Collection of this information is coordinated by another routing ("upset") , a routine contained in a listing of auxially files (“auxfiles”) .
- the output of applib is used by "makegen” to convert the text information from applib into local frames of reference for each degree of freedom.
- This information together with the output of resmap.lib is used by readpro (described above) to produce the binary data file used by the main routine.
- Another routine (“psizer") determines the size of the files being sent to readpro and allocates memory sufficient to store this information.
- a preferred embodiment of the invention uses look-up tables to tabulate pairwise interactions between atoms in the peptide.
- the first look-up table lists side-chains-main chain atom interactions while the second table lists side-chains-side-chains interactions. These lists reflect the notion that atoms in separate three-dimensional areas of the peptide do not interact and, thus, should not considered.
- construction of the first list begins by first classifying atoms as moving or stationary. Stationary atoms are not moved during the minimization and include all main chain atoms, as well as c" atoms, and any other atoms in the peptide (including any desired side-chains) that are held fixed in space during the minimization.
- a pairwise list of the moving atoms that could interact with main chain atoms is generated by moving the side-chains through their possible positions and tabulating atom pairs that can come within ⁇ k of each other.
- This pairwise interaction list is a boolean list that indicates which atoms could possibly interact with other moving atoms.
- Each thermal cycle consists of setting all moving residues to a starting structure, and calculating the energy for a large number of moves, sufficient to obtain a good approximation for each residue's conformational energy map.
- enough moves are made to obtain a weighted average of at least about 5 samples of each (and preferably about 8) residue conformation.
- the weighting accounts for the different probabilities associated with each conformation.
- To make one move a new conformation is selected for each residue, by looking up the relative probabilities of all conformations within a certain distance of its current conformation, and choosing one.
- torsions were represented as the integers 0-32, covering the range of rotations 0-360° in discrete steps of about 12°, providing sufficient resolution for the work described herein.
- step sizes may be used depending upon the resolution desired in the run, such as, for example, 5-20°.
- conformations within ⁇ 1 step of the current torsion angles are allowed; in "jump" moves steps of magnitude greater than 1 are considered and preferably all of the residue's torsional conformations are considered.
- the temperature factor ⁇ is calculated from the ⁇ of the E mean _ fiel for the residue.
- a new conformational probability ma is generated from each residue's E mean _ field , according to the Boltz ann probability equation, concluding the thermal cycle.
- the starting structure used for beginning a new thermal cycle is set either to random torsions (for the very first cycle) , or to the peak probability conformation.
- the latter structure, used as the start of every cycle after the first, is generated by setting each residue to the highest probability conformation in its latest conformational probability map.
- the method's physical basis is both simple and well- founded. It uses only van der Waals interactions and a simple alkane torsional potential, whose force constants are relatively well-known from experimental data.
- the constants used in the present calculations were derived from refinement of experimental measurements of organic crystals (Hagler et al., 1974), and have been in use since the 1970's (e.g. Levitt, 1983) .
- the van der Waals calculations recognize three distinc atom types—oxygen, nitrogen, and carbon/sulfur—characterized by two constants each: an equilibrium atomic diameter, and a scale factor giving the strength of the equilibrium interaction.
- Figure 6d is a flow chart of the main program for calculating a peptide's conformation energy maps and conformation probability maps.
- the routine is initiated by setting an initial conformation probability map (typically having a flat contour) .
- various operations for a thermal cycle at 1001 are performed, and the cycle counter, "icyc," is set to 0.
- an initial temperature will be set.
- the temperature is typically changed with each new iteration.
- a number of cycles are run to generate a conformation energy map for each residue. This is accomplished by first checking the number of moves (icyc) at 1004 to determine whether a multiple of 200 moves have been made.
- a normal step move 1006 is made by adjusting the conformations of the peptide's side chains by a small increment as described above. If, however, icyc is a multiple of 200 a jump move 1008 is performed. As described above, a jump move results in the conformations of the peptide's side chains being moved by relatively large increments.
- a flag 1010 is set forcing an update of the list of interactions considered in calculating the interaction's energies between various residue atoms. After either a jump move or a step move, the interaction energies for the atoms of each residue are calculated at block 1012. Next, those interactions energies are summed to obtain an overall residue energy for each side chain at 1014.
- Step 1016 The residue energy is then saved at step 1016 to be used later to calculate the conformation energy map for the peptide.
- Steps 1018 and 1020 check to determine if a sufficient number of moves (cycles) have been made to adequately sample conformation space. As noted above, when each conformation has been sampled by a weighted average of 8, the iteration at the current temperature is completed. In addition, if icyc exceeds a pre-set data- limit the current temperature iteration is completed. After each current temperature iteration, a new temperature is set
- step 1022 the system is checked to see whether a sufficient number of temperature iterations has been conducted to end the run.
- all side-chain torsions are set to random angles selected in the range of 0 to 360° and having a uniform probability distribution.
- the side-chains are place in predetermined positions according to, for example, the crystal structure data on homologous or mutant/wild type enzyme.
- the method can predict side-chains conformations in local zones of 5 to 15 residues within a protein, or alternatively, simultaneously predict all the side-chains conformations within a whole protein. In some cases, it may b advantageous to predict the conformations of only a fraction of the total peptide side-chains. For example, some peptides will have conformations that are well known from X-ray crystallography or other techniques, and mutants of these peptides will have only slightly perturbed structures. Because it can be expected that the mutant structure will deviate from the wildtype peptide structure only at certain localities, e.g. near the mutation site, the side-chains that are sufficiently separated from these localities may, in certain circumstances, be held in fixed conformations during the self consistent mean field iterations.
- These fixed conformations preferably correspond to the known conformations of the wildtype peptide.
- the initial conformations of peptides in the vicinity of the mutation may be selected randomly or on the basis of some preferred pattern, such as the wildtype conformation. This approach may require considerably fewer computations when the overall peptide size is large in comparison to the mutated region(s) .
- Mutated peptides within the scope of the present invention can be synthesized chemically by means well-known in the art such as, Merrifield solid phase peptide synthesis and its modern variants.
- Merrifield solid phase peptide synthesis for an exhaustive overview of chemical peptide synthesis, see Principles of Peptide Synthesis, M. Bodansky, Springer, Verlag (1984) ; Solid Phase Peptide Synthesis, J.M. Stewart and J.B. Young, 2d ed., Pierce Chemical Co. (1984); The Peptides: Analysis, Synthesis, and Biology, (pp. 3-285) G. Barany and R.B. Merrifield, Academic Press (1980) . Each of these references is herein incorporated by reference for all purposes.
- the synthesis starts at the carboxyl-terminal end of the peptide by attaching an alpha-amino protected amino acid such as, t- butyloxycarbonyl (Boc) or fluorenylmethyloxycarbonyl (Fmoc) protective groups, to a solid support.
- Suitable polystyrene resins consist of insoluble copolymers of styrene with about 0.5 to 2% of a cross-linking agent, such as divinyl benzene.
- the synthesis uses manual synthesis techniques, as in traditional Merrifield synthesis, or automatically employs peptide synthesizers. Both manual and automatic techniques are well known in the art of peptide chemistry.
- the resulting peptides can be cleaved from the support resins using standard techniques, such as HF (hydrofluoric acid) deprotection protocols as described in Lu, G.S., Int. J Peptide & Protein
- cleavage methods include the use of hydrazine or TFA (tri-fluoracetic acid) .
- mutated peptide designed by the methods described in the present disclosure can be produced by expression of recombinant DNA constructs prepared according to well-known methods. Such production can be desirable when large quantities are needed or when many different mutating peptides are required. Since the DNA of the wildtype (or other related) peptide has often been isolated, mutation into modified peptide is possible.
- the DNA encoding the mutated peptides is preferably prepared using commercially available nucleic acid synthesis methods. See Gait et al. "Oligonucleotide Synthesis; A
- Expression can be affected in either procaryotic or eucaryotic hosts.
- Procaryotes most frequently are represented by various strains of E. Coli. However, other microbial strains may also be used, such as bacilli, for example Bacillus subtilis. species of pseudomonas, or other bacterial strains.
- plasmid vectors that contain replication sites and control sequences derived from a species compatible with the host are used. For example, a common vector for E. coli is pBR322 and its derivatives.
- procaryotic control sequences which contain promoters for transcription initiation, optionally with an operator, along with ribosome binding-site sequences, include such commonly used promoters as the beta-lactamase and lactose (lac) promoter systems, the tryptophan (trp) promoter system, and the lambda- derived P L promoter.
- lac beta-lactamase and lactose
- trp tryptophan
- lambda- derived P L promoter any available promoter system compatible with procaryotes can be used.
- Expression systems useful in eucaryotic hosts consist of promoters derived from appropriate eucaryotic genes.
- a class of promoters useful in yeast includes promoters for synthesis of glycolytic enzymes, such as 3- phosphoglycerate kinase.
- Other yeast promoters include those from the enolase gene or the Leu2 gene obtained from YEpl3.
- Suitable mammalian promoters include the early and late promoters from SV40 or other viral promoters such as those derived from polyoma, adenovirus II, bovine papilloma virus or avian sarcoma viruses. Suitable viral and mammalian enhancers are cited above. When plant cells are used as an expression system, the nopaline synthesis promoter, for example, is appropriate.
- the expression systems are constructed using well- known restriction and ligation techniques and transformed into appropriate hosts. Transformation is done using standard techniques appropriate to such cells.
- the cells containing the expression systems are cultured under conditions appropriate for production. It will be readily appreciated by those having ordinary skill in the art of peptide design that the mutated peptides that are designed in accordance with the present disclosure and subsequently synthesized are themselves novel and useful compounds and are thus within the scope of the invention.
- the physical stabilities can be measured using a variety of physical techniques.
- thermal stability can be determined by assaying a specific property of the mutated protein at different temperatures as is well known in the art.
- Physical stability is a structural property, and generally indicates the stability of a folded conformation of the peptide relative to an unfolded or denatured state.
- Many methods such as spectroscopy, sedimentation analysis, chemical assays, etc. can determine whether a peptide has undergone a structure change. For example, NMR, circular dichroism, fluorescent transfer, etc. can measure the folded state of a peptide at different conditions.
- mutants Of the 125 possible permutations, seventy-eight of the mutants were analyzed in vivo for DNA binding activity, and nine were purified for thermostability measurements. These mutants will be designated by the amino acids at the three mutated residues 36, 40 and 47; thus 36 val 40 met 47 val, the wildtype is "VMV".
- Fig. 8a presents a comparison of predicted packing energy versus measured thermostability for a six residue molten-zone. The predicted energies were generated by seven runs form different random starts for each mutant; the error bars indicate their standard deviation.
- Fig. 8 b shows detection of anomalous strain by comparing the energies calculated using the six residue molten- zone versus an eight residue zone. To facilitate comparison, the energies are shown relative to wildtype.
- Fig. 8 c presents a comparison of predicted packing energy versus measured thermostability for the eight residue molten-zone.
- VAV is the only example of destabilization by loss of attractive van der Waals interactions, rather than by gain of repulsive interactions, and thus probably creates little anomalous strain.
- the calculated packing energies for two of the mutants, IMV and LLI are less than that of wildtype, because they are able to fit in additional methylene groups (one in IMV, two in LLI) without bad contacts to the surrounding structure, obtaining a net decrease in the van der Waal's energy.
- these mutants do exhibit improved thermostability (3°C and 4°C, respectively). While both the overall trend and detailed ordering of the thermostability data are captured well by the predictions, the two mutant containing Phe do not fit the observed correlation line at all. Examination of their predicted structures reveals bad contacts with the fixed context surrounding the mutated residues, that produce an "anomalous strain" component that does not reflect the inherent packing qualities of these mutations, but rather artifact clashes resulting from holding the surrounding context fixed.
- This anomalous strain component can be directly detected by expanding the molten zone to include residues around these Phe insertions, specifically Leu 65 (adjacent to Phe 36) and Asn 61 (adjacent to Phe 40) .
- the pattern of predicted energies for the nine mutants is unchanged, except that FLV and FFI drop 11 and 18 kcal/mol respectively, relative to wildtype (see Table and Fig. 8 b) .
- These large energy shifts were produced by relatively slight structural adjustments in Leu 65 (30° in ⁇ _ , 15° in ⁇ 2 ) , and in Asn 61 (20° in ⁇ _ , 1 ° in ⁇ 2 ) .
- thermostability is the same as that of FFI (destabilized by 9°C relative to wildtype)
- its calculated packing energy is about 7.5 kcal/mol lower than that for FFI, which contains many repulsive contacts.
- anomalous strain component is about 60% of its calculated packing energy difference versus wildtype.
- the calculated energies also discriminated active from inactive mutants quite well (Fig. 9a) .
- Fig. 9 is a histogram showing the distribution of active (dark bars) and inactive (open bars) over the calculated energies (relative to wildtype) , the 10 sequences found experimentally to be fully active at 26°C (activity grade 5) all had strongly negative
- Fig. 9b reproduces Lim and Sauer*s analysis of the distributions of their active versus inactive mutants over the range of core packing volume, a simple measure often used to forecast internal mutations' viability.
- the distribution of inactive mutants is slightly shifted relative to that of active mutants, their extensive overlap makes volume a poor predictor of activity.
- choosing the optimal cut-off rule of ' ⁇ volume ⁇ 3 then active' is only a marginally better predictor (19 errors out of 78) than simply asserting 'all sequences are active' (22 errors out of 78) .
- E calc is 22/6 (nearly fourfold better) than this null assertion, volume is only 22/19 (16% better) .
- Fig. 8d presents a comparison of predicted energetics with experimentally measured activity.
- the 78 mutants* activity (measured experimentally) were plotted against their calculated energies.
- the experimental activity grades 0-5 were defined by a simple plate assay that challenges each repressor mutant-bearing clone with five phage covering a range of different virulence levels. Thus there was no reason to expect a linear relation between the activity grades and true activity.
- Lim and Sauer report that, among 10 mutants tested, DNA-binding affinity was about 0.1 (relative to wildtype) for mutants in grades 2-4, and ⁇ 0.01 for mutants in grades 0-1. 0 i? calc ; ⁇ , grade average.
- Comparison of prediction sets generated from many different starting structures provides a straightforward measure of the level of "noise" within the calculated energy (a very different matter from systematic error, due to incorrect aspects of the theory) , and its dependence on initial conformation.
- the standard deviation for VMV was 1.2 kcal/mol (by constant condense method) and 0.3 kcal/mol (by linear cooling method) .
- the average standard deviation was 0.7 kcal/mol (by constant condense method), and 0.5 kcal/mol (by linear cooling method); the overall trend and detailed ordering of the mutants' energies were unchanged.
- VMV the wildtype
- the predicted peak-probability structure closely matched the native structure as determined by X-ray diffraction, with an overall rms deviation of 0.49 A.
- Fig. 8 presents a comparison of predicted side-chain coordinates (bold lines) for an eight residue molten-zone surrounding the mutations, versus the X-ray structure (thin lines) . The main chain is shown as a dotted line.
- the prediction errors were confined primarily to the two methionines (residues 40 and 42, side-chain rms error 0.56A and 0.94A, respectively) and Asn 61 (0.65A). While these coordinate errors were slight, their basis was interesting.
- the X-ray coordinates contained a bad contact (Met 40 C ⁇ - Val 47 C ⁇ l , 2.93A « 6 kcal/mol) which the prediction avoided by moving the side-chain away and out, to position these atoms « 4.0 A apart. However, this forced Met 40 to within 2.7 A of Met 42 C ⁇ , in turn forcing this surface side-chain outwards into the gauche-conformation.
- Fig. 12 shows the condensation of one residue (Leu 64) in typical constant-condense prediction run on VMV.
- a contour line is drawn at a probability level equal to 10% of the peak probability density in the current map. The first cycle winnows the residue's conformation probability map greatly, excluding regions where it clashes strongly with the main-chain and surrounding fixed side-chains.
- a 0 was set to give a peak six-fold higher than the uniform background probability, and the decay constant r 0 was 60°.
- the conformational probability maps and total energy were slightly perturbed in the initial condensation cycles, the final result was not substantially effected. The system converged similarly to a low energy, and leu 64's conformational probability map gradually became more and more like that in the unbiased case, and converged to the same final peak.
- Fig. 13 illustrates the condensation of a six residue molten-zone for the wildtype protein, showing random samples of conformations form cycles at the beginning (a) , middle (b) , and end (c) of the run. The predicted side-chains are labeled in c. The last panel is representative of the dynamics in the final ensemble predicted by the method. The ensemble's progressive condensation slows and eventually halts because the residues become so focused they no longer strike bad contacts with each other.
- the ensemble becomes self- consistent when the residues' pressure to condense (due to collisions with each other) falls below the inherent pressure to diffuse supplied by thermal motion.
- the thermal motions in the final, equilibrated ensemble were quite slight, corresponding to B-factors in the range of 3 - 11 A 2 .
- B- factors correlated with atoms' distance along the side-chain from the fixed backbone, and were highest for residues at the protein surface (Met 42, Leu 64).
- the low overall B-factors reflect the method's use of a fixed mainchain, which disallows coupled motions of the protein as a whole.
- Met 42 converged to a single, well-defined peak in the final conformational probability maps, indicating a relatively confident structural prediction.
- Met 42's final ensemble in contrast, consisted of two separate peaks representing the conformations gauche- trans and gauche- gauche- (see Fig. 13c) .
- the method could not "decide” on a best conformation for this residue, and instead left both peaks in the final map, marking the prediction as internally inconsistent.
- the final map provides a further indicator identifying possible errors in the predictions.
- Fig. 14 illustrates convergence of the total system energy as a function of iteration cycle, for a six residue molten-zone by the constant condense algorithm.
- the plots for the wildtype protein and six mutants are nearly identical, except for relatively slight differences in their final, converged energies. It is these differences which are used to predict the mutations' effect on stability.
- Positive controls included significant procedural changes in the method that in principle are irrelevant to packing energetics (Table 2) , while negative controls disturbed the method's ability to calculate van der Waals' interactions accurately, and to condense reliably to the global minimum.
- the largest internal deviation among the set of positive controls was the variation of LLI, the most stable mutant, relative to IMV, the next most stable mutant.
- deliberately introducing inaccuracies into the van der Waals calculations severely disrupted the predictions. Deleting just one non-bonded list entry reduced the calculations' correlation with experimental T m s from about 0.9 to about 0.4 correlation coefficient.
- forcing the ensemble to condense too rapidly to locate the global minimum also destroyed the correlation with experiment.
- the simple point of these tests is that the method's predictions reflect the packing quality of the mutants' global minimum ensembles, not artifacts of the method's procedures.
- SCMF self-consistent field method
- Fig. 15a is a graphical representation of this, lowest energy (Emin) versus the number of conformational moves generated (N) .
- Emin lowest energy
- N number of conformational moves generated
- Fig. 15c analyzes the size-dependence of the two methods' rates of convergence.
- Fig. 15c shows how zone size affects the number of conformational moves required before attaining an energy one- third of the way between the minimum possible energy and the mean energy of a random sample of conformations.
- SCMF is uniformly low, about 1000-2000 moves for the very small zones containing only residues with 1-2 free torsions, and rising to about 7000 moves for the zones containing larger side-chains.
- Simulated annealing by contrast, requires increasingly large numbers of moves to converge to E one _ third , in proportion to the total number of residues i the molten zone.
- thermodynamic free energy difference ⁇ E
- FIG. 16 shows theoretical calculations of the ⁇ E for a series of hydrophobic core mutations in the protein barnase, compared with experimental measurements on these mutants (Kellis et al. , Biochemistry, 28:4914-4922 (1989)).
- Physical stability calculations for the native state were performed starting from x-ray coordinates of the wildtype protein; physical stability for the unfolded state was calculated starting from an extended 0-chain conformation. The energy differences between these calculated stabilities were added to the known ⁇ G transfer values for the mutated amino acids (representing the hydrophobic effect; the values reported by Bull and Breese (1974) Arch. Biochem.
- One aspect of the present invention involves the automatic search and identification of mutations in a given macromolecular complex which produce a desired effect on its physical stability.
- This aspect of the invention may be applied to design of drugs that bind their target more tightly, proteins which are more thermostable, proteins which bind a given DNA sequence specifically, and other uses which will be apparent to those of skill in the art.
- Subtilisin is a commercially important protein used in some cleaning applications. It would be highly desirable to produce mutant subtilisin peptides having increased thermal stability, while retaining the activity of the wildtype compound.
- the present method has been employed to generate stability prediction for various peptides including subtilisin.
- One approach employed was to identify buried hydrophobic residues and substitute for them similar hydrophobic residues, such that the wildtype amino acid and the substituted mutant amino acid differ only in the presence of one or few methylene groups. By quickly scanning the stability of the various mutants, promising candidates can be identified by noting sequences having energies below a preselected value. Such sequences can then be synthesized by techniques well known in the art, as described above.
- Table 3 presents the physical stability changes calculated for mutations of buried Valine residues in Subtilisin BPN-prime, to Isoleucine. Of the twenty-two mutations tested, at least seven (indicated by three or more "+" signs) are calculated to produce significant improvements in the native protein's physical stability. Furthermore, these mutations may be combined to produce additive improvements in physical stability which are quite large.
- wildtype subtilisin is as follows: ALA GLN SER VAL PRO TYR GLY VAL SER GLN ILE LYS ALA PRO ALA LEU HIS SER GLN GLY TYR THR GLY SER ASN VAL LYS VAL ALA VAL ILE ASP SER GLY ILE ASP SER SER HIS PRO ASP LEU LYS VAL ALA GLY GLY ALA SER MET VAL PRO SER GLU THR PRO ASN PHE GLN ASP ASP ASN SER HIS GLY THR HIS VAL ALA GLY THR VAL ALA ALA LEU ASN ASN SER ILE GLY VAL LEU GLY VAL ALA PRO SER SER ALA LEU TYR ALA VAL LYS VAL LEU GLY ASP ALA GLY GLN TYR SER TRP ILE ILE ASN GLY ILE GLU TRP ALA ILE ALA ASN ASN ASN ASN SER
- Trp Ala lie Ala Asn Asn Met Asp Val lie Asn Met Ser Leu Gly Gly 115 120 125
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- Genetics & Genomics (AREA)
- Medicinal Chemistry (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Peptides Or Proteins (AREA)
Abstract
L'invention se rapporte à un procédé pour déterminer la conformation en moyenne temporelle et l'énergie de tassement d'une structure macromoléculaire, tel qu'un peptide ou un acide nucléique. A cet effet, on utilise des cartes de probabilité de conformations pour mettre en rotation une multitude de chaînes latérales de peptides en vue d'obtenir un grand nombre de conformations différentes. A chaque conformation, l'énergie d'interaction de chaque chaîne latérale de peptides avec ses voisines est déterminée et utilisée pour affiner la carte de l'énergie de conformation. A la suite de mouvements de rotation répétés, ce procédé produit, pour chaque chaîne latérale, une carte d'énergie de conformation complète qui est ensuite employée pour déterminer une carte de probabilité de conformation. La nouvelle carte de probabilité de conformation remplace la précédante et un nouveau cycle peut commencer. Ce processus transforme une structure macromoléculaire en un ensemble final autoconsistant de conformations probables représentant une structure en moyenne temporelle de la macromolécule effective. L'énergie libre de la structure peut également être déterminée. Ce procédé peut servir à identifier la stabilité de peptides mutants.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US82372592A | 1992-01-21 | 1992-01-21 | |
US07/823,725 | 1992-01-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1993014465A1 true WO1993014465A1 (fr) | 1993-07-22 |
Family
ID=25239554
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1993/000418 WO1993014465A1 (fr) | 1992-01-21 | 1993-01-20 | Prediction de la conformation et de la stabilite de structures macromoleculaires |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO1993014465A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012037659A1 (fr) | 2010-09-24 | 2012-03-29 | Zymeworks Inc. | Système pour calculs de structures moléculaires |
WO2019232222A1 (fr) * | 2018-05-31 | 2019-12-05 | Trustees Of Dartmouth College | Conception de protéine par modélisation numérique utilisant des motifs structuraux tertiaires ou quaternaires |
CN113421610A (zh) * | 2021-07-01 | 2021-09-21 | 北京望石智慧科技有限公司 | 一种分子叠合构象确定方法、装置以及存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4704692A (en) * | 1986-09-02 | 1987-11-03 | Ladner Robert C | Computer based system and method for determining and displaying possible chemical structures for converting double- or multiple-chain polypeptides to single-chain polypeptides |
US4852017A (en) * | 1987-06-19 | 1989-07-25 | Applied Biosystems, Inc. | Determination of peptide sequences |
US4853871A (en) * | 1987-04-06 | 1989-08-01 | Genex Corporation | Computer-based method for designing stablized proteins |
US4908773A (en) * | 1987-04-06 | 1990-03-13 | Genex Corporation | Computer designed stabilized proteins and method for producing same |
US5008831A (en) * | 1989-01-12 | 1991-04-16 | The United States Of America As Represented By The Department Of Health And Human Services | Method for producing high quality chemical structure diagrams |
US5081584A (en) * | 1989-03-13 | 1992-01-14 | United States Of America | Computer-assisted design of anti-peptides based on the amino acid sequence of a target peptide |
-
1993
- 1993-01-20 WO PCT/US1993/000418 patent/WO1993014465A1/fr active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4704692A (en) * | 1986-09-02 | 1987-11-03 | Ladner Robert C | Computer based system and method for determining and displaying possible chemical structures for converting double- or multiple-chain polypeptides to single-chain polypeptides |
US4853871A (en) * | 1987-04-06 | 1989-08-01 | Genex Corporation | Computer-based method for designing stablized proteins |
US4908773A (en) * | 1987-04-06 | 1990-03-13 | Genex Corporation | Computer designed stabilized proteins and method for producing same |
US4852017A (en) * | 1987-06-19 | 1989-07-25 | Applied Biosystems, Inc. | Determination of peptide sequences |
US5008831A (en) * | 1989-01-12 | 1991-04-16 | The United States Of America As Represented By The Department Of Health And Human Services | Method for producing high quality chemical structure diagrams |
US5081584A (en) * | 1989-03-13 | 1992-01-14 | United States Of America | Computer-assisted design of anti-peptides based on the amino acid sequence of a target peptide |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012037659A1 (fr) | 2010-09-24 | 2012-03-29 | Zymeworks Inc. | Système pour calculs de structures moléculaires |
EP2619700A4 (fr) * | 2010-09-24 | 2017-06-07 | Zymeworks, Inc. | Système pour calculs de structures moléculaires |
US10832794B2 (en) | 2010-09-24 | 2020-11-10 | Zymeworks Inc. | System for molecular packing calculations |
WO2019232222A1 (fr) * | 2018-05-31 | 2019-12-05 | Trustees Of Dartmouth College | Conception de protéine par modélisation numérique utilisant des motifs structuraux tertiaires ou quaternaires |
CN113421610A (zh) * | 2021-07-01 | 2021-09-21 | 北京望石智慧科技有限公司 | 一种分子叠合构象确定方法、装置以及存储介质 |
CN113421610B (zh) * | 2021-07-01 | 2023-10-20 | 北京望石智慧科技有限公司 | 一种分子叠合构象确定方法、装置以及存储介质 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5241470A (en) | Prediction of protein side-chain conformation by packing optimization | |
US6631332B2 (en) | Methods for using functional site descriptors and predicting protein function | |
Brünger et al. | Computational challenges for macromolecular structure determination by X-ray crystallography and solution NMRspectroscopy | |
US7139665B2 (en) | Computational method for designing enzymes for incorporation of non natural amino acids into proteins | |
Voigt et al. | Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design | |
US6950754B2 (en) | Apparatus and method for automated protein design | |
US20030130797A1 (en) | Protein modeling tools | |
Rufino et al. | Predicting the conformational class of short and medium size loops connecting regular secondary structures: application to comparative modelling | |
US20130013279A1 (en) | Apparatus and method for structure-based prediction of amino acid sequences | |
US5553004A (en) | Constrained langevin dynamics method for simulating molecular conformations | |
Cavasotto et al. | The challenge of considering receptor flexibility in ligand docking and virtual screening | |
Vedani et al. | Pseudo-receptor modeling: a new concept for the three-dimensional construction of receptor binding sites | |
King et al. | Structure‐based prediction of protein–peptide specificity in rosetta | |
WO2001016810A2 (fr) | Procede informatise destine a l'ingenierie et a la conception macromoleculaires | |
Stoddard et al. | Molecular recognition analyzed by docking simulations: the aspartate receptor and isocitrate dehydrogenase from Escherichia coli. | |
WO1993014465A1 (fr) | Prediction de la conformation et de la stabilite de structures macromoleculaires | |
EP1471443B1 (fr) | Methode de construction de la stereostructure d'une proteine a plusieurs chaines | |
Datta et al. | Selectivity and specificity of substrate binding in methionyl‐tRNA synthetase | |
US7751987B1 (en) | Method and system for predicting amino acid sequences compatible with a specified three dimensional structure | |
WO1999061654A1 (fr) | Procedes et systeme de prediction des fonctions biologiques de proteines | |
Zacharias | Computational Protein–Protein Docking | |
Alber et al. | Structure determination of macromolecular complexes by experiment and computation | |
Wrabl et al. | Experimental Characterization of “Metamorphic” Proteins Predicted from an Ensemble-Based Thermodynamic Description | |
bioRχiv PREPRINT et al. | NEURAL NETWORK-DERIVED POTTS MODELS FOR STRUCTURE-BASED PROTEIN DESIGN USING BACKBONE ATOMIC COORDINATES AND TERTIARY MOTIFS | |
Opuu | Computational design of proteins and enzymes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CA JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: CA |