WO1997001634A2 - Polypeptide for repairing genetic information, nucleotidic sequence which codes for it and process for the preparation thereof (guanine thymine binding protein - gtbp) - Google Patents
Polypeptide for repairing genetic information, nucleotidic sequence which codes for it and process for the preparation thereof (guanine thymine binding protein - gtbp) Download PDFInfo
- Publication number
- WO1997001634A2 WO1997001634A2 PCT/IT1996/000131 IT9600131W WO9701634A2 WO 1997001634 A2 WO1997001634 A2 WO 1997001634A2 IT 9600131 W IT9600131 W IT 9600131W WO 9701634 A2 WO9701634 A2 WO 9701634A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- gtbp
- sequence
- gene
- seq
- protein
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 120
- 230000008569 process Effects 0.000 title claims abstract description 8
- 108090000765 processed proteins & peptides Proteins 0.000 title claims description 78
- 102000004196 processed proteins & peptides Human genes 0.000 title claims description 67
- 229920001184 polypeptide Polymers 0.000 title claims description 60
- 238000002360 preparation method Methods 0.000 title claims description 7
- 230000002068 genetic effect Effects 0.000 title abstract description 15
- MKKDSIYTXTZFAU-UHFFFAOYSA-N 2-amino-3,7-dihydropurin-6-one;5-methyl-1h-pyrimidine-2,4-dione Chemical compound CC1=CNC(=O)NC1=O.O=C1NC(N)=NC2=C1NC=N2 MKKDSIYTXTZFAU-UHFFFAOYSA-N 0.000 title abstract description 3
- 108091012397 thymine binding proteins Proteins 0.000 title 1
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 207
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 136
- 230000027455 binding Effects 0.000 claims abstract description 49
- 241000282414 Homo sapiens Species 0.000 claims abstract description 44
- 238000009739 binding Methods 0.000 claims abstract description 42
- 230000000694 effects Effects 0.000 claims abstract description 25
- 206010028980 Neoplasm Diseases 0.000 claims abstract description 22
- 210000004027 cell Anatomy 0.000 claims description 78
- 101001134036 Homo sapiens DNA mismatch repair protein Msh2 Proteins 0.000 claims description 66
- 102100034157 DNA mismatch repair protein Msh2 Human genes 0.000 claims description 63
- 150000001413 amino acids Chemical class 0.000 claims description 60
- 108020004414 DNA Proteins 0.000 claims description 52
- 239000002773 nucleotide Substances 0.000 claims description 45
- 125000003729 nucleotide group Chemical group 0.000 claims description 42
- 239000002299 complementary DNA Substances 0.000 claims description 36
- 230000035772 mutation Effects 0.000 claims description 30
- 108091034117 Oligonucleotide Proteins 0.000 claims description 29
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 28
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 26
- 238000003752 polymerase chain reaction Methods 0.000 claims description 25
- 238000003556 assay Methods 0.000 claims description 23
- 230000003321 amplification Effects 0.000 claims description 22
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 22
- 150000007523 nucleic acids Chemical class 0.000 claims description 20
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 18
- 102000039446 nucleic acids Human genes 0.000 claims description 15
- 108020004707 nucleic acids Proteins 0.000 claims description 15
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 claims description 14
- 108020004705 Codon Proteins 0.000 claims description 14
- 238000004458 analytical method Methods 0.000 claims description 14
- 108700028369 Alleles Proteins 0.000 claims description 13
- 108091026890 Coding region Proteins 0.000 claims description 13
- 238000012163 sequencing technique Methods 0.000 claims description 13
- 230000004075 alteration Effects 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 12
- 230000010076 replication Effects 0.000 claims description 10
- 238000012216 screening Methods 0.000 claims description 10
- 239000003298 DNA probe Substances 0.000 claims description 8
- 239000003795 chemical substances by application Substances 0.000 claims description 8
- 230000000295 complement effect Effects 0.000 claims description 8
- 108020004999 messenger RNA Proteins 0.000 claims description 8
- 201000011510 cancer Diseases 0.000 claims description 7
- 230000009395 genetic defect Effects 0.000 claims description 7
- 241001465754 Metazoa Species 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 6
- 238000012217 deletion Methods 0.000 claims description 6
- 230000037430 deletion Effects 0.000 claims description 6
- 239000013604 expression vector Substances 0.000 claims description 6
- 238000009396 hybridization Methods 0.000 claims description 6
- 238000003780 insertion Methods 0.000 claims description 6
- 230000037431 insertion Effects 0.000 claims description 6
- 230000033607 mismatch repair Effects 0.000 claims description 6
- 231100000504 carcinogenesis Toxicity 0.000 claims description 5
- 230000001613 neoplastic effect Effects 0.000 claims description 5
- 238000000746 purification Methods 0.000 claims description 5
- 230000003993 interaction Effects 0.000 claims description 4
- 208000005623 Carcinogenesis Diseases 0.000 claims description 3
- 108020003215 DNA Probes Proteins 0.000 claims description 3
- 230000036952 cancer formation Effects 0.000 claims description 3
- 230000009826 neoplastic cell growth Effects 0.000 claims description 3
- 230000009261 transgenic effect Effects 0.000 claims description 3
- 238000001262 western blot Methods 0.000 claims description 3
- 238000002105 Southern blotting Methods 0.000 claims description 2
- 230000001580 bacterial effect Effects 0.000 claims description 2
- 101710099946 DNA mismatch repair protein Msh6 Proteins 0.000 claims 53
- 102100021147 DNA mismatch repair protein Msh6 Human genes 0.000 claims 29
- 108010010789 G-T mismatch-binding protein Proteins 0.000 claims 2
- 230000001413 cellular effect Effects 0.000 claims 2
- 230000000536 complexating effect Effects 0.000 claims 2
- 201000010099 disease Diseases 0.000 claims 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims 2
- 238000005734 heterodimerization reaction Methods 0.000 claims 2
- 238000012287 DNA Binding Assay Methods 0.000 claims 1
- 238000012258 culturing Methods 0.000 claims 1
- 230000001419 dependent effect Effects 0.000 claims 1
- 210000003527 eukaryotic cell Anatomy 0.000 claims 1
- 230000002538 fungal effect Effects 0.000 claims 1
- 238000012215 gene cloning Methods 0.000 claims 1
- 239000001963 growth medium Substances 0.000 claims 1
- 238000003119 immunoblot Methods 0.000 claims 1
- 238000003365 immunocytochemistry Methods 0.000 claims 1
- 239000008194 pharmaceutical composition Substances 0.000 claims 1
- 210000001236 prokaryotic cell Anatomy 0.000 claims 1
- 238000007894 restriction fragment length polymorphism technique Methods 0.000 claims 1
- 208000001333 Colorectal Neoplasms Diseases 0.000 abstract description 29
- 238000001514 detection method Methods 0.000 abstract description 13
- 230000008439 repair process Effects 0.000 abstract description 8
- 238000010353 genetic engineering Methods 0.000 abstract description 6
- 238000003745 diagnosis Methods 0.000 abstract description 2
- 235000018102 proteins Nutrition 0.000 description 117
- 229940024606 amino acid Drugs 0.000 description 61
- 235000001014 amino acid Nutrition 0.000 description 60
- 239000000047 product Substances 0.000 description 27
- 239000013615 primer Substances 0.000 description 26
- 239000000523 sample Substances 0.000 description 25
- 239000000284 extract Substances 0.000 description 22
- 238000013519 translation Methods 0.000 description 21
- 239000012634 fragment Substances 0.000 description 20
- 239000013598 vector Substances 0.000 description 18
- 230000000692 anti-sense effect Effects 0.000 description 17
- 229920002401 polyacrylamide Polymers 0.000 description 17
- 210000001519 tissue Anatomy 0.000 description 16
- 238000006243 chemical reaction Methods 0.000 description 14
- 102000053602 DNA Human genes 0.000 description 11
- 241000588724 Escherichia coli Species 0.000 description 9
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 9
- 238000013518 transcription Methods 0.000 description 9
- 230000035897 transcription Effects 0.000 description 9
- 108091027305 Heteroduplex Proteins 0.000 description 8
- 108010038272 MutS Proteins Proteins 0.000 description 8
- 238000000211 autoradiogram Methods 0.000 description 8
- 230000015572 biosynthetic process Effects 0.000 description 8
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 7
- 102000010645 MutS Proteins Human genes 0.000 description 7
- 108700026244 Open Reading Frames Proteins 0.000 description 7
- 125000000539 amino acid group Chemical group 0.000 description 7
- 238000010367 cloning Methods 0.000 description 7
- 238000001415 gene therapy Methods 0.000 description 7
- 210000005260 human cell Anatomy 0.000 description 7
- 239000011159 matrix material Substances 0.000 description 7
- 239000000126 substance Substances 0.000 description 7
- 230000004568 DNA-binding Effects 0.000 description 6
- 108010038807 Oligopeptides Proteins 0.000 description 6
- 102000015636 Oligopeptides Human genes 0.000 description 6
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 6
- 210000004899 c-terminal region Anatomy 0.000 description 6
- 230000002950 deficient Effects 0.000 description 6
- 238000001727 in vivo Methods 0.000 description 6
- 230000000977 initiatory effect Effects 0.000 description 6
- 239000000203 mixture Substances 0.000 description 6
- 238000010369 molecular cloning Methods 0.000 description 6
- 239000013612 plasmid Substances 0.000 description 6
- 239000007787 solid Substances 0.000 description 6
- 230000002103 transcriptional effect Effects 0.000 description 6
- 101710099953 DNA mismatch repair protein msh3 Proteins 0.000 description 5
- YBAFDPFAUTYYRW-UHFFFAOYSA-N N-L-alpha-glutamyl-L-leucine Natural products CC(C)CC(C(O)=O)NC(=O)C(N)CCC(O)=O YBAFDPFAUTYYRW-UHFFFAOYSA-N 0.000 description 5
- 108020004511 Recombinant DNA Proteins 0.000 description 5
- 239000008049 TAE buffer Substances 0.000 description 5
- HGEVZDLYZYVYHD-UHFFFAOYSA-N acetic acid;2-amino-2-(hydroxymethyl)propane-1,3-diol;2-[2-[bis(carboxymethyl)amino]ethyl-(carboxymethyl)amino]acetic acid Chemical compound CC(O)=O.OCC(N)(CO)CO.OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O HGEVZDLYZYVYHD-UHFFFAOYSA-N 0.000 description 5
- 239000012491 analyte Substances 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 5
- 238000002955 isolation Methods 0.000 description 5
- 239000006166 lysate Substances 0.000 description 5
- 239000003471 mutagenic agent Substances 0.000 description 5
- 230000006337 proteolytic cleavage Effects 0.000 description 5
- 241000894007 species Species 0.000 description 5
- 230000009870 specific binding Effects 0.000 description 5
- 238000006467 substitution reaction Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 239000003155 DNA primer Substances 0.000 description 4
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 4
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 4
- 102000004190 Enzymes Human genes 0.000 description 4
- 108090000790 Enzymes Proteins 0.000 description 4
- 208000034826 Genetic Predisposition to Disease Diseases 0.000 description 4
- JBCLFWXMTIKCCB-UHFFFAOYSA-N H-Gly-Phe-OH Natural products NCC(=O)NC(C(O)=O)CC1=CC=CC=C1 JBCLFWXMTIKCCB-UHFFFAOYSA-N 0.000 description 4
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 4
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 description 4
- SBMNPABNWKXNBJ-BQBZGAKWSA-N Ser-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](N)CO SBMNPABNWKXNBJ-BQBZGAKWSA-N 0.000 description 4
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 description 4
- KOSRFJWDECSPRO-UHFFFAOYSA-N alpha-L-glutamyl-L-glutamic acid Natural products OC(=O)CCC(N)C(=O)NC(CCC(O)=O)C(O)=O KOSRFJWDECSPRO-UHFFFAOYSA-N 0.000 description 4
- 239000000427 antigen Substances 0.000 description 4
- 102000036639 antigens Human genes 0.000 description 4
- 108091007433 antigens Proteins 0.000 description 4
- 102000023732 binding proteins Human genes 0.000 description 4
- 108091008324 binding proteins Proteins 0.000 description 4
- 239000003153 chemical reaction reagent Substances 0.000 description 4
- 230000007547 defect Effects 0.000 description 4
- 239000003814 drug Substances 0.000 description 4
- 230000003053 immunization Effects 0.000 description 4
- 210000004962 mammalian cell Anatomy 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 239000003550 marker Substances 0.000 description 4
- 238000002156 mixing Methods 0.000 description 4
- 239000013642 negative control Substances 0.000 description 4
- 210000002826 placenta Anatomy 0.000 description 4
- 210000001995 reticulocyte Anatomy 0.000 description 4
- 238000003757 reverse transcription PCR Methods 0.000 description 4
- 210000004881 tumor cell Anatomy 0.000 description 4
- 239000004474 valine Substances 0.000 description 4
- 241000894006 Bacteria Species 0.000 description 3
- 241000701959 Escherichia virus Lambda Species 0.000 description 3
- BUZMZDDKFCSKOT-CIUDSAMLSA-N Glu-Glu-Glu Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O BUZMZDDKFCSKOT-CIUDSAMLSA-N 0.000 description 3
- 241000282412 Homo Species 0.000 description 3
- 102100034343 Integrase Human genes 0.000 description 3
- DCXYFEDJOCDNAF-REOHCLBHSA-N L-asparagine Chemical compound OC(=O)[C@@H](N)CC(N)=O DCXYFEDJOCDNAF-REOHCLBHSA-N 0.000 description 3
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 3
- RXGLHDWAZQECBI-SRVKXCTJSA-N Leu-Leu-Ser Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(O)=O RXGLHDWAZQECBI-SRVKXCTJSA-N 0.000 description 3
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 3
- 208000032818 Microsatellite Instability Diseases 0.000 description 3
- 241000283973 Oryctolagus cuniculus Species 0.000 description 3
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 3
- 108020004682 Single-Stranded DNA Proteins 0.000 description 3
- 108020005038 Terminator Codon Proteins 0.000 description 3
- MEJHFIOYJHTWMK-VOAKCMCISA-N Thr-Leu-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)[C@@H](C)O MEJHFIOYJHTWMK-VOAKCMCISA-N 0.000 description 3
- 238000001042 affinity chromatography Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 108010062796 arginyllysine Proteins 0.000 description 3
- 108010040443 aspartyl-aspartic acid Proteins 0.000 description 3
- 238000003776 cleavage reaction Methods 0.000 description 3
- 210000001072 colon Anatomy 0.000 description 3
- 150000001875 compounds Chemical class 0.000 description 3
- 239000000470 constituent Substances 0.000 description 3
- 238000012937 correction Methods 0.000 description 3
- 230000007812 deficiency Effects 0.000 description 3
- 238000002405 diagnostic procedure Methods 0.000 description 3
- 108010081551 glycylphenylalanine Proteins 0.000 description 3
- 238000002649 immunization Methods 0.000 description 3
- 238000003018 immunoassay Methods 0.000 description 3
- 229960000310 isoleucine Drugs 0.000 description 3
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 description 3
- 108010034529 leucyl-lysine Proteins 0.000 description 3
- 108010057821 leucylproline Proteins 0.000 description 3
- 239000002502 liposome Substances 0.000 description 3
- 108010064235 lysylglycine Proteins 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 229930182817 methionine Natural products 0.000 description 3
- 230000000869 mutational effect Effects 0.000 description 3
- 230000000246 remedial effect Effects 0.000 description 3
- 108091008146 restriction endonucleases Proteins 0.000 description 3
- 230000007017 scission Effects 0.000 description 3
- 239000000758 substrate Substances 0.000 description 3
- 229940124597 therapeutic agent Drugs 0.000 description 3
- 238000011144 upstream manufacturing Methods 0.000 description 3
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 2
- 108010036211 5-HT-moduline Proteins 0.000 description 2
- ZEBDYGZVMMKZNB-SRVKXCTJSA-N Arg-Met-Val Chemical compound CC(C)[C@@H](C(=O)O)NC(=O)[C@H](CCSC)NC(=O)[C@H](CCCN=C(N)N)N ZEBDYGZVMMKZNB-SRVKXCTJSA-N 0.000 description 2
- NJIKKGUVGUBICV-ZLUOBGJFSA-N Asp-Ala-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CC(O)=O NJIKKGUVGUBICV-ZLUOBGJFSA-N 0.000 description 2
- DWOGMPWRQQWPPF-GUBZILKMSA-N Asp-Leu-Glu Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O DWOGMPWRQQWPPF-GUBZILKMSA-N 0.000 description 2
- 125000001433 C-terminal amino-acid group Chemical group 0.000 description 2
- -1 GTBP (ff. sapiens) Proteins 0.000 description 2
- 108700028146 Genetic Enhancer Elements Proteins 0.000 description 2
- RGJKYNUINKGPJN-RWRJDSDZSA-N Glu-Thr-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](CCC(=O)O)N RGJKYNUINKGPJN-RWRJDSDZSA-N 0.000 description 2
- 102000005720 Glutathione transferase Human genes 0.000 description 2
- 108010070675 Glutathione transferase Proteins 0.000 description 2
- FFJQHWKSGAWSTJ-BFHQHQDPSA-N Gly-Thr-Ala Chemical compound [H]NCC(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C)C(O)=O FFJQHWKSGAWSTJ-BFHQHQDPSA-N 0.000 description 2
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 2
- WHUUTDBJXJRKMK-VKHMYHEASA-N L-glutamic acid Chemical compound OC(=O)[C@@H](N)CCC(O)=O WHUUTDBJXJRKMK-VKHMYHEASA-N 0.000 description 2
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 2
- KDXKERNSBIXSRK-YFKPBYRVSA-N L-lysine Chemical compound NCCCC[C@H](N)C(O)=O KDXKERNSBIXSRK-YFKPBYRVSA-N 0.000 description 2
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 description 2
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 2
- 241000880493 Leptailurus serval Species 0.000 description 2
- OIARJGNVARWKFP-YUMQZZPRSA-N Leu-Asn-Gly Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(=O)NCC(O)=O OIARJGNVARWKFP-YUMQZZPRSA-N 0.000 description 2
- LXKNSJLSGPNHSK-KKUMJFAQSA-N Leu-Leu-Lys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)O)N LXKNSJLSGPNHSK-KKUMJFAQSA-N 0.000 description 2
- HVHRPWQEQHIQJF-AVGNSLFASA-N Leu-Lys-Glu Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(O)=O HVHRPWQEQHIQJF-AVGNSLFASA-N 0.000 description 2
- PBIPLDMFHAICIP-DCAQKATOSA-N Lys-Glu-Glu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O PBIPLDMFHAICIP-DCAQKATOSA-N 0.000 description 2
- YWJQHDDBFAXNIR-MXAVVETBSA-N Lys-Ile-His Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)NC(=O)[C@H](CCCCN)N YWJQHDDBFAXNIR-MXAVVETBSA-N 0.000 description 2
- YPLVCBKEPJPBDQ-MELADBBJSA-N Lys-Leu-Pro Chemical compound CC(C)C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CCCCN)N YPLVCBKEPJPBDQ-MELADBBJSA-N 0.000 description 2
- TWRXJAOTZQYOKJ-UHFFFAOYSA-L Magnesium chloride Chemical compound [Mg+2].[Cl-].[Cl-] TWRXJAOTZQYOKJ-UHFFFAOYSA-L 0.000 description 2
- SITLTJHOQZFJGG-UHFFFAOYSA-N N-L-alpha-glutamyl-L-valine Natural products CC(C)C(C(O)=O)NC(=O)C(N)CCC(O)=O SITLTJHOQZFJGG-UHFFFAOYSA-N 0.000 description 2
- 101100342977 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) leu-1 gene Proteins 0.000 description 2
- 238000012408 PCR amplification Methods 0.000 description 2
- KAJLHCWRWDSROH-BZSNNMDCSA-N Phe-Phe-Asp Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CC(O)=O)C(O)=O)C1=CC=CC=C1 KAJLHCWRWDSROH-BZSNNMDCSA-N 0.000 description 2
- 108091000080 Phosphotransferase Proteins 0.000 description 2
- 102000007056 Recombinant Fusion Proteins Human genes 0.000 description 2
- 108010008281 Recombinant Fusion Proteins Proteins 0.000 description 2
- IXZHZUGGKLRHJD-DCAQKATOSA-N Ser-Leu-Val Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(O)=O IXZHZUGGKLRHJD-DCAQKATOSA-N 0.000 description 2
- FHXGMDRKJHKLKW-QWRGUYRKSA-N Ser-Tyr-Gly Chemical compound OC[C@H](N)C(=O)N[C@H](C(=O)NCC(O)=O)CC1=CC=C(O)C=C1 FHXGMDRKJHKLKW-QWRGUYRKSA-N 0.000 description 2
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 2
- 101710106714 Shutoff protein Proteins 0.000 description 2
- CYCGARJWIQWPQM-YJRXYDGGSA-N Thr-Tyr-Ser Chemical compound C[C@@H](O)[C@H]([NH3+])C(=O)N[C@H](C(=O)N[C@@H](CO)C([O-])=O)CC1=CC=C(O)C=C1 CYCGARJWIQWPQM-YJRXYDGGSA-N 0.000 description 2
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 description 2
- IGXLNVIYDYONFB-UFYCRDLUSA-N Tyr-Phe-Arg Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CCCN=C(N)N)C(O)=O)C1=CC=C(O)C=C1 IGXLNVIYDYONFB-UFYCRDLUSA-N 0.000 description 2
- OVBMCNDKCWAXMZ-NAKRPEOUSA-N Val-Ile-Ser Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@H](C(C)C)N OVBMCNDKCWAXMZ-NAKRPEOUSA-N 0.000 description 2
- 241000700605 Viruses Species 0.000 description 2
- 229960000723 ampicillin Drugs 0.000 description 2
- AVKUERGKIZMTKX-NJBDSQKTSA-N ampicillin Chemical compound C1([C@@H](N)C(=O)N[C@H]2[C@H]3SC([C@@H](N3C2=O)C(O)=O)(C)C)=CC=CC=C1 AVKUERGKIZMTKX-NJBDSQKTSA-N 0.000 description 2
- 108010043240 arginyl-leucyl-glycine Proteins 0.000 description 2
- 108010068380 arginylarginine Proteins 0.000 description 2
- 229940009098 aspartate Drugs 0.000 description 2
- 235000003704 aspartic acid Nutrition 0.000 description 2
- 108010069205 aspartyl-phenylalanine Proteins 0.000 description 2
- 239000011324 bead Substances 0.000 description 2
- OQFSQFPPLPISGP-UHFFFAOYSA-N beta-carboxyaspartic acid Natural products OC(=O)C(N)C(C(O)=O)C(O)=O OQFSQFPPLPISGP-UHFFFAOYSA-N 0.000 description 2
- 230000004071 biological effect Effects 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 210000001124 body fluid Anatomy 0.000 description 2
- 239000010839 body fluid Substances 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 239000007795 chemical reaction product Substances 0.000 description 2
- 210000000349 chromosome Anatomy 0.000 description 2
- 210000004748 cultured cell Anatomy 0.000 description 2
- 238000001962 electrophoresis Methods 0.000 description 2
- 230000037433 frameshift Effects 0.000 description 2
- 102000037865 fusion proteins Human genes 0.000 description 2
- 108020001507 fusion proteins Proteins 0.000 description 2
- 229930195712 glutamate Natural products 0.000 description 2
- 108010055341 glutamyl-glutamic acid Proteins 0.000 description 2
- RWSXRVCMGQZWBV-WDSKDSINSA-N glutathione Chemical compound OC(=O)[C@@H](N)CCC(=O)N[C@@H](CS)C(=O)NCC(O)=O RWSXRVCMGQZWBV-WDSKDSINSA-N 0.000 description 2
- VPZXBVLAVMBEQI-UHFFFAOYSA-N glycyl-DL-alpha-alanine Natural products OC(=O)C(C)NC(=O)CN VPZXBVLAVMBEQI-UHFFFAOYSA-N 0.000 description 2
- 108010089804 glycyl-threonine Proteins 0.000 description 2
- 108010020688 glycylhistidine Proteins 0.000 description 2
- 108010037850 glycylvaline Proteins 0.000 description 2
- 230000012010 growth Effects 0.000 description 2
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 2
- 102000057079 human MSH2 Human genes 0.000 description 2
- 210000004408 hybridoma Anatomy 0.000 description 2
- 210000004201 immune sera Anatomy 0.000 description 2
- 229940042743 immune sera Drugs 0.000 description 2
- 238000000338 in vitro Methods 0.000 description 2
- 238000011534 incubation Methods 0.000 description 2
- 229920002521 macromolecule Polymers 0.000 description 2
- 125000001360 methionine group Chemical group N[C@@H](CCSC)C(=O)* 0.000 description 2
- 238000000520 microinjection Methods 0.000 description 2
- 230000010309 neoplastic transformation Effects 0.000 description 2
- 102000020233 phosphotransferase Human genes 0.000 description 2
- 230000008488 polyadenylation Effects 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 108010031719 prolyl-serine Proteins 0.000 description 2
- 238000000159 protein binding assay Methods 0.000 description 2
- 230000002797 proteolythic effect Effects 0.000 description 2
- 239000011541 reaction mixture Substances 0.000 description 2
- 238000010188 recombinant method Methods 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000010839 reverse transcription Methods 0.000 description 2
- 230000000392 somatic effect Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000010998 test method Methods 0.000 description 2
- 238000001890 transfection Methods 0.000 description 2
- 230000014621 translational initiation Effects 0.000 description 2
- 230000005945 translocation Effects 0.000 description 2
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 2
- 108010051110 tyrosyl-lysine Proteins 0.000 description 2
- 241001430294 unidentified retrovirus Species 0.000 description 2
- MTCFGRXMJLQNBG-REOHCLBHSA-N (2S)-2-Amino-3-hydroxypropansäure Chemical compound OC[C@H](N)C(O)=O MTCFGRXMJLQNBG-REOHCLBHSA-N 0.000 description 1
- BRPMXFSTKXXNHF-IUCAKERBSA-N (2s)-1-[2-[[(2s)-pyrrolidine-2-carbonyl]amino]acetyl]pyrrolidine-2-carboxylic acid Chemical compound OC(=O)[C@@H]1CCCN1C(=O)CNC(=O)[C@H]1NCCC1 BRPMXFSTKXXNHF-IUCAKERBSA-N 0.000 description 1
- QFVHZQCOUORWEI-UHFFFAOYSA-N 4-[(4-anilino-5-sulfonaphthalen-1-yl)diazenyl]-5-hydroxynaphthalene-2,7-disulfonic acid Chemical compound C=12C(O)=CC(S(O)(=O)=O)=CC2=CC(S(O)(=O)=O)=CC=1N=NC(C1=CC=CC(=C11)S(O)(=O)=O)=CC=C1NC1=CC=CC=C1 QFVHZQCOUORWEI-UHFFFAOYSA-N 0.000 description 1
- HRPVXLWXLXDGHG-UHFFFAOYSA-N Acrylamide Chemical compound NC(=O)C=C HRPVXLWXLXDGHG-UHFFFAOYSA-N 0.000 description 1
- CXRCVCURMBFFOL-FXQIFTODSA-N Ala-Ala-Pro Chemical compound C[C@H](N)C(=O)N[C@@H](C)C(=O)N1CCC[C@H]1C(O)=O CXRCVCURMBFFOL-FXQIFTODSA-N 0.000 description 1
- SVBXIUDNTRTKHE-CIUDSAMLSA-N Ala-Arg-Glu Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(O)=O SVBXIUDNTRTKHE-CIUDSAMLSA-N 0.000 description 1
- IMMKUCQIKKXKNP-DCAQKATOSA-N Ala-Arg-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@H](C)N)CCCN=C(N)N IMMKUCQIKKXKNP-DCAQKATOSA-N 0.000 description 1
- TTXMOJWKNRJWQJ-FXQIFTODSA-N Ala-Arg-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)C)CCCN=C(N)N TTXMOJWKNRJWQJ-FXQIFTODSA-N 0.000 description 1
- NXSFUECZFORGOG-CIUDSAMLSA-N Ala-Asn-Leu Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(O)=O NXSFUECZFORGOG-CIUDSAMLSA-N 0.000 description 1
- KIUYPHAMDKDICO-WHFBIAKZSA-N Ala-Asp-Gly Chemical compound C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)NCC(O)=O KIUYPHAMDKDICO-WHFBIAKZSA-N 0.000 description 1
- HJGZVLLLBJLXFC-LSJOCFKGSA-N Ala-His-Val Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](C(C)C)C(O)=O HJGZVLLLBJLXFC-LSJOCFKGSA-N 0.000 description 1
- CCDFBRZVTDDJNM-GUBZILKMSA-N Ala-Leu-Glu Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O CCDFBRZVTDDJNM-GUBZILKMSA-N 0.000 description 1
- MNZHHDPWDWQJCQ-YUMQZZPRSA-N Ala-Leu-Gly Chemical compound C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)NCC(O)=O MNZHHDPWDWQJCQ-YUMQZZPRSA-N 0.000 description 1
- SOBIAADAMRHGKH-CIUDSAMLSA-N Ala-Leu-Ser Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(O)=O SOBIAADAMRHGKH-CIUDSAMLSA-N 0.000 description 1
- AJBVYEYZVYPFCF-CIUDSAMLSA-N Ala-Lys-Asn Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(O)=O AJBVYEYZVYPFCF-CIUDSAMLSA-N 0.000 description 1
- CHFFHQUVXHEGBY-GARJFASQSA-N Ala-Lys-Pro Chemical compound C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N1CCC[C@@H]1C(=O)O)N CHFFHQUVXHEGBY-GARJFASQSA-N 0.000 description 1
- GKAZXNDATBWNBI-DCAQKATOSA-N Ala-Met-Lys Chemical compound C[C@@H](C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCCCN)C(=O)O)N GKAZXNDATBWNBI-DCAQKATOSA-N 0.000 description 1
- DHBKYZYFEXXUAK-ONGXEEELSA-N Ala-Phe-Gly Chemical compound OC(=O)CNC(=O)[C@@H](NC(=O)[C@@H](N)C)CC1=CC=CC=C1 DHBKYZYFEXXUAK-ONGXEEELSA-N 0.000 description 1
- IPZQNYYAYVRKKK-FXQIFTODSA-N Ala-Pro-Ala Chemical compound C[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](C)C(O)=O IPZQNYYAYVRKKK-FXQIFTODSA-N 0.000 description 1
- ADSGHMXEAZJJNF-DCAQKATOSA-N Ala-Pro-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@@H]1CCCN1C(=O)[C@H](C)N ADSGHMXEAZJJNF-DCAQKATOSA-N 0.000 description 1
- DCVYRWFAMZFSDA-ZLUOBGJFSA-N Ala-Ser-Ala Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CO)C(=O)N[C@@H](C)C(O)=O DCVYRWFAMZFSDA-ZLUOBGJFSA-N 0.000 description 1
- RMAWDDRDTRSZIR-ZLUOBGJFSA-N Ala-Ser-Asp Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(O)=O RMAWDDRDTRSZIR-ZLUOBGJFSA-N 0.000 description 1
- MSWSRLGNLKHDEI-ACZMJKKPSA-N Ala-Ser-Glu Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(O)=O MSWSRLGNLKHDEI-ACZMJKKPSA-N 0.000 description 1
- DYXOFPBJBAHWFY-JBDRJPRFSA-N Ala-Ser-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CO)NC(=O)[C@H](C)N DYXOFPBJBAHWFY-JBDRJPRFSA-N 0.000 description 1
- KTXKIYXZQFWJKB-VZFHVOOUSA-N Ala-Thr-Ser Chemical compound [H]N[C@@H](C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(O)=O KTXKIYXZQFWJKB-VZFHVOOUSA-N 0.000 description 1
- PGNNQOJOEGFAOR-KWQFWETISA-N Ala-Tyr-Gly Chemical compound OC(=O)CNC(=O)[C@@H](NC(=O)[C@@H](N)C)CC1=CC=C(O)C=C1 PGNNQOJOEGFAOR-KWQFWETISA-N 0.000 description 1
- QRIYOHQJRDHFKF-UWJYBYFXSA-N Ala-Tyr-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)C)CC1=CC=C(O)C=C1 QRIYOHQJRDHFKF-UWJYBYFXSA-N 0.000 description 1
- BOKLLPVAQDSLHC-FXQIFTODSA-N Ala-Val-Cys Chemical compound C[C@@H](C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CS)C(=O)O)N BOKLLPVAQDSLHC-FXQIFTODSA-N 0.000 description 1
- CLOMBHBBUKAUBP-LSJOCFKGSA-N Ala-Val-His Chemical compound C[C@@H](C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)N CLOMBHBBUKAUBP-LSJOCFKGSA-N 0.000 description 1
- VYSRNGOMGHOJCK-GUBZILKMSA-N Arg-Ala-Met Chemical compound C[C@@H](C(=O)N[C@@H](CCSC)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N VYSRNGOMGHOJCK-GUBZILKMSA-N 0.000 description 1
- GIVATXIGCXFQQA-FXQIFTODSA-N Arg-Ala-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CCCN=C(N)N GIVATXIGCXFQQA-FXQIFTODSA-N 0.000 description 1
- OZNSCVPYWZRQPY-CIUDSAMLSA-N Arg-Asp-Glu Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O OZNSCVPYWZRQPY-CIUDSAMLSA-N 0.000 description 1
- HJAICMSAKODKRF-GUBZILKMSA-N Arg-Cys-Arg Chemical compound NC(N)=NCCC[C@H](N)C(=O)N[C@@H](CS)C(=O)N[C@@H](CCCN=C(N)N)C(O)=O HJAICMSAKODKRF-GUBZILKMSA-N 0.000 description 1
- PBSOQGZLPFVXPU-YUMQZZPRSA-N Arg-Glu-Gly Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)NCC(O)=O PBSOQGZLPFVXPU-YUMQZZPRSA-N 0.000 description 1
- OHYQKYUTLIPFOX-ZPFDUUQYSA-N Arg-Glu-Ile Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O OHYQKYUTLIPFOX-ZPFDUUQYSA-N 0.000 description 1
- OGUPCHKBOKJFMA-SRVKXCTJSA-N Arg-Glu-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CCCN=C(N)N OGUPCHKBOKJFMA-SRVKXCTJSA-N 0.000 description 1
- OKKMBOSPBDASEP-CYDGBPFRSA-N Arg-Ile-Met Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCSC)C(O)=O OKKMBOSPBDASEP-CYDGBPFRSA-N 0.000 description 1
- OTZMRMHZCMZOJZ-SRVKXCTJSA-N Arg-Leu-Glu Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O OTZMRMHZCMZOJZ-SRVKXCTJSA-N 0.000 description 1
- YBZMTKUDWXZLIX-UWVGGRQHSA-N Arg-Leu-Gly Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(C)C)C(=O)NCC(O)=O YBZMTKUDWXZLIX-UWVGGRQHSA-N 0.000 description 1
- NPAVRDPEFVKELR-DCAQKATOSA-N Arg-Lys-Ser Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(O)=O NPAVRDPEFVKELR-DCAQKATOSA-N 0.000 description 1
- XKDYWGLNSCNRGW-WDSOQIARSA-N Arg-Lys-Trp Chemical compound C1=CC=C2C(C[C@H](NC(=O)[C@@H](NC(=O)[C@@H](N)CCCN=C(N)N)CCCCN)C(O)=O)=CNC2=C1 XKDYWGLNSCNRGW-WDSOQIARSA-N 0.000 description 1
- OVQJAKFLFTZDNC-GUBZILKMSA-N Arg-Pro-Asp Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(O)=O)C(O)=O OVQJAKFLFTZDNC-GUBZILKMSA-N 0.000 description 1
- NGYHSXDNNOFHNE-AVGNSLFASA-N Arg-Pro-Leu Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(C)C)C(O)=O NGYHSXDNNOFHNE-AVGNSLFASA-N 0.000 description 1
- DNLQVHBBMPZUGJ-BQBZGAKWSA-N Arg-Ser-Gly Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CO)C(=O)NCC(O)=O DNLQVHBBMPZUGJ-BQBZGAKWSA-N 0.000 description 1
- FRBAHXABMQXSJQ-FXQIFTODSA-N Arg-Ser-Ser Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(O)=O FRBAHXABMQXSJQ-FXQIFTODSA-N 0.000 description 1
- LRPZJPMQGKGHSG-XGEHTFHBSA-N Arg-Ser-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](CCCN=C(N)N)N)O LRPZJPMQGKGHSG-XGEHTFHBSA-N 0.000 description 1
- OQPAZKMGCWPERI-GUBZILKMSA-N Arg-Ser-Val Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(O)=O OQPAZKMGCWPERI-GUBZILKMSA-N 0.000 description 1
- ZUVDFJXRAICIAJ-BPUTZDHNSA-N Arg-Trp-Asp Chemical compound C1=CC=C2C(C[C@H](NC(=O)[C@H](CCCN=C(N)N)N)C(=O)N[C@@H](CC(O)=O)C(O)=O)=CNC2=C1 ZUVDFJXRAICIAJ-BPUTZDHNSA-N 0.000 description 1
- ULBHWNVWSCJLCO-NHCYSSNCSA-N Arg-Val-Glu Chemical compound OC(=O)CC[C@@H](C(O)=O)NC(=O)[C@H](C(C)C)NC(=O)[C@@H](N)CCCN=C(N)N ULBHWNVWSCJLCO-NHCYSSNCSA-N 0.000 description 1
- XEOXPCNONWHHSW-AVGNSLFASA-N Arg-Val-His Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N XEOXPCNONWHHSW-AVGNSLFASA-N 0.000 description 1
- 239000004475 Arginine Substances 0.000 description 1
- SJUXYGVRSGTPMC-IMJSIDKUSA-N Asn-Ala Chemical compound OC(=O)[C@H](C)NC(=O)[C@@H](N)CC(N)=O SJUXYGVRSGTPMC-IMJSIDKUSA-N 0.000 description 1
- DAPLJWATMAXPPZ-CIUDSAMLSA-N Asn-Asn-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H](N)CC(N)=O DAPLJWATMAXPPZ-CIUDSAMLSA-N 0.000 description 1
- DJIMLSXHXKWADV-CIUDSAMLSA-N Asn-Leu-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC(N)=O DJIMLSXHXKWADV-CIUDSAMLSA-N 0.000 description 1
- FTSAJSADJCMDHH-CIUDSAMLSA-N Asn-Lys-Asp Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)[C@H](CC(=O)N)N FTSAJSADJCMDHH-CIUDSAMLSA-N 0.000 description 1
- RAUPFUCUDBQYHE-AVGNSLFASA-N Asn-Phe-Glu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(O)=O)C(O)=O RAUPFUCUDBQYHE-AVGNSLFASA-N 0.000 description 1
- YUOXLJYVSZYPBJ-CIUDSAMLSA-N Asn-Pro-Glu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(O)=O YUOXLJYVSZYPBJ-CIUDSAMLSA-N 0.000 description 1
- GZXOUBTUAUAVHD-ACZMJKKPSA-N Asn-Ser-Glu Chemical compound NC(=O)C[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCC(O)=O GZXOUBTUAUAVHD-ACZMJKKPSA-N 0.000 description 1
- JBDLMLZNDRLDIX-HJGDQZAQSA-N Asn-Thr-Leu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(O)=O JBDLMLZNDRLDIX-HJGDQZAQSA-N 0.000 description 1
- CBWCQCANJSGUOH-ZKWXMUAHSA-N Asn-Val-Ala Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(O)=O CBWCQCANJSGUOH-ZKWXMUAHSA-N 0.000 description 1
- ZAESWDKAMDVHLL-RCOVLWMOSA-N Asn-Val-Gly Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](C(C)C)C(=O)NCC(O)=O ZAESWDKAMDVHLL-RCOVLWMOSA-N 0.000 description 1
- MFMJRYHVLLEMQM-DCAQKATOSA-N Asp-Arg-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CC(=O)O)N MFMJRYHVLLEMQM-DCAQKATOSA-N 0.000 description 1
- ZELQAFZSJOBEQS-ACZMJKKPSA-N Asp-Asn-Glu Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O ZELQAFZSJOBEQS-ACZMJKKPSA-N 0.000 description 1
- QXHVOUSPVAWEMX-ZLUOBGJFSA-N Asp-Asp-Ser Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(O)=O QXHVOUSPVAWEMX-ZLUOBGJFSA-N 0.000 description 1
- ZEDBMCPXPIYJLW-XHNCKOQMSA-N Asp-Glu-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CC(=O)O)N)C(=O)O ZEDBMCPXPIYJLW-XHNCKOQMSA-N 0.000 description 1
- CRNKLABLTICXDV-GUBZILKMSA-N Asp-His-Glu Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)O)NC(=O)[C@H](CC(=O)O)N CRNKLABLTICXDV-GUBZILKMSA-N 0.000 description 1
- QNMKWNONJGKJJC-NHCYSSNCSA-N Asp-Leu-Val Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(O)=O QNMKWNONJGKJJC-NHCYSSNCSA-N 0.000 description 1
- YZQCXOFQZKCETR-UWVGGRQHSA-N Asp-Phe Chemical compound OC(=O)C[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 YZQCXOFQZKCETR-UWVGGRQHSA-N 0.000 description 1
- LTCKTLYKRMCFOC-KKUMJFAQSA-N Asp-Phe-Leu Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(C)C)C(O)=O LTCKTLYKRMCFOC-KKUMJFAQSA-N 0.000 description 1
- BRRPVTUFESPTCP-ACZMJKKPSA-N Asp-Ser-Glu Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCC(O)=O BRRPVTUFESPTCP-ACZMJKKPSA-N 0.000 description 1
- DRCOAZZDQRCGGP-GHCJXIJMSA-N Asp-Ser-Ile Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O DRCOAZZDQRCGGP-GHCJXIJMSA-N 0.000 description 1
- HRVQDZOWMLFAOD-BIIVOSGPSA-N Asp-Ser-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CO)NC(=O)[C@H](CC(=O)O)N)C(=O)O HRVQDZOWMLFAOD-BIIVOSGPSA-N 0.000 description 1
- IQCJOIHDVFJQFV-LKXGYXEUSA-N Asp-Thr-Cys Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CC(=O)O)N)O IQCJOIHDVFJQFV-LKXGYXEUSA-N 0.000 description 1
- ITGFVUYOLWBPQW-KKHAAJSZSA-N Asp-Thr-Val Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(O)=O ITGFVUYOLWBPQW-KKHAAJSZSA-N 0.000 description 1
- USENATHVGFXRNO-SRVKXCTJSA-N Asp-Tyr-Asp Chemical compound OC(=O)C[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CC(O)=O)C(O)=O)CC1=CC=C(O)C=C1 USENATHVGFXRNO-SRVKXCTJSA-N 0.000 description 1
- DCXYFEDJOCDNAF-UHFFFAOYSA-N Asparagine Natural products OC(=O)C(N)CC(N)=O DCXYFEDJOCDNAF-UHFFFAOYSA-N 0.000 description 1
- 102100021277 Beta-secretase 2 Human genes 0.000 description 1
- 108020004635 Complementary DNA Proteins 0.000 description 1
- 108020004394 Complementary RNA Proteins 0.000 description 1
- LHLSSZYQFUNWRZ-NAKRPEOUSA-N Cys-Arg-Ile Chemical compound [H]N[C@@H](CS)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O LHLSSZYQFUNWRZ-NAKRPEOUSA-N 0.000 description 1
- YRJICXCOIBUCRP-CIUDSAMLSA-N Cys-Asn-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CS)N YRJICXCOIBUCRP-CIUDSAMLSA-N 0.000 description 1
- IIGHQOPGMGKDMT-SRVKXCTJSA-N Cys-Asp-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CS)N IIGHQOPGMGKDMT-SRVKXCTJSA-N 0.000 description 1
- VBPGTULCFGKGTF-ACZMJKKPSA-N Cys-Glu-Asp Chemical compound [H]N[C@@H](CS)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O VBPGTULCFGKGTF-ACZMJKKPSA-N 0.000 description 1
- YXPNKXFOBHRUBL-BJDJZHNGSA-N Cys-Lys-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CS)N YXPNKXFOBHRUBL-BJDJZHNGSA-N 0.000 description 1
- RWVBNRYBHAGYSG-GUBZILKMSA-N Cys-Met-Val Chemical compound CC(C)[C@@H](C(=O)O)NC(=O)[C@H](CCSC)NC(=O)[C@H](CS)N RWVBNRYBHAGYSG-GUBZILKMSA-N 0.000 description 1
- HMWBPUDETPKSSS-DCAQKATOSA-N Cys-Pro-Lys Chemical compound C1C[C@H](N(C1)C(=O)[C@H](CS)N)C(=O)N[C@@H](CCCCN)C(=O)O HMWBPUDETPKSSS-DCAQKATOSA-N 0.000 description 1
- LEVWYRKDKASIDU-QWWZWVQMSA-N D-cystine Chemical compound OC(=O)[C@H](N)CSSC[C@@H](N)C(O)=O LEVWYRKDKASIDU-QWWZWVQMSA-N 0.000 description 1
- 230000033616 DNA repair Effects 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- 102000052510 DNA-Binding Proteins Human genes 0.000 description 1
- 101710096438 DNA-binding protein Proteins 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 229920002307 Dextran Polymers 0.000 description 1
- 108090000204 Dipeptidase 1 Proteins 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 102000002464 Galactosidases Human genes 0.000 description 1
- 108010093031 Galactosidases Proteins 0.000 description 1
- 208000031448 Genomic Instability Diseases 0.000 description 1
- UTKUTMJSWKKHEM-WDSKDSINSA-N Glu-Ala-Gly Chemical compound OC(=O)CNC(=O)[C@H](C)NC(=O)[C@@H](N)CCC(O)=O UTKUTMJSWKKHEM-WDSKDSINSA-N 0.000 description 1
- ITYRYNUZHPNCIK-GUBZILKMSA-N Glu-Ala-Leu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(O)=O ITYRYNUZHPNCIK-GUBZILKMSA-N 0.000 description 1
- MXOODARRORARSU-ACZMJKKPSA-N Glu-Ala-Ser Chemical compound C[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@H](CCC(=O)O)N MXOODARRORARSU-ACZMJKKPSA-N 0.000 description 1
- DYFJZDDQPNIPAB-NHCYSSNCSA-N Glu-Arg-Val Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C(C)C)C(O)=O DYFJZDDQPNIPAB-NHCYSSNCSA-N 0.000 description 1
- TUTIHHSZKFBMHM-WHFBIAKZSA-N Glu-Asn Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CC(N)=O)C(O)=O TUTIHHSZKFBMHM-WHFBIAKZSA-N 0.000 description 1
- YYOBUPFZLKQUAX-FXQIFTODSA-N Glu-Asn-Glu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O YYOBUPFZLKQUAX-FXQIFTODSA-N 0.000 description 1
- ZJICFHQSPWFBKP-AVGNSLFASA-N Glu-Asn-Tyr Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O ZJICFHQSPWFBKP-AVGNSLFASA-N 0.000 description 1
- KASDBWKLWJKTLJ-GUBZILKMSA-N Glu-Glu-Met Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCSC)C(O)=O KASDBWKLWJKTLJ-GUBZILKMSA-N 0.000 description 1
- PHONAZGUEGIOEM-GLLZPBPUSA-N Glu-Glu-Thr Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O PHONAZGUEGIOEM-GLLZPBPUSA-N 0.000 description 1
- QJCKNLPMTPXXEM-AUTRQRHGSA-N Glu-Glu-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CCC(O)=O QJCKNLPMTPXXEM-AUTRQRHGSA-N 0.000 description 1
- LRPXYSGPOBVBEH-IUCAKERBSA-N Glu-Gly-Leu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)NCC(=O)N[C@@H](CC(C)C)C(O)=O LRPXYSGPOBVBEH-IUCAKERBSA-N 0.000 description 1
- ZWQVYZXPYSYPJD-RYUDHWBXSA-N Glu-Gly-Phe Chemical compound OC(=O)CC[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 ZWQVYZXPYSYPJD-RYUDHWBXSA-N 0.000 description 1
- XMPAXPSENRSOSV-RYUDHWBXSA-N Glu-Gly-Tyr Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)NCC(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O XMPAXPSENRSOSV-RYUDHWBXSA-N 0.000 description 1
- QXDXIXFSFHUYAX-MNXVOIDGSA-N Glu-Ile-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@@H](N)CCC(O)=O QXDXIXFSFHUYAX-MNXVOIDGSA-N 0.000 description 1
- WTMZXOPHTIVFCP-QEWYBTABSA-N Glu-Ile-Phe Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 WTMZXOPHTIVFCP-QEWYBTABSA-N 0.000 description 1
- ZHNHJYYFCGUZNQ-KBIXCLLPSA-N Glu-Ile-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@@H](N)CCC(O)=O ZHNHJYYFCGUZNQ-KBIXCLLPSA-N 0.000 description 1
- HVYWQYLBVXMXSV-GUBZILKMSA-N Glu-Leu-Ala Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(O)=O HVYWQYLBVXMXSV-GUBZILKMSA-N 0.000 description 1
- VMKCPNBBPGGQBJ-GUBZILKMSA-N Glu-Leu-Asn Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCC(=O)O)N VMKCPNBBPGGQBJ-GUBZILKMSA-N 0.000 description 1
- MWMJCGBSIORNCD-AVGNSLFASA-N Glu-Leu-Leu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O MWMJCGBSIORNCD-AVGNSLFASA-N 0.000 description 1
- SJJHXJDSNQJMMW-SRVKXCTJSA-N Glu-Lys-Arg Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCN=C(N)N)C(O)=O SJJHXJDSNQJMMW-SRVKXCTJSA-N 0.000 description 1
- OCJRHJZKGGSPRW-IUCAKERBSA-N Glu-Lys-Gly Chemical compound NCCCC[C@@H](C(=O)NCC(O)=O)NC(=O)[C@@H](N)CCC(O)=O OCJRHJZKGGSPRW-IUCAKERBSA-N 0.000 description 1
- ILWHFUZZCFYSKT-AVGNSLFASA-N Glu-Lys-Leu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(O)=O ILWHFUZZCFYSKT-AVGNSLFASA-N 0.000 description 1
- ZIYGTCDTJJCDDP-JYJNAYRXSA-N Glu-Phe-Lys Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CCC(=O)O)N ZIYGTCDTJJCDDP-JYJNAYRXSA-N 0.000 description 1
- GMVCSRBOSIUTFC-FXQIFTODSA-N Glu-Ser-Glu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(O)=O GMVCSRBOSIUTFC-FXQIFTODSA-N 0.000 description 1
- DTLLNDVORUEOTM-WDCWCFNPSA-N Glu-Thr-Lys Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCCN)C(O)=O DTLLNDVORUEOTM-WDCWCFNPSA-N 0.000 description 1
- UMZHHILWZBFPGL-LOKLDPHHSA-N Glu-Thr-Pro Chemical compound C[C@H]([C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CCC(=O)O)N)O UMZHHILWZBFPGL-LOKLDPHHSA-N 0.000 description 1
- HQTDNEZTGZUWSY-XVKPBYJWSA-N Glu-Val-Gly Chemical compound CC(C)[C@H](NC(=O)[C@@H](N)CCC(O)=O)C(=O)NCC(O)=O HQTDNEZTGZUWSY-XVKPBYJWSA-N 0.000 description 1
- FGGKGJHCVMYGCD-UKJIMTQDSA-N Glu-Val-Ile Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O FGGKGJHCVMYGCD-UKJIMTQDSA-N 0.000 description 1
- 108010024636 Glutathione Proteins 0.000 description 1
- KRRMJKMGWWXWDW-STQMWFEESA-N Gly-Arg-Phe Chemical compound NC(=N)NCCC[C@H](NC(=O)CN)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 KRRMJKMGWWXWDW-STQMWFEESA-N 0.000 description 1
- LXXLEUBUOMCAMR-NKWVEPMBSA-N Gly-Asp-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC(=O)O)NC(=O)CN)C(=O)O LXXLEUBUOMCAMR-NKWVEPMBSA-N 0.000 description 1
- LCNXZQROPKFGQK-WHFBIAKZSA-N Gly-Asp-Ser Chemical compound NCC(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(O)=O LCNXZQROPKFGQK-WHFBIAKZSA-N 0.000 description 1
- PEZZSFLFXXFUQD-XPUUQOCRSA-N Gly-Cys-Val Chemical compound [H]NCC(=O)N[C@@H](CS)C(=O)N[C@@H](C(C)C)C(O)=O PEZZSFLFXXFUQD-XPUUQOCRSA-N 0.000 description 1
- UFPXDFOYHVEIPI-BYPYZUCNSA-N Gly-Gly-Asp Chemical compound NCC(=O)NCC(=O)N[C@H](C(O)=O)CC(O)=O UFPXDFOYHVEIPI-BYPYZUCNSA-N 0.000 description 1
- UUWOBINZFGTFMS-UWVGGRQHSA-N Gly-His-Met Chemical compound [H]NCC(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CCSC)C(O)=O UUWOBINZFGTFMS-UWVGGRQHSA-N 0.000 description 1
- NSTUFLGQJCOCDL-UWVGGRQHSA-N Gly-Leu-Arg Chemical compound NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CCCN=C(N)N NSTUFLGQJCOCDL-UWVGGRQHSA-N 0.000 description 1
- IUZGUFAJDBHQQV-YUMQZZPRSA-N Gly-Leu-Asn Chemical compound NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(O)=O IUZGUFAJDBHQQV-YUMQZZPRSA-N 0.000 description 1
- LHYJCVCQPWRMKZ-WEDXCCLWSA-N Gly-Leu-Thr Chemical compound [H]NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O LHYJCVCQPWRMKZ-WEDXCCLWSA-N 0.000 description 1
- VBOBNHSVQKKTOT-YUMQZZPRSA-N Gly-Lys-Ala Chemical compound [H]NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C)C(O)=O VBOBNHSVQKKTOT-YUMQZZPRSA-N 0.000 description 1
- MHZXESQPPXOING-KBPBESRZSA-N Gly-Lys-Phe Chemical compound [H]NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O MHZXESQPPXOING-KBPBESRZSA-N 0.000 description 1
- NTBOEZICHOSJEE-YUMQZZPRSA-N Gly-Lys-Ser Chemical compound [H]NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(O)=O NTBOEZICHOSJEE-YUMQZZPRSA-N 0.000 description 1
- UWQDKRIZSROAKS-FJXKBIBVSA-N Gly-Met-Thr Chemical compound [H]NCC(=O)N[C@@H](CCSC)C(=O)N[C@@H]([C@@H](C)O)C(O)=O UWQDKRIZSROAKS-FJXKBIBVSA-N 0.000 description 1
- SCJJPCQUJYPHRZ-BQBZGAKWSA-N Gly-Pro-Asn Chemical compound NCC(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(N)=O)C(O)=O SCJJPCQUJYPHRZ-BQBZGAKWSA-N 0.000 description 1
- YOBGUCWZPXJHTN-BQBZGAKWSA-N Gly-Ser-Arg Chemical compound NCC(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCCN=C(N)N YOBGUCWZPXJHTN-BQBZGAKWSA-N 0.000 description 1
- WCORRBXVISTKQL-WHFBIAKZSA-N Gly-Ser-Ser Chemical compound NCC(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(O)=O WCORRBXVISTKQL-WHFBIAKZSA-N 0.000 description 1
- LLWQVJNHMYBLLK-CDMKHQONSA-N Gly-Thr-Phe Chemical compound [H]NCC(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O LLWQVJNHMYBLLK-CDMKHQONSA-N 0.000 description 1
- MUGLKCQHTUFLGF-WPRPVWTQSA-N Gly-Val-Met Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)O)NC(=O)CN MUGLKCQHTUFLGF-WPRPVWTQSA-N 0.000 description 1
- 239000004471 Glycine Substances 0.000 description 1
- PDSUIXMZYNURGI-AVGNSLFASA-N His-Arg-Arg Chemical compound NC(N)=NCCC[C@@H](C(O)=O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H](N)CC1=CN=CN1 PDSUIXMZYNURGI-AVGNSLFASA-N 0.000 description 1
- MWAJSVTZZOUOBU-IHRRRGAJSA-N His-Arg-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H](N)CC1=CN=CN1 MWAJSVTZZOUOBU-IHRRRGAJSA-N 0.000 description 1
- ORERHHPZDDEMSC-VGDYDELISA-N His-Ile-Ser Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@H](CC1=CN=CN1)N ORERHHPZDDEMSC-VGDYDELISA-N 0.000 description 1
- XDIVYNSPYBLSME-DCAQKATOSA-N His-Met-Asp Chemical compound CSCC[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)[C@H](CC1=CN=CN1)N XDIVYNSPYBLSME-DCAQKATOSA-N 0.000 description 1
- BZAQOPHNBFOOJS-DCAQKATOSA-N His-Pro-Asp Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(O)=O)C(O)=O BZAQOPHNBFOOJS-DCAQKATOSA-N 0.000 description 1
- PBVQWNDMFFCPIZ-ULQDDVLXSA-N His-Pro-Phe Chemical compound C([C@H](N)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CN=CN1 PBVQWNDMFFCPIZ-ULQDDVLXSA-N 0.000 description 1
- FCPSGEVYIVXPPO-QTKMDUPCSA-N His-Thr-Arg Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O FCPSGEVYIVXPPO-QTKMDUPCSA-N 0.000 description 1
- UPJODPVSKKWGDQ-KLHWPWHYSA-N His-Thr-Pro Chemical compound C[C@H]([C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CC2=CN=CN2)N)O UPJODPVSKKWGDQ-KLHWPWHYSA-N 0.000 description 1
- LQSBBHNVAVNZSX-GHCJXIJMSA-N Ile-Ala-Asn Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](C)C(=O)N[C@@H](CC(=O)N)C(=O)O)N LQSBBHNVAVNZSX-GHCJXIJMSA-N 0.000 description 1
- FVEWRQXNISSYFO-ZPFDUUQYSA-N Ile-Arg-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N FVEWRQXNISSYFO-ZPFDUUQYSA-N 0.000 description 1
- SPQWWEZBHXHUJN-KBIXCLLPSA-N Ile-Glu-Ser Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(O)=O SPQWWEZBHXHUJN-KBIXCLLPSA-N 0.000 description 1
- MQFGXJNSUJTXDT-QSFUFRPTSA-N Ile-Gly-Ile Chemical compound N[C@@H]([C@@H](C)CC)C(=O)NCC(=O)N[C@@H]([C@@H](C)CC)C(=O)O MQFGXJNSUJTXDT-QSFUFRPTSA-N 0.000 description 1
- PWDSHAAAFXISLE-SXTJYALSSA-N Ile-Ile-Asp Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(O)=O)C(O)=O PWDSHAAAFXISLE-SXTJYALSSA-N 0.000 description 1
- GVNNAHIRSDRIII-AJNGGQMLSA-N Ile-Lys-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)O)N GVNNAHIRSDRIII-AJNGGQMLSA-N 0.000 description 1
- AKOYRLRUFBZOSP-BJDJZHNGSA-N Ile-Lys-Ser Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(=O)O)N AKOYRLRUFBZOSP-BJDJZHNGSA-N 0.000 description 1
- UOPBQSJRBONRON-STECZYCISA-N Ile-Met-Tyr Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CCSC)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 UOPBQSJRBONRON-STECZYCISA-N 0.000 description 1
- XLXPYSDGMXTTNQ-UHFFFAOYSA-N Ile-Phe-Leu Natural products CCC(C)C(N)C(=O)NC(C(=O)NC(CC(C)C)C(O)=O)CC1=CC=CC=C1 XLXPYSDGMXTTNQ-UHFFFAOYSA-N 0.000 description 1
- VEPIBPGLTLPBDW-URLPEUOOSA-N Ile-Phe-Thr Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)O)C(=O)O)N VEPIBPGLTLPBDW-URLPEUOOSA-N 0.000 description 1
- KCTIFOCXAIUQQK-QXEWZRGKSA-N Ile-Pro-Gly Chemical compound CC[C@H](C)[C@H](N)C(=O)N1CCC[C@H]1C(=O)NCC(O)=O KCTIFOCXAIUQQK-QXEWZRGKSA-N 0.000 description 1
- ZNOBVZFCHNHKHA-KBIXCLLPSA-N Ile-Ser-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N ZNOBVZFCHNHKHA-KBIXCLLPSA-N 0.000 description 1
- PXKACEXYLPBMAD-JBDRJPRFSA-N Ile-Ser-Ser Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)O)N PXKACEXYLPBMAD-JBDRJPRFSA-N 0.000 description 1
- QGXQHJQPAPMACW-PPCPHDFISA-N Ile-Thr-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCCN)C(=O)O)N QGXQHJQPAPMACW-PPCPHDFISA-N 0.000 description 1
- 108060003951 Immunoglobulin Proteins 0.000 description 1
- 108010065920 Insulin Lispro Proteins 0.000 description 1
- 108091092195 Intron Proteins 0.000 description 1
- ONIBWKKTOPOVIA-BYPYZUCNSA-N L-Proline Chemical compound OC(=O)[C@@H]1CCCN1 ONIBWKKTOPOVIA-BYPYZUCNSA-N 0.000 description 1
- SITWEMZOJNKJCH-UHFFFAOYSA-N L-alanine-L-arginine Natural products CC(N)C(=O)NC(C(O)=O)CCCNC(N)=N SITWEMZOJNKJCH-UHFFFAOYSA-N 0.000 description 1
- ODKSFYDXXFIFQN-BYPYZUCNSA-P L-argininium(2+) Chemical compound NC(=[NH2+])NCCC[C@H]([NH3+])C(O)=O ODKSFYDXXFIFQN-BYPYZUCNSA-P 0.000 description 1
- ZDXPYRJPNDTMRX-VKHMYHEASA-N L-glutamine Chemical compound OC(=O)[C@@H](N)CCC(N)=O ZDXPYRJPNDTMRX-VKHMYHEASA-N 0.000 description 1
- HNDVDQJCIGZPNO-YFKPBYRVSA-N L-histidine Chemical compound OC(=O)[C@@H](N)CC1=CN=CN1 HNDVDQJCIGZPNO-YFKPBYRVSA-N 0.000 description 1
- RCFDOSNHHZGBOY-UHFFFAOYSA-N L-isoleucyl-L-alanine Natural products CCC(C)C(N)C(=O)NC(C)C(O)=O RCFDOSNHHZGBOY-UHFFFAOYSA-N 0.000 description 1
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 description 1
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 description 1
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 description 1
- LZDNBBYBDGBADK-UHFFFAOYSA-N L-valyl-L-tryptophan Natural products C1=CC=C2C(CC(NC(=O)C(N)C(C)C)C(O)=O)=CNC2=C1 LZDNBBYBDGBADK-UHFFFAOYSA-N 0.000 description 1
- CZCSUZMIRKFFFA-CIUDSAMLSA-N Leu-Ala-Asn Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(N)=O)C(O)=O CZCSUZMIRKFFFA-CIUDSAMLSA-N 0.000 description 1
- ZRLUISBDKUWAIZ-CIUDSAMLSA-N Leu-Ala-Asp Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CC(O)=O ZRLUISBDKUWAIZ-CIUDSAMLSA-N 0.000 description 1
- BPANDPNDMJHFEV-CIUDSAMLSA-N Leu-Asp-Ala Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C)C(O)=O BPANDPNDMJHFEV-CIUDSAMLSA-N 0.000 description 1
- PVMPDMIKUVNOBD-CIUDSAMLSA-N Leu-Asp-Ser Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(O)=O PVMPDMIKUVNOBD-CIUDSAMLSA-N 0.000 description 1
- BABSVXFGKFLIGW-UWVGGRQHSA-N Leu-Gly-Arg Chemical compound CC(C)C[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CCCNC(N)=N BABSVXFGKFLIGW-UWVGGRQHSA-N 0.000 description 1
- HYIFFZAQXPUEAU-QWRGUYRKSA-N Leu-Gly-Leu Chemical compound CC(C)C[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CC(C)C HYIFFZAQXPUEAU-QWRGUYRKSA-N 0.000 description 1
- APFJUBGRZGMQFF-QWRGUYRKSA-N Leu-Gly-Lys Chemical compound CC(C)C[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CCCCN APFJUBGRZGMQFF-QWRGUYRKSA-N 0.000 description 1
- QJXHMYMRGDOHRU-NHCYSSNCSA-N Leu-Ile-Gly Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)NCC(O)=O QJXHMYMRGDOHRU-NHCYSSNCSA-N 0.000 description 1
- LIINDKYIGYTDLG-PPCPHDFISA-N Leu-Ile-Thr Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(O)=O LIINDKYIGYTDLG-PPCPHDFISA-N 0.000 description 1
- OTXBNHIUIHNGAO-UWVGGRQHSA-N Leu-Lys Chemical compound CC(C)C[C@H](N)C(=O)N[C@H](C(O)=O)CCCCN OTXBNHIUIHNGAO-UWVGGRQHSA-N 0.000 description 1
- VCHVSKNMTXWIIP-SRVKXCTJSA-N Leu-Lys-Ser Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(O)=O VCHVSKNMTXWIIP-SRVKXCTJSA-N 0.000 description 1
- POMXSEDNUXYPGK-IHRRRGAJSA-N Leu-Met-His Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)N POMXSEDNUXYPGK-IHRRRGAJSA-N 0.000 description 1
- VULJUQZPSOASBZ-SRVKXCTJSA-N Leu-Pro-Glu Chemical compound [H]N[C@@H](CC(C)C)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(O)=O VULJUQZPSOASBZ-SRVKXCTJSA-N 0.000 description 1
- IDGZVZJLYFTXSL-DCAQKATOSA-N Leu-Ser-Arg Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCCN=C(N)N IDGZVZJLYFTXSL-DCAQKATOSA-N 0.000 description 1
- KIZIOFNVSOSKJI-CIUDSAMLSA-N Leu-Ser-Cys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CS)C(=O)O)N KIZIOFNVSOSKJI-CIUDSAMLSA-N 0.000 description 1
- AIQWYVFNBNNOLU-RHYQMDGZSA-N Leu-Thr-Val Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(O)=O AIQWYVFNBNNOLU-RHYQMDGZSA-N 0.000 description 1
- QESXLSQLQHHTIX-RHYQMDGZSA-N Leu-Val-Thr Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O QESXLSQLQHHTIX-RHYQMDGZSA-N 0.000 description 1
- RVOMPSJXSRPFJT-DCAQKATOSA-N Lys-Ala-Arg Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](C)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O RVOMPSJXSRPFJT-DCAQKATOSA-N 0.000 description 1
- VHXMZJGOKIMETG-CQDKDKBSSA-N Lys-Ala-Tyr Chemical compound C[C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)O)NC(=O)[C@H](CCCCN)N VHXMZJGOKIMETG-CQDKDKBSSA-N 0.000 description 1
- YNNPKXBBRZVIRX-IHRRRGAJSA-N Lys-Arg-Leu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(C)C)C(O)=O YNNPKXBBRZVIRX-IHRRRGAJSA-N 0.000 description 1
- SJNZALDHDUYDBU-IHRRRGAJSA-N Lys-Arg-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCCCN)C(O)=O SJNZALDHDUYDBU-IHRRRGAJSA-N 0.000 description 1
- SVJRVFPSHPGWFF-DCAQKATOSA-N Lys-Cys-Arg Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CS)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O SVJRVFPSHPGWFF-DCAQKATOSA-N 0.000 description 1
- SSYOBDBNBQBSQE-SRVKXCTJSA-N Lys-Cys-Leu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(C)C)C(O)=O SSYOBDBNBQBSQE-SRVKXCTJSA-N 0.000 description 1
- DCRWPTBMWMGADO-AVGNSLFASA-N Lys-Glu-Leu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(O)=O DCRWPTBMWMGADO-AVGNSLFASA-N 0.000 description 1
- WGLAORUKDGRINI-WDCWCFNPSA-N Lys-Glu-Thr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O WGLAORUKDGRINI-WDCWCFNPSA-N 0.000 description 1
- HGNRJCINZYHNOU-LURJTMIESA-N Lys-Gly Chemical compound NCCCC[C@H](N)C(=O)NCC(O)=O HGNRJCINZYHNOU-LURJTMIESA-N 0.000 description 1
- ITWQLSZTLBKWJM-YUMQZZPRSA-N Lys-Gly-Ala Chemical compound OC(=O)[C@H](C)NC(=O)CNC(=O)[C@@H](N)CCCCN ITWQLSZTLBKWJM-YUMQZZPRSA-N 0.000 description 1
- QZONCCHVHCOBSK-YUMQZZPRSA-N Lys-Gly-Asn Chemical compound [H]N[C@@H](CCCCN)C(=O)NCC(=O)N[C@@H](CC(N)=O)C(O)=O QZONCCHVHCOBSK-YUMQZZPRSA-N 0.000 description 1
- ISHNZELVUVPCHY-ZETCQYMHSA-N Lys-Gly-Gly Chemical compound NCCCC[C@H](N)C(=O)NCC(=O)NCC(O)=O ISHNZELVUVPCHY-ZETCQYMHSA-N 0.000 description 1
- NNKLKUUGESXCBS-KBPBESRZSA-N Lys-Gly-Tyr Chemical compound [H]N[C@@H](CCCCN)C(=O)NCC(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O NNKLKUUGESXCBS-KBPBESRZSA-N 0.000 description 1
- QOJDBRUCOXQSSK-AJNGGQMLSA-N Lys-Ile-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCCCN)C(O)=O QOJDBRUCOXQSSK-AJNGGQMLSA-N 0.000 description 1
- AIRZWUMAHCDDHR-KKUMJFAQSA-N Lys-Leu-Leu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O AIRZWUMAHCDDHR-KKUMJFAQSA-N 0.000 description 1
- NVGBPTNZLWRQSY-UWVGGRQHSA-N Lys-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@H](C(O)=O)CCCCN NVGBPTNZLWRQSY-UWVGGRQHSA-N 0.000 description 1
- URGPVYGVWLIRGT-DCAQKATOSA-N Lys-Met-Ala Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](C)C(O)=O URGPVYGVWLIRGT-DCAQKATOSA-N 0.000 description 1
- MIROMRNASYKZNL-ULQDDVLXSA-N Lys-Pro-Tyr Chemical compound NCCCC[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 MIROMRNASYKZNL-ULQDDVLXSA-N 0.000 description 1
- MGKFCQFVPKOWOL-CIUDSAMLSA-N Lys-Ser-Asp Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(=O)O)C(=O)O)N MGKFCQFVPKOWOL-CIUDSAMLSA-N 0.000 description 1
- ZUGVARDEGWMMLK-SRVKXCTJSA-N Lys-Ser-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCCCN ZUGVARDEGWMMLK-SRVKXCTJSA-N 0.000 description 1
- JHNOXVASMSXSNB-WEDXCCLWSA-N Lys-Thr-Gly Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(=O)NCC(O)=O JHNOXVASMSXSNB-WEDXCCLWSA-N 0.000 description 1
- DLCAXBGXGOVUCD-PPCPHDFISA-N Lys-Thr-Ile Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O DLCAXBGXGOVUCD-PPCPHDFISA-N 0.000 description 1
- RPWTZTBIFGENIA-VOAKCMCISA-N Lys-Thr-Leu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(O)=O RPWTZTBIFGENIA-VOAKCMCISA-N 0.000 description 1
- RMKJOQSYLQQRFN-KKUMJFAQSA-N Lys-Tyr-Asp Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(O)=O)C(O)=O RMKJOQSYLQQRFN-KKUMJFAQSA-N 0.000 description 1
- VWPJQIHBBOJWDN-DCAQKATOSA-N Lys-Val-Ala Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(O)=O VWPJQIHBBOJWDN-DCAQKATOSA-N 0.000 description 1
- VWJFOUBDZIUXGA-AVGNSLFASA-N Lys-Val-Met Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)O)NC(=O)[C@H](CCCCN)N VWJFOUBDZIUXGA-AVGNSLFASA-N 0.000 description 1
- 239000004472 Lysine Substances 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 108091027974 Mature messenger RNA Proteins 0.000 description 1
- UYAKZHGIPRCGPF-CIUDSAMLSA-N Met-Glu-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCSC)N UYAKZHGIPRCGPF-CIUDSAMLSA-N 0.000 description 1
- GPAHWYRSHCKICP-GUBZILKMSA-N Met-Glu-Glu Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O GPAHWYRSHCKICP-GUBZILKMSA-N 0.000 description 1
- UZWMJZSOXGOVIN-LURJTMIESA-N Met-Gly-Gly Chemical compound CSCC[C@H](N)C(=O)NCC(=O)NCC(O)=O UZWMJZSOXGOVIN-LURJTMIESA-N 0.000 description 1
- XPVCDCMPKCERFT-GUBZILKMSA-N Met-Ser-Arg Chemical compound [H]N[C@@H](CCSC)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O XPVCDCMPKCERFT-GUBZILKMSA-N 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 108091092878 Microsatellite Proteins 0.000 description 1
- 241000713333 Mouse mammary tumor virus Species 0.000 description 1
- AUEJLPRZGVVDNU-UHFFFAOYSA-N N-L-tyrosyl-L-leucine Natural products CC(C)CC(C(O)=O)NC(=O)C(N)CC1=CC=C(O)C=C1 AUEJLPRZGVVDNU-UHFFFAOYSA-N 0.000 description 1
- XMBSYZWANAQXEV-UHFFFAOYSA-N N-alpha-L-glutamyl-L-phenylalanine Natural products OC(=O)CCC(N)C(=O)NC(C(O)=O)CC1=CC=CC=C1 XMBSYZWANAQXEV-UHFFFAOYSA-N 0.000 description 1
- KZNQNBZMBZJQJO-UHFFFAOYSA-N N-glycyl-L-proline Natural products NCC(=O)N1CCCC1C(O)=O KZNQNBZMBZJQJO-UHFFFAOYSA-N 0.000 description 1
- AJHCSUXXECOXOY-UHFFFAOYSA-N N-glycyl-L-tryptophan Natural products C1=CC=C2C(CC(NC(=O)CN)C(O)=O)=CNC2=C1 AJHCSUXXECOXOY-UHFFFAOYSA-N 0.000 description 1
- 108010079364 N-glycylalanine Proteins 0.000 description 1
- 108010066427 N-valyltryptophan Proteins 0.000 description 1
- 229930193140 Neomycin Natural products 0.000 description 1
- 102000007999 Nuclear Proteins Human genes 0.000 description 1
- 108010089610 Nuclear Proteins Proteins 0.000 description 1
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 1
- 108020005187 Oligonucleotide Probes Proteins 0.000 description 1
- 108010030544 Peptidyl-Lys metalloendopeptidase Proteins 0.000 description 1
- MPGJIHFJCXTVEX-KKUMJFAQSA-N Phe-Arg-Glu Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(O)=O MPGJIHFJCXTVEX-KKUMJFAQSA-N 0.000 description 1
- QCHNRQQVLJYDSI-DLOVCJGASA-N Phe-Asn-Ala Chemical compound OC(=O)[C@H](C)NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H](N)CC1=CC=CC=C1 QCHNRQQVLJYDSI-DLOVCJGASA-N 0.000 description 1
- RIYZXJVARWJLKS-KKUMJFAQSA-N Phe-Asp-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CC1=CC=CC=C1 RIYZXJVARWJLKS-KKUMJFAQSA-N 0.000 description 1
- OJUMUUXGSXUZJZ-SRVKXCTJSA-N Phe-Asp-Ser Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(O)=O OJUMUUXGSXUZJZ-SRVKXCTJSA-N 0.000 description 1
- PSKRILMFHNIUAO-JYJNAYRXSA-N Phe-Glu-Lys Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)O)N PSKRILMFHNIUAO-JYJNAYRXSA-N 0.000 description 1
- HBGFEEQFVBWYJQ-KBPBESRZSA-N Phe-Gly-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)CNC(=O)[C@@H](N)CC1=CC=CC=C1 HBGFEEQFVBWYJQ-KBPBESRZSA-N 0.000 description 1
- YCCUXNNKXDGMAM-KKUMJFAQSA-N Phe-Leu-Ser Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(O)=O YCCUXNNKXDGMAM-KKUMJFAQSA-N 0.000 description 1
- KNYPNEYICHHLQL-ACRUOGEOSA-N Phe-Leu-Tyr Chemical compound C([C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(O)=O)C1=CC=CC=C1 KNYPNEYICHHLQL-ACRUOGEOSA-N 0.000 description 1
- SCKXGHWQPPURGT-KKUMJFAQSA-N Phe-Lys-Ser Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(O)=O SCKXGHWQPPURGT-KKUMJFAQSA-N 0.000 description 1
- FENSZYFJQOFSQR-FIRPJDEBSA-N Phe-Phe-Ile Chemical compound C([C@@H](C(=O)N[C@@H]([C@@H](C)CC)C(O)=O)NC(=O)[C@@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FENSZYFJQOFSQR-FIRPJDEBSA-N 0.000 description 1
- AAERWTUHZKLDLC-IHRRRGAJSA-N Phe-Pro-Asp Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(O)=O)C(O)=O AAERWTUHZKLDLC-IHRRRGAJSA-N 0.000 description 1
- ZVRJWDUPIDMHDN-ULQDDVLXSA-N Phe-Pro-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)CC1=CC=CC=C1 ZVRJWDUPIDMHDN-ULQDDVLXSA-N 0.000 description 1
- XDMMOISUAHXXFD-SRVKXCTJSA-N Phe-Ser-Asp Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(O)=O XDMMOISUAHXXFD-SRVKXCTJSA-N 0.000 description 1
- JMCOUWKXLXDERB-WMZOPIPTSA-N Phe-Trp Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(O)=O)C1=CC=CC=C1 JMCOUWKXLXDERB-WMZOPIPTSA-N 0.000 description 1
- ZYNBEWGJFXTBDU-ACRUOGEOSA-N Phe-Tyr-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)NC(=O)[C@H](CC2=CC=CC=C2)N ZYNBEWGJFXTBDU-ACRUOGEOSA-N 0.000 description 1
- APMXLWHMIVWLLR-BZSNNMDCSA-N Phe-Tyr-Ser Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N[C@@H](CO)C(O)=O)C1=CC=CC=C1 APMXLWHMIVWLLR-BZSNNMDCSA-N 0.000 description 1
- KPDRZQUWJKTMBP-DCAQKATOSA-N Pro-Asp-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@@H]1CCCN1 KPDRZQUWJKTMBP-DCAQKATOSA-N 0.000 description 1
- HXOLCSYHGRNXJJ-IHRRRGAJSA-N Pro-Asp-Phe Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O HXOLCSYHGRNXJJ-IHRRRGAJSA-N 0.000 description 1
- SFECXGVELZFBFJ-VEVYYDQMSA-N Pro-Asp-Thr Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O SFECXGVELZFBFJ-VEVYYDQMSA-N 0.000 description 1
- NOXSEHJOXCWRHK-DCAQKATOSA-N Pro-Cys-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CS)NC(=O)[C@@H]1CCCN1 NOXSEHJOXCWRHK-DCAQKATOSA-N 0.000 description 1
- FRKBNXCFJBPJOL-GUBZILKMSA-N Pro-Glu-Glu Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O FRKBNXCFJBPJOL-GUBZILKMSA-N 0.000 description 1
- NMELOOXSGDRBRU-YUMQZZPRSA-N Pro-Glu-Gly Chemical compound OC(=O)CNC(=O)[C@H](CCC(=O)O)NC(=O)[C@@H]1CCCN1 NMELOOXSGDRBRU-YUMQZZPRSA-N 0.000 description 1
- WVOXLKUUVCCCSU-ZPFDUUQYSA-N Pro-Glu-Ile Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O WVOXLKUUVCCCSU-ZPFDUUQYSA-N 0.000 description 1
- PTLOFJZJADCNCD-DCAQKATOSA-N Pro-Glu-Met Chemical compound CSCC[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@@H]1CCCN1 PTLOFJZJADCNCD-DCAQKATOSA-N 0.000 description 1
- VYWNORHENYEQDW-YUMQZZPRSA-N Pro-Gly-Glu Chemical compound OC(=O)CC[C@@H](C(O)=O)NC(=O)CNC(=O)[C@@H]1CCCN1 VYWNORHENYEQDW-YUMQZZPRSA-N 0.000 description 1
- FEVDNIBDCRKMER-IUCAKERBSA-N Pro-Gly-Met Chemical compound CSCC[C@@H](C(=O)O)NC(=O)CNC(=O)[C@@H]1CCCN1 FEVDNIBDCRKMER-IUCAKERBSA-N 0.000 description 1
- OFGUOWQVEGTVNU-DCAQKATOSA-N Pro-Lys-Ala Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C)C(O)=O OFGUOWQVEGTVNU-DCAQKATOSA-N 0.000 description 1
- WFIVLLFYUZZWOD-RHYQMDGZSA-N Pro-Lys-Thr Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(O)=O WFIVLLFYUZZWOD-RHYQMDGZSA-N 0.000 description 1
- SNGZLPOXVRTNMB-LPEHRKFASA-N Pro-Ser-Pro Chemical compound C1C[C@H](NC1)C(=O)N[C@@H](CO)C(=O)N2CCC[C@@H]2C(=O)O SNGZLPOXVRTNMB-LPEHRKFASA-N 0.000 description 1
- RMJZWERKFFNNNS-XGEHTFHBSA-N Pro-Thr-Ser Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(O)=O RMJZWERKFFNNNS-XGEHTFHBSA-N 0.000 description 1
- MCPXQHVVCPTRIM-HJOGWXRNSA-N Pro-Trp-Trp Chemical compound N([C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)O)C(=O)[C@@H]1CCCN1 MCPXQHVVCPTRIM-HJOGWXRNSA-N 0.000 description 1
- IIRBTQHFVNGPMQ-AVGNSLFASA-N Pro-Val-Lys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@@H]1CCCN1 IIRBTQHFVNGPMQ-AVGNSLFASA-N 0.000 description 1
- ONIBWKKTOPOVIA-UHFFFAOYSA-N Proline Natural products OC(=O)C1CCCN1 ONIBWKKTOPOVIA-UHFFFAOYSA-N 0.000 description 1
- 108010076504 Protein Sorting Signals Proteins 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- HRNQLKCLPVKZNE-CIUDSAMLSA-N Ser-Ala-Leu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(O)=O HRNQLKCLPVKZNE-CIUDSAMLSA-N 0.000 description 1
- BRKHVZNDAOMAHX-BIIVOSGPSA-N Ser-Ala-Pro Chemical compound C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CO)N BRKHVZNDAOMAHX-BIIVOSGPSA-N 0.000 description 1
- JPIDMRXXNMIVKY-VZFHVOOUSA-N Ser-Ala-Thr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O JPIDMRXXNMIVKY-VZFHVOOUSA-N 0.000 description 1
- GXXTUIUYTWGPMV-FXQIFTODSA-N Ser-Arg-Ala Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C)C(O)=O GXXTUIUYTWGPMV-FXQIFTODSA-N 0.000 description 1
- QFBNNYNWKYKVJO-DCAQKATOSA-N Ser-Arg-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)CO)CCCN=C(N)N QFBNNYNWKYKVJO-DCAQKATOSA-N 0.000 description 1
- NRCJWSGXMAPYQX-LPEHRKFASA-N Ser-Arg-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CO)N)C(=O)O NRCJWSGXMAPYQX-LPEHRKFASA-N 0.000 description 1
- BNFVPSRLHHPQKS-WHFBIAKZSA-N Ser-Asp-Gly Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(=O)NCC(O)=O BNFVPSRLHHPQKS-WHFBIAKZSA-N 0.000 description 1
- QPFJSHSJFIYDJZ-GHCJXIJMSA-N Ser-Asp-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CO QPFJSHSJFIYDJZ-GHCJXIJMSA-N 0.000 description 1
- SWSRFJZZMNLMLY-ZKWXMUAHSA-N Ser-Asp-Val Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(O)=O SWSRFJZZMNLMLY-ZKWXMUAHSA-N 0.000 description 1
- MPPHJZYXDVDGOF-BWBBJGPYSA-N Ser-Cys-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)[C@H](CS)NC(=O)[C@@H](N)CO MPPHJZYXDVDGOF-BWBBJGPYSA-N 0.000 description 1
- UOLGINIHBRIECN-FXQIFTODSA-N Ser-Glu-Glu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O UOLGINIHBRIECN-FXQIFTODSA-N 0.000 description 1
- LALNXSXEYFUUDD-GUBZILKMSA-N Ser-Glu-Leu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(O)=O LALNXSXEYFUUDD-GUBZILKMSA-N 0.000 description 1
- UFKPDBLKLOBMRH-XHNCKOQMSA-N Ser-Glu-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CO)N)C(=O)O UFKPDBLKLOBMRH-XHNCKOQMSA-N 0.000 description 1
- VQBCMLMPEWPUTB-ACZMJKKPSA-N Ser-Glu-Ser Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(O)=O VQBCMLMPEWPUTB-ACZMJKKPSA-N 0.000 description 1
- GZBKRJVCRMZAST-XKBZYTNZSA-N Ser-Glu-Thr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O GZBKRJVCRMZAST-XKBZYTNZSA-N 0.000 description 1
- MIJWOJAXARLEHA-WDSKDSINSA-N Ser-Gly-Glu Chemical compound OC[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CCC(O)=O MIJWOJAXARLEHA-WDSKDSINSA-N 0.000 description 1
- YMTLKLXDFCSCNX-BYPYZUCNSA-N Ser-Gly-Gly Chemical compound OC[C@H](N)C(=O)NCC(=O)NCC(O)=O YMTLKLXDFCSCNX-BYPYZUCNSA-N 0.000 description 1
- JFWDJFULOLKQFY-QWRGUYRKSA-N Ser-Gly-Phe Chemical compound [H]N[C@@H](CO)C(=O)NCC(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O JFWDJFULOLKQFY-QWRGUYRKSA-N 0.000 description 1
- XXXAXOWMBOKTRN-XPUUQOCRSA-N Ser-Gly-Val Chemical compound [H]N[C@@H](CO)C(=O)NCC(=O)N[C@@H](C(C)C)C(O)=O XXXAXOWMBOKTRN-XPUUQOCRSA-N 0.000 description 1
- QYSFWUIXDFJUDW-DCAQKATOSA-N Ser-Leu-Arg Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O QYSFWUIXDFJUDW-DCAQKATOSA-N 0.000 description 1
- UBRMZSHOOIVJPW-SRVKXCTJSA-N Ser-Leu-Lys Chemical compound OC[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(O)=O UBRMZSHOOIVJPW-SRVKXCTJSA-N 0.000 description 1
- QJKPECIAWNNKIT-KKUMJFAQSA-N Ser-Lys-Tyr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O QJKPECIAWNNKIT-KKUMJFAQSA-N 0.000 description 1
- UGGWCAFQPKANMW-FXQIFTODSA-N Ser-Met-Ala Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](C)C(O)=O UGGWCAFQPKANMW-FXQIFTODSA-N 0.000 description 1
- ADJDNJCSPNFFPI-FXQIFTODSA-N Ser-Pro-Ala Chemical compound OC(=O)[C@H](C)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)CO ADJDNJCSPNFFPI-FXQIFTODSA-N 0.000 description 1
- RHAPJNVNWDBFQI-BQBZGAKWSA-N Ser-Pro-Gly Chemical compound OC[C@H](N)C(=O)N1CCC[C@H]1C(=O)NCC(O)=O RHAPJNVNWDBFQI-BQBZGAKWSA-N 0.000 description 1
- FKYWFUYPVKLJLP-DCAQKATOSA-N Ser-Pro-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)CO FKYWFUYPVKLJLP-DCAQKATOSA-N 0.000 description 1
- DINQYZRMXGWWTG-GUBZILKMSA-N Ser-Pro-Pro Chemical compound OC[C@H](N)C(=O)N1CCC[C@H]1C(=O)N1[C@H](C(O)=O)CCC1 DINQYZRMXGWWTG-GUBZILKMSA-N 0.000 description 1
- SRSPTFBENMJHMR-WHFBIAKZSA-N Ser-Ser-Gly Chemical compound OC[C@H](N)C(=O)N[C@@H](CO)C(=O)NCC(O)=O SRSPTFBENMJHMR-WHFBIAKZSA-N 0.000 description 1
- KKKVOZNCLALMPV-XKBZYTNZSA-N Ser-Thr-Glu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCC(O)=O)C(O)=O KKKVOZNCLALMPV-XKBZYTNZSA-N 0.000 description 1
- QNBVFKZSSRYNFX-CUJWVEQBSA-N Ser-Thr-His Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)NC(=O)[C@H](CO)N)O QNBVFKZSSRYNFX-CUJWVEQBSA-N 0.000 description 1
- NADLKBTYNKUJEP-KATARQTJSA-N Ser-Thr-Leu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(O)=O NADLKBTYNKUJEP-KATARQTJSA-N 0.000 description 1
- VLMIUSLQONKLDV-HEIBUPTGSA-N Ser-Thr-Thr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O VLMIUSLQONKLDV-HEIBUPTGSA-N 0.000 description 1
- 241000700584 Simplexvirus Species 0.000 description 1
- 108091081024 Start codon Proteins 0.000 description 1
- STGXWWBXWXZOER-MBLNEYKQSA-N Thr-Ala-His Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CC1=CN=CN1 STGXWWBXWXZOER-MBLNEYKQSA-N 0.000 description 1
- KEGBFULVYKYJRD-LFSVMHDDSA-N Thr-Ala-Phe Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 KEGBFULVYKYJRD-LFSVMHDDSA-N 0.000 description 1
- TWLMXDWFVNEFFK-FJXKBIBVSA-N Thr-Arg-Gly Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)NCC(O)=O TWLMXDWFVNEFFK-FJXKBIBVSA-N 0.000 description 1
- TZKPNGDGUVREEB-FOHZUACHSA-N Thr-Asn-Gly Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](CC(N)=O)C(=O)NCC(O)=O TZKPNGDGUVREEB-FOHZUACHSA-N 0.000 description 1
- GNHRVXYZKWSJTF-HJGDQZAQSA-N Thr-Asp-Lys Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)O)N)O GNHRVXYZKWSJTF-HJGDQZAQSA-N 0.000 description 1
- XOTBWOCSLMBGMF-SUSMZKCASA-N Thr-Glu-Thr Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O XOTBWOCSLMBGMF-SUSMZKCASA-N 0.000 description 1
- XFTYVCHLARBHBQ-FOHZUACHSA-N Thr-Gly-Asn Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)NCC(=O)N[C@@H](CC(N)=O)C(O)=O XFTYVCHLARBHBQ-FOHZUACHSA-N 0.000 description 1
- DJDSEDOKJTZBAR-ZDLURKLDSA-N Thr-Gly-Ser Chemical compound C[C@@H](O)[C@H](N)C(=O)NCC(=O)N[C@@H](CO)C(O)=O DJDSEDOKJTZBAR-ZDLURKLDSA-N 0.000 description 1
- AMXMBCAXAZUCFA-RHYQMDGZSA-N Thr-Leu-Arg Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O AMXMBCAXAZUCFA-RHYQMDGZSA-N 0.000 description 1
- VTVVYQOXJCZVEB-WDCWCFNPSA-N Thr-Leu-Glu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O VTVVYQOXJCZVEB-WDCWCFNPSA-N 0.000 description 1
- FLPZMPOZGYPBEN-PPCPHDFISA-N Thr-Leu-Ile Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O FLPZMPOZGYPBEN-PPCPHDFISA-N 0.000 description 1
- PRNGXSILMXSWQQ-OEAJRASXSA-N Thr-Leu-Phe Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O PRNGXSILMXSWQQ-OEAJRASXSA-N 0.000 description 1
- IJVNLNRVDUTWDD-MEYUZBJRSA-N Thr-Leu-Tyr Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O IJVNLNRVDUTWDD-MEYUZBJRSA-N 0.000 description 1
- BDGBHYCAZJPLHX-HJGDQZAQSA-N Thr-Lys-Asn Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(O)=O BDGBHYCAZJPLHX-HJGDQZAQSA-N 0.000 description 1
- WYLAVUAWOUVUCA-XVSYOHENSA-N Thr-Phe-Asp Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(O)=O)C(O)=O WYLAVUAWOUVUCA-XVSYOHENSA-N 0.000 description 1
- VEIKMWOMUYMMMK-FCLVOEFKSA-N Thr-Phe-Phe Chemical compound C([C@H](NC(=O)[C@@H](N)[C@H](O)C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=CC=C1 VEIKMWOMUYMMMK-FCLVOEFKSA-N 0.000 description 1
- KVEWWQRTAVMOFT-KJEVXHAQSA-N Thr-Tyr-Val Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](C(C)C)C(O)=O KVEWWQRTAVMOFT-KJEVXHAQSA-N 0.000 description 1
- KPMIQCXJDVKWKO-IFFSRLJSSA-N Thr-Val-Glu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O KPMIQCXJDVKWKO-IFFSRLJSSA-N 0.000 description 1
- BTAJAOWZCWOHBU-HSHDSVGOSA-N Thr-Val-Trp Chemical compound C1=CC=C2C(C[C@H](NC(=O)[C@@H](NC(=O)[C@@H](N)[C@@H](C)O)C(C)C)C(O)=O)=CNC2=C1 BTAJAOWZCWOHBU-HSHDSVGOSA-N 0.000 description 1
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 description 1
- 239000004473 Threonine Substances 0.000 description 1
- OFNPHOGOJLNVLL-KCTSRDHCSA-N Trp-Ala-His Chemical compound C[C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)NC(=O)[C@H](CC2=CNC3=CC=CC=C32)N OFNPHOGOJLNVLL-KCTSRDHCSA-N 0.000 description 1
- HYVLNORXQGKONN-NUTKFTJISA-N Trp-Ala-Lys Chemical compound C1=CC=C2C(C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](CCCCN)C(O)=O)=CNC2=C1 HYVLNORXQGKONN-NUTKFTJISA-N 0.000 description 1
- CMXACOZDEJYZSK-XIRDDKMYSA-N Trp-Leu-Cys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)N CMXACOZDEJYZSK-XIRDDKMYSA-N 0.000 description 1
- BABINGWMZBWXIX-BPUTZDHNSA-N Trp-Val-Ser Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)N BABINGWMZBWXIX-BPUTZDHNSA-N 0.000 description 1
- 102000004142 Trypsin Human genes 0.000 description 1
- 108090000631 Trypsin Proteins 0.000 description 1
- NSOMQRHZMJMZIE-GVARAGBVSA-N Tyr-Ala-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 NSOMQRHZMJMZIE-GVARAGBVSA-N 0.000 description 1
- QOEZFICGUZTRFX-IHRRRGAJSA-N Tyr-Cys-Val Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CS)C(=O)N[C@@H](C(C)C)C(O)=O QOEZFICGUZTRFX-IHRRRGAJSA-N 0.000 description 1
- NZFCWALTLNFHHC-JYJNAYRXSA-N Tyr-Glu-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 NZFCWALTLNFHHC-JYJNAYRXSA-N 0.000 description 1
- GFJXBLSZOFWHAW-JYJNAYRXSA-N Tyr-His-Glu Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CCC(O)=O)C(O)=O GFJXBLSZOFWHAW-JYJNAYRXSA-N 0.000 description 1
- JHORGUYURUBVOM-KKUMJFAQSA-N Tyr-His-Ser Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CO)C(O)=O JHORGUYURUBVOM-KKUMJFAQSA-N 0.000 description 1
- OHOVFPKXPZODHS-SJWGOKEGSA-N Tyr-Ile-Pro Chemical compound CC[C@H](C)[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CC2=CC=C(C=C2)O)N OHOVFPKXPZODHS-SJWGOKEGSA-N 0.000 description 1
- PMHLLBKTDHQMCY-ULQDDVLXSA-N Tyr-Lys-Val Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(O)=O PMHLLBKTDHQMCY-ULQDDVLXSA-N 0.000 description 1
- GQVZBMROTPEPIF-SRVKXCTJSA-N Tyr-Ser-Asp Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(O)=O GQVZBMROTPEPIF-SRVKXCTJSA-N 0.000 description 1
- HRHYJNLMIJWGLF-BZSNNMDCSA-N Tyr-Ser-Phe Chemical compound C([C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=C(O)C=C1 HRHYJNLMIJWGLF-BZSNNMDCSA-N 0.000 description 1
- DDRBQONWVBDQOY-GUBZILKMSA-N Val-Ala-Arg Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](CCCN=C(N)N)C(O)=O DDRBQONWVBDQOY-GUBZILKMSA-N 0.000 description 1
- ZLFHAAGHGQBQQN-GUBZILKMSA-N Val-Ala-Pro Natural products CC(C)[C@H](N)C(=O)N[C@@H](C)C(=O)N1CCC[C@H]1C(O)=O ZLFHAAGHGQBQQN-GUBZILKMSA-N 0.000 description 1
- JYVKKBDANPZIAW-AVGNSLFASA-N Val-Arg-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](C(C)C)N JYVKKBDANPZIAW-AVGNSLFASA-N 0.000 description 1
- ISERLACIZUGCDX-ZKWXMUAHSA-N Val-Asp-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](C(C)C)N ISERLACIZUGCDX-ZKWXMUAHSA-N 0.000 description 1
- VLOYGOZDPGYWFO-LAEOZQHASA-N Val-Asp-Glu Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O VLOYGOZDPGYWFO-LAEOZQHASA-N 0.000 description 1
- OVLIFGQSBSNGHY-KKHAAJSZSA-N Val-Asp-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](C(C)C)N)O OVLIFGQSBSNGHY-KKHAAJSZSA-N 0.000 description 1
- LHADRQBREKTRLR-DCAQKATOSA-N Val-Cys-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CS)NC(=O)[C@H](C(C)C)N LHADRQBREKTRLR-DCAQKATOSA-N 0.000 description 1
- SRWWRLKBEJZFPW-IHRRRGAJSA-N Val-Cys-Phe Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)N SRWWRLKBEJZFPW-IHRRRGAJSA-N 0.000 description 1
- GBESYURLQOYWLU-LAEOZQHASA-N Val-Glu-Asp Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CC(=O)O)C(=O)O)N GBESYURLQOYWLU-LAEOZQHASA-N 0.000 description 1
- ROLGIBMFNMZANA-GVXVVHGQSA-N Val-Glu-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](C(C)C)N ROLGIBMFNMZANA-GVXVVHGQSA-N 0.000 description 1
- XBRMBDFYOFARST-AVGNSLFASA-N Val-His-Val Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)N[C@@H](C(C)C)C(=O)O)N XBRMBDFYOFARST-AVGNSLFASA-N 0.000 description 1
- PYPZMFDMCCWNST-NAKRPEOUSA-N Val-Ile-Cys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](C(C)C)N PYPZMFDMCCWNST-NAKRPEOUSA-N 0.000 description 1
- LYERIXUFCYVFFX-GVXVVHGQSA-N Val-Leu-Glu Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)O)NC(=O)[C@H](C(C)C)N LYERIXUFCYVFFX-GVXVVHGQSA-N 0.000 description 1
- AEMPCGRFEZTWIF-IHRRRGAJSA-N Val-Leu-Lys Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(O)=O AEMPCGRFEZTWIF-IHRRRGAJSA-N 0.000 description 1
- RQOMPQGUGBILAG-AVGNSLFASA-N Val-Met-Leu Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(C)C)C(O)=O RQOMPQGUGBILAG-AVGNSLFASA-N 0.000 description 1
- ZEBRMWPTJNHXAJ-JYJNAYRXSA-N Val-Phe-Met Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCSC)C(=O)O)N ZEBRMWPTJNHXAJ-JYJNAYRXSA-N 0.000 description 1
- YKNOJPJWNVHORX-UNQGMJICSA-N Val-Phe-Thr Chemical compound CC(C)[C@H](N)C(=O)N[C@H](C(=O)N[C@@H]([C@@H](C)O)C(O)=O)CC1=CC=CC=C1 YKNOJPJWNVHORX-UNQGMJICSA-N 0.000 description 1
- RYQUMYBMOJYYDK-NHCYSSNCSA-N Val-Pro-Glu Chemical compound CC(C)[C@@H](C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(=O)O)C(=O)O)N RYQUMYBMOJYYDK-NHCYSSNCSA-N 0.000 description 1
- VIKZGAUAKQZDOF-NRPADANISA-N Val-Ser-Glu Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCC(O)=O VIKZGAUAKQZDOF-NRPADANISA-N 0.000 description 1
- LCHZBEUVGAVMKS-RHYQMDGZSA-N Val-Thr-Leu Chemical compound CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@@H](N)C(C)C)[C@@H](C)O)C(O)=O LCHZBEUVGAVMKS-RHYQMDGZSA-N 0.000 description 1
- VBTFUDNTMCHPII-UHFFFAOYSA-N Val-Trp-Tyr Natural products C=1NC2=CC=CC=C2C=1CC(NC(=O)C(N)C(C)C)C(=O)NC(C(O)=O)CC1=CC=C(O)C=C1 VBTFUDNTMCHPII-UHFFFAOYSA-N 0.000 description 1
- MIAZWUMFUURQNP-YDHLFZDLSA-N Val-Tyr-Asn Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)N[C@@H](CC(=O)N)C(=O)O)N MIAZWUMFUURQNP-YDHLFZDLSA-N 0.000 description 1
- RTJPAGFXOWEBAI-SRVKXCTJSA-N Val-Val-Arg Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CCCN=C(N)N RTJPAGFXOWEBAI-SRVKXCTJSA-N 0.000 description 1
- NLNCNKIVJPEFBC-DLOVCJGASA-N Val-Val-Glu Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CCC(O)=O NLNCNKIVJPEFBC-DLOVCJGASA-N 0.000 description 1
- AOILQMZPNLUXCM-AVGNSLFASA-N Val-Val-Lys Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CCCCN AOILQMZPNLUXCM-AVGNSLFASA-N 0.000 description 1
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 1
- 102000005421 acetyltransferase Human genes 0.000 description 1
- 108020002494 acetyltransferase Proteins 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 230000002378 acidificating effect Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 239000002671 adjuvant Substances 0.000 description 1
- 238000001261 affinity purification Methods 0.000 description 1
- 235000004279 alanine Nutrition 0.000 description 1
- 125000003295 alanine group Chemical group N[C@@H](C)C(=O)* 0.000 description 1
- 108010005233 alanylglutamic acid Proteins 0.000 description 1
- 108010047495 alanylglycine Proteins 0.000 description 1
- 125000003172 aldehyde group Chemical group 0.000 description 1
- 230000029936 alkylation Effects 0.000 description 1
- 238000005804 alkylation reaction Methods 0.000 description 1
- 108010050025 alpha-glutamyltryptophan Proteins 0.000 description 1
- 125000003277 amino group Chemical group 0.000 description 1
- 210000004102 animal cell Anatomy 0.000 description 1
- 238000005349 anion exchange Methods 0.000 description 1
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 1
- 108010013835 arginine glutamate Proteins 0.000 description 1
- 229960001230 asparagine Drugs 0.000 description 1
- 235000009582 asparagine Nutrition 0.000 description 1
- 108010077245 asparaginyl-proline Proteins 0.000 description 1
- 108010047857 aspartylglycine Proteins 0.000 description 1
- 108010058966 bacteriophage T7 induced DNA polymerase Proteins 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 102000005936 beta-Galactosidase Human genes 0.000 description 1
- 108010005774 beta-Galactosidase Proteins 0.000 description 1
- 230000008827 biological function Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 229960002685 biotin Drugs 0.000 description 1
- 235000020958 biotin Nutrition 0.000 description 1
- 239000011616 biotin Substances 0.000 description 1
- 238000010804 cDNA synthesis Methods 0.000 description 1
- 239000001506 calcium phosphate Substances 0.000 description 1
- 229910000389 calcium phosphate Inorganic materials 0.000 description 1
- 235000011010 calcium phosphates Nutrition 0.000 description 1
- 125000003178 carboxy group Chemical group [H]OC(*)=O 0.000 description 1
- 238000004113 cell culture Methods 0.000 description 1
- 230000004663 cell proliferation Effects 0.000 description 1
- 230000010307 cell transformation Effects 0.000 description 1
- 230000033077 cellular process Effects 0.000 description 1
- 239000003184 complementary RNA Substances 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 229960003067 cystine Drugs 0.000 description 1
- 230000009089 cytolysis Effects 0.000 description 1
- 210000000805 cytoplasm Anatomy 0.000 description 1
- 238000004925 denaturation Methods 0.000 description 1
- 230000036425 denaturation Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000005546 dideoxynucleotide Substances 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- FSXRLASFHBWESK-UHFFFAOYSA-N dipeptide phenylalanyl-tyrosine Natural products C=1C=C(O)C=CC=1CC(C(O)=O)NC(=O)C(N)CC1=CC=CC=C1 FSXRLASFHBWESK-UHFFFAOYSA-N 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 230000002888 effect on disease Effects 0.000 description 1
- 239000012636 effector Substances 0.000 description 1
- 238000004520 electroporation Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006862 enzymatic digestion Effects 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 238000009585 enzyme analysis Methods 0.000 description 1
- 210000002919 epithelial cell Anatomy 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 208000020603 familial colorectal cancer Diseases 0.000 description 1
- 231100000221 frame shift mutation induction Toxicity 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 108010063718 gamma-glutamylaspartic acid Proteins 0.000 description 1
- 230000004077 genetic alteration Effects 0.000 description 1
- 231100000118 genetic alteration Toxicity 0.000 description 1
- 238000003167 genetic complementation Methods 0.000 description 1
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 description 1
- 108010013768 glutamyl-aspartyl-proline Proteins 0.000 description 1
- 229960003180 glutathione Drugs 0.000 description 1
- 108010084264 glycyl-glycyl-cysteine Proteins 0.000 description 1
- 108010067216 glycyl-glycyl-glycine Proteins 0.000 description 1
- XKUKSGPZAADMRA-UHFFFAOYSA-N glycyl-glycyl-glycine Natural products NCC(=O)NCC(=O)NCC(O)=O XKUKSGPZAADMRA-UHFFFAOYSA-N 0.000 description 1
- 108010026364 glycyl-glycyl-leucine Proteins 0.000 description 1
- 108010017446 glycyl-prolyl-arginyl-proline Proteins 0.000 description 1
- 108010015792 glycyllysine Proteins 0.000 description 1
- 108010077515 glycylproline Proteins 0.000 description 1
- 108010084389 glycyltryptophan Proteins 0.000 description 1
- 239000000833 heterodimer Substances 0.000 description 1
- HNDVDQJCIGZPNO-UHFFFAOYSA-N histidine Natural products OC(=O)C(N)CC1=CN=CN1 HNDVDQJCIGZPNO-UHFFFAOYSA-N 0.000 description 1
- 230000006801 homologous recombination Effects 0.000 description 1
- 238000002744 homologous recombination Methods 0.000 description 1
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 1
- 230000001900 immune effect Effects 0.000 description 1
- 102000018358 immunoglobulin Human genes 0.000 description 1
- 229940072221 immunoglobulins Drugs 0.000 description 1
- 238000000099 in vitro assay Methods 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 150000002519 isoleucine derivatives Chemical class 0.000 description 1
- 108010027338 isoleucylcysteine Proteins 0.000 description 1
- BPHPUYQFMNQIOC-NXRLNHOXSA-N isopropyl beta-D-thiogalactopyranoside Chemical compound CC(C)S[C@@H]1O[C@H](CO)[C@H](O)[C@H](O)[C@H]1O BPHPUYQFMNQIOC-NXRLNHOXSA-N 0.000 description 1
- 229960000318 kanamycin Drugs 0.000 description 1
- 229930027917 kanamycin Natural products 0.000 description 1
- 229930182823 kanamycin A Natural products 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 210000004698 lymphocyte Anatomy 0.000 description 1
- 108010009298 lysylglutamic acid Proteins 0.000 description 1
- 108010054155 lysyllysine Proteins 0.000 description 1
- 108010038320 lysylphenylalanine Proteins 0.000 description 1
- 108010017391 lysylvaline Proteins 0.000 description 1
- 229910001629 magnesium chloride Inorganic materials 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000007257 malfunction Effects 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 108010056582 methionylglutamic acid Proteins 0.000 description 1
- 108010085203 methionylmethionine Proteins 0.000 description 1
- 230000009456 molecular mechanism Effects 0.000 description 1
- 210000000066 myeloid cell Anatomy 0.000 description 1
- 229960004927 neomycin Drugs 0.000 description 1
- 239000002853 nucleic acid probe Substances 0.000 description 1
- 210000004940 nucleus Anatomy 0.000 description 1
- 239000002751 oligonucleotide probe Substances 0.000 description 1
- 210000003463 organelle Anatomy 0.000 description 1
- 239000012071 phase Substances 0.000 description 1
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 description 1
- 108010084572 phenylalanyl-valine Proteins 0.000 description 1
- 108010073025 phenylalanylphenylalanine Proteins 0.000 description 1
- 108010051242 phenylalanylserine Proteins 0.000 description 1
- 108010083476 phenylalanyltryptophan Proteins 0.000 description 1
- 239000013600 plasmid vector Substances 0.000 description 1
- 238000011533 pre-incubation Methods 0.000 description 1
- 108010070643 prolylglutamic acid Proteins 0.000 description 1
- 108010090894 prolylleucine Proteins 0.000 description 1
- 239000012429 reaction media Substances 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000022532 regulation of transcription, DNA-dependent Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000003248 secreting effect Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 108010026333 seryl-proline Proteins 0.000 description 1
- 150000003384 small molecules Chemical class 0.000 description 1
- 238000002415 sodium dodecyl sulfate polyacrylamide gel electrophoresis Methods 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 210000001082 somatic cell Anatomy 0.000 description 1
- 238000001179 sorption measurement Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 210000000952 spleen Anatomy 0.000 description 1
- 210000004988 splenocyte Anatomy 0.000 description 1
- 108010005652 splenotritin Proteins 0.000 description 1
- 238000010186 staining Methods 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 239000007858 starting material Substances 0.000 description 1
- 125000004079 stearyl group Chemical group [H]C([*])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])[H] 0.000 description 1
- 239000006228 supernatant Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 108010061238 threonyl-glycine Proteins 0.000 description 1
- 108010031491 threonyl-lysyl-glutamic acid Proteins 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- QORWJWZARLRLPR-UHFFFAOYSA-H tricalcium bis(phosphate) Chemical compound [Ca+2].[Ca+2].[Ca+2].[O-]P([O-])([O-])=O.[O-]P([O-])([O-])=O QORWJWZARLRLPR-UHFFFAOYSA-H 0.000 description 1
- YNJBWRMUSHSURL-UHFFFAOYSA-N trichloroacetic acid Chemical compound OC(=O)C(Cl)(Cl)Cl YNJBWRMUSHSURL-UHFFFAOYSA-N 0.000 description 1
- PIEPQKCYPFFYMG-UHFFFAOYSA-N tris acetate Chemical compound CC(O)=O.OCC(N)(CO)CO PIEPQKCYPFFYMG-UHFFFAOYSA-N 0.000 description 1
- 239000012588 trypsin Substances 0.000 description 1
- 108010045269 tryptophyltryptophan Proteins 0.000 description 1
- 108010078580 tyrosylleucine Proteins 0.000 description 1
- 241000701161 unidentified adenovirus Species 0.000 description 1
- 108700026220 vif Genes Proteins 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K14/00—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
- C07K14/435—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
- C07K14/46—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
- C07K14/47—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K14/00—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
- C07K14/82—Translation products from oncogenes
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K16/00—Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies
- C07K16/18—Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies against material from animals or humans
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- A—HUMAN NECESSITIES
- A01—AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
- A01K—ANIMAL HUSBANDRY; AVICULTURE; APICULTURE; PISCICULTURE; FISHING; REARING OR BREEDING ANIMALS, NOT OTHERWISE PROVIDED FOR; NEW BREEDS OF ANIMALS
- A01K2217/00—Genetically modified animals
- A01K2217/05—Animals comprising random inserted nucleic acids (transgenic)
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K38/00—Medicinal preparations containing peptides
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K48/00—Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/136—Screening for pharmacological compounds
Definitions
- This invention relates to the area of cancer prevention, diagnosis and therapeutics.
- the invention is concerned with methods for detection of a novel mismatch binding protein, termed GTBP (Guanine Timine Binding Protein) , which mediates the repair of genetic information, with the nucleic acid sequence encoding the protein and with processes for obtaining the protein and producing it by recombinant genetic engineering techniques.
- GTBP Guide Timine Binding Protein
- the present invention also relates to detection of mutated GTBP gene in tumour tissues and to prevention and early diagnosis of human colorectal cancers.
- the link between the biological function of hMSH2 and the phenotype of the CRC tumors was forged when (i) the hMSH2 gene was shown to segregate with a known CRC locus on chromosome 2p (10,11), (ii) the hMSH2-deficient cell line LoVo was shown to be deficient in mismatch repair (12) as well as in mismatch-binding activity (12) and (iii) the genome of this cell line exhibited a marked instability of microsatollite sequences (14) .
- GTBP for G/T binding protein
- G/T mispair binding protein A mismatch-binding factor, GTBP (for G/T binding protein) , originally identified in HeLa cells by the present inventors (15) , was shown to bind preferentially to heteroduplexes containing G/T mispairs. Purification of this DNA binding activity by G/T mismatch affinity chromatography yielded a mixture of two polypeptides of apparent molecular weights of 100 and 160 kDa (16) , indicating that the mismatch-specific complex was composed of two proteins. The 100 kDa constituent of the complex was demonstrated to be hMSH2 (17) . The present discovery implies that hMSH2 acts as a complex with GTBP in the correction of base/base mispairs and one- or two-nucleotide loops.
- GTBP is necessary but not indispensible in the correction of larger insertion/deletion loops.
- a number of tumors have been shown to display mutator phenotypes which are consistent with the functional role of the hMSH2-GTBP complex (20-24) .
- Prior to the current discovery and characterization of GTBP no specific role in the repair of genetic information and no hereditary defect had been associated with this protein or with the gene encoding it.
- GTBP 1360-amino acid sequence corresponding to the polypeptide referred to as GTBP.
- GTBP is used to indicate a compound polypeptide combining in order the amino acid sequences indicated in SEQ ID NO:15 (from amino acid 1 to 68) and SEQ ID NO:l (from amino acid 1 to 1292) .
- SEQ ID NO:15 from amino acid 1 to 68
- SEQ ID NO:l from amino acid 1 to 1292
- GTBP 1360-amino acid sequence corresponding to as GTBP.
- the whole coding gene GTBP indicates a compound DNA sequence combining in order the nucleotide sequences indicated in SEQ ID NO:16 (from nucleotide 1 to 204) and SEQ ID NO:12 (from nucleotide 1 to 3980) .
- a further object of the present invention is to provide a genetic construct capable of expressing a 1360- a ino acid peptide of molecular mass 153 kDa referred to as GTBP.
- CRC colorectal cancers
- CRC human colorectal cancers
- sequence of a 1360-amino acid polypeptide is provided corresponding to the protein referred to as GTBP.
- a cDNA molecule which comprises the coding sequence of the GTBP gene.
- the sequence of said primers is internal to chromosome 2pl6, said pairs of primers allowing the syntesis of GTBP gene or of parts of it.
- a nucleic acid probe is provided which is complementary to human wild-type GTBP gene coding sequence and which can form mismatches when annealed with mutant GTBP alleles, thereby making possible the detection of heteroduplex DNA as revealed by shifts in electrophoretic mobility either with or without prior enzymatic or chemical cleavage.
- a procedure for the detection of wild-type or mutated GTBP protein in humans, comprising: isolating a human sample selected from the tissue or body fluid and detecting the wild-type or the altered GTBP protein itself or in any complex formed by the association of GTBP with other polypeptides.
- a method for the assessment of the activitiy of (i) the wild-type GTBP protein or (ii) of derived peptides obtained by deletion or insertion of known amino acid sequences in GTBP protein or (iii) of the altered GTBP protein as the result of in vivo mutational events or (iv) of any complex formed by the association of peptides just mentioned in (i) , (ii) , (iii) , and (iv) of the present embodiment with other polypeptides.
- a method for the detection of cancer in humans comprising: isolating a human sample selected from the tissue or body fluid; detecting the alteration in the GTBP gene or in the expressed polypeptide (GTBP protein) itself or in any complex formed by the association of GTBP with other polypeptides, said alteration indicating the predisposition to neoplastic transformation or the presence of cancer.
- GTBP protein expressed polypeptide
- a method of diagnosing or prognosing neoplastic tissue of a human comprising: detecting somatic alterations in wild-type GTBP alleles or their expression products in human colorectal cancers (CRC) , said alteration indicating neoplasia of the tissue.
- a method for the detection of genetic predisposition to CRC comprising: isolating a human sample selected from the group consisting of blood, bioptic samples of tissues, esfoliative cells and any other generic human sample; detecting the alteration in the GTBP gene or in the expressed polypeptide (GTBP protein) itself or in any complex formed by the association of GTBP with other polypeptides, said alteration indicating genetic predisposition to cancer.
- a method for supplying wild-type GTBP gene function to a cell which has lost said gene function by virtue of any mutation in the GTBP gene comprising: introducing wild type GTBP gene into a cell which has lost said gene function such that GTBP gene is then expressed at wild-type level in the cell.
- GTBP protein can also be applied to cells or administered to animals to remediate defects in GTBP gene function.
- a method is provided to supply a portion of wild-type GTBP gene to a cell which has lost the said gene such that the said portion is expressed in the cells and encodes part of the GTBP protein which is required for non-neoplastic growth of the said cell. It is another embodiment of the present invention the generation of transgenic animals carrying a mutated GTBP gene derived from a second species or a mutated GTBP gene generated in vi tro by genetic engineering techniques. In another embodiment of the present invention a method of testing therapeutic agents for the ability to suppress a neoplastically trasformed phenotype is provided.
- the method comprises: applying a test substance to a cultured epithelial cell which carries a mutation of the GTBP gene and determining whether the substance suppresses the neoplastic phenotype of the cell or suppresses the growth of already developed tumors.
- a method of testing therapeutic agents for the ability to suppress a neoplastically trasformed phenotype comprises: applying a test substance to an animal which carries a mutation of the GTBP gene and determining whether the substance prevents neoplastic transformation of defined tissues or suppresses the growth of already developed tumors.
- the present information provides the art with the information that the GTBP gene, a heretofore unknown gene, encodes the GTBP protein which acts as specific mismatch-binding factor.
- GTBP binds preferentially to heteroduplexes containing G/T mispairs and one- or two- nucleotide loops. Purification of this DNA binding activity made it possible to establish that the mismatch- specific factor is in fact a complex composed of two distinct proteins.
- the smaller constituent of the complex (about 100 kDa) is the hMSH2 protein (17) whereas the larger component (about 160 kDa) is GTBP.
- the present invention provides the technical tools for the detection and for the activity assessment of GTBP alone or as a complex with hMSH2.
- the GTBP gene is a target of mutational events, these alterations being associated with tumorigenesis.
- This discovery allows highly specific assays to be performed to determine the neoplastic status of a particular tissue or the predisposition to cancer of individuals.
- a number of tumors have been shown to display mutator phenotypes with a similarly low degree of microsatellite instability (20-24) consistent with the functional role of the hMSH2-GTBP complex.
- Prior to the current discovery and characterization of GTBP no specific role in the repair of genetic information and no hereditary defect had been associated with this protein.
- Figure 1 a shows the commercial phagemid vector pBluescript SK" (Stratagene) used for cloning and sequencing the GTBP cDNA.
- the DNA fragment shown in SEQ ID NO: 12 was cloned between the EcoRI and Xhol sites of the vector, b shows the commercial pCITE 2b vector.
- the insert described in SEQ ID NO: 12 was inserted between the EcoRI and Xhol sites of the vector.
- Ampicillin beta-lactamase gene for ampicillin resistance
- ColEl ori origin of replication derived from plasmid
- ColEl fl origin of replication of phage
- IacZ alpha peptide of beta-galactosidase used for genetic complementation
- MCS multiple cloning site containing the recognition sequences of the listed restriction enzymes
- T3 and T7 promoter sequences from phages T3 and T7.
- Figure 2 shows the commercial plasmid vector pGEX-3x (Pharmacia Biotech) that was used for cloning of the PCR fragments corresponding to amino acid residues 27 to 158 of hMSH2 and 750 to 928 of GTBP (SEQ ID NO:l) .
- Primers used for amplification were:
- Figure 3 shows an alignment of the amino acid sequences of the conserved C-terminal regions of the four mismatch binding proteins, i.e. GTBP (ff. sapiens) , hMSH2
- Figure 4 shows the sequence homology, at the protein level, between pairs of MSH family members.
- Section a shows the matrix obtained from the alignment of GTBP (on the abscissa) with the yeast GTBP homolog (GenBank accession number Z47746, on the ordinate); the two proteins show comparable length and a significant homology is evident throughout their whole sequence.
- Section b shows the matrix obtained from the alignment of yeast MSH2 (on the ordinate) with GTBP (on the abscissa) ; the proteins show different lengths and most of the homology is confined to the C-rerminal regions of the two sequences.
- Section c shows the matrix obtained from the alignment of human MSH2 protein (on the ordinate) with GTBP (on the abscissa) ; the proteins show different lengths and, also in this case, most of the homology is confined to the C-rerminal regions of the two sequences.
- Section d shows the matrix obtained from the alignment of human hMSH2 protein (on the ordinate) with the yeast MSH2 (on the abscissa) ; the two proteins show comparable length and the homology is evident throughout the entire sequence.
- Figure 5 shows the effect of selective anti-hMSH2 and anti-GTBP antisera on the formation of the specific mismatch-binding complex.
- Pre-incubation of HeLa nuclear extracts with either antiserum prior to addition of the G/T heteroduplex DNA probe results in a diminuition of the specific band in the gel-shift assay, an effect not observed when the respective pre-immune sera were used.
- This figure proves that both hMSH2 and GTBP are present in the mismatch-binding factor.
- This gel-shift analysis was carried out as described in ref.15, except that nuclear extracts were used (25) .
- the antisera were added to the reaction mixtures 20 min prior to the addition of the radioactively-labelled probe.
- the figure is an autoradiogram of a native 6% polyacrylamide gel run in Tris-acetate/EDTA (TAE) buffer prepared according to Maniatis et al . , Molecular cloning: a laboratory manual , Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., 1982.
- TAE Tris-acetate/EDTA
- Figure 6 shows that the mismatch-binding activity can be reconstituted using GTBP and hMSH2 obtained using an in vi tro translation system.
- the procedure followed to generate in vi tro transcripts of the hMSH2, Cl and FLY5 coding sequences was as follows : The DNA region encoding hMSH2 was inserted into pCite-1; Cl and FLY5 ORFs were introduced into pCite-2b (Novagen) .
- vi tro transcription and translation reactions were carried out as described in ref. 26, including a mock translation reaction in the absence of added DNA. S-labeled translation products were analysed on a SDS- polyacrylamide gel treated with Amplify (Amersham) , dried and autoradiographed.
- Section a is an autoradiogram of a denaturing 7.5% SDS-polyacrylamide gel showing that translation of hMSH2, GTBP (Cl) and FLY5 mRNAs in a reticulocyte lysate system (Promega) gave rise to expected polypeptides of 113, 142 and 122 kDa, respectively.
- Section b shows the gel-shift analysis which demonstrates the binding of the in vitro-translated proteins to the G/T heteroduplex. The figure is an autoradiogram of a native 6% polyacrylamide gel run in TAE buffer.
- Figure 7 shows that mismatch binding activity is absent from cell extracts lacking GTBP or hMSH2.
- the experiment is based on the analysis of two cell lines derived from CRC: LoVo cells contain a homozygous deletion of hMSH2 alleles and do not exhibit G/T binding activity (13) , while neither hMSH2 allele is mutated in DLDl cells, in spite of the fact that also this cell line lacks G/T binding activity.
- Section a shows a gel-shift assay showing that extracts of LoVo and DLDl fail to make mismatch-specific complexes.
- the G/C and G/T probes were obtained as described previously (15) . Experimental conditions were as in Figure 6.
- the figure is an autoradiogram of a native 6% polyacrylamide gel run in TAE buffer.
- Section b shows the Western blot analysis of extracts from Hela, LoVo and DLDl cells.
- the protein bands were visualized using an alkaline phosphatase- conjugated anti-rabbit IgG system (Promega) as directed by the manufacturer.
- the anti-GTBP and anti-hMSH2 antisera were used alone with the HeLa extract to demonstrate their selectivity for the 160 and 100 kDa proteins, respectively.
- both antisera were used together. Control HeLa cells revealed the presence of both hMSH2 and GTBP.
- the two CRC-derived tumor cell lines LoVo and DLDl were completely devoid of full-length hMSH2 and GTBP, respectively.
- the amounts of hMSH2 in DLDl cells and GTBP in LoVo cells were considerably lower than in HeLa cells. Since hMSH2 and GTBP bind heteroduplex DNA as a complex, the lack of one of the two proteins may cause instability of the second component of the complex.
- Figure 8 part a, shows the experimental approach followed to discover the amino-terminal region of GTBP (from amino acid 1 to 68 of SEQ ID NO:15) .
- 5' RACE method Radar Amplification cDNA Ends, given in detail in the publication Nicolaides, N.C. et al. Geno ics, 29: 229-234, 1995 and Nicolaides N.C. et al. Genomics, 30: 195-206, 1995
- oligonucleotides were used that pairs with the sequence given in SEQ ID NO:12 from nucleotide 114 to 133 (primary oligonucleotide A) and from nucleotide 56 to 74 (secondary oligonucleotide B) .
- the PCR reaction products were sequenced and it was possible to determine that the amplification product was capable of encoding the polypeptide DAAWSEAGPGPR, corresponding to amino acids 46-58 of the amino-terminal domain of GTBP as indicated in SEQ ID NO:15.
- oligonucleotides whose sequence was deduced from the initial RACE, complementary to the sequence given in SEQ ID NO:16 from nucleotide 188 to 204 (primary oligonucleotide C) and from oligonucleotide 169 to 185 (secondary oligonucleotide D) it was possible to amplify the GTBP- coding region 5' by-passing the methionine in position 1 of the amino acid sequence given in SEQ ID NO:15.
- the amplified clone termed KMN, contained the entire nucleotidic sequence given in SEQ ID NO:16.
- RACE analysis of leucocyte cDNA is shown in lanes 2 and 5, that of placenta cDNA in lanes 3 and 6.
- the products of lanes 1 to 3 derive from sequenced amplifications with oligonucleotides A and B, those in lanes 4 to 6 derive from sequenced amplifications with oligonucleotides C and D.
- Lanes 1 and 4 are the negative controls (absence of template) .
- the molecular weight markers are indicated at the side.
- Part b of figure 8 shows expression of the transcript encoding the protein GTBP using RT-PCR (PCR preceded by inverse transcription on RNA templates) .
- the RT-PCR was carried out using a synthetic oligonucleotide which paired with the sequence given in SEQ ID NO:12 from nucleotide 114 to 133 in the inverse transcription reaction followed by amplification with an oligonucleotide with a sequence equal to the end 5' of the GTBP transcript, that is 5'GGTGCTTTTAGGAGCCCCG3'.
- RNA used as a mold template taken from HeLa cells (lane 2) placenta (lane 3) leucocytes (lane 4) and cells from the colon (lane 5) ; these were incubated with
- Lane 1 is the negative control without RNA.
- the hMSH2/GTBP heterodimer is necessary for the correction of base/base mispairs and one or two- nucleotide loops.
- Genomic instability in tumor-derived cell-lines lacking GTBP demonstrates itself mainly in the form of small differences (e.g. in runs of A) rather than large changes in CA repeats, characteristic of phenotypes associated with the four known CRC loci hMSH2, hMLHl, hPMSl and hPMS2. Cancers displaying mutator phenotypes with a low degree of microsatellite instability (20-24) may be associated with a malfunction of GTBP. It is a discovery of the present invention that mutational events associated with tumorigenesis in CRC are due to defects in the GTBP gene.
- Novel compositions comprising generic sequences encoding the GTBP protein, as well as fragments derived therefrom are provided, together with recombinant proteins produced using the genomic sequences and methods of using these compositions.
- Exemplary amino acid and DNA sequences of the invention are set forth in SEQ ID NO: 1 - SEQ ID NO:15 and in SEQ ' ID NO: 12 - SEQ ID NO: 16. Standard abbreviations for nucleotides and amino acids are used in the Figures and elsewhere in this specification.
- GTBP- derived polypeptides are particularly preferred embodiments of the invention, although variations based on the specific sequences of these polypeptides are also part of the present invention.
- the invention (as it pertains to polypeptides per se) includes any polypeptide selected from the group consisting of:
- the genetic engineering aspects of the present invention include any recombinant DNA or RNA molecule comprising a DNA sequence encoding GTBP itself or GTBP-derived protein according to SEQ ID NO: 1 or a corresponding DNA or RNA sequence, or a subsequence thereof comprising at least 10 nucleotides.
- the present invention also focuses on diagnostic methodologies aimed to detect loss of GTBP function in humans and consequent predisposition to neoplasia. Defintion of terms
- Two nucleic acid fragments are "homologous" if they are capable of hybridizing to one another under hybridization conditions described in Maniatis et al . , (1982) , Molecular cloning: a laboratory manual . Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., pp. 320-325.
- wash conditions --2 x SSC, 0.1% SDS, room temperature twice, 30 minutes each; then 2 x SSC, 0.1% SDS, 50° C once, 30 minutes; then 2 x SSC, room temperature twice, 10 minutes each-- homologous sequences can be identified that contain at most about 25-30% base pair mismatches.
- homologous nucleic acid strand contains 15-25% base pair mismatches, even more preferably 5-15% base pair mismatches. These degrees of homology can be selected by using more stringent wash conditions for identification of clones from gene libraries (or other sources of genetic material), as is well known in the art.
- Two amino acid sequences are homologous if there is a partial or complete identity between their sequences. For example, 85% homology means that 85% of the amino acids are identical when the two sequences are aligned for maximum matching. Gaps (in either of the two sequences being matched) are allowed in maximizing matching gap lengths of 5 or less are preferred with 2 or less being more preferred.
- two protein sequences are homologous, as this term is used herein, if they have an alignment score of more than 5 (in standard deviation units) using the program ALIGN with the mutation data matrix and a gap penalty of 6 or greater (Dayhoff, M.O., in Atlas of Protein Sequence and Structure, 1972, volume 5, National Biomedical Research Foundation, pp. 101-110, and Supplement 2 to this volume, pp. 1-10) .
- the two sequences or parts thereof are more preferably homologous if their amino acids are greater than or equal to 50% identical when optimally aligned using the ALIGN program.
- a DNA fragment is "derived from" a GTBP-encoding DNA sequence if it has the same or substantially the same base pair sequence as a region of the coding sequence for GTBP protein molecule.
- substantially the same means, when referring to biological activities, that the activities are of the same type although they may differ in degree.
- amino acid sequences “substantially the same” means that the molecules in question have similar biological properties and preferably have at least 85 % homology in amino acid sequences. More preferably, the amino acid sequences are at least 90% identical. In other uses, "substantially the same” has its ordinary English language meaning.
- a protein is "derived from" GTBP if it has the same or substantially the same amino acid sequence as a region of the GTBP protein molecule.
- polypeptide derivatives of GTBP protein is meant polypeptides differing in length from the natural protein and containing five or more amino acids in the same primary order as found in the protein as obtained from a natural source.
- Polypetide molecules having substantially the same amino acid sequence as the natural protein but possessing minor amino acid substitutions which do not significantly affect the ability of the protein or polypeptide to interact with protein-specific molecules, such as antibodies and nucleic acids are within the definition as derived from GTBP.
- Derivatives include glycosylated forms, aggregative conjugates with other protein molecules and covalent conjugates with unrelated chemical moieties. Covalent derivatives are prepared by linkage of functionalities to groups which are found in the amino acid chain or at the N-or C-terminal residue by means known in the art.
- GTBP-specific molecules include polypeptides such as antibodies that are specific for the protein or polypeptide containing the naturally occurring GTBP amino acid sequence.
- specific binding polypetide are intended polypeptides that bind with GTBP protein and its derivatives and which have a measurably higher binding affinity for the target polypeptide than for other polypetides tested for binding. Higher affinity by a factor 10 is preferred, more preferably by a factor of 100. Binding affinity for antibodies refers to a single binding event (i.e., monovalent binding of an antibody molecule) . Specific binding by antibodies also means that binding takes place at the normal binding site of the molecule's antibody (at the end of the arms in the variable region) .
- Phenylanine, tryptophan, and tyrosine are sometimes classified jointly as aromatic amino acids.
- an isolated replacement of a leucine with an isoleucine or valine, an aspartate with a glutamate, a theonine with a serine, or a similar replacement of an amino acid with a structurally related amino acid will not have a major effect on the binding properties of the resulting molecule, especially if the replacement does not involve an amino acid at a binding site involved in the interaction of GTBP or its derivatives with an antibody or with a specific DNA recognition sequence.
- Whether an amino acid change results in a functional peptide can readily be determined by assaying the specific binding properties of the polypeptide derivative. Isolation of cDNA encoding GTBP protein
- Isolation of nucleotide sequences encoding GTBP protein involves creation of a cDNA library prepared from full-length mature messenger RNA extracted from cultured cells or tissues.
- Evidence is provided that GTBP is conserved over a broad evolutionary range, thus allowing the isolation of GTBP homologs from the genomes of phylogenetically distant species, i.e. from mammals to yeasts to bacteria.
- Genetic libraries can be made in either eukaryotic or prokaryotic host cells. Widely available cloning vectors such as plasmids, cosmids, phage, YACs and the like can be used to generate genomic libraries suitable for the isolation of nucleotide sequences encoding GTBP protein or portions thereof.
- Useful methods for screening genetic libraries for the presence of GTBP protein nucleotide sequences include the preparation of oligonucleotide probes based on the sequence information provided in SEQ ID NO: 1 and SEQ ID NO: 15 (after decoding of the amino acid sequence) as well as in SEQ ID NO:12 and SEQ ID NO: 16 (directly derived from the encoding DNA) of this patent.
- oligonucleotide sequences of about 17 base pairs or longer can be prepared by conventional in vi tro synthesis techniques.
- the resultant nucleic acid sequences can be subsequently labeled with radionuclides, enzymes, biotin, fluorescers or the like, and used as probes for screening the libraries.
- Additional methods of interest for isolating GTBP protein-encoding nucleic acid sequences include screening of genetic libraries for the expression of GTBP protein or fragments thereof by means of GTBP protein-specific antibodies, either polyclonal or monoclonal. Moreover, a selection method advisable for the screening of GTBP libraries cloned in conventional expression vectors is based on the specific binding of the protein (or of polypeptides contained therein) to heteroduplex DNA molecules containing G/T mimatches.
- a particularly preferred technique for isolating homolog proteins from related species or strains involves the use of degenerate primers based on partial amino acid sequences of GTBP protein and the polymerase chain reaction (PCR) to amplify gene segments between the primers.
- a similar approach can also be applied to generate double stranded cDNA molecules after amplification of mRNA with appropriate primers and polymerases.
- the gene can than be isolated using a specific hybridization probe based on the amplified gene segment, which is then analyzed for appropriate expression of the protein.
- the nucleotide sequence of the isolated genetic material which encodes GTBP protein can be obtained by sequencing the non-vector nucleotide sequences of these recombinant molecules. Nucleotide sequence information can be obtained by employing widely used DNA sequencing protocols, such as Maxam and Gilbert sequencing, dideoxy nucleotide sequencing according to Sanger, and the like. Examples of suitable nucleotide sequencing protocols can be found in Berger and Kimmel, Methods in Enzymology Vol 52 Guide to Molecular Cloning Techniques, (1987) Academic Press.
- Nucleotide sequence information from several recombinant DNA isolates may be combined so as to provide the entire amino acid coding sequence of GTBP, as well as the nucleotide sequences of upstream and downstream nucleotide sequences.
- Nucleotide sequences obtained from sequencing GTBP protein-specific genomic library isolates can be subjected to further analysis in order to identify regions of interest in the GTBP gene. These regions of interest include additional open reading frames, promoter sequences, termination sequences, and the like. Analysis of nucleotide sequence information is preferably performed by computer. Software suitable for analyzing nucleotide sequences for regions of interest is commercially available and includes, for example, DNASIS
- Isolated nucleotide sequences encoding GTBP protein can be used to produce purified GTBP protein or fragments thereof by either recombinant DNA methodology or by in vitro polypeptide synthesis techniques.
- purified and isolated is meant, when referring to a polypeptide or nucleotide sequence, that the indicated molecule is present in the substantial absence of other biological macromolecules of the same type.
- the term “purified” as used herein preferably means at least 95% by weight, more preferably at least 99% by weight, and most preferably at least 99.8% by weight, of biological macromolecules of the same type present (but water, buffers, and other small molecules, especially molecules having a molecular weight of less than 1000, can be present) .
- a significant advantage of producing GBTB protein by recombinant DNA techniques rather than by isolating from natural sources of GTBP protein is that equivalent quantities of GTBP protein can be produced by using less starting material than would be required for isolating GTBP protein from a natural source.
- Producing GTBP protein by recombinant techniques also permits GTBP protein to be isolated in the absence of some molecules normally present in cells that naturally produce GTBP protein. It is also apparent that recombinant DNA techniques can be used to produce GTBP protein polypeptide derivatives that are not found in nature, such as the variations described above.
- GTBP protein and polypeptide derivatives of GTBP protein can be expressed by recombinant techniques when a DNA sequence encoding the relevant molecule is functionally inserted into a vector.
- functionally inserted is meant in proper reading frame and orientation, as is well understood by those skilled in the art.
- the GTBP protein gene will be inserted downstream from a promoter and will be followed by a stop codon, although production as a hybrid protein followed by cleavage may be used, if desired.
- host-cell-specific sequences improving the production yield of GTBP protein and GTBP polypeptide derivatives will be used, and appropriate control sequences will be added to the expression vector, such as enhancer sequences, polyadenylation sequences, and ribosome binding sites.
- Two basic types of expression are contemplated: (i) expression in mammalian cells so as to overcome a deficiency in an individual having insufficient GTBP, and
- BK-SV40 hybrid vectors have been constructed . These vectors can be maintained in cultured human cells as multicopy double-stranded DNA extrachromosomal replicons.
- One exemplary vector consists of the SV40 promoter controlling the expression of neomycin resistance gene (the selectable marker) and the MMTV promoter regulated by the DRE enhancer sequence which controls the expression of the cloned gene.
- the foreign construct will usually include transcriptional and translational initiation and termination signals, with the initiation signals 5' to the gene and termination signals 3' to the gene of interest, altough linear DNA can be delivered to a host where recombination occurs for insertion into the host genome.
- Expression under the control of the native promoter can thus be achieved by replacing the defective gene with the linear DNA encoding GTBP by making use of cellular processes, e.g. homologous recombination.
- the transcriptional initiation region which includes the RNA polymerase binding site (promoter) may be native to the host or may be derived from an alternative source, where the region is functional in the host.
- the transcriptional initiation regions may not only include the RNA polymerase binding site, but also regions providing for the regulation of the transcription.
- the 3' termination region may be derived from the same gene as the trancriptional initiation region or a different gene. For example, where the gene of interest has a trascriptional termination region functional in the host species, that region may be retained within the gene.
- An expression cassette can be constructed which will include transcriptional initiation region, the GTBP protein gene under the transcriptional control of the trascription initiation region, the initiation codon, the coding sequence of the gene, with or without introns, and the translational stop codons, followed by the transcriptional termination region, which will include the terminator, and may include a polyadenylation signal sequence, and other sequences associated with transcriptional termination.
- the direction is 5' to 3' same as the direction of transcripition.
- the cassette will usually be less than about 10 kb, frequently less than about 6 kb, usually being at least about 5 kb.
- the gene When the expression product of the gene is to be located other than in the cytoplasm, the gene will usually be constructed to include particular amino acid sequences which result in translocation of the product to a particular site, which may be an organelle, such as the nucleus, or may be secreted into the external envirnoment of the cell.
- a particular site which may be an organelle, such as the nucleus, or may be secreted into the external envirnoment of the cell.
- Various secretory leaders, membrane integrator sequences, and translocation sequences for directing the peptide expression product to a particular site are described in the literature.
- cassettes may be involved, where the cassettes may be employed in tandem for the expression of independent genes which may express products independently of each other or may be regulated concurrently, where the products may act independently or in conjunction, e.g. GTBP and hMSH2.
- the expression cassette will normally be carried on a vector having at least one replication system.
- a replication system functional in E. coli such as ColEl, pSClOl, pACYC184, or the like. In this manner, at each stage after each manipulation, the resulting construct may be cloned, sequenced, and the correctness of the manipulation determined. In addition, or in place of the E.
- a broad host range replication system may be employed, such as the replication systems of the Pl incompatibility plasmids, e.g. RK2, RP1, RP4 and R68.
- the replication system there will frequently be at least one marker present, which may be uselful in one or more hosts, or different markers for individual hosts. That is, one marker may be employed for selection in a prokaryotic host, while another marker may be employed for selection in a eukaryotic host.
- neo neomycin- kanamycin resistance
- choramphenicol acetyltransferase cat
- b lactamase Jbla
- b galactosidase etc.
- the various fragments comprising the various constructs, expression cassettes, markers, and the like may be introduced consecutively by restriction enzyme cleavage of an appropriate replication system, and insertion of the particular construct or fragment into the available size. After ligation and cloning the vector may be isolated for further manipulation. All of these techniques are amply exemplified in the literature and find particular exemplification in Maniatis et al . , Molecular cloning: a laboratory manual , Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., 1982. Transformation of mammalian cells and ⁇ ene therapy Once the vector is completed, the vector may be introduced into mammalian cells. Techniques for transforming mammalian cells include transfection, microinjection, liposome-based delivery etc..
- Transfection of cultured human cells is the most commonly used method and can be achieved by standard protocols which involve either incubation of cells with DNA that has been co-precipitated with calcium phosphate or DEAE- dextran or electroporation with purified transfecting DNA.
- a genetically modified virus, a liposome or a microinjection can also be used to deliver foreign DNA to human recipient cells.
- Vectors derived from retroviruses are often used to stably maintain and persintently express the remedial gene in the corrected cell.
- In vivo gene therapy entails the direct delivery of remedial gene into the cell of a particular tissue of a prospective patient.
- the wild-type protein can be cloned into various benign viruses and delivered to target defective cells in an in vivo infection.
- Vectors derived from adenovirus, herpes simplex virus and certain retroviruses are excellent candidates for in vivo gene therapy. Methods and prospectives of gene therapy have been reviewed by Mulligan (1993), Science 260:926-932. Diagnostic methods usin ⁇ anti ⁇ ens
- methods for detecting analytes such as binding proteins of the invention are based on immunoassays.
- Immunoassays can be conducted to determine the presence or absence of GTBP in host cells. Such techniques are well known and need not be described here in detail. Examples include both heterogeneous and homogeneous immunoassay techniques. Both techniques are based on the formation of an immunological complex between the binding protein and a corresponding specific antibody.
- Heterogeneous assays for GTBP typically use a specific monoclonal or polyclonal antibody bound to solid surface, e.g. in sandwich assays.
- Homogeneous assays which are carried out in solution without the presence of a solid phase, can be used, for example, by determining the difference in enzyme activity brought on by binding of free antibody to an enzyme-antigen conjugate.
- a number of suitable assays are disclosed in U.S. Patent Nos. 3,817,837, 4,006,360, and 3, 996.34545.
- the solid surface reagent in the above assay prepared by known techniques for attaching protein material to solid support material, such as polymeric beads, dip sticks, or filter material. These attachment methods generally include non-specific adsorption of the protein to the support or covalent attachment of the protein, typically through a free amine group, to a chemically reactive group on the solid support, such as an activated carboxyl, hydroxyl, or aldehyde group.
- homogeneous assay In a second diagnostic configuration, known as a homogeneous assay, antibody binding to an analyte produces some change in the reaction medium which can be directly detected in the medium.
- Known general types of homogeneous assays proposed heretofore include (a) spin- labeled reporters, where antibody binding to the antigen is detected by a change in reported mobility (broadening of the spin splitting peaks) , (b) fluorescent reporters, where binding is detected by a change in fluorescence emission, (c) enzyme reporters, where antibody binding effects enzyme/substrate interactions, and (d) liposome- bound reporters, where binding leads to liposome lysis and release of encapsulated reporter.
- spin- labeled reporters where antibody binding to the antigen is detected by a change in reported mobility (broadening of the spin splitting peaks)
- fluorescent reporters where binding is detected by a change in fluorescence emission
- enzyme reporters where antibody binding effects enzyme/substrate interactions
- liposome- bound reporters where binding
- the assay method involves reacting the tissue extract from a test individual with an antibody and examining the sample for the presence of bound antigen.
- the examination may involve attaching a labelled anti-GTBP antibody to the primary complex formed between GTBP and the immobilized antibody and measuring the amount of reporter bound to the solid support, as in the first method, or may involve observing the effect of antibody binding on a homogeneous assay reagent, as in the second method.
- GTBP in its native or chemically modified form, or polypeptide derivatives thereof, or specific complexes with other polypeptides may be used for producing antibodies, either monoclonal or polyclonal, specific to GTBP or polypeptide derivatives thereof, or to GTBP complexes with other polypeptides.
- Antibodies specific for GTBP protein are produced by immunizing an appropriate vertebrate host, e.g., rabbit or mouse, with purified GTBP protein or polypeptide derivatives of GTBP protein, by themselves or in conjunction with a conventional adjuvant. Usually, two or more immunizations will be involved, and blood or spleen will be harvested a few days after the last injection.
- the immunoglobulins can be precipitated, isolated and purified by a variety of standard techniques, including affinity purification using GTBP protein attached to a solid surface, such as a gel or beads in an affinity column.
- the splenocytes will normally be fused with an immortalized lymphocyte, e.g., a myeloid cell line, under selective conditions for hybridoma formation.
- the hybridomas can then be cloned under limiting dilution conditions and their supernatants screened for antibodies having the desired specificity.
- Techniques for producing antibodies are well known in the literature and are exemplified by the publication Antibodies : A Laboratory Manual (1988) eds.
- the genetic material of the invention can itself be used in numerous assays as probes for genetic material present in an individual.
- the analyte can be a nucleotide sequence which hybridizes with a probe comprising a sequence of at least about 16 consecutive nucleotides, usually 30 to 200 nucleotides, up to substantially the full sequence of the gene as shown in SEQ ID NO: 12.
- the analyte can be RNA or DNA.
- the sample is typically a DNA or an RNA molecule extracted by the patient's tissue.
- the probe may contain a detectable label.
- Particularly preferred for use as a probe are sequences up to about 3200 consecutive nucleotides (for example from nucleotide 1 to nucleotide 3000 of SEQ ID NO: 12 and from nucleotide 1 to nucleotide 204 of SEQ ID NO:16) since these sequences appear to be particularly specific for GTBP.
- PCR technique One method for amplification of target nucleic acids, for later analysis by hybridization assays, is known as the polymerase chain reaction or PCR technique.
- the PCR technique can be applied to detect sequences of the invention in suspected samples using oligonucleotide primers spaced apart from each other and based on the genetic sequence set forth in SEQ ID NO: 12 and SEQ ID NO:16.
- the primers are complementary to opposite strands of a double-stranded DNA molecule and are typically separated by from about 50 to 450 nt or more (usually not more than 2000 nt) .
- This method entails preparing the specific oligonucleotide primers and then repeated cycles of target DNA denaturation, primer binding, and extension with a DNA polymerase to obtain DNA fragments of the expected length based on the primer spacing. Extension products generated from one primer serve as additional target sequences for the other primer.
- the degree of amplification of a target sequence is controlled by the number of cycles that are performed and is theoretically calculated by the simple formula 2 where n is the number of cycles. Given that the average efficiency per cycle ranges from about 65% to 85%, 25 cycles produce from 0.3 to 4.8 million copies of the target sequence.
- the PCR method is described in a number of publications, including Saiki et al. , Science (1985) 230:1350-1354; Saiki et al. , Nature (1986) 324:163-166; and Scharf et al., Science (1986)233:1076-1078. Also see U.S. Patent Nos. 4,683,194; 4,683,195; and 4,68
- the invention includes a specific diagnostic method for determination of GTBP, based on selective amplification of GTBP-protein-encoding DNA fragments.
- This method employs a pair of single-stranded primers derived from non-homologous regions of opposite strands of GTBP DNA duplex fragment having a sequence as described by combining the sequences SEQ ID NO: 16 and SEQ ID NO:12. These "primer fragments" represent one aspect of the invention.
- the method follows the process for amplifying selected nucleic acid sequences as disclosed in U.S. Patent No. 4,683,202, as discussed above.
- Mutations in the GTBP gene can be detected by restriction enzyme analysis of the amplification product or by direct sequencing. Also, alterations in GTBP sequence can be revealed by Southern hybridization with probes encompassing part or the entire sequences of SEQ ID NO: 12 and SEQ ID NO:16.
- Single-stranded DNA probes complementary to the wild-type GTBP-coding sequence can also be hybridized to RNA extracted from tissues or cells of human patients and used to detect mutations in the mature GTBP gene transcript by enzymatic digestion of heteroduplexes at the level of mismatches. These and other techniques aimed to identify variations in gene sequences from wild-type GTBP are extensively reported in the literature and well established in the scientific community. Binding assays involving GTBP The presence of an altered GTBP protein can be detected by the use of binding assays based on the specific recognition of G/T mismatches by GTBP. A synthetic double-stranded 34-mer oligonucleotide containing G/T mispair is prepared and labelled substantially as reported (15) .
- Cell extracts can be prepared as reported in current literature (e.g. ref 25 and refs. therein) .
- the cell extract (1-10 micrograms of nuclear proteins) can be incubated with the heteroduplex oligonucleotide at room temperature for 30 minutes to allow GTBP binding to the G/T mismatch.
- the mixture can then be loaded on a gel prepared as reported in Figure 6. Alterations in GTBP mass or affinity for the substrate can be evidenced by an altered electrophoretic mobility.
- NCIMB Newcastle disease virus
- NCIMB 40742 accession numbers NCIMB 40742
- NCIMB 40471 accession numbers NCIMB 40740 respectively.
- a strain of E.coli TOP10 - transformed using the plasmid pBluescript SK ' /GTBP coding for the whole amino acid sequence of GTBP from the amino acid 1 to the amino acid 1360 (SEQ ID NO: 15 and SEQ ID NO:l)- has been deposited on 28/5/96 with the above depositary institution with accession number NCIMB 40805.
- the present example shows that the GTBP protein sequence, as reported by combining the sequences SEQ ID NO:15 and SEQ ID NO: 1, contains seven subsequences which correspond to polypeptides obtained after proteolytic cleavage of the 160 kDa DNA-binding protein termed GTBP. These subsequences are indicated as SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7 and SEQ ID NO: 8.
- the 160 kDa protein was purified as reported in ref. 16.
- the fractions containing the G/T- specific mismatch binding activity were loaded onto a preparative SDS-PAGE gel and the 100 and 160 kDa bands were excised following staining with Coomassie Blue.
- the proteins were digested in the gel matrix either with trypsin (100 kDa protein, Promega Corporation, UK) , or with Achromombacter lyticus endopeptidase lys-C (160 kDa protein, Wako Chemicals GmbH, Germany) .
- the proteolytic peptides were recovered by sequential extractions and separated by tandem hplc on a Hewlett-Packard 1090M with diode array detection.
- Example IB The present example shows that the protein GTBP contains an amino-terminal domain corresponding to SEQ ID NO:15. This region can be determined by analysis of the coding nucleotide sequence. The amino-terminal domain is an integral part of the peptide GTBP itself, and therefore the GTBP sequence must be understood to be the sequenced combination of SEQ ID NO:15 and SEQ ID: NO:l with a total extension of 1360 amino acids. Part a of figure 8 shows the experimental approach followed to discover the amino-terminal region of GTBP (from amino acid 1 to 68 of SEQ ID NO:15) . Using the 5' RACE method(Rapid Amplification cDNA Ends, given in detail in the publication Nicolaides, N.C. et al .
- Genomics 29: 229-234, 1995 and Nicolaides N.C. et al. Genomics, 30: 195-206, 1995
- a pair of oligonucleotides was used that pairs with the sequence given in SEQ ID NO:12 from nucleotide 114 to 133 (primary oligonucleotide A) and from nucleotide 56 to 74 (secondary oligonucleotide B) .
- the PCR reaction products were sequenced and it was possible to determine that the amplification product was capable of encodmg the polypeptide DAAWSEAGPGPR, corresponding to amino acids 46-58 of the amino-terminal domain of GTBP as indicated in SEQ ID NO:15.
- oligonucleotides whose sequence was deduced from the initial RACE, complementary to the sequence given in SEQ ID NO:16 from nucleotide 188 to 204 (primary oligonucleotide C) and from oligonucleotide 169 to 185 (secondary oligonucleotide D) it was possible to amplify the GTBP-coding region 5' by-passing the methionine in position 1 of the amino acid sequence given in SEQ ID NO:15.
- the amplified clone termed KMN, contained the entire nucleotidic sequence given in SEQ ID NO:16.
- RACE analysis of leucocyte cDNA is shown in lanes 2 and 5, that of placenta cDNA in lanes 3 and 6.
- the products of lanes 1 to 3 derive from sequenced amplifications with oligonucleotides A and B, those in lanes 4 to 6 derive from sequenced amplifications with oligonucleotides C and D.
- Lanes 1 and 4 are the negative controls (absence of template) .
- the molecular weight markers are indicated at the side.
- Part b of figure 8 shows expression of the transcript encoding the protein GTBP using RT-PCR (PCR preceded by inverse transcription on RNA templates) .
- the RT-PCR was carried out using a synthetic oligonucleotide which paired with the sequence given in SEQ ID NO:12 from nucleotide 114 to 133 in the inverse transcription reaction followed by amplification with an oligonucleotide with a sequence equal to the end 5' of the GTBP transcript, that is 5'GGTGCTTTTAGGAGCCCCG3' .
- RNA used as a mold was taken from HeLa cells (lane 2) placenta (lane 3) leucocytes (lane 4) and cells from the colon (lane 5) ; these were incubated with (+ symbol on the lane) or without (- symbol on the lane) inverse transcriptase and then made to underto PCR. Wnere no cDNA was produced, as the reverse transcription reaction did not occur, it was not seen to be amplified. Lane 1 is the negative control without RNA. Example ?.
- the present example shows that DNA regions internal to GTBP gene can be obtained by amplification with primers designed on the basis of the sequence of peptides deriving from proteolytic cleavage of the 160 kDa G/T- binding factor (SEQ ID NO: 2 to 8) .
- the inventors identified a unique DNA sequence encoding the central 8 amino acids of the peptide of SEQ ID NO: 6.
- Example 3 shows that DNA regions internal to GTBP gene can be identified by hybridization with a DNA probe designed on the basis of the sequence of peptides obtained upon proteolytic cleavage of the 160 kDa G/T-binding factor.
- the DNA sequence reported as SEQ ID NO: 11 was was labeled with 32P by a standard kinase
- the labelled probe of SEQ ID NO: 11 was then used in the screening of a commercial oligo dT-primed cDNA library in phage lambda (HeLa S3 Uni-ZAP XR, Stratagene) . Two positive clones were selected for further analysis.
- Clone Cl contained an insert of 3980 bp corresponding to SEQ ID NO: 12, with a continuous open reading frame from amino acid residue 1 to 1292 encoding a polypeptide of 1292 amino acids (SEQ ID NO: 1) and a calculated molecular mass of 142 kDa; clone FLY 5 contained sequences coding from aa residue 116 to 1292 (see comments to SEQ ID NO: 1 and 12) .
- GTBP protein can be used as an antigen to produce highly specific antibodies which recognize GTBP but not hMSH2.
- PCR fragments corresponding to amino acid residues 27 to 158 of hMSH2 (SEQ ID NO: 13) and 750 to 928 of GTBP (SEQ ID NO: 14) were subcloned into the E. coli expression vector pGEX-3X (Pharmacia/LKB) and the recombinant proteins, in the form of fusion polypeptides with glutathione S-transferase, were induced and isolated as recommended by the manufacturer, except that the final concentration of IPTG was 0.25 mM and induced cultures were harvested after 6 hours at 20°C.
- the fusion proteins were used for immunization of New Zealand White S.P.F. female rabbits (Charles River Co.) using standard protocols. Two polyclonal antisera specifically immunoreactive to GTBP and hMSH2, respectively, were obtained and assayed as reported in Antibodies : A Laboratory Manual (1988) eds. Harlow and Lane, Cold Spring Harbor Laboratories Press (see Figures 2 and 5 for more details) . ExampIf? 5
- GTBP belongs to a class of DNA-repair proteins conserved over a wide evolutionary range.
- Figure 3 shows the alignment of the amino acid sequences of the conserved C-terminal regions of the mismatch binding proteins GTBP (ff. sapiens) , hMSH2
- GenBank entry HSU04045 (hMSH2) .
- the alignment was carried out using the GCG Pileup option.
- the figure was generated using Prettyplot.
- the alignment reveals a high degree of conservation at the C-terminal domain among all the proteins. GTBP can thus be considered a new member of the
- Figure 5 shows the effect of anti-hMSH2 and anti-GTBP antisera on the formation of the specific mismatch-binding complex.
- This gel-shift analysis was carried out as described (15) , except that nuclear extracts were used (25) .
- the antisera were added to the reaction mixtures 20 min prior to the radioactively- labelled probe.
- the figure is an autoradiogram of a native 6% polyacrylamide gel run in TAE buffer.
- the following example shows that GTBP and hMSH2 can be expressed separately in a cell-free translation system.
- the inventors employed a hMSH2 cDNA clone (17) and the GTBP clones Cl and FLY5 as set forth in SEQ ID NO: 12.
- the Cl and FLY5 ORFs were introduced into pCite- 2b.
- the hMSH2 ORF was inserted into pCite-1 (Novagen) .
- In vitro transcription and translation reactions were carried out as described previously (26) including a mock translation reaction in the absence of added DNA. 3 5s- labeled translation products were analyzed on a SDS- polyacrylamide gel treated with Amplify (Amersham) , dried and autoradiographed.
- the experiment was carried out using conditions recommended by the manufacturer.
- the figure is an autoradiogram of a denaturing 7.5% SDS- polyacrylamide gel.
- Fig. 6 section a translation of hMSH2, GTBP (Cl) and FLY5 mRNAs in a reticulocyte lysate system (Promega) gave rise to polypeptides of 113, 142 and 122 kDa respectively.
- reticulocyte lysate system Promega
- mismatch repair genes such as hMSH2, hMLHl , hPMSl and hPMS2 (1) are known to cause the hypermutability found in many forms of hereditary colorectal cancers (CRC) .
- CRC hereditary colorectal cancers
- the CRC-derived cell line HCT15 contains a full length hMSH2 protein but shows hypermutable phenotype (19) .
- the RNA of this cell line was reverse transcribed with random hexamers and reverse transcriptase according to standard protocols (e.g., see Powell et al., New Engl . J. Med. 329, 1982, 1993).
- the cDNA was then amplified with PCR using primers specific for the GTBP-coding sequence.
- the oligonucleotides used were: primer 5' -PGAGGGTTACCCCTGG-3' and 5'- ACACTGTAAGTCTGTGTACC-3' for codons 32 to 458, primers 5'- PAGTGAAGGCCTGAACAGCC-3' and 5' -AAGTCCAGTCTTTCGAGCC-3' for codons 219 to 858, and primers 5' -PGAGAGGGTTGATACTTGCC-3' and 5' -AGAAGTCAACTCAAAGCTTCC-3' for codons 692 to 1292 (where P denotes a T7 promoter sequence and a ribosome- binding site for translation initiation (26) and codon numbers are those reported in SEQ ID NO: 1 and SEQ ID NO: 12) .
- the amplification products were first transcribed and translated in vi tro using a commercial kit (Promega) .
- Analysis of translation products in a PAGE-SDS gel revealed truncated GTBP polypeptides from two PCR products, corresponding to regions located at codons 32- 458 (5' -end of the gene) and 692-1292 (3' -end of the gene) .
- Sequencing of these PCR products using a commercial system (SequiTherm Polymerase, Epicentre Technologies) revealed that truncations were due to frameshift mutations.
- nucleotide 664 (a C) at codon 222 changed a leucine to a termination codon and a substitution of nucleotides 3307-3312 (GATAGA) with T (see SEQ ID NO: 12) created a new termination codon several bp downstream.
- MTl is an alkylation-resistant lymphoblastoid cell line with a biochemical deficiency . similar to that of HCT15 (see Goldmacher et al . , J. Biol . Chem. , 261 , 12462, 1986; Kat et al . Proc. Natl . Acad Sci USA, 90, 6424, 1993) .
- the RNA of this cell line was reverse transcribed with random hexamers and reverse transcriptase and the cDNA was then amplified with PCR using primers specific for the GTBP- coding sequence as reported above.
- GTBP ff. sapiens
- hMSH2 ff. sapiens
- the amplification products were cloned in the vector
- BLUESCRIPT SK ⁇ and individual clones were sequenced using conventional protocols (Sequenase, USB) .
- the two mutations were not found to be associated in a single clone, deriving thus from separate alleles.
- a tumor cell line termed 543X (from the patient's designation) was derived from CRC and displays hypermutable phenotype and microsatellite instability but no mutation in mismatch repair genes so far described, including hMSH2 , hMLHl , hPMSl and hPMS2 (Liu et al . , Nature Genetics 9, 48, 1995) .
- RNA of this cell line was reverse transcribed with random hexamers and reverse transcriptase and the cDNA was then amplified with PCR using primers specific for the GTBP-coding sequence as reported above.
- MOLECULE TYPE protein
- HYPOTHETICAL No
- ANTISENSE No
- ORGANISM Homo sapiens
- IMMEDIATE SOURCE cDNA clone pCITE2b- Cl
- FEATURE SEQ ID NO : 1 shows the 1292 amino acid sequence ( in three letter code) of GTBP encoded by clone Cl ( see SEQ ID NO : 12 ) .
- the seven oligopeptides which were identif ied upon proteolytic cleavage of GTBP see SEQ ID NO : 2 to 8 ) are underlined .
- the first amino acid residue of the peptide encoded by the FLY5 cDNA is Asn at position 116 .
- NAME Cl
- Lys lie Leu Lys Gin Val Ile Ser Leu Gin Thr Lys Asn Pro Glu Gly
- Phe Leu Tyr Lys lie Lys Gly Ala Cys Pro Lys Ser Tyr Gly Phe 1220 1225 1230 Asn Ala Ma Ar ⁇ Leu Ala Asn Leu Pro Glu Glu Val Ile Gin Lys Gly
- ORGANISM Homo sapiens
- FEATURE SEQ ID NO: 2 to 8 show seven oligopeptides derived from proteolytic cleavage of GTBP extracted from HeLa cells and purified as described in ref. 16 .
- the peptide corresponding to SEQ ID NO: 6 (18 amino acids) was selected to design two degenerate primers corresponding to the N- and C-terminal sequences of the peptide, as given in detail in SEQ ID NO: 9 and 10.
- NAME FR44
- C IDENTIFICATION METHOD: Experimentally
- MOLECULE TYPE protein
- HYPOTHETICAL No
- ANTISENSE No
- ORIGINAL SOURCE (A) ORGANISM: Homo sapiens
- MOLECULE TYPE protein
- HYPOTHETICAL No
- ANTISENSE No
- ORIGINAL SOURCE
- ORGANISM Homo sapiens
- ix FEATURE: see SEQ ID NO: 2
- A NAME: FR69
- SEQ ID NO: 9 shows the sequence of the degenerate single-stranded DNA primer deduced from the N-terminal of oligopeptide shown in SEQ ID NO: 6. Together with SEQ ID NO: 10, the two primers were used to amplify poly-A RNA extracted from HeLa cells .
- the expected 67 base pairs (bp) fragment was cloned in pBluescript SK ⁇ (Stratagene) and sequenced with a commercial T7-polymerase based kit (Pharmacia) .
- the 54 bp sequence of the resulting fragment, obtained after subtraction of the engineered cloning sites, is shown as SEQ ID NO: 11.
- A)NAME oligo 5' sense
- C IDENTIFICATION METHOD: Polyacrylamide gel
- IMMEDIATE SOURCE oligonucleotide synthesizer
- FEATURE SEQ ID NO:10 shows the sequence of the degenerate single-stranded DNA primer deduced from the C-terminal of oligopeptide shown in SEQ
- SEQ ID NO: 11 shows the double-stranded DNA sequence encoding the oligopeptide reported in SEQ ID NO: 6, as deduced by sequencing of cloned amplification product . This fragment was derived from PCR amplification of HeLa cDNA, using the degenerate primers described in SEQ ID NO:
- MOLECULE TYPE synthetic DNA
- HYPOTHETICAL No
- ANTISENSE No
- IMMEDIATE SOURCE cDNA clone Cl
- FEATURE SEQ ID NO: 12 shows the 3980 bp cDNA sequence of clone Cl .
- the cDNA insert of clone FLY5 spanned from nucleotide 346 to 3980 of the Cl sequence as reported in SEQ ID NO: 12.
- TATCCCCCAG TACAAGTTTT ATTTGAAAAA GGAAATCTCT CAAAGGAAAC TAAAACAATT 1620
- CAGGTCATCT CTCTGCAGAC AAAAAATCCT GAAGGTCGTT TTCCTGATTT GACTGTAGAA 2520
- AAAACTATTG AAAAGAAGTT GGCTAATCTC ATAAATGCTG AAGAACGGAG GGATGTATCA 2880
- SEQ ID NO: 13 shows the double-stranded DNA sequence used to express an internal domain of hMSH2 (corresponding to amino acid residues 27 to 158) in the expression vector pGEX-3x (see also legend to Figure 2) .
- FEATURE SEQ ID NO: 14 shows the double -stranded DNA sequence used to express an internal domain of GTBP (corresponding to amino acid residues 750 to 928) in the expression vector pGEX-3x (see also legend to Figure 2) .
- NAME GST/GTBP
- C IDENTIFICATION METHOD: Polyacrylamide gel
- SEQ ID NO: 15 shows the amino-terminal sequence of 68 amino acids of GTBP encoded by the clone TASNR2A1 (see SEQ ID NO:16 for the corresponding nucleotide encoding sequence) .
- the amino acid sequence SEQ ID NO:15 (corresponding to residues 1-68) must be placed in front of the amino acid in position 1 of the sequence given in SEQ ID NO:l (corresponding to 1292 residues) to obtain the complete GTBP sequence of 1360 amino acids.
- nucleotidic sequence SEQ ID NO : 15 (corresponding to 204 residues) must be positioned in front of the nucleotide in position 1 of the sequence given in SEQ ID NO : 12 (corresponding to 3980 residues ) in order to obtain the complete GTBP- encoding sequence of 4080 nucleotides .
- NAME KMN
Landscapes
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Genetics & Genomics (AREA)
- Biochemistry (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Zoology (AREA)
- Medicinal Chemistry (AREA)
- Immunology (AREA)
- Oncology (AREA)
- Engineering & Computer Science (AREA)
- Wood Science & Technology (AREA)
- Gastroenterology & Hepatology (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Physics & Mathematics (AREA)
- Hospice & Palliative Care (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Toxicology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Peptides Or Proteins (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention relates to a new protein, GTBP (Guanine Thymine Binding Protein), that binds to G/T DNA mismatches to mediate repair of genetic information, to methods for detection of this protein, to the nucleotidic sequence encoding this protein and to processes for obtaining the above-mentioned protein using genetic engineering techniques. Furthermore, the present invention has as its object the detection in tumor tissues of the mutant GTBP gene in order to prevent and provide rapid diagnosis of human colorectal tumor forms. The figure shows the absence of GTBP-specific activity in cells obtained from human colorectal tumors.
Description
POLYPEPTIDE FOR REPAIRING GENETIC INFORMATION, NUCLEOTIDIC SEQUENCE WHICH CODES FOR IT AND PROCESS FOR THE PREPARATION THEREOF (GUANINE THYMINE BINDING PROTEIN - GTBP).
DESCRIPTION Technical field
This invention relates to the area of cancer prevention, diagnosis and therapeutics. In particular, the invention is concerned with methods for detection of a novel mismatch binding protein, termed GTBP (Guanine Timine Binding Protein) , which mediates the repair of genetic information, with the nucleic acid sequence encoding the protein and with processes for obtaining the protein and producing it by recombinant genetic engineering techniques. In addition, the present invention also relates to detection of mutated GTBP gene in tumour tissues and to prevention and early diagnosis of human colorectal cancers.
Background of the discovery In human cells, mismatch recognition and binding has until now been believed to be mediated by the h SH2 protein. The observation that cells from human colorectal cancers (CRC) exhibit a mutator phenotype with a marked instability of microsatellite sequences suggested that these tumor cells may be deficient in DNA mismatch repair. This hypothesis was substantiated when extracts from CRC tumor-derived cell lines where shown to be unable to repair mismatches in an in vitro assay (see refs. 1 and 2 for reviews) .
The serendipitous discovery of an open reading frame (ORF) encoding a polypeptide homolog of the E. coli mismatch-binding protein MutS (3, 4) paved the way for the identification of an ever-growing family of MSH genes, ranging from bacteria to man (see e.g. 5) . Three members of this family, S. cerevisiae MutS homologs MSHl and MSH2, as well as the human homolog hMSH2, could be shown to bind to mismatched DNA in vi tro (6-9) . The link between the biological function of hMSH2 and the
phenotype of the CRC tumors was forged when (i) the hMSH2 gene was shown to segregate with a known CRC locus on chromosome 2p (10,11), (ii) the hMSH2-deficient cell line LoVo was shown to be deficient in mismatch repair (12) as well as in mismatch-binding activity (12) and (iii) the genome of this cell line exhibited a marked instability of microsatollite sequences (14) . A mismatch-binding factor, GTBP (for G/T binding protein) , originally identified in HeLa cells by the present inventors (15) , was shown to bind preferentially to heteroduplexes containing G/T mispairs. Purification of this DNA binding activity by G/T mismatch affinity chromatography yielded a mixture of two polypeptides of apparent molecular weights of 100 and 160 kDa (16) , indicating that the mismatch-specific complex was composed of two proteins. The 100 kDa constituent of the complex was demonstrated to be hMSH2 (17) . The present discovery implies that hMSH2 acts as a complex with GTBP in the correction of base/base mispairs and one- or two-nucleotide loops. Moreover, GTBP is necessary but not indispensible in the correction of larger insertion/deletion loops. A number of tumors have been shown to display mutator phenotypes which are consistent with the functional role of the hMSH2-GTBP complex (20-24) . Prior to the current discovery and characterization of GTBP, no specific role in the repair of genetic information and no hereditary defect had been associated with this protein or with the gene encoding it. Relevant Literature l. P. Modrich, Science 266, 1959 (1994)
2. J. Jiricny, Trends Genet . 10, 164, 1994.
3. J. P. Linton et al. Mol . Cell . Biol 9, 303(1989)
4. H. Fujii and T. Shimada, J. Biol . Chem. 264, 10057 (1989) . 5. L. New, K.Liu, G.E. Crouse, Mol . Gen . Genet . 239, 97 (1993) .
6. N.-W. Chi and R. D. Kolodner, J. Biol. Chem. 269, 29984 (1994) .
7. T. Prolla et al , Science 265,1091(1994) .
8. B. Alani, N- Chi, R. D. Kolodner, Genes & Development 9, 234 (1995) .
9. R. Fishel, A. Ewei, M.K. Lescoe, Cancer Research 54, 5539 (1994) .
10. R. Fishel et al. , Cell 75,1027 (1993) .
11. F.S. Leach et al . , Cell 75,1215 (1993) . 12. A. Umar eϋ al., J. Biol. Chem. 269, 14367 (1994) .
13. G. Aquilina et al . , Proc. Natl. Acad. Sci. U.S.A.. 91, 8905 (1994) .
14. D. Shibataet al, Nature Genet. 6, 273 (1994) .
15. J. Jiricny et al. Proc. Natl. Acad. Sci. U.S.A. 85, 8860 (1988) .
16. M. Hughes and J. Jiricny, J. Biol. Chem. 267, 23876 (1992) .
17. F. Palombo et al. Nature 367, 417 (1994) .
18. J. Lingner, J. Kellernan and . Keller, Nature 354, 496 (1991) .
19. Da. Costa et at., Nature Genetics 9, 10 (1995) .
20. L.A. Aaltonen et al., Science 260, 812 (1993)
21. S.Ν. Thibodeau et al., Science 260, 816 (1993)
22. Ionov et al., Nature 363, 558 (1993) 23. R. ooster et al. , Nature Genetics 6, 152 (1994)
24. A. Merlo et al., Cancer Res . 54, 2098 (1994) .
25. J.D. Dignam et al. , Methods Enzymol. 101, 382 (1983) .
26. P. Gallinari et al., J". Virol. 68, 3809 (1994) . Sumary of the invention
It is an object of the present invention to provide a 1360-amino acid sequence corresponding to the polypeptide referred to as GTBP. It should be stated that GTBP is used to indicate a compound polypeptide combining in order the amino acid sequences indicated in SEQ ID NO:15 (from amino acid 1 to 68) and SEQ ID NO:l (from amino acid 1 to 1292) .
It is another object of the present invention to provide a genetic construct containing a double-stranded cDNA sequence of 4080 base pairs encoding a 1360-amino acid peptide referred to as GTBP. It should be stated that the whole coding gene GTBP indicates a compound DNA sequence combining in order the nucleotide sequences indicated in SEQ ID NO:16 (from nucleotide 1 to 204) and SEQ ID NO:12 (from nucleotide 1 to 3980) .
A further object of the present invention is to provide a genetic construct capable of expressing a 1360- a ino acid peptide of molecular mass 153 kDa referred to as GTBP.
It is another object of the present invention to provide a method for preparation and isolation of native GTBP protein in pure form from cultured cells and tissues.
It is another object of the present invention to provide a method for the assessment of the in vi tro activity of GTBP. It is yet another object of the present invention to provide a method for the detection of mutated GTBP by the use of specific antibodies directed against GTBP.
It is yet another object of the present invention to provide a method for the detection of mutated GTBP alleles by the use of the polymerase chain reaction and sequencing of the amplification products.
It is another object of the present invention to provide DNA probes for the detection of mutated GTBP genes in human cells. It is an object of the present invention to provide a method for diagnosing and prognosing of human colorectal cancers (CRC) .
It is yet another object of the present invention to provide a method for detecting the genetic predisposition to human colorectal cancers (CRC) .
It is yet another object of the present invention to provide a method for large-scale population screening to genetic predisposition to human colorectal cancers (CRC) .
It is still another object of the present invention to provide a method for supplying wild-type GTBP alleles to a cell which has lost the GTBP gene function.
It is another object of the present invention to provide a method for generating transgenic animals carrying mutant GTBP alleles. It is another object of the present invention to provide a method for testing the activity of therapeutic agents aimed to suppress human colorectal cancers (CRC) .
These and other objects of the invention are provided by one or more of the embodiements which are described below.
In one embodiment the sequence of a 1360-amino acid polypeptide is provided corresponding to the protein referred to as GTBP.
In another embodiment a cDNA molecule is provided which comprises the coding sequence of the GTBP gene.
In another embodiment a procedure for the preparation of the pure GTBP protein is provided.
It is another embodiment of the present invention to provide pairs of single stranded primers to determine the nucleotide sequence of the GTBP gene or of DNA regions internal to the GTBP gene by polymerase chain reaction. The sequence of said primers is internal to chromosome 2pl6, said pairs of primers allowing the syntesis of GTBP gene or of parts of it. In yet another embodiment of the present invention a nucleic acid probe is provided which is complementary to human wild-type GTBP gene coding sequence and which can form mismatches when annealed with mutant GTBP alleles, thereby making possible the detection of heteroduplex DNA as revealed by shifts in electrophoretic mobility either with or without prior enzymatic or chemical cleavage.
In another embodiment a procedure is indicated for the detection of wild-type or mutated GTBP protein in humans, comprising: isolating a human sample selected from the tissue or body fluid and detecting the wild-type or the altered GTBP protein itself or in any complex formed by the association of GTBP with other polypeptides.
In another embodiment of the present invention a method is provided for the assessment of the activitiy of (i) the wild-type GTBP protein or (ii) of derived peptides obtained by deletion or insertion of known amino acid sequences in GTBP protein or (iii) of the altered GTBP protein as the result of in vivo mutational events or (iv) of any complex formed by the association of peptides just mentioned in (i) , (ii) , (iii) , and (iv) of the present embodiment with other polypeptides.
In yet another embodiment a method is provided for the detection of cancer in humans, comprising: isolating a human sample selected from the tissue or body fluid; detecting the alteration in the GTBP gene or in the expressed polypeptide (GTBP protein) itself or in any complex formed by the association of GTBP with other polypeptides, said alteration indicating the predisposition to neoplastic transformation or the presence of cancer.
In still another embodiment of the present invention a method of diagnosing or prognosing neoplastic tissue of a human is provided comprising: detecting somatic alterations in wild-type GTBP alleles or their expression products in human colorectal cancers (CRC) , said alteration indicating neoplasia of the tissue.
In yet another embodiment a method is provided for the detection of genetic predisposition to CRC, comprising: isolating a human sample selected from the group consisting of blood, bioptic samples of tissues, esfoliative cells and any other generic human sample; detecting the alteration in the GTBP gene or in the
expressed polypeptide (GTBP protein) itself or in any complex formed by the association of GTBP with other polypeptides, said alteration indicating genetic predisposition to cancer. In another embodiment of the present invention a method is provided for supplying wild-type GTBP gene function to a cell which has lost said gene function by virtue of any mutation in the GTBP gene, comprising: introducing wild type GTBP gene into a cell which has lost said gene function such that GTBP gene is then expressed at wild-type level in the cell. GTBP protein can also be applied to cells or administered to animals to remediate defects in GTBP gene function.
In an additional embodiment a method is provided to supply a portion of wild-type GTBP gene to a cell which has lost the said gene such that the said portion is expressed in the cells and encodes part of the GTBP protein which is required for non-neoplastic growth of the said cell. It is another embodiment of the present invention the generation of transgenic animals carrying a mutated GTBP gene derived from a second species or a mutated GTBP gene generated in vi tro by genetic engineering techniques. In another embodiment of the present invention a method of testing therapeutic agents for the ability to suppress a neoplastically trasformed phenotype is provided. The method comprises: applying a test substance to a cultured epithelial cell which carries a mutation of the GTBP gene and determining whether the substance suppresses the neoplastic phenotype of the cell or suppresses the growth of already developed tumors.
In another embodiment of the present invention a method of testing therapeutic agents for the ability to suppress a neoplastically trasformed phenotype is provided. The method comprises: applying a test substance to an animal which carries a mutation of the GTBP gene
and determining whether the substance prevents neoplastic transformation of defined tissues or suppresses the growth of already developed tumors.
The present information provides the art with the information that the GTBP gene, a heretofore unknown gene, encodes the GTBP protein which acts as specific mismatch-binding factor. GTBP binds preferentially to heteroduplexes containing G/T mispairs and one- or two- nucleotide loops. Purification of this DNA binding activity made it possible to establish that the mismatch- specific factor is in fact a complex composed of two distinct proteins. The smaller constituent of the complex (about 100 kDa) is the hMSH2 protein (17) whereas the larger component (about 160 kDa) is GTBP. The present invention provides the technical tools for the detection and for the activity assessment of GTBP alone or as a complex with hMSH2. The GTBP gene is a target of mutational events, these alterations being associated with tumorigenesis. This discovery allows highly specific assays to be performed to determine the neoplastic status of a particular tissue or the predisposition to cancer of individuals. A number of tumors have been shown to display mutator phenotypes with a similarly low degree of microsatellite instability (20-24) consistent with the functional role of the hMSH2-GTBP complex. Prior to the current discovery and characterization of GTBP, no specific role in the repair of genetic information and no hereditary defect had been associated with this protein. Brief description of the drawings. Figure 1 a shows the commercial phagemid vector pBluescript SK" (Stratagene) used for cloning and sequencing the GTBP cDNA. The DNA fragment shown in SEQ ID NO: 12 was cloned between the EcoRI and Xhol sites of the vector, b shows the commercial pCITE 2b vector. The insert described in SEQ ID NO: 12 was inserted between the EcoRI and Xhol sites of the vector.
Ampicillin = beta-lactamase gene for ampicillin resistance
ColEl ori = origin of replication derived from plasmid ColEl fl = origin of replication of phage Fl
IacZ = alpha peptide of beta-galactosidase used for genetic complementation
MCS = multiple cloning site containing the recognition sequences of the listed restriction enzymes T3 and T7 = promoter sequences from phages T3 and T7.
Figure 2 shows the commercial plasmid vector pGEX-3x (Pharmacia Biotech) that was used for cloning of the PCR fragments corresponding to amino acid residues 27 to 158 of hMSH2 and 750 to 928 of GTBP (SEQ ID NO:l) . Primers used for amplification were:
5'CGGGATCCCCCCGGAGAAGCCGACCACCAC3' and
5'CGGAATTCCTGGCCATCAACTGCGGACAT3' for codons 27 to 158 of hMSH2, and 5'CGGAATTCTCAACTCGTATTCTTCTG3' and 5'CGGGATCCCCCTTGAGAGGCTACTCAGT3' for codons 750 to 928 of GTBP. The PCR products, identified respectively as SEQ ID NO: 13 and 14 were cloned between the BamHI and EcoRI sites. The expression products, in the form of polypeptides fused with glutathione-S-transferase, were purified by affinity chromatography on a commercial glutathione matrix (Pharmacia Biotech) as directed by the manufacturer. The pure fusion proteins were used for the immunization of New Zealand White SPF female rabbits by standard protocols as reported in the publication Antibodies : A Laboratory Manual (1988) eds. Harlow and Lane, Cold Spring Harbor Laboratories Press.
Figure 3 shows an alignment of the amino acid sequences of the conserved C-terminal regions of the four mismatch binding proteins, i.e. GTBP (ff. sapiens) , hMSH2
(H. sapiens) , MSH2 (S. cerevisiae) and MutS (E. coli) . Identical residues are in black boxes, conserved ones in shaded boxes. Sequences reported in the alignment correspond to entries MSH2_YEAST (MSH2) and MUTS_ECOLI
(MutS) in the SwissProt databank, or the coding region of GenBank entry HSU04O45 (hMSH2) . The alignements show that a high degree of conservation exists among the three homologs, with the C-terminal part of the protein being particularly highly conserved. GTBP can therefore be considered a new member of the MSH family.
Figure 4 shows the sequence homology, at the protein level, between pairs of MSH family members. Section a shows the matrix obtained from the alignment of GTBP (on the abscissa) with the yeast GTBP homolog (GenBank accession number Z47746, on the ordinate); the two proteins show comparable length and a significant homology is evident throughout their whole sequence. Section b shows the matrix obtained from the alignment of yeast MSH2 (on the ordinate) with GTBP (on the abscissa) ; the proteins show different lengths and most of the homology is confined to the C-rerminal regions of the two sequences. Section c shows the matrix obtained from the alignment of human MSH2 protein (on the ordinate) with GTBP (on the abscissa) ; the proteins show different lengths and, also in this case, most of the homology is confined to the C-rerminal regions of the two sequences. Section d shows the matrix obtained from the alignment of human hMSH2 protein (on the ordinate) with the yeast MSH2 (on the abscissa) ; the two proteins show comparable length and the homology is evident throughout the entire sequence.
Figure 5 shows the effect of selective anti-hMSH2 and anti-GTBP antisera on the formation of the specific mismatch-binding complex. Pre-incubation of HeLa nuclear extracts with either antiserum prior to addition of the G/T heteroduplex DNA probe results in a diminuition of the specific band in the gel-shift assay, an effect not observed when the respective pre-immune sera were used. This figure proves that both hMSH2 and GTBP are present in the mismatch-binding factor. This gel-shift analysis was carried out as described in ref.15, except that
nuclear extracts were used (25) . The antisera were added to the reaction mixtures 20 min prior to the addition of the radioactively-labelled probe. The figure is an autoradiogram of a native 6% polyacrylamide gel run in Tris-acetate/EDTA (TAE) buffer prepared according to Maniatis et al . , Molecular cloning: a laboratory manual , Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., 1982.
Figure 6 shows that the mismatch-binding activity can be reconstituted using GTBP and hMSH2 obtained using an in vi tro translation system. The procedure followed to generate in vi tro transcripts of the hMSH2, Cl and FLY5 coding sequences was as follows : The DNA region encoding hMSH2 was inserted into pCite-1; Cl and FLY5 ORFs were introduced into pCite-2b (Novagen) . In vi tro transcription and translation reactions were carried out as described in ref. 26, including a mock translation reaction in the absence of added DNA. S-labeled translation products were analysed on a SDS- polyacrylamide gel treated with Amplify (Amersham) , dried and autoradiographed. Gel-shift assays were performed as described in ref. 15. Aliquots of 5 μl of the single in vi tro translation reactions were tested; in the pre- mixing experiments, 2.5 μl of each of the two translation reactions were mixed and incubated for 15 min at room temperature before the addition of the probe. AMP at a concentration of 5 mM was included in all the DNA binding reactions so as to overcome the effect of ATP in the reticulocyte lysates, which prevents the formation of mismatch-specific protein-DNA complexes, according to ref. 16. Section a is an autoradiogram of a denaturing 7.5% SDS-polyacrylamide gel showing that translation of hMSH2, GTBP (Cl) and FLY5 mRNAs in a reticulocyte lysate system (Promega) gave rise to expected polypeptides of 113, 142 and 122 kDa, respectively. Section b shows the gel-shift analysis which demonstrates the binding of the in vitro-translated proteins to the G/T heteroduplex. The
figure is an autoradiogram of a native 6% polyacrylamide gel run in TAE buffer.
Figure 7 shows that mismatch binding activity is absent from cell extracts lacking GTBP or hMSH2. The experiment is based on the analysis of two cell lines derived from CRC: LoVo cells contain a homozygous deletion of hMSH2 alleles and do not exhibit G/T binding activity (13) , while neither hMSH2 allele is mutated in DLDl cells, in spite of the fact that also this cell line lacks G/T binding activity. Section a shows a gel-shift assay showing that extracts of LoVo and DLDl fail to make mismatch-specific complexes. The G/C and G/T probes were obtained as described previously (15) . Experimental conditions were as in Figure 6. The figure is an autoradiogram of a native 6% polyacrylamide gel run in TAE buffer. Section b shows the Western blot analysis of extracts from Hela, LoVo and DLDl cells. The protein bands were visualized using an alkaline phosphatase- conjugated anti-rabbit IgG system (Promega) as directed by the manufacturer. In the two left lanes, the anti-GTBP and anti-hMSH2 antisera were used alone with the HeLa extract to demonstrate their selectivity for the 160 and 100 kDa proteins, respectively. In the remaining lanes, both antisera were used together. Control HeLa cells revealed the presence of both hMSH2 and GTBP. In contrast, the two CRC-derived tumor cell lines LoVo and DLDl were completely devoid of full-length hMSH2 and GTBP, respectively. The amounts of hMSH2 in DLDl cells and GTBP in LoVo cells were considerably lower than in HeLa cells. Since hMSH2 and GTBP bind heteroduplex DNA as a complex, the lack of one of the two proteins may cause instability of the second component of the complex.
Figure 8, part a, shows the experimental approach followed to discover the amino-terminal region of GTBP (from amino acid 1 to 68 of SEQ ID NO:15) . Using the 5' RACE method(Rapid Amplification cDNA Ends, given in detail in the publication Nicolaides, N.C. et al.
Geno ics, 29: 229-234, 1995 and Nicolaides N.C. et al. Genomics, 30: 195-206, 1995) it is possible to determine the sequence upstream of the amino acid Ala in position 1 of SEQ ID N0:1. Initially, a pair of oligonucleotides was used that pairs with the sequence given in SEQ ID NO:12 from nucleotide 114 to 133 (primary oligonucleotide A) and from nucleotide 56 to 74 (secondary oligonucleotide B) . The PCR reaction products were sequenced and it was possible to determine that the amplification product was capable of encoding the polypeptide DAAWSEAGPGPR, corresponding to amino acids 46-58 of the amino-terminal domain of GTBP as indicated in SEQ ID NO:15. Using a further two oligonucleotides, whose sequence was deduced from the initial RACE, complementary to the sequence given in SEQ ID NO:16 from nucleotide 188 to 204 (primary oligonucleotide C) and from oligonucleotide 169 to 185 (secondary oligonucleotide D) it was possible to amplify the GTBP- coding region 5' by-passing the methionine in position 1 of the amino acid sequence given in SEQ ID NO:15. The amplified clone, termed KMN, contained the entire nucleotidic sequence given in SEQ ID NO:16. RACE analysis of leucocyte cDNA is shown in lanes 2 and 5, that of placenta cDNA in lanes 3 and 6. The products of lanes 1 to 3 derive from sequenced amplifications with oligonucleotides A and B, those in lanes 4 to 6 derive from sequenced amplifications with oligonucleotides C and D. Lanes 1 and 4 are the negative controls (absence of template) . The molecular weight markers are indicated at the side.
Part b of figure 8 shows expression of the transcript encoding the protein GTBP using RT-PCR (PCR preceded by inverse transcription on RNA templates) . The RT-PCR was carried out using a synthetic oligonucleotide which paired with the sequence given in SEQ ID NO:12 from nucleotide 114 to 133 in the inverse transcription reaction followed by amplification with an
oligonucleotide with a sequence equal to the end 5' of the GTBP transcript, that is 5'GGTGCTTTTAGGAGCCCCG3'.
The RNA used as a mold template taken from HeLa cells (lane 2) placenta (lane 3) leucocytes (lane 4) and cells from the colon (lane 5) ; these were incubated with
(+ symbol on the lane) or without (- symbol on the lane) inverse transcriptase and then made to underto PCR.
Wnere no cDNA was produced, as the reverse transcription reaction did not occur, it was not seen to be amplified. Lane 1 is the negative control without RNA.
Detailed des ription
In view of the potential and varied roles for mismatch binding proteins in the repair of genetic information and their effects on disease state, such as tumor cell transformation and proliferation, metastases, and the paucity of understanding of the molecules and agents that selectively effect or modulate the activities of these proteins there exists a need in the art for compounds and agents with effector and modulator activity and methods to identify these and related compositions and agents. Further, such agents can serve as commercial research reagents for control of nucleic acid repair, and other GTBP-related conditions. Despite progress in developing a more defined model of the molecular mechanisms underlying nucleic acid repair, few significant methods applicable to assessing predisposition to cancer and or to its treatment have evolved. The hMSH2/GTBP heterodimer is necessary for the correction of base/base mispairs and one or two- nucleotide loops. Genomic instability in tumor-derived cell-lines lacking GTBP demonstrates itself mainly in the form of small differences (e.g. in runs of A) rather than large changes in CA repeats, characteristic of phenotypes associated with the four known CRC loci hMSH2, hMLHl, hPMSl and hPMS2. Cancers displaying mutator phenotypes with a low degree of microsatellite instability (20-24) may be associated with a malfunction of GTBP. It is a
discovery of the present invention that mutational events associated with tumorigenesis in CRC are due to defects in the GTBP gene.
Novel compositions comprising generic sequences encoding the GTBP protein, as well as fragments derived therefrom are provided, together with recombinant proteins produced using the genomic sequences and methods of using these compositions.
Exemplary amino acid and DNA sequences of the invention are set forth in SEQ ID NO: 1 - SEQ ID NO:15 and in SEQ' ID NO: 12 - SEQ ID NO: 16. Standard abbreviations for nucleotides and amino acids are used in the Figures and elsewhere in this specification. GTBP- derived polypeptides are particularly preferred embodiments of the invention, although variations based on the specific sequences of these polypeptides are also part of the present invention. In its broader aspects, the invention (as it pertains to polypeptides per se) includes any polypeptide selected from the group consisting of:
(i) any protein having an amino acid sequence which is at least 85% homologous to the amino acid sequences of SEQ ID NO: 1, SEQ ID NO:15 and the combination thereof, and, (ii) fragments thereof comprising at least 10 consecutive amino acids located within the amino acid sequences of SEQ ID NO: 1, SEQ ID NO:15 and the combination thereof, wherein the polypeptide is capable of binding to an antibody specific for GTBP.
In the genetic engineering aspects of the present invention, specific coding sequences as set forth in SEQ ID NO: 12, SEQ ID NO:16 and the combination thereof, which correspond to the preferred polypetides are themselves preferred.
Equivalent and complementary DNA and RNA sequences (see below for definitions of these terms) are likewise preferred. In its broader aspects, the genetic engineering aspects of the present invention include any
recombinant DNA or RNA molecule comprising a DNA sequence encoding GTBP itself or GTBP-derived protein according to SEQ ID NO: 1 or a corresponding DNA or RNA sequence, or a subsequence thereof comprising at least 10 nucleotides. The present invention also focuses on diagnostic methodologies aimed to detect loss of GTBP function in humans and consequent predisposition to neoplasia. Defintion of terms
A number of terms used in the art of genetic engineering and protein chemistry are used herein with the following defined meanings.
Two nucleic acid fragments are "homologous" if they are capable of hybridizing to one another under hybridization conditions described in Maniatis et al . , (1982) , Molecular cloning: a laboratory manual . Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., pp. 320-325. By using the following wash conditions --2 x SSC, 0.1% SDS, room temperature twice, 30 minutes each; then 2 x SSC, 0.1% SDS, 50° C once, 30 minutes; then 2 x SSC, room temperature twice, 10 minutes each-- homologous sequences can be identified that contain at most about 25-30% base pair mismatches. More preferably, homologous nucleic acid strand contains 15-25% base pair mismatches, even more preferably 5-15% base pair mismatches. These degrees of homology can be selected by using more stringent wash conditions for identification of clones from gene libraries (or other sources of genetic material), as is well known in the art.
Two amino acid sequences are homologous if there is a partial or complete identity between their sequences. For example, 85% homology means that 85% of the amino acids are identical when the two sequences are aligned for maximum matching. Gaps (in either of the two sequences being matched) are allowed in maximizing matching gap lengths of 5 or less are preferred with 2 or less being more preferred.
Alternatively and preferably, two protein sequences (or polypeptide sequences derived from them of at least 30 amino acids in length) are homologous, as this term is used herein, if they have an alignment score of more than 5 (in standard deviation units) using the program ALIGN with the mutation data matrix and a gap penalty of 6 or greater (Dayhoff, M.O., in Atlas of Protein Sequence and Structure, 1972, volume 5, National Biomedical Research Foundation, pp. 101-110, and Supplement 2 to this volume, pp. 1-10) . The two sequences or parts thereof are more preferably homologous if their amino acids are greater than or equal to 50% identical when optimally aligned using the ALIGN program.
A DNA fragment is "derived from" a GTBP-encoding DNA sequence if it has the same or substantially the same base pair sequence as a region of the coding sequence for GTBP protein molecule.
"Substantially the same" means, when referring to biological activities, that the activities are of the same type although they may differ in degree. When referring to amino acid sequences, "substantially the same" means that the molecules in question have similar biological properties and preferably have at least 85 % homology in amino acid sequences. More preferably, the amino acid sequences are at least 90% identical. In other uses, "substantially the same" has its ordinary English language meaning.
A protein is "derived from" GTBP if it has the same or substantially the same amino acid sequence as a region of the GTBP protein molecule. By polypeptide derivatives of GTBP protein is meant polypeptides differing in length from the natural protein and containing five or more amino acids in the same primary order as found in the protein as obtained from a natural source. Polypetide molecules having substantially the same amino acid sequence as the natural protein but possessing minor amino acid substitutions which do not significantly
affect the ability of the protein or polypeptide to interact with protein-specific molecules, such as antibodies and nucleic acids are within the definition as derived from GTBP. Derivatives include glycosylated forms, aggregative conjugates with other protein molecules and covalent conjugates with unrelated chemical moieties. Covalent derivatives are prepared by linkage of functionalities to groups which are found in the amino acid chain or at the N-or C-terminal residue by means known in the art.
GTBP-specific molecules include polypeptides such as antibodies that are specific for the protein or polypeptide containing the naturally occurring GTBP amino acid sequence. By "specific binding polypetide" are intended polypeptides that bind with GTBP protein and its derivatives and which have a measurably higher binding affinity for the target polypeptide than for other polypetides tested for binding. Higher affinity by a factor 10 is preferred, more preferably by a factor of 100. Binding affinity for antibodies refers to a single binding event (i.e., monovalent binding of an antibody molecule) . Specific binding by antibodies also means that binding takes place at the normal binding site of the molecule's antibody (at the end of the arms in the variable region) .
As discussed above, minor amino acid variations, from the natural amino acid sequence of GTBP protein are contemplated; in particular, conservative amino acid replacements are contemplated. Conservative replacements are those that take place within a family of amino acids that are related in their side chains. Genetically encoded amino acids are generally divided into four families: (I) acidic = aspartate, glutamate; (2) basic = lysine, arginine, histidine; (3) non-polar = alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan; and (4) uncharged polar, glycine, asparagine, glutamine, cystine, serine,
threonine, tyrosine. Phenylanine, tryptophan, and tyrosine are sometimes classified jointly as aromatic amino acids. For example, it is reasonable to expect that an isolated replacement of a leucine with an isoleucine or valine, an aspartate with a glutamate, a theonine with a serine, or a similar replacement of an amino acid with a structurally related amino acid will not have a major effect on the binding properties of the resulting molecule, especially if the replacement does not involve an amino acid at a binding site involved in the interaction of GTBP or its derivatives with an antibody or with a specific DNA recognition sequence. Whether an amino acid change results in a functional peptide can readily be determined by assaying the specific binding properties of the polypeptide derivative. Isolation of cDNA encoding GTBP protein
Isolation of nucleotide sequences encoding GTBP protein involves creation of a cDNA library prepared from full-length mature messenger RNA extracted from cultured cells or tissues. Evidence is provided that GTBP is conserved over a broad evolutionary range, thus allowing the isolation of GTBP homologs from the genomes of phylogenetically distant species, i.e. from mammals to yeasts to bacteria. Genetic libraries can be made in either eukaryotic or prokaryotic host cells. Widely available cloning vectors such as plasmids, cosmids, phage, YACs and the like can be used to generate genomic libraries suitable for the isolation of nucleotide sequences encoding GTBP protein or portions thereof. Useful methods for screening genetic libraries for the presence of GTBP protein nucleotide sequences include the preparation of oligonucleotide probes based on the sequence information provided in SEQ ID NO: 1 and SEQ ID NO: 15 (after decoding of the amino acid sequence) as well as in SEQ ID NO:12 and SEQ ID NO: 16 (directly derived from the encoding DNA) of this patent. By employing the standard
triplet genetic code, oligonucleotide sequences of about 17 base pairs or longer can be prepared by conventional in vi tro synthesis techniques. The resultant nucleic acid sequences can be subsequently labeled with radionuclides, enzymes, biotin, fluorescers or the like, and used as probes for screening the libraries.
Additional methods of interest for isolating GTBP protein-encoding nucleic acid sequences include screening of genetic libraries for the expression of GTBP protein or fragments thereof by means of GTBP protein-specific antibodies, either polyclonal or monoclonal. Moreover, a selection method advisable for the screening of GTBP libraries cloned in conventional expression vectors is based on the specific binding of the protein (or of polypeptides contained therein) to heteroduplex DNA molecules containing G/T mimatches. A particularly preferred technique for isolating homolog proteins from related species or strains involves the use of degenerate primers based on partial amino acid sequences of GTBP protein and the polymerase chain reaction (PCR) to amplify gene segments between the primers. A similar approach can also be applied to generate double stranded cDNA molecules after amplification of mRNA with appropriate primers and polymerases. The gene can than be isolated using a specific hybridization probe based on the amplified gene segment, which is then analyzed for appropriate expression of the protein.
The nucleotide sequence of the isolated genetic material which encodes GTBP protein can be obtained by sequencing the non-vector nucleotide sequences of these recombinant molecules. Nucleotide sequence information can be obtained by employing widely used DNA sequencing protocols, such as Maxam and Gilbert sequencing, dideoxy nucleotide sequencing according to Sanger, and the like. Examples of suitable nucleotide sequencing protocols can be found in Berger and Kimmel, Methods in Enzymology Vol 52 Guide to Molecular Cloning Techniques, (1987) Academic
Press. Nucleotide sequence information from several recombinant DNA isolates, including isolates from both cDNA and genomic libraries, may be combined so as to provide the entire amino acid coding sequence of GTBP, as well as the nucleotide sequences of upstream and downstream nucleotide sequences.
Nucleotide sequences obtained from sequencing GTBP protein-specific genomic library isolates can be subjected to further analysis in order to identify regions of interest in the GTBP gene. These regions of interest include additional open reading frames, promoter sequences, termination sequences, and the like. Analysis of nucleotide sequence information is preferably performed by computer. Software suitable for analyzing nucleotide sequences for regions of interest is commercially available and includes, for example, DNASIS
(Pharmacia Biotech) . It is also of interest to use amino acid sequence information obtained from the sequencing of purified GTBP protein when analyzing new GTBP nucleotide sequence information so as to improve the accuracy of the nucleotide sequence analysis. Expression of GTBP
Isolated nucleotide sequences encoding GTBP protein can be used to produce purified GTBP protein or fragments thereof by either recombinant DNA methodology or by in vitro polypeptide synthesis techniques. By "purified" and "isolated" is meant, when referring to a polypeptide or nucleotide sequence, that the indicated molecule is present in the substantial absence of other biological macromolecules of the same type. The term "purified" as used herein preferably means at least 95% by weight, more preferably at least 99% by weight, and most preferably at least 99.8% by weight, of biological macromolecules of the same type present (but water, buffers, and other small molecules, especially molecules having a molecular weight of less than 1000, can be present) .
A significant advantage of producing GBTB protein by recombinant DNA techniques rather than by isolating from natural sources of GTBP protein is that equivalent quantities of GTBP protein can be produced by using less starting material than would be required for isolating GTBP protein from a natural source. Producing GTBP protein by recombinant techniques also permits GTBP protein to be isolated in the absence of some molecules normally present in cells that naturally produce GTBP protein. It is also apparent that recombinant DNA techniques can be used to produce GTBP protein polypeptide derivatives that are not found in nature, such as the variations described above.
GTBP protein and polypeptide derivatives of GTBP protein can be expressed by recombinant techniques when a DNA sequence encoding the relevant molecule is functionally inserted into a vector. By "functionally inserted" is meant in proper reading frame and orientation, as is well understood by those skilled in the art. Typically, the GTBP protein gene will be inserted downstream from a promoter and will be followed by a stop codon, although production as a hybrid protein followed by cleavage may be used, if desired. In general, host-cell-specific sequences improving the production yield of GTBP protein and GTBP polypeptide derivatives will be used, and appropriate control sequences will be added to the expression vector, such as enhancer sequences, polyadenylation sequences, and ribosome binding sites. Two basic types of expression are contemplated: (i) expression in mammalian cells so as to overcome a deficiency in an individual having insufficient GTBP, and
(ii) expression for the purpose of providing GTBP for purpose irrelevant to the host in which expression occurs, such as production of diagnostic tests for GTBP deficiency.
ProdUCtJQI Qf gene ic CQnS mC S for transforms,inn nf human cells
With the goal of expression in human cells, a gene construct will be prepared and used to transform human cells. Several strategies and vectors have been developed for the expression of proteins in animal cells. For example BK-SV40 hybrid vectors have been constructed . These vectors can be maintained in cultured human cells as multicopy double-stranded DNA extrachromosomal replicons. One exemplary vector consists of the SV40 promoter controlling the expression of neomycin resistance gene (the selectable marker) and the MMTV promoter regulated by the DRE enhancer sequence which controls the expression of the cloned gene. In any case, the foreign construct will usually include transcriptional and translational initiation and termination signals, with the initiation signals 5' to the gene and termination signals 3' to the gene of interest, altough linear DNA can be delivered to a host where recombination occurs for insertion into the host genome. Expression under the control of the native promoter can thus be achieved by replacing the defective gene with the linear DNA encoding GTBP by making use of cellular processes, e.g. homologous recombination. The transcriptional initiation region which includes the RNA polymerase binding site (promoter) may be native to the host or may be derived from an alternative source, where the region is functional in the host. The transcriptional initiation regions may not only include the RNA polymerase binding site, but also regions providing for the regulation of the transcription. The 3' termination region may be derived from the same gene as the trancriptional initiation region or a different gene. For example, where the gene of interest has a trascriptional termination region functional in the host species, that region may be retained within the gene.
An expression cassette can be constructed which will include transcriptional initiation region, the GTBP protein gene under the transcriptional control of the trascription initiation region, the initiation codon, the coding sequence of the gene, with or without introns, and the translational stop codons, followed by the transcriptional termination region, which will include the terminator, and may include a polyadenylation signal sequence, and other sequences associated with transcriptional termination. The direction is 5' to 3' same as the direction of transcripition. The cassette will usually be less than about 10 kb, frequently less than about 6 kb, usually being at least about 5 kb.
When the expression product of the gene is to be located other than in the cytoplasm, the gene will usually be constructed to include particular amino acid sequences which result in translocation of the product to a particular site, which may be an organelle, such as the nucleus, or may be secreted into the external envirnoment of the cell. Various secretory leaders, membrane integrator sequences, and translocation sequences for directing the peptide expression product to a particular site are described in the literature.
One or more cassettes may be involved, where the cassettes may be employed in tandem for the expression of independent genes which may express products independently of each other or may be regulated concurrently, where the products may act independently or in conjunction, e.g. GTBP and hMSH2. The expression cassette will normally be carried on a vector having at least one replication system. For convenience, it is common to have a replication system functional in E. coli such as ColEl, pSClOl, pACYC184, or the like. In this manner, at each stage after each manipulation, the resulting construct may be cloned, sequenced, and the correctness of the manipulation determined. In addition, or in place of the E. coli
replication system, a broad host range replication system may be employed, such as the replication systems of the Pl incompatibility plasmids, e.g. RK2, RP1, RP4 and R68. In addition to the replication system, there will frequently be at least one marker present, which may be uselful in one or more hosts, or different markers for individual hosts. That is, one marker may be employed for selection in a prokaryotic host, while another marker may be employed for selection in a eukaryotic host. Various genes which may be employed include neo (neomycin- kanamycin resistance) , choramphenicol acetyltransferase (cat) , b lactamase (Jbla) , b galactosidase etc.
The various fragments comprising the various constructs, expression cassettes, markers, and the like may be introduced consecutively by restriction enzyme cleavage of an appropriate replication system, and insertion of the particular construct or fragment into the available size. After ligation and cloning the vector may be isolated for further manipulation. All of these techniques are amply exemplified in the literature and find particular exemplification in Maniatis et al . , Molecular cloning: a laboratory manual , Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., 1982. Transformation of mammalian cells and σene therapy Once the vector is completed, the vector may be introduced into mammalian cells. Techniques for transforming mammalian cells include transfection, microinjection, liposome-based delivery etc.. Transfection of cultured human cells is the most commonly used method and can be achieved by standard protocols which involve either incubation of cells with DNA that has been co-precipitated with calcium phosphate or DEAE- dextran or electroporation with purified transfecting DNA. In other systems, a genetically modified virus, a liposome or a microinjection can also be used to deliver foreign DNA to human recipient cells. Once the GTBP gene has been introduced into the defective cell, it can
complement the genetic defect, restoring the normal phenotype. This methodology, when used to remediate genetic defects in individuals, goes under the name of gene therapy. At least two strategies for implementing somatic cell gene therapy have emerged and could be applied to correct GTBP genetic defects: ex vivo and in vivo gene therapy. Usually, the ex vivo gene therapy involves the following procedures:
- collect the cells from an affected individual - correct the genetic defect by gene transfer select and grow the genetically corrected (remedial) cells
- infuse or transplant corrected cells back into the patient. Vectors derived from retroviruses are often used to stably maintain and persintently express the remedial gene in the corrected cell.
In vivo gene therapy entails the direct delivery of remedial gene into the cell of a particular tissue of a prospective patient. The wild-type protein can be cloned into various benign viruses and delivered to target defective cells in an in vivo infection. Vectors derived from adenovirus, herpes simplex virus and certain retroviruses are excellent candidates for in vivo gene therapy. Methods and prospectives of gene therapy have been reviewed by Mulligan (1993), Science 260:926-932. Diagnostic methods usinσ antiσens
Typically, methods for detecting analytes such as binding proteins of the invention are based on immunoassays. Immunoassays can be conducted to determine the presence or absence of GTBP in host cells. Such techniques are well known and need not be described here in detail. Examples include both heterogeneous and homogeneous immunoassay techniques. Both techniques are based on the formation of an immunological complex between the binding protein and a corresponding specific antibody. Heterogeneous assays for GTBP typically use a
specific monoclonal or polyclonal antibody bound to solid surface, e.g. in sandwich assays. Homogeneous assays, which are carried out in solution without the presence of a solid phase, can be used, for example, by determining the difference in enzyme activity brought on by binding of free antibody to an enzyme-antigen conjugate. A number of suitable assays are disclosed in U.S. Patent Nos. 3,817,837, 4,006,360, and 3, 996.34545.
The solid surface reagent in the above assay prepared by known techniques for attaching protein material to solid support material, such as polymeric beads, dip sticks, or filter material. These attachment methods generally include non-specific adsorption of the protein to the support or covalent attachment of the protein, typically through a free amine group, to a chemically reactive group on the solid support, such as an activated carboxyl, hydroxyl, or aldehyde group.
In a second diagnostic configuration, known as a homogeneous assay, antibody binding to an analyte produces some change in the reaction medium which can be directly detected in the medium. Known general types of homogeneous assays proposed heretofore include (a) spin- labeled reporters, where antibody binding to the antigen is detected by a change in reported mobility (broadening of the spin splitting peaks) , (b) fluorescent reporters, where binding is detected by a change in fluorescence emission, (c) enzyme reporters, where antibody binding effects enzyme/substrate interactions, and (d) liposome- bound reporters, where binding leads to liposome lysis and release of encapsulated reporter. The adaptation of these methods to the protein antigen of the present invention follows conventional methods for preparing homogeneous assay reagent.
In each of the assays described above, the assay method involves reacting the tissue extract from a test individual with an antibody and examining the sample for the presence of bound antigen. The examination may
involve attaching a labelled anti-GTBP antibody to the primary complex formed between GTBP and the immobilized antibody and measuring the amount of reporter bound to the solid support, as in the first method, or may involve observing the effect of antibody binding on a homogeneous assay reagent, as in the second method. Production of specific binding proteins
GTBP, in its native or chemically modified form, or polypeptide derivatives thereof, or specific complexes with other polypeptides may be used for producing antibodies, either monoclonal or polyclonal, specific to GTBP or polypeptide derivatives thereof, or to GTBP complexes with other polypeptides. Antibodies specific for GTBP protein are produced by immunizing an appropriate vertebrate host, e.g., rabbit or mouse, with purified GTBP protein or polypeptide derivatives of GTBP protein, by themselves or in conjunction with a conventional adjuvant. Usually, two or more immunizations will be involved, and blood or spleen will be harvested a few days after the last injection. For polyclonal antisera, the immunoglobulins can be precipitated, isolated and purified by a variety of standard techniques, including affinity purification using GTBP protein attached to a solid surface, such as a gel or beads in an affinity column. For monoclonal antibodies, the splenocytes will normally be fused with an immortalized lymphocyte, e.g., a myeloid cell line, under selective conditions for hybridoma formation. The hybridomas can then be cloned under limiting dilution conditions and their supernatants screened for antibodies having the desired specificity. Techniques for producing antibodies are well known in the literature and are exemplified by the publication Antibodies : A Laboratory Manual (1988) eds. Harlow and Lane, Cold Spring Harbor Laboratories Press, and U.S. Patent Nos. 4,381,292, 4,451,570, and 4,618,577. GTBP diagnostic application using genetic probes
The genetic material of the invention can itself be used in numerous assays as probes for genetic material present in an individual. The analyte can be a nucleotide sequence which hybridizes with a probe comprising a sequence of at least about 16 consecutive nucleotides, usually 30 to 200 nucleotides, up to substantially the full sequence of the gene as shown in SEQ ID NO: 12. The analyte can be RNA or DNA. The sample is typically a DNA or an RNA molecule extracted by the patient's tissue. In order to detect an analyte, where the analyte hybridizes to a probe, the probe may contain a detectable label. Particularly preferred for use as a probe are sequences up to about 3200 consecutive nucleotides (for example from nucleotide 1 to nucleotide 3000 of SEQ ID NO: 12 and from nucleotide 1 to nucleotide 204 of SEQ ID NO:16) since these sequences appear to be particularly specific for GTBP.
One method for amplification of target nucleic acids, for later analysis by hybridization assays, is known as the polymerase chain reaction or PCR technique. The PCR technique can be applied to detect sequences of the invention in suspected samples using oligonucleotide primers spaced apart from each other and based on the genetic sequence set forth in SEQ ID NO: 12 and SEQ ID NO:16. The primers are complementary to opposite strands of a double-stranded DNA molecule and are typically separated by from about 50 to 450 nt or more (usually not more than 2000 nt) . This method entails preparing the specific oligonucleotide primers and then repeated cycles of target DNA denaturation, primer binding, and extension with a DNA polymerase to obtain DNA fragments of the expected length based on the primer spacing. Extension products generated from one primer serve as additional target sequences for the other primer. The degree of amplification of a target sequence is controlled by the number of cycles that are performed and is theoretically calculated by the simple formula 2 where n is the number
of cycles. Given that the average efficiency per cycle ranges from about 65% to 85%, 25 cycles produce from 0.3 to 4.8 million copies of the target sequence. The PCR method is described in a number of publications, including Saiki et al. , Science (1985) 230:1350-1354; Saiki et al. , Nature (1986) 324:163-166; and Scharf et al., Science (1986)233:1076-1078. Also see U.S. Patent Nos. 4,683,194; 4,683,195; and 4,683,202.
The invention includes a specific diagnostic method for determination of GTBP, based on selective amplification of GTBP-protein-encoding DNA fragments. This method employs a pair of single-stranded primers derived from non-homologous regions of opposite strands of GTBP DNA duplex fragment having a sequence as described by combining the sequences SEQ ID NO: 16 and SEQ ID NO:12. These "primer fragments" represent one aspect of the invention. The method follows the process for amplifying selected nucleic acid sequences as disclosed in U.S. Patent No. 4,683,202, as discussed above.
Mutations in the GTBP gene can be detected by restriction enzyme analysis of the amplification product or by direct sequencing. Also, alterations in GTBP sequence can be revealed by Southern hybridization with probes encompassing part or the entire sequences of SEQ ID NO: 12 and SEQ ID NO:16.
Single-stranded DNA probes complementary to the wild-type GTBP-coding sequence can also be hybridized to RNA extracted from tissues or cells of human patients and used to detect mutations in the mature GTBP gene transcript by enzymatic digestion of heteroduplexes at the level of mismatches. These and other techniques aimed to identify variations in gene sequences from wild-type GTBP are extensively reported in the literature and well established in the scientific community. Binding assays involving GTBP
The presence of an altered GTBP protein can be detected by the use of binding assays based on the specific recognition of G/T mismatches by GTBP. A synthetic double-stranded 34-mer oligonucleotide containing G/T mispair is prepared and labelled substantially as reported (15) . Cell extracts can be prepared as reported in current literature (e.g. ref 25 and refs. therein) . The cell extract (1-10 micrograms of nuclear proteins) can be incubated with the heteroduplex oligonucleotide at room temperature for 30 minutes to allow GTBP binding to the G/T mismatch. The mixture can then be loaded on a gel prepared as reported in Figure 6. Alterations in GTBP mass or affinity for the substrate can be evidenced by an altered electrophoretic mobility. Deposits
Strains of E. coli TOP10 - transformed using the plasmids pBluescript SK"/C1 and pCite-2b/Cl coding respectively for the protein GTBP from the amino acid 1 to the amino acid 1292 of SEQ ID NO:l and using the plasmid pBluescript SK"/FLYS coding for a GTBP protein from the amino acid 116 to the amino acid 1292 of SEQ ID NO:l - have been deposited on 19/5/1995 with the National
Collections of Industrial and Marine Bacteria Ltd.
(NCIMB), Aberdeen,Scotland, UK, with accession numbers NCIMB 40742, NCIMB 40471 and NCIMB 40740 respectively. Moreover, a strain of E.coli TOP10 - transformed using the plasmid pBluescript SK'/GTBP coding for the whole amino acid sequence of GTBP from the amino acid 1 to the amino acid 1360 (SEQ ID NO: 15 and SEQ ID NO:l)- has been deposited on 28/5/96 with the above depositary institution with accession number NCIMB 40805. Examples
As mentioned above, the inventors identified a mismatch-binding factor in HeLa cells (15) , GTBP, which was shown to bind preferentially to heteroduplexes containing G/T mispairs. Purification of this DNA binding activity by G/T mismatch affinity chromatography yielded
a mixture of two proteins of apparent molecular weights of 100 and 160 kDa (16) , which indicates that the mismatch-specific complex is composed of two proteins. The 100 kDa constituent of the complex is hMSH2 (17) while the second component is GTBP. Examples regarding the identity and function of GTBP are reported below. Example 1
The present example shows that the GTBP protein sequence, as reported by combining the sequences SEQ ID NO:15 and SEQ ID NO: 1, contains seven subsequences which correspond to polypeptides obtained after proteolytic cleavage of the 160 kDa DNA-binding protein termed GTBP. These subsequences are indicated as SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7 and SEQ ID NO: 8. The 160 kDa protein was purified as reported in ref. 16. The fractions containing the G/T- specific mismatch binding activity were loaded onto a preparative SDS-PAGE gel and the 100 and 160 kDa bands were excised following staining with Coomassie Blue. The proteins were digested in the gel matrix either with trypsin (100 kDa protein, Promega Corporation, UK) , or with Achromombacter lyticus endopeptidase lys-C (160 kDa protein, Wako Chemicals GmbH, Germany) . The proteolytic peptides were recovered by sequential extractions and separated by tandem hplc on a Hewlett-Packard 1090M with diode array detection. Anion-exchange and octadecyl reverse phase columns were connected in series, essentially as described by H. Kawasaki and K. Suzuki, Anal. Biochem. 186, 264 (1990) . Fractions were collected and applied directly to an Applied Biosystems 477A pulsed-liquid automated sequencer modified as described by N.F. Totty, M.D. Waterfield and J.J. Hsuan, Protein Sci , 1, 1215 (1992) . Microsequencing yielded seven proteolytic peptides whose sequences have been designated as SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7 and SEQ ID NO: 8. Example IB
The present example shows that the protein GTBP contains an amino-terminal domain corresponding to SEQ ID NO:15. This region can be determined by analysis of the coding nucleotide sequence. The amino-terminal domain is an integral part of the peptide GTBP itself, and therefore the GTBP sequence must be understood to be the sequenced combination of SEQ ID NO:15 and SEQ ID: NO:l with a total extension of 1360 amino acids. Part a of figure 8 shows the experimental approach followed to discover the amino-terminal region of GTBP (from amino acid 1 to 68 of SEQ ID NO:15) . Using the 5' RACE method(Rapid Amplification cDNA Ends, given in detail in the publication Nicolaides, N.C. et al . Genomics, 29: 229-234, 1995 and Nicolaides N.C. et al. Genomics, 30: 195-206, 1995) it is possible to determine the sequence upstream of the amino acid Ala in position 1 of SEQ ID NO:l. Initially, a pair of oligonucleotides was used that pairs with the sequence given in SEQ ID NO:12 from nucleotide 114 to 133 (primary oligonucleotide A) and from nucleotide 56 to 74 (secondary oligonucleotide B) . The PCR reaction products were sequenced and it was possible to determine that the amplification product was capable of encodmg the polypeptide DAAWSEAGPGPR, corresponding to amino acids 46-58 of the amino-terminal domain of GTBP as indicated in SEQ ID NO:15. Using a further two oligonucleotides, whose sequence was deduced from the initial RACE, complementary to the sequence given in SEQ ID NO:16 from nucleotide 188 to 204 (primary oligonucleotide C) and from oligonucleotide 169 to 185 (secondary oligonucleotide D) it was possible to amplify the GTBP-coding region 5' by-passing the methionine in position 1 of the amino acid sequence given in SEQ ID NO:15. The amplified clone, termed KMN, contained the entire nucleotidic sequence given in SEQ ID NO:16. RACE analysis of leucocyte cDNA is shown in lanes 2 and 5, that of placenta cDNA in lanes 3 and 6. The products of lanes 1 to 3 derive from sequenced amplifications with
oligonucleotides A and B, those in lanes 4 to 6 derive from sequenced amplifications with oligonucleotides C and D. Lanes 1 and 4 are the negative controls (absence of template) . The molecular weight markers are indicated at the side.
Part b of figure 8 shows expression of the transcript encoding the protein GTBP using RT-PCR (PCR preceded by inverse transcription on RNA templates) . The RT-PCR was carried out using a synthetic oligonucleotide which paired with the sequence given in SEQ ID NO:12 from nucleotide 114 to 133 in the inverse transcription reaction followed by amplification with an oligonucleotide with a sequence equal to the end 5' of the GTBP transcript, that is 5'GGTGCTTTTAGGAGCCCCG3' . The RNA used as a mold was taken from HeLa cells (lane 2) placenta (lane 3) leucocytes (lane 4) and cells from the colon (lane 5) ; these were incubated with (+ symbol on the lane) or without (- symbol on the lane) inverse transcriptase and then made to underto PCR. Wnere no cDNA was produced, as the reverse transcription reaction did not occur, it was not seen to be amplified. Lane 1 is the negative control without RNA. Example ?.
The present example shows that DNA regions internal to GTBP gene can be obtained by amplification with primers designed on the basis of the sequence of peptides deriving from proteolytic cleavage of the 160 kDa G/T- binding factor (SEQ ID NO: 2 to 8) . Following the strategy of Lingner et al. (18) the inventors identified a unique DNA sequence encoding the central 8 amino acids of the peptide of SEQ ID NO: 6. Two degenerate primers corresponding to the N- and C-terminal amino acid sequences of the oligopeptide of SEQ ID NO: 6, i.e. the DNA sequences 5'GCGAATTCTAYGGNTTYAAYGC3 ' (SEQ ID NO: 9) and
5'GCGGATCCTAYTGDATNACYTC3 ' (SEQ ID NO: 10), where N=any nucleotide, Y=C or T and D=A, G or T
were used for PCR amplification on poly-A+ HeLa mRNA as described (18) except that the MgCl2 concentration was 5 mM. The expected 67 bp fragment was eluted from an acrylamide gel, cloned into pBluescript SK- and sequenced (see. comments to SEQ ID NO: 9 and 10 for details) . Two clones contained the correct sequence, corresponding to SEQ ID NO: 11, encoding the starting target peptide SEQ ID NO: 6..
Example 3 The present example shows that DNA regions internal to GTBP gene can be identified by hybridization with a DNA probe designed on the basis of the sequence of peptides obtained upon proteolytic cleavage of the 160 kDa G/T-binding factor. The DNA sequence reported as SEQ ID NO: 11 was was labeled with 32P by a standard kinase
32 reaction (with T4 PNK and [g- P]ATP as described by
Maniatis et al. , Molecular cloning: a laboratory manual ,
Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.,
1982) in order to generate a double-stranded DNA probe. The labelled probe of SEQ ID NO: 11 was then used in the screening of a commercial oligo dT-primed cDNA library in phage lambda (HeLa S3 Uni-ZAP XR, Stratagene) . Two positive clones were selected for further analysis. Clone Cl contained an insert of 3980 bp corresponding to SEQ ID NO: 12, with a continuous open reading frame from amino acid residue 1 to 1292 encoding a polypeptide of 1292 amino acids (SEQ ID NO: 1) and a calculated molecular mass of 142 kDa; clone FLY 5 contained sequences coding from aa residue 116 to 1292 (see comments to SEQ ID NO: 1 and 12) .
As all seven peptides obtained from the microsequencing of the 160 kDa protein (SEQ ID NO: 2 to 8) could be found in SEQ ID NO: 1, it can be concluded that clone Cl encodes GTBP. Example 4
The present examples shows that GTBP protein can be used as an antigen to produce highly specific antibodies
which recognize GTBP but not hMSH2. PCR fragments corresponding to amino acid residues 27 to 158 of hMSH2 (SEQ ID NO: 13) and 750 to 928 of GTBP (SEQ ID NO: 14) were subcloned into the E. coli expression vector pGEX-3X (Pharmacia/LKB) and the recombinant proteins, in the form of fusion polypeptides with glutathione S-transferase, were induced and isolated as recommended by the manufacturer, except that the final concentration of IPTG was 0.25 mM and induced cultures were harvested after 6 hours at 20°C. The fusion proteins were used for immunization of New Zealand White S.P.F. female rabbits (Charles River Co.) using standard protocols. Two polyclonal antisera specifically immunoreactive to GTBP and hMSH2, respectively, were obtained and assayed as reported in Antibodies : A Laboratory Manual (1988) eds. Harlow and Lane, Cold Spring Harbor Laboratories Press (see Figures 2 and 5 for more details) . ExampIf? 5
The following example shows that GTBP belongs to a class of DNA-repair proteins conserved over a wide evolutionary range. Figure 3 shows the alignment of the amino acid sequences of the conserved C-terminal regions of the mismatch binding proteins GTBP (ff. sapiens) , hMSH2
(ff. sapiens) , MSH2 {S. cerevisiae) and MutS (E. coli) . Identical residues are in black boxes, conserved ones in shaded boxes. Sequences reported in the alignment correspond to entries MSH2_YEAST (MSH2) and MUTS_ECOLI
(MutS) in the SwissProt databank, or the coding region of
GenBank entry HSU04045 (hMSH2) . The alignment was carried out using the GCG Pileup option. The figure was generated using Prettyplot. The alignment reveals a high degree of conservation at the C-terminal domain among all the proteins. GTBP can thus be considered a new member of the
MutS Homolog (MSH) family. However, GTBP must be considered structurally distinct from MSH proteins, since the N-terminal domain
(up to approximatively 1000 amino acids) of GTBP exhibits
remarkable divergency from MSH (human, yeast or bacterial) . This is particularly evident when the homology matrixes of hMSH2 versus MSH2 (Figure 4 section d) and GTBP versus hMSH2 (Figure 4 section c) or GTBP versus MSH2 (Figure 4 section b) are compared to one another. In contrast, clear evidence is provided that GTBP is conserved over a wide evolutionary range and that structural homologs of GTBP through the whole sequence can also be found , e.g. in yeast (GenBank accession number Z47746, Figure 4 section a) . Example 6
The following example demonstrates that selective antisera recognize hMSH2 and GTBP bound to mismatched DNA in a complex. Figure 5 shows the effect of anti-hMSH2 and anti-GTBP antisera on the formation of the specific mismatch-binding complex. This gel-shift analysis was carried out as described (15) , except that nuclear extracts were used (25) . The antisera were added to the reaction mixtures 20 min prior to the radioactively- labelled probe. The figure is an autoradiogram of a native 6% polyacrylamide gel run in TAE buffer. Pre¬ incubation of the HeLa nuclear extracts with either antiserum prior to the addition of the G/T heteroduplex probe resulted in the diminuition of the specific band in a gel-shift assay, an effect not observed when the respective pre-immune sera were used. This result indicates that both proteins are present in the mismatch- specific factor. This finding also implies that extracts from cells lacking either protein are devoid of mismatch- binding activity.
Example 7
The following example shows that GTBP and hMSH2 can be expressed separately in a cell-free translation system. The inventors employed a hMSH2 cDNA clone (17) and the GTBP clones Cl and FLY5 as set forth in SEQ ID NO: 12. The Cl and FLY5 ORFs were introduced into pCite- 2b. The hMSH2 ORF was inserted into pCite-1 (Novagen) . In
vitro transcription and translation reactions were carried out as described previously (26) including a mock translation reaction in the absence of added DNA. 35s- labeled translation products were analyzed on a SDS- polyacrylamide gel treated with Amplify (Amersham) , dried and autoradiographed. The experiment was carried out using conditions recommended by the manufacturer. The figure is an autoradiogram of a denaturing 7.5% SDS- polyacrylamide gel. As shown in Fig. 6 section a, translation of hMSH2, GTBP (Cl) and FLY5 mRNAs in a reticulocyte lysate system (Promega) gave rise to polypeptides of 113, 142 and 122 kDa respectively. Thus, translation of all three mRNAs gave rise to protein products of the expected size. Example 8
The following examples shows that GTBP binds G/T mismatches when complexed to hMSH2. This was achieved by testing the two polypeptides expressed in a cell-free translation system for their ability to bind mismatch- containing substrates. Reconstitution of the mismatch- binding activity using in vi tro translated GTBP and hMSH2 is shown in Figure 6 section b. The figure shows a gel- shift analysis showing the binding of the in vi tro- translated proteins to the G/T heteroduplex. When GTBP and hMSH2 proteins were tested for mismatch binding activity, it was noted that expression of either protein alone has no effect on the intensity of the endogenous G/T-specific band present in the lysates at low levels. In contrast, mixing of the hMSH2 and GTBP translation products resulted in a reproducible increase in the intensity of the mismatch-specific band. This result is confirmed by using the GTBP cDNA clone FLY5, which encodes a truncated GTBP protein (see SEQ ID NO: 1 and 12) . Mixing of hMSH2 and FLY5 translation products with the G/T probe gave rise to a new band with a faster electrophoretic mobility than the endogenous complex, such as would be expected of a smaller species. This
experiment provides convincing evidence that the human mismatch binding complex is composed of hMSH2 and GTBP.
Gel-shift assays were performed as described in (15) . 5ml aliquots of the single in . vi tro translation reactions were tested; in the pre-mixing experiments, 2.5 ml of each of the two translation reactions were mixed and incubated for 15 min at room temperature before the addition of the probe. 5 mM AMP was included in all the DNA binding reactions so as to overcome the effect of ATP in the reticulocyte lysates, which prevents the formation of mismatch-specific protein/DNA complexes (16) . The figure is an autoradiogram of a native 6% polyacrylamide gel run in TAE buffer.
Genetic alterations in mismatch repair genes such as hMSH2, hMLHl , hPMSl and hPMS2 (1) are known to cause the hypermutability found in many forms of hereditary colorectal cancers (CRC) . Here we report examples showing that different cell lines from CRC, which display hypermutable phenotype, contain mutated GTBP alleles which are expressed into non functional proteins. We also show that the spectrum of mutations found in these cell lines is different from that caused by the inactivation of hMSH2 or of other mismatch repair genes. The following examples confirm the role of GTBP in the maintenance of human genome integrity in vivo and provide an explanation for the mutator phenotype observed in different CRC. Example 9
The following example shows that mismatch binding activity is absent from extracts of LoVo and DLDl cells, both derived from human CRC. LoVo cells contain a homozygous deletion in both hMSH2 alleles (13) while neither hMSH2 allele appears to be mutated in the cell line DLDl (19) . Extracts of LoVo and DLDl cells fail to make mismatch-specific complexes as revealed by gel-shift assay shown in Figure 7 section a (probes were prepared as described previously (15) and experimental conditions were as in Figure 5) . The figure is an autoradiogram of a
native 6% polyacrylamide gel run in TAE buffer showing the absence of specific DNA-protein complexes of expected molecular mass in both LoVo and DLDl extracts. Based on this it appears evident that the DLDl cell line must be devoid of GTBP. Confirmatory results were also obtained by direct screening of LoVo and DLDl cell extracts with specific antibodies directed against GTBP and hMSH2. As expected, western blot analysis of HeLa extracts revealed the presence of equivalent amounts of hMSH2 and GTBP. In contrast, LoVo cells could be shown to lack hMSH2, and
DLDl extracts were completely devoid of full-length GTBP
(Figure 7 section b) . Interestingly, the amounts of hMSH2 in DLDl and of GTBP in LoVo extracts were considerably lower than in the HeLa extracts. Our explanation for this finding is that hMSH2 and GTBP are unstable when not in a complex (16) . Example 10
The CRC-derived cell line HCT15 contains a full length hMSH2 protein but shows hypermutable phenotype (19) . To determine whether HCT15 had a mutation in the GTBP coding sequence, the RNA of this cell line was reverse transcribed with random hexamers and reverse transcriptase according to standard protocols (e.g., see Powell et al., New Engl . J. Med. 329, 1982, 1993). The cDNA was then amplified with PCR using primers specific for the GTBP-coding sequence. The oligonucleotides used were: primer 5' -PGAGGGTTACCCCTGG-3' and 5'- ACACTGTAAGTCTGTGTACC-3' for codons 32 to 458, primers 5'- PAGTGAAGGCCTGAACAGCC-3' and 5' -AAGTCCAGTCTTTCGAGCC-3' for codons 219 to 858, and primers 5' -PGAGAGGGTTGATACTTGCC-3' and 5' -AGAAGTCAACTCAAAGCTTCC-3' for codons 692 to 1292 (where P denotes a T7 promoter sequence and a ribosome- binding site for translation initiation (26) and codon numbers are those reported in SEQ ID NO: 1 and SEQ ID NO: 12) . To detect mutations in the GTBP-coding sequence, the amplification products were first transcribed and translated in vi tro using a commercial kit (Promega) .
Analysis of translation products in a PAGE-SDS gel revealed truncated GTBP polypeptides from two PCR products, corresponding to regions located at codons 32- 458 (5' -end of the gene) and 692-1292 (3' -end of the gene) . Sequencing of these PCR products using a commercial system (SequiTherm Polymerase, Epicentre Technologies) revealed that truncations were due to frameshift mutations. The deletion of nucleotide 664 (a C) at codon 222 changed a leucine to a termination codon and a substitution of nucleotides 3307-3312 (GATAGA) with T (see SEQ ID NO: 12) created a new termination codon several bp downstream.
Example 11
MTl is an alkylation-resistant lymphoblastoid cell line with a biochemical deficiency . similar to that of HCT15 (see Goldmacher et al . , J. Biol . Chem. , 261 , 12462, 1986; Kat et al . Proc. Natl . Acad Sci USA, 90, 6424, 1993) . To ascertain whether MTl had a GTBP mutation, the RNA of this cell line was reverse transcribed with random hexamers and reverse transcriptase and the cDNA was then amplified with PCR using primers specific for the GTBP- coding sequence as reported above. In vi tro transcription and translation of GTBP-coding sequence from MTl did not reveal truncated GTBP polypeptide after electrophoretic analysis. The coding region of GTBP was therefore sequenced and two missense mutation were found in the GTBP cDNA. The first was an GAT to GTT transversion at codon 1145 of SEQ ID NO: 1, resulting in a substitution of aspartic acid with valine. The aspartic acid at codon 1145 is located in the putative DNA-binding domain of
GTBP, and the identical amino acid is found at homologous positions in GTBP (ff. sapiens) , hMSH2 (ff. sapiens) , MSH2
(S. cerevisiae) and MutS (E. coli ) . This highly conserved amino acid residue is therefore necessary for GTBP activity and non conservative substitutions at this residue cause dramatic refuction of GTBP funcionality. The second was a GTT to ATT transition, resulting in a
substitution of isoleucine to valine at codon 1193 of SEQ ID NO: 1.
The amplification products were cloned in the vector
BLUESCRIPT SK~ and individual clones were sequenced using conventional protocols (Sequenase, USB) . The two mutations were not found to be associated in a single clone, deriving thus from separate alleles.
Example 12
A tumor cell line, termed 543X (from the patient's designation) was derived from CRC and displays hypermutable phenotype and microsatellite instability but no mutation in mismatch repair genes so far described, including hMSH2 , hMLHl , hPMSl and hPMS2 (Liu et al . , Nature Genetics 9, 48, 1995) . To ascertain whether 543X had a GTBP mutation, the RNA of this cell line was reverse transcribed with random hexamers and reverse transcriptase and the cDNA was then amplified with PCR using primers specific for the GTBP-coding sequence as reported above. Jn vitro transcription and translation of GTBP-coding sequence from 543X revealed truncated GTBP polypeptide after electrophoretic analysis. The sequence of the DNA region encoding GTBP was found to contain a 1 bp insertion (a T) at nucleotide 1876 of SEQ ID NO: 12, resulting in a frameshift and a downstream termination codon. The same mutation was identified also in the tumor tissue from this patient, but not in normal colon tissue. This proves that the mutation was somatic in nature and that it did not occur after the establishment of the cell culture line.
SEQUENCE LISTING GENERAL INFORMATION (i) APPLICANT: ISTITUTO DI RICERCHE DI BIOLOGIA MOLECOLARE P. ANGELETTI S.p.A. (ii) TITLE OF INVENTION: POLYPEPTIDE FOR
REPAIRING GENETIC INFORMATION, NUCLEOTIDIC SEQUENCE WHICH CODES FOR IT AND PROCESS FOR THE PREPARATION THEREOF (iii) NUMBER OF SEQUENCES: 16 (iv) CORRESPONDENCE ADDRESS:
(A)ADDRESSEE: Societa Italiana Brevetti" (B) STREET: Piazza di Pietra, 39 (C)CITY: Rome (D) COUNTRY : Italy (E) POSTAL CODE: 1-00186
(v) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Floppy disk 3.5" 1.44
MBYTES (B) COMPUTER: IBM PC compatible (C) OPERATING SYSTEM: PC-DOS/MS-DOS Rev.5.0
(D) SOFTWARE: Microsoft Word 6.0 (viii) ATTORNEY INFORMATION
(A) NAME: DI CERBO, Mario (Dr.) (C) REFERENCE: RM/X88551/PC-DC (ix) TELECOMMUNICATION INFORMATION
(A) TELEPHONE: 06/6785941 (B) TELEFAX: 06/6794692 (C) TELEX: 612287 ROPAT
(1) INFORMATION FOR SEQ ID NO: 1:
(i) SEQUENCE CHARACTERISTICS
(A) LENGTH: 1292 amino acids (B)TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein (iii) HYPOTHETICAL: No
( iv) ANTISENSE : No
(vi ) ORIGINAL SOURCE :
(A) ORGANISM : Homo sapiens (vii ) IMMEDIATE SOURCE : cDNA clone pCITE2b- Cl ( ix) FEATURE : SEQ ID NO : 1 shows the 1292 amino acid sequence ( in three letter code) of GTBP encoded by clone Cl ( see SEQ ID NO : 12 ) . The seven oligopeptides which were identif ied upon proteolytic cleavage of GTBP ( see SEQ ID NO : 2 to 8 ) are underlined . The first amino acid residue of the peptide encoded by the FLY5 cDNA is Asn at position 116 . (A) NAME : Cl
(C) IDENTIFICATION METHOD : Experimentally (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 :
Ala Lys Asn Leu Asn Gly Gly Leu Arg Arg Ser Val Ala Pro Ala Ala 1 5 10 15
Pro Thr Ser Cys Asp Phe Ser Pro Gly Asp Leu Val Trp Ala Lys Met 20 25 30 Glu Gly Tyr Pro Trp Trp Pro Cys Leu Val Tyr Asn His Pro Phe Asp
35 40 45
Gly Thr Phe lie Arg Glu Lys Gly Lys Ser Val Arσ Val His Val Gin
50 55 60
Phe Phe Asp Asp Ser Pro Thr Arg Gly Trp Val Ser Lys Arg Leu Leu 65 70 75 80
Lys Pro Tyr Thr Gly Ser Lys Ser Lys Glu Ala Gin Lys Gly Gly His
85 90 95
Phe Tyr Ser Ala Lys Pro Glu Ile Leu Arg Ala Met Gin Arg Ala Asp 100 105 110 Glu Ala Leu Asn Lys Asp Lys Ile Lys Arg Leu Glu Leu Ala Val Cys
115 120 125
Asp Glu Pro Ser Glu Pro Glu Glu Glu Glu Glu Met Glu Val Gly Thr
130 135 140
Thr Tyr Val Thr Asp Lys Ser Glu Glu Asp Asn Glu Ile Glu Ser Glu 145 150 155 160
Glu Glu Val Gin Pro Lys Thr Gin Gly Ser Arg Arg Ser Ser Arg Gin 165 170 175
Ile Lys Lys Arg Arg Val Ile Ser Asp Ser Glu Ser Asp Ile Gly Gly
180 185 190
Ser Asp Val Glu Phe Lys Pro Asp Thr Lys Glu Glu Gly Ser Ser Asp 195 200 205 Glu Ile Ser Ser Gly Val Gly Asp Ser Glu Ser Glu Gly Leu Asn Ser 210 215 220
Pro Val Lys Val Ala Arg Lys Arg Lys Arg Met Val Thr Gly Asn Gly 225 230 235 240
Ser Leu Lys Arg Lys Ser Ser Arg Lys Glu Thr Pro Ser Ala Thr Lys 245 250 255
Gin Ala Thr Ser Ile Ser Ser Glu Thr Lys Asn Thr Leu Arg Ala Phe
260 265 270
Ser Ala Pro Gin Asn Ser Glu Ser Gin Ala His Val Ser Gly Gly Gly 275 280 285 Asp Asp Ser Ser Arg Pro Thr Val Trp Tyr His Glu Thr Leu Glu Trp 290 295 300
Leu Lys Glu Glu Lys Arg Arg Asp Glu His Arg Arg Arg Pro Asp His 305 310 315 320
Pro Asp Phe Asp Ala Ser Thr Leu Tyr Val Pro Glu Asp Phe Leu Asn 325 330 335
Ser Cys Thr Pro Gly Met Arg Lys Trp Trp Gin Ile Lys Ser Gin Asn
340 345 350
Phe Asp Leu Val Ile Cys Tyr Lys Val Gly Lys Phe Tyr Glu Leu Tyr 355 360 365 His Met Asp Ala Leu Ile Gly Val Ser Glu Leu Gly Leu Val Phe Met 370 375 380
Lys Gly Asn Trp Ala His Ser Gly Phe Pro Glu Ile Ala Phe Gly Arg 385 390 395 400
Tyr Ser Asp Ser Leu Val Gin Lys Gly Tyr Lys Val Ala Arg Val Glu 405 410 415
Gin Thr Glu Thr Pro Glu Met Met Glu Ala Arg Cys Arg Lys Met Ala
420 425 430
His Ile Ser Lys Tyr Asp Arg Val Val Arg Arg Glu Ile Cys Arg Ile 435 440 445 Ile Thr Lys Gly Thr Gin Thr Tyr Ser Val Leu Glu Gly Asp Pro Ser 450 455 460
Glu Asn Tyr Ser Lys Tyr Leu Leu Ser Leu Lys Glu Lys Glu Glu Asp
465 470 475 480
Ser Ser Gly His Thr Arg Ala Tyr Gly Val Cys Phe Val Asp Thr Ser
485 490 495
Leu Gly Lys Phe Phe Ile Gly Gin Phe Ser Asp Asp Arg His Cys Ser 500 505 510
Arg Phe Arg Thr Leu Val Ala Hxs Tyr Pro Pro Val Gin Val Leu Phe
515 520 525
Glu Lys Gly Asn Leu Ser Lys Glu Thr Lys Thr Ile Leu Lys Ser Ser
530 535 540 Leu Ser Cys Ser Leu Gin Glu Gly Leu Ile Pro Gly Ser Gin Phe Trp
545 550 555 560
Asp Ala Ser Lys Thr Leu Arσ Thr Leu Leu Glu Glu Glu Tyr Phe Arg
565 570 575
Glu Lys Leu Ser Asp Gly Ile Glv Val Met Leu Pro Gin Val Leu Lys 580 585 590
Gly Met Thr Ser Glu Ser Asp Ser Ile Gly Leu Thr Pro Gly Glu Lys
595 600 605
Ser Glu Leu Ala Leu Ser Ala Leu Gly Gly Cys Val Phe Tyr Leu Lys
610 615 620 Lys Cys Leu Ile Asp Gin Glu Leu Leu Ser Met Ala Asn Phe Glu Glu
625 630 635 640
Tyr Ile Pro Leu Asp Ser Asp Thr Val Ser Thr Thr Arg Ser Gly Ala
645 650 655
Ile Phe Thr Lys Ala Tyr Gin Arg Met Val Leu Asp Ala Val Thr Leu 660 665 670
Asn Asn Leu Glu Ile Phe Leu Asn Gly Thr Asn Gly Ser Thr Glu Gly
675 680 685
Thr Leu Leu Glu Arg Val Asp Thr Cys His Thr Pro Phe Gly Lys Arg
690 695 700 Leu Leu Lys Gin Trp Leu Cys Ala Pro Leu Cys Asn His Tyr Ala Ile
705 710 715 720
Asn Asp Arg Leu Asp Ala Ile Glu Asp Leu Met Val Val Pro Asp Lys
725 730 735
Ile Ser Glu Val Val Glu Leu Leu Lys Lys Leu Pro Asp Leu Glu Arσ 740 745 750
Leu Leu Ser Lys Ile His Asn Val Gly Ser Pro Leu Lys Ser Gin Asn 755 760 765
His Pro Asp Ser Arg Ala Ile Met Tyr Glu Glu Thr Thr Tyr Ser Lys 770 775 780
Lys Lys He lie Asp Phe su Ser Ala Leu Glu Gly Phe Lys val Met
785 790 795 800 Cys Lys Ile Ile Gly Ile Met Glu Glu Val Ala Asp Gly Phe Lys Ser
805 810 815
Lys lie Leu Lys Gin Val Ile Ser Leu Gin Thr Lys Asn Pro Glu Gly
820 825 830
Arσ Phe Pro ASP Leu Thr Val Glu Leu Asn Arg Trp Asp Thr Ala Phe 835 840 845
Asp His Glu Lys Ala Arg Lys Thr Gly Leu Ile Thr Pro Lys Ala Gly
850 855 860
Phe Asp Ser Asp Tyr Asp Gin Ala Leu Ala Asp Ile Arg Glu Asn Glu 865 870 875 Glu Asn 1045 1050 1055
Gly Lys Ala Tyr Cys Val Leu Val Thr Gly Pro Asn Met Gly Gly Lys
1060 1065 1070
Ser Thr Leu Met Arg Gin Ala Gly Leu Leu Ala Val Met Ala Gin Met 1075 1080 1085 Gly Cys Tyr Val Pro Ala Glu Val Cys Arg Leu Thr Pro Ile Asp Arg 1090 1095 1100
Val Phe Thr Arg Leu Gly Ala Ser Asp Arg Ile Met Ser Gly Glu Ser 1105 1110 1115 1120
Thr Phe Phe Val Glu Leu Ser Glu Thr Ala Ser Ile Leu Met His Ala 1125 1130 1135
Thr Ala His Ser Leu Val Leu Val Asp Glu Leu Gly Arg Gly Thr Ala
1140 1145 1150
Thr Phe Asp Gly Thr Ala Ile Ala Asn Ala Val Val Lys Glu Leu Ala 1155 1160 1165 Glu Thr Ile Lys Cys Arg Thr Leu Phe Ser Thr His Tyr His Ser Leu 1170 1175 1180
Val Glu Asp Tyr Ser Gin Asn Val Ala Val Arg Leu Gly His Met Ala 1185 1190 1195 1200
Cys Met Val Glu Asn Glu Cys Glu Asp Pro Ser Gin Glu Thr Ile Thr 1205 1210 1215
Phe Leu Tyr Lys Phe lie Lys Gly Ala Cys Pro Lys Ser Tyr Gly Phe 1220 1225 1230
Asn Ala Ma Arα Leu Ala Asn Leu Pro Glu Glu Val Ile Gin Lys Gly
1235 1240 1245
His Arg Lys Ala Arg Glu Phe Glu Lys Met Asn Gin Ser Leu Arg Leu 1250 1255 1260 Phe Arg Glu Val Cys Leu Ala Ser Glu Arg Ser Thr Val Asp Ala Glu
1265 1270 1275 1280
Ala Val His Lys Leu Leu Thr Leu Ile Lys Glu Leu 1285 1290
(2) INFORMATION FOR SEQ ID NO: 2: (i) SEQUENCE CHARACTERISTICS
(A) LENGTH: 10 amino acids (B)TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein
(iii) HYPOTHETICAL: No (iv) ANTISENSE: No
(vi) ORIGINAL SOURCE:
(A) ORGANISM: Homo sapiens (ix) FEATURE: SEQ ID NO: 2 to 8 show seven oligopeptides derived from proteolytic cleavage of GTBP extracted from HeLa cells and purified as described in ref. 16 . The peptide corresponding to SEQ ID NO: 6 (18 amino acids) was selected to design two degenerate primers corresponding to the N- and C-terminal sequences of the peptide, as given in detail in SEQ ID NO: 9 and 10. (A) NAME: FR44 (C) IDENTIFICATION METHOD: Experimentally
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2:
Val Arg Val His Val Gin Phe Phe Asp Asp 1 5 10
(3) INFORMATION FOR SEQ ID NO: 3: (i) SEQUENCE CHARACTERISTICS
(A) LENGTH: 18 amino acids (B)TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (iii) HYPOTHETICAL: No (iv) ANTISENSE: No
(vi) ORIGINAL SOURCE:
(A) ORGANISM: Homo sapiens (ix) FEATURE: see SEQ ID NO: 2 (A)NAME: FR48 (C) IDENTIFICATION METHOD: Experimentally
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: Lys Leu Pro Asp Leu Glu Arg Leu Leu Ser Lys Ile His Asn Val XXX 1 5 10 15
Ser Lys (4) INFORMATION FOR SEQ ID NO: 4:
(i) SEQUENCE CHARACTERISTICS (A) LENGTH: 13 amino acids (B)TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein (iii) HYPOTHETICAL: No (iv) ANTISENSE: No (vi) ORIGINAL SOURCE: (A) ORGANISM: Homo sapiens
(ix) FEATURE: see SEQ ID NO: 2 (A)NAME: FR49b
(C) IDENTIFICATION METHOD: Experimentally (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: Leu Ser Arg Gly Iso Gly Val Met Leu Pro Gin Val Leu 1 5 10
(5) INFORMATION FOR SEQ ID NO: 5:
(i) SEQUENCE CHARACTERISTICS (A) LENGTH: 14 amino acids (B)TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(iii) HYPOTHETICAL: No
(iv) ANTISENSE: No
(vi) ORIGINAL SOURCE: (A) ORGANISM: Homo sapiens
(ix) FEATURE: see SEQ ID NO: 2 (A) NAME: FR49c
(C) IDENTIFICATION METHOD: Experimentally (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: Thr Leu Arg Thr Leu Leu Glu Glu Glu Tyr Phe Arg Glu Lys
1 5 10
(6) INFORMATION FOR SEQ ID NO : 6:
(i) SEQUENCE CHARACTERISTICS
(A) LENGTH: 18 amino acids (B)TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (iii) HYPOTHETICAL: No (iv) ANTISENSE: No
(vi) ORIGINAL SOURCE:
(A) ORGANISM: HeLa cell extract (ix) FEATURE: see SEQ ID NO: 2 (A) NAME: FR52 (C) IDENTIFICATION METHOD: Experimentally
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: Ser Tyr Gly Phe Asn Ala Ala Arg Leu Ala Asn Leu Pro Glu Glu Val 1 5 10 15
Ile Gin (7) INFORMATION FOR SEQ ID NO : 7:
(i) SEQUENCE CHARACTERISTICS (A) LENGTH: 13 amino acids (B)TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(iii) HYPOTHETICAL: No (iv) ANTISENSE: No (vi) ORIGINAL SOURCE:
(A) ORGANISM: Homo sapiens (ix) FEATURE: see SEQ ID NO: 2
(A) NAME: FR59
(C) IDENTIFICATION METHOD: Experimentally (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: Asn Pro Glu Gly Arg Phe Pro Asp Leu Thr Val Glu Leu 1 5 10
(8) INFORMATION FOR SEQ ID NO: 8: (i) SEQUENCE CHARACTERISTICS
(A) LENGTH: 11 amino acids (B)TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (iii) HYPOTHETICAL: No (iv) ANTISENSE: No (vi) ORIGINAL SOURCE:
(A) ORGANISM: Homo sapiens (ix) FEATURE: see SEQ ID NO: 2 (A)NAME: FR69
(C) IDENTIFICATION METHOD: Experimentally (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8:
Ile Ile Asp Phe Leu Ser Ala Leu Glu Gly Phe 1 5 10
(9) INFORMATION FOR SEQ ID NO: 9 (i) SEQUENCE CHARACTERISTICS (A) LENGTH: 22 base pairs
(B)TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: synthetic DNA (iii) HYPOTHETICAL: No
(iv) ANTISENSE: No (vii) IMMEDIATE SOURCE: oligonucleotide synthesizer
(ix) FEATURE: SEQ ID NO: 9 shows the sequence of the degenerate single-stranded DNA primer deduced from the N-terminal of oligopeptide shown in SEQ ID NO: 6. Together with SEQ ID NO: 10, the two primers were used to amplify poly-A RNA extracted from HeLa cells . The expected 67 base pairs (bp) fragment was cloned in pBluescript SK~ (Stratagene) and sequenced with a commercial T7-polymerase based kit (Pharmacia) . The 54 bp sequence of the resulting fragment, obtained after subtraction of the engineered cloning sites, is shown as SEQ ID NO: 11. (A)NAME: oligo 5' sense (C) IDENTIFICATION METHOD: Polyacrylamide gel
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9 GCGAATTCTA YGGNTTYAAY GC 22
(10) INFORMATION FOR SEQ ID NO: 10
(i) SEQUENCE CHARACTERISTICS (A) LENGTH: 22 base pairs
(B)TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: synthetic DNA (iii) HYPOTHETICAL: No
(iv) ANTISENSE: Yes
(vii) IMMEDIATE SOURCE: oligonucleotide synthesizer (ix) FEATURE: SEQ ID NO:10 shows the sequence of the degenerate single-stranded DNA primer deduced from the C-terminal of oligopeptide shown in SEQ
ID NO: 6. Together with SEQ ID NO: 9, the two primers were used to amplify poly-A RNA extracted from HeLa cells. The expected 67 base pairs (bp) fragment was cloned in pBluescript SK" (Stratagene) and sequenced with a commercial
T7-polymerase based kit (Pharmacia) .The 54 bp sequence of the resulting fragment, obtained
after subtraction of the engineered cloning sites, is shown as SEQ ID NO: 11. (A)NAME: oligo 3' antisense
(C) IDENTIFICATION METHOD: Polyacrylamide gel (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10
GCGGATCCTC YTGDATNACY TC 22
(11) INFORMATION FOR SEQ ID NO: 11
(i) SEQUENCE CHARACTERISTICS (A) LENGTH: 54 base pairs (B)TYPE: nucleic acid
(C) STRANDEDNESS: double (D) TOPOLOGY: linear (ii) MOLECULE TYPE: synthetic DNA (iii) HYPOTHETICAL: No (iv) ANTISENSE: Yes
(vii) IMMEDIATE SOURCE: PCR product
(ix) FEATURE: SEQ ID NO: 11 shows the double-stranded DNA sequence encoding the oligopeptide reported in SEQ ID NO: 6, as deduced by sequencing of cloned amplification product . This fragment was derived from PCR amplification of HeLa cDNA, using the degenerate primers described in SEQ ID
NO: 9 and 10. The DNA sequence was end-labelled
32 with P by a standard kinase reaction (with T4
32 polynucleotide kinase PNK and [g- P]ATP as described by Maniatis et al . , Molecular cloning: a laboratory manual , Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. , 1982) in order to generate a double-stranded DNA probe. The labelled probe was used in the screening of a commercial oligo dT-primed cDNA library in phage lambda (HeLa S3 UNI-ZAP XR, Stratagene) . Screening of the HeLa S3 UNI-ZAP XR library in phage lambda made it possible the identification of two clones hybridizing with the DNA probe.
These clones were designated Cl and FLY5. (A)NAME:
(C) IDENTIFICATION METHOD: Polyacrylamide gel (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 AGCTATGGCT TTAATGCAGC AAGGCTTGCT AATCTCCCAG AGGAAGTTAT TCAA 54 (12) INFORMATION FOR SEQ ID NO: 12
(i) SEQUENCE CHARACTERISTICS
(A) LENGTH: 3980 base pairs (B)TYPE: nucleic acid (C) STRANDEDNESS: double (D)TOPOLOGY: linear
(ii) MOLECULE TYPE: synthetic DNA (iii) HYPOTHETICAL: No (iv) ANTISENSE: No
(vii) IMMEDIATE SOURCE: cDNA clone Cl (ix) FEATURE: SEQ ID NO: 12 shows the 3980 bp cDNA sequence of clone Cl . The cDNA insert of clone FLY5 spanned from nucleotide 346 to 3980 of the Cl sequence as reported in SEQ ID NO: 12. (A)NAME: Cl (C) IDENTIFICATION METHOD: Polyacrylamide gel
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 GCGAAGAACC TCAACGGAGG GCTGCGGAGA TCGGTAGCGC CTGCTGCCCC CACCAGTTGT 60 GACTTCTCAC CAGGAGATTT GGTTTGGGCC AAGATGGAGG GTTACCCCTG GTGGCCTTGT 120 CTGGTTTACA ACCACCCCTT TGATGGAACA TTCATCCGCG AGAAAGGGAA ATCAGTCCGT 180 GTTCATGTAC AGTTTTTTGA TGACAGCCCA ACAAGGGGCT GGGTTAGCAA AAGGCTTTTA 240 AAGCCATATA CAGGTTCAAA ATCAAAGGAA GCCCAGAAGG GAGGTCATTT TTACAGTGCA 300 AAGCCTGAAA TACTGAGAGC AATGCAACGT GCAGATGAAG CCTTAAATAA AGACAAGATT 360 AAGAGGCTTG AATTGGCAGT TTGTGATGAG CCCTCAGAGC CAGAAGAGGA AGAAGAGATG 420 GAGGTAGGCA CAACTTACGT AACAGATAAG AGTGAAGAAG ATAATGAAAT TGAGAGTGAA 480 GAGGAAGTAC AGCCTAAGAC ACAAGGATCT AGGCGAAGTA GCCGCCAAAT AAAAAAACGA 540 AGGGTCATAT CAGATTCTGA GAGTGACATT GGTGGCTCTG ATGTGGAATT TAAGCCAGAC 600 ACTAAGGAGG AAGGAAGCAG TGATGAAATA AGCAGTGGAG TGGGGGATAG TGAGAGTGAA 660 GGCCTGAACA GCCCTGTCAA AGTTGCTCGA AAGCGGAAGA GAATGGTGAC TGGAAATGGC 720 TCTCTTAAAA GGAAAAGCTC TAGGAAGGAA ACGCCCTCAG CCACCAAACA AGCAACTAGC 780 ATTTCATCAG AAACCAAGAA TACTTTGAGA GCTTTCTCTG CCCCTCAAAA TTCTGAATCC 840 CAAGCCCACG TTAGTGGAGG TGGTGATGAC AGTAGTCGCC CTACTGTTTG GTATCATGAA 900 ACTTTAGAAT GGCTTAAGGA GGAAAAGAGA AGAGATGAGC ACAGGAGGAG GCCTGATCAC 960
CCCGATTTTG ATGCATCTAC ACTCTATGTG CCTGAGGATT TCCTCAATTC TTGTACTCCT 1020
GGGATGAGGA AGTGGTGGCA GATTAAGTCT CAGAACTTTG ATCTTGTCAT CTGTTACAAG 1080
GTGGGGAAAT TTTATGAGCT GTACCACATG GATGCTCTTA TTGGAGTCAG TGAACTGGGG 1140
CTGGTATTCA TGAAAGGCAA CTGGGCCCAT TCTGGCTTTC CTGAAATTGC ATTTGGCCGT 1200 TATTCAGATT CCCTGGTGCA GAAGGGCTAT AAAGTAGCAC GAGTGGAACA GACTGAGACT 1260
CCAGAAATGA TGGAGGCACG ATGTAGAAAG ATGGCACATA TATCCAAGTA TGATAGAGTG 1320
GTGAGGAGGG AGATCTGTAG GATCATTACC AAGGGTACAC AGACTTACAG TGTGCTGGAA 1380
GGTGATCCCT CTGAGAACTA CAGTAAGTAT CTTCTTAGCC TCAAAGAAAA AGAGGAAGAT 1440
TCTTCTGGCC ATACTCGTGC ATATGGTGTG TGCTTTGTTG ATACTTCACT GGGAAAGTTT 1500 TTCATAGGTC AGTTTTCAGA TGATCGCCAT TGTTCGAGAT TTAGGACTCT AGTGGCACAC 1560
TATCCCCCAG TACAAGTTTT ATTTGAAAAA GGAAATCTCT CAAAGGAAAC TAAAACAATT 1620
CTAAAGAGTT CATTGTCCTG TTCTCTTCAG GAAGGTCTGA TACCCGGCTC CCAGTTTTGG 1680
GATGCATCCA AAACTTTGAG AACTCTCCTT GAGGAAGAAT ATTTTAGGGA AAAGCTAAGT 1740
GATGGCATTG GGGTGATGTT ACCCCAGGTG CTTAAAGGTA TGACTTCAGA GTCTGATTCC 1800 ATTGGGTTGA CACCAGGAGA GAAAAGTGAA TTGGCCCTCT CTGCTCTAGG TGGTTGTGTC 1860
TTCTACCTCA AAAAATGCCT TATTGATCAG GAGCTTTTAT CAATGGCTAA TTTTGAAGAA 1920
TATATTCCCT TGGATTCTGA CACAGTCAGC ACTACAAGAT CTGGTGCTAT CTTCACCAAA 1980
GCCTATCAAC GAATGGTGCT AGATGCAGTG ACATTAAACA ACTTGGAGAT TTTTCTGAAT 2040
GGAACAAATG GTTCTACTGA AGGAACCCTA CTAGAGAGGG TTGATACTTG CCATACTCCT 2100 TTTGGTAAGC GGCTCCTAAA GCAATGGCTT TGTGCCCCAC TCTGTAACCA TTATGCTATT 2160
AATGATCGTC TAGATGCCAT AGAAGACCTC ATGGTTGTGC CTGACAAAAT CTCCGAAGTT 2220
GTAGAGCTTC TAAAGAAGCT TCCAGATCTT GAGAGGCTAC TCAGTAAAAT TCATAATGTT 2280
GGGTCTCCCC TGAAGAGTCA GAACCACCCA GACAGCAGGG CTATAATGTA TGAAGAAACT 2340
ACATACAGCA AGAAGAAGAT TATTGATTTT CTTTCTGCTC TGGAAGGATT CAAAGTAATG 2400 TGTAAAATTA TAGGGATCAT GGAAGAAGTT GCTGATGGTT TTAAGTCTAA AATCCTTAAG 2460
CAGGTCATCT CTCTGCAGAC AAAAAATCCT GAAGGTCGTT TTCCTGATTT GACTGTAGAA 2520
TTGAACCGAT GGGATACAGC CTTTGACCAT GAAAAGGCTC GAAAGACTGG ACTTATTACT 2580
CCCAAAGCAG GCTTTGACTC TGATTATGAC CAAGCTCTTG CTGACATAAG AGAAAATGAA 2640
CAGAGCCTCC TGGAATACCT AGAGAAACAG CGCAACAGAA TTGGCTGTAG GACCATAGTC 2700 TATTGGGGGA TTGGTAGGAA CCGTTACCAG CTGGAAATTC CTGAGAATTT CACCACTCGC 2760
AATTTGCCAG AAGAATACGA GTTGAAATCT ACCAAGAAGG GCTGTAAACG ATACTGGACC 2820
AAAACTATTG AAAAGAAGTT GGCTAATCTC ATAAATGCTG AAGAACGGAG GGATGTATCA 2880
TTGAAGGACT GCATGCGGCG ACTGTTCTAT AACTTTGATA AAAATTACAA GGACTGGCAG 2940
TCTGCTGTAG AGTGTATCGC AGTGTTGGAT GTTTTACTGT GCCTGGCTAA CTATAGTCGA 3000 GGGGGTGATG GTCCTATGTG TCGCCCAGTA ATTCTGTTGC CGGAAGATAC CCCCCCCTTC 3060
TTAGAGCTTA AAGGATCACG CCATCCTTGC ATTACGAAGA CTTTTTTTGG AGATGATTTT 3120
ATTCCTAATG ACATTCTAAT AGGCTGTGAG GAAGAGGAGC AGGAAAATGG CAAAGCCTAT 3180
TGTGTGCTTG TTACTGGACC AAATATGGGG GGCAAGTCTA CGCTTATGAG ACAGGCTGGC 3240 TTATTAGCTG TAATGGCCCA GATGGGTTGT TACGTCCCTG CTGAAGTGTG CAGGCTCACA 3300 CCAATTGATA GAGTGTTTAC TAGACTTGGT GCCTCAGACA GAATAATGTC AGGTGAAAGT 3360 ACATTTTTTG TTGAATTAAG TGAAACTGCC AGCATACTCA TGCATGCAAC AGCACATTCT 3420 CTGGTGCTTG TGGATGAATT AGGAAGAGGT ACTGCAACAT TTGATGGGAC GGCAATAGCA 3480 AATGCAGTTG TTAAAGAACT TGCTGAGACT ATAAAATGTC GTACATTATT TTCAACTCAC 3540 TACCATTCAT TAGTAGAAGA TTATTCTCAA AATGTTGCTG TGCGCCTAGG ACATATGGCA 3600 TGCATGGTAG AAAATGAATG TGAAGACCCC AGCCAGGAGA CTATTACGTT CCTCTATAAA 3660 TTCATTAAGG GAGCTTGTCC TAAAAGCTAT GGCTTTAATG CAGCAAGGCT TGCTAATCTC 3720 CCAGAGGAAG TTATTCAAAA GGGACATAGA AAAGCAAGAG AATTTGAGAA GATGAATCAG 3780 TCACTACGAT TATTTCGGGA AGTTTGCCTG GCTAGTGAAA GGTCAACTGT AGATGCTGAA 3840 GCTGTCCATA AATTGCTGAC TTTGATTAAG GAATTATAGA CTGACTACAT TGGAAGCTTT 3900 GAGTTGACTT CTGACCAAAG GTGGTAAATT CAGACAACAT TATGATCTAA TAAACTTTAT 3960 TTTTTAAAAA TGAAAAAAAA 3980
(13) INFORMATION FOR SEQ ID NO: 13
(i) SEQUENCE CHARACTERISTICS (A)LENGTH: 394 base pairs (B)TYPE: nucleic acid (C)STRANDEDNESS: double
(D)TOPOLOGY: linear (ii) MOLECULE TYPE: synthetic DNA (iii) HYPOTHETICAL: No (iv) ANTISENSE: No (vii) IMMEDIATE SOURCE: Homo sapiens
(ix) FEATURE: SEQ ID NO: 13 shows the double-stranded DNA sequence used to express an internal domain of hMSH2 (corresponding to amino acid residues 27 to 158) in the expression vector pGEX-3x (see also legend to Figure 2) .
(A)NAME: GST/hMSH2
(C) IDENTIFICATION METHOD: Polyacrylamide gel (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 GGAGAAGCCG ACCACCACAG TGCGCCTTTT CGACCGGGGC GACTTCTATA CGGCGCACGG 60 CGAGGACGCG CTGCTGGCCG CCCGGGAGGT GTTCAAGACC CAGGGGGTGA TCAAGTACAT 120 GGGGCCGGCA GGAGCAAAGA ATCTGCAGAG TGTTGTGCTT AGTAAAATGA ATTTTGAATC 180 TTTTGTAAAA GATCTTCTTC TGGTTCGTCA GTATAGAGTT GAAGTTTATA AGAATAGAGC 240
TGGAAATAAG GCATCCAAGG AGAATGATTG GTATTTGGCA TATAAGGCTT CTCCTGGCAA 300
TCTCTCTCAG TTTGAAGACA TTCTCTTTGG TAACAATGAT ATGTCAGCTT CCATTGGTGT 360
TGTGGGTGTT AAAATGTCCG CAGTTGATGG CCAG 394
(14) INFORMATION FOR SEQ ID NO: 14 (i) SEQUENCE CHARACTERISTICS
(A) LENGTH: 534 base pairs (B)TYPE: nucleic acid (C) STRANDEDNESS: double (D) TOPOLOGY: linear (ii) MOLECULE TYPE: synthetic DNA
(iii) HYPOTHETICAL: No (iv) ANTISENSE: No (vii) IMMEDIATE SOURCE:
(ix) FEATURE: SEQ ID NO: 14 shows the double -stranded DNA sequence used to express an internal domain of GTBP (corresponding to amino acid residues 750 to 928) in the expression vector pGEX-3x (see also legend to Figure 2) . (A) NAME: GST/GTBP (C) IDENTIFICATION METHOD: Polyacrylamide gel
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 CTTGAGAGGC TACTCAGTAA AATTCATAAT GTTGGGTCTC CCCTGAAAGT CAGAACCACC 60 CAGACAGCAG GGCTATAATG TATGAAGAAA CTACATACAG CAAGAAGAAG ATTATTGATT 120 TTCTTTCTGC TCTGGAAGGA TTCAAAGTAA TGTGTAAAAT TATAGGGATC ATGGAAGAAG 180 TTGCTGATGG TTTTAAGTCT AAAATCCTTA AGCAGGTCAT CTCTCTGCAG ACAAAAAATC 240
CTGAAGGTCG TTTTCCTGAT TTGACTGTAG AATTGAACCG ATGGGATACA GCCTTTGACC 300 ATGAAAAGGC TCGAAAGACT GGACTTATTA CTCCCAAAGC AGGCTTTGAC TCTGATTATG 360 ACCAAGCTCT TGCTGACATA AGAGAAAATG AACAGAGCCT CCTGGAATAC CTAGAGAAAC 420 AGCGCAACAG AATTGGCTGT AGGACCATAG TCTATGGATT GGTAGGAACC GTTACGCAGC 480 TGGAAATTCC TGAGAATTTC ACCACTCGCA ATTTGCCAGA AGAATACGAG TTGA 534
(15) INFORMATION FOR SEQ ID NO: 15 (i) SEQUENCE CHARACTERISTICS
(A) LENGTH: 68 amino acids (B)TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein
(iii) HYPOTHETICAL: No (iv) ANTISENSE: No (vi) ORIGINAL SOURCE:
(A)ORGANISM: Homo sapiens (vii) IMMEDIATE SOURCE: cDNA of clone KMN
(ix) FEATURE: SEQ ID NO: 15 shows the amino-terminal sequence of 68 amino acids of GTBP encoded by the clone TASNR2A1 (see SEQ ID NO:16 for the corresponding nucleotide encoding sequence) . The amino acid sequence SEQ ID NO:15 (corresponding to residues 1-68) must be placed in front of the amino acid in position 1 of the sequence given in SEQ ID NO:l (corresponding to 1292 residues) to obtain the complete GTBP sequence of 1360 amino acids.
(A)NAME: KMN
( C) IDENTIFICATION METHOD : experimental (xi ) SEQUENCE DESCRIPTION : SEQ ID NO : 15
Met Ser Arg Gin Ser Thr Leu Tyr Ser Phe Phe Pro Lys Ser Pro Ala 1 5 10 15
Lys Ser Asp Ala Met Lys Ala Ser Ala Arg Ala Ser Arg Glu Gly Gly
20 25 30
Arg Ala Ala Ala Ala Pro Glu Ala Ser Pro Ser Pro Gly Gly Asp Ala 35 40 45 Ala Tyr Ser Glu Ala Gly Pro Gly Pro Arg Pro Leu Ala Arg Ser Ala
50 55 60
Ser Pro Pro Lys 65
( 16 INFORMATION FOR SEQ ID NO : 16 ( i ) SEQUENCE CHARACTERISTICS
(A) LENGTH : 204 base pairs (B) TYPE : nucleic acid (C) STRANDEDNESS : double (D) TOPOLOGY : linear ( ii ) MOLECULE TYPE : synthetic DNA
(iii ) HYPOTHETICAL : No ( iv) ANTISENSE : No
(vii) IMMEDIATE SOURCE : cDNA of clone KMN (ix) FEATURE : SEQ ID NO : 16 shows the double-stranded DNA sequence obtained using the RACE method (Rapid Anmpl if ication cDNA Ends) used to establish the 5 ' -terminal sequence of GTBP cDNA encoding the amino-terminal region of the protein GTBP as indicated in SEQ ID NO : 15 . The nucleotidic sequence SEQ ID NO : 15 (corresponding to 204 residues) must be positioned in front of the nucleotide in position 1 of the sequence given in SEQ ID NO : 12 (corresponding to 3980 residues ) in order to obtain the complete GTBP- encoding sequence of 4080 nucleotides . (A) NAME : KMN (C) IDENTIFICATION METHOD : Polyacrylamide gel
(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 16 ATGTCGCGAC AGAGCACCCT GTACAGCTTC TTCCCCAACT CTCCGGCGCT GAGTGATGCC 60 AACAAGGCCT CGGCCAGGGC CTCACGCGAA GGCGGCCGTG CCGCCGCTGC CCCCGAGGCC 120 TCTCCTTCCC CAGGCGGGAA TGCGGCCTGG AGCGAGGCTG GGCCTGGGCC CAGGCCCTTG 180 GCGCGATCCG CGTCACCGCC CAAG 204
Claims
1. An isolated polypeptide, wherein said polypeptide comprises: (1) a first sequence corresponding to GTBP as set forth by combining the amino acid sequences set forth in SEQ ID NO: 15 and SEQ ID NO:l; a second sequence wherein said second sequence is a subsequence of said first sequences and is at least 4 amino acids; (3) a third sequence in which at least one amino acid is replaced by a different amino acID
2. The polypeptide of Claim 1 complexed to a second polypeptide.
3. The polypeptide complex of Claim 2, wherein said second polypeptide is hMSH2.
4. An isolated polypeptide according to claim 1, comprising the amino acid sequences from amino acid 1 to
68 of SEQ ID NO:15 and from amino acid 1 to 1292 of SEQ ID NO: 1, or in any case sequences within the combination of SEQ ID NO: 15 and SEQ ID NO:l, for example SEQ ID NO: 2 to SEQ ID NO:8) .
5. An isolated DNA or RNA molecule, wherein said molecule comprises:
(1) a first sequence encoding GTBP as set forth by combining SEQ ID NO:16 and SEQ ID NO: 12;
(2) a second sequence, wherein said second sequence is a subsequence of said first sequence and is at least 10 nucleotides in length;
(3) a third sequence in which at least one nucleotide of said first or second sequence is replaced by a different nucleotide; or (4) a fourth sequence complementary to any of said first second, or third sequences; with the provisos that (1) if said molecule is an RNA molecule, U replaces T in said sequence of said molecule, (2) said third sequence is at least 95% identical to said first or second sequence, and (3) said second sequence is not present in hMSH2 cDNA.
6. The molecule of Claim 5, wherein said molecule comprises said first sequence.
7. The molecule of Claim 5, wherein said molecule comprises said second sequence.
8. The molecule of Claim 5, wherein said molecule comprises said third sequence.
9. The molecule of Claim 5 , wherein said molecule comprises a cDNA sequence.
10. The molecule of Claim 5, wherein said molecule consists essentially of DNA encoding GTBP.
11. The molecule of Claim 5, wherein the RNA or DNA encoding GTBP is naturally occuring.
12. An expression vector containing the molecule of Claim 5.
13. A cell transformed with the molecule of Claim 5.
14. The cell of Claim 13, wherein said molecule is DNA and said DNA is arranged in operative association with an expression control sequence capable of directing replication and expression of said DNA.
15. The cell according to Claim 13, wherein said cell is a eukaryotic or prokaryotic cell including animal, fungal or bacterial cell.
16. A process for producing GTBP protein comprising culturing a cell of Claim 13 in a suitable culture medium and isolating said GTBP protein from said cell.
17. A polypeptide made according to the process of Claim 16.
18. A method for identifying agents which inhibit or enhance GTBP activity as detectable by in vi tro multi- or dimeriation assays, DNA-binding assays and mismatch repair assays .
19. A method of identifying GTBP-modulating agents, comprising: (1) performing a heterodimerization that includes a GTBP polypeptide, hMSH2 and an agent, and (2) detecting whether the agent modulates hetero- dimerization.
20. The method of Claim 19, wherein the heteodimerization assay comprises an in vi tro binding reaction.
21. A preparation of specific antibodies immunoreactive with GTBP and not substantially immunoreactive with other proteins unrelated to GTBP.
22. A method of purification of GTBP or GTBP- complexing molecules involving the use of specific antibodies of Claim 21.
23. A method of purification of GTBP or GTBP- complexing molecules based on specific interaction between GTBP and nucleic acid recognition sequences.
24. A method of detecting the presence of a genetic defect that has the potential of causing tumorigenesis in human, which comprises: identifying a mutation of a GTBP gene of said human, wherein said mutation results in a GTBP gene sequence different from wild-type human GTBP-coding DNA sequence as set forth by combining SEQ ID NO:16 and SEQ
ID NO: 12.
25. A method of detecting the presence of a genetic defect that causes cancer in a human, which comprises: identifying a mutation of a GTBP gene of said human, wherein said mutation provides a GTBP gene sequence different from human GTBP DNA sequence as set forth by combining SEQ ID NO:16 and SEQ ID NO: 12, that changes the sequence of a protein product of said GTBP gene, or that causes the GTBP product to be truncated or that results in said GTBP gene not being transcribed or translated.
26. A method of diagnosing or prognosing a neoplastic tissue of a human comprising: identifying the presence of a mutation of a
GTBP gene or its expression product in said tissue of said human patient, wherein said mutation provides a GTBP gene sequence different from human GTBP DNA sequence as set forth by combining SEQ ID NO:12 and SEQ ID NO: 16, said alteration indicating neoplasia of the tissue.
27. The methods of Claims 24-26, wherein said mutations result in a change in the sequence of a protein product of said GTBP gene .
28. The methods of Claims 24-26, wherein said mutations result in said GTBP gene not being transcribed or translated.
29. The methods of Claims 24-26, wherein said mutations create stop codons in said GTBP gene.
30. The methodε of Claims 24-26, wherein said methods comprise Polymerase Chain Reaction (PCR) amplification of at least a segment of said GTBP gene.
31. The methods of Claims 24-26, whereas said methods comprise identifying a change in a restriction site as a result of said mutation.
32. The methods of Claim 24-26, wherein said methods comprise restriction fragment length polymorphism analysis, allele-specific oligonucleotide hybridization or nucleotide sequencing.
33. The methods of Claims 24-26, wherein said methods classify said human as homozygous for said GTBP gene or for said mutated GTBP gene or heterozygous for said GTBP gene and said mutated GTBP gene.
34. The methods of Claims 24-26 wherein the expression products are mRNA molecules .
35. The methods of Claims 24-26 wherein the loss of wild-type GTBP coding sequence is detected by Nothern hybridization of mRNA molecules extracted from cells or tissues.
36. The methods of Claims 24-26 wherein the loss of wild-type GTBP is detected by Southern hybridization of a GTBP DNA probe to genomic DNA of said human patient.
37. The methods of Claims 24-26 wherein the loss of wild-type GTBP gene is detected by identifying a mismatch between nucleic acids including (1) mRNA molecules of said human patient and (2) a nucleic acid complementary to human wild-type GTBP coding sequence, when molecules 1 and 2 are hybridized with each other and form a duplex.
38. The methods of Claims 24-26 wherein the loss of wild-type gene is detected by gene cloning and sequencing of cloned DNA.
39. The methods of Claims 24-26 wherein the loss of wild-type GTBP gene is detected by screening for point mutations and deletion or insertion mutations.
40. The method of Claims 24-26 wherein the expression products are protein molecules.
41. The methods of Claims 24-26 wherein the loss of wild-type GTBP is detected by immunoblotting, e.g.
Western blotting.
42. The methods of Claims 24-26 wherein the alteration of wild type GTBP is detected by immunoenzymology and immunocytochemistry.
43. The method of Claims 24-26 wherein the alteration of wild-type GTBP is detected by binding interactions between said GTBP protein and a second cellular protein.
44. The method of Claim 43 wherein the second cellular protein is hMSH2.
45. A method for generating transgenic animals carrying mutant GTBP alleles.
46. A pharmaceutical composition useful in the treatment of GTBP-dependent diseases comprising a therapeutically effective amount of GTBP in a pharmaceutically acceptable vehicle.
47. A method for supplying wild-type GTBP gene function to a cell which has altered GTBP, said gene function being lost by virtue of a mutation in a GTBP gene comprising: introducing full-length or part of GTBP gene in a cell which has lost such gene function such that said full-length or part of GTBP gene are expressed in the cell and encode full-length or part of the GTBP protein which is capable of complementing the genetic defect at the basis of neoplastic disease.
48. A method for supplying wild-type GTBP gene function to a cell which has altered GTBP, said gene i function being lost by virtue of a mutation in a GTBP gene comprising introducing into a cell a molecule which mimics the effect of GTBP alone or complexed with other molecules.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU62412/96A AU6241296A (en) | 1995-06-27 | 1996-06-27 | Polypeptide for repairing genetic information, nucleotidic sequence which codes for it and process for the preparation thereof (guanine thymine binding protein - gtbp) |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
ITRM95A000434 | 1995-06-27 | ||
IT95RM000434A IT1278118B1 (en) | 1995-06-27 | 1995-06-27 | POLYPEPTIDE FOR THE REPAIR OF GENETIC INFORMATION, NUCLEOTIDE SEQUENCE THAT CODES FOR IT AND PROCEDURE FOR ITS |
Publications (2)
Publication Number | Publication Date |
---|---|
WO1997001634A2 true WO1997001634A2 (en) | 1997-01-16 |
WO1997001634A3 WO1997001634A3 (en) | 1997-04-03 |
Family
ID=11403448
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IT1996/000131 WO1997001634A2 (en) | 1995-06-27 | 1996-06-27 | Polypeptide for repairing genetic information, nucleotidic sequence which codes for it and process for the preparation thereof (guanine thymine binding protein - gtbp) |
Country Status (3)
Country | Link |
---|---|
AU (1) | AU6241296A (en) |
IT (1) | IT1278118B1 (en) |
WO (1) | WO1997001634A2 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1999019492A3 (en) * | 1997-10-10 | 1999-06-24 | Rhone Poulenc Agrochimie | Methods for obtaining plant varieties |
WO2002020783A1 (en) * | 2000-06-07 | 2002-03-14 | Shanghai Biowindow Gene Development Inc. | Novel polypeptide--- the human base mismatch repair protein 13.2 and polynucleotide encoding it |
CN119285699A (en) * | 2024-11-13 | 2025-01-10 | 盐城工学院 | Walnut meal-derived antibacterial polypeptide and its application |
-
1995
- 1995-06-27 IT IT95RM000434A patent/IT1278118B1/en active IP Right Grant
-
1996
- 1996-06-27 AU AU62412/96A patent/AU6241296A/en not_active Abandoned
- 1996-06-27 WO PCT/IT1996/000131 patent/WO1997001634A2/en active Application Filing
Non-Patent Citations (10)
Title |
---|
GENOMICS, vol. 31, no. 3, 1 February 1996, pages 395-397, XP000613760 N.C. NICOLAIDES ET AL.: "Molecular cloning of the N-terminus of GTBP" * |
JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 267, no. 33, 25 November 1992, pages 23876-23882, XP000615520 M.J. HUGHES ET AL.: "The purification of a human mismatch-binding protein and identification of its associated ATPase and helicase activities." cited in the application * |
NATURE MEDICINE., vol. 2, no. 2, February 1996, pages 169-174, XP000615507 B. LIU ET AL.: "Analysis of mismatch repair genes in hereditary non-polposis colorectal cancer patients." * |
NATURE, vol. 367, 3 February 1994, page 417 XP000615506 F. PALOMBO ET AL.: "Mismatch repair and cancer." cited in the application * |
PROC. NATL. ACAD. SCI. USA, vol. 85, no. 23, December 1988, pages 8860-8864, XP000615500 J. JIRICNY: "A human 200-kDa protein binds selectively to DNA fragments containing G.T mismatches." cited in the application * |
PROC. NATL. ACAD. SCI. USA., vol. 91, September 1994, pages 8905-8909, XP000615501 G. AQUILINA ET AL.: "A mismatch recognition defect in colon carcinoma confers DNA microsatellite instability and a mutator phenotype." cited in the application * |
SCIENCE, vol. 268, 30 June 1995, pages 1912-1914, XP000615504 F. PALOMBO ET AL.: "GTBP, a 160-Kilodalton protein essential for mismatich-binding activity in human cells." * |
SCIENCE, vol. 268, 30 June 1995, pages 1915-1917, XP000615505 N. PAPADOPUOLOS ET AL.: "Mutations of GTBP in genetically unstable cells." * |
THE JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 264, no. 35, 15 December 1989, pages 21177-21182, XP000371702 C. STEPHENSON ET AL.: "Selective binding to DNA base pair mismatches by proteins from human cells." * |
THE JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 269, no. 20, 20 May 1994, pages 14367-14370, XP000615519 A. UMAR ET AL.: "Defective mismatch repair in extracts of colorectal and endometrial cancer cell lines exhibiting microsatellite instability." cited in the application * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1999019492A3 (en) * | 1997-10-10 | 1999-06-24 | Rhone Poulenc Agrochimie | Methods for obtaining plant varieties |
US6734019B1 (en) | 1997-10-10 | 2004-05-11 | Aventis Cropscience S.A | Isolated DNA that encodes an Arabidopsis thaliana MSH3 protein involved in DNA mismatch repair and a method of modifying the mismatch repair system in a plant transformed with the isolated DNA |
WO2002020783A1 (en) * | 2000-06-07 | 2002-03-14 | Shanghai Biowindow Gene Development Inc. | Novel polypeptide--- the human base mismatch repair protein 13.2 and polynucleotide encoding it |
CN119285699A (en) * | 2024-11-13 | 2025-01-10 | 盐城工学院 | Walnut meal-derived antibacterial polypeptide and its application |
Also Published As
Publication number | Publication date |
---|---|
ITRM950434A1 (en) | 1996-12-27 |
WO1997001634A3 (en) | 1997-04-03 |
ITRM950434A0 (en) | 1995-06-27 |
IT1278118B1 (en) | 1997-11-17 |
AU6241296A (en) | 1997-01-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU664834B2 (en) | Gene mutated in colorectal cancer of humans | |
US5352775A (en) | APC gene and nucleic acid probes derived therefrom | |
US6191268B1 (en) | Compositions and methods relating to DNA mismatch repair genes | |
JP3944654B2 (en) | Neuronal apoptosis inhibitor protein and its gene sequence, and mutation of the gene causing spinal muscular atrophy | |
JPH11502404A (en) | Development of DNA probes and immunological reagents specific for cell surface expressed molecules and transformation-related genes | |
WO1993025713A9 (en) | Compositions and methods for detecting gene rearrangements and translocations | |
WO1993025713A1 (en) | Compositions and methods for detecting gene rearrangements and translocations | |
EP0756488B1 (en) | Detection of alterations of tumor suppressor genes for the diagnosis of cancer | |
US20030190639A1 (en) | Genes involved in intestinal inflamatory diseases and use thereof | |
KR100828506B1 (en) | Mouse spermatogenesis genes and human male infertility genes, and diagnostic systems using them | |
CA2176819A1 (en) | A method for detection of alterations in the dna mismatch repair pathway | |
WO1995021943A1 (en) | Dna encoding atp-sensitive potassium channel proteins and uses thereof | |
EP0633268B1 (en) | MDC proteins and DNAs encoding the same | |
CA2554380C (en) | Mecp2e1 gene | |
JP2003274978A (en) | Composition and method for treatment of tumor | |
US8003764B2 (en) | Folliculin-specific antibodies and methods of detection | |
WO1999001550A1 (en) | A method for detection of alterations in msh5 | |
US20030027300A1 (en) | Human hairless gene and protein | |
AU710551B2 (en) | Nucleic acid encoding a nervous tissue sodium channel | |
WO1997001634A2 (en) | Polypeptide for repairing genetic information, nucleotidic sequence which codes for it and process for the preparation thereof (guanine thymine binding protein - gtbp) | |
US6127128A (en) | Diagnosis of primary congenital glaucoma | |
WO1997012973A1 (en) | Human cyclin i and gene encoding the same | |
US20030138928A1 (en) | Tumor suppressor gene and methods for detection of cancer, monitoring of tumor progression and cancer treatment | |
KR101093508B1 (en) | Colorectal cancer diagnostic composition and use thereof | |
WO1998006871A1 (en) | Materials and methods relating to the diagnosis and prophylactic and therapeutic treatment of papillary renal cell carcinoma |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AU CA CN JP MX US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: CA |