WO1996001320A2 - Complete genomic sequence of autographa californica nuclear polyhedrosis virus c6 - Google Patents
Complete genomic sequence of autographa californica nuclear polyhedrosis virus c6 Download PDFInfo
- Publication number
- WO1996001320A2 WO1996001320A2 PCT/IB1995/000578 IB9500578W WO9601320A2 WO 1996001320 A2 WO1996001320 A2 WO 1996001320A2 IB 9500578 W IB9500578 W IB 9500578W WO 9601320 A2 WO9601320 A2 WO 9601320A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- acnpv
- orf
- virus
- gene
- Prior art date
Links
- 241000201370 Autographa californica nucleopolyhedrovirus Species 0.000 title abstract description 76
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 355
- 241000700605 Viruses Species 0.000 claims abstract description 151
- 241000701447 unidentified baculovirus Species 0.000 claims abstract description 49
- 238000013518 transcription Methods 0.000 claims abstract description 35
- 230000035897 transcription Effects 0.000 claims abstract description 35
- 102000004169 proteins and genes Human genes 0.000 claims description 80
- 241000238631 Hexapoda Species 0.000 claims description 76
- 210000004027 cell Anatomy 0.000 claims description 69
- 108091026890 Coding region Proteins 0.000 claims description 54
- 238000000034 method Methods 0.000 claims description 43
- 239000013604 expression vector Substances 0.000 claims description 39
- 108020004705 Codon Proteins 0.000 claims description 35
- 150000001413 amino acids Chemical class 0.000 claims description 31
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 27
- 102000004196 processed proteins & peptides Human genes 0.000 claims description 26
- 229920001184 polypeptide Polymers 0.000 claims description 24
- 230000010076 replication Effects 0.000 claims description 19
- 238000004519 manufacturing process Methods 0.000 claims description 17
- 102000040430 polynucleotide Human genes 0.000 claims description 9
- 108091033319 polynucleotide Proteins 0.000 claims description 9
- 239000002157 polynucleotide Substances 0.000 claims description 9
- 102100038132 Endogenous retrovirus group K member 6 Pro protein Human genes 0.000 claims description 8
- 101000953580 Pseudomonas phage Pf1 8.6 kDa protein Proteins 0.000 claims description 8
- 101000953577 Pseudomonas phage Pf3 7.9 kDa protein Proteins 0.000 claims description 8
- 108091005804 Peptidases Proteins 0.000 claims description 7
- 101000850960 Pseudomonas phage Pf1 3.2 kDa protein Proteins 0.000 claims description 7
- -1 ORF 32 Proteins 0.000 claims description 6
- 239000004365 Protease Substances 0.000 claims description 6
- 108020003175 receptors Proteins 0.000 claims description 5
- 102000005962 receptors Human genes 0.000 claims description 5
- 241000193388 Bacillus thuringiensis Species 0.000 claims description 4
- 101710091045 Envelope protein Proteins 0.000 claims description 4
- 108700028146 Genetic Enhancer Elements Proteins 0.000 claims description 4
- 241000700721 Hepatitis B virus Species 0.000 claims description 4
- 101710138657 Neurotoxin Proteins 0.000 claims description 4
- 101710188315 Protein X Proteins 0.000 claims description 4
- 241000239226 Scorpiones Species 0.000 claims description 4
- 229940097012 bacillus thuringiensis Drugs 0.000 claims description 4
- 239000002581 neurotoxin Substances 0.000 claims description 4
- 231100000618 neurotoxin Toxicity 0.000 claims description 4
- 239000002243 precursor Substances 0.000 claims description 4
- 230000002194 synthesizing effect Effects 0.000 claims description 4
- 101800000385 Transmembrane protein Proteins 0.000 claims description 3
- 239000005556 hormone Substances 0.000 claims description 3
- 229940088597 hormone Drugs 0.000 claims description 3
- 101710132601 Capsid protein Proteins 0.000 claims description 2
- 101710151559 Crystal protein Proteins 0.000 claims description 2
- 102000003951 Erythropoietin Human genes 0.000 claims description 2
- 108090000394 Erythropoietin Proteins 0.000 claims description 2
- 101710177291 Gag polyprotein Proteins 0.000 claims description 2
- 101001010573 Heliothis virescens Juvenile hormone esterase Proteins 0.000 claims description 2
- 101001111439 Homo sapiens Beta-nerve growth factor Proteins 0.000 claims description 2
- 101100005713 Homo sapiens CD4 gene Proteins 0.000 claims description 2
- 101001033280 Homo sapiens Cytokine receptor common subunit beta Proteins 0.000 claims description 2
- 101000987586 Homo sapiens Eosinophil peroxidase Proteins 0.000 claims description 2
- 101000920686 Homo sapiens Erythropoietin Proteins 0.000 claims description 2
- 101001002657 Homo sapiens Interleukin-2 Proteins 0.000 claims description 2
- 101001076408 Homo sapiens Interleukin-6 Proteins 0.000 claims description 2
- 101000611183 Homo sapiens Tumor necrosis factor Proteins 0.000 claims description 2
- 108090000144 Human Proteins Proteins 0.000 claims description 2
- 102000003839 Human Proteins Human genes 0.000 claims description 2
- 241000713772 Human immunodeficiency virus 1 Species 0.000 claims description 2
- 241000713340 Human immunodeficiency virus 2 Species 0.000 claims description 2
- 102000006992 Interferon-alpha Human genes 0.000 claims description 2
- 108010047761 Interferon-alpha Proteins 0.000 claims description 2
- 102000014150 Interferons Human genes 0.000 claims description 2
- 108010050904 Interferons Proteins 0.000 claims description 2
- 101710125418 Major capsid protein Proteins 0.000 claims description 2
- 241000255908 Manduca sexta Species 0.000 claims description 2
- 101000879976 Manduca sexta Eclosion hormone Proteins 0.000 claims description 2
- 241001481690 Mesobuthus eupeus Species 0.000 claims description 2
- 108091000080 Phosphotransferase Proteins 0.000 claims description 2
- 101710192141 Protein Nef Proteins 0.000 claims description 2
- 101710150344 Protein Rev Proteins 0.000 claims description 2
- 101800001271 Surface protein Proteins 0.000 claims description 2
- 108010015780 Viral Core Proteins Proteins 0.000 claims description 2
- 239000000427 antigen Substances 0.000 claims description 2
- 108091007433 antigens Proteins 0.000 claims description 2
- 102000036639 antigens Human genes 0.000 claims description 2
- 238000013320 baculovirus expression vector system Methods 0.000 claims description 2
- 239000002158 endotoxin Substances 0.000 claims description 2
- 229940105423 erythropoietin Drugs 0.000 claims description 2
- 108010027225 gag-pol Fusion Proteins Proteins 0.000 claims description 2
- 102000055647 human CSF2RB Human genes 0.000 claims description 2
- 102000044890 human EPO Human genes 0.000 claims description 2
- 102000055277 human IL2 Human genes 0.000 claims description 2
- 102000057041 human TNF Human genes 0.000 claims description 2
- 229940116886 human interleukin-6 Drugs 0.000 claims description 2
- 230000010354 integration Effects 0.000 claims description 2
- 229940079322 interferon Drugs 0.000 claims description 2
- 102000020233 phosphotransferase Human genes 0.000 claims description 2
- 108010089520 pol Gene Products Proteins 0.000 claims description 2
- OXCMYAYHXIHQOA-UHFFFAOYSA-N potassium;[2-butyl-5-chloro-3-[[4-[2-(1,2,4-triaza-3-azanidacyclopenta-1,4-dien-5-yl)phenyl]phenyl]methyl]imidazol-4-yl]methanol Chemical compound [K+].CCCCC1=NC(Cl)=C(CO)N1CC1=CC=C(C=2C(=CC=CC=2)C2=N[N-]N=N2)C=C1 OXCMYAYHXIHQOA-UHFFFAOYSA-N 0.000 claims description 2
- 241000714260 Human T-lymphotropic virus 1 Species 0.000 claims 4
- 241001219494 Androctonus australis hector Species 0.000 claims 1
- 101000960969 Homo sapiens Interleukin-5 Proteins 0.000 claims 1
- 102000003978 Tissue Plasminogen Activator Human genes 0.000 claims 1
- 108090000373 Tissue Plasminogen Activator Proteins 0.000 claims 1
- 229960000187 tissue plasminogen activator Drugs 0.000 claims 1
- 108700026244 Open Reading Frames Proteins 0.000 abstract description 213
- 239000002773 nucleotide Substances 0.000 abstract description 45
- 125000003729 nucleotide group Chemical group 0.000 abstract description 45
- 238000004458 analytical method Methods 0.000 abstract description 34
- 230000006870 function Effects 0.000 abstract description 32
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 abstract description 9
- 229930182817 methionine Natural products 0.000 abstract description 9
- 230000004543 DNA replication Effects 0.000 abstract description 7
- 230000000977 initiatory effect Effects 0.000 abstract description 7
- 230000033228 biological regulation Effects 0.000 abstract description 5
- 108020005202 Viral DNA Proteins 0.000 abstract description 4
- 108020004414 DNA Proteins 0.000 description 104
- 235000018102 proteins Nutrition 0.000 description 75
- 239000012634 fragment Substances 0.000 description 53
- 230000014509 gene expression Effects 0.000 description 42
- 208000015181 infectious disease Diseases 0.000 description 40
- 101150066555 lacZ gene Proteins 0.000 description 40
- 108091028043 Nucleic acid sequence Proteins 0.000 description 32
- 230000029812 viral genome replication Effects 0.000 description 30
- 238000004113 cell culture Methods 0.000 description 28
- 230000002458 infectious effect Effects 0.000 description 28
- 101710182846 Polyhedrin Proteins 0.000 description 24
- 241000256251 Spodoptera frugiperda Species 0.000 description 24
- 108091081024 Start codon Proteins 0.000 description 23
- 235000001014 amino acid Nutrition 0.000 description 23
- 239000013612 plasmid Substances 0.000 description 23
- 230000014616 translation Effects 0.000 description 23
- 241001367049 Autographa Species 0.000 description 22
- 229940024606 amino acid Drugs 0.000 description 22
- 108091008146 restriction endonucleases Proteins 0.000 description 21
- 238000013519 translation Methods 0.000 description 21
- 239000013598 vector Substances 0.000 description 21
- KDCGOANMDULRCW-UHFFFAOYSA-N 7H-purine Chemical compound N1=CNC2=NC=NC2=C1 KDCGOANMDULRCW-UHFFFAOYSA-N 0.000 description 20
- 108091007065 BIRCs Proteins 0.000 description 20
- 230000027455 binding Effects 0.000 description 19
- 230000014621 translational initiation Effects 0.000 description 19
- 230000003612 virological effect Effects 0.000 description 19
- 230000000692 anti-sense effect Effects 0.000 description 17
- 239000002299 complementary DNA Substances 0.000 description 16
- 238000011144 upstream manufacturing Methods 0.000 description 16
- 108020004999 messenger RNA Proteins 0.000 description 15
- 230000004048 modification Effects 0.000 description 15
- 238000012986 modification Methods 0.000 description 15
- 101100144928 Autographa californica nuclear polyhedrosis virus PNK/PNL gene Proteins 0.000 description 14
- 102000055031 Inhibitor of Apoptosis Proteins Human genes 0.000 description 14
- 108010022172 Chitinases Proteins 0.000 description 13
- 108010005774 beta-Galactosidase Proteins 0.000 description 13
- 108700026226 TATA Box Proteins 0.000 description 12
- 230000002068 genetic effect Effects 0.000 description 12
- 239000000203 mixture Substances 0.000 description 12
- 241000894007 species Species 0.000 description 12
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 11
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 11
- 108700039887 Essential Genes Proteins 0.000 description 11
- 239000013600 plasmid vector Substances 0.000 description 11
- 241000701366 unidentified nuclear polyhedrosis viruses Species 0.000 description 11
- 102000012286 Chitinases Human genes 0.000 description 10
- 108700010070 Codon Usage Proteins 0.000 description 10
- 102000004190 Enzymes Human genes 0.000 description 10
- 108090000790 Enzymes Proteins 0.000 description 10
- 241000588724 Escherichia coli Species 0.000 description 10
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 10
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 10
- 230000002939 deleterious effect Effects 0.000 description 10
- 230000004927 fusion Effects 0.000 description 10
- 229920000669 heparin Polymers 0.000 description 10
- 229960002897 heparin Drugs 0.000 description 10
- 230000008569 process Effects 0.000 description 10
- 239000011701 zinc Substances 0.000 description 10
- 229910052725 zinc Inorganic materials 0.000 description 10
- HTTJABKRGRZYRN-UHFFFAOYSA-N Heparin Chemical compound OC1C(NC(=O)C)C(O)OC(COS(O)(=O)=O)C1OC1C(OS(O)(=O)=O)C(O)C(OC2C(C(OS(O)(=O)=O)C(OC3C(C(O)C(O)C(O3)C(O)=O)OS(O)(=O)=O)C(CO)O2)NS(O)(=O)=O)C(C(O)=O)O1 HTTJABKRGRZYRN-UHFFFAOYSA-N 0.000 description 9
- 108060004795 Methyltransferase Proteins 0.000 description 9
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 9
- 239000011159 matrix material Substances 0.000 description 9
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 8
- 102000009572 RNA Polymerase II Human genes 0.000 description 8
- 108010009460 RNA Polymerase II Proteins 0.000 description 8
- 108700009124 Transcription Initiation Site Proteins 0.000 description 8
- 108091023040 Transcription factor Proteins 0.000 description 8
- 238000003780 insertion Methods 0.000 description 8
- 230000037431 insertion Effects 0.000 description 8
- 108700003860 Bacterial Genes Proteins 0.000 description 7
- 102000053602 DNA Human genes 0.000 description 7
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 7
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 7
- 102000005936 beta-Galactosidase Human genes 0.000 description 7
- 238000013461 design Methods 0.000 description 7
- 238000011161 development Methods 0.000 description 7
- 230000018109 developmental process Effects 0.000 description 7
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 6
- 108050007372 Fibroblast Growth Factor Proteins 0.000 description 6
- 102000018233 Fibroblast Growth Factor Human genes 0.000 description 6
- 101150099406 GTA gene Proteins 0.000 description 6
- 101710141347 Major envelope glycoprotein Proteins 0.000 description 6
- 108010076504 Protein Sorting Signals Proteins 0.000 description 6
- 238000012300 Sequence Analysis Methods 0.000 description 6
- 102000040945 Transcription factor Human genes 0.000 description 6
- 230000006907 apoptotic process Effects 0.000 description 6
- 102000039446 nucleic acids Human genes 0.000 description 6
- 108020004707 nucleic acids Proteins 0.000 description 6
- 150000007523 nucleic acids Chemical class 0.000 description 6
- 230000001105 regulatory effect Effects 0.000 description 6
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 5
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 5
- 108060002716 Exonuclease Proteins 0.000 description 5
- 102100025243 Myeloid cell surface antigen CD33 Human genes 0.000 description 5
- 230000000295 complement effect Effects 0.000 description 5
- 102000013165 exonuclease Human genes 0.000 description 5
- 239000002917 insecticide Substances 0.000 description 5
- 239000002245 particle Substances 0.000 description 5
- 101000818108 Acholeplasma phage L2 Uncharacterized 81.3 kDa protein Proteins 0.000 description 4
- 101000743047 Autographa californica nuclear polyhedrosis virus Protein AC23 Proteins 0.000 description 4
- 102000052510 DNA-Binding Proteins Human genes 0.000 description 4
- 101710197780 E3 ubiquitin-protein ligase LAP Proteins 0.000 description 4
- 101000912350 Haemophilus phage HP1 (strain HP1c1) DNA N-6-adenine-methyltransferase Proteins 0.000 description 4
- 108700005087 Homeobox Genes Proteins 0.000 description 4
- 101000879661 Homo sapiens Chitotriosidase-1 Proteins 0.000 description 4
- 101000790844 Klebsiella pneumoniae Uncharacterized 24.8 kDa protein in cps region Proteins 0.000 description 4
- XUJNEKJLAYXESH-REOHCLBHSA-N L-Cysteine Chemical compound SC[C@H](N)C(O)=O XUJNEKJLAYXESH-REOHCLBHSA-N 0.000 description 4
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 4
- 108091081548 Palindromic sequence Proteins 0.000 description 4
- 108700005077 Viral Genes Proteins 0.000 description 4
- 241000607479 Yersinia pestis Species 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 210000004748 cultured cell Anatomy 0.000 description 4
- 235000018417 cysteine Nutrition 0.000 description 4
- HAAZLUGHYHWQIW-KVQBGUIXSA-N dGTP Chemical class C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 HAAZLUGHYHWQIW-KVQBGUIXSA-N 0.000 description 4
- 238000012217 deletion Methods 0.000 description 4
- 230000037430 deletion Effects 0.000 description 4
- 108010013770 ecdysteroid UDP-glucosyltransferase Proteins 0.000 description 4
- 230000002708 enhancing effect Effects 0.000 description 4
- 239000003112 inhibitor Substances 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 239000000575 pesticide Substances 0.000 description 4
- 238000002360 preparation method Methods 0.000 description 4
- 230000002123 temporal effect Effects 0.000 description 4
- 230000002103 transcriptional effect Effects 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 101000748781 Anthoceros angustus Uncharacterized 3.0 kDa protein in psbT-psbN intergenic region Proteins 0.000 description 3
- 239000004475 Arginine Substances 0.000 description 3
- 101100335652 Autographa californica nuclear polyhedrosis virus GP64 gene Proteins 0.000 description 3
- 108010084457 Cathepsins Proteins 0.000 description 3
- 102100037328 Chitotriosidase-1 Human genes 0.000 description 3
- 108091035707 Consensus sequence Proteins 0.000 description 3
- 101000792449 Cyanophora paradoxa Uncharacterized 3.4 kDa protein in atpE-petA intergenic region Proteins 0.000 description 3
- 108050006400 Cyclin Proteins 0.000 description 3
- 238000001712 DNA sequencing Methods 0.000 description 3
- 108700020911 DNA-Binding Proteins Proteins 0.000 description 3
- 101001117015 Escherichia coli (strain K12) Poly(A) polymerase I Proteins 0.000 description 3
- 108700024394 Exon Proteins 0.000 description 3
- 102000003886 Glycoproteins Human genes 0.000 description 3
- 108090000288 Glycoproteins Proteins 0.000 description 3
- 101000702559 Homo sapiens Probable global transcription activator SNF2L2 Proteins 0.000 description 3
- 101000702545 Homo sapiens Transcription activator BRG1 Proteins 0.000 description 3
- 101150032161 IAP1 gene Proteins 0.000 description 3
- 102100024319 Intestinal-type alkaline phosphatase Human genes 0.000 description 3
- 101000626970 Marchantia polymorpha Uncharacterized 3.3 kDa protein in psbT-psbN intergenic region Proteins 0.000 description 3
- 101150092861 ORF71 gene Proteins 0.000 description 3
- 101150071814 ORF86 gene Proteins 0.000 description 3
- 108091034117 Oligonucleotide Proteins 0.000 description 3
- 101100263767 Orgyia pseudotsugata multicapsid polyhedrosis virus GP16 gene Proteins 0.000 description 3
- 101100317133 Orgyia pseudotsugata multicapsid polyhedrosis virus p91 gene Proteins 0.000 description 3
- 101150027323 PCNP gene Proteins 0.000 description 3
- 108010029182 Pectin lyase Proteins 0.000 description 3
- 108010021757 Polynucleotide 5'-Hydroxyl-Kinase Proteins 0.000 description 3
- 102000008422 Polynucleotide 5'-hydroxyl-kinase Human genes 0.000 description 3
- 102100036691 Proliferating cell nuclear antigen Human genes 0.000 description 3
- 102000001253 Protein Kinase Human genes 0.000 description 3
- 101100346651 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) MSS18 gene Proteins 0.000 description 3
- 241000700584 Simplexvirus Species 0.000 description 3
- 108020004682 Single-Stranded DNA Proteins 0.000 description 3
- 108020005038 Terminator Codon Proteins 0.000 description 3
- 101710183015 Trans-activating transcriptional regulatory protein Proteins 0.000 description 3
- 102000006290 Transcription Factor TFIID Human genes 0.000 description 3
- 108010083268 Transcription Factor TFIID Proteins 0.000 description 3
- 102100031027 Transcription activator BRG1 Human genes 0.000 description 3
- 101100166027 Trichoplusia ni ascovirus 2c MCP-2 gene Proteins 0.000 description 3
- 101000764204 Trieres chinensis Uncharacterized 3.3 kDa protein in rpl11-trnW intergenic region Proteins 0.000 description 3
- 108020000999 Viral RNA Proteins 0.000 description 3
- 230000002378 acidificating effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000009089 cytolysis Effects 0.000 description 3
- 238000009795 derivation Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000006386 neutralization reaction Methods 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 3
- 230000000644 propagated effect Effects 0.000 description 3
- 108060006633 protein kinase Proteins 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 238000002741 site-directed mutagenesis Methods 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 3
- 230000005026 transcription initiation Effects 0.000 description 3
- 101800003158 5 kDa peptide Proteins 0.000 description 2
- 101150113556 ALK-EXO gene Proteins 0.000 description 2
- 101000748061 Acholeplasma phage L2 Uncharacterized 16.1 kDa protein Proteins 0.000 description 2
- 101000827329 Acholeplasma phage L2 Uncharacterized 26.1 kDa protein Proteins 0.000 description 2
- 102100021569 Apoptosis regulator Bcl-2 Human genes 0.000 description 2
- 241001203868 Autographa californica Species 0.000 description 2
- 101100495846 Autographa californica nuclear polyhedrosis virus CHIA gene Proteins 0.000 description 2
- 101000781183 Autographa californica nuclear polyhedrosis virus Uncharacterized 20.4 kDa protein in IAP1-SOD intergenic region Proteins 0.000 description 2
- 108091012583 BCL2 Proteins 0.000 description 2
- 241000409811 Bombyx mori nucleopolyhedrovirus Species 0.000 description 2
- 102000005600 Cathepsins Human genes 0.000 description 2
- 101000947615 Clostridium perfringens Uncharacterized 38.4 kDa protein Proteins 0.000 description 2
- 102100033195 DNA ligase 4 Human genes 0.000 description 2
- 230000004568 DNA-binding Effects 0.000 description 2
- 241000255581 Drosophila <fruit fly, genus> Species 0.000 description 2
- 101150093002 EGT gene Proteins 0.000 description 2
- 101000964391 Enterococcus faecalis UPF0145 protein Proteins 0.000 description 2
- 101100066648 Escherichia phage T5 D17 gene Proteins 0.000 description 2
- 241000206602 Eukaryota Species 0.000 description 2
- 101150021185 FGF gene Proteins 0.000 description 2
- 239000004471 Glycine Substances 0.000 description 2
- 101000748063 Haemophilus phage HP1 (strain HP1c1) Uncharacterized 11.1 kDa protein in rep-hol intergenic region Proteins 0.000 description 2
- 101000818057 Haemophilus phage HP1 (strain HP1c1) Uncharacterized 14.9 kDa protein in rep-hol intergenic region Proteins 0.000 description 2
- 101000927810 Homo sapiens DNA ligase 4 Proteins 0.000 description 2
- 101150118344 IAP2 gene Proteins 0.000 description 2
- 101001015100 Klebsiella pneumoniae UDP-glucose:undecaprenyl-phosphate glucose-1-phosphate transferase Proteins 0.000 description 2
- 101000790840 Klebsiella pneumoniae Uncharacterized 49.5 kDa protein in cps region Proteins 0.000 description 2
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 2
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 description 2
- 101150038414 LAP gene Proteins 0.000 description 2
- 102400000401 Latency-associated peptide Human genes 0.000 description 2
- 102000003960 Ligases Human genes 0.000 description 2
- 108090000364 Ligases Proteins 0.000 description 2
- 101150034674 ME53 gene Proteins 0.000 description 2
- 101000788492 Marchantia polymorpha Uncharacterized mitochondrial protein ymf28 Proteins 0.000 description 2
- 101150083029 ORF147 gene Proteins 0.000 description 2
- 101150077302 ORF88 gene Proteins 0.000 description 2
- 101100281854 Orgyia pseudotsugata multicapsid polyhedrosis virus GP64 gene Proteins 0.000 description 2
- 101100028042 Orgyia pseudotsugata multicapsid polyhedrosis virus OPEP-3 gene Proteins 0.000 description 2
- 101100484850 Orgyia pseudotsugata multicapsid polyhedrosis virus P15 gene Proteins 0.000 description 2
- 101150030083 PE38 gene Proteins 0.000 description 2
- 101150051210 PK2 gene Proteins 0.000 description 2
- 102000035195 Peptidases Human genes 0.000 description 2
- 102000004160 Phosphoric Monoester Hydrolases Human genes 0.000 description 2
- 108090000608 Phosphoric Monoester Hydrolases Proteins 0.000 description 2
- 101710092489 Protein kinase 2 Proteins 0.000 description 2
- CZPWVGJYEJSRLH-UHFFFAOYSA-N Pyrimidine Chemical compound C1=CN=CN=C1 CZPWVGJYEJSRLH-UHFFFAOYSA-N 0.000 description 2
- 101710086015 RNA ligase Proteins 0.000 description 2
- 108700008625 Reporter Genes Proteins 0.000 description 2
- 108091058545 Secretory proteins Proteins 0.000 description 2
- 102000040739 Secretory proteins Human genes 0.000 description 2
- 101000992423 Severe acute respiratory syndrome coronavirus 2 Putative ORF9c protein Proteins 0.000 description 2
- 101000953979 Streptomyces lividans Uncharacterized 6.6 kDa protein Proteins 0.000 description 2
- 241000255993 Trichoplusia ni Species 0.000 description 2
- 101710172411 Uncharacterized protein ycf68 Proteins 0.000 description 2
- 239000012190 activator Substances 0.000 description 2
- 230000009418 agronomic effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- ZWPRYVATYZPCDP-UHFFFAOYSA-M bis(dibutylamino)methylidene-dibutylazanium;fluoride Chemical compound [F-].CCCCN(CCCC)C(N(CCCC)CCCC)=[N+](CCCC)CCCC ZWPRYVATYZPCDP-UHFFFAOYSA-M 0.000 description 2
- 125000003178 carboxy group Chemical group [H]OC(*)=O 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 230000030833 cell death Effects 0.000 description 2
- XUJNEKJLAYXESH-UHFFFAOYSA-N cysteine Natural products SCC(N)C(O)=O XUJNEKJLAYXESH-UHFFFAOYSA-N 0.000 description 2
- 150000001945 cysteines Chemical class 0.000 description 2
- 230000002950 deficient Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 230000001066 destructive effect Effects 0.000 description 2
- 230000029087 digestion Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 239000003623 enhancer Substances 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000001976 enzyme digestion Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 229940126864 fibroblast growth factor Drugs 0.000 description 2
- 230000012010 growth Effects 0.000 description 2
- 239000003102 growth factor Substances 0.000 description 2
- 239000003228 hemolysin Substances 0.000 description 2
- 238000009396 hybridization Methods 0.000 description 2
- 101150100002 iap gene Proteins 0.000 description 2
- 238000001727 in vivo Methods 0.000 description 2
- 230000002401 inhibitory effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 description 2
- 229960000310 isoleucine Drugs 0.000 description 2
- 238000005304 joining Methods 0.000 description 2
- 230000001418 larval effect Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000007935 neutral effect Effects 0.000 description 2
- 230000002018 overexpression Effects 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 230000008488 polyadenylation Effects 0.000 description 2
- 150000003212 purines Chemical class 0.000 description 2
- 230000006798 recombination Effects 0.000 description 2
- 238000005215 recombination Methods 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000028327 secretion Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 230000004083 survival effect Effects 0.000 description 2
- 230000009261 transgenic effect Effects 0.000 description 2
- HLXHCNWEVQNNKA-UHFFFAOYSA-N 5-methoxy-2,3-dihydro-1h-inden-2-amine Chemical group COC1=CC=C2CC(N)CC2=C1 HLXHCNWEVQNNKA-UHFFFAOYSA-N 0.000 description 1
- 101150044182 8 gene Proteins 0.000 description 1
- 101150023956 ALK gene Proteins 0.000 description 1
- 101710115267 ATP synthase protein MI25 Proteins 0.000 description 1
- 101000621943 Acholeplasma phage L2 Probable integrase/recombinase Proteins 0.000 description 1
- 101000977065 Acidithiobacillus ferridurans Uncharacterized 11.6 kDa protein in mobS 3'region Proteins 0.000 description 1
- 101000618348 Allochromatium vinosum (strain ATCC 17899 / DSM 180 / NBRC 103801 / NCIMB 10441 / D) Uncharacterized protein Alvin_0065 Proteins 0.000 description 1
- 101800002638 Alpha-amanitin Proteins 0.000 description 1
- 102000013455 Amyloid beta-Peptides Human genes 0.000 description 1
- 108010090849 Amyloid beta-Peptides Proteins 0.000 description 1
- 241000239239 Androctonus Species 0.000 description 1
- 108700031308 Antennapedia Homeodomain Proteins 0.000 description 1
- 101100064323 Arabidopsis thaliana DTX47 gene Proteins 0.000 description 1
- 101100214862 Autographa californica nuclear polyhedrosis virus AC152 gene Proteins 0.000 description 1
- 101100171547 Autographa californica nuclear polyhedrosis virus E27 gene Proteins 0.000 description 1
- 101100394491 Autographa californica nuclear polyhedrosis virus HE65 gene Proteins 0.000 description 1
- 101100070304 Autographa californica nuclear polyhedrosis virus HELI gene Proteins 0.000 description 1
- 101100127793 Autographa californica nuclear polyhedrosis virus LEF-4 gene Proteins 0.000 description 1
- 101100135329 Autographa californica nuclear polyhedrosis virus P6.9 gene Proteins 0.000 description 1
- 101100351191 Autographa californica nuclear polyhedrosis virus PCNA gene Proteins 0.000 description 1
- 101000781117 Autographa californica nuclear polyhedrosis virus Uncharacterized 12.4 kDa protein in CTL-LEF2 intergenic region Proteins 0.000 description 1
- 101000666833 Autographa californica nuclear polyhedrosis virus Uncharacterized 20.8 kDa protein in FGF-VUBI intergenic region Proteins 0.000 description 1
- 101000847476 Autographa californica nuclear polyhedrosis virus Uncharacterized 54.7 kDa protein in IAP1-SOD intergenic region Proteins 0.000 description 1
- 101000708323 Azospirillum brasilense Uncharacterized 28.8 kDa protein in nifR3-like 5'region Proteins 0.000 description 1
- 101000977027 Azospirillum brasilense Uncharacterized protein in nodG 5'region Proteins 0.000 description 1
- 101000770311 Azotobacter chroococcum mcd 1 Uncharacterized 19.8 kDa protein in nifW 5'region Proteins 0.000 description 1
- 101000748761 Bacillus subtilis (strain 168) Uncharacterized MFS-type transporter YcxA Proteins 0.000 description 1
- 101000736075 Bacillus subtilis (strain 168) Uncharacterized protein YcbP Proteins 0.000 description 1
- 101000765620 Bacillus subtilis (strain 168) Uncharacterized protein YlxP Proteins 0.000 description 1
- 101000916134 Bacillus subtilis (strain 168) Uncharacterized protein YqxJ Proteins 0.000 description 1
- 101000962005 Bacillus thuringiensis Uncharacterized 23.6 kDa protein Proteins 0.000 description 1
- 102000051819 Baculoviral IAP Repeat-Containing 3 Human genes 0.000 description 1
- 108700003785 Baculoviral IAP Repeat-Containing 3 Proteins 0.000 description 1
- 241000701412 Baculoviridae Species 0.000 description 1
- 101000754349 Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251) UPF0065 protein BP0148 Proteins 0.000 description 1
- 241000701083 Bovine alphaherpesvirus 1 Species 0.000 description 1
- 101100054773 Caenorhabditis elegans act-2 gene Proteins 0.000 description 1
- 101000827633 Caldicellulosiruptor sp. (strain Rt8B.4) Uncharacterized 23.9 kDa protein in xynA 3'region Proteins 0.000 description 1
- 241001164374 Calyx Species 0.000 description 1
- 108090000565 Capsid Proteins Proteins 0.000 description 1
- 102100023321 Ceruloplasmin Human genes 0.000 description 1
- 101000748765 Chlorella vulgaris Uncharacterized 16.5 kDa protein in psaC-atpA intergenic region Proteins 0.000 description 1
- 241000255942 Choristoneura fumiferana Species 0.000 description 1
- 108091062157 Cis-regulatory element Proteins 0.000 description 1
- 101000947628 Claviceps purpurea Uncharacterized 11.8 kDa protein Proteins 0.000 description 1
- 241000193403 Clostridium Species 0.000 description 1
- 241000193468 Clostridium perfringens Species 0.000 description 1
- 101000686796 Clostridium perfringens Replication protein Proteins 0.000 description 1
- 206010010144 Completed suicide Diseases 0.000 description 1
- 101000861180 Cupriavidus necator (strain ATCC 17699 / DSM 428 / KCTC 22496 / NCIMB 10442 / H16 / Stanier 337) Uncharacterized protein H16_B0147 Proteins 0.000 description 1
- 101000764209 Cyanophora paradoxa Uncharacterized 11.2 kDa protein in ycf23-apcF intergenic region Proteins 0.000 description 1
- 201000003883 Cystic fibrosis Diseases 0.000 description 1
- 108010061982 DNA Ligases Proteins 0.000 description 1
- 102000012410 DNA Ligases Human genes 0.000 description 1
- 230000003682 DNA packaging effect Effects 0.000 description 1
- 239000003155 DNA primer Substances 0.000 description 1
- 101710096438 DNA-binding protein Proteins 0.000 description 1
- 101100499270 Drosophila melanogaster Diap1 gene Proteins 0.000 description 1
- 101000785191 Drosophila melanogaster Uncharacterized 50 kDa protein in type I retrotransposable element R1DM Proteins 0.000 description 1
- 101150112474 EXO gene Proteins 0.000 description 1
- 108010042407 Endonucleases Proteins 0.000 description 1
- 102000004533 Endonucleases Human genes 0.000 description 1
- 101000791598 Enterobacteria phage 82 Uncharacterized protein in rusA 5'region Proteins 0.000 description 1
- 101000747704 Enterobacteria phage N4 Uncharacterized protein Gp1 Proteins 0.000 description 1
- 241000701832 Enterobacteria phage T3 Species 0.000 description 1
- 101000861206 Enterococcus faecalis (strain ATCC 700802 / V583) Uncharacterized protein EF_A0048 Proteins 0.000 description 1
- 101900264058 Escherichia coli Beta-galactosidase Proteins 0.000 description 1
- 101000769180 Escherichia coli Uncharacterized 11.1 kDa protein Proteins 0.000 description 1
- 101000788129 Escherichia coli Uncharacterized protein in sul1 3'region Proteins 0.000 description 1
- 101000788370 Escherichia phage P2 Uncharacterized 12.9 kDa protein in GpA 3'region Proteins 0.000 description 1
- 241001524679 Escherichia virus M13 Species 0.000 description 1
- 101710086766 FP protein Proteins 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 108010011145 Fushi Tarazu Transcription Factors Proteins 0.000 description 1
- 241000255896 Galleria mellonella Species 0.000 description 1
- 241000951956 Galleria mellonella MNPV Species 0.000 description 1
- 101100272587 Gallus gallus ITA gene Proteins 0.000 description 1
- 108700039691 Genetic Promoter Regions Proteins 0.000 description 1
- 101000787096 Geobacillus stearothermophilus Uncharacterized protein in gldA 3'region Proteins 0.000 description 1
- 108010017213 Granulocyte-Macrophage Colony-Stimulating Factor Proteins 0.000 description 1
- 102100039620 Granulocyte-macrophage colony-stimulating factor Human genes 0.000 description 1
- 101000626971 Guillardia theta Uncharacterized 8.1 kDa protein Proteins 0.000 description 1
- 101001066788 Haemophilus phage HP1 (strain HP1c1) Probable portal protein Proteins 0.000 description 1
- 101000743335 Haemophilus phage HP1 (strain HP1c1) Probable terminase, endonuclease subunit Proteins 0.000 description 1
- 101000976893 Haemophilus phage HP1 (strain HP1c1) Uncharacterized 14.1 kDa protein in cox-rep intergenic region Proteins 0.000 description 1
- 101000976889 Haemophilus phage HP1 (strain HP1c1) Uncharacterized 19.2 kDa protein in cox-rep intergenic region Proteins 0.000 description 1
- 101000708358 Haemophilus phage HP1 (strain HP1c1) Uncharacterized 23.3 kDa protein in lys 3'region Proteins 0.000 description 1
- 101000786921 Haemophilus phage HP1 (strain HP1c1) Uncharacterized 26.0 kDa protein in rep-hol intergenic region Proteins 0.000 description 1
- 229920002971 Heparan sulfate Polymers 0.000 description 1
- 101000748192 Herpetosiphon aurantiacus Uncharacterized 15.4 kDa protein in HgiDIIM 5'region Proteins 0.000 description 1
- 101000929495 Homo sapiens Adenosine deaminase Proteins 0.000 description 1
- 101000836540 Homo sapiens Aldo-keto reductase family 1 member B1 Proteins 0.000 description 1
- 101000771674 Homo sapiens Apolipoprotein E Proteins 0.000 description 1
- 101000959437 Homo sapiens Beta-2 adrenergic receptor Proteins 0.000 description 1
- 101000746373 Homo sapiens Granulocyte-macrophage colony-stimulating factor Proteins 0.000 description 1
- 101000820589 Homo sapiens Succinate-hydroxymethylglutarate CoA-transferase Proteins 0.000 description 1
- 241000700588 Human alphaherpesvirus 1 Species 0.000 description 1
- 241000725303 Human immunodeficiency virus Species 0.000 description 1
- 201000001096 IGSF1 deficiency syndrome Diseases 0.000 description 1
- 108091029795 Intergenic region Proteins 0.000 description 1
- 108091092195 Intron Proteins 0.000 description 1
- 101000827627 Klebsiella pneumoniae Putative low molecular weight protein-tyrosine-phosphatase Proteins 0.000 description 1
- 101000790838 Klebsiella pneumoniae UPF0053 protein in cps region Proteins 0.000 description 1
- ODKSFYDXXFIFQN-BYPYZUCNSA-P L-argininium(2+) Chemical compound NC(=[NH2+])NCCC[C@H]([NH3+])C(O)=O ODKSFYDXXFIFQN-BYPYZUCNSA-P 0.000 description 1
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 1
- 101000976301 Leptospira interrogans Uncharacterized 35 kDa protein in sph 3'region Proteins 0.000 description 1
- 239000004472 Lysine Substances 0.000 description 1
- 101000788487 Marchantia polymorpha Uncharacterized mitochondrial protein ymf25 Proteins 0.000 description 1
- 101000747938 Marchantia polymorpha Uncharacterized mitochondrial protein ymf31 Proteins 0.000 description 1
- 101001130841 Middle East respiratory syndrome-related coronavirus (isolate United Kingdom/H123990006/2012) Non-structural protein ORF5 Proteins 0.000 description 1
- 101100446506 Mus musculus Fgf3 gene Proteins 0.000 description 1
- 101000658690 Neisseria meningitidis serogroup B Transposase for insertion sequence element IS1106 Proteins 0.000 description 1
- 101100289047 Novosphingobium sp. (strain KA1) ligU gene Proteins 0.000 description 1
- 108020003217 Nuclear RNA Proteins 0.000 description 1
- 102000043141 Nuclear RNA Human genes 0.000 description 1
- 101150089976 ORF144 gene Proteins 0.000 description 1
- 101150075249 ORF40 gene Proteins 0.000 description 1
- 101150050790 ORF49 gene Proteins 0.000 description 1
- 101710087110 ORF6 protein Proteins 0.000 description 1
- 101150080573 ORF90 gene Proteins 0.000 description 1
- 101150034596 ORF95 gene Proteins 0.000 description 1
- 241001465800 Orgyia Species 0.000 description 1
- 101100181496 Orgyia pseudotsugata multicapsid polyhedrosis virus LEF-5 gene Proteins 0.000 description 1
- 101100181498 Orgyia pseudotsugata multicapsid polyhedrosis virus LEF-7 gene Proteins 0.000 description 1
- 101100372859 Orgyia pseudotsugata multicapsid polyhedrosis virus P25 gene Proteins 0.000 description 1
- 101100428663 Orgyia pseudotsugata multicapsid polyhedrosis virus P39 gene Proteins 0.000 description 1
- 101100463342 Orgyia pseudotsugata multicapsid polyhedrosis virus PE38 gene Proteins 0.000 description 1
- 101000770899 Orgyia pseudotsugata multicapsid polyhedrosis virus Uncharacterized 24.3 kDa protein Proteins 0.000 description 1
- 101100064055 Ostreid herpesvirus 1 (isolate France) ORF100 gene Proteins 0.000 description 1
- 101100103570 Ostreid herpesvirus 1 (isolate France) ORF123 gene Proteins 0.000 description 1
- 101150110481 PNK/PNL gene Proteins 0.000 description 1
- 101100156835 Paenarthrobacter nicotinovorans xdh gene Proteins 0.000 description 1
- 241000500437 Plutella xylostella Species 0.000 description 1
- 101710093543 Probable non-specific lipid-transfer protein Proteins 0.000 description 1
- 102000002727 Protein Tyrosine Phosphatase Human genes 0.000 description 1
- 108090000412 Protein-Tyrosine Kinases Proteins 0.000 description 1
- 102000004022 Protein-Tyrosine Kinases Human genes 0.000 description 1
- 102000052575 Proto-Oncogene Human genes 0.000 description 1
- 108700020978 Proto-Oncogene Proteins 0.000 description 1
- 101000748660 Pseudomonas savastanoi Uncharacterized 21 kDa protein in iaaL 5'region Proteins 0.000 description 1
- 241000238706 Pyemotes Species 0.000 description 1
- 241001456341 Rachiplusia ou Species 0.000 description 1
- 102000007056 Recombinant Fusion Proteins Human genes 0.000 description 1
- 108010008281 Recombinant Fusion Proteins Proteins 0.000 description 1
- 241001068263 Replication competent viruses Species 0.000 description 1
- 101000974028 Rhizobium leguminosarum bv. viciae (strain 3841) Putative cystathionine beta-lyase Proteins 0.000 description 1
- 101000756519 Rhodobacter capsulatus (strain ATCC BAA-309 / NBRC 16581 / SB1003) Uncharacterized protein RCAP_rcc00048 Proteins 0.000 description 1
- 101000757825 Rhodobacter capsulatus (strain ATCC BAA-309 / NBRC 16581 / SB1003) Uncharacterized protein RCAP_rcc01784 Proteins 0.000 description 1
- 101000748499 Rhodobacter capsulatus Uncharacterized 104.1 kDa protein in hypE 3'region Proteins 0.000 description 1
- 101000948219 Rhodococcus erythropolis Uncharacterized 11.5 kDa protein in thcD 3'region Proteins 0.000 description 1
- 101000584469 Rice tungro bacilliform virus (isolate Philippines) Protein P1 Proteins 0.000 description 1
- 241000702670 Rotavirus Species 0.000 description 1
- RXGJTYFDKOHJHK-UHFFFAOYSA-N S-deoxo-amaninamide Natural products CCC(C)C1NC(=O)CNC(=O)C2Cc3c(SCC(NC(=O)CNC1=O)C(=O)NC(CC(=O)N)C(=O)N4CC(O)CC4C(=O)NC(C(C)C(O)CO)C(=O)N2)[nH]c5ccccc35 RXGJTYFDKOHJHK-UHFFFAOYSA-N 0.000 description 1
- 101150112782 SNF2 gene Proteins 0.000 description 1
- 101000953093 Salmonella phage P22 Uncharacterized 9.0 kDa protein in gp15-gp3 intergenic region Proteins 0.000 description 1
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 1
- BQCADISMDOOEFD-UHFFFAOYSA-N Silver Chemical compound [Ag] BQCADISMDOOEFD-UHFFFAOYSA-N 0.000 description 1
- 101000818096 Spirochaeta aurantia Uncharacterized 15.5 kDa protein in trpE 3'region Proteins 0.000 description 1
- 241000931755 Spodoptera exempta Species 0.000 description 1
- 241000985245 Spodoptera litura Species 0.000 description 1
- 101000936711 Streptococcus gordonii Accessory secretory protein Asp4 Proteins 0.000 description 1
- 101000766081 Streptomyces ambofaciens Uncharacterized HTH-type transcriptional regulator in unstable DNA locus Proteins 0.000 description 1
- 101000929863 Streptomyces cinnamonensis Monensin polyketide synthase putative ketoacyl reductase Proteins 0.000 description 1
- 101000788468 Streptomyces coelicolor Uncharacterized protein in mprR 3'region Proteins 0.000 description 1
- 101000845085 Streptomyces violaceoruber Granaticin polyketide synthase putative ketoacyl reductase 1 Proteins 0.000 description 1
- 102100021652 Succinate-hydroxymethylglutarate CoA-transferase Human genes 0.000 description 1
- 241000701093 Suid alphaherpesvirus 1 Species 0.000 description 1
- 206010042566 Superinfection Diseases 0.000 description 1
- 101000804403 Synechococcus elongatus (strain PCC 7942 / FACHB-805) Uncharacterized HIT-like protein Synpcc7942_1390 Proteins 0.000 description 1
- 101000750910 Synechococcus elongatus (strain PCC 7942 / FACHB-805) Uncharacterized HTH-type transcriptional regulator Synpcc7942_2319 Proteins 0.000 description 1
- 101000644897 Synechococcus sp. (strain ATCC 27264 / PCC 7002 / PR-6) Uncharacterized protein SYNPCC7002_B0001 Proteins 0.000 description 1
- 210000001744 T-lymphocyte Anatomy 0.000 description 1
- 101000711771 Thiocystis violacea Uncharacterized 76.5 kDa protein in phbC 3'region Proteins 0.000 description 1
- 108020004566 Transfer RNA Proteins 0.000 description 1
- 101800001690 Transmembrane protein gp41 Proteins 0.000 description 1
- 101000768114 Triticum aestivum Uncharacterized protein ycf70 Proteins 0.000 description 1
- 101710134973 Uncharacterized 9.7 kDa protein in cox-rep intergenic region Proteins 0.000 description 1
- 101710095001 Uncharacterized protein in nifU 5'region Proteins 0.000 description 1
- 101000711318 Vibrio alginolyticus Uncharacterized 11.6 kDa protein in scrR 3'region Proteins 0.000 description 1
- 241001672648 Vieira Species 0.000 description 1
- 108010067390 Viral Proteins Proteins 0.000 description 1
- 108010087302 Viral Structural Proteins Proteins 0.000 description 1
- 241000282485 Vulpes vulpes Species 0.000 description 1
- 208000028265 X-linked central congenital hypothyroidism with late-onset testicular enlargement Diseases 0.000 description 1
- 101000916336 Xenopus laevis Transposon TX1 uncharacterized 82 kDa protein Proteins 0.000 description 1
- 101001000760 Zea mays Putative Pol polyprotein from transposon element Bs1 Proteins 0.000 description 1
- 101000678262 Zymomonas mobilis subsp. mobilis (strain ATCC 10988 / DSM 424 / LMG 404 / NCIMB 8938 / NRRL B-806 / ZM1) 65 kDa protein Proteins 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- 238000000246 agarose gel electrophoresis Methods 0.000 description 1
- 239000004007 alpha amanitin Substances 0.000 description 1
- CIORWBWIBBPXCG-SXZCQOKQSA-N alpha-amanitin Chemical compound O=C1N[C@@H](CC(N)=O)C(=O)N2C[C@H](O)C[C@H]2C(=O)N[C@@H]([C@@H](C)[C@@H](O)CO)C(=O)N[C@@H](C2)C(=O)NCC(=O)N[C@@H]([C@@H](C)CC)C(=O)NCC(=O)N[C@H]1C[S@@](=O)C1=C2C2=CC=C(O)C=C2N1 CIORWBWIBBPXCG-SXZCQOKQSA-N 0.000 description 1
- CIORWBWIBBPXCG-UHFFFAOYSA-N alpha-amanitin Natural products O=C1NC(CC(N)=O)C(=O)N2CC(O)CC2C(=O)NC(C(C)C(O)CO)C(=O)NC(C2)C(=O)NCC(=O)NC(C(C)CC)C(=O)NCC(=O)NC1CS(=O)C1=C2C2=CC=C(O)C=C2N1 CIORWBWIBBPXCG-UHFFFAOYSA-N 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000019552 anatomical structure morphogenesis Effects 0.000 description 1
- 230000000890 antigenic effect Effects 0.000 description 1
- 230000005735 apoptotic response Effects 0.000 description 1
- 230000001873 bacteriocinogenic effect Effects 0.000 description 1
- 108010058966 bacteriophage T7 induced DNA polymerase Proteins 0.000 description 1
- 230000004071 biological effect Effects 0.000 description 1
- 230000008827 biological function Effects 0.000 description 1
- 230000000853 biopesticidal effect Effects 0.000 description 1
- 238000006664 bond formation reaction Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 244000309466 calf Species 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000001876 chaperonelike Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 238000011281 clinical therapy Methods 0.000 description 1
- 238000010954 commercial manufacturing process Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 108050003126 conotoxin Proteins 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 125000000151 cysteine group Chemical group N[C@@H](CS)C(=O)* 0.000 description 1
- 210000000805 cytoplasm Anatomy 0.000 description 1
- 230000001086 cytosolic effect Effects 0.000 description 1
- UFJPAQSLHAGEBL-RRKCRQDMSA-N dITP Chemical compound O1[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C[C@@H]1N1C(N=CNC2=O)=C2N=C1 UFJPAQSLHAGEBL-RRKCRQDMSA-N 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006862 enzymatic digestion Effects 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 210000003527 eukaryotic cell Anatomy 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 238000010230 functional analysis Methods 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 description 1
- 125000000487 histidyl group Chemical group [H]N([H])C(C(=O)O*)C([H])([H])C1=C([H])N([H])C([H])=N1 0.000 description 1
- 238000002744 homologous recombination Methods 0.000 description 1
- 230000006801 homologous recombination Effects 0.000 description 1
- 102000043395 human ADA Human genes 0.000 description 1
- 102000053020 human ApoE Human genes 0.000 description 1
- 230000005934 immune activation Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000000968 intestinal effect Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 231100000518 lethal Toxicity 0.000 description 1
- 231100000636 lethal dose Toxicity 0.000 description 1
- 230000001665 lethal effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 108091064355 mitochondrial RNA Proteins 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000003472 neutralizing effect Effects 0.000 description 1
- 238000001807 normal pulse voltammetry Methods 0.000 description 1
- 230000005937 nuclear translocation Effects 0.000 description 1
- 239000002777 nucleoside Substances 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 description 1
- 230000026731 phosphorylation Effects 0.000 description 1
- 238000006366 phosphorylation reaction Methods 0.000 description 1
- 230000007505 plaque formation Effects 0.000 description 1
- 101150048568 pnl gene Proteins 0.000 description 1
- 230000023603 positive regulation of transcription initiation, DNA-dependent Effects 0.000 description 1
- 230000004481 post-translational protein modification Effects 0.000 description 1
- 230000003334 potential effect Effects 0.000 description 1
- 239000013615 primer Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 108020000494 protein-tyrosine phosphatase Proteins 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 238000003259 recombinant expression Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 102000037983 regulatory factors Human genes 0.000 description 1
- 108091008025 regulatory factors Proteins 0.000 description 1
- 230000009711 regulatory function Effects 0.000 description 1
- 108091035233 repetitive DNA sequence Proteins 0.000 description 1
- 102000053632 repetitive DNA sequence Human genes 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000011451 sequencing strategy Methods 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 238000002415 sodium dodecyl sulfate polyacrylamide gel electrophoresis Methods 0.000 description 1
- 238000001179 sorption measurement Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 239000003053 toxin Substances 0.000 description 1
- 231100000765 toxin Toxicity 0.000 description 1
- 108700012359 toxins Proteins 0.000 description 1
- 108091006106 transcriptional activators Proteins 0.000 description 1
- 238000001890 transfection Methods 0.000 description 1
- 238000009966 trimming Methods 0.000 description 1
- 235000011178 triphosphate Nutrition 0.000 description 1
- 239000001226 triphosphate Substances 0.000 description 1
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 1
- 241000701451 unidentified granulovirus Species 0.000 description 1
- 241000712461 unidentified influenza virus Species 0.000 description 1
- 230000009385 viral infection Effects 0.000 description 1
- 230000007484 viral process Effects 0.000 description 1
- 239000013603 viral vector Substances 0.000 description 1
- 210000002845 virion Anatomy 0.000 description 1
- 229960005502 α-amanitin Drugs 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/85—Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
- C12N15/86—Viral vectors
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2710/00—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA dsDNA viruses
- C12N2710/00011—Details
- C12N2710/14011—Baculoviridae
- C12N2710/14111—Nucleopolyhedrovirus, e.g. autographa californica nucleopolyhedrovirus
- C12N2710/14141—Use of virus, viral particle or viral elements as a vector
- C12N2710/14143—Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector
Definitions
- This invention relates to Autographa califomica nuclear polyhedrosis virus DNA sequences and particularly to the DNA sequence of the complete virus genome.
- Autographa califomica nuclear polyhedrosis virus (AcNPV) is a widely studied baculovirus which has been used to form the basis of a polypeptide expression systems (see e.g. US-P-4,745,051 and EP 0 327 626). Modified baculoviruses have also been proposed for use as viral insecticides.
- Baculoviruses are invertebrate-specific viruses with large, circular, covalently closed, double-stranded DNA genomes (Francki et al., 1991).
- early genes and early gene promoters are identified in the AcNPV C6 sequence, genes that have not been reported hitherto and which via substitution of the downstream gene, or by promoter duplication, alone, or in concert with other promoters of AcNPV, or other baculoviruses will allow the expression of foreign genes, or operons, or duplicated AcNPV genes at defined times in the baculovirus infection process.
- the genome data provides information on which specific restriction enzymes do not cut the AcNPV C6 genome.
- the data also identifies restriction enzymes that cut the sequence only once, or twice, or thrice, etc, and the location of all such sites.
- the latter sites can now be removed by deletion (for non-essential, including coding sequences), or by site directed mutage ⁇ esis (for essential, including coding sequences).
- AcNPV derivatives can be constructed that only cut the genome at defined locations (new sites) by these specific enzymes. This will allow the linearisation of the virus DNA at defined locations in order to facilitate the introduction of foreign genes.
- the new sites may be located within a reporter gene sequence for the efficient identification of recombinant expression vectors by the loss of the reporter gene function.
- Additional sequences representing these restriction sites may also be placed in flanking sequences of essential genes to improve the efficient recovery of recombinants using transfer vectors that provide both the foreign gene and the unmodified essential flanking sequences. Further, the use of a number of such enzyme sites strategically located in the virus genome, will allow the preparation of genetically stable, multiple gene expression vectors.
- the genome sequence allows for the identification of essential and non- essential genes in relation to the infection course of the virus in different types of cultured cells and host insects.
- genes that will be proven to be essential to the infection course of the virus in cultured cells and insect hosts and other genes that are non-essential to one or other or both substrates can now be specifically removed from the AcNPV genome without affecting the expression of essential, including flanking, genes, or the replication of the virus in certain cultured cells. Removal of such genes and corresponding reduction in the AcNPV genome size and hence cost to the overall transcription, translation and other processes induced by the virus, or certain other processes and structures naturally operative in the host cell, will provide a preferred expression vector system and improved virus replication.
- the modifications will allow the time when foreign gene products are made to be regulated and improvements to the amounts and quality of such products.
- removal of such genes will be to the benefit of commercial manufacturing processes and environmental safety.
- the removal of natural AcNPV genes that facilitate the persistence of AcNPV in the environment and, or that provide for the productive infection of insect larvae and, or that facilitate the transmission of infectious virus in the environment by affecting characters such as determinants of host range, cell death and larval degradation will be suitable candidate genes to remove.
- the loss of any or all such functions and the derivation of disabled virus expression vectors will prohibit the occurrence of any adverse consequences of virus escape from laboratories or manufacturing establishments, by eliminating any potential effect on natural insect populations in the environment, or the likelihood of re-acquisition of such genes and functions from natural sources.
- both ⁇ uclease and protease genes deleterious to the transcription, expression and product accumulation of foreign genes expressed by baculovirus vectors have been identified. Removal of such genes will also provide for improved expression vectors.
- sequence information allows new sites to be identified for the insertion of single or multiple gene expression cassettes composed of viral promoters, foreign gene(s) of choice, including new polyadenylation sites and transcription terminators.
- cassettes can now be positioned so that they do not affect resident genes, their promoters, terminators, polyadenylation sites, or give mRNA species that act as antisense sequences to required viral genes.
- the sites may be contiguous. Additionally, or alternatively, the sites may be non-contiguous thereby facilitating expression of foreign genes without incurring deleterious positional effects on mRNA transcription.
- the genome sequence allows genetically engineered virus insecticides to be produced by exploiting the advantages described above with regard to tailored genome size, genetic stability, multiple foreign gene expression, and by the exploitation of gene dose.
- the ability to introduce genes into proscribed sites in the AcNPV genome and derivatives without affecting resident genes thereof includes the ability to transfer from other baculoviruses and other origins individual genes, cassettes of genes, and other DNA sequences that will affect the virus host range, its transmission and stability in the environment.
- the benefits will include effects on the LD50 (lethal doses required to kill 50% of target species) and LT50 (lethal time in 50% of members of an infected host species) and other biological properties of the natural virus.
- Such sequences will include, for example, genes representing baculoviruses with alternative host ranges, including genes from viruses that have proved impossible to grow, or to clone in cultured cells.
- the AcNPV genome contains genes and sequences that alone or in concert with host factors regulate the expression of viral genes generally in a temporally controlled fashion.
- the genome sequence allows the identification of all such regulatory viral genes and sequences.
- sequence information contained in SEQ LD NO. 1 may be used in the manufacture of a range of novel polynucleotides which may be used industrially.
- the invention according to one aspect thereof provides the use of sequence information derivable from the complete genomic sequence of AcNPV in the manufacture of a polynucleotide for use in an industrially applicable process.
- the invention further provides the use of sequence information derivable from the complete genomic sequence of AcNPV in the manufacture of a polynucleotide capable of acting as a control sequence in the expression of a foreign gene in an insect or insect cell.
- sequence information is derivable solely and/or primarily from said complete genomic sequence.
- the information may be derived from sequence data present in said complete genomic sequence, but essentially absent from or present in incomplete form in previously available sequence data.
- sequence analysis of the complete genomic sequence contained in SEQ ID NO. 1 has revealed the presence of 154 open reading frames of which 91 have not hitherto been described.
- These novel open reading frames are identified in Table 1 as ORF 13, 22-26, 28-30, 32, 38, 41-46, 50-60, 62-63, 66, 68-79, 81-87, 91-92, 96-98, 101-103, 106-126, 129-130, 140-146, 148-150, 152 and 154.
- the present invention thus includes isolated polynucleotides containing a nucleotide sequence which corresponds to one of the aforementioned ORFs.
- corresponding to as used herein is meant a nucleotide sequence which is identical to the disclosed sequence or which has sufficient homology to hybridize to the aforementioned sequence under hybridization conditions corresponding to TM -19 to TM -25.
- the corresponding sequences may be at least 80%, preferably at least 90% and most preferably at least 95% homologous to the stated sequence. Desirably the degree of homology is not less than 98%),
- the invention also includes polypeptides obtainable by expressing polynucleotides corresponding to the aforementioned ORFs. Such expression may be achieved by incorporating an insert having a sequence corresponding to one of the aforementioned polynucleotides into a suitable expression vector in association with and under the control of appropriate expression control sequences.
- Information derived from the SEQ ID NO. 1 may be used to optimize polypeptide expression in expression systems based upon baculoviruses by selecting appropriate control sequences.
- the present invention further provides a method of synthesizing a polypeptide by expressing the polypeptide in an insect or cultured insect cell which has been transformed by an expression vector derived from AcNPV, the expression vector containing a coding sequence coding for the polypeptide and control sequences responsible for control of replication of the expression vector and/or transcription of the coding sequence, characterized in that the control sequences are selected on the basis of sequence information derived from SEQ ID NO. 1.
- the information derived from SEQ ID NO. 1 additionally enables the efficiency of polypeptide expression to be increased by modifying the nucleotide sequence being expressed so as to take advantage of the preferred codon usage which is characteristic of the ORFs which have been identified in SEQ ID NO. 1.
- the invention provides a method of synthesizing a polypeptide by expressing the polypeptide in an insect or cultured insect cell which has been transformed by an expression vector derived from AcNPV, the expression vector containing a coding sequence coding for the polypeptide and control sequences responsible for control of replication of the expression vector and transcription of the coding sequence, characterized in that the coding sequence is adapted by selecting codons in accordance with the preferred codon usage of AcNPV.
- Preferred codon usage differs between species and expression of foreign polypeptides can often .be hampered if codons contained in the coding sequence to be expressed correspond to less preferred codons in the expression host.
- Knowledge of the preferred condo ⁇ usage for AcNPV allows the DNA sequence of the insert being expressed to be modified so as to increase the proportion of codons which are preferred for AcNPV.
- the coding sequence should be modified (if necessary) so as to ensure that one or more (and preferably at least ten, most preferably at least 15) of the amino acids indicated below are encoded by the indicated codons:
- Val GTG A person of ordinary skill in this art could therefore employ the preferred codons for the different amino acids as described herein in order to the optimize expression of a variety of different heterologous proteins using the claimed expression vector and the claimed methods.
- the genes encoding a desired heterologous protein could be modified to include the more preferred codons (see list above) and to exclude the less preferred codons (see list below for codons to avoid).
- DNA sequences encoding different enzymes, hormones, toxins, antibodies and receptors may be modified as described herein to enhance production.
- proteins useful in agriculture proteins are modified to alter insect behavior in a desirable way), clinical therapy, and or diagnosing disease could be modified.
- these different proteins include, but are not limited to the following: hepatitis B virus core antigen, hepatitis B virus surface antigen, bovine Herpesvirus-1 glycoprotein glV, Human immunodeficiency virus type 1 (HIV-l) envelope protein gp 120, HTV-l envelope protein gp 160, HTV-l Gag protein, HTV-l Gag-pol fusion protein, HTV-l Integration protein, HTV-l Major core p24, HTV-l Nef protein, HTV-l Pol protein, HTV-l protease, HTV-l Rev protein, Human immunodeficiency virus type 2 Gag precursor protein, Human T-cell lymphotxophic virus type 1 (HTLV-1) p20E protein, HTLV-1 gp46 protein, HTLV-1 040* protein, Bacillus thuringiensis subspecies kurstaki HD-73 delta endotoxin, Bacillus thuringiensis subspecies aizawai 7.21 crystal
- the coding sequence should be modified so that the following codons are avoided (these being less preferred codons for the indicated amino acids): Amino Acid Codon(s) to be avoided
- Chaperon sequences shall be defined as a sequence encoding a protein which contains a nucleotriphosphate and which is capable of leading, escorting or "chaperoning" a different protein into the nucleus from the cytoplasm.
- open reading frame shall refer to a specific length of DNA with a methioni ⁇ e start codon and terminated by a translation stop codon.
- Predicted sequences describes a sequence of putative protein as derived from the DNA sequence in the open reading frame. Using the genetic code one of ordinary skill in this art could readily define a protein sequence corresponding to each of the 154 open reading frames presented in Table 1. Putative is defined as "assumed to exist” e.g. “encodes a putative alkaline exonuclease” (infra under the Heading "Gene functions", last para.).
- data is used to define nucleotide sequences based on computer predictions; particularly when assuming the function of putative gene product.
- a consensus sequence is defined as a sequence specific for a biological function or characteristic as determined by computer sequence analysis. Consensus sequences may also be used to define a sequence (and corresponding characteristic or function for this sequence) which is shared or found to be homologous among different species.
- DNA wobble is a term used to explain how the third nucleotide of a codon can vary or "wobble" and still encode the same amino acid.
- TTT and TTC both encode the amino acid phenylalanine and that ATT, ATC, and ATA all encode for the amino acid isoleucine.
- Protease sequence defines those amino acid sequences found on certain proteins which are known or presumed (because of a consensus sequence) to play a role in the enzymatic digestion of other proteins.
- Ligase sequences refers to an amino acid sequence that is capable of joining or ligating the ends of RNA molecules or joining the ends of DNA molecules.
- T4 DNA ligase is used to join or ligate compatible "sticky” or “blunt” ends of DNA derived after restriction enzyme digestion.
- "Sticky” and “blunt” are terms in the art to define how the ends of DNA molecules appear after restriction enzyme digestion.
- Helicase sequences are protein sequences in enzymes associated with the unfolding of DNA molecules.
- polymerase sequences refer to either RNA or DNA polymerases. These enzymes are responsible for synthesizing RNA or DNA from the appropriate template.
- Deleterious sequences refer to a sequence that can have a deleterious effect on the production or efficiency of certain proteins being produced in the host cells.
- a protease sequence might be deleterious if this portease specifically breaks down the foreign recombinant protein synthesized in the insect cell via a baculovirus expression vector.
- Enhancer sequences are DNA sequences which increase the transcription of a virus gene. For example, dot matrix analysis of the AcNPV sequence against itself and its complement revealed eight regions of direct and inverted repetitive DNA sequences (hr ⁇ -hr5). The hrs are involved in enhancing early mRNA transcription and act as origins of DNA replication (infra, first two sentences under the heading "AcNPV genomic organization and repetitive DNA”.
- disrupted, interrupted, mutated, and deleted are sometimes used interchangeably in reference to specific ORFs. It is intended that these terms refer to a condition where the encoded protein is no longer functional due to a disruption, interruption, mutation, or some other interference that prevents, shuts down, nullifies or inhibits the otherwise named function.
- SEQ ID NO. 1 was derived from the C6 clone of AcNPV
- sequence information provided according to the invention may be used to optimize expression in other baculovirus expression systems.
- published partial sequence data, restriction enzyme and hybridization analysis can be used to identify other clones and baculovirus isolates from insects which may be strains, variants or varieties of AcNPV.
- isolates include viruses obtained from Autograph califomica, Autographa gamm, Galleria mellonella, Plutella xylostella, Rachiplusia ou, Spodoptera exempta, Spodoptera litura and Trichoplusio ni.
- Such viruses are likely to possess DNA sequences, genes, origins and replication, transcriptional promoters, terminators and regulatory factors in common with those of AcNPV C6 and such entities are likely to be involved in directing the course of infection, multiplication and morphogenesis of these viruses as well as their interactions with hosts, host cells and components thereof. Accordingly, the information provided according to the invention of SEQ ID No. 1 may be used in the development of expression systems utilizing these alternative viruses and virus strains.
- the complete nucleotide sequence of the genome of clone 6 of the baculovirus Autographa califomica nuclear polyhedrosis virus (AcNPV) has been determined.
- the molecule comprises 133,894 base-pairs and has an overall A + T content of 59%.
- Our analysis suggests that the virus enclodes some 154 methionine-initiated, and potentially expressed, open reading frames (ORFs) of 150 nucleotides or greater. These ORFs are distributed evenly throughout the virus genome on either strand.
- the ORFs are arranged as adjacent, non-overlapping reading frames, separated by short intergenic regions.
- Figure 1 A physical map and summary of coding strategy of the
- Figure 2. A dot matrix analysis of AcNPV genomic DNA.
- Figure 3. A circular map of the AcNPV genome.
- Figures 4 - 14. A construct for modififying the following respective genes to identify which genes are dispensable (non-essential) and which genes are indispensable (essential) for viral replication in cell culture or insect larvae.
- Figure 15 Single restriction enzyme site within the AcNPV EGT gene.
- AcNPV genomic DNA was prepared as described by Possee (1986).
- the DNA was digested with an appropriate restriction endonuclease (Bam ⁇ I, Bg ⁇ i, EcoBl, HindTH, Pstl, Sstl, Sst ⁇ ).
- the derived DNA fragments were inserted into pUC18/19, pUC118/119 or pT7T318/19 vectors using standard protocols (Sambrook et al., 1989).
- plasmids containing larger regions of virus DNA were digested with a restriction enzyme to release the insert, the virus DNA purified using agarose gel electrophoresis and then digested with another restriction enzyme. These smaller DNA fragments were inserted into plasmid vectors to provide materials more convenient for DNA sequencing.
- Reaction mixtures contained the dGTP analogue, 7-deaza dGTP, in lieu of dGTP in order to reduce sequence compressions.
- dITP was substituted for dGTP in the sequencing reactions.
- the M13 primer (5' GTAAAACGACGGCCAGT) was used to sequence the ends of each virus DNA fragment.
- Oligonucleotide primers prepared using an Applied Biosystems Instruments synthesizer (ABI, Model 380B, Warrington, UK), were employed to obtain the internal sequences of the viral fragments. Where appropriate, double-stranded DNA templates were used to complete regions of the AcNPV sequence not analysed as single-stranded DNA.
- An ABI automated sequencer (model 370A) was also used on occasion. Using the established nomenclature for describing (by rank of size) the AcNPV restriction endonuclease fragments (e.g., A, B, C, etc.,), the following cloned virus DNA fragments were completely sequenced: BamHl-D, -E and -G; BgHl-G', HinaH-C to -K, -O to -S, -U, -W and -X; PsrI-J to -M, Sstl-F to -H.
- the AcNPV restriction endonuclease fragments e.g., A, B, C, etc.
- Partially sequenced fragments included: BamHI-E; BgU -E and -H; HindHl-L; Pstl-B and -C; Sstl-O and Ssf ⁇ -I. All the DNA sequences between adjoining virus DNA fragments were determined using appropriate subclones spanning the respective junctions.
- the DNA sequences of the AcNPV C6 homologous region (hr) 1, .EcoBI-I and -R fragments have been reported (Possee et al., 1991).
- the remaining sequence of this AcNPV clone was determined from a data set comprising approximately 106 nucleotides.
- the complete AcNPV genomic sequence has been determined to consist of 133,894 base-pairs (bp) and has an A+T content of 59%.
- the distributions of purines and A+T nucleotides for the plus strand (+ strand; see convention established by Vlak and Smith, 1982) throughout a linearized representation of the circular AcNPV genome is shown in Fig. 1, using a moving window of 250 nucleotides.
- FIG. 1 A physical map of the genome was derived from the sequence data and is also illustrated in Fig. 1. This shows the arrangement of some of the common restriction enzyme sites frequently used to map the virus DNA (Ec ⁇ Rl, HindUL, Pstl, Sstl, BgiU, Xhol). Although circular, the map is presented with the first JEcoRI site of ⁇ rl as the left end of the genome.
- the virus DNA fragments shown in Fig. 1 are labelled alphabetically, in decreasing order of size (Vlak and Smith, 1982).
- a small fragment of 38 nucleotides is present between the HindUI-L and -M fragments and a 12 nucleotide fragment between the Hind ⁇ I-C and -W fragments (see Lu and Carstens, 1991 for the data on the clone HR3).
- the only exceptions to labelling fragments uniquely according to their size are the HindUl-Al (15,293 bp) and -A2 (7,576 bp) fragments. These are designated Al and A2 in Fig. 1 solely for convenience of comparison with previously published data.
- the Sstl map is modified to interchange the SsrI-A and -B fragments and the BgUL map is modified to interchange the B ⁇ ZII-G and -H fragments.
- the Ar4c represents an imperfect copy of the typical AcNPV 30 bp palindrome since there is a base change that mutates to AAATTC the characteristic JBCORI site (GAATTC) found in the centre of all other AcNPV hr palindromes (Table 2).
- Fig. 1 shows the positions (black boxes) of 337 open reading frames (ORFs) that are initiated with a methionine codon (vertical bars) and which could encode polypeptides of at least 50 amino acids.
- ORFs open reading frames
- This strategy of analysis does not identify gene products that may be smaller than 50 amino acids, or products that are generated by removal of introns from primary mRNA transcripts representing larger regions of the genome.
- ORFs open reading frames
- ORF 1 encodes a virus protein tyrosine/serine phosphatase (FTP) previously identified by Kim and Weaver (1993).
- FTP virus protein tyrosine/serine phosphatase
- Table 1 provides a more detailed summary of the information concerning the selected ORFs.
- the left end of each ORF identified in Table 1 (column Left) represents the site of either the translation initiation or termination codon, as determined by the orientation of the ORF.
- the right end of each ORF (Table 1, column Right) indicates the respective translation termination or initiation codon.
- the .direction of transcription (Table 1, column D), relative to that of the polyhedrin gene, is indicated by an arrow.
- the predicted number of amino acids (Table 1, column aa) per methionine initiated polypeptide derived from the ORF, and the M r of that polypeptide are also given.
- ORF128, Fig. 1 the large ORF encoded entirely within the region of gp67 (ORF128, Fig. 1), but on the opposite strand, was excluded from our final dataset.
- ORF100 which encodes the basic DNA binding protein, p6.9, of AcNPV (Wilson, et al., 1987), was included in our final dataset. As a consequence the two similar sized ORFs that overlap ORFIOO were not. Further analyses of the selected and non-selected ORFs will determine whether these assumptions are correct.
- ORF6 (lef-2) starts within the 3' region of ORF5.
- ORF14 (lef-1) overlaps the start of ORF13.
- ORF25 in Table 1 was recorded as 2 smaller ORFs by the same authors. In the vicinity of residue 7,497 there are 4 extra nucleotides compared to the previous published AcNPV C6 sequence data (Possee et al., 1991). This causes a frameshift in the coding region and results in an extension of a predicted protein, PKl (ORF10), from 196 to 272 amino acids.
- PKl predicted protein
- hrl Dot matrix analysis of the AcNPV sequence against itself and its complement revealed 8 regions of direct and inverted repetitive DNA (Fig. 2, identified as hrl, Aria, hrl, hrZ, Ar4a, hr4b, ⁇ r4c, hr ⁇ ).
- the hr regions are involved in enhancing early mRNA transcription and as origins of DNA replication (Pearson et ⁇ l., 1992; .Leisy.and Rohrmann, 1993; Kool et ⁇ l., 1993a,b).
- Other regions of DNA sequence were identified that have direct or inverted repetitive DNA that meet the minimal 21/24 bp matching criteria. The significance of these sequences is unknown.
- Table 2 is listed a number of the larger, non- ⁇ r inverted repeats that could in single-stranded forms produce hairpin structures. These may be relevant to the secondary structure of mRNA species and affect the transcriptional or translational efficiencies of a particular ORF. In this regard, it is noted that most of these sequences occur within ORFs, rather than in intergenic sequences (Table 2). Their presence may be solely a consequence of the encoded amino acid sequence and the codons used. However, of particular note is the palindromic sequence found within the 25K gene (FP-protein; ORF61) and its similarity to the hr palindromic sequences (see Table 2).
- RNA polymerase II RNA polymerase II
- AGT first potential translation start codon
- TATA boxes shown in Table 3 represents a sampling of several of the core DNA elements that are recognised to bind transcription factors (TFIID and TFUD-like proteins) (Ghosh, 1992).
- TFIID transcription factors
- TFIID transcription factors
- One general, loosely-defined consensus for the TFIID binding site is TATA(A/T)A(A/T) (Nikolov et al., 1992).
- the patterns that were employed were selected to limit the number of matches obtained when only TATA was used as the search motif. In the TATA motif search it was observed that the two patterns that favoured the A residue at position 6 were preferred over the third pattern (TA AAT, see Table 3).
- the TATAAA motif occurs in 46% of the cases, the TATATA motif in 34%, and the TATAAT motif in 19%.
- the CAGT motif is not always found at the start site of AcNPV early mRNA species. It should also be noted that in identifying possible RNA pol II promoter sites, we only considered the relative positions of the TATA box and CAGT motif (i.e., a TATA box 5' to a CAGT motif within the 5' leader sequence that was analysed, see above). Generally, however, in eukaryotes the TATA box motif is within 20 to 40 nucleotides of the mRNA cap site (Roeder, 1991; Zawel and Reinberg, 1992).
- AcNPV late genes are transcribed from a consensus late promoter transcription start signal (TAAG; Blissard and Rohrmann, 1990).
- the TAAG motif shows a dramatic difference in occurrence within the leader sequences of the selected ORFs (71 ORFs, 46%, Tables 1, 3) compared to the non- selected ORFs (11 ORFs, 6%; Tables 1, 3).
- A- T rich regions flank AcNPV ORFs (Kuzio et al., 1984). While the nucleotide composition of the genome is 59% A+T, A+T rich regions are not uniformly ( randomly) distributed.
- Fig. 1 shows several regions of A+T composition th approaches 85% when measured with a 250 nucleotide moving window.
- Althoug A+T rich regions often flank AcNPV genes this characteristic is not absolute
- the region 5' to the viral DNA polymerase (ORF65) is not especiall A+T rich.
- the TAAG motif occurs less frequently than would b expected for a random sequence.
- GAAT a sequence of similar composition, GAAT, occurs 574 times on the strand and 595 times on the — strand.
- th expected frequency of a sequence conforming to the composition (A2TG) in 133,894 bp genome of the base composition of AcNPV and involving randoml distributed bases is 705 occurrences per strand.
- a frequency distribution profile of the nucleotides surrounding the start codon of the 154 selected ORFs is shown in Table 4.
- the dominance of an A residue at the -3 and perhaps -2 positions relative to the A of the ATG translation start sites in the corresponding DNA is the only significant characteristic of the selected ORFs. G at -3 is not favoured in the selected ORFs.
- ORFs in AcNPV initiate translation at an ATG downstream of an in-frame ATG in the transcribed mRNA (Table 1, column K, identified as "2"). These are gp67 (ORF128) and PCNA (ORF49) (O'Reilly et al., 1989; Whitford et al, 1989).
- gp67 ORF1278
- PCNA ORF49
- the amino acids and predicted M r of the selected ORFs are based on the calculations for the largest potential ORF initiated with a methionine. This assumption over-estimates the size of the primary translation products for gp67 and PCNA, and for any other product for which translation is initiated at a downstream in-frame ATG.
- mini-cistrons There are 15 short ORFs (mini-cistrons) that are located immediately upstream (within 80 nucleotides) of the translation start site of the selected ORFs. All these mini-cistrons have ATG flanking sequences that conform to Kozak's rules. These are identified as "! in Table 1, column K. For mini-cistrons that are out-of-frame with respect to the larger ORF, a termination codon occurs either upstream of the selected ORF, or within a short distance into its coding region. Mini-cistrons have been reported in the 5' leaders of other baculovirus genes (Tomalski et a/., 1988; Blissard and Rohrmann, 1989) and may have regulatory roles in the translation of mRNA species.
- codons that are used (Table 5), for example AGG and CGG (arginine), GGG (glycine), CTA, CTC, CTT (leucine) which are each used at less than half of the frequencies that may be expected if all the possible codons were utilized equally. While some codons appear to be discriminated against in the selected ORFs, others appear to be favoured (Table 5), for example CAA (glutamine), GAA (glutamic), GGC (glycine), ATT (isoleucine), TTG (leucine), and AAA (lysine). To what extent codon bias affects the expression level of AcNPV genes, or foreign genes expressed from AcNPV- derived expression vectors, remains to be determined.
- the predominant translation termination codon utilized by the selected ORFs is TAA. It terminates 117 of the 154 ORFs (76%, Table 5).
- CpGV LAP Cydi ⁇ pomonell ⁇ granulosis virus
- AcNPV encodes a gene with identity to the acidic and basic fibroblast growth factors (FGFs), also known as heparin binding growth factors (HBGF, reviewed by Burgess and Maciag, 1989; Klagsbrun and D'Armore, 1991).
- FGFs acidic and basic fibroblast growth factors
- HBGF heparin binding growth factors
- the AcNPV FGF- like gene product shows c ⁇ . 35% identity (75% similarity) with known members of the FGF superfamily.
- GTAs Global transactivators
- D. mel ⁇ nog ⁇ ster brahma gene is encoded by a 1638 codon ORF (Tamkun et al., 1992) while the yeast SNF2 gene contains an ORF of 1703 codons (Laurent era/., 1991).
- PNK/PNL encodes a protein that may have multiple functions.
- the amino terminal portion is strongly related to T4 RNA ligase (31% identity, 72% similarity) while the carboxy terminal half of this protein is related to T4 polynucleotide kinase (26% identity, 66% similarity).
- AcNPV encodes a chitinase (ORF126) that resembles those of other organisms, most notably Serr ⁇ ti ⁇ m ⁇ rcescens (57% identity; 88% similarity; Jones et ⁇ l., 1986). Analyses of the function of the viral chitinase indicates that it has a role in the liquefaction of infected larvae (R. Hawtin and R.D. Possee, manuscript in preparation).
- AcNPV also encodes a putative alkaline exonuclease (ORF133).
- ORF133 has 53% identity with its Orgyi ⁇ pseudotsug ⁇ t ⁇ NPV (OpNPV) homologue (Gombart etal., 1989).
- RNA binding motifs As part of our search for potential virus-encoded RNA polymerase subunits, we searched for DNA binding motifs. A sample of the motifs used for the searches are shown in Table 3. They include zinc fingers (Table 1, Dom column, “Z”), leucine zippers (Table 1, Dom column, “L”), nucleoside triphosphate binding domains (Table 1, Dom column, “NTP”) and nuclear translocation signals (Table 1, Dom column, “NTS”).
- Zinc fingers were found in two potential apoptosis inhibitory proteins IAPl (ORF27) and IAP2 (ORF71) (Table 1). Zinc fingers were also found in the early genes IE-1 (ORF147), ME53 (ORF139) and PE38 (ORF153). The zinc finger suggested to be in cg30 was not identified by our analysis. However, the leucine zipper in the cg30 protein (ORF88) was identified. Leucine zippers were found in 7 other potential polypeptides, including the calyx protein, pp34 (Table 1).
- NTP binding motif was identified in 4 ORFs, 3 of which are known as late enhancing factors (lefs, Table 1).
- the fourth protein was PNK/PNL (ORF86).
- searches with a simplified motif for the ATP-binding site in protein kinases would not have found matches in either PKl (ORF10), or PK2 (ORF123), both of which have extensive overall identity with known protein kinases.
- PKl lacks a consensus ATP-binding motif, having IxGxxG at the ATP-binding site, while PK2 completely lacks this N- terminal domain.
- NTS motifs were found in 12 of the selected ORFs.
- Known nuclear localising proteins that have an NTS include 39K, DNA polymerase, and p6.9.
- No NTS was found for the plO protein, which is the component of fibrous bodies present in the nuclei of AcNPV infected cells. It is possible that this and other viral proteins enter the nucleus using an alternative pathway, or are chaperoned by a protein containing an NTS. None of the AcNPV proteins that are known to be solely cytoplasmic had a predicted NTS.
- the cDNA sequence information for A. califomica can be used to design a vector which is capable of optimally expressing a desired protein product (called a "designer vector").
- a design vector An investigator of ordinary skill in this art would analyze a variety of different factors prior to deciding on which genetic elements should be included in a specific designer vector. For example, an investigator might study the following factors before designing a vector; the protein to be synthetically produced, the host cells to be used, desired temporal timing for protein production, available insertion sites for the non-natural promoters, any known deleterious sequences or proteases that could reduce the amount of protein being produced, etc.
- the designer vector can include a single promoter, multiple promoters, tandem promoters, combinations of synthetically constructed promoters, natural promoters and derivatives thereof.
- the choice of promoters depends on several factors and is usually performed on a vector to vector basis (case by case basis). Additionally, many different genetic elements can be included in the designer vector and deciding which to include or exclude depends on the desired protein to be recombina ⁇ tly produced in the baculovirus expression vector system.
- a vector can be designed and constructed to optimize the isolation and recovery of the desired protein.
- the vector can be designed to include specifically identified secretion sequences determined from the cDNA sequence data.
- califomica cDNA sequence information locations of transcription and translation signal sequences can be determined. Additionally, specific flanking sequences near the ATG sequence of the open reading frame can be identified and then used in order to optimally transcribe the ORF.
- the A. califomica cDNA sequence information can be used to identify new genes. Once these new genes are identified, their promoters (early, late, immediate early or immediate late) may then be obtained and used in vectors. The new promoters from these new late genes may then be used to drive the expression of desired genes more efficiently and effectively when compared to the polyhedrin.
- essential and non-essential gene regions can be identified.
- essential and non-essential genes refer to the virus replication in cell culture (e.g. Spodoptera frugiperda cells).
- ORFs 126 chitinase
- 127 cathespin
- ORFs 126 and 127 have been shown to be non-essential genes.
- these two gene could be eliminated from the A. califo ica sequence and not affect the nature of the sequence. Elimination of these two non-essential genes could be performed by standard protocols known to those skilled in the art.
- the rationale for identifying non-essential genes is to reduce the genome size to smaller and more functional pieces in order to create a more effective, and environmentally acceptable pesticide or in order to create a more effective vector.
- regions that are essential for enabling a virus to live in an insect cell can be identified. Once these essential regions are identified, the essential sequence can be used to produce a virus that will not propagate in live insects. One use of such an environmentally safe virus would be used as a selective pesticide.
- the A califomica cDNA sequences claimed in this invention can be used to design a plasmid vector capable of optimizing expression of the desired protein.
- One way in which this plasmid vector can be tailored to more effectively and efficiently produce the desired protein of choice is to optimize it for the particular host.
- SF9 cells are the optimal cells of choice for production of desired proteins in the baculoviral expression vector system.
- a designer vector as in Example 11 above can be constructed for optimal expression of the desired protein in the SF9 cells by deleting selected deleterious sequences and/or providing enhancer sequences.
- the A. califomica cDNA sequences claimed in this invention can be used to design a complete virus which is specifically constructed to contain specific and unique elements which will enhance the infectivity of this virus in a particular insect cell.
- a viral particle can -be designed to infect and kill the insect at an early stage.
- the claimed sequence can also be used to produce a virus capable of infecting larvae and not adult insects.
- An additional embodiment of this invention is to use the claimed A. califomica cDNA sequence to tailor or design a virus which is capable of infecting only specific insects, thereby constructing a very host specific virus.
- a self destructive mechanism may be included in the viral particle. This mechanism can be designed such that once the viral particle has killed the host specific insect, the virus destroys itself via a time, chemical, or enzymatic attack. This self destructive mechanism will effectively eliminate any residual virus and therefore produce a more environmentally acceptable pesticide.
- a sequence known to trigger lysis may be inserted adjacent to a late or early promoter.
- the availability of the complete AcNPV sequence and subsequent experimental data will allow the identification of those virus genes with roles in determining those insect species which can be infected with the virus.
- the virus could be modified to limit infection to the target pest species, while leaving other species unaffected.
- baculoviruses may be engineered to expand their host range to include several pest species.
- AcNPV has a wide host range in comparison to other baculoviruses and therefore may be a source of "host range genes" which can be added to these other baculoviruses.
- Certain proteins are naturally expressed in A. califomica (for example, heparin binding factor).
- the cDNA sequence information of the claimed invention can be used to enhance or increase production of the proteins that are naturally expressed in califomica, for example, by inserting additional promoter sequences and/or by deleting certain sequences deleterious to the production of the desired protein.
- the deletion of the annihilator gene (ORF 135) from the virus results in a phenotype in which virus-infected cells die through a process of apoptosis or early cell death. In effect, the cell commits suicide to prevent replication of the virus.
- ORF 135 The deletion of the annihilator gene (ORF 135) from the virus results in a phenotype in which virus-infected cells die through a process of apoptosis or early cell death. In effect, the cell commits suicide to prevent replication of the virus.
- ORF 135 The deletion of the annihilator gene (ORF 135) from the virus results in a phenotype in which virus-infected cells die through a process of apoptosis or early cell death. In effect, the cell commits suicide to prevent replication of the virus.
- other genes could be identified with a similar function to the annihilator gene, i.e. preventing the cell from undergoing an apoptotic response. These genes would
- Fig. 1 a linear representation of the map is shown. Since the virus genome is circular, a more conventional map for the AcNPV genome is given in Fig. 3. In this map the identified genes (hatched arrows), and unassigned selected ORFs (open arrows) are shown as well as their orientations. Also indicated in Fig. 3 are the sites of Ar sequences and insertion (IS) and retroposon sequences (RP). This circular map includes the revised Eco ⁇ l (outer ring) and HindUL (inner ring) fragment lengths of AcNPV C6.
- ORFs were identified within the virus genome that could potentially encode proteins of greater than 50 amino acids. This selection allowed inclusion of the 55 amino acid, arginine-rich p6.9 protein (basic protein, Wilson et al., 1987). It disregards smaller ORFs, some of which may encode proteins or peptides that are made during the virus infection process.
- the 154 ORFs were selected on the basis of their possession of a methionine codon and the absence of a larger, overlapping ORF. Again these assumptions may prove to be incorrect in some cases (e.g., where a spliced mRNA is involved).
- the number of gene products encoded by the AcNPV genome may be larger or smaller than 154, depending on the extent that the assumptions made in these analyses prove to be correct.
- other strains of the virus may include additional sequences (insertions, or ORFs), or lack sequences by comparison to those in the C6 virus. Since it is valuable to have a reference point for comparison purposes, it is suggested that the AcNPV C6 ORF numbering nomenclature is adopted pro temporis and until virus gene functions are described for the particular ORFs.
- the complete AcNPV sequence was analysed using a neutral-net ORF identification programme, GRATL (Uberbacher and Mural, 1991), in order to predict potential protein coding regions.
- GRAIL was originally designed as a programme for identifying coding exons in human and other DNA sequences.
- the GRATL coding recognition module incorporates seven sensor algorithms. Each component of the module provides an indication of the coding potential of the DNA sequence. The various sensor outputs are integrated using a neutral network which also predicts the locations of the coding regions.
- the system has been demonstrated to be effective in the identification of 90% of exons over 100 bases long in human DNA (Uberbacher and Mural, 1991). In part this success rate depends on the G+C content of the DNA.
- Coding regions are recognised less easily in DNA sequences with a lower G+C than A+T content.
- the G+C content of the AcNPV genome is only 41%, so the coding regions predicted from the GRATL an lysis must be treated with some caution.
- the candidate ORFs that were identified by GRATL were rated as excellent, good, marginal or null (Table 1).
- Most of the AcNPV genes which have been assigned functions gave excellent or good ratings using this method. The most notable exceptions were the protein tyrosine phosphatase (Kim and Weaver, 1993), p6.9 (Wilson et al., 1987) and conotoxin (Eldridge et al., 1992).
- GRAIL provided a complementary analysis of the likely coding potential of the AcNPV genome. The value is confirmed by the fact that GRAIL predicted 84% of the 154 selected ORFs (Table 1), whereas only 4% of the 183 non-selected ORFs were identified by GRATL as having potential protein coding capacity.
- TATA boxes TFITD binding sites
- CAGT possible mRNA transcription start sites
- the CAGT motif is associated with many baculovirus early gene promoters and is probably a good indicator of whether or not a virus gene is transcribed in the early phase of the replication cycle.
- the TATA boxes are more problematic, in part due to the high A+T content of the AcNPV genome and its intergenic regions. More than one TATA box was present upstream of many of the ORFs.
- Table 3 we located TATA boxes upstream of 40% of the selected ORFs. Of the 3 TATA box patterns utilised to identify possible TFIID-type binding motifs (Table 3),. the TATAAA motif, which is the preferred TFIID binding site, was the most frequent in the selected ORFs identified to be early genes.
- RNA pol II promoter within 160 nucleotides of the ATG codon of the respective ORF.
- the known AcNPV early genes identified by this procedure include: ME53 (ORF139), IE-1 (ORF147), IE-N (ORF151), and PE38 (ORF153).
- the presence of a TATA motif does not prove that it is used in early transcription by RNA pol EL This can only be determined by experimentation.
- lef-3 has consensus TATA and CAGT motifs in its 5' leader, but no evidence has been reported that these are utilised in early mRNA synthesis (Li et al., 1993).
- the polyhedrin gene has an RNA pol II motif within its promoter region.
- CGTGC motif Alternative transcription start sites, initiating from a CGTGC motif, have been identified in some AcNPV early gene promoters.
- the CGTGC motif is utilised as early start sites for pl43 (Lu and Carstens, 1991), DNA polymerase (Tomalski et al., 1988) and p47 (Carstens et al., 1993).
- pl43 Long and Carstens, 1991
- DNA polymerase Tomalski et al., 1988
- p47 Carstens et al., 1993.
- this motif is involved in the expression of the AcNPV delayed-early genes and may be a site of recognition by virus-encoded, trans-activating proteins.
- the CGTGC motif is broadly similar to sequences found in AcNPV Ar regions, i.e., TYC(A/T)(A/T)A(AT)CGXGTRA (where Y is a pyrimidine, R a purine and X any nucleotide).
- the CGTGC motif is evenly distributed between the selected ORFs and the non-selected ORFs, suggesting that the definition of this motif is not refined enough toie of predictive value. If it is important, its placement may not be confined to the immediate 5' leader sequence of a neighbouring gene.
- the late and very late transcription start sites involve a TAAG motif (Blissard and Rohrmann, 1990).
- TAAG rather than the canonical ATAAG or RTAAG sequences to search for ORFs that might be transcribed late in infection in an endeavour to maximise the chance of finding matches.
- the 46% of the selected ORFs that are identified as probable late/very late genes may under ⁇ estimate such genes.
- cg30 ORF88
- initiates from the sequence ATTAG Wang and Miller 1989.
- the late gene p74 (ORF138) initiates transcription at the sequence TATTG (Kuzio et al, 1989) and p47 (ORF40) has a late transcription start site GTAAAAC (Carstens et al., 1993).
- a search for similar matches to the start site used in p47 revealed a good match at nucleotide 66,740 in the coding region for gp41 (ORF80).
- an ATAAG motif is present 145 nucleotides upstream of this site.
- codon usage table for the 154 selected ORFs presented in this study (Table 5). There appears to be some codon bias.
- the codon usage bias shown by the AcNPV ORFs may reflect some state of the tRNAs available to the virus during the infection process. However although the sample base is low, so far we have not been able to detect a differential codon bias between early and late expressed genes.
- C ⁇ s-acting elements (hrs) involved in the origins of AcNPV DNA replication have been shown to be A+T rich.
- OpNPV appears to have at least one origin that is slightly G+C rich, but with a neutral purine composition, i.e., different from the hrs of AcNPV, or the transcription enhancer regions found within OpNPV (Pearson et al, 1993).
- the region in AcNPV homologous to the OpNPV origin of replication lies within ORF13 and ORF14 (lef-1).
- Choristoneura fumiferana NPV CfNPV
- Bombyx mori NPV BmNPV
- the Ar sites of baculoviruses may be active in inter- or intra-molecular recombination. If recombination was involved in the one or other inversion, how this occurred is not certain since there is no obvious relationship between the left and right arms of the second inverted region in
- OpNPV the corresponding regions of AcNPV are A+T rich. This suggests that an intramolecular inversion may have taken place in OpNPV. However, a detailed analysis of this region in that virus has yet to be undertaken.
- the Ar regions of AcNPV have been implicated in replication of the AcNPV genomic DNA and may act as origins of replication (Pearson et al., 1992; Leisy and Rohrmann, 1993; Kool et al., 1993a,b). Furthermore, recent studies have identified regions of the AcNPV genome that encode products that act on the origins of replication (Kool et al., 1994).
- the AcNPV + strand G-rich sequence at position 78,300 of AcNPV shows no overall bias in A+T content but has a pronounced spike with respect to total purine composition (ca. 78%, Fig. 1).
- Purine-rich tracts can potentially form an intrastrand triple-helix and tetrads.
- Triple helical DNA has been implicated as an origin of replication for some plasmids, as well as having other potential regulatory functions (Caddie et al., 1990).
- the only other region of elevated purine composition in AcNPV occurs within the coding region of ORF66 (ca. 68% purines).
- the purine rich region within ORF66 is also A+T rich, thus A residues- contribute highly to the purine composition of the + strand.
- the CpGV IAP gene provides the 3 ⁇ k gene function in AcNPV 35k-negative mutants, thereby preventing the annihilator phenotype of the mutant (Crook et al., 1993).
- the CpGV IAP provides the 3 ⁇ k gene function in AcNPV 35k-negative mutants, thereby preventing the annihilator phenotype of the mutant (Crook et al., 1993).
- there is no sequence identity between AcNPV 35k and CpGV IAP In view of the structural homologies between the CpGV IAP and the AcNPV IAPl and IAP2 genes, the roles and functions of these AcNPV genes warrant further investigation. It has been shown that the 35k-negative AcNPV mutant, while unable to replicate efficiently in S. frugiperda cells in culture, or in whole larvae, can be propagated in T. ni cells, or insects.
- the AcNPV IAPl and IAP2 genes may prevent apoptosis in AcNPV infections of other cell types or larval species. Further, it has been shown that over-expression of a human inhibitor of apoptosis (BCL2) in S. frugiperda cells (Alnemri et at., 1992), using an AcNPV expression vector, results in the protection of the cells against apoptosis. These recombinant virus infected cells have an extended survival time and do no . show the degradation of host cell DNA that is evident in cells infected with wild- type AcNPV. It is not known if over-expression of the AcNPV IAP genes results in extended survival of virus-infected cells. The AcNPV LAP genes do not share any structural similarity with BCL2, or any other known IAP gene. However, the viral LAP genes are similar to certain DNA binding proteins by the possession of 3 copies of a zinc finger motif.
- AcNPV encodes a gene with identity to the FGFs and HBGF family of growth factors.
- Two conserved cysteines have been identified in all the human FGFs sequenced to-date. These are Cys31 and Cys98 (relative to human acidic FGF). These cysteines have been implicated in intramolecular disulfide bond formation (Burgess and Maciag, 1989). The N-terminal cysteine is lacking from the putative AcNPV FGF.
- site-directed mutagenesis of cDNA clones has implicated Lysl33 in heparin binding (Burgess and Maciag, 1989).
- the AcNPV FGF has an arginine at this position.
- Hbg3 This substitution of one basic residue for another also occurs in the int-2 proto-oncogene precursor Hbg3.
- Free heparin is known to inhibit the growth of herpes simplex viruses (Nahmias and Kilbrick, 1964). More recently, it has been shown that heparin binds to HSV-1 virions via the glycoprotein gC (WuDunn and Spear, 1989; Herold et al., 1991) and prevents their adsorption to heparin sulphate moieties resident on cell surface proteogl cans. Heparin similarly inhibits plaque formation of pseudorabies virus by binding to glycoprotein gELI (Mettenleiter et al., 1990).
- heparin binding factor by the baculovirus could be a method to complex free heparin (or heparin-related compounds) thereby facilitating virus spread within the host.
- the virus FGF has a signal peptide sequence at its amino terminal sequence which may facilitate secretion from virus-infected cells.
- the GTAs are non-DNA binding proteins thought to have a role in the regulation of homeotic genes (Tamkun et al., 1992).
- Homeotic genes are involved in the expression of a large group of other genes that have been implicated in directed development and growth of an organism (McGinnis et al., 1984a,b; Scott and Weiner, 1984; Levine and Hoey, 1988; Hayashi and Scott, 1990).
- the AcNPV ORF42 has homology with the GTAs of D. mel ⁇ nog ⁇ ster (Tamkun etal., 1992) and yeast (Laurent et al., 1991).
- the AcNPV GTA-like protein does not have either the early CAGT, or late TAAG transcription initiation sites, so it is difficult to predict when it may be expressed in virus-infected cells. Transcriptional analysis is required to determine if and when it is synthesized.
- a viral GTA might be involved in regulating a number of genes involved in viral processes, such as late gene transactivation. It is also conceivable that the AcNPV GTA-like gene acts as a repressor to inhibit host gene expression.
- RNA polymerase has at least 8 subunits with apparent sizes of 95, 76, 50, 47.5, 40, . 33.5, 27.5, and 26 kDa (Yang et ⁇ l., 1991). These subunits are believed to be distinct from host encoded RNA polymerase subunits.
- the level of processing of viral RNA polymerase subunits i.e., cleavage of primary products phosphorylation
- ORF144 encodes a 33.5 kDa peptide that has similarity to the yeast MSS18 protein (Seraphin et al., 1988). MSS18 is known to be involved with yeast mitochondrial RNA splicing. Also, ORF124 encodes a 28.5 kDa peptide with similarity to a plasmid copy number protein from Clostridium perfringens (Gamier and Cole, 1988).
- lefs late enhancing factors
- This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 27: 22600 > 23458 and named "Inhibitor of Apoptosis-Like Gene 1" (IAPl).
- This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome.
- Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " ⁇ " between the left and right coordinates.
- the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
- the cassette was chosen to derive an in-frame fusion between the IAPl and beta-galactosidase coding regions.
- the plasmid was designated pUC118.IAPl.lacZ. This was used to cotransfect Spodoprera frugiperda cells with infectious AcMNPV C6 DNA to produce recombinant virus with a copy of the beta- galactosidase gene in frame with the IAPl, this disrupting IAPl function.
- the results from this Example demonstrated that the recombinant virus (AcIAPl.lacZ) replicated normally in S. frugiperda cells and Trichoplusia ni insect larvae.
- ORF 30 24315 ⁇ 25704: Haemolysin Secretory Protein (HSP)
- This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 30: 24315 ⁇ 25704 and named "Haemolysin Secretory Protein” (HSP).
- This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome.
- Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " ⁇ " between the left and right coordinates.
- the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
- ORF 32 27041 ⁇ 27584: Fibroblast Growth Factor (FGF)
- This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 32: 27041 ⁇ 27584 and named "Fibroblast Growth Factor: (FGF).
- FGF Fibroblast Growth Factor
- This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome.
- Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " ⁇ " between the left and right coordinates.
- the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
- ORF 71 61016 > 61763: Inhibitor of Apoptosis-Like Gene 2 (IAP2)
- This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 71: 61016 > 61763 and named "Inhibitor of Apoptosis-Like Gene 2" (IAP2).
- This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome.
- Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " ⁇ " between the left and right coordinates.
- the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
- an AcMNPV DNA fragment coordinates 60448 (Sad Site) to 63194 (Sad site) was subcloned into pUC118 digested with SacI and treated with CEP to derive pUC118.IAP2 (See Figure 7, Panels a-e).
- This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 86: 72131 ⁇ 74213 and named "Polynucleotide Kinase/Polynucleotide Ligase" (PNK/PNL).
- This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome.
- Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " ⁇ " between the left and right coordinates.
- the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
- an AcMNPV DNA fragment coordinates 71417 (Hindm site) to 83121 (Hin ⁇ T ⁇ site) was subcloned into pAT153 digested with HindEQ and treated with CEP to derive PAT153.PNK/PNL (See Figure 8, Panels a-e).
- the plasmid was designated pUCll ⁇ .PNK/PNL.lacZ. This was used to cotransfect S. frugiperda cells with infectious ACMNPV C6 DNA to produce recombinant virus with a copy of the beta-galactosidase gene in frame with the PNK/PNL, thus disrupting PNK/PNL function. The results showed that the recombinant virus (AcPNK/PNL.lacZ) replicated normally in S. frugiperda cells. EXAMPLE 26
- ORF 123 102964 ⁇ 103609: Protein Kinase 2 (PK2)
- This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 123: 102964 ⁇ 103609 and named "Protein Kinase 2" (PK2).
- PK2 Protein Kinase 2
- This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome.
- Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " ⁇ " between the left and right coordinates.
- the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
- an AcMNPV DNA fragment coordinates 102148 (Pstl site) to 105164 (Pstl site), was subcloned into pUCll ⁇ digested with Pstl and treated with CEP to derive pUC118.PK2 (See Figure 9, Panels a-e).
- Apal-BglH adaptor AGATCTGGCC
- coli lacZ coding region to provide an in-frame fusion between the virus and bacterial genes.
- the plasmid was designated pUC118.PK2.lacZ. This was used to cotransfect S. frugiperda cells with infectious AcMNPV C6 DNA to produce recombinant virus with a copy of the beta-galactosidase gene in frame with the PK2, thus disrupting PK2 function. The results demonstrated that the recombinant virus (AcPK2.1acZ) replicated normally in S. frugiperda cells. 56 EXAMPLE 27
- ORF 126 105282 ⁇ 106935: Chitinase (CHID
- This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 126: 105282 ⁇ 106935 and named "Chitinase” (CHIT.
- This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome.
- Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " ⁇ " between the left and right coordinates.
- the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
- an AcMNPV DNA fragment coordinates 105164 (Pstl site) to 107943 (Pstl site), was subcloned into pUC118 (lacking a Hind HI site) digested with Pstl and treated with CIP to derive pUC118.CHIT (See Figure 10, Panels a-e). This was digested with Hind (106337), treated with CIP and ligated with a Hindm-BamHl adaptor (AGCTGGATCC) to insert a BamHl site within the CHIT gene to derive pUC118.CHIT-BamHl. This was digested with BamHl, treated with CEP and ligated with a DNA cassette containing the E.
- the plasmid was designated pUC118.CHIT.lacZ. This was used to cotransfect S. frugiperda cells with infectious ACMNPV C6 DNA to produce recombinant virus with a copy of the beta- galactosidase gene in frame with the chitinase, thus disrupting chitinase function.
- the recombinant virus (AcCHIT.lacZ) replicated normally in S. frugiperda cells. In T. ni insect larvae, the virus replicated but failed to induce liquefaction of the host.
- This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 127: 106983 > 107952 and is named "Cathepsin” (CATH).
- This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome.
- Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " ⁇ " between the left and right coordinates.
- the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
- This mutated plasmid was designated pUC119.M.CHTT- /CATH-. It was used to cotransfect S. frugiperda cells with infectious virus DNA, purified from the AcCHTT.lacZ, which had been digested with Bsu361 to enhance the recovery of recombinant viruses.
- the recombinant virus, AcCH_T-/CATH- replicated normally in S. frugiperda cells. In T. ni insect larvae, the virus replicated but failed to induce liquefaction of the host.
- ORF 42 34010 > 33924: Global Transactivator (GTA)
- This Example identifies a new AcMNPV gene which is indispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 42: 34010 > 33924 and is named "Global Transactivator" (GTA).
- GTA Global Transactivator
- This AcMNPV gene was modified in a similar manner as Examples 21-28 (described above), but the modification did not result in the production of an infectious virus stock. This information is a strong indication that this virus gene is indispensable for replication and cannot be removed from the virus genome.
- an AcMNPV DNA fragment coordinates 33403 (EcoRI site) to 37088 (Asp718 site) was inserted into pUC118 digested with EcoRI and Asp718 and treated with CEP, to derive pUCll ⁇ .GTA (See Figure 12, Panels a-e).
- This plasmid was designated pUC118.GTA.lacZ. This was used to cotransfect S. frugiperda cells with infectious AcMNPV C6 DNA to produce recombinant virus. Although some blue plaques were derived, these could not be titrated to genetic homogeneity and it was concluded that the GTA gene is essential for virus replication in cell culture.
- This Example identifies a new AcMNPV gene which is indispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 124: 103793 > 104534 and named "Plasmid Copy Number Protein” (PCNP).
- This AcMNPV gene was modified in a similar manner as Examples 21-28 (described above), but the modification did not result in the production of an infectious virus stock. This information is a strong indication that this virus gene is indispensable for replication and cannot be removed from the virus genome.
- an AcMNPV DNA fragment coordinates 102148 (Pstl site) to 105164 (Pstl site) was inserted into pUC118, digested with Pstl and treated with CEP, to derive pUC118.PCNP (See Figure 13, Panels a-e).
- This plasmid was designated pUC118.PCNP.lacZ. This was used to cotransfect Spodoptera frugiperda cells with infectious AcMNPV C6 DNA to produce recombinant virus. Although some blue plaques were derived, these could not be titrated to genetic homogeneity and it was concluded that the PCNP gene is essential for virus replication in cell culture.
- ORF 132 112560 > 113817: Alkaline Exonuclease (ALK-EXO)
- This Example identifies a new AcMNPV gene which is indispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 132: 112560 > 113817 and named "Alkaline Exonuclease" (ALK-EXO).
- This AcMNPV gene was modified in a similar manner as Examples 21-28 (described above), but the modification did not result in the production of an infectious virus stock. This information is a strong indication that this virus gene is indispensable for replication and cannot be removed from the virus genome.
- an AcMNPV DNA fragment coordinates 112044 (Smal site) to 113913 (HindELT site) was subcloned into pUC118 digested with HindHT and Smal and treated with CEP (See Figure 14, Panels a-e).
- This combination of enzymes served to remove an intervening BamHl site within the polylinker of the plasmid.
- the plasmid was designated pUCl 18.
- ALK-EXO This was digested with BamHl (113033), treated with CEP and Ugated with a DNA cassette containing the E. coli lacZ coding region to provide an in-frame fusion between the virus and bacterial genes.
- the plasmid was designated pUC118.ALK-EXO.lacZ. This was used to cotransfect S. frugiperda cells with infectious AcMNPV C6 DNA to produce recombinant virus. Although some blue plaques were derived, these could not be titrated to genetic homogeneity and it was concluded that the ALK-EXO gene is essential for virus replication in cell culture.
- This Example identifies three restriction enzymes which do not have recognition sites within the AcNPV genome. These three enzymes are:
- Bsu36I sites were inserted within the ORF 9 (immediately downstream of the polyhedrin gene in AcNPV) and ORF 7 (immediately upstream of the polyhedrin gene).
- the polyhedrin gene was replaced with the beta-galactosidase coding region which also contains a Bsu36I site.
- Srfl and Sse8387I could be utilized in a similar manner in other regions of the virus genome. For example, they could be used to alter the AcNPV genome to incorporate these sites and facilitate genomic DNA linearization.
- This Example identifies two restriction sites which only digest AcNPV DNA once: Avril (See Figure 15) and Fsel (See Figure 16).
- AvriT digests within the non-essential EGT (ecdysteroid UDP-glucosyltransferase) gene and Fsel digests within the essential GTA (global transactivator) gene.
- EGT ecdysteroid UDP-glucosyltransferase
- GTA global transactivator
- Information derived from the entire AcMNPV genomic sequence could afford development of novel baculovirus transfer vectors that encode baculoviruses with favorable agronomic properties. Identification of genes encoding proteins that modify viral host range would lead to generation of recombinant NPVs wherein said recombinant viruses would be capable of infecting and therefore neutralizing a wider spectrum of important agronomic pests. Alternatively, genetic manipulation could lead to changes in viral properties that render the virus capable of infecting only a very narrow spectrum of insect pests, thus affording precise control of targeted insect species while sparing beneficial insect populations.
- genes involved in viral replication could be identified. Manipulation of these genes could afford recombinant baculoviruses that multiply more rapidly within infected insect cells, thus leading to more rapid neutralization of the infected insect.
- the Global Transactivator Gene ORF 42; see Example 29
- ORF 42 the Global Transactivator Gene
- Other genes influencing viral infectivity could also be identified and modified in order to raise the efficiency of the infectious process. This would also afford more rapid neutralization of targeted populations, and thus approach the rapidity of insect neutralization commonly associated with appUcation of traditional chemical insect control agents.
- genes could be identified that qu ⁇ ditatively control viral repUcation outside of a permissive propagation system.
- viral mutants deficient in a protein or proteins required for in vivo infectivity could be propagated in an insect cell culture system that is permissive to viral replication. While efficient viral repUcation in cell culture takes place, as well as initial infection of target insects, further viral replication in vivo is curtailed, and environmental impact of appUcation of recombinant baculoviruses is minimized.
- ALTSCHUL S.F., GISH, W., MILLER, W prisms, E.W. and LTPMAN, D.J.
- COCHRAN M.A.
- CARSTENS E.B.
- EATON B.T.
- FAULKNER FAULKNER
- Viral transcription during Autographa califo ica nuclear polyhedrosis virus infection a novel
- baculovirus polyhedral envelope-associated protein genetic location nucleotide sequence, and immunocytochemical characterization.
- Glycoprotein C of herpes simplex virus type 1 plays a principal role in the adsorption of virus to cells and in infectivity. J. Virol.65, 1090-1098. HODGMAN, T.C (1988a). A new superfamily of replicative proteins. Nature 333,
- Baculovirus gene ME53 which contains a putative zinc finger motif, is one of the major early-transcribed genes. J. Virol. 67, 753-758. KOGAN, P.H. and BLISSARD, G.W. (1994). A baculovirus gp64 early promoter is activated by host transcription factor binding to CACGTG and GATA elements. J. Virol. 68, 813-822. KOOL, M. and VLAK, J.M. (1993). The structural and functional organization of the Autographa califomica nuclear polyhedrosis virus genome. Arch. Virol.
- Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes.
- RAWLINGS N.D., PEARL, L.H. and BUTTLE, D.J. (1992).
- the baculovirus Autographa califomica nuclear polyhedrosis virus genome includes a papain- like sequence. Biol. Chem. Hoppe-Seyler 373, 1211-1215. ROEDER, R.G. (1991).
- the complexities of eukaryotic transcription initiation regulation of preinitiation complex assembly. TIBS 16, 402-408.
- Brahma a regulator of Drosophila homeotic genes structurally related to the yeast transcriptional activator SNF2/SW12. Cell 68, 561-572. THIEM, S.M. and MELLER, L.K. (1989a). Identification, sequence, and transcriptional mapping of the major capsid protein gene of the baculovirus
- FIG. 1 Physical map and summary of coding strategy of the AcNPV genome.
- the upper part of each panel represents a map of the sites in the virus genome for the commonly used restriction endonucleases (see text). Also shown are the hrs within the EcoRI map.
- the middle part of each panel summarizes the coding potential of all six reading frames of the virus DNA (1,2,3, ,2',3').
- ORFs are identified as black boxes starting at methionine codons (vertical Unes).
- the selected ORFs (see text) are numbered 1-154, with appropriate designations for the genes which have been characterized previously (see Table 1).
- Non-selected ORFs represent potential genes which overlap with other coding regions (see text).
- the lower section in each panel summarizes the percent purine or A+T composition for the + strand of the virus genome, using a sliding 250 nucleotide window. Units at the bottom are in base-pairs.
- FIG. 2 Dot matrix analysis of AcNPV genomic DNA. The genomic sequence of AcNPV
- + strand was compared to itself (left panel), or to its complementary - strand (right panel), using a 24 nucleotide moving window.
- the direction of sequence (strandedness) relative to the standard map in each comparison is indicated by the arrows on the x and y axes. Dots represent sites where there is 21 out of 24 or greater nucleotide sequence match (88% identity).
- matches in the left panel indicate sites of positional identity (diagonal Une mnning lower left to upper right), or direct DNA repeats (dots off the diagonal Une).
- Matches in the right panel indicate regions of inverted repetitive DNA.
- Dots close to the position where a diagonal Une should be in the right panel represent potential stem-and-loop (hairpin) structures.
- the columns and rows of dots marking the positions of the repetitive DNA associated with hrs are labelled across the top and on the right-side y-axis. Scales on the x and y axes are in kilobase-pairs.
- FIG. 3 Circular map of the AcNPV genome. The sites for the EcoRI (outer ring) and
- HindUL (inner ring) restriction enzymes are presented. The positions of the 154 ORFs described PCMB95/00578
- Fig. 1 Fig. 1
- arrows representing the direction of transcription for these putative genes. Shaded arrows indicate that the gene is known to be expressed, or has a weU characterized homologue in the protein sequence databases.
- Insertion sites and names of well characterized insertion sequences (IS) and retroposons (RP) are indicated, as are the positions of the hr sequences.
- the scale on the inner circle is in 100 map units.
- Fig. 10 Modification of the AcNPV CHITINASE gene.
- Panel (a) Pstl restriction maps for AcNPV (linearized form).
- Panel (b) Exploded view of genome coordinates 105164-107943 within pUC118.CHTE.
- Panel (c) pUC118.CHrr-Bgi ⁇ .
- Panel (d) pUC118.CHTr.LacZ.
- Fig. 12. Modification of the AcNPV GTA gene.
- Fig. 13 Modification of the AcNPV PCNP gene.
- Fig. 14 Modification of the AcNPV ALK-EXO gene.
- Panel (a) Hindm/Smal restriction map for AcNPV (linearized form).
- Fig. 15 Single restriction enzyme site (AvrEQ within the AcNPV EGT gene. Panel (a):
- Avrll restriction enzyme map for AcNPV (linearized form).
- Fig. 16 Single restriction enzyme site (Fsel) within the AcNPV GTA gene.
- Fsel restriction enzyme map for AcNPV (linearized form).
- the selected ORF's are numbered sequentially, in their order of appearance in the + strand of the genome (see text and Fig. 1).
- the left (column Left) and right (column Right) columns define the ends of the ORF irrespective of its encoding strand.
- the direction of the transcripts (column D) that could express the ORF is indicated by arrows.
- the number of amino acids encoded by the ORF (column aa) and the predicted molecular mass of the primary translation product (column M r ) from the first ATG are Usted (see text).
- the transcription column (Trans) indicates if at least one early (e/E), or TATA-like (t/T), or cap (c/Q motif is present in the 160 nucleotides upstream of an ORF (see text and Table 3). where a TATA-box is positioned 5' to a CAGT in a poUL- like promoter orientation, this is indicated by "TC".
- TC late promoter motif
- TAAG L
- ORFs that have an initiation methionine that conforms to Kozak-rules (column K) for higher eukaryotes are indicated (k).
- ORFs representing potential mini-cistrons initiating upstream of one of the selected ORFs and with an ATG condon that conforms to Kozak-rules are indicated (*, see text).
- ORFs that initiate at an ATG codon downstream of the first ATG or an ORF and producing a translation product that is smaller than the computer predicted product are marked (*, see text).
- Representative motifs in putative translation products (Table 3) are indicated in the domains column (Dom).
- the motifs included signal peptide (S), zinc finger (Z), leucine zipper (L), nuclear translation signal (N), and NTP binding domain (P).
- S signal peptide
- Z zinc finger
- L leucine zipper
- N nuclear translation signal
- P NTP binding domain
- the comments column includes differences in genomic organization pubUshed for other strains of AcNPV, functional properties of predicted peptide products, or other relevant features. References are Usted as a guide to the Uterature regarding previously pubUshed sequences or studies defining AcNPV gene functions.
- TATA box TATAAA; TATATA; TATAAT " 61/154 40/183
- Late promoter TAAG 71/154 1 1/183 oza consensus AxxATG(A/G); GxxATGG 91/154 52/183
- Zinc finger C/H X 2 -5 C/H X1 L13 C/H X2/5 C/H 31/154
- the motifs, their patterns and the number of the selected and non-selected ORFs with at least one copy of the indicated motifs are presented.
- the searches for motifs representing putative early transcription sites involved analyses of DNA sequences 160 nucleotides upstream of the first ATG codon (i.e., CGTGC, TATA box, Cap site and Pol II promoter motifs).
- the •search involved 80 nucleotides upstream of the ATG codon.
- Only the selected ORFs were analysed for motifs in the putative gene products (see text).
- Val GTA 508 18 lie ATA 822 30 Val GTC 4S2 18 lie ATC 590 22 Val GTG 1083 39 lie ATT 1286 48 Val GTT 678 25 SEQUENCE LISTING
- MOLECULE TYPE DNA (genomic)
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Organic Chemistry (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Wood Science & Technology (AREA)
- Microbiology (AREA)
- Physics & Mathematics (AREA)
- Plant Pathology (AREA)
- Virology (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Peptides Or Proteins (AREA)
Abstract
The complete nucleotide sequence of the genome of clone 6 of the baculovirus Autographa californica nuclear polyhedrosis virus (AcNPV) has been determined. The molecule comprises 133,894 base-pairs and has an overall A+T content of 59 %. Our analysis suggests that the virus encodes some 154 methionine-initiated, and potentially expressed, open reading frames (ORFs) of 150 nucleotides or greater. These ORFs are distributed evenly throughout the virus genome on either strand. The ORFs are arranged as adjacent, non-overlapping reading frames, separated by short intergenic regions. Based on the primary nucleotide sequence, predictions have been made concerning the functions of certain genes, the sites for initiation of viral DNA replication, the regulation of early and late gene transcription, and factors that may affect the AcNPV gene translational efficiency. The genome sequence data confirm, with minor differences, the information obtained for other AcNPV clones. It is proposed that clone C6 is considered the archetype AcNPV for comparison purposes.
Description
PCMB95/00578
AUTOGRAPHA CALIFORNICA COMPLETE GENOME SEQUENCE
Field of the invention
This invention relates to Autographa califomica nuclear polyhedrosis virus DNA sequences and particularly to the DNA sequence of the complete virus genome.
Background to the invention
Autographa califomica nuclear polyhedrosis virus (AcNPV) is a widely studied baculovirus which has been used to form the basis of a polypeptide expression systems (see e.g. US-P-4,745,051 and EP 0 327 626). Modified baculoviruses have also been proposed for use as viral insecticides.
Baculoviruses are invertebrate-specific viruses with large, circular, covalently closed, double-stranded DNA genomes (Francki et al., 1991). The most widely studied member of the Baculoviridae in the Autographa califomica πuclear polyhedrosis virus (AcNPV), otherwise known as AcMNPV to signify the encapsidatioπ of multiple nucleocapsids in the occluded particle (polyhedron). Several clones of AcNPV have been utilised in studies of the virus genetic structure, gene expression, the development of baculoviruses as expression vectors of foreign genes, and as genetically modified virus insecticides (see King and Possee, 1992; O'Reilly et al.. 1992). These include clones E2 (Smith and Summers, 1978, 1979), LI (Miller and Dawes, 1979), HR3 (Cochran et al., 1982), E (Tjia et al., 1979), and C6 (Possee et al., 1991). The sequences of selected regions of the DNA from these cloned viruses have been reported in the literature. This information has been summarised recently by Kool and Vlak (1993).
Hitherto, the development of applications utilising baculovirus expression systems has been hampered by lack of knowledge of the structure and organisation of the AcNPV genome.
Summary of the invention
This problem has now been overcome, and the complete sequence of the C6 clone of AcNPV has been determined and analyzed. This nucleotide sequence is set forth as the attached SEQ ID NO. 1.
Certain features within the genomic sequence that relate to the encoded genes, viral DNA replication, early and late gene transcription, and protein translation can be identified. From the data it is possible to predict the potential coding capacity of the virus and identify structural motifs and suggested functions for some of the putative gene products.
As will be appreciated, knowledge of the genomic sequence and its components for AcNPV will allow the preparation of (A) new, temporally controlled, expression vectors for foreign gene expression, (B) a tailored genome for manufacturing purposes, (C) improved expression foreign gene products following the removal of deleterious genes (D) the derivation of multiple gene expression vectors, (E) the derivation of engineered, genetically stable, virus insecticides including viruses with altered host ranges, (F) the identification of the controls of temporally expressed AcNPV genes and their use in regulating foreign gene expression in transgenic eukaryotic cells.
More specifically a number of new late genes and late gene promoters, early genes and early gene promoters are identified in the AcNPV C6 sequence, genes that have not been reported hitherto and which via substitution of the downstream gene, or by promoter duplication, alone, or in concert with other promoters of
AcNPV, or other baculoviruses will allow the expression of foreign genes, or operons, or duplicated AcNPV genes at defined times in the baculovirus infection process.
The genome data provides information on which specific restriction enzymes do not cut the AcNPV C6 genome. The data also identifies restriction enzymes that cut the sequence only once, or twice, or thrice, etc, and the location of all such sites. The latter sites can now be removed by deletion (for non-essential, including coding sequences), or by site directed mutageπesis (for essential, including coding sequences). Thereafter, AcNPV derivatives can be constructed that only cut the genome at defined locations (new sites) by these specific enzymes. This will allow the linearisation of the virus DNA at defined locations in order to facilitate the introduction of foreign genes. The new sites may be located within a reporter gene sequence for the efficient identification of recombinant expression vectors by the loss of the reporter gene function. Additional sequences representing these restriction sites may also be placed in flanking sequences of essential genes to improve the efficient recovery of recombinants using transfer vectors that provide both the foreign gene and the unmodified essential flanking sequences. Further, the use of a number of such enzyme sites strategically located in the virus genome, will allow the preparation of genetically stable, multiple gene expression vectors.
The genome sequence allows for the identification of essential and non- essential genes in relation to the infection course of the virus in different types of cultured cells and host insects. In the new sequence data there are many genes that will be proven to be essential to the infection course of the virus in cultured cells and insect hosts and other genes that are non-essential to one or other or both substrates. Such non-essential genes can now be specifically removed from the AcNPV genome without affecting the expression of essential, including flanking, genes, or the replication of the virus in certain cultured cells. Removal of such
genes and corresponding reduction in the AcNPV genome size and hence cost to the overall transcription, translation and other processes induced by the virus, or certain other processes and structures naturally operative in the host cell, will provide a preferred expression vector system and improved virus replication. The modifications will allow the time when foreign gene products are made to be regulated and improvements to the amounts and quality of such products. Importantly, removal of such genes will be to the benefit of commercial manufacturing processes and environmental safety. For example, the removal of natural AcNPV genes that facilitate the persistence of AcNPV in the environment and, or that provide for the productive infection of insect larvae and, or that facilitate the transmission of infectious virus in the environment by affecting characters such as determinants of host range, cell death and larval degradation will be suitable candidate genes to remove. The loss of any or all such functions and the derivation of disabled virus expression vectors will prohibit the occurrence of any adverse consequences of virus escape from laboratories or manufacturing establishments, by eliminating any potential effect on natural insect populations in the environment, or the likelihood of re-acquisition of such genes and functions from natural sources.
In addition to the beneficial effects on foreign gene expression obtained by removing non-essential genes, both πuclease and protease genes deleterious to the transcription, expression and product accumulation of foreign genes expressed by baculovirus vectors have been identified. Removal of such genes will also provide for improved expression vectors.
Further to the items listed in above, the sequence information allows new sites to be identified for the insertion of single or multiple gene expression cassettes composed of viral promoters, foreign gene(s) of choice, including new polyadenylation sites and transcription terminators. Such cassettes can now be
positioned so that they do not affect resident genes, their promoters, terminators, polyadenylation sites, or give mRNA species that act as antisense sequences to required viral genes. Such developments will provide for genetic stability in the constructs. The sites may be contiguous. Additionally, or alternatively, the sites may be non-contiguous thereby facilitating expression of foreign genes without incurring deleterious positional effects on mRNA transcription.
The genome sequence allows genetically engineered virus insecticides to be produced by exploiting the advantages described above with regard to tailored genome size, genetic stability, multiple foreign gene expression, and by the exploitation of gene dose.
The ability to introduce genes into proscribed sites in the AcNPV genome and derivatives without affecting resident genes thereof includes the ability to transfer from other baculoviruses and other origins individual genes, cassettes of genes, and other DNA sequences that will affect the virus host range, its transmission and stability in the environment. The benefits will include effects on the LD50 (lethal doses required to kill 50% of target species) and LT50 (lethal time in 50% of members of an infected host species) and other biological properties of the natural virus. Such sequences will include, for example, genes representing baculoviruses with alternative host ranges, including genes from viruses that have proved impossible to grow, or to clone in cultured cells. Thus with the available knowledge it will be possible to alter the host range of the virus and hence to prepare virus insecticides with broader host ranges.
The AcNPV genome contains genes and sequences that alone or in concert with host factors regulate the expression of viral genes generally in a temporally controlled fashion. The genome sequence allows the identification of all such regulatory viral genes and sequences. Armed with this information it will now be
possible to prepare transgenic cells with all the required information to express foreign genes in the absence of a viral vector.
The sequence information contained in SEQ LD NO. 1 may be used in the manufacture of a range of novel polynucleotides which may be used industrially. Thus the invention according to one aspect thereof provides the use of sequence information derivable from the complete genomic sequence of AcNPV in the manufacture of a polynucleotide for use in an industrially applicable process. The invention further provides the use of sequence information derivable from the complete genomic sequence of AcNPV in the manufacture of a polynucleotide capable of acting as a control sequence in the expression of a foreign gene in an insect or insect cell.
Preferably said sequence information is derivable solely and/or primarily from said complete genomic sequence. Thus, e.g. the information may be derived from sequence data present in said complete genomic sequence, but essentially absent from or present in incomplete form in previously available sequence data.
Thus, for example, sequence analysis of the complete genomic sequence contained in SEQ ID NO. 1 has revealed the presence of 154 open reading frames of which 91 have not hitherto been described. These novel open reading frames are identified in Table 1 as ORF 13, 22-26, 28-30, 32, 38, 41-46, 50-60, 62-63, 66, 68-79, 81-87, 91-92, 96-98, 101-103, 106-126, 129-130, 140-146, 148-150, 152 and 154. The present invention thus includes isolated polynucleotides containing a nucleotide sequence which corresponds to one of the aforementioned ORFs. ("By corresponding to" as used herein is meant a nucleotide sequence which is identical to the disclosed sequence or which has sufficient homology to hybridize to the aforementioned sequence under hybridization conditions corresponding to TM -19 to TM -25. Alternatively, expressed as percentage homology, the corresponding sequences may be at least 80%, preferably at least
90% and most preferably at least 95% homologous to the stated sequence. Desirably the degree of homology is not less than 98%),
In addition to the polynucleotides having the aforementioned sequences, the invention also includes polypeptides obtainable by expressing polynucleotides corresponding to the aforementioned ORFs. Such expression may be achieved by incorporating an insert having a sequence corresponding to one of the aforementioned polynucleotides into a suitable expression vector in association with and under the control of appropriate expression control sequences.
Information derived from the SEQ ID NO. 1 may be used to optimize polypeptide expression in expression systems based upon baculoviruses by selecting appropriate control sequences.
Thus the present invention further provides a method of synthesizing a polypeptide by expressing the polypeptide in an insect or cultured insect cell which has been transformed by an expression vector derived from AcNPV, the expression vector containing a coding sequence coding for the polypeptide and control sequences responsible for control of replication of the expression vector and/or transcription of the coding sequence, characterized in that the control sequences are selected on the basis of sequence information derived from SEQ ID NO. 1.
The information derived from SEQ ID NO. 1 additionally enables the efficiency of polypeptide expression to be increased by modifying the nucleotide sequence being expressed so as to take advantage of the preferred codon usage which is characteristic of the ORFs which have been identified in SEQ ID NO. 1.
Thus the invention provides a method of synthesizing a polypeptide by expressing the polypeptide in an insect or cultured insect cell which has been
transformed by an expression vector derived from AcNPV, the expression vector containing a coding sequence coding for the polypeptide and control sequences responsible for control of replication of the expression vector and transcription of the coding sequence, characterized in that the coding sequence is adapted by selecting codons in accordance with the preferred codon usage of AcNPV.
Preferred codon usage differs between species and expression of foreign polypeptides can often .be hampered if codons contained in the coding sequence to be expressed correspond to less preferred codons in the expression host. Knowledge of the preferred condoπ usage for AcNPV allows the DNA sequence of the insert being expressed to be modified so as to increase the proportion of codons which are preferred for AcNPV.
Thus, for example, it is particularly preferred that the coding sequence should be modified (if necessary) so as to ensure that one or more (and preferably at least ten, most preferably at least 15) of the amino acids indicated below are encoded by the indicated codons:
Amino Acid Preferred Codon(s)
Ala GCC or GCG
Arg AGA, CGA, CGC,
Asn AAC
Asp GAC
Cys TGC
Gin CAA
Glu GAA
Gly GGC
His CAC
He ATT or ATA
Leu TTA or TTG
Lys AAA
Phe TIT
Pro CCC or CCG
Ser ACT, TCG or TCT
Thr ACG
Tyr TAC
Val GTG
A person of ordinary skill in this art could therefore employ the preferred codons for the different amino acids as described herein in order to the optimize expression of a variety of different heterologous proteins using the claimed expression vector and the claimed methods. For example, in order to optimize gene expression, the genes encoding a desired heterologous protein could be modified to include the more preferred codons (see list above) and to exclude the less preferred codons (see list below for codons to avoid). For example, DNA sequences encoding different enzymes, hormones, toxins, antibodies and receptors may be modified as described herein to enhance production. Also, many different proteins useful in agriculture (proteins are modified to alter insect behavior in a desirable way), clinical therapy, and or diagnosing disease could be modified. Examples of these different proteins include, but are not limited to the following: hepatitis B virus core antigen, hepatitis B virus surface antigen, bovine Herpesvirus-1 glycoprotein glV, Human immunodeficiency virus type 1 (HIV-l) envelope protein gp 120, HTV-l envelope protein gp 160, HTV-l Gag protein, HTV-l Gag-pol fusion protein, HTV-l Integration protein, HTV-l Major core p24, HTV-l Nef protein, HTV-l Pol protein, HTV-l protease, HTV-l Rev protein, Human immunodeficiency virus type 2 Gag precursor protein, Human T-cell lymphotxophic virus type 1 (HTLV-1) p20E protein, HTLV-1 gp46 protein, HTLV-1 040* protein, Bacillus thuringiensis subspecies kurstaki HD-73 delta endotoxin, Bacillus thuringiensis subspecies aizawai 7.21 crystal protein, Androctonus ausiralis Hector (Scorpion) Insect neurotoxin (AaTT), Buthus eupeus (Scorpion) BeTT insectotoxin-1, Heliothis virescens juvenile hormone esterase, Manduca sexta eclosion hormone, Manduca sexta diurectic hormone, Pyemotes triπci (Mite) neurotoxin (TxP-1), Human Adenosine deaminase, Human beta2-adrenergic receptor, human aldose reductase, human beta-amyloid precursor, human apolipoprotein-E, human CD4 HTV receptor, human cystic fibrosis gene product, human erythropoietin (EPO), human granulocyte-macrophage colony stimulating factor (GM-CSF), human immune activation gene Act-2, human alpha interferon, human beta interferon, human interleukin-2, human iπterleukin-5, human interleukin-6, human tyrosine kinase (p561ck), human beta-nerve growth factor, human protein kinase C, human tissue plasminogeπ activator, and human tumor necrosis factor receptor.
Alternatively, it is preferred that the coding sequence should be modified so that the following codons are avoided (these being less preferred codons for the indicated amino acids):
Amino Acid Codon(s) to be avoided
Ala GCA or GCT
Arg AGG or CGG
Gin CAG
Glu GAG
Gly GGG lie ATC
Leu CTA, CTC or CTT
Lys AAG
Phe TTC
Pro CCA
Ser TCA or TCC
Val GTA or GTC
Definitions of terms/phrases
Chaperon sequences shall be defined as a sequence encoding a protein which contains a nucleotriphosphate and which is capable of leading, escorting or "chaperoning" a different protein into the nucleus from the cytoplasm.
Open Reading Frame (ORF) and Potential Genes shall be used interchangeably throughout this disclosure. For the purpose of this invention, open reading frame shall refer to a specific length of DNA with a methioniπe start codon and terminated by a translation stop codon.
Predicted sequences describes a sequence of putative protein as derived from the DNA sequence in the open reading frame. Using the genetic code one of ordinary skill in this art could readily define a protein sequence corresponding to each of the 154 open reading frames presented in Table 1.
Putative is defined as "assumed to exist" e.g. "encodes a putative alkaline exonuclease" (infra under the Heading "Gene functions", last para.).
For the purpose of this invention, data is used to define nucleotide sequences based on computer predictions; particularly when assuming the function of putative gene product.
For the purpose- of this invention, a consensus sequence is defined as a sequence specific for a biological function or characteristic as determined by computer sequence analysis. Consensus sequences may also be used to define a sequence (and corresponding characteristic or function for this sequence) which is shared or found to be homologous among different species.
DNA wobble is a term used to explain how the third nucleotide of a codon can vary or "wobble" and still encode the same amino acid. For example, for years it has been known that TTT, and TTC both encode the amino acid phenylalanine and that ATT, ATC, and ATA all encode for the amino acid isoleucine.
Protease sequence defines those amino acid sequences found on certain proteins which are known or presumed (because of a consensus sequence) to play a role in the enzymatic digestion of other proteins.
Ligase sequences refers to an amino acid sequence that is capable of joining or ligating the ends of RNA molecules or joining the ends of DNA molecules. For example, T4 DNA ligase is used to join or ligate compatible "sticky" or "blunt" ends of DNA derived after restriction enzyme digestion. "Sticky" and "blunt" are terms in the art to define how the ends of DNA molecules appear after restriction enzyme digestion.
Helicase sequences are protein sequences in enzymes associated with the unfolding of DNA molecules.
For the purpose of this invention, polymerase sequences refer to either RNA or DNA polymerases. These enzymes are responsible for synthesizing RNA or DNA from the appropriate template.
Deleterious sequences refer to a sequence that can have a deleterious effect on the production or efficiency of certain proteins being produced in the host cells. For example, a protease sequence might be deleterious if this portease specifically breaks down the foreign recombinant protein synthesized in the insect cell via a baculovirus expression vector.
Enhancer sequences are DNA sequences which increase the transcription of a virus gene. For example, dot matrix analysis of the AcNPV sequence against itself and its complement revealed eight regions of direct and inverted repetitive DNA sequences (hr\-hr5). The hrs are involved in enhancing early mRNA transcription and act as origins of DNA replication (infra, first two sentences under the heading "AcNPV genomic organization and repetitive DNA".
In this application, the terms disrupted, interrupted, mutated, and deleted are sometimes used interchangeably in reference to specific ORFs. It is intended that these terms refer to a condition where the encoded protein is no longer functional due to a disruption, interruption, mutation, or some other interference that prevents, shuts down, nullifies or inhibits the otherwise named function.
Although SEQ ID NO. 1 was derived from the C6 clone of AcNPV, the sequence information provided according to the invention may be used to optimize expression in other baculovirus expression systems. Thus, for example, published partial sequence data, restriction
enzyme and hybridization analysis can be used to identify other clones and baculovirus isolates from insects which may be strains, variants or varieties of AcNPV. Such isolates include viruses obtained from Autograph califomica, Autographa gamm, Galleria mellonella, Plutella xylostella, Rachiplusia ou, Spodoptera exempta, Spodoptera litura and Trichoplusio ni. Such viruses are likely to possess DNA sequences, genes, origins and replication, transcriptional promoters, terminators and regulatory factors in common with those of AcNPV C6 and such entities are likely to be involved in directing the course of infection, multiplication and morphogenesis of these viruses as well as their interactions with hosts, host cells and components thereof. Accordingly, the information provided according to the invention of SEQ ID No. 1 may be used in the development of expression systems utilizing these alternative viruses and virus strains.
The complete nucleotide sequence of the genome of clone 6 of the baculovirus Autographa califomica nuclear polyhedrosis virus (AcNPV) has been determined. The molecule comprises 133,894 base-pairs and has an overall A + T content of 59%. Our analysis suggests that the virus enclodes some 154 methionine-initiated, and potentially expressed, open reading frames (ORFs) of 150 nucleotides or greater. These ORFs are distributed evenly throughout the virus genome on either strand. The ORFs are arranged as adjacent, non-overlapping reading frames, separated by short intergenic regions. Based on the primary nucleotide sequence, predictions have been made concerning the functions of certain genes, the sites of initiation of viral DNA replication, the regulation of early and late gene transcription, and factors that may affect the AcNPV gene translational efficiency. The genome sequence data confirm with minor differences, the information obtained for other AcNPV clones. It is proposed that clone C6 is considered the archetype AcNPV for comparison purposes.
Specific description and Examples
The procedure used for determining SEQ ID No. 1 according to the invention will now be described in more detail.
Description of the drawings
Figure 1. A physical map and summary of coding strategy of the
AcNPV genome.
Figure 2. A dot matrix analysis of AcNPV genomic DNA. Figure 3. A circular map of the AcNPV genome. Figures 4 - 14. A construct for modififying the following respective genes to identify which genes are dispensable (non-essential) and which genes are indispensable (essential) for viral replication in cell culture or insect larvae. Figure 4:
AcNPV IAP1 gene; Figure 5: AcNPV HSP gene; Figure
6: AcNPV FGF gene; Figure 7: AcNPV IAP2 gene;
Figure 8: AcNPV PNK/PNL gene; Figure 9: AcNPV
PK2 gene; Figure 10: AcNPV chitinase gene; Figure 11:
AcNPV CATH gene; Figure 12: AcNPV GTA gene;
Figure 13: AcNPV PCNP gene; and Figure 14: AcNPV
ALK/EXO gene.
Figure 15. Single restriction enzyme site within the AcNPV EGT gene.
Figure 16. Single restriction site (Fsel) within the AcNPV GTA gene.
Examples
EXAMPLE 1
Procedure for Determining Autographa californica cDNA Sequence (SEQ. ID. NO. 1)
I. Materials and Methods
A. Virus and cells
The AcNPV C6 clone (Possee, 1986; Possee et al., 1991) was propagated in Spodoptera frugiperda cells (IPLB-SF21, Vaughn et al., 1977) as described previously (Possee, 1986).
B. Preparation of virus DNA and recombinant plasmids
AcNPV genomic DNA was prepared as described by Possee (1986). The DNA was digested with an appropriate restriction endonuclease (BamΗI, Bgϊ i, EcoBl, HindTH, Pstl, Sstl, SstΗ). The derived DNA fragments were inserted into pUC18/19, pUC118/119 or pT7T318/19 vectors using standard protocols (Sambrook et al., 1989). Where appropriate, plasmids containing larger regions of virus DNA were digested with a restriction enzyme to release the insert, the virus DNA purified using agarose gel electrophoresis and then digested with another restriction enzyme. These smaller DNA fragments were inserted into plasmid vectors to provide materials more convenient for DNA sequencing.
C. DNA sequencing
Complementary strategies were used to sequence the library of cloned AcNPV C6 DNA. Recombinant plasmids containing the M13 intergenic region were used to produce single-stranded DNA by superinfection of Escherichia coli (JM105 or NM522 strains) with the helper phage M13 K07 (Vieira and Messing, 1987). Double-stranded DNA was derived from other plasmids using the alkaline lysis method of Birnboim and Doly (1979). Single-stranded DNA templates were sequenced using modified T7 DNA polymerase (Sequenase, US Biochemicals; Tabor and Richardson, 1987) according to the protocols recommended by the supplier. Reaction mixtures contained the dGTP analogue, 7-deaza dGTP, in lieu of dGTP in order to reduce sequence compressions. For certain regions of DNA sequence that were difficult to resolve, dITP was substituted for dGTP in the sequencing reactions. The M13 primer (5' GTAAAACGACGGCCAGT) was used to sequence the ends of each virus DNA fragment. Oligonucleotide primers, prepared using an Applied Biosystems Instruments synthesizer (ABI, Model 380B, Warrington, UK), were employed to obtain the internal sequences of the viral fragments. Where appropriate, double-stranded DNA templates were used to complete regions of the AcNPV sequence not analysed as single-stranded DNA. An ABI automated sequencer (model 370A) was also used on occasion. Using the established nomenclature for describing (by rank of size) the AcNPV restriction endonuclease fragments (e.g., A, B, C, etc.,), the following cloned virus DNA fragments were completely sequenced: BamHl-D, -E and -G; BgHl-G', HinaH-C to -K, -O to -S, -U, -W and -X; PsrI-J to -M, Sstl-F to -H. Partially sequenced fragments included: BamHI-E; BgU -E and -H; HindHl-L; Pstl-B and -C; Sstl-O
and Ssfϋ-I. All the DNA sequences between adjoining virus DNA fragments were determined using appropriate subclones spanning the respective junctions.
D. Sequence data analysis
Sequence assembly and analysis was performed using the University of Wisconsin GCG package (v7.3) (Devereux et al., 1984). The searches for sequence identities were made using the programs: FASTA (Pearson and Lipman, 1988), BLAST (Altschul et al., 1990), and BLITZ (Smith and Waterman, 1981; Sturrock and Collins, 1993). For amino acid sequenc analyses the PIR, SWISSPROT and NCBI non-redundant databanks were searched, while for nucleic acid searches the FASTA program and the combined GenBank/EMBL databases were employed. Scoring for amino acid sequence similarities in database searches was undertaken using the PAM250 scoring matrix (Dayhoff et al., 1978), except for the BLAST searches which used the BLOSTJM62 scoring matrix (Henikoff and Henikoff, 1992). For some of the BLITZ searches the PAM120 scoring matrix was used. The % protein similarities reported in this paper were all calculated using the PAM250 scoring matrix. Protein motifs were investigated by profile analysis (Gribskov et al., 1990) and pattern matching using the PROSITE database (Bairoch, 1993);
π RESULTS
A. DNA sequencing
The DNA sequences of the AcNPV C6 homologous region (hr) 1, .EcoBI-I and -R fragments have been reported (Possee et al., 1991). The remaining sequence of this AcNPV clone was determined from a data set comprising approximately 106 nucleotides. The complete AcNPV genomic sequence has been determined to
consist of 133,894 base-pairs (bp) and has an A+T content of 59%. The distributions of purines and A+T nucleotides for the plus strand (+ strand; see convention established by Vlak and Smith, 1982) throughout a linearized representation of the circular AcNPV genome is shown in Fig. 1, using a moving window of 250 nucleotides. A physical map of the genome was derived from the sequence data and is also illustrated in Fig. 1. This shows the arrangement of some of the common restriction enzyme sites frequently used to map the virus DNA (EcόRl, HindUL, Pstl, Sstl, BgiU, Xhol). Although circular, the map is presented with the first JEcoRI site of Λrl as the left end of the genome. The virus DNA fragments shown in Fig. 1 are labelled alphabetically, in decreasing order of size (Vlak and Smith, 1982).
Our analysis of the AcNPV C6 sequence necessitates some correction to the restriction enzyme maps of AcNPV. These maps were based on estimating the sizes of DNA fragments resolved in agarose gels. The maps of different AcNPV clones may differ to that of C6, however the published data of other AcNPV clones suggest that such differences, if any, are marginal. On this basis the AcNPV C6 sequence and genetic map may be considered as a reference. The changes to the C6 map (see Fig. 1), include the interchange of i£coRI-W and -X fragments and an additional EcoΕI-Y fragment between fragments .EcoRI-V and -U. In the HindUL map, the HindUJ-N and -O fragments are interchanged. A small fragment of 38 nucleotides is present between the HindUI-L and -M fragments and a 12 nucleotide fragment between the HindΩI-C and -W fragments (see Lu and Carstens, 1991 for the data on the clone HR3). The only exceptions to labelling fragments uniquely according to their size are the HindUl-Al (15,293 bp) and -A2 (7,576 bp) fragments. These are designated Al and A2 in Fig. 1 solely for convenience of comparison with previously published data. The Sstl map is modified to interchange the SsrI-A and -B fragments and the BgUL map is modified to interchange the B^ZII-G and -H fragments. For the Xhol map, we have determined that the Xhol-C and -D fragments should be interchanged, as well as
the Xhol-ΕL and -I fragments, since their sizes are inconsistent with the original restriction maps. Further, the extra XAoI-N fragment, reported by Cochran and associates (1982) for the HR3 clone, is located between the Xhol-C and -F fragments as a 379 bp region.
An additional hr, designated Aria, 88 bp in length, is located between the EcόRl-1 and -R fragments. This was not detected by the sequencing strategy used in the study reported previously (Possee et al., 1991). Further, the Λr4L and Λr4R (Guarino era/., 1986) are renamed Λr4a and Λr4b. This change of nomenclature is warranted since there is an additional copy of the 30 bp palindrome hr located to the right of Ar4b. It is therefore designated Ar4c (Fig. 1, Table 2). The Ar4c represents an imperfect copy of the typical AcNPV 30 bp palindrome since there is a base change that mutates to AAATTC the characteristic JBCORI site (GAATTC) found in the centre of all other AcNPV hr palindromes (Table 2).
EXAMPLE 2
AcNPV coding potential
The primary AcNPV sequence data was analysed to predict potential protein coding regions and to determine the gene organisation. Fig. 1 shows the positions (black boxes) of 337 open reading frames (ORFs) that are initiated with a methionine codon (vertical bars) and which could encode polypeptides of at least 50 amino acids. We recognise that this strategy of analysis does not identify gene products that may be smaller than 50 amino acids, or products that are generated by removal of introns from primary mRNA transcripts representing larger regions of the genome. We chose 154 ORFs for further analysis. These are named and, or numbered in Fig. 1 (see also Table 1). For convenience, in this paper these ORFs are termed the "selected ORFs". The 154 selected ORFs are numbered according to their position in the virus genome beginning at the left end of the linear map (Fig. 1), and irrespective of their orientation (i.e., on which DNA strand or potential reading frame they are encoded, i.e., 1, 2, 3, or 1', 2' or 3', Fig. 1). Where
appropriate in Fig. 1 the names of previously identified genes are included (see also Table 1). For example, ORF 1 encodes a virus protein tyrosine/serine phosphatase (FTP) previously identified by Kim and Weaver (1993).
Table 1 provides a more detailed summary of the information concerning the selected ORFs. The left end of each ORF identified in Table 1 (column Left) represents the site of either the translation initiation or termination codon, as determined by the orientation of the ORF. Correspondingly, the right end of each ORF (Table 1, column Right) indicates the respective translation termination or initiation codon. The .direction of transcription (Table 1, column D), relative to that of the polyhedrin gene, is indicated by an arrow. The predicted number of amino acids (Table 1, column aa) per methionine initiated polypeptide derived from the ORF, and the Mr of that polypeptide are also given.
We based our selection of the 154 ORFs on the assumption that expressed AcNPV genes are distributed as non-overlapping, contiguous sequences (single exons). This conclusion derives from prior empirical observations of the baculovirus ORFs that have been characterized. The only known exception to this is the splicing. of the transcript which spans IE-0 and IE-1 (Chisholm and Henner, 1988; Kovacs et al., 1991). Further studies may identify other exceptions and invalidate the assumption we have made. In trimming the dataset of 337 ORFs to 154 ORFs, we omitted potential coding regions which were overlapped by larger ORFs on the same or opposite strand of DNA. Gene product analyses will be required to determine if this assumption is correct. Where ambiguity arose with two nearly identically-sized ORFs, we chose the larger ORF unless there was previous experimental evidence contrary to this assumption. For example, the large ORF encoded entirely within the region of gp67 (ORF128, Fig. 1), but on the opposite strand, was excluded from our final dataset. Similarly, ORF100, which encodes the basic DNA binding protein, p6.9, of AcNPV (Wilson, et al., 1987), was included in our final dataset. As a consequence the two similar sized ORFs that
overlap ORFIOO were not. Further analyses of the selected and non-selected ORFs will determine whether these assumptions are correct.
Where two ORFs of similar size partially overlapped their amino and/or carboxy terminal coding regions, both were included in the final dataset. Thus, the 5' end of ORF6 (lef-2) starts within the 3' region of ORF5. Similarly, the 3' end of ORF14 (lef-1) overlaps the start of ORF13.
Many computer programs are now available to aid in the identification of coding regions within a genome. Therefore to confirm the effectiveness of the chosen strategy, we used the neural-net ORF identification programme, GRAIL (Uberbacher and Mural, 1991), as an alternative procedure to search for coding potential in the primary sequence data. When GRATL was used to analyse the AcNPV DNA sequence, it selected 137 (41%) of the 337 ORFs. Of the 154 selected ORFs listed in Table 1, GRATL identified 130 (84%) as potential coding regions, and ignored 24 of the selected ORFs. For the remaining 183 non-selected ORFS, GRATL only highlighted 7 (4%) as likely to encode polypeptides. A GRAIL rating for protein-coding potential of excellent (e), good (g), marginal (m) or null (n) is assigned to each of the ORFs listed in Table 1 (column G).
EXAMPLE 3
Differences with other published data
We have not recorded every instance where other published sequence data differs from the complete DNA sequence of AcNPV C6 reported here. There are, however, quite a number of differences. Where these have been identified, care has been taken to be certain that the C6 sequence reported here is correct. Many of these differences may represent minor variations between different AcNPV clones; some may derive from sequencing errors. Generally, though, the sequence data support the view that clone C6 is representative of.the other AcNPV clones that have been described. It will be up to other investigators to determine whether the differences between their clones of AcNPV and the reported sequence
for C6 are real or not. In Table 1, however, we note that ORF22 was originally reported as two smaller ORFs by Braunagel and associates (1992). Likewise, ORF25 in Table 1 was recorded as 2 smaller ORFs by the same authors. In the vicinity of residue 7,497 there are 4 extra nucleotides compared to the previous published AcNPV C6 sequence data (Possee et al., 1991). This causes a frameshift in the coding region and results in an extension of a predicted protein, PKl (ORF10), from 196 to 272 amino acids.
EXAMPLE 4
AcNPV genomic organisation and repetitive DNA
Dot matrix analysis of the AcNPV sequence against itself and its complement revealed 8 regions of direct and inverted repetitive DNA (Fig. 2, identified as hrl, Aria, hrl, hrZ, Ar4a, hr4b, Λr4c, hrδ). The hr regions are involved in enhancing early mRNA transcription and as origins of DNA replication (Pearson et αl., 1992; .Leisy.and Rohrmann, 1993; Kool et αl., 1993a,b). Other regions of DNA sequence were identified that have direct or inverted repetitive DNA that meet the minimal 21/24 bp matching criteria. The significance of these sequences is unknown. In Table 2 is listed a number of the larger, non-λr inverted repeats that could in single-stranded forms produce hairpin structures. These may be relevant to the secondary structure of mRNA species and affect the transcriptional or translational efficiencies of a particular ORF. In this regard, it is noted that most of these sequences occur within ORFs, rather than in intergenic sequences (Table 2). Their presence may be solely a consequence of the encoded amino acid sequence and the codons used. However, of particular note is the palindromic sequence found within the 25K gene (FP-protein; ORF61) and its similarity to the hr palindromic sequences (see Table 2). Another major region of repetitive DNA involving a stretch of 250 bp that is composed of some 50% G residues (in the + strand) is downstream of ORF90 (lef-4) and within ORF91 (at cα. 78,300 bp).
The dot matrix comparison for repetitive sequences was carried out with a 24 bp moving window that would accept up to 3 mismatches. This may be adequate to scan for larger repetitive DNA, but would overlook shorter repetitive sequences, including those that may be regulatory sequences located within intergenic or other sites of the viral genome. On closer examination of selected regions of the AcNPV genome, we have observed that there are many areas where there could be secondary structure for single-stranded nucleic acids. To record all of these sequences as potential structures in the viral DNA or RNA is beyond the scope of this paper. However, such data should be considered in the context of the individual genes, their transcripts and translation.
EXAMPLE 5
Transcription signals
It has been s own that the early genes of AcNPV utilise the host cell RNA polymerase II (RNA pol II) transcription machinery and that, in general, the promoters of these early genes conform to those seen in other RNA pol II transcribed genes (Hoopes and Rohrmann, 1991). For each of the 337 identified ORFs of >150 bp, we examined the 160 nucleotides 5' to the first potential translation start codon (ATG). In particular, we sought for motifs characteristic of early transcription, i.e., an enhancer-like element (search patterns: A(A/T)CGT(G/T); CGTGC), a cap site (CAGT, for mRNA transcription initiation) and TATA box motifs (see patterns listed in Table 3).
Searches for the TATA box motifs showed some bias in favour of the selected ORFs (Table. 1, column Trans, "t T"; 61 ORFs, i.e., 40%, Table 3) compared to the non-selected 183 ORFs (40 ORFs, i.e., 22%, Table 3). In total there were 90 TATA- like elements in the scanned 5' regions of the 61 ORFs, and 46 copies of this motif in the 40 non-selected ORFs. In part, the numbers may reflect the A+T rich nature of the 5' leader sequences of the 154 selected ORFs (66% A+T), as
compared to the non-selected ORFs (58% A+T). This latter A+T bias is similar to the overall A+T composition of the AcNPV genome (59%).
The pattern selection of TATA boxes shown in Table 3 represents a sampling of several of the core DNA elements that are recognised to bind transcription factors (TFIID and TFUD-like proteins) (Ghosh, 1992). One general, loosely-defined consensus for the TFIID binding site is TATA(A/T)A(A/T) (Nikolov et al., 1992). The patterns that were employed were selected to limit the number of matches obtained when only TATA was used as the search motif. In the TATA motif search it was observed that the two patterns that favoured the A residue at position 6 were preferred over the third pattern (TA AAT, see Table 3). Considering the total number of TATA boxes found in the leaders of the 154 selected ORFs, the TATAAA motif occurs in 46% of the cases, the TATATA motif in 34%, and the TATAAT motif in 19%.
We also analysed the upstream sequences for potential transcription initiation (cap) sites hy searching .for the conserved CAGT motif (Table 1, Trans column, "c/C"; Table 3). Of the selected ORFs, 72 (47%) contained potential cap sites in the scanned regions. For the non-selected ORFs, only 59 (32%) had potential cap sites (Table 3). In addition, we determined the number of cases where the TATA box motif was followed by a CAGT motif (Hoopes and Rohrmann, 1991) to search for potential RNA pol II transcription start sites (Table 1, column Trans, scored as "TC", Table 3). Where the CAGT motif preceded the TATA box motif these are scored as "tc" in Table 1. The analysis for potential cap RNA pol II sites showed a similar bias in favour of the selected ORFs (Tables 1, 3; 21 ORFs, 14%) versus 4% for those not selected. The 14% value may only approximate the number of early genes since several factors may affect this deduction, for example, the assumptions made concerning the ideal RNA pol Et promoter.
Another factor determining the outcome of these searches for early promoter matching patterns was the lack of allowance for mismatch between the motif pattern and the actual sequence. In particular, it should be noted that the CAGT
motif is not always found at the start site of AcNPV early mRNA species. It should also be noted that in identifying possible RNA pol II promoter sites, we only considered the relative positions of the TATA box and CAGT motif (i.e., a TATA box 5' to a CAGT motif within the 5' leader sequence that was analysed, see above). Generally, however, in eukaryotes the TATA box motif is within 20 to 40 nucleotides of the mRNA cap site (Roeder, 1991; Zawel and Reinberg, 1992).
A second AcNPV early gene motif has been proposed with some similarity to a sequence found within the so-called enhancer sequences (Lu and Carstens, 1991; Friesen and Miller, 1987; Kogan and Blissard, 1994), Table 3; scored as "E" in Table 1, column Trans). This CGTGC motif is not yet refined enough to be of predictive value. This might explain why it is found almost as frequently 5' to the 154 selected ORFs as in the non-selected ORFs (Table 3, 42% versus 38%). If the CGTGC-like motifs are elements of AcNPV early gene transcription sequences, they may not be located immediately upstream of the respective ORF, so that the analyses undertaken may be inadequate.
It has been observed that the 5' leader sequences of most late mRNA species of AcNPV are less than 80 nucleotides in length. In compiling a list of putative late gene promoters (Table 1 column Trans, "L"; Table 3), we limited our search of sequences upstream of all the potential ORFs to 80 nucleotides 5' to the indicated ORF start codon. We recognise that this assumption may not be correct and that we may have missed late transcription start sites that involve longer 5' leader sequences. Also we may not have scanned the appropriate sequences for genes that are, in fact, translated from ATG codons downstream of the first ATG in an ORF (see later). AcNPV late genes are transcribed from a consensus late promoter transcription start signal (TAAG; Blissard and Rohrmann, 1990). The TAAG motif shows a dramatic difference in occurrence within the leader sequences of the selected ORFs (71 ORFs, 46%, Tables 1, 3) compared to the non- selected ORFs (11 ORFs, 6%; Tables 1, 3). It has been previously reported that A- T rich regions flank AcNPV ORFs (Kuzio et al., 1984). While the nucleotide
composition of the genome is 59% A+T, A+T rich regions are not uniformly ( randomly) distributed. Fig. 1 shows several regions of A+T composition th approaches 85% when measured with a 250 nucleotide moving window. Althoug A+T rich regions often flank AcNPV genes, this characteristic is not absolut For example, the region 5' to the viral DNA polymerase (ORF65) is not especiall A+T rich. Further, the TAAG motif occurs less frequently than would b expected for a random sequence. For the entire AcNPV genome we have observe that TAAG occurs 201 times on the + strand and 196 times on the - strand. B comparison, a sequence of similar composition, GAAT, occurs 574 times on the strand and 595 times on the — strand. In this context we calculate that th expected frequency of a sequence conforming to the composition (A2TG) in 133,894 bp genome of the base composition of AcNPV and involving randoml distributed bases, is 705 occurrences per strand.
EXAMPLE 6
Sequences flanking potential translation initiation sites
Kozak's rules (Kozak, 1986; 1987) for the sequences surrounding the translation initiation sites were not applied in the original selection of either the original 337 ORFs, or the selected 154 ORFs listed in Table 1. This analysis, however, was performed subsequently giving the results shown in Table 1 (colum K; see Table 3). Of the selected ORFs 91 (59%) had translation start sites that conformed to Kozak's rules (Table 1, column K, "k") against 52 (28%) of the non- selected ORFs (Table 3). The 41% of the selected ORFs that did not conform to Kozak's rules suggests that these rules do not apply to all AcNPV ORFs and that other, or additional criteria may be involved for some AcNPV genes. A frequency distribution profile of the nucleotides surrounding the start codon of the 154 selected ORFs is shown in Table 4. The dominance of an A residue at the -3 and perhaps -2 positions relative to the A of the ATG translation start sites in the
corresponding DNA is the only significant characteristic of the selected ORFs. G at -3 is not favoured in the selected ORFs.
At least two ORFs in AcNPV initiate translation at an ATG downstream of an in-frame ATG in the transcribed mRNA (Table 1, column K, identified as "2"). These are gp67 (ORF128) and PCNA (ORF49) (O'Reilly et al., 1989; Whitford et al, 1989). In Table 1, the amino acids and predicted Mr of the selected ORFs are based on the calculations for the largest potential ORF initiated with a methionine. This assumption over-estimates the size of the primary translation products for gp67 and PCNA, and for any other product for which translation is initiated at a downstream in-frame ATG.
There are 15 short ORFs (mini-cistrons) that are located immediately upstream (within 80 nucleotides) of the translation start site of the selected ORFs. All these mini-cistrons have ATG flanking sequences that conform to Kozak's rules. These are identified as "!" in Table 1, column K. For mini-cistrons that are out-of-frame with respect to the larger ORF, a termination codon occurs either upstream of the selected ORF, or within a short distance into its coding region. Mini-cistrons have been reported in the 5' leaders of other baculovirus genes (Tomalski et a/., 1988; Blissard and Rohrmann, 1989) and may have regulatory roles in the translation of mRNA species.
EXAMPLE 7
Codon usage
In the selected ORFs there is some bias in the codons that are used (Table 5), for example AGG and CGG (arginine), GGG (glycine), CTA, CTC, CTT (leucine) which are each used at less than half of the frequencies that may be expected if all the possible codons were utilized equally. While some codons appear to be discriminated against in the selected ORFs, others appear to be favoured (Table 5), for example CAA (glutamine), GAA (glutamic), GGC (glycine), ATT (isoleucine), TTG (leucine), and AAA (lysine). To what extent codon bias affects
the expression level of AcNPV genes, or foreign genes expressed from AcNPV- derived expression vectors, remains to be determined.
The predominant translation termination codon utilized by the selected ORFs is TAA. It terminates 117 of the 154 ORFs (76%, Table 5).
EXAMPLE 8 Gene functions
The known AcNPV genes have been summarised recently in a review by Kool and Vlak (1993). The known genes and cited references are included in Fig. 1 and in Table 1. The possible functions of certain other genes identified by the AcNPV genome analysis are described below.
A functional homologue of the AcNPV p35 gene (Friesen and Miller, 1987; Clem et al., 1991) has been found in Cydiα pomonellα granulosis virus (CpGV) (Crook et al., 1993, termed CpGV LAP). One AcNPV gene similar to the CpGV LAP is located within the EcdBl-k region of the genome (Table 1, ORF 27, IAP1; 31% identity; 53% similarity). In addition, AcNPV encodes a second LAP gene (Table 1; ORF 71, IAP2). This gene is 22% identical (45% similar) to the AcNPV IAP1 gene and 21% identical (48% similar) to the CpGV LAP gene.
AcNPV encodes a gene with identity to the acidic and basic fibroblast growth factors (FGFs), also known as heparin binding growth factors (HBGF, reviewed by Burgess and Maciag, 1989; Klagsbrun and D'Armore, 1991). The AcNPV FGF- like gene product shows cα. 35% identity (75% similarity) with known members of the FGF superfamily.
Global transactivators (GTAs) are a class of non-DNA binding proteins that have been implicated in the regulation of homeotic genes (Tamkun et al., 1992). AcNPV encodes a GTA (Table 1, ORF 42) with similarity to more than a dozen proteins in this class. The most significant alignments were with the brahma GTA of Drosophilα melαnogαster (Tamkun et al., 1992; 23% identity and 68% similarity in a 458 amino acid overlap) and the SNF2/SWI2 activators of yeast
(Laurent et al., 1991; 22% identity and 68% similarity in a 490 amino acid overlap). The AcNPV GTA is encoded by a 506 codon ORF. In contrast, the D. melαnogαster brahma gene is encoded by a 1638 codon ORF (Tamkun et al., 1992) while the yeast SNF2 gene contains an ORF of 1703 codons (Laurent era/., 1991).
PNK/PNL (ORF86) encodes a protein that may have multiple functions. The amino terminal portion is strongly related to T4 RNA ligase (31% identity, 72% similarity) while the carboxy terminal half of this protein is related to T4 polynucleotide kinase (26% identity, 66% similarity).
AcNPV encodes a chitinase (ORF126) that resembles those of other organisms, most notably Serrαtiα mαrcescens (57% identity; 88% similarity; Jones et αl., 1986). Analyses of the function of the viral chitinase indicates that it has a role in the liquefaction of infected larvae (R. Hawtin and R.D. Possee, manuscript in preparation).
AcNPV also encodes a putative alkaline exonuclease (ORF133). We identified this gene on the basis of 4 short conserved domains that are present in the alkaline exonucleases of herpes simplex viruses (typically 55% identity, 65% similarity). ORF133 has 53% identity with its Orgyiα pseudotsugαtα NPV (OpNPV) homologue (Gombart etal., 1989).
EXAMPLE 9 Protein motifs
As part of our search for potential virus-encoded RNA polymerase subunits, we searched for DNA binding motifs. A sample of the motifs used for the searches are shown in Table 3. They include zinc fingers (Table 1, Dom column, "Z"), leucine zippers (Table 1, Dom column, "L"), nucleoside triphosphate binding domains (Table 1, Dom column, "NTP") and nuclear translocation signals (Table 1, Dom column, "NTS").
A broad definition of the zinc finger was employed to search for such motifs, allowing either cysteine or histidine residues to occupy any of the zinc-locating
sites (Table 3). Using this pattern we determined that 31 out of the 154 selected ORFs contained at least 1 potential zinc finger. Zinc fingers were found in two potential apoptosis inhibitory proteins IAPl (ORF27) and IAP2 (ORF71) (Table 1). Zinc fingers were also found in the early genes IE-1 (ORF147), ME53 (ORF139) and PE38 (ORF153). The zinc finger suggested to be in cg30 was not identified by our analysis. However, the leucine zipper in the cg30 protein (ORF88) was identified. Leucine zippers were found in 7 other potential polypeptides, including the calyx protein, pp34 (Table 1).
An NTP binding motif was identified in 4 ORFs, 3 of which are known as late enhancing factors (lefs, Table 1). The fourth protein was PNK/PNL (ORF86). Interestingly, searches with a simplified motif for the ATP-binding site in protein kinases (GxGxxG, where x represents any amino acid), would not have found matches in either PKl (ORF10), or PK2 (ORF123), both of which have extensive overall identity with known protein kinases. PKl lacks a consensus ATP-binding motif, having IxGxxG at the ATP-binding site, while PK2 completely lacks this N- terminal domain.
NTS motifs were found in 12 of the selected ORFs. Known nuclear localising proteins that have an NTS include 39K, DNA polymerase, and p6.9. No NTS was found for the plO protein, which is the component of fibrous bodies present in the nuclei of AcNPV infected cells. It is possible that this and other viral proteins enter the nucleus using an alternative pathway, or are chaperoned by a protein containing an NTS. None of the AcNPV proteins that are known to be solely cytoplasmic had a predicted NTS.
Searches for known DNA binding motifs, or transcription factor motifs and profiles, and other domains relating to an RNA polymerase, did not conclusively identify any ORF that might encode a late gene RNA polymerase, or subunits thereof.
EXAMPLE 10
Potential secreted proteins
The method of von Heijne (1986) was used to predict if any of the selected ORFs encoded a protein with a signal sequence that may be involved in translocating a protein across a membrane. Ten ORFs were identified with putative signal sequences located within their first 50 amino acids (Table 1, column Dom, "S"). Proteins that are. either known to be secreted, or that possess signal peptides include EGT(ORF15), gp37(ORF64), gp67(ORF128), FGF(ORF32) and chitinase (ORF126).
EXAMPLE 11
Designing a Vector for Optimal Expression of a Protein
The cDNA sequence information for A. califomica can be used to design a vector which is capable of optimally expressing a desired protein product (called a "designer vector"). An investigator of ordinary skill in this art would analyze a variety of different factors prior to deciding on which genetic elements should be included in a specific designer vector. For example, an investigator might study the following factors before designing a vector; the protein to be synthetically produced, the host cells to be used, desired temporal timing for protein production, available insertion sites for the non-natural promoters, any known deleterious sequences or proteases that could reduce the amount of protein being produced, etc.
The designer vector can include a single promoter, multiple promoters, tandem promoters, combinations of synthetically constructed promoters, natural promoters and derivatives thereof. The choice of promoters depends on several
factors and is usually performed on a vector to vector basis (case by case basis). Additionally, many different genetic elements can be included in the designer vector and deciding which to include or exclude depends on the desired protein to be recombinaπtly produced in the baculovirus expression vector system.
A vector can be designed and constructed to optimize the isolation and recovery of the desired protein. For example, the vector can be designed to include specifically identified secretion sequences determined from the cDNA sequence data.
EXAMPLE 12
Using the Autographa califomica DNA Sequence to Identify Preferred Sites for Translation and Transcription Signals
From the claimed A. califomica cDNA sequence information, locations of transcription and translation signal sequences can be determined. Additionally, specific flanking sequences near the ATG sequence of the open reading frame can be identified and then used in order to optimally transcribe the ORF.
EXAMPLE 13
Using the Autographa califomica DNA Sequence to Identify New Genes
The A. califomica cDNA sequence information can be used to identify new genes. Once these new genes are identified, their promoters (early, late, immediate early or immediate late) may then be obtained and used in vectors. The new promoters from these new late genes may then be used to drive the expression of desired genes more efficiently and effectively when compared to the polyhedrin.
The reader's attention is directed to Table 1 which lists 154 potentially expressed open reading frames in AcNPV Strain £6. Many new genes have already been identified in the A. califomica cDNA sequence as is seen in Table 1 of the specification. Where genes have already been identified, the claimed sequence of the instant invention serves to confirm the presence of a specific gene. Genes have not yet been identified for many different ORFs. It is these genes that the inventors are initially focusing on in order to identify new genes in the A. califo ica sequence.
EXAMPLE 14
Using the Autographa califomica DNA Sequence to Identify Essential and Non-Essential Genes
From the claimed A. califomica cDNA sequence information, essential and non-essential gene regions can be identified. For the purpose of this invention, essential and non-essential genes refer to the virus replication in cell culture (e.g. Spodoptera frugiperda cells).
The inventors have identified several ORFs which are known to express essential genes and several ORFs which express non-essential genes. A partial list of these is presented below.
OPEN READING FRAME ESSENTIAL NON-ESSENTIAL
1 X
3 X
6 X
7 X
8 X
9 X
14 X
15 X
27 X
31 X
35 X
36 X
47-49 X
61 X
65 X
67 X
71 X
88 X
89-90 X
95 X
99-100 X
123 X
126-127 X
131 X
135 X
137-138 X
147 X
For example, ORFs 126 (chitinase) and 127 (cathespin) have been shown to be non-essential genes. Thus, these two gene could be eliminated from the A. califo ica sequence and not affect the nature of the sequence. Elimination of these two non-essential genes could be performed by standard protocols known to those skilled in the art.
The rationale for identifying non-essential genes is to reduce the genome size to smaller and more functional pieces in order to create a more effective, and environmentally acceptable pesticide or in order to create a more effective vector.
With the complete cDNA sequence for A. califo ica, regions that are essential for enabling a virus to live in an insect cell can be identified. Once these essential regions are identified, the essential sequence can be used to produce a virus that will not propagate in live insects. One use of such an environmentally safe virus would be used as a selective pesticide.
EXAMPLE 15
Using the Autographa califomica DNA Sequence to Design a Vector to be Used in Particular Host
The A califomica cDNA sequences claimed in this invention can be used to design a plasmid vector capable of optimizing expression of the desired protein. One way in which this plasmid vector can be tailored to more effectively and efficiently produce the desired protein of choice is to optimize it for the particular host.
For example, SF9 cells are the optimal cells of choice for production of desired proteins in the baculoviral expression vector system. A designer vector as in Example 11 above can be constructed for optimal expression of the desired protein in the SF9 cells by deleting selected deleterious sequences and/or providing enhancer sequences.
EXAMPLE 16
Using the Autographa califo ica DNA Sequence to Design a Tailored Viral Particle
Specific for Enhanced Infectivity
The A. califomica cDNA sequences claimed in this invention can be used to design a complete virus which is specifically constructed to contain specific and unique elements which will enhance the infectivity of this virus in a particular insect cell.
For example, by modifying the sequences required for the temporal expression of genes in the virus, a viral particle can -be designed to infect and kill the insect at an early stage. The claimed sequence can also be used to produce a virus capable of infecting larvae and not adult insects.
An additional embodiment of this invention is to use the claimed A. califomica cDNA sequence to tailor or design a virus which is capable of infecting only specific insects, thereby constructing a very host specific virus.
In order to create this designer virus with optimal infectivity, deleterious genes may have to be removed. For example, elimination of viral antigenicity would enable the production of an environmentally sound pesticide because the virus would be unable to arouse an antigenic response but can still infect and kill the insect. Additionally, removal of proteases will, in certain instances, also increase the productivity of a desired protein.
With the claimed cDNA sequence, a self destructive mechanism may be included in the viral particle. This mechanism can be designed such that once the viral particle has killed the host specific insect, the virus destroys itself via a time, chemical, or enzymatic attack. This self destructive mechanism will effectively eliminate any residual virus and therefore produce a more environmentally acceptable pesticide. For example, a sequence known to trigger lysis may be inserted adjacent to a late or early promoter.
EXAMPLE 17
Using the Autographa califomica DNA Sequence to Improve Host Ranges and Design Alternative Host Systems
The availability of the complete AcNPV sequence and subsequent experimental data will allow the identification of those virus genes with roles in determining those insect species which can be infected with the virus. The virus could be modified to limit infection to the target pest species, while leaving other species unaffected.
Conversely, other baculoviruses may be engineered to expand their host range to include several pest species. AcNPV has a wide host range in comparison to other baculoviruses and therefore may be a source of "host range genes" which can be added to these other baculoviruses.
EXAMPLE 18
Using the Autographa califomica DNA Sequence to Produce Proteins Naturally Expressed in AcNPV
Certain proteins are naturally expressed in A. califomica (for example, heparin binding factor). The cDNA sequence information of the claimed invention can be used to enhance or increase production of the proteins that are naturally expressed in califomica, for example, by inserting additional promoter sequences and/or by deleting certain sequences deleterious to the production of the desired protein.
EXAMPLE 19
Using the "Annihilator Sequence" from Autographa califomica cDNA Sequence
The deletion of the annihilator gene (ORF 135) from the virus results in a phenotype in which virus-infected cells die through a process of apoptosis or early cell death. In effect, the cell commits suicide to prevent replication of the virus. With the knowledge of this new cDNA sequence, other genes could be identified with a similar function to the annihilator gene, i.e. preventing the cell from undergoing an apoptotic response. These genes would be very useful since they could be incorporated into other baculoviruses to expand host range.
EXAMPLE 20
Conculsions from above data
The complete nucleotide sequence of AcNPV clone C6 has been determined. Other studies have reported the sequences of parts of the AcNPV genome of C6, or other AcNPV clones (reviewed by Kool and Vlak, 1993). The information reported here may be compared with the data of these other clones of AcNPV and other baculoviruses.
Analysis of the AcNPV C6 sequence data has served to resolve a number of minor issues concerning the genetic map of the virus and with respect to the DNA fragments produced by digestion with various restriction enzymes. In Fig. 1 a linear representation of the map is shown. Since the virus genome is circular, a more conventional map for the AcNPV genome is given in Fig. 3. In this map the identified genes (hatched arrows), and unassigned selected ORFs (open arrows) are shown as well as their orientations. Also indicated in Fig. 3 are the sites of Ar sequences and insertion (IS) and retroposon sequences (RP). This circular map includes the revised EcoΕl (outer ring) and HindUL (inner ring) fragment lengths of AcNPV C6. In the case of the extra HindTLl site within the HindlU A fragment, we have chosen to designate the two smaller fragments as Al and A2, rather than rename the entire map. In some AcNPV clones, notably E2 (Smith and Summers, 1978, 1979) and HR3 (Cochran et al., 1982), these two fragments are linked to
form HindUL A (22,869 nucleotides). In AcNPV Ll (Miller and Dawes, 1979), HindUL Al and A2 are present. The DNA sequence at the missing HindLTL site in AcNPV HR3 or E2 has not been reported, but in the C 6 strain this region encodes a functional chitinase gene (Table 1; R.D. Possee, unpublished data). If the HR3 and E2 strains also produce this enzyme, then the question is whether the sequence change involves an amino acid substitution, or some other alteration.
A total of 337 ORFs were identified within the virus genome that could potentially encode proteins of greater than 50 amino acids. This selection allowed inclusion of the 55 amino acid, arginine-rich p6.9 protein (basic protein, Wilson et al., 1987). It disregards smaller ORFs, some of which may encode proteins or peptides that are made during the virus infection process. The 154 ORFs were selected on the basis of their possession of a methionine codon and the absence of a larger, overlapping ORF. Again these assumptions may prove to be incorrect in some cases (e.g., where a spliced mRNA is involved). The number of gene products encoded by the AcNPV genome may be larger or smaller than 154, depending on the extent that the assumptions made in these analyses prove to be correct. Also, other strains of the virus may include additional sequences (insertions, or ORFs), or lack sequences by comparison to those in the C6 virus. Since it is valuable to have a reference point for comparison purposes, it is suggested that the AcNPV C6 ORF numbering nomenclature is adopted pro temporis and until virus gene functions are described for the particular ORFs.
It is known that some AcNPV proteins are initiated downstream of an in-frame start codon, for example, ETL PCNA (Crawford and Miller, 1988; O'Reilly et αl., 1989) and gp67 (Whitford et al., 1989). In consequence, some of the sizes of the primary gene products of the ORFs shown in Table 1 may prove to be smaller than predicted from this analysis. Also, the calculated sizes may not correspond to the final gene products, e.g., where post-translational modifications occur.
The complete AcNPV sequence was analysed using a neutral-net ORF identification programme, GRATL (Uberbacher and Mural, 1991), in order to
predict potential protein coding regions. GRAIL was originally designed as a programme for identifying coding exons in human and other DNA sequences. The GRATL coding recognition module incorporates seven sensor algorithms. Each component of the module provides an indication of the coding potential of the DNA sequence. The various sensor outputs are integrated using a neutral network which also predicts the locations of the coding regions. The system has been demonstrated to be effective in the identification of 90% of exons over 100 bases long in human DNA (Uberbacher and Mural, 1991). In part this success rate depends on the G+C content of the DNA. Coding regions are recognised less easily in DNA sequences with a lower G+C than A+T content. The G+C content of the AcNPV genome is only 41%, so the coding regions predicted from the GRATL an lysis must be treated with some caution. The candidate ORFs that were identified by GRATL were rated as excellent, good, marginal or null (Table 1). Most of the AcNPV genes which have been assigned functions gave excellent or good ratings using this method. The most notable exceptions were the protein tyrosine phosphatase (Kim and Weaver, 1993), p6.9 (Wilson et al., 1987) and conotoxin (Eldridge et al., 1992). Clearly, some coding regions of AcNPV may have been missed using this approach. However, the use of GRAIL provided a complementary analysis of the likely coding potential of the AcNPV genome. The value is confirmed by the fact that GRAIL predicted 84% of the 154 selected ORFs (Table 1), whereas only 4% of the 183 non-selected ORFs were identified by GRATL as having potential protein coding capacity.
Analysing the sequences immediately upstream of the ATG in each of the 154 selected ORFs served to identify potential TFITD binding sites (TATA boxes) and possible mRNA transcription start sites (CAGT). The CAGT motif is associated with many baculovirus early gene promoters and is probably a good indicator of whether or not a virus gene is transcribed in the early phase of the replication cycle. The TATA boxes are more problematic, in part due to the high A+T content of the AcNPV genome and its intergenic regions. More than one TATA
box was present upstream of many of the ORFs. Using the patterns shown in Table 3 we located TATA boxes upstream of 40% of the selected ORFs. Of the 3 TATA box patterns utilised to identify possible TFIID-type binding motifs (Table 3),. the TATAAA motif, which is the preferred TFIID binding site, was the most frequent in the selected ORFs identified to be early genes.
Taken together in context with the CAGT motif, 14% of the selected ORFs have a potential eukaryotic RNA pol II promoter within 160 nucleotides of the ATG codon of the respective ORF. The known AcNPV early genes identified by this procedure include: ME53 (ORF139), IE-1 (ORF147), IE-N (ORF151), and PE38 (ORF153). The presence of a TATA motif does not prove that it is used in early transcription by RNA pol EL This can only be determined by experimentation. For example, lef-3 has consensus TATA and CAGT motifs in its 5' leader, but no evidence has been reported that these are utilised in early mRNA synthesis (Li et al., 1993). Also, the polyhedrin gene has an RNA pol II motif within its promoter region.
Alternative transcription start sites, initiating from a CGTGC motif, have been identified in some AcNPV early gene promoters. For example, the CGTGC motif is utilised as early start sites for pl43 (Lu and Carstens, 1991), DNA polymerase (Tomalski et al., 1988) and p47 (Carstens et al., 1993). Perhaps this motif is involved in the expression of the AcNPV delayed-early genes and may be a site of recognition by virus-encoded, trans-activating proteins. The CGTGC motif is broadly similar to sequences found in AcNPV Ar regions, i.e., TYC(A/T)(A/T)A(AT)CGXGTRA (where Y is a pyrimidine, R a purine and X any nucleotide). The CGTGC motif is evenly distributed between the selected ORFs and the non-selected ORFs, suggesting that the definition of this motif is not refined enough toie of predictive value. If it is important, its placement may not be confined to the immediate 5' leader sequence of a neighbouring gene.
The late and very late transcription start sites involve a TAAG motif (Blissard and Rohrmann, 1990). We have used TAAG, rather than the canonical ATAAG or
RTAAG sequences to search for ORFs that might be transcribed late in infection in an endeavour to maximise the chance of finding matches. The 46% of the selected ORFs that are identified as probable late/very late genes may under¬ estimate such genes. It is known that some AcNPV late genes do not use TAAG transcription start sites, for example, cg30 (ORF88) initiates from the sequence ATTAG (Wu and Miller 1989). Also, the late gene p74 (ORF138) initiates transcription at the sequence TATTG (Kuzio et al, 1989) and p47 (ORF40) has a late transcription start site GTAAAAC (Carstens et al., 1993). A search for similar matches to the start site used in p47 revealed a good match at nucleotide 66,740 in the coding region for gp41 (ORF80). Interestingly, an ATAAG motif is present 145 nucleotides upstream of this site.
Finally, concerning transcription motifs, the analyses identified a number of ORFs that are transcribed both early and late. These are indicated in Table 1 and agree with the data recorded by others for some of these genes (see Table 1).
We also constructed a codon usage table for the 154 selected ORFs presented in this study (Table 5). There appears to be some codon bias. The codon usage bias shown by the AcNPV ORFs may reflect some state of the tRNAs available to the virus during the infection process. However although the sample base is low, so far we have not been able to detect a differential codon bias between early and late expressed genes.
Cϊs-acting elements (hrs) involved in the origins of AcNPV DNA replication have been shown to be A+T rich. By contrast, OpNPV appears to have at least one origin that is slightly G+C rich, but with a neutral purine composition, i.e., different from the hrs of AcNPV, or the transcription enhancer regions found within OpNPV (Pearson et al, 1993). The region in AcNPV homologous to the OpNPV origin of replication lies within ORF13 and ORF14 (lef-1). An OpNPV homologue for ORF13 has not been reported, however, a contiguous stretch of sequence, lacking an initiation methionine, but capable of encoding a 299 amino acid polypeptide, is present immediately downstream of the OpNPV homologue of
lef-1. This potential translation product is 50% identical (83% similar) to AcNPV ' ORF13. Interestingly, the relative positions of AcNPV ORFll and the OpNPV homologue are switched.
From the published data, to a large extent the sequences of OpNPV,
Choristoneura fumiferana NPV (CfNPV), and Bombyx mori NPV (BmNPV) are comparable to those of AcNPV (Leisy et al, 1984; Arif, 1986; Majima et al., 1993).
However, at least two sequence inversions have occurred in the OpNPV genome by comparison to the sequence of AcNPV. Adjacent and probably independent inversions have occurred in the region between Arl and Aria, and Aria and a nearby putative origin of replication in OpNPV genome (Pearson et al., 1993; G.F.
Rohrmann, personal communication). The Ar sites of baculoviruses may be active in inter- or intra-molecular recombination. If recombination was involved in the one or other inversion, how this occurred is not certain since there is no obvious relationship between the left and right arms of the second inverted region in
OpNPV; the corresponding regions of AcNPV are A+T rich. This suggests that an intramolecular inversion may have taken place in OpNPV. However, a detailed analysis of this region in that virus has yet to be undertaken.
The Ar regions of AcNPV have been implicated in replication of the AcNPV genomic DNA and may act as origins of replication (Pearson et al., 1992; Leisy and Rohrmann, 1993; Kool et al., 1993a,b). Furthermore, recent studies have identified regions of the AcNPV genome that encode products that act on the origins of replication (Kool et al., 1994).
The AcNPV + strand G-rich sequence at position 78,300 of AcNPV shows no overall bias in A+T content but has a pronounced spike with respect to total purine composition (ca. 78%, Fig. 1). Purine-rich tracts can potentially form an intrastrand triple-helix and tetrads. Triple helical DNA has been implicated as an origin of replication for some plasmids, as well as having other potential regulatory functions (Caddie et al., 1990). The only other region of elevated purine composition in AcNPV occurs within the coding region of ORF66 (ca. 68%
purines). The purine rich region within ORF66 is also A+T rich, thus A residues- contribute highly to the purine composition of the + strand. No pyrimidine rich regions are found in the + strand of AcNPV. In fact the overall purine composition of the + strand is 50%. The uniqueness of the purine rich region in the region of 78,300 bp warrants further investigation, either as a cis-acting element in DNA replication, or as a possible DNA packaging signal.
Using a combination of database searches for protein motifs and protein profile analyses, we have identified several previously unreported ORFs that encode proteins related to previously sequenced proteins.
There are two AcNPV genes (ORF27, IAPl; ORF71, IAP2) that by sequence analysis are similar to the CpGV IAP (Crook et al., 1993). Clem and associates (1991) have shown that the AcNPV 35k gene (Friesen and Miller, 1987) encodes an inhibitor of apoptosis (TAP). Deletion of this gene results in an "annihilator" virus phenotype which causes blebbing in virus-infected 5. frugiperda cells so that only 5% of the cells remain viable by 36 h post-infection. It is not known whether deletion of either ORF27 or ORF71 affects apoptosis in AcNPV infected S. frugiperda cells. By transfection studies it has been shown that the CpGV IAP gene provides the 3δk gene function in AcNPV 35k-negative mutants, thereby preventing the annihilator phenotype of the mutant (Crook et al., 1993). However, there is no sequence identity between AcNPV 35k and CpGV IAP. In view of the structural homologies between the CpGV IAP and the AcNPV IAPl and IAP2 genes, the roles and functions of these AcNPV genes warrant further investigation. It has been shown that the 35k-negative AcNPV mutant, while unable to replicate efficiently in S. frugiperda cells in culture, or in whole larvae, can be propagated in T. ni cells, or insects. It is possible that the AcNPV IAPl and IAP2 genes may prevent apoptosis in AcNPV infections of other cell types or larval species. Further, it has been shown that over-expression of a human inhibitor of apoptosis (BCL2) in S. frugiperda cells (Alnemri et at., 1992), using an AcNPV expression vector, results in the protection of the cells against apoptosis.
These recombinant virus infected cells have an extended survival time and do no . show the degradation of host cell DNA that is evident in cells infected with wild- type AcNPV. It is not known if over-expression of the AcNPV IAP genes results in extended survival of virus-infected cells. The AcNPV LAP genes do not share any structural similarity with BCL2, or any other known IAP gene. However, the viral LAP genes are similar to certain DNA binding proteins by the possession of 3 copies of a zinc finger motif.
AcNPV encodes a gene with identity to the FGFs and HBGF family of growth factors. Two conserved cysteines have been identified in all the human FGFs sequenced to-date. These are Cys31 and Cys98 (relative to human acidic FGF). These cysteines have been implicated in intramolecular disulfide bond formation (Burgess and Maciag, 1989). The N-terminal cysteine is lacking from the putative AcNPV FGF. In human acidic FGF, site-directed mutagenesis of cDNA clones has implicated Lysl33 in heparin binding (Burgess and Maciag, 1989). The AcNPV FGF has an arginine at this position. This substitution of one basic residue for another also occurs in the int-2 proto-oncogene precursor Hbg3. Free heparin is known to inhibit the growth of herpes simplex viruses (Nahmias and Kilbrick, 1964). More recently, it has been shown that heparin binds to HSV-1 virions via the glycoprotein gC (WuDunn and Spear, 1989; Herold et al., 1991) and prevents their adsorption to heparin sulphate moieties resident on cell surface proteogl cans. Heparin similarly inhibits plaque formation of pseudorabies virus by binding to glycoprotein gELI (Mettenleiter et al., 1990). If a similar mechanism operates in insects, then expression of a heparin binding factor by the baculovirus could be a method to complex free heparin (or heparin-related compounds) thereby facilitating virus spread within the host. The virus FGF has a signal peptide sequence at its amino terminal sequence which may facilitate secretion from virus-infected cells. However, the role of AcNPV FGF in the infection process of AcNPV remains to be determined.
The GTAs are non-DNA binding proteins thought to have a role in the regulation of homeotic genes (Tamkun et al., 1992). Homeotic genes are involved in the expression of a large group of other genes that have been implicated in directed development and growth of an organism (McGinnis et al., 1984a,b; Scott and Weiner, 1984; Levine and Hoey, 1988; Hayashi and Scott, 1990). The AcNPV ORF42 has homology with the GTAs of D. melαnogαster (Tamkun etal., 1992) and yeast (Laurent et al., 1991). The AcNPV GTA-like protein does not have either the early CAGT, or late TAAG transcription initiation sites, so it is difficult to predict when it may be expressed in virus-infected cells. Transcriptional analysis is required to determine if and when it is synthesized. A viral GTA might be involved in regulating a number of genes involved in viral processes, such as late gene transactivation. It is also conceivable that the AcNPV GTA-like gene acts as a repressor to inhibit host gene expression.
It had been expected that direct sequence analysis of the entire AcNPV genome would reveal whether or not baculoviruses encode their own RNA polymerase. The induction of an α-amanitin-resistant RNA polymerase in AcNPV-infected cells (Fuchs et αl., 1983) is thought to be responsible for the high levels of late and very late virus gene transcription. A detailed comparison of the potential coding regions in the AcNPV genome has so far failed to identify with confidence any putative subunits of a virus-encoded polymerase. This suggests that either the AcNPV-induced RNA polymerase is unlike any other polymerases so far identified, or that the virus does not encode such an enzyme and utilises a modified host polymerase instead. We have found only 1 ORF encoding a gene that may be involved with RNA metabolism, the protein kinase RNA ligase (ORF86, PNK/PNL).
It has been shown from analyses of infected cells that the AcNPV late gene RNA polymerase has at least 8 subunits with apparent sizes of 95, 76, 50, 47.5, 40, . 33.5, 27.5, and 26 kDa (Yang et αl., 1991). These subunits are believed to be distinct from host encoded RNA polymerase subunits. The level of processing of
viral RNA polymerase subunits (i.e., cleavage of primary products phosphorylation) is not known, so that simple comparison with the predicted sizes of the ORFs listed in Table 1 could be misleading. Nonetheless, some uncharacterized ORFs encode peptides of similar molecular weights to the indicated components of the viral RNA polymerase. In particular, ORF144 encodes a 33.5 kDa peptide that has similarity to the yeast MSS18 protein (Seraphin et al., 1988). MSS18 is known to be involved with yeast mitochondrial RNA splicing. Also, ORF124 encodes a 28.5 kDa peptide with similarity to a plasmid copy number protein from Clostridium perfringens (Gamier and Cole, 1988).
Several genes encoding late enhancing factors (lefs) have recently been •identified (see Table 1). These include lefs 1-5 as well as LE-1, LE-N, and pl43 (helicase). Some or all of these genes could be involved in forming the late gene transcription complex. As shown in Table 1, several of these genes have predicted molecular weights close to the molecular weights identified for the late transcription complex as determined by SDS-PAGE electrophoresis.
Space constraints prevent the presentation of a complete listing of all the motifs characteristic of proteins with known functions that are present in the selected and non-selected ORFs. This is best explored on a gene by gene basis. Pattern searches look for exact matches. This means that some motifs will be missed if there are minor changes in the scanned sequence. For example, without prior knowledge that ORF95 was a helicase, it is unlikely that we would have identified it as a helicase. While the NTP binding domain was clearly evident, other motifs, i.e., DEAD and DEAD-like motifs characteristic of other helicases (Gorbalenya et al., 1989) are not. There are 2 classes of helicases currently recognised, those with a characteristic DEAD motif in domain LT (Gorbalenya et al., 1989) and those in which the DEAD motif in domain II is not well conserved (non-DEAD helicases, see Hodgman, 1988a,b). Except for limited similarity in domain I (the NTP binding domain) and domain LI, the other 4 domains identified
49
as conserved within the DEAD or non-DEAD helicases are not well conserved between the 2 families. A typical BLAST search revealed no other helicase matches except for the published sequence of AcNPV pl43. Similar results were obtained with FASTA, BLAZE and BLITZ searches. Finding matches to ORFs with several small scattered conserved domains may be a matter for additional, fine-sequence analysis.
EXAMPLE 21
Newly Identified AcMNPV Gene
Which is Dispensable For Virus Replication
In Cell Culture or Insect Larvae
ORF 27: 22600 > 23458: Inhibitor of Apoptosis-Like Gene 1 (IAPl)
This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 27: 22600 > 23458 and named "Inhibitor of Apoptosis-Like Gene 1" (IAPl).
This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome. Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " < " between the left and right coordinates. Thus, the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
For this Example, an AcMNPV DNA fragment, coordinates 21828 (SalGI site) to 23559 (SalGI site) was subcloned into pUC118 which had been digested with Sail and treated with calf intestinal phosphatase (CD?) to derive pUCllδ.IAPl (Figure 4, Panels a-e). This was digested with Hpal (22756 and 23017), treated with CE? and ligated with a Bgiπ linker (CAGATCTC). This plasmid was digested with BgUI, treated with CD? and ligated with a DNA cassette containing the Escherichia coli beta-galactosidase coding region. The cassette was chosen to derive an in-frame fusion between the IAPl and beta-galactosidase coding regions. The plasmid was designated pUC118.IAPl.lacZ. This was used to cotransfect Spodoprera frugiperda cells with infectious AcMNPV C6 DNA to produce recombinant virus with a copy of the beta- galactosidase gene in frame with the IAPl, this disrupting IAPl function. The results from this Example demonstrated that the recombinant virus (AcIAPl.lacZ) replicated normally in S. frugiperda cells and Trichoplusia ni insect larvae.
EXAMPLE 22
Newly Identified AcMNPV Gene
Which is Dispensable For Virus Replication
In Cell Culture or Insect Larvae
ORF 30: 24315 < 25704: Haemolysin Secretory Protein (HSP)
This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 30: 24315 < 25704 and named "Haemolysin Secretory Protein" (HSP).
This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome. Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " < " between the left and right coordinates. Thus, the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
For this Example, an AcMNPV DNA fragment, coordinates 24117 (Pstl site) to 27783 (Pstl site) was subcloned into a pUC118 which had been previously digested with Pstl and treated with CEP to derive pUCllδ.HSP (See Figure 5, Panels a-e). This was digested with Mlul (25550), treated with CEP and ligated with an Mlul-Bgiπ oligonucleotide adaptor (CGCGAGATCT) to insert a Bgiπ site within the HSP coding region to derive pUC118.HSP- BglEt. This was digested with BglEC, treated with CEP and ligated with a DNA cassette containing the E. coli beta-galactosidase coding region. The cassette was chosen to derive an in-frame fusion between the HSP and beta-galactosidase coding regions. The plasmid was designated pUC118.HSP.lacZ. This was used to cotransfect S. frugiperda cells and T. ni insect larvae. The results demonstrated that the recombinant virus replicated normally in S. frugiperda cells and T. ni insect larvae.
52 EXAMPLE 23
Newly Identified AcMNPV Gene
Which is Dispensable For Virus Replication
In Cell Culture or Insect Larvae
ORF 32: 27041 < 27584: Fibroblast Growth Factor (FGF)
This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 32: 27041 < 27584 and named "Fibroblast Growth Factor: (FGF).
This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome. Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " < " between the left and right coordinates. Thus, the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
For this Example, an AcMNPV DNA fragment, coordinates 26945 (EcoRI site) to 27783 (Pstl), was subcloned into pUC118 which had been previously digested with EcoRI and Pstl and treated with CEP to derive pUC118.FGF (See Figure 6, Panels a-e). This was digested with AccI (27417), treated with CEP, and ligated with an Accl-Bgffi ohgonucleotide adaptor (CGAGATC) to insert a Bgiπ site within the FGF coding region to derive pUC118-FGF-Bgiπ. This was digested with Bgiπ, treated with CEP and ligated with DNA cassette containing the E. coli beta-galactosidase coding region. The cassette was chosen to derive an in-frame fusion between the FGF and beta-galactosidase coding regions. The plasmid was designated pUC118.FGF.lacZ. This was used to cotransfect S. frugiperda cells with infectious AcMNPV C6 DNA to produce recombinant virus with a copy of the beta-galactosidase gene in frame with the FGF, thus disrupting FGF function. The results demonstrated that the recombinant virus (AcFGF.lacZ) replicated normally in S. frugiperda cells and T. i insect larvae.
EXAMPLE 24
Newly Identified AcMNPV Gene
Which is Dispensable For Virus Replication
In Cell Culture or Insect Larvae
ORF 71: 61016 > 61763: Inhibitor of Apoptosis-Like Gene 2 (IAP2)
This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 71: 61016 > 61763 and named "Inhibitor of Apoptosis-Like Gene 2" (IAP2).
This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome. Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " < " between the left and right coordinates. Thus, the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
For this Example, an AcMNPV DNA fragment, coordinates 60448 (Sad Site) to 63194 (Sad site) was subcloned into pUC118 digested with SacI and treated with CEP to derive pUC118.IAP2 (See Figure 7, Panels a-e). This was digested with Clal (61263) and SacH (61560), treated with CIP and ligated with a Clal-Bgiπ-Sacπ oligonucleotide adaptor (GCGAGATCTGGC [top strand], CAGATCTG [bottom strand]) to derive pUC118.IAP2-Bgiπ. This was digested with BglH, treated with CIP and ligated with a DNA cassette containing the E. coli lacZ coding region to provide an in-frame fusion between the virus and bacterial genes. The plasmid was designated pUC118.IAP2.lacZ. This was used to cotransfect S. frugiperda cells with infectious AcMNPV C6 DNA to produce recombinant virus with a copy of the beta- galactosidase gene in frame with the IAP2, thus disrupting IAP2 function. The results demonstrated that the recombinant virus (AcIAP2.1acZ) replicated normally in S. frugiperda cells and T. ni insect larvae.
EXAMPLE 25
Newly Identified AcMNPV Gene Which is Dispensable For Virus Replication In Cell Culture or Insect Larvae ORF 86: 72131 < 74213: Polynucleotide Kinase/Polynucleotide Ligase (PNK/PNL)
This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 86: 72131 < 74213 and named "Polynucleotide Kinase/Polynucleotide Ligase" (PNK/PNL).
This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome. Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " < " between the left and right coordinates. Thus, the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
For this Example, an AcMNPV DNA fragment coordinates 71417 (Hindm site) to 83121 (HinάTπ site) was subcloned into pAT153 digested with HindEQ and treated with CEP to derive PAT153.PNK/PNL (See Figure 8, Panels a-e). This was digested with StuI (72308), treated with CEP and ligated with a BgHI adaptor (CAGATCTG) to insert a BglH site within the PNKPNL coding region to derive pAT153.PNK/PNL-Bgiπ. This was digested with BglH, treated with CIP and ligated with a DNA cassette containing the E. coli lacZ coding region to provide an in-frame fusion between the virus and bacterial genes. The plasmid was designated pUCllδ.PNK/PNL.lacZ. This was used to cotransfect S. frugiperda cells with infectious ACMNPV C6 DNA to produce recombinant virus with a copy of the beta-galactosidase gene in frame with the PNK/PNL, thus disrupting PNK/PNL function. The results showed that the recombinant virus (AcPNK/PNL.lacZ) replicated normally in S. frugiperda cells.
EXAMPLE 26
Newly Identified AcMNPV Gene
Which is Dispensable For Virus Replication
In Cell Culture or Insect Larvae
ORF 123: 102964 < 103609: Protein Kinase 2 (PK2)
This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 123: 102964 < 103609 and named "Protein Kinase 2" (PK2).
This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome. Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " < " between the left and right coordinates. Thus, the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
For this Example, an AcMNPV DNA fragment, coordinates 102148 (Pstl site) to 105164 (Pstl site), was subcloned into pUCllδ digested with Pstl and treated with CEP to derive pUC118.PK2 (See Figure 9, Panels a-e). This was digested with Apal (103356), treated with CEP and ligated with an Apal-BglH adaptor (AGATCTGGCC) to insert a BglH site within the PK2 coding region to derive pUC118.PK2.Bgiπ. This was digested with BglH, treated with CEP and ligated with a DNA cassette containing the E. coli lacZ coding region to provide an in-frame fusion between the virus and bacterial genes. The plasmid was designated pUC118.PK2.lacZ. This was used to cotransfect S. frugiperda cells with infectious AcMNPV C6 DNA to produce recombinant virus with a copy of the beta-galactosidase gene in frame with the PK2, thus disrupting PK2 function. The results demonstrated that the recombinant virus (AcPK2.1acZ) replicated normally in S. frugiperda cells.
56 EXAMPLE 27
Newly Identified AcMNPV Gene
Which is Dispensable For Virus Replication
In Cell Culture or Insect Larvae
ORF 126: 105282 < 106935: Chitinase (CHID
This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 126: 105282 < 106935 and named "Chitinase" (CHIT.
This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome. Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " < " between the left and right coordinates. Thus, the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
For this Example, an AcMNPV DNA fragment coordinates 105164 (Pstl site) to 107943 (Pstl site), was subcloned into pUC118 (lacking a Hind HI site) digested with Pstl and treated with CIP to derive pUC118.CHIT (See Figure 10, Panels a-e). This was digested with Hind (106337), treated with CIP and ligated with a Hindm-BamHl adaptor (AGCTGGATCC) to insert a BamHl site within the CHIT gene to derive pUC118.CHIT-BamHl. This was digested with BamHl, treated with CEP and ligated with a DNA cassette containing the E. coli lacZ coding region to provide an in-frame fusion between the virus and bacterial genes. The plasmid was designated pUC118.CHIT.lacZ. This was used to cotransfect S. frugiperda cells with infectious ACMNPV C6 DNA to produce recombinant virus with a copy of the beta- galactosidase gene in frame with the chitinase, thus disrupting chitinase function. The recombinant virus (AcCHIT.lacZ) replicated normally in S. frugiperda cells. In T. ni insect larvae, the virus replicated but failed to induce liquefaction of the host.
EXAMPLE 28
Newly Identified AcMNPV Gene
Which is Dispensable For Virus Replication
In Cell Culture or Insect Larvae
ORF 127: 106983 > 107952: Cathepsin (CATS)
This Example identifies a new AcMNPV gene which is dispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 127: 106983 > 107952 and is named "Cathepsin" (CATH).
This AcMNPV gene can be deleted from the baculovirus genome to: (a) provide additional sites for inserting single or multiple copies of foreign genes and (b) to reduce the size of the virus genome. Each gene is ascribed a number corresponding with the order of open reading frames (ORFs) within the AcMNPV genome. This is followed by the precise coordinates of the left and right ends of the coding region. Genes which are on the same strand as the polyhedrin gene are indicated by " > " between the left and right coordinates. Genes which are antisense to the polyhedrin are indicated by " < " between the left and right coordinates. Thus, the translation initiation codon (ATG) for polyhedrin-sense genes is located at the left coordinate while the translation initiation codon for antisense genes is located at the right coordinate. All coordinates in this description are relative to the AcMNPV genomic sequence, even after subcloning of virus DNA fragments into plasmid vectors.
For this Example, an AcMNPV DNA fragment, coordinates 105164 (Pstl site) to 107943 (Pstl site), was subcloned into pUC119 to derive pUC119.M (See Figure 11, Panels a-e). This plasmid was used in a site directed mutagenesis experiment to remove part of the chitinase coding region. In the course of this experiment, a clone was produced which lacked an approximate 500 base pair region which spanned the chitinase and cathepsin gene promoters (these are located "back-to-back" between coordinates 106935 and 106983). This served to abrogate the function of both genes. This mutated plasmid was designated pUC119.M.CHTT- /CATH-. It was used to cotransfect S. frugiperda cells with infectious virus DNA, purified from the AcCHTT.lacZ, which had been digested with Bsu361 to enhance the recovery of recombinant viruses. The recombinant virus, AcCH_T-/CATH-, replicated normally in S. frugiperda cells. In T. ni insect larvae, the virus replicated but failed to induce liquefaction of the host.
EXAMPLE 29
Newly Identified AcMNPV Gene Which Is
Not Amenable to Disruption With Foreign DNA Sequences
ORF 42: 34010 > 33924: Global Transactivator (GTA)
This Example identifies a new AcMNPV gene which is indispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 42: 34010 > 33924 and is named "Global Transactivator" (GTA).
This AcMNPV gene was modified in a similar manner as Examples 21-28 (described above), but the modification did not result in the production of an infectious virus stock. This information is a strong indication that this virus gene is indispensable for replication and cannot be removed from the virus genome.
For this Example, an AcMNPV DNA fragment, coordinates 33403 (EcoRI site) to 37088 (Asp718 site) was inserted into pUC118 digested with EcoRI and Asp718 and treated with CEP, to derive pUCllδ.GTA (See Figure 12, Panels a-e). This was digested with BstETI (34289 and 34382), treated with CEP and ligated with a BstEH adaptor (GTAACAGATCT [top strand], GTTACAGATCT [bottom strand], to insert a Bgiπ site into the GTA gene to derive pUC118.GTA-BGiπ. This was digested with BglH, treated with CEP and ligated with a DNA cassette containing the E. coli lacZ coding region to provide an in-frame fusion between the virus and bacterial genes. This plasmid was designated pUC118.GTA.lacZ. This was used to cotransfect S. frugiperda cells with infectious AcMNPV C6 DNA to produce recombinant virus. Although some blue plaques were derived, these could not be titrated to genetic homogeneity and it was concluded that the GTA gene is essential for virus replication in cell culture.
EXAMPLE 30
Newly Identified AcMNPV Gene Which Is
Not Amenable to Disruption With Foreign DNA Sequences
ORF 124: 103793 > 104534: Plasmid Copy Number Protein (PCNP)
This Example identifies a new AcMNPV gene which is indispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 124: 103793 > 104534 and named "Plasmid Copy Number Protein" (PCNP).
This AcMNPV gene was modified in a similar manner as Examples 21-28 (described above), but the modification did not result in the production of an infectious virus stock. This information is a strong indication that this virus gene is indispensable for replication and cannot be removed from the virus genome.
For this Example, an AcMNPV DNA fragment, coordinates 102148 (Pstl site) to 105164 (Pstl site) was inserted into pUC118, digested with Pstl and treated with CEP, to derive pUC118.PCNP (See Figure 13, Panels a-e). This was digested with BstXI (103913) and Spel (104424), treated with CEP and ligated with a BstXI-Spel oligonucleotide adaptor (GTTGCAAGATCT [top strand], CTAGAGATCTTG [bottom strand]), containing a BglH site, to derive pUC118.PCNP-Bgiπ. This was digested with BglH, treated with CEP and ligated with a DNA cassette containing the E. coli lacZ coding region to provide an in-frame fusion between the virus and bacterial genes. This plasmid was designated pUC118.PCNP.lacZ. This was used to cotransfect Spodoptera frugiperda cells with infectious AcMNPV C6 DNA to produce recombinant virus. Although some blue plaques were derived, these could not be titrated to genetic homogeneity and it was concluded that the PCNP gene is essential for virus replication in cell culture.
EXAMPLE 31
Newly Identified AcMNPV Gene Which Is
Not Amenable to Disruption With Foreign DNA Sequences
ORF 132: 112560 > 113817: Alkaline Exonuclease (ALK-EXO)
This Example identifies a new AcMNPV gene which is indispensable for virus replication in cell culture or insect larvae. This gene is located at ORF 132: 112560 > 113817 and named "Alkaline Exonuclease" (ALK-EXO).
This AcMNPV gene was modified in a similar manner as Examples 21-28 (described above), but the modification did not result in the production of an infectious virus stock. This information is a strong indication that this virus gene is indispensable for replication and cannot be removed from the virus genome.
For this Example, an AcMNPV DNA fragment, coordinates 112044 (Smal site) to 113913 (HindELT site) was subcloned into pUC118 digested with HindHT and Smal and treated with CEP (See Figure 14, Panels a-e). This combination of enzymes served to remove an intervening BamHl site within the polylinker of the plasmid. The plasmid was designated pUCl 18. ALK-EXO. This was digested with BamHl (113033), treated with CEP and Ugated with a DNA cassette containing the E. coli lacZ coding region to provide an in-frame fusion between the virus and bacterial genes. The plasmid was designated pUC118.ALK-EXO.lacZ. This was used to cotransfect S. frugiperda cells with infectious AcMNPV C6 DNA to produce recombinant virus. Although some blue plaques were derived, these could not be titrated to genetic homogeneity and it was concluded that the ALK-EXO gene is essential for virus replication in cell culture.
EXAMPLE 32
Restriction Sites Which Are Not
Recognized Sites Within the AcNPV Genome
This Example identifies three restriction enzymes which do not have recognition sites within the AcNPV genome. These three enzymes are:
Bsu36l 5' CCTNAGG 3'
Srfl 5' GGCCCGGGC 3'
SSe8387I 5' CCTGCAGG 3'
Bsu36I has already been exploited in the widely used "Baculogold or BakPAK6" linearized virus DNA technology for the efficient production of recombinant viruses. Briefly, Bsu36I sites were inserted within the ORF 9 (immediately downstream of the polyhedrin gene in AcNPV) and ORF 7 (immediately upstream of the polyhedrin gene). The polyhedrin gene was replaced with the beta-galactosidase coding region which also contains a Bsu36I site. Digestion of the virus DNA with Bsu36I removes the beta-galactosidase coding region and flanking regions of virus sequences to derive a linearized genome which is unable to replicate in insect cells (virus DNA must be circular). Furthermore , the removal of part of ORF 9, which encodes a virus structural protein, prevents the formation of an infectious virus in the rare event that virus DNA is recircularized via the action of cellular ligases. However, if a baculovirus transfer vector, which contains a foreign gene and the complete ORF 9, is cotransfected into insect cells with the defective virus DNA, homologous recombination restores ORF 9 function and enables recombinant virus production.
Srfl and Sse8387I could be utilized in a similar manner in other regions of the virus genome. For example, they could be used to alter the AcNPV genome to incorporate these sites and facilitate genomic DNA linearization.
EXAMPLE 33 Unique Restriction Sites Within the AcNPV Genome
This Example identifies two restriction sites which only digest AcNPV DNA once: Avril (See Figure 15) and Fsel (See Figure 16). AvriT digests within the non-essential EGT (ecdysteroid UDP-glucosyltransferase) gene and Fsel digests within the essential GTA (global transactivator) gene. Thus, Avril can be used to linearize the AcNPV genome and facilitate the insertion of foreign genes within EGT. Furthermore, Fsel cannot be used in a similar manner to facilitate insertion of genes within GTA because this will disrupt GTA function. However, site directed mutagenesis could be used to remove the Fsel site from GTA, thus permitting the exploitation of this enzyme as described for Bsu361, Sse83871 and Srfl as described above.
EXAMPLE 34
Agricultural Biopesticide
Information derived from the entire AcMNPV genomic sequence could afford development of novel baculovirus transfer vectors that encode baculoviruses with favorable agronomic properties. Identification of genes encoding proteins that modify viral host range would lead to generation of recombinant NPVs wherein said recombinant viruses would be capable of infecting and therefore neutralizing a wider spectrum of important agronomic pests. Alternatively, genetic manipulation could lead to changes in viral properties that render the virus capable of infecting only a very narrow spectrum of insect pests, thus affording precise control of targeted insect species while sparing beneficial insect populations.
Moreover, genes involved in viral replication could be identified. Manipulation of these genes could afford recombinant baculoviruses that multiply more rapidly within infected insect cells, thus leading to more rapid neutralization of the infected insect. For example, it is known that the Global Transactivator Gene (ORF 42; see Example 29) is indispensable for viral replication. Accordingly, it may be possible to manipulate this gene such that the efficiency of viral replication is maximized, accelerating the infectious process. Other genes influencing viral infectivity could also be identified and modified in order to raise the efficiency of the infectious process. This would also afford more rapid neutralization of targeted populations, and thus approach the rapidity of insect neutralization commonly associated with appUcation of traditional chemical insect control agents.
In addition, genes could be identified that qu∑ditatively control viral repUcation outside of a permissive propagation system. For instance, viral mutants deficient in a protein or proteins required for in vivo infectivity could be propagated in an insect cell culture system that is permissive to viral replication. While efficient viral repUcation in cell culture takes place, as well as initial infection of target insects, further viral replication in vivo is curtailed, and environmental impact of appUcation of recombinant baculoviruses is minimized.
REFERENCES
ALNEMRI, E.S., ROBERTSON, N.M, FERNANDES, T.F., CROCE, CM and
LITWACK, G. (1992). Overexpressed full-length human BCL2 extends the survival of baculovirus-infected Sf9 insect cells. Proc. Natl. Acad. Sci. USA.
89, 7295-7299. ALTSCHUL, S.F., GISH, W., MILLER, W„ MYERS, E.W. and LTPMAN, D.J.
(1990). Basic local alignment search tool. J. Mol. Biol. 215, 403-410. ARIF, B.M. (1986). The structure of the viral genome. Curr. Top. Microbiol.
Immunol. 131, 21-29. BATROCH, A. (1993). The PROSITE dictionary of sites and patterns in proteins, its current status. Nucleic Acids Res. 21, 3097-3103. BEAMES, B. and SUMMERS, M.D. (1989). Location and nucleotide sequence of the 25 k protein missing from baculovirus few polyhedra (FP) mutants.
Virology 168, 344-353. BECKER, D. and KNEBEL-MORSDORF, D. (1993). Sequence and temporal appearance of the early transcribed baculovirus gene HE65. J. Virol. 67,
5867-5872. BTRNBOIM, H.C. and DOLY, J. (1979). A rapid alkaline extraction procedure for screening recombinant plasmid DNA. Nucleic Acids Res. 7, 1513-1523. BLISSARD, G.W. and ROHRMANN, G.F. (1989). Location, sequence transcriptional mapping, and temporal expression of the gp64 envelope glycoprotein gene of the Orgyia pesudotsugata multicapsid nuclear polyhedrosis virus. Virology 170, 537-555. BLISSARD, G.W. and ROHRMANN, G.F. (1990). Baculovirus diversity and molecular biology. Anuu. Reυ. Entomol. 35, 127-155. BRAUNAGEL, S.C., DANTEL, K.D., REILLY, L.M., GUARTNO, L.A., HONG, T. and SUMMERS, M.D. (1992). Sequence, genomic organisation of the EcόKL-
A fragment of Autographa califomica nuclear polyhedrosis virus, and
96/01320 PCI7
64 identification of a viral-encoded protein resembling the outer capsid protein
VP8 of rotavirus. Virology 191, 1003-1008. BURGESS, W.H. and MACIAG, T. (1989). The herparin-binding (fibroblast) growth factor family of proteins. Ann. Rev. Biochem. 58, 575-606. CADDLE, M.S., LUSSIER, R.H. and HEINTZ, N.H. (1990). Intramolecular DNA triplexes, bent DNA and DNA unwinding elements in the initiation region of an amplified dihydrofolate reductase replicon. J. Mol. Biol.211, 19-33. CARSON, D.D., SUMMERS, M.D. and GUARTNO, L.A. (1991). Molecular analysis of a baculovirus regulatory gene. Virology 182, 279-286. CARSTENS, E.B., LU, A.L. and CHAN, H.L.B. (1993). Sequence, transcriptional mapping, and overexpression of p47, a baculovirus gene regulating late gene expression. J. Virol. 67, 2513-2520. CHISHOLM, G.E. and HENNER, D.J. 1988. Multiple early transcripts and splicing of the Autographa califomica nulcear polyhedrosis virus IE-1 gene.
J. Virol. 62, 3193-3200. CLEM, R.J., FECHHEIMER, M. and MELLER, L.K. (1991). Prevention of apoptosis by a baculovirus gene during infection of insect cells. Science 254,
1388-1390. COCHRAN, M.A., CARSTENS, E.B., EATON, B.T. and FAULKNER, P. (1982).
Molecular cloning and physical mapping of restriction endonuclease fragments oϊ Autographa califo ica nuclear polyhedrosis virus. J. Virol. 41,
940-946. CRAWFORD, A.M. and MILLER, L.K. (1988). Characterization of an early gene accelerating expression of late genes of the baculovirus Autographa califomica nuclear polyhedrosis virus. J. Virol. 62, 2773-2781. CROOK, N.E., CLEM, R.J. and MELLER, L.K. (1993). An apoptosis-inhibiting baculovirus gene with a zinc finger-like motif. J. Virol. 67, 2168-2174. DAYHOFF. M., SCHWARTZ, R.M. and ORCUTT, B.C. (1978). In atlas of protein
sequence and structure. Nat. Biomed. Res. Found., Silver Spring, MB. 5,
345-352. DEVEREUX, J., HAEBERLI, P. and SMITHIES, 0. (1984). A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 12, 387-395. ELDRIDGE, R.L., LI, Y. and MELLER, L.K. (1992). Characterization of a baculovirus gene encoding a small conotoxinlike polypeptide. J. Virol. 66,
6563-6571. FRANCKI, R.I.B., FAUQUET, CM., KN JDSON, D.L. and BROWN, F. (Eds.)
(1991). Classification and nomenclature of viruses. Fifth report of the
International Committee on Taxonomy of Viruses. Arch. Virol. 2. Springer-
Verlag Wien New York. FRIESEN, P.D. and MILLER, L.K. (1987). Divergent transcription of early 35- and 94-kilodalton protein genes encoded by the HindUL-K genome fragment of the baculovirus Autographa califomica nuclear polyhedrosis virus. J.
Virol. 61, 2264-2272. FUCHS, L.Y., WOODS, M.S. and WEAVER, R.F. (1983). Viral transcription during Autographa califo ica nuclear polyhedrosis virus infection: a novel
RNA polymerase induced in infected Spodoptera frugiperda cells. J. Virol.
48, 641-646. GARNTER, T. and COLE, S.T. (1988). Complete nucleotide sequence and genetic organization of the bacteriocinogenic plasmid, pEP404, from Clostridium perβngens. Plasmid 19, 134-150. GEARING, K.L. and POSSEE, R.D. (1990). Functional analysis of a 603 nucleotide open reading frame upstream of the polyhedrin gene of
Autographa califo ica nuclear polyhedrosis virus. J. Gen. Virol. 71, 251-
262. GHOSH, D. (1992). TFD: the transcription factors database. Nucleic Acids Res.
20 Suppl., 2091-2093.
GOMBART, A.F., PEARSON, M.N. and ROHRMANN, G.F. (1989). baculovirus polyhedral envelope-associated protein: genetic location nucleotide sequence, and immunocytochemical characterization. Virology
169, 182-193. GORBALENYA, A.E., KOONTN, E.V., DONCHENKO, A.P. and BLTNOV, V.M.
(1989). Two related superfamiUes of putative helicases involved in replication, recombination, repair and expression of DNA and RNA genomes.
Nucleic Acids Res. 17, 4713-4730. GRIBSKOV, M., LUTHY, R. and EISENBERG, D. (1990). Profile analysis. Meth.
Enzy. 183, 146-59. GUARINO, L.A. (1990). Identification of a viral gene encoding a ubiquitin-like protein. Proc. Natl. Acad ScL USA. 87, 409-413. GUARTNO, L.A., GONZALEZ, M.A. and SUMMERS, M.D. (1986). Complete sequence and enhancer function of the homologous DNA regions of
Autographa califomica nuclear polyhedrosis virus. J. Virol. 60, 224-229. GUARINO, L.A. and SMITH, M.W. (1990). Nucleotide sequence and characterization of the 39k gene region of Autographa califomica nuclear polyhedrosis virus. Virology 179, 1-8. GUARINO, L.A. and SUMMERS, M.D. (1987). Nucleotide sequence and temporal expression of a baculovirus regulatory gene. J. Virol. 61, 2091-2099. GUARINO, L.A. and SUMMERS, M.D. (1988). Functional mapping of
Autographa califo ica nuclear polyhedrosis virus genes required for late gene expression. J. Virol.62, 463-471. HAYASHI, S. and SCOTT, M.P. (1990). What determines the specificity of action of Drosophila homeodomain proteins? Cell 63, 883-894. HENIKOFF, S. and HENIKOFF, J.G. (1992). Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci USA. 89, 10,915-10,919. HEROLD, B.C., WuDUNN, D., SOLTYS, N. and SPEAR, P.G. (1991).
Glycoprotein C of herpes simplex virus type 1 plays a principal role in the adsorption of virus to cells and in infectivity. J. Virol.65, 1090-1098. HODGMAN, T.C (1988a). A new superfamily of replicative proteins. Nature 333,
22-23. HODGMAN, T.C. (1988b). A new superfamily of replicative proteins (Erratum).
Nature 333, 578. HOOFT VAN IDDEKINGE, B.J.L., SMITH, G.E. and SUMMERS, M.D. (1983).
Nucleotide sequence of the polyhedrin gene of Autographa califo ica nuclear polyhedrosis virus. Virology 131, 561-565. HOOPES, R.R and ROHRMANN, G.F. (1991). In vitro transcription of baculovirus immediate early genes: accurate mRNA initiation by nuclear extracts from both insect and human cells. Proc. Natl. Acad. Sci. USA. 88,
4513-4517. JONES, J.D.G., GRADY, K.L., SUSLOW, T.V. and BEDBROOK, J.R. (1986).
Isolation and characterization of genes encoding two chitinase enzymes from
Serratia marcescens. EMBO J. 5, 467-473. KIM, D. and WEAVER, R.F. (1993). Transcription mapping and functional analysis of the protein tyrosine/serine phosphatase (PTPase) gene of the
Autographa califomica nuclear polyhedrosis virus. Virology 195, 587-595. KING, L.A. and POSSEE, R.D. (1992) The baculovirus expression system, a laboratory guide, pp 1-229, Chapman and Hall (London). KLAGSBRUN, M. and D'AMORE, P.A. (1991). Regulators of angiogenesis. Ann.
Rev. Physiol. 53, 217-239. KNEBEL-MORSDORF, D., KREMER, A. and JAHNEL, F. (1993). Baculovirus gene ME53, which contains a putative zinc finger motif, is one of the major early-transcribed genes. J. Virol. 67, 753-758. KOGAN, P.H. and BLISSARD, G.W. (1994). A baculovirus gp64 early promoter is activated by host transcription factor binding to CACGTG and GATA elements. J. Virol. 68, 813-822.
KOOL, M. and VLAK, J.M. (1993). The structural and functional organization of the Autographa califomica nuclear polyhedrosis virus genome. Arch. Virol.
130, 1-16. KOOL, M., VOETEN, J.T.M., GOLDBACH, R.W. and VLAK, J.M. (1994).
Functional mapping of regions of the Autographa califomica nuclear polyhedrosis viral genome required for DNA replication. Virology 198, 680-
689. KOOL, M., vanDENBERG,P.M.M.M., TRAMPER, J., GOLDBACH,R.W. and
VLAK, J.M. (1993a). Location of two putative origins of DNA replication of
Autographa califomica nuclear polyhedrosis virus. Virology 192, 94-101. KOOL, M., VOETEN, J.T.M., GOLDBACH, R.W., TRAMPER, J. and VLAK, J.M.
(1993b). Identification of seven putative origins of Autographa califomica multiple nucleocapsid nuclear polyhedrosis virus DNA replication. J. Gen.
Virol. 74, 2661-2668. KOVACS, G.R., GUARINO, L.A., GRAHAM, B.L. and SUMMERS, M.D. (1991).
Identification of spliced baculovirus RNAs expressed late in infection.
Virology 185, 633-643. KOZAK, M. (1986). Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44,
283-292. KOZAK, M. (1987). An analysis of δ'-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic. Acids Res. 15, 8125-8148. KRAPPA, R. and KNEBEL-MORSDORF, D. (1991). Identification of the very early transcribed baculovirus gene PE-38. J. Virol. 65, 805-812. KUZIO, J., ROHEL, D.Z., CURRY, C.J., KREBS, A., CARSTENS, E.B. and FAULKNER, P. (1984). Nucleotide sequence of the plO polypeptide gene of Autographa califomica nuclear polyhedrosis virus. Virology, 139, 414-418. KUZIO, J., JAQUES, R. and FAULKNER, P. (1989). Identification of p74, a
gene essential for virulence of baculovirus occlusion bodies. Virology 173,
759-763.
LAURENT, B.C., TRETTEL, M.A. and CARLSON, M. (1991). Functional interdependence of the yeast SNF2, SNF5, and SNF6 proteins in transcriptional activation. Proc. Natl. Acad Sci. USA. 88, 2687-2691.
LEISY, D.J., ROHRMANN, G.F. and BEAUDREAU, G.S. (1984). Conservation of genome organization in two multicapsid nuclear polyhedrosis viruses. J.
Virol. 52, 699-702.
LEISY, D.J. and ROHRMANN, G.F. (1993). Characterization of the replication of plasmids containing hr sequences in baculovirus-infected Spodoptera frugiperda cells. Virology 196, 722-730.
LEVE Ε, M. and HOEY, T. (1988). Homobox proteins as sequence-specific transcription factors. Cell 55, 537-540.
LI, Y., PASSARELLI, A.L. and MELLER, L.K. (1993). Identification, sequence, and transcriptional mapping of lef-Z, a baculovirus gene involved in late and very late gene expression. J. Virol.67, 5260-5268.
LIU, A., QIN, J., RANKEST, C, HARDEN, S.E. and WEAVER, R.F. (1986).
Nucleotide sequence of a portion of the Autographa califomica nuclear polyhedrosis virus genome containing the Ecό l site-rich region (hr5) and an open reading frame just 5' of the plO gene. J. Gen. Virol. 67, 2565-2570.
LU, A. and CARSTENS, E.B. (1991). Nucleotide sequence of a gene essential for viral DNA replication in the baculovirus Autographa califo ica nuclear polyhedrosis virus. Virology 181, 336-347.
LU, A. and CARSTENS, E.B. (1992). Nucleotide sequence and transcriptional analysis of the p80 gene of Autographa califomica nuclear polyhedrosis virus: a homologue of the Orgyia pseudotsugata nuclear polyhedrosis virus capsid-associated gene. Virology 190, 201-209.
McGTNNTS, W., GARBER, R.L., WΓRZ, J., KUROIWA, A. and GEHRING, W.J.
(1984a). A homologous protein-coding sequence in Drosophila homoeotic genes and its conservation in other metazoans. Cell 37, 403-408. McGTNNTS, W., LEVINE, M.S., HAFEN, E., KUROΓWA, A. and GEHRING, W.J.
(1984b). A conserved DNA sequence in homoeotic genes of the Drosophila antennapedia and bithorax complexes. Nature 308, 428-433. MAJIMA, K., KOBARA, R. and MAEDA, S. (1993). Divergence and evolution of homologous regions of Bombyx mori nuclear polyhedrosis virus. J. Virol. 67.
7513-7521. METTENLEITER, T.C, ZSAK, L., ZUCKERMANN, F., SUGG, N., KERN, H. and BEN-PORAT, T. (1990). Interaction of glycoprotein gin with a cellular heparinlike substance mediates adsorption of pseudorabies virus. J. Virol.
64, 278-286. MILLER, L.K. and DA WES, K.P. (1979). Physical map of the DNA genome of
Autographa califomica nuclear polyhedrosis virus. J. Virol.29, 1044-1055. NAHMIAS, A.J. and KILBRICK, S. (1964). Inhibitory effect of heparin on herpes simplex virus. J. Bacteriol. 87, 1060-1066. NIKOLOV, D.B., HU, S.-H., LIN, J., GASCH, A., HOFFMANN, A., HORIKOSHI,
M., CHUA, N.-H., ROEDER, R.G. and BURLEY, S.K. (1992). Crystal structure of TFIID TATA-box binding protein. Nature 360, 40-46. O'REILLY, D R., CRAWFORD, A.M. and MILLER, L.K. (1989). Viral proliferating cell nuclear antigen. Nature 337, 606. O'REILLY, D.R., MILLER, L.K. and LUCKOW, V.A. (1992). Baculovirus expression vectors. A laboratory manual. W.H. Freeman and Co., New
York. O'REILLY, D.R. and MILLER, L.K. (1989). A baculovirus blocks insect molting by producing ecdysteroid UDP-glucosyl transferase. Science 245, 1110-1112. O'REILLY, DJR. and MILLER, L.K. (1990). Regulation of expression of a baculovirus ecdysteroid UDP-glucosyl transferase gene. J. Virol. 64, 1321-
1328.
O'REILLY, D.R., PASSARELLI, A.L., GOLDMAN, I.F. and MILLER, L.K. (1990). Characterization of the DA26 gene in a hypervariable region of the
Autographa califo ica nuclear polyhedrosis virus genome. J. Gen. Virol. 71, 1029-1037. OELLIG, C, HAPP, B., MULLER, T. and DOERFLER, W. (1987). Overlapping sets of viral RNAs reflect the array of polypeptides in the EcόEl J and N fragments (map positions 81.2 to 85.0) of the Autographa califomica nuclear polyhedrosis virus genome. J. Virol. 61, 3048-3057. PASSARELLI, A.L. and MILLER, L.K. (1993a). Three baculovirus genes involved in late and very late gene expression: ie-1, le-n, and lef-2. J. Virol.
67, 2149-2158. PASSARELLI, A.L. and MELLER, L.K. (1993b). Identification and characterization of lef-1, a baculovirus gene involved in late and very late gene expression. J. Virol. 67, 3481-3488. PEARSON, W.R. and LEPMAN, D.J. (1988). Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA. 85, 2444-2448. PEARSON, M.N., BJORNSON, R.M., AHRENS, C and ROHRMANN, G.F.
(1993). Identification and characterization of a putative origin of DNA replication in the genome of a baculovirus pathogenic for Orgyia pseudotsugata. Virology 197, 715-725. PEARSON, M., BJORNSON, R., PEARSON, G. and ROHRMANN, G. (1992).
The Autographa califo ica baculovirus genome: evidence for multiple replication origins. Science 257, 1382-1384. POSSEE. R.D. (1986). Cell-surface expression of influenza virus haemagglutinin in insect cells using a baculovirus vector. Virus Res. 5, 43-59. POSSEE, R.D., SUN, T.-P., HOWARD, S.C, AYRES, M.D., HILL-PERKINS, M. and GEARING, K.L. (1991). Nucleotide sequence of the Autographa califomica nuclear polyhedrosis 9.4 kbp EcoRI-I and -R (polyhedrin gene) region. Virology 185, 229-241.
RAWLINGS, N.D., PEARL, L.H. and BUTTLE, D.J. (1992). The baculovirus Autographa califomica nuclear polyhedrosis virus genome includes a papain- like sequence. Biol. Chem. Hoppe-Seyler 373, 1211-1215. ROEDER, R.G. (1991). The complexities of eukaryotic transcription initiation: regulation of preinitiation complex assembly. TIBS 16, 402-408. SAMBROOK, J., FRITSCH, E.F. and MANIATIS, T. (1989). Molecular cloning; a laboratory manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York. SCOTT, M.P. and WEINER, A.J. (1984). Structural relationships among genes that control development: sequence homology between the antennapedia, ultrabithorax, and fushi tarazu loci of Drosophila. Proc. Natl. Acad Sci. USA. 81, 4115-4119. SERAPHTN, B., SIMON, M. and FAYE, G. (1988). MSS18, a yeast nuclear gene involved in the splicing of intron al5β of the mitochnodrial coxl transcript. EMBO 7, 1455-1464. SMITH, G.E. and SUMMERS, M.D. (1978). Analysis of baculovirus genomes with restriction endonucleases. Virology 89, 517-527. SMITH, G.E. and SUMMERS, M.D. (1979). Restriction maps of five Autographa califomica MNPV variants, Trichoplusia ni MNPV, and Galleria mellonella MNPV DNAs with endonucleases Smal, Kpnl, BamHl, Sacl, Xhol and EcoRI. J. Virol. 30, 828-838. SMITH, T.F. and WATERMAN, M.S. (1981). Identification of common molecular subsequences. J. Mol. Biol. 147, 195-197. STURROCK, S.S. and COLLINS, J.F. (1993). MPsrch version 1.3. Biocomputing
Research Unit, University of Edinburgh, UK. TABOR, S. and RICHARDSON, C.C (1987). DNA sequence analysis with a modified bacteriophage T7 DNA polymerase. Proc. Natl. Acad. Sci. USA. 84, 4767-4771.
TAMKUN, J.W., DEURING, R., SCOTT, M.P., KISSINGER, M., PATTATUCCI,
A.M., KAUFMAN, T.C and KENNISON, J.A. (1992). Brahma: a regulator of Drosophila homeotic genes structurally related to the yeast transcriptional activator SNF2/SW12. Cell 68, 561-572. THIEM, S.M. and MELLER, L.K. (1989a). Identification, sequence, and transcriptional mapping of the major capsid protein gene of the baculovirus
Autographa califomica nuclear polyhedrosis virus. J. Virol.63, 2008-2018. THIEM, S.M. and MILLER, L.K. (1989b). A baculovirus gene with a novel transcription pattern encodes a polypeptide with a zinc finger and a leucine zipper. J. Virol. 63, 4489-4497. TJIA, S.T., CARSTENS, E.B. and DOERFLER, W. (1979). Infection of Spodoptera frugiperda cells with Autographa califo ica nuclear polyhedrosis virus. II.
The viral DNA and the kinetics of its replication. Virology 99, 399-409. TILAKARATNE, N., HARDEN, S.E. and WEAVER, R.F. (1991). Nucleotide sequence and transcript mapping of the HindUl F region of the Autographa califomica nuclear polyhedrosis virus genome. J. Gen. Virol.72, 285-291. TOMALSKI, M.D., ELDREDGE, R. and MILLER, L.K. (1991). A baculovirus homolog of a Cu/Zn superoxide dismutase gene. Virology 184, 149-161. TOMALSKI, M.D., WU, J. and MILLER, L.K. (1988). The location, sequence, transcription, and regulation of a baculovirus DNA polymerase gene.
Virology 167, 591-600. UBERBACHER, E.C. and MURAL, R.J. (1991). Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach.
Proc. Natl. Acad Sci. USA 88, 11261-11265. VAUGHN, J.L., GOODWIN, R.H., TOMPKINS, G.J. and McCAWLEY, P. (1977).
The establishment of two cell lines from the insect Spodoptera frugiperda
(Lepidoptera: Noctuidae). In Vitro 13, 213-217. VIEIRA, J. and MESSING, J. (1987). Production of single-stranded plasmid
DNA. Meth. Enzy. 153, 3-11.
VLAK, J.M. and SMITH, G.E. (1982). Orientation of the genome of Autographa califomica nuclear polyhedrosis virus: a proposal. J. Virol.41, 1118-1121. von HEIJNE, G. (1986). A new method for predicting signal sequence cleavage sites. Nucleic Acids Res. 14, 4683-4690. WHITFORD, M and FAULKNER, P. (1992). Nucleotide sequence and transcriptional analysis of a gene encoding gp41, a structural glycoprotein of the baculovirus Autographa califomica nuclear polyhedrosis virus. J. Virol.
66, 4763-4768. WHITT, M.A. and MANNING, J.S. (1988). A phosphorylated 34-kDa protein and a subpopulation of polyhedrin are thiol linked to the carbohydrate layer surrounding a baculovirus occlusion body. Virology 163, 33-42. WHITFORD, M., STEWART, S., KUZIO, J. and FAULKNER, P. (1989).
Identification and sequence analysis of a gene encoding gp67, an abundant envelope glycoprotein of the baculovirus Autographa califomica nuclear polyhedrosis virus. J. Virol.63, 1393-1399. WELSON, M.E., MAENPRIZE, T.H., FREESEN, P.D. and MELLER, L.K. (1987).
Location, transcription, and sequence of a baculovirus gene encoding a small arginine-rich polypeptide. J. Virol. 61, 661-666. WU, J. and MILLER, L.K. (1989). Sequence, transcription and translation of a late gene of the Autographa califomica nuclear polyhedrosis virus encoding a
34.8 k polypeptide. J. Gen. Virol. 70, 2449-2459. WuDUNN, D. and SPEAR, P.G. (1989). Initial interaction of herpes simplex virus with cells is binding to heparan sulfate. J. Virol. 63, 52-58. YANG, C.L., STETLER, D.A. and WEAVER, R.F. (1991). Structural comparison of the Autographa califomica nuclear polyhedrosis virus-induced RNA polymerase and the three nuclear RNA polymerases from the host,
Spodoptera frugiperda. Virus Res. 20, 251-264. ZAWEL, L. and REINBERG, D. (1992). Advances in RNA polymerases II transcription. Curr. Opin. Cell Biol. 4, 488-495.
FIGURE LEGENDS
FIG. 1. Physical map and summary of coding strategy of the AcNPV genome. The upper part of each panel represents a map of the sites in the virus genome for the commonly used restriction endonucleases (see text). Also shown are the hrs within the EcoRI map. The middle part of each panel summarizes the coding potential of all six reading frames of the virus DNA (1,2,3, ,2',3'). ORFs are identified as black boxes starting at methionine codons (vertical Unes). The selected ORFs (see text) are numbered 1-154, with appropriate designations for the genes which have been characterized previously (see Table 1). Non-selected ORFs (not numbered or named) represent potential genes which overlap with other coding regions (see text). The lower section in each panel summarizes the percent purine or A+T composition for the + strand of the virus genome, using a sliding 250 nucleotide window. Units at the bottom are in base-pairs.
FIG. 2. Dot matrix analysis of AcNPV genomic DNA. The genomic sequence of AcNPV
+ strand was compared to itself (left panel), or to its complementary - strand (right panel), using a 24 nucleotide moving window. The direction of sequence (strandedness) relative to the standard map in each comparison is indicated by the arrows on the x and y axes. Dots represent sites where there is 21 out of 24 or greater nucleotide sequence match (88% identity). As a consequence, matches in the left panel indicate sites of positional identity (diagonal Une mnning lower left to upper right), or direct DNA repeats (dots off the diagonal Une). Matches in the right panel indicate regions of inverted repetitive DNA. Dots close to the position where a diagonal Une should be in the right panel (i.e., upper left to lower right) represent potential stem-and-loop (hairpin) structures. The columns and rows of dots marking the positions of the repetitive DNA associated with hrs are labelled across the top and on the right-side y-axis. Scales on the x and y axes are in kilobase-pairs.
FIG. 3. Circular map of the AcNPV genome. The sites for the EcoRI (outer ring) and
HindUL (inner ring) restriction enzymes are presented. The positions of the 154 ORFs described
PCMB95/00578
76
in Fig. 1 are indicated, with arrows representing the direction of transcription for these putative genes. Shaded arrows indicate that the gene is known to be expressed, or has a weU characterized homologue in the protein sequence databases. Insertion sites and names of well characterized insertion sequences (IS) and retroposons (RP) are indicated, as are the positions of the hr sequences. The scale on the inner circle is in 100 map units.
Fig. 4. Modification of the AcNPV IAPl gene. Panel (a): SalGI restriction map for
AcNPV (linearized form). Panel (b): Exploded view of genome coordinates 21828-23359 within pUC118.IAPl. Panel (c): pUC118.IAPl-Bgiπ. Panel (d): pUC118.IAPl.LacZ. Panel (e): Cotransfect pUCl 18IAPl.lacZ with infectious AcNPV DNA to produce AcIAPl.lacZ which is repUcation competent in insect ceU culture and larvae.
Fig. 5 Modification of the AcNPV HSP gene. Panel (a): Pstl restriction map for
AcNPV (linearized form). Panel (b): Exploded view of genome coordinates' 24117-27783 within pUC118.HSP. Panel (c): pUC118.HSP-Bgiπ. Panel (d): pUC118.HSP.LacZ. Panel (e): Cotransfect pUC118.HSP.lacZ with infectious AcNPV DNA to produce AcHSP.lacZ which is repUcation competent in insect ceU culture and larvae.
Fig. 6. Modification of the AcNPV FGF gene. Panel (a): EcoRI and Pstl restriction maps for AcNPV (linearized form). Panel (b): Exploded view of genome coordinates 26945- 27783 within pUC118.FGF. Panel (c): pUC118.FGF-BglIL Panel (d): pUC118.FGF.LacZ. Panel (e): Cotransfect pUC118.FGF.lacZ with infectious AcNPV DNA to produce AcFGF.lacZ which is repUcation competent in insect cell culture and larvae.
Fig. 7. Modification of the AcNPV IAP2 gene. Panel (a): Sacl restriction map for
AcNPV (linearized form). Panel (b): Exploded view of genome coordinates 60448-63194 within pUC118.IAP2. Panel (c): pUC118.IAP2-BgUI. Panel (d): pUC118.IAP2.LacZ. Panel (e): Cotransfect pUC118.IAP2.lacZ with infectious AcNPV DNA to produce AcIAP2.1acZ which is replication competent in insect ceU culture and larvae.
Fig. 8. Modification of the AcNPV PNK PNL gene. Panel (a): HindEtl restriction map for AcNPV (linearized form). Panel (b): Exploded view of genome coordinates 71417-83121 within pAT153.PNK/PNL. Panel (c): pAT153.PNK/PNL-Bgiπ. Panel (d): pAT153.PNK PNL.LacZ. Panel (e): Cotransfect pAT153.PNK/PNL.lacZ with infectious AcNPV DNA to produce AcPNK/PNL.lacZ which is repUcation competent in insect cell culture and larvae.
Fig. 9. Modification of the AcNPV PK2 gene. Panel (a): Pstl restriction maps for
AcNPV (linearized form). Panel (b): Exploded view of genome coordinates 102148-105164 within pUC118.PK2. Panel (c): pUC118.PK2-Bgiπ. Panel (d): pUC118.PK2.LacZ. Panel (e): Cotransfect pUC118.PK2.lacZ with infectious AcNPV DNA to produce AcPK2.1acZ which is repUcation competent in insect ceU culture and larvae.
Fig. 10. Modification of the AcNPV CHITINASE gene. Panel (a): Pstl restriction maps for AcNPV (linearized form). Panel (b): Exploded view of genome coordinates 105164-107943 within pUC118.CHTE. Panel (c): pUC118.CHrr-Bgiπ. Panel (d): pUC118.CHTr.LacZ. Panel (e): Cotransfect pUC118.CHTT.lacZ with infectious AcNPV DNA to produce AcCHIT.lacZ which is repUcation competent in insect ceU culture and larvae.
Fig. 11. Modification of the AcNPV CATH gene. Panel (a): Pstl restriction maps for
AcNPV (linearized form). Panel (b): Exploded view of genome coordinates 105164-107943 within pUC119.CHTT/CATΗ. Panel (c): pUC119.CHTT-/CATH-. Panel (d): Cotransfect pUCl 19.CHrr-CATH- with infectious AcCHTT.lacZ digested with Bsu36I to produce AcCHTT- /CATH- which is replication competent in insect ceU culture and larvae.
Fig. 12. Modification of the AcNPV GTA gene. Panel (a): EcorRI/Asp718 restriction map for AcNPV (linearized form). Panel (b): Exploded view of genome coordinates 33403-37088 within pUCllδ.GTA. Panel (c): pUC118.GTA-Bgiπ. Panel (d): pUC118.GTA.LacZ. Panel (e): Cotransfect pUC118.GTA.lacZ with infectious AcNPV DNA to produce AcGTA.lacZ. This failed to produce a replication competent virus.
Fig. 13. Modification of the AcNPV PCNP gene. Panel (a): Pstl restriction map for
AcNPV (linearized form). Panel (b): Exploded view of genome coordinates 102148-105164 within pUC118.PCNP. Panel (c): pUC118.PCNP-Bgiπ. Panel (d): pUC118.PCNP.LacZ. Panel (e): Cotransfect pUC118.PCNP.lacZ with infectious AcNPV DNA to produce AcPCNP.lacZ. This failed to produce a repUcation competent virus.
Fig. 14. Modification of the AcNPV ALK-EXO gene. Panel (a): Hindm/Smal restriction map for AcNPV (linearized form). Panel (b): Exploded view of genome coordinates 112044- 113913 within pUCl 18. ALK-EXO. Panel (c): pUC118.ALK-EXO.LacZ. Panel (d): Cotransfect pUC118.ALK-EXO.lacZ with infectious AcNPV DNA. This failed to produce a repUcation competent virus.
Fig. 15. Single restriction enzyme site (AvrEQ within the AcNPV EGT gene. Panel (a):
Avrll restriction enzyme map for AcNPV (linearized form). Panel (b): exploded view of genomic coordinates 10000-17248.
Fig. 16. Single restriction enzyme site (Fsel) within the AcNPV GTA gene. Panel (a):
Fsel restriction enzyme map for AcNPV (linearized form). Panel (b): exploded view of genomic coordinates 33403-37088.
TABLE 1. Listing of 154 potentially expressed ORFs in AcNPV strain C6.
The selected ORF's (see text) are numbered sequentially, in their order of appearance in the + strand of the genome (see text and Fig. 1). The left (column Left) and right (column Right) columns define the ends of the ORF irrespective of its encoding strand. The direction of the transcripts (column D) that could express the ORF is indicated by arrows. The number of amino acids encoded by the ORF (column aa) and the predicted molecular mass of the primary translation product (column Mr) from the first ATG are Usted (see text). The UkeUhood that an ORF is translated as predicted from Grail analysis (column G) is scored: e=exceUent; g=good; m=medium; and, n=nuU. The transcription column (Trans) indicates if at least one early (e/E), or TATA-like (t/T), or cap (c/Q motif is present in the 160 nucleotides upstream of an ORF (see text and Table 3). where a TATA-box is positioned 5' to a CAGT in a poUL-
like promoter orientation, this is indicated by "TC". The presence of a late promoter motif, TAAG (L), within 80 nucleotides upstream of an ATG is also shown in column Trans. ORFs that have an initiation methionine that conforms to Kozak-rules (column K) for higher eukaryotes are indicated (k). ORFs representing potential mini-cistrons initiating upstream of one of the selected ORFs and with an ATG condon that conforms to Kozak-rules are indicated (*, see text). ORFs that initiate at an ATG codon downstream of the first ATG or an ORF and producing a translation product that is smaller than the computer predicted product are marked (*, see text). Representative motifs in putative translation products (Table 3) are indicated in the domains column (Dom). The motifs included signal peptide (S), zinc finger (Z), leucine zipper (L), nuclear translation signal (N), and NTP binding domain (P). The presence of a motif indicates that this feature occurs in at least one copy of the given ORF. Where an ORF has a common name defined by its function or by sequence identity to other products in the protein databases these are indicated (column Name). Alternative names for ORFs that are cited in the Uterature are given (column Alt Name). The comments column includes differences in genomic organization pubUshed for other strains of AcNPV, functional properties of predicted peptide products, or other relevant features. References are Usted as a guide to the Uterature regarding previously pubUshed sequences or studies defining AcNPV gene functions.
PALINDROMIC SEQUENCES IN THE HR* COMPLEX AND ELSEWHERE
Position Sequence hr name
93, 456 GCGTTACAAGTAGAATTCTACTGGTAAAGC Λr4al
93, 576 GTTTTACAAGTAGAATTCTACTCGTAAAGC Ar4a2
97, 396 GCTTTACGAGTAGAATTCTACTTGTAACGC Ar4bl
97, 611 GCTTTACGAGTAGAATTCTACTTGTAAAAC Λr4b2
97, 684 GCTTTACGAGTAGAATTCTACTTGTAACGC Λr4b3
97, 773 GCTTTACGAGTAGAATTCTACGTGTAAAAC Ar4b4
102, 606 GTTTTACGCGTAAAATTCTACTGGTAAAAC Λr4cl
Position Sequence Nearest ORF
11,471 GTAAATGCGGCCAATA TATTGGCCGTGTTTCC 0RP15-internal
15,820 ATTGTTTTTATTTTT T ATAAAATAATACAAT 3' to ORF19
48,676 CCGTTTTTACAAATGGAA ATGTATTTGTAAAA CGG ORF61-internal
55,090 AGTTTTACTTT AAAGTAAAACT ORF65-internal
68,155 TTGAAATTTTTGATTTC T TAAATCAAAAATTTCAA ORF83-internal
99,730 TAAACGTGTACA TGTACACGATTA ORP115-internal
120,798 AA CGGCGTCGTGTT G GACACGACGCCGGTT ORF138-internal
132,221 AAAATGAACTTTTTTGTA A TGCAAAAAAGTTGATAGT ORF152-internal
132,284 ACAGTGTAGACTATTCTA A TAAAATAGTCTACGATTT 5' to ORF152
132,354 AAAGTGAACTTTTTTGCA T TGCAAAAAAATTCATTTT 51 to ORF153
In the upper panel are shown the 30 bp sequences of the hr4 complex of repetitive DNA in the AcNPV clone 6 genomic DNA (see text). In the lower panel are shown certain other palindromic sequences with residues underlined that may contribute to hairpin structures in the single-stranded species. The indicated nucleotide residue is the position of the left-most residue in the genome sequence. The locations of the non-λr palindromes with respect to the specified ORFs are indicated (see text). Of interest are the similarities between the lower three palindromes, in particular those initiating at 132,221 and 132,354 (Krappa and Knebel-Mδrsdorf, 1991).
TABLE 3
SEARCH PATTERNS EMPLOYED TO IDENTIFY MOTIFS IN THE ACNPV GENOME
AND ITS PUTATIVE GENE PRODUCTS.
Number of ORFs with matches
Motif Patterns
Non-
Selected selected
CGTGC A(A/T)CGT(G/T); CGTGC 65/154 69/183
TATA box TATAAA; TATATA; TATAAT" 61/154 40/183
Cap site CAGT 72/154 59/183
Pol II promoter TATA-boxxι-160 cap site 21/154 7/183
Late promoter TAAG 71/154 1 1/183 oza consensus AxxATG(A/G); GxxATGG 91/154 52/183
Zinc finger C/H X2-5 C/H X1 L13 C/H X2/5 C/H 31/154
Leucine zipper (Lxxxxxx)3L 8/154
NTS (K/R)2 X10 K/R(3outof 5) 12/154
NTP binding GxxGxG X15-20 K; GxGK(S/T)(S/T) 4/154
The motifs, their patterns and the number of the selected and non-selected ORFs (see text) with at least one copy of the indicated motifs are presented. The searches for motifs representing putative early transcription sites involved analyses of DNA sequences 160 nucleotides upstream of the first ATG codon (i.e., CGTGC, TATA box, Cap site and Pol II promoter motifs). For the late promoter transcription motif the •search involved 80 nucleotides upstream of the ATG codon. Only the selected ORFs were analysed for motifs in the putative gene products (see text).
TABLE 4
NUCLEOTIDE FREQUENCIES ENCOMPASSING THE START CODONS OF 154 ORFS
Position -6 -5 -4 -3 -2 -1 A T G +4 +5 +6
A 54 35 49 120 88 52 15 J 0 0 60 64 35
C 17 27 61 2 27 55 0 0 0 15 37 31
G 36 29 9 19 9 15 0 0 15_4 37 15 38
T 47 63 35 13 30 32 0 154 0 42 38 50
Shown are the occurrence of nucleotides flanking the ATG codons (underlined) of the 154 selected ORFs (see text). The data do not account for genes which are translated from downstream and in-frame ATG codons (see text).
TABLE 5
CODON USAGE FOR THE 154 SELECTED ORFS OF ACNPV
aa CDN No. % aa CDN No. %
Ala GCA 402 19 Leu CTA 338 9
Ala GCC 642 30 Leu CTC 293 8
Ala GCG 671 31 Leu CTG 542 14
Ala GCT 420 20 Leu cττ 318 8
Arg AGA 393 21 Leu TTA 872 23
Arg AGG 156 8 Leu TTG 1457 38
Arg CGA 303 16 Lys AAA 2231 75
Arg CGC 562 29 Lys AAG 737 25
Arg CGG 167 9 Met ATG 1134 100
Arg CGT 334 17 Phe TTC 442 22
Asn AAC 1823 56 Phe 1605 78
Asπ AAT 1421 44 Pro CCA 300 18
Asp GAC 1407 57 Pro CCC 463 29
Asp GAT 1070 43 Pro CCG 515 32
Cys TGC 539 52 Pro CCT 344 21
Cys TGT 500 48 Ser AGC 654 25
End TAA 117 76 Ser ACT 410 16
End TAG 17 11 Ser TCA 282 11
End TGA 20 13 Ser TCC 276 11
Gin CAA 1093 70 Ser TCG 573 22
Gin CAG 475 30 Ser TCT 407 16
Glu GAA 1545 70 Thr ACA 517 22
Glu GAG 647 30 Thr ACC 535 23
Gly GGA 272 20 Thr ACG 770 33
Gly GGC 667 49 Thr ACT 491 21
Gly GGG 115 9 Trp TGG 317 100
Gly GGT 295 22 Tyr TAC 1105 55
His CAC 520 56 Tyr TAT 921 45
His CAT 412 44 Val GTA 508 18 lie ATA 822 30 Val GTC 4S2 18 lie ATC 590 22 Val GTG 1083 39 lie ATT 1286 48 Val GTT 678 25
SEQUENCE LISTING
(1) GENERAL INFORMATION:
(i) APPLICANT:
(A) NAME: Natural Environment Research Corporation
(B) STREET: Polaris House, North Star Avenue
(C) CITY: Swindon
(D) STATE:
(E) COUNTRY: UK
(F) POSTAL CODE (ZIP): SN2 1EU
(ii) TITLE OF INVENTION: AUTOGRAPHA CALIFORNICA NUCLEAR POLYHEDROSIS VIRUS DNA SEQUENCE
(iii) NUMBER OF SEQUENCES: 1
(iv) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Floppy disk
(B) COMPUTER: IBM PC compatible
(C) OPERATING SYSTEM: PC-DOS/MS-DOS
(D) SOFTWARE: Word Perfect 5.1
(V) CURRENT APPLICATION DATA:
APPLICATION NUMBER: GB unknown
(2) INFORMATION FOR SEQ ID NO: 1:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 133894 nucleotides
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: DNA (genomic)
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1:
Length: 133894 August 17, 1993 15:22 Type: N Check: 643
1 GAATTCTACC CGTAAAGCGA GTTTAGTTTT GAAAAACAAA TGACATCATT
51 TGTATAATGA CATCATCCCC TGATTGTGTT TTACAAGTAG AATTCTATCC
101 GTAAAGCGAG TTCAGTTTTG AAAACAAATG AGTCATACCT AAACACGTTA
151 ATAATCTTCT GATATCAGCT TATGACTCAA GTTATGAGCC GTGTGCAAAA
201 CATGAGATAA GTTTATGACA TCATCCACTG ATCGTGCGTT ACAAGTAGAA
251 TTCTACTCGT AAAGCCAGTT CGGTTATGAG CCGTGTGCAA AACATGACAT
301 CAGCTTATGA CTCATACTTG ATTGTGTTTT ACGCGTAGAA TTCTACTCGT
351 AAAGCGAGTT CGGTTATGAG CCGTGTGCAA AACATGACAT CAGCTTATGA
401 GTCATAATTA ATCGTGCGTT ACAAGTAGAA TTCTACTCGT AAAGCGAGTT
451 GAAGGATCAT ATTTAGTTGC GTTTATGAGA TAAGATTGAA AGCACGTGTA
501 AAATGTTTCC CGCGCGTTGG CACAACTATT TACAATGCGG CCAAGTTATA
551 AAAGATTCTA ATCTGATATG TTTTAAAACA CCTTTGCGGC CCGAGTTGTT
601 TGCGTACGTG ACTAGCGAAG AAGATGTGTG GACCGCAGAA CAGATAGTAA
651 AACAAAACCC TAGTATTGGA GCAATAATCG ATTTAACCAA CACGTCTAAA
701 TATTATGATG GTGTGCATTT TTTGCGGGCG GGCCTGTTAT ACAAAAAAAT
751 TCAAGTACCT GGCCAGACTT TGCCGCCTGA AAGCATAGTT CAAGAATTTA
801 TTGACACGGT AAAAGAATTT ACAGAAAAGT GTCCCGGCAT GTTGGTGGGC
851 GTGCACTGCA CACACGGTAT TAATCGCACC GGTTACATGG TGTGCAGATA
901 TTTAATGCAC ACCCTGGGTA TTGCGCCGCA GGAAGCCATA GATAGATTCG
951 AAAAAGCCAG AGGTCACAAA ATTGAAAGAC AAAATTACGT TCAAGATTTA
1001 TTAATTTAAT TAATATTATT TGCATTCTTT AACAAATACT TTATCCTATT
1051 TTCAAATTGT TGCGCTTCTT CCAGCGAACC AAAACTATGC TTCGCTTGCT
1101 CCGTTTAGCT TGTAGCCGAT CAGTGGCGTT GTTCCAATCG ACGGTAGGAT
1151 TAGGCCGGAT ATTCTCCACC ACAATGTTGG CAACGTTGAT GTTACGTTTA
1201 TGCTTTTGGT TTTCCACGTA CGTCTTTTGG CCGGTAATAG CCGTAAACGT
1251 AGTGCCGTCG CGCGTCACGC ACAACACCGG ATGTTTGCGC TTGTCCGCGG
1301 GGTATTGAAC CGCGCGATCC GACAAATCCA CCACTTTGGC AACTAAATCG
1351 GTGACCTGCG CGTCTTTTTT CTGCATTATT TCGTCTTTCT TTTGCATGGT
1401 TTCCTGGAAG CCGGTGTACA TGCGGTTTAG ATCAGTCATG ACGCGCGTGA
1451 CCTGCAAATC TTTGGCCTCG ATCTGCTTGT CCTTGATGGC AACGATGCGT
1501 TCAATAAACT CTTGTTTTTT AACAAGTTCC TCGGTTTTTT GCGCCACCAC
1551 CGCTTGCAGC GCGTTTGTGT GCTCGGTGAA TGTCGCAATC AGCTTAGTCA
1601 CCAACTGTTT GCTCTCCTCC TCCCGTTGTT TGATCGCGGG ATCGTACTTG
1651 CCGGTGCAGA GCACTTGAGG AATTACTTCT TCTAAAAGCC ATTCTTGTAA
1701 TTCTATGGCG TAAGGCAATT TGGACTTCAT AATCAGCTGA ATCACGCCGG
1751 ATTTAGTAAT GAGCACTGTA TGCGGCTGCA AATACAGCGG GTCGCCCCTT
1801 TTCACGACGC TGTTAGAGGT AGGGCCCCCA TTTTGGATGG TCTGCTCAAA
1851 TAACGATTTG TATTTATTGT CTACATGAAC ACGTATAGCT TTATCACAAA
1901 CTGTATATTT TAAACTGTTA GCGACGTCCT TGGCCACGAA CCGGACCTGT
1951 TGGTCGCGCT CTAGCACGTA CCGCAGGTTG AACGTATCTT CTCCAAATTT
2001 AAATTCTCCA ATTTTAACGC GAGCCATTTT GATACACGTG TGTCGATTTT
2051 GCAACAACTA TTGTTTTTTA ACGCAAACTA AACTTATTGT GGTAAGCAAT
2101 AATTAAATAT GGGGGAACAT GCGCCGCTAC AACACTCGTC GTTATGAACG
2151 CAGACGGCGC CGGTCTCGGC GCAAGCGGCT AAAACGTGTT GCGCGTTCAA
2201 CGCGGCAAAC ATCGCAAAAG CCAATAGTAC AGTTTTGATT TGCATATTAA
2251 CGGCGATTTT TTAAATTATC TTATTTAATA AATAGTTATG ACGCCTACAA
2301 CTCCCCGCCC GCGTTGACTC GCTGCACCTC GAGCAGTTCG TTGACGCCTT
2351 CCTCCGTGTG GCCGAACACG TCGAGCGGGT GGTCGATGAC CAGCGGCGTG
2401 CCGCACGCGA CGCACAAGTA TCTGTACACC GAATGATCGT CGGGCGAAGG
2451 CACGTCGGCC TCCAAGTGGC AATATTGGCA AATTCGAAAA TATATACAGT
2501 TGGGTTGTTT GCGCATATCT ATCGTGGCGT TGGGCATGTA CGTCCGAACG
2551 TTGATTTGCA TGCAAGCCGA AATTAAATCA TTGCGATTAG TGCGATTAAA
2601 ACGTTGTACA TCCTCGCTTT TAATCATGCC GTCGATTAAA TCGCGCAATC
2651 GAGTCAAGTG ATCAAAGTGT GGAATAATGT TTTCTTTGTA TTCCCGAGTC
2701 AAGCGCAGCG CGTATTTTAA CAAACTAGCC ATCTTGTAAG TTAGTTTCAT
2751 TTAATGCAAC TTTATCCAAT AATATATTAT GTATCGCACG TCAAGAATTA
2801 ACAATGCGCC CGTTGTCGCA TCTCAACACG ACTATGATAG AGATCAAATA
2851 AAGCGCGAAT TAAATAGCTT GCGACGCAAC GTGCACGATC TGTGCACGCG
2901 TTCCGGCACG AGCTTTGATT GTAATAAGTT TTTACGAAGC GATGACATGA
2951 CCCCCGTAGT GACAACGATC ACGCCCAAAA GAACTGCCGA CTACAAAATT
3001 ACCGAGTATG TCGGTGACGT TAAAACTATT AAGCCATCCA ATCGACCGTT
7IB95/00578
93
3051 AGTCGAATCA GGACCGCTGG TGCGAGAAGC CGCGAAGTAT GGCGAATGCA
3101 TCGTATAACG TGTGGAGTCC GCTCATTAGA GCGTCATGTT TAGACAAGAA
3151 AGCTACATAT TTAATTGATC CCGATGATTT TATTGATAAA TTGACCCTAA
3201 CTCCATACAC GGTATTCTAC AATGGCGGGG TTTTGGTCAA AATTTCCGGA
3251 CTGCGATTGT ACATGCTGTT AACGGCTCCG CCCACTATTA ATGAAATTAA
3301 AAATTCCAAT TTTAAAAAAC GCAGCAAGAG AAACATTTGT ATGAAAGAAT
3351 GCGTAGAAGG AAAGAAAAAT GTCGTCGACA TGCTGAACAA CAAGATTAAT
3401 ATGCCTCCGT GTATAAAAAA AATATTGAAC GATTTGAAAG AAAACAATGT
3451 ACCGCGCGGC GGTATGTACA GGAAGAGGTT TATACTAAAC TGTTACATTG
3501 CAAACGTGGT TTCGTGTGCC AAGTGTGAAA ACCGATGTTT AATCAAGGCT
3551 CTGACGCATT TCTACAACCA CGACTCCAAG TGTGTGGGTG AAGTCATGCA
3601 TCTTTTAATC AAATCCCAAG ATGTGTATAA ACCACCAAAC TGCCAAAAAA
3651 TGAAAACTGT CGACAAGCTC TGTCCGTTTG CTGGCAACTG CAAGGGTCTC
3701 AATCCTATTT GTAATTATTG AATAATAAAA CAATTATAAA TGCTAAATTT
3751 GTTTTTTATT AACGATACAA ACCAAACGCA ACAAGAACAT TTGTAGTATT
3801 ATCTATAATT GAAAACGCGT AGTTATAATC GCTGAGGTAA TATTTAAAAT
3851 CATTTTCAAA TGATTCACAG TTAATTTGCG ACAATATAAT TTTATTTTCA
3901 CATAAACTAG ACGCCTTGTC GTCTTCTTCT TCGTATTCCT TCTCTTTTTC
3951 ATTTTTCTCC TCATAAAAAT TAACATAGTT ATTATCGTAT CCATATATGT
4001 ATCTATCGTA TAGAGTAAAT TTTTTGTTGT CATAAATATA TATGTCTTTT
4051 TTAATGGGGT GTATAGTACC GCTGCGCATA GTTTTTCTGT AATTTACAAC
4101 AGTGCTATTT TCTGGTAGTT CTTCGGAGTG TGTTGCTTTA ATTATTAAAT
4151 TTATATAATC AATGAATTTG GGATCGTCGG TTTTGTACAA TATGTTGCCG
4201 GCATAGTACG CAGCTTCTTC TAGTTCAATT ACACCATTTT TTAGCAGCAC
4251 CGGATTAACA TAACTTTCCA AAATGTTGTA CGAACCGTTA AACAAAAACA
4301 GTTCACCTCC CTTTTCTATA CTATTGTCTG CGAGCAGTTG TTTGTTGTTA
4351 AAAATAACAG CCATTGTAAT GAGACGCACA AACTAATATC ACAAACTGGA
4401 AATGTCTATC AATATATAGT TGCTGATATC ATGGAGATAA TTAAAATGAT
4451 AACCATCTCG CAAATAAATA AGTATTTTAC TGTTTTCGTA ACAGTTTTGT 4501 AATAAAAAAA CCTATAAATA TGCCGGATTA TTCATACCGT CCCACCATCG 4551 GGCGTACCTA CGTGTACGAC AACAAGTACT ACAAAAATTT AGGTGCCGTT 4601 ATCAAGAACG CTAAGCGCAA GAAGCACTTC GCCGAACATG AGATCGAAGA 4651 GGCTACCCTC GACCCCCTAG ACAACTACCT AGTGGCTGAG GATCCTTTCC 4701 TGGGACCCGG CAAGAACCAA AAACTCACTC TCTTCAAGGA AATCCGTAAT 4751 GTTAAACCCG ACACGATGAA GCTTGTCGTT GGATGGAAAG GAAAAGAGTT 4801 CTACAGGGAA ACTTGGACCC GCTTCATGGA AGACAGCTTC CCCATTGTTA 4851 ACGACCAAGA AGTGATGGAT GTTTTCCTTG TTGTCAACAT GCGTCCCACT 4901 AGACCCAACC GTTGTTACAA ATTCCTGGCC CAACACGCTC TGCGTTGCGA 4951 CCCCGACTAT GTACCTCATG ACGTGATTAG GATCGTCGAG CCTTCATGGG 5001 TGGGCAGCAA CAACGAGTAC CGCATCAGCC TGGCTAAGAA GGGCGGCGGC 5051 TGCCCAATAA TGAACCTTCA CTCTGAGTAC ACCAACTCGT TCGAACAGTT 5101 CATCGATCGT GTCATCTGGG AGAACTTCTA CAAGCCCATC GTTTACATCG 5151 GTACCGACTC TGCTGAAGAG GAGGAAATTC TCCTTGAAGT TTCCCTGGTG 5201 TTCAAAGTAA AGGAGTTTGC ACCAGACGCA CCTCTGTTCA CTGGTCCGGC 5251 GTATTAAAAC ACGATACATT GTTATTAGTA CATTTATTAA GCGCTAGATT 5301 CTGTGCGTTG TTGATTTACA GACAATTGTT GTACGTATTT TAATAATTCA 5351 TTAAATTTAT AATCTTTAGG GTGGTATGTT AGAGCGAAAA TCAAATGATT 5401 TTCAGCGTCT TTATATCTGA ATTTAAATAT TAAATCCTCA ATAGATTTGT 5451 AAAATAGGTT TCGATTAGTT TCAAACAAGG GTTGTTTTTC CGAACCGATG 5501 GCTGGACTAT CTAATGGATT TTCGCTCAAC GCCACAAAAC TTGCCAAATC 5551 TTGTAGCAGC AATCTAGCTT TGTCGATATT CGTTTGTGTT TTGTTTTGTA 5601 ATAAAGGTTC GACGTCGTTC AAAATATTAT GCGCTTTTGT ATTTCTTTCA 5651 TCACTGTCGT TAGTGTACAA TTGACTCGAC GTAAACACGT TAAATAAAGC 5701 TTGGACATAT TTAACATCGG GCGTGTTAGC TTTATTAGGC CGATTATCGT 5751 CGTCGTCCCA ACCCTCGTCG TTAGAAGTTG CTTCCGAAGA CGATTTTGCC 5801 ATAGCCACAC GACGCCTATT AATTGTGTCG GCTAACACGT CCGCGATCAA
5851 ATTTGTAGTT GAGCTTTTTG GAATTATTTC TGATTGCGGG CGTTTTTGGG
5901 CGGGTTTCAA TCTAACTGTG CCCGATTTTA ATTCAGACAA CACGTTAGAA
5951 AGCGATGGTG CAGGCGGTGG TAACATTTCA GACGGCAAAT CTACTAATGG
6001 CGGCGGTGGT GGAGCTGATG ATAAATCTAC CATCGGTGGA GGCGCAGGCG
6051 GGGCTGGCGG CGGAGGCGGA GGCGGAGGTG GTGGCGGTGA TGCAGACGGC
6101 GGTTTAGGCT CAAATGTCTC TTTAGGCAAC ACAGTCGGCA CCTCAACTAT
6151 TGTACTGGTT TCGGGCGCCG TTTTTGGTTT GACCGGTCTG AGACGAGTGC
6201 GATTTTTTTC GTTTCTAATA GCTTCCAACA ATTGTTGTCT GTCGTCTAAA
6251 GGTGCAGCGG GTTGAGGTTC CGTCGGCATT GGTGGAGCGG GCGGCAATTC
6301 AGACATCGAT GGTGGTGGTG GTGGTGGAGG CGCTGGAATG TTAGGCACGG
6351 GAGAAGGTGG TGGCGGCGGT GCCGCCGGTA TAATTTGTTC TGGTTTAGTT
6401 TGTTCGCGCA CGATTGTGGG CACCGGCGCA GGCGCCGCTG GCTGCACAAC
6451 GGAAGGTCGT CTGCTTCGAG GCAGCGCTTG GGGTGGTGGC AATTCAATAT
6501 TATAATTGGA ATACAAATCG TAAAAATCTG CTATAAGCAT TGTAATTTCG
6551 CTATCGTTTA CCGTGCCGAT ATTTAACAAC CGCTCAATGT AAGCAATTGT
6601 ATTGTAAAGA GATTGTCTCA AGCTCGGATC CCGCACGCCG ATAACAAGCC
6651 TTTTCATTTT TACTACAGCA TTGTAGTGGC GAGACACTTC GCTGTCGTCG
6701 ACGTACATGT ATGCTTTGTT GTCAAAAACG TCGTTGGCAA GCTTTAAAAT
6751 ATTTAAAAGA ACATCTCTGT TCAGCACCAC TGTGTTGTCG TAAATGTTGT
6801 TTTTGATAAT TTGCGCTTCC GCAGTATCGA CACGTTCAAA AAATTGATGC
6851 GCATCAATTT TGTTGTTCCT ATTATTGAAT AAATAAGATT GTACAGATTC
6901 ATATCTACGA TTCGTCATGG CCACCACAAA TGCTACGCTG CAAACGCTGG
6951 TACAATTTTA CGAAAACTGC AAAAACGTCA AAACTCGGTA TAAAATAATC
7001 AACGGGCGCT TTGGCAAAAT ATCTATTTTA TCGCACAAGC CCACTAGCAA
7051 ATTGTATTTG CAGAAAACAA TTTCGGCGCA CAATTTTAAC GCTGACGAAA
7101 TAAAAGTTCA CCAGTTAATG AGCGACCACC CAAATTTTAT AAAAATCTAT
7151 TTTAATCACG GTTCCATCAA CAACCAAGTG ATCGTGATGG ACTACATTGA
7201 CTGTCCCGAT TTATTTGAAA CACTACAAAT-TAAAGGCGAG CTTTCGTACC
7251 AACTTGTTAG CAATATTATT AGACAGCTGT GTGAAGCGCT CAACGATTTG
7301 CACAAGCACA ATTTCATACA CAACGACATA AAACTCGAAA ATGTCTTATA
7351 TTTCGAAGCA CTTGATCGCG TGTATGTTTG CGATTACGGA TTGTGCAAAC
7401 ACGAAAACTC ACTTAGCGTG CACGACGGCA CGTTGGAGTA TTTTAGTCCG
7451 GAAAAAATTC GACACACAAC TATGCACGTT TCGTTTGACT GGTACGCCGT
7501 CGGCGTGTTA ACATACAAGT TGCTAACCGG CGGCCGACAC CCATTTGAAA
7551 AAAGCGAAGA CGAAATGTTG GACTTGAATA GCATGAAGCG TCGTCAGCAA
7601 TACAATGACA TTGGCGTTTT AAAACACGTT CGTAACGTTA ACGCTCGTGA
7651 CTTTGTGTAC TGCCTAACAA GATACAACAT AGATTGTAGA CTCACAAATT
7701 ACAAACAAAT TATAAAACAT GAGTTTTTGT CGTAAAAATG CCACTTGTTT
7751 TACGAGTAGA ATTCTACGTG TAACACACGA TCTAAAAGAT GATGTCATTT
7801 TTTATCAATG ACTCATTTGT TTTAAAACAG ACTTGTTTTA CGAGTAGAAT
7851 TCTACGTGTA AAGCATGATC GTGAGTGGTG TTAATAAAAT CATAAAAATT
7901 ATTGTAAATG TTTATTATTT AAAAACGATT CAAATATATA ATAAAAACAA
7951 TCTACATCTA TTTCTTCACA ATCCATAACA CACAACAGGT CCATCAATGA
8001 GTTTTTGTCT TTATCCGACA TACTATGTGC ATGTAACAAA TCAAATACAT
8051 CTTTTAAATT TTTATACACA TCTTTACATT GTCTACCAAA ATCTTTAATA
8101 ACCCTATAAC AAGGAAAAGA CTTTTCTTCT TGCGTGGTTT TGCCGCGCAG
8151 ATATTGAAAT AAAATGTGCA TGCACGACAA CTTGTGTTTA CTAAAATGCT
8201 CCTTGCCTAT ACCGCAAAAC CGGCCATACA TTTCGGCGAT TACACGCGGA
8251 CAATTGTACG ATTCGTCTAC GTGTAAACGA TCATCATAAT CACTCTTGCG
8301 CAAACGAATA AATTTTTTCA CCGCTTCCGA CAAACGAGGC ACCAATTCGG
8351 CGGGCACGCT TCGATACATT ATTCTGTGCA CATAAGTTAC CACACAAAAT
8401 TTATTGTACC ACCATCCGAC AACGTCGTTA TTAGGGTTGA ACACGTTGGC
8451 GATGCGCAGC AGTTTCCCGT TTCTCATGAA ATATTCAAAG CGGCCCAAAA
8501 TAATTTGCAA GCAATCCAAC ATGTCTTGAG AAATTTCTCG TTCAAAATTG
8551 TTCAAAGAGA ATATCTGCCA TCCGTTTTGA ACGCGCACGC TGACGGGAAC
8601 CACCGCATCG ATTTGCTCCA ACACTTCACG GACGTTATCG TCGATGCCCA
8651 TCGTTTCGCT GGTGCTGAAC CAATGGGAAA GGCTCTTGAT GGAATCGCCC
8701 GCGTCTATCA TCTTGACCGC TTCGTCAAAG GTGCAACTGC CGCTCTTCAA
8751 ACGCCGCATA GCGGTCACGT CCCGCTCTAT GCACGACATA CCGTTTACGT
8801 ACGATTCTGA TAGGTATTCC TGAACTATAC GGTAATGGTG ATACGACTCG
8851 CCATACACGT CGTGCACCTC ATTGTATTTA GCATAATAAT TGTAAATTAT
8901 TAACTTTGCA GCGAGAGACA TGTTGTCAGT AAAGCGGTGC TAGGCTCAAT
8951 AATACTGATG TACAGGCACG CGTGCTATTT ATATATAATT TCGCAAGGAG
9001 GGGAGCTGTT ATCGGTTGCT ATTATTAAAG AATGGCCGTC TGTTTTTATC
9051 ACAAGCTTGG CAGCCTCAAC CATGAAGCGT CGTCATTGTA AATTAAATTC
9101 TCTGCCTCAA GAATTATTTG ACAAGATTGT CGAGTATTTA TCTTTATCTG
9151 ATTACTGCAA TTTGGTGCTT GTCTGTAAAA GACCTTCTAG TAAATATAAC
9201 GTGATATTTG ATAGTACTAA TCACCAACAT TTGAAAGGCG TGTACAAAAA
9251 GACAGACGTG CAAATAACAA GCTACAACGA ATACATCAAC TGTATTTGCA
9301 ACGAACTGAG ACAAGACGAA TTCTATGCCA AATCATCATG GATTGCGAGT
9351 ATTTGCGGTC ACCAGAGAGC GACAATTTTT AGTGTAACAA ATAAACAAGT
9401 AGAAATGAAA TATCATTTGT ATAATATAGC AATTGTGGAA AGTGAAGATT
9451 GCAACGGATT TTACCCATTT GAGCCAACGC GCGATTGTTT AATATGCAAA
9501 CAAAAAAACC AATGTCCTCG TAATTCATTT ATTGTTTCGT TGTGTAAATA
9551 TTTAGAAAAA CAAAATGTAC AATCAAACTT TATATATTAT TTATACGAAA
9601 TAAATACATA ATAATAACTA TTATACATGT TTTTATTTTA CAATACTTCC
9651 TGTATAACCT CTCTAACTAC ATTAGGAGTA CAATCCACGT CAATTACACG
9701 TTTAGCTATT TTTCTAATTT TGTAATGTTT ATCGTAGAGT TTTTCGTTAA
9751 TACATTGAAT AGCCAACAAG GGATTTGGGT GCACACCGTC ATAGAGTACT
9801 TCCATGTCGT CTTCAAAGCG CATTTTTCGC TTGCGAAAAT GCCGCTCTTG
9851 GCCCAAAACA AAAGCGAGTT TGATGCGGTC GTCGATGCGT TCCGAAAATA
9901 CGGCCAAATG CTGGTGTTTG GTGATGTCGC GCGGAAACGT CACCGTGCCA
9951 TTTTTGCTTT CCGCCACGAC GGCGGTTTTC AATTTTTCGG CCGACTGCAG
10001 CATGTTAAGT TTGGCGTCGA GTTCGTGCAA ACGCAATTCA AACTGCTCAA
10051 ACCTGTTGCC CACCTCGTTC TTGAACGTCT CGTGGGTGAC CATAAATTTT
10101 TCGCTGTTTG CATTCAGTTT CTTTACATGT TTTAAAACAG ATTCAATCTT
10151 GTCGCGCAAA TCATCACGCT CGCCTTCAGT TTGAATGTGC AGCAACGCGT
10201 TGCTTTTGTT GGCAAAATTT AACCGCATCA AAATTTCCAA CAACCCGTGC
10251 TTGGTCGCGA ACAATGCGCC CAACGAGTTG AGATCGCGTT TGGATCTCTG
10301 TTTGTGAAAA ACAATTTCGT TTAAATGGTA AACTTGATCG CCGTCCCAAT
10351 TGCAATCAAG TATGTCGTCG TGCGCAATTT CAAGACCTTT GCAAAAATCT
10401 ATCACATTGT AGCATTTTGC GTTCGTGTCG CTGTGCACGT ATCTGTACTT
10451 GAAACTGTGC GTGTTGCATT TGAATGAGTC CCATTTAACG ATGTGCGACC
10501 ATTGTTGGGC GTTTATGTGG TACTTTTTGT AGTCGTCTGC ATTGAACCGA
10551 TCTTCGGCGG CGATGGCGTC GTTGTCGTTG TCACCGGACC ACATCCACCA
10601 GTTCCATAAC CAGGATAGCA TTGCTTTAGC TTGTCTAGCA ATTCCTTTGT
10651 TATACAACGA GAAAATTTCG TTCCCTTATA ATTATAGCTG TACGGTGCGC
10701 GTATTTGTTT GTTAACGTTA CAAAAAATAT CCCTGTCCAC GTCCGGCCAA
10751 TACTGCAACG TGAGCGCGTC CAAGTTTGAA TCTTGCATAT GCGGAACGTA
10801 CAAACGTACG GCCTCTCTCA CACAATGCGC AAAACTGCCC GGCTGAATGT
10851 AATCACTGTC CAACTTTGCA GGTTTCTCGA AAGCCTTGTA CCGATGCACG
10901 CGAACATTTT GAGCGGACGT GATTTTAAAC TTGTCGGTGA ATTTTAACCA
10951 CAAATGAAAT CCACGGTTGC CGGTATACAT GACTCTTGAC ACGTTCTCTT
11001 CCGTGTAAAA CAACAGAAAC GCCGTGGCGC CAATGTAAAT TTTCAGCATT
11051 AAATCGTGTT CGTCAACATA ATTTTTGTAA TCGGCGTCTA CGACCCATTC
11101 CCTGCCGCCG CCGTCGTCCA ACGGTTTGAC GTGCACGTCG GACACTTTGT
11151 TTTGCACAAT ATAACTATAC AATTGTGCGG AGGTATCAAA ATATCTGTCG
11201 GCGTGAATCC AGCGCGCGTT GACCGTCATG AACGCGTACT TGCGGCTGTC
11251 GTTGTACGCA ATGGCGTCCC ACATCATGTC GACGCGCTTC TGCGTATAAT
11301 TGCACACTAA CATGTTGCCC TTTGAACTTG ACCTCGATTG TGTTAATTTT
11351 TGGCTATAAA AAGGTCACCC TTTAAAATTT GTTACATAAT CAAATTACCA
11401 GTACAGTTAT TCGGTTTGAA GCAAAATGAC TATTCTCTGC TGGCTTGCAC
01320 PC17IB95/00578
99
11451 TGCTGTCTAC GCTTACTGCT GTAAATGCGG CCAATATATT GGCCGTGTTT
11501 CCTACGCCAG CTTACAGCCA CCATATAGTG TACAAAGTGT ATATTGAAGC
11551 CCTTGCCGAA AAATGTCACA ACGTTACGGT CGTCAAGCCC AAACTGTTTG
11601 CGTATTCAAC TAAAACTTAT TGCGGTAATA TCACGGAAAT TAATGCCGAC
11651 ATGTCTGTTG AGCAATACAA AAAACTAGTG GCGAATTCGG CAATGTTTAG
11701 AAAGCGCGGA GTGGTGTCCG ATACAGACAC GGTAACCGCC GCTAACTACC
11751 TAGGCTTGAT TGAAATGTTC AAAGACCAGT TTGACAATAT CAACGTGCGC
11801 AATCTCATTG CCAACAACCA GACGTTTGAT TTAGTCGTCG TGGAAGCGTT
11851 TGCCGATTAT GCGTTGGTGT TTGGTCACTT GTACGATCCG GCGCCCGTAA
11901 TTCAAATCGC GCCTGGCTAC GGTTTGGCGG AAAACTTTGA CACGGTCGGC
11951 GCCGTGGCGC GGCACCCCGT CCACCATCCT AACATTTGGC GCAGCAATTT
12001 CGACGACACG GAGGCAAACG TGATGACGGA AATGCGTTTG TATAAAGAAT
12051 TTAAAATTTT GGCCAACATG TCCAACGCGT TGCTCAAACA ACAGTTTGGA
12101 CCCAACACAC CGACAATTGA AAAACTACGC AACAAGGTGC AATTGCTTTT
12151 GCTAAACCTG CATCCCATAT TTGACAACAA CCGACCCGTG CCGCCCAGCG
12201 TGCAGTATCT TGGCGGAGGA ATCCATCTTG TAAAGAGCGC GCCGTTGACC
12251 AAATTAAGTC CGGTCATCAA CGCGCAAATG AACAAGTCAA AAAGCGGAAC
12301 GATTTACGTA AGTTTTGGGT CGAGCATTGA CACCAAATCG TTTGCAAACG
12351 AGTTTCTTTA CATGTTAATC AATACGTTCA AAACGTTGGA TAATTACACC
12401 ATATTATGGA AAATTGACGA CGAAGTAGTA AAAAACATAA CGTTGCCCGC
12451 CAACGTAATC ACGCAAAATT GGTTTAATCA ACGCGCCGTG CTGCGTCATA
12501 AAAAAATGGC GGCGTTTATT ACGCAAGGCG GACTACAATC GAGCGACGAG
12551 GCCTTGGAAG CCGGGATACC CATGGTGTGT CTGCCCATGA TGGGCGACCA
12601 GTTTTACCAT GCGCACAAAT TACAGCAACT CGGCGTAGCC CGCGCCTTGG
12651 ACACTGTTAC CGTTTCCAGC GATCAACTAC TAGTGGCGAT AAACGACGTG
12701 TTGTTTAACG CGCCTACCTA CAAAAAACAC ATGGCCGAGT TATATGCGCT
12751 CATCAATCAT GATAAAGCAA CGTTTCCGCC TCTAGATAAA GCCATCAAAT
12801 TCACAGAACG CGTAATTCGA TATAGACATG ACATCAGTCG TCAATTGTAT
12851 TCATTAAAAA CAACAGCTGC CAATGTACCG TATTCAAATT ACTACATGTA
12901 TAAATCTGTG TTTTCTATTG TAATGAATCA CTTAACACAC TTTTAATTAC
12951 GTCAATAAAT GTTATTCACC ATTATTTACC TGGTTTTTTT GAGAGGGGCT
13001 TTGTGCGACT GCGCACTTCC AGCCTTTATA AACGCTCACC AACCAAAGCA
13051 GGTCATTATT GTGCCAGGAC GTTCAAAGGC GAAACATCGA AATGGAGTCT
13101 GTTCAAACGC GCTTATGTGC CAGTAGCAAT CAATTTGCTC CGTTCAAAAA
13151 GCGCCAGCTT GCCGTGCCGG TCGGTTCTGT GAACAGTTTG ACACACACCA
13201 TCACCTCCAC CACCGTCACC AGCGTGATTC CAAAAAATTA TCAAGAAAAA
13251 CGTCAGAAAA TATGCCACAT AATATCTTCG TTGCGTAACA CGCACTTGAA
13301 TTTCAATAAG ATACAGTCTG TACATAAAAA GAAACTGCGG CATTTGCAAA
13351 ATTTGCTAAG AAAAAAGAAC GAAATTATTG CCGAGTTGGT TAGAAAACTT
13401 GAAAGTGCAC AGAAGAAGAC AACGCACAGA AATATTAGTA AACCAGCTCA
13451 TTGGAAATAC TTTGGAGTAG TCAGATGTGA CAACACAATT CGCACAATTA
13501 TTGGCAACGA AAAGTTTGTA AGGAGACGTT TGGCCGAGCT GTGCACATTG
13551 TACAACGCCG AGTACGTGTT TTGCCAAGCA CGCGCCGATG GAGACAAAGA
13601 TCGACAGGCA CTAGCGAGTC TGCTGACGGC GGCGTTTGGT TCGCGAGTCA
13651 TAGTTTATGA AAATAGTCGC CGGTTCGAGT TTATAAATCC GGACGAGATT
13701 GCTAGTGGTA AACGTTTAAT AATTAAACAT TTGCAAGATG AATCTCAAAG
13751 TGATATTAAC GCCTATTAAT TTGAAAGGTG AGGAAGAGCC CAATTGCGTT
13801 GAGCGCATTA CCATAATGCC ATGTATTTTA ATAGATACTG AGATCTGTTT
13851 AAATGTCAGA TGCCGTTCTC CTTTTGCCAA ATTCAAAGTA TTGATTATTG
13901 TAGATGGCTT TGATAGCGCT TATATTCAGG CTACCTTTTG TAGCATTAGC
13951 GATAGTGTAA CAATTGTTAA CAAATCTAAC GAAAAGCATG TAACGTTTGA
1 001 CGGGTTTGTA AGGCCGGACG ATGAAGGTAC AACAATGCCT TATGTCATTG
14051 GACCATTATA TTCTGTCGAC GCTGCTGTCG CCGACCGTAA AGTGAAGGAC
14101 GTGGTGGATT CAATTCAAAA CCAACAGACA ATGTTAAAAG TATTTATTAA
14151 CGAGGCTAAT GTGTATAACA AATGGAATAT GCTTAAAGGT TTAATTTATA
14201 ATAATAACAA TGAATCTGTT TTAGTAAAAT AATGTAGTAA AATTTATAAA
14251 GGTAGATAAA AATTATAATA TTAATAAAAA AAATAATGTT ACTAAATGGG
14301 TTCCTGCGTT AAATTATTTT ACGGGTAGAC AGCTATTAAC TATTTTATTT
14351 ATTTTTAAAT TTAAATAAAT GTATTGTTAG AAAATTGTGT TGTTTTATTA
14401 GTATAACGAA AAAATACATG ACATAAACCG CTTCCAATTT TGGTCACACA
14451 AACTCTTGTG TGGATAGTTT ACGTAATGAG TTAAATAGGC GGGCAGTTGT
14501 CCGCTAAACG TGTCGGTGGT CAAGTAGATG TGCATTAATT TACGACAACC
14551 CAAAGCGGGG CCGCTTATGT CAAGTATTTT TTTCACAAAA TTGGTAATGG
14601 TTTCGTTTTG TTCCTTGTAC AAACACATGT CGGTGTGATC GTTGACGCAC
14651 GAGTTGTACG ATTCCGCCGG CAGGTTGGCA AACAAGCGCT TGAGATGCTT
14701 GAGTCTGCGT TCAATTTTAT AATCAAACTT GTTGGTGAAA ATGTCTTTCA
14751 GCAAGCACAT TAACTGGTCG TTCAAAACGC GCTGCAACGA CGACACCAAC
14801 ACATGATATT CGTTTCCAAA AAGCGAAAAA TTTTTGATGC AGCGGTCCGC
14851 GTTGAAGGGT CGTTTCATAA TGCGCACGTT GACAAAAAAC ACGTTGAAAG
14901 ACAGCGGGGC TGTGGTTATT TTAACGCCGT TGTCGGTATA CTCGTCGACG
14951 CCGTCTGCGC TTGTTATGTC AATTTGTAGC GCAAATCTAA CCAAATCAAA
15001 CTCATCGTTG TACTGTGTCT TTATGCATTT TATATGGCGG TTTAAGTGCA
15051 AGTTGATTTG GCCGTTTAAT CTATAGGCTC CGTTTTGATA ACATTTCAGC
15101 ACTACCAACG GATCCGACAT GTAAACTTGA CGCGTTAGCA CGTCCAATTC
15151 AGCGTAATGT TGGTCGACGC ATTTTTGTAA ATTAGTTTGC AGGTTGCAAA
15201 ACATTTTTGC GCAAAAGCCG TAATAGTCAA AATCTATGCA TTTTAATGCG
15251 CTTCTGTCGT CGTCAATATG GCATGTCACG GCTGCGCCTC CAGTTAACAC
15301 GAATAAACCG CCGTTTTCGC AAACTACGGC TTCGAAACAA TCTTTGATAA
15351 ATGCCAACTT TGCTTTAGCC ACAATTTTAT CGCGCAGGCG ATCTTCAATA
15401 TCCTTTGTCG TAATATAAGG TAGGACGCCA AGATTTAGTT GATTCAACAA
15451 ACGTTCCATA ATGAATAGCG GCGACGCAAC ACGACTACAC TGTTCAAATG
15501 CGCACGCAAA ACAAACCCTT GCAACTTTAT TTGCCAATCG TAATCACAGT
15551 AGTTTTTACG AGTACGCCAT CGCGTTTGTA AGCACATTGC TTTTTAAAAA
15601 TAATTTAAAT TTAATGACCG CGTGCAATTT GATCAACTCG TTGATCAACT
15651 TTGAACTCAA CATGTTTGGT AAAAGTTTAT TGCTAAATGG ATTTGTTAAT
15701 TTCTGCATTG CTAACAGCGA CGGGGTTACG ATTCAACATA AAATGTTAAC
15751 CAACGTGTTA AGTTTTTTGT TGGAAAAATA TTATTAAAAA TAAATAAATA
15801 AACTTGTTCA GTTCTAATTA TTGTTTTATT TTTTATAAAA TAATACAATT
15851 TTATTTATAC ATTAATACTT TGGTATTTAT TAATACAATT ATTTACAATA
15901 CTTTATTTAC ACTATAATAC TTTATTTACA TTAGTACTAA ATTAATACTA
15951 AATTACGCTA ATACTAAATT AATACTTTAT ATAATCAAAA ATAATACTTT
16001 ATATAATACT TTCTAATCAT CATAAACGGG TAATAGTTTT TTCTCTTGAA
16051 ATTTACGCTG CAACTCTTCG CTAAAACACA TGGGCGGTGG AGTGGGAGCG
16101 GGTGGAGTAG GAGTCCTTAC GGGTTTGATG GGCGACAGTT CTCTGGACTT
16151 GCGGAACAGC TTGGGCGAAA ACGTCGGCGT GCGCCGACTA ATGATTTCTT
16201 CATCGCACGA GGCGTCGCAC ATTGTGCACG CGTCCGGTGA GGTACACAAA
16251 ACTTTCTTGG GCACGCTGTA CACCGGCTTG GGCACGCTAT ATGTGTTGCC
16301 AAAACTAGAA CTCGTTGTGG TTGCCGAACG GAGACGATGG GTGTGAAGAC
16351 GGCGATGGCT GTGAAGACAA GTCCGAAGGC GCGATAAAAG ATGAAAGTGT
16401 TTCTGAAACC GAAGTGGTGG TAGAAGTGGT AGAAGGCGGG TGCGTTACGG
16451 CAACCACGCT GCTGCTATTT CTGCCTTCGG AGACCACTTC CAGCAATCTA
16501 GAGTTACTCT CTCGTTCTTC GCGGCGATAG TCAATGTCGC AATAATGTTC
16551 ATAAGATGCC TTTTCGGCTT CGGCGCGCCT TTTCATGTAT ATGTTGTGAC
16601 GCATCTCCTT TAACTGCACG TACAAATTCC AGCATTGCAC AGCCAGTATC
16651 GTAAGCACGC CCATTATGAT TACGGGATAA TTTTGATTAA ACACGGTCGG
16701 CTCGTGATCG CTTACAATCG CTCGGCACAT GATGCATTTT TTGTAAATGT
16751 TCACATACAC ACAGTTTTGG CTCAAGGTTT CGGTATTTGC GTAGTCAATT
16801 TCCAGATACA CGATAGAGTT CCAGCACATT GATTCCAAAT CGTAGTGACG
16851 ATATAAAACA TCTAGCGCCG GTAGATGACC ATTTTTGAAC ACGTAGATTT
16901 GAAACGCGGC AAACAGCATC CAACACAGCC CAGTGATCAC GTTTACCATA
16951 ATACACGTGA TAGCGACGTA AAAGTTTTCT TTCGCATTGA AATTTACATT
17001 TGTGTTTGAA GAGCTGCTGC GATTTTTCGT CCACACGATA ATCTTCCATA
17051 TAAAATAAAA CATGTAAAAT AATATCCACA TGCCGAACGC CAGCATTATC
17101 GGTATAGATA GATTGATAAC CGATTGCTTT CCTTCAATTT CCAGCAAAAA
17151 CGCGTATCTG CTGTCTATCA CTCCCATTAT AGATAACACA AACACTATCA
17201 GATATGCTAA TAATAATGAG GCATTAAGCC CGAATTGTAA AACTGCAGTG
17251 ATTTTATTTA ACATTTTGAA TATTTAATTC AACAACTAAG TAATGGCAAT
17301 ATGTATCGAG TACTGATCGT GTTTTTCCTG TTCGTGTTTC TTTATATAGT
17351 GTACCAGCCC TTTTATCAGG CATACTTGCA TATCGGACAT GCCCAACAAG
17401 ATTACAATGA CACGTTGGAC GATAGGATGG ATTACATTGA ATCCGTAATG
17451 CGTAGAAGGC ACTACGTGCC GATTGAAGCG TTGCCCGCAA TCAGGTTTGA
17501 TACTAATCTC GGCACGTTGG CCGGTGACAC GATTAAATGC ATGTCGGTGC
17551 CTTTGTTTGT TAGTGACATT GACCTGCCGA TGTTTGATTG TAGTCAGATA
17601 TGCGATAACC CGTCTGCGGC GTATTTCTTT GTCAACGAAA CGGATGTGTT
17651 TGTGGTCAAC GGCCACAGAC TGACGGTGGG CGGATACTGC TCCACTAATA
17701 GTTTGCCCCG CAACTGTAAT CGCGAGACGA GCGTCATTTT AATGAGTCTC
17751 AATCAGTGGA CGTGCATAGC CGAGGACCCG CGTTACTATG CGGGCACAGA
17801 TAACATGACG CAACTCGCAG GCAGACAACA CTTTGACCGC ATTATGCCCG
17851 GACAGAGTGA TAGGAACGTC CTGTTTGACC GATTACTAGG CCGAGAGGTG
17901 AACGTGACCA CTAACACGTT TCGCCGCAGC TGGGACGAGT TGCTGGAGGA
17951 CGGCACTAGG CGGTTCGAAA TGCGCTGCAA CGCCCGAGAT AACAACAATA
18001 ATCTCATGTT TGTTAATCCG CTTAATCCCC TCGAGTGTCT CCCGAACGTG
18051 TGCACTAACG TTAGCAACGT GCACACCAGT GTTAGACCCG TATTTGAAAC
18101 GGGAGAGTGT GACTGCGGCG ACGAAGCGGT CACGCGTGTT ACGCACATTG
18151 TGCCGGGGGA CAGGACCTCT ATGTGTGCCA GCATTATAGA TGGCCTGGAT
18201 AAAAGTACGG CATCATATAG ATATCGCGTA GAGTGCGTTA ATCTGTACAC
18251 CTCTATTCTA AATTATTCTA ATAACAAATT GTTATGTCCC AGTGACACTT
18301 TTGATAGTAA CACGGACGCA GCTTTTGCCT TTGAAGTGCC CGGCTCCTAC
18351 CCTTTATCGC GCAACGGCAT CAACGAGCCA ACTTATCGCT TTTATCTTGA
18401 TACCAGATCT CGAGTTAATT ACAATGACGT CAGAGGGCAG TTATCTTAAT
18451 TGTGATAACA CAAACAATAA GTCATTTAAA TGTTACGTCA GTAGTTAGTA
18501 TATAAGCCGT ACATGTTGGC TTGCAAATTC AGTCAATATC AGGCTTTTAT
18551 CATGGACGGT GTAAAGCTGC TAGGGACGTG CGCGCTAATA ATTTTGTTAT
18601 CGACGACGAG TACAGTTGTC GGGCGTGACC GTATCACGTT TACGCCGATA
18651 GAAGATAGCG CAGGCCTCAT GTTTGAACGC ATGTACGGCT TGCGACATCA
18701 TACAGACGAC AGATTTGTGT TTGTGAAAAA ATTCAATTTT GTTTCGGTGC
18751 TGCAAGAGCT CAATAATATC AAATCTAAAA TTGAATTATA TGAAGCGCAA
18801 GTTTCAACTT GCACAAACGT CAGACAAATA AAACAGAACA GATCGAGTAT
18851 CATCAAAGCT CGCATTGAAA ATCAGCTGCA GTTTTTGACG CAACTAAACA
18901 AAAATCTCAT CACATACTCT GTGGAAAGCA GCATTTTAAG CAACGACGTG
18951 CTGGACAACA TCGATCTGGA ATATGACGAC AGCGGTGAGT TTGACGTTTA
19001 CGACGAATAC GAACAGCCTT CGCATTGGAG CAACATGACT GTATCCGACG
19051 CGCAAGCTTT GCTCCGAAAC CCGCCCAAAG ACAGAGTAAT GTTTTTGGAC
19101 ACGGTTACCA CCAGCGACGT GAGCAGCAAA TACGAAGAAT ACATAAACTG
19151 CATTGTGAGC AACCGTACCG TTGAAAACGA GTGCATGTTT TTAGCCAACA
19201 TGATGAACGT GCTCAACGAC AAATTGGACG ACGCAGCAGC TTTGGCCAAG
19251 ATGCTGGAGC GAATAGTAAA ACAAACGCGA AAGAACAAAC TCAACATCTC
19301 CAACACGGTT ATAGACGACG ACACGCTGCT AACGGAAATG AAAAAATTAA
19351 CACAAACTTT ATACAACCAA AACCGCGTGT GGGTAGTGGA TTTTAACAAG
19401 GACATGAATA GTTATTTCGA TTTGTCGCAA GCGTATAAAT TGCATTTATA
19451 TGTTGATTTA AACACGGTCA TTATGTTTAT TACCATGCCA TTGTTAAAAT
19501 CCACCGCCGT TTCGTTTAAT TTGTATCGCG TCATGACGGT GCCTTTTTGC
19551 AGGGGCAAAA TGTGTCTGCT TATCATTTCG GGCAATGAAT ACTTTGGGAT
19601 TACAGACAGC AAAAACTATT ATGTGCCCGT ATCTGATAAC TTTAGACAAG
19651 ATTGCCAAGA GTTTACGGGC TACAATGAGT TTTTGTGTCC CGAAACTGAG
19701 CCGATTGCCA CTATGAACTC GAAAGTGTGC GAGATTGAAA TGTTTATGGG
19751 TCGATATAGC GACGACGTGG ACAACATGTG CGACATTAGG GTGGCCAATT
19801 ATAATCCCAA AAAAGCTTAC GTGAACACTT TAATAGACTA CCGAAAATGG
PCIYIB95/00578
105
19851 TTGTACATTT TTCCAAACAC GACCGTGTCC GTCCACTATT ATTGTCACGA
19901 CGCGCTTGTA GAAGTTGATA CAAAAGTTTC GCCCGGCGTT GGTGTTATGT
19951 TTTCGACTAT GGCGCAAACG TGTTCGATTA GAATAACGTA TGATGTGACC
20001 ATAACTGTAG ATTCGCGATT TTATGTCAGC CATTCAACTA CATACTGGCC
20051 TAAAAAGAAA TTTAATTTTA ACAACTACAT CGACCAAATG TTGCTTGAAA
20101 AAGCGACCAC CAGTTTTATA CCGACTGTTG ACAATTTTAC CCGGCCCGTT
20151 TTATTGCAAC TTCCTCATAA ATTTCACATT AAAGATTACA CATCGACGCC
20201 CCATCATTTT TTCCATCAGT CTAAAATTTA CACCAACAGC GCGGCGCCCG
20251 ACGAAGACTC GCAAGACGAC AGTAATACCA CCGTGGTTAT TATCGCTATT
20301 GTCGCTGCAA TGATCCTATT CTGTGGATTA TTGTTATTTT TGTTTTGCTG
20351 TATAAAAAAA CGGTGTCATC AATCAAATAA CGTGGTTGTG CAATACAAAA
20401 ATAACAATGA ATTTGTCACA ATTTGCAATA ATTTAGAAGA CAATCGAGCA
20451 TACATTAATT TACCTAATGA ATACGATAGC GATGATATGC CAAAACCATT
20501 GTACCCTTTA CTTGGCTTTA ATGATGATTT GTTAAAAGAT GATAAACCTG
20551 TGTTGTACCC TATGATTATA GAAAGAATAA AATAAAACAT GTATAATTGA
20601 AATAAATATA TTATTTAATA AAATGTTTTT TATTTATATA CTATTTTCTA
20651 TTACATATTC CAATGCACAC AAATGTTTAA TGGCTATCAG TTTTAATTTT
20701 ACTAATTCGT CTAAACAAAA ATTATTCACT TGCTGTTTTT CATCCATTTG
20751 ACATATGGCG TTTATAAATA ATTCGCTGTG TTTTATGAAC GAATCGTAAA
20801 CCGCTGCCTG GGCCTTCAGC ACGGTCGGCG CATTGTATTT TTGGGTAAAG
20851 TACGCAATAT TTTTAGTCAA ACACAGAGAT TTTAAATCTT TTTCATTTAT
20901 ATCCAAGTCG GAACAATCGT ATACAAAATC TAGCTTTTCA CTTTCGGGCG
20951 CGCCCAGATA CTGGTTTACG AGTTCGAGCT GCTCCACTTG GCCTTTGATA
21001 TCGGCCGCTA TGCACAACAT TTTGTCGATT GCAGTTTCAT TGTTTTTAAC
21051 ATAATAATTT TTAACTTTTT TATTTTGCAA TTTAATCAAA CTATTTAAAT
21101 TCGCTTGACC TTTCTTACAA AGCGCAGTTA ATATGCAAGA CATTTTGACT
21151 TATAATAAAA AACAAAACTT TTATATATTC ATTTATTGTT CAATAATAAC
21201 AAATATTCCA GGCTTAAAAG CTAACGAATA GGGCTTTTCG GTAATTTTCT
21251 TATTATTCAT GTCCGTCATC TGCATCTCTT TGCCGTACTT GACGCCGTCA
21301 ATGGTGCCCA TCATGTACAT TTTAATCTCC TCCGAAGGTC CGTCTATTTT
21351 GTCCATTTCG AACAATCTAT CAAAATCTTC AACGCTCATT CTCTGCATAT
21401 CAAGAGGAAC GTTTCTGATC TTTCCGGTGG CGTAAATTGA TCCGTTGTTG
21451 TCACGGTTGA TTATGTAAAA CCGACGAATC AACATGTCGC GCTCGCTAGT
21501 TTTGTTCTTA TCCGGCAAAT GAATGCACAC GTTTGGTTCC ATCTTCAAAG
21551 GAAAATCGCT TTGCAAGTGT TTTTGCAAAA TGTTGCCAAA TATATTGTTG
21601 TGTTTGTGAA TGTCTCCGTA TTGAATGCTA AAAAACTGGC CAAAGTTGCT
21651 TTTGGCACGT TTTATGGTTC CAAAGTCGGA AAACCAAAAT CCGCAGGGCT
21701 TGCCCTGCAC TCTTGGACCG ATGGTGTACG TAGTCTTGCC GTTGGCCGGC
21751 TCCAACACCA CGATATTTTT ATCGGGCTCG GGATACAACT TGTCTTCCCA
21801 TTCGTGCAAA CTGTTCAAAT TAGACAGTCG ACAAAATTCG TTTTTCAAAA
21851 ATCTGCCTTC GAAACAACTA CAATTCAGTA TTGAAAAGTT GCCTCGTTTC
21901 ACATTAATCG CCATCTGCTC CTGCCACAAC ATCTTCGTCA ACTCGTGTGG
21951 CTCCAATTGA ATGGACGACG GCGTAAAATA GCACATTACG CCCGTTTCGT
22001 CGTGTTTCAC GTTAAAAGCG CCGCTGTTGT ACGGCACCAG CTGCTGGTCC
22051 TCACCACCTT CCGATCTTTC CCGCTTCGGC TGGTTGTCGT CGCTGCTCGA
22101 ATATCCATCG CCAATCTTGC GTTTAGTTGC CATGCTACCG ACGTGCGCTG
22151 TCTGCTGTGG TTCAAGTCTA ATTGAAGTGT TTCACAGAAT ATAAGATATA
22201 TAATAAATAT GGACGACTCT GTTGCCAGCA TGTGCGTAGA CAACGCGTTT
22251 GCGTACACTA CTGACGATTT ATTGAAAAAT ATTCCTTTTA GTCATTCCAA
22301 ATGCGCCCCT TTCAAGCTAC AAAATTACAC CGTTTTGAAG CGGTTGAGCA
22351 ACGGGTTTAT CGACAAGTAT GTGGACGTGT GCTCTATCAG CGAGTTGCAA
22401 AAGTTTAATT TTAAGATAGA TCGGCTAACC AACTACATAT CAAACATTTT
22451 CGAGTACGAG TTTGTAGTTT TAGAACACGA TTTGTCCACA GTGCACGTCA
22501 TTAACGCCGA AACAAAAACC AAACTGGGCC ATATAAACGT GTCGCTAAAC
22551 CAAAACGACG CAAACGTGCT CATTTTGACC GTAACTTTAA CGAGCTAAAA
22601 TGAACGAGGA CACGCCCCCG TTTTATTTTA TCAGCGTGTG TGACAACTTT
22651 CGCGACAACA CCGCCGAACA CGTATTCGAC ATGTTAATAG AAAGACATAG
22701 TTCGTTTGAA AATTATCCCA TTGAAAACAC GGCGTTTATT AACAGCTTGA
22751 TCGTTAACGG GTTTAAATAC AATCAAGTTG ACGATCACGT TGTGTGCGAG
22801 TATTGCGAAG CAGAAATAAA AAATTGGTCC GAAGACGAGT GTATTGAATA
22851 TGCACACGTA ACCTTGTCGC CGTATTGCGC GTATGCTAAC AAGATCGCCG
22901 AGCGTGAATC GTTTGGCGAC AACATTACCA TCAACGCTGT ACTAGTGAAA
22951 GAAGGCAAAC CCAAGTGTGT GTACAGATGC ATGTCCAATT TACAGTCGCG
23001 TATGGATACG TTTGTTAACT TTTGGCCTGC CGCATTGCGT GACATGATTA
23051 CAAACATTGC GGAAGCGGGA CTTTTTTACA CGGGTCGCGG AGACGAAACT
23101 GTGTGTTTCT TTTGCGACTG TTGCGTACGT GATTGGCATA CTAATGAAGA
23151 CACCTGGCAG CGACACGCCG CCGAAAACCC GCAATGTTAT TTTGTATTGT
23201 CGGTGAAAGG TAAAGAATTT TGTCAAAACT CAATTACTGT CACTCACGTT
23251 GATAAACGTG ACGACGACAA TTTAAACGAA AACGCCGACG ACATTGAGGA
23301 AAAATATGAA TGCAAAGTCT GTCTCGAACG CCAACGCGAC GCCGTGCTTA
23351 TGCCGTGTCG GCATTTTTGC GTTTGCGTTC AGTGTTATTT TGGATTAGAT
23401 CAAAAGTGTC CGACGTGTCG TCAGGACGTC ACCGATTTTA TAAAAATATT
23451 TGTGGTGTAA TAAAATGGTG TTCAACGTGT ACTACAACGG CTATTATGTG
23501 GAAAAAAAAT TCTCCAAGGA GTTTTTAATT CATATTGCGC CTGATTTGAA
23551 AAACAGCGTC GACTGGAACG GCAGCACGCG CAAACAGCTG CGCGTTCTAG
23601 ACAAGCGCGC CTACAGGCAG GTGTTGCACT GCAACGGCAG ATACTACTGG
23651 CCCGATGGCA CAAAGTTTGT CTCTCATCCG TACAACAAAT CTATTCGCAC
23701 GCACAGCGCA ACAGTCAAAC GGACCGACAG CTCGCATCGA TTAAAAAGCC
23751 ACGTGGTCGA CAAACGACCG CGCCGCTCTT TAGATTCTCC TCGCTTGGAC
23801 GGATATGTTT TGGCATCGTC GCCCATACCA CACAGCGACT GGAATGAAGA
23851 ACTAAAGCTG TACGCCCAGA GCCACGGCTA CGACGACTAC GACGACAATT
23901 TAGAAGATGG CGAAATCGAC GAACGTGACT CTTTAAAAAG TTTAAATAAT
23951 CATCTAGACG ACTTGAATGT ATTAGAAAAA CAATAAAACA TGTATTAAAA
24001 ATAATAATAA TAAAACTATA TTTTGTAATA TATAATGTAT TTTATTTAAA
24051 AATTGTCTAT TCCGTAGTTG AGAAAGTTTT GTCTTGACTT CATAACTCTC
24101 TTCTCCATAT TCTGCAGCTC GTTTACGTTT TTTGTGACGC TTTTAATTTT
24151 CTCAAAATGC TGGCTGTCAA TAGTTATTTT TTGCTTTTGT CTATTAATTT
24201 CTTCCAATTG AGATTTTAAA TCTCGCTGAG ATTGAGATGC GTTGTAATTC
24251 CTTGAGAACA TCTTGAGAAA ACATACAGAT GAGGTAAAAC AGCATCTTTT
24301 ATCCAAATTA GGAGTTAATT ATTATTCATT TGTATCGCGA CCATTTGCTC
24351 GTACACATCT TCCATAAAAT GGTTATTTTT ATTGCGATAA GTGTTGGCAT
24401 TGACATTTTG CAAATGTCGT AGGTTAAAGG GGCAAATGGG CTGCGTGGCC
24451 GATAAAAGAT TCCAGTTCAA CAATCCCTCT TCGCCCCCGT TTAACTTGAA
24501 AATGGCGCTA CACGTTTCTA CGCTATCGTG TTCCTGTTGA GTGGCGCACG
24551 GTTCGACCAG TATCATCTTG TGATATGCGG TTTTGACATT CATGTGCAAC
24601 GGAATAACTT GCGGGTCATC GCATTCGTCG GAATTAAGCT TTAAATGGCG
24651 TCCGTATGCT TTCCAAAGTT TTTCGTCGTC GAACCGCGGC ACTGCTTGCA
24701 AGTCGACGCG GGGAAACGGC GCTCTGTACA AAACGCCTAA ATTCAAAAAC
24751 TGATTGCATT GTTGCAGCTC TGTCCAATCG ACGCGATTTT TGTAATTTTG
24801 AAACAGCATC AGGTTGAACG CCGCGCTGGC GCGCACGTTT GTAATCACTG
24851 TGTAATTGAT CAGCTTGTGC CAATACTGGG CATTGAAATT TTCTTCAAAC
24901 TCATTTCTAA ACTCTGGATG CGCAAACATG TGTCTAATGT AGTACGCGGG
24951 CGGGGCGTTG AACGCAGTCC ATTTGTCAAT ACACTTCCAG TCTGAATGTA
25001 ACGTGTTCAC CAAACCGGGA TATTCGTCAA ACACGAGCAT GTGATCCGAC
25051 CACGGTATGC TGTGGGCGAT CAATTTTAGT TCTTGCACGC GGCCTTCGCG
25101 TAAGCAATAC AAAATGAGCG CGTCGCTGAT CTTGACACAG TCTTGCATGT
25151 ACGCGGACAA ATTAACGTTT TCCATACAGC TCACATTGTT TATTAGCGCC
25201 GTGTTCAAGT GTTTGTATTT GGACACATAA TCGTAGTTGA TGTACTGTTT
25251 AATGGGTTCT TGAAACCATT CTTTTAGTAG TATGTGACTG GCCACTATGC
25301 GTTTCCAATT TAATTTGTGT GCGTATTTTT GCTGCACCGA CAACGAGAGG
25351 TTATTGTAAT TTTTGGATAT TTCTTCCATG TCCAACAAGT CCCCAAACGC
25401 GAGTATAAAA" TCTTGCGTCA AAAATTTTTG CTCAGACACC AACGACCAGA
25451 TCAAATGTGA TTTAAACCTG TTGGCGATTG TTATCGACAA CGGCGAAATT
25501 GAAATAATTT TCCAATCCAA CTTGTTGCGA AACACGTGAA TAAAATCGAC
25551 GCGTCCGTAA CATTCGCGCG ATATGCGCTT CCAAAACGTG TCATCTTGCA
25601 AATTAAGCAA ATAGACACGA TTGTTGGGAG ATTTGACGGC CAATTCAATT
25651 ATTTTTATAT ATTCTTTTTG CTTTAAAGCG CGTTGTAGCA CTTGGGTTGG
25701 AGCCATGTCG ACTGAAGCTC CACGCTGTTT GAAGCAAGGT GACCGTTTTG
25751 GTCGGCATGT TCAAACGTCG ATTACATGTT TGCTTTGCAT CAAAATGGCG
25801 TAATTAATTA AGAAACAACA TGAAAGCCAT CTGCATCATT AGCGGCGATG
25851 TTCATGGAAA AATTTATTTT CAACAAGAAT CAGCGAATCA ACCGCTTAAA
25901 ATTAGCGGCT ATTTGTTAAA TTTGCCTCGA GGTTTGCACG GCTTTCACGT
25951 GCACGAATAT GGCGACACGA GCAACGGTTG CACGTCGGCC GGTGAGCACT
26001 TTAATCCCAC CAATGAGGAC CACGGCGCTC CCGATGCTGA AATTAGGCAT
26051 GTTGGCGACT TGGGCAACAT AAAATCGGCT GGCTACAATT CACTGACCGA
26101 AGTAAACATG ATGGACAACG TTATGTCTCT ATATGGCCCG CATAATATTA
26151 TCGGAAGAAG TTTGGTCGTG CACACGGACA AAGACGATTT GGGCCTTACC
26201 GATCATCCGT TGAGCAAAAC AACCGGCAAT TCTGGCGGCC GTTTGGGATG
26251 CGGAATAATT GCCATATGTA AATGATGTCA TCGTTCTAAC TCGCTTTACG
26301 AGTAGAATTC TACGTGTAAA ACATAATCAA GAGATGATGT CATTTGTTTT
26351 TCAAAACTGA ACTCAAGAAA TGATGTCATT TGTTTTTCAA AACTGAACTG
26401 GCTTTACGAG TAGAATTCTA CTTGTAACGC ATGATCAAGG GATGATGTCA
26451 TTTGTTTTTC AAAACCGAAC TCGCTTTACG AGTAGAATTC TACTTGTAAA
26501 ACATAATCGA AAGATGATGT CATTTGTTTT TTAAAATTGA ACTGGCTTTA
26551 CGAGTAGAAT TCTACTTGTA AAACACAATC GAGAGATGAT GTCATATTTT
26601 GCACACGGCT CTAATTAAAC TCGCTTTACG AGTAAAATTC TACTTGTAAC
26651 GCATGATCAA GGGATGATGT ATTGGATGAG TCATTTGTTT TTCAAAACTA
26701 AACTCGCTTT ACGAGTAGAA TTCTACTTGT AACGCACGCC CAAGGGATGA
26751 TGTCATTTAT TTGTGCAAAG CTGATGTCAT CTTTTGCACA CGATTATAAA
26801 CACAATCAAA TAATGACTCA TTTGTTTTTC AAAACTGAAC TCGCTTTACG
26851 AGTAGAATTC TACTTGTAAA ACACAATCAA GCGATGATGT CATTTTAAAA
26901 ATGATGTCAT TTGTTTTTCA AAACTAAACT CGCTTTACGA GTAGAATTCT
26951 ACGTGTAAAA CACAATCAAG GGATGATGTC ATTTACTAAA ATAAAATAAT
27001 TATTTAAATA AAAATGTTTT TATTGTAAAA TACACATTGA TTACACGTGA
27051 CATTTACGAT GGCGAACAAT AATTTCACTT TTTATATTAG GACACGACGT
27101 GTATATAGGA AAGCTTAAGC GTTTCAATAA AGCCATGGCG TACACGCTAA
27151 GCTTGCCCAG CTTGCGGCTC TTTGAAATCT GTAGTTTTCG GGGAGTACCG
27201 TCGTTCTTCA GTGCCACATA CGTCAACTTG CGATCGTACA CTTTATAATA
27251 CGTGTTGTAG TTATTTTTTT CCAGAAATTC CCTCATAAAG CAATCCTTGG
27301 ATAAAGTTTT TGATCCGTAC AGTTGGCCAC ACCGGTCCAT GCACAGGTAC
27351 ACACACGTGA TGGCGTTTTG AATGACGATG CGATTTCTGT CAACGGCAAC
27401 GCGCTTGAAT ATGGTGTCGA CGTTGTCCGA TTCAATGGTT CCGTAAACAG
27451 CTCCGTCTGG ATTTACTGCC AAAAACTGCC GGTTAATAAA CAGCTGGCCG
27501 GGAATAGACG TGCCCGTGAT GTGTGTCAGC AGAGCTGAGC AGTCAGCCAT
27551 AGAGGCTAGA GCTACAAGTG CCAGCAAGCG ATACATGATG AACTTTAAGT
27601 CCCCACAGCA AACTGGCGCT TTTATATAAA AATTTGGGCC ATTTTTGGCG
27651 ATTAGATAAT TTTTGAAGAT TAGATAATAT TGAGATTAGT TAATAATTTG
27701 TGTGATTAGA TAACTTTTTA GGGTATTGCG CATTATAAAT CAAGGTCGAG
27751 TTGTATAAAC TGCTCTGGCG TGTAAAACTG CAGACTTAAG TTTTTTGCAA
27801 ACACTCGGTC TGAATCGCTA AAATCTTTCT GACCGGTGGT TAGATTAATT
27851 CGGCCAGCCG CGTCGCCCAC ATAAAAAGAT TGTTCCTTGT CAATATGCGT
27901 AAACTGTTTG GCCATCTCGC GCCACATTCC CGTGTCGGGC TTTCGATGCT
27951 CATCCTTGTT GGGCGACACA TAAAACGATA TGGGCACGCC AGTAGCTTTT
28001 TTAATATTCT CTAATTTATA TAATAAATCG CTCGCTTTGA TTTTGCCGGA
28051 ACCTAAATGG GCTTGGTTCG TAAAAACAAC TAAATCGTAG CCTAATTCGT
28101 ACAAACGCTT TAGCTTGTGT GCGCACGGAA GGAGCTGCCA GTCGTCTGGG
28151 TTTTTTGGAA ATTTGGACCG TGTCTTTGAG CTAATTAGCG TGCCGTCCAA
28201 ATCAAAAGCC GCAATTTTGG TTCTTTTAGC GCCGTCATGA ACCGCGTaCG
28251 CATACAAATC GGGCTGCTGT AACGTCCACA TGGTGAATGC ATCTTACTCA
28301 AAGTCCATCA ATTCGTACGC GTTTGTGTCC AGGTCGGGCG TTGAAAAATT
28351 GTAGCTTGCC ATTAGATCGG ATAGCGATTC AAATTTTGTA AGCGTTTGTA
28401 GCGCACGTTT GGCATCTTGT TTAAAATTAC ACGACGACAG ACAGTAAAAA
28451 TATTCCTCGA TAAGCATGAC TACACCCATA TCACTGTTTA AGTGCTCGAC
28501 GTAGTTGTTG CATGTTATGT CGCGTGTGCC GCGATACGCG TGATTTCGGT
28551 GAAAATCACA CCACAACCAG TCGGCGTGCG TGTAACAAAG TCGACAGCGA
28601 AACAATTTAT CGTTTTCCAA AAAATTTAAA TACTCGACAG TTTTGCAGCT
28651 TAGATTCCGC GTTTGATTCA CCTTAAAATC GTCGTCAGCC TCTATAATCT
28701 CGGGCAACAG CTTGCCTTGT TGCCCCATCG TATCGATCAC CTCCCCCAAG
28751 TGGCCCGGTG TTATATTAAG TCGTTTAAAA TCATTTATTG CTTCCTGCAC
28801 GTCGGCCTGG TAATTTTTGA CCACGGGCGT GGAAATCAAT TGCCGTTGAA
28851 GGGAAATAAT TCGTGGTGTG GGTATCGGCC GCCTGTTGCA CAATTCCACC
28901 AGCGGTGGAG GCAAGGGCGC ATTCACAGCA ACCGTTGTCA TTTATAAGTA
28951 ATAGTGTAAA AATGCAAATA TTCATCAAAA CATTGACGGG CAAAACCATT
29001 ACCGCCGAAA CGGAACCCGC AGAGACGGTG GCCGATCTTA AGCAAAAAAT
29051 TGCCGATAAA GAAGGTGTGC CCGTAGATCA ACAAAGACTT ATCTTTGCGG
29101 GCAAACAACT GGAAGATTCC AAAACTATGG CCGATTACAA TATTCAGAAG
29151 GAATCTACTC TTCACATGGT GTTACGATTA CGAGGAGGGT ATTAATAATA
29201 ACAATAATAA AAACCATTAA ATATACATAA AAGTTTTTTA TTTAATCTGA
29251 CATATTTGTA TCTTGTGTAT TATCGCTAAC CATTAAAAGT GCTGGAGCCA
29301 CAGTGTTGCG GCGAGTCTTT ATAGAAGATC GTTGTTTGGC TGGAACTGAG
29351 CTTTTCCTTT TCCTGCTGCC GCTAATGGGA GTGGGCACGT ACTCTGTAGT
29401 AGACGGTGCA ACGGGCAACT TGAGCGCTAC CGTCTTAAAT TTGGCCATAC
29451 TTTTAGTGAT GAAATCGCGC GTTAACACTT CGTCGTAAAT GTTACTTAGC
29501 AGAGGCGCAA CATTGTGATT AAATGTCTCG TTTAACAAGC TGTAAAACTC
29551 CGAATAAAGC TTATCGCGCA TTTCGCAGCT CTCCTTCAAT TCTGCCAAAT
29601 TTGCGTTGGT AAGCACCACA GTCTGTCTTT TTTTGCTCGC TGGAATTGCT
29651 GCGTTCTCGC TTGAAGACGA CGATGTCGAT CGGTCGGCCA TTTTTTTGCC
29701 CAGCTTTTCA GTGTGATCAA AAATGAAC C AAAATCTGCC AATTCGGGCT
29751 TGTTTTTCAC CAAATCCCAC ATGGCCGGGC TACTAGGCCA CTCGGGCTGC
29801 TTGATCTTAG TGTACCAACT GTTAAACAAA ATGTATTTAT TGTTGTTAAT
29851 CACTTTCTTC TTGCGTTTGG ACATTTTGCG TTCGTCTTGC ATGACAGGCA
29901 CCACGTTAAG GATATAGTTA ATGTTCTTTC TTTCCAAGAA ATTTACAATA
29951 ACGGCCAGCT GGTCCATGTT GGATTTGTTG TAAGAGCTCG ATTCCAGTTT
30001 ATTCAACAGC TTTTCATTTT TGCACACGGC CGCAGTCTCC GGAGATTGTT
30051 GCTCCGGCAC GTTTACCATG TTTGCTTCTT GTAAACCTTT GAAACAACCC
30101 GTTTGTATTC TTGATGATAT ATTTTTTTAA TGCCCAACAA CCTGGCAATT
30151 CGTTTGTGAT GAAGACACAC CTTACGCTTC GAACATTTGT CGGTGATTAC
30201 TGTGAAATGG CCTAAATTAG CTCTTATATA TTCTTTTATA CGCTCAAACG
30251 ACACGATGTC CAACATGTGC GCGCAGACGT TTTCTGTGTT CATCGTGTGC
30301 TTGAGCGTGT TGATGGCTTC CCTGAACAGC GCTTGTATTT CGCTGCGAGT
30351 CAAGCAGTCC GAATCACACC CGCCTAAGTG CGTGCAATTT TTGGGGGGCA
30401 TCGTTGTCTA TCTTTTTCAG AGTGGCGTAG AAAAAGTCCT GCAATTGCCT
30451 ATTATCAAAA CGCGCCTTGA CGCTGCGCAC AAAATCAAAA AATTCAATGT
30501 AATTGCTGTA ATCGTACGTG ATCAGTTGTT TGTCGTTCAT ATAATTAAAG
30551 TATTTGTTGA GCGGCACGAT GGCCAGGCTG CGCGCTATTT CGCAATTGAA
30601 GCGTCGCGGT TTTAACATTA TACGGTAGTC ATTGCCAAAC GTGCCCGGCA
30651 ACAACTTCAC GGTGTACGTG TTGGGTTTGG CGTTCACGTT AATCAAGTTG
30701 CCGCGCACGA CGCCTACGTA TATCAAATAC TTGTAGGTGA CGCCGTCATC
30751 TTTCCATTGT AACGTAAATG GCAACTTGTA GATGAACGCG CTGTCAAAAA
30801 ACCGGCCAGT TTCTTCCACA AACTCGCGCA CGGCTGTCTC GTAAACTTTT
30851 GCGTCGCAAC AATCGCGATG ACCTCGTGGT ATGGAAATTT TTTCTAAAAA
30901 AGTGTCGTTC ATGTCGGCGG CGGGCGCGTT CGCGCTCCGG TACGCGCGAC
30951 GGGCACACAG CAGGACAGCC TTGTCCGGCT CGATTATCAT AAACAATCCT
31001 GCAGCGTTTC GCATTTTACA TATTTGACAC TTAAAAAATT GCGCACACGA
PCMB95/00578
113
31051 GCACCATCGT TTGATACCTA ATTGCAACTA'TTTACAATTT ATCAGTTTAC
31101 GTTGAACCCG TTTTAATTTT TTAGATCCGT CCTTGTTCAG TTGCAAGTTG
31151 ACTAAATGAC AAAATTTTTC GGTTCTGCAA AACCGCCCTT GTCTGTTCCA
31201 CCCGTTGTAT TTGAAAAAAC TTTTTTTCAC GCGGCGACAA CTGCTTGTAT
31251 AATATTGCCC AATGTAAACA TGCAAAATTT TGTTACTCTC GTCAAAACAG
31301 CGGTTGGCGT TCCATTCCAT AATTTTTTTA TTATTTATCA ACGATGGCCA
31351 TTGTAAATTG TCGTCATTTA TACGCATCAT ATGATTTAAC AAAAGCTTTT
31401 CGTATAGCGG AACTTCAATT CCCTTGGAAC ATTTTTCAAA CGATAATTTA
31451 ATTTGTTTCT CGGTTGGCAG CATTTCATGC TTGATTAACA ATCGCCTGAC
31501 TTTTATAGCC ACGTTTATGT CTTTGCACAG CAAATGTGGG TTGTCGACAA
31551 TGTAATAGTG CAAAGCATTT GTTACGGCAA ATGCGTAGTT TGATTTGACG
31601 ACGCCCTTTT TCTTGACGGG CATTGCGGCT TTTAAAATTA CTTGCAAGCA
31651 TTGTACGAAT ACCTCTTTGT GTTTAAACAA TAATATGGAC AAACATCGGC
31701 GAAACAATTT GTAATAATTA TGAAATCCCA AATTGCAGGT TTTAAACTTC
31751 TTTGTTACTT GTTTTATAAT AAATAAAATT TGCTGACCCA TGTCTGCGCC
31801 CACAACTTTA ATTAACCATT TGTGCGCATA TTGATTGTCT CGTTGTTCCC
31851 AACCGGAAAA TTGATTGATC TCGAGCCACC GGCATTGGTC GTTTGATACC
31901 GTCGTTAACG CCGACGCTCC TGCCTGTTTG ATTACGGGTT CTAAAAGACG
31951 AAACAGCAGC GTAAATTTGT TTTTGCGTCG GTAGTATTTT GGCAGGCAAT
32001 AATCAAAAAA ATCCGTAAGC AATTCTCTGC ATCTATTAAT ATTCGTTGCG
32051 TACGAATCGA GTTTTTCAAA AATTACTTTG TTTGTATGAA AATAACGTTT
32101 GGGCTTCTCA CAATAATAAT CTTCGTTGTA GAACAGAAAC GGTTTGCGAG
32151 AATTGGCACG TTTGTCCATG ATTGGCTCAG TGTAACGATT GATTCAAATC
32201 AAAATTGACA ACACGTTTGC CGTAATGTGC ACCGGTTCGC ACACGTTTGC
32251 CGCGTATGTA ATCCATGTTT ATTTCGCTGT CGCAATTGAT TACACGATTG
32301 TGTTGGGCGG CGCGTTTTAT TGAATTTAGG CGACGCGTCG ACAACTCCAA
32351 AGGATTGTAA AGCGCAGATT TTTCCAGAGT AAACGAGTTT AAGTGGCCAC
32401 CGTTGAACCA TTCCAGAGCC ACGATTGTGT ACAGCAAAAA GAATATTTCT
32451 TTGTCGACGT TTTCAAACGC AAACTTGTTT TTTAGGCAAT AGTAGTAAAA
32501 TTTTAACGAA TTGTATAAAT AAAACATAAA ATTGCCATTT TTAAAGTAAA
32551 ATTCTACATC CGTGACGAAC AAAAGGTTTA CTATTTTGTT CTCCAACAAG
32601 TGTGCCAATT TTCTTAAGTA CACCATTGAA TTTTTGTCGT CGTCCATCTC
32651 GATCAACAAC ACGTACGGCG. TTTTGGAATT TAAAATTATT CTAAAATTTT
32701 CCTGTTGCAA CGATTCCACA GCGTCCGACC AATATGACGC TGCCACCTCT
32751 AGACAGATGT ATTTCTTGGA AAACACGTGT CGTTTGATAA CCTCGCTGAT
32801 GGACGTGATC GATTGTAAAT ACTTTTCAAA CGTCGCGTCT TCCCAACCAC
32851 GCACCGAAAC GGGCGCTGTC GTGTCGGGCT GATGTTTGAA ATCCAAACCA
32901 CTCTGAATTA ACTTGGTTGT GATTCGTATG CTCAACTGTT GACCCAACGT
32951 GTAGTGATCT TCGTAGGCGC GCTCCCACAT CACGTTACAC ACAAATTTGA
33001 CGAGATCATC AACGTCTTTC TGTTGCAAAA TTCGCCGCAA ACGCGCCACA
33051 TCGCCCTTGT ACCACCGATC TCGGCACACA AGCTGTAGCA TTTTTAAATC
33101 GTGATCGCTC AAGCTATTAA TTCTGGTTAG ATTTATATAG TCGTCAATAT
33151 CCTCGGGCGT GGTTTGCGTC ATGTCTGTAA AACGTGCAAA ATCAAACATT
33201 TTTATGTTGT AGTCGAATCT AACAAATCCA TCGGCGTTCA CTTGCACTTC
33251 GCGCTTTACA AAACGAGGTA GCGTGTAATC GAACCCGTTT AAATAGATTG
33301 CGTACAAAAC CAGCACTTCA TCTTCCAGTT TGCACGCTTG CGGCAAAAAT
33351 TGTGTGGTGT GCTCCAACCG GGTGACAAAC ATGACTATGG AAAATAACGC
33401 GGAATTCAAC AGACGACTAG AGTACGTGGG CACGATCGCC ACAATGATGA
33451 AACGAACATT GAACGTTTTA CGACAGCAGG GCTATTGCAC GCAACAGGAT
33501 GCGGATTCTT TGTGCGTGTC AGACGACACG GCGGCCTGGT TATGCGGCCG
33551 TTTGCCGACC TGCAATTTTG TATCGTTCCG CGTGCACATC GACCAGTTTG
33601 AGCATCCAAA TCCGGCGTTG GAATATTTTA AATTTGAAGA AAGTCTGGCG
33651 CAACGCCAAC ACGTGGGCCC GCGTTACACG TACATGAATT ACACGCTTTT
33701 TAAAAACGTC GTGGCCCTCA AATTGGTCGT GTACACGCGC ACGCTACAAG
33751 CTAACATGTA CGCGGACGGG TTGCCGTATT TTGTGCAAAA TTTTTCAGAA
33801 ACAAGCTACA AACATGTTCG TGTGTATGTT AGAAAACTTG GTGCGATACA
33851 AGTAGCGACA TTATCAGTTT ACGAACAAAT TATTGAAGAT ACAATAAATG
33901 AACTCGTCGT CAATCACGTT GATTAGATAA TGTCCGTGTT AAATGTGATA
33951 TCTTAGATTA CGAGCGCGCA ATAACCATAG TTTAATCGAA GAGAATAGCC
34001 GTCGCCACAA TGGATAATTA CAAATTGCAA TTGCAAGAAT TTTTTGACCA
34051 AGCGCCCGAC AACGACGATC CCAACTTTGA ACATCAAACG CCCAATCTAT
34101 TGGCGCATCA GAAAAAAGGC ATACAGTGGA TGATTAACAG AGAAAAAAAC
34151 GGCCGGCCCA ACGGCGGCGT GCTTGCCGAC GACATGGGAC TCGGCAAAAC
34201 GCTCTCTGTG CTAATGTTAA TCGCAAAAAA CAACTCTCTA CAATTGAAAA
34251 CTCTAATAGT GTGTCCTTTG TCTTTAATCA ATCATTGGGT AACCGAAAAC
34301 AAGAAGCATG ATTTAAATTT TAACATTTTA AAGTATTACA AATCTTTGGA
34351 TGCCGACACG GTTGAGCATT ACCACATTGT GGTGACCACG TACGACGTTT
34401 TATTGGCACA TTTCAAATTG ATCAAACAAA ATAAACAGTC AAGTCTGTTT
34451 TCAACCCGCT GGCATCGAGT TGTTCTAGAT GAAGCGCATA TTATCAAAAA
34501 CTGCAAGACG GGCGTGCACA ACGCCGCGTG CGCTTTGACC GCAACAAACC
34551 GATGGTGCAT TACCGGCACA CCGATCCACA ACAAGCATTG GGACATGTAC
34601 TCGATGATTA ATTTTTTGCA ATGTCGTCCT TTTAACAATC CAAGAGTGTG
34651 GAAAATGTTA AATAAAAACA ACGACTCTAC AAATCGCATA AAAAGTATTA
34701 TTAAAAAAAT TGTTTTAAAA CGCGACAAAT CTGAAATTTC TTCTAACATT
34751 CCTAAACACA CGGTTGAGTA TGTACATGTT AATTTTAATG AAGAAGAAAA
34801 AACGTTGTAC GATAAATTAA AGTGTGAATC GGAAGAGGCG TATGTGAAGG
34851 CTGTGGCAGC GCGTGAAAAC GAAAACGCAC TAAGCCGATT GCAGCAAATG
34901 CAGCACGTGT TATGGCTAAT ACTGAAATTG AGGCAAATCT GCTGCCACCC
34951 GTATTTGGCC ATGCACGGTA AAAATATTTT GGAAACAAAC GACTGTTTTA
35001 AAATGGATTA TATGAGCAGC AAGTGCAAAC GAGTGCTCGA CTTGGTAGAC
35051 GACATTTTGA ACACAAGCAA CGACAAGATA ATATTGGTTT CGCAATGGGT
35101 GGAATATTTA AAAATATTTG AAAACTTTTT TAAACAAAAA AACATTGCTA
35151 CGTTAATGTA CACGGGCCAA TTAAAAGTGG AAGACAGGAT TTTGGCCGAG
35201 ACGACATTCA ATGATGCTGC CAATACTCAA CATCGAATTT TGCTGCTTTC
01320 PCMB95/00578
116
35251 CATTAAGTGC GGCGGCGTCG GGTTAAACTT AATAGGCGGA AACCACATTG
35301 TAATGTTGGA GCCTCATTGG AACCCGCAAA TTGAATTGCA GGCGCAAGAC
35351 CGAATCAGTC GTATGGGACA AACAAAAAAC ACGTACGTGT ACAAGATGCT
35401 AAATGTGGAA GACAACAGCA TCGAAAAATA CATTAAACAA CGCCAAGACA
35451 AAAAGATTGC GTTTGTCAAC ACGGTCTTTG AAGAGACTCT GCTCAATTAC
35501 GAAGACATTA AAAAATTTTT CAACTTGTAG CTGGTAAGTC GTCATGAACA
35551 CCCGATATGC TACTTGCTAT GTTTGCGACG AGTTGGTGTA CTTGTTTAAG
35601 AAAACGTTTA GTAACATGTC CCCTTCGGCC GCTGCGTTTT ACCAACGGCG
35651 CATGGCCATT GTTAAAAACG GTATCGTGCT GTGCCCACGT TGTTCGTCGG
35701 AACTAAAAAT TGGCAACGGC GTTTCGATTC CAATTTACCC CCACCGCGCT
35751 CAACAACATG CACGACGGTC GCGTTAAGAC GCAAGCGCTT CGAGTTTTGG
35801 CCCGCTCGCT ACCTCCGCTG TACGACTCGA CCGTCGATCG ACACGGCTGC
35851 AAGGTGTTCA CGGTGCGGCG CTACAACAGA CGCGTAATCG ACTTTGCGGG
35901 CATTCGCAAC AAAACGCTGG AAATCATTAA AACGGATAGA AACTTGCCGC
35951 TCAACACAGA ATGCAATGTG AAAGTTGTCG ACAGTGCATG CATGCGTTGC
36001 AGAAAAAGTT TCGCAGTTTA CCCCGCCGTT ACCTATCTGC ATTGCGGACA
36051 TTCGTGTCTG TGCACCGACT GCGACGAAAC GGTAAACGTG GACAACACGT
36101 GTCCTAAATG TAAAAGCGGC ATTAGATATA AATTAAAATA CAAAACTTTG
36151 TAACATGTTG CCCTACGAAA TGGTGATTGC CGTGTTGGTT TACTTGTCGC
36201 CGGCGCAGAT TCTAAATTTA AACCTTCCTT TTGCATACCA AAAAAGTGTG
36251 CTGTTTGCCA GCAACTCTGC AAAAGTTAAC GAACGCATCA GGCGGCGAGC
36301 GCGTGACGAC AACGACGACG ACCCCTATTT TTACTACAAA CAGTTCATAA
36351 AGATTAATTT TTTAACTAAA AAAATAATAA ATGTTTATAA TAAAACTGAA
36401 AAGTGTATTA GAGCGACGTT TGATGGTCGG TATGTGGTTA CACGCGACGT
36451 TTTAATGTGC TTTGTAAACA AGAGTTATAT GAAGCAATTG CTGCGCGAGG
36501 TTGACACTCG CATTACACTA CAGCAACTTG TTAAAATGTA TAGTCCAGAA
36551 TTTGGTTTTT ATGTAAATAG CAAAATTATG TTTGTGTTAA CTGAATCGGT
36601 GTTGGCGTCT ATTTGTTTAA AACACTCGTT CGGCAAATGC GAGTGGTTGG
36651 ACAAAAATAT AAAAACTGTG TGTTTACAAT TAAGAAAAAT TTGTATTAAT
36701 AATAAGCAAC ATTCGACATG TCTATCGTAT TGATTATTGT CATAGTTGTA
36751 ATATTTTTAA TATGTTTTTT GTACCTATCA AATAGCAATA ATAAAAATGA
36801 TGCCAATAAA AACAATGCTT TTATTGATCT CAATCCCTTG CCGCTCAATG
36851 CTACAACCGC TACTACTACC ACTGCCGTTG CTACCACCAC TACCAACAAC
36901 AACAACAGCA TAGTGGCCTT TCGGCAAAAC AACATTCAAG AACTACAAAA
36951 CTTTGAACGA TGGTTCAAAA ATAATCTCTC ATATTCGTTT AGCCAAAAAG
37001 CTGAAAAGGT GGTAAATCCC AATAGAAATT GGAACGACAA CACGGTATTT
37051 GACAATTTGA GTCCGTGGAC AAGCGTTCCG GACTTTGGTA CCGTGTGCCA
37101 CACGCTCATA GGGTATTGCG TACGCTACAA CAACACCAGC GACACGTTAT
37151 ACCAGAACCC TGAATTGGCT TACAATCTCA TTAACGGGCT GCGCATCATT
37201 TGCAGCAAAC TGCCCGATCC GCCGCCGCAC CAACAAGCGC CCTGGGGCCC
37251 GGTCGCCGAT TGGTACCATT TCACAATCAC AATGCCCGAG GTGTTTATGA
37301 ACATTACCAT TGTGCTAAAC GAAACGCAGC ATTACGACGA AGCTGCGTCC
37351 CTCACGCGTT ACTGGCTCGG CTTGTATCTG CCCACGGCCG TCAACTCGAT
37401 GGGCTGGCAC CGGACGGCAG GCAACTCAAT GCGCATGGGT GTGCCCTACA
37451 CGTACAGTCA AATCTTGCGC GGATATTCAT TGGCGCAAAT TAGGCAAGAG
37501 CAGGGAATAC AAGAAATCCT AAACACGATC GCGTTTCCGT ACGTGACTCA
37551 AGGCAACGGC TTGCACGTCG ATTCGATATA CATCGATCAC ATTGACGTGC 37601 GCGCTTACGG CTATTTGATA AATTCATACT TTACGTTTGC CTATTACACG
37651 TACTATTTTG GAGACGAGGT AATCAACACG GTGGGTTTGA CGAGAGCCAT
37701 CGAAAACGTG GGCAGTCCCG AGGGAGTTGT GGTGCCAGGC GTCATGTCTC
37751 GAAACGGCAC GTTGTACTCT AACGTGATAG GCAACTTTAT TACGTATCCG
37801 TTGGCCGTCC ATTCGGCCGA TTACTCCAAA GTGTTGACCA AACTTTCAAA
37851 AACATATTAC GGTTCGGTTG TGGGCGTAAC GAATAGGTTG GCTTACTACG
37901 AATCCGATCC CACAAACAAC ATTCAAGCGC CCCTGTGGAC CATGGCGCGG
37951 CGCATTTGGA ATCGGCGCGG CAGAATTATC AACTATAATG CCAACACGGT
38001 GTCGTTTGAG TCGGGTATTA TTTTGCAAAG TTTGAACGGA ATCATGCGCA
38051 TCCCGTCGGG CACCACGTCC ACGCAGTCGT TCAGACCGAC CATTGGCCAA
38101 ACGGCTATAG CCAAAACCGA CACGGCCGGC GCCATTTTGG TGTACGCCAA
38151 GTTTGCGGAA ATGAACAATT TGCAATTTAA ATCGTGCACG TTGTTCTACG
38201 ATCACGGCAT GTTCCAGCTA TATTACAACA TTGGCGTGGA ACCAAACTCG
38251 CTCAACAACA CAAACGGGCG GGTGATTGTG CTAAGCAGAG ACACGTCGGT
38301 CAACACCAAC GATTTGTCAT TTGAAGCGCA AAGAATTAAC AACAACAACT
38351 CGTCGGAAGG CACCACGTTC AACGGTGTGG TCTGTCATCG CGTTCCTATC
38401 ACAAACATCA ACGTGCCTTC TCTGACCGTT CGAAGTCCCA ATTCTAGCGT
38451 CGAACTAGTC GAGCAGATAA TTAGTTTTCA AACAATGTAC ACGGCCACGG
38501 CTTCGGCCTG TTACAAATTA AACGTCGAAG GTCATTCGGA TTCCCTGAGA
38551 GCTTTTAGAG TTAATTCCGA CGAAAACATT TATGTAAACG TGGGCAACGG
38601 CGTTAAAGCC CTGTTTAATT ATCCCTGGGT AATGGTCAAA GAAAATAACA
38651 AAGTGTCTTT CATGTCGGCT AACGAAGACA CTACTATACC ATTTAGCGTT
38701 ATAATGAATT CCTTCACCTC TATCGGCGAA CCAGCTTTGC AATACTCTCC
38751 ATCAAATTGC TTTGTGTATG GAAACGGTTT CAAATTGAAC AACAGCACGT
38801 TTGATTTACA ATTTATTTTT GAAATTGTGT AATTATATTT AGGGAGAATG
38851 TGATATTCAA AAGACTGACT GTTAACACAA AAGACTGATA TTGTTGTTGT
38901 TACAAAATAG ATAATAAAAC AAAAAATAAA TTAAATATTA TTTATTTATT
38951 AAACTGTTTA ATTTTAATGC TAACGCGTAC AAATCACGCT GTTCCGACGT
39001 GGACATGGAA TTGCGCAGAA AAGTCTTGAT AGTGTCGATT TCTTCGCCGT
39051 CATCCACTTC CATATATTTG ATTTCTTCCT CGATTTGCAT TTCCAAGTTT
39101 GCGTATTCTT GCAAATAATA ATCTAGTCGT TGGGCGACCT CGCCAATTTT
39151 AAATAATACA TTATCCGACA CCAAATGCCA GCGAGTGACT GTGCGCTCCA
39201 TCATCCTGGC ACTTTTTAAT GTGAATATTA AAAGGTTGTT GCATATATAT
39251 CGTTAAACGT TTATGTTTAC TTTCACGTTA GCTCGTTTCA TTGATGTAAA
39301 CATTTAGTTT TATAACAGCG TCGGTAATTT TATTTTTTAA AGTAAACAGA
39351 CCAAAATCAA AGGTGTCTTC GACAGGTACG ATTATTTTCC CATTGACACT
39401 GTTTTCGTGC ACAGATATAA TTTTATCACC GTTTATTATT TTGCCCAAAC
39451 ACACGTACTC GTTTCTTCTC AAGCCAACTA TTTCTAAACA ATTCACTTTT
39501 CTATTATCGT GTACGCAATT AAAAGTAAAC GAAGCGCTAC AATTGTCGTA
39551 TTCTATTACA ATTCTGCGGC ATTTATAAAA TTTATTAATG TTGACGCAAA
39601 TTCCATGCAG CGCATCCATT TCGTACTGCA AATGCGGCGC AATTAAAAAA
39651 TTTCCTCGTC GTTGTTAACA ATCTTGGGCG CTAAAAAGCA CGCCAACACG
39701 CCCACGTCTT TAATGCAATA TTCCAATTTG AACGGCAGTT CCTCGGACAT
39751 GTATATTGTC ACGGTGGGCG CCAAAGGAGC GGCTTTAGCA AAATGACACA
39801 AGTAATCGCC CGCAAAAGTG TGCGTTACGG TTTGCTTTGC TTTGAGAACG
39851 GAAAAGTTTT CGTTGTCCGC GCTCATCTGC ACGTCCGCCG AGCCAATGTC
39901 GCCATTTGCT CTAAACTGCA GACCCTTCTT GGAACACGAC ACAATAATAT
39951 CGTGGTCGAA TTGCGTCATG TCTTTGCACA CCTGCGCAAA CTCGACGCTC
40001 GACATGTGGA CGACGCAATC GTAATCGCTA TCCGGAATTC CCAAATGTTC
40051 CACGTCGATG CACATCAACT TGAGCGTGTA CGTGCAGATT CTATTGTCGT
40101 TGTTGAACAC GAACGCCATC ACATCGCCCT GATCTTCCGC TTTCATCAGT
40151 ACAGAGCTGC GCTCGTTAAC GCATTTGACA ATTTTACTTA AACTGTTTAT
40201 GGACACGTTG AGCGGCACGT TGCGGTCACA TCTATATTTT TTGAAACCCT
40251 CGGCGTGTAG TTGCAACGAC ACGAGCGCGA CATGCGAGGT GTCCATAACC
40301 TGCATGCTTA CGCCTCGATT ATCACAATCA AAAGTAGCGT GCGGCAGCAG
40351 ATCCTTAAAA GTTTCCACCA GCCTCTTCAA AACTGCGCCG GTTTTAAATT
40401 CCGCTTCGAA CATTTTTAGC AGTGATTCTA ATTGCAGCTG CTCTTTGATA
40451 CAACTAATTT TACGACGACG ATGCGAGCTT TTATTCAACC GAGCGTGCAT
40501 GTTTGCAATC GTGCAAGCGT TATCAATTTT TCATTATCGT ATTGTTGCAC
40551 ATCAACAGGC TGGACACCAC GTTGAACTCG CCGCAGTTTT GCGGCAAGTT
40601 GGACCCGCCG CGCATCCAAT GCAAACTTTC CGACATTCTG TTGCCTACGA
40651 ACGATTGATT CTTTGTCCAT TGATCGAAGC GAGTGCCTTC GACTTTTTCG
40701 TGTCCAGTGT GGCTTGTTTT AATAAATTCT TTGAAAATAT TGTCGGGTGT
40751 ATTATTAAAT AGCATGTATG GTATGTTGAA GATGGGATAA CGCTTGGCGT
40801 GCGGGTCGTC ATGATTTCCA CCGCGCACCA CATATTTGCG CTCAATTTTA
40851 TCAAAATTGG ACTGGCGAGA CAAAAACGAG ACGGGCGACA GGCATATTTG
40901 GGCGTGCGTA CCATCTTCGG CCATCCACTC GGTCAGGTCT TCGCTGCGGT
40951 TAAACACACC TTTCTGACCG TGAATGCCAC ATATTTTTAT TCCTTCCAAA
41001 TCGTTGGTGG ACGTGACTAT GACTATTTTA AGCATAACGT TGTCGCCGTT
41051 AACCACCATG CTGGCGTCGA GTTTTTCAAT TTTTTGATTT TTAATTTGTC 1101 TAAAGTAAAC GTACACTTTG TAAACGTTAA AATTGCCGTT GGTGCACGTT
41151 TCAATTTTGT ACCGTCGGCC GTCGTACACC CAATTAATCT TTGCGTTGCT
41201 CACCAACACA CCGGCCATGT ACAGCACAAG TCCGTCGTCT AGCGCAACGT
41251 AATTTTTGTC GCTACTATTC GTAAACTTTA CTAAACACGA CTGCTTGGGG
41301 CCGACCACAA GCTTGCCCTT CAATTTGTTC ACTTTGTTGT TGTATAAACA
41351 AATGGGCAGC GCAATGTGCG GAATGTACGG ATCTTCGGCG GTCATGAGTT
41401 TATTGTCTCG CACCAACGTC CACAATTTAA ACATTTTATT GTTGAGCAAA
41451 ATGGACTTGT TTACCGCCAC AGAGTAGCCA TTTGGTAAAC CCGATACGCA
41501 ATTTTCCTCT TTGTACTCAA ACACGGGCAT GGCATTCTTT AGATTGGTTA
41551 GGGACACAAT CAATTTGGGT ACGGGCGTGG TATGAAATAA ATGTATAAAA
41601 TTACGATAAT AATACTGCTC CAACTTGGAC ATGAGCGATT TGACGTCATC
41651 GTTTTCTACG ATCGTACACT GAATAATGGG ATTATAGTAT ATAGAATGTT
41701 TATAGTGGTA TTCGTAGGGT GTCAACAATA CGTTAATGTC GGCTTCGTTG
41751 TTCACCCGCA ACTTTTTTTT GATGCATATC ATTCCTTCGT GATGATTAAC
41801 GTAAAGTATT CTGTCTGTAA TCTTCAATTC GATGGGCGCC ATGTTTCTTT
41851 TCATAGTGTA CACGATAAAC GACGTGTTTG ATTTTAAACA TTTTAAATTT
41901 GTGGGTCTAT CATTAAACGC GATCAGCAAC GAGTCGTCTT GAACGTCGTT
41951 GAGGTCGTCC ACGAACGCGA CCAGATTGTG TTTTAGCAAA TATTGAAATT
42001 TTTGCGCAAC CATTTCGTAG TCCACGTTGG GCAAACATGC GTTGCGGCAA
42051 AGGAAAAACT TTTTGCCCGC CACGGTCATT TCGCCGTGAA AAAAACTGCC
42101 AATAAATTTC ACAAAATCCT TTTTTTGCTT CAACATTTTC TGGCGCATGC
42151 TGTCGTTGGT GATTCGCGCC ACCTCGTTGC CGACGCGATA TTTTAACACG
42201 GGCAACGAAA TTTCAATATT GTTATTGCTG CTGTTGTCCT GTTGATTGGG
42251 AAAGACTTTG CGTTGCTTGC TAAAAGTTTT CGATACGCAA TATATGAGAC
42301 GCCCGTTGAC TATACAATCG ACAATCTTTT TCGACTCTTT GTTGTACAAG
42351 ACGCTTTGAA TTTTACGACG CTTGTTCGCC ACCGTGTACG CGTCGTCGTC
42401 GGCCGTCTTG TCGAGAACTC GTTGATAGTT TTGCAAAATT GTCGAAGTTA
42451 ATAACAGTTC TATCAAATAG GCGTGCTTGT ATACAATTTT GTTGGCCAAA
42501 CTGTCTATAG AATAGTTTAT GTCGTGATTC ATAATAATTT TTATGTGTTC
42551 CACGAGTTGT TGCTTGTGAA GCGTGTTGTA TTCGAAGAGA AAATCGAGCG
42601 GTTTCCATTT GCCGCTGTTG GCCAGATATG TTTCCAGCAC AGAATTTAAA
42651 TCTTCCGTCA CTACGTAATC GCTAGCGTAC ACGTCTCGAG CAAACAGGAC
42701 GTCGTCTTGT TTGTCGTAAA CTAGTTGGAT TGCGCGATTG ATGTGCTTCT
42751 CTTGATCCAC GTTGCCGTAC AAAAACATGC GTTTGCAATG TTTGGCGTAT
42801 AGCTTGTCGT AGAAATTGTG CACCAAAACG TTGTTGTTCA TCATTATGTT
42851 GGGAAAACTC AAAAATCTGC CGTCCAGCAT AAAAGTTCCG TTAATATTGT
42901 TGTTTGCGTC GACATCGTCC GTTTCTCTAA ATTGCTTGTC TAAGCGCGTG
42951 CCGAATATAA CGGGCACACA TTTATGCATT ACGCAACTGA GCTGTTCATT
43001 AAGAGCGCAA CACAAATAAG ACTTGCGTTC TTGAATAGCG CAAAAAAGCA
43051 TACGTTCATT GCTGTTTGTA GCGCAATCAA AAGTATATTT TAATTTGTAT
43101 TTATTTTCAA TTCTATCGTA CAACTCGTTG AAATCTTGAA CCACGTCCGT
43151 CATCGTGAAG CGATTACTGC GCACTAATTA TGTCTAAACG TGTTCGTGAA
43201 CGGTCGGTTG TTTCGGATGA AACGGCCAAA CGCATTCGAC AAAACGAACA
43251 CTGTCATGCC AAAAATGAAT CTTTTTTGGG GTTTTGCAAC TTGGAAGAAA
43301 TTGATTATTA TCAATGTTTA AAAATGCAAT ACGTTCCGGA CCAAAAGTTT
43351 GACAACGATT TTATTTTAAC AGTGTACAGA ATGGCCAACG TGGTGACGAA
43401 ACAAGTTAGA CCGTATAACA GTATCGACGA AAAGCACCAT TACAACACGG
43451 TGCGTAACGT GTTGATTTTA ATAAAAAATG CGCGTTTAGT GCTTAGTAAT
43501 AGTGTCAAAA AGCAATACTA TGACGATGTG TTAAAATTGA AAAAAAATAC
43551 AGACTTGGAA TCGTACGATC CATTGATTAC GGTCTTTTTA CAAATTGGCG
43601 AATCTGTAAA TGAAGAAATA CAAAAACTCA GAAAAGCTTT GGTCAATATT
43651 TTTACTAATA AACCCGACAA GTCGGATATA AACAACCCAG ATGTAGTTTC
43701 GTATCAATTT ATTTTTGGCA GAGTACAAAA ATTGTATAAC AGGGCAATTA
43751 AACAAAAAAC TAAAACTATA ATTGTAAAAC GTCCTACAAC TATGAACAGA
43801 ATTCAAATAG ATTGGAAAAC TCTTTCCGAA GACGAACAAA AAATGACTAG
43851 ACAAGAAATT GCCGAAAAAA TTGTAAAGCC TTGTTTTGAG CAATTTGGCA
43901 CTATATTACA CATATACGTA TGTCCTTTAA AACACAACCG AATTATTGTC
43951 GAGTATGCAA ACTCAGAGTC GGTACAAAAA GCCATGACTG TAAATGACGA
44001 CACTCGATTT ACAGTTACAG AGTTTTCCGT GGTTCAGTAC TACAACGTGG
44051 CCAAAACAGA AATGGTGAAC CAGCGAATTG ACATAATAAG CAAGGACATT
44101 GAGGATTTAA GAAACGCTTT AAAATCTTAC ACATAAATTA AAATATCGAA
44151 CAAAGGAAAA AAACAATTGT AACAAAAATA ATTTACATTA AAATTTACAA
44201 GTTTTTTTCT AGTGTCGTAC TTTTTTACAA TGCGTCTGTT GTCCGTCGAG
44251 CATTGCAAAC ATATTGTGGA CGGCGCAAAA TAGCAAACAA AAGGCACGTC
44301 CGCGCTCTCC CACGCTATTC TAAAACGATG AATCCATATT AATTTTTCAT
44351 TGTCGCCAAA CGTCGCTCCG CTGCCTCCTT CCAATAACAA ATACTCAGAA
44401 ACACAAACAT GTACAATTGC TGTCGCGGCG TTAATTGTCG CTGTTTTTCC
44451 AAATAGTCTA TTATGGGAAA CAAACACTTG TCACAACACA AATACTCGTT
44501 AATTGTCACA ACCGACAAGC ACATTTGGCA AAATGCGTCG CAATTTTTGT
44551 ACGGACGAGA TTCTATGCGA AGTTCGTTGT CCATGACGTC TTGGGTCCAC
44601 TTTTTCAACA AGACACTTTT ATATTTGTGA TTTGTACAAC TTTGGTACGT
44651 GTTAGAGTGT TTTTGATAAG CTTTGATAAG TTTAAAACTG TTGGAGTAAG
44701 GCCACGTCAT TATGTTCTGC ACCTTTTGTT TAAAAGACAG AAATTACTAT
44751 ATGTTCAAAC TATTTAAAGA TTATTGGCCA ACGTGCACGA CAGAATGCCA
44801 GATATGTCTT GAGAAAATTG ACGATAACGG GGGCATAGTG GCAATGCCCG
44851 ACACTGGCAT GTTAAACTTG GAAAAGATGT TTCACGAACA ATGTATTCAG
44901 CGTTGGCGTC GCGAACATAC TCGAGATCCC TTTAATCGTG TTATAAAATA
44951 TTATTTTAAC TTTCCCCCAA AAACACTAGA GGAGTGCAAC GTGATGCTTC
45001 GAGAAACTAA AGGGTCTATA GGCGATCACG AAATTGATCG CGTTTACAAA
45051 CGCGTTTATC AACGCGTTAC ACAGGAAGAC GCCCTGGACA TTGAACTCGA
45101 TTTTAGGCAT TTTTTTAAAA TGCAATCATG ACGAACGTAT GGTTCGCGAC
45151 GGACGTCAAC CTGATCAATT GTGTACTGAA AGATAATTTA TTTTTGATAG
45201 ATAATAATTA CATTATTTTA AATGTGTTCG ACCAAGAAAC CGATCAAGTT
45251 AGACCTCTGT GCCTCGGTGA AATTAACGCC CTTCAAACCG ATGCGGCCGC
45301 CCAAGCCGAT GCAATGCTGG ATACATCCTC GACGAGCGAA TTGCAAAGTA
45351 ACGCGTCCAC GTAACAATTA TTCAGATCCC GATAACGAAA ACGACATGTT
45401 GCACATGACC GTGTTAAACA GCGTGTTTTT GAACGAGCAC GCGAAATTGT
45451 ATTATCGGCA CTTGTTGCGC AACGATCAAG CCGAGGCGAG AAAAACAATT
45501 CTCAACGCCG ACAGCGTGTA CGAGTGCATG TTAATTAGAC CAATTCGTAC
45551 GGAACATTTT AGAAGCGTCG ACGAGGCTGG CGAACACAAC ATGAGCGTTT
45601 TAAAGATCAT CATCGATGCG GTCATCAAGT ACATTGGCAA ACTGGCCGAC
45651 GACGAGTACA TTTTGATAGC GGACCGCATG TATGTCGATT TAATCTATTC
45701 CGAATTTAGG GCCATTATTT TGCCTCAAAG CGCGTACATT ATCAAAGGAG
45751 ATTACGCAGA AAGCGATAGT GAAAGCGGGC AAAGTGTCGA CGTTTGTAAT
45801 GAACTCGAAT ATCCTTGGAA ATTAATTACG GCGAACAATT GTATTGTTTC
45851 TACGGACGAG TCACGTCAGT CGCAATACAT TTATCGCACT TTTCTTTTGT
45901 ACAATACAGT CTTGACCGCA ATTCTTAAAC AAAACAATCC ATTCGACGTA
45951 ATTGCCGAAA ATACTTCTAT TTCAATTATA GTCAGGAATT TGGGCAGCTG
46001 TCCAAACAAT AAAGATCGGG TAAAGTGCTG CGATCTTAAT TACGGCGGCG
46051 TCCCGCCGGG ACATGTCATG TGCCCGCCGC GTGAGATCAC CAAAAAATTT
46101 TTTCATTACG CAAAGTGGGT TCGAAATCCC AACAAGTACA AACGATACAG
46151 CGAGTTAATC GCGCGCCAAT CAGAAACCGG CGGCGGATCT GCGAGTTTAC
46201 GCGAAAACGT AAACAACCAG CTACACGCTC GAGATGTGTC TCAATTACAT
46251 TTATTGGATT GGGAAAACTT TATGGGTGAA TTCAGCAGTT ATTTTGGTCT
46301 GCACGCACAC AACGTGTAGC ATCGCCAGTA TTTAACAGCT GACCTATTTG
46351 TTAAACAAGC ATTCTTATCT CAATAATTGG TCCGACGTGG TGACAATTGT
46401 ATCCACAATC ATGAAAAAAG TAGCGCTTGG AAAAATTATC GAAAACACAG
/01320 PCMB95/00578
124
46451 TAGAAAGCAA ATATAAAAGC AACAGTGTGT CGTCGTCATT GTCAACGGGC
46501 GCCAGTGCAA AATTGAGTTT AAGCGAATAT TACAAAACTT TTGAAGCAAA
46551 TAAAGTGGGC CAGCACACTA CGTACGACGT GGTCGGCAAG CGAGATTACA
46601 CGAAATTTGA CAAATTGGTG AAAAAATATT GACATGCTGC GATCAATCAT
46651 GCGACGTTTC AAGAGTACAA ACAATCTCAG CAAAAAACCC TCCGATTATT
46701 ATGTAGTGTT ATGTCCAAAG TGTTATTTTG TGACGTCGGC CGAAGTGAGC
46751 GTGGCTGAAT ACATAGAAAT GCATAAAAAT TTTAACACGA AATTCGCCGA
46801 TCGGTGCCCT AACGATTTTA TTGTGACCAA CTCTAAAAGT TGGAATAATC
46851 ATGAAAATTG TTCTGCCCTA TTTTACCCTC TGTGTTAATA AAGTTTGTTG
46901 TTTGTATTTT GTGGTTTTAT TTATTTACGC TAGATATTGG GTTTAAGGTT
46951 CTTAGAAATA GAGTTGTATT TTCCCTACCA AAAGGGATTT GAGCTTCATA
47001 TAAATACAAT ATTCGCTCGA CAAGCGGTTT ATTTCACTCG GAGGTATTAT
47051 ATCAGGCAGT CGAACGTGCG CGATGAAACA TCCCGTTTAC GCTAGATATT
47101 TGGAGTTTGA TGATGTAGTG TTAGATTTGA CTAGTTTAAT ATTTTTAGAG
47151 TTTGATAACG CTCAAAATGA AGAGTACATT ATTTTTATGA ATGTAAAAAA
47201 GGCGTTTTAC AAAAACTTTC ACATTACTTG TGATCTGTCG CTTGAAACGC
47251 TGACCGTGTT GGTGTACGAA AAAGCTCGCC TAATTGTGAA ACAAATGGAG
47301 TTTGAGCAGC CGCCAAACTT TGTTAATTTT ATCAGTTTCA ACGCGACCGA
47351 CAACGACAAC TCCATGATAA TAGACTTGTG TTCCGACGCG CGCATAATCG
47401 TGGCCAAGAA GCTGACGCCC GACGAAACGT ATCATCAGCG CGTGTCCGGA
47451 TTTTTGGATT TTCAAAAACG TAACTGCATA CCTCGGCCCC CAATCGAGTC
47501 GGACCCAAAA GTGCGAGACG CCTTGGATCG TGAACTAGAA ATAAAACTAT
47551 ACAAGTAGAA AAAAATTAAT TTATTAATAG TTGTAATAAT TATCTTCGTC
47601 CTCATCTTCG CTGGTGTCAT AATGCGGTGG TGTGTTTGTG TTTTGTTTTA
47651 ATCGTTTGCG CGTCGACACC ACTTCGCCGA TAGGAAATTT TTTGGATTTC
47701 GCATTAAATG CCCTCTTAGC GACGCGCCGT TTACGACTAC TAAACATGTT
47751 GACGCGCTCG TCGTCTTCAG TGTCATAATC CGTGCTAGTG TTTTCGTTGT
47801 TATTTTCTAT GAGACGATCG TTTGATTTAG TTTTCGTAGA ATTGTCCGCG
47851 TTATCGTCGC TTTCGTCGAT GTCGTCCCTA ACTATCTCGT AGGCGGCTTT
47901 GCGCGGAATC CAAGATTTTG CAATGTATCT ATTTTAACGT ACTTTTCTTC
47951 GAGCGCTTTT CTAGCTTTAT GCATAGCAAT GTCTTCGTCG CCGCCGTTCA
48001 TTTTATGATA CTTTGTAAAC GTCTCGACGA ATAACTTTTT GGCGCGAGGA
48051 GGCATTTTTT CATTGTATAA CATATCGGGA ATTTGATACA TTGTAATTAG
48101 AATTAAGCAA GTTCGTCTTC GGTTGTACTG TATTCGGTTT CTGTATCTGT
48151 AGTGGAATCC TCTGTACTAG TAGTAGTGTC GCTATTGTTG GCGTCAGGCC
48201 TTGGCTGCCA TTTACCGTCT ATCAACATGT ATTTTTTCCT AACAGCACAA
48251 CATGCTAGCT TGGTAGCTAT CTGTGTCGAC TTATATTTTT GTAAACTACG
48301 ATCGTAGAAT TTTTCAAATA TCCTCTTACC GTTATAGGGA AGGTTTTGAT
48351 AATATTTAGG CAACATATCA ATAAAAGACA ATATAAAAAC TTTGTGTTTG
48401 TGTTTTATTT ATCACATAAA ATGGACGTCT GGCAAGAATC ACAACCAATA
48451 TTAGTGTTTT TTTTCTTACA TTACGAGATT CAACTTGATA CTAAAATTAA
48501 TTATTAATTA AATTAAATTA AATTTTGAAG CATTTTTTCG CTATCGTTTT
48551 CAGACTCAAA ATTATCGACG CTATCGCTAT GAAAAGCGTA ATATTTGTTG
48601 GCTTTGAGAT ATTCTATATT TTGCTCATTT TTAACAATAA ACACGCGACT
48651 CTTTTCGTCG CGTCTCACCA TAACACCGTT TTTACAAATG GAAATGTATT
48701 TGTAAAACGG CAACAGAGCG TCGCGAGTTT TTTTAAGTAA CAGCTTTTGC
48751 TCCGCTGTGG CGGCCACAAA TATTTTTACG GGCCCGTCGT AATTAATGTT
48801 TAAATTAAAA TTTTTAAGTC GACGCTCGCG CGACTTGGTT TGCCATTCTT
48851 TAGCGCGCGT CGCGTCACAC AGCTTGGCCA CAATGTGGTT TTTGTCAAAC
48901 GAAGATTCTA TGACGTGTTT AAAGTTTAGG TCGAGTAAAG CGCAAATCTT
48951 TTTTAAATAA TAGTTTCTAA TTTTTTTATT ATTCAGCCTG CTGTCGTGAA
49001 TACCGTATAT CTCAACGCTG TCTGTGAGAT TGTCGTATTC TAGCCTTTTT
49051 AGTTTTTCGC TCATCGACTT GATATTGTCC GACACATTTT CGTCGATTTG
49101 CGTTTTGATC AACGACTTGA GCAGAGACAC GTTAATCAAC TGTTCAAATT
49151 GATCCATATT AACTATATCA ACCCGATGCG TATATGGTGC GTAAAATATA
49201 TTTTTTAACC CTCTTATACT TTGCACTCTG CGTTAATACG CGTTCGTGTA
49251 CAGACGTAAT CATGTTTTCT TTTTTGGATA AAACTCCTAC TGAGTTTGAC
49301 CTCATATTAG ACCCTCACAA GTTGCAAAAC GTGGCATTTT TTACCAATGA
49351 AGAATTTAAA GTTATTTTAA AAAATTTCAT CACAGATTTA AAGAAGAACC
49401 AAAAATTAAA TTATTTCAAC AGTTTAATCG ACCAATTAAT CAACGTGTAC
49451 ACAGACGCGT CGGTGAAAAA CACGCAGCCC GACGTGTTGG CTAAAATTAT
49501 CAAATCAACT TGTGTTATAG TCACAGATTT GCCGTCCAAC GTGTTTCTCA
49551 AAAAGTTGAA GACCAACAAG TTTACAGACA CTATTAATTA TTTAATTTTG
49601 CCCCACTTTA TTTTGTGGGA TCACAATTTT GTTATATTTT TAAACAAAGC
49651 TTTCAATTCT AAACATGAAA ACGATCTGGT TGACATTTCG GGCGCTCTGC
49701 AGAAAATCAA ACTTACACAC GGTGTCATCA AAGATCAGTT GCAGAGCAAA
49751 AACGGGTACG CGGTCCAATA CTTGTACGCG ACGTTTCTCA ACACGGCCTC
49801 GTTCTACGCC AACGTGCAAT GTTTAAATGG TGTCAACGAA ATTATGCCGC
49851 CGCGGAGCAG CGTAAAGCGC TATTATGGAC GTGATGTGGA CAACGTGCGT
49901 GCATGGACCA CGCGTCATCC CAACATTAGC CAGCTGAGTA CGCAAGTCTC
49951 GGACGTCCAC ATTAACGAGT CATCTACCGA CTGGAATGTA AAAGTGGGTC
50001 TGGGAATATT TCCCGGCGCT AACACAGACT GCGACGGTGA CAAAAAAATT
50051 ATTACATTTT TACCCAAACC TAATTCCCTA ATCGACTCGG AATGCCTTTT
50101 GTACGGCGAC CCTCGGTTTA ATTTCATTTG CTTTGACAAA AACCGTTTGT
50151 CGTTTGTGTC ACAACAAATT TATTATTTGT ACAAAAATAT TGACGCAATG
50201 GAGGCGTTGT TTAAATCTAC ACCATTGGTT TACGCGCTGT GGCAAAAACA
50251 TAAACATGAG CAGTTTGCAC AGAGGCTAGA GATGTTGTTG CGTGATTTTT
50301 GCTTAATTGC CAGTTCAAAC GCTAGTTATT TACTTTTTAA ACAGCTTACA
50351 CAGCTCATAG CTAACGAAGA AATGGTGTGC GGAGATGAAG AAATATTCAA
50401 TTTAGGCGGC CAATTTGTAG ACATGATTAA AAGCGGTGCT AAAGGCAGTC
50451 AAAATCTGAT TAAAAGCACG CAACAATACC GACAGACTTT AAATACAGAT
50501 ATTGAAACTG TGTCTTCACG AGCCACCACC AGTTTAAATA GTTACATATC
50551 TTCTCACAAT AAGGTAAAAG TGTGTGGCGC CGACATATAT CATAACACGG
50601 TTGTGTTACA GAGCGTGTTT ATTAAAAATA ACTATGTTTG TTACAAAAAC
PCMB95/00578
127
50651 GACGAACGTA CAATCATGAA TATTTGCGCT TTGCCCTCTG AGTTTCTGTT
50701 TCCAGAACAT TTGCTCGACA TGTTCATTGA ATGATAATAT AAATAGAGCG
50751 CATTTGATTG CATGCAATCA GTGTTTTATT AATTTTAGAG CAACATGTAC
50801 GATAAATTTA TGATCTATCT TCACTTGAAT GGGCTGCACG GAGAAGCAAA
50851 ATACTACAAA TATTTAATGT CTCAAATGGA TTTTGAAAAT CAAGTAGCCG
50901 ATGAAATCAA GCGGTTTTGT GAAACTCGTC TGAAACCGGC AATCAGTTGC
50951 AACACTTTAA CTGCGGAAAG TCTCAATACG CTCGTAGACA GCGTAGTCTG
51001 CAAAAATGGA CTGTTAAATC CTTACGCCAA AGAAGTACAG TTTGCTTTGC
51051 AATATCTTTT TGACGATGAC GAAATATCCA AACGAGATCA AGATGGCTTT
51101 AAACTATTTT TATTACATAA TTATGACAGG TGTGAAAATA TGGAAGAATA
51151 TTTTTTAATT AACAATTTTA GCATAGCAGA CTACGAATTT GAAGACATGT
51201 TTGAAATTGT TCGTATTGAT TGTAGAGATC TGTTATTACT TCTTGCTAAA
51251 TATAATATGT AATTAAAATT TTGTTTGTTT TATTAAAATC CTGGATTAAA
51301 AAATGACGAA TAATTTGATT TGCGTGCACG CCAACAAGAT TCTTCGTCAT
51351 TATGATCAAT GCGTGCATCA AGTTTATGCT TTTGTAATTG GCTTCTGACC
51401 ACTTTAGCCA TTTGAGCGTA TCTGCATTCG TCGTCTAGAG TTTCAAACAC
51451 CAGATCGGCG CAATTATAAA ATCCTTCACC CACGGGATCT ATGCGCTGCC
51501 AACGCACATA CATTACAAAT TGATTTGACC TGTACGGTAT TACTACGGGT
51551 ATAGAATAGA CTAGACTGTT GTCACATAAT GAATCGCCCG GATTTGGAAT
51601 TAAATTTGAA TCGTTACCAC CTATGTATTC TAATTCGTTC CAAGTTATTG
51651 GATTGCGACG ATCCCAGTTT GATTTAGTAA TAAACACTTC AAAATAACTG
51701 GGCTCGTGTA TGGCTGTTGG ACAAAAATGA ACATTCATCT GATAAACCGG
51751 TTGATAGCGA TTTAAATATA GCGTATTTGG CCTCCAGTTG TTAAAAGGTT
51801 CGTCCATTCC GCTTTTATCA CCAAACACAG AATTGCGATC GTTTGAACCG
51851 GCACCGCAAA GTGTGTGCGG CACAACCCTT TGTTTGATTA GGTCAAAATC
51901 GTCATAATTA GGACCGGCCA CAGCCGCGTA TTCCATATAC TGTTGAAACA
51951 TGTATTGCGC TGTGGAAGCG GCCGCCCCGG ATTCTAAATC GAGAGCTCGA
52001 TATTTATAAT AGACTGATTT GTAAGCATTG CGGCACGCGG CGTCGGGAAT
SUBSTITUTE SHEET.(RULE 26)
52051 GTTATCGCCA TTGTCGGGCC AATAAAAGTT TCCATCTTTA AAACATTTAT
52101 ATTGACGGGC CGTCGGCACG GACAAATAGC CGTGAGAGCG CACTGCCGGC
52151 GCGTGAATCG CAGCAAACAA TGCAATTAAT AATGCAATCA TTATGATTAT
52201 ACTTATAGAA CACTAATCGG AATAATAACC GCTGTCGTAA TCTTGGTCAA
52251 AAACGTTATG TTGAAACATA ATAACACCTT ACAGTAACAT ACAATAAAAC
52301 AACATAGTAT CGTATATAAT TATAAACTTT ATTTTTTCAT TTTATACAAA
52351 CAAAATTTAT ACGTATTGTT AGCACATTGA GTGTCATTTT CGCTGTCTGA
52401 ACTATCACAA TCATCGTCAT CATCATCATC ATTGTCATCG TCGTCGTCAC
52451 GTTTGCGTTT GACACTGCAT TTTTTTTGGT TAATTTTCAC TAACACTGGT
52501 TCTTTTCGAT CGTACAATTG ATTCTGCATG TACTTTTGCA TGATCGCGGT
52551 AAAACACTTT GCAATTTTAT CCTTTTGTTC GTCGCCAAAT ATTTCCAGCA
52601 ACTCGTTCAT AAATGTGCAC AAAATGCCCA TGTGTTTTAT CCAGCTGATT
52651 CGCATTTTCA CTGGATCGAA CAAACGCAAG GGGTACGCTT TTTCTGTTAC
52701 CTTGCCTTCG ATGTCTATCA AAAGGTACGG GATACGATCT CCGTTGCCGG
52751 GCACAAAATC CGTGCCTTTG TTAACCAAAA TTTCTCTACA ATGCCTAGCC
52801 ACCGTAATCA CGCGTCTTTT GGGTGACGGA CCCTCATTAT CGTCAGTTGA
52851 TTTGCGTTTT TTGCCCGGGT TATCGTTATA GGTCATACTA AAGCTGTAGT
52901 CGGTCAACGA TTTTGATTTG GCAAACTCAT CATAGTATTC ATAAAAACTA
52951 GTCTGTAAAC TTTGCAAACA TTTGTCCATG TCCAAATGAC GCAATATTTG
53001 TTCCACTGCC GTCCTAAACG CGATTCTCAT AAAAACGGGC ATATCCTTTT
53051 TAACTAACCA ACCCTTGTAT ACGATTTTAT TCTCACTGTT GAGATAGCAA
53101 TATTTTTTCT TTTTTAATAG TATTAAAACT TTCATTAAAT TTTCAAATGC
53151 CATTTTGTAA CCGTCCGTGA ATGAGTTATT AACGCGTGTC TCAACATGTG
53201 TGCATATTTG TTTTAATGTG TCGGTTTCGT TGGATATTTC GTTATAGTTA
53251 AATGTGGGCA AAACAAATGT AGAATCTGTG TCGCCGTACA CAACTTTAAA
53301 AGTGATGCTG CCCAGATTGA ATTTTTCTAA AATCTCAGGG TCGTTGCTCA
53351 AACCTTCAAT CAGAGAAATG GCCAGCCGCA ACTGATTGCG ACCAACTCTA
53401 GTGATGTAGT TTGCAAGCAC TTTGTAAAAA ATGCCATAAT AACCGTATAT
53451 GCTATTGGCG GTGCGCTTCA CGGAATTTTG TTTTTGATCG TACAGATCGT
53501 ACAAGAATGC CGATTCGCTT TGATTGTCGC GATTCTTTTT AAATTTGCAC
53551 CTTTCGCTTA ACAATTTTAA TAGCAATTTA ACAACTATTG CACGCGAATT
53601 GTGGTTCAAA TACACGTTGC CGTCTTCGCA TAAAATTAAA TTGGACAAAC
53651 AAGCACAAAT GGCTATCATT ATAGTCAAGT ACAAAGAATT AAAATCGAGA
53701 GAAAACGCGT TCTTGTAAAT GCCTGCACGA GGTTTTAACA CTTTGCCGCC
53751 TTTGTACTTG ACCGTTTGAT TGGCGGGTCC CAAATTGATG GCATCTTTAG
53801 GTATGTTTTT TAGAGGTATC AATTTTCTTT TGAGATTAGA AATACCCGCT
53851 GCGGCTTTGT CGGCTTTGAA TTGGCCCGAT ATTATTGACA GATCGTTTTT
53901 GTTAAAAAAA TACGGGTCAG GCTCCTCTTT GCCGGTGCTC TCGTTAATGC
53951 GCGTGTTTGT GATGGCTGCG TAAAAGCACG CCACGCTAAT CAAATGCGAA
54001 ATATTACATA TCACGTCGTC TGTACACAAA CGATGCAATA TACATTGCGA
54051 ATATACAGAA TCGGCCATTT TCAATTTGAC AAACAATTTT ATCGGCAACA
54101 TGCAATCCTG CACGTTGTAC TTGGCAATCA CGTCCAGCCG TCGAGTGTTG
54151 TACATCTTGA CCATTTCGGT CCAAGGCAAA TCGATTTTGT TTTCACCCAA
54201 ATAGTAACTA CTGATTGTGT TCAATTGAAA GTTTTCAACT TTATGCTGAT
54251 TAGAATCGCT GCTGAAAAAT TTATACAAAT CAATGTGAAT GTAATAGTTA
54301 AAATAATACG TGTCCACTTT GTTGCCCAAC TTGTTTATAA ACAGCTTTGT
54351 CGTCGGCGCC GCAGCCGGCA AATCGTAACG CTTTAATAGC ATTTTGGTTT
54401 TATTCAATCG TCCAAGTATA TAGGGCAGAT CAAATACGTC TCCGTTAAAA
54451 TCCAAAATCA CATCGGGATT TGTAATTTTT ATCATGTCAA AAAACGCTGT
54501 AATCATGTCG ATTTCATTTT GAAACATGAC CACATACGTG TCATCGTCAT
54551 AGGTCTCTGG AATCTGGGTC GGCAGCTTGT GATACATAAA ACAAAATTTT
54601 GCATACTCGT CGTTTTTGTA CACCACAAAT CCTATAGACA TTATGCAATC
54651 AACCGATGCT TTCGACATGT TGTGGCCGTC CGAATGAGTC TCAATGTCAT
54701 AGCACGACAA AACGGGCATG ATGCCGCTGG TTAAAGTCAT TTCATCGACC
54751 AACTCAAAGT CTTCATTAAA ATGTTGCAAA TTAAACATGC GCGTCGTCGA
54801 TCCACCGACA TAGTTATTTT GGCAGCGTTG TGTTTTCTTG AATCGCATAT
/01320 PCMB95/00578
130
54851 AGGCGCCTTC CACAAACGGC GTTTGCATGT GTACGCGATT AACGTTGTGA
54901 AGAAACTTGT CCAAACACGC CGCGTTGTCC GATGGCGCTG CTTTGTTTCT
54951 TTCGTATTTA ATCACGTTTA TCTTGTTCAA ATAATTTCCT TCCACGCCCG
55001 GCGCCACAAA CGTGGTGTAG CTGATGCACT TGTTGCGGCA AGACGGAAAT
55051 ATGTGCTTGT CGTAGCATTG TTTGTAAGAA TACAAATTTA GTTTTACTTT
55101 AAAGTAAAAC TGCAGCACTC GTTCTTTGAT ATTTGTATTA CAAAATGCAA
55151 ACAAGCAACC TTGTTTTTCA TCGTAATGCA AACGAATGAT ACGAAACGTA
55201 TCGGCTGAAG TAATATTGAA TTCTCCTGGT TTTGCATATT CTGCAAAGCG
55251 CGTTTTGAGT TCATTGTAAG GATATATTTT CATTTTTAAA TATGCAGCGA
55301 TGGCCCAAAT ATGGAGGCAC AGACGTCAAC ACGCGCACTG TACACGATTT
55351 GTTAAACACC ATAAACACCA TGAGTGCTCG AATCAAAACT CTGGAGCGGT
55401 ATGAGCACGC TTTGCGAGAG ATTCACAAAG TCGTTGTAAT TTTGAAACCG
55451 TCCGCGAACA CACATAGCTT TGAACCCGAC GCTCTGCCGG CGTTGATTAT
55501 GCAATTTTTA TCGGATTTCG CCGGCCGAGA TATCAACACG TTGACGCACA
55551 ACATCAACTA CAAGTACGAT TACAATTATC CGCCGGCGCC CGTGCCCGCG
55601 ATGCAACCAC CGCCACCGCC TCCTCAACCC CCCGCGCCAC CTCAACCACC
55651 GTATTACAAC AATTATCCGT ATTATCCGCC GTATCCGTTT TCGACACCGC
55701 CGCCAACACA GCCGCCAGAA TCGAACGTCG CGGGCGTCGG CGGCTCGCAA
55751 AGTTTGAATC AAATCACGTT GACTAACGAG GAGGAGTCTG AACTGGCGGC
55801 TTTATTTAAA AACATGCAAA CGAACATGAC TTGGGAACTT GTTCAAAATT
55851 TCGTTGAAGT GTTAATCAGG ATCGTACGCG TGCACGTAGT AAACAACGTG
55901 ACCATGATTA ACGTTATATC GTCTATAACT TCCGTTCGAA CATTAATTGA
55951 TTACAATTTT ACAGAATTTA TTAGATGCGT ATACCAAAAA ACAAACATAC
56001 GTTTTGCAAT AGATCAGTAT CTGTGCACTA ACATAGTTAC GTTTATAGAT
56051 TTTTTTACTA GAGTCTTTTA TTTGGTGATG CGAACAAATT TTCAGTTCAC
56101 CACTTTTGAC CAATTGACCC AATACTCTAA CGAACTTTAC ACAAGAATTC
56151 AAACGAGCAT ACTTCAAAGC GCGGCTCCTC TTTCTCCTCC GACCGTGGAA
56201 ACGGTCAACA GCGATATCGT CATTTCAAAT TTGCAAGAAC AATTAAAAAG
56251 AGAACGCGCT TTGATGCAAC AAATCAGCGA GCAACATAGA ATTGCAAACG
56301 AAAGAGTGGA AACTCTGCAA TCGCAATACG ACGAGTTGGA TTTAAAGTAT
56351 AAAGAGATAT TTGAAGACAA AAGTGAATTC GCACAACAAA AAAGTGAAAA
56401 CGTGCGAAAA ATTAAACAAT TAGAGAGATC CAACAAAGAA CTCAACGACA
56451 CCGTACAGAA ATTGAGAGAT GAAAATGCCG AAAGATTGTC TGAAATACAA
56501 TTGCAAAAAG GCGATTTGGA CGAATATAAA AACATGAATC GCCAGTTGAA
56551 CGAGGACATT TATAAACTCA AAAGAAGAAT AGAATCGACA TTTGATAAAG
56601 ATTACGTCGA AACCTTGAAC GATAAAATTG AATCGTTGGA AAAGCAATTG
56651 GATGATAAAC AAAATTTAAA CCGGGAACTA AGAAGCAGCA TTTCAAAAAT
56701 AGACGAAACT ACACAGAGGT ACAAACTTGA CGCCAAAGAT ATTATGGAAC
56751 TCAAACAGTC GGTATCGATT AAAGATCAAG AAATTGCCAT GAAAAACGCT
56801 CAATATTTAG AATTGAGTGC TATATATCAA CAAACTGTAA ATGAATTAAC
56851 TGCAACTAAA AATGAATTGT CTCAAGTCGC GACAACCAAT CAAAGTTTAT
56901 TTGCAGAAAA TGAAGAATCT AAAGTGCTTT TAGAAGGCAC GTTGGCGTTT
56951 ATAGATAGCT TTTATCAAAT AATTATGCAG ATTGAAAAAC CTGATTACGT
57001 GCCGATTTCT AAACCACAGC TTACAGCACA AGAAAGTATA TATCAAACGG
57051 ATTATATCAA AGATTGGTTG CAAAAATTGA GGTCTAAACT GTCAAACGCC
57101 GACGTTGCCA ATTTGCAATC AGTTTCCGAA TTGAGTGATT TAAAAAGTCA
57151 AATAATTTCT ATTGTACCAC GAAATATTGT AAATCGAATT TTAAAAGAAA
57201 ATTATAAAGT AAAAGTAGAA AATGTCAATG CAGAATTACT GGAAAGTGTT
57251 GCTGTCACAA GTGCTGTAAG CGCTTTAGTA CAGCAATATG AACGATCAGA
57301 AAAGCAAAAC GTTAAACTTA GACAAGAATT CGAAATAAAA TTAAACGATT
57351 TACAAAGATT ATTGGAGCAA AATCAGACTG ATTTTGAGTC AATATCAGAG
57401 TTTATCTCAC GAGATCCGGC TTTCAACAGA AATTTAAATG ACGAGCGATT
57451 CCAAAACTTG AGGCAACAAT ACGACGAAAT GTCTAGTAAA TATTCAGCCT
57501 TGGAAACGAC TAAAATTAAA GAGATGGAGT CTATTGCAGA TCAGGCTGTC
57551 AAATCTGAAA TGAGTAAATT AAACACACAA CTAGATGAAT TAAACTCTTT
57601 ATTTGTTAAA TATAATCGTA AAGCTCAAGA CATATTTGAG TGGAAAACTA
57651 GCATGCTTAA AAGGTACGAA ACGTTGGCGC GAACAACAGC GGCCAGCGTT
57701 CAACCAAACG TCGAATAGAA TTACAAAAAT TTATATTCAT TTTCATCTTC
57751 GTCATACTTC AACAGTCCCA ACACGTTCAT GTTGTGATTC TCGCCGTTCT
57801 CGACAGTTAC GTAAATAGTT ACTTTGATTA AATTATCTTC CAGCAGCATT
57851 GAGATTTGAT TGAAATCCGC ACATAGCTTT TGTAGCGAAT CCGCTTCGGT
57901 TTTTTTATTT GTGTTGACGT AGAAAACAGA TTTGTTCCAT TTGCCCAAGT
57951 CGGAAGAGGT AGAACAGTCA TCCGAATCGG CAATGTTCAA CTCGTCGCTT
58001 TTAAACTGCA CAATAAACTT GTTATCGCCC ATGTCATTTT CTTCCAATTC
58051 GCTTTTTAAC ACATTTACAT TGTACGAAGC AACGTGTTTG TTCGATCGAC
58101 TAATGTTGAT CTTTGCGTTT GTGCAATTTT GCAAATTTGA ATATGCTTCG
58151 CTTTCTTTAG CCTCGCACAA TTCGATGCGC GTAGAGTTGA CCACGTTCCA
58201 ATTCATGTAC ACGTTTGATC CATTAAAAAT TTGTTGACAC TTTATACTGT
58251 AAATGGTAAA GATTTGGTTT TCATTGTCTT TTAAATATTT AAACACCTCA
58301 TTGATGTCGT CAGACCCCTT TATATTGTTC TTGAATAGAT TTATTAGTGT
58351 TTTCGCATTG ACAGAACATT CCACTTGAAC CACGTCGGGA TCGTCGTTGA
58401 GATTTTTGTA CACAACCTCA AAAACAACTT TGTACAAACC GCTGTTGATT
58451 TTCTTGTAGA TAAATTTGTA CTTTACAATA ATATTGACGC CATCTTCATT
58501 TTCAAAATGT TTGTTAGTCA AATAGTCGCT CATGGGGGTT GCAGTTTCAA
58551 TTTCCATTTC ACATTCTTTG TATTCGTTGA TCTGAATCAT TTGACTAAAC
58601 TTTGTTTTCA CATAATTTAA ACTAATGTCA TAGCACTTGC CTTCTTCCAT
58651 GTCTTTGAAA GATTGCGAAT CGCCGTAGTA TTCTTGAATT TTGTTGTCGG
58701 ACATTATTCG AAAAGTGTAA TGGTATTCAT TATCGATACT CAACGTCATT
58751 TTGCTCATCA ATTTACCACT AATCCTTTTG TAATTTTCTC TAATCTTCTT
58801 GGGGCTACTG GCCATAGCCA TGCGTTTTAT AAGCGGCTCA CCGCTACTTT
58851 CTCCAGACAA AGATCTTTTG GTCGCCATAT TGCTGTTGTC GATATGTGGG
58901 AATCTATCCG ATGGCAAATA CTGAATGGCG ACGAAATCGA AGTGTCGCCA
58951 GAGCACCGTT CGTTAGCGTG GAGGGAGTTG ATTATAAACG TGGCCAGCAA
59001 CACGCCGCTC GACAACACGT TCAGAACAAT GTTTCAAAAA GCCGATTTTG
59051 AAAATTTCGA CTACAACACG CCGATTGTGT ACAATTTAAA AACAAAAACT
59101 TTAACAATGT ACAACGAGAG AATAAGAGCG GCTCTGAACA GACCCGTCCG
59151 ATTTAACGAT CAAACGGTCA ATGTTAATAT TGCGTACGTA TTTTTGTTCT
59201 TTATTTGTAT AGTTTTGCTG AGCGTGTTGG CCGTCTTTTT CGACACAAAC
59251 ATTGCGACCG ACACGAAGAG TAAAAATGTT GCAGCAAAAA TTAAATAAAC
59301 TCAAAGATGG TTTGAACACG TTCAGCAGCA AGTCGGTGGT TTGCGCTCGC
59351 TCAAAATTAT TTGACAAACG CCCAACGCGC AGACCTAGAT GTTGGCGAAA
59401 ACTATCAGAG ATCGACAAAA AGTTTCACGT TTGCCGACAC GTTGACACGT
59451 TTTTGGATTT GTGCGGCGGA CCGGGCGAGT TTGCCAACTA TACCATGTCG
59501 TTGAACCCGC TTTGCAAAGC GTATGGCGTC ACGTTGACAA ACAACTCGGT
59551 GTGCGTGTAC AAACCGACAG TGCGCAAACG CAAAAATTTC ACAACCATTA
59601 CGGGGCCCGA CAAGTCAGGC GACGTGTTTG ATAAAAATGT TGTATTTGAG
59651 ATTAGCATCA AGTGTGGCAA CGCGTGCGAT CTGGTGTTGG CAGATGGCTC
59701 GGTTGACGTT AATGGACGCG AAAACGAACA AGAACGTCTC AACTTTGATT
59751 TGATCATGTG CGAGACGCAG CTAATTTTAA TTTGCCTGCG TCCCGGCGGC
59801 AATTGCGTTT TAAAAGTTTT CGACGCGTTT GAACACGAAA CGATCCAAAT
59851 GCTAAACAAG TTTGTTAACC ATTTCGAAAA ATGGGTTTTA TACAAACCGC
59901 CTTCTTCTCG GCCTGCCAAT TCCGAACGCT ATTTAATTTG TTTCAATAAA
59951 TTAGTTAGAC CGTATTGTAA CAATTATGTC AACGAGTTGG AAAAACAGTT
60001 TGAAAAATAT TATCGCATAC AATTAAAAAA CTTAAACAAG TTGATAAACT
60051 TGTTGAAAAT ATAACGTGTG TATAAAAAGC CAGCGGCTTC AAATCAGGCA
60101 TCATTCAACA TGGATTCGCT AGCCAATTTG TGCTTGAAAA CCCTGCCTTA
60151 CAAGTTTGAG CCGCCTAAGT TTTTACGAAC AAAATATTGC GACGCATGTC
60201 GCTACAGATT TTTACCAAAA TTTTCTGATG AAAAATTTTG TGGACAATGC
60251 ATATGCAACA TATGCAACAA TCCAAAAAAT ATAGATTGTC CATCATCATA
60301 TATATCGAAA ATTAAACCGA AGAAAGAAAA CAAAGAAATA TATATTACCA
60351 GCAACAAGTT TAATAAAACG TGCAAAAACG AATGTAATCA ACAATCAAAC
60401 CGGAGATGTT TAATTTCCTA TTTTACAAAT GAAAGTTGTA AAGAGCTCAA
60451 TTGTTGTTGG TTTAATAAAA ACTGTTACAT GTGTTTGGAA TATAAAAAGA
60501 ATTTATACAA TGTAAATTTG TATACGATTG ATGGTCATTG TCCTTCGTTT
60551 AAAGCCGTTT GTTTTTCATG TATAAAAAGA ATCAAAACGT GCCAAGTTTG
60601 CAATCAACCT TTATTGAAAA TGTACAAAGA GAAGCAAGAA GAGCGTTTGA
60651 AGATGCAGTC GCTGTACGCA ACGTTGGCCG ATGTAGATTT AAAAATATTA
60701 GACATTTACG ATGTCGACAA TTATTCTAGA AAAATGATAT TGTGTGCTCA
60751 ATGTCATATA TTTGCACGCT GTTTTTGTAC CAATACCATG CAATGTTTTT
60801 GTCCTCGACA GGGTTATAAG TGTGAATGTA TATGCCGACG ATCTAAATAT
60851 TTTAAAAATA ATGTATTGTG TGTTAAAAGT AAAGCGGCTT GTTTTAATAA
60901 AATGAAAATA AAACGTGTTC CAAAATGGAA GCATAGTGTA GATTATACTT
60951 TCAAAAGTAT ATACAAGTTA ATAAATGTTT AATTTTAAGG ATATTGTTAT
61001 GGAATAAACT ATAAAATGAA TTTGATGCAA TTTAATTTTT TGATACTTTC
61051 CACAGACGGT AGATTCAGAA CGATGGCAAA CATGTCGCTA GACAATGAGT
61101 ACAAACTTGA ATTGGCCAAA ACGGGGCTGT TTTCTCACAA TAACCTGATT
61151 AAATGTATAG GCTGTCGCAC GATTTTGGAC AAGATTAACG CCAAGCAAAT
61201 TAAACGACAC ACGTATTCGA ATTATTGCAT ATCGTCAACC AACGCGTTGA
61251 TGTTCAATGA ATCGATGAGA AAAAAATCAT TTACGAGTTT TAAAAGCTCT
61301 CGGCGTCAGT TTGCATCACA ATCCGTGGTC GTTGACATGT TGGCTCGTCG
61351 CGGCTTCTAT TATTTTGGCA AAGCCGGCCA TTTGCGTTGT TCCGGATGCC
61401 ATATAGTTTT TAAATATAAA AGCGTAGACG ACGCCCAACG CCGGCACAAA
61451 CAAAATTGCA AGTTTCTCAA CGCAATAGAA GACTATTCCG TCAATGAACA
61501 ATTTGGCAAA CTCGATGTTG CGGAAAAAGA AATACTGGCT GCCGATTTGA
61551 TTCCTCCGCG GCTAAGCGTT AAACCTTCGG CGCCGCCCGC CGAACCGCTA
61601 ACTCAACAGG TCTCCGAATG CAAAGTTTGT TTTGATAGAG AAAAATCGGT
61651 GTGTTTCATG CCGTGCCGTC ACCTGGCTGT GTGCACGGAA TGTTCGCGTC
61701 GGTGCAAGCG TTGTTGTGTG TGCAACGCAA AAATTATGCA GCGCATCGAA
61751 ACATTACCTC AGTAAACATT GCAAACGACT ACGACATTCT TTAAAAATAA
61801 GCTATATATA AATATTGCAT TGTATGACAA AAAAATTATT AACCTACTGC
61851 AAAGTAAAAC TTGTAAAAGG CTTTTCAAAA AAATTTGCGA GTTTATTTTG
61901 TCGCTGCGTC GTGTCGCATC TAAGCGACGA AGACGACAGC GACGGTGATC
61951 GCTATTATCA GTATAATAAC AATTGTAATT TCATATACAT AAATATTGTA
62001 AAATAAAAGA CATATTATTG TACATAATGT TTTATTGTAA TTAAATTAAT
62051 ACACCAATTT AAACACATGT TGATGTTGTT GTGAATAATT TTTAAATTTT
62101 TACTTTTTTC GTCAAACACT ATGGCGTTGC TTTCGATTAG TTTTTTCGTT
62151 AGCATTTCAT CTAAAAAATC AAACTGTTTG CCCGGCGCGT TTAGGGATTC
62201 TATGGTGTAG TCGGGCGTGT CGCTGTTTAG ATATTGGTCC ACTTCGCGCA
62251 TTATGTCCAA GACGTTGTTC TGCAAATGAA TGAGCTTTGT CACCACGTCC
62301 ACGGACGTGT TCATGTTTCT TTTTTGAAAA CTAAATTGCA ACAATTGTAC
62351 GTGTCCACTA TACAATTCGG CTTAATATAC TCGTCGGCGC AATCGTATTT
62401 GCAATCCAAT TTCGTGTTCA ACAAATTGGT GATGATATCT TTGAACGTGC
62451 ACGTTTTCAA TTTGTCCTTA TCGGCCAACG CAAGTTTCAA TTCGCTCTGT
62501 AAAGTTTCTA AAATTTTGTC TTTATTGTTG TCAAATTCGT GCGTGTTGCG
62551 TTCCAACCAC AATTTGAACG GCTCGTCGAC AAAAATGCTG CGCAACACCT
62601 CGTACAACTG TCTGCCTAAC GTGTACACTT GCTCGTATTC TTTCATGCTG
62651 ACCTCTTTGC TAACGTACAT TACTAAAAAA TCTACAAGTA TTTTCAAACA
62701 TTTGTAATAG GCGACGTATT TTGATTTAAG TTTTAAACCG TCCACCGTGT
62751 ATTCGTCCAC GTTCGCATCG ACCACTTTTC GATTATTATC GCCGCTTGTT
62801 GCCGGCGCGT CGGCCTGTTC GGTTTTAACT ATATCCGGTT CAATATTTAA
62851 AGTTTCAAAA GATTTAATGG CATTCATAAA ATCATCTTTT TGCTTTGGCG
62901 TGGTCAATGG TAAATCTATC GAGGAGTTGT CGTCCGTGTG CTCTTCGGGC
62951 ACGCTGTTCA GACGTAACGT AATCTTTTTG GGATCGTCTT CATCGGGTAT
63001 CAAATCGGCT TTAATTTTAT TAGAATTGAG CAACGACATG GTGGTCGCTT
63051 GTAAATTTAA TAAATTAATT AAAGACTGAA ATTGTATATT GCACAAATTT
63101 ATTTTCATTT TTATTGATCT TACTATTAAT ACGCTGGCAG TTGGTATGCT
63151 TCATCCATTT TTGTGACTAG AAAATTTGCT AAAAAACTGA GCTCGTCCTG
63201 TGTTAAAACG TTGTCGTCCA CGAATCTATG CAATGTAAAT GTTACACTGA
63251 CATTGTTTAA CAATGCATGT ATTAAAAAAT CAACCTGTCG CCTACTGAGT
63301 TTATTAGAAG AGTCGACCGT TTCTACTAGT TTGTAGATTT TGTTATTTTC
63351 AATTTCATTG TTTAAAAACA TGTTAACTAC TCGTTTGAGT TTAAGCGAAA
63401 AATCCTTGTC CGGATAGACT TGTTCGCACA GCCAATTGCT AAGAGTGGTT
63451 TTGACCACGG ACACCTTGGT GGTGAACGTC GTCGATTTGA CCAGTTCGGT
63501 GAAAAAGTTT TTCATTAAAT TGGACATTTT AACAAACACT TATCAATCTA
63551 TTGAGCTGGT ATTTTTGTTT AGAATCGCAT CAAGCGCTTG CTCGATCTCC
63601 AATTTTTTTC GGACGCTCTT AGCTTTATGA CTCGGTATGT CTTCTACGGT
63651 AGACTCGGTG TTCTTACTTA TAATGGCCGG GCTGACGATA ATAAACACGA
63701 GAAACAATAT GAGCAGATAC AAAAAGATGC TGTTTTCCTT TTTGTCATAC
63751 ACTAGGCTAA ATATGGCCAG TGCGCCCAAC AACAAATATA AATTCATTTT
63801 TATTCCCTTA CTCTATTCGT TGCGATAGTA CAACAACGAT TCTCCCGACG
63851 AACCGGACGA ATTGCGATTA TGCTGCGCGT CGTCGTCGTC GTTGTTGTTC
63901 TCCTCTTCGC TGCTCGTTTC GTCTAAACCT ATATTGTATT TGTTCAAGTA
63951 ATGTTTGGTG CTTGCGGAGG ATTCGTGGTT CATTAATTTG GCCACTTTTT
64001 GTAAAGGCAC GCCGCTATTG TATAGGTTAC TGCTCAAATA ATGTCTTATC
64051 ATGTTGCTGC GCGGCCGTTC CATCTCGACG CCCGACTCTT CAAGGAGTCG
64101 CCTGAAATCT TTGAAGGGCG TCGAGGTGTT TTTAGATATT TGCAAAATGG
64151 TCGGGTTTCG TGAATAAATC TCGCGTGCCA ATTCCAACGG TTTCATTTTG
64201 ATGTTGTTGA GTGTGTTATT ACGACTGCGT TTTCGCTTTA AATTAATCGT
64251 GTCGCTGTGC AGTTTTCCTC TTTTAATTAG CACGTTGAGA TCGTCCACGC
64301 TGAGTTGGCG CGCTTCGTTG ATTCGCATAC CCGTCCCTAA CATGATGCAA
64351 AACACTATCG CGCCCCTAAT TAGACCGCGG TCGTGAACAT AATCGCTGTT
64401 GAGCATTTTA ATTTTATCAT TAATAAAATT TAATATGGTA TCTATTACGT
64451 TTTTAAGCAT TAAATTCTTT TCCTTTTCCC TGATATTTTT GAGCTCCTTG
64501 TCGCGCGGCA GCATAACCAT GCGGGGAATT TTGTATTCGG GCAAGTTCAT
64551 CATGTTGGTG TAAAAGTTTA TAGTCAACTG TAGTGTTTCT TTGGTGACCG
64601 AGCGAAGTTC GAGCATGCGC CTGCACAGTT CTTGGGGATC AATGAGAAGT
64651 GTTTGGTTTT CTATCGAGTC AAACTCCTTG TCCAACGAGT ACGACATGTC
64701 TTCCAGGTGA ACATCGTCTA CCGAGCAGTA CACAATTTTA ATGAATCGAG
64751 ACTTGTAACT TTTTAAAGTG GTGGGCGCAA ACGGTTTGGG GAACATGTAC
64801 TTGCTCCACA GACTGTTGTT TTTCACCTCG TCGGGCGTGC ATCGTTGCCG
64851 ATCGGTGGCC AAATCGAACA CGGACTCGAA CCGGGGAGCG GATTGAATTT
64901 TTATTTTCCA AGAATTAAAA TTGTTTTCGT TGCGAACATT AAAACCGTTC
64951 ATTGTGGTTA ATCAAATTTA TTAAAAACAA AAGGAGAATC GGTGTCAATA
65001 CTATCCGAAT ATTGTTGTTG TTCTCTTAAT ATTACGAAAT AATATATTAC
65051 ATACAGCAGT AAGAATAAAG CTATAAAAGC GACTACACTA ATTAAAATTA
65101 TAATTCCCGC CGACACGTTG CTCGTCGTGT TGTCATAGCC CACCATGTCG
65151 TTTATTGGCA TTTTGTGAAC GGGCTCGCTA AATTGTTGCG GTTCGCTGGC
65201 AGTATCGTCG TTGAGCGCCA ATTTCAACGG GATGTATTCC ACCTTTTCGT
65251 GGTTGCCCAA CCGATAGTAG GGCACGTCCA AATTCATGTT TACAACTTAT
65301 TTGCTAACAG GAATTTATGC AACAAAAGTG GTTTGGCTTT GATGAGACGC
65351 AATTTGAAAT ACTTGCTGCA TTTACGCTTA AGATTGTATT CCATGCGGGC
65401 GGCGGTGTTG TAGTCGTACG CGCTCGCGCT GTGATACACG AGCCGTAAAT
65451 TGGTTGCGTT GCGCAAACAC TTGGCGCCTT GTTTGTTCGA ATGCTGTTTT
65501 ATGCGTCTGT TAAGATTGCT CGTGATGCCC GTGTACAATT TTCCATTGTC
65551 TTGCCGCAGA ATGTACACGC ACCACACCTT GTTGGTGTAC AGAGTCGTCG
65601 CCATGATTAT GCAGTGCGCC CTTTCGTGTT CGGCCGAGTG GCGTTAGGCG
65651 CAGCCGCGGC AATAATCGCG TTGGCGTCCT TGTTGTAATT TATTTGTTGA
65701 AAAATAAAAC GTCTTAGAGT TTCGTTTTGG AACGCCAATT CGGTCAAGCT
65751 CTCCTGGCAA GCGCTTTTGG TCAAATGAGC GGCCGGCGAA TTGACCGCGT
65801 TGGCGGCCGA CGTTAAGAAG GTGGCGTTCT GGAACATGCT GGGCTGCTTG
65851 CCGGCTCGCG TCGCCAGCTC GGCCATGTAA TTGAATATGT TGGCAGACGC
65901 AGATAGCGGC GCCAAAAACG CAACGTTCTC TTTTAAACTC ATGACTCGCG
65951 CCCTGTTTTT TTCGTTCAGC ACGTAGTGGT AGTAATCGCC GCCGCCGGCA
66001 AACAGATCGT CAATCACGGC GTTGATCAGA TCGTTGATCA TGTTGATGTG
PCMB95/00578
138
66051 CGGAAAGCGA CGCGACTCGA CTGCGCTCTG TATGTTTGGC GGCAGAGTGG
66101 CGTGCTTGAG CAACAGAGTC ATGTAATTGT TGGCCAGCTG CTGATTGAAA
66151 GGTAACGGAA TGGGAATGTT GCACGTCACC GCTTCCGCCA CCATGTACTG
66201 GACGGCCAGA CTGAGTTGTT TGGCGGCCTC GGCCAAAGCG TCTTTGCCCA
66251 ACATATCAGC GCCACCGTTG TAAAACTTTT GCGCGTACGC CGGCAGCGAA
66301 TTTAGCACAA ACGATGGCTG AAATATATTT GAATCGCTCG ACAGGGACTC
66351 GGCCGCGTTG CTCTGTCCCA ACTCTTTTTG CAACCGAATC AGGTGGCGTA
66401 TCATGGTTTC CTCCGATTCA AACCGCTTTA CCACGTTTAC GCTGATTGGG
66451 TTCGTGTCGA TGCACATGTC ACGAATAGTG TTTATAAAAA GAATCATGAG
66501 AGGACTAAGT TCTGACATGT CATTGCACCT GTAATATCTA ATAATCTTTT
66551 GAACAAAATC CACACATTTG TTGTACCAAA TAGATTCACC GGCGTCGAGC
66601 GTCGGTTCTT TGCTCTTGTT GTACGGTGCA ATCGCTACCG AGTTTGTGCT
66651 GTTGCTGCGG CTCGTGTAAT CCATCCTGTT GTCGCGCGTG GCGACGGTCG
66701 TAGGCACCGT CGCCGGCGGC ACGTACCCGG GCGCGTTGTA AGTTTGCGCG
66751 CTGGTGAATA TGGCCGTTGC CGGATTAGAG GGATACCTCA GCGGCGGAGG
66801 GGTGTTGTAA TAAAAATTGC CACGTTCATC TGTCATACTT TTTATTTGTA
66851 CTCTTATGAT TACAAAACTC AATATACGGA TTACTTATAA TATAGTTGTT
66901 GTGACAAAAA AGCGATAATA AAATTAACAA AATTATCAAC AAGTTAATCA
66951 TGGAAAATTT TTCAACGTTG AATAACAACA ACAAAATGGC GCAGGTCAAC
67001 AGCACCGTTT GAAAACTGAC GCGCCGACAC AAAATGCTTT CGCAATTTCT
67051 AAAAGCCACA TTAAACGAAT TTTCACCTTT GATATAATCA CGCAGTTCTT
67101 TTTTACAACA TTCGTCGCAC AAAATTAACA CCTTTATAAT GAGGCCGTCG
67151 GTGTGTATCG TTTGAAATGT CCGCGGTTGA CTGCCTGGAT GAAATTCAAA
67201 CGAGTACCCA GTGGACACGT GTATCTGTGC AAAATAATGG GCTAATATCG
67251 AGGCGCCCGT TTTTTTAACC TTTACTTTTG ATATTTTAAT AACATTAATG
67301 TTGTTATTTG CGTAATCAGA GTTTTTATTG TGGTGATCAT CGTACAAATA
67351 ATGAAGCAAC AGTTCACTAT CGTATTTAAT CTTGTTTAGC GTTGTCAAGT
67401 TTTTGTTTCT TAGGCGTTGG AGCGTCTCCG TCGTCGATAT TTTCTTCGAA
67451 ATCGAGTCCA ACAACGTCGG CGTTTCCTTC TTGCTCATCG ATAGCGGCGG
67501 CGGAGGCGGC CTCTCCGTCG TCGTCATTCG CGGTTTCTAC AGTGCGTTTG
67551 GGCGACGACG TGTGTACAGC AGCGTCCGTC TTACTATTAT CGGACCGCCA
67601 AATTTTTGTT TGAAATAACA TTTGGCCCTT GTTCAACTTT ATTTCGGCGC
67651 AGTTAAACAT TATTGCATTA AGATCATATT CGCCGTTTTG CACCAAATTG
67701 CACAAAACAC CATAGTTGCC GCACGACACT GTAGAATAGG CGTTTTTGTA
67751 CAACAATCTG AGTTGCGGCG AGCTAGCCAC CTTGATAATA TGGGCGCCAA
67801 CGCCCCGTTT TTTTAAGTAA TATTCGTCTT CAATTATAAA ATCTAGTACG
67851 TTTTCATCTT CACTGTTGAT TTGGGCGTTC ACGATGATGT CTGGCGTAAT
67901 GTTGCTCATG CTTGCCATTT TTCTTATAAT AGCGTTTACT TTAATGTATT
67951 TGGCAATTTA TTTTGAATTT GACGAAACGA CTTTCACCAA GCGGCTCCAA
68001 GTGATGACTG AATATGTGAA GCGCACCAAC GCAGACGAAC CCACACCCGA
68051 CGTAATAGGC TACGTGTCGG ATATTATGCA AAACACTTAT ATTGTAACGT
68101 GGTTCAACAC CGTCGACCTT TCCACCTATC ACGAAAGCGT GCATGATGAC
68151 CGGATTGAAA TTTTTGATTT CTTAAATCAA AAATTTCAAC CTGTTGATCG
68201 AATCGTACAC GATCGCGTTA GAGCAAATGA TGAAAATCCC AACGAGTTTA
68251 TTTTGAGCGG CGACAAGGCC GACGTGACCA TGAAATGCCC CGCATATTTT
68301 AACTTTGATT ACGCACAACT AAAATGTGTT CCCGTGCCGC CGTGCGACAA
68351 CAAGTCTGCC GGTCTTTATC CCATGGACGA GCGTTTGCTG GACACGTTGG
68401 TGTTGAACCA ACACTTGGAC AAAGATTATT CTACCAACGC GCACTTGTAT
68451 CATCCCACGT TCTATCTTAG GTGTTTTGCA AACGGAGCGC ACGCAGTCGA
68501 AGAATGTCCA GATAATTACA CGTTTGACGC GGAAACCGGC CAGTGTAAAG
68551 TTAACGAATT GTGTGAAAAC AGGCCAGACG GCTATATACT ATCATACTTT
68601 CCCTCCAATT TGCTCGTCAA CCAGTTTATG CAGTGCGTAA ATGGGCGCCA
68651 CGTGGTGGGC GAATGCCCCG CGAATAAAAT ATTTGATCGC AACTTAATGT
68701 CGTGCGTGGA AGCGCATCCG TGCGCGTTTA ACGGCGCCGG ACACACGTAC
68751 ATAACGGCCG ATATCGGCGA CACGCAATAT TTCAAATGTT TGAATAATAA
68801 CGAGTCACAA CTGATAACGT GCATCAACCG GATCAGAAAC TCTGACAACC
68851 AGTACGAGTG TTCCGGCGAC TCCAGATGCA TAGATTTACC CAACGGTACG
68901 GGCCAACATG TATTCAAACA CGTTGACGAC GATATTTCGT ACAACAGTGG
68951 CCAATTGGTG TGCGATAATT TTGAAGTTAT TTCCGACATC GAATGTGATC
69001 AATCAAACGT GTTTGAAAAC GCGTTGTTTA TGGACAAATT TAGATTAAAC
69051 ATGCAATTCC CAACTGAGGT GTTTGACGGC ACCGCGTGCG TGCCAGCCAC
69101 CGCGGACAAT GTCAACTTTT TACGTTCCAC GTTTGCCATT GAAAATATTC
69151 CAAACCATTA TGGCATCGAC ATGCAAACCT CCATGTTGGG CACGACCGAA
69201 ATGGTTAAAC AGTTGGTTTC CAAAGATTTG TCGTTAAACA ACGACGCCAT
69251 CTTTGCTCAA TGGCTTTTGT ATGCGAGAGA CAAAGACGCC ATCGGGCTTA
69301 ACCCGTTCAC CGGCGAGCCT ATCGACTGTT TTGGAGACAA CTTGTACGAT
69351 GTGTTTGACG CTAGACGCGC AAACATTTGT AACGATTCGG GAACGAGCGT
69401 TTTAAAAACG CTCAATTTTG GCGATGGCGA GTTTTTAAAC GTATTGAGCA
69451 GCACGCTGAC CGGAAAAGAT GAGGATTATC GCCAATTTTG TGCTATATCC
69501 TACGAAAACG GCCAAAAAAT CGTAGAAAAC GAACATTTTC AGCGACGTAT
69551 ATTGACAAAT ATACTACAGT CGGACGTTTG TGCCGACCTA TATACTACAC
69601 TTTACCAAAA ATATACTACA CTAAACTCTA AATATACTAC AACTCCACTT
69651 CAATATAACC ACACTCTCGT AAAACGGCCC AAAAATATCG AAATATATGG
69701 GGCAAATACA CGTTTAAAAA ACGCTACGAT TCCAAAAAAC GCTGCAACTA
69751 TTCCGCCCGT GTTTAATCCC TTTGAAAACC AGCCAAATAA CAGGCAAAAC
69801 GATTCTATTC TACCCCTGTT TAACCCTTTT CAAACGACCG ACGCCGTATG
69851 GTACAGCGAA CCAGGTGGCG ACGACGACCA TTGGGTAGTG GCGCCGCCAA
69901 CCGCACCACC TCCACCGCCC GAGCCAGAAC CAGAGCCAGA ACCCGAGCCA
69951 GAACCCGAGC CAGAGTTACC GTCACCGCTA ATATTAGACA ACAAAGATTT
70001 ATTTTATTCA TGCCACTACT CGGTTCCGTT TTTCAAGCTA ACCAGTTGTC
70051 ATGCGGAAAA TGACGTCATT ATTGATGCTT TAAACGAGTT ACGCAACAAC
70101 GTTAAAGTGG ACGCTGATTG CGAATTGGCC AAAGACCTAT CGCACGTTTT
70151 GAACGCGTAC GCTTATGTGG GCAATGGGAT TGGTTGTAGA TCCGCGTACG
70201 ACGGAGATGC GATAGTGGTA AAAAAAGAAG CCGTGCCTAG TCACGTGTAC
PO7IB95/00578
141
70251 GCCAACCTGA ACACGCAATC CAACGACGGC GTCAAATACA ACCGTTGGTT
70301 GCACGTCAAA AACGGCCAAT ACATGGCGTG TCCCGAAGAA TTGTACGATA
70351 ACAACGAATT TAAATGTAAC ATAGAATCGG ATAAATTATA CTATTTGGAT
70401 AATTTACAAG AAGATTCCAT TGTATAAACA TTTTATGTCG AAAACAAATG
70451 ACATCATTCC GGATCATGAT TTACGCGTAG AATTCTACTT GTAAAGCAAG
70501 TTAAAATAAG CCGTGTGCAA AAATGACATC AGACAAATGA CATCATCTAC
70551 CTATCATGAT CATGTTAATA ATCATGTTTT AAAATGACAT CAGCTTATGA
70601 CTAATAATTG ATCGTGCGTT ACAAGTAGAA TTCTACTCGT AAAGCGAGTT
70651 TAGTTTTGAA AAACAAATGA GTCATCATTA AACATGTTAA TAATCGTGTA
70701 TAAAGGATGA CATCATCCAC TAATCGTGCG TTACAAGTAG AATTCTACTC
70751 GTAAAGCGAG TTCGGTTTTG AAAAACAAAT GACATCATTT CTTGATTGTG
70801 TTTTACACGT AGAATTCTAC TCGTAAAGTA TGTTCAGTTT AAAAAACAAA
70851 TGACATCATT TTACAGATGA CATCATTTCT TGATTATGTT TTACAAGTAG
70901 AATTCTACTC GTAAAGCAAG TTTAGTTTTA AAAAACAAAT GACATCATCT
70951 CTTGATTATG TTTTACAAGT AGAATTCTAC TCGTAAAGCG AGTTTAGTTT
71001 TGAAAAACAA ATGACATCAT CTCTTGATTA TGTTTTACAA GTAGAATTCT
71051 ACTCGTAAAG CGAGTTTAGT TTTCAAAAAC AAATGACATC ATCCCTTGAT
71101 CATGCGTTAC AAGTAGAATT CTACTCGTAA AGCGAGTTGA ATTTTGATTA
71151 CAAATATTTT GTTTATGATA GCAAGTATAA ATAACCGCAC AAAGTTAAAT
71201 TTTTTTCATT TACTTGTCAC CATGTTTCGA ATATACCCTA ATAACACAAC
71251 TGTGCCCGGT TGTTTAGTGG GTGACATTAT TCAAGTTCGT TATAAAGATG
71301 TATCACATAT TCGCTTTTTG TCAGATTATT TATCTTTGAT GCCTAACGTT
71351 GCGATTGTAA ACGAATATGG ACCTAACAAC CAGTTAGTAA TAAAACGCAA
71401 AAACAAATCG CTGAAAAGCT TGCAAGATTT GTGTCTGGAC AAAATAGCCG
71451 TTTCGCTCAA GAAACCTTTT CGTCAGTTAA AATCGTTAAA TGCTGTTTGT
71501 TTGATGCGAG ACATTATATT TTCGCTGGGT TTACCAATTA TTTTTAATCC
71551 GGCTTTGCTA CAAAGAAAAG TGCCGCAGCG CAGCGTGGGA TATTTCATGA
71601 ATTCAAAATT GGAAAGGTTT GCCAATTGTG ATCGGGGTCA TGTCGTTGAA
71651 GAGAAACAAT TGCAGAGTAA TTTGTATATA GATTATTTTT GTATGATTTG
71701 TGGTTTAAAT GTTTTTAAAA TAAAAGAATA ACAATTTACA CATTGTTTTA
71751 TTACATGGAT AATGTTGTTT GTTTGACATT AAAGGTTATC ATGGTGCAAT
71801 GATTAATAAT AAAACAATAT TATGACATTA TTTTCCTGTT ATTTTACAAT
71851 ATAAAATCAC ACCAATTGTG CAAAGTTTTA TTATTTGTTT GTCGACGGTC
71901 GAGGGGTCAG CGGCGTGTGC AACAATAAAA AACATGAAGC TGTTAACAAT
71951 TTTGATTTTA TTTTATTCAT TTTTTATGAA TTTGCAAGCG CTACCAGATT
72001 ACCATCAAGC AAATAGGTGT GTGTTGCTGG GAACTCGCAT TGGATGGAAC
72051 GATGACAATA GCCAAGATCC CAACGTATAT TGGAAATGGT GTTAAATAAA
72101 AGTGAATATA TTTTTTATAA AATTTTTTAT TTAAAATTCC AAGTAATCCC
72151 TGCAAACATT AAACACTGTA GGTATTTTTA AATCTTGCCA CATGCGAACA
72201 ACGCACGGCC TGTCGTCGAA CACCGCTATT ACATTATATT TTCCTCTGAT
72251 ATAGTTGTTA AACAATTTTA ATTTTAATAA ATAATCTTTA CAAGTATCGT
72301 CTGAAGGCCT CATAAACAAT TTATATGATT TAATATCAAA ATACTTTTCA
72351 ATCCAGTTTC GAGTGGGCTG TTCACAAATT ACGCTTCTCC CGCTCATAAA
72401 CACGATAATT GCGTCGTGGC AATTTGCCAA ATACTTAACG CAAGTAATAA
72451 CGTCTAAGCG GGCTTCATCT TGAGCAACTC TATTATCAAA ATCATAAAAC
72501 GATCTATTTG TGGGCAAAGC TACTGTACCG TCTAAATCAC ATAATACAGC
72551 GCGGGGAAAT TTGTCGCCGA CAGGAACGTA ATATTCGAAA TTATTTACCT
72601 TTAGAAACTT TTTATATTGC TTTTTAATAG TTTCTGGATT TAATGGAAAT
72651 TTATCAGAGC GTTTATAATT GCGTTCAAGA GCCGTTTCCA AAGAAACGTC
72701 CATCAAACGC GTTAAAAAAT GGTAATTATG CGTTGCGGCC ATTTTTTGCC
72751 ACATGTCCAC CGATTGAGTG TTCAAATTAG TGTCGCTGAC AACCACGTTG
72801 GCACCACATT TTGCGGCTTT TAAAAACTGT TCAATGCACA TTTTGGTAAT
72851 TTGTTCTTCT TTAGTTTGTC TACATTTCCG CGATTGGTTA TAGAAAGCGT
72901 TCAGTTTTGT ATAATCGCCG TTTAAAAACA ACTTAACGCG CACGTCGTCT
72951 CTGTTGATTT CTGTATAGCC TTTTAAACTT TTGGCATACG TGCTTTTGCC
73001 CGAACCCGAA ATGCCTATCA ACACCAACAA TTGTTTTGAA GAAGGCAATT
73051 TAATTGTTGG AGCAAGTTTA TTATTTAATG CCTGCTTAGT CGATACAAAT
73101 TTTATAATAT TTTTGATCAT TTTAATTTTT TCAGGCTCGG TTAATTTTAA
73151 AAATTCGCTC TCCACATCGA TCGTTTGTGC TTTACGACAT CTGTACGCTA
73201 AACATTTCCA CGGCAAAGTT TGCACCAGTT CGTTGAAACG CTGTTGATTC
73251 AAAGTCAAAC CCGACACCAT AATATTTATT GTAGACTCGT TGGTGAACGT
73301 GTTTCTAGCA TCAACGTACG GTTTAATGAC ACTTTTTAAA TGCGGGAAAA
73351 GAGCTAGAAA GTCATCGTGT TCGCCATTTA TAACAAGCTG CGCCAATTTA
73401 GTAGGATTTT CAGCACGGCT CTGATTTTTG TGCATGTTCA AATACACGTC
73451 GCTTTTAATC TTGCATAGTG GCGCGTTGTT TTTATCGTAA ACTACAAATC
73501 CTTCTTCCAA ATTTTTCAAC TGGGCCGCGT GTTCGACACA TTCTTGCACA
73551 GACGTAAACT CGTAACATTT GGGGTATTTG CAAAACGGCA AATTGGAACA
73601 GTAAAAATAA TCGCCCGTTT CGTTGTTTCT GCTTGCCAAA TACCACAACG
73651 TTGGCTGTTC ATCGTAAACG GTTACAATTC TGTTGTGTTT GCTTGTTAAC
73701 TCAAACATGT GAGTCGACGC GCAGTCTAAA TATTCGTTAC ACAACGCTTG
73751 AAATTGATTG TGGGCCTCGT CAAGTTGAAG AGCTTGCAAA ACTAAACGTT
73801 TAAACGTCAC GTCTGACACG CAAAGGTTTT CTGCAAAAGC ACTTCCTCGG
73851 GTGCTGGCAT GCCATTCGCC GTTGTACTTG TAGATTTTAA TTAAACTTCC
73901 GTCGATTTTT TCGTAAAACT TAAAATTCTC CTTCGATTGG AACAGTTTGT
73951 GATGAGCATC TTCGCCGCCG ATATTTTGTA GCAATTCTTG AAAATTAAAG
74001 AAACGATCGA AAGAACGCGA CACAACGGCG TACGTGCGGC TGTTAAGAAT
74051 TAAACCGCGA CATTCCACGA CCACAGGATG ATCTCGATCG CGTTCAAACG
74101 ATTCGTAATT AAGAACCATC AAATCGTGTT CGGTATAATT TTTAATTTTG
74151 ACTTTAAACT TGTCACAAAG ATTTTTCACT CCGCCGTTTG CAAGTAGACG
74201 CGAAACGTGC AACATGATTG CTGTTTAATA ATGCATACCA ATGCTAAACT
74251 GTCTATTATA TAAAGTGCAG TGATAACTTT GTTATCAACG CGTTCGATGC
74301 CGACATATAT AAACGCAATG TAACAGTTTT TGCTAGTACC ATCGCATACA
74351 ACATTATGAA TACAAGGGGT TGTGTTAATA ATAATAAAAT GATATTTATG
74401 AATGCTTTGG GCTTGCAACC TCAAAGTAAA TTGAAAATTA TTGCACATAA
74451 AATACTAGAA AAATGTAAAC GTGACGCGTA CACGCGTTTC AAGGGCGTAA
74501 AGGCGATCAA GAATGAACTA AAAACATACA ATCTTACGTT GCAACAATAC
74551 AACGAGGCGC TCAATCAGTG CGCTTTAAAC GATAGCCGAT GGCGCGACAC
74601 AAATAATTGG CATCACGATA TTGAAGAAGG TGTGAAAATA AACAAGAGAC
74651 ATATATATAG AGTTAATTTT AATTCTAAAA CCCAAGAAAT TGAAGAATAT
74701 TATTACATTA AAGTAGAATG TTATGTAAAC AGTTAATTAA TCTACATTTA
74751 TTGTAACATT TGTGGTAATA GTGGCGTTGG TTATACATTT ATATGATTGT
74801 AATGTTGTGT ACTCGTTTTG TAATAAATTT TTGTGTTTAA TCAATTCAAT
74851 ATTTTTATTT GATAAAACCT TATTTTCGCT ACTCAATTTG GCGTTTTTAG
74901 ACGCAAGTTT TGCGTAATCG TCATTGAGCG ATTTTAGCGC CTTTTCAGTT
74951 GTAATTCGTT TCAGTTGCAA TTCTTTAAAA GATTTATGCA TGTTGTTGTA
75001 GTCGCTTTTA ATTTTGTCTA ACTTTTCTTG CATAGAAACG CTTGTTTGTT
75051 GTAATTTGTC TAAATCTAAT TGTTGTTTAA TGTTGAGCTG CGTTTGTTCG
75101 GCAATGTCTA CCTGTAGTTT TTTTAGTATC GCTTGTGCTT CAGACAGCAT
75151 AGTGTCGTCG GCATTTGCGT TGTTGTCTTC TGCGTCGTCC AACAGACTTT
75201 TTTCAAACAA CACACTGGCC AAAGAGGCCG CATCAAAATT AGCGTTTATT
75251 TTATTCCATT GTGCGACACT CGACGCGCTG CATTTAATCA CATCCACAAC
75301 GTTTCGGTTT ACGCTGTAAA CGTTGAAATG CAAACTTTCA ACCCTACACA
75351 AGGGACATGG TACTTTTTTT CGTTTTCTAA TCTTGCGTAT ACACATTGAG
75401 CATAATTGAT GTTTGCACGT GTCTAGTTCT AATACGGGTA TTATAGTCAA
75451 TCTGTCTATT GGTTGCAGAA AATAATTTTT AATTTCTGCA ACCGAAAAAC
75501 AAATGTTGCA TTGCAATTTA ACAAACTCCA TTTTTAGACG GCTATTCCTC
75551 CACCTGCTTC GCCTGCAACA CCAGGCGCAG GACCTGCCAC TGCGCCGCCG
75601 CCCAGAGTAG CGTTAGGATT TGCTCTTGGT ATAAAGTCGT TGCGCAAAAA
75651 GTTGTTTTCT GAATTGATTA TTTGGTATCC CAAAAACAGC GGAACGTACG
75701 TCGGGTATTC TTCGTATCCG CTAAGCGTTC TGTCCAGCTC ACGTGTGTCG
75751 CCTTCAAATT TCAAAACGTT TCTAATTTGC AAACGATTGG GTTGACTTCT
75801 CATAATGTCA CTGCTTCTTA TCGGGTTGTA CAACTCGGGG CCGTCGGGCA
75851 CAGACGCGAC CAGACCCGTT TCGTCAATTA TACACGTGGC GCAATTTCTA
75901 AACCTCAATT CCTCCGTGTC GATTTGCAAG TACTCGGGCG CTACTGCGCG
75951 TCGAATCAAA TTTTGCAAAA ATCCACTGTA ATTGTTAAAT AATTGATCGC
76001 CAGCACCGCC TCGAAGCGCT CGGGCGTTGG TCACGTCAAA GAAACGCAAT
76051 TCGTCTCGCG ACACCCGCGA ACAAAACGTG TTCGGGTTTG TGGTGTCCAG
76101 AATGCTTTTT GTAGTTGCGT AAACGCTGTG TATAACGCGT TGCGTGTTGC
76151 TTGTGAAACC TTCGGTATAT TTTAGATTGT CGCATATAGT GTTAACTGCG
76201 TTTTCGTTGT TATATATCAA ATGAAAGATT AGCTGTTCGG CTTGCATCAT
76251 ACTGTTTAGA TTAAACACGT CTTGGTAATT GGTTGCGCTT GGAATTAAAA
76301 TTCGCTTGAT ACCTCTTTCT TTATTTCCAA CTAAATGCCT AGCGATCGTC
76351 ATTTTGAATT GATTGTCGTC TTCGTCGAAA ATGGGCAAAA CCATTTTTGA
76401 CATTTTAAAA CGTTTTATGA GGTGGTTGTT GCAAATAAAC CATCCATCGT
76451 CATGATACGC GTCGGGCGAA CACGGCGATT TGTATGTTAT GCACGCGTCG
76501 AACGACACGA TGGACGCGAA AATGCAGCGA TTAACTCTCA TTTGTCGCGG
76551 CGCCATACCC ACGGGCACTA GCGCCATATT GTTGCCGTTA TAAATATGGA
76601 CTACGGCGAT TTTGTGATTG AGAAAGAAAT CTCTTATTCA ATAAATTTTA
76651 GCCAAGATTT GTTGTATAAA ATTTTAAATT CTTATATTGT TCCTAATTAT
76701 TCGCTGGCAC AACAATATTT CGATTTGTAC GACGAAAACG GCTTTCGCAC
76751 TCGTATACCT ATTCAGAGCG CTTGCAATAA CATAATATCA AGCGTGAAAA
76801 AGACTAATTC CAAACACAAA AAATTTGTTT ATTGGCCTAA AGATACCAAC
76851 GCGTTGGTGC CGTTGGTGTG GAGAGAAAGC AAAGAAATCA AACTGCCTTA
76901 CAAGACTCTT TCGCACAACT TGAGTAAAAT AATTAAAGTG TACGTTTACC
76951 AACACGATAA AATTGAAATC AAATTTGAAC ATGTATATTT TTCGAAAAGT
77001 GACATTGATC TATTTGATTC CACGATGGCG AACAAGATAT CCAAACTGCT
77051 GACTTTGTTG GAAAATGGGG ACGCTTCAGA GACGCTGCAA AACTCGCAAG
77101 TGGGCAGCGA TGAAATTTTG GCCCGCATAC GTCTCGAATA TGAATTTGAC
77151 GACGACGCGC CCGACGACGC GCAGCTAAAC GTGATGTGCA ACATAATTGC
77201 GGACATGGAA GCGTTAACCG ACGCGCAAAA CATATCACCG TTCGTGCCGT
PCΪ7IB95/00578
146
77251 TGACCACGTT GATTGACAAG ATGGCCCCTC GAAAATTTGA ACGGGAACAA
77301 AAAATAGTGT ACGGCGACGA CGCGTTCGAC AACGCGTCCG TAAAAAAATG
77351 GGCGCTCAAA TTGGACGGTA TGCGGGGCAG AGGTCTGTTT ATGCGCAATT
77401 TTTGCATTAT TCAAACCGAC GATATGCAAT TCTACAAAAC CAAAATGGCC
77451 AATCTGTTTG CGCTAAACAA CATTGTGGCC TTTCAATGCG AGGTTATGGA
77501 CAAACAAAAG ATTTACATTA CAGATTTGCT GCAAGTGTTT AAATACAAAT
77551 ACAACAATCG AACACAGTAC GAATGCGGCG TGAACGCGTC ATACGCTATA
77601 GATCCGGTGA CGGCCATCGA ATGTATAAAC TACATGAACA ACAACGTGCA
77651 AAGCGTCACG TTGACCGACA CTTGCCCCGC AATTGAATTG CGGTTTCAGC
77701 AATTTTTTGA TCCACCGCTA CAGCAGAGCA ATTACATGAC CGTGTCCGTG
77751 GACGGGTATG TCGTGCTCGA CACCGAGTTG AGATACGTCA AATATAAATG
77801 GATGCCAACA ACCGAGTTAG AGTATGACGC CGTGAATAAG TCGTTTAACA
77851 CACTCAATGG GCCATTGAAC GGTCTCATGA TTTTAACCGA CTTGCCGGAG
77901 TTACTGCACG AAAACATTTA CGAATGTGTA ATCACGGACA CGACAATAAA
77951 CGTGTTGAAA CATCGTCGCG ACCGAATCGT GCCAAATTAA AGCACGTTAA
78001 GCGGATACAA CGGGCAGTCC GAGCTGTTAA AGTCAATACA ACCATCGTTA
78051 ACAAACGAAT ACGCATTGTT GTGACAGCTG AGGATATAAA AAGGAATAGA
78101 GAAGTAATTG CAATGAAATA TCCCGTTACA ATTCCACGGC ACAGCGTATG
78151 TTGCTCGAGT TCTATCAGTT GCACACAACG GCCTAAGAAA ATTTATTAAT
78201 GCTTCATTTG TATCTATATT AGAAGGATAA TACATAGGTT CGCCCAAAGG
78251 ACTGGGAGAA GGCGGCGGCG AAGGTGTAGG TGTAGGAGGA ATAGGAGAAG
78301 GCGGCGGCGA AGGTGTAGGT GTTGGAGGAA TAGGAGAAGG CGGCGGCGAA
78351 GGTGTAGGTG TAGGAGGAAT AGGAGAAGGT GGAGGTGTAG GTGTAGGTGT
78401 TGGAGGTATA GGTGTTGGAG GAGGTGTAGG TGTAGGTGTT GGAGGTATAG
78451 GTGTTGGAGG AGGTGTAGGC GAAGGTGGAG AAGGTGTAGG AGTAGGTGGA
78501 GGTGTAGGTA ACGGTACAAT TGGTGGAGAT GTAGGTGGTG GTACAATTGG
78551 TGGATTTGGA TACAATTCCT GAATGTCGTC TAATATTTTT AAAGTTAATA
78601 AAATTATTAT AAATAAATTT AATATTATTA TTATTATTAT TATCACAATA
78651 ATGTACCACA TGTTGCTTAA ATATAAAAAT TAAACAAAGA ATGTTGTATT
78701 ATTGCAAATT TAACAATTTT TTGTATTCTC CCCATGTCAT GCGTTCGTAA
78751 TGAGCGGGCG GTTTTTTATT TCTTTGTATC CACTTGTAAT CGTTAATGTG
78801 GTTGTGAAAA GTCATACTGA CGTAGGCCAT TAAATTTTTC ATGAGCATAT
78851 TATTTGACAC AACTGCAACA TCTGCGCCTG CCGTTTCTTG CTGGTACGAA
78901 TCGACAAACG TAATGTCTGT GCCGTATTTT TCTTTGTCAA GTGCAATTTC
78951 TATAAGCTCA ATGTGGTAAA TGATGAAACC TTTGACGTTC ATATAATGAT
79001 CGCGGCACAT GGCGCACTGT AGTATGAAAA ATACGTTGTA AAATAGCACC
79051 TTCATTGTTT TCAACTGCTG CATGACAAAA TCTAAACTGC TTTTGTCTCG
79101 CGTATACACC ATATCGTCGA TGATGAGACT GAGAAAGTGC ATGGTGTCCC
79151 ATATGGTAGT AAACGTGTAA GTAAAACTCT TGGGCTGGCA CGAACGCAAA
79201 TTGAGTTCTG TGGTTTTGTC CATAAATTCT ATGCGAAACT GTTGCAAGTC
79251 CATGTCGGGG GATGCGTTAA TGGCCCATTC GATCAACTGC TGCACCTCGT
79301 ACTTTTGAAT GTCTTTGTAT TTCATCAAAC ACGCAAAATG GTATAAGTAA
79351 GTTGCTTGCG AAGACAACAG TTTGGTGAGG TGCGTCGATT TAGAGGCTCG
79401 CAAAAGGTCT ATGAGACGAA ACGAATACAA CAGATAGCTG TCTTTGTAAC
79451 GAGAAAAAAG CGGCGTCAGC GGTATCATGG CGACTAGCAA AACGATCGTG
79501 CTGTACTTGT GTCAGGCGCC GGCCACAGCG TCGTTGTACG TTAGCGCAGA
79551 CACGGACGCC GACGAGCCTA TTATTTATTT CGAAAATATT ACAGAATGTC
79601 TTACGGACGA CCAATGCGAC AAGTTTACTT ATTTTGCTGA ACTCAAACAG
79651 GAGCAAGCCT TATTTATGAA AAAAGTATAC AAACACTTGG TGCTTAAAAA
79701 CGAGGGTGCT TTTAACAAAC ACCACGTATT GTTCGATGCA ATGATTATGT
79751 ATAAGACATA TGTGCATTTG GTCGACGAGT CTGCGTTCGG AAGCAACGTT
79801 ATCAACTATT GCGAACAGTT TATCACGGCC ATTTTTGAAA TTTTTACGCT
79851 CAGCAGTAAA ATCGTCGTGG CCGTGCCCGT CAATTGGGAA AACGATAATT
79901 TAAGTGTACT TTTGAAACAT TTGCACAACC TAAATCTCAT TGGAATTGAA
79951 ATTGTAAATT AAAACAAATC ATGTGGGGAA TCGTGTTACT TATCGTTTTG
80001 CTCATACTGT TTTATCTTTA TTGGACGAAT GCATTAAATT TCAATTCCTT
80051 AACCGAGTCG TCGCCCAGTT TAGGGCAGAG CAGCGACTCG GTGGAATTAG
80101 ACGAGAACAA ACAATTAAAC GTAAAGCTGA ATAACGGCCG GGTGGCCAAC
80151 TTGCGCATCG CACACGGCGA TAATAAATTG AGCCAAGTGT ATATTGCCGA
80201 AAAACCGCTA TCTATAGACG ACATAGTCAA AGAGGGCTCC AACAAGGTGG
80251 GCACTAACAG CGTTTTTCTG GGCACCGTAT ACGACTATGG AATCAAATCA
80301 CCAAACGCGG CCAGCACATC TAGTAATGTA ACCATGACGC GCGGCGCCGC
80351 AAACTTTGAT ATCAAGGAAT TCAAGTCCAT GTTTATCGTA TTCAAGGGTG
80401 TGACGCCCAC TAAAACTGTA GAGGACAATG GCATGTTGCG ATTCGAAGTC
80451 GACAACATGA TTGTGTGTTT GATCGACCCC AACACGGCGC CGCTGTCCGA
80501 ACGAGAGGTG CGCGAATTGC GCAAATCTAA TTGCACTTTG GTGTACACAA
80551 GAAACGCGGC AGCTCAGCAA GTTTTATTGG AAAATAACTT TACCGTCATT
80601 AATGCTGAAC AAACCGCCTA TCTCAAAAAC TATAAATCAT ACAGAGAAAT
80651 GAATTAATAA AACAAAAAGT CTATTTATAT AATATATTAT TTATTAACAT
80701 ACAAAATTTG GTACACTAGT GTTCAAATCG TTTCTGTTCA ACGCCATTGT
80751 CATGTTATAA AACACATTTG TAGTTTTATT GTAATTATTT TTAAATTTAT
80801 TTTTAATTTG CTGTAATAAA ACTTGTTCAT TAAATACAAA AGACTTTGAA
80851 CTACTTGCGT TTATATTCTT TTTATAATTG TACTGAACAA ACGAGGGGTG
80901 CAAAAAGTTT TTGAAATGCT GCACGGCAAT ACCTATCATC TCCTCCATTT
80951 TGTCCTCTCC TATTGTAATA GTGGCACTGC GCACCGTTTT AATGTTTAGA
81001 ATGTAAATGA GCGCATACAG CGGACTATTG TTGGTGCTCA AGCACATTAG
81051 GTTGTGCTTA TGCATAGGGT CGTTGCTCAG CAGCGTTTTG TATACTACAA
81101 AGCCCGTTTT GGGGTCGCGT CTGTACATTA GTACGTGCGA CAAAAACAAA
81151 CGCACCGGCG TCACAAGCGA CTCGTAATAC ATGCTTTCTA TCGGAAACTG
81201 ' TTTGGACTTG ATGTGTTCGT ACACGGAGCC GGCAAACTTG ACGCTGTCTA
81251 CAAACTTATG GTTCGTGTAA ACAATCAAAA ATCTGTCTTG TACACCGTCG
81301 TCATAATCGT CCACGTACAG CGGCTTGTTG TTAACAATTA ACATTTTGTA
81351 GTTGGCTTCA TACTTTAGCA GCCCTTGGTA TTTTCTGCTC TTGGAATCGC
81401 TCTTGCTCGA ATCGGCATGC TTCTTAAAGT ACGACTCGCT GCATTGTTTC
81451 AACTCGTTGA TAGTGTACAA CTGCGAGTTG AGTTTGCTCA CTTCCTTGTC
81501 GCTCGTTTCC TTGTTGGACT CTCCGCTGTG GTTGTCATCG TCAAACTTGT
81551 GCATCAACAC CAAATAGTCC AACAGCTCAA AAAACGACGA CTTGCCCGAA
81601 CCCGGTTCGC CGGGCATGTA AATAGCCTTC TTTCCGTAAT CTACGGGAAT
81651 GGCCAAACTA GCGGCGAAAT GCATCAACAT AATCGCGTTC GCGTGATTAA
81701 AATTGGTGAA GCGTTTAAAG TACAAATAGC CTTCGACAAT CTTTTTCAAA
81751 TAATTGTACG AGTACTCCTT CAAGTCCACT TTGGACATGA TGATGCGCAT
81801 GTAGAATCGA GTCAGCCAAG TGGGCAAATC GTCCGTGCTG CGCGCCAATA
81851 TGATTTTGTC CCACCACACA TTGTACTTCT TCAAGATCAT TAACGCGTCG
81901 GCGTGGTGCG TGTAAAATTT GGAAATGTTA TCCGATTCTT CAAACTGAAC
81951 ATCGGGTTCA CGTGCAACAT CATCGCGCAA TTCGGTTAAA AACAAACGTT
82001 TATCATTAAA CTTGTCCATC AACATGTCGA CATATTCGAT TTTGTGAATT
82051 GTTCGATACA AGTACTGAAT AATTTTGTTG TGTTCTTTGG AAAAAAACTC
82101 TCCGTGTTGG TTAACAAATT CGCTGTTCGT GCGAATCAAC GTGGTCGACA
82151 CGTACGTTTT GTTAGTAAAA ATTAGCATCC AAATCAATTC GCTCAATTCT
82201 GCATCGTTAC CGAACATGTC CGCCATCAAG CAGACTTTTA GCGCTTTTCT
82251 ATTGATCTTT ATTTTCTTGT AGCATTTGCA TTTTGGTCGA GATCCCGATA
82301 CCGTTGACCG ACACGGTTTG CATTTTAGGT TGTGCAACAT GTCGGAAACC
82351 CTGTTCTTGT TTACGTACAG AGCGAGCGTA ATCAGATTTT CATCGTCCAA
82401 ATTCCACAAA TCGCGAAACA GGTTGTTTAA CGCGACTCGC ATATCGGCTT
82451 GGCATGTGTT GCAATTGCCC ATGTAGTTAA CTATGGCCGT GTTAGTTTTT
82501 AGCATTTTTA CATCTCGGCA CATTTTGGCG ATGTGATAAG TTCTATAAAT
82551 GCTGAGCTCG TCGGCGCTAG TAGATAGCAT GTAATTAAAC GCGTCCTCGG
82601 GCAAATACTT TTCGTCGGTG GGCTTCTTGA ATGTCTGCGG CAACGTGGTG
82651 CCCAACAAAA ATGGACAGCT CGAATGAAAG CTGTTGGTGA ACACGTTGTA
82701 CACACCGTGC GTTGTCAAGT ACAAGTATTT CCAATTGTTA AATTTTATGT
82751 TGCTCAACTT GTAACAATTG CTTTTGGTCA ATTTGAATAG GTCATCCTCT
82801 TTCTTTACAA TTTGATAATG TTTGCCGTTG AAAACCAAAT TGACTCCGGT
82851 CACTACGTTT TCCAATTTTC TAAAGAATCC TTTACACACA ATGTCAGGCG
82901 GCAAGTTTAG CGCCATCACA TTCTCGTACG TGTACGCCCA CAATTCATCG
82951 TGATCCAAAA TTTCGTTTTT AGCCGACTGA GTCAAATATA TCATGTAGTG
83001 TATGCCAAAA TAATAGCCCA ACGATACGCA CAATTTGGTA TCGTCAAAGT
83051 CAAACCAATG ATTGCAGGCC CTATTAAACA CTATTTTCTC TTGTTTTTTG
83101 TAAGGCTCAC ATCGCTTCAA AGCTTCATTC AAAGCTTCTT TGTCGCAGGC
83151 AAATAATGAT TCACACAAAA GTTCCAAAAA CAGTTTGATG TCGGTTTCTC
83201 TGTACGAGAA ATTTTCGTTC TTGGTCAATA TCTTCCACAG TACATAGATT
83251 AAAAAATCAA AATTTTTAAA TTTGCTTTTT TCAAAGTATT GTTGTAGAAG
83301 GTTTGGATCG TTGGCTCGTT CGTGGGTCGC CAAAACTTTA ACCATGTTCT
83351 CGTGAATTGC TATAAGCCCC AAATTGATTT GCGTTTGAAT GTAGTCTGCA
83401 TTTTCGCTGC TCGCCGATAT AATGGGTACG ATGCGCGGTT TTCTGGAACG
83451 CGTGTCGCTC AAGTCCACGT CGTTTTTGTC AAAATTGTTG TTCTCGAACA
83501 CTCTGAGGCT TTTGAGGTTG ACGTTGACGA TATGCTTGTA CTTGGGCACC
83551 GTAATGCATT CCTCCAAATT AATGTCGTCC CTAATGTAAT TGAAAAAATT
83601 TTTATCCGAA TTGACCAGCT CGCCATTAAC TTTGCACGTG GCCACAGTGC
83651 CGTCGGCCAT TTTGAGTATA AACAAGTCTT CGTGAGAATC GTCAAACTTG
83701 GTTTTTCCAT TTACAAACAG CGTTTGCGGC GGATCGTGAT TCGTGCGCAG
83751 GCTGAGCTCG ACGTTGAGAA AACATTTAGG GTCAAACACA AACAAATCCA
83801 CAGGGCCTAG TTTTTTGTTG TGTATGATTG GTATCGTGGG TTCGATGACA
83851 ATTCCAAATT TTATATTTAA AAACAGCTGC CATCCGTTAA AAGAGAAAGC
83901 TTGCTTTTTG GGCCAGTTGG GCCAATAATA GTAATCGCCC GCTTGCACGC
83951 ATTTGTTAAT GTATCCAGGG TCGGTGCTCT TGAAAAAATC TTCAAAATTA
84001 ATATACTTTT GTATGATGTC ATAGTGCTTC TTCAAAATGA AAGGTTTTAC
84051 AAAAATGCAA AAATCGTTAC TTTCCAACAC CCAGTCGTGG CCGTCTAATG
84101 TTTGAGCTGC GTGTTTCTCT GCAGGTTCTT CGGTGTCTTC GCAAGATGCG
84151 CCCATGTCGT GTTTCGCGCA CGGACCGTTA AAGTTGTTTC TAATTGTGTT
84201 TAAGAACTGT TGAAAGTTGT TGACGTACTC AAACAATCTA CGTGTTCCTG
84251 TTCGCGTGTT TCTAATGATT AAATGATTTG CATCTTGCAA GTTGTTAATC
84301 TCGTACGTTT TGTCTTGAGG CACGTTTTTC AAAAAAAATT GTAAAATGTT
84351 GTCAATCATG TTGGCTATCG TGTTTGTACT TTTCGTGTTA ATTTATTTAA
84401 TAATTTCGAT CAAAAATCAC CATCCATTCT TACATAGAAT AGAAACGCTA
84451 ATACAAGATT TCAACAACAC ATTGTTGTTT GGCGCGTATG TACAGATTTA
84501 CGATTTAAGC ACGCCCGCCC GCACCGAACG ATTGTTTATT ATTGCGCCCG
84551 AAAATGTGGT GTTGTATAAT TTTAACAAAA CGCTCTATTA TTACTTGGAC
84601 TCGGCGAACG TGTTTTGTCC CAACGAGTTT AGCGTGACCA CGTTCACGCA
84651 ATCCACTATT AAAACGATCA ACGAGACGGG AATATATGCC ACCGCATGCA
84701 CGCCGGTCAG CAGCTTGACG CTAATTGAAC ATTTTGCAAC ATTAAAAAAT
84751 AACGTGCCCG ATCACACGCT CGTTCTCGAT GTGGTCGACC AACAGATTCA
84801 GTTTTCAATA CTCGACATTA TCAATTATTT GATTTACAAT GGCTACGTGG
84851 ATTTGTTGGC CGAATAACGC GTATATAGAC GCTTGTACGT TCATCGTAGT
84901 AATCATTTTA ATACATTTGA TTGAACTAAA CATACATCTG CAATGGGTGA
84951 AAGAGTCACT AAATTTTGCA ATGGAAAACG GCGATAAAGA AGACAGCGAC
85001 AATGAATAGA GTTTATATTT TTATTTAATA AAATATTGTT CGTAATCCAT
85051 AATGTTTTGT ATTATTTCAT TGTGATAATG TTCCCAATCT TGCACGGGGG
85101 TGGGGCATCG TTTGACTTTG ACGTAGAAAT CGTACGCGTA GTTATTAGTT
85151 GGCAGATCGT CGACAAGTGT GATCGACTTG AAAAAGTTTA CATTTTTATC
85201 GCTCAAATAT TTAATTACAA TTTTTGGCGA TTTGGGTATA TTGTTGTCGG
85251 ATCGATGATT GTGAATGTCA AAAACAAATT TATTTTCAAT GAAACGCTTT
85301 TTTAAATTGT AATCTACAAT AGCGTTGTGT GAATTTTGAA CTAAATCAGA
85351 GCGTTCTTCT TGAACGGTGG AACCTTCGCT GATAATGATA TCAAAATAGC
85401 CTTCCAAATC GACGTCTCGC ATCGAGTGTG CTACATGATC TCTACTGCCA
85451 TACGACCACA AGACTAAAAC GCAACCCATC TCGTGCAACT CCTGCAAGCT
85501 GTCATACACA AACGGATCTC GAATCTCAAC TTGCTCCTCT TCGGTTATGA
85551 GAGTGCTGTC CAAATCAAAC ACGACCACGT GCGGAAATCC CCACGTCAAA
85601 GATTCGCTTT TGAGAGAGAC CACTTTGTAG TGTGGCAATA GAAACCATTC
PC17IB95/00578
152
85651 TTTAAGAAAC GAATACATTG GCGGTTTGTT GCTAAGCACG CACATGTGGC
85701 CCAACACTGG CGTTTTGAAT GCGCGTTTAA TATTGTGCCT GATGTCGCGC
85751 ATGTCGTCGG CGGGCGCTTT GAATATTTGC ATACAGTAAT TGTAATTGTT
85801 TTCTATGATC TTGCACAGCT GCGGGTCGTT GCAAAATTGA AATATTACAT
85851 ATTCAAAAAA TTTATACTTT TCAAAGCCAA GGTATTTGAG GTCGGCGTAC
85901 TCGCTTAAAA CGAGAACATG TCGTTTGATG ATGGCGTCGT TAAGGCGCAA
85951 ACAGATCCAT TTGCTTTGAA GCGAGGAGGC CATAATGTAC AAAAATGGAC
86001 CAGTTACGCC TTATTTAAAC TGTTTAAAGA GTTTCGTATA AACAAAAACT
86051 ACTCTAAACT AATAGATTTC TTAACAGAAA ATTTTCCCAA CAACGTCAAA
86101 AACAAAACGT TCAACTTTTC GTCTACCGGC CATCTGTTTC ACTCGTTGCA
86151 CGCGTACGTG CCCAGCGTCA GTGATTTGGT GAAAGAGCGC AAACAAATTC
86201 GATTGCAGAC AGAATATTTG GCAAAGCTGT TCAACAACAC AATAAACGAT
86251 TTCAAACTGT ACACTGAGCT GTACGAGTTT ATCGAACGGA CCGAAGGCGT
86301 CGATTGCTGT TGTCCGTGCC AGCTATTGCA CAAGAGTCTA CTCAACACCA
86351 AAAATTACGT GGAAAACTTA AATTGCAAAC TGTTTGACAT AAAGCCGCCC
86401 AAATTTAAAA AGGAACCTTT TGACAACATT CTTTACAAGT ATTCCCTAAA
86451 TTACAAAAGT TTGTTGTTGA AAAAAAAGGA AAAACATACC AGCACTGGGT
86501 GTACACGCAA AAAGAAAATC AAACACAGGC AAATATTGAA TGATAAAGTT
86551 ATTTATTTAC AAAACAGTAA TAAAAATAAA CTATTTGAGC TTAGCGGGCT
86601 TAGTTTAAAA TCTTGCAGAC ATGATTTTGT AACAGTCGAA AGCCAAACGA
86651 GGGCAGGCGA CGAAATCGCT TCGTTCATTC GCTACTGTCG GCTGTGTGGA
86701 ATGTCTGGTT GTTAATAGTA GCGTGTTCTG TAACTTCGGC GACCTGTCGA
86751 TGAACGGCTC CTGGATCTTC TGTATGTGCG GGGTCTACCC GGGCGGCGTC
86801 TGTAACCCGA GCTTCTGCGC CTGCGTGTCG AACCATATGT GGTACCGGTT
86851 GAAGAACGGC GACGGCGACG ATAAACCATG TTTAAATTGT GTAATTTATG
86901 TAGCTGTAAT TTTTACCTTA TTAATATTTT TTACGCTTTG CATTCGACGA
86951 CTGAACTCCC AAATATATGT TTAACTCGTC TTGGTCGTTT GAATTTTTGT
87001 TGCTGTGTTT CCTAATATTT TCCATCACCT TAAATATGTT ATTGTAATCC
87051 TCAATGTTGA ACTTGCAATT GGACACGGCA TAGTTTTCCA TAGTCGTGTA
87101 AAACATGGTA TTGGCTGCAT TGTAATACAT CCGACTGAGC GGGTACGGAT
87151 CTATGTGTTT GAGCAGCCTG TTCAAAAACT CTGCATCGTC GCAAAACGGA
87201 ATTTCGGTAC CGCTGTTGAT GTATTGTTGC GGCTGCAACA TTTGTATCTT
87251 TTCGCCGCGC TCGATCAACA ATTCTTCAAG AGTGGTGCGT TTGTCGCGCT
87301 GTAAAGCCAC GTTTTGTAAC AGCACTATTT TCGCATATCT CATAATCGGA
87351 CTGTTGAAAC AGCGTGCAAA CGACGACCGC ATAATATCGA CGGTCGTCAA
87401 GTCGATTGTG GTCGAAGGCA TCTCCAACAG AGATCGCACG GCGTCCAACA
87451 GCGTGTCCGT TTGAACCTGC GTCATTTGCG GTCTGCACGT GTAGTCGTCA
87501 AACGTGGTTT CGAGCAGTTT GAACAACGAA TGATACTTTT CCGATCGCAG
87551 CAAAAATATC ATGGTCATGA CCACGTCGCT GATTTTGTAT TCTGTAGAAC
87601 TGGTGCTGTT CAACGAATAG TGATGGATTA GTTTGCGAGC AGCATTTCTG
87651 TATCGGCGCA TGTTGATCAA CTCTTCGGAA GGCTGCGCGG GCGCGGCGGC
87701 GTTGGCTCGC GCAAACAAAT TTATTACGGG ACGCGGCGTA GGCTGCGCGG
87751 ACGCTGGCGC GGCGACGACG TCCGCGTTTC CCGCCGCGTA CTGAGACGCT
87801 ATGGCAGCGT TGTTATTTAA AATTGTGTTT TGCGATTTGC GAGCCACGTG
87851 CATCATAAAA TTTATCAACA CGTCGGTGTT CAACTGCACG CTTTGATGTT
87901 CGTCGCAGAG CAAAGGAAAT AGCTGGGGCC ATATCGCCAA TTGCATAGGC
87951 TCGTCTATTT TTAACCGCAA TTTGTTTATT TCCAAATACA ACGCGATAGC
88001 GCTCATCGTG ACCGACGACG CACACTTACT CTGTAACTAT CACTTGGATC
88051 GTGTTGTCGT AAACGCTTCC CAAAAAGTCT AACACGTTGA CCGTTTCGAT
88101 TCTATTCAAC TTAATTGTGG ACGCGTTGGC TTGCATCGGT TCCAACAGAC
88151 TGCGCGCTCC GACAGATTGA GTAGACAAAA TTTTTAAACT TTCCGTCTTA
88201 TTGGGCGTAA TGTCGTTGAT TAACAACGAC GCAGCCGTTT GAGAGGCCGC
88251 AGTGTTGATG GTTTGCAACA TGTCGACGGC CGCCATTTGC GTTTGCGCCG
88301 AAGGTCTTGC TGGCGGCCTG TTGCGGCGGT TTCTTCGTGC TTGCGACATG
88351 TTGTCGTCAG TGTCCATATC GGTATCATTT ATTGAAGCAA TCATGGTTGA
88401 GTTCGATAAG CAGAGATATT TCGTTGTCCA ATTGGTACTT GGTAATGATG
88451 TGCCTTATAA ATGTTTCGGG CACAATCATT TCTGTCATTA GCACGTTACA
88501 AATATCTATT TTGATCAATT TCAATTTATG AATTAACAGA TTAATGTTTT
88551 CGTCCGAGTA CTTGCTCATG ATGAAACGAC AAACGTTGCG GAGTTCCAAC
88601 TCCGCTACCG GATACGCTTT GTTGGGCAAA CTCTCTAAAT AGTGTCTCAA
88651 ATAAAAGCCG ATCAATACGG TGGACGCTAT TTTGTTAACC TTTTTCATTT
88701 TAGTATTGCG GCCCATTTCT ATCATGAAGT TTTTAAACGG TAGCAACAGC
88751 CTGTCTCCGT TAGCAACAGT GGAGCAGCCG TTGCATTGCG CGCTCAAAAT
88801 ACTCAACACG CGCTCGTGAT CTTCTTGGCG CAATCCGACG GTTGCTTTTT
88851 TGCATTCTTT GACAAATGGC ACGCACATGT CGCGTTTCGT GTACAAAGAA
88901 TACGCTTTGT CGCAAATCAA GTTATAGAAA AATTGCACAA ATATCTGCGT
88951 AATCAAGTTG TTTTCGTTAA TAATGTCACT TTCGTTTTTG TAATCGGTTC
89001 GAAGCAACAC GTACAACATC AGAGGCATGC CGAACATGGG TCTTAAAAAA
89051 ATGTCCCAAC CATTTTGCAA GCCCGCGTCG AGGGTGCTCA GCGAGGACGC
89101 CAAGTATTTG CATTTGCACT CAAAACATTG AATTTTGTTT GCGGGCTTGC
89151 ACGACTGACA CATGATCGCA TCCACGTCGG GTGCCGGCGT CGGATTGTAA
89201 TATTTTTGCA AGTATTGCAT AATGGTCCTA AAATGGGGTA CCTGTTTGAT
89251 AAACTCGTCG CGCAAAAATA TCGAAAAAAT GTTTTTTACA TTGTGTATGT
89301 TGTCTGTGTT GTTGGCTTGA TTCTCAAAAC TACTCTTTAT GGAAACAATA
89351 CATTTGTTAA ATTCTGTGAA AAAAGTAAGA CCTTTACTGT CCACGATCAA
89401 GCTTTGGTTG AAATATTTTG AAAATAAAAA ACACAACGAA TCGATTTCAT
89451 CTTGTAACAA TTGCGCTTCA AAACACACGT TTTCAAAGCG GTCGTAAATG
89501 TTAAACCTTA AACTGTATTG TAATCTGTAA GCGCACATGG TGCATTCGAT
89551 ATAACCTTAT AATATGAACG ATTCCAATTC TCTGTTGATT ACGCGTTTGG
89601 CAGCGCAAAT ACTGTCCAGA-AACATGCAAA CGGTGGATGT GATTGTTGAC
89651 GACAAAACGC TCAGTTTGGA AGAAAAAATA GACACGTTGA CCAGCATGGT
89701 GTTGGCTGTA AATAGCCCGC CGCAATCGCC GCCGCGGGTA ACATCCAGCG
89751 ACCTGGCCGC ATCGATCATT AAAAATAACA GCAAAATGGT GGGCAACGAT
89801 TTTGAAATGC GATACAACGT GTTGCGTATG GCCGTCGTTT TTGTTAAGCA
89851 TTATCCCAAG TATTACAACG AGACGACCGC CGGTTTAGTT GCCGAAATAG
89901 AAAGTAATCT GTTGCAATAT CAAAATTATG TAAACCAAGG CAATTATCAG
89951 AACATTGAGG GTTACGATAG TTTATTAAAT AAGGCGGAAG AGTGTTATGT
90001 TAAAATTGAT AGACTATTTA AAGAGAGCAT TAAAAAAATC ATGGACGACA
90051 CGGAAGCGTT CGAAAGAGAA CAGGAAGCGG AGAGATTGAG GGCCGAACAA
90101 ACTGCCGCAA ACGCTCTTCT GGAGAGGCGA GCGCAGACGT CCGCAGACGA
90151 TGTCGTTAAT CGTGCCGACG CCAATATTCC CACGGCATTT AGCGATCCGC
90201 TTCCAGGCCC CAGCGCGCCG CGGTACATGT ACGAAAGTTC AGAGTCGGAC
90251 ACGTACATGG AAACCGCCCG ACGTACCGCC GAACATTACA CCGATCAGGA
90301 CAAAGACTAC AACGCGGCGT ACACTGCCGA CGAGTACAAT TCCCTGGTCA
90351 AGACGGTTCT TTTGCGTTTA ATCGAAAAGG CGCTGGCCAC TCTAAAAAAT
90401 CGGTTGCACA TAACAACTAT TGATCAATTG AAAAAGTTTA GAGATTATCT
90451 GAATAGCGAT GCTGATGCTG GAGAATTTCA AATATTTTTA AACCAGGAAG
90501 ATTGTGTGAT ACTGAAAAAT TTGTCAAATT TAGCGTCAAA GTTTTTCAAC
90551 GTTCGTTGCG TGGCCGACAC GTTAGAGGTA ATGTTGGAAG CGCTTCGCAA
90601 TAATATTGAG TTGGTGCAGC CTGAAAGCGA TGCCGTACGG CGAATAGTCA
90651 TAAAAATGAC GCAAGAAATT AAAGATTCGA GCACGCCGCT GTACAACATT
90701 GCCATGTACA AAAGCGATTA TGACGCCATA AAAAACAAAA ACATTAAAAC
90751 CTTGTTCGAC TTGTACAACG ACAGGCTGCC AATCAATTTC TTGGACACGT
90801 CCGCAACCAG TCCAGTTCGC AAAACTTCCG GCAAGAGATC TGCGGAAGAC
90851 GACTTGTTGC CGACTCGCAG CAGCAAACGT GCCAATAGAC CCGAAATTAA
90901 TGTAATATCG TCAGAAGACG AGCAGGAAGA TGATGACGTT GAAGATGTCG
90951 ACTACGAAAA AGAAAGTAAA CGCAGAAAAT TAGAAGACGA AGATTTTCTC
91001 AAATTAAAAG CATTAGAATT TAGCAAGGAC ATTGTCAACG AAAAGCTTCA
91051 AAAAATTATT GTGGTCACCG ACGGTATGAA ACGGCTGTAC GAATACTGCA
91101 ACTGCAAAAA TTCTTTAGAG ACTTTACCGA GCGCCGCTAA CTATGGCAGC
91151 TTGCTCAAAA GGCTAAACCT GTACAATCTC GATCATATCG AAATGAATGT
91201 AAATTTTTAC GAGTTGCTGT TTCCATTGAC ACTGTACAAT GACAATGATA
91251 ACAGTGACAA AACGCTTTCT CATCAATTGG TAAATTACAT ATTTTTGGCC
91301 AGTAACTATT TTCAAAACTG CGCTAAAAAC TTCAACTATA TGCGCGAAAC
91351 TTTTAACGTG TTTGGCCCGT TTAAACAAAT CGACTTTATG GTCATGTTTG
91401 TTATAAAATT TAACTTTTTA TGCGACATGC GTAATTTTGC CAAATTAATC
91451 GACGAGCTGG TGCCCAACAA ACAGCCCAAC ATGAGAATTC ACAGCGTGTT
91501 GGTCATGCGG GATAAAATTG TTAAACTAGC TTTTAGTAAT TTACAATTTC
91551 AAACCTTTTC AAAGAAAGAC AAGTCGCGCA ACACAAAACA TTTGCAAAGA
91601 CTAATAATGT TGATGAACGC AAACTACAAT GTTATATAAT AAAAAATTAT
91651 AAAATATTTT TAATTTTTAT TTATATTCAG TACATTTACA CATATTAACA
91701 TATTGTTTAT ACAAATTCTT ATAATCATTA TGATTTAAAT TGAATTGTTG
91751 TCTAAACAAA TTAAACACTT TATTAAACAA TAACTTTTCG TTGTAATTTT
91801 TTACTTTGCA CATGTTATAA CAAAAAATTA AAATTTTCAT CATGTCTGAT
91851 TTGTCTATGG CGTCACAGTT GCTTTTAATG TAATCGCAAG TTAACCACTC
91901 AAAAGGACCC TTTTCTATTT TTAATTTGTT TAAATCTTTA TAATCAGACT
91951 TCAGTTTGTA AATTAGATTT CCACATCGAA TAATAAATCC TTCCAGCGGG
92001 CTTTGGGGAA ACATTAAAGA CTTGAAATTT AACCTTTCTA CAAAATCGTT
92051 GTACAAATAT TTGTGACACG GAATAGTATT AAACCCCACG TTAGTCAACA
92101 ACTCTTGCGC CTCCACAAAG GGCACAAACT CCCCGCCGTA TAATTGAATT
92151 TCGTAAGCGT AGTATTTCAA ACTCTCTTTC TGGTCCACGT AGTTAATTAC
92201 GTTAATGGGT GTCGTTTTTG CGTCGTCTTT CCAACCCATT AATTCGCCGT
92251 AGACAATAAA ACCGTCATTG AACCGCGCCT GAAGCGATCG CATGCACGTT
92301 TCTAAATCTT TTCGAATGCG GTAATAATTC ATAAAATTGC CGTCCGGTCT
92351 GTAAGTGTTT CTTGACCCGT ACGTAATTTT ATTTTGGTTG CAAATGATTC
92401 TGAAATTACA ACCGTCCAAC TTTTCTTGAA CAATAATTTC TTTGTCGGCC
92451 AACGTACCTT TTTTACCTTG ATCTAGATGC GACACAGATG GATAAATTTG
92501 ATACACAATT TTATTCTCAT CTTCGGGCAT TACGGGTCCG CGTTCATTTA
92551 ACGCGTACAT GACAATGTTG TGGCGAATGT CGGTGCGCTC CGGCGGTTCT
92601 GGCACGTGGT GCAGTCTGTC CTGCAATTGT TGCTTCCATT GTTGAAAATA
92651 TTCGGTCCAT TCTTGTTGAT ACTCGCCGCG TTGCATGAGT TTTACGTACA
92701 GTTTTAAAAG TTTGACATTC TTTACAAATA ACGTTAGAGT TTCGTCGATT
92751 TTGTATCCTC CATTATTTTT GTTTAAATCC AATACATTTA AATCGTTCAC
92801 TACCAGTTGA TTGTTTTTAT CCATCGTAAT TTTTATCTCA TCGCCCACGT
92851 TGAACAACAT GTTTAAAATT TTGGTGGATT TCGGCGCACG TTTATAATCT
92901 AAATAATATT CAACGTACAC GTAATTGAAC ATGAGCTGCA ACAATCCTTT
92951 GGCATTGTTC AAAATTTTGT ATCTCATCAA AGTATAAATA ATTTTCACCA
93001 TCGACACCGT CATCAACTTG GTTACAAACT CGTACAATTG CAAGTTTTCA
93051 ATACCGTATT TGTCTTTAAA ATCTTCACGT TTACTGAACA TGCTTAATTC
93101 GGGAGATTTT CCAGTCAAAA TGCCAATTAA TCCCGTGTAC AAGTCAACGT
93151 ATTTGACATC GTTGCCCGAT TCATCTTTTG CATGTCGATT TTTCAAAAGC
93201 TCTTTATTGT CGATAAATTT TTCAAAGGTC TCTCGATCAC ATTTAGTGTA
93251 AATATGGTAG TCAGTGTCGC TGCTTTCGAC CGCGTATCCC TTGGCATGGC
93301 TGCCCGTATC AATGCAAATG TACACCATGT TAGAATGTGC TGCTTACTGT
93351 GCCTGTATCA AGCCTTATAT ACCTCAAAAT ATTTCACATT TTTGCATCAT
93401 CGTAAAATAT ACATGCATAT AATTGTGTAC AAAATATGAC TCATTAATCG
93451 ATCGTGCGTT ACAAGTAGAA TTCTACTGGT AAAGCAAGTT CGGTTGTGAG
93501 CCGTGTGCAA AACATGACAT CATAACTAAT CATGTTTATA ATCATGTGCA
93551 AAATATGACA TCATCCGACG ATTGTGTTTT ACAAGTAGAA TTCTACTCGT
93601 AAAGCGAGTT TAAAAATTTT GTGACGTCAA TGAAACAACG TGTAATATTT
93651 TTTACAATAT TTAAGTGAAA CATTATGACT TCCAATAATT TTGTGGATGT
93701 GGATACGTTT GCAAGACAAT TGATTACAGA TAAATGTAGT GCTCTAATCA
93751 AAGTGCGGAT CTGTTGCCGG CAAACATTTT AGAGATTGTA GAGAAGGCCA
93801 GAGACAAGTA TTTTGAGGGC CAACTCAAAA AAACTATGAA TACATTAAAA
93851 AATTATTTTT ACGAAAAAAT ATATGGACGA TTCGATAGAT TATAAAGATT
93901 TTAACAGACG CATCCTATTG ATAGTTTTTA AATTCGCTTT AAACAAGAGC
93951 ACAATACTTT CCATCGTACA AAGAGATCAT CGAGTGGCCA TTAAACGTTT
94001 AAACAAAATT AACCCCGATT TAAAGAGTTC TCCGCGCAAT GCTTCAGCAT
158
94051 TACAATGAAT GTTTGGAAAA TCTAGACAAT CCAGTCACGG ACGAACATCA
94101 TTTGTTGACA AAAGAGTTGC TACAAAAATA TTTATCGAAG CGTTTGAATA
94151 CAGTTACACC AACACTAATG CCATCAGCAT GGACAAAACA GATGAATTTG
94201 ATTTTATTAA ACCGGCATTG AAACCTTTGC CAGATGCAAG ACCGCCATCG
94251 CTTTTGGCCA ACGTGATGAA CGAACGTAAA AGAAAATTAC AAAACACCAA
94301 CTCAACGGCA AAATGTTTGC TACCAGCACC ACCGCCACAA TTGCGTAAAC
94351 TTGAAAAAAA GAATCATTTA TTGCCTTTGT TTTCTTTGTA ATTATATTGT
94401 TGCATTTCTA TTTCTAATAT CATAGTTTTC TAATAAAGTA GTTTCATATT
94451 TTTGTTTTTG TACAGTAATT GTTTCTTGGT TTAACAAGAT CACAACCAAT
94501 AACATAAAGA ATAACACAAT CATAACAAAA ATTAAAAAGC CGCATACTAC
94551 TAGAACAAAT TCTTTAATTA GCGATCGGTT TCTATTTACA AATTGGCCGA
94601 GCTGATCGCC TTCAGTCGGC GAGTTGTGGG CTTGGATGAT GTCGACGATA
94651 TTGTTGCCGG CGCGACCGCC TGTCGCTCTC GATATAATGT CGGCCGCCGT
94701 CGGTTTCATG ATGTGCTTAA CTACAAATAA TAGTTGTACT TGACGGGCGT
94751 CACCGTGATG CCGCTGCTAA AACCTCCGTC CGTTAAGACG CGTTGCGTTA
94801 CAAAATTAAT GTTTGTCCGA TTAGCGTAGT CGGAATAATC AAACGTGTTG
94851 GGCGGACTAA AATCGGGCAT GTTGATGGGC ACAATGCCGC TGGAGCTGAT
94901 AGCAATGCTG TCGTTCTTGC AAAACAGCCG AATTTTTTTG TAGGGCTCTG
94951 CTTTATTCGG CGCAGACGAC ACCATCTGGT CAAAGTTGTT CAATTTTATG
95001 ATTACGTTGG GTACCAATTG ATAGGGGAAA ATTATTTTCT GGAACATTTT
95051 GACAAAGTCC ACAACCGTTT GGCTATAGTC GGGAATGCCG AGCAAAGACT
95101 GCGCCTGTTT AATGTATTTG AGACTGGAGC GGTTTACTGT AGCGCAATTG
95151 GATGGCACGT CGCCCTTCAT AAGCCGGCGC GTTCTCTCCC AATTCAATTT
95201 GTTGTACAAA TTATCAATCT CCTCGTGCGG CAGATTGATT ACATAGCGCG
95251 CGGGCTGTTT GCGATATTGA AAGATGCAAA AAATGCGTTT CAACGACAAT
95301 ATCTTCACCA TGGTGGACGT TTCCAGATTG AAACATAACA AAAAGTCATT
95351 GCTTTCCACC AATTCTTTAA AATGAGACAG CGGAATTTCA CAAGCGATCG
95401 GTCGCAAATT GCTTTTTATT GGAGGCGGAA CGCTTTGACC GTTGCGGTTT
95451 TTTAGTAACG CGCTGCACGC AGATTGCATG TCCGTTTCGG GATACGTAAA
95501 CTCGATGGGA CATTTGGGGT TTTCATGGTG AACGATCATA GTGTTGCAAT
95551 AAAACAAGTT GTTGGTCAGG AGCACGCTAA AAACACGCGT TTCGCCCGCA
95601 CCGATTTCGG TGATGGGTAC CAACGGGTTC CAGTAGACTA TGGTGGCGGA
95651 CGCTGTTTTT TTTGGCGATC GACTGTCTAT GTTAACATCA TGCTCGTGCC
95701 TGTACACTAG CACAGAATTG AATTTTGGAA ATTGTTTTTT GTCAATGTAC
95751 AACCGGTCGT CGTCTGTGGG CACGTACACG ATCAAGTTTT CGATTAATTT
95801 GTTGCCTACG TCGCTTTGCG GTTCCACCAA ATTGTGAGGG AACGCAAAAA
95851 AGCGATCGCT AATACAAACT TGAATCTGAA ACGGGCACTC CATCGTGATG
95901 TATATGTCTT ACTTCATTAG ACTTTAGATT ATTTTAATTT GTGAACTCGT
95951 ACCGTATTCA ATAGGGTGTC GGGCACGTAA TTGTAATGGT AAAACAGATC
96001 CTGTTGAACA CGTGCGTTGT TCACTACGAT TGAAATGCAA AAATACATCA
96051 AGTACATAAA CACTATGATT AGAAAGGTAG CAGACAGAAA ATATTTCATC
96101 TTTAAATCTT ATGCTAGTTG AATAAAATAC ATAGTACTTT TATACGTTTA
96151 TTTATATTTG TTTTCTTTGT TATAACCGTA ATTGTAAAAC TTGTGATCGT
96201 GCTCGCCAGG CATAATTTCT TTGCACATCA GCTTGCGAAT ATATGTGACA
96251 TCTTCGTACA CCGATTTCTT GATGTTACCA TCGTGAAGCG TTGTCGGCTT
96301 GAGAGGTTTG CGGTCGTTGT TGTAAAAATT TTGCACCGAA TAATTATCCA
96351 TAGTGCAGCA CAGGCAATGT CACTGATGCA TATGCTTTAA TTTTTTATTG
96401 CATTCAGTTA TTATATGATT TAATAAACGT ACACAATAGC ACGTTTATCG
96451 GTTAAAGATA ACTTTCAATA TATAAAAGTG TTTGAATTGC GAGACCGTCA
96501 ACATAACGTT TATCAACGCG ATGACTAAAC GACAATTTGC TTTGCTGTTT
96551 GTGTGGCACC ACGACAACCA ATTTGTTTGC AACACGGACG AATACCCGTT
96601 TTGGCACAAC ATTGAATACC ATGCACGGCG CTATAAATGC ATCGTTTTGT
96651 ACTGTGTGGA AAACGACGGA TCGCTACAAC TGCCCGTTTG CAAAAACATA
96701 AATCTCATAA ATTATAAAAA AGCGTATCCT CATTATTATG GAAACTGTGT
96751 TGACAGTATA GTGAAACGTG CTGGCAAAAA TTGATTATAT GAAAGTAACT
96801 GCAATGTTAA ACCCCCACCT GTTGGACGTC GCGTACAATT ATTTGCTGTT
SUBSTITUTE SHEET RULE 26
96851 GATGGACATG GATTGTGTGG TGCAAAGCGT GCAATGGAAA CAATTGTCAA
96901 CCGACACGTA TTGTTTTGAG CCGTTTTACG ACTCTCAAAT TAAATGGTTG
96951 TACGCGCCCA AAAGCGGACA AAGTTTTGAT AGTTATCTTG AAAACTATGC
97001 AACTCTAATT CGAGTCAAAC AAGTGCAGCA ACATCGAAAA GAATTAATAC
97051 TGCATTGTGT GGATTTTCTT ACAATGAAAG CAAATGACAA TTTTATGGTG
97101 TTCAAAAATT ATATTAACAT GATTATAAAA GTGTATTTGC AATTTTACAA
97151 TTACAGATTT CCCATCAATT TTGAGGACAA CACGATGAAA CCTTGTGTAA
97201 ATTTAACTTT TAGACGTGGC GGCAGTTGGA AAACTCAACT GCAACCCGTA
97251 TGCAATTATG TTTACAAAAG TAAAAATATG CCAAAATTTA TTAAATAAAA
97301 CAAATTAATT TAAACAAGCG TTTTTATTGA CAATACTCAC ATTTGATATT
97351 ATTTATAATC AAGAAATGAT GTCATTTGTT TTCAAAATTG AACTGGCTTT
97401 ACGAGTAGAA TTTTACTTGT AAAACACAAT CAAGAAATGA TGTCATTTTT
97451 GTACGTGATT ATAAACATGT TTAAACATGG TACATTGAAC TTAATTTTTG
97501 CAAGTTGATA AACATGATTA ATGTACGACT CATTTGTTTG TGCAAGTTGA
97551 TAAACGTGAT TAATATATGA CTCATATGTT TGTGCAAAAA TGATGTCATC
97601 GTACAAACTC GCTTTACGAG TAGAATTCTA CTTGTAACGC ATGATCAAGG
97651 GATGATGTCA TTTGTTTTTT TAAAATTCAA CTCGCTTTAC GAGTAGAATT
97701 CTACTTGTAA AACACAATCG AGGGATGATG TCATTTGTAG AATGATGTCA
97751 TTTGTTTTTC AAAACCGAAC TCGCTTTACG AGTAGAATTC TACTTGTAAC
97801 GCAAGATCGG TGGATGATGT CATTTTAAAA ATGATGTCAT CGTACAAACT
97851 CGCTTTACGA GTAGAATTCT ACGTGTAAAA CACGATTACA GCACTTCGTA
97901 GTTGTATCGA AAATTGTTCA ATGGCTCTTT GTTAATGTCG TAATTGATTA
97951 ATATGTCGTA CAATTTGGCG GCGTTGTGTT TGCACACGAC CGTTTTTAGT
98001 TCTTGAAACA TTTTTTCGTG TATGTTTAGC ATGTTGTATT TCAGAGTGCG
98051 ATGTGTAATG CTGGTGACGA GCATCAAAAT GATAAAATCT AAAGCGGCTA
98101 ATTTGTAATC CCGTTCATAC GCTCTGTAAT CGCCAACAAC TCTGTGGCCA
98151 GATCTTTTTA GATTTTGACA GGCGTTATGG TACGAATTGA TAATATTTAC
98201 TATAGTTTCT CTTGTTATCG GTTTGTCGAT TAAACTGTTA ACAAACATCA
98251 CGTTGCCCAA GCGCGACGGT TTAGACACCG ACTTGTTTTT TGTCTGTTCA
98301 AATTTGTACA AATTAAAAAC GCTCATAGAC TGGTCGTCAG GCAGTGTGTC
98351 GTTATACAAA CAAAATGGTA AAACGTTTAA TTCGACAAAC GACGAGCACA
98401 TTAAAGTTTG TTGGCTGTTA ACGTCCTGGG GATGTAAACT GTTATTCATA
98451 ACGTAACACA CTTCAATGTC GGAATGCTTG TTTTCAAATT TGTCCTTGTC
98501 TACAGTTTCA ATGGTGATTG AGCGAGGTTT GAGTTTATTT TCTAAATTCA
98551 TTTGGATATT TTCAATATGG TATACCACCG ACACGTTGTG AGCCAGCGAT
98601 CCTTGATTGG TTTTAATCAT ATTCAAAATA TTCATGATAT GGTTGAAAAA
98651 AGAGTCTGTC AAAACGTTTG TGTCGTTGTT AAATATCGCT TTCCAGGGTT
98701 TACTGTTGCG TGACTCAACG ACGGCCGTGT AACATAACAA GCGCGCCAGT
98751 TGCATGTGCG ACAACTTAAT GTTATCAATG TCGGTGATGT TTGGCACCAG
98801 ATTTTCATTG CCGTCTTCCA GTAGCGTGCT CAGTTCGGTC GAGTAGTTAT
98851 TCAACGATCG ATTGTGCGAT TCAAACAAGT TTACTATCGC AGGTTGTACA
98901 TAGTTTTTTA TGTCGTCAAA TTGAATTATA TCGATCTTGT CCTTGTTCTC
98951 CAGCATAAAC GACAAATTTT TTAGGTCGAA TTTAATATTT GGCGCGTTTT
99001 CGTTGGACTT TTTGTAATTT AACAACATCG CCAACAGTTT GTGTAACTCG
99051 CCGTTAGCTT GATCTTTGCT AAACAGTTTA TTGGTAGCGT AATTCACGTT
99101 GTCGTTCAAA AACAGCAACT CGTTGATGAT CATTTTTTGT AAAAGCGCGT
99151 ACTTGCTCAT GTTGACAGAA TCTCTTACAT TTCAGTTGTA AACGCGTCTG
99201 TACAAATTGG CCATGCGATT CGGAATGCAC ACGGGGATCG TGCGAGCCAG
99251 TGCCGTTTGG CGAAATAGCA TTTTTTCATA GCCGCTCGAA CAATCGCACG
99301 CGTCCGGCGA AAATTGCACC GTGTTCAAAT TCATATTCAA CCGGCCGTCG
99351 TTGCATAGAT AAGGCCTCGG TGTTCCCGTA TCGTCCACCA AGTCTCTGTA
99401 CGTGCTCACG CATGTTTGAG ACACGACAAA ATCTCCGCCG GCGGAGAAAA
99451 CGTGAACCAA GCCCAGTGCG GGATCGCATT CTATCAAGTC CGGAGCCTGC
99501 GCGTTTACCA AAGCGTCGGA GGCGTTGCAA AAGCCATCCT GGCAGGTCAA
99551 CTCGTTTGCA GCGCTGGAGA TCACGCAGTT GTCTCTACAC TGCTGATCCG
99601 TCACGCACGG TAACCGGTTC AATGAACAAT CTACGCCTCG ATTGCGCTGA
99651 AACGTAAAAT TTAACGGCGG CGCTTCCAAC TCGTTAATGT GCATGTATGC
99701 ATCTTGCAAA ATAAATTTTT GAACAAATTT AAACGTGTAC ATGTACACGA
99751 TTAGTATAAT TACCAGTAGA ATAAGTATTT GCCAAAAGTT CAACATGATC
99801 GTCTTAACTG AGTGTGAAAA GCGTGGTGTG ACGCACGAAA TGACTGGTTG
99851 CGCAAAAAAT AAACCGGGGT CTATATAACT CGGCGTCGAC CGCGTTCATT
99901 TTTACCGTCA TGCATCTGAC GGCTAATGTA TTGCTCGTTC CTAACGCGCT
99951 CAAAAAGCGG GACGTGAAAT ACATTTATAA TACCTATTTG AAAAATTACA
100001 GTGTAATTGA AGGTGTGATG TGTTGCAATG GCGATTGTTT GGCCGTGGTG
100051 GTGTTGGACC GAAATCAGCT GCAAAACACG GACATGGAAG TGTTGGAGAG
100101 TTTAGAATAC ACTAGTGACA ACATTGAACT GTTATGCGAA AAAATATGTG
100151 TGATAGTTGA TAATTACGAC AAGTATTACC AAAAAAATTG TGTATAAATA
100201 AAATACCAAA TTTTATTATA TCATTTTGTT TTATTTAATA ATTAAAGAAT
100251 ACAACGCCAC ATCTATTCCT AGTACAACAA ATAATTTGAT TATTATTTTT
100301 GAGTGCACAT TAAAAAATAA CAAACAGTGT AAAAATACTA CAGAATAATA
100351 CAATACATAA ATATTATAGT AAATAGCTGC AATTTTGATA GCGTAATTTA
100401 TACTTTGATA TTTTTCAACG TACAACGTTA AATGTTGATA CGCATTATTC
100451 ACAAATAACA AAATTTTTCT AATATGCCAT TTGTCCGCAA TTGTTTTTGC
100501 GATATCAAAG CCTTTTTCAA ACAATTGAAA AATTGCAAAC AAAACCACGT
100551 ACATGACGTT ATACATAGTG TTAAAGTTTT TACATAACAA TTCTATAATG
100601 AAGAAAATTG CTAAACACGG CATGAGCGCG CACATAATCG CGTTGGCCGC
100651 AAATATCTCG TACGTACAAA AATACTCGGA CATTCTCCAA TAAGTAAAAT
100701 GCATTTTGCT ATTATACTGT TGTTTCTTCT AGTGATTATT GCAATAGTGT
100751 ACACGTATGT AGACTTGATA GATGTGCACC ATGAAGAGGT GCGTTATCCT
100801 ATTACGGTTT TTGACAACAC ACGCGCGCCG CTTATTGAAC CGCCGTCCGA
100851 AATAGTAATC GAAGGCAATG CACACGAATG TCACAAAACT TTGACGCCGT
100901 GCTTCACACA CGGCGATTGC GATCTGTGCC GCGAAGGATT AGCCAACTGC
100951 CAGTTGTTTG ACGAAGATAC AATAGTCAAG ATGCGTGGAG ATGACGGCCA
101001 AGAACACGAG ACGCTTATTC GAGCGGGAGA AGCGTACTGC TTGGCTTTGG
101051 ATCGAGAACG CGCCCGATCG TGTAACCCCA ACACGGGTGT GTGGTTGTTG
101101 GCCGAAACTG AAACTGGTTT CGCTCTTTTG TGCAACTGCT TACGGCCCGG
101151 ACTTGTTACG CAGCTCAACA TGTACGAAGA CTGCAACGTG CCCGTGGGCT
101201 GCGCGCCTCA CGGCCGTATC GACAATATCA ACAGCGCTTC GATCCGGTGC
101251 GTGTGCGACG ACGGGTACGT GAGCGACTAT AACGCCGACA CCGAAACTCC
101301 GTATTGCCGT CCGCGCACCG TGCGCGACGT AATGTACGAC GAGAGTTTTT
101351 TTCCGCGGGC GCCATGCGCA GACGGCCAAG TTCGTCTGGA TCATCCGGCG
101401 CTCAATGATT TTTACCGCAG ACACTTTAGA CTCGAAGACA TTTGCGTGAT
101451 CGACCCTTGC TCGGTGGACC CGATTAGCGG GCAACGCACA TCGGGACGCT
101501 TATTTCACCA ACCAACCGTA AATGGTGTGG GAATCAACGG ATGCAATTGT
101551 CCGGCCGATG ACGGGTTACT GCCCGTGTTT AATCGACACA CCGCCGACAC
101601 GGGCATGGTT AGACAAAGCG ACCGCACCGT CGCGAACGCT TGCTTGCAGC
101651 CGTTTAACGT GCACATGTTA TCGTTGCGTC ATGTGGATTA CAAATTTTTC
101701 TGGGGCCGCA GCGACCACAC CGAGTTTGCC GACGCGGACA TGGTGTTTCA
101751 AGCGAATGTC AACCAACTCA GTCACGAACG GTATCGAGCG ATTTTGTACT
101801 CGTTGCTCGA GTCGCACCCG GACGTAACAG AAATCGTAAC AGTCAACATG
101851 GGTGTCATGA AAATTTCCGT GTCATACGAT ACCACATTGA AAAATATACT
101901 ATTACCATCT TCTGTTTTTA GGCTATTTAG ATTTAAAGAA AGTGGCACTG
101951 CTCAGCCGGT ATGCTTCTTT CCAGGCGTAG GACGGTGCAT AACCGTCAAT
102001 TCCGATTCGT GCATCAGGCG ACACGCTGGT GGTCAAGTGT GGACCGCAGA
102051 AACGTTCACC AACTCGTGGT GTGTACTGAG TCGTGAAGGT ACGCATATAA
102101 AAGTTTGGAG TCGCGCGTCA CGATATCCAC GCGGAGACGC GCCTGCAGCG
102151 TTAAGATTGC GCGGCTTCTT TCTGAACAAC GATCGCGAAC GAAACACAAT
102201 AAGAGCGGTC ACTACAGGCG ACATGACCCA AGGGCAACAA ATAGACGCAT
102251 TAACCCAAAT ACTTGAAACT TACCCCAACT ACTCTGTATA ACAACATGAG
102301 CATTTTAAAA GTTGTAGAAG CGTGCAATTT GGCACACACT TTTTTAAAAT
102351 TGGGTTATTT ATTTAGGGCC AAGACTTGTT TGGATATCGC TTTAGATAAT
102401 TTGGAACTAT TGCGTCGAAA GACTAACATA AAAGAAGTGG CAGTCATGTT
102451 AAACAAGAAA ACTACAGAGT GTTTGCAATT GAAACGAAAA ATAGATAAAA
102501 AAATTGCACA ACGTGTTTTA ATAAAAATTT ACACTATCAA ATGATGACAT
102551 CATAACGGGT TCAATATTCT GTGTGCAAAA ATAAATGACA TCATATTTCA
102601 AACTTGTTTT ACGCGTAAAA TTCTACTGGT AAAACAAGTT TGAGATATGA
102651 TGTCATCATC ACAAATAATA GTATGTAATA AAATAAACAT ATTTGTGTGT
102701 AAATATAATT TATTACAAAT AAATTTTACA TTGAATCAAT CTGTCTTCGT
102751 GTTTGTTGTA AGGTCTTCGA ATCTTGTGTT TCAGCCCCTC GGGATGGTCA
102801 AAATGCGCCG TAGTAATTGT TAATGGATCT TTCAACGATT TTTTGCCCAT
102851 GGCGAGTGTG ACAAACGCGG CCACGACAAA CAGCAGGATA ATCAGTTTCA
102901 TGGTGTTCTA TATTCGACAA TATATGGGTC GCTTCTAAAT CACCTTGTCC
102951 CCAAAAGCCT CTTTTATAGT TTTTTAGAAC ACGTTGTGTA TTCCAACAGT
103001 AATTGTTCCA TCTCTTTCAA CAGCCATTCA GCATCCGGTC GTTGACTGTA
103051 ATCATGCTGA ATTAATTTAC AAACAATTTC GGTCAATTTA GGATGGCCTT
103101 GGGATAAACT TGCCGGCATT TGCTGTACAT TGTTTCTAAA GTTAGTTAGC
103151 GTAGTTTCGC GTTCCAAAGC AGTCTTGAAG GGCATTATCA ATTCGAATAA
103201 AACAATGCCC AAACTATACA TGTCATTTTT GGGGGTGTAC ACTTTTTTGA
103251 TTTGTTCTGG TGCAGCGTAC AAAGTTATAT TTTGAGGGTT GTTTTTGATA
103301 AACGTTTTGT ATAGACTGCC AAACATGCCG CCCACATACA AATCAAAGTC
103351 GGGCCCAGTC ATGAAAATAT CTTCGGGATT AATATTGTGG TGCACGATAT
103401 TTACGGAATG AATCGCTTTC ACGGCGCTCA CCAAATCAAC AAACTTGCTA
103451 ATATAAAAGC CAAAATCCGC CGGAACTTTA ATGTTGGTCT TTGCAAAAGT
103501 TTGCAAATTG CGTTGTTTCA AATAGTCGCT CAACATGTAC TCGTTTAGAG
103551 GCGACGCAAT ATATATGCGG TGCTGCCGCG GATTCAAATA AACCAATTGT
103601 TCGGGTTTCA TGGTATACAG TTAAGTGTTA ACGCGTCACT AAATTCAGAC
103651 ACGAGCGCAC GCCCTATATA CATACAATTT ATCGCACAAG ATGCTTAACG
103701 CGATCTGTTT ATAAACTAAA ACGCACTGCA ATAAATTTTA GCAAGCATTT
103751 GTATTTAATC AATCGAACCG TGCACTGATA TAAGAATTAA AAATGGGTTT
103801 GTTTGCGTGT TGCACAAAAT ACACAAGGCT GTCGACCGAC ACAAAAATGA
103851 AGTTTCCCTA TGTTGCGTTG TCGTACATCA ACGTGACGCT GTGCACCTAC
103901 ACCGCCATGT TGGTGGGATA CATGGTAACA TTCAATGACT CCAGCGAATT
103951 GAAATATTTA CAATACTGGT TGCTGTTGTC GTTTTTGATG TCCGTGGTGC
104001 TAAACGCTCC GACTCTGTGG ACGATGCTCA AAACCACAGA AGCCCATGAA
104051 GTAATTTACG AAATGAAGCT GTTCCACGCC ATGTACTTTA GTAACGTGCT
104101 GTTGAATTAT GTGGTGTTTT TGGACAATCA AATGGGTACA AATTTTGTTT
104151 TTGTTAACAA TTTAATTCAC TGTTGTGTAC TTTTTATGAT ATTTGTTGAA
104201 TTGCTTATCC TGTTGGGCCA CACAATGGGC ACGTACACGG ATTATCAATA
104251 TGTCAAATCG TGTTATATGG TTATATTGTT TGTTTCAGTT ATGAGTGTTA
104301 CTATTGTTAT GGGTTTAGAG TGTTTGAAAA CGAAACTAAT TGATAACAGT
104351 TTGATGTTTA ACGCGTTTGT GTGCGCTTTG TACATTGTGA TTGCAATAAT
104401 GTGGTCTTTA AAAAATAATT TGACTAGTTA TTACGTTTCA AATTTACAAA
104451 GTATTCAAGT TGTTCCGTTT TCATACAACG ATCCGCCGCC ACCGTTCTCT
104501 AACATTGTAA TGGATGACAT AAAAAATAAA AAATAATTTA TAAAAATGTT
104551 TTTTATTCTT TCACAATTCT GTAAATTCTA AACAAAAAAT ATAAATACAA
104601 ACTTATTATG TTGTCGTCTA AATAAACATC AATTTGTAAA TCTGGACACC
104651 TATTCATATC ATTGATATTA CAGTCTACTA TACAACAATT AAAACTAACC
104701 AAATTATCTT TACAACAATT AAAGCAATTA AAACAATTTA AATAATCTTC
104751 ATTGTCGTCG TATAAGTTTA TTTGCACTGT AGACGGTGTT ACACAGCGAT
104801 CCATTCGACG TTCGTGTTCG ATCAACTTTC TCGCCAACTT GTACCATAAA
104851 AATTGTTTGG ACAAAAAGTT TTCCAACAAT GGTAACGGCC AATTCAACGT
104901 GACGATGCGC ACGTCCTCGG GTATGCATTT GTTAAAAAAC ACACAGCTCG
104951 CTTTACCAAA CGAAAGCAAA GGTACTAAAT ATGGCGCCAT TGGCTGATTT
105001 GTTATTCCAA GATAATTACA AATAAACTGA TCCGTCGTGG GGTGATAACT
105051 GGCAGGTGTC AGCTTTAAAT AATCTTCAAC GTTGTTGTCG CGCAAAAGTC
105101 TGCATTTTAC ACGCGTTGTT AATCCCACGA CTTTTGCATG TAAAATCGGA
105151 TCCAAATACT GCAGAATCGT GTCTATAATT TCTAATGGTA AACGTATGCG
105201 TTTTGCTCGT GGGCGCTTTG TAACGCTCGA CATCCTAATA ACAACTAACA
166
105251 CAAAACTAAA ATGATACTCA ATATATTGCT TTTACAGTTC ATCTTTAGGT
105301 TTAAACTGTG CGTTTATCGC GTTGAGCAAG TCGCCGTTAT CGGCATCAAT
105351 CTCCCAAGCA AACAGGCCGC CCAATTTATT TCGGTCGACA TATTTAACTT
105401 TTCCTAACAC AGAGTCGACG CTGTCAAACG AAATCAAATC ACCTTTACTT
105451 TTATCGAAAA CGTACGACGC TTGAGCGGCG CTGTCAAACG TGTACACATA
105501 ATTGTTGAGA TCTTTTTGAA TTTGACGATA ATCTACAACA CCGTCCTCCC
105551 ACGTGCCCGA CCCCGGCCCG TTGCCAGTGC CGGAAAAATA GTTGTCATTC
105601 GTATAATTTG TTACGCCGGT CCAGCCGCGG CCGTACATGG CGACGCCCAC
105651 AATTATTTTG TTGGGATCGA CGCCTTGTTT CAGTAACGCA TCGACAGCGT
105701 AGTGTGTAGT GTATAGCTCT TCCGAGTTCC AACTTGGCGC GTAGACTGTT
105751 GTTTGGTAGC CCAAATCCGT GTTTGACCAA GCCCCTTTAA AATCGTAACT
105801 CATGAGAAAT ATTTTGCCTA ATGACTTTTG CGCTTCGGCG TAGTTTACCA
105851 CGGCAATCTT GTCGTAACCC GCGCTTATAG CGCTTGTTAA TTCGTAAACC
105901 CTGCCGGTTT GCGCTTCGAG GTCGTCTAGC ATTGCGCGCA GCTCCTCCAA
105951 CAACAAAATG TATGTTTTGG CGTCACCGTC CGCATCGCCC AACGACGGGT
106001 TAGCCCCTTT GCCGCCCGGA AACTCCCAAT CGATGTCTAC ACCGTCAAAG
106051 AATTTCCACA CTTGCAGAAA TTCCTTAACC GAATCTACAA AAACGTTTCT
106101 TTTTTCAACA TCGTGCATAA AATAAAATGG GTCTGATAGA GTCCAGCCTC
106151 CTATTGAAGG AAGAATTTTT AAATGGGGGT TTGCTAATTT TGCCGCCATC
106201 AACTGTCCAA AATTGCCTTT ATACGGCTCG TTCCAAGCGG ACACACCTTT
106251 TTGGGGTTTT TGTACGGCGG CCCACGGATC GTGAATGGCA ACTTTGAAAT
106301 CTTCGCGTCC CTTGCACGAT CTTTGCAAAG ATTCAAAGCT TCCGGGTATC
106351 GTTTTGAGGG CGTCGTTTAT TCCATCGCCG CCGCAGATGG GTATGAAACC
106401 ATACAACAAG TGTGATAAAT TTGGCAAGGG AACTTTGTCT ACGGGAAAGT
106451 TGCGCCCGTA CACACCCCAC TCAACAAAGT ACGCAGCGAC AATTTTATCC
106501 TCTCTCCTGC CAGGTTTGTT GTTTTCCAGC CATGTGTATT CGAGCGGTGC
106551 CAGATGGCCG CCGTCGGTGT CTGCGACTTT GACCAACACG GGATCGCTCA
106601 CGGAACAGCC GTCCTCATTG CAAAGTTTGA CACGCATGTT AAATTGCCCG
106651 CTCACAAGAA CTTTAATGGT AGCCCTTTTA CTTTCGGCGT CGCCTTTCCA
106701 TACCTGCTGC TCGTCAAACA ACACGTACGC TATGTCGCCA ATGTCGCCGT
106751 TCCAGACGTT CCAACTGACT TGAACGTCGA CTTGTTCTTT AGGCTTTATT
106801 AAATTTTCGT AAGCGGTGGC CTCGTAATTT ATTTCTACGA GCGCATAATT
106851 GCGATCGGCC CAATCGATCA CCGGCGTGCC GGGAATCGCG TTAGAAACGG
106901 CGACCAACCA CAAAACGTTT AACAATTTGT ACAACATTTT AATTTATCTT
106951 AATTTTAAGT TGTAATTATT TTATGTAAAA AAATGAACAA AATTTTGTTT
107001 TATTTGTTTG TGTACGGCGT TGTAAACAGC GCGGCGTACG ACCTTTTGAA
107051 AGCGCCTAAT TATTTTGAAG AATTTGTTCA TCGATTCAAC AAAGATTATG
107101 GTAGCGAAGT TGAAAAATTG CGAAGATTCA AAATTTTCCA ACACAATTTA
107151 AATGAAATTA TTAATAAAAA CCAAAACGAT TCGGCCAAAT ATGAAATAAA
107201 CAAATTCTCG GATTTGTCCA AAGACGAAAC TATCGCAAAA TACACAGGTT
107251 TGTCTTTGCC TATTCAGACT CAAAATTTTT GCAAAGTAAT AGTCCTAGAC
107301 CAGCCACCGG GCAAAGGGCC CCTTGAATTC GACTGGCGTC GTCTCAACAA
107351 AGTCACTAGC GTAAAAAATC AGGGCATGTG TGGCGCCTGC TGGGCGTTTG
107401 CCACTCTGGC TAGTTTGGAA AGTCAATTTG CAATCAAACA TAACCAGTTG
107451 ATTAATCTGT CGGAGCAGCA AATGATCGAT TGTGATTTTG TCGACGCTGG
107501 CTGTAACGGC GGCTTGTTGC ACACAGCGTT CGAAGCCATC ATTAAAATGG
107551 GCGGCGTACA GCTGGAAAGC GACTATCCAT ACGAAGCAGA CAATAACAAT
107601 TGCCGTATGA ACTCCAATAA GTTTCTAGTT CAAGTAAAAG ATTGTTATAG
107651 ATACATTACC GTGTACGAGG AAAAACTTAA AGATTTGTTA CGCCTTGTCG
107701 GCCCTATTCC TATGGCCATA GACGCTGCCG ACATTGTTAA CTATAAACAG
107751 GGTATTATAA AATATTGTTT CAACAGCGGT CTAAACCATG CGGTTCTTTT
107801 AGTGGGTTAT GGTGTTGAAA ACAACATTCC ATATTGGACC TTTAAAAACA
107851 CTTGGGGCAC GGATTGGGGA GAGGACGGAT TTTTCAGGGT ACAACAAAAC
107901 ATAAACGCCT GTGGTATGAG AAACGAACTT GCGTCTACTG CAGTCATTTA
107951 TTAATCTCAA CACACTCGCT ATTTGGAACA TAATCATATC GTCTCAGTAG
108001 CTCAAGGTAG AGCGTAGCGC TCTGGATCGT ATAGATCTTG CTAAGGTTGT
108051 GAGTTCAAGT CTCGCCTGAG ATATTAAAAA ACTTTGTAAT TTTAAAAATT
108101 TTATTTTATA ATATACAATT AAAAACTATA CAATTTTTTA TTATTACATT
108151 AATAATGATA CAATTTTTAT TATTACATTT AATATTGTCT ATTACGGTTT
108201 CTAATCATAC AGTACAAAAA TAAAATCACA ATTAATATAA TTACAAAGTT
108251 AACTACATGA CCAAACATGA ACGAAGTCAA TTTAGCGGCC AATTCGCCTT
108301 CAGCCATGGA AGTGATGTCG CTCAGACTGG TGCCGACGCC GCCAAACTTG
108351 GTGTTCTCCA TGGTGGTTAT GAGGTTGCTT TTTTGTTGGG CAATAAACGA
108401 CCAGCCGCTG GCATCTTTCC AACTGTCGTG ATAGGTCGTG TTGCCGATGG
108451 TCGGGATCCA AAACTCGACG TCGTCGTCAA TTGCTAGTTC CTTGTAGTTG
108501 CTAAAATCTA TGCATTGCGA CGAGTCCGTG TTGGCCACCC AACGCCCTTC
108551 TTTGTAGATG CTGTTGTTGT AGCAATTACT GGTGTGTGCC GGCGGATTGG
108601 TGCACGGCAT CAGCAAAAAC GTGTCGTCCG ACAAAAATGT TGAAGAAACA
108651 GAGTTGTTCA TGAGATTGCC AATCAAACGC TCGTCCACCT TGGCCACGGA
108701 GACTATCAGG TCGTGCAGCA TATTGTTTAG CTTGTTGATG TGCGCATGCA
108751 TCAGCTCAAT GTTCATTTTC AGCAAATCGT TTTCGTACAT CAGCTCCTCT
108801 TGAATATGCA TCAGGTCGCC TTTGGTGGCA GTGTCTCCCT CTGTGTACTT
108851 GGCTCTAACG TTGTGGCGCC AAGTGGGCGG CCGCTTCTTG ACTCGGTGCT
108901 CGACTTTGCG TTTAATGCAT CTGTTAAACT TGCAGTTCCA CGTGTTTTTA
108951 GAAAGATCAT ATATATCATT GTCAATCAAA CAGTGTTCGC GTGTCACCGA
109001 CTCGGGGTTA TTTTTGTCAT CTTTAATGAG CAGACACGCA GCTTTTATTT
109051 GGCGCGTGGT GAACGTAGAC TTTTGTTTGA GAATCATACT CACGCCGTCT
109101 CGATGAAGCA CAGTGTCCAC GGTCACGTTG ATGGGGTTGC CCTCAGCGTC
109151 CAAAATGTAT ACCTGGCACT CGTCCGTGTC GTCCTGGCAC TCGAGCCTGC
109201 TGTACATTTT CGAAGTGGAA ATGCCGCATC GCCACGATTT GTTGCACGTG
109251 TGGTGCGCAA AGTGATTGTT ATTCTGCCGC TTCACCAACT CTTTGCCTTT
109301 GACCCACTGG CCGCGGCCCT CGTTGTCGCG AAAACAGTCG TCGCTGTCAC
109351 TGCCCCAACG GTCGATCAGC TCTTCGCCCA CCTCGCACTG CTGCCTGATG
109401 CTCCACATAA GCAAATCCTC TTTGCCCACA TTCAGCGTTT TCATGGTTTC
109451 TTCGACGCGT GTGTTGGGAT CCAGCGAGCC GCCGTTGTAC GCATACGCCT
109501 GGTAGTACCC CTTGTAGCCG ATAATCACGT TTTCGTTGTA GTCCGTCTCC
109551 ACGATGGTGA TTTCCACGTC CTTTTGCAGC GTTTCCTTGG GCGGGGTAAT
109601 GTCCAAGTTT TTAATCTTGT ACGGACCCGT CTTCATTTGC GCGTTGCAGT
109651 GCTCCGCCGC AAAGGCAGAA TGCGCCGCCG CCGCCAAAAG CACATATAAA
109701 ACAATAGCGC TTACCATCTT GCTTGTGTGT TCCTTATTGA AGCCTTGGTG
109751 TGACTGATTT ACTAGTAGCA TTGAGGCATC TTATATACCC GACCGTTATC
109801 TGGCCTACGT GACACAAGGC ACGTTGTTAG ATTAATAATC TTATCTTTTT
109851 ATCTTAATTG ATAAGATTAT TTTTATCTGG CTGTTATAAA AACGGGATCA
109901 TGAACACGGA CGCTCAGTCG ACATCGAACA CGCGCAACTT CATGTACTCT
109951 CCCGACAGCA GTCTGGAGGT GGTCATCATT ACCAATTCGG ACGGCGATCA
110001 CGATGGCTAT CTGGAACTAA CCGCCGCCGC CAAAGTCATG TCACCTTTTC
110051 TTAGCAACGG CAGTTCGGCC GTGTGGACCA ACGCGGCGCC CTCGCACAAA
110101 TTGATTAAAA ACAATAAAAA TTATATTCAT GTGTTTGGTT TATTTAAATA
110151 TCTGTCAAAT TACAATTTAA ATAATAAAAA GCGTCCTAAA GAGTATTACA
110201 CCCTTAAATC GATTATTAGC GACTTGCTTA TGGGCGCTCA AGGCAAAGTA
110251 TTTGATCCGC TTTGCGAAGT AAAAACGCAA CTGTGTGCGA TTCAGGAGAG
110301 TCTCAACGAG GCTATTTCGA TTTTGAACGT TCATAGCAAC GATGCGGCCG
110351 CCAACCCGCC TGCGCCAGAC ATTAACAAGT TGCAAGAACT GATACAAGAT
110401 TTGCAGTCTG AATACAATAA AAAAATTACC TTTACCACTG ATACAATTTT
110451 GGAGAATTTA AAAAATATAA AGGATTTAAT GTGCCTGAAT AAATAATAAT
110501 AAGGGTTTTG TACGATTTCA ACAATGAACT TTTGGGCCAC GTTTAGCATT
110551 TGTCTGGTGG GTTATTTGGT GTACGCGGGA CACTTGAATA ACGAGCTACA
110601 AGAAATAAAA TCAATATTAG TGGTCATGTA CGAATCTATG GAAAAGCATT
110651 TTTCCAATGT GGTAGACGAA ATTGATTCTC TTAAAACGGA CACGTTTATG
110701 ATGTTGAGCA ACTTGCAAAA TAACACGATT CGAACGTGGG ACGCAGTTGT
110751 AAAAAATGGC AAAAAAATAT CCAATCTCGA CGAAAAAATT AACGTGTTAT
110801 TAACAAAAAA CGGGGTAGTT AACAACGTGC TAAACGTTCA ATAAACGCTT
110851 ATCACTAAGT TAATATACTA AAAATCACAT AGTCACTACA ATATTTCAAA
110901 ATATGAAGCC GACGAATAAC GTTATGTTCG ACGACGCGTC GGTCCTTTGG
110951 ATCGACACGG ACTACATTTA TCAAAATTTA AAAATGCCTT TGCAGGCGTT
111001 TCAACAACTT TTGTTCACCA TTCCATCTAA ACATAGAAAA ATGATCAACG
111051 ATGCGGGCGG ATCGTGTCAT AACACGGTCA AATACATGGT GGACATTTAC
111101 GGAGCGGCCG TTCTGGTTTT GCGAACGCCT TGCTCGTTCG CCGACCAGTT
111151 GTTGAGCACA TTTATTGCAA ACAATTATTT GTGCTACTTT TACCGTCGTC
111201 GCCGATCACG ATCACGCTCA CGATCACGCT CGCGATCACG TTCTCCTCAT
111251 TGCAGACCTC GTTCGCGCTC TCCTCATTGC AGACCTCGTT CGCGATCTCG
111301 GTCCCGGTCT AGATCGCGGT CACGTTCATC GTCTCCCAGG CGAGGGCGTC
111351 GACAAATATT CGACGCGCTG GAAAAGATTC GTCATCAAAA CGACATGTTG
111401 ATGAGCAACG TCAACCAAAT AAATCTCAAC CAAACTAATC AATTTTTAGA
111451 ATTGTCCAAC ATGATGACGG GCGTGCGCAA TCAAAACGTG CAGCTCCTCG
111501 CGGCGTTGGA AACCGCTAAA GATGTTATTT TGACCAGATT AAACACATTG
111551 CTTGCCGAGA TTACAGACTC GTTACCCGAC TTGACGTCCA TGTTAGATAA
111601 ATTAGCTGAA CAATTGTTGG ACGCCATCAA CACGGTGCAG CAAACCTGCG
111651 CAACGAGTTG AACAACACCA ACTCTATTTT GACCAATTTA GCGTCAAGCG
111701 TCACAAACAT CAACGGTACG CTCAACAATT TGCTAGCCGC TATCGAAAAC
111751 TTAGTAGGCG GCGGCGGCGG TGGCAATTTT AACGAAGCCG ACAGACAAAA
111801 ACTGGACCTC GTGTACACTT TGGTTAACGA AATCAAAAAT ATACTCACGG
111851 GAACGCTGAC AAAAAAATAA GCATGTCCGA CAAAACACCA ACAAAAAAGG
111901 GTGGCAGCCA TGCCATGACG TTGCGAGAGC GCGGCGTAAC AAAACCCCCA
111951 AAAAAGTCTG AAAAGTTGCA GCAATACAAG AAAGCCATCG CTGCCGAGCA
112001 AACGCTGCGC ACCACAGCAG ATGTTTCTTC TTTGCAGAAC CCCGGGGAGA
112051 GTGCCGTTTT TCAAGAGTTG GAAAGATTAG AGAATGCAGT TGTAGTATTA
112101 GAAAATGAAC AAAAACGATT GTATCCCATA TTAGATACGC CTCTTGATAA
112151 TTTTATTGTC GCATTCGTGA ATCCGACGTA TCCCATGGCC "TATTTTGTCA
112201 ATACCGATTA CAAATTAAAA CTAGAATGTG CCAGAATCAG AAGCGATTTA
HEET RULE 26>
171
112251 CTTTACAAAA ACAAAAACGA AGTCGCTATC AACAGGCCTA AGATATCGTC
112301 TTTTAAATTG CAATTGAACA ACGTAATTTT AGACACTATA GAAACTATTG
112351 AATACGATTT ACAAAATAAA GTTCTCACAA TTACTGCACC TGTTCAAGAT
112401 CAAGAACTAA GAAAATCCAT TATTTATTTT AATATTTTAA ATAGTGACAG
112451 TTGGGAAGTA CCAAAGTATA TGAAAAAATT GTTTGATGAA ATGCAATTGG
112501 AACCTCCCGT CATTTTACCA TTAGGTCTTT AGATTTGGTA AGGCTAGCAC
112551 GTCGACATCA TGTTTGCGTC GTTGACCTCA GAGCAAAAGC TGTTATTAAA
112601 AAAATATAAA TTTAACAATT ATGTGAAAAC GATCGAGTTG AGTCAAGCGC
112651 AGTTGGCTCA TTGGCGTTCA AACAAAGATA TTCAGCCAAA ACCTTTGGAT
112701 CGTGCAGAAA TTTTACGTGT CGAAAAGGCC ACCAGGGGAC AAAGCAAAAA
112751 TGAGCTGTGG ACGCTATTGC GTTTGGATCG CAACACAGCG TCTGCATCGT
112801 CCAACTCGTC CGGCAACATG TTACAACGAC CAGCGCTTTT GTTTGGAAAC
112851 GCGCAAGAAA GTCACGTCAA AGAAACCAAC GGCATCATGT TAGACCACAT
112901 GCGCGAAATC ATAGAAAGTA AAATTATGAG CGCGGTCGTT GAAACGGTTT
112951 TGGATTGCGG CATGTTCTTT AGCCCCTTGG GTTTGCACGC CGCTTCGCCC
113001 GATGCGTATT TTTCTCTCGC CGACGGAACG TGGATCCCAG TGGAAATAAA
113051 ATGTCCGTAC AATTACCGAG ACACGACCGT GGAGCAGATG CGTGTCGAGT
113101 TGGGGAACGG CAATCGCAAG TATCGCGTGA AACACACCGC GCTGTTGGTT
113151 AACAAGAAAG GCACGCCCCA GTTCGAAATG GTCAAAACGG ATGCGCATTA
113201 CAAGCAAATG CAACGGCAGA TGTATGTGAT GAACGCGCCT ATGGGCTTTT
113251 ACGTGGTCAA ATTCAAACAA AATTTGGTGG TGGTTTCTGT GCCGCGCGAC
113301 GAAACGTTCT GCAACAAAGA ACTGTCTACG GAAAACAACG CGTACGTGGC
113351 GTTTGCCGTG GAAAACTCCA ACTGCGCGCG CTACCAATGC GCCGACAAGC
113401 GACGGCTTTC ATTCAAAACG CACAGCTGCA ATCACAACTA TAGTGGTCAA
113451 GAAATCGATG CTATGGTCGA TCGCGGAATA TATTTAGATT ATGGACATTT
113501 AAAATGTGCG TACTGTGATT TTAGCTCAGA CAGTCGGGAA ACGTGCGATT
113551 CTGTTTTAAA ACGCGAGCAC ACCAACTGCA AAAGTTTTAA CTTGAAACAT
113601 AAAAACTTTG ACAATCCTAC ATACTTTGAT TATGTTAAAA GATTGCAAAG
113651 TTTGCTAAAG AGTCACCACT TTAGAAACGA CGCTAAAACA CTTGCCTATT
113701 TTGGTTACTA TTTAACTCAT ACAGGAACCC TGAAGACCTT TTGCTGCGGA
113751 TCGCAAAACT CGTCGCCCAC CAAACACGAT CATTTAAACG ACTGTGTATA
113801 TTATTTGGAA ATAAAATAAA CCTTTATATT ATATATAATT CTTTTATTTA
113851 TACATTTGTT TATACAATTT TATTTACGAC AAATATTGAC TCGTTGTTCA
113901 GAAAGTTTAA TAAGCTTGTC AATTTCTTCG GCTTGCAAAG GGCTGCCAAC
113951 GCGTTCGTTT TGAATGCGCG TAATCCGGTT TACGGTATTG TTGGCGCGAA
114001 CAATAAACTC CTCAACTGGC AAATTAACAA TTTTGTTTGC GTACTCATTG
114051 TGCACTGCGG CCAGGTTTTG TAGAATGTTT TCGGGAAAAA TGGCAATTCT
114101 ATTAAATTTG ACATGTTTTT GATTGTATAC ATAGTTTTGA TATTCTTCCA
114151 GCGTAGGATA TTTGTTTAAA CTCTTGACGC ATTCAATGTA CAATTTGTGC
114201 AGTGACAAAA TTCTGTTAAA ATCCAAACGA GAACATTTCT CAAAAGTTAT
114251 TTCTTGACCG TTGAAATGTA CACTTTGCAA TTGTTTCAAT AAACTGTCGT
114301 AAAAAGTTTT TCCTTCTTCA AGCACAAACG CGGGGCGCAT CGTGTTATCT
114351 ACAACGCTTA TGTACTTGTC AAAATCTTCA ATTATATGAT AGAAATACAA
114401 ATATCTCTCC GCGTTTATGG ACGTGTCGTT TAAAACATGT TCGTCAACAA
114451 CTCCGTTATG ATTTACTTTC AAAAATTTCA AATCTTGCAA AGCGTCCGCG
114501 TTGGTCAACT TGTTGATAAT AAATTTGTCT TTGCATTCAA ACGCTCTGTT
114551 TGCAATCCAC TCCACAGCGT CCAAAACGGA CATGCGTTTA AACATGTTGA
114601 , TACGTTTTAG ACAATACGCT CGTTTTTTTA CCGCCTCAAC GTTCACGTCC
114651 GTGTAGTCGC ACCATTGCAG GATTTGCAAC ATGTCCTCGG CAAAATGCGC
114701 GAACTGCCGC AGCTTTTCCT TTCCAAAATG TTGATTGTCG TGTTTAAAAA
114751 GCAACGTTGA AATTTCCGAG ACATACCACA AAGCCGTGGG CAATTTTACT
114801 TTGATCAGCG GCTCCATAGC CAGGTTGCTG AACCCGATCA TGCATTCCGT
114851 GTTGTTAATG CGGTAAATGA CATAGCGTTT AAAGTAGTCC TTTACATTAT
114901 CGTCAATGTA TTCTGCGTCG TTTATGTGCT TGTACAGCAA ATAGTACATA
114951 AGGCCCGCGT TAAACGCGAC CTTTTTAGCG TCAAAATACG TGCACGCCAA
115001 CACGTAATCG TTGTATTCGT CGAATTGCTC GTTGGGCACT ATGGCGCCCG
115051 TAAAAGGGCG TCTGCTGCGC GGTGACAAAC GCGTTCCATG CTGAATCAAC
115101 TGCTTCAAAC TTTCCAAATT ATAACAATAT TCAATTGAAT TTTTAATCTC
115151 TTTATTTTGG CTCCATAAAA GAGGAAACTC GAGTCGGCTT TTAAACTTGG
115201 TCAAACTGCC CTGAATTGTT TCAAACAAGT TGTAATGTGT TAACAATATG
115251 GCCGGCACAC CGCTATCGTT GGCTAAAATA CAATCGGGGA ATCGAATATT
115301 TTCTACGTTG CTGTAATCGT ACGCTTCGTC GTCGTCGTTG GCAACAACAT
115351 CGTCGGTTTC GGCGTTAACG CTCGCTAACT TGTTCTGATA GTGTAAATTT
115401 TTCATTACAT CAAAAGCGTA TGACTTGTTG CGATTGTGCA AATAATTTAT
115451 GGCCGTGCTA ATGGTGCTGT CGATAATTTT ATCAAAATTG AGAACATCGG
115501 CGTTATACAA CGTTTTATAA AATTCTGTTG ACTTGAACGT GTTTACAAAC
115551 TCATTTTTAT TTTTAATCTG GTCAAAATTC ATACTAGAAT TGTTAGTTTG
115601 TTTGATTTCG CTGAATAGCC GCTGGCGGAG ACGCTTCAGC TTGTCCACCT
115651 CGTTTAACAC GTTGGCGTCC GTCGGCATGG AATTGATAAA TTTGAACCGA
115701 ACAAAAGACA GCAGTTCATC TTTTTTCGAT ATAAAATTTT CGGTTGTAAT
115751 GATATCGTAG TTAAATTCTT TGGTTAAATT GACCCATTCG ACCATTTCAT
115801 CGTTGCGATA AATCTTGCAG TCCGAGTTGT TGACAAACGC CGAGGCAACG
115851 GACAAATCAA TCTGTTCCGT GTTATTATTG ATGGCATAAA ACACAATGCG
115901 TTCGAAACTA AACGGTTTTT CGTTTAGCAA ATTTTTGCAA ACGTTTGCCT
115951 CATTTTTGGA AATTTGGCCG TCGGTCACCA TGTACAAAAG TTTCAACTTG
116001 CCGTCGAGCA AGTTTATATT CTTGTGAATC CACTTTATGA ATTCGCTGGG
116051 CCTGGTGTCA GTACCCTCGC CATTGCGGCG CAAATAACGA CTCTTGACGT
116101 CTCCGATTTC TTTTTGGCGG CAATAAGCAC TCCAATGCAA ATACAAAACT
116151 TTGTCGCAAC TACTGATGTT TTCGATTTCA TTCTGAAATT GTTCTAAAGT
116201 TTGTAACGCG TTCTTGTTAA AGTAATAGTC CGAGTTTGTC GACAAGGAAT
116251 CGTCGGTGGC GTACACGTAG TAGTTAATCA TCTTGTTGAT TGATATTTAA
116301 TTTTGGCGAC GGATTTTTAT ATACACGAGC GGAGCGGTCA CGTTCTGTAA
116351 CATGAGTGAT CGTGTGTGTG TTATCTCTGG CAGCGCGATA GTGGTCGCGA
116401 AAATTACACG CGCGTCGTAA CGTGAACGTT TATATTATAA ATATTCAACG
116451 TTGCTTGTAT TAAGTGAGCA TTTGAGCTTT ACCATTGCAA AATGTGTGTA
116501 ATTTTTCCGG TAGAAATCGA CGTGTCCCAG ACGATTATTC GAGATTGTCA
116551 GGTGGACAAA CAAACCAGAG AGTTGGTGTA CATTAACAAG ATTATGAACA
116601 CGCAATTGAC AAAACCCGTT CTCATGATGT TTAACATTTC GGGTCCTATA
116651 CGAAGCGTTA CGCGCAAGAA CAACAATTTG CGCGACAGAA TAAAATCAAA
116701 AGTCGATGAA CAATTTGATC AACTAGAACG CGATTACAGC GATCAAATGG
116751 ATGGATTCCA CGATAGCATC AAGTATTTTA AAGATGAACA CTATTCGGTA
116801 AGTTGCCAAA ATGGCAGCGT GTTGAAAAGC AAGTTTGCTA AAATTTTAAA
116851 GAGTCATGAT TATACCGATA AAAAGTCTAT TGAAGCTTAC GAGAAATACT
116901 GTTTGCCCAA ATTGGTCGAC GAACGCAACG ACTACTACGT GGCGGTATGC
116951 GTGTTGAAGC CGGGATTTGA GAACGGCAGC AACCAAGTGC TATCTTTCGA
117001 GTACAACCCG ATTGGTAACA AAGTTATTGT GCCGTTTGCT CACGAAATTA
117051 ACGACACGGG ACTTTACGAG TACGACGTCG TAGCTTACGT GGACAGTGTG
117101 CAGTTTGATG GCGAACAATT TGAAGAGTTT GTGCAGAGTT TAATATTGCC
117151 GTCGTCGTTC AAAAATTCGG AAAAGGTTTT ATATTACAAC GAAGCGTCGA
117201 AAAACAAAAG CATGATCTAC AAGGCTTTAG AGTTTACTAC AGAATCGAGC
117251 TGGGGCAAAT CCGAAAAGTA TAATTGGAAA ATTTTTTGTA ACGGTTTTAT
117301 TTATGATAAA AAATCAAAAG TGTTGTATGT TAAATTGCAC AATGTAACTA
117351 GTGCACTCAA CAAAAATGTA ATATTAAACA CAATTAAATA AATGTTAAAA
117401 TTTATTGCCT AATATTATTT TGTCATTGCT TGTCATTTAT TAATTTGGAT
117451 GATGTCATTT GTTTTTAAAA TTGAACTGGC TTTACGAGTA GAATTCTACG
117501 CGTAAAACAC AATCAAGTAT GAGTCATAAT CTGATGTCAT GTTTTGTACA
117551 CGGCTCATAA CCGAACTGGC TTTACGAGTA GAATTCTACT TGTAATGCAC
117601 GATCAGTGGA TGATGTCATT TGTTTTTCAA ATCGAGATGA TGTCATGTTT
117651 TGCACACGGC TCATAAACTC GCTTTACGAG TAGAATTCTA CGTGTAACGC
117701 ACGATCGATT GATGAGTCAT TTGTTTTGCA ATATGATATC ATACAATATG
117751 ACTCATTTGT TTTTCAAAAC CGAACTTGAT TTACGGGTAG AATTCTACTT
117801 GTAAAGCACA ATCAAAAAGA TGATGTCATT TGTTTTTCAA AACTGAACTC
117851 GCTTTACGAG TAGAATTCTA CGTGTAAAAC ACAATCAAGA AATGATGTCA
117901 TTTGTTATAA AAATAAAAGC TGATGTCATG TTTTGCACAT GGCTCATAAC
117951 TAAACTCGCT TTACGGGTAG AATTCTACGC GTAAAACATG ATTGATAATT
118001 AAATAATTCA TTTGCAAGCT ATACGTTAAA TCAAACGGAC GTTATGGAAT
118051 TGTATAATAT TAAATATGCA ATTGATCCAA CAAATAAAAT TGTAATAGAG
118101 CAAGTCGACA ATGTGGACGC GTTTGTGCAT ATTTTAGAAC CGGGTCAAGA
118151 AGTGTTCGAC GAAACGCTAA GCCAGTACCA CCAATTTCCT GGCGTCGTTA
118201 GTTCGATTAT TTTCCCGCAA CTCGTGTTAA ACACAATAAT TAGCGTTTTG
118251 AGCGAAGACG GCAGTTTGCT CACGTTGAAA CTCGAAAACA CTTGTTTTAA
118301 TTTTCACGTG TGCAATAAAC GCTTTGTGTT TGGCAATTTG CCAGCGGCGG
118351 TCGTGAATAA TGAAACGAAG CAAAAACTGC GCATTGGAGC TCCAATTTTT
118401 GCCGGCAAAA AGCTGGTTTC GGTCGTGACG GCGTTTCATC GTGTTGGCGA
118451 AAACGAATGG CTGTTACCGG TGACGGGAAT TCGAGAGGCG TCCCAGCTGT
118501 CGGGACATAT GAAGGTGCTG AACGGCGTCC GTGTTGAAAA ATGGCGACCC
118551 AACATGTCCG TCTACGGGAC TGTGCAATTG CCGTACGATA AAATTAAACA
118601 GCATGCGCTC GAGCAAGAAA ATAAAACGCC AAACGCGTTG GAGTCTTGTG
118651 TGCTATTTTA CAAAGATTCA GAAATACGCA TCACTTACAA CAAGGGGGAC
118701 TATGAAATTA TGCATTTGAG GATGCCGGGA CCTTTAATTC AACCCAACAC
118751 AATATATTAT AGTTAAATAA GAATTATTAT CAAATCATTT GTATATTAAT
118801 TAAAATACTA TACTGTAAAT TACATTTTAT TTACAATCAT GTCAAAGCCT
118851 AACGTTTTGA CGCAAATTTT AGACGCCGTT ACGGAAACTA ACACAAAGGT
118901 TGACAGTGTT CAAACTCAGT TAAACGGGCT GGAAGAATCA TTCCAGCTTT
118951 TGGACGGTTT GCCCGCTCAA TTGACCGATC TTAACACTAA GATCTCAGAA
119001 ATTCAATCCA TATTGACCGG CGACATTGTT CCGGATCTTC CAGACTCACT
119051 AAAGCCTAAG CTGAAAACCC AAGCTTTTGA ACTCGATTCA GACGCTCGTC
119101 GTGGTAAACG CAGTTCCAAG TAAATGAATC GTTTTTAAAA TAACAAATCA
119151 ATTGTTTTAT AATATTCGTA CGATTCTTTG ATTATGTAAT AAAATGTGAT
119201 CATTAGGAAG ATTACGAAAA ATATAAAAAA TATGAGTTCT GTGTGTATAA
119251 CAAATGCTGT AAACGCCACA ATTGTGTTTG TTGCAAATAA ACCCAGTATT
119301 ATTTGATTAA AATTGTTGTT TTCTTTGTTC ATAGACAATA GTGTGTTTTG
119351 CCTAAACGTG TACTGCATAA ACTCCATGCG AGTGTATAGC GAGCTAGTGG
119401 CTAACGCTTG CCCCACCAAA GTAGATTCGT CAAAATCCTC AATTTCATCA
119451 CCCTCCTCCA AGTTTAACAT TTGGCCGTCG GAATTAACTT CTAAAGATGC
119501 CACATAATCT AATAAATGAA ATAGAGATTC AAACGTGGCG TCATCGTCCG
119551 TTTCGACCAT TTCCGAAAAG AACTCGGGCA TAAACTCTAT GATTTCTCTG
119601 GACGTGGTGT TGTCGAAACT CTCAAAGTAC GCAGTCAGGA ACGTGCGCGA
119651 CATGTCGTCG GGAAACTCGC GCGGAAACAT GTTGTTGTAA CCGAACGGGT
119701 CCCATAGCGC CAAAACCAAA TCTGCCAGCG TCAATAGAAT GAGCACGATG
119751 CCGACAATGG AGCTGGCTTG GATAGCGATT CGAGTTAACG CTTTGGCAGT
119801 CACGGTCAGC GTTTTGATGG CGATCACGTT GAGCGAGTGC ACTAACGCGG
119851 CTTTGTAAGT CTCTCCCAAC ATGCGCACGG TCACGCGCCG AGTCGTGCTA
119901 AGCAACATGT GTTTCATGGC CGGAATGAGA GAAGTGTTAA TTTTTTTCAA
119951 CATGCTTTTA AACCCGGACA TTAGCATATC AAAGCCAATG TCCGTAGCAA
120001 TACCGAAAAC GAGCGCGTAA TCTTCCAAAA ACGATGTTAT AATTGACTCC
120051 AAGTCTTGGT CGCTGATTGA ACGGTCGAGC GCCTCGAAAT GTTCGACACG
120101 TGCACGTTCG TTACCGCGGT AATTGTATGC GATCGGAGTT TTAGTAAAGC
120151 CGGTTTCGGC CGTGTACGTG ATCTGGACGG GCGACCCGTT GACGATCATG
120201 CCCAAATCGT TTAGTGTTGG ATTTTTGTTA AAAAGTTTTT CAAATTCCAA
120251 GTCTGTGGCG TTATCGCGCA CGCTGCGCCA TTGCGCTAGT ATTGCGTTGG
120301 AGTCCACGTT GGGTCGTGGC GGTAGTATGC TGGAAGGCGC TTTGTAATCA
120351 AAATCGCGCA GTTCGCTAAA AATGTTGTTG GCCAGCATTT TGAAAGTGAC
120401 AAAGATCGTG TCGCCCAGCA CGAATCCGAT GAGCGATTCC CACCATCTAA
120451 ACGAACAACC GCCGTTGAAT AGCTCTCTGC CGAAACGTCG ACAGTAGGCT
120501 TCGTTGAATT CGCCTTTAAA GCGTTCGGGA AACAAGGGGT CGGGATCGGG
120551 CCGAACGTTA AAAGCCGGCA CATCGTCCAC GCCCATGATC GTGTGTTCTT
120601 CGGTGCGCAA GTATGGGCTG TTAAAGTACA TTTTGGACAG CGAGTCCACT
120651 AAGATGCATT TGTTGTCGAG CGTGTATCTA AACTCGGCAG ACTGAACTTG
120701 GGTTTCGGCG CCTTCACGCA TGGCCGCCGC CCTGTCCAGG TGGTAGCACG
120751 CGGGCTGCGC GTAACCCACG CTAGTCTCGG AGGTCTGCAT GTACATGAAC
120801 GGCGTCGTGT TGGACACGAC GCCGGTTTCG TGAAACGGAT AGCAGCTCAT
120851 GCTTACACAC CCGCGCTTGC TGAAAGCCAG TTTGACGGCC AGCGCTTTGT
120901 CGGCCAATTT CGGCGGCACA TAATAATCGT CGTCACTTGA CGCGGGACGC
120951 AGCGTGTAGT CGATTAGTAT ATGCGGAAAC CTGGTGCGCC ATCTCGAAAT
121001 AAACTCGAGA CGATGCATAT GTATGGCATA CCTACTGGCA TTAGTTAAAT
121051 CGACGGCTGT TAAAACCGCC ATGTTATATA GGACTTAAAA TAAACAACAA
121101 TATATAATGA AATATTTATT AGATTATATT ATAGCAATAC ATTTACATTT
121151 ATTATAACAA TACTTTTTAT TTAATCTGAT TATATTATAA CGATACATTT
121201 TTATTTAGAC ATTGTTATTT ACAATATTAA TTAACTTTTT ATACATTTTT
121251 AAATCATAAT ATATAATCAT TTCGTTGTGC ATTTCAAAGC TTTTGATAGC
121301 TTCAAAGTAA TACATGAATT TAGAGTATTC AGGAAAATGA TAAACGTTGG
121351 TAAACCCGCA TTTGGTACAA TATAACACGG GATTTTTATA ATACAGTTTA
121401 GTTTTTTTAC ACAATTTGCA ATAGTTGTTA GTTGTAGGTT TCAAAGGAAA
121451 CGTGATTGCG CCGTCCAATA CCTGGGTAAA CTTTTTGACT TTAACAGTGG
121501 CAAACACGGT TCCTTTGATA CCCGAAAATC GGTTGTCTTG CAGAGCGGCC
121551 ATCATTTCGC TTGGCTCTTG AAGTATAAAA CAGTTGACGT CATCCACCAC
121601 GTCGGGTCTG GTGCACATGC TTCGGTAGCG CTGCAACACT ATATTGGTGT
121651 ATGTTTCCCT GAGAACGAGA CCGCCGGTGG TGCTAAGATC GATTGTTTGA
121701 ATGCGCTCGT TGGGCTCTTT GTGATTTCGA ATTATGCGCC GAATTATTTC
121751 AAACACTTTG CAGTTGTGAT CGTCAATTCT CAATTCTTTA ACTTCCGTCG
121801 TGTGCTCTAA ACTTACAGGG AAAATGTATT GGTAAAAAAA CCTCTCTCTG
121851 GCTAAATAGC TGAGGTCGAC CAAATTGATA GAAGGATATA TTTCGTACGA
121901 GGTTTTTGGA ACGTTGTGAT ATAGATAGCA TTTTTGACAG CAGATGTCTA
121951 TGCGGTCAGG ATCGTCCAAC GGCTTTTCGA TGTGAACCAC AACATACAAA
122001 AACCATTCGC GCGTGTTGTC TTTGAATCTA TAATTGCAAG TGGTGCATCG
122051 CGAATCGCTC ATGTGCTCCA TAGTCTTCTT GTATTTCACA GGCCTGCTTG
122101 CAAATTTGCC CGTCATGCGC ATATCTTTGC TGTTTATGTA GCCCATAATG
122151 TAATTGGTGG AAAATTTTAG CGTGGCTTTC ATGATGTCGC GTTCTAAATC
122201 GCTCATGAAA TGCATACGTA GATCGCGCTC TTGTTTGAAA TCCAGTTTGT
122251 CGCTGTACGC GGGCAAACCT TCAAACTTGT TCCCAAACTC GGGCGGCACA
122301 AAATATCCAT CTTTTCTGTT GACGACTGGT TTTTTACTTA CAATGCTGCT
122351 GTGCTCCAAC GGCTTGGCCG GAGAGGTGCA CATAGGCTGT TTAGGCGGAG
122401 AGATGCGCGT AGGTGGTTTG ATGTTAGATT TTGGCGGCGG ACGAACAGGC
122451 GACGGCGGCG AGTTGGCGGC AGGCGCTGGC AAAGATTTGG CACGACCCTT
122501 GCCCCCGGTC CTTGGCGCGT CAAAAATGTT ATTCTCTCGA AAAAAACGGT
122551 TCATTGTAAC TGTTAGTTAG CACTCAGAAA TCAACACGAT ACTGTGCACG
122601 TTCAGCCATC GAGAGGCTTT ATATATGGAA ACCTTATCTA TAGAGATAAG
122651 ATTGTATATG CGTAGGAGAG CCTGGTCACG TAGGCACTTT GCGCACGGCA
122701 CTAGGGCTGT GGAGGGGACA GGCTATATAA AGCCCGTTTG CCCAACTCGT
122751 AAATCAGTAT CAATTGTGCT CCGGCGCACA CGCTCGCTTG CGCGCCGGAT
122801 AGTATAAGTA ATTGATAACG GGCAACGCAA CATGATAAGA ACCAGCAGTC
122851 ACGTGCTGAA CGTCCAGGAA AATATAATGA CGTCAAACTG TGCGTCATCG
122901 CCATATTCGT GCGAGGCAAC GTCCGCTTGC GCAGAAGCTC AGCAGGTAAT
122951 GATCGATAAC TTTGTTTTCT TTCACATGTA CAACGCCGAC ATACAAATTG
123001 ACGCAAAGCT GCAATGCGGC GTGCGCTCGG CCGCGTTTGC AATGATCGAC
123051 GATAAACATT TGGAAATGTA CAAGCATAGA ATAGAGAATA AATTTTTTTA
123101 TTACTATGAT CAATGTGCCG ACATTGCCAA ACCCGACCGT CTGCCCGATG
123151 ACGACGGCGC GTGCTGTCAC CATTTTATTT TTGATGCCCA ACGTATTATT
123201 CAATGTATTA AAGAGATTGA AAGCGCGTAC GGCGTGCGTG ATCGCGGCAA
123251 TGTAATAGTG TTTTATCCGT ACTTGAAACA GTTGCGAGAC GCGTTGAAGC
123301 TAATTAAAAA CTCTTTTGCG TGTTGTTTTA AAATTATAAA TTCTATGCAA
123351 ATGTACGTGA ACGAGTTAAT ATCAAATTGC CTGTTGTTTA TTGAAAAGCT
123401 GGAAACTATT AATAAAACTG TTAAAGTTAT GAATTTGTTT GTAGACAATT
123451 TGGTTTTGTA CGAATGCAAT GTTTGTAAAG AAATATCTAC GGATGAAAGA
123501 TTTTTAAAGC CAAAAGAATG TTGCGAATAC GCTATATGCA ACGCGTGCTG
123551 CGTTAACATG TGGAAGACGG CCACCACGCA CGCAAAATGT CCAGCGTGCA
123601 GGACATCGTA TAAATAAGCA CGCAACGCAA AATGAGTGGT GGCGGCAACT
123651 TGTTGACTCT GGAAAGAGAT CATTTTAAAT ATTTATTTTT GACCAGCTAT
123701 TTTGATTTAA AAGATAATGA ACATGTTCCT TCAGAGCCTA TGGCATTTAT
123751 TCGCAATTAC TTGAATTGCA CGTTTGATTT GCTAGACGAT GCCGTGCTCA
123801 TGAACTATTT CAATTACTTG CAAAGCATGC AATTGAAACA TTTGGTGGGC
123851 AGCACGTCGA CAAACATTTT CAAGTTTGTA AAGCCACAAT TTAGATTTGT
123901 GTGCGATCGC ACAACTGTGG ACATTTTAGA ATTTGACACG CGCATGTACA
123951 TAAAACCCGG CACGCCCGTG TACGCCACGA ACCTGTTCAC GTCCAATCCC
124001 CGCAAGATGA TGGCTTTCCT GTACGCTGAA TTTGGCAAGG TGTTTAAAAA
124051 TAAAATATTC GTAAACATCA ACAACTACGG CTGCGTGTTG GCGGGCAGTG
124101 CCGGTTTCTT GTTCGACGAT GCGTACGTGG ATTGGAATGG TGTGCGAATG
124151 TGTGCGGCGC CGCGATTAGA TAACAACATG CATCCGTTCC GACTGTATCT
124201 ACTGGGCGAG GACATGGCTA AGCACTTTGT CGATAATAAT ATACTACCGC
124251 CGCACCCTTC TAACGCAAAG ACTCGCAAAA TCAACAATTC AATGTTTATG
124301 CTGAAAAACT TTTACAAAGG TCTGCCGCTG TTCAAATCAA AGTACACGGT
124351 GGTGAACAGC ACTAAAATCG TGACCCGAAA ACCCAACGAT ATATTTAATG
124401 AGATAGATAA AGAATTAAAT GGCAACTGTC CGTTTATCAA GTTTATTCAG
124451 CGCGACTACA TATTCGACGC CCAGTTTCCG CCAGATTTGC TTGATTTGCT
124501 AAACGAATAC ATGACCAAAA GCTCGATCAT GAAAATAATT ACCAAGTTTG
124551 TGATTGAAGA AAACCCCGCT ATGAGCGGTG AAATGTCTCG CGAGATTATT
124601 CTTGATCGCT ACTCAGTAGA CAATTATCGC AAGCTGTACA TAAAAATGGA
124651 AATAACCAAC CAGTTTCCTG TCATGTACGA TCATGAATCG TCGTACATTT
124701 TTGTGAGCAA AGACTTTTTG CAATTGAAAG GCACTATGAA CGCGTTCTAC
124751 GCGCCCAAGC AGCGTATATT AAGTATTTTG GCGGTGAATC GTTTGTTTGG
124801 CGCCACGGAA ACGATCGACT TTCATCCCAA CCTGCTCGTG TACCGGCAGA
124851 GTTCGCCGCC GGTCCGTTTG ACGGGCGACG TGTATGTTGT TGATAAGAAC
124901 GAAAAAGTTT TTTTGGTCAA ACACGTGTTC TCAAACACGG TGCCTGCATA
124951 TCTTTTAATA AGAGGTGATT ACGAAAGTTC GTCTGACTTG AAATCCCTTC
125001 GCGATTTGAA TCCGTGGGTT CAGAACACGC TTCTCAAATT ATTAATCCCC
125051 GACTCGGTAC AATAATATGA TTTACACTGA TCCCACTACT GGCGCTACGA
125101 CTAGCACAGA CGTCGTCCGT CCACAAACTA TTTAAACAGG CTAACTCCAA
125151 ACATGTTCTT GACCATCTTG GCTGTAGTAG TAATTATTGC TTTAATAATT
125201 ATATTTGTTC AATCTAGCAG TAATGGAAAC AGCTCGGGGG GTAATGTACC
125251 TCCAAACGCC CTGGGGGGTT TTGTAAATCC TTTAAACGCT ACCATGCGAG
125301 CTAATCCCTT TATGAACACG CCTCAAAGGC AAATGTTGTA GATAAGTGTA
125351 TAAAAAATGA AACGTATCAA ATGCAACAAA GTTCGAACGG TCACCGAGAT
125401 TGTAAACAGC GATGAAAAAA TCCAAAAGAC CTACGAATTG GCTGAATTTG
125451 ATTTAAAAAA TCTAAGCAGT TTAGAAAGCT ATGAAACTCT AAAAATTAAA
125501 TTGGCGCTCA GCAAATACAT GGCTATGCTC AGCACCCTGG AAATGACTCA
125551 ACCGCTGTTG GAAATATTTA GAAACAAAGC AGACACTCGG CAGATTGCCG
125601 CCGTGGTGTT TAGCACATTA GCTTTTATAC ACAATAGATT CCATCCCCTT
125651 GTTACTAATT TTACTAACAA AATGGAGTTT GTGGTCACTG AAACCAACGA
125701 CACAAGCATT CCCGGAGAAC CCATTTTGTT TACGGAAAAC GAAGGTGTGC
125751 TGCTGTGTTC CGTGGACAGA CCGTCTATCG TTAAAATGCT AAGCCGCGAG
125801 TTTGACACCG AGGCTTTAGT AAACTTTGAA AACGACAACT GCAACGTGCG
125851 GATAGCCAAG ACGTTTGGCG CCTCTAAGCG CAAAAACACG ACGCGCAGCG
125901 ATGATTACGA GTCAAATAAA CAACCCAATT ACGATATGGA TTTGAGCGAT
125951 TTTAGCATAA CTGAGGTTGA AGCCACTCAA TATTTAACTC TGTTGCTGAC
126001 CGTCGAACAT GCCTATTTAC ATTATTATAT TTTTAAAAAT TACGGGGTGT
126051 TTGAATATTG CAAATCGCTA ACGGACCATT CGCTTTTTAC CAACAAATTG
126101 CGATCGACAA TGAGCACAAA AACGTCTAAT TTACTGTTAA GCAAATTCAA
126151 ATTTACCATT GAAGATTTTG ACAAAATAAA CTCAAATTCT GTAACATCAG
126201 GGTTTAATAT ATATAATTTT AATAAATAAT TAAATAATAT ACAATGTTTT
126251 TATTAATTAT ATTTTTAATA TTAATTAAAA GTATTAATAT TTAAAAAAAT
126301 GAATCAAATT CATCTAAAGT GTCACAGCGA TAAAATTTGT CCTAAAGGGT
126351 ATTTTGGCCT CAACGCCGAT CCCTATGATT GCACGGCGTA TTATCTGTGT
126401 CCGCATAAAG TGCAAATGTT TTGCGAATTA AATCACGAAT TTGACTTGGA
126451 CTCCGCCAGC TGCAAGCCTA TCGTGTACGA TCACACGGGC AGCGGGTGTA
126501 CGGCTCGCAT GTATAGAAAC TTGTTACTAT GAAGAGCGGG TTTCCAGTTG
126551 CACAACACTA TTATCGATTT GCAGTTCGGG ACATAAATGT TTAAATATAT
126601 CGATGTCTTT GTGATGCGCG CGACATTTTT GTAGGTTATT GATAAAATGA
126651 ACGGATACGT TGCCCGACAT TATCATTAAA TCCTTGGCGT AGAATTTGTC
126701 GGGTCCATTG TCCGTGTGCG CTAGCATGCC CGTAACGGAC CTCGTACTTT
126751 TGGCTTCAAA GGTTTTGCGC ACAGACAAAA TGTGCCACAC TTGCAGCTCT
126801 GCATGTGTGC GCGTTACCAC AAATCCCAAC GGCGCAGTGT ACTTGTTGTA
126851 TGCAAATAAA TCTCGATAAA GGCGCGGCGC GCGAATGCAG CTGATCACGT
126901 ACGCTCCTCG TGTTCCGTTC AAGGACGGTG TTATCGACCT CAGATTAATG
126951 TTTATCGGCC GACTGTTTTC GTATCCGCTC ACCAAACGCG TTTTTGCATT
127001 AACATTGTAT GTCGGCGGAT GTTCTATATC TAATTTGAAT AAATAAACGA
127051 TAACCGCGTT GGTTTTAGAG GGCATAATAA AAGAAATATT GTTATCGTGT
127101 TCGCCATTAG GGCAGTATAA ATTGACGTTC ATGTTGGATA TTGTTTCAGT
127151 TGCAAGTTGA CACTGGCGGC GACAAGATCG TGAACAACCA AGTGACTATG
127201 ACGCAAATTA ATTTTAACGC GTCGTACACC AGCGCTTCGA CGCCGTCCCG
127251 AGCGTCGTTC GACAACAGCT ATTCAGAGTT TTGTGATAAA CAACCCAACG
127301 ACTATTTAAG TTATTATAAC CATCCCACCC CGGATGGAGC CGACACGGTG
127351 ATATCTGACA GCGAGACTGC GGCAGCTTCA AACTTTTTGG CAAGCGTCAA
127401 CTCGTTAACT GATAATGATT TAGTGGAATG TTTGCTCAAG ACCACTGATA
127451 ATCTCGAAGA AGCAGTTAGT TCTGCTTATT ATTCGGAATC CCTTGAGCAG
127501 CCTGTTGTGG AGCAACCATC GCCCAGTTCT GCTTATCATG CGGAATCTTT
127551 TGAGCATTCT GCTGGTGTGA ACCAACCATC GGCAACTGGA ACTAAACGGA
127601 AGCTGGACGA ATACTTGGAC AATTCACAAG GTGTGGTGGG CCAGTTTAAC
127651 AAAATTAAAT TGAGGCCTAA ATACAAGAAA AGCACAATTC AAAGCTGTGC
127701 AACCCTTGAA CAGACAATTA ATCACAACAC GAACATTTGC ACGGTCGCTT
127751 CAACTCAAGA AATTACGCAT TATTTTACTA ATGATTTTGC GCCGTATTTA
127801 ATGCGTTTCG ACGACAACGA CTACAATTCC AACAGGTTCT CCGACCATAT
127851 GTCCGAAACT GGTTATTACA TGTTTGTGGT TAAAAAAAGT GAAGTGAAGC
127901 CGTTTGAAAT TATATTTGCC AAGTACGTGA GCAATGTGGT TTACGAATAT
127951 ACAAACAATT ATTACATGGT AGATAATCGC GTGTTTGTGG TAACTTTTGA
128001 TAAAATTAGG TTTATGATTT CGTACAATTT GGTTAAAGAA ACCGGCATAG
128051 AAATTCCTCA TTCTCAAGAT GTGTGCAACG ACGAGACGGC TGCACAAAAT
128101 TGTAAAAAAT GCCATTTCGT CGATGTGCAC CACACGTTTA AAGCTGCTCT
128151 GACTTCATAT TTTAATTTAG ATATGTATTA CGCGCAAACC ACATTTGTGA
128201 CTTTGTTACA ATCGTTGGGC GAAAGAAAAT GTGGGTTTCT TTTGAGCAAG
128251 TTGTACGAAA TGTATCAAGA TAAAAATTTA TTTACTTTGC CTATTATGCT
128301 TAGTCGTAAA GAGAGTAATG AAATTGAGAC TGCATCTAAT AATTTCTTTG
128351 TATCGCCGTA TGTGAGTCAA ATATTAAAGT ATTCGGAAAG TGTGCAGTTT
128401 CCCGACAATC CCCCAAACAA ATATGTGGTG GACAATTTAA ATTTAATTGT
128451 TAACAAAAAA AGTACGCTCA CGTACAAATA CAGCAGCGTC GCTAATCTTT
128501 TGTTTAATAA TTATAAATAT CATGACAATA TTGCGAGTAA TAATAACGCA
128551 GAAAATTTAA AAAAGGTTAA GAAGGAGGAC GGCAGCATGC ACATTGTCGA
128601 ACAGTATTTG ACTCAGAATG TAGATAATGT AAAGGGTCAC AATTTTATAG
128651 TATTGTCTTT CAAAAACGAG GAGCGATTGA CTATAGCTAA GAAAAACAAA
128701 GAGTTTTATT GGATTTCTGG CGAAATTAAA GATGTAGACG TTAGTCAAGT
128751 AATTCAAAAA TATAATAGAT TTAAGCATCA CATGTTTGTA ATCGGTAAAG
128801 TGAACCGAAG AGAGAGCACT ACATTGCACA ATAATTTGTT AAAATTGTTA
128851 GCTTTAATAT TACAGGGTCT GGTTCCGTTG TCCGACGCTA TAACGTTTGC
128901 GGAACAAAAA CTAAATTGTA AATATAAAAA ATTCGAATTT AATTAATTAT
128951 ACATATATTT TGAATTTAAT TAATTATACA TATATTTTAT ATTATTTTTG
129001 TCTTTTATTA TCGAGGGGCC GTTGTTGGTG TGGGGTTTTG CATAGAAATA
129051 ACAATGGGAG TTGGCGACGT TGCTGCGCCA ACACCACCTC CTCCTCCTCC
129101 TTTCATCATG TATCTGTAGA TAAAATAAAA TATTAAACCT AAAAACAAGA
129151 CCGCGCCTAT CAACAAAATG ATAGGCATTA ACTTGCCGCT GACGCTGTCA
129201 CTAACGTTGG ACGATTTGCC GACTAAACCT TCATCGCCCA GTAACCAATC
129251- TAGACCCAAG TCGCCAACTA AATCACCAAA CGAGTAAGGT TCGATGCACA
129301 TGAGTGTTTG GCCCGCAGGA AGATCGCTAA TATCTACGTA TTGAGGCGAA
129351 TCTGGGTCGG CGGACGGATC GCTGCCGCGA CAAACTGTTT TTTCTACTTC
129401 ATAGTTGAAT CCTTGGCACA TGTTGGTTAG TTCGGGCGGA TTGTTAGGCA
129451 ACAAGGGGTC GAATGGGCAA ATGGTAACAT CCGACTGATT TAGATTGGGG
129501 TCTTGACGAC AAGTGCGCTG CAATAACAAG CAGGCCTCGG CGATTTCTCC
129551 GGCGTCTTTA CCTTGCACAT AATAACTTCC GCCGGTGTTA TTGATGGCGT
129601 TGATTATATC TTGTACTAGT GTGGCGGCGC TAAACAAGAA ATAGCCGCCG
129651 GTGGCCAAGA GTATGCCCGT TCCTCCTACT TTTAAGCTTT GCATGTAACT
129701 ATGTAGACGG GGGTTTTGCT GCAGTGCGTT TTGAACACCT TCGGGCGTGC
129751 GCACGTTGGT TTCCGGGAAG TTTTGTTTGA CTGCATTGGA TCGCGTCTGC
129801 TTGGTGTGGT AATTAAAGTC TGGCACGTTG TCCACGCGCC GCAATTGGCT
129851 CAATGAGTTT ATTTGAGGGT CTGAAATGCC CTGAAATACT CCGCGTATGT
129901 TGGGGACATC ATTGTTACGA GTAATTCTGT TTATGTCTGA AGTGCTCACA
129951 AACTGGTTGT TAGATAGTTG ATAGCCCGGC TGAAATCTGT TGTTTCCAAT
130001 GTTGCGTACA CTGGGCGCGT TGAGCACATT TGTGAAACCG GCGGGAGTGC
130051 TTGTTAAAAG ACGCGTATTA TCAGTAATAA AACTGGCCTG ATTAGGATAC
130101 AATTTATTGA CTGCGCGAAG ATTTGAAAAA AAACTCATTT TAAAGCAAAC
130151 TTATTTAATA AATATATCAC AGTAAAGGTT TTGCAAAACT GCCGTCGTCA
130201 ATACAACACG GCAGCGGCGT CATGTTGGTA AAATCTAATC TTCTCCTTGC
130251 TTTAGATTCT GGGCGAGAAG GCGCATTTGT TGTGTAAGTT ATTTCGACGT
130301 CTGCATTATT TGTTGTGTAA GGTATCTCGA CGTATGAAGC AACTTTAACA
130351 TTGTTATAAT TTTTTTTAAA TATTGATGCG CTCCACGGCG CGCGTTGATA
130401 CGGATGATAT CTCTCCATTG TATGATCGCT AAATTTATAT ACCGTTTCAA
130451 TAAATATGTT AAAACCCAAC ATGTTAATTA TAATATTCAT AATAGTTTGT
130501 TTGTTTTCAA TAATTATTTT TACTGTTTTG AAATCTAAAA GAGGTGACGA
130551 TGACGAATCA GACGACGGGT TCAGTTGCTA TAACAAACCA ATTGGAGTAA
130601 ATTTTCCGCA TCCTACTAGA TGTGACGCTT TCTACATGTG TGTCGGTTTA
130651 AATCAAAAAT TAGAGTTAAT CTGCCCTGAA GGATTTGAAT TTGATCCAGA
130701 TGTTAAAAAT TGTGTTCCTA TATCAGATTA TGGATGTACC GCTAACCAAA
130751 ACTAAAAATA AAATAAAATT TATATAGATT AATGAAATAA AATTTATATA
130801 GATTAATAAA ATAAAATTTA TTTAATATAT TATACTATTT ATATTATTTA
130851 CAACACTTAA CGTCTAGACA TAACAGTTTG TAACTTAGAA ACTAAATCAG
130901 AGTTACTGCG CTCAAACTCT GAAAATTTGG CTTGAGACTC GGCCACCTGC
130951 TTACGCAATT GTTCTTGCAG ATTATTCACA GTCGATTGCA ACTCTTCTGA
131001 TTTCTTGGTA GATTCTTGCA AGTCATAGTT TGCCTTTTGT AAATCTAATT
131051 CGGCGACAGC ATGCTTGTGT TTAAGCATAA TGTAGTCGCT GTTTAACATG
131101 GTCATTTTAT GTTCAACTTG GCTGGTCTTG GCTCGCAGCT CGGACAGTTC
131151 TTTTTGCAAT TGCTCCACAT AGTTCAAGTC CGTGGTGTGA TTGTTGACCG
131201 TGTTATTTTC TAAAAGCTCG CGCCAATGCT GTTTGATGGA ATCCTGGTTA
131251 CGAGTGACGT TAATGGGCAT AAATTCTACA TACCCGTGCT TATTGTACAC
131301 GCGACAATCT GATGAAGTAG CGCTGCAAAA ACATTTGTAC ACAGAATTGT
131351 CCATAATTAT CTTGACATAA CACTTGAAAC ACACAGCATG GTTACAATGA
131401 ATCGAAGTCA CAAACGAGGA ATTTACGTTT TTAGTGTCTT TAAAAGTAGT
131451 AAAACAAATA TTACACGAAA CCTCTACTTC TTCTTCGGGT TCTGATTGCT
131501 GCTGCTGCTG CTGCTGCGGC TGCGGAGACT GCGGCGAGGC AAACAAATCT
131551 GGCGACTGTG GTATTACGTA ATTCGGCGAA TAAGATGGAC TATAAGTGGG
131601 AGACCTTGGG GCAATCTCAT TCATCAGCTG AGCCTCAAGA TCTAAACCTC
131651 GTTGCAGAGC CCTCTGCGCA GCTGTCTCCG ACGCAATGTT ATCCTGGTAC
131701 TGCTGGGCAG TGATGTCGGG AAACCGTTCA CGATCCACAT TTTCACTATT
131751 AATTAGTATG ACGTCATCCT CTTGACTTAA TAGCGGATCG TCATTGCTAA
131801 TGTTAACCTG ACCGTGCACG TAATACGTGA CACCCTGACG ATGGTAGGTG
SUBSTITUTE SHEET (RULE 26
131851 CGCGTCAACG GCTCGTTGAC GTTCCCGATA ATCTGCACGT TTTCTTCGCT
131901 GACACGCTGC TCCTGACGCC GCTCCTGACG GCGATGGCTG CGACTGCTTG
131951 AAGACGGCTG GCTGCGACTG CTTGAAGACG GCTGGGCTTC GGGAGATGTT
132001 GTAAAGTTGA TGCGGCGACG GCTGAGAGAC AGCCTGTGGC GGCGGCTGCT
132051 GCTGGGAGTG GCGGCGTTGA TTTGGCGACT CATGGCTGGG CTGGTAGGAT
132101 ACTGTTCACT AGGCTGTGAG GCTTGAACTG TGCTTACGAG TAGAACGGCA
132151 GCTGTATTTA TACTGTTTAT CAGTACTGCA CGACTGATAA GACAATAGTG
132201 GTGGGGGAAC TTGCCAGGCA AAAATGAACT TTTTTGTAAT GCAAAAAAGT
132251 TGATAGTGTA GTAGTATATT GGGAGCGTAT CGTACAGTGT AGACTATTCT
132301 AATAAAATAG TCTACGATTT GTAGAGATTG TACTGTATAT GGAGTGTCAG
132351 GCAAAAGTGA ACTTTTTTGC ATTGCAAAAA AATTCATTTT AAATTTATCA
132401 TATCACAGGC TGCAGTTTCT GTTATCTGTC CCCCACTCAG GCGTGCAGCT
132451 ATAAAAGCAG GCACTCACCA ACTCGTAAGC ACAGTTCGTT GTGAAGTGAA
132501 CACGGAGAGC CTGCCAATAA GCAAAATGCC AAGGGACACC AACAATCGCC
132551 ACCGGTCTAC GCCATATGAA CGTCCTACGC TTGAAGATCT CCGCAGACAG
132601 TTGCAAGACA ATTTGGACAG CATAAACCGC CGAGACAGAA TGCAAGAAGA
132651 ACAAGAAGAA AACCTGCGCT ATCAAGTGCG TAGAAGGCAG CGTCAAAACC
132701 AGCTCCGCTC CATACAAATG GAACAGCAGC GAATGATGGC GGAATTAAAC
132751 AACGAGCCGG TGATTAATTT TAAATTTGAG TGTAGTGTGT GTTTAGAAAC
132801 ATATTCCCAA CAATCTAACG ATACTTGTCC TTTTTTGATT CCGACTACGT
132851 GCGACCACGG TTTTTGTTTC AAATGCGTCA TCAATCTGCA AAGCAACGCG
132901 ATGAATATTC CGCATTCCAC TGTGTGCTGT CCATTGTGCA ATACCCAGGT
132951 AAAAATGTGG CGTTCCTTAA AGCCTAACGC TGTTGTGACG TGTAAGTTTT
133001 ACAAGAAAAC TCAAGAAAGA GTTCCGCCCG TGCAGCAGTA TAAAAACATT
133051 ATTAAAGTGC TACAAGAACG GAGCGTGATT AGTGTCGAAG ACAACGACAA
133101 TAATTGTGAC ATAAATATGG AGAATCAGGC AAAGATAGCT GCTTTGGAAG
133151 CTGAATTGGA AGAAGAAAAA AATCACAGTG ATCAAGTAGC TTCTGAAAAC
133201 CGACAGCTGA TAGAAGAAAA TACTCGTCTC AATGAACAGA TTCAAGAGTT
133251 GCAGCATCAG GTGAGGACAT TGGTGCCGCA ACGTGGCATT ACGGTTAATC
133301 AGCAAATTGG CCGTGACGAC AGTGCGCCAG CCGAGCTGAA CGAGCGTTTT
133351 CGCTCACTTG TCTATTCGAC TATTTCAGAG CTGTTTATTG AAAATGGCGT
133401 TCATAGTATT CAAAATTATG TTTATGCCGG AACTTCTGCT GCTAGTTCAT
133451 GTGATGTAAA TGTTACTGTT AATTTTGGGT TTGAAAATTA ATGTGATATG
133501 AAATGTATAT ATAAAAATGA TGGAATAAAT AATAAACATT TTTATACTTT
133551 TTATGTTTTT TTTATTTCAT GTGATTAAGA AACTTTTAAG ATGGATAGTA
133601 GTAATTGTAT TAAAATAGAT GTAAAATACG ATATGCCGTT ACATTATCAA
133651 TGTGACAATA ACGCAGATAA AGACGTTGTA AATGCGTATG ACACTATCGA
133701 TGTTGACCCC AACAAAAGAT TTATAATTAA TCATAATCAC GAACAACAAC
133751 AAGTCAATGA AACAAATAAA CAAGTTGTCG ATAAAACATT CATAAATGAC
133801 ACAGCAACAT ACAATTCTTG CATAATAAAA ATTTAAATGA CATCATATTT
133851 GAGAATAACA AATGACATTA TCCCTCGATT GTGTTTTACA AGTA
Claims
1. A polynucleotide sequence selected from the group consisting of the sequences designated ORF 13, 20, 22-26, 28-30, 32, 38, 41-46, 50-60, 62-63, 66, 68-79, 81-87, 91-92, 96-98, 101-103, 106-126, 129-130, 140-146, 148-150, 152 and 154 of Table 1.
2. A polypeptide produced by expressing a polynucleotide sequence selected from the group consisting of the sequences designated ORF 13, 20, 22-26, 28-30, 32, 38, 41-46, 50- 60, 62-63, 66, 68-79, 81-87, 91-92, 96-98, 101-103, 106-126, 129-130, 140-146, 148-150, 152 and 154 of Table 1.
3. The polypeptide of claim 2 wherein the ORF has been modified to contain specific preferred codons as indicated below for the specific amino acid listed:
Amino Acid Preferred Codon Cs')
Ala GCC or GCG
Arg AGA, CGA, CGC, or CGT (especially CGC)
Asn AAC
Asp GAC
Cys TGC
Gin CAA
Glu GAA
Gly GGC
His CAC lie ATT or ATA
Leu TTA or TTG
Lys AAA
Phe TTT
Pro CCC or CCG
Ser ACT, TCG or TCT
Thr ACG
Tyr TAC
Val GTG
4. An expression vector containing the complete genomic sequence of AcNPV with the exception that ORF 27 is disrupted.
5. An expression vector containing the complete genomic sequence of AcNPV with the exception that ORF 30 is disrupted.
6. An expression vector containing the complete genomic sequence of AcNPV with the exception that ORF 32 is disrupted.
7. An expression vector containing the complete genomic sequence of AcNPV with the exception that ORF 71 is disrupted.
8. An expression vector containing the complete genomic sequence of AcNPV with the exception that ORF 86 is disrupted.
9. An expression vector containing the complete genomic sequence of AcNPV with the exception that ORF 123 is disrupted.
10. An expression vector containing the complete genomic sequence of AcNPV with the exception that ORF 126 is disrupted.
11. An expression vector containing the complete genomic sequence of AcNPV with the exception that ORF 127 is disrupted.
12. An expression vector containing the complete genomic sequence of AcNPV with the exception that at least one non-essential ORF selected from the group consisting of ORF 27, ORF 30, ORF 32, ORF 71, ORF 86, ORF 123, ORF 126 and ORF 127 is disrupted.
13. A method of synthesizing a polypeptide by expressing the polypeptide in an insect or cultured insect cell which has been transformed by an expression vector derived from AcNPV, wherein the expression vector contains: (1) a coding sequence which codes for the polypeptide; and
(2) control sequences which control replication of the expression vector and which control transcription of the coding sequence; and wherein the expression vector is produced by disrupting at least one non-essential sequence from the complete genomic sequence of AcNPV.
14. The method of claim 13, wherein the sequence of ORF 27 is the non-essential sequence disrupted from the genomic sequence of AcNPV.
15. The method of claim 13, wherein the sequence of ORF 30 is the non-essential sequence disrupted from the genomic sequence of AcNPV.
16. The method of claim 13, wherein the sequence of ORF 32 is the non-essential sequence disrupted from the genomic sequence of AcNPV.
17. The method of claim 13, wherein the sequence of ORF 71 is the non-essential sequence disrupted from the genomic sequence of AcNPV.
18. The method of claim 13, wherein the sequence of ORF 86 is the non-essential sequence disrupted from the genomic sequence of AcNPV.
19. The method of claim 13, wherein the sequences of ORF 123 are the non-essential sequences disrupted from the genomic sequence of AcNPV.
20. The method of claim 13, wherein the sequence of ORF 126 is the non-essential sequence disrupted from the genomic sequence of AcNPV.
21. The method of claim 13, wherein the sequence of ORF 127 is the non-essential sequence disrupted from the genomic sequence of AcNPV.
22. The method of claim 13, wherein at least one non-essential sequence is replaced with an enhancer sequence.
23. An expression vector containing the complete genomic sequence of AcNPV with the exception that at least one ORF selected from the group consisting of ORF 27, ORF 30, ORF 32, ORF 71, ORF 86, ORF 123, ORF 126 and ORF 127 is disrupted.
24. A recombinant virus derived from the expression vector of claim 23.
25. A method of modifiying a desired heterologous gene to be more highly expressed in a baculovirus expression vector system comprising the steps of: a) analyzing the coding sequence of the heterologous gene; and b) modifying a portion of the coding sequence to contain a greater number of preferred codons as indicated below for the specific amino acids:
Amino Acid Preferred Codon (s)
Ala GCC or GCG
Arg AGA, CGA, CGC, or CGT (especially CGC)
Asn AAC
Asp GAC
Cys TGC
Gin CAA
Glu GAA
Gly GGC
His CAC lie ATT or ATA
Leu TTA or TTG
Lys AAA
Phe TTT
Pro CCC or CCG
Ser ACT, TCG or TCT
Thr ACG
SUBSTITUTE SHEET (RULE 26 Tyr TAC
Val GTG
26. The method of claim 25 wherein the desired heterologous gene to be more highly expressed is selected from the group consisting of hepatitis B virus core antigen, hepatitis B virus surface antigen, Human immunodeficiency virus type 1 (HTV-l) envelope protein gp 120, HTV-l envelope protein gp 160, HTV-l Gag protein, HTV-l Gag-pol fusion protein, HTV-l Integration protein, HTV-l Major core p24, HTV-l Nef protein, HTV-l Pol protein, HTV-l protease, HTV-l Rev protein, Human immunodeficiency virus type 2 Gag precursor protein, Human T-cell lymphotrophic virus type 1 (HTLV-1) p20E protein, HTLV-1 gp46 protein, HTLV-1 p40* protein, Bacillus thuringiensis subspecies kurstaJd HD-73 delta endotoxin, Bacillus thuringiensis subspecies aizawai 7.21 crystal protein, Androctonus australis Hector (Scorpion) Insect neurotoxin (AaTT), Buthus eupeus (Scorpion) BelT insectotoxin-1, Heliothis virescens juvenile hormone esterase, Manduca sexta eclosion hormone, Manduca sexta diurectic hormone, Py emotes tritici (Mite) neurotoxin (TxP-1), human CD4 HTV receptor, human erythropoietin (EPO), human alpha interferon, human beta interferon, human interleukin-2, human interleukin- 5, human interleukin-6, human beta-nerve growth factor, human protein kinase C, human tissue plasminogen activator, and human tumor necrosis factor receptor.
27. A method for producing a recombinant baculovirus whereby an insect cell or cultured insect cell has been transformed by an expression vector derived from AcNPV, wherein the expression vector contains: a) a coding sequence which codes for the polypeptide; and b) control sequences which control replication of the expression vector and which control transcription of the coding sequence; and wherein the expression vector is produced by disrupting at least one non-essential sequence from the complete genomic sequence of AcNPV.
28. The method of claim 27, wherein the sequence of ORF 27 is the non-essential sequence disrupted from the genomic sequence of AcNPV.
29. The method of claim 27, wherein the sequence of ORF 30 is the non-essential sequence disrupted from the genomic sequence of AcNPV.
30. The method of claim 27, wherein the sequence of ORF 32 is the non-essential sequence disrupted from the genomic sequence of AcNPV.
31. The method of claim 27, wherein the sequence of ORF 71 is the non-essential sequence disrupted from the genomic sequence of AcNPV.
32. The method of claim 27, wherein the sequence of ORF 86 is the non-essential sequence disrupted from the genomic sequence of AcNPV.
33. The method of claim 27, wherein the sequence of ORF 123 is the non-essential sequence disrupted from the genomic sequence of AcNPV.
34. The method of claim 27, wherein the sequence of ORF 126 is the non-essential sequence disrupted from the genomic sequence of AcNPV.
35. The method of claim 27, wherein the sequence of ORF 127 is the non-essential sequence disrupted from the genomic sequence of AcNPV.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU28972/95A AU2897295A (en) | 1994-07-04 | 1995-06-30 | Autographa californica complete genome sequence |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB9413420A GB9413420D0 (en) | 1994-07-04 | 1994-07-04 | Autographa californica nuclear polyhedrosis virus dna sequences |
GB9413420.2 | 1994-07-04 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO1996001320A2 true WO1996001320A2 (en) | 1996-01-18 |
WO1996001320A3 WO1996001320A3 (en) | 1996-07-25 |
Family
ID=10757768
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB1995/000578 WO1996001320A2 (en) | 1994-07-04 | 1995-06-30 | Complete genomic sequence of autographa californica nuclear polyhedrosis virus c6 |
Country Status (3)
Country | Link |
---|---|
AU (1) | AU2897295A (en) |
GB (1) | GB9413420D0 (en) |
WO (1) | WO1996001320A2 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000005391A1 (en) * | 1998-07-21 | 2000-02-03 | Dow Agrosciences Llc | Antibody-mediated down-regulation of plant proteins |
US6635748B2 (en) * | 1997-12-31 | 2003-10-21 | Chiron Corporation | Metastatic breast and colon cancer regulated genes |
CN114058598A (en) * | 2021-11-04 | 2022-02-18 | 中国科学院精密测量科学与技术创新研究院 | Novel recombinant baculovirus genome insertion site and use thereof |
CN114317608A (en) * | 2020-12-28 | 2022-04-12 | 陕西杆粒生物科技有限公司 | Gene knockout type baculovirus expression vector |
CN118086400A (en) * | 2024-04-17 | 2024-05-28 | 和元生物技术(上海)股份有限公司 | Nucleic acid molecule, recombinant baculovirus containing the same and application thereof |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AUPN570295A0 (en) * | 1995-09-29 | 1995-10-26 | Commonwealth Scientific And Industrial Research Organisation | Biologically active proteins of viral origin |
-
1994
- 1994-07-04 GB GB9413420A patent/GB9413420D0/en active Pending
-
1995
- 1995-06-30 WO PCT/IB1995/000578 patent/WO1996001320A2/en active Application Filing
- 1995-06-30 AU AU28972/95A patent/AU2897295A/en not_active Abandoned
Non-Patent Citations (4)
Title |
---|
ARCH. VIROL., vol.130, pages 1 - 16 M. KOOL AND J.M. VLAK; 'The structural and functional organization of the autographa californica nuclear polyhedrosis virus genome' cited in the application * |
VIROLOGY, vol.185, 19 October 0 pages 229 - 241 R.D. POSSEE ET AL.; 'Nucleotide sequence of the Autographa californica nuclear polyhedrosis 9.4 kbp Eco RI-I and -R (polyhedrin gene) region' cited in the application * |
VIROLOGY, vol.191, pages 1003 - 1008 S.C. BRAUNAGEL ET AL.; 'Sequence, genomic organization of the EcoRI-A fragment of Autographica californica nuclear polyhedrosis virus, and identification of a viral-encoded protein resembling the outer capsid protein VP8 of Rotavirus' cited in the application * |
VIROLOGY, vol.202, pages 586 - 605 M.D. AYRES ET AL.; 'The complete DNA sequence of Autographa californica nuclear polyhedrosis virus' * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6635748B2 (en) * | 1997-12-31 | 2003-10-21 | Chiron Corporation | Metastatic breast and colon cancer regulated genes |
US7279307B2 (en) | 1997-12-31 | 2007-10-09 | Chiron Corporation | Metastatic breast and colon cancer regulated genes |
US7795407B2 (en) | 1997-12-31 | 2010-09-14 | Novartis Vaccines And Diagnostics, Inc. | Metastatic breast and colon cancer regulated genes |
WO2000005391A1 (en) * | 1998-07-21 | 2000-02-03 | Dow Agrosciences Llc | Antibody-mediated down-regulation of plant proteins |
CN114317608A (en) * | 2020-12-28 | 2022-04-12 | 陕西杆粒生物科技有限公司 | Gene knockout type baculovirus expression vector |
CN114317608B (en) * | 2020-12-28 | 2023-08-22 | 陕西杆粒生物科技有限公司 | Gene knockout type baculovirus expression vector |
CN114058598A (en) * | 2021-11-04 | 2022-02-18 | 中国科学院精密测量科学与技术创新研究院 | Novel recombinant baculovirus genome insertion site and use thereof |
CN114058598B (en) * | 2021-11-04 | 2023-04-28 | 中国科学院精密测量科学与技术创新研究院 | Novel recombinant baculovirus genome insertion site and application thereof |
CN118086400A (en) * | 2024-04-17 | 2024-05-28 | 和元生物技术(上海)股份有限公司 | Nucleic acid molecule, recombinant baculovirus containing the same and application thereof |
Also Published As
Publication number | Publication date |
---|---|
WO1996001320A3 (en) | 1996-07-25 |
GB9413420D0 (en) | 1994-08-24 |
AU2897295A (en) | 1996-01-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DK2467489T3 (en) | Baculovirus-based production of biopharmaceuticals free of contaminating baculoviral virions | |
Cummings et al. | The complete DNA sequence of the mitochondrial genome of Podospora anserina | |
KR102147007B1 (en) | Fad3 performance loci and corresponding target site specific binding proteins capable of inducing targeted breaks | |
AU2013312198B2 (en) | Fluorescence activated cell sorting (FACS) enrichment to generate plants | |
CN111163803B (en) | Oncolytic viruses expressing CAR T cell targets and uses thereof | |
AU2017353868B2 (en) | Synthetic chimeric poxviruses | |
CN112543806B (en) | Synthetic chimeric vaccinia virus | |
KR102080055B1 (en) | Plant regulatory elements and uses thereof | |
CN113215109B (en) | Construction of an attenuated strain with combined deletion of multiple genes in African swine fever and its application as a vaccine | |
KR20220165731A (en) | Recombinant poxvirus-based vaccine against SARS-COV-2 virus | |
KR20230113832A (en) | Chimeric poxvirus compositions and uses thereof | |
Van Oers et al. | The baculovirus 10-kDa protein | |
CN112899290B (en) | Attenuated African swine fever virus strain with deletion of natural immune suppressor gene and application thereof | |
CN113025629A (en) | Attenuated African swine fever virus strain with gene deletion and application thereof | |
KR20220148823A (en) | Poxvirus-based vectors produced by natural or synthetic DNA and uses thereof | |
CN116670153A (en) | Genomic deletion of african swine fever vaccines allowing efficient growth in stable cell lines | |
WO1996001320A2 (en) | Complete genomic sequence of autographa californica nuclear polyhedrosis virus c6 | |
CN112261951A (en) | Stem cells comprising synthetic chimeric vaccinia virus and methods of use | |
US20040185565A1 (en) | High throughput system for producing recombinant viruses using site-specific recombination | |
WO1998050571A1 (en) | Entomopoxvirus-based gene delivery vector for vertebrates | |
US6180098B1 (en) | Recombinant helicoverpa baculoviruses expressing heterologous DNA | |
HK40078138A (en) | Poxvirus-based vectors produced by natural or synthetic dna and uses thereof | |
KR100270928B1 (en) | Transfer vector and manufacturing method using silkworm polyhedron virus p10 gene | |
KR100325394B1 (en) | Novel transfer vector for eukaryotic cell, recombinant expression virus vector and process for preparing recombinant protein using the same | |
HK40042130A (en) | Stem cells comprising synthetic chimeric vaccinia virus and methods of using them |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AM AT AU BB BG BR BY CA CH CN CZ DE DK EE ES FI GB GE HU IS JP KE KG KP KR KZ LK LR LT LU LV MD MG MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TT UA UG US UZ VN |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): KE MW SD SZ UG AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase in: |
Ref country code: CA |