WO1999039004A1 - Resequençage automatique - Google Patents
Resequençage automatique Download PDFInfo
- Publication number
- WO1999039004A1 WO1999039004A1 PCT/US1998/005438 US9805438W WO9939004A1 WO 1999039004 A1 WO1999039004 A1 WO 1999039004A1 US 9805438 W US9805438 W US 9805438W WO 9939004 A1 WO9939004 A1 WO 9939004A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- nucleic acid
- probes
- target nucleic
- probe
- Prior art date
Links
- 239000000523 sample Substances 0.000 claims abstract description 191
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 71
- 238000009396 hybridization Methods 0.000 claims abstract description 69
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 69
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 69
- 238000000034 method Methods 0.000 claims abstract description 49
- 230000000295 complement effect Effects 0.000 claims abstract description 29
- 239000002773 nucleotide Substances 0.000 claims description 83
- 125000003729 nucleotide group Chemical group 0.000 claims description 83
- 241000282414 Homo sapiens Species 0.000 claims description 50
- 238000006467 substitution reaction Methods 0.000 claims description 16
- 241000894007 species Species 0.000 claims description 15
- 241000288906 Primates Species 0.000 claims description 7
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 5
- 230000009870 specific binding Effects 0.000 claims description 4
- 238000013461 design Methods 0.000 abstract description 11
- 230000000875 corresponding effect Effects 0.000 description 25
- 238000004458 analytical method Methods 0.000 description 24
- 230000003321 amplification Effects 0.000 description 21
- 238000003199 nucleic acid amplification method Methods 0.000 description 21
- 241000282577 Pan troglodytes Species 0.000 description 19
- 238000003491 array Methods 0.000 description 17
- 241000282405 Pongo abelii Species 0.000 description 16
- 241000282575 Gorilla Species 0.000 description 13
- 108020004414 DNA Proteins 0.000 description 11
- 108090000623 proteins and genes Proteins 0.000 description 10
- 238000012163 sequencing technique Methods 0.000 description 9
- 230000000692 anti-sense effect Effects 0.000 description 8
- 239000002299 complementary DNA Substances 0.000 description 8
- 239000000203 mixture Substances 0.000 description 8
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 7
- 238000006243 chemical reaction Methods 0.000 description 7
- 238000002966 oligonucleotide array Methods 0.000 description 7
- 241000282704 Alouatta seniculus Species 0.000 description 6
- 108091034117 Oligonucleotide Proteins 0.000 description 6
- 230000037429 base substitution Effects 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 6
- 210000004027 cell Anatomy 0.000 description 6
- 238000003786 synthesis reaction Methods 0.000 description 6
- 238000013518 transcription Methods 0.000 description 6
- 230000035897 transcription Effects 0.000 description 6
- 108020005187 Oligonucleotide Probes Proteins 0.000 description 5
- 239000002751 oligonucleotide probe Substances 0.000 description 5
- 108700028369 Alleles Proteins 0.000 description 4
- 241000288988 Galago Species 0.000 description 4
- 238000012300 Sequence Analysis Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 241001416535 Dermoptera Species 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 108010028263 bacteriophage T3 RNA polymerase Proteins 0.000 description 3
- 238000010367 cloning Methods 0.000 description 3
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 238000003780 insertion Methods 0.000 description 3
- 230000037431 insertion Effects 0.000 description 3
- 230000035772 mutation Effects 0.000 description 3
- 102000054765 polymorphisms of proteins Human genes 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 210000001519 tissue Anatomy 0.000 description 3
- 239000011534 wash buffer Substances 0.000 description 3
- XKRFYHLGVUSROY-UHFFFAOYSA-N Argon Chemical compound [Ar] XKRFYHLGVUSROY-UHFFFAOYSA-N 0.000 description 2
- 108091033380 Coding strand Proteins 0.000 description 2
- 102000053602 DNA Human genes 0.000 description 2
- 238000000018 DNA microarray Methods 0.000 description 2
- 241000588724 Escherichia coli Species 0.000 description 2
- ZHNUHDYFZUAESO-UHFFFAOYSA-N Formamide Chemical compound NC=O ZHNUHDYFZUAESO-UHFFFAOYSA-N 0.000 description 2
- 241000282412 Homo Species 0.000 description 2
- 241000282560 Macaca mulatta Species 0.000 description 2
- TWRXJAOTZQYOKJ-UHFFFAOYSA-L Magnesium chloride Chemical compound [Mg+2].[Cl-].[Cl-] TWRXJAOTZQYOKJ-UHFFFAOYSA-L 0.000 description 2
- 108020005196 Mitochondrial DNA Proteins 0.000 description 2
- 239000013614 RNA sample Substances 0.000 description 2
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 2
- 208000037065 Subacute sclerosing leukoencephalitis Diseases 0.000 description 2
- 206010042297 Subacute sclerosing panencephalitis Diseases 0.000 description 2
- 101710137500 T7 RNA polymerase Proteins 0.000 description 2
- 229920004890 Triton X-100 Polymers 0.000 description 2
- 239000013504 Triton X-100 Substances 0.000 description 2
- 230000027455 binding Effects 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 238000010835 comparative analysis Methods 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000003292 diminished effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000007834 ligase chain reaction Methods 0.000 description 2
- 239000000178 monomer Substances 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 150000003839 salts Chemical class 0.000 description 2
- 239000000758 substrate Substances 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- RNAMYOYQYRYFQY-UHFFFAOYSA-N 2-(4,4-difluoropiperidin-1-yl)-6-methoxy-n-(1-propan-2-ylpiperidin-4-yl)-7-(3-pyrrolidin-1-ylpropoxy)quinazolin-4-amine Chemical compound N1=C(N2CCC(F)(F)CC2)N=C2C=C(OCCCN3CCCC3)C(OC)=CC2=C1NC1CCN(C(C)C)CC1 RNAMYOYQYRYFQY-UHFFFAOYSA-N 0.000 description 1
- JRYMOPZHXMVHTA-DAGMQNCNSA-N 2-amino-7-[(2r,3r,4s,5r)-3,4-dihydroxy-5-(hydroxymethyl)oxolan-2-yl]-1h-pyrrolo[2,3-d]pyrimidin-4-one Chemical compound C1=CC=2C(=O)NC(N)=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O JRYMOPZHXMVHTA-DAGMQNCNSA-N 0.000 description 1
- 108091093088 Amplicon Proteins 0.000 description 1
- 241000219194 Arabidopsis Species 0.000 description 1
- 241000271566 Aves Species 0.000 description 1
- 244000063299 Bacillus subtilis Species 0.000 description 1
- 235000014469 Bacillus subtilis Nutrition 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 101150052727 CA1 gene Proteins 0.000 description 1
- 241000282465 Canis Species 0.000 description 1
- 108091035707 Consensus sequence Proteins 0.000 description 1
- 241000938605 Crocodylia Species 0.000 description 1
- 102000018832 Cytochromes Human genes 0.000 description 1
- 108010052832 Cytochromes Proteins 0.000 description 1
- 230000004544 DNA amplification Effects 0.000 description 1
- 230000006820 DNA synthesis Effects 0.000 description 1
- 241000255581 Drosophila <fruit fly, genus> Species 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 241001635598 Enicostema Species 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108700024394 Exon Proteins 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- 241000205692 Galeopterus variegatus Species 0.000 description 1
- 108700039691 Genetic Promoter Regions Proteins 0.000 description 1
- 108010010369 HIV Protease Proteins 0.000 description 1
- 101000934870 Homo sapiens Breast cancer type 1 susceptibility protein Proteins 0.000 description 1
- 101100005713 Homo sapiens CD4 gene Proteins 0.000 description 1
- 229930010555 Inosine Natural products 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 101100383042 Mus musculus Cd4 gene Proteins 0.000 description 1
- OKIZCWYLBDKLSU-UHFFFAOYSA-M N,N,N-Trimethylmethanaminium chloride Chemical compound [Cl-].C[N+](C)(C)C OKIZCWYLBDKLSU-UHFFFAOYSA-M 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 108700026244 Open Reading Frames Proteins 0.000 description 1
- 241000288994 Otolemur crassicaudatus Species 0.000 description 1
- 238000009004 PCR Kit Methods 0.000 description 1
- 229910019142 PO4 Inorganic materials 0.000 description 1
- 101100437867 Pan troglodytes BRCA1 gene Proteins 0.000 description 1
- 108091093037 Peptide nucleic acid Proteins 0.000 description 1
- 241000009328 Perro Species 0.000 description 1
- 108091028664 Ribonucleotide Proteins 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 1
- 230000006044 T cell activation Effects 0.000 description 1
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 229910052786 argon Inorganic materials 0.000 description 1
- 210000004436 artificial bacterial chromosome Anatomy 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 238000006664 bond formation reaction Methods 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 108091036078 conserved sequence Proteins 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000368 destabilizing effect Effects 0.000 description 1
- 210000003743 erythrocyte Anatomy 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 230000002550 fecal effect Effects 0.000 description 1
- 238000002073 fluorescence micrograph Methods 0.000 description 1
- 239000007850 fluorescent dye Substances 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 201000011045 hereditary breast ovarian cancer syndrome Diseases 0.000 description 1
- 210000003917 human chromosome Anatomy 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 238000012872 hydroxylapatite chromatography Methods 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012177 large-scale sequencing Methods 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 229910001629 magnesium chloride Inorganic materials 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000008018 melting Effects 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 208000024191 minimally invasive lung adenocarcinoma Diseases 0.000 description 1
- 230000002438 mitochondrial effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000003068 molecular probe Substances 0.000 description 1
- 231100000350 mutagenesis Toxicity 0.000 description 1
- 238000002515 oligonucleotide synthesis Methods 0.000 description 1
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 1
- 239000010452 phosphate Substances 0.000 description 1
- 150000004713 phosphodiesters Chemical class 0.000 description 1
- 150000008300 phosphoramidites Chemical group 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 125000006239 protecting group Chemical group 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000010839 reverse transcription Methods 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 210000003296 saliva Anatomy 0.000 description 1
- 210000000582 semen Anatomy 0.000 description 1
- 125000003607 serino group Chemical group [H]N([H])[C@]([H])(C(=O)[*])C(O[H])([H])[H] 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 239000011780 sodium chloride Substances 0.000 description 1
- 229910001415 sodium ion Inorganic materials 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 210000004243 sweat Anatomy 0.000 description 1
- 210000001138 tear Anatomy 0.000 description 1
- YNDXUCZADRHECN-JNQJZLCISA-N triamcinolone acetonide Chemical compound C1CC2=CC(=O)C=C[C@]2(C)[C@]2(F)[C@@H]1[C@@H]1C[C@H]3OC(C)(C)O[C@@]3(C(=O)CO)[C@@]1(C)C[C@@H]2O YNDXUCZADRHECN-JNQJZLCISA-N 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
- C12Q1/6874—Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation
Definitions
- the invention resides in the technical fields of molecular genetics, genomics and comparative sequence analysis.
- Arrays of probes provide a more efficient means of analyzing variant sequences once a prototypical or reference sequence has been determined. Analysis of the hybridization pattern of probes to a target nucleic acid reveals the position, and optionally the nature, of differences between the target and reference sequence.
- WO 95/11995 describes arrays comprising four probe sets. Comparison of the intensities of four corresponding probes from the four sets to a target sequence reveals the identity of a corresponding nucleotide in the target sequences aligned with an interrogation position of the probes. The corresponding nucleotide is the complement of the nucleotide occupying the interrogation position of the probe showing the highest intensity.
- hybridization intensities for multiple targets from different sources can be classified into groups or clusters suggested by the data, not defined a priori , such that isolates in a give cluster tend to be similar and isolates in different clusters tend to be dissimilar (see WO 97/29212, incorporated by reference in its entirety for all purposes) .
- Array-based resequencing has been used, for example, in the identification of large numbers of human polymorphisms in mitochondrial DNA and ESTs, the identification of drug- 3 induced mutations in HIV, and analysis of mutations in p53 correlated with human cancer.
- FIG. 1 Outline of sequence analysis algorithm using first and second level base calling strategies.
- Figs. 2 A-F Chimpanzee and human chip image comparisons. Magnified digitized false colored red images showing human and chimpanzee BRCA1 target hybridization patterns to high density oligonucleotide arrays evaluating antisense strands (array size is 1.2 cm x 1.2 cm with 50 micron probe feature sizes) . Contrast and brightness parameters were changed in each panel to increase image clarity. Probes designed to detect single nucleotide insertions are not shown for clarity. Nucleotide identities, determined through dideoxysequencing analysis, are given under the respective column, underlined if differing from human, and colored red or blue if correctly or incorrectly identified by level one hybridization analysis, respectively.
- Figs. 3 A-G Primate chip image comparisons. Digitized false colored red images showing hybridization 4 pattern of BRCAl fluorescent targets to high density oligonucleotide arrays evaluating antisense target strands. Magnification of the region (50 micron feature size) corresponding to nucleotide positions 3374-3388 of human BRCAl cDNA is given for each species; specific insertion probes are not shown for clarity. The arrangement of sequencing probes is given in Fig. IB. Nucleotide identities, determined through dideoxysequencing analysis, are given under each column and colored or underlined as described in Fig. 2. Hybridization patterns of (A) human ⁇ Hsa, Homo sapiens) , (B) chimpanzee
- Figs. 4 A-D Representative chip images of alternative second order tiling schemes for orangutan target sites. Magnified digitized false colored red images showing hybridization pattern of BJ?CA1 fluorescent orangutan targets to high density oligonucleotide arrays evaluating sense and antisense target strands. Nucleotide identities, determined through dideoxysequencing analysis, depicting coding strand sequence are given under the respective column and underlined if differing from the human consensus sequence. For the 2731 C->T and 3667 A->G base substitutions relative to human sequence, hybridization to nucleotides 2724-2728 and 3660-3674 using human cDNA numbering are given respectively.
- Hybridization patterns of orangutan (A) sense target with standard 2731 C tiling, (B) , sense target with alternative 2731 T tiling, (C) , antisense target with standard 3667 A tiling, (D) , antisense target with alternative 3667 G tiling. 5
- Antisense strand hybridization data is given relative to coding strand sequence.
- a nucleic acid is a deoxyribonucleotide or ribonucleotide polymer in either single-or double-stranded form, including known analogs of natural nucleotides unless otherwise indicated.
- An oligonucleotide is a single-stranded nucleic acid ranging in length from 2 to about 500 bases, and is typically, about 8-40, and more typically, 10-25 bases.
- a probe is an oligonucleotide capable of binding to a target nucleic acid of complementary sequence through one or more types of chemical bonds, usually through complementary base pairing, usually through hydrogen bond formation.
- An oligonucleotide probe may include natural (i.e. A, G, C, or T) or modified bases (e.g., 7-deazaguanosine, inosine) .
- the bases in oligonucleotide probe may be joined by a linkage other than a phosphodiester bond, so long as it does not interfere with hybridization.
- oligonucleotide probes may be peptide nucleic acids in which the constituent bases are joined by peptide bonds rather than phosphodiester linkages.
- Specific hybridization refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA.
- Stringent conditions are conditions under which a probe will hybridize to its target subsequence, but to no other sequences. Stringent conditions are sequence- dependent and are different in different circumstances. Longer sequences hybridize specifically at higher 6 temperatures. Generally, stringent conditions are selected to be about 5°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH.
- the Tm is the temperature (under defined ionic strength, pH, and nucleic acid concentration) at which 50% of the probes complementary to the target sequence hybridize to the target sequence at equilibrium. (As the target sequences are generally present in excess, at Tm, 50% of the probes are occupied at equilibrium) .
- stringent conditions include a salt concentration of at least about 0.01 to 1.0 M
- Stringent conditions can also be achieved with the addition of destabilizing agents such as formamide .
- destabilizing agents such as formamide .
- 5X SSPE 750 mM NaCl , 50 mM Na phosphate, 5 mM EDTA, pH 7.4
- a temperature of 25-30°C are suitable for allele-specific probe hybridizations.
- a perfectly matched probe has a segment perfectly complementary to a particular target sequence.
- Complementary base pairing means sequence-specific base pairing which includes e . g. , Watson-Crick base pairing or other forms of base pairing such as Hoogsteen base pairing.
- Probes typically have a segment of complementarity of 6-20 nucleotides, and preferably, 10-25 nucleotides. Leading or trailing sequences flanking the segment of complementarity can also be present.
- the term "mismatch probe” refer to probes whose sequence is deliberately selected not to be perfectly complementary to a particular target sequence. Although the mismatch (s) may be located anywhere in the mismatch probe, terminal mismatches are less desirable as a terminal mismatch is less likely to prevent hybridization of the target sequence. Thus, probes are often designed to have the mismatch located at or near the 7 center of the probe such that the mismatch is most likely to destabilize the duplex with the target sequence under the test hybridization conditions.
- Polymorphism refers to the occurrence of two or more genetically determined alternative sequences or alleles in a population.
- a polymorphic marker or site is the locus at which divergence occurs.
- Preferred markers have at least two alleles, each occurring at frequency of greater than 1%, and more preferably greater than 10% or 20% of a selected population.
- a polymorphic locus may be as small as one base pair.
- An array including a pooled probe means that a cell in the array is occupied by pooled mixture of probes .
- a cell might be occupied by probes ACCCTCCA and ACCCCCCA, in which case, the underline position is described as a pooled position.
- the identity of each probe in the mixture is known, the individual probes in the pool are not separately addressable.
- the hybridization signal from a cell is the aggregate of that of the different probes occupying the cell.
- species variant refers to a gene sequence that is evolutionarily and functionally related between species.
- the human CD4 gene is the cognate gene to the mouse CD4 gene, since the sequences and structures of these two genes indicate that they are highly homologous and both genes encode a protein which functions in signaling T-cell activation through MHC class II- restricted antigen recognition.
- Percentage sequence identity is determined between optimally aligned sequences from computerized implementations of algorithms such as GAP, BESTFIT, FASTA, and TFASTA in the 8
- the invention provides iterative methods of analyzing a target sequence, which represents a variant of a reference sequence.
- the methods employ an array of probes which includes a probe set comprising probes complementary to the reference sequence.
- a target nucleic acid is hybridized to the array of probes.
- the relative hybridization intensities of the probes to the target nucleic acid are then determined.
- the relative hybridization intensities are used to estimate a sequence of the target nucleic acid.
- a further array of probes is then provided comprising a probe set comprising probes complementary to the estimated sequence of the target nucleic acid.
- the target nucleic acid is then hybridized to the further array of probes and the relative hybridization of the probes to the target nucleic acid is determined.
- the sequence of the target nucleic acid is then reestimate from the relative hybridization intensities of the probes.
- the cycles of hybridization and estimating the sequence of the target nucleic acid can be reiterated, if desired, until the reestimate sequence of the target nucleic acid is the true sequence of the target nucleic acid.
- the methods are particularly useful for analyzing a target nucleic acid that represents a species variant of a known reference sequence.
- the reference sequence can be from a human and the target sequence from a primate .
- the target nucleic acid shows 50-99% sequence identity with the reference sequence.
- the methods are also particularly useful in situations where a target sequence 9 differs from a reference sequence by more than one mutation within a probe length.
- the methods can readily accommodate a reference sequence of at least 1 or 10 kb long or even a complete or substantially complete human chromosome or genome.
- a probe set for use in the methods typically includes overlapping probes that are perfectly complementary to and span the reference sequence, and the further array comprises probes that are perfectly complementary to and span the estimate sequence.
- the array of probes comprises four probe sets.
- a first probe set comprises a plurality of probes, each probe comprising a segment of at least six nucleotides exactly complementary to a subsequence of the reference sequence, the segment including at least one interrogation position complementary to a corresponding nucleotide in the reference sequence.
- third and fourth probe sets each comprise a corresponding probe for each probe in the first probe set, the probes in the second, third and fourth probe sets being identical to a sequence comprising the corresponding probe from the first probe set or a subsequence of at least six nucleotides thereof that includes the at least one interrogation position, except that the at least one interrogation position is occupied by a different nucleotide in each of the four corresponding probes from the four probe sets.
- the target sequence can be estimated by comparing the relative specific binding of four corresponding probes from the first, second, third and fourth probe sets. A nucleotide in the target nucleic acid is then assigned as the complement of the interrogation position of the probe having the greatest 10 specific binding. Other nucleotides in the target sequence are assigned by similar comparisons.
- the invention also provides methods of analyzing a target nucleic acid comprising the following steps.
- An array of probes is designed to be complementary to an estimated sequence of the target nucleic acid.
- the array of probes is hybridized to the target nucleic acid.
- the target sequence is reestimated from hybridization pattern of the array to the target nucleic acid.
- the steps are the repeated at least once.
- the invention provides improved methods for analyzing variants of a reference sequence using arrays of probes.
- the methods are particularly useful for target sequences showing substantial variation from a reference sequence, as may be the case where target sequence and reference sequence are from different species.
- the methods involve designing a primary array of probes based on a known reference sequence. Effectively, the reference sequence serves as a first estimate of sequence of the target nucleic acid.
- the primary array of probes is hybridized to a target nucleic acid, and the sequence of the target is estimated as well as possible from its hybridization pattern to the primary array.
- a secondary array of probes is then designed based on the estimated sequence of the target nucleic acid.
- the target nucleic acid is then hybridized with the secondary array of probes, and the sequence is reestimated from the resulting hybridization pattern. Further cycles of array design and estimation of target sequence can be performed in an iterative 11 fashion, if desired, until the estimated sequence is constant between successive cycles.
- Reference sequences for polymorphic site identification are often obtained from computer databases such as Genbank, the Stanford Genome Center, The Institute for Genome Research and the Whitehead Institute. The latter databases are available at http://www-genome.wi.mit.edu; http://shgc.stanford.edu and http://ww.tigr.org. Reference sequences are typically from well-characterized organisms, such as human, mouse, C. elegans , Arabidopsis, Drosophila, yeast, E. coli or Bacillus subtilis . A reference sequence can vary in length from 5 bases to at least 1,000,000 bases. References sequences are often of the order of 100-10,000 bases . The reference sequence can be from expressed or nonexpressed regions of the genome.
- RNA samples are used, highly expressed reference sequences are sometimes preferred to avoid the need for RNA amplification.
- the function of a reference sequence may or may not be known.
- Reference sequences can also be from episomes such as mitochondrial DNA. Of course, multiple reference sequences can be analyzed independently.
- Targets can represent allelic, species, induced or other variants of reference sequences. Considerable diversity is possible between reference and target sequence. Target sequences usually show between 50-99%, 80-98%, 90-95% sequence identity.
- a human reference sequence can be used as the starting point for analysis of primates, such as 12 gorillas, orangutans, other mammals, reptiles, birds, plants, fungi or bacteria.
- the nucleic acid samples hybridized to arrays can be genomic, RNA or cDNA. Nucleic acid samples are usually subject to amplification before application to an array. An individual genomic DNA segment from the same genomic location as a designated reference sequence can be amplified by using primers flanking the reference sequence. Multiple genomic segments corresponding to multiple reference sequences can be prepared by multiplex amplification including primer pairs flanking each reference sequence in the amplification mix. Alternatively, the entire genome can be amplified using random primers (typically hexamers) (see Barrett et al . , Nucleic Acids Research 23, 3488-3492 (1995)) or by fragmentation and reassembly (see, e.g., Stemmer et al . , Gene 164, 49-53
- Nucleic acids can also be amplified by cloning into vectors and propagating the vectors in a suitable organism.
- YACs, BACs and HACs are useful for cloning large segments of genomic DNA.
- Genomic DNA can be obtained from virtually any tissue source (other than pure red blood cells) .
- tissue samples include whole blood, semen, saliva, tears, urine, fecal material, sweat, buccal, skin and hair.
- RNA samples are also often subject to amplification. In this case amplification is typically preceded by reverse transcription. Amplification of all expressed mRNA can be performed as described by commonly owned WO 96/14839 and WO 97/01603. In some methods, in which arrays are designed to tile highly expressed sequences, amplification of RNA is unnecessary. The choice of tissue from which the sample is obtained affects the relative and absolute levels of different 13
- RNA transcripts in the sample For example, cytochromes P450 are expressed at high levels in the liver.
- Nucleic acids in a target sample are usually labelled in the course of amplification by inclusion of one or more labelled nucleotides in the amplification mix. Labels can also be attached to amplification products after amplification e.g., by end- labelling.
- the amplification product can be RNA or DNA depending on the enzyme and substrates used in the amplification reaction.
- LCR ligase chain reaction
- NASBA nucleic acid based sequence amplification
- ssRNA single stranded RNA
- dsDNA double stranded DNA
- An array of probes contain at least a first set of probes that are complementary to a reference sequence (or regions of interest therein) .
- the probes tile the reference sequence. Tiling means that the probe set contains overlapping probes which are complementary to and span a region of interest in the reference sequence.
- a probe set might contain a ladder of probes, each of which differs from its predecessor in the omission of a 5 ' base and the acquisition of an additional 3' base.
- the probes in a probe set may or may not be the same length.
- the number of probes can vary widely from about 5, 10, 20, 50, 100, 1000, to 10,000 or 100,000.
- the arrays do not contain every possible probe sequence of a given length.
- the first probe set comprises a plurality of probes exhibiting perfect complementarily with a reference sequence, as described above.
- Each probe in the first probe set has an interrogation position that corresponds to a nucleotide in the reference sequence. That is, the interrogation position is aligned with the corresponding nucleotide in the reference sequence, when the probe and reference sequence are aligned to maximize complementarily between the two.
- For each probe in the first set there are three corresponding probes from three additional probe sets. Thus, there are four probes corresponding to each nucleotide in the reference sequence .
- the probes from the three additional probe sets are identical to the corresponding probe from the first probe set except at the interrogation position, which occurs in the same position in each of the four corresponding probes from the four probe sets, and is occupied by a different nucleotide in the four probe sets.
- a substrate bearing the four probe sets is hybridized to a labelled target sequence, which shows substantial sequence similarity with the reference sequence, but which may differ due to e.g., species variations.
- the amount of label bound to probes is measured. Analysis of the pattern of label revealed the nature and position of differences between the target and reference sequence. For example, comparison of the intensities of four corresponding probes reveals the identity of a corresponding nucleotide in the target sequences aligned with the interrogation position of the probes. The corresponding nucleotide is the complement of the nucleotide occupying the interrogation position of the probe showing the highest intensity. The comparison can be performed between successive columns of four corresponding probes to determine the identity of successive nucleotides in the target sequence .
- one of the four probes clearly has a significantly higher signal than the other three, and the identity of the base in the target sequence aligned with the interrogation position of the probes can be called with substantial certainty.
- two or more probes may show similar but not identical signals. In these instances, one can simply score the position as ambiguous. Alternatively, can still call a base from the probe that has the higher signal but must recognize a significant possibility of error. In general, if the ratio of signals of two probes is less than 1.2, a base call has a significant possibility of error.
- Ambiguous positions are most frequently due to closely spaced multiple points of variation between target and reference sequence (i.e., within a probe length). Ambiguities 16 can also arise due to low hybridization intensity because of base composition effects.
- a secondary array of probes is constructed based on the same principles as the first array, except that the first probe set is tiled based on the newly estimated sequence rather than the original reference sequence.
- the estimated sequence includes the best estimate of base present at positions of ambiguity as noted above. If there is equal probability of two or more bases occupying a particular position in the estimated sequence, one can arbitrarily decide to include one of the bases, provide alternate tilings corresponding to the different possible bases, or include multiple pooled bases at the position.
- the secondary array typically has second, third and fourth probe sets designed according to the same principles as in the primary array.
- the secondary array is hybridized to the same target nucleic acid as was the primary array.
- Bases in the target sequence are called using the same principles as described above by comparison of probe intensities to give rise to a reestimated target sequence.
- the process can be repeated through further iterations, if desired. Further iteration is desirable if the estimated sequence contains a substantial number of positions, which have been estimated with a low degree of confidence (e.g., from a comparison of probe intensities differing by a factor of less than 1.2) . After sufficient iterations, the estimated sequence from one cycle should converge with that from the subsequent cycle. In some instances, positions of ambiguities may remain through many cycles. These positions may. be due to effects such as heterozygosity, and should be checked by other means (e.g., conventional dideoxy sequencing 17 or de novo sequencing by hybridization to a complete array of probes a given length) .
- a low degree of confidence e.g., from a comparison of probe intensities differing by a factor of less than 1.2
- arrays tile both strands of a reference sequence. Both strands are tiled separately using the same principles described above, and the hybridization patterns of the two tilings are analyzed separately. Typically, the hybridization patterns of the two strands indicates the same results (i.e., location and/or nature of variation between target sequence and reference sequence) . Occasionally, there may be an apparent inconsistency between the hybridization patterns of the two strands due to, for example, base-composition effects on hybridization intensities. Combination of results from the two strands increases the probability of correct base calling and can decrease the number of iterations required to determine the correct base sequence of a target .
- duplicate arrays are synthesized to allow analysis of hybridization between target sequence and probes under conditions of high and low stringency.
- high stringency is generally most useful
- Statistical combination of base calls from conditions of high and low stringency can increase the overall probability of correct base calling.
- Arrays of probe immobilized on supports can be synthesized by various methods.
- a preferred methods is VLSIPSTM (see Fodor et al . , US 5,143,854; EP 476,014, Fodor et al., 1993, Nature 364, 555-556; McGall et al . , USS ⁇ 08/445,332), which entails the use of light to direct the synthesis of oligonucleotide probes in high-density, miniaturized arrays (sometimes known as chips) .
- Algorithms for design of masks to reduce the number of synthesis cycles are described by Hubbel et al . , US 5,571,639 and US 5,593,839.
- Arrays can also be synthesized in a combinatorial fashion by delivering monomers to cells of a support by mechanically constrained flowpaths. See Winkler et al . , EP 624,059. Arrays can also be synthesized by spotting monomers reagents on to a support using an ink jet printer. See id. ; Pease et al. , EP 728,520.
- hybridization intensity for the respective samples is determined for each probe in the array.
- hybridization intensity can be determined by, for example, a scanning confocal microscope in photon counting mode. Appropriate scanning devices are described by e.g., Trulson et al . , US 5,578,832; Stern et al . , US 5,631,734.
- a minimal overlapping set of physical clones is first obtained. For example, random bacterial artificial chromosome clones are generated, and ordered by hybridization or conventional methods. If necessary, regions mapping to related positions in the genome are determined. E.g., pools of clones are hybridized to an array of mapped markers. Pools of clones are then generated for hybridization (e.g., 300 pools if the resequencing capacity is 1 Mb/chip and 300 chip designs are used to analyze 1/lOth a mammalian genome) .
- sequences differences between differences species allows correlation between form and function. For example, the sequence of chimpanzee and human differ by -1% overall. Further, the present methods allow comparison of a range of primate sequences, to see which sequences have evolved the most rapidly and which are highly conserved. It will be apparent from the above that the invention includes a general concept which can be expressed concisely as follows. The invention entails the use of iterative cycles of designing an array of probes to be complementary to an estimated sequence of a target nucleic acid, and using the hybridization pattern of the array to the target nucleic acid sequence to determine a more accurate reestimated target sequence.
- base 21 calling was made with at least 99.91% accuracy covering a minimum of 97% of the sequence.
- PCR from genomic DNA and transcription PCR reactions were performed on genomic samples using the EXPANDTM Long Range PCR Kit (Boehringer Mannheim) with intronic primers 11PIF 5 ' -CCTTGTTATTTTTTGTATATTTTCAG-3 ' and 11PIR 5 ' -CAAAAACCTGGTTCCAATAC-3 ' , directly overlapping the underlined 5 ' -AG acceptor and 3 ' -GT donor splice sites.
- PCR reactions using the templates generated from the 11PIF and 11PIR primer set, were performed with primers 11PIFT3 5'- ATTAACCCTCACTAAAGGGACCTTGTTATTTTTTGTATATTTTCAG- .3 ' and 11PIRT7 5 ' -TAATACGACTCACTATAGGGACAAAAACCTGGTTCCAA TAC-3' containing T3 and T7 RNA polymerase promoter sequences respectively.
- Test sample transcription product was diluted to a final concentration of 100 nM in a 25 ⁇ l solution of 30 mM MgCl 2 .
- the reaction was incubated at 94 degrees C for 60 minutes to hydrolyze target into fragments ranging from
- hybridization buffer 3 M TMAC (tetramethylammonium chloride), IX TE pH 7.4, 0.005% Triton X-100, 1 nM 5 ' -fluorescein-labelled control oligonucleotide 5'CGGTACCATCTTGAC-3' ) .
- the control oligonucleotide hybridizes to specific surface probes aiding in image alignment.
- Target was hybridized with the appropriate sense or antisense strand reading array in a 250 ⁇ l volume for 4 hours at 35 degrees C.
- the array surface was washed with 5 ml of wash buffer (6X SSPE, 0.005% Triton X-100) and stained with phycoerythrin-streptavidin conjugate (Molecular Probes) (2 ⁇ /ml in wash buffer containing 2 mg/ml acetylated BSA (GIBCO BRL)) for 20 minutes at room temperature.
- wash buffer 6X SSPE, 0.005% Triton X-100
- phycoerythrin-streptavidin conjugate Molecular Probes
- Each array was washed with 5 ml of wash buffer and imaged using a 488 nm argon laser equipped with a scanning confocal microscope (GeneChip Scanner, Affymetrix) .
- Fluorescent hybridization signals were detected by a photomultiplier tube using a 560 nm longpass emission filter.
- oligonucleotide array The synthesis and design of the oligonucleotide array has been described previously 9 . Briefly, DNA phosphoramidites bearing 5 ' -photolabile protecting groups are coupled to a derivatized glass surface using modified DNA synthesis protocols. Spatially addressable oligonucleotide synthesis is obtained through photolithographic techniques with selective oligonucleotide photodeprotection for each 23 coupling cycle. Thirty identical high density array chips containing over 48,000 oligonucleotides were simultaneously produced in a single 8 hour synthesis.
- GeneChip Software created digitized fluorescence images by converting photomultiplier tube output signal into proportional spatially addressed pixel values. The probe intensity is calculated from the mean of the non-outlier photon counts for each feature (i.e. per probe). Background corrected fluorescent hybridization signal to each probe was extracted from test images using AVI Software (Affymetrix) and imported into ViewSeq Software (Affymetrix) which quantitates ratios of fluorescent target hybridization signal to each set of 8 oligonucleotide probes (4 per strand) interrogating each nucleotide. Data from 4 sets of experiments reading both target strands were averaged to produce a composite file.
- Template PCR products were purified using the Wizard DNA Purification Kit (Promega) .
- Conventional fluorescent dye terminator 3 pass dideoxysequencing analysis was performed using the AB1377 System.
- Human BRCAl exon 11 primers were used for first pass sequencing of all templates, except the canine ortholog of known sequence 18 . Sequence gaps were filled by a primer walking strategy.
- This sequencing and template generation strategy is not sensitive towards detecting all possible heterozygous single nucleotide polymorphisms; however, it is quite sensitive to detection of heterozygous sequences causing chain length differences in dideoxysequencing products. Nevertheless, in cases of 24 heterozygous base substitutions the identity of one allele is reported.
- a nested amplicon within the flying lemur template was generated using Amplitaq GOLD (Perkin Elmer) and the manufacturers protocols to clarify a suspected heterozygous sequence.
- PCR product was subcloned using Zero Blunt Cloning Kit (Invitrogen) and inserts from individual colonies were sequenced.
- a heterozygous in-frame 3 base pair deletion was found in flying lemur target which aligns with bases 2192-2194 of human BRCAl cDNA sequence and results in the removal of a single serine from a tract of three serine residues.
- High density arrays have been used to screen the 3.43-kb exon 11 of the human hereditary breast and ovarian cancer BRCAl gene 13 for all possible heterozygous polymorphisms and mutations 9 .
- level one analysis quantitates the ratios of fluorescent target hybridization signal for eight probes (four per sense and antisense strands, respectively) querying each nucleotide position.
- the identity of the brightest signal is assigned to the target nucleotide, using human sequence numbering. If the brightest probe signal in each set is less than or equal to a factor of 1.2 of the next brightest, an IUPAC ambiguity designation is assigned.
- 1,363 nucleotide positions had single nucleotide mismatch specificity ratios (the ratio between the two highest probe signals within each averaged composite set of four) greater than 9.0, 1,346 positions had ratios between 5.0 and 9.0, 708 positions had ratios between 2.0 and 5.0, 5 positions had ratios between 1.2 and 2.0, and 4 positions had ratios less than 1.2 (giving an ambiguous level one base call) .
- Figs. 3 A-D Human, chimpanzee, gorilla, and orangutan targets with identical sequence tracts showed similar hybridization patterns (Figs. 3 A-D) .
- a single nucleotide substitution between rhesus and human targets is correctly identified by level one analysis; however, the 3' -adjacent nucleotide is incorrectly assigned (Fig. 3E) .
- Level one hybridization data identifies two red howler monkey nucleotide substitutions, but cannot accurately read adjacent sequences (Fig. 3F) .
- Galago target contained 3 closely spaced nucleotide substitutions causing diminished hybridization signals and lower fidelity nucleotide assignments (Fig. 3G) .
- level one sequence information was determined from the least (dog) and most (chimpanzee) highly conserved targets by referring to dideoxysequencing data.
- level one dog sequence Upon inspecting level one dog sequence, it was evident that base calling was poor quality in regions of predicted multiple substitutions. Furthermore, it was apparent that the most accurate level one base calls occurred in sequence tracts predicted identical to human reference sequence. Therefore, such tracts ranging from 4 to 8 nucleotides in length were systematically evaluated for base-calling fidelity.
- predicted single nucleotide substitutions flanked by these tracts were included in this evaluation since the array has the capacity to correctly identify them (Fig. 2B) .
- a second order tiling scheme with probes designed to match anticipated base substitutions in the level one data based upon single nucleotide mismatch probe hybridization signals, clarifies most or all ambiguities.
- All chimpanzee, gorilla, and orangutan miscalls made adjacent to base substitutions can be clarified using second order tiling schemes since the sequence accuracy was at least 99.88% when the tiling pattern matches the target sequence.
- red howler monkey, galago, and dog orthologs provided less level two quality hybridization data (Table 1) . This was primarily caused by an increased number of closely spaced nucleotide substitutions along with insertions and deletions. Of the 26 level-two red 29 howler target miscalls, 23 were found nearby an almost exact 21-bp target duplication while 3 were due to a 3 base pair deletion.
- Completion of the Human Genome Project allows use of DNA chips for rapid genome-wide determination of non-human primate sequences 14 . This approach is particularly powerful when scanning for conserved sequence tracts, important for phylogenetic footprinting of promoter regions 2 .
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention concerne des techniques itératives permettant d'analyser un acide nucléique cible qui représente un variant d'un acide nucléique de référence. On construit une matrice de sondes complémentaire d'une séquence estimée d'un acide nucléique cible. On hybride ladite matrice avec ledit acide nucléique cible. On procède à une nouvelle estimation de la séquence cible à partir du motif d'hybridation de la matrice avec l'acide nucléique cible. On construit une autre matrice de sondes complémentaire de la séquence nouvellement estimée, puis on utilise cette matrice pour obtenir une nouvelle estimation de la séquence de l'acide nucléique cible. En effectuant des cycles répétés de construction des sondes et d'estimation de la séquence cible, on obtient une séquence estimée de la cible qui se rapproche de la véritable séquence.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US7385398P | 1998-02-02 | 1998-02-02 | |
US7334598P | 1998-02-02 | 1998-02-02 | |
US60/073,345 | 1998-02-02 | ||
US60/073,853 | 1998-02-02 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1999039004A1 true WO1999039004A1 (fr) | 1999-08-05 |
Family
ID=26754376
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1998/005438 WO1999039004A1 (fr) | 1998-02-02 | 1998-03-19 | Resequençage automatique |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO1999039004A1 (fr) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19957320A1 (de) * | 1999-11-29 | 2001-05-31 | Febit Ferrarius Biotech Gmbh | Dynamische Sequenzierung durch Hybridisierung |
WO2001040509A3 (fr) * | 1999-11-29 | 2001-12-06 | Febit Ferrarius Biotech Gmbh | Determination dynamique d'analytes |
WO2001057278A3 (fr) * | 2000-02-04 | 2003-01-09 | Aeomica Inc | Sondes d'acide nucleique a un seul exon derivees du genome humain utiles pour analyser l'expression genique dans des cellules hela humaines ou d'autres cellules epitheliales humaines du col de l'uterus |
EP0972078A4 (fr) * | 1997-03-20 | 2003-05-28 | Affymetrix Inc | Resequen age iteratif |
US6632641B1 (en) | 1999-10-08 | 2003-10-14 | Metrigen, Inc. | Method and apparatus for performing large numbers of reactions using array assembly with releasable primers |
US6846635B1 (en) | 1999-07-30 | 2005-01-25 | Large Scale Proteomics Corp. | Microarrays and their manufacture |
US7179638B2 (en) | 1999-07-30 | 2007-02-20 | Large Scale Biology Corporation | Microarrays and their manufacture by slicing |
US7211390B2 (en) | 1999-09-16 | 2007-05-01 | 454 Life Sciences Corporation | Method of sequencing a nucleic acid |
US7244559B2 (en) | 1999-09-16 | 2007-07-17 | 454 Life Sciences Corporation | Method of sequencing a nucleic acid |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5683881A (en) * | 1995-10-20 | 1997-11-04 | Biota Corp. | Method of identifying sequence in a nucleic acid target using interactive sequencing by hybridization |
US5698391A (en) * | 1991-08-23 | 1997-12-16 | Isis Pharmaceuticals, Inc. | Methods for synthetic unrandomization of oligomer fragments |
-
1998
- 1998-03-19 WO PCT/US1998/005438 patent/WO1999039004A1/fr active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5698391A (en) * | 1991-08-23 | 1997-12-16 | Isis Pharmaceuticals, Inc. | Methods for synthetic unrandomization of oligomer fragments |
US5683881A (en) * | 1995-10-20 | 1997-11-04 | Biota Corp. | Method of identifying sequence in a nucleic acid target using interactive sequencing by hybridization |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0972078A4 (fr) * | 1997-03-20 | 2003-05-28 | Affymetrix Inc | Resequen age iteratif |
US7144699B2 (en) | 1997-03-20 | 2006-12-05 | Affymetrix, Inc. | Iterative resequencing |
US7179638B2 (en) | 1999-07-30 | 2007-02-20 | Large Scale Biology Corporation | Microarrays and their manufacture by slicing |
US6887701B2 (en) | 1999-07-30 | 2005-05-03 | Large Scale Proteomics Corporation | Microarrays and their manufacture |
US6846635B1 (en) | 1999-07-30 | 2005-01-25 | Large Scale Proteomics Corp. | Microarrays and their manufacture |
US7335762B2 (en) | 1999-09-16 | 2008-02-26 | 454 Life Sciences Corporation | Apparatus and method for sequencing a nucleic acid |
US7264929B2 (en) | 1999-09-16 | 2007-09-04 | 454 Life Sciences Corporation | Method of sequencing a nucleic acid |
US7244559B2 (en) | 1999-09-16 | 2007-07-17 | 454 Life Sciences Corporation | Method of sequencing a nucleic acid |
US7211390B2 (en) | 1999-09-16 | 2007-05-01 | 454 Life Sciences Corporation | Method of sequencing a nucleic acid |
US6632641B1 (en) | 1999-10-08 | 2003-10-14 | Metrigen, Inc. | Method and apparatus for performing large numbers of reactions using array assembly with releasable primers |
EP1650314A1 (fr) * | 1999-11-29 | 2006-04-26 | febit AG | Détermination dynamique d'analytes en utilisant une puce localisée sur une surface interne |
WO2001040510A3 (fr) * | 1999-11-29 | 2001-12-06 | Febit Ferrarius Biotech Gmbh | Sequençage dynamique par hybridation |
DE19957320A1 (de) * | 1999-11-29 | 2001-05-31 | Febit Ferrarius Biotech Gmbh | Dynamische Sequenzierung durch Hybridisierung |
WO2001040509A3 (fr) * | 1999-11-29 | 2001-12-06 | Febit Ferrarius Biotech Gmbh | Determination dynamique d'analytes |
GB2382814B (en) * | 2000-02-04 | 2004-12-15 | Aeomica Inc | Human genome-derived single exon nucleic acid probes useful for analysis of gene expression in human hela cells or other human cervical epithelial cells |
WO2001057278A3 (fr) * | 2000-02-04 | 2003-01-09 | Aeomica Inc | Sondes d'acide nucleique a un seul exon derivees du genome humain utiles pour analyser l'expression genique dans des cellules hela humaines ou d'autres cellules epitheliales humaines du col de l'uterus |
WO2001057270A3 (fr) * | 2000-02-04 | 2003-02-13 | Aeomica Inc | Sondes d'acide nucleique a un seul exon derivees du genome humain utiles pour analyser l'expression genique dans des cellules hbl 100 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0972078B1 (fr) | Resequenage iteratif | |
Hacia et al. | Evolutionary sequence comparisons using high-density oligonucleotide arrays | |
EP1108062B1 (fr) | Emploi de sondes groupees en analyse genetique | |
US6709816B1 (en) | Identification of alleles | |
JP3693352B2 (ja) | プローブアレイを使用して、遺伝子多型性を検出し、対立遺伝子発現をモニターする方法 | |
US20050074787A1 (en) | Universal arrays | |
US20010053519A1 (en) | Oligonucleotides | |
EP1256632A2 (fr) | Criblage à haut rendement de polymorphismes | |
US20050164184A1 (en) | Hybridization portion control oligonucleotide and its uses | |
JP2006520206A (ja) | プローブ、バイオチップおよびそれらの使用方法 | |
EP1056889A2 (fr) | Procedes et produits associes a la determination d'un genotype et a l'analyse de l'adn | |
EP0950720A1 (fr) | Méthode pour l'identification et pour établir le profil des polymorphismes | |
EP1975249A2 (fr) | Ensemble amorce de nucléotide et sonde nucléotide pour détecter un génotype de N-acétyltransférase 2 (NAT2) | |
WO1999039004A1 (fr) | Resequençage automatique | |
US6638719B1 (en) | Genotyping biallelic markers | |
EP1612282B1 (fr) | Ensemble de sondes destiné à la détection des acides nucléiques | |
WO1999058721A1 (fr) | Amplification muliplex d'adn a l'aide d'amorces chimeres | |
US20040248176A1 (en) | Iterative resequencing | |
KR102237248B1 (ko) | 소나무 개체식별 및 집단의 유전 분석용 snp 마커 세트 및 이의 용도 | |
US20030129598A1 (en) | Methods for detection of differences in nucleic acids | |
HK1025603B (en) | Iterative resequencing | |
WO2024048602A1 (fr) | Composition tampon à utiliser en hybridation et procédé d'hybridation | |
WO2004059013A1 (fr) | Detection de polymorphismes mononucleotidiques utilisant le genotypage avec depletion du nucleotide | |
Remm et al. | 13 Primer Design for Large-Scale | |
JP2009125018A (ja) | ハプロタイプの検出法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): JP US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
122 | Ep: pct application non-entry in european phase |