WO1998039480A1

WO1998039480A1 - Methods and compositions for identifying expressed genes

Info

Publication number: WO1998039480A1
Application number: PCT/US1998/004094
Authority: WO
Inventors: Tariq M. Haqqi
Original assignee: Haqqi Tariq M
Priority date: 1997-03-03
Filing date: 1998-03-03
Publication date: 1998-09-11
Also published as: AU6444698A

Abstract

Methods and compositions are described for distinguishing between the expression of genes in two or more biological samples. The present invention employs oligonucleotide primers targeting conserved motifs within each expressed gene. The primers and method of the present invention allows for the accurate identification of differentially expressed gene(s) in various cell types.

Description

METHODS AND COMPOSITIONS FOR IDENTIFYING EXPRESSED GENES

FIELD OF THE INVENTION The present invention relates to the identification of expressed genes, and in particular, methods and compositions for distinguishing between the expression of genes in two or more biological samples.

BACKGROUND The initial observations of the "hybridization" process, i.e., the ability of two polymers of nucleic acid containing complementary sequences to find each other and anneal through base pairing interaction, by Marmur and Lane, Proc.Nat.Acad.Sci.. U.S.A. 46. 453 (1960) and Doty, et al. Proc.Nat.Acad.Sci.. U.S.A. 46, 461 (1960), have been followed by the refinement of this process into an essential tool of modern biology. Initial hybridization studies, such as those performed by Hayashi, et al., Proc.Nat.Acad.Sci., U.S.A. 50, 664 (1963). were formed in solution. Further development led to the immobilization of the target DNA or RNA on solid supports. With the discovery of specific restriction endonucleases by Smith and Wilcox, J.Mol.Biol. 51, 379 (1970), it became possible to isolate discrete fragments of DNA. Utilization of immobilization techniques, such as those described by Southern, J.Mol.Biol. 98. 503 (1975). in combination with restriction enzymes, has allowed for the identification by hybridization of single copy genes among a mass of fractionated, genomic DNA.

With the development of these complex and powerful biological techniques, an ambitious project has been undertaken. This project, called the Human Genome Project (HGP). involves the complete characterization of the archetypal human genome sequence which comprises 3 x 10⁹ DNA nucleotide base pairs. An implicit goal of the project is to find genes that may be involved in human health.

However, humans are greater than 99% identical at the DNA sequence level. Thus, merely finding the native sequence of genes will not reveal whether the gene is important in a disease-related process. Indeed, it is the identification of the differences between people that arguably will provide the information most relevant to individual health care.

Identifying differences between biological samples is not trivial. The first approach involved the production of a so-called "subtracted cDNA library." A subtracted cDNA library contains cDNA clones corresponding to mRNAs present in one sample and not present in another (e.g. present in a particular species, tissue or cell and not present in another species, tissue or cell). See generally, Current Protocols in Molecular Biology, Section 5.8.9 (1990). In the protocol, cDNA containing the gene(s) of interest ["+cDNA"] is prepared with EcoRI ends and the cDNA not containing the gene(s) of interest ["-cDNA"] is prepared with blunt ends. The +cDNA is mixed with a 50-fold excess of -cDNA inserts and the mixture is heated to make the DNA single-stranded. Thereafter, the mixture is cooled to allow for hybridization. Annealed cDNA inserts are ligated to a vector and transfected. In theory, the only +cDNA likely to be double-stranded with an ΕcoRI site at each end are those not hybridized to something in the -cDNA preparation; in other words, where a complementary sequence is in the -cDNA preparation, the sequence will not be transfected. Thus, only sequences unique to the +cDNA preparation will be cloned and amplified.

The subtraction approach is tedious. Moreover the hybridizations and library production with a small amount of cDNA are technically artful.

A second approach to identifying differences involves the differential display of mRNAs using arbitrarily primed polymerase chain reaction (DDRT-PCR). The polymerase chain reaction is described by Mullis, et al., in U.S. Patents Nos. 4.683,195, 4,683,202 and 4,965,188, hereby incorporated by reference. Briefly, the PCR process consists of introducing a molar excess of two oligonucleotide primers to the DNA mixture containing the desired target sequence. The two primers are complementary to their respective strands of the double-stranded sequence. The mixture is denatured and then allowed to hybridize.

Following hybridization, the primers are extended with a thermostable DNA polymerase so as to form complementary strands. The steps of denaturation, hybridization, and polymerase extension can be repeated as often as needed to obtain a relatively high concentration of a segment of the desired target sequence. In the case of DDRT-PCR, the target is mRNA; the mRNA is, however, treated with reverse transcriptase in the presence of oligo(dT) primers to make cDNA prior to the PCR process. The PCR is carried out with random primers in combination with the oligo(dT) primer used for cDNA synthesis. In theory, since only mRNA is (indirectly) amplified, only the expressed genes are amplified. Where two samples are to be compared, the amplified products are placed in side-by-side lanes of a gel; following electrophoresis, the products can be compared or "differentially displayed."

DDRT-PCR, while an improvement over subtractive hybridization, has a number of drawbacks. First, the use of arbitrary random primers can cause faint banding at essentially .__

every position of the gel. Secondly, the process is generally biased toward high-copy number genes.

There have been some attempts to remedy these problems. For example. E. Haag, et al, "Effects of Primer Choice and Source of Taq DNA Polymerase on the Banding Patterns of Differential Display RT-PCR," Biotechniques 17:226-228 (1994) describes an improved DDRT-PCR method, whereby the use of the standard oligo-dT primer in the PCR step is omitted to decrease the faint banding at essentially every position of the electrophoresis gel. Instead, a second arbitrary primer was utilized in PCR. Another example is O.C. Ikonomov, et al, "Differential Display Protocol With Selected Primers That Preferentially Isolate mRNAs of Moderate to Low Abundance in a Microscopic System," Biotechniques 20:1030-1042 ( 1996); this paper describes the use of a modified DDRT-PCR protocol to increase bias towards moderate to low abundance transcripts. The authors utilized experimentally selected primer pairs directed at known coding sequences that avoid amplification of highly abundant ribosomal and mitochondrial transcripts. While such efforts have improved DDRT-PCR, the process remains unsatisfactory because of the continued amplification of material that is not of interest.

What is needed is a convenient method for distinguishing between the expression of genes in two or more biological samples. Such a method should also promote followup analysis once a gene of interest is identified.

SUMMARY OF THE INVENTION

The present invention relates to the identification of expressed genes, and in particular, methods and compositions for distinguishing between the expression of genes in two or more biological samples. The present invention employs oligonucleotide primers targeting conserved motifs within each expressed gene. In one embodiment, the present invention contemplates first and second oligonucleotide primers, said first oligonucleotide primer specific for the highly conserved Kozak sequence present before the translation initiating first methionine codon and said second oligonucleotide primer containing sequence complementary to a specific restriction endonuclease recognition site. It is contemplated that the specificity of the oligonucleotide primers can be enhanced by the presence of degenerate bases 5' and 3' of the target sequence thus allowing for PCR to be performed at a higher annealing temperature which in turn provide sufficient specificity to generate reproducible patterns of bands on a sequencing gel. This reproducibility enables the method of the present invention

- _» - __.

to accurately identify differentially expressed gene(s) in various human cell types including intermediate and low abundance transcripts. The present invention contemplates applying the method for the study of functional genomics and for analyzing the differentially expressed genes in various cell types. It is not intended that the present invention be limited by the nature of the sample.

The terms "sample" and "specimen" in the present specification and claims are used in their broadest sense. On the one hand they are meant to include a specimen or culture. On the other hand, they are meant to include both biological and environmental samples. These terms encompasses all types of samples obtained from humans and other animals, including but not limited to, body fluids such as urine, blood, fecal matter, cerebrospinal fluid (CSF), semen, and saliva, cells as well as solid tissue (including both normal and diseased tissue). These terms also refers to swabs and other sampling devices which are commonly used to obtain samples for culture of microorganisms.

It is also not intended that the invention be limited by the particular purpose for carrying out the biological reactions. In one medical diagnostic application, it may be desirable to differentiate between normal and cancerous tissue. In one embodiment, the present invention may be used to differentiate between cancer tissue that is metastatic and cancer tissue that is non-metastatic. In yet another embodiment, the present invention may be used to detect drug resistance. In another medical diagnostic application, it may be desirable to simply detect the presence or absence of specific pathogens (or pathogenic variants) in a clinical sample. In yet another application, it may be disirable to distinguish one species or strain from another.

With regard to distinguishing different species, in one embodiment, the present invention contemplates comparing the expressed genes of two samples suspected to be different species. In another embodiment, a species that is suspected to have changed or diverged from the parent species is compared with the parent species. For example, a species or strain of bacteria may develop a different susceptibilities to a drug (e.g. antibiotics) as compared to the parent species: rapid identification of the specific species or subspecies aids diagnosis and allows initiation of appropriate treatment. In one embodiment, the present invention contemplates a method of analyzing nucleic acid in a sample, comprising: a) providing: i) a sample containing nucleic acid, ii) a first primer having a sequence of which at least a portion is at least partially complementary to a natural common non-coding sequence on a portion of said nucleic acid of said sample, iii) a second primer having a sequence of which at least a portion is at least partially complementary to a restriction enzyme recognition sequence present on a portion of said nucleic acid of said sample, and iv) a polymerase and PCR reagents; b) preparing said nucleic acid from said sample under conditions so as to produce amplifiable nucleic acid; c) amplifying said nucleic acid with said first and second primers, said polymerase and said

PCR reagents under conditions such that amplified product is generated; d) detecting said amplified product.

It is not intended that the present invention be limited by the nature of the non-coding sequence; the choice may depend on the type of sample. In one embodiment, said sample comprises eukaryotic cells and said natural common sequence is the Kozak sequence. In another embodiment, said sample comprises prokaryotic cells and said natural common sequence is the Shine-Dalgarno sequence.

It is not intended that the present invention be limited by the means of detection. In one embodiment, said detecting comprises gel electrophoresis. The present invention can be used with particular success when comparing samples.

In one embodiment, the present invention contemplates amethod of analyzing expressed genes in biological samples, comprising: a) providing: i) two samples containing mRNA. ii) a first primer having a sequence of which at least a portion is at least partially complementary to a natural common non-coding sequence on at least a portion of said mRNA of said two samples, iii) a second primer having a sequence of which at least a portion is at least partially complementary to a restriction enzyme recognition sequence present on a portion of said mRNA of said two samples, and iv) a polymerase and PCR reagents; b) treating said mRNA of each of said two samples under conditions so as to produce amplifiable DNA from each sample; c) amplifying said DNA from each sample with said first and second primers, said polymerase and said PCR reagents under conditions such that amplified product is generated from each of said two samples; d) detecting said amplified product.

The comparison can be made between cells of similar type. For example, in one embodiment, each of said two samples comprise eukaryotic cells and said natural common sequence is the Kozak sequence. On the other hand, dissimilar samples can be usefully compared. For example, in one embodiment, said two samples comprise prokaryotic cells and said natural common sequence is the Shine-Dalgarno sequence, and said two samples comprises bacterial cells of different species. It is not intended that the present invention be limited by the number of samples compared. The present invention contemplates amethod of analyzing expressed genes in a multiple samples, comprising: a) providing: i) at least two samples containing mRNA, ii) random primers, iii) reverse transcriptase, iv) a first primer having a sequence of which at least a portion is at least partially complementary to a natural common non-coding sequence on a portion of said mRNA of said samples, v) a second primer having a sequence of which at least a portion is at least partially complementary to a restriction enzyme recognition sequence present on a portion of said mRNA of said samples, and vi) a polymerase and PCR reagents: b) extracting mRNA from each of said samples and reverse transcribing said mRNA with said reverse transcriptase and said random primers under conditions such that cDNA is produced: c) amplifying said cDNA from each sample with said first and second primers, said polymerase and said PCR reagents under conditions such that amplified product is generated from each of said samples: d) detecting said amplified product.

Clinical samples are specifically contemplated within the scope of the present invention. For example, where said samples comprise eukaryotic cells and said natural common sequence is the Kozak sequence, said samples can comprise human cancer cells. The present invention contemplates the primers of the present invention as unique compositions. The present invention also contemplates kits containing these novel compositions. In one embodiment, the kit comprises: i) a first primer having a sequence of which at least a portion is at least partially complementary to a natural common non-coding sequence, and ii) a second primer having a sequence of which at least a portion is at least partially complementary to a restriction enzyme recognition sequence. In one embodiment, said natural common sequence is the Kozak sequence. In another embodiment, said natural common sequence is the Shine-Dalgarno sequence. In one embodiment, said restriction enzyme recognition sequence is selected from the group consisting of the sequences set forth in Table 1.

A variety of primers are contemplated. In one embodiment, the present invention contemplates that said first primer is of the general formula:

5XN _ϋX-N_MOATGN _O-3' ,wherein N is A, T,G, C or nothing, and wherein X is the recognition sequence for a restriction enzyme or nothing.

The present invention also contemplates said second primer is of the general formula: 5^XN_M0-X-N_{|. l0}TAAGGAGGN _ϋ-3\ where X is a recognition sequence or nothing, and where N is A, T, G, C or nothing. Again, the recognition sequences can be selected from a variety of sources, including but not limited to those in Table 1.

DEFINITIONS To facilitate understanding of the invention, a number of terms are defined below.

"Nucleic acid sequence" and "nucleotide sequence" as used herein refer to an oligonucleotide or polynucleotide, and fragments or portions thereof, and to DNA o RNA of genomic or synthetic origin which may be single- or double-stranded, and represent the sense or antisense strand. The term "recombinant DNA molecule" as used herein refers to a DNA molecule which is comprised of segments of DNA joined together by means of molecular biological techniques.

The term "recombinant protein" or "recombinant polypeptide" as used herein refers to a protein molecule which is expressed using a recombinant DNA molecule. As used herein, the terms "vector" and "vehicle" are used interchangeably in reference to nucleic acid molecules that transfer DNA segment(s) from one cell to another.

The term "expression vector" or "expression cassette" as used herein refers to a recombinant DNA molecule containing a desired coding sequence and appropriate nucleic acid sequences necessary for the expression of the operably linked coding sequence in a particular host organism. Nucleic acid sequences necessary for expression in prokaryotes usually include a promoter, an operator (optional), and a ribosome binding site, often along with other sequences. Eukaryotic cells are known to utilize promoters, enhancers, and termination and polyadenylation signals.

The terms "in operable combination", "in operable order" and "operably linked" as used herein refer to the linkage of nucleic acid sequences in such a manner that a nucleic acid molecule capable of directing the transcription of a given gene and/or the synthesis of a desired protein molecule is produced. The term also refers to the linkage of amino acid sequences in such a manner so that a functional protein is produced.

The term "transfection" as used herein refers to the introduction of foreign DNA into cells. Transfection may be accomplished by a variety of means known to the art including calcium phosphate-DNA co-precipitation, DEAE-dextran-mediated transfection, polybrene- mediated transfection. electroporation, microinjection, liposome fusion, lipofection, protoplast fusion, retroviral infection, biolistics (i.e., particle bombardment) and the like. __.

As used herein, the terms "complementary" or "complementarity" are used in reference to "polynucleotides" and "oligonucleotides" (which are interchangeable terms that refer to a sequence of nucleotides) related by the base-pairing rules. For example, the sequence "C-A- G-T." is complementary to the sequence "G-T-C-A." Complementarity can be "partial" or "total." "Partial" complementarity is where one or more nucleic acid bases is not matched according to the base pairing rules. "Total" or "complete" complementarity between nucleic acids is where each and every nucleic acid base is matched with another base under the base pairing rules. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands. This is of particular importance in amplification reactions, as well as detection methods which depend upon binding between nucleic acids.

The terms "homology" and "homologous" as used herein in reference to nucleotide sequences refer to a degree of complementarity with other nucleotide sequences. There may be partial homology or complete homology (i.e., identity). A nucleotide sequence which is partially complementary, i.e.. "substantially homologous," to a nucleic acid sequence is one that at least partially inhibits a completely complementary sequence from hybridizing to a target nucleic acid sequence. The inhibition of hybridization of the completely complementary sequence to the target sequence may be examined using a hybridization assay (Southern or Northern blot, solution hybridization and the like) under conditions of low stringency. A substantially homologous sequence or probe will compete for and inhibit the binding (i.e.. the hybridization) of a completely homologous sequence to a target sequence under conditions of low stringency. This is not to say that conditions of low stringency are such that non-specific binding is permitted; low stringency conditions require that the binding of two sequences to one another be a specific (i.e., selective) interaction. The absence of non-specific binding may be tested by the use of a second target sequence which lacks even a partial degree of complementarity (e.g., less than about 30% identity); in the absence of nonspecific binding the probe will not hybridize to the second non-complementary target.

Low stringency conditions comprise conditions equivalent to binding or hybridization at 42°C in a solution consisting of 5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH₂PO₄-H₂O and 1.85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.1% SDS, 5X Denhardt's reagent [50X

Denhardt's contains per 500 ml: 5 g Ficoll (Type 400, Pharmacia), 5 g BSA (Fraction V; Sigma)] and 100 μg/ml denatured salmon sperm DNA followed by washing in a solution comprising 5X SSPE, 0.1% SDS at 42°C when a probe of about 500 nucleotides in length is employed.

The art knows well that numerous equivalent conditions may be employed to comprise low stringency conditions; factors such as the length and nature (DNA, RNA. base composition) of the probe and nature of the target ( DNA. RNA, base composition, present in solution or immobilized, etc.) and the concentration of the salts and other components (e.g., the presence or absence of formamide, dextran sulfate, polyethylene glycol), as well as components of the hybridization solution may be varied to generate conditions of low stringency hybridization different from, but equivalent to, the above listed conditions. In addition, the art knows conditions which promote hybridization under conditions of high stringency (e.g.. increasing the temperature of the hybridization and/or wash steps, the use of formamide in the hybridization solution, etc.).

When used in reference to a double-stranded nucleic acid sequence such as a cDNA or genomic clone, the term "substantially homologous" refers to any probe which can hybridize to either or both strands of the double-stranded nucleic acid sequence under conditions of low stringency as described above.

When used in reference to a single-stranded nucleic acid sequence, the term "substantially homologous" refers to any probe which can hybridize (i.e., it is the complement ol) the single-stranded nucleic acid sequence under conditions of low stringency as described above.

As used herein, the term "hybridization" is used in reference to the pairing of complementary nucleic acids using any process by which a strand of nucleic acid joins with a complementary strand through base pairing to form a hybridization complex. Hybridization and the strength of hybridization (i.e.. the strength of the association between the nucleic acids) is impacted by such factors as the degree of complementarity between the nucleic acids, stringency of the conditions involved, the T_m of the formed hybrid, and the G:C ratio within the nucleic acids.

As used herein the term "hybridization complex" refers to a complex formed between two nucleic acid sequences by virtue of the formation of hydrogen bonds between complementary G and C bases and between complementary A and T bases; these hydrogen bonds may be further stabilized by base stacking interactions. The two complementary nucleic acid sequences hydrogen bond in an antiparallel configuration. A hybridization complex may be formed in solution (e.g., C₀t or Rot analysis) or between one nucleic acid __.

sequence present in solution and another nucleic acid sequence immobilized to a solid support [e.g., a nylon membrane or a nitrocellulose filter as employed in Southern and Northern blotting, dot blotting or a glass slide as employed in in situ hybridization, including FISH (fluorescent in situ hybridization)]. As used herein, the term "T_m" is used in reference to the "melting temperature." The melting temperature is the temperature at which a population of double-stranded nucleic acid molecules becomes half dissociated into single strands. The equation for calculating the T_m of nucleic acids is well known in the art. As indicated by standard references, a simple estimate of the T_m value may be calculated by the equation: T_m = 81.5 + 0.41(% G + C), when a nucleic acid is in aqueous solution at 1 M NaCl [see e.g., Anderson and Young,

Quantitative Filter Hybridization, in Nucleic Acid Hybridization (1985)]. Other references include more sophisticated computations which take structural as well as sequence characteristics into account for the calculation of T_m.

As used herein the term "stringency" is used in reference to the conditions of temperature, ionic strength, and the presence of other compounds such as organic solvents, under which nucleic acid hybridizations are conducted. "Stringency" typically occurs in a range from about T_m-5°C (5°C below the T_m of the probe) to about 20°C to 25°C below T_m. As will be understood by those of skill in the art, a stringent hybridization can be used to identify or detect identical polynucleotide sequences or to identify or detect similar or related polynucleotide sequences.

As used herein, the term "amplifiable nucleic acid" is used in reference to nucleic acids which may be amplified by any amplification method. It is contemplated that "amplifiable nucleic acid" will usually comprise "sample template."

As used herein, the term "sample template" refers to nucleic acid originating from a sample which is analyzed for the presence of a target sequence of interest. In contrast.

"background template" is used in reference to nucleic acid other than sample template which may or may not be present in a sample. Background template is most often inadvertent. It may be the result of carryover, or it may be due to the presence of nucleic acid contaminants sought to be purified away from the sample. For example, nucleic acids from organisms other than those to be detected may be present as background in a test sample.

"Amplification" is defined as the production of additional copies of a nucleic acid sequence and is generally carried out using polymerase chain reaction technologies well known in the art [Dieffenbach CW and GS Dveksler (1995) PCR Primer, a Laboratory .__

Manual, Cold Spring Harbor Press, Plainview NY]. As used herein, the term "polymerase chain reaction" ("PCR") refers to the method of K.B. Mullis U.S. Patent Nos. 4.683,195 and 4,683,202. hereby incorporated by reference, which describe a method for increasing the concentration of a segment of a target sequence in a mixture of genomic DNA without cloning or purification. The length of the amplified segment of the desired target sequence is determined by the relative positions of two oligonucleotide primers with respect to each other, and therefore, this length is a controllable parameter. By virtue of the repeating aspect of the process, the method is referred to as the "polymerase chain reaction" (hereinafter "PCR"). Because the desired amplified segments of the target sequence become the predominant sequences (in terms of concentration) in the mixture, they are said to be "PCR amplified".

With PCR, it is possible to amplify a single copy of a specific target sequence in genomic DNA to a level detectable by several different methodologies (e.g., hybridization with a labeled probe; incorporation of biotinylated primers followed by avidin-enzyme conjugate detection; incorporation of ³²P-labeled deoxynucleotide triphosphates, such as dCTP or dATP. into the amplified segment). In addition to genomic DNA, any oligonucleotide sequence can be amplified with the appropriate set of primer molecules. In particular, the amplified segments created by the PCR process itself are, themselves, efficient templates for subsequent PCR amplifications.

Amplification in PCR requires "PCR reagents" or "PCR materials", which herein are defined as all reagents necessary to carry out amplification except the polymerase, primers and template. PCR reagents nomally include nucleic acid precursors (dCTP. dTTP etc.) and buffer.

As used herein, the term "primer" refers to an oligonucleotide. whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is induced, (i.e.. in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer is preferably single stranded for maximum efficiency in amplification, but may alternatively be double stranded. If double stranded, the primer is first treated to separate its strands before being used to prepare extension products.

Preferably, the primer is an oligodeoxyribonucleotide. The primer must be sufficiently long to prime the synthesis of extension products in the presence of the inducing agent. The exact lengths of the primers will depend on many factors, including temperature, source of primer and the use of the method.

As used herein, the term "probe" refers to an oligonucleotide (i.e., a sequence of nucleotides), whether occurring naturally as in a purified restriction digest or produced synthetically, recombinantly or by PCR amplification, which is capable of hybridizing to another oligonucleotide of interest. A probe may be single-stranded or double-stranded. Probes are useful in the detection, identification and isolation of particular gene sequences. It is contemplated that any probe used in the present invention will be labelled with any "reporter molecule," so that it is detectable using any detection system, including, but not limited to enzyme (e.g., ELISA, as well as enzyme-based histochemical assays), fluorescent, radioactive, and luminescent systems. It is not intended that the present invention be limited to any particular detection system or label.

As used herein, the terms "restriction endonucleases" and "restriction enzymes" refer to bacterial enzymes, each of which cut double-stranded DNA at or near a specific nucleotide sequence.

DNA molecules are said to have "5' ends" and "3" ends" because mononucleotides are reacted to make oligonucleotides in a manner such that the 5' phosphate of one mononucleotide pentose ring is attached to the 3' oxygen of its neighbor in one direction via a phosphodiester linkage. Therefore, an end of an oligonucleotide is referred to as the "5' end" if its 5^" phosphate is not linked to the 3' oxygen of a mononucleotide pentose ring. An end of an oligonucleotide is referred to as the "3' end" if its 3^" oxygen is not linked to a 5^" phosphate of another mononucleotide pentose ring. As used herein, a nucleic acid sequence, even if internal to a larger oligonucleotide. also may be said to have 5' and 3' ends. In either a linear or circular DNA molecule, discrete elements are referred to as being "upstream" or 5^" of the "downstream" or 3' elements. This terminology reflects the fact that transcription proceeds in a 5^" to 3^" fashion along the DNA strand. The promoter and enhancer elements which direct transcription of a linked gene are generally located 5' or upstream of the coding region. However, enhancer elements can exert their effect even when located 3' of the promoter element and the coding region. Transcription termination and polyadenylation signals are located 3^" or downstream of the coding region.

As used herein, the term "an oligonucleotide having a nucleotide sequence encoding a gene" means a nucleic acid sequence comprising the coding region of a gene, i.e. the nucleic acid sequence which encodes a gene product. The coding region may be present in either a cDNA. genomic DNA or RNA form. When present in a DNA form, the oligonucleotide may be single-stranded (i.e., the sense strand) or double-stranded. Suitable control elements such as enhancers/promoters, splice junctions, polyadenylation signals, etc. may be placed in close proximity to the coding region of the gene if needed to permit proper initiation of transcription and/or correct processing of the primary RNA transcript. Alternatively, the coding region utilized in the expression vectors of the present invention may contain endogenous enhancers/promoters, splice junctions, intervening sequences, polyadenylation signals, etc. or a combination of both endogenous and exogenous control elements.

As used herein, the term "regulatory element" refers to a genetic element which controls some aspect of the expression of nucleic acid sequences. For example, a promoter is a regulatory element which facilitates the initiation of transcription of an operably linked coding region. Other regulatory elements are splicing signals, polyadenylation signals, termination signals, etc.

Transcriptional control signals in eukaryotes comprise "promoter" and "enhancer" elements. Promoters and enhancers consist of short arrays of DNA sequences that interact specifically with cellular proteins involved in transcription [Maniatis, T. et al. , Science 236:1237 (1987)]. Promoter and enhancer elements have been isolated from a variety of eukaryotic sources including genes in plant, yeast, insect and mammalian cells and viruses (analogous control elements, i.e., promoters, are also found in prokaryotes). The selection of a particular promoter and enhancer depends on what cell type is to be used to express the protein of interest.

The presence of "splicing signals" on an expression vector often results in higher levels of expression of the recombinant transcript. Splicing signals mediate the removal of introns from the primary RNA transcript and consist of a splice donor and acceptor site [Sambrook. J. et al.. Molecular Cloning: A Laboratory Manual, 2nd ed.. Cold Spring Harbor

Laboratory Press. New York (1989) pp. 16.7-16.8]. A commonly used splice donor and acceptor site is the splice junction from the 16S RNA of SV40.

Efficient expression of recombinant DNA sequences in eukaryotic cells requires expression of signals directing the efficient termination and polyadenylation of the resulting transcript. Transcription termination signals are generally found downstream of the polyadenylation signal and are a few hundred nucleotides in length. The term "poly A site" or "poly A sequence" as used herein denotes a DNA sequence which directs both the termination and polyadenylation of the nascent RNA transcript. Efficient polyadenylation of __.

the recombinant transcript is desirable as transcripts lacking a poly A tail are unstable and are rapidly degraded. The poly A signal utilized in an expression vector may be "heterologous" or "endogenous." An endogenous poly A signal is one that is found naturally at the 3' end of the coding region of a given gene in the genome. A heterologous poly A signal is one which is isolated from one gene and placed 3' of another gene.

The term "transfection" or "transfected" refers to the introduction of foreign DNA into a cell.

As used herein, the terms "nucleic acid molecule encoding." "DNA sequence encoding." and "DNA encoding" refer to the order or sequence of deoxyribonucleotides along a strand of deoxyribonucleic acid. The order of these deoxyribonucleotides determines the order of amino acids along the polypeptide (protein) chain. The DNA sequence thus codes for the amino acid sequence.

As used herein, the term "antisense" is used in reference to RNA sequences which are complementary to a specific RNA sequence (e.g., mRNA). Antisense RNA may be produced by any method, including synthesis by splicing the gene(s) of interest in a reverse orientation to a viral promoter which permits the synthesis of a coding strand. Once introduced into a cell, this transcribed strand combines with natural mRNA produced by the cell to form duplexes. These duplexes then block either the further transcription of the mRNA or its translation. In this manner, mutant phenotypes may be generated. The term "antisense strand" is used in reference to a nucleic acid strand that is complementary to the "sense" strand. The designation (-) (i.e. , "negative") is sometimes used in reference to the antisense strand, with the designation (+) sometimes used in reference to the sense (i.e. , "positive") strand.

The term "Southern blot" refers to the analysis of DNA on agarose or acrylamide gels to fractionate the DNA according to size, followed by transfer and immobilization of the

DNA from the gel to a solid support, such as nitrocellulose or a nylon membrane. The immobilized DNA is then probed with a labeled oligo-deoxyribonucleotide probe or DNA probe to detect DNA species complementary to the probe used. The DNA may be cleaved with restriction enzymes prior to electrophoresis. Following electrophoresis, the DNA may be partially depurinated and denatured prior to or during transfer to the solid support. Southern blots are a standard tool of molecular biologists [J. Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, NY, pp 9.31 -9.58]. __.

The term "Northern blot" as used herein refers to the analysis of RNA by electrophoresis of RNA on agarose gels to fractionate the RNA according to size followed by transfer of the RNA from the gel to a solid support, such as nitrocellulose or a nylon membrane. The immobilized RNA is then probed with a labeled oligo-deoxyribonucleotide probe or DNA probe to detect RNA species complementary to the probe used. Northern blots are a standard tool of molecular biologists [J. Sambrook, J. et al. (1989) supra, pp 7.39-7.52].

The term "reverse Northern blot" as used herein refers to the analysis of DNA by electrophoresis of DNA on agarose gels to fractionate the DNA on the basis of size followed by transfer of the fractionated DNA from the gel to a solid support, such as nitrocellulose or a nylon membrane. The immobilized DNA is then probed with a labeled oligo-ribonuclotide probe or RNA probe to detect DNA species complementary to the ribo probe used.

The term "isolated" when used in relation to a nucleic acid, as in "an isolated oligonucleotide" refers to a nucleic acid sequence that is identified and separated from at least one contaminant nucleic acid with which it is ordinarily associated in its natural source. Isolated nucleic acid is nucleic acid present in a form or setting that is different from that in which it is found in nature. In contrast, non-isolated nucleic acids are nucleic acids such as DNA and RNA which are found in the state they exist in nature.

As used herein, the term "purified" or "to purify" refers to the removal of undesired components from a sample. As used herein, the term "substantially purified" refers to molecules, either nucleic or amino acid sequences, that are removed from their natural environment, isolated or separated, and are at least 60% free, preferably 75% free, and most preferably 90% free from other components with which they are naturally associated. An "isolated polynucleotide" is therefore a substantially purified polynucleotide. As used herein the term "coding region" when used in reference to a structural gene refers to the nucleotide sequences which encode the amino acids found in the nascent polypeptide as a result of translation of a mRNA molecule. The coding region is bounded, in eukaryotes. on the 5^" side by the nucleotide triplet "ATG" which encodes the initiator methionine and on the 3' side by one of the three triplets which specify stop codons (i.e. , TAA, TAG. TGA).

As used herein, the term "structural gene" refers to a DNA sequence coding for RNA or a protein. In contrast, "regulatory genes" are structural genes which encode products which control the expression of other genes (e.g., transcription factors). As used herein, the term "gene" means the deoxyribonucleotide sequences comprising the coding region of a structural gene and including sequences located adjacent to the coding region on both the 5' and 3' ends for a distance of about 1 kb on either end such that the gene corresponds to the length of the full-length mRNA. The sequences which are located 5^" of the coding region and which are present on the mRNA are referred to as 5' non-translated sequences. The sequences which are located 3' or downstream of the coding region and which are present on the mRNA are referred to as 3' non-translated sequences. The term "gene" encompasses both cDNA and genomic forms of a gene. A genomic form or clone of a gene contains the coding region interrupted with non-coding sequences termed "introns" or "intervening regions" or "intervening sequences." Introns are segments of a gene which are transcribed into heterogenous nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. Introns are removed or "spliced out" from the nuclear or primary transcript: introns therefore are absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify the sequence or order of amino acids in a nascent polypeptide.

In addition to containing introns, genomic forms of a gene may also include sequences located on both the 5' and 3' end of the sequences which are present on the RNA transcript. These sequences are referred to as "flanking" sequences or regions (these flanking sequences are located 5^* or 3' to the non-translated sequences present on the mRNA transcript). The 5" flanking region may contain regulatory sequences such as promoters and enhancers which control or influence the transcription of the gene. The 3" flanking region may contain sequences which direct the termination of transcription, posttranscriptional cleavage and polyadenylation.

The term "sample" as used herein is used in its broadest sense and includes environmental and biological samples. Environmental samples include material from the environment such as soil and water. Biological samples may be animal, including, human, fluid (e.g.. blood, plasma and serum), solid (e.g., stool), tissue, liquid foods (e.g., milk), and solid foods (e.g., vegetables).

The term "bacteria" and "bacterium" refer to all prokaryotic organisms, including those within all of the phyla in the Kingdom Procaryotae. It is intended that the term encompass all microorganisms considered to be bacteria including Mycoplasma. Chlamydia, Actinomyces, Streptomyces, and Rickettsia. All forms of bacteria are included within this definition including cocci, bacilli, spirochetes, spheroplasts, protoplasts, etc. Also included within this term are prokaryotic organisms which are gram negative or gram positive. "Gram negative" and "gram positive" refer to staining patterns with the Gram-staining process which is well known in the art [Finegold and Martin, Diagnostic Microbiology, 6th Ed. (1982), CV Mosby St. Louis, pp 13-15]. "Gram positive bacteria" are bacteria which retain the primary dye used in the Gram stain, causing the stained cells to appear dark blue to purple under the microscope. "Gram negative bacteria" do not retain the primary dye used in the Gram stain, but are stained by the counterstain. Thus, gram negative bacteria appear red.

DESCRIPTION OF THE DRAWINGS Figure 1 schematically shows one embodiment of the primers of the present invention

(a "K primer") partially hybridized to one strand of a denatured double-stranded template.

Figure 2 schematically shows one embodiment of the primers of the present invention (an "RE Primer") partially hybridized to the other strand of denatured double-stranded target DNA. Figure 3 is an autoradiograph of PAGE showing differential expression in a variety of human cell types.

Figure 4 is an autoradiograph of PAGE showing differential expression in a variety of species of bacteria.

Figure 5 is an autoradiograph of PAGE showing differential expression in a variety of human cell types where differentially expressed bands have been obtained and cloned.

Figure 6 shows the nucleic acid sequence of one of the cloned transcripts encoding a human mitochondrial hinge protein.

Figure 7 shows the sequence of one of the cloned transcripts corresponding to a coactivator gene. Figure 8 is an autoradiograph of PAGE showing differential expression in normal and malignant tissue.

DESCRIPTION OF THE INVENTION

The present invention relates to the identification of expressed genes, and in particular, methods and compositions for distinguishing between the expression of genes in two or more biological samples. The description of the invention involves the I) Design of the Primers, II) Preparation of RNA from Samples; and III) Comparing of Biological Samples. .__

I. Design of Primers

To identify differentially expressed genes ideally one must be able to identify nearly all of the expressed genes (or at least a significant majority of them) in a cell type, only then a meaningful comparison can be made with a related cell or tissue sample. For this purpose. the present invention contemplates the use of specific primers able to anneal with sequences which are conserved in expressed genes.

In one embodiment, the present invention contemplates primers directed at the Kozak sequence, a string of non-random nucleotides which are present before the translation initiating first ATG in majority of the mRNAs which are transcribed and translated in an eukarytic cells. See M. Kozak, Cell 44:283-292 (1986). Thus, an oligonucleotide primer specific for the Kozak sequence (consensus sequence 5XGCCA/GCCATGG-3') with degenerate bases at its 5^' and 3" end will provide sufficient specificity to be used in a PCR amplification reaction as an upstream primer. Additionally, the presence of degenerate bases at the 3 ^"-end of these primers (Kozak or K primers) would reduce the complexity of the copied transcript pool to which this primer may hybridize thus allowing the primers to access and anneal with specific subsets of transcripts overcoming the problem of ^"competition' in a PCR reaction.

Based on the knowledge of distribution of specific DNA sequences within the genome which are recognized by restriction endonucleases ("RE"), a second primer (an "RE primer") can be designed. Again, the presence of degenerate bases at the 5' and 3' end of these primers would provide length sufficient to give specificity in a PCR amplification reaction. Since the ability of a primer pair to amplify a transcript is a function of transcript abundance and the specificity of primer-template interactions, the use of K and RE-primers is likely to significantly improve the detection rate of rare mRNAs-an outcome not possible with standard or modified differential display methods because of the use of random primers.

A. Specific Design Considerations

1. Kozak Primers (Upstream Primers)

M. Kozak performed an analysis of nearly 700 vertebrate mRNAs. See M. Kozak, "An analysis of 5Xnoncoding sequences from 699 vertebrate messenger RNAs, Nucleic Acids

Research 15:8125 (1987). The results provide a general approximation of the frequency of A.C. G and T around the translational start site in vertebrate mRNAs: Position

-6 -5 -4 -3 -2 -1 +4

%A 17 18 25 61 27 15 23

%C 19 39 53 2 49 55 16

%G 44 23 15 36 13 21 46

%T 20 20 7 1 11 9 15

A search of the GenBank and a random selection of 100 mRNA sequences (largely human) revealed that bases at +5 position (with reference to translation initiating ATG triplet of the Kozak sequence and A being +1) are also highly conserved. These results indicated that >38% of the mRNAs surveyed had a C at this position and approximately 25% had an A at this position.

The present invention therefore contemplates primers which can specifically hybridize with the nucleotide sequences present around the initiating codon. Collectively, these primers would hybridize with all of the expressed mRNAs although the hydridization of individual primers within an expressed gene pool may vary. This would help in reducing the complexity of the target transcripts by effectively dividing the transcript pool in subsets based on the presence of the nucleotides with reference to the ATG in the mRNA sequence.

Specifically, with regard to the primers of the present invention, it is contemplated that degenerate bases can be used i) before the consensus Kozak sequence at the 5' end. ii) inside the Kozak sequence (e.g. at position -5) and/or iii) after the ATG at the 3^' end. In one embodiment, the primers are selected from the group consisting of the primers: NNN-X- GCC(A or G)CCATGGNN; NNN-X-GCC(A or G)CCATGANN; NNN-X-GCC(A or G)CCATGCNN; and NNN-X-GCC(A or G)CCATGTNN (wherein X is either a recognition sequence or nothing, and wherein N is either A,T,G,C or nothing). This embodiment contains primers that vary at the +4 position.

It is not intended that the present invention be limited by the nature of the recognition sequence. By "recognition sequence" it is meant that the sequence is a known sequence that can be targeted by a) nucleic acid hybridization (e.g. poly(dT) or poly (dA), b) an enzyme (e.g. a restriction enzyme), or c) a ligand (e.g. biotin or avidin). Preferred primers are those where X is the recognition sequence for a restriction enzyme; introducing this sequence into expressed genes facilitates subsequent manipulation (e.g. cloning).. For example, preferred primers are those where X is the recognition sequence for the restriction enzyme BamHl; these primers are selected from the group consisting of NNNGGATCCGCC(A or G)CCATGGNN; NNNGGATCCGCC(A or G)CCATGANN; NNNGGATCCGCC(A or G)CCATGCNN; and NNNGGATCCGCC(A or G)CCATGTNN (wherein N is either A.T,G,C or nothing). Table 1 sets forth, for illustrative purposes, a number of restriction enzyme recognition sequences. One skilled in the art will understand that "X" (see above) can be selected from this list depending on design considerations. Other restriction enzymes from commercially available sources have recognition sequences that can also be employed with success.

Primers containing facilitating moieties such as recoginition sequences allow for the introduction of such sequences into the product of the amplification reaction. That is to say, amplification in PCR involves primer extension to make the so-called "long products." These long products are the template for subsequent cycles of amplification. While it is not intended that the present invention be limited by any understanding of the mechanism whereby the primers of the present invention successfully operate, it is believed that a primer such as NNN-X-GCC(A or G)CCATGGNN will only partially hybridize to one strand of the denatured double-stranded target nucleic acid in the first round as set forth in Figure 1.

To improve hybridization of the primer for making long products, the present invention, in one embodiment, contemplates using a lower annealing temperature (discussed more below). To improve the specificity of hybridization in subsequent cycles, the present invention, in one embodiment, also contemplates isolating the long products via the recognition sequence prior to subsequent cycles. In one embodiment, the long products are isolated using an oligo (dT) resin; the long products containing the corresponding recognition sequence bind to the resin, while the background template nucleic acid does not. In this manner, the background template can be removed and subsequent rounds of hybridization are carried out on the long products [with the same primers or with the primers that lack the recognition sequence (but that are otherwise the same)].

In another embodiment, the primers are selected from the group consisting of the primers: NNN-X-GCC(A or G)CCATGG(C or A)GNN; NNN-X-GCC(A or G)CCATGG(C or A)TNN; NNN-X-GCC(A or G)CCATGG(C or A)ANN; and NNN-X-GCC(A or G)CCATGG (C or A)CNN (wherein X is either a recognition sequence or nothing, and wherein N is either

A,T,G.C or nothing). This embodiment contains primers with the concensus sequence extending to the +5 position, but that vary at the +6 position. TABLE 1 - RECOGNITION SEQUENCES

Kev:

D = A or G or T N = A or C or G or T H = A or C or T R = A or G K = G or T S = C or G M = A or C Y = C or T The present invention contemplates primers where there are many degenerate bases after the ATG at the 3' end (e.g. between three and ten. more preferrably between three and five) as well as where there is only one degenerate base after the ATG at the 3' end. In one embodiment, the primers are selected from the group consisting of the primers: GCC(A or G)CCATGN (wherein N is either A,T,G or C). These primers can be linked to a recognition sequence ("X") in the manner described above, if desired.

The present invention also contemplates primers where there are a number of degenerate bases at the 5^" end (i.e. prior to the Kozak sequence). In one embodiment, the primers are selected from the group consisting of the primers: N ₀GCC(A or G)CCATGGNN; N_M0GCC(A or G)CCATGANN;

N_M0GCC(A or G)CCATGCNN; and N,_.I0GCC(A or G)CCATGTNN (wherein N is either A.T,G or C).

In another embodiment, the primers are selected from the group consisting of the primers: CGGGATCCGCC(A or G)CNATGG (hereinafter "Kl" when N is C); CGGGATCCGCC(A or G)CNATGA (hereinafter "K2" when N is C); CGGGATCCGC A or G)CNATGC (hereinafter "K3" when N is C); and CGGGATCCGCC(A or G)CNATGT (hereinafter "K4" when N is C).

In another embodiment, the primers are selected from the group consisting of the primers: CGGGATCCGCC(A or G)(C or G)NATGG (hereinafter "K-2-1 " when N is C); CGGGATCCGCC(A or G)(C or G)NATGC (hereinafter "K-2-2" when N is C);

CGGGATCCGCC(A or G)(C or G)NATGT (hereinafter "K-2-3" when N is C); and CGGGATCCGCC(A or G)CNATGA (hereinafter "K-2-4" when N is C).

In another embodiment, the primers are selected from the group consisting of the primers: CGGGATCCGCC(A or G)(C or G)NATGGN (hereinafter "K-3-1") when N is C); CGGGATCCGCC(A or G)(C or G)NATGCN (hereinafter "K-3-2"); CGGGATCCGCC(A or

G)(C or G)NATGTN (hereinafter "K-3-3"); and CGGGATCCGCC(A or G)CNATGAN (hereinafter "K-3-4"). In these embodiments, N can be A, C, G or T.

It is not intended that the present invention be limited to the entire Kozak sequence. It is specifically contemplated that the primer of the present invention can be only partially complementary to this natural common non-coding sequence. For example, in one embodiment, the present invention contemplates linking the ATG triplet to degenerate bases on either side (or both sides). A recognition sequence ("X") can be linked to such a primer on the 5^" end. In such an embodiment, the primers are of the general formula: 5XN,.₁₀X-N,. _i()ATGN_M0-3^" (wherein N is A, T,G, C or nothing). In a preferred embodiment, X is the recognition sequence for a restriction enzyme; again, introducing this sequence into expressed genes facilitates subsequent manipulation (e.g. cloning).. For example, preferred primers are those where X is the recognition sequence for the restriction enzyme BamRl; these primers are selected from the group consisting of NGGATCCNNNATGA; NGGATCCNNNATGC;

NGGATCCNNNATGT: and NGGATCCNNNATGG (wherein N is either A.T,G,C or nothing).

While the above discussion has focused on primer extension or PCR of DNA using K primers, the present invention also contemplates hybridization of the K primers to the corresponding mRNA Kozak sequence: 5XACCAUGG. In addition, primers can be made having the ACCAUGG sequence that can be used to hybridize to DNA.

2. Primers Complementary To Restriction Enzyme Recognition

Sequences (Downstream Primers) Since the efficiency of sequencing gels in resolving DNA fragments greater than 600 bases is very limited, the presence of recognition sequences for 4 and 6 base cutting restriction enzymes were searched within 600 bp from the putative Kozak sequence. It was found that the sequence GATC. which is recognized by the Mbo I enzyme and its isoschizomer Saιι3Al. was present in the target region at least once in 72% of the cDNAs. The remaining of the cDNAs had sequences for other common restriction enzymes Hpa II

( 10%): HinPll (6%); Maell (5%). Sequences for Msel (TTAA) and Mael (CTAG) restriction endonucleases were present in only 1 and 2% respectively of the cDNAs surveyed. Thus by using oligonucleotide primers having 3Xsequences complementary to the recognition sequences for 4-6 common restriction enzymes in combination with Kozak primers one could amplify the entire repertoire of the expressed genes.

Therefore, the present invention contemplates downstream primers designed with recoginition sequences for common restriction enzymes (hereafter "RE" primers). In one embodiment, the RE primers are designed with degeneraate bases on either side (or both sides) of the recognition sequence. In a preferred embodiment, the RE primer is designed with 3 degenerate bases at the 5^' and 2 degenerate bases at the 3' end (5^XN₃-specific recognition sequence-N₂-3').

In one embodiment, the downstream primers of the present invention are primers selected from the group consisting of the primers: 5XX-NNNGATC-3' ( i.e. having the recognition sequence for Mbol); 5'-X-NNNCTAG-3^" (i.e. having the recognition sequence for Bfal); 5^"-X-NNNCCGC-3' (i.e. having the recognition sequence for Acil); 5^"-X-NNNCCGG-3' (i.e. having the recognition sequence for Hpall); and 5'-X- NNNAATT-3^' (i.e. having the recognition sequence for Tsp 509 I), wherein X is a recognition sequence on the 5^" end that is different from the recognition sequence of the 3' end. or X is nothing).

It is not intended that the present invention be limited by the recognition sequence on the 5^" end: again resort can be made to a variety of recognition sequences, including but not limited to those sequences found in Table 1. In one embodiment, the recognition sequence on the 5 ^" end of the downstream primers of the present invention is for EcoRI. Such primers are selected from the group consisting of the primers: GAATTCNNNGATC; GAATTCNNNCTAG; GAATTCNNNCCGC: GAATTCNNNCCGG: GAATTCNNNAATT; GAATTCNNNTTAA: and GAATTCNNNGCGC.

In one embodiment, the recognition sequence on the 5' end of the downstream primers of the present invention is for BamHI. Such primers are selected from the group consisting of the primers: GGATTCCNNNGATC (hereinafter "Mbol primer"); GGATTCCNNNCTAG (hereinafter "Bfal primer"); GGATTCCNNNCCGC (hereinafter "Acil primer"); GGATTCCNNNCCGG (hereinafter "Hpall primer"); and GGATTCCNNNAATT (hereinafter "Tsp509I primer"). Primers containing facilitating moieties such as 5^" recoginition sequences of the RE primers of the present invention allow for the introduction of such sequences into the product of the amplification reaction. As noted above, amplification in PCR involves primer extension to make the so-called "long products." These long products are the template for subsequent cycles of amplification. While it is not intended that the present invention be limited by any understanding of the mechanism whereby the primers of the present invention successfully operate, it is believed that a primer such as X-NNNGATC will only partially hybridize to one strand of the denatured double-stranded target nucleic acid in the first round as set forth in Figure 2.

It is not intended that the primers of the present invention be limited by the precise sequence of a restriction recognition sequence. Indeed, it is specifically contemplated that the primers of the present invention can be only partially complementary to the recognition sequence. B. Shine-Dalgarno

The prokaryotic mRNA ribosome binding site (RBS) usually contains part or all of a polypurine domain UAAGGAGGU known as the Shine-Dalgarno (SD) sequence found just 5' to the translation initiation codon: mRNA 5'-UAAGGAGGU - N₅.,₀ - AUG

The present invention therefore contemplates primers containing this motif (in a manner similar to the Kozak motif discussed above). An oligonucleotide primer specific for the SD sequence (with or without degenerate bases at its 5' and 3' end) will provide sufficient specificity to be used in a PCR amplification reaction as an upstream primer. Additionally, Taq DNA polymerase adds an A to the 5 'end of such PCR products and this can be used to clone by virtue of commercially available ligation kits (e.g. from Promega).

Based on the knowledge of distribution of specific DNA sequences within the genome which are recognized by restriction endonucleases ("RE"), a second primer (a "RE primer") can be designed for use with the SD primer. Again, the presence of degenerate bases at the 5" and 3^" end of these primers would provide length sufficient to give specificity in a PCR amplification reaction.

In one embodiment, the SD primers of the present invention are of the general formula: 5^'-N _O-X-N ₍TAAGGAGGN_MO-3' (where X is a recognition sequence or nothing, and where N is A, T, G. C or nothing). In a preferred embodiment, the recognition sequence (X) is a restriction enzyme recognition sequence; such sequences can be selected from Table 1 or other known lists of such sequences. On the other hand, the recognition sequence can be a region of nucleic acid that can be targeted by hybridization or by a ligand. Such recognition sequences can be used to separate the products of the first cycles of PCR (as discussed above). Where the recognition sequence is a restriction enzyme recognition sequence, a preferred sequence is that for the enzyme EcoRI. In such an embodiment, the SD primers are selected from the group of the general formula:

5 ' -NG AATTCNNNTAAGGAGG-3 ' where N is A. T, G, C or nothing. It is not intended that the present invention be limited to the entire SD sequence. For example, in one embodiment, the present invention contemplates linking a portion of the SD sequence (e.g. AGGAGG) to degenerate bases on either side (or both sides) to create a useful primer. It is also contemplated that the SD primers of the present invention need not hybridize completely to the target nucleic acid. In the manner set forth in Figure 1 for K primers, it is contemplated that the primer can be extended even though portions of the primer are not hybridized.

II. Preparation of RNA

The nucleic acid content of cells consists of deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). The DNA contains the genetic blueprint of the cell. RNA is involved as an intermediary in the production of proteins based on the DNA sequence. RNA exists in three forms within cells, structural RNA (i.e., ribosomal RNA "rRNA"), transfer

RNA ("tRNA"). which is involved in translation, and messenger RNA ("mRNA"). Since the mRNA is the intermediate molecule between the genetic information encoded in the DNA, and the corresponding proteins, the cell's mRNA component at any given time is representative of the physiological state of the cell. In order to study and utilize the molecular biology of the cell, it is therefore important to be able to purify mRNA, including purifying mRNA from the total nucleic acid of a sample.

The preparation of RNA is complicated by the presence of ribonucleases that degrade RNA (e.g.. T. Maniatis et al.. Molecular Cloning, pp. 188-190, Cold Spring Harbor Laboratory [1982]). Furthermore, the preparation of amplifiable RNA is made difficult by the presence of ribonucleoproteins in association with RNA. ( See, R. J. Slater, In:

Techniques in Molecular Biology, J.M. Walker and W. Gaastra, eds.. Macmillan. NY, pp. 1 13-120 [1983]).

Typically, the steps involved in purification of nucleic acid from cells include 1) cell lysis; 2) inactivation of cellular nucleases; and 3) separation of the desired nucleic acid from the cellular debris and other nucleic acid. Cell lysis may be achieved through various methods, including enzymatic, detergent or chaotropic agent treatment. Inactivation of cellular nucleases may be achieved by the use of proteases and/or the use of strong denaturing agents. Finally, separation of the desired nucleic acid is typically achieved by extraction of the nucleic acid with phenol or phenol-chloroform; this method partitions the sample into an aqueous phase (which contains the nucleic acids) and an organic phase (which contains other cellular components, including proteins). Commonly used protocols require the use of salts in conjunction with phenol (P. Chomczynski and N. Sacchi, Anal. Biochem. 162:156 [1987]), __.

or employ a centrifugation step to remove the protein (R.J. Slater, supra). While useful, phenol extraction is time consuming and creates a serious waste disposal problem.

Once the nucleic acid fraction been isolated from the cell, the structure of the mRNA molecule may used to assist in the purification of mRNA from DNA and other RNA molecules. Because the mRNA of higher organisms is usually polyadenylated on its 3' end

("poly-A tail" or "poly-A track"), one means of isolating RNA from cells has been based on binding the poly-A tail with its complementary sequence (i.e., oligo-dT), that has been linked to a support such as cellulose. Commonly, the hybridized mRNA/ oligo-dT is separated from the other components present in the sample through centrifugation or. in the case of magnetic formats, exposure to a magnetic field. Once the hybridized mRNA/oligo-dT is separated from the other sample components, the mRNA is usually removed from the oligo-dT. However, for some applications, the mRNA may remain bound to the oligo-dT that is linked to a solid support.

A wide variety of solid supports with linked oligo-dT have been developed and are commercially available. Cellulose remains the most common support for most oligo-dT systems, although formats with oligo-dT covalently linked to latex beads and paramagnetic particles have also been developed and are commercially available. The paramagnetic particles may be used in a biotin-avidin system, in which biotinylated oligo-dT is annealed in solution to mRNA. The hybrids are then captured with streptavidin-coated paramagnetic particles, and separated using a magnetic field. In addition to these methods, variations exist, such as affinity purification of polyadenylated RNA from eukaryotic total RNA in a spun- column format. These approaches allow for hybridization of poly A mRNA. but vary in efficiency and sensitivity.

It is not intended that the present invention be limited by the source of RNA; a variety of sources is contemplated, including but not limited to mammalian (e.g. liver tissue), plant

(e.g. tobacco leaves) and microbial (e.g. yeast). In one embodiment, the present invention contemplates the isolation of PolyA+ RNA from extracts, including direct isolation from crude extracts.

III. Comparing Biological Samples

Successful amplification can be confirmed by characterization of the product(s) from the reaction. The present invention contemplates, in one embodiment, using electrophoresis to confirm product formation and compare the results between samples. A. Cancer Tissue

As noted above, the present invention may be used to compare normal tissue with cancer tissue, as well as to differentiate between cancer tissue that is metastatic and cancer tissue that is non-metastatic. In yet another embodiment, the present invention may be used to detect drug resistance.

The treatment of cancer has been hampered by the fact that there is considerable heterogeneity even within one type of cancer. Some cancers, for example, have the ability to invade tissues and display an aggressive course of growth characterized by metastases. These tumors generally are associated with a poor outcome for the patient. And yet. without a . means of identifying such tumors and distinguishing such tumors from non-invasive cancer. the physician is at a loss to change and/or optimize therapy.

With regard to metastatic disease, it is believed that cancer cells proteolytically alter basement membranes underlying epithelia or the endothelial linings of blood and lymphatic vessels, invade through the defects created by proteolysis, and enter the circulatory or lymphatic systems to colonize distant sites. During this process, the secretion of proteolytic enzymes is coupled with increased cellular motility and altered adhesion. After their colonization of distant sites, metastasizing tumor cells proliferate to establish metastatic nodules. The present invention can be used to compare metastatic cancer tissue with non- metastatic cancer tissue to identify differentially expressed genes as markers of metastatic potential. Thereafter, the present invention can be used to determine the presence or absence of these markers in various clinical cancer isolates. The present invention also contemplates "phenotyping" cancer cells adapted to tissue culture.

With regard to drug resistance, it should be noted that success with chemotherapeutics as anticancer agents has been severely hampered by the phenomenon of multiple drug resistance, resistance to a wide range of structurally unrelated cytotoxic anticancer compounds. J.H. Gerlach et al.. Cancer Surveys, 5:25-46 (1986). The underlying cause of progressive drug resistance may be due to a small population of drug-resistant cells within the tumor (e.g.. mutant cells) at the time of diagnosis. J.H. Goldie and Andrew J. Coldman, Cancer Research, 44:3643-3653 (1984). Treating such a tumor with a single drug first results in a remission, where the tumor shrinks in size as a result of the killing of the predominant drug-sensitive cells. With the drug-sensitive cells gone, the remaining drug-resistant cells continue to multiply and eventually dominate the cell population of the tumor. The present invention can be used to compare drug resistant cells with non-resistant cells to identify __.

differentially expressed genes as markers of drug resistance. Thereafter, the present invention can be used to determine the presence or absence of these markers in various clinical cancer isolates.

B. Classification and Identification of Microorganisms

The detection and identification of microorganisms recovered from clinical specimens or environmental sources is an important aspect of clinical microbiology, as this information is important to physicians in making decisions related to methods of treatment. In order that a particular microorganism is identified correctly and consistently, regardless of the source or the laboratory identifying the organism, reproducible systems for identifying microorganisms are critical. As stated by Finegold. "The primary purpose of nomenclature of microorganisms is to permit us to know as exactly as possible what another clinician, microbiologist, epidemiologist, or author is referring to when describing an organism responsible for infection of an individual or outbreak" (S. Finegold. "Introduction to summary of current nomenclature, taxonomy, and classification of various microbial agents," Clin. Infect. Dis., 16:597 [1993]).

Classification, nomenclature, and identification are three separate, but interrelated aspects of taxonomy. Classification is the arranging of organisms into taxonomic groups (i.e., taxa) on the basis of similarities or relationships. A multitude of prokaryotic organisms has been identified, with great diversity in their types, and many more organisms being characterized and classified on a regular basis. It is a matter of convenience to classify the organisms into groups based upon their similarities. Classification has been used to organize the seemingly chaotic array of individual bacteria into an orderly framework. Through use of a classification framework, a new isolate can be more easily be characterized by comparison with known organisms. The choice of criteria for placement into groups is somewhat arbitrary, although most classifications are based on phylogenetic relationships. An example of the arbitrariness of bacterial classification is reflected in the genetic definition of a "species" as being strains of bacteria that exhibit 70% DNA relatedness. with 5% or less divergence within related sequences (Baron et al., "Classification and identification of bacteria," in Manual of Clinical Microbiology, Murray et al. (eds.), ASM Press, Washington. D.C., pp. 249-264 [1995]).

There are two basic genetic test methods used in the classification and identification of bacteria. Nucleic acid hybridization studies may be conducted to determine the degree of relatedness of organisms on the DNA level. Ribosomal RNA (rRNA) sequence analysis is another method used to study the relationships between organisms. In addition to these methods, molecular probes and amplification methods (e.g., PCR) may be used to detect and identify microorganisms.

In nucleic acid hybridization methods, the test DNA is denatured and exposed to denatured DNA of known sequence from a particular organism. The amount of hybridization between the test DNA and known DNA provides an indication of the degree of relatedness between the test and known organisms. An important drawback to this approach is that hybridization between two single DNA strands can occur even when 15% of the sequences are not complementary. Ribosomal RNA analysis is another method by which the relatedness of organisms has been determined. Because ribosomes are critical to cellular function and interact with many other molecules (e.g., mRNA and tRNAs), the core rRNA sequences are highly constrained and well-conserved throughout evolution. However, because rRNA also contains highly variable regions, it is usually possible to identify regions of 20-30 bases that are unique to a particular species. While analyzing sequence differences between the rRNAs of different organisms, this approach is extremely narrow in that it looks at no other differences between organisms.

Generally, identification of an organism (e.g. bacteria) is based on its overall morphological and biochemical patterns observed in culture. However, numerous organisms associated with disease may not be cultured in vitro. Indeed, some do not grow well in traditional in vivo culture systems, such as cell cultures or embryonated eggs. Nonetheless, their detection and identification is crucial for the appropriate treatment of affected individuals. Genetic testing methods have proven useful for the classification and identification of such organisms. For example, universal ribosomal primers designed to hybridize to and amplify all bacterial rRNA may be used to detect bacteria in any sterile body site (e.g., synovial fluid). Once detected, the organism may then be identified by sequencing and/or amplification methods, and comparing the results with those obtained from known organisms. While this method has led to the identification and classification of various organisms that were historically not cultivable, it is again limited in its focus on rRNA.

The present invention can be used to identify genes unique to a particular species, subspecies or strain. Unlike the above-described currently used genetic approaches, the __.

present invention is not limited to any particular genes or gene sequences (e.g. rRNA sequences).

With regard to distinguishing different species, in one embodiment, the present invention contemplates comparing the expressed genes of two samples suspected to be different species. In another embodiment, a species that is suspected to have changed or diverged from the parent species is compared with the parent species. For example, a species or strain of bacteria may develop a different susceptibilities to a drug (e.g. antibiotics) as compared to the parent species: rapid identification of the specific species or subspecies aids diagnosis and allows initiation of appropriate treatment.

EXPERIMENTAL

The following examples serve to illustrate certain preferred embodiments and aspects of the present invention and are not to be construed as limiting the scope thereof.

In the experimental disclosure which follows, the following abbreviations apply: eq ( equivalents); M (Molar): μM (micromolar); N (Normal); mol (moles): mmol (millimoles); μmol (micromoles); nmol (nanomoles); gm (grams); mg (milligrams); μg (micrograms); L ( liters); ml (milliliters); μl (microliters); cm (centimeters); mm (millimeters); μm (micrometers); nm (nanometers); °C (degrees Centigrade); Ci (Curies); MW (molecular weight); OD (optical density); EDTA (ethylenediamine-tetracetic acid); PAGE (polyacrylamide gel electrophoresis); UV (ultraviolet); V (volts): W (watts); mA (milliamps); bp (base pair): CPM (counts per minute).

The present invention contemplates preparation of RNA. While a variety of preparation schemes can be used successfully with the present invention, in the experiments below, the total RNA and mRNA were either purchased commercially (Clontech, Palo Alto, CA) or prepared according to standard protocols. However, to remove any contaminating genomic DNA, all RNA preparations were digested with RNAse-free DNAse-1 (RQ-1 DNAse, Promega, Madison, WI), extracted with phenol-Chloroform (Sigma Chemical Company. St. Louis, MO) and precipitated with ethanol.

The cDNA used for the PCR reaction can be made in a variety of ways. However, in the examples below, single stranded cDNA (sscDNA) was synthesized using 1 μg of total

RNA or 100 μg of mRNA with random primers according to the instructions supplied with a commercially available kit (Superscript, BRL-GIBCO, Gaithersburg. MD). At the end of the synthesis reaction, the reverse transcriptase enzyme was killed by heating at 94°C for 15 min. __.

To calculate the amount of synthesized cDNA, an aliquot of the cDNA synthesis mixture was mixed with labelled dCTP (Amersham, Arlington Heights, IL) and the yield was determined by standard methods. The cDNAs were diluted to a final concentration of lng/μl.

PCR conditions can vary depending on desired outcome. Nonetheless, unless otherwise indicated, the conditions used were as follows. First, the amount of cDNA used in each PCR amplification reaction was empirically determined; 2-5 ng of sscDNA give satisfactory results. Second, the PCR reactions were setup in precooled 0.2ml thin-walled tubes on ice and contained, 50mM TrisHCl (pH 8.5), 50mM KC1, 1.5 mM MgCl₂, ImM of each dNTP. 2-5 ng of sscDNA, lOpmoles of a K-primer, lOpmoles of an RE-primer, 0.5 μl of a α-P³³dCTP (10 μCi/μl. Amersham) and water to 20 μl.

The mixture can be subjected to PCR cycles in different ways. In one embodiment, the first cycle (or even the first few cycles) involves a lower annealing temperature than the annealing temperature in subsequent cycles. For example, an annealing temperature of between approximately 34°C and approximately 44°C, and more preferrably between approximately 36°C and approximately 40°C, and most preferrably approximately 38°C (for approximately 30 seconds), can be used for the first cycle (or even the first few cycles). The subsequent cycles of denaturation, annealing and extension can involve a higher temperature. For example, in one embodiment, there are approximately 25-35 subsequent cycles of denaturation (approximately 94°C for approximately 30 seconds), annealing, and extension (72"C for 1 min), wherein the annealing temperature is between approximately 40°C and approximately 60"C, more preferrably between approximately 44°C and approximately 54°C, and most preferrably approximately 48°C (for approximately 30 sec).

In another embodiment, the annealing temperature is approximately the same temperature for all cycles. For example, the above-described mixture is subjected to 35 cycles of denaturation. annealing and extension, wherein the annealing temperature is between approximately 38°C and approximately 40°C (for approximately 30 seconds).

Cycling is done using a Perkin-Elmer System 2400 Thermal Cycler (Perkin-Elmer. Norwalk. CT). PCR amplifications were performed with subsets of K-primers in combination with different RE-primers. Finally, the PCR products were analyzed by high resolution polyacrylamide gel electrophoresis using 6% sequencing-grade gels (BRL) and the amplified

DNA fragments were visualized by autoradiography using BioMaxMR film (Kodak, Rochester. NY). EXAMPLE 1

This example describes the generation of PCR product from several human cell types using one embodiment of the method of the present invention. Total RNA (available commercially from Clontech) was used from four human cell types: 1) the K562 tumor cell line. 2) placental tissue. 3) spleen cells, and 4) thymus cells (in the first, second, third and fourth lane of each four lane group in Figures 3A and 3B). The total RNA was reverse transcribed using 6-mer random primers (available from Pharmacia). The resultant cDNA was subjected to thirty-five cycles of PCR (in the presence of a radioactive precursor) using a mixture of two anchor primers ("K2" for Figure 3A and "K3" for Figure 3B) and restriction enzyme-based primers [for this experiment, the recognition sequence on the 5' end of the RE downstream primers was for EcoRI; the primer sequences were: GAATTCNNNGT(A or C)(G or T)AC (lanes 1-4); GAATTCNNNCGGC (lanes 5-8); GAATTCNNN(A or G)GCGC(C or T) (lanes 9-12); GAATTCNNNTTAA (lanes 13-16)]. The PCR products were analyzed by PAGE using 6% sequencing gels (BRL) and visualized by autoradiography. The results show a large number of bands (see Figures 3A and 3B). Importantly, there is differential expression of transcripts in the various cell types.

EXAMPLE 2

This example describes the generation of PCR product from several bacterial species using one embodiment of the method of the present invention. Bacterial DNA was prepared by standard methods. 10-50 ng of genomic DNA from E coli and P. stuarti (in the first and second lane, respectively, of each two lane group in Figure 4) was subjected to thirty-five cycles of PCR (in the presence of a radioactive precursor) using a mixture of anchor primers (SD-primers: 5'- GGAATTCNNN-TAAGGAGG-3') and restriction enzyme-based primers (RE-primers: 5'- GGATTC-CNNNGATC (this "Mbol primer" was used in lanes 1 and 2 of

Figure 4); GGATTCCNNNCTAG (this "Bfal primer" was used in lanes 3 and 4 of Figure 4); GGATTCCNNNCCGC (this "Acil primer" was used in lanes 5 and 6 in Figure 4); GGATTCCNNNCCGG (this "Hpall primer" was used in lanes 7 and 8 in Figure 4); and GGATTCCNNNAATT (this "Tsp509I primer" was used in lanes 9 and 10 in Figure 4). The PCR products were analyzed by PAGE using 6% sequencing gels (BRL) and visualized by autoradiography.

The results show a large number of bands (Figure 4). Importantly, there is differential expression of transcripts in the different species. For example, there are clearly DNA fragments that are associated with E. co li. that are not found in P. stuarti. Such bands are markers for identification.

EXAMPLE 3 This example describes the cloning and sequencing of expressed transcripts. Briefly,

DNA bands representing differently expressed transcripts (see single and double arrows) were identified by visual scanning of the autoradiograph and marked (Figure 5. which represents a different exposure of the experiment run in Figure 3A). The film was then used as a template and the marked bands were cut out and eluted in water, precipitated with 0.3M sodium acetate. pH 6.0. and 2.5 vol of ethanol, pelleted by centrifugation (12,000 x g, 20 min), washed 2X with 70% ethanol, air dried and dissolved in 10 μl of nuclease free water. Half of the sample was then used for reamplification using the same primer combination and PCR conditions. Amplified material was resolved on a 2% agarose gel and the size of the amplified fragments was determined with reference to DNA size standards ( 100 bp ladder. BRL) and the amplified DNA fragments were gel purified using a commercially available kit

(Qaquick. Qiagen. Los Angeles, CA). Amplified fragments were then cloned into a T-tailed vector using a commercially available kit (pGEM-T, Promega, Madison. WI) and the recombinants were identified by blue-white color selection. Positive clones were grown in LB medium (BRL) and plasmid minipreps were prepared (Qiagen) and sequenced (CWRU Molecular Biology Core Facility). Sequencing homology searches were performed at the

National Center for Biotechnology Information (NCBI) using BLAST network service.

One band (single arrow of Figure 5) was found to have a sequence corresponding to a human mitochonrial hinge protein (see Figure 6 for the partial nucleic acid sequence). The other band (double arrow of Figure 5) was found to have a sequence corresponding to a human coactivator gene (Figure 7 shows the partial nucleic acid sequence).

EXAMPLE 4

This example describes the comparison of normal and malignant tissues. A variety of cell types were studied: 1) normal human keratinocytes, 2) normal human skin, and 3-5) three squamous cell carcinoma samples from patients (in the first, second, third, fourth and fifth lane of each five lane group in Figure 8). The total RNA was reverse transcribed using 6- mer random primers (available from Pharmacia). The resultant cDNA was subjected to thirty-five cycles (all cycles were performed using annealing temperatures between 38 and 42 degrees) of PCR (in the presence of a radioactive precursor) using a mixture of two anchor primers ("Kl") and restriction enzyme-based primers (RE-primers: 5^*- GGATTCCNNNGATC (this "Mbo I primer" was used in the reactions represented by lanes 1 through 5 of Figure 8); GGATTCCNNNCTAG (this "Bfal primer" was used in the reactions represented by lanes 6 through 10 of Figure 8); GGATTCCNNNCCGC (this "Acil primer" was used in reactions represented by lanes 11 through 15 in Figure 8); GGATTCCNNNCCGG (this "Hpall primer" was used in reactions represented by lanes 16 through 20 in Figure 8); and GGATTCCNNNAATT (this "Tsp509I primer" was used in the reactions represented in lanes 21 through 25 in Figure 8). The PCR products were analyzed by PAGE using 6% sequencing gels (BRL) and visualized by autoradiography.

The results show a large number of bands (Figure 8). Importantly, there is differential expression of transcripts in the different species. For example, there are clearly DNA fragments that are associated with normal cells that are not found in the lanes representing cancer cells. There are also DNA fragments that are expressed at much higher levels in cancer cells than in normal cells. These are useful markers for cancer identification.

From the above it should be evident that the present invention provides a convenient method for distinguishing between the expression of genes in two or more biological samples. Importantly, the method also promotes followup analysis once a gene of interest is indentified.

Claims

CLAIMSWe claim:

1. A method of analyzing nucleic acid in a sample, comprising: a) providing: i) a sample containing nucleic acid, ii) a first primer having a sequence of which at least a portion is at least partially complementary to a natural common non-coding sequence on a portion of said nucleic acid of said sample. iii) a second primer having a sequence of which at least a portion is at least partially complementary to a restriction enzyme recognition sequence present on a portion of said nucleic acid of said sample, and iv) a polymerase and PCR reagents; b) preparing said nucleic acid from said sample under conditions so as to produce amplifiable nucleic acid; c) amplifying said nucleic acid with said first and second primers, said polymerase and said PCR reagents under conditions such that amplified product is generated; d) detecting said amplified product.

2. The method of Claim 1, wherein said sample comprises eukaryotic cells and said natural common sequence is the Kozak sequence.

3. The method of Claim 1, wherein said sample comprises prokaryotic cells and said natural common sequence is the Shine-Dalgarno sequence.

4. The method of Claim 1, wherein said detecting comprises gel electrophoresis. __

5. A method of analyzing expressed genes in biological samples, comprising: a) providing: i) two samples containing mRNA. ii) a first primer having a sequence of which at least a portion is at least partially complementary to a natural common non-coding sequence on at least a portion of said mRNA of said two samples, iii) a second primer having a sequence of which at least a portion is at least partially complementary to a restriction enzyme recognition sequence present on a portion of said mRNA of said two samples, and iv) a polymerase and PCR reagents; b) treating said mRNA of each of said two samples under conditions so as to produce amplifiable DNA from each sample; c) amplifying said DNA from each sample with said first and second primers, said polymerase and said PCR reagents under conditions such that amplified product is generated from each of said two samples: d) detecting said amplified product.

6. The method of Claim 5, wherein each of said two samples comprise eukaryotic cells and said natural common sequence is the Kozak sequence.

7. The method of Claim 5. wherein each of said two samples comprise prokaryotic cells and said natural common sequence is the Shine-Dalgarno sequence.

8. The method of Claim 7, wherein said two samples comprises bacterial cells of different species.

9. The method of Claim 5, wherein said detecting comprises gel electrophoresis.

10. A method of analyzing expressed genes in a multiple samples, comprising: a) providing: i) at least two samples containing mRNA, ii) random primers, iii) reverse transcriptase, __.

iv) a first primer having a sequence of which at least a portion is at least partially complementary to a natural common non-coding sequence on a portion of said mRNA of said samples, v) a second primer having a sequence of which at least a portion is at least partially complementary to a restriction enzyme recognition sequence present on a portion of said mRNA of said samples, and vi) a polymerase and PCR reagents; b) extracting mRNA from each of said samples and reverse transcribing said mRNA with said reverse transcriptase and said random primers under conditions such that cDNA is produced; c) amplifying said cDNA from each sample with said first and second primers, said polymerase and said PCR reagents under conditions such that amplified product is generated from each of said samples: d) detecting said amplified product.

1 1. The method of Claim 10, wherein said samples comprise eukaryotic cells and said natural common sequence is the Kozak sequence.

12. The method of Claim 11, wherein a portion of said samples comprise human cancer cells.

13. The method of Claim 10, wherein said sample comprise prokaryotic cells and said natural common sequence is the Shine-Dalgarno sequence.

14. The method of Claim 10, wherein said analyzing means comprises gel electrophoresis.

15. A kit, comprising: i) a first primer having a sequence of which at least a portion is at least partially complementary to a natural common non-coding sequence, and ii) a second primer having a sequence of which at least a portion is at least partially complementary to a restriction enzyme recognition sequence.

16. The kit of Claim 15, wherein said natural common sequence is the Kozak sequence.

17. The kit of Claim 15, wherein said natural common sequence is the Shine- Dalgarno sequence.

18. The kit of Claim 15, wherein said restriction enzyme recognition sequence is selected from the group consisting of the sequences set forth in Table 1.

19. The kit of Claim 15, wherein said first primer is of the general formula:

S XN _LKJX-N _{L KJ}ATGN _HO-S ' .wherein N is A, T,G, C or nothing, and wherein X is the recognition sequence for a restriction enzyme or nothing.

20. The kit of Claim 15, wherein said second primer is of the general formula: 5^"-N ₀-X-N_|.10TAAGGAGGN_MO-3\ where X is a recognition sequence or nothing, and where N is A. T, G, C or nothing.