WO1997015690A1 - Method and apparatus for identifying, classifying, or quantifying dna sequences in a sample without sequencing - Google Patents
Method and apparatus for identifying, classifying, or quantifying dna sequences in a sample without sequencing Download PDFInfo
- Publication number
- WO1997015690A1 WO1997015690A1 PCT/US1996/017159 US9617159W WO9715690A1 WO 1997015690 A1 WO1997015690 A1 WO 1997015690A1 US 9617159 W US9617159 W US 9617159W WO 9715690 A1 WO9715690 A1 WO 9715690A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- fragments
- database
- sequence
- dna
- sample
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6853—Nucleic acid amplification reactions using modified primers or templates
- C12Q1/6855—Ligating adaptors
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6809—Methods for determination or identification of nucleic acids involving differential detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/10—Signal processing, e.g. from mass spectrometry [MS] or from PCR
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S977/00—Nanotechnology
- Y10S977/70—Nanostructure
- Y10S977/724—Devices having flexible or movable element
- Y10S977/727—Devices having flexible or movable element formed from biological material
- Y10S977/728—Nucleic acids, e.g. DNA or RNA
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S977/00—Nanotechnology
- Y10S977/902—Specified use of nanostructure
- Y10S977/904—Specified use of nanostructure for medical, immunological, body treatment, or diagnosis
- Y10S977/924—Specified use of nanostructure for medical, immunological, body treatment, or diagnosis using nanostructure as support of dna analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10T—TECHNICAL SUBJECTS COVERED BY FORMER US CLASSIFICATION
- Y10T436/00—Chemistry: analytical and immunological testing
- Y10T436/14—Heterocyclic carbon compound [i.e., O, S, N, Se, Te, as only ring hetero atom]
- Y10T436/142222—Hetero-O [e.g., ascorbic acid, etc.]
- Y10T436/143333—Saccharide [e.g., DNA, etc.]
Definitions
- quantification more particularly it is the quantitative classification, comparison of expression, or identification of preferably all DNA sequences or genes in a sample without performing any sequencing.
- hereditary disorders such as the thalassemias
- Genomic DNA sequences are those naturally occurring DNA sequences constituting the genome of a cell.
- the state of gene, or gDNA, expression at any time is represented by the composition of total cellular messenger RNA ("mRNA”), which is synthesized by the regulated transcription of gDNA.
- mRNA total cellular messenger RNA
- cDNA Complementary DNA
- gene specific DNA analysis techniques have not been directed to the determination or classification of substantially all genes in a DNA sample representing total cellular mRNA and have required some degree of sequencing.
- existing cDNA, and also gDNA, analysis techniques have been directed to the determination and analysis of one or two known or unknown genetic sequences at one time. These techniques have used probes synthesized to specifically recognize by hybridization only one particular DNA sequence or gene. (See, e .g. , Watson et al., 1992, Recombinant DNA. chap 7, W. H. Freeman, New York.) Further, adaptation of these methods to the problem of recognizing all sequences in a sample would be cumbersome and uneconomical.
- One existing method for finding and sequencing unknown genes starts from an arrayed cDNA library. From a particular tissue or specimen, mRNA is isolated and cloned into an appropriate vector, which is then plated in a manner so that the progeny of individual vectors bearing the clone of one cDNA sequence can be separately identified. A replica of such a plate is then probed, often with a labeled DNA oligomer selected to hybridize with the cDNA representing the gene of interest. Thereby, those colonies bearing the cDNA of interest are found and isolated, the cDNA harvested and subject to sequencing. Sequencing can then be done by the Sanger dideoxy chain termination method (Sanger et al., 1977, "DNA sequencing with chain terminating inhibitors", Proc. Natl. Acad. Sci. USA 74(12): 5463-5467) applied to inserts so isolated.
- the DNA oligomer probes for the unknown gene used for colony selection are synthesized to hybridize
- cDNA for the gene of interest.
- One manner of achieving this specificity is to start with the protein product of the gene of interest. If a partial sequence of 5 to 10-mer peptide fragment from an active region of this protein can be determined, corresponding 15 to 30-mer degenerate oligonucleotides can be synthesized which code for this peptide. This collection of degenerate
- oligonucleotides will typically be sufficient to uniquely identify the corresponding gene. Similarly, any information leading to 15 to 30 long nucleotide subsequences can be used to create a single gene probe.
- Another existing method which searches for a known gene in a cDNA or gDNA prepared from a tissue sample, also uses single gene or single sequence probes which are
- the expression of a particular oncogene in sample can be determined by probing tissue derived cDNA with a probe derived from a subsequence of the oncogene's expressed sequence tag.
- the presence of a rare or difficult to culture pathogen such as the TB bacillus or the HIV, can be determined by probing gDNA with a hybridization probe specific to a gene of the pathogen.
- the heterozygous presence of a mutant allele in a phenotypically normal individual, or its homozygous presence in a fetus can be determined by probing with an allele specific probe complementary only to the mutant allele (See, e .g. , Guo et al., 1994, Nucleic Acid Research, 22:5456-65).
- oligomer sequence signatures (Lennon et al., 1991, Trends In Genetics 7(10) : 314-317). This technique classifies a single clone based on the pattern of probe hits against an entire combinatorial library, or a significant sub-library. It requires that the tissue sample library be arrayed into clones, each clone comprising only one pure sequence from the library. It cannot be applied to mixtures.
- differential display In contrast to the prior exemplary existing gene determination and classification techniques, another existing technique, known as differential display, attempts to
- PCR to amplify DNA subsequences of various lengths, which are defined by being between the hybridization sites of arbitrarily selected primers.
- the pattern of lengths observed is characteristic of the tissue from which the library was prepared.
- one primer used in differential display is oligo(dT) and the other is one or more arbitrary oligonucleotides designed to hybridize within a few hundred base pairs of the poly-dA tail of a cDNA in the library.
- amplified fragments of lengths up to a few hundred base pairs should generate bands characteristic and distinctive of the sample. Changes in tissue gene expression may be observed as changes in one or more bands.
- This object is realized by generating a plurality of distinctive and detectable signals from the DNA sequences in the sample being analyzed. Preferably, all the signals taken together have sufficient discrimination and resolution so that each particular DNA sequence in a sample may be individually classified by the particular signals it generates, and with reference to a database of DNA sequences possible in the sample, individually determined.
- the intensity of the signals indicative of a particular DNA sequence depends quantitatively on the amount of that DNA present.
- the signals together can classify a
- each recognition reaction generates a large number of or a distinctive pattern of distinguishable signals, which are quantitatively
- the signals are preferably detected and measured with a minimum number of observations, which are preferably capable of simultaneous performance.
- the signals are preferably optical, generated by fluorochrome labels and detected by automated optical detection technologies. Using these methods, multiple individually labeled moieties can be discriminated even though they are in the same filter spot or gel band. This permits multiplexing reactions and parallelizing signal detection.
- the invention is easily adaptable to other labeling systems, for example, silver staining of gels.
- any single molecule detection system whether optical or by some other technology such as scanning or tunneling microscopy, would be highly advantageous for use according to this invention as it would greatly improve quantitative characteristics.
- signals are generated by detecting the presence (hereinafter called “hits”) or absence of short DNA subsequences (hereinafter called “hits”) or absence of short DNA subsequences (hereinafter called “hits”) or absence of short DNA subsequences (hereinafter called “hits”) or absence of short DNA subsequences (hereinafter called “hits”) or absence of short DNA subsequences (hereinafter called “hits”) or absence of short DNA subsequences).
- target subsequences within a nucleic acid sequence of the sample to be analyzed. The presence or absence of a
- subsequence is detected by use of recognition means, or probes, for the subsequence.
- the subsequences are recognized by recognition means of several sorts, including but not limited to restriction endonucleases (“REs”), DNA oligomers, and PNA oligomers.
- REs recognize their specific subsequences by cleavage thereof; DNA and PNA oligomers recognize their specific subsequences by hybridization methods.
- This length representation can be corrected to true physical length in base pairs upon removing experimental biases and errors of the length separation and detection means.
- An alternative embodiment detects only the pattern of hits in an array of clones, each containing a single sequence ("single sequence clones"). The generated signals are then analyzed together with DNA sequence information stored in sequence databases in computer implemented experimental analysis methods of this invention to identify individual genes and their quantitative presence in the sample.
- target subsequences are chosen by further computer implemented experimental design methods of this invention such that their presence or absence and their relative distances when present yield a maximum amount of information for classifying or determining the DNA sequences to be analyzed. Thereby it is possible to have orders of magnitude fewer probes than there are DNA sequences to be analyzed, and it is further possible to have considerably fewer probes than would be present in combinatorial libraries of the same length as the probes used in this invention. For each embodiment, target subsequences have a preferred
- the presence of one probe in a DNA sequence to be analyzed is independent of the presence of any other probe.
- target subsequences are chosen based on information in relevant DNA sequence databases that
- subsequences may be chosen to determine the expression of all genes in a tissue sample ("tissue mode"). Alternatively, a smaller number of target subsequences may be chosen to quantitatively classify or determine only one or a few sequences of genes of interest, for example oncogenes, tumor suppressor genes, growth factors, cell cycle genes,
- a preferred embodiment of the invention named quantitative expression analysis (“QEATM”), produces signals comprising target subsequence presence and a representation of the length in base pairs along a gene between adjacent target subsequences by measuring the results of recognition reactions on cDNA (or gDNA) mixtures.
- QEATM quantitative expression analysis
- this method does not require the cDNA be inserted into a vector to create individual clones in a library. Creation of these libraries is time consuming, costly, and introduces bias into the process, as it requires the cDNA in the vector to be transformed into bacteria, the bacteria arrayed as clonal colonies, and finally the growth of the individual transformed colonies.
- exemplary experimental methods are described herein for performing QEATM: a preferred method utilizing a novel RE/ligase/amplification procedure; a PCR based method; and a method utilizing a removal means, preferably biotin, for removal of unwanted DNA fragments.
- the preferred method generates precise, reproducible, noise free signatures for determining individual gene expression from DNA in mixtures or libraries and is uniquely adaptable to automation, since it does not require intermediate extractions or buffer exchanges.
- a computer implemented gene calling step uses the hit and length information measured in conjunction with a database of DNA sequences to determine which genes are present in the sample and the relative levels of expression. Signal intensities are used to determine relative amounts of sequences in the sample. Computer implemented design methods optimize the choice of the target subsequences.
- a second specific embodiment of the invention gathers only target subsequence presence information for all target subsequences for arrayed, individual single sequence clones in a library, with cDNA libraries being preferred.
- the target subsequences are carefully chosen according to computer implemented design methods of this invention to have a maximum information content and to be minimum in number. Preferably from 10-20 subsequences are sufficient to characterize the expressed cDNA in a tissue.
- preferable recognition means are PNAs.
- Degenerate sets of longer DNA oligomers having a common, short, shared, target sequence can also be used as a
- a computer implemented gene calling step uses the pattern of hits in conjunction with a database of DNA sequences to determine which genes are present in the sample and the relative levels of expression.
- the embodiments of this invention preferably generate measurements that are precise, reproducible, and free of noise.
- Measurement noise in QEATM is typically created by generation or amplification of unwanted DNA fragments, and special steps are preferably taken to avoid any such unwanted fragments.
- Measurement noise in colony calling is typically created by mis-hybridization of probes, or recognition means, to colonies. High stringency reaction conditions and DNA mimics with increased hybridization specificity may be used to minimize this noise.
- DNA mimics are polymers composed of subunits capable of specific,
- hybridization and melting on oligonucleotide arrays by using optical wave guides Proc. Natl . Acad. Sci . USA 92 : 6379- 6383.
- the hybridization surface forms one surface of a light pipe or optical wave guide, and the scattering induced by these aggregated particles causes light to leak from the light pipe.
- hybridization is revealed as an illuminated spot of leaking light on a dark background.
- This latter method makes hybridization detection more rapid by eliminating the need for a washing step between the hybridization and detection steps.
- multiple probe hybridizations can be detected from one colony.
- the embodiments of the invention can be adapted to automation by eliminating non-automatable steps, such as extractions or buffer exchanges.
- the embodiments of the invention facilitate efficient analysis by permitting multiple recognition means to be tested in one reaction and by utilizing multiple, distinguishable labeling of the recognition means, so that signals may be simultaneously detected and measured.
- this labeling is by multiple fluorochromes.
- detection is preferably done by the light scattering methods with variously sized and shaped particles.
- this invention achieves rapid and economical determination of quantitative gene expression in tissue or other samples, it has considerable medical and research utility.
- diseases are recognized to have important genetic components to their etiology and development, it is becoming increasingly useful to be able to assay the genetic makeup and expression of a tissue sample.
- the presence and expression of certain genes or their particular alleles are prognostic or risk factors for disease (including disorders).
- diseases are found among the
- neurodegenerative diseases such as Huntington's disease and ataxia-telangiectasia.
- Several cancers such as
- gene expression can now be linked to specific genetic defects.
- gene expression can also determine the presence and classification of those foreign pathogens that are difficult or impossible to culture in vitro but which nevertheless express their own unique genes.
- Disease progression is reflected in changes in genetic expression of an affected tissue.
- expression of particular tumor promoter genes and lack of expression of particular tumor suppressor genes is now known to correlate with the progression of certain tumors from normal tissue, to hyperplasia, to cancer in situ , and to metastatic cancer. Return of a cell population to a normal pattern of gene expression, such as by using anti-sense technology, can correlate with tumor regression. Therefore, knowledge of gene expression in a cancerous tissue can assist in staging and classifying this disease.
- Expression information can also be used to chose and guide therapy. Accurate disease classification and staging or grading using gene expression information can assist in choosing initial therapies that are increasingly more precisely tailored to the precise disease process occurring in the particular patient. Gene expression
- a therapy is favored that results in a regression towards normal of an abnormal pattern of gene expression in an individual, while therapy which has little effect on gene expression or its progression can need modification.
- Such monitoring is now useful for cancers and will become useful for an increasing number of other diseases, such as diabetes and obesity.
- tissue or other samples In biological research, rapid and economical assay for gene expression in tissue or other samples has numerous applications. Such applications include, but are not limited to, for example, in pathology examining tissue specific genetic response to disease, in embryology determining developmental changes in gene expression, in pharmacology assessing direct and indirect effects of drugs on gene expression.
- this invention can be applied, e . g. , to in vitro cell populations or cell lines, to in vivo animal models of disease or other processes, to human samples, to purified cell populations perhaps drawn from actual wild-type occurrences, and to tissue samples
- the cell or tissue sources can advantageously be a plant, a single celled animal, a multicellular animal, a bacterium, a virus, a fungus, or a yeast, etc.
- the animal can advantageously be laboratory animals used in research, such as mice engineered or bread to have certain genomes or disease conditions or tendencies.
- the in vitro cell populations or cell lines can be exposed to various exogenous factors to determine the effect of such factors on gene expression. Further, since an unknown signal pattern is indicative of an as yet unknown gene, this invention has important use for the discovery of new genes. In medical research, by way of further example, use of the methods of this invention allow correlating gene expression with the presence and progress of a disease and thereby provide new methods of diagnosis and new avenues of therapy which seek to directly alter gene expression.
- This invention includes various embodiments and aspects, several .of which are described below.
- the invention provides a method for identifying, classifying, or quantifying one or more nucleic acids in a sample comprising a plurality of nucleic acids having different nucleotide sequences, said method comprising probing said sample with one or more recognition means, each recognition means recognizing a different target nucleotide subsequence or a different set of target nucleotide subsequences; generating one or more signals from said sample probed by said recognition means, each generated signal arising from a nucleic acid in said sample and comprising a representation of (i) the length between occurrences of target subsequences in said nucleic acid and (ii) the identities of said target subsequences in said nucleic acid or the identities of said sets of target subsequences among which is included the target subsequences in said nucleic acid; and searching a nucleotide sequence database to determine sequences that match or the absence of any sequences that match said one or more generated signals
- This invention further provides in the first embodiment additional methods wherein each recognition means recognizes one target subsequence, and wherein a sequence from said database matches a generated signal v/hen the sequence from said database has both the same length between occurrences of target subsequences as is represented by the generated signal and the same target subsequences as
- each recognition means recognizes a set of target
- a sequence from said database matches a generated signal when the sequence from said database has both the same length between occurrences of target subsequences as is represented by the generated signal, and target subsequences that are members of the sets of target subsequences represented by the generated signal.
- This invention further provides in the first embodiment additional methods further comprising dividing said sample of nucleic acids into a plurality of portions and performing the methods of this object individually on a plurality of said portions, wherein a different one or more recognition means are used with each portion.
- This invention further provides in the first embodiment additional methods wherein the quantitative abundance of a nucleic acid comprising a particular nucleotide sequence in the sample is determined from the quantitative level of the one or more signals generated by said nucleic acid that are determined to match said
- This invention further provides in the first embodiment additional methods wherein said plurality of nucleic acids are DNA, and optionally wherein the DNA is cDNA, and optionally wherein the cDNA is prepared from a plant, an single celled animal, a multicellular animal, a bacterium, a virus, a fungus, or a yeast, and optionally wherein the cDNA is of total cellular RNA or total cellular poly (A) RNA.
- This invention further provides in the first embodiment additional methods wherein said database comprises substantially all the known expressed sequences of said plant, single celled animal, multicellular animal, bacterium, or yeast.
- This invention further provides in the first embodiment additional methods wherein the recognition means are one or more restriction endonucleases whose recognition sites are said target subsequences, and wherein the step of probing comprises digesting said sample with said one or more restriction endonucleases into fragments and ligating double stranded adapter DNA molecules to said fragments to produce ligated fragments, each said adapter DNA molecule comprising (i) a shorter stand having no 5' terminal phosphates and consisting of a first and second portion, said first portion at the 5' end of the shorter strand being complementary to the overhang produced by one of said restriction
- step of generating further comprises melting the shorter strand from the ligated
- fragments and amplifying the blunt-ended fragments by a method comprising contacting said blunt-ended fragments with a DNA polymerase and primer oligodeoxynucleotides, said primer oligodeoxynucleotides comprising the longer adapter strand, and said contacting being at a temperature not greater than the melting temperature of the primer
- oligodeoxynucleotide from a strand of the blunt-ended fragments complementary to the primer oligodeoxynucleotide and not less than the melting temperature of the shorter strand of the adapter nucleic acid from the blunt-ended fragments.
- This invention further provides in the first embodiment additional methods wherein the recognition means are one or more restriction endonucleases whose recognition sites are said target subsequences, and wherein the step of probing further comprises digesting the sample with said one or more restriction endonucleases.
- This invention further provides in the first embodiment additional methods further comprising identifying a fragment of a nucleic acid in the sample which generates said one or more signals; and recovering said fragment, and optionally wherein the signals generated by said recovered fragment do not match a sequence in said nucleotide sequence database, and optionally further comprising using at least a hybridizable portion of said fragment as a hybridization probe to bind to a nucleic acid that can generate said fragment upon digestion by said one or more restriction endonucleases.
- This invention further provides in the first embodiment additional methods wherein the step of generating further comprises after said digesting removing from the sample both nucleic acids which have not been digested and nucleic acid fragments resulting from digestion at only a single terminus of the fragments, and optionally wherein prior to digesting, the nucleic acids in the sample are each bound at one terminus to a biotin molecule or to a hapten molecule, and said removing is carried out by a method which comprises contacting the nucleic acids in the sample with streptavidin or avidin or with an anti-hapten antibody, respectively, affixed to a solid support.
- This invention further provides in the first embodiment additional methods wherein said digesting with said one or more restriction endonucleases leaves single-stranded nucleotide overhangs on the digested ends.
- step of probing further comprises hybridizing double-stranded adapter nucleic acids with the digested sample fragments, each said adapter nucleic acid having an end complementary to said overhang generated by a particular one of the one or more restriction endonucleases, and ligating with a ligase a strand of said adapter nucleic acids to the 5' end of a strand of the digested sample fragments to form ligated nucleic acid fragments-.
- This invention further provides in the first embodiment additional methods wherein said digesting with said one or more restriction endonucleases and said ligating are carried out in the same reaction medium, and optionally wherein said digesting and said ligating comprises incubating said reaction medium at a first temperature and then at a second temperature, in which said one or more restriction endonucleases are more active at the first temperature than the second temperature and said ligase is more active at the second temperature that the first temperature, or wherein said incubating at said first temperature and said incubating at said second temperature are performed repetitively.
- This invention further provides in the first embodiment additional methods wherein the step of probing further comprises prior to said digesting removing terminal phosphates from DNA in said sample by incubation with an alkaline phosphatase, and optionally wherein said alkaline phosphatase is heat labile and is heat inactivated prior to said digesting.
- This invention further provides in the first embodiment additional methods wherein said generating step comprises amplifying the ligated nucleic acid fragments, and optionally wherein said amplifying is carried out by use of a nucleic acid polymerase and primer nucleic acid strands, said primer nucleic acid strands being capable of priming nucleic acid synthesis by said polymerase, and optionally wherein the primer nucleic acid strands have a G+C content of between 40% and 60%.
- each said adapter nucleic acid has a shorter strand and a longer strand, the longer strand being ligated to the digested sample fragments
- said generating step comprises prior to said amplifying step the melting of the shorter strand from the ligated fragments, contacting the ligated fragments with a DNA polymerase, extending the ligated fragments by synthesis with the DNA polymerase to produce blunt-ended double stranded DNA fragments
- the primer nucleic acid strands comprise a hybridizable portion the sequence of said longer strands, or optionally comprise the sequence of said longer strands, each different primer nucleic acid strand priming amplification only of blunt ended double stranded DNA
- fragments that are produced after digestion by a particular restriction endonuclease are produced after digestion by a particular restriction endonuclease.
- endonuclease further comprises at the 3' end of and contiguous with the longer strand sequence the portion of the restriction endonuclease recognition site remaining on a nucleic acid fragment terminus after digestion by the
- each said primer specific for a particular restriction endonuclease further comprises at its 3' end one or more nucleotides 3' to and contiguous with the remaining portion of the restriction endonuclease recognition site, whereby the ligated nucleic acid fragment amplified is that comprising said remaining portion of said restriction endonuclease recognition site contiguous to said one or more additional nucleotides, and optionally such that said primers comprising a particular said one or more additional nucleotides can be
- This invention further provides in the first embodiment additional methods wherein during said amplifying step the primer nucleic acid strands are annealed to the ligated nucleic acid fragments at a temperature that is less than the melting temperature of the primer nucleic acid strands from strands complementary to the primer nucleic acid strands but greater than the melting temperature of the shorter adapter strands from the blunt-ended fragments.
- This invention further provides in the first embodiment additional methods wherein the recognition means are oligomers of nucleotides, nucleotide-mimics, or a combination of nucleotides and nucleotide-mimics, which are specifically hybridizable with the target subsequences, and optionally further provides additional methods wherein the step of generating comprises amplifying with a nucleic acid polymerase and with primers comprising said oligomers, whereby fragments of nucleic acids in the sample between hybridized oligomers are amplified.
- This invention further provides in the first embodiment additional methods wherein said signals further comprise a representation of whether an additional target subsequence is present on said nucleic acid in the sample between said occurrences of target subsequences, and
- said additional target subsequence is recognized by a method comprising contacting nucleic acids in the sample with oligomers of nucleotides, nucleotide-mimics, or mixed nucleotides and nucleotide-mimics, which are hybridizable with said additional target subsequence.
- This invention further provides in the first embodiment additional methods wherein the step of generating comprises suppressing said signals when an additional target subsequence is present on said nucleic acid in the sample between said occurrences of target subsequences, and
- step of generating comprises amplifying nucleic acids in the sample
- said additional target subsequence is recognized by a method comprising contacting nucleic acids in the sample with (a) oligomers of nucleotides, nucleotide-mimics, or mixed nucleotides and nucleotide-mimics, which hybridize with said additional target subsequence and disrupt the amplifying step; or (b) restriction endonucleases which have said additional target subsequence as a recognition site and digest the nucleic acids in the sample at the recognition site.
- This invention further provides in the first embodiment additional methods wherein the step of generating further comprises separating nucleic acid fragments by length, and optionally wherein the step of generating further comprises detecting said separated nucleic acid fragments, and optionally wherein said detecting is carried out by a method comprising staining said fragments with silver, labeling said fragments with a DNA intercalating dye, or detecting light emission from a fluorochrome label on said fragments.
- This invention further provides in the first embodiment additional methods wherein said separating is carried out by use of liquid chromatography, mass
- electrophoresis and optionally wherein said electrophoresis is carried out in a slab gel or capillary configuration using a denaturing or non-denaturing medium.
- This invention further provides in the first embodiment additional methods wherein a predetermined one or more nucleotide sequences in said database are of interest, and wherein the target subsequences are such that said sequences of interest generate at least one signal that is not generated by any other sequence likely to be present in the sample, and optionally wherein the nucleotide sequences of interest are a majority of sequences in said database.
- This invention further provides in the first embodiment additional methods wherein the target subsequences have a probability of occurrence in the nucleotide sequences in said database of from approximately 0.01 to approximately 0.30.
- This invention further provides in the first embodiment additional methods wherein the target subsequences are such that the majority of sequences in said database contain on average a sufficient number of occurrences of target subsequences in order to on average generate a signal that is not generated by any other nucleotide sequence in said database, and optionally wherein the number of pairs of target subsequences present on average in the majority of sequences in said database is no less than 3, and wherein the average number of signals generated from the sequences in said database is such that the average difference between lengths represented by the generated signals is greater than or equal to 1 base pair.
- This invention further provides in the first embodiment additional methods wherein the target subsequences are selected according to the further steps comprising determining a pattern of signals that can be generated and the sequences capable of generating each such signal by simulating the steps of probing and generating applied to each sequences in said database of nucleotide sequences;
- choosing step selects target subsequences which comprise the recognition sites of the one or more restriction
- choosing step selects target subsequences which comprise the recognition sites of the one or more restriction endonucleases contiguous with one or more additional nucleotides.
- This invention further provides in the first embodiment additional methods wherein a predetermined one or more of the nucleotide sequences present in said database of nucleotide sequences are of interest, and the information measure optimized is the number of such said sequences of interest which generate at least one signal that is not generated by any other nucleotide sequence present in said database, and optionally wherein said nucleotide sequences of interest are a majority of the nucleotide sequences present in said database.
- step of choosing target subsequences is by a method comprising simulated annealing.
- step of searching further comprises determining a pattern of signals that can be generated and the sequences capable of generating each such signal by simulating the steps of probing and generating applied to each sequence in said database of nucleotide sequences; and finding the one or more nucleotide sequences in said database that are able to generate said one or more generated signals by finding in said pattern those signals that comprise a representation of the (i) the same lengths between occurrences of target subsequences as is represented by the generated signal and (ii) the same target subsequences as is represented by the generated signal, or target
- This invention further provides in the first embodiment additional methods wherein the step of determining further comprises searching for occurrences of said target subsequences or sets of target subsequences in nucleotide sequences in said database of nucleotide sequences; finding the lengths between occurrences of said target subsequences or sets of target subsequences in the nucleotide sequences of said database; and forming the pattern of signals that can be generated from the sequences of said database in which the target subsequences were found to occur.
- This invention further provides in the first embodiment additional methods wherein said restriction endonucleases generate 5' overhangs at the terminus of digested fragments and wherein each double stranded adapter nucleic acid comprises a shorter nucleic acid strand
- This invention further provides in the first embodiment additional methods wherein said shorter strand has a melting temperature from a complementary strand of less than approximately 68°C, and has no terminal phosphate, and optionally wherein said shorter strand is approximately 12 nucleotides long.
- This invention further provides in the first embodiment additional methods wherein said longer strand has a melting temperature from a complementary strand of greater than approximately 68oC, is not complementary to any
- nucleotide sequence in said database has no terminal phosphate, and optionally wherein said ligated nucleic acid fragments do not contain a recognition site for any of said restriction endonucleases, and optionally wherein said longer strand is approximately 24 nucleotides long and has a G+C content between 40% and 60%.
- This invention further provides in the first embodiment additional methods wherein said one or more restriction endonucleases are heat inactivated before said ligating.
- This invention further provides in the first emoodiment additional methods wherein said restriction endonucleases generate 3' overhangs at the terminus of the digested fragments and wherein each double stranded adapter nucleic acid comprises a longer nucleic acid strand
- first and second contiguous portion consisting of a first and second contiguous portion, said first portion being a 3' end subsequence complementary to the overhang produced by one of said restriction endonucleases; and a shorter nucleic acid strand complementary to the 3' end of said second portion of the longer nucleic acid stand.
- This invention further provides in the first embodiment additional methods wherein said shorter strand has a melting temperature from said longer strand of less than approximately 68oC, and has no terminal phosphates, and optionally wherein said shorter strand is 12 base pairs long.
- This invention further provides in the first embodiment additional methods wherein said longer strand has a melting temperature from a complementary strand of greater than approximately 68 °C, is not complementary to any
- nucleotide sequence in said database has no terminal phosphate, and wherein said ligated nucleic acid fragments do not contain a recognition site for any of said restriction endonucleases, and optionally wherein said longer strand is 24 base pairs long and has a G+C content between 40% and 60%.
- the invention provides a method for identifying or classifying a nucleic acid
- each recognition means recognizing a target nucleotide subsequence or a set of target nucleotide subsequences, in order to generate a set of signals, each signal representing whether said target subsequence or one of said set of target subsequences is present or absent in said nucleic acid; and searching a nucleotide sequence database, said database comprising a plurality of known nucleotide sequences of nucleic acids that may be present in the sample, for sequences matching said generated set of signals, a sequence from said database matching a set of signals when the sequence from said database (i) comprises the same target subsequences as are represented as present, or comprises target subsequences that are members of the sets of target subsequences represented as present by the generated sets of signals and (ii) does not comprise the target subsequences represented as absent or that are members of the sets of target subsequences represented
- This invention further provides in the second embodiment additional methods wherein the step of probing generates quantitative signals of the numbers of occurrences of said target subsequences or of members of said set of target subsequences in said nucleic acid, and optionally wherein a sequence matches said generated set of signals when the sequence from said database comprises the same target subsequences with the same number of occurrences in said sequence as in the quantitative signals and does not comprise the target subsequences represented as absent or target subsequences within the sets of target subsequences
- This invention further provides in the second embodiment additional methods wherein said plurality of nucleic acids are DNA.
- This invention further provides in the second embodiment additional methods wherein the recognition means are detectably labeled oligomers of nucleotides, nucleotide- mimics, or combinations of nucleotides and nucleotide-mimics, and the step of probing comprises hybridizing said nucleic acid with said oligomers, and optionally wherein said
- the recognition means are oligomers of peptido-nucleic acids
- the recognition means are DNA oligomers, DNA oligomers comprising universal nucleotides, or sets of partially degenerate DNA oligomers.
- step of searching further comprises determining a pattern of sets of signals of the presence or absence of said target subsequences or said sets of target subsequences that can be generated and the sequences capable of generating each set of signals in said pattern by simulating the step of probing as applied to each sequence in said database of nucleotide sequences; and finding one or more nucleotide sequences that are capable of generating said generated set of signals by finding in said pattern those sets that match said generated set, where a set of signals from said pattern matches a generated set of signals when the set from said pattern (i) represents as present the same target subsequences as are represented as present or target subsequences that are members of the sets of target subsequences represented as present by the
- This invention further provides in the second embodiment additional methods wherein the target subsequences are selected according to the further steps comprising determining (i) a pattern of sets of signals representing the presence or absence of said target subsequences or of said sets of target subsequences that can be generated, and (ii) the sequences capable of generating each set of signals in said pattern by simulating the step of probing as applied to each sequence in said database of nucleotide sequences;
- This invention further provides in the second embodiment additional methods wherein the information measure is the number of sets of signals in the pattern which are capable of being generated by one or more sequences in said database, or optionally wherein the information measure is the number of sets of signals in the pattern which are capable of being generated by only one sequence in said database.
- This invention further provides in the second embodiment additional methods wherein said choosing step is by a method comprising exhaustive search of all combination of target subsequences of length less than approximately 10, or optionally wherein said choosing step is by a method comprising simulated annealing.
- This invention further provides in the second embodiment additional methods wherein the step of determining by simulating further comprises searching for the presence or absence of said target subsequences or sets of target subsequences in each nucleotide sequence in said database of nucleotide sequences; and forming the pattern of sets of signals that can be generated from said sequences in said database, and optionally where the step of searching is carried out by a string search, and optionally wherein the step of searching comprises counting the number of
- This invention further provides in the second embodiment additional methods wherein the target subsequences have a probability of occurrence in a nucleotide sequence in said database of nucleotide sequences of from 0.01 to 0.6, or optionally wherein the target subsequences are such that the presence of one target subsequence in a nucleotide sequence in said database of nucleotide sequences is substantially independent of the presence of any other target subsequence in the nucleotide sequence, or optionally wherein fewer than approximately 50 target subsequences are selected.
- the invention provides a programmable apparatus for analyzing signals comprising an inputting device for inputting one or more actual signals generated by probing a sample comprising a plurality of nucleic acids with recognition means, each recognition means recognizing a target nucleotide subsequence or a set of target nucleotide subsequences, said signals comprising a representation of (i) the length between occurrences of said target subsequences in a nucleic acid of said sample, and (ii) the identities of said target subsequences in said nucleic acid, or the identities of said sets of target subsequences among which is included the target subsequences in said nucleic acid; a searching device operatively coupled to said accepting device for searching a sequence in a nucleotide sequence database for occurrences of said target subsequences or target subsequences that are members of said sets of target subsequences, and for the length between such occurrence
- control device further comprises causing said searching device to search substantially all sequences in said database in order to determine a pattern of signals that can be generated by probing said sample with said recognition means, and wherein said control device further causes said comparing device to find any matches between said one or more actual signals and said pattern of signals, said one or more actual signals matching a signal in said pattern of signals when the signal from said pattern represents (i) the same length between occurrences of target subsequences as is represented by said one or more actual signals and (ii) the same target subsequences as is represented by said one or more actual signals or target subsequences that are members of the same sets of target subsequences represented by said one or more actual signals.
- sample of nucleic acids comprises cDNA from RNA of a cell or tissue type
- database comprises DNA sequences that are likely to be expressed by d cell or tissue type.
- This invention further provides in the third embodiment a computer readable memory that can be used to direct a programmable apparatus to function for analyzing signals according to steps comprising inputting one or more actual signals generated by probing a sample comprising a plurality of nucleic acids with recognition means, each recognition means recognizing a target nucleotide subsequence or a set of target nucleotide subsequences, said signals comprising a representation of (i) the length between
- nucleotide sequence database for occurrences of said target subsequences or target subsequences that are members of said sets of target subsequences, and for the length between such occurrences, said database comprising a plurality of known nucleotide sequences that may be present in said sample; matching said one or more actual signals and a sequence in said database when the sequence in said
- database has both (i) the same length between occurrences of target subsequences as is represented by said one or more actual signals and (ii) the same target subsequences as is represented by said one or more actual signals, or target subsequences that are members of the same sets of target subsequences as is represented by said one or more actual signals; and repetitively performing said searching and matching steps for the majority of sequences in the database and outputting those database sequences that match said one or more actual signals, or alternatively a computer readable memory for directing a programmable apparatus to function in the manner of the third object.
- the invention provides a programmable apparatus for selecting target subsequences comprising an initial selection device for selecting initial target subsequences or initial sets of target subsequences; a first control device; a search device operatively coupled to said initial selection device and to said first control device (i) for searching sequences in a nucleotide sequence database for occurrences of said initial target subsequences or occurrences of target subsequences that are members of said initial sets of target subsequences and for the length between such occurrences and (ii) for determining an initial pattern of signals that can be generated from said selected initial target subsequences or said initial sets of target subsequences, said database comprising a plurality of known nucleotide sequences, said signals comprising a
- ascertaining device operatively coupled to said searching device and to said first control device for ascertaining the value of said determined initial pattern according to an information measure; and wherein said first control device causes further target subsequences to be selected and causes the search device to determine a further pattern of signals and the ascertaining device to ascertain a further value of said information measure and accepts the further target subsequences when said further pattern optimizes said further value of said information measure.
- This invention further provides in the fourth object that a predetermined one or more of the sequences in said database are of interest, and wherein said ascertaining device ascertains the value of an information measure by counting the number of such sequences of interest which generate in said determined pattern at least one signal that is not generated by any other sequence in said database, and optionally that said one or more of the sequences of interest comprise substantially all the sequences in said database.
- said first control device selects further target subsequences of length less than approximately 10 and accepts the further target subsequences if said further value of said information measure is greater than the previous value.
- This invention further provides in the fourth embodiment that said first control device optimizes the value of said information measure according to a method comprising simulated annealing, wherein said first control device repeatedly selects further target subsequences and accepts the further target subsequences if said further value of said information measure is not decreased by greater than a probabilistic factor dependent on a simulated-temperature, and wherein said programmable apparatus further comprises a second control device operatively coupled to said first control device for decreasing said simulated-temperature as said first control device selects further target
- said probabilistic factor is an exponential function of the negative of the decrease in the information measure divided by said
- This invention further provides in the fourth embodiment that the database comprises a majority of known DNA sequences that are likely to be expressed by one or more cell types.
- a nucleotide sequence database searching a sequence in a nucleotide sequence database for occurrences of said initial target subsequences or occurrences of target subsequences that are members of said initial sets of target subsequences and for the length between such occurrences, said database comprising a
- the invention provides a programmable apparatus for displaying data comprising a selecting device for selecting target subsequences or sets of target subsequences, such that recognition means for
- recognizing said target subsequences or said sets of target subsequences can be used to generate signals by probing a sample comprising a plurality of nucleic acids, said signals comprising a representation of (i) the length between
- an inputting device for inputting one or more actual signals generated by probing said sample with said recognition means; an analyzing device for analyzing signals operatively coupled to said selecting and inputting devices that determines which sequences in a nucleotide sequence database can generate said actual signals when subject to said recognition means, said database comprising a plurality of known nucleotide sequences that may be present in said sample; an input/output device operatively coupled to said selecting, inputting, and analyzing devices that inputs user requests and controls the selecting device to select target subsequences or sets of target subsequences, controls the inputting device to accept actual signals, controls the analyzing device to find the sequences in said database that can generate said actual signals, and displays output comprising said actual signals and said sequences in said database that can generate said actual signals.
- This invention further provides in the fifth embodiment that said sample is a cDNA sample prepared from a tissue specimen, and the apparatus further comprises a storage device operatively coupled to the input/output device for storing indications of the origin of said tissue specimen and information concerning said tissue specimen, and wherein said indications can be displayed upon user input, and optionally that the indications and information concerning said tissue specimen comprises histological information comprising tissue images.
- This invention further provides in the fifth embodiment additional apparatus further comprising one or more instrument devices for probing said sample with said recognition means and for generating said actual signals; and a control device operatively coupled to said one or more instrument devices and to said input/output device for controlling the operation of said instrument devices, wherein said user can input control commands for control of said instrument devices and receive output concerning the status of said instrument devices, and optionally wherein one or more of said selecting, inputting, analyzing, and input/output devices are physically collocated with each other, or are physically spaced apart from each other and are connected by a communication medium for exchanges of commands and information.
- recognition means for recognizing said target subsequences or said sets of target subsequences can be used to generate signals by probing a sample comprising a plurality of nucleic acids, said signals comprising a representation of (i) the length between occurrences of said target subsequences in a nucleic acid of said sample and (ii) the identities of said target subsequences in said nucleic acid or the identities of said sets of target subsequences among which are included the target subsequences in said nucleic acid inputting one or more actual signals generated by probing said sample with said recognition means analyzing said one or more actual signals to determine which sequences in a nucleotide sequence database can generate said actual signals when subject to said recognition means, said database comprising a plurality of known nucleotide sequences that may be present in said sample; and inputting user requests to control said selecting step to select target subsequences or sets
- said inputting step to input actual signals
- said analyzing step to find the sequences in said
- the invention provides a method for identifying, classifying, or quantifying DNA molecules in a sample of DNA molecules having a plurality of different nucleotide sequences, the method comprising the steps of digesting said sample with one or more restriction endonucleases, each said restriction endonuclease recognizing a subsequence recognition site and digesting DNA at said recognition site to produce fragments with 5' overhangs;
- each said shorter oligodeoxynucleotide hybridizable with a said 5' overhang and having no terminal phosphates, each said longer oligodeoxynucleotide
- each said primer oligodeoxynucleotide having a sequence comprising that of one of the longer oligodeoxynucleotides; determining the length of the
- said database comprising a plurality of known DNA sequences that may be present in the sample, for sequences matching one or more of said fragments of determined length, a sequence from said database matching a fragment of
- This invention further provides in the sixth embodiment additional methods wherein the sequence of each primer oligodeoxynucleotide further comprises 3* to and contiguous with the sequence of the longer
- oligodeoxynucleotide the portion of the recognition site of said one or more restriction endonucleases remaining on a DNA fragment terminus after digestion, said remaining portion being 5' to and contiguous with one or more additional nucleotides, and wherein a sequence from said database matches a fragment of determined length when the sequence from said database comprises subsequences that are the recognition sites of said one or more restriction
- endonucleases contiguous with said one or more additional nucleotides and when the subsequences are spaced apart by the determined length.
- This invention further provides in the sixth embodiment additional methods wherein said determining step further comprises detecting the amplified DNA fragments by a method comprising staining said fragments with silver.
- oligodeoxynucleotide primers are detectably labeled, wherein the determining step further comprises detection of said detectable labels, and wherein a sequence from said database matches a fragment of determined length when the sequence from said database comprises recognition sites of the one or more restriction endonucleases, said recognition sites being identified by the detectable labels of said
- determining step further comprises detecting the amplified DNA fragments by a method comprising labeling said fragments with a DNA intercalating dye or detecting light emission from a fluorochrome label on said fragments.
- This invention further provides in the sixth embodiment additional steps further comprising, prior to said determining step, the step of hybridizing the amplified DNA fragments with a detectably labeled oligodeoxynucleotide complementary to a subsequence, said subsequence differing from said recognition sites of said one or more restriction endonucleases, wherein the determining step further comprises detecting said detectable label of said oligodeoxynucleotide, and wherein a sequence from said database matches a fragment of determined length when the sequence from said database further comprises said subsequence between the recognition sites of said one or more restriction endonucleases.
- Acc56I and HindIII Acc65I and NgoMI
- BamHI and EcoRI BglII and HindIII
- BglII and NgoMI BsiWI and BspHI
- BspHI and BstYI BspHI and NgoMI
- BsrGI and EcoRI EagI and EcoRI
- EagI and HindIII EagI and Ncol
- HindIII and NgoMI NgoMI and Nhel
- NgoMI and SpeI BglII and BspHI, Bsp120I and NcoI
- BssHII and NgoMI EcoRI and HindIII
- NgoMI and XbaI or wherein the step of ligating is performed with T4 DNA ligase.
- This invention further provides in the sixth embodiment additional methods wherein the steps of digesting, contacting, and ligating are performed simultaneously in the same reaction vessel, or optionally wherein the steps of digesting, contacting, ligating, extending, and amplifying are performed in the same reaction vessel.
- This invention further provides in the sixth embodiment additional methods wherein the step of determining the length is performed by electrophoresis.
- digesting with said one or more restriction endonucleases contacting, ligating, extending, amplifying, and determining applied to each sequence in said DNA database; and finding the sequences that are capable of generating said one or more fragments of determined length by finding in said pattern one or more fragments that have the same length and recognition sites as said one or more fragments of determined length.
- This invention further provides in the sixth embodiment additional methods wherein the steps of digesting and ligating go substantially to completion.
- the DNA sample is cDNA prepared from mRNA, and optionally wherein the DNA is of RNA from a tissue or a cell type derived from a plant, a single celled animal, a multicellular animal, a bacterium, a virus, a fungus, a yeast, or a mammal, and optionally wherein the mammal is a human, and optionally wherein the mammal is a human having or suspected of having a diseased condition, and optionally wherein the diseased condition is a malignancy.
- this invention provides additional methods for identifying, classifying, or
- each said restriction endonuclease recognizing a subsequence recognition site and digesting DNA to produce fragments with 3' overhangs; contacting said fragments with shorter and longer oligodeoxynucleotides, each said longer oligodeoxynucleotide consisting of a first and second contiguous portion, said first portion being a 3' end subsequence complementary to the overhang produced by one of said restriction endonucleases, each said shorter
- DNA sequence database comprising a plurality of known DNA sequences that may be present in the sample, for sequences matching one or more of said fragments of determined length, a sequence from said database matching a fragment of determined length when the sequence from said database comprises recognition sites of said one or more restriction endonucleases spaced apart by the determined length, whereby DNA sequences in said sample are identified, classified, or quantified.
- this invention provides additional methods of detecting one or more differentially expressed genes in an in vitro cell exposed to an exogenous factor relative to an in vitro cell not exposed to said exogenous factor comprising performing the methods the first embodiment of this invention wherein said plurality of nucleic acids comprises cDNA of RNA of said in vitro cell exposed to said exogenous factor; performing the methods of the first embodiment of this invention wherein said plurality of nucleic acids comprises cDNA of RNA of said in vitro cell not exposed to said exogenous factor; and comparing the identified, classified, or quantified cDNA of said in vitro cell exposed to said exogenous factor with the identified, classified, or quantified cDNA of said in vitro cell not exposed to said exogenous factor, whereby differentially expressed genes are identified, classified, or quantified.
- this invention provides additional methods of detecting one or more differentially expressed genes in a diseased tissue relative to a tissue not having said disease comprising performing the methods of the first embodiment of this invention wherein said plurality of nucleic acids comprises cDNA of RNA of said diseased tissue such that one or more cDNA molecules are identified,
- said plurality of nucleic acids comprises cDNA of RNA of said tissue not having said disease such that one or more cDNA molecules are
- finding cDNA molecules which are reproducibly expressed and said significant differences in expression of said cDNA molecules in said diseased tissue and in said tissue not having the disease are determined by a method comprising applying statistical measures
- This invention further provides in the ninth embodiment additional methods wherein the diseased tissue and the tissue not having the disease are from one or more mammals, and optionally, wherein the disease is a malignancy, and optionally wherein the disease is a malignancy selected from the group consisting of prostrate cancer, breast cancer, colon cancer, lung cancer, skin cancer, lymphoma, and
- This invention further provides in the ninth embodiment additional methods wherein the disease is a malignancy and the tissue not having the disease has a premalignant character.
- this invention provides methods of staging or grading a disease in a human individual comprising performing the methods of the first embodiment of this invention in which said plurality of nucleic acids comprises cDNA of RNA prepared from a tissue from said human individual, said tissue having or suspected of having said disease, whereby one or more said cDNA molecules are
- this invention provides additional methods for predicting a human patient's response to therapy for a disease, comprising performing the methods of the first embodiment of this invention in which said plurality of nucleic acids comprises cDNA of RNA prepared from a tissue from said human patient, said tissue having or suspected of having said disease, whereby one or more cDNA molecules in said sample are identified, classified, and/or quantified; and ascertaining if the one or more cDNA
- molecules thereby identified, classified, and/or quantified correlates with a poor or a favorable response to one or more therapies, and optionally which further comprises selecting one or more therapies for said patient for which said
- identified, classified, and/or quantified cDNA molecules correlates with a favorable response.
- this invention provides additional methods for evaluating the efficacy of a therapy in a mammal having a disease, the method comprising
- identified, classified, and/or quantified cDNA molecules of said mammal subsequent to therapy and determining whether the response to therapy is favorable or unfavorable according to whether any differences in the one or more identified, classified, and/or quantified cDNA molecules after therapy are correlated with regression or progression, respectively, of the disease, and optionally wherein the mammal is a human.
- this invention provides a kit comprising one or more containers having one or more restriction endonucleases; one or more containers having one or more shorter oligodeoxynucleotide strands; one or more containers having one or more longer oligodeoxynucleotide strands hybridizable with said shorter strands, wherein either the longer or the shorter oligodeoxynucleotide strands each comprise a sequence complementary to an overhang
- said instructions comprising (i) digest said sample with said restriction endonucleases into fragments, each fragment being terminated on each end by a recognition site of said one or more restriction
- said database comprising a plurality of known nucleotide sequences of nucleic acids that may be present in the sample, a sequence from said database matching a generated signal when the sequence from said database has both (i) the same length between occurrences of said recognition sites of said one or more restriction endonucleases as is represented by the generated signal and (ii) the same recognition sites of said one of more restriction endonucleases as is represented by the generated signal.
- endonucleases generate 5' overhangs at the terminus of digested fragments, wherein each said shorter
- oligodeoxynucleotide strand consists of a first and second contiguous portion, said first portion being a 5' end
- each said longer oligodeoxynucleotide strand comprises a 3' end subsequence complementary to said second portion of said shorter
- each said longer oligodeoxynucleotide strand consists of a first and second contiguous portion, said first portion being a 3' end subsequence complementary to the overhang produced by one of said restriction endonucleases, and wherein each said shorter oligodeoxynucleotide strand is complementary to the 3' end of said second portion of said longer oligodeoxynucleotide stand.
- This invention further provides in the thirteenth embodiment a kit wherein said instructions further comprise those signals expected from one or more DNA molecules of interest when said sample is digested with a particular one or more restriction endonucleases selected from among said one or more restriction endonucleases in said kit, and optionally wherein said one or more DNA molecules of interest are cDNA molecules differentially expressed in a disease condition.
- This invention further provides in the thirteenth embodiment a kit wherein the restriction endonucleases are selected from the group consisting of Acc65I, AflII, AgeI, ApaLI, ApoI, AscI, AvrI, BamHI, BelI, BglII, BsiWI, Bsp120I, BspEI, BspHI, BsrGI, BssHII, BstYI, EagI, EcoRI, HindIII, MluI, NcoI, NgoMI, NheI, NotI, SpeI, and XbaI.
- the restriction endonucleases are selected from the group consisting of Acc65I, AflII, AgeI, ApaLI, ApoI, AscI, AvrI, BamHI, BelI, BglII, BsiWI, Bsp120I, BspEI, BspHI, BsrGI, BssHII, BstY
- This invention further provides in the thirteenth embodiment a kit further comprising the computer readable memory of claim 106, or optionally further comprising the computer readable memory of claim 114, or optionally further comprising the computer readable memory of claim 122.
- This invention further provides in the thirteenth embodiment a kit further comprising in a container a DNA ligase, or optionally further comprising in a container a phosphatase capable of removing terminal phosphates from a DNA sequence.
- This invention further provides in the thirteenth embodiment a kit further comprising one or more primers, each said primer consisting of a single stranded
- each of said one or more primers further comprises (a) a first subsequence that is the portion of the recognition site of one of said one or more restriction endonucleases
- This invention further provides in the thirteenth embodiment a kit wherein said instructions further comprise: detect such of said fragments digested on each end by a method comprising staining said fragments with silver, labeling said fragments with a DNA intercalating dye, or detecting light emission from a fluorochrome label on said fragments.
- This invention further provides in the thirteenth embodiment a kit further comprising reagents for performing a cDNA sample preparation step; reagents for performing a step of digestion by one or more restriction endonucleases;
- reagents for performing a ligation step and reagents for performing a PCR amplification step.
- Fig. 1 illustrates exemplary results of the signals generated by QEATM methods of this invention
- Figs. 2A, 2B, and 2C illustrate DNA adapters for an RE/ligation implementation of QEATM methods of this invention, where the restriction endonucleases generate 5' overhangs, open blocks indicating strands of DNA;
- Figs. 3A and 3B illustrate the DNA adapters for an
- Figs. 4A, 4B, and 4C illustrate an exemplary biotin alternative embodiment of QEATM methods
- Fig. 5 illustrates the DNA primers for a PCR embodiment of QEATM methods
- Figs. 6A and 6B illustrate a method for DNA sequence database selection according to this invention
- Fig. 7 illustrates an exemplary experimental description for QEATM embodiments of this invention
- Figs. 8A and 8B illustrate an overview of a method for determining a simulated database of experimental results for QEATM embodiments of this invention
- Fig. 9 illustrates the detail of a method for simulating a QEATM reaction
- Figs. 10A-F illustrate exemplary results of the action of the method of Fig. 9;
- Fig. 11 illustrates the detail of a method for
- Figs. 12A, 12B, and 12C illustrate an exemplary computer system apparatus, and an alternative embodiment, implementing methods of this invention
- Fig. 13A illustrates exemplary detail of an experimental design method for QEATM and CC embodiments of this invention
- Fig. 13B illustrates exemplary detail of an experimental design method for a QEATM embodiment of this invention
- Fig. 14 illustrates an exemplary method for ordering the DNA sequences found to be likely causes of a QEATM signal in the order of their likely presence in the sample
- Fig. 15 illustrates the detail of a method for
- Figs. 16A, 16B, 16C, and 16D illustrate exemplary reaction temperature profiles for preferred manual and automated implementations of a preferred RE embodiment of a QEATM method
- Figs. 17A-F illustrate the SEQ-QEATM alternative
- nucleotide or gene sequence full or partial, as well as many components of genomic DNA, it is not necessary to determine the actual, complete nucleotide sequences.
- Full sequences provide far more information than is needed to merely classify or determine a sequence
- the human genome it is known that there are approximately 10 5 expressed genes. Since the average length of a coding sequence is approximately 2000 nucleotides, the total number of possible sequences is approximately 4 2000 , or about 10 1200 . The actual number of expressed human genes is an unimaginably small fraction (10 -1195 ) of the total number of possible DNA
- sequences Even sequencing a 50 bp fragment of a cDNA sequence generates about 10 25 times more information than is needed for classification of that sequence.
- Use of the present invention allows direct determination of sequences in a sample with far less information than either a complete or a partial sequence determination of a sample by making use of a database of sequences likely to be present in the sample. If such a database is not available, sequences in the sample can nevertheless be separately classified.
- the invention is adaptable to analyzing the sequences of any biopolymer, built of a small number of repeating units, whose naturally occurring
- construct hash codes for expressed DNA sequences
- codes are constructed from one or more signals which represent the presence of short nucleic acid (preferably DNA) subsequences (hereinafter called DNA) subsequences (hereinafter called DNA) subsequences
- target subsequences in the sample sequence and,
- a QEATM embodiment include a representation of the length along the sample sequence between adjacent target subsequences.
- the presence of target subsequences is directly recognized by direct subsequence recognition means, including, but not limited to, REs and other DNA binding proteins, which bind and/or react with target subsequences, and oligomers of, for example, PNAs or DNAs, which hybridize to target subsequences.
- the presence of effective target subsequences is recognized indirectly as a result of applying protocols, perhaps involving multiple DNA binding proteins together with hybridizing oligomers.
- each of the multiple proteins or ologomers can recognize a separate subsequence and the effective target subsequence can be the combination of the separate subsequences.
- a preferable combination is subsequence concatenation in the situation where all the separately recognized subsequences are
- Such acceptable subsequence recognition means preferably precisely and reproducibly recognize target subsequences and generate a recognition signal with adequate signal to noise ratio and further preferably provide
- target subsequences which contain representations of target subsequence occurrences and, preferably, representations of the length between target subsequence occurrences, can differ in various embodiments of this invention.
- target subsequences are exactly recognized, for example, where REs are the
- subsequence representation can be the unique identity of the subsequences.
- target subsequence recognition is less exact, for example, where short oligomers are used, and this representation can be "fuzzy". In the case of short oligomer, a fuzzy
- representation can consist of all subsequences which differ by one nucleotide from a target subsequence, each such subsequence, perhaps weighted by the probability that each member of the set is the target subsequence. Further, length representation may depend on the separation and detection means used to generate the signals. In the case of
- electrophoretically may need to be corrected, perhaps up to 5 to 10%, for mobility differences due to average base
- electrophoretic length in bp and not the physical length in bp are presumed to represent physically correct lengths, as if generated by precise recognition means with a length determined by error or bias free separation and detection means.
- target subsequences can be
- Target subsequences recognized are typically contiguous. This is typical for REs adaptable to this invention. However, this invention is adaptable to means recognizing discontiguous target subsequences or discontiguous effective targer subsequences. For example, oligomers recognizing discontinuous subsequences can be constructed by inserting degenerate nucleotides in a
- a set of 16 oligomers recognizing AGC-TAT, with a two nucleotide discontiguous region can be constructed according to the schema TCGNNATA, where N is any nucleotide. Alternately, such discontiguous subsequences can be recognized by one oligomer of the form TCGiiATA, where "i" is inosine, or any other "universal" nucleotide, capable of hybridizing with any naturally occurring base.
- the invention is applied to the analysis of cDNA samples
- RNA synthesized from any in vivo or in vi tro sources of RNA.
- cDNA can be synthesized either from total cellular RNA, from poly (A)+ RNA, or from specific sub-pools of RNA.
- RNA sub-pools can be produced by RNA pre-purification, for example, separation of mRNA of the endoplasmic reticulum from cytoplasmic mRNA enriches mRNA primarily encoding for cell surface or extracellular proteins (Celis et al., 1994, Cell Biology, Academic Press, New York, NY) .
- Such enriched mRNAs have increased diagnostic or therapeutic utility due, for example, to their encoded protein's cell-surface or
- First strand cDNA synthesis can be performed by any method known in the art and can use any priming method known in the art.
- first strand synthesis primers can be oligo(dT) primers, random hexamer primers, phasing primers, mixtures thereof, etc.
- phasing primers containing either an A,C, or G at the 3' end, can be used in separate cDNA synthesis reactions to split the cDNA first strands into 3 pools, each generated from poly (A)+ mRNA having a T, G, or C, respectively, 5' to the poly (A)+ tail.
- cDNA can be synthesized by methods biased to producing full-length cDNAs, e .g. by requiring presence of the 5 '-cap in the source cap mRNA.
- QEATM probes a sample with recognition means generating signals that preferably comprise an
- first target subsequence an indication of the presence of a second target subsequence
- a representation of the length between the target subsequences in the sample nucleic acid sequence If the first strand of target subsequences occur more than once in a single nucleic acid in the sample, more than one signal is generated, each signal comprising the length between adjacent occurrences of the target subsequences.
- QEATM embodiments are preferred for classifying and determining sequences in mixtures of cDNAs, but is also adaptable to samples with only one cDNA. It affords the relative advantage over prior art methods that cloning of sample nucleic acids is not required.
- enough pairs of target subsequences can be chosen so that sufficient distinguishable signals can be generated to determine one to all the sequences in the sample mixture. For example, first, any pair of target subsequences may occur more than once in a single DNA molecule to be analyzed, thereby generating several signals with differing lengths from one DNA molecule. Second, even if a pair of target subsequences occurs only once in two different DNA molecules to be analyzed, the lengths between the hits may differ and thus distinguishable signals may be generated.
- the target subsequences used in QEATM are preferably optimally chosen by the computer implemented methods of this invention in view of DNA sequence databases containing sequences likely to occur in the sample to be analyzed.
- efforts of the Human Genome Project in the United States, efforts abroad, and efforts of private companies in the sequencing of the human genome sequences, both expressed and genetic, are being collected in several available databases (listed in Sec. 5.1).
- QEATM can be performed in a "query mode” or in a "tissue mode.”
- a query mode experiment focuses on determining the expression of a limited number of genes, perhaps 1 - 100, of interest and of known sequence. A minimal number of target subsequences are chosen to generate signals, with the goal that each of the limited number of genes is discriminated from all the other genes likely to occur in the sample by at least one unique signal. In other words, such a QEATM experiment is designed so that each gene of interest generates at least one signal unique to it (a "good” gene, see infra).
- a QEATM tissue mode experiment focuses on determining the expression of as many as possible, preferably a majority, of the genes expressed in a tissue or other sample, without the need for any prior knowledge or interest in their expression.
- Target subsequences are optimally chosen to discriminate the maximum number of sample DNA sequences into classes comprising one or preferably at most a few sequences. Preferably, enough signals are
- signals are generated and detected as determined by the threshold and sensitivity of a
- Some important determinants of threshold and sensitivity are the initial amount of mRNA and thus of cDNA, the amount of molecular amplification performed during the experiment, and the sensitivity of the detection means.
- QEATM signals are generated by methods comprising a recognition means for target subsequences that include, but are not limited to one or more REs in a preferred RE/ligase embodiment or nucleotide oligomer primers in an alternative PCR embodiment. In both embodiments, this invention
- the RE/ligase method proceeds according to the following, steps.
- the method employs recognition reactions with one, a pair, or more REs which recognize target subsequences with high specificity and cut the sequence at the recognition sites leaving fragments with sticky overhangs characteristic of the particular RE.
- recognition reactions with one, a pair, or more REs which recognize target subsequences with high specificity and cut the sequence at the recognition sites leaving fragments with sticky overhangs characteristic of the particular RE.
- To each sticky overhang specially constructed, labeled
- amplification primers are ligated with the aid of shorter linkers in a manner so that the particular RE making the cut, and thus the particular target subsequence, can be later identified.
- a DNA polymerase then forma blunt-ended DNA fragments. These fragments are then PCR amplified using the same special labeled primers for a number of cycles
- amplified labeled fragments are then separated by length using gel electrophoresis in either denaturing or non-denaturing conditions and the length and labeling of the fragments is optically detected.
- single stranded fragments can be removed by a binding hydroxyapatite, or other single strand specific, column or by digestion by a single strand specific nuclease.
- this invention is adaptable to other functionally equivalent amplification and length separation means. In this manner, the identity of the REs cutting a fragment, and thereby the subsequences present, as well as the length between the cuts is determined.
- the RE/ligase embodiment is adaptable to several embodiments which enhance quantitative characteristics of QEATM signals or which increase sample sequence
- One or more of the special, labeled amplification primers described above and used in the PCR amplification step can have attached removal means comprising a capture moiety attached to the primer and a binding partner attached to a solid support, e.g., biotin and streptavidin beads.
- a solid support e.g., biotin and streptavidin beads.
- cDNA is synthesized from an mRNA sample with synthesis primers at least one of which is biotinylated.
- the cDNA is then cyclized.
- the cDNA is then cut with a one or a pair of REs, and the
- synthesis primers are removed with streptavidin or avidin bsads leaving highly pure double cut cDNA fragments with ligated amplification primers, but with minimal singly cut and labeled background fragments.
- these pure doubly cut and labeled fragments can be directly detected, after separation by length (e . g. , by electrophoresis or column chromatography), without amplification. If amplification is needed, absence of the DNA singly cut background fragments improves signal to noise ratio resulting in fewer necessary amplification cycles. Thereby, PCR amplification bias is decreased or eliminated and linear responsiveness of QEATM signals to input mRHA amounts is improved.
- RE/ligase embodiments increase sample sequence discrimination in QEATM experiments, for example, by recognizing target subsequences longer or less limited than those recognized by REs, or by recognizing third subsequences interior to cut fragments. This added information can often discriminate two sample sequences producing fragments having identical original end subsequences and lengths. It is used in the computer implemented database lookup methods of this invention in a manner similar to the use of target
- the target subsequences recognized can be effectively lengthened by using an
- amplification primer with an internal Type IIS RE recognition site so positioned that the Type IIS RE cuts the amplified fragments in a manner producing a second overhang contiguous with the recognition site of the initial RE.
- the sequence of the second overhang concatenated with the initial target end subsequence produces an effectively longer target
- telomere sequence can be recognized by using phasing primers during PCR amplification.
- the PCR amplification step can de divided into several pools with each pool using one phasing
- amplification primer constructed so as to recognize one or more additional nucleotides beyond the original RE
- a third subsequence internal to a fragment can be recognized by a distinctively labeled probe binding or hybridizing with the third subsequence.
- a probe added before detection generates unique signals from the fragment containing that subsequence.
- a probe can suppress signals from fragments with the third subsequence.
- a probe added before the PCR amplification step and which prevents amplification of a fragment with the third subsequence thereby removes and suppresses any signal from such fragments.
- Such a probe can be without limitation either an RE for recognizing and cutting the fragment with the third subsequence or a PNA or modified DNA oligomer, which cannot serve as a PCR primer, for hybridizing with the third subsequence.
- a third subsequence can be the sequence of the overhang produced by a Type IIS RE cutting the amplification primers sufficiently close to their 3' ends so that the resulting overhang is not contiguous with the recognition sequence of the initial RE.
- removal means to increase the s/n ratio is combined with a Type IIS RE cutting the amplification primers to increase sample sequence discrimination in an embodiment called SEQ-QEATM.
- PCR primers distinctively labeled with fluorochromes are synthesized to hybridize with these target subsequences.
- the primers are designed as described in Sec. 5.3 to reliably recognize short subsequences while achieving a high specificity in PCR amplification.
- a minimum number of PCR amplification steps amplifies those fragments between the primed subsequences existing in DNA sequences in the sample, thereby recognizing the target subsequences.
- the labeled, amplified fragments are then separated by gel electrophoresis and detected.
- the PCR embodiment is adaptable to the same embodiment previously discussed with respect to the RE/ligase embodiment.
- the signals generated from the recognition reactions of a QEATM experiment are analyzed by computer methods of this invention.
- the analysis methods simulate a QEATM experiment using a database either of substantially all known DNA sequences or of substantially all, or at least a majority of, the DNA sequences likely to be present in a sample to be analyzed and a description of the reactions to be performed.
- the simulation results in a digest database which contains for each possible signal that can be generated the database sequences responsible for that signal. Thereby, finding the sequences that can generate a signal involves a look-up in the simulated digest database.
- Computer implemented design methods optimize the choice of target subsequences in QEATM reactions in order to maximize the information produced in an experiment. For the tissue mode, the methods maximize the number of sequences having unique signals by which their quantitative presence can be
- the methods maximize only the number of sequences of interest having unique signals, ignoring recognition of other sequences that might be present in a sample.
- colony calling generates subsequence occurrence data without length information. Since this method requires only
- the hash code generated by the probe hybridization reactions is interpreted by computer implemented methods of this invention.
- the analysis methods simulate a CC
- the simulation results in a hash code table which contains for each hash code all possible sequences that can generate that code.
- target subsequences chosen to generate a signal should preferably occur in the DNA sequence sample to be analyzed less than about 50% and at least more often than 5- 10%, preferably more often than 10-15%.
- the most preferable occurrence probability is from 25-50%.
- the presence of one target subsequence is preferably probabilistically independent of the presence of any other subsequence.
- sub-sequences are preferably less than about 5 to 8 bp long for cDNA classification.
- the resulting preferable target subsequences are 4 to 6 bp long. Longer sequences occur too infrequently to be preferred for use. However, for classifying gDNA, longer subsequences, up to 20 to 40 bp, are preferably used, because gDNA fragments are normally of much greater length, from at least 5 kilobases ("kb") for plasmid inserts to more the 100 kb for PI inserts, and thus would typically have more sequence variability, requiring longer target subsequences.
- kb kilobases
- the preferred hybridization probes for short target subsequences are labeled peptido-nucleic acids (PNAs).
- hybridization specificity as compared to 4 to 6-mers.
- Sets of probes, each probe distinctively and distinguishably labeled with a fluorochrome are hybridized in conditions of high stringency to arrayed DNA sequence clones and optically detected to detect the presence of target subsequences. For example, in an embodiment wherein five fluorochromes are simultaneously distinguished and 20 subsequences observations are required for gene identification (a 20 bit code), any gene in a colony can be identified in only four hybridization steps.
- efficient hybridization detection means based on optical wave guide detection of DNA hybridization can be used. By using differently sized and shaped particles associated with different probes, the resultant differences in light scattering can be used to detect hybridization of multiple probes simultaneously with these wave guide methods.
- Target subsequences can be chosen to discriminate not only single genes but also, more coarsely, sets of genes. Fewer target subsequences can be chosen so that a particular pattern of hits will indicate the presence of a gene of a particular type. Types of genes of interest might be
- oncogenes tumor suppressor genes, growth factors, cell cycle genes, or cytoskeletal genes, etc.
- conditions of this invention generally comprise a low salt concentration, equivalent to a concentration of SSC (173.5 g. NaCl, 88.2 g. Na Citrate, H 2 O to 1 1.) of less than approximately 1 mM, and a temperature near or above the T m of the hybridizing DNA.
- conditions of low stringency generally comprise a high salt concentration, equivalent to a concentration of SSC of greater than approximately 150 mM, and a temperature below the T m of the hybridizing DNA.
- DNA oligomers are specified for performing functions, including hybridization and chain elongation priming
- alternatively oligomers can be used that comprise those of the following nucleotide mimics which perform similar functions.
- the oligomers can be DNA or RNA or chimeric mixtures or derivatives or modified versions thereof.
- the oligomers can be modified at the base moiety, sugar moiety, or phosphate backbone.
- the oligomers may include other appending groups such as peptides, hybridization-triggered cleavage agents (see, e . g. , Krol et al., 1988, BioTechniques 6:958-976), or intercalating agents (see, e .g. , Zon, 1988, Pharm . Res .
- the oligomers may be conjugated to another molecule, e . g. , a peptide, hybridization triggered cross-linking agent, transport agent, hybridization-triggered cleavage agent, etc.
- the oligomers may also comprise at least one nucleotide mimic that is a modified base moiety which is selected from the group including, but not limited to,
- 5-fluorouracil 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xantine, 4-acetylcytosine,
- 2-methylthio-N6-isopentenyladenine 2-methylthio-N6-isopentenyladenine, uracil-5-oxyacetic acid (v), wybutoxosine, pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil,
- the oligomers may comprise at least one modified sugar moiety selected from the group including but not limited to arabinose, 2-fluoroarabinose, xylulose, and hexose.
- the oligomers may comprise at least one modified phosphate backbone selected from the group consisting of a phosphorothioate, a phosphorodithioate, a
- phosphoramidothioate a phosphoramidate, a phosphordiamidate, a methylphosphonate, an alkyl phosphotriester, and a
- the oligomer may be an ⁇ -anomeric oligomer.
- An ⁇ - anomeric oligomer forms specific double-stranded hybrids with complementary RNA in which, contrary to the usual ⁇ -units, the strands run parallel to each other (Gautier et al., 1987, Nucl. Acids Res . 15:6625-6641).
- Oligomers of the invention may be synthesized by standard methods known in the art, e .g . , by use of an
- phosphorothioate oligos may be synthesized by the method of Stein et al. (1988, Nucl . Acids Res . 16:3209),
- methylphosphonate oligos can be prepared by use of controlled pore glass polymer supports (Sarin et al., 1988, Proc . Natl . Acad . Sci . U . S.A . 85.-7448-7451), etc.
- oligomers that can specifically hybridize to subsequences of a DNA sequence too short to achieve reliably specific recognition, such that a set of target subsequences is recognized.
- PCR is used, as Tag polymerase tolerates hybridization mismatches, PCR specificity is generally less than hybridization specificity.
- oligomers recognizing short subsequences are preferable, they may be constructed in manners including but not limited to the following.
- degenerate sets of DNA oligomers may be used which are constructed of a total length sufficient to achieve specific hybridization with each member of the set containing a shorter sequence complementary to the common subsequence to be recognized.
- a longer DNA oligomer may be constructed with a shorter sequence complementary to the subsequence to be recognized and with additional universal nucleotides or nucleotide mimics, which are capable of hybridizing to any naturally occurring nucleotide.
- Nucleotide mimics are sub-units which can be polymerized to form molecules capable of specific, Watson-Crick-like base pairing with DNA.
- the oligomers may be constructed from DNA mimics which have improved hybridization energetics compared to naturally occurring nucleotides.
- a preferred mimic is a peptido-nucleic acid ("PNA") based on a linked N- (2-aminoethyl)glycine backbone to which normal DNA bases have been attached (Egholm et al., 1993, Nature 365:566-67). This PNA obeys specific Watson-Crick base pairing, but with greater free energy of binding and correspondingly higher melting temperatures.
- PNA peptido-nucleic acid
- Suitable oligomers may be constructed entirely from PNAs or from mixed PNA and DNA oligomers.
- any length separation means known in the art can be used.
- any length separation means known in the art can be used.
- the separation means employs a sieving medium for separation by fragment length coupled with a force for propelling the DNA fragments though the sieving medium.
- the sieving medium can be a polymer or gel, such a polyacrylamide or agarose in suitable concentrations to separate 10-1000 bp DNA fragments.
- the propelling force is a voltage applied across the medium.
- the gel can be disposed in electrophoretic configurations comprising thick or thin plates or
- the gel can be non-denaturing or denaturing.
- the sieving medium can be such as used for chromatographic separation, in which case a pressure is the propelling force.
- Standard or high performance liquid chromatographic ("HPLC") length separation means may be used.
- An alternative separation means employs molecular
- Mass spectrographic means capable of separating 10-1000 bp fragments may be used. DNA fragment lengths determined by such a
- separation means represent the physical length in base pairs between target subsequences, after adjustment for biases or errors introduced by the separation means and length changes due to experimental variables (e.g., presence of a detectable label, ligation to an adapter molecule).
- a represented length is the same as the physical length between occurrences of target subsequences in a sequence from said database when both said lengths are equal after applying corrections for biases and errors in said separation means and corrections based on experimental variables.
- represented lengths determined by electrophoresis can be adjusted for mobility biases due to average base composition or mobility changes due to an attached labeling moiety and/or adapter strand by conventional software programs, such as Gene Scan Software from Applied Biosystems, Inc. (Foster City, CA).
- any compatible labeling and detection means known in the art can be used. Advances in fluorochromes, in optics, and in optical sensing now permit multiply labeled DNA fragments to be distinguished even if they completely overlap in space, as in a spot on a filter or a band in a gel. Results of several recognition reactions or hybridizations can be multiplexed in the same gel lane or filter spot. Fluorochromes are available for DNA labeling which permit distinguishing 6-8 separate products simultaneously (Ju et al., 1995, Proc. Natl . Acad Sci . USA 9.2:4347-4351).
- intercalating DNA dyes are utilized to detect DNA, any such dye known in the art is adaptable.
- dyes include but are not limited to ethidium bromide, propidium iodide, Hoechst 33258, Hoechst 33342, acridine orange, and ethidium bromide homodimers.
- dyes also include POPO, BOBO, YOYO, and TOTO from Molecular Probes (Eugene, OR).
- a filter e . g. , nitrocellulose
- visualization means known in the art to visualize adherent DNA. See, e .g . , Kricka et al., 1995, Molecular Probing, Blotting,, and Sequencing, Academic Press, New York.
- visualization means requiring secondary reactions with one or more reagents or enzymes can be used, as can any means employed in the CC embodiment.
- This embodiment of this invention in the tissue mode preferably generates one or more signals unique to each cDNA sequence in a mixture of cDNAs, such as may be derived from total cellular RNA or total cellular mRNA from a tissue sample, and to quantitatively relate the strength of such a signal or signals to the relative amount of that cDNA sequence in the sample or library.
- this embodiment preferably generates signals uniquely
- QEATM signals comprise an indication of the presence of pairs of target subsequences and the length between pairs of adjacent subsequences in a DNA sample.
- Alternatives include recognizing the presence of third subsequences between the pairs of target subsequences.
- one of the subsequences is the true end of the protein coding sequence, in a defined relation to the 5' cap of the source mRNA.
- Signals are preferably generated in a manner permitting straightforward automation with existing laboratory robots.
- the detailed description of this method is directed to the analysis of samples comprising a plurality of cDNA sequences. It is equally applicable to samples comprising a single sequence or samples comprising sequences of other types of DNA. or nucleic acids generally.
- the DNA sample can be cDNA and/or genomic DNA, and preferably comprises a mixture of DNA sequences.
- the DNA sample is an aliquot of cDNA of total cellular RNA or total cellular mRNA, most preferably derived from human tissue.
- the human tissue can be diseased or normal.
- the human tissue is malignant tissue, e.g., from prostate cancer, breast cancer, colon cancer, lung cancer, lymphatic or hematopoietic cancers, etc.
- the tissue may be derived from in vivo animal models of disease or other biologic processes. In this cases the diseases modeled can usefully include, as well as cancers, diabetes, obesity, the rheumatoid or autoimmune diseases, etc.
- the samples can be derived from in vitro cultures and models. This invention can also be
- the cDNA, or the mRNA from which it is synthesized must be present at some threshold level in order to generate signals, this level being determined to some degree by the conditions of a particular QEATM experiment.
- a threshold is that preferably at least 1000, and more preferably at least 10,000, mRNA molecules of the sequence to be detected be present in a sample.
- at least a corresponding number of such cells should be present in the initial tissue sample.
- the mRNA detected is present in a ratio to total sample RNA of 1:10 5 to 1:10 6 . With a lower ratio, more molecular amplification can be performed during a QEATM experiment.
- the cDNA sequences occurring in a tissue derived pool include short untranslated sequences and translated protein coding sequences, which, in turn, may be a complete protein coding sequence or some initial portion of a coding sequence, such as an expressed sequence tag.
- a coding sequence may represent an as yet unknown sequence or gene or an already known sequence or gene entered into a DNA sequence database.
- Exemplary sequence databases include those made available by the National Center for Biotechnology
- NCBI Genetic Information
- EBL European Bioinformatics Institute
- a QEATM method is also applicable to samples of genomic DNA in a manner similar to its application to cDNA.
- information of interest includes occurrence and identity of translocations, gene amplifications, loss of heterozygosity for an allele, etc. This information is of interest in cancer diagnosis and staging.
- amplified sequences might reflect an oncogene, while loss of heterozygosity might reflect a tumor suppressor gene.
- Such sequences of interest can be used to select target subsequences and to predict signals generated by a QEATM experiment. Even without prior knowledge of the
- sequences of interest, detection and classification of QEATM signal patterns is useful for the comparison of normal and diseased states or for observing the progression of a disease state.
- Classification of QEATM signal patterns in an exemplary embodiment, can involve statistical analysis to determine significant differences between patterns of
- Target subsequence choice is important in the practice of this invention.
- the two primary considerations for selecting subsequences are, first, redundancy, that is, that there be enough target subsequence pair occurrences (also known as "hits") per gene that a unique signal is likely to be generated for each sample sequence, and second, resolution, that is, that there not be so target subsequence pair occurrences with very similar lengths in a sample that the signals cannot be resolved.
- redundancy that is, that there be enough target subsequence pair occurrences (also known as "hits") per gene that a unique signal is likely to be generated for each sample sequence
- resolution that is, that there not be so target subsequence pair occurrences with very similar lengths in a sample that the signals cannot be resolved.
- redundancy that there be enough target subsequence pair occurrences (also known as "hits") per gene that a unique signal is likely to be generated for each sample sequence
- resolution that is, that there
- a recognition reaction preferably should not generate more fragments than can be separated and distinguishably detected.
- gel electrophoresis is the separation means used to separate DNA fragments by length.
- electrophoretic techniques allow an effective resolution of three base pair (“bp") length differences in sequences of up to 1000 bp length. Given knowledge of fragment base
- detection means Any alternative means for separation and detection of DNA fragments by length, preferably with resolution of three bp or better, can be employed.
- separation means can be thick or thin plate or column electrophoresis, column chromatography or HPLC, or physical means such as mass spectroscopy.
- the redundancy and resolution criteria are probabilistically expressed in Eqns. 1 and 2 in an
- the number of genes in the cDNA sequence mixture is N
- the average gene length is L
- the number of target subsequence pairs is M (the number of pairs of recognition means)
- the probability of each target subsequence occurring in, or hitting, a typical sample sequence is p. Since each target subsequences is preferably selected to occur independently in each sample sequence, the probability of occurrence of an arbitrary subsequence pair is then p 2 .
- Eqn. 1 expresses the redundancy condition of three pair occurrences per sample sequence, assuming the probability of occurrence of each target subsequence is independent.
- Eqn 2 expresses the resolution condition of having fragments with lengths no closer on average than 3 base pairs. This equation approximates the actual fragment length distribution with a uniform distribution.
- Eqns 1 and 2 Given expected values of ⁇ , the number of sequences in the library or sample to analyze (library complexity), and L, the average expressed sequence (or gene) length, Eqns 1 and 2 are solved for the subsequence occurrence probability and number of subsequences required. This solution depends on the particular redundancy and resolution criteria dictated by the particular experimental method chosen to implement QEATM.
- probabilities of target subsequence occurrence are from approximately 0.01 to 0.30. Probabilities of occurrence of specified subsequences and RE recognition sites can be determined from databases of DNA sample sequences. Example 6.2 lists these probabilities for exemplary RE recognition sites. Appropriate target subsequences can be selected from these tables. Computer implemented QEATM experimental design methods can then optimize this initial selection.
- QEATM Another use of QEATM is to compare directly the expression of only a few genes or sample sequences, typically 1 to 10, between two different tissues, the query mode, instead of seeking to determine the expression of all genes in a tissue, the tissue mode.
- this query mode a few target subsequences are selected to discriminate the genes of interest both among themselves and from all other sequences possibly present.
- the computer design methods described hereinbelow can make this selection. If 4 subsequence pairs are sufficient for identification, then the fragments from the 4 recognition reactions performed on each tissue are preferably separated and detected on two separate lanes in the same gel. If 2 subsequence pairs are sufficient for identification, the two tissues are preferably analyzed in the same gel lane. Such comparison of signals from the same gel improves quantitative results by eliminating measurement variability due to differences between separate
- electrophoretic runs For example, expression of a few target genes in diseased and normal tissue samples can be rapidly and reliably analyzed.
- the query mode of QEATM is also useful even if the sequences of the particular genes of interest are not yet known.
- Differentially expressed features can be identified by comparing the results of QEATM reactions applied to two different samples. In the case where the separation and detection of reaction products is by gel electrophoresis, such a comparison can be done by comparing gel bands or fluorescent traces of exiting fragments.
- Such differentially expressed features can then retrieved from the gel by methods known in the art (e.g., electro-elution from the gel) and the DNA fragments analyzed by conventional techniques, such as by sequencing.
- sequences which are typically partial, can then be used as probes (e.g., in PCR or Southern blot hybridization) to recover full-length sequences.
- QEATM techniques can guide the discovery of new differentially expressed cDNA or of changes of the state of gDNA.
- the sequences of the newly identified genes, once determined, can then be used to guide QEATM target subsequence choice for further analysis of the differential expression of the new genes.
- target subsequences are recognized by oligomers which hybridize to the DNA target subsequences and act as PCR primers for the amplification of the segments between
- subsequence selection begins by compiling
- oligomer frequency tables containing the frequencies of, preferably, all 4 to 8-mers by using a sequence database.
- target subsequences with the necessary probabilities of occurrence are selected and checked for independence, by, for example, checking that the conditional probability for occurrence by any selected pair of subsequences is the product of the probabilities of occurrences of the individual subsequences of the pair.
- An initial selection can be optimized to determine target subsequence sets producing unique fragments from the greatest number of sample sequences.
- PCR primers are synthesized with a 3' end complementary to the chosen subsequences and used in the PCR embodiment.
- Example 6.1 illustrates the signals output by this method in a specific example.
- the preferred embodiment uses DNA binding proteins, specifically REs, including Type IIS REs, to recognize and cleave sample sequences at the target subsequences. Desired fragments, with lengths dependent only on source cDNA
- amplification means are amplified by an amplification means in order to dilute remaining, unwanted fragments with indefinite lengths.
- desired fragments are doubly cut by REs whereas unwanted fragments are singly cut.
- singly cut fragments have a definite length and are of interest.
- Unwanted singly cut fragments can be removed by affinity means (e.g., biotin labeling), physical means (e.g., hydroxyapatite column separation), or enzymatic means (e . g. , single strand specific nucleases). Sufficient removal of the unwanted singly cut ends from the desired doubly cut fragments can permit fragment detection without an amplification step.
- the possible target subsequences although limited to recognition sites of available REs, can be selected in a manner similar to the above in order to meet the previous probability or occurrence and independence criteria as closely as possible.
- the probabilities of occurrence of various RE recognition sites can be determined from a database of potential sample sequences, and those REs chosen with
- recognition subsequences whose probabilities of occurrence meet the criterion of Eqns 1 and 2 as closely as possible. If multiple REs satisfy the selection criteria, a subset is selected by including only those REs with independently occurring recognition subsequences, determined, for example, in the previous manner using conditional probabilities of occurrence.
- An initial choice can be optionally optimized by the computer implemented experimental design methods.
- a number, R e , of REs are preferably selected so that the number of RE pairs is approximately M, as determined from Eqn. l, where the relation between M and R e is given by Eqn. 3.
- a set a set of 20 acceptable REs results in 210 subsequence pairs.
- the PCR and the RE embodiments have different accuracy and flexibility characteristics.
- RE embodiments are generally more accurate, with fewer false positive and false negative identifications, since the enzymatic recognition and subsequent ligation reactions are generally more specific than the hybridization of short PCR primers to their
- RE Restriction endonucleases
- RE RE
- other REs such as those known as class IIS restriction enzymes, which produce overhangs of unknown sequence can be used to extend initial target subsequences into longer effective target subsequences.
- Phasing primers can also be used to recognize longer effective targer
- Overhangs of the initial REs can be
- the ligase enzymes which are used in this alternative embodiment of this invention to ligate the amplification primer, are highly specific in their
- PCR and the preferred Taq polymerase used therein tolerates hybridization mis-matches of elongation primers.
- PCR embodiments can generate false positive signals which arise from mis-matches in the hybridization of the oligomer probes to the target subsequences.
- the PCR embodiments are more flexible since any desired subsequence can be a target subsequence.
- the RE embodiment is limited to the recognition sequences of acceptable REs. However, more than 150 to 200 REs are now commercially available recognizing a wide variety of
- QEATM experiments are also adaptable to distinguish sample sequences into small sets, typically comprising 2 to 10 sequences. Such coarser grain analysis requires fewer subsequence pairs, fewer recognition reactions, and less analysis time. Alternatively, smaller numbers of target subsequence pairs can be optimally chosen to distinguish individually a specific set of sequences of interest from all the other sequences in a sample. These target subsequences can be chosen either from REs that produce fragments from the specific sample sequences or, in the case of the PCR
- the preferred restriction endonuclease (“RE") embodiments of QEATM use novel simultaneous RE and ligase enzymatic reactions, known as recognition reactions, for generating labeled fragments of the sample sequences to be analyzed. These labeled fragments are then optionally amplified by an amplification means, separated according to length by a separation means, and detected by a detection means to yield QEATM signals comprising the identity of the REs cutting each fragment together with each fragment's length.
- the RE/ligase subsequence recognition reactions can specifically and reproducibly generate QEATM signals with good signal to noise ratios.
- Preferred protocols for this reaction perform all steps, including amplification, in a single tube without any intermediate extractions or buffer exchanges. This protocol is preferably automatically performed by standard laboratory robots.
- REs bind with specificity to short DNA target subsequences, usually 4 to 8 bp long, that are termed
- Recognition sites and are characteristic of each RE. REs that are used cut the sequence at (or near) these recognition sites preferably producing characteristic ("sticky") ends with single-stranded overhangs, which usually incorporate part of the recognition site. Type IIS REs, which cut outside of their recognition site, can be used to extend the initial target subsequence to a longer effective target subsequence for use in the computer implemented database lookup.
- Preferred REs have a 6 bp recognition site and generate a 4 bp 5' overhang. Less preferred REs generate a 2 bp 5' overhang. These are less preferred since 2 bp
- overhangs have a lower ligase substrate activity than 4 bp overhangs.
- All RE embodiments can be adapted to 3' overhangs of two and four bp.
- REs generating 5' and 3' overhangs are preferably not used in the same recognition reaction.
- preferred REs have the following additional properties.
- Their recognition sites and overhang sequences are preferably such that an amplification primer can be designed whose ligation does to a cut end does not recreate the recognition site. They preferably have sufficient activity below 37°C, and particularly at 16°C, the optimal ligase temperature, to cut unwanted ligation
- PCR amplification can be performed by simply adding PCR reagents to the RE/ligase reaction mix. They preferably have low non-specific cutting and nuclease activities and cut to completion.
- the REs selected for a particular experiment preferably have recognition sites meeting the previously described occurrence and independence criteria. Preferred pairs of REs for analyzing human and mouse cDNA are listed in Sec. 6.10.
- cDNA fragments doubly cut on each end and by REs have a length dependent only on the sequence of the originating cDNA and are, therefore, of interest.
- cDNA fragments singly cut on their 5' end by an RE and terminated on their 3' end by the poly (A) tail have a variable and non-reproducible lengths that depend strongly on cDNA synthesis conditions. Such fragments singly cut on one end by an RE and with a variable length tail on the other are not of interest.
- RE embodiments of QEATM exponentially amplify doubly cut fragments, while only linearly amplifying singly cut fragments. This amplification is preferably done by the PCR method.
- Other RE embodiments separate singly and doubly cut fragments with a removal means targeted at either type of fragment.
- the preferred removal means comprises a biotin capture moiety and a streptavidin binding partner.
- the removal means can either supplement or replace
- fragments singly cut on their 3' end by an RE and terminated on their 5' end by a sequence in a fixed relation to the 5' cap of the source mRNA also have definite lengths and are of interest.
- Such fragments can be generated according to a method herein called 5'-QEATM, which comprises synthesizing cDNA according to the protocol of Sec. 6.3.3, performing recognition reactions, and separating the fragments of interest by a removal means.
- fragments are also of interest if they have a definite, sequence dependent length by being singly cut on their 5' end and by being terminated in a fixed relation with respect to the beginning of the 3' poly (A) + tail.
- This invention is adaptable to alternative amplification means known in the art. If a removal means for unwanted singly cut fragments is not utilized, alternative amplification means must preferentially amplify doubly cut fragments with respect to singly cut fragments, in order that signals from singly cut fragments be relatively suppressed. On the other hand, if a removal means for singly cut
- alternative amplification means can less preferably have no amplification preference.
- this means can be used either to remove the singly or the doubly cut fragments.
- Known alternative amplification means are listed in Kricka et al., 1995, Molecular Probing, Blotting, and Sequencing, chap. 1 and table IX, Academic Press, New
- Certain other embodiments use a physical removal means to directly remove unwanted singly cut fragments, preferably before amplification.
- Singly cut fragment removal can be accomplished, e .g. , by labeling DNA termini with a capture moiety prior to digestion, as by synthesizing the cDNA with biotinylated primers. After digestion, the singly cut fragments are then removed by contacting the sample with a binding partner of the capture moiety, affixed to a solid phase.
- the doubly cut fragments can be labeled with a capture moiety, as by amplifying the fragments with primers one of which is labeled with a capture moiety. The amplification products are contacted with a binding partner affixed to a solid support, washed, and then
- single stranded fragments can be removed by single stand specific column separation or single strand specific nucleases.
- the removal means includes a capture moiety and a binding partner.
- the capture moiety is capable of conjugation to DNA oligomers without disruption of hybridization or chain elongation reactions.
- the binding partner is capable of attachment to a solid phase support and can bind the capture moiety to such a support in DNA denaturing conditions.
- the preferred removal means is biotin-streptavidin.
- Other removal means adaptable to this invention include various haptens, which are removed by their corresponding antibodies. Exemplary haptens include digoxigenin, DNP, and fluorescein (Holtke et al., 1992,
- RE/ligase embodiments of QEATM use recognition moieties.
- each recognition moiety is capable of hybridizing with and being ligated to overhangs cut by only one RE. Thereby, the recognition sequence of that RE is identified.
- Recognition moieties typically comprise partially double stranded DNA oligomers, each oligomer capable of specifically hybridizing with only one RE generated sticky end in one recognition reaction, in the RE/ligase embodiment using PCR amplification, the recognition moieties also provide primer means for the PCR and thereby also provide for labeling and recognition of RE cut ends. For example, using a pair of REs in one
- recognition reaction generates doubly cut fragments some with the recognition sequence of the first RE on both ends, some with the recognition sequence of the second RE on both ends, and the remainder with one recognition sequence of each RE on either end.
- Using more REs generates doubly cut fragments with all pair-wise combinations of RE cut ends from adjacent RE recognition sites along the sample sequences. All these cutting combinations need preferably to be distinguished, since each provides unique information on the presence of different subsequences pairs, the RE recognition sites, present in the original cDNA sequence.
- the recognition moieties preferably have unique labels which label
- each RE cut made in a reaction specifically each RE cut made in a reaction. As many REs can be used in a single reaction as labeled recognition moieties are available to uniquely label each RE cut. If the
- detectable labeling in a particular system is, for example, by fluorochromes, then fragments cut with one RE have a single fluorescent signal from the one fluorochrome
- fragments cut with two REs have mixed signals, one from the fluorochrome associated with each RE.
- fluorochrome labels are preferably distinguishable.
- the recognition moieties need not be distinctively labeled. In embodiments using PCR amplification, corresponding primers would not be labeled. If silver staining is used to recognize fragments separated on an electrophoresis gel, no recognition moiety need be labeled, as fragments cut by the various RE
- the recognition reaction conditions are preferably selected, as described in Sec. 6.4, so that RE cutting and recognition moiety ligation go to full completion: all recognition sites of all REs in the reaction are cut and ligated to a recognition moiety.
- the fragments generated from a sequence analyzed lie only between adjacent recognition sites of any RE in that reaction. No fragments remain which include an internal RE recognition site.
- Multiple REs can be used in one recognition reaction. Too many REs in one reaction can cut the sequences too frequently, generating a compressed length distribution with many short fragments of lengths between 10 and a few hundred base pairs long that are not clearly resolvable by the separation means.
- fragments For example, for gel electrophoresis, if the fragments are too close in length, fragments should not be closer than 3 bp on the average. Too many REs also can generate fragments of the same length and end subsequences from different sample sequences. Finally, where fragment labels are to be distinguished, no more REs can be used than can have distinguishably labeled sticky ends.
- REs optimally useable in one recognition reaction.
- Preferably two REs are used, with one, three and four REs less preferable.
- Preferable pairs of REs for the analysis of human cDNA samples are listed in Sec. 6.10.
- Fragments with specific third internal subsequences can be detected by either labeling or suppressing such fragments or with Type IIS REs.
- probes with distinguishable labels which bind to this target subsequence are added to the fragments prior to detection, and alternatively prior to separation and detection.
- fragments with this third subsequence present will generate a signal, preferably fluorescent, from the probe.
- a probe could be a labeled PNA or DNA oligomer. Short DNA oligomers may need to be extended with a universal nucleotide or degenerate sets of natural nucleotides in order to provide for specific
- Fragments with a third subsequence can be suppressed in various manners. The absence of such fragments is determined by comparing a recognition reaction without the suppressing factors with a reaction with the suppressing factors.
- a probe hybridizing with this third subsequence which prevents polymerase elongation in PCR can be added prior to
- sequences with this subsequence will be at most linearly amplified and their signal thereby
- Such a probe could be a PNA or modified DNA oligomer (with the 3' nucleotide being a ddNTP).
- this RE can be added to the RE-ligase reaction without any corresponding specific primer. Fragments with the third subsequence thereby have primers on one end only are at most linearly amplified. Both these embodiments can be extended to
- Type IIS REs which cut a primer close to its junction with the original cDNA fragment sequence generates overhangs which are not contiguous with the initial RE recognition sequence.
- the sequence of such an overhang can be used as a third internal subsequence.
- recognition moieties also herein called adapters or linker-primer oligomers
- the adapters are partially double stranded DNA ("dsDNA").
- the adapters can be constructed as oligomers of any nucleic acid having
- the adapters preferably serve as a primer for that amplification means, if needed.
- Fig. 2A illustrates the DNA molecules involved in the ligation reaction as conventionally indicated with the 5' ends of the top strands and the 3' ends of the bottom strands at left.
- dsDNA 201 is a fragment of a sample cDNA sequence with an RE cut at the left end generating, preferably, four bp 5' overhang 202.
- Adapter dsDNA 209 is a synthetic substrate provided by this invention. The structure of adapter 209 is selected to ensure that RE digestion and adapter ligation preferably go to completion, that generation of unwanted products and amplification biases are minimized, and that unique labels are attached to cut ends (if needed).
- Adapter 209 comprises strand 203, called a primer, and a partially complementary strand 205, called a linker.
- the primer is also known as the longer strand of the adapter
- the linker is also known as the shorter strand of the adapter.
- the linker or shorter strand, links the cDNA cut by an RE to the primer, or longer strand, by hybridizing to the overhang generated by the RE and to the primer such that the 3' end of the primer is adjacent to the 5' end of the overhang.
- the primer can be
- linker 205 comprises subsequence 206 complementary to RE overhang 202 and subsequence 207 complementary to 3' end 204 of primer 203.
- Subsequence 206 is most preferably of the same length as the RE overhang.
- Subsequence 207 is preferably eight nucleotides long, less preferably from 4 to 12 nucleotides long, but can be of any length as long as the linker reliably hybridizes with only one primer in any one recognition reaction at an appropriate T m .
- the appropriate T m should preferably be less than the self-annealing T m of primer 203.
- linker 205 preferably lacks a 5' terminal phosphate to prevent its ligated to the 3' bottom strand of dsDNA 201. More importantly, lack of a terminal phosphate also prevents self-annealed adapters from ligating and forming dimers. Adapter self-ligation is disadvantageous in that it would compete with adapter ligation to cut cDNA fragments. Further, adapter dimers would be amplified in a subsequent amplification step
- Terminal phosphates can be removed from linkers using
- Primer, or longer strand, 203 has a 3' end subsequence 204 complementary to 3' end subsequence 2C7 of linker 205. It is preferable that each RE generated overhang is ligated to a unique primer, in each recognition reaction in order that the overhangs generated by each RE can be detected. Consequently, in each recognition reaction primers and linkers are preferably chosen so that each primer is complementary to and hybridizes with only one linker 205 and that each linker which hybridizes with an RE has a unique sequence 207 for hybridizing with a unique primer. In order that the primer/cDNA overhang ligation reaction go to
- primer 203 preferably does not recreate the recognition sequence of any RE in one recognition reaction when it is ligated with cDNA end 202. Further, primer 203 preferably has no 5' terminal phosphate in order to prevent primer self-ligations. To minimize amplification noise, it is preferred that primer 203 not hybridize with any sequence present in the original sample mixture. If such
- a subsequence PCR step can amplify unwanted fragments not cut by the initial REs.
- the T m of primer 203 is preferably high, in the range from 50° to 80°C, and more preferably above 68°C. This permits that the subsequent PCR amplification can be controlled so that only primers and not linkers initiate new chains, the linkers remaining melted through the PCR cycle.
- the primer is optionally unlabeled.
- this T m can be achieved by use of a primer having a combination of a G+C content preferably from 40-60%, most preferably from 55-60%, and a length most preferably 24 nucleotides, and preferably from 18 to 30 nucleotides.
- Primer 203 is optionally labeled with
- the primer, or longer strand is constructed so that, preferably, it is highly specific, free of dimers and hairpins, and capable of forming stable duplexes under the conditions specified, in particular at the desired T m .
- Fig. 2B illustrates two exemplary adapters and their component primers and linkers constructed according to the above description.
- Adapter 250 is specific for the RE BamHI, as it has a 3' end complementary to the 5' overhang generated by BamHI.
- Adapter 251 is similarly specific for the RE HindIII.
- Sec. 6.10 contains a more comprehensive, non-limiting list of adapters that can be used according to the invention. All synthetic oligonucleotides of this invention are preferably as short as possible for their functional roles in order to minimize synthesis costs.
- a further alternative illustrated in Fig. 2C is to construct an adapter by self hybridization of single stranded DNA in hairpin loop configuration 212. Subsequences of loop 212 are constructed with similar structure to the corresponding subsequences of linker 205 and primer 203. Exemplary hairpin loop 211 sequences are C 4 to C 10 .
- dsDNA 301 is a fragment of a sample cDNA cut with a RE generating 3' overhang 302.
- Adapter 309 comprises primer, or longer strand, 304 and linker, or shorter strand, 305.
- Primer, or longer strand, 304 includes subsequence 306 complementary to and of the same length as 3' overhang 302 and subsequence 307 complementary to linker 305. It also optionally has label 308 which distinctively labels primer 304. As in the case of adapters for 5' overhangs, in order that the RE digestion and ligation reactions go to
- primer 304 preferably has no 5' terminal
- primer 304 in order to prevent self-ligations, and preferably has a sequence such that no recognition site for any RE in one recognition reaction is created upon ligation of the primer with dsDNA 301.
- primer 304 should preferably not hybridize with any sequence in the initial sample mixture.
- the T m of primer 304 is preferably high, in the range from 50° to 80°C, and more preferably above 68°C. This ensures the subsequent PCR amplification can be controlled so that only primers and not linkers initiate new chains.
- this T ra can be achieved by using a primer having a G+C content preferably from 40-60%, most preferably from 55-60%, and a primer length most preferably of 24 nucleotide and less preferably of 18-30 nucleotides.
- Each primer 304 in a reaction can optionally have a distinguishable label 308, which is preferably a fluorochrome.
- Linker, or shorter strand, 305 is complementary to and hybridizes with subsequence 307 of primer 304 in a position adjacent to 3' overhang 302.
- Linker 305 is most preferably 8 nucleotides long, less preferably from 4-16 nucleotides, and has no terminal phosphates to prevent self- ligation. This linker only promotes ligation specificity and activity and does not link primer 304 to the cut dsDNA, as in the 5' case.
- linker 305 T m should preferably be less than primer 304 self-annealing T m .
- Fig. 3B illustrates an exemplary adapter with its primer and linker for the case of the RE
- a 3' adapter can also be constructed from a hairpin loop configuration.
- the adapter primer strand can have a conjugated capture moiety in addition to or in place of a conjugated label moiety.
- a conjugated capture moiety is advantageous in separating various classes of RE/ligase reaction products by binding the capture moiety to its binding partners. Acceptable and preferred capture moieties and binding partners have been previously described.
- a primer has a conjugated capture moiety, particularly biotin which form a streptavidin complex that is difficult to dissociate, it can advantageous to include a release means in the primer in order to achieve controlled release from the bound capture moiety. Release means can involve including subsequences in the primer which can be cleaved in a controlled manner.
- subsequence is one or more uracil nucleotides.
- digestion with uracil DNA glycosylase (UDG) and subsequent hydrolysis of the sugar backbone at an alkaline pH effects releases.
- UDG uracil DNA glycosylase
- Another exemplary such subsequence is the
- a preferred RE of this sort for human cDNA sequences is AscI, which has an 8 bp recognition sequence that rarely, if ever, occurs in
- AscI is further advantageously active at the ends of DNA molecules. In this case, digestion with this RE, i.e., AscI, will release strand 2351.
- adapters can be constructed from hybrid primers which are designed to facilitate the direct sequencing of a fragment or the direct generation of RNA probes for in situ hybridization with the tissue of origin of the DNA sample analyzed.
- Hybrid primers for direct sequencing are constructed by ligating onto the 5' end of existing primers the M13-21 primer, the M13 reverse primer, or equivalent sequences. Fragments generated with such hybrid adapters can be removed from the separation means and amplified and sequenced with conventional systems. Such sequence information can be used both for a previously known sequence to confirm the sequence determination and for a previously unknown sequence to isolate the putative new gene.
- Hybrid primers for direct generation of RNA hybridization probes are constructed by ligating onto the 5' end of
- hybrid adapters are illustrated in Sec. 6.8.
- the previously described adapters are used but the PCR primers strands have a extra subsequence 3' to the adapter primer strands in order to act as phasing primers. That is the PCR amplification reaction is used to recognize additional nucleotides beyond the initial RE target recognition subsequence.
- Fig. 2D the PCR primers strands have a extra subsequence 3' to the adapter primer strands in order to act as phasing primers. That is the PCR amplification reaction is used to recognize additional nucleotides beyond the initial RE target recognition subsequence.
- sample dsDNA 201 is illustrated after blunt-ending RE/ligase reaction products but just prior to a PCR
- dsDNA 201 has been cleaved at position 221 producing overhang 202 by an RE recognizing target recognition subsequence 227, has been ligated to adapter primer strand 203, and has been completed to a blunt ended double strand by strand 220 by incubation at 72°C for 10 minutes.
- the RE recognition subsequence 227 typically extends 1 bp beyond overhang 202. Other relative positions depend on the lengths of the overhang and the recognition sequence.
- Alternative PCR phasing primer 222 illustrated with its 5' end at the left, comprises subsequence 223, with the same sequence as strand 203; subsequence 224, with the same sequence as the RE overhang 202; subsequence 225, with a sequence consisting of a remaining portion of RE recognition subsequence 227, if any; and subsequence 226 of P nucleotides.
- Length P is preferably from 1 to 6 and more preferably either 1 or 2.
- Subsequences 223 and 224 hybridize for PCR priming with corresponding subsequences of dsDNA 201.
- Subsequence 225 hybridizes with any remaining portion of recognition
- subsequence 227 typically 1 bp.
- Subsequence 226 hybridizes only with fragments 201 having complementary nucleotides in corresponding positions 228.
- P is 1
- PCR primer 222 selects for PCR amplification 1 of 4 possible fragments 201; when P is 2, 1 of 16 are selected.
- primers 222 each with one of the possible (pairs of) nucleotides, in 4 (16) aliquots or RE/ligase reaction products selects for amplification one of the possible fragments 201.
- the effect of using PCR primers 222, having subsequences 226 of length P bp, is to extend the initially recognized RE target subsequence into an effective target subsequence, which is the initial RE target subsequence concatenated to a subsequence complementary to subsequence 226.
- an effective target subsequence which is the initial RE target subsequence concatenated to a subsequence complementary to subsequence 226.
- REs recognizing 4 bp subsequences can be used in such a combined reaction with an effective 5 or 6 bp target subsequence, which need not be palindromic.
- sequences can be used in a combined reaction to recognize 7 or 8 bp sequences. Such effective recognition sequences are input to the computer implemented design and analysis methods subsequently described.
- additional subsequence information can be generated from adapters comprising primers with specially placed Type IIS RE recognition subsequence followed by digestion with the Type IIS RE and sequencing of the generated overhang.
- the Type IIS recognition subsequence is placed so that the generated overhang is contiguous with the original recognition
- the subsequence is formed by concatenating the sequence of the Type IIS overhang and the original recognition sequence.
- the Type IIS recognition sequence is placed so that the sequence of the generated overhang is not contiguous with the original recognition sequence.
- the sequence of the overhang is used as an third internal subsequence in the fragment.
- the additionally recognized subsequence is used in the computer implemented experimental analysis methods to increase the capability of determining the source sequence of a fragment.
- the SEQ-QEATM Embodiment advantageously included combined enhancements, including label moieties, capture moieties, and release means.
- the steps of the preferred RE/ligase embodiment of QEATM comprise: first, in one reaction cutting a cDNA sample with one or more REs, hybridizing adapters corresponding to the RES, and ligating the primers of the adapters on the cut ends; second, amplifying the cut fragments, if necessary; and third, separating the fragments according to length and detecting fragment lengths and fragment target end
- the cDNA sample can be synthesized by methods commonly known in the art, such as those described in Sec. 6.3.
- additional steps to remove unwanted DNA fragments or RE/ligase reaction products prior to separation detection can increase QEATM signal to noise ratio or simplify interpretation of the resulting signals. Additional Re/ligase embodiments are described, including those known as 5'-QEATM and SEQ-QEATM.
- the RE/ ligase embodiment can begin with pre-synthesized cDNA, or with a tissue sample or mRNA from which cDNA is to be synthesized.
- cDNA is to be synthesized
- the exemplary methods and procedures of Sec. 6.3 can be used.
- QEATM does not require cloning into a vector.
- a first step is the largely conventional separation of RNA from the tissue sample.
- RNA is preferably poly (A)+ purified RNA, mRNA separated from particular cellular fractions, or less
- RNA preferably total cellular RNA.
- the steps of separation involve RNase extraction, DNase treatment and mRNA
- First and second strand cDNA synthesis from mRNA can be performed according to the protocols of Sec. 6.3.2, or the less preferred protocols of Sec. 6.3.4.
- the preferred synthesis protocols of Sec. 6.3.3, or functionally equivalent protocols can be used.
- the final preparation step of a DNA sample is removal of terminal phosphates from the cDNA sample., if needed.
- phosphate removal is preferably done with a heat-inactivated phosphatase.
- Phosphatase activity is preferably removed prior to RE digestion and adapter ligation step in order to prevent interference with the intended ligation of adapters to doubly cut fragments. Heat inactivation allows
- a preferred phosphatase comes from cold living Barents Sea (arctic) shrimp (U.S. Biochemical Corp.) ("shrimp alkaline phosphatase" or "SAP"). Terminal phosphate removal need be done only once for each population of cDNA being analyzed. In other embodiments alternative phosphatases can be used for terminal phosphate removal, such as calf intestinal
- phosphatase-alkaline from Boehringer Mannheim (Indianapolis, IN). Those that are not heat inactivated require a step to separate the phosphatase from the cDNA sample before the RE/ligase reactions, such as by phenol-chloroform extraction.
- the prepared cDNA is then separated into batches of from 1 picogram ("pg") to 200 nanograms ("ng") of cDNA each, and each batch is separately processed by the further steps of the method. A number of batches sufficient for whichever QEATM mode is to be practiced are made.
- pg picogram
- ng nanograms
- one sample is divided into approximately 50 batches, each batch is then subject to an RE/ligase recognition reaction to generate approximately 200-500 fragments, and more preferably 250 to 350 fragments of 10 to 1000 bp in length, the majority of fragments preferably having a distinct length and being uniquely derived from one cDNA sequence.
- a preferable tissue mode analysis entails approximately 50 batches generating approximately 300 bands each. For query mode experiments, fewer recognition
- cDNA preparation is the important step of simultaneous RE cutting of and adapter ligation to the sample cDNA sequences.
- the prepared sample is cut with one or more REs.
- the number of REs and associated adapters preferably are limited so that both a compressed length distribution consisting of shorter fragments is avoided and enough
- REs can be used without associated adapters in order that the amplified fragments not have the associated recognition sequences. Absence of these sequences can be used to additionally differentiate genes that happen to produce fragments of identical length with particular REs.
- Qlig mix reaction mix
- REs, adapters and ligase enzyme are simultaneously present for concurrent adapter ligation and RE cutting.
- the amount of RE enzyme in the reaction is preferably
- REs and corresponding adapters are chosen according to the previous description.
- Table 10 in Sec. 6.10 lists exemplary REs and corresponding primers and linkers.
- Table 11 in Sec. 6.10.1 lists exemplary combinations for biotin labeled primers.
- the method is adaptable to any ligase enzyme that is active in the temperature range 10 to 37°C.
- T4 DNA ligase is the preferred ligase.
- cloned T4 DNA ligase or T4 RNA ligase can also be used.
- thermostable ligases can be used, such as AmpligaseTM Thermostable DNA Ligase from Epicenpre (Madison, WI), which has a low blunt end ligation activity. These ligases in conjunction with the repetitive cycling of the basic thermal profile for the RE-ligase reaction, described in the following, permit more complete RE cutting and adapter ligation.
- Qlig mix also present in the Qlig mix are necessary buffers, as known in the art, and ATP.
- An excess of primers is preferably present in the Qlig mix in order than subsequent amplification can be performed in an automated manner.
- primers and linkers are present approximately in the ratio of 20:1 and to an adequate total primer amount of approximately 20 pm where 1 ng of cDNA is used. Less preferably the ratio is 10:1. Also, Betaine (Sigma
- Chemicals is preferably present in the Qlig reaction mix. Betaine has been found to improve the uniformity of signals from fragments that are at approximately the same original concentration by aiding ligation activity. Betaine also improves the PCR amplification of hard to amplify products.
- RE/ligase reaction conditions are optimized to minimize unwanted products.
- terminal phosphate removal from cDNA samples prevents unwanted ligation of cDNA blunt ends together and subsequent exponential amplification of the resulting dimers.
- Another class of unwanted products are fragment concatamers, formed when the sticky ends of cut cDNA fragments hybridize and ligate together. Fragment concatamers are removed by maintaining restriction enzymes activity during ligation in order to cut any unwanted concatamers.
- ligated primers terminate further RE cutting, since primers do not recreate RE recognition subsequences.
- a high molar excess of adapters is, therefore, preferable to limit concatamer formation by driving the RE and ligase reactions toward complete digestion and adapter ligation.
- unwanted adapter self-ligation is prevented since primers and linkers lack terminal phosphates (preferably due to synthesis without phosphates or less preferably due to pretreatment thereof with phosphatases).
- the temperature profile of the RE/ligase reaction is important for complete cutting and ligation.
- the preferred protocol has several steps.
- the first step is at the optimum RE temperature for a time sufficient to achieve substantially complete cutting, for example 37 °C for 30 minutes.
- the ligase used is preferably active during the first step.
- the second step is a ramp at -1 °C/min down to an optimum temperature for adapter annealing and primer ligation, for example, 16 °C.
- the third step achieves substantially complete primer ligation of cut products, and is, for example, at 16 °C for 60 minutes.
- the REs used are preferably active during this third step.
- the fourth step is again at the temperature for optimum RE activity to achieve complete cutting of recognition sites and unwanted ligation products, for example at 37 °C for 15 minutes.
- the fifth step is to heat inactivate the Qlig enzymes and is, for example, above 65 °C. If the PCR amplification is to be performed immediately, as in the preferred single tube protocol of Sec. 6.4.1., this fifth step is at 72 °C for 20 minutes and performs additional reactions to be subsequently described. If the PCR amplification is not to be immediately performed, the Qlig reaction results are held at 4 °C, as in the much less preferred multi-tube protocol as Sec. 6.4.5. This temperature profile, together with the subsequence PCR profile, is illustrated in Fig. 16D.
- a less preferred profile involves repetitive cycling of the first four steps of the temperature protocol described above, that is from an optimum RE temperature to an optimum annealing and ligation temperature, and back to an optimum RE temperature.
- the additional temperature cycles act to further drive the RE/ligase reactions to completion.
- thermostable ligase enzymes The majority of restriction enzymes are active at the conventional 16 °C ligation temperature and hence prevent unwanted ligations without thermal cycling.
- temperature profiles comprising alternating optimum ligation conditions and optimum RE conditions can cause both enzymatic reactions to proceed more rapidly than if at one constant temperature.
- An exemplary profile comprises periodically cycling between a 37 °C optimum RE temperature to a 16 °C optimum annealing and ligation temperature at a ramp of
- the RE and ligase enzymes are heat inactivated by a final stage above 65 °C for 10 minutes.
- thermocyclers for example from MJ Research
- the Qlig mix and reaction temperature profile are designed to achieve the substantially complete cutting of all RE recognition sites present in the analyzed sequence mixture and the substantially complete ligation of primers to cut ends, each primer being unique in one reaction for one particular RE cut end.
- the fragments generated are limited by adjacent RE recognition sites, with substantially no fragments having an internal undigested sites. Further, a minimum of unwanted self-ligation products and concatamers is formed.
- This invention is adaptable to other temperature profiles which achieve the same effect of substantially complete cutting and ligation. Exemplary alternative profiles are described in the accompanying examples in Sec. 6.4.
- a step for amplifying the doubly cut cDNA fragments Following the RE/ligase step is a step for amplifying the doubly cut cDNA fragments.
- PCR protocols are described in the exemplary embodiment of this invention, any amplification method that selects fragments to be amplified based on end sequences is adaptable to this invention (see above) .
- the amplification step can be dispensed with entirely. This is preferable as molecular amplification often distorts the quantitative response of this method.
- PCR amplification protocols used in this invention are designed to have maximum specificity and reproducibility. First, PCR amplification produces fewer unwanted products if the linkers remain substantially melted and unable to
- amplification primers typically strand 203 of Fig. 2A (and 304 of Fig. 3A), are preferably designed for high amplification specificity by having a high T m , preferably above 50 °C and most preferably above 68 °C, to ensure specific hybridization with a minimum of mismatches. They are further chosen not to hybridize with any native cDNA species to be analyzed.
- the previously described phasing primers which are alternatively used for PCR amplification, have similar properties.
- the PCR temperature profile is preferably designed for specificity and reproducibility.
- High annealing temperatures minimize primer mis-hybridizations. Longer extension times reduce PCR bias related to smaller fragments. Longer melting times reduces PCR amplification bias related to high G+C content.
- a preferred PCR temperature cycles is 95 °C for 30 sec., then 57 °C for 1 min., then 72 °C for 2 min. This preferred PCR temperature profile is illustrated in Fig. 16D. Fourth, it is preferable to include Betaine in the PCR reaction mix, as this has been found to improve amplification of hard to amplify products. To further reduce bias, large
- amplification volumes and a minimum number of amplification cycles typically between 10 and 30 cycles, are preferred.
- any other techniques designed to raise specificity, yield, or reproducibility of amplification are applicable to this method.
- one such technique is the use of 7-deaza-2'-dGTP in the PCR reaction in place of dGTP. This has been shown to increase PCR efficiency for G+C rich targets (Mutter et al., 1995, Nuc . Acid Res . 23: 1411-1418).
- another such technique is the addition of tetramethylammonium chloride to the reaction mixture, which has the effect of raising the T m (Chevet et al., 1995, Nucleic Acids Research 23(16) : 3343-3344).
- Amplifications of multiple identical samples with the same number of cycles serves to check reliability and quantitative response by comparing signals from each of the separately amplified aliquots.
- Amplifications of multiple identical samples with an increasing number of amplification cycles for example 10, 15, and 20 cycles, are preferable in that amplifications with a lower number of cycles can detect more prevalent fragments in a more quantitative manner, while amplification with a higher number of cycles can detect less prevalent fragments but less quantitatively.
- PCR reaction mix herein called the QPCR mix, is made from appropriate DNA
- Exemplary QPCR mix compositions can be found in the examples of Sec. 6.4.
- the QPCR mix is placed in a reaction tube, and a layer of wax melting near but below 72 °C is layered above the QPCR mix.
- the Qlig mix is placed above the wax layer and processed according to the previously described temperature profile, which does not melt the wax.
- the tube is incubated at 72°C for 20 min. This incubation melts the linkers from the fragments, melts the wax layer and allows the processed Qlig mix and the QPCR mix to combine, and finally, permits the DNA polymerase to complete the fragments to blunt-ended dsDNA. After this incubation, the PCR temperature profile is
- the preferred wax to prevent such intermingling is a mixture of Paraffin wax and ChilloutTM 14 wax in a 90:10 ratio
- the paraffin is a highly purified paraffin wax melting between 58 °C and 60 °C such as can be obtained from Fluka Chamical, Inc. (Ronkonkoma, N.Y.) as Paraffin Wax cat. no. 76243.
- Chillout 14 Liquid Wax is a low melting, purified paraffin oil available from MJ Research.
- This wax layer is created in the following manner. The reaction tubes are pre- waxed by melting the preferred wax onto the upper half of the sides of the tubes. The QPCR mix is added carefully avoiding this wax layer. Then the wax layer is melted onto the surface of the QPCR mix by incubating the tubes at 75°C for 2 min. The wax layer is then carefully solidified by
- PCR amplification can be performed in a separate tube.
- the QPCR mix is prepared in a second tube.
- the first tube with the processed Qlig mix is incubated at 72°C for approximately 10 min. in order to melt the linker from the fragments.
- An aliquot of the Qlig mix is then combined with the QPCR mix in the second tube, and a further incubation at 72°C for 10 minutes completes the fragments to blunt-ended dsDNA.
- the PCR temperature profile is performed according to the preferred protocol for a certain number of cycles.
- cleanup and separation steps prior to length separation and fragment detection can be advantageous to substantially eliminate certain unwanted DNA strands and thereby to improve the signal to noise ratio of QEATM signals, or to substantially separate the reaction products into various classes and thereby to simplify interpretation of detected fragment patterns by removing signal ambiguities.
- primer enhancements including conjugated capture moieties and release means.
- QEATM reaction products fall into certain categories. These categories, described without limitation in the case where the capture moiety is biotin, are:
- the additional method steps comprise contacting the amplified fragments with streptavidin affixed to a solid support, preferably streptavidin magnetic beads, washing the beads to in a non-denaturing wash buffer to remove unbound DNA, and then resuspending the beads in a denaturing loading buffer and separating the beads from this buffer.
- the denatured single strands are then passed to the separation and
- the biotinylated primer can include a release means in order to recover fragments of class "c".
- the releasing means e .g. UDG or AscI
- the releasing means can be applied to release the biotinylated strands for separation and detection. Fragments detected at this second separation in addition to those previously detected then represent class "c" products.
- capture moieties can be used in a single reaction to separate different classes of products.
- Capture moieties can be combined with release means to achieve similar separation.
- Label moieties can be combined with capture moieties to verify separations or to run reactions in parallel.
- This invention is adapted to other less preferred means for single strand separation and product concentration that are known in the art.
- single strands can be removed by the use of single strand specific exonucleases. Mung Rean exonuclease, Exo I or S1 nuclease can be used, with Exo I preferred because of its higher specificity for single strands while SI is least preferred.
- Other methods to remove unwanted strands include the affinity based methods of gel filtration and affinity column separation. Amplified
- products can be concentrated by ethanol precipitation or column separation.
- the last QEATM step is separation according to length of the amplified fragments followed by detection the fragment lengths and end labels (if any).
- Lengths of the fragments cut from a cDNA sample typically span a range from a few tens of bp to perhaps 1000 bp. Any separation method with adequate length resolution, preferably at least to three base pairs in a 1000 base pair sequence, can be used. It is preferred to use gel electrophoresis in any adequate
- Gel electrophoresis is capable of resolving separate fragments which differ by three or more base pairs and, with knowledge of average fragment composition and with correction of composition induced mobility differences, of achieving a length precision down to 1 bp.
- a preferable electrophoresis apparatus is an ABI 377 (Applied Biosystems, Inc.) automated sequencer using the Gene Scan software (ABI) for analysis.
- the electrophoresis can be done by suspending the reaction products in a loading buffer, which can be non-denaturing, in which the dsDNA remains hybridized and carries the labels (if any) of both primers.
- the buffer can also be denaturing, in which the dsDNA separates into single strands that typically are expected to migrate together (in he absence of large average differences in strand composition or significant strand secondary structure).
- fluorochrome labels can be typically be resolved from a single band in a gel, the products of one recognition
- reaction with several REs or other recognition means or of several separate recognition reactions can be analyzed in a single lane. However, where one band reveals signals from multiple fluorochrome labels, interpretation can be
- ambiguous is such a band due to one fragment cut with multiple REs or to multiple fragments each cut by one RE. In this case, it can also be advantageous to separate reaction products into classes.
- SEQ-QEATM is an alternative embodiment of the preferred method of practicing a RE/ligase embodiment of QEATM method as previously described in Sec. 5.2.2.
- a SEQ-QEATM method is able to identify an additional 4-6 terminal nucleotides adjacent to the recognition subsequence of the RE initially cutting a fragment.
- the effective target subsequence is the concatenation of the initial RE recognition subsequence and the additional 4-6 terminal nucleotides, and has, therefore, a length of at least from 8 to 12 nucleotides and preferably has a length of at least 10 nucleotides. This longer
- QEATM Analysis and Design Methods QEATM Analysis and Design Methods which involves searching a database of sequences to identify the sequence or gene from which the fragment derived.
- the longer effective target subsequence increases the capability of these methods to determine a unique source sequence for a fragment.
- Type IIS REs Next the specially constructed primers, and then the additional method steps of a SEQ-QEATM method used to recognize the additional nucleotides.
- a Type IIS RE is a restriction endonuclease enzyme which cuts a dsDNA molecule at locations outside of the recognition sequence of the Type IIS RE (Szybalski et al., 1991, Gene 100:13-26).
- Fig. 17C illustrates Type IIS RE 1731 cutting dsDNA 1730 outside of its recognition subsequence 1720 at locations 1708 and 1709.
- the Type IIS RE preferably generates an overhang by cutting the two dsDNA strands at locations differently displaced away on the two strands from the recognition sequence.
- the sequence of the generated overhang is determined by the dsDNA cut, in
- Type IIS REs are sequenced.
- Table 17 in Sec. 6.10.1 lists several Type IIS REs adaptable for use in the SEQ-QEATM method and their relevant characteristics, including their recognition subsequences on both DNA strands and the displacements from these recognition subsequences to the respective cutting sites. It is preferable to use REs of high specificity and generating an overhang of at least 4 bp displaced at least 4 or 5 bp beyond the recognition subsequence in order to span the remaining recognition subsequence of the RE that
- FokI and BbvI are most preferred Type IIS REs for the SEQ-QEATM method.
- the special primers, and the special linkers if needed, which hybridize to form the adapters for SEQ-QEATM have, in additional to the structure previously described in Sec. 5.2.1, a Type IIS recognition subsequence whose
- Figs. 17A-E schematically illustrates dsDNA 1702, which is a fragment cut from an original sample sequence on one end by a first RE and on the other end by a different second initial RE, with adapters fully hybridized but prior to primer ligation.
- linker strand 1711 has hybridized to primer strand 1712 and to the 5' overhang generated by the first RE, and how fixes primer 1712 adjacent to fragment 1702 for subsequent ligation.
- Primer 1712 has recognition subsequence 1720 for Type IIS RE 1721.
- Linker 1711 to the extent it overlaps and hybridizes with recognition subsequence 1720, has
- primer 1712 preferably has a conjugated label moiety 1734, e . g . a fluorescent FAM moiety.
- linker strand 1713 has hybridized to primer strand 1714 and to the. 5' overhang generated by the second RE.
- Primer 1714 preferably has a conjugated capture moiety 1732, e . g. a biotin moiety, and a release means represented by subsequence 1723.
- Subsequence 1704 terminating at nucleotide 1707 in Fig. 17B is the portion of the recognition subsequence of the first RE remaining after its cutting of the original sample sequence.
- the placement of the Type IIS RE recognition subsequence is determined by the length of this subsequence.
- Fig. 17A schematically illustrates how the length of
- the subsequence 1704 is determined by properties of the first RE.
- the first initial RE is chosen to be of a type that
- subsequence 1707 of sample dsDNA 1701, and that cuts the two strands of dsDNA 1701 at locations 1705 that are located within recognition subsequence 1703.
- subsequence 1703 be entirely determined by the first RE and be without indeterminate nucleotides.
- overhang subsequence 1706 is generated and has a known sequence, since it is entirely within the determined recognition subsequence 1703.
- subsequence 1704 the portion of the recognition subsequence 1703 remaining on a fragment cut by the first RE, has a length not less than the length of overhang 1706 and is typically longer.
- subsequence 1703 is of length 6 and is palindromic; locations 1705 are symmetrically placed in subsequence 1703; and overhang 1706 is of length 4.
- the typical length of the remaining portion 1704 of the recognition subsequence 1703 is of length 5.
- Type IIS recognition sequence 1720 is now be described with reference to Fig. 17C, which schematically illustrates dsDNA 1730, which derives from dsDNA 1702 of Fig. 17B after the further steps of primer ligation, PCR amplification with primers 1712 and 1714, binding of capture moiety 1732 to binding partner 1733 affixed to a solid-phase substrate, and binding of Type IIS RE 1731 to its recognition subsequence 1720.
- Subsequence 1722 is the subsequence between recognition subsequence 1720 and the end of primer 1712 at location 1705.
- Type IIS RE is illustrated cutting dsDNA 1730 at nucleotide locations 1708 and 1709 and, thereby, generating an exemplary 5' overhang 1724 between these locations. For this overhang to be contiguous with the remaining portion 1704 of initial target end subsequence 1703, nucleotide 1709 is adjacent to
- Type IIS recognition sequence 1720 is preferably placed on primer 1712 such that the length of subsequence 1704 plus the length of subsequence 1722 equals the distance of closest cutting of Type IIS RE 1731. For example, in the case of
- FokI since the closest cutting distance is 9 and the typical length of subsequence 1704 is 5, its recognition sequence is preferably placed 5 bp from the end of primer 1712. in the case of BbvI, since the closest cutting distance is 8, its recognition sequence is preferably placed 3 bp from the end of primer 1712.
- FIG. 17D schematically illustrates dsDNA
- dsDNA has 5' overhang 1724 between and including nucleotides 1708 and 1709, where the Type IIS RE cut dsDNA 1730 of Fig. 17C. This overhang is contiguous with former subsequence 1704, the remaining portion of the recognition sequence of the first RE, which has been cut off.
- the shorter strand has primer 1714 including release means represented by subsequence 1723.
- dsDNA 1730 remains bound to the solid-phase support through capture moiety 1732 and binding partner 1724. The absence of label moiety 1734 can be used to monitor the completeness of cutting by Type IIS RE 1731.
- This invention is also adaptable to other less preferable placements of recognition sequence 1720. If recognition sequence 1720 is placed closer to the 3' end of primer 1712 than the optimal and preferable distance, the overhang produced by Type IIS RE 1731 is not contiguous with recognition subsequence 1703 of the first RE, and a
- the determined sequence of the Type IIS RE generated overhang can be used as third internal subsequence information in QEATM experimental analysis methods in order to further resolve the source sequence of fragment 1702, if necessary.
- recognition sequence 1720 is placed further from the 3' end of the cut primer than the optimal and preferable distance, the overhang produced by Type IIS RE overlaps with recognition subsequence 1703 of the first RE.
- the length of the now contiguous effective target subsequence is less than the sum of the lengths of the Type IIS overhang and the first RE recognition subsequence. Effective target end subsequence information is, thereby, lost.
- recognition sequence 1710 is placed further from the 3' end than the distance of furthest cutting, no additional information is obtained.
- Primer 1714 also has certain additional structure.
- primer 1714 has capture moiety 1732 conjugated near or to its 5' end. Biotin/streptavidin are the preferred capture moiety/binding partner pair, which are used in the following description without limitation to this invention.
- primer 1714 has release means represented as subsequence 1723. As previously described, the release means allows controlled release of strand 1735 of Fig. 17D from the capture moiety/binding partner complex. This alternative is adaptable to any such controlled release means, including the cases where subsequence 1723 is one or more uracil
- nucleotides and where it is the recognition subsequence of an RE which cuts extremely rarely if at all in the sequences of the sample e.g. AscI.
- Release means are particularly useful in the case of biotin-streptavidin, which form a complex that is difficult to dissociate.
- Table 18 of Sec. 6.30.1 lists exemplary primers, linkers, and associated REs, for the preferred implementation of SEQ-QEATM in which contiguous effective target end
- SEQ-QEATM comprises, first, practicing the
- Figs. 17B-E illustrate various steps in a SEQ-QEATM method.
- Fig. 17B illustrates a fragment from a sample sequence digested by two different REs and just prior to primer ligation.
- Fig. 17C illustrates a sample sequence after primer ligation, chain blunt-ending, and PCR amplification.
- the additional steps unique to SEQ-QEATM include, first, binding the amplified fragments to a solid-phase support, also illustrated in Fig. 17C, second, washing the bound fragments, and third, digesting the bound fragments by the Type IIS RE corresponding to primer 1712 used.
- the Type IIS digestion is preferably performed with reaction
- Fig. 17D illustrates dsDNA fragments 1730 remaining after complete digestion by the Type IIS RE.
- an aliquot of the bound, amplified RE/ligase reaction products is denatured and the supernatant, containing the labeled 5' strands, are separated according to length by, e.g., gel electrophoresis, in order to determine the length of each fragment doubly cut by different REs.
- the subsequent additional SEQ-QEATM step is sequencing of overhang 1724.
- This can be done in any manner known in the art.
- an alternative, herein called a phasing QEATM method can be used to sequence this overhang.
- Phasing QEATM depends on the precise sequence specificity with which RE/ligase reactions recognize short overhangs, in this case the Type IIS generated overhang.
- Fig. 17E illustrates a first step of this embodiment in which a QEATM method adapter, which is comprised of primer 1751 with label moiety 1753 and linker 1750, has hybridized to overhang 1724 in Type IIS digested fragment 1730 bound to a solid-phase support.
- overhang 1724 is here illustrated as being 4 bp long.
- special phasing linkers are used.
- 4 pools of linkers 1750 are prepared.
- All linkers in each pool have one fixed nucleotide, i.e. one of either A, T, C, or G, at that position, e .g. position 1755, while random nucleotides in all combinations are present at the other three positions.
- For each nucleotide position of the overhang four RE/ligase reactions are performed according to QEATM protocols, one reaction using linkers from one of the four corresponding pools.
- the results of the four RE/ligase reactions are denatured and separated according to length, only one reaction of the four can produce labeled products at a length corresponding to the length of fragment 1730, namely the reaction with linkers complementary to position 1754 of overhang 1724.
- this overhang can be sequenced.
- the products of these four RE/ligase reactions can be further PCR amplified.
- linkers 1750 comprise subsequence 1756 that is uniquely related to the fixed nucleotide in subsequence 1752 and if four separately and distinguishably labeled primers 1751 complementary to these unique subsequences are used, all four RE/ligase reactions for one overhang position can be simultaneously performed in one reaction tube.
- release means 1723 can be omitted from primer 1714.
- sequencing of a 5' overhang can be done by standard Sanger reactions.
- strand 1735 is elongated by a DNA polymerase in the presence of labeled ddNTPs at a relatively high concentration to dNTPs in order to achieve frequent incorporation in the short 4-6 bp elongation.
- Partially elongated strands 1735 are released by denaturing fragment 1730, washing, and then by causing release means 1723 to release strands 1735 from the capture moiety bound to the solid phase support.
- the released, partially elongated strands are then separated by length, e.g., by gel electrophoresis, and the chain terminating ddNTP is observed at the length previously observed for that fragment. In this manner, the 4-6 bp overhang 1724 of each fragment can be quickly sequenced.
- the effective target subsequence information formed by concatenating the sequence of the Type IIS overhang to the sequence of the recognition subsequence of the first RE, is then input into QEATM Experimental Analysis methods, and is used as a longer target subsequence in order to determined the source of the fragment in question.
- This longer effective target subsequence information preferably permits exact and unique sample sequence identification.
- nucleic acid e.g. cDNA
- synthesis conditions are then only of indirect importance, in that they preferably adequately represent input mRNA.
- RE/ligase embodiments utilize signals from fragments of a nucleic acid that, although only singly cut by an RE on one end, nevertheless have a definite length, dependent only on nucleotide sequence, because of particular cDNA synthesis conditions that fix the other end.
- the cDNA synthesis conditions are of direct importance, in that these embodiments can only be used with cDNA synthesized according to the particular conditions. In general, these conditions insure that the cDNA begins or ends in a known relation, herein called "anchored)" to general landmarks on the input mRNA.
- anchored a known relation
- preferable anchoring landmarks include the 5' end of the poly (A) + tail present on the 3' end of the input mRNA, or the cap on the 5' end of the input mRNA.
- cDNA fragments terminated on their 5' end in a fixed relation to the 5' cap of the source mRNA and cut on their 3' end at the nearest recognition subsequence of a single RE have a
- cDNA fragments terminated on their 3' end in a fixed relation to the 5' end of the poly (A) + tail present on the source mRNA and cut on their 5' end at the nearest recognition sequence of a single RE also have a definite length and generate QEATM signals that can also be used to determine the source nucleic acid in the sample.
- such cDNA can be synthesized by a protocol which requires the presence of an intact 5' cap on the input mRNA.
- a protocol which requires the presence of an intact 5' cap on the input mRNA is described in Sec. 6.3.3.
- RNA ligase to ligate to a source mRNA at the nucleotide adjacent to the 5' cap a DNA-RNA chimera comprising a first DNA subsequence 5' to the ribonucleotide triplet GGA at the 3' end of the chimera.
- the RNA component of the DNA-RNA chimera is preferably GGA, but any RNA subsequence can be used that promotes effective ligation by the ligase chosen of the chimera to the source mRNA.
- the DNA oligonucleotide component is later used as a primer and is herein called a "5'-cap-primer"
- RNA ligase is T4 RNA ligase.
- First strand synthesis is then performed with a first DNA primer comprising the first DNA subsequence. Thereby, all cDNAs originate from input mRNAs having their 5' cap. Second strand synthesis is then performed with such second strand primers as are known in the art.
- Preferabl y second strand primers are three second strand primers mixed or in separate pools, each of which comprises a second DNA subsequence 5' to one of three oligo(dT) one-nucleotide phasing primers, as known in the art (Liang et al., 1994, Nuc. Acid Res. 22:5763-5764).
- the first DNA primer and a second DNA primer comprising the second DNA subsequence can be used in a PCR reaction to amplify the synthesized cDNA.
- This QEATM embodiment is adaptable to other methods known in the art to produce cDNAs with a 5' end anchored in a fixed relation to the 5' mRNA cap, for example the CapFinderTM PCR cDNA Library Construction Kit Clonetech (Palo Alto, CA). See also Schmidt et al., 1996, Nuc. Acids. Res. 24:1789-1791.
- the first and second DNA primer sequences are preferably chosen according to certain guidelines. First, they are chosen not to generate by themselves any PCR
- first and second primers are described in Sec. 6.3.3.
- Software packages are available for primer construction according to such guidelines, an example being OLIGOTM Version 4.0 For Macintosh from National Biosciences, Inc. (Plymouth, MN).
- the 5'-QEATM embodiment is performed according to the general methods Sec. 5.2.2, including the optional cleanup and separation steps.
- the QPCR mix is prepared as previously described.
- the Qlig mix includes the one RE chosen to cut the fragment and an associated adapter with primer excess. These primers are preferably be labeled are most preferably do not have a conjugated capture moiety.
- an extra primer which is the first DNA primer, that is the DNA portion of the chimera now appearing on the 5' end of the synthesized cDNA, together with a conjugated biotin moiety or other capture moiety,.
- the RE/ligase reactions and the subsequent PCR amplification are performed as previously described and result in the following classes of fragments.
- Such cDNA can be synthesized by protocols known in the art which utilize phasing primers.
- phasing primers can comprise a first DNA subsequence, which is constructed according to the previously described primer guidelines, 5' to one of three oligo(dT) one nucleotide phasing primer subsequences (Liang et al. 1994). Sequences MBTA, MBTC, and MBTG of Sec. 6.3.3 are exemplary of such primers.
- RE/ligase and PCR amplification reactions are carried out according to the protocol of the 5'-QEATM embodiment with the exception that the extra primer used in the Qlig mix is the first DNA subsequence used in the prior cDNA synthesis with a conjugated biotin or other capture moiety.
- signals are only generated from fragments cut by the chosen RE adjacent to the 3' end. These signals have a definite length, because the RE recognition site nearest the 3' end is determined only by the sequence of the nucleic acid.
- the signals generated from the singly cut fragments according to the protocols of this section can be used in the computer implemented experimental analysis methods of Sec. 5.4 in order to determine the sample nucleic source of a particular signal.
- the analysis methods need minimal
- This adaptation can be done in several ways, including simply specially marking in the signals that one target end subsequence is the 3' or 5' end as needed or by including in the generated signal an artificial and not naturally occurring target subsequence that represents the 3' or the 5' end as
- the embodiments of this section remove unwanted RE/ligase reaction products at least partially by utilizing cDNA with conjugated capture moieties, obtained perhaps from either first and second strand synthesis with primers having conjugated capture moieties or from PCR amplification of cDNA with such primers.
- the preferred capture moiety is biotin for which the corresponding binding partner is streptavidin attached to a solid support, preferably magnetic beads.
- a first QEATM embodiment in conjunction with sufficiently sensitive detection means can advantageously minimize or eliminate altogether the PCR amplification step.
- PCR amplification disadvantageously has a non-linear response well known in the arts, depending on such factors as fragment length, average base composition, and secondary structure. To improve quantitative response, it is preferred to
- output signal intensity is more nearly linearly responsive to the abundance of the input nucleic acids generating that signal.
- the amplification step serves both to amplify the signals from fragments of interest and simultaneously to dilute the signals from unwanted fragments without a definite sequence-dependent length and.
- fragments doubly cut with REs and ligated to adapters are exponentially amplified, while unwanted fragments singly cut by an RE are at best linearly amplified.
- doubly cut fragments are amplified 1000X while singly cut fragments are amplified 10X, fragments from sample nucleic acids with a relative abundance of 1% or more can be detected above the background noise while fragments from sample nucleic acids with a relative abundance of 1% or less can be lost in the unwanted
- More sensitive detection means decrease the need for amplification in order to generate observable signals.
- a minimum of 6 ⁇ 10 -18 moles of fluorochrome (approximately 10 5 molecules) is required for detection. Since one gram of cDNA contains about 10 -6 moles of transcripts, it is possible to detect transcripts to at least a 1% relative level from microgram quantities of mRNA. With greater mRNA quantities, proportionately rarer transcripts are detectable. Labeling and detection schemes of increased sensitivity permit use of less mRNA. Such a scheme of increased sensitivity is
- the first embodiment described in this section minimizes the need for amplification in order to dilute unwanted signals by using a capture moiety to remove unwanted singly cut
- Figs 4A, 4B, and 4C illustrate this alternative protocol, which preferably uses biotin as a capture moiety for direct removal of the singly cut 3' and 5' cDNA ends from the RE/ligase reaction products.
- cDNA first strands are synthesized according to the method of Sec. 6.3.3 using, for example, an oligo(dT) primer with a biotin molecule linked to a thymidine nucleotide.
- an oligo(dT) primer with a biotin molecule linked to a thymidine nucleotide.
- such a primer is
- Fig. 4A illustrates such a cDNA 401 with ends 407 and 408, poly(dA) subsequence 402, oligo(dT) primer 403 with biotin 404 attached.
- Fragment 409 is the cDNA sequence defined by these adjacent RE recognition sequences. Fragments 423 and 424 are singly cut fragments resulting from RE cleavages at subsequences 405 and 406.
- the cDNA is ligated into a circle.
- a ligation reaction using, for example, T4 DNA ligase is performed under sufficiently dilute conditions so that predominantly intramolecular ligations occur circularizing the cDNA, with a only a minimum of intermolecular, concatamer forming ligations. Reaction conditions favoring
- Concatamers can be separated from circularized single molecules by size
- FIG. 4B illustrates the circularized cDNA. Blunt end ligation occurred between ends 407 and 408.
- the circularized, biotin labeled, cDNA is cut with REs and ligated to adapters uniquely recognizing and perhaps uniquely labeled for each particular RE cut.
- the RE/ligase step is performed by procedures described in the sections hereinabove, for example in Sec. 5.2.2, so that RE digestion and primer ligation proceed to completion with minimal formation of concatamers and other unwanted ligation products.
- unwanted singly cut ends are removed by contacting the reaction products with streptavidin or avidin magnetic beads, leaving only doubly cut fragments that have RE-specific recognition sequences ligated to each end.
- Fig. 4C illustrates these steps. Sequences 405 and 406 are cut by RE, and RE 2 , respectively, and adapters 421 and 422 specific for cuts by RE 1 and RE 2 , respectively are ligated onto the overhangs. Thereby, fragment 409 is freed from the
- circularized cDNA and adapters 421 and 422 are ligated to it.
- the remaining segment of the circularized cDNA comprises singly cut ends 423 and 424 with ligated adapters 421 and 422. Both singly cut ends are joined to the primer sequence 403 with attached biotin 404. Removal is accomplished by contact with streptavidin or avidin 420 which is fixed to substrate 425, perhaps comprising magnetic beads. Doubly cut labeled fragment 409, now separated from the singly cut ends, can be separated according to length and detected with minimized background noise signals.
- the detected signals more quantitatively reflect the relative abundance of the source cDNA, and thus gene expression levels.
- the reaction products can be subjected to just the minimum number of cycles, for example according to the methods of Sec. 5.2.2, to detect the gene or sequence of interest.
- the number of cycles can be as small as four to eight without any concern of
- amplification is not needed to suppress signals from singly cut ends, and preferred more quantitative response signal intensities result.
- Another QEATM embodiment amplifies the cDNA sample prior to the RE/ligase reactions, removes unwanted fragments with a removal means, and then separates and detects the reaction products. Alternately, further amplification of the fragments of interest can be performed after the RE/ligase step.
- double stranded cDNA perhaps prepared from a tissue sample according to Sec.
- primers a conjugated capture moiety preferably biotin.
- a set of arbitrary primers with no net sequence preference can be used.
- the method of step 6 of that protocol can be used, except that both the MA24 and MB24 have a conjugated biotin.
- the resulting cDNA with biotin linked to both ends is then cut with one or more REs and ligated to adapters corresponding to the REs used.
- the adapter primers can be optionally labeled but cannot have a conjugated biotin.
- the RE/ligase reaction is preferably performed according to the protocols of Sec. 5.2.2 in order that the RE digestion and adapter ligation proceed to
- the reaction products comprise fragments of interest that are doubly cut by REs and without any conjugated biotin, and unwanted fragments with a biotin conjugated to one end that are singly cut and derive from the ends of cDNAs.
- the unwanted singly cut fragments are removed by contacting the reaction products with streptavidin beads.
- the purified fragments of interest can be blunt-ended and subject to further PCR amplification for a minimum number of cycles to observe the signals of interest.
- the products are then analyzed, also as in the prior sections, by separation according to length and by detection of the DNA and of the optionally labeled adapter primers, which indicate the RE cutting each fragment.
- removal means include but are not limited to digestion by single strand specific nucleases or passage though a single strand specific chromatographic column, for example, containing hydroxyapatite.
- conjugated capture moiety can combined with the other QEATM embodiments in various manners. This invention encompasses all such insubstantially different variations.
- QEATM methods not using REs is based on PCR, or alternative amplification means, to select and amplify cDNA fragments between chosen target subsequences recognized by amplification primers.
- target subsequences between four and eight base pairs long chosen by the methods previously described are preferred because of their greater probability of occurrence, and hence information content, as compared to longer subsequences.
- DNA oligomers this short may not hybridize reliably and reproducibly to their
- the RE embodiments of QEATM have been verified to produce reproducible signal patterns over a 103 range on input DNA concentrations.
- the PCR embodiment is less preferred because the input DNA concentration, as well as the initial hybridization temperature, must be closely to yield reproducible results.
- Primer 501 is constructed of three components, which, listed 5' to 3', are 504, 503, and 502. Component 503, described infra, is optional.
- Component 502 is a sequence which is complementary to the subsequence which primer 501 is designed to recognize. Component 502 is typically 4-8 bp long.
- Component 504 is a 10-20 bp sequence chosen so the final primer does not hybridize with any native sequence in the cDNA sample to be analyzed; that is, primer 501 does not anneal with any sequence known to be present in the sample to be analyzed.
- the sequence of component 504 is also chosen so that the final primer has a melting point above 50°C, and preferably above 68°C. The method for controlling melting temperature selecting average primer composition and primer length is described above.
- primer 501 in the PCR embodiment involves a first annealing step, which allows the 3' end component 502 to anneal to its target subsequence in the presence of end component 504, which may not hybridize.
- this annealing step is at a temperature between 36 and 44°C that is empirically determined to maximize reproducibility of the resulting signal pattern.
- the DNA concentration is approximately 10 ng/50 ml and is similarly determined to maximize reproducibility.
- Other PCR conditions are standard and are described in Sec. 6.6.
- hybridization matches of the target subsequence, the degree of inexactness depending on the stringency of the
- the signals generated contain only a fuzzy representation of the actual subsequence in the sample, the degree of fuzziness being a function of subsequence length and the stringency condition, that is binding free energy, and the temperature of the
- annealing steps ensure exact hybridization of the entire primer. No further false positive bands are generated.
- these PCR cycles alternate between a 65°C
- Optional component 503 can be used to improve the specificity of the first low stringency annealing step and thereby minimize false positive bands generated then.
- component 503 can be -(U) j -, where N is a "universal" nucleotide and j is typically between 2 and 4, preferably 3 or 4.
- a universal nucleotide, such as inosine, is capable of forming base pairs with any other naturally occurring nucleotide.
- single primer 501 has a 3' end subsequence effectively j bases longer than the target, and thus also has improved hybridization specificity.
- a less preferred primer design comprises sets of degenerate oligonucleotides of sufficient length to achieve specific and reproducible hybridization, where each member of a set includes a shared subsequence complementary to one selected, target sequence.
- the set of primers used may be all sequences of the form NNAATCNN, where N is any nucleotide.
- sets of degenerate primers permit the recognition of discontinuous subsequences. For example, GA--TT may be recognized by all sequences of the form NAANNTCNN.
- a universal nucleotide can be used in place of the degenerate nucleotides represented by 'N'.
- Each primer or primer set used in a single reaction is preferably distinctively labeled for detection.
- electrophoretic fragment In the preferred embodiment using electrophoretic fragment
- optical detection means simultaneously distinguished with optical detection means.
- cDNA samples can be prepared from any source or be directly obtained.
- the primers of the selected primer sets are used in a conventional PCR amplification protocol.
- a high molar excess of primers is preferably used to ensure only fragments between primer sites that are adjacent on a target cDNA sequence or gene are amplified. With a high molar excess of primers binding to all available primer binding sites, no amplified fragment should include internally any primer recognition site.
- many primers can be used in one reaction as can be labeled for concurrent separation and detection and which generate an adequately resolved length distribution, as in the RE
- each pair of fluorochromes preferably is distinguishable in one band and separate pairs preferably are distinguishable in separate bands.
- the fragments are separated, re-suspended for gel electrophoresis,
- the analysis methods comprise, first, selecting a database of DNA sequences representative of the DNA sample to be analyzed, second, using this database and a description of the experiment to derive the pattern of simulated signals, contained in a database of simulated signals, which will be produced by DNA fragments generated in the experiment, and third, for any particular detected signal, using the pattern or database of simulated signals to predict the sequences in the original sample likely to cause this signal.
- Further analysis methods present an easy to use user interface and permit determination of the sequences actually causing a signal in cases where the signal may arise from multiple sequences, .and perform statistical correlations to quickly determine signals of interest in multiple samples.
- the first analysis method is selecting a database of DNA sequences representative of the sample to be analyzed.
- the DNA sequences to be analyzed will be derived from a tissue sample, typically a human sample examined for diagnostic or research purposes.
- database selection begins with one or more publicly available databases which comprehensively record all observed DNA sequences.
- databases are GenBank from the National Center for Biotechnology Information (Bethesda, MD), the EMBL Data Library at the European Bioinformatics Institute (Hinxton Hall, UK) and databases from the National Center for Genome Research (Santa Fe, NM).
- GenBank GenBank from the National Center for Biotechnology Information (Bethesda, MD)
- EMBL Data Library at the European Bioinformatics Institute (Hinxton Hall, UK)
- databases from the National Center for Genome Research (Santa Fe, NM).
- any database containing entries for the sequences likely to be present in such a sample to be analyzed is usable in the further steps of the computer methods.
- Fig. 6A illustrates the preferred database selection method starting from a comprehensive tissue derived database.
- Database 1001 is the comprehensive input database, having the exemplary flat-file or relational structure 1010 shown in Fig. 6B, with one row, or record, 1014 for each entered DNA sequence.
- Column, or field, 1011 is the
- accession number field which uniquely identifies each sequence in database 1001. Most such databases contain redundant entries, that is multiple sequence records are present that are derived from one biological sequence.
- Column 1013 is the actual nucleotide sequence of the entry.
- the plurality of columns, or fields, represented by 1012 contain other data identifying this entry including, for example whether this is a cDNA or gDNA sequence, if cDNA, whether this is a full length coding sequence or a fragment, the species origin of the sequence or its product, the name of the gene containing the sequence, if known, etc.
- GenBank has 15 different divisions, of which the EST division and the separate database, dbEST, that contain expressed sequence tags (“EST”) are of particular interest, since they contain expressed sequences.
- genomic DNA sample database 1001 is scanned against criteria 1002 for human gDNA to create selected database 1003.
- cDNA sequences include a genomic sequence, a genomic sequence, coding domain sequences ("CDS"), and ESTs.
- CDS coding domain sequences
- ESTs can be selected 1006 to create selected database 1007 of expressed sequences.
- selected databases can be composed of sequences that can be selected according to any available relevant field, indication, or combination present in sequence databases.
- the second analysis method uses the previously selected database of sequences likely to be present in a sample and a description of an intended experiment to derive a pattern of the signals which will be produced by DNA fragments generated in the experiment.
- This pattern can be stored in a computer implementation in any convenient manner. In the following, without limitation, it is described as being stored as a table of information. This table may be stored as individual records or by using a database system, such as any conventionally available relational database. Alternatively, the pattern may simply be stored as the image of the in-memory structures which represent the pattern.
- a QEATM experiment comprises several independent recognition reactions applied to the DNA sample sequences, where in each of the reactions labeled DNA fragments are produced from sample sequences, the fragments lying between certain target subsequences in a sample sequence. The target subsequences can be recognized and the fragments generated by the preferred RE embodiments of QEATM methods or by the PCR embodiment of QEATM. The following description is focused on the RE embodiments.
- FIG. 7 illustrates an exemplary description 1100 of a preferred QEATM embodiment.
- Field 1101 contains a
- tissue sample which is the source of the DNA sample.
- one experiment could analyze a normal prostrate sample; a second otherwise identical experiment could analyze a prostrate sample with premalignant changes; and a third experiment could analyze a cancerous prostate sample. Differences in gene expression between these samples then relate to the progress of the cancer disease state. Such samples could be drawn from any other human cancer or malignancy.
- Major rows 1102, 1105, and 1109 describe the separate individual recognition reactions to which the DNA from tissue sample 1101 is subjected. Any number of
- reaction 1 specified by major row 1102 generates fragments between target subsequences which are the recognition sites of restriction endonucleases 1 and 2 described in minor rows 1103 and 1104. Further, the RE1 cut end is recognized by a labeling moiety labeled with
- reaction 15, 1109 utilizes restriction endonucleases 36 and 37 labeled with labels 3 and 4, minor rows 1110 and 1111, respectively.
- Major row 1105 describes a variant QEATM reaction using three REs and a separate probe.
- many REs can be used in a single recognition reaction as long as a useful fragment distribution results. Too many REs results in a compressed length distribution.
- probes for target subsequences that are not intended to be labeled fragment ends, but rather occur within a fragment can be used.
- a labeled probe added after QEATM PCR amplification step if present in a given embodiment
- a post PCR probe can recognize subsequences internal to a fragment and thereby provide an additional signal which can be used to discriminate between two sample sequences which produce fragments of the same length and end sequence which otherwise have differing internal sequences.
- a probe added before QEATM PCR step and which cannot be extended by DNA polymerase will prevent PCR amplification of those fragment containing the probe's target subsequences. If PCR aroplification is necessary to generate detectable signals (in a given embodiment), such a probe will prevent the detection of such a fragment. The absence of a fragment may make a previously ambiguous detected band now unambiguous.
- PCR disruption probes can be PNA oligomers or degenerate sets of DNA oligomers, modified to prevent polymerase extension
- FIG. 8A illustrates, in general, that from the database selected to best represent the likely DNA sequences in the sample analyzed, 1201, and the description of QEATM experiment, 1202, the simulation methods, 1203, determine a pattern of simulated signals stored in a simulated database, 1204, that represents the results of QEATM experiments.
- the experimental simulation generates the same fragment lengths and end subsequences from the input database that will be generated in an actual experiment performed on the same sample of DNA sequences.
- the simulated pattern or database may not be needed, in which case the DNA database is searched sequence by sequence, mock digestions are performed and compared against the input signals.
- a simulated database is preferable if several signals need to be searched or if the same QEATM experiment is run several times.
- the simulated database can be dispensed with when few signals from a few experiments need to searched.
- a quantitative statement of when the simulated database is more efficient depends upon an analysis of the costs of the various
- Fig. 8B illustrates an exemplary structure for the simulated database.
- the simulated results of all the individual recognition reactions defined for the experiment are gathered into rectangular table 1210.
- the invention is equally adaptable to other database structures containing equivalent information; such an equivalent structure would be one, for example, where, each reaction was placed in a
- table 1210 The rows of table 1210 are indexed by the lengths of possible fragments. For example, row 1211
- columns 1210 contains fragments of length 52.
- the columns of table 1210 are indexed by the possible end subsequences and probe hits, if any, in a particular experimental reaction.
- columns 1212, 1213, and 1214 contain all fragments generated in reaction 1, R1, which have both end subsequences
- entries in table 1210 contain lists of the accession numbers of sequences in the database that give rise to a fragment with particular length and end subsequences. For example, entry 1215 indicates that only accession number A01 generates a fragment of length 52 with both end subsequences recognized by RE1 in R1.
- entry 1216 indicates that accession numbers A01 and S003 generate a fragment of length 151 with both end subsequences recognized by RE3 in reaction 2.
- the contents of the table can be supplemented with various information.
- this information can aid in the interpretation of results produced by the separation and detection means used. For example, if separation is by electrophoresis, then the detected electrophoretic DNA length can be corrected to obtain the true physical DNA length. Such corrections are well known in the electrophoretic arts and depend on such factors, as average base composition and fluorochrome labels. One commercially available package for making these
- each table entry for a fragment can contain additionally average base composition, perhaps expressed as percent G+C content, and the
- experimental definition can include primer average base composition and fluorochrome label used.
- additional information can be the molecular weight of each fragment and perhaps a typically fragmentation pattern. Use of other separation and detection means can suggest the use of other appropriate supplemental data.
- labels are used to detect binding reaction events by subsequence recognition means to the target DNA, to allow detection after separation of the fragments by length.
- these labels are fluorochromes covalently attached to the primer strands of the adapters, as previously described, or to hybridization probes, if any.
- all the fluorochrome labels used in one reaction are simultaneously distinguishable so that fragments with all possible combinations of target
- subsequences can be fluorescently distinguished. For example, fragments at entry 1217 in table 1210 (Fig. 8B) occur at length 175 and present simultaneous fluorescent signals LABEL1 and LABEL2 upon stimulation, since these are the labels used with adapters which recognize ends cuts by RE1 and RE2 respectively.
- major row 1105 of experimental definition 1100 a fragment with ends cut by RE2 and RE3 and hybridizing with probe P will present simultaneous signals LABEL2, LABEL3, and LABEL4.
- effective target subsequences are constructed. e.g. by SEQ-QEATM or. alternative phasing primers, this lookup is appropriately modified.
- subsequences can be identically labeled or not labeled at all, in which case the corresponding group of fragments are not distinguishable. In this case, if RE1 and RE3 end.
- a fragment of length 151 may be generated by sequence T162, A01, or S003, or any combination of these sequences.
- silver (Ag) staining of an electrophoresis gel is used in an embodiment to detect separated fragments, then all bands will be identically labeled and only band lengths can be distinguished within one electrophoresis lane.
- the simulated database together with the experimental definition can be used to predict experimental results. If a signal is detected in a recognition reaction, say Rn, whose end labelings are LABEL1 and LABEL2 and whose representation of length is corrected to physical length in base pairs of L, the length L row of the simulated database is retrieved and it is scanned for Rn entries with the detected subsequence labeling, by using the column headings indicating observed subsequences and the experimental
- this fragment represents a new gene or sequence not present in the selected database. If a match is found, then this fragment, in addition to possibly being a new gene or sequence, can also have been generated by those candidate sequences present in the table entry(ies) found.
- the simulated database lookup is described herein as using the physical length of a detected fragment.
- lookup is augmented to account for such as approximation.
- electrophoresis when used as the separation means, returns the electrophoretic length, which depending on average base composition and labeling moiety is typically within 10% of the physical length.
- database lookup can search all relevant entries whose physical length is within 10% of the reported electrophoretic length, perform corrections to obtain electrophoretic length, and then check for a match with the detected signal.
- Alternative lookup implementations arc apparent, one being to precompute the electrophoretic length for all predicted fragments, construct an alternate table index over the electrophoretic length, and then
- matched candidate database sequences are found, then the selected database can be consulted to determine other information concerning these sequences, for example, gene name, tissue origin, chromosomal location, etc. If an unpredicted fragment is found, this fragment can be
- length separation means optionally retrieved from the length separation means, cloned or sequenced, and used to search for homologues in a DNA sequence database or to isolate or characterize the
- this invention can be used to rapidly discover and identify new genes.
- the computer methods of this invention are also adaptable to other formats of an experimental definition.
- the labeling of the target subsequence is also adaptable to other formats of an experimental definition. For example, the labeling of the target subsequence is also adaptable to other formats of an experimental definition. For example, the labeling of the target subsequence is also adaptable to other formats of an experimental definition. For example, the labeling of the target subsequence is also adaptable to other formats of an experimental definition. For example, the labeling of the target subsequence
- recognition moieties can be stored in a table separate from the table defining the experimental reactions.
- FIG. 9 illustrates a basic method, termed herein mock fragmentation, which takes one sequence and the definition of one reaction of an experiment and produces the predicted results of the reaction on that sequence. Generation of the entire simulated database requires repetitive execution of this basic method.
- the method commences at 1301 and at 1302 it inputs the sequence to be fragmented and the definition of the fragmentation reaction, in the following terms: the target end subsequences RE1 ... REn. where n is typically 2 or 3, and the subsequences to be recognized by post PCR probes, P1 ... Pn, where n is typically 0 or 1.
- PCR disruption probes act as unlabeled and subsequences and are so treated for input to this method.
- the operation of the method is illustrated by example in Fig. 10A-F for the case RE1, RE2 and P1.
- the method makes a "vector of ends", which has elements which are pairs of nucleotide positions along the sequence, each pair being labeled by the corresponding end subsequence.
- vector of ends which has elements which are pairs of nucleotide positions along the sequence, each pair being labeled by the corresponding end subsequence.
- the first member of each pair is the beginning of a target end subsequence and the second member is the end of a target end subsequence.
- the first member of each pair is the beginning of the overhang region that corresponds to the RE recognition subsequence and the second member is the end of that overhang region. It is preferred to use REs that generate 4 bp overhangs.
- the actual target end subsequences are the RE recognition sequences, which are preferably 4-8 bp long.
- This vector is generated by a string operation which compares the target end subsequence in a 5' to 3' direction against the input sequence and seeks string matches, that is the nucleotides match exactly.
- effective target subsequences are formed by using, e.g. SEQ-QEATM or alternative phasing primers
- a more efficient string matching algorithm such as the Knuth-Morris-Pratt or the Boyer-Moore algorithms. These are described with sample code in Sedgewick, 1990, Algorithms in C, chap. 19, Addison-Wesley, Reading, MA.
- target subsequence are recognized with accuracy
- the comparison of target subsequence against input sequence should be exact, that is the bases should match in a one-to-one manner.
- the string match should be done in a less exact, or fuzzy, manner.
- a target subsequence of length T can be
- Fig. 10A illustrates end vectors 1401 and 1402, comprising three and two ends, respectively, generated by RE1 and RE2, which are for this example assumed to be REs with a 4 bp overhang.
- the first overhang in vector 1401 occurs between nucleotide 10 and 14 in the input sequence.
- Step 1304 of Fig. 9 merges all the end vectors for all the end subsequences and sorts the elements on the position of the end.
- Vector 1404 of Fig. 10B illustrates the result of this step for example end vectors 1401 and 1402.
- Step 1305 of Fig. 9 then creates the fragments generated by the reaction by selecting the parts of the full input sequence that are delimited by adjacent ends in the merged and sorted end vector. Since the experimental
- conditions in conducting QEATM should be selected such that target end subsequence recognition is allowed to go to completion, all possible ends are recognized.
- the cutting and ligase reactions should be conducted such that all possible RE cuts are made and to each cut end a labeled primer is ligated.
- fragment sequence can be
- the fragment length is the difference between the end position of the second end
- the fragment length is the difference between the start position of the second end subsequence and the start position of the first end subsequence plus twice the primer length (48 in the preferred primer embodiment).
- Fig. 10C illustrates the exemplary fragments generated, each fragment being represented by a 4 member tuple comprising: the two end subsequences, the length, and an indicator whether the probe binds to this fragment.
- the position of this indicator is indicated by a '*'.
- Fragment 1408 is defined by ends 1405 and 1406, and fragment 1409 by ends 1406 and 1407. There is no fragment defined by ends 1405 and 1407 because the intermediate end subsequence is recognized and either fully cut in an RE embodiment or used as a fragment end priming position in a PCR embodiment.
- the fragment lengths are illustrated for the RE embodiment without the primer length addition.
- Step 1306 of. Fig. 9 checks if a hybridization probe is involved in the experiment. If not, the method skips to step 1309. If so, step 1307 determines the sequence of the fragment defined in step 1305.
- Fig. 10D illustrates that the fragment sequences for this example are the nucleotide sequences within the input sequence that are between the indicated nucleotide positions. For example, the first fragment sequence is the part of the input sequence between positions 10 and 62.
- Step 1308 then checks each probe subsequence against each fragment sequence to determine whether there is any match (i.e., whether the probe has a sequence complementary enough to the fragment sequence sufficient for it to hybridize thereon). If a match is found, an indication is made in the fragment 4 member tuple. This match is done by string searching in a similar manner to that described for generation of the end vectors.
- step 1309 of Fig. 9 all the fragment are sorted on length and assembled into a vector of sorted fragments, which is output from the mock fragmentation method at step 1310.
- This vector contains the complete list of all fragments, with probe information, defined by their end subsequences and lengths that the input reaction will generate from the input sequence.
- Fig. 10E illustrates the fragment vector of the example sorted according to length.
- probe PI was found to hybridize only to the third fragment 1412, where a 'Y' is marked. 'N' is marked in all the other fragments, indicating no probe binding.
- the simulated database is generated by iteratively applying the basic mock fragmentation method for each sequence in the selected database and each reaction in the experimental definition.
- Fig. 11 illustrates a simulated database generation method. The method starts at 1501 and at 1502 inputs the selected representative database and the experimental definition with, in particular, the list of reactions and their related subsequences. Step 1503
- Step 1504 a DO loop, causes the iterative execution of steps 1505, 1506, and 1507 for all sequences in the input selected database.
- Step 1505 takes the next sequence in the database, as selected by the enclosing DO loop, and the next reaction of the experiment and performs the mock fragmentation method of Fig. 9, on these inputs.
- Step 1506 adds the sorted fragment vector to the simulated database by taking each fragment from the vector and adding the sequence accession number to the list in the database entry indexed by the fragment length and end subsequences and probe (if any).
- Fig. 10F represents the simulated database entry list
- accession number A01 is added to the accession number list in the entry 1412 at length 151 and with both end subsequences RE2.
- step 1507 tests whether there is another reaction in the input experiment that should be simulated against this sequence. If so, step 1505 is repeated with this reaction. If not, the DO loop is repeated to select another database sequence. If all the database sequences have been selected, the step 1508 outputs the simulated database and the method ends at 1509.
- the goal of the experimental design methods is to optimize each experiment in order to obtain the maximum amount of quantitative information.
- An experiment is defined by its component recognition reactions, which are in turn defined by the target end subsequences recognized, probes used, if any, and labels assigned. If alternative phasing primers, SEQ-QEATM, or other similar means are used, effective target subsequences are used. Any of several criteria can. be used to ascertain the amount of information obtained, and any of several algorithms can be used to perform the reaction optimization.
- a preferred criteria for ascertaining the amount of information uses the concept of "good sequence.”
- a good sequence for an experiment is a sequence for which there is at least one reaction in the experiment that produces a unique signal from that sequence, that is, a fragment is produced from that good sequence, by at least one recognition reaction, that has a unique combination of length and labeling.
- the sequence with accession number A01 is a good sequence because reaction 1 produces signal 1215, with length 52 and with both target end subsequences recognized by RE1, uniquely from sequence A01.
- sequence S003 is not a good sequence because there are no unique signals produced only from S003 : reaction R2 produces signal 1216 from both A01 and S003 and signal 1219 from both Q012 and S003.
- expression of different good sequences can be obtained by comparing the relative intensities of the signal uniquely produced from the good sequences.
- An absolute quantitative measure of the expression of a good sequence can be obtained by including a concentration standard in the original sample. Such a standard for a particular experiment can consist of several different good sequences known not to occur in the original sample and which are introduced at known
- exogenous good sequence 1 is added at a 1:10 3 concentration in molar terms; exogenous good sequence 2 at a 1:10 4 in molar terms; etc. Then comparison of the relative intensity of the unique signal of a good
- the sequence in the sample with the intensities of the unique signal of the standards allows determination of the molar concentrations of the sample sequence. For example, if the good sequence has a unique signal intensity half way between the unique signal intensities of good sequences 1 and 2, then it is present at a concentration half way between the
- Another preferred measure for ascertaining the amount of information produced by an experiment is derived by limiting attention to a particular set of sequences of interest, for example a set of known oncogenes or a set of receptors known or expected to be present in a particular tissue sample.
- An experiment is designed according to this measure to maximize the number of sequences of interest that are good sequences. Whether other sequences possibly present in the sample are good sequences is not considered. These other sequences are of interest only to the extent that the sequences of interest produce uniquely labeled fragments without any contribution from these other sequences.
- This invention is adaptable to other measures for ascertaining information from an experiment. For example, another measure is to minimize on average the number of sequences contributing to each detected signal. A further measure is, for example, to minimize for each possible sequence the number of other sequences that occur in common in the same signals. In that case each sequence is linked by common occurrences in fragment labelings to a minimum number of other sequences. This can simplify making unambiguous signal peaks of interest (see infra) .
- optimization methods choose target subsequences, and possibly probes, which optimize the chosen measure.
- One possible optimization method is exhaustive search, in which all subsequences in lengths less than approximately 10 are tested in all combinations for that combination which is optimum. This method requires considerable computing power, and the upper bound is determined by the computational facilities available and the average probability of occurrence of subsequences of a given length. With adequate resources, it is preferable to search all sequences down to a probability of occurrence of about 0.005 to 0.01. Upper bounds may range from 8 to 11 or 12.
- Simulated annealing attempts to find the minimum of an "energy” function of the "state” of a system by generating small changes in the state and accepting such changes according to a probabilistic factor to create a "better” new state. While the method progresses, a simulated "temperature”, on which the probabilistic factor depends and which limits acceptance of new states of higher energy, is slowly lowered.
- a "state”, denoted by S, is the experimental definition, that is the target end subsequences and
- the “energy”, denoted E, is taken to be 1.0 divided by the information measure, so that when the energy is minimized, the information is maximized. Alternatively, the energy can be any monotonically decreasing function of the information measure.
- the computation of the energy is denoted by applying the function E( ) to a state.
- the preferred method of generating a new experiment, or state, from an existing experiment, or state is to make the following changes, also called moves to the experimental definition: (1) randomly change a target end subsequence in a randomly chosen recognition reaction; (2) add a randomly chosen target end subsequence to a randomly chosen reaction; (3) remove a randomly chosen target end subsequence from a randomly chosen reaction with three or more target subsequences; (4) add a new reaction with two randomly chosen target end subsequences; and (5) remove a randomly chosen reaction. If an RE embodiment of QEATM is being designed, all target end subsequences are limited to available RE recognition sequences.
- reactions is to be fixed, moves (4) and (5) are skipped.
- the invention is further adaptable to other moves for generating new experiments. Preferable generation methods will generate all possible experiments.
- E 0 and T 0 are defined by the maximum of the information measure. For example, if the number of good sequences of interest is G and is used as the information measure, then E 0 , which equals T 0 , equals 1/G.
- An initial temperature, denoted T 1 is preferably chosen to be 1.
- An initial experimental definition, or state is chosen, either randomly or guided by prior knowledge of previous
- N which is preferably taken to be 100
- f the temperature decay factor
- Fig. 13A With choices for the information measure or energy function, the moves for generating new experiments, an initial state or experiment, and the execution parameters made as above, the general application of simulated annealing to optimize an experimental definition is illustrated in Fig. 13A.
- the information measure used in this description is the number of good sequences of interest. Any information measure, such as those previously described, may be used alternately.
- the method begins at step 1701.
- the temperature is set to the initial temperature; the state to the initial state or experimental definition; and the energy is set to the energy of the initial state.
- the temperature and energy are checked to determine whether either is less than or equal to the minima for the
- Step 1706 is a DO loop which executes an epoch, or N iterations, of the simulated annealing algorithm. Each iteration consists of steps 1707 through 1711. Step 1707 generates a new experimental definition, or state, S new , according to the described generation moves. Step 1708 ascertains or determines the information content, or energy, of S new . Step 1709 tests the energy of the new state, and, if it is lower than the energy of the current state, at step 1711, the new state and new energy are accepted and replace the current state and current energy. If the energy of the new state is higher than the energy of the current state, step 1710 computes the following function.
- This. function defines the probabilistic factor controlling acceptance. If this function is less than a random chosen number uniformly distributed between 0 and 1, then the new state is accepted at step 1711. If not, then the newly generated state is discarded. These steps are equivalent to accepting a new state if the energy is not increased by an amount greater than that determined by function (4) in conjunction with the selection of a random number. Or in other words, a new state is accepted if the new information measure is not decreased by an amount greater than indirectly determined by function (4).
- step 1712 the temperature is reduced by the multiplicative factor f and the method loops back to the test at step 1703.
- Step 1708 The computation of the energy of an experimental definition, or state, in step 1708 is illustrated more detail in Fig. 13B.
- This method starts at step 1720.
- Step 1721 inputs the current experimental definition.
- Step 1722 determines a complete digest database from this definition and a particular selected database by the method of Fig. 11.
- Step 1723 scans the entire digest database and counts the number of good sequences of interest. If the total number of good sequences is the measure used, the total number of good sequences can be counted. Alternatively, other information measures may be applied to the digest database.
- Step 1724 computes the energy as the inverse of the information
- Step 1725 outputs the energy, and the method ends at step 1726.
- two related tissue samples can be subject to the same experiment, perhaps consisting of only one recognition reaction, and the outcomes compared.
- the two tissue samples may be otherwise identical except for one being normal and the other diseased, perhaps by infection or a proliferative process, such as hyperplasia or cancer.
- One or more signals may be detected in one sample and not in the other sample. Such signals might represent genetic aspects of the pathological process in one tissue. These signals are of particular interest.
- the candidate sequences that can produce a signal of interest are determined, as previously described, by lookup in the digest database.
- the signal may be produced by only one sequence, in which case it is unambiguously
- the signal may be ambiguous in that it may be produced by several candidate sequences from the selected database.
- a signal of interest may be made unambiguous in several manners which are described herein.
- the first manner of making unambiguous can be extended to the case where one of the sequences possibly contributing to a signal is not a good sequence.
- Fig. 14 illustrates a preferred ranking method.
- the method begins at step 1801 and at step 1802 inputs the list of possible accession numbers in a signal of interest, the experimental definition, and the actual experimental results.
- DO-loop 1803 iterates once for each possible accession number.
- Step 1804 performs a simulated experiment by the method illustrated in Fig. 11 in which, however, only the current accession number is acted on.
- the output is a single sequence digest table, such as illustrated in Fig. 10F.
- Step 1805 determines a numerical score of ranking the similarity of this digest table to the experimental results.
- One possible scoring metric comprises scanning the digest table for all fragment signals and adding 1 to the score if such a signal appears also in the experimental results and subtracting 1 from the score if such signal does not appear in the experimental results. Alternate scoring metrics are possible. For example, the subtraction of l may be omitted.
- Step 1806 sorts the numerical scores of the likelihood that each possible accession number is actually present in the sample.
- Step 1807 outputs the sorted list and the method ends at step 1808.
- the colony calling embodiment recognizes and classifies single, individual genes or DNA sequences by determining the presence or absence of target subsequences. No length information is determined.
- This embodiment is directed to gene determination and classification of arrayed samples or colonies, where each sample or colony contains or expresses only one sequence or gene of interest and is perhaps prepared from a tissue cDNA library.
- the presence or absence of target subsequences in a colony is determined by use of labeled hybridization recognition means, each of which uniquely binds to one target subsequence. It is preferable that this binding be highly specific and reproducible.
- Each sample or colony, or an array of samples or colonies is assayed for the contained sequence by determining which of the set of probes recognizes and thus hybridizes to target subsequences in the sample(s) or colony(ies).
- Each sample is then characterized by a hash code, each bit of which
- the size of the set of recognition means should be as small as possible, preferably less than 50 elements and more preferably from 15 to 25 elements. Further, it is most preferable that all possible sequences or genes are recognized and uniquely determined. It is preferable that 90 to 95% of all possible sequences be recognized, with each sequence being indistinguishable from, or ambiguous with, at most one or two other sequences.
- each target subsequence preferably occurs
- recognition means needed. For example, it is not practical for this invention, directed to rapid gene classification, if each probe recognized only a few genes and therefore
- each target subsequence preferably does not occur so frequently that its presence conveys little information.
- a probe recognizing every gene conveys no information.
- each target subsequence to have a probability of occurrence in all the genes or sequences that can appear in a sample or colony of
- target subsequences of length 4 to 6 meet this condition, as longer sequences occur too infrequently to make useful hash codes.
- the presence of one target subsequence is preferably independent of the presence of any other target subsequence in the same sequence or gene.
- the maximal number of genes or sequences that can be represented by a hash code is 2 n , where n is the number of target subsequences.
- a simple test to determine whether the target subsequences occur frequently enough in the expected gene library is made by comparing the actual probabilities of the two hash codes that have all target subsequences either present or absent to the ideal probabilities of these codes. If p is the probability that any target subsequence occurs in a given sequence in the library, then probability that none of the target subsequences occur in a random gene is (1-p) n . The closer the ratio (1-p) n /2 -n is to 1 the more efficient is the code.
- the preferred method of selecting target subsequences meeting the probability of occurrence and independence criteria is to use a database containing
- sequences generally expected to be present in the samples to be analyzed, for example human GenBank sequences for human tissue derived samples.
- oligomer frequency tables are compiled containing the frequencies of, preferably, all 4 to 8-mers. From these tables, candidate subsequences with the desired probability of occurrence are selected. Each candidate target subsequence is then checked for independent occurrence, by, for example, checking that the conditional probability for a hit by any selected pair of candidates is approximately the product of the probabilities of the individual candidate hit probabilities.
- Candidate target subsequences meeting both occurrence and independence criteria are possible target subsequences. A sufficient number, typically 20, of any of these subsequences can be selected as target subsequences for a hash code.
- the initially set of target subsequences can be optimized, using information on the actual occurrences of the initially selected target subsequences in the sequence database, resulting in a set of target subsequences selected which recognizes a maximum number of genes with a minimum number of sequences and with a minimum amount of recognition ambiguity.
- this optimization can also be performed on a sub-set of the database comprised of sequences or genes of particular biological or medical interest, for example, the set of all oncogenes or growth factors. In this manner, fewer target subsequences can be chosen which distinguish more efficiently among a set of sequences or genes of particular interest and distinguish that set of genes from the sequences of the remainder of the sample.
- Example 6.6 illustrates the results of the simulated annealing optimization method. Simulated annealing generally produces a choice of subsequences that achieve the same resolution while using approximately 20% fewer total sequences than a selection guided only by the probability principles previously described. This level of optimization is likely to improve with larger and less redundant databases that represent longer genes.
- An alternative to using single target subsequences is to use sets of target subsequences, recognized by sets of identically labeled hybridization probes, to generate one presence or absence indication for the hash code.
- sets of longer target subsequences would be chosen such that the presence of any target subsequence in the set is a presence indication. Absence means no element of the set is present. If the sets are chosen so that their probability of presence in a single sequence is near 50%, preferably from 10 to 50%, and the presence or absence of one set is independent of the presence or absence of any other set, such sets can be used to construct codes equally well as single subsequences.
- a resulting code will be efficient and can be further optimized by simulated annealing, as for single target subsequence codes.
- Target sets of longer subsequences are preferable where experimental recognition of shorter subsequences is less specific and reproducible, as for example is true where short DNA oligomers are used as hybridization probes for recognition.
- a code can consist of presence or absence indications of mixed target sets of subsequences and single target subsequences.
- Probes for a target subsequence are preferably PNA oligomers, or less preferably DNA oligomers, which hybridize to the subsequence of interest.
- Use of sets of degenerate DNA oligomers to more specifically and reliably hybridize to short DNA subsequences has been described in relation to the PCR implementation of QEATM methods.
- the use of PNAs is preferred in the colony calling embodiment since PNA
- PNAs are even more preferable when, in the alternative, the hash code comprises presence or absence indication of target sets of longer subsequences. In this case, many more DNA probes are generally required than PNA probes.
- target sets can consist of subsequences of length 6 to 8. since DNA oligomers of this length may not reliably hybridize, each subsequence in the set must in turn be represented by a further degenerate set of DNA oligomers, requiring thereby a set of sets.
- the experimental method of colony calling comprises three principal steps: first, arraying cDNA libraries on filters or other suitable substrates; second, PNA
- DNA hybridization can be used; and third, interpreting the resulting hash code to determine the sequence in the sample.
- the first step which can be omitted if arrayed cDNA libraries are already available, is constructing and arraying cDNA libraries. Any methods known in the art may be used. For example, cDNA libraries from normal or diseased tissues can be constructed according to Example 6.3.
- Example 6.7 can be used to generate these arrays from cDNA libraries.
- the second step is probe (e.g., PNA) hybridization and detection.
- Fluorescently labeled PNA oligomers are available from PerSeptive Biosystems (Bedford, MA) or can be synthesized. PNAs are designed to be complementary to the chosen target subsequences and to have a maximum number of distinguishable labels for simultaneous hybridization with multiple oligomers.
- PNA hybridization is performed according to standard protocols developed by the manufacturer and detailed in Example 6.7. Detection of the PNA signals uses optical spectrographic means to distinguish fluorochrome emissions similar to those used in DNA analysis instruments, but appropriately modified to recognize spots on filters as opposed to linearly arrayed bands.
- the third step, interpretation of the hash code, is done by the computer implemented method described in the following section.
- the intensity of the detected hybridization signal indicates the number of times the probe binds to the sample sequence. In this manner the number of recognized target subsequences present in the sample can be determined. This information can be used to more precisely classify of identify a sample.
- the colony calling ("CC") computer implemented methods are similar to QEATM computer methods.
- QEATM the experimental analysis methods are described before the experimental design methods.
- CC EXPERIMENTAL ANALYSIS METHODS The analysis methods make use of a mock experiment concept. First, a database is selected to represent possible sequences in the sample by the same methods as described for QEATM analysis. These are illustrated and described with reference to Fig. 6A. For CC, an experimental definition is simply a list of N P target subsequences, where N P is
- each hash code being a string of N P binary digits wherein the n'th digit is a 1 (0) if the n'th target
- Step 1901 starts at step 1901 and at step 1902 it inputs a selected database and on experimental definition consisting of N P target subsequences.
- Step 1903 initializes a table which for each of the 2 NP hash codes can contain a list of possible accession numbers which have this hash code.
- Step 1904 is a DO loop which iterates through all sequences in the database. For a particular sequence, step 1905 checks for each target subsequence whether that subsequence
- step 1907 outputs the code table and the method ends at step 1908.
- the identification is ambiguous. If the list is empty, the sample is not in the selected database and may possibly be a previously unknown sequence.
- a code table can be dispensed with if only a few hash codes need to be looked up from only a few experiments. Then the DNA database is scanned sequence by sequence for those sequences generating the hash code of interest. If many hash codes from many experiments need to be analyzed, a code table is more efficient.
- the quantitative decision of when to build a code table depends on the costs of the various operations and the size of DNA database, and can be performed as is well known in the computer arts. Without limitation, this description is built on the use of a code table.
- step 1905 checks whether each member of such a set of target subsequences is found in the sample sequence. If any member is found in the sequence, then this information is used to construct the hash code. 5.6.2. CC EXPERIMENTAL DESIGN METHODS
- the goal of CC experimental design is to maximize the amount cf information from a CC hybridization experiment. This is also performed by defining an
- the preferred information measure is the number of occupied hash codes. This is equivalent to minimizing the number of accession numbers which can result in a given hash code. In fact for N P greater than about 17 to 18, that is for 2 N P greater than the number of expressed human genes (about 100,000), maximizing the number of occupied hash codes can result in each hash code representing a single sequence.
- the invention is adaptable to other CC
- the energy is taken to be 1.0 divided by the information content; alternatively, any monotonically
- the energy is determined by performing the mock experiment of Fig. 15 using a particular experimental definition and then applying the measure to the resulting code table. For example, if the number of occupied hash codes is the
- this number can be computed by simply scanning the code table and counting the number of table entries with non-empty accession number lists.
- the Boltzman constant is again taken to be 1 so that the temperature equals the energy.
- the initial temperature is preferably 1.0.
- En which equals T 0 , is 1.0 divided by the number of sequences in the selected database.
- definition from an existing definition is to pick randomly one target subsequence and to perform one of the following moves: (1) randomly modifying one or more nucleotides; (2) adding a random nucleotide; and (3) removing a random
- a modification is discarded if it results in two identical target subsequences. Further, it is desirable to discard a modification if the resulting subsequence has an extreme probability of binding to sequences in the database.
- the invention is further adaptable to other methods of generating new experiments. Preferably, generation methods used will randomly generate all possible experiments.
- An initial experimental definition can be picked by taking N P randomly chosen subsequences or by using subsequences from prior optimization.
- N is preferably taken to be 100 and the temperature decay factor, denoted by f, is preferably taken to be 0.95. Both N and f may be systematically varied case-by-case to achieve a better experimental definition with lower energy and a higher information measure.
- the simulated annealing optimization method of Fig. 13A can be performed to obtain an optimized set of target subsequences.
- N P different initial N P can be selected, the prior design optimization performed, and the results compared.
- the Np with the maximum information measure is optimum for the selected database.
- the pattern of simulated hash codes stored in the code table is augmented with additional information.
- this additional information comprises recording the number of times each target subsequence is found in such a sequence. These numbers are simply determined by scanning the entire sequence and counting the number of occurrences of each target subsequence.
- An exemplary method to perform hash code look up in this augmented table is to first find the sequences giving rise to a particular hash code as a binary number, and second to pick from these the most likely sequence as that sequence having the most similar pattern of subsequence counts to the detected quantitative hybridization signal.
- An exemplary method to determine such similarity is to linearly normalize the detected signal so that the smallest hybridization signal is 1.0 and then to find the closest sequence by using a
- each pattern of subsequence counts may alternatively be considered as a distinct code entry for evaluation of an information measure. This is instead of considering each hash code alone a
- the apparatus of this invention includes means for performing the recognition reactions of this invention in a preferably automated fashion, for example by the protocols of ⁇ 6.4.3, and means for performing the computer implemented experimental analysis and design methods of this invention.
- the subsequent discussion is directed to embodiments of apparatus for QEATM embodiments of this invention, similar apparatus is adaptable to the CC embodiments. Such adaption includes using, in place of the corresponding components for QEATM embodiments, automatic laboratory instruments
- Fig. 12A illustrates an exemplary apparatus for QEATM embodiments of this invention, and with the described adaption, also for the CC embodiments of this invention.
- Computer 1601 can be, alternatively, a UNIX based work station type computer, an MS-DOS or Windows based personal computer, a Macintosh personal computer, or another equivalent computer.
- computer 1601 is a PowerPCTM based Macintosh computer with software systems capable of running both Macintosh and MS-DOS/Windows programs.
- Fig. 12B illustrates the general software structure in RAM memory 1650 of computer 1601 in a preferred
- Macintosh operating system 1655 At the lowest software level is Macintosh operating system 1655. This system contains features 1656 and 1657 for permitting execution of UNIX programs and MS-DOS or Windows programs alongside Macintosh programs in computer 1601. At the next higher software level are the preferred languages in which the computer methods of this invention are implemented. LabView 1658, from National Instruments
- control routines 1661 for the laboratory instruments, exemplified by 1651 and 1652, which perform the recognition reactions and fragment separation and detection.
- C or C++ languages 1659 are preferred for implementing experimental routines 1662, which are described in Sec. 5.4 and 5.6. Less preferred, but useful for rapid prototyping, are various scripting languages known in the art.
- PowerBuilder 1660 from Sybase (Denver, CO) , is preferred for implementing the user interfaces to the computer implemented routines and methods.
- programs implementing the described computer methods are divided into instrument control routines 1661 and experimental analysis and design routines 1662.
- Control routines 1661 interact with laboratory instruments, exemplified by 1651 and 1652, which physically perform QEATM and CC protocols.
- Experimental routines 1662 interact with storage devices, exemplified by devices 1654 and 1653, which store DNA sequence databases and experimental results.
- instrument control methods 1661 computational experimental methods 1661
- graphical interface methods can be on different processors in any combination or sub-combination.
- Input/output devices include color display device 1620 controlled by a keyboard and standard mouse 1603 for output display of instrument control information and
- Input and output data are preferably stored on disk devices such as 1604, 1605, 1624, and 1625 connected to computer 1601 through links 1606.
- the data can be stored on any
- links 1606 can be either local attachments, whereby all the disks can be in the computer cabinet (s), LAN attachments, whereby the data can be on other local server computers, or remote links, whereby the data can be on distant servers.
- Instruments 1630 and 1631 exemplify laboratory devices for performing, in a partly or wholly automatic manner, QEATM recognition reactions. These instruments can be, for example, automatic thermal cyclers, laboratory robots, and controllable separation and detection apparatus, such as is found in the applicants' copending U.S. Patent Application 08/438,231 filed May 9, 1995. Links 1632
- Sample flow can be either
- a QEATM experiment is designed, performed, and analyzed, preferably in a manner as automatic as possible.
- a QEATM experiment is designed, according to the methods specified in Sec. 5.4.2 as
- Database 1604 can be local to or remote from computer 1601. Database selection performed by processor 1601 executing the described methods generates one or more representative selected databases 1605. Output from the experimental design methods are tables, exemplified by 1609 and 1615, which, for a QEATM RE embodiment, specify the recognition reaction and the REs used for each recognition reaction.
- Exemplary experiment 1607 is defined by tissue sample 1608, which may be normal or diseased, experimental definition 1609, and physical recognition reactions 1610 as defined by 1609.
- instrument 1630 is a laboratory robot for automating reaction
- computer 1601 commands and controls robot 1630 to perform reactions 1610 on cDNA samples prepared from tissue 1608.
- instrument 1631 is a separation and detection instrument, the results of these reactions are then transferred, automatically or manually, to 1631 for
- Computer 1601 commands and
- the detection information is input to computer 1601 over links 1632 and is stored on storage device 1624, along with the experimental design tables and information on the tissue sample source for processing. Since this
- Experiment 1613 is processed similarly along sample pathway 1633, with robot 1630 performing recognition
- Fragment detection data is input by computer 1601 and stored on storage device 1625. In this case, for example, silver staining is used, and detection data is image 1617 of the stained bands.
- instrument control routines 1661 provide the detailed control signals needed by instruments 1630 and 1631. These routines also allow
- Simulated database 1612 for experiment 1607 is generated by the analysis methods executing on processor 1601 using as input the appropriate selected database 1605 and experimental definition 1609, and is output in table 1612.
- table 1618 is the corresponding simulated database of signals for experiment 1613, and is generated from appropriate selected database 1605 and experimental definition 1615.
- a signal is made unambiguous by
- Display device 1602 presents an exemplary user interface for the data generated by the methods of this invention.
- This user interface is programmed preferably by using the Powerbuilder display front end.
- selection buttons which can be used to select the particular experiment and the particular reaction of the experiment whose results are to be displayed.
- histological images of the tissue source of the sample are presented for selection and display in window 1621. These images are typically observed, digitized, and stored on computer 1601 as part of sample preparation.
- the results of the selected reaction of the selected experiment are displayed in window 1622.
- a fluorescent trace output of a particular labeling is made available.
- Window 1622 is indexed by marks 1626 representing the possible locations of DNA fragments of successive integer lengths.
- Window 1623 displays contents from simulated database 1612. Using, for example, mouse 1603, a particular fragment length index 1626 is selected. The processor then retrieves from the simulated database the list of accession numbers that could generate a peak of that length with the displayed end labeling. This window can also contain further information about these sequences, such as gene name, bibliographic data, etc. This further information may be available in selected databases 1605 or may require queries to the complete sequence database 1604 based on the accession numbers. In this manner, a user can interactively inquire into the possible sequences causing particular results and can then scan to other reactions of the experiment by using buttons 1620 to seek other evidence of the presence cf these sequences.
- a particular accession number is selected from window 1623 with mouse 1603, and processor 1601 scans the simulated database for all other fragment lengths and their recognition reactions that could be produced by this
- accession number In a further window, these lengths and reactions are displayed, and the user allowed to select further reactions for display in order to confirm or refute the presence of this accession number in the tissue sample. If one of these other fragments are generated uniquely by this sequence (a "good sequence", see supra) , that fragment can be highlighted as of particular interest. By displaying the results of the generating reaction of that unique
- system 1601 can aid the determination of signals of interest by automating the visual comparison by performing statistical analysis of signals from samples of the same tissue in different states.
- Display 1602 shows which reproducible signals vary across the states, thereby guiding the user in the selection of signals of interest.
- FIG. 12C An alternative, explicitly distributed embodiment of this apparatus is illustrated in Fig. 12C. Shown here are laboratory instruments 1670, DNA sequence database systems 1684, and computer systems 1671 and 1673, all of which cooperate to perform the methods of this invention as described above.
- This medium may be any dedicated or shared or local or remote communication medium known in the art.
- This medium can be a "campus" LAN network extending perhaps a few kilometers, a dedicated wide area communication system, or a shared network, such as the Internet.
- the system local attachments are adapted to the nature of medium 1674.
- Laboratory instruments 1670 are commanded by computer system 1671 to perform the automatable steps of the recognition reactions, separation of the reaction results, and detection and transmission of resulting signals through link 1672.
- Link 1672 can be any local or remote link known in the art that is adapted to instrument control, and may even be routed through communication medium 1674.
- DNA sequence database systems 1684 with various sequence databases 1685 may be remote from the other systems, for example, by being directly accessed at their sites of origin, such as Genbank at Bethesda, MD. Alternatively, parts or all of these databases may be periodically
- Computer system 1671 can perform various methods of this invention. For example, it can perform solely the control routine for control and monitoring of instrument system 1670, whereby experimental design and analysis are performed elsewhere, as at computer system 1673. In this case, system 1671 it would typically be operated by
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Physics & Mathematics (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Genetics & Genomics (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- Medical Informatics (AREA)
- Analytical Chemistry (AREA)
- Biomedical Technology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Immunology (AREA)
- Evolutionary Biology (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Public Health (AREA)
- Crystallography & Structural Chemistry (AREA)
- Bioethics (AREA)
- Software Systems (AREA)
- Plant Pathology (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Signal Processing (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002235860A CA2235860A1 (en) | 1995-10-24 | 1996-10-24 | Method and apparatus for identifying, classifying, or quantifying dna sequences in a sample without sequencing |
EP96936985A EP0866877A4 (en) | 1995-10-24 | 1996-10-24 | Method and apparatus for identifying, classifying, or quantifying dna sequences in a sample without sequencing |
AU74763/96A AU730830C (en) | 1995-10-24 | 1996-10-24 | Method and apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing |
JP9516817A JP2000500647A (en) | 1995-10-24 | 1996-10-24 | Method and apparatus for identifying, classifying or quantifying a DNA sequence in a sample without performing sequencing |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/547,214 US5871697A (en) | 1995-10-24 | 1995-10-24 | Method and apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing |
US663,823 | 1996-06-14 | ||
US08/663,823 US5972693A (en) | 1995-10-24 | 1996-06-14 | Apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing |
US547,214 | 1996-06-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1997015690A1 true WO1997015690A1 (en) | 1997-05-01 |
Family
ID=27068486
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1996/017159 WO1997015690A1 (en) | 1995-10-24 | 1996-10-24 | Method and apparatus for identifying, classifying, or quantifying dna sequences in a sample without sequencing |
Country Status (6)
Country | Link |
---|---|
US (1) | US5972693A (en) |
EP (1) | EP0866877A4 (en) |
JP (1) | JP2000500647A (en) |
AU (1) | AU730830C (en) |
IL (1) | IL124185A (en) |
WO (1) | WO1997015690A1 (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1999007896A2 (en) * | 1997-08-07 | 1999-02-18 | Curagen Corporation | Detection and confirmation of nucleic acid sequences by use of oligonucleotides comprising a subsequence hybridizing exactly to a known terminal sequence and a subsequence hybridizing to an unidentified sequence |
WO1999028505A1 (en) * | 1997-12-03 | 1999-06-10 | Curagen Corporation | Methods and devices for measuring differential gene expression |
WO1999028836A2 (en) * | 1997-11-28 | 1999-06-10 | Cybergene Ab | Arrangement and method for the analysis of nucleotide sequences |
WO2000015851A1 (en) * | 1998-09-17 | 2000-03-23 | Curagen Corporation | Geometrical and hierarchical classification based on gene expression |
WO2000034525A1 (en) * | 1998-12-09 | 2000-06-15 | Vistagen, Inc. | Toxicity typing using embryoid bodies |
WO2000040755A2 (en) * | 1999-01-06 | 2000-07-13 | Cornell Research Foundation, Inc. | Method for accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing |
WO2000040757A2 (en) * | 1999-01-08 | 2000-07-13 | Curagen Corporation | Method of identifying nucleic acids |
WO2000070099A2 (en) * | 1999-05-19 | 2000-11-23 | Mitokor | Differential gene expression in specific regions of the brain in neurodegenerative diseases |
WO2001075156A1 (en) * | 2000-03-31 | 2001-10-11 | Sanyo Electric Co., Ltd. | Microbe identifying method, microbe identifying apparatus, method for creating database for microbe identification, microbe identifying program, and recording medium on which the same is recorded |
US6470277B1 (en) | 1999-07-30 | 2002-10-22 | Agy Therapeutics, Inc. | Techniques for facilitating identification of candidate genes |
US6486299B1 (en) | 1998-09-28 | 2002-11-26 | Curagen Corporation | Genes and proteins predictive and therapeutic for stroke, hypertension, diabetes and obesity |
US6610480B1 (en) | 1997-11-10 | 2003-08-26 | Genentech, Inc. | Treatment and diagnosis of cardiac hypertrophy |
WO2007119779A1 (en) | 2006-04-14 | 2007-10-25 | Nec Corporation | Individual discrimination method and apparatus |
EP1872786A1 (en) * | 1997-09-05 | 2008-01-02 | The Regents of the University of California | Use of immunostimulatory oligonucleotides for preventing or reducing antigen-stimulated, granulocyte-mediated inflammation |
US7830575B2 (en) | 2006-04-10 | 2010-11-09 | Illumina, Inc. | Optical scanner with improved scan time |
US8143009B2 (en) | 2000-06-14 | 2012-03-27 | Vistagen, Inc. | Toxicity typing using liver stem cells |
US8288322B2 (en) | 2000-04-17 | 2012-10-16 | Dyax Corp. | Methods of constructing libraries comprising displayed and/or expressed members of a diverse family of peptides, polypeptides or proteins and the novel libraries |
US8367322B2 (en) | 1999-01-06 | 2013-02-05 | Cornell Research Foundation, Inc. | Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing |
US9268983B2 (en) | 2003-01-22 | 2016-02-23 | Illumina, Inc. | Optical system and method for reading encoded microbeads |
US9382535B2 (en) | 2000-04-17 | 2016-07-05 | Dyax Corp. | Methods of constructing libraries of genetic packages that collectively display the members of a diverse family of peptides, polypeptides or proteins |
US10683342B2 (en) | 2008-04-24 | 2020-06-16 | Dyax Corp. | Libraries of genetic packages comprising novel HC CDR1, CDR2, and CDR3 and novel LC CDR1, CDR2, and CDR3 designs |
US10718066B2 (en) | 2008-03-13 | 2020-07-21 | Dyax Corp. | Libraries of genetic packages comprising novel HC CDR3 designs |
Families Citing this family (101)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5871697A (en) | 1995-10-24 | 1999-02-16 | Curagen Corporation | Method and apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing |
US6418382B2 (en) | 1995-10-24 | 2002-07-09 | Curagen Corporation | Method and apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing |
CA2299325C (en) * | 1997-08-07 | 2008-01-08 | Imaging Research, Inc. | A digital imaging system for assays in well plates, gels and blots |
US20110166040A1 (en) * | 1997-09-05 | 2011-07-07 | Ibis Biosciences, Inc. | Compositions for use in identification of strains of e. coli o157:h7 |
US8337753B2 (en) | 1998-05-01 | 2012-12-25 | Gen-Probe Incorporated | Temperature-controlled incubator having a receptacle mixing mechanism |
EP1930078A1 (en) * | 1998-05-01 | 2008-06-11 | Gen-Probe Incorporated | Method for agitating the contents of a container |
JP2002528096A (en) * | 1998-10-27 | 2002-09-03 | アフィメトリックス インコーポレイテッド | Genomic DNA complexity control and analysis |
US20020012922A1 (en) * | 1998-11-04 | 2002-01-31 | Hilbush Brian S. | Simplified method for indexing and determining the relative concentration of expressed messenger RNAs |
US6613509B1 (en) * | 1999-03-22 | 2003-09-02 | Regents Of The University Of California | Determination of base (nucleotide) composition in DNA oligomers by mass spectrometry |
US6242189B1 (en) * | 1999-10-01 | 2001-06-05 | The Regents Of The University Of California | Selective isolation of bacterial mRNA |
US6618679B2 (en) * | 2000-01-28 | 2003-09-09 | Althea Technologies, Inc. | Methods for analysis of gene expression |
US20040024493A1 (en) * | 2000-05-08 | 2004-02-05 | Magnus Fagrell | Method, system, and sub-system, for processing a chemical reaction |
US6625599B1 (en) * | 2000-05-18 | 2003-09-23 | Rajendra Kumar Bera | Method and apparatus for data searching and computer-readable medium for supplying program instructions |
WO2001098535A2 (en) * | 2000-05-19 | 2001-12-27 | Curagen Corporation | Method for analyzing a nucleic acid |
US6887664B2 (en) * | 2000-06-06 | 2005-05-03 | Applera Corporation | Asynchronous primed PCR |
US7016788B2 (en) * | 2000-07-07 | 2006-03-21 | Curagen Corporation | Methods for classifying nucleic acids and polypeptides |
AU2002241501A1 (en) * | 2000-11-21 | 2002-06-03 | Affymetrix, Inc. | Methods and computer software products for selecting nucleic acid probes |
WO2002074953A2 (en) * | 2001-02-28 | 2002-09-26 | Lion Bioscience Ag | Gene library and a method for producing the same |
US20040121311A1 (en) | 2002-12-06 | 2004-06-24 | Ecker David J. | Methods for rapid detection and identification of bioagents in livestock |
US7666588B2 (en) | 2001-03-02 | 2010-02-23 | Ibis Biosciences, Inc. | Methods for rapid forensic analysis of mitochondrial DNA and characterization of mitochondrial DNA heteroplasmy |
US20030027135A1 (en) | 2001-03-02 | 2003-02-06 | Ecker David J. | Method for rapid detection and identification of bioagents |
US7226739B2 (en) | 2001-03-02 | 2007-06-05 | Isis Pharmaceuticals, Inc | Methods for rapid detection and identification of bioagents in epidemiological and forensic investigations |
US7217510B2 (en) | 2001-06-26 | 2007-05-15 | Isis Pharmaceuticals, Inc. | Methods for providing bacterial bioagent characterizing information |
US8073627B2 (en) | 2001-06-26 | 2011-12-06 | Ibis Biosciences, Inc. | System for indentification of pathogens |
US7194369B2 (en) * | 2001-07-23 | 2007-03-20 | Cognis Corporation | On-site analysis system with central processor and method of analyzing |
US6872529B2 (en) * | 2001-07-25 | 2005-03-29 | Affymetrix, Inc. | Complexity management of genomic DNA |
JP2003245098A (en) * | 2002-02-25 | 2003-09-02 | Hitachi Ltd | Method of searching gene and method of providing list |
US6586220B1 (en) * | 2002-02-26 | 2003-07-01 | New England Biolabs, Inc. | Method for cloning and expression of BsaWI restriction endonuclease and BsaWI methylase in E. coli |
US20070065816A1 (en) * | 2002-05-17 | 2007-03-22 | Affymetrix, Inc. | Methods for genotyping |
US9388459B2 (en) * | 2002-06-17 | 2016-07-12 | Affymetrix, Inc. | Methods for genotyping |
US7459273B2 (en) * | 2002-10-04 | 2008-12-02 | Affymetrix, Inc. | Methods for genotyping selected polymorphism |
JP2006516193A (en) | 2002-12-06 | 2006-06-29 | アイシス・ファーマシューティカルス・インコーポレーテッド | Rapid identification of pathogens in humans and animals |
WO2004065628A1 (en) * | 2003-01-21 | 2004-08-05 | Guoliang Fu | Quantitative multiplex detection of nucleic acids |
AU2003202671A1 (en) * | 2003-01-21 | 2004-08-13 | Guoliang Fu | Quantitative multiplex detection of nucleic acids |
US8046171B2 (en) | 2003-04-18 | 2011-10-25 | Ibis Biosciences, Inc. | Methods and apparatus for genetic evaluation |
US8057993B2 (en) | 2003-04-26 | 2011-11-15 | Ibis Biosciences, Inc. | Methods for identification of coronaviruses |
US8158354B2 (en) | 2003-05-13 | 2012-04-17 | Ibis Biosciences, Inc. | Methods for rapid purification of nucleic acids for subsequent analysis by mass spectrometry by solution capture |
US7964343B2 (en) | 2003-05-13 | 2011-06-21 | Ibis Biosciences, Inc. | Method for rapid purification of nucleic acids for subsequent analysis by mass spectrometry by solution capture |
US8114978B2 (en) | 2003-08-05 | 2012-02-14 | Affymetrix, Inc. | Methods for genotyping selected polymorphism |
JP4638431B2 (en) | 2003-08-06 | 2011-02-23 | ブリッジャー テクノロジーズ,インク. | Cross-linked component parts for target substance detection |
US8288523B2 (en) | 2003-09-11 | 2012-10-16 | Ibis Biosciences, Inc. | Compositions for use in identification of bacteria |
US8097416B2 (en) | 2003-09-11 | 2012-01-17 | Ibis Biosciences, Inc. | Methods for identification of sepsis-causing bacteria |
US20100129811A1 (en) * | 2003-09-11 | 2010-05-27 | Ibis Biosciences, Inc. | Compositions for use in identification of pseudomonas aeruginosa |
US8546082B2 (en) | 2003-09-11 | 2013-10-01 | Ibis Biosciences, Inc. | Methods for identification of sepsis-causing bacteria |
US8163895B2 (en) | 2003-12-05 | 2012-04-24 | Ibis Biosciences, Inc. | Compositions for use in identification of orthopoxviruses |
US7666592B2 (en) | 2004-02-18 | 2010-02-23 | Ibis Biosciences, Inc. | Methods for concurrent identification and quantification of an unknown bioagent |
US8119336B2 (en) | 2004-03-03 | 2012-02-21 | Ibis Biosciences, Inc. | Compositions for use in identification of alphaviruses |
CA2567839C (en) | 2004-05-24 | 2011-06-28 | Isis Pharmaceuticals, Inc. | Mass spectrometry with selective ion filtration by digital thresholding |
US20050266411A1 (en) | 2004-05-25 | 2005-12-01 | Hofstadler Steven A | Methods for rapid forensic analysis of mitochondrial DNA |
JP4590957B2 (en) * | 2004-07-13 | 2010-12-01 | 東洋紡績株式会社 | Method for promoting ligation reaction by DNA ligase and DNA ligase composition |
US7811753B2 (en) | 2004-07-14 | 2010-10-12 | Ibis Biosciences, Inc. | Methods for repairing degraded DNA |
EP1869180B1 (en) | 2005-03-03 | 2013-02-20 | Ibis Biosciences, Inc. | Compositions for use in identification of polyoma viruses |
US8084207B2 (en) | 2005-03-03 | 2011-12-27 | Ibis Bioscience, Inc. | Compositions for use in identification of papillomavirus |
US7452671B2 (en) * | 2005-04-29 | 2008-11-18 | Affymetrix, Inc. | Methods for genotyping with selective adaptor ligation |
US8026084B2 (en) | 2005-07-21 | 2011-09-27 | Ibis Biosciences, Inc. | Methods for rapid identification and quantitation of nucleic acid variants |
US11306351B2 (en) | 2005-12-21 | 2022-04-19 | Affymetrix, Inc. | Methods for genotyping |
EP2010679A2 (en) | 2006-04-06 | 2009-01-07 | Ibis Biosciences, Inc. | Compositions for the use in identification of fungi |
WO2008143627A2 (en) | 2006-09-14 | 2008-11-27 | Ibis Biosciences, Inc. | Targeted whole genome amplification method for identification of pathogens |
EP2126132B1 (en) | 2007-02-23 | 2013-03-20 | Ibis Biosciences, Inc. | Methods for rapid foresnsic dna analysis |
US20080293589A1 (en) * | 2007-05-24 | 2008-11-27 | Affymetrix, Inc. | Multiplex locus specific amplification |
US20100291544A1 (en) * | 2007-05-25 | 2010-11-18 | Ibis Biosciences, Inc. | Compositions for use in identification of strains of hepatitis c virus |
US9598724B2 (en) | 2007-06-01 | 2017-03-21 | Ibis Biosciences, Inc. | Methods and compositions for multiple displacement amplification of nucleic acids |
US20110045456A1 (en) * | 2007-06-14 | 2011-02-24 | Ibis Biosciences, Inc. | Compositions for use in identification of adventitious contaminant viruses |
US9388457B2 (en) | 2007-09-14 | 2016-07-12 | Affymetrix, Inc. | Locus specific amplification using array probes |
US20110015084A1 (en) * | 2007-10-25 | 2011-01-20 | Monsanto Technology Llc | Methods for Identifying Genetic Linkage |
WO2009131728A2 (en) * | 2008-01-29 | 2009-10-29 | Ibis Biosciences, Inc. | Compositions for use in identification of picornaviruses |
JP5344670B2 (en) * | 2008-02-13 | 2013-11-20 | 独立行政法人放射線医学総合研究所 | Gene expression analysis method, gene expression analysis apparatus, and gene expression analysis program |
US9074244B2 (en) * | 2008-03-11 | 2015-07-07 | Affymetrix, Inc. | Array-based translocation and rearrangement assays |
US20110177515A1 (en) * | 2008-05-30 | 2011-07-21 | Ibis Biosciences, Inc. | Compositions for use in identification of francisella |
WO2009155103A2 (en) * | 2008-05-30 | 2009-12-23 | Ibis Biosciences, Inc. | Compositions for use in identification of tick-borne pathogens |
US20110151437A1 (en) * | 2008-06-02 | 2011-06-23 | Ibis Biosciences, Inc. | Compositions for use in identification of adventitious viruses |
EP2344893B1 (en) | 2008-09-16 | 2014-10-15 | Ibis Biosciences, Inc. | Microplate handling systems and methods |
EP2349549B1 (en) | 2008-09-16 | 2012-07-18 | Ibis Biosciences, Inc. | Mixing cartridges, mixing stations, and related kits, and system |
WO2010033627A2 (en) | 2008-09-16 | 2010-03-25 | Ibis Biosciences, Inc. | Sample processing units, systems, and related methods |
WO2010039696A1 (en) * | 2008-10-02 | 2010-04-08 | Ibis Biosciences, Inc. | Compositions for use in identification of herpesviruses |
US20110189687A1 (en) * | 2008-10-02 | 2011-08-04 | Ibis Bioscience, Inc. | Compositions for use in identification of members of the bacterial genus mycoplasma |
US20110183343A1 (en) * | 2008-10-03 | 2011-07-28 | Rangarajan Sampath | Compositions for use in identification of members of the bacterial class alphaproteobacter |
US20110183345A1 (en) * | 2008-10-03 | 2011-07-28 | Ibis Biosciences, Inc. | Compositions for use in identification of streptococcus pneumoniae |
WO2010039870A1 (en) * | 2008-10-03 | 2010-04-08 | Ibis Biosciences, Inc. | Compositions for use in identification of neisseria, chlamydia, and/or chlamydophila bacteria |
WO2010039763A2 (en) * | 2008-10-03 | 2010-04-08 | Ibis Biosciences, Inc. | Compositions for use in identification of antibiotic-resistant bacteria |
WO2010039787A1 (en) * | 2008-10-03 | 2010-04-08 | Ibis Biosciences, Inc. | Compositions for use in identification of clostridium difficile |
EP2396803A4 (en) | 2009-02-12 | 2016-10-26 | Ibis Biosciences Inc | Ionization probe assemblies |
EP2396430B1 (en) * | 2009-02-16 | 2013-05-01 | Epicentre Technologies Corporation | Template-independent ligation of single-stranded dna |
WO2010104798A1 (en) | 2009-03-08 | 2010-09-16 | Ibis Biosciences, Inc. | Bioagent detection methods |
WO2010114842A1 (en) | 2009-03-30 | 2010-10-07 | Ibis Biosciences, Inc. | Bioagent detection systems, devices, and methods |
WO2011008972A1 (en) | 2009-07-17 | 2011-01-20 | Ibis Biosciences, Inc. | Systems for bioagent identification |
US8950604B2 (en) | 2009-07-17 | 2015-02-10 | Ibis Biosciences, Inc. | Lift and mount apparatus |
US9416409B2 (en) | 2009-07-31 | 2016-08-16 | Ibis Biosciences, Inc. | Capture primers and capture sequence linked solid supports for molecular diagnostic tests |
US9080209B2 (en) | 2009-08-06 | 2015-07-14 | Ibis Biosciences, Inc. | Non-mass determined base compositions for nucleic acid detection |
US20110059453A1 (en) * | 2009-08-23 | 2011-03-10 | Affymetrix, Inc. | Poly(A) Tail Length Measurement by PCR |
US20110065111A1 (en) * | 2009-08-31 | 2011-03-17 | Ibis Biosciences, Inc. | Compositions For Use In Genotyping Of Klebsiella Pneumoniae |
WO2011047307A1 (en) | 2009-10-15 | 2011-04-21 | Ibis Biosciences, Inc. | Multiple displacement amplification |
US9758840B2 (en) * | 2010-03-14 | 2017-09-12 | Ibis Biosciences, Inc. | Parasite detection via endosymbiont detection |
US8566603B2 (en) * | 2010-06-14 | 2013-10-22 | Seagate Technology Llc | Managing security operating modes |
CN103675303B (en) | 2010-07-23 | 2016-02-03 | 贝克曼考尔特公司 | Sensing system |
RU2603265C2 (en) | 2011-04-05 | 2016-11-27 | ДАУ АГРОСАЙЕНСИЗ ЭлЭлСи | High through-put analysis of transgene borders |
WO2013078019A1 (en) | 2011-11-22 | 2013-05-30 | Dow Agrosciences Llc | Three dimensional matrix analyses for high throughput sequencing |
EP2608088B1 (en) * | 2011-12-20 | 2018-12-12 | F. Hoffmann-La Roche AG | Improved method for nucleic acid analysis |
KR101930835B1 (en) * | 2016-11-29 | 2018-12-19 | 가천대학교 산학협력단 | A method and a system for producing combinational logic network based on gene expression |
US10427162B2 (en) | 2016-12-21 | 2019-10-01 | Quandx Inc. | Systems and methods for molecular diagnostics |
US11462299B2 (en) * | 2017-10-17 | 2022-10-04 | Invitae Corporation | Molecular evidence platform for auditable, continuous optimization of variant interpretation in genetic and genomic testing and analysis |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5366877A (en) * | 1988-01-26 | 1994-11-22 | Applied Biosystems, Inc. | Restriction/ligation labeling for primer initiated multiple copying of DNA ssequences |
WO1995021944A1 (en) * | 1994-02-14 | 1995-08-17 | Smithkline Beecham Corporation | Differentially expressed genes in healthy and diseased subjects |
Family Cites Families (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4555019A (en) * | 1981-11-10 | 1985-11-26 | The Procter & Gamble Company | Packaged detergent composition with instructions for use in a laundering process |
US5171534A (en) * | 1984-01-16 | 1992-12-15 | California Institute Of Technology | Automated DNA sequencing technique |
GB8606719D0 (en) * | 1986-03-19 | 1986-04-23 | Lister Preventive Med | Genetic probes |
US4987066A (en) * | 1986-11-07 | 1991-01-22 | Max Planck-Gesellschaft Zur Forderung Der Wissenschaften E.V. | Process for the detection of restriction fragment length polymorphisms in eukaryotic genomes |
US5202231A (en) * | 1987-04-01 | 1993-04-13 | Drmanac Radoje T | Method of sequencing of genomes by hybridization of oligonucleotide probes |
EP0392546A3 (en) * | 1989-04-14 | 1991-09-11 | Ro Institut Za Molekularnu Genetiku I Geneticko Inzenjerstvo | Process for determination of a complete or a partial contents of very short sequences in the samples of nucleic acids connected to the discrete particles of microscopic size by hybridization with oligonucleotide probes |
CZ291877B6 (en) * | 1991-09-24 | 2003-06-18 | Keygene N.V. | Amplification method of at least one restriction fragment from a starting DNA and process for preparing an assembly of the amplified restriction fragments |
US5262311A (en) * | 1992-03-11 | 1993-11-16 | Dana-Farber Cancer Institute, Inc. | Methods to clone polyA mRNA |
US5665544A (en) * | 1992-05-27 | 1997-09-09 | Amersham International Plc | RNA fingerprinting to determine RNA population differences |
GB9214873D0 (en) * | 1992-07-13 | 1992-08-26 | Medical Res Council | Process for categorising nucleotide sequence populations |
US6114114A (en) * | 1992-07-17 | 2000-09-05 | Incyte Pharmaceuticals, Inc. | Comparative gene transcript analysis |
US5795714A (en) * | 1992-11-06 | 1998-08-18 | Trustees Of Boston University | Method for replicating an array of nucleic acid probes |
FR2710279B1 (en) * | 1993-09-23 | 1995-11-24 | Armand Ajdari | Improvements to methods and devices for separating particles contained in a fluid. |
JPH0793370A (en) * | 1993-09-27 | 1995-04-07 | Hitachi Device Eng Co Ltd | Gene data base retrieval system |
US5459037A (en) * | 1993-11-12 | 1995-10-17 | The Scripps Research Institute | Method for simultaneous identification of differentially expressed mRNAs and measurement of relative concentrations |
US5707807A (en) * | 1995-03-28 | 1998-01-13 | Research Development Corporation Of Japan | Molecular indexing for expressed gene analysis |
US5604100A (en) * | 1995-07-19 | 1997-02-18 | Perlin; Mark W. | Method and system for sequencing genomes |
US5712126A (en) * | 1995-08-01 | 1998-01-27 | Yale University | Analysis of gene expression by display of 3-end restriction fragments of CDNA |
US5866330A (en) * | 1995-09-12 | 1999-02-02 | The Johns Hopkins University School Of Medicine | Method for serial analysis of gene expression |
-
1996
- 1996-06-14 US US08/663,823 patent/US5972693A/en not_active Expired - Fee Related
- 1996-10-24 AU AU74763/96A patent/AU730830C/en not_active Ceased
- 1996-10-24 IL IL12418596A patent/IL124185A/en not_active IP Right Cessation
- 1996-10-24 EP EP96936985A patent/EP0866877A4/en not_active Withdrawn
- 1996-10-24 WO PCT/US1996/017159 patent/WO1997015690A1/en not_active Application Discontinuation
- 1996-10-24 JP JP9516817A patent/JP2000500647A/en not_active Ceased
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5366877A (en) * | 1988-01-26 | 1994-11-22 | Applied Biosystems, Inc. | Restriction/ligation labeling for primer initiated multiple copying of DNA ssequences |
WO1995021944A1 (en) * | 1994-02-14 | 1995-08-17 | Smithkline Beecham Corporation | Differentially expressed genes in healthy and diseased subjects |
Non-Patent Citations (9)
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1999007896A2 (en) * | 1997-08-07 | 1999-02-18 | Curagen Corporation | Detection and confirmation of nucleic acid sequences by use of oligonucleotides comprising a subsequence hybridizing exactly to a known terminal sequence and a subsequence hybridizing to an unidentified sequence |
WO1999007896A3 (en) * | 1997-08-07 | 1999-04-29 | Curagen Corp | Detection and confirmation of nucleic acid sequences by use of oligonucleotides comprising a subsequence hybridizing exactly to a known terminal sequence and a subsequence hybridizing to an unidentified sequence |
US6190868B1 (en) | 1997-08-07 | 2001-02-20 | Curagen Corporation | Method for identifying a nucleic acid sequence |
EP1872786A1 (en) * | 1997-09-05 | 2008-01-02 | The Regents of the University of California | Use of immunostimulatory oligonucleotides for preventing or reducing antigen-stimulated, granulocyte-mediated inflammation |
US6610480B1 (en) | 1997-11-10 | 2003-08-26 | Genentech, Inc. | Treatment and diagnosis of cardiac hypertrophy |
WO1999028836A3 (en) * | 1997-11-28 | 1999-07-15 | Cybergene Ab | Arrangement and method for the analysis of nucleotide sequences |
WO1999028836A2 (en) * | 1997-11-28 | 1999-06-10 | Cybergene Ab | Arrangement and method for the analysis of nucleotide sequences |
US6355423B1 (en) * | 1997-12-03 | 2002-03-12 | Curagen Corporation | Methods and devices for measuring differential gene expression |
WO1999028505A1 (en) * | 1997-12-03 | 1999-06-10 | Curagen Corporation | Methods and devices for measuring differential gene expression |
WO2000015851A1 (en) * | 1998-09-17 | 2000-03-23 | Curagen Corporation | Geometrical and hierarchical classification based on gene expression |
US6486299B1 (en) | 1998-09-28 | 2002-11-26 | Curagen Corporation | Genes and proteins predictive and therapeutic for stroke, hypertension, diabetes and obesity |
WO2000034525A1 (en) * | 1998-12-09 | 2000-06-15 | Vistagen, Inc. | Toxicity typing using embryoid bodies |
WO2000040755A2 (en) * | 1999-01-06 | 2000-07-13 | Cornell Research Foundation, Inc. | Method for accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing |
WO2000040755A3 (en) * | 1999-01-06 | 2001-01-04 | Cornell Res Foundation Inc | Method for accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing |
US6534293B1 (en) | 1999-01-06 | 2003-03-18 | Cornell Research Foundation, Inc. | Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing |
US8367322B2 (en) | 1999-01-06 | 2013-02-05 | Cornell Research Foundation, Inc. | Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing |
WO2000040757A2 (en) * | 1999-01-08 | 2000-07-13 | Curagen Corporation | Method of identifying nucleic acids |
WO2000040757A3 (en) * | 1999-01-08 | 2000-11-30 | Curagen Corp | Method of identifying nucleic acids |
WO2000070099A2 (en) * | 1999-05-19 | 2000-11-23 | Mitokor | Differential gene expression in specific regions of the brain in neurodegenerative diseases |
WO2000070099A3 (en) * | 1999-05-19 | 2002-04-04 | Mitokor | Differential gene expression in specific regions of the brain in neurodegenerative diseases |
US6470277B1 (en) | 1999-07-30 | 2002-10-22 | Agy Therapeutics, Inc. | Techniques for facilitating identification of candidate genes |
WO2001075156A1 (en) * | 2000-03-31 | 2001-10-11 | Sanyo Electric Co., Ltd. | Microbe identifying method, microbe identifying apparatus, method for creating database for microbe identification, microbe identifying program, and recording medium on which the same is recorded |
US8901045B2 (en) | 2000-04-17 | 2014-12-02 | Dyax Corp. | Methods of constructing libraries comprising displayed and/or expressed members of a diverse family of peptides, polypeptides or proteins and the novel libraries |
US10829541B2 (en) | 2000-04-17 | 2020-11-10 | Dyax Corp. | Methods of constructing libraries comprising displayed and/or expressed members of a diverse family of peptides, polypeptides or proteins and the novel libraries |
US9683028B2 (en) | 2000-04-17 | 2017-06-20 | Dyax Corp. | Methods of constructing libraries comprising displayed and/or expressed members of a diverse family of peptides, polypeptides or proteins and the novel libraries |
US8288322B2 (en) | 2000-04-17 | 2012-10-16 | Dyax Corp. | Methods of constructing libraries comprising displayed and/or expressed members of a diverse family of peptides, polypeptides or proteins and the novel libraries |
US9382535B2 (en) | 2000-04-17 | 2016-07-05 | Dyax Corp. | Methods of constructing libraries of genetic packages that collectively display the members of a diverse family of peptides, polypeptides or proteins |
US8143009B2 (en) | 2000-06-14 | 2012-03-27 | Vistagen, Inc. | Toxicity typing using liver stem cells |
US8512957B2 (en) | 2000-06-14 | 2013-08-20 | Vistagen Therapeutics, Inc. | Toxicity typing using liver stem cells |
US9268983B2 (en) | 2003-01-22 | 2016-02-23 | Illumina, Inc. | Optical system and method for reading encoded microbeads |
US7830575B2 (en) | 2006-04-10 | 2010-11-09 | Illumina, Inc. | Optical scanner with improved scan time |
WO2007119779A1 (en) | 2006-04-14 | 2007-10-25 | Nec Corporation | Individual discrimination method and apparatus |
US10718066B2 (en) | 2008-03-13 | 2020-07-21 | Dyax Corp. | Libraries of genetic packages comprising novel HC CDR3 designs |
US11926926B2 (en) | 2008-03-13 | 2024-03-12 | Takeda Pharmaceutical Company Limited | Libraries of genetic packages comprising novel HC CDR3 designs |
US10683342B2 (en) | 2008-04-24 | 2020-06-16 | Dyax Corp. | Libraries of genetic packages comprising novel HC CDR1, CDR2, and CDR3 and novel LC CDR1, CDR2, and CDR3 designs |
US11598024B2 (en) | 2008-04-24 | 2023-03-07 | Takeda Pharmaceutical Company Limited | Libraries of genetic packages comprising novel HC CDR1, CDR2, and CDR3 and novel LC CDR1, CDR2, and CDR3 designs |
Also Published As
Publication number | Publication date |
---|---|
EP0866877A1 (en) | 1998-09-30 |
AU7476396A (en) | 1997-05-15 |
EP0866877A4 (en) | 2004-10-13 |
AU730830C (en) | 2001-10-25 |
IL124185A (en) | 2000-12-06 |
US5972693A (en) | 1999-10-26 |
JP2000500647A (en) | 2000-01-25 |
AU730830B2 (en) | 2001-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU730830C (en) | Method and apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing | |
US5871697A (en) | Method and apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing | |
US6418382B2 (en) | Method and apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing | |
US10538759B2 (en) | Compounds and method for representational selection of nucleic acids from complex mixtures using hybridization | |
US6057101A (en) | Identification and comparison of protein-protein interactions that occur in populations and identification of inhibitors of these interactors | |
JP3251291B2 (en) | Classification method of nucleotide sequence population | |
JP2004504059A (en) | Method for analyzing and identifying transcribed gene, and finger print method | |
WO1999007896A2 (en) | Detection and confirmation of nucleic acid sequences by use of oligonucleotides comprising a subsequence hybridizing exactly to a known terminal sequence and a subsequence hybridizing to an unidentified sequence | |
US20020015951A1 (en) | Method of analyzing a nucleic acid | |
CA2235860A1 (en) | Method and apparatus for identifying, classifying, or quantifying dna sequences in a sample without sequencing | |
KR20060130599A (en) | Method of obtaining gene tag | |
CN117025786A (en) | Fine wool sheep 50K SNP liquid phase chip based on targeted capturing sequencing and application thereof | |
US20030170661A1 (en) | Method for identifying a nucleic acid sequence | |
AU3085701A (en) | Method of analyzing a nucleic acid | |
JP2005515790A (en) | Methods and means for identification of genetic features | |
GB2348284A (en) | Method for comparing nucleic acid sequences |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AL AM AU AZ BA BB BG BR BY CA CN CU CZ EE FI GE HU IL IS JP KG KP KR KZ LC LK LR LS LT LV MD MG MK MN MX NO NZ PL RO RU SG SI SK TJ TM TR TT UA UZ VN AM AZ BY KG KZ MD RU TJ TM |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
ENP | Entry into the national phase |
Ref document number: 2235860 Country of ref document: CA Ref country code: CA Ref document number: 2235860 Kind code of ref document: A Format of ref document f/p: F Ref country code: JP Ref document number: 1997 516817 Kind code of ref document: A Format of ref document f/p: F |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1996936985 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1996936985 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1996936985 Country of ref document: EP |