WO2008119084A1 - Method for identifying and selecting low copy nucleic acid segments - Google Patents

Method for identifying and selecting low copy nucleic acid segments Download PDF

Info

Publication number
WO2008119084A1
WO2008119084A1 PCT/US2008/058795 US2008058795W WO2008119084A1 WO 2008119084 A1 WO2008119084 A1 WO 2008119084A1 US 2008058795 W US2008058795 W US 2008058795W WO 2008119084 A1 WO2008119084 A1 WO 2008119084A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
sequences
genomic
probe
primer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2008/058795
Other languages
English (en)
French (fr)
Inventor
Heather Newkirk
Chengpeng Bi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Childrens Mercy Hospital
Original Assignee
Childrens Mercy Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Childrens Mercy Hospital filed Critical Childrens Mercy Hospital
Priority to JP2010501272A priority Critical patent/JP2010522571A/ja
Priority to EP08744701A priority patent/EP2129800A4/en
Publication of WO2008119084A1 publication Critical patent/WO2008119084A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10TTECHNICAL SUBJECTS COVERED BY FORMER US CLASSIFICATION
    • Y10T436/00Chemistry: analytical and immunological testing
    • Y10T436/14Heterocyclic carbon compound [i.e., O, S, N, Se, Te, as only ring hetero atom]
    • Y10T436/142222Hetero-O [e.g., ascorbic acid, etc.]
    • Y10T436/143333Saccharide [e.g., DNA, etc.]

Definitions

  • the present invention relates to a method of identifying low copy nucleic acid segments, suitable for use in hybridization experiments, from within a known nucleic acid sequence.
  • the present invention further relates to a method of preferentially selecting among the identified low copy nucleic acid segments for segments that are thermodynamically suitable for use in hybridization experiments.
  • thermodynamic qualities of a potential probe sequence are not capable of initially identifying the sequence.
  • Mfold publicly available on the world wide web at a website that reads in pertinent part "bioinfo.rpi.edu"
  • Mfold does not evaluate genomic sequences for their unique sequence nature.
  • a user cannot be certain that the thermodynamically stable sequence that has been identified will be unique until tested. Since testing a probe consumes both time and money, it is desired to find a more reliable method of identifying thermodynamically stable, unique sequences within a genetic segment.
  • the present invention overcomes the problems inherent in the prior art and provides a distinct advance in the state of the art by providing methods and computerized processes for the rapid and reliable identification of low copy nucleic acid segments from within a known nucleic acid sequence and for the selection from the identified low copy segments of segments that are thermodynamically suitable for use in hybridization experiments.
  • the invention advantageously provides for greater sensitivity and higher throughput in hybridization.
  • the methods allow the user to analyze longer sequence lengths at a time versus other genomics programs, while still being capable of analyzing sequences of any length. These longer sequences may be greater than 100 kilobases (kb), 150 kb, 200 kb, 250 kb, 300 kb, 500 kb, or even 1000 kb or more in length.
  • the parameters used by this method are stricter than those commonly used on web-based programs.
  • ⁇ G Gibbbs Free Energy
  • ⁇ H Enthalpy
  • ⁇ S Entropy
  • Tm Melting Temperature
  • the Gibb's Free Energy Equation is an equation and the variables ⁇ H, ⁇ S, and Tm can be manipulated in order to arrive at the desired ⁇ G, which is ⁇ 50 in preferred forms. If manipulation of 1 or more of these variables is outside of the preferred range but still results in a ⁇ G ⁇ 50, these criteria or parameters are also covered by the present invention.
  • the criteria or parameters will require that ⁇ G ⁇ 50, ⁇ H ⁇ -1000, ⁇ S ⁇ -3500, Tm ⁇ 60C.
  • Methods of the invention are more comprehensive, compared to present technologies, because they combine sequence analysis with thermodynamic analysis to identify nucleic acid segments that are both low copy sequences (i.e. not repetitive sequences, and preferably single copy meaning that the sequence appears only a single time in the genome) and thermodynamically suitable for hybridization. Additionally, methods of the invention identify unique sequences and search the genome to ensure that no other non-repetitive genomic regions are homologous to the region of interest. Further, unlike technology in the art, methods of the invention provide a double-check analysis of low copy nucleic acid segments to determine their suitability to be used as primers for polymerase chain reaction (PCR), or in other techniques that rely on variable temperatures. This represents the first invention to use such analytical methods sequentially.
  • PCR polymerase chain reaction
  • This invention is quite versatile in that it can be employed to design a variety of low copy nucleic acid probes of different lengths with characteristics that can be user-defined. For example, the present invention allows the user to choose the length of a unique sequence probe for the output.
  • FIGURE, l is a screen capture showing an input screen for the web-based
  • FIG. 2A is a screen capture showing examplary output from UGSH displaying unique sequence genomic probes and locations.
  • FIG. 2B is a screen capture showing an exemplary Primer Selection Output screen from UGSH.
  • FIG. 2C is a screen capture showing an exemplary primer sequence file from UGSH displayed in FASTA format;
  • FIG. 3 is a photograph taken from a fluorescence in situ hybridization (FISH) experiment using a unique sequence probe from BAC RP11-677F14 on chromosome 7;
  • FISH fluorescence in situ hybridization
  • FIG. 4 is a photograph taken from a FISH experiment using a unique sequence probe cocktail containing five, different unique sequence probes
  • FIG. 5 illustrates the results of a FISH experiment, using a probe not designed using the UGSH method. Probes (light gray, arrows) hybridized to numerous chromosomal locations, indicating that this sequence is homologous to more than one chromosomal region and thus not comprising a purely unique sequence;
  • FIG. 6 is a flow chart illustrating an embodiment of a computerized method for identifying low copy nucleic acid segments from within a known nucleic acid sequence, and selecting among the identified low copy segments for segments that are thermodynamically suitable for use in hybridization experiments
  • FIG. 7 is a flow chart illustrating a further embodiment of a computerized method for identifying low copy nucleic acid segments from within a known nucleic acid sequence and selecting among the identified low copy segments for segments that are thermodynamically suitable for use in hybridization experiments;
  • FIG. 8 is a flow chart illustrating an embodiment of a computerized method for identifying known repetitive sequences within an exemplary sequence from a subject or patient.
  • FIG. 9 is a flow chart illustrating an embodiment of a computerized method for extracting known repetitive sequences from a sequence from a subject or patient and selecting remaining portions of the sequence according to user-specified size parameters.
  • the present invention comprises a new, computerized process for the identification of unique sequence regions in genomic DNA, and provides methods to design unique-sequence genomic segments.
  • the identified segments can in turn be synthesized or amplified from a genome, or part of a genome, genomic library, or other source of genomic DNA and utilized in hybridization experiments such as, but not limited to, microarray, arrayCGH (collectively with microarray termed "array- based"), quantitative microsphere hybridization (QMH), and fluorescent in situ hybridization (FISH).
  • arrayCGH collectively with microarray termed "array- based”
  • QMH quantitative microsphere hybridization
  • FISH fluorescent in situ hybridization
  • genomic sequences, or segments are evaluated for unique, or non-repetitive, sequence composition by combining two different strategies and analyzing the thermodynamic characteristics of any identified unique sequence regions to ensure optimal performance of an identified low copy nucleic acid segment in hybridization assays.
  • a preferred form of this method includes five main steps: 1) Removing highly and moderately repetitive sequences from a sequence of interest and displaying those genomic segments (i.e. the segments remaining after the repetitive sequences are removed).
  • genomic segments can be of any size, but for FIS, they are preferably greater than 500 bp, more preferably greater than 750 bp, and most preferably greater than 1kb; 2) Searching each segment for homology to genomic regions other than the region of interest and discarding all segments which match elsewhere in the genome; 3) Evaluating unique sequence segments for possible secondary structure motifs (hairpin loops, stems, bulges, etc.) by thermodynamic analysis; 4) Designing PCR primers for genomic segments which pass the above three steps; and, 5) evaluating each PCR primer to ensure it contains only unique sequence and does not match elsewhere in the genome.
  • the process stops after step 3, and in other preferred forms, the process stops after step 4. However, in use, it is preferred to perform all 5 steps.
  • the UGSH method was developed through the iterative design and experimental testing of genomic probes. Initially, methods from the prior art (U.S. Patent Nos. 6,828,097 ('097 patent) and 7,014,997 ('997 patent)) were used for the generation of "single copy" probes for quantitative microsphere hybridization (QMH) experiments (Newkirk et al. 2006, Determination of genomic copy number with quantitative microsphere hybridization. Human Mutation 27:376-386).
  • QMH assay allows for the high-throughput determination of genomic copy number by the direct hybridization of unique sequence probes, attached to spectrally distinct microspheres, to biotinylated genomic patient DNA, followed by flow cytometric analysis (Newkirk et al. 2006, U.S. Provisional Patent Application Serial No.
  • MFI mean fluorescence intensity
  • Step 1 as described above, of the UGSH method is similar but distinct from the methods described in the aforesaid patent applications.
  • Methods of the aforesaid patent applications involve repeat-masking (i.e. running a comparison of the sequence of interest with all known repetitive sequences in a genome and eliminating or “masking" those sequences that have 90% or higher sequence similarity (which can introduce gaps and windows to provide a better match between two sequences)) a sequence of interest to generate unique or "single copy probes".
  • repeat-masking i.e. running a comparison of the sequence of interest with all known repetitive sequences in a genome and eliminating or “masking" those sequences that have 90% or higher sequence similarity (which can introduce gaps and windows to provide a better match between two sequences)
  • a sequence of interest to generate unique or "single copy probes.
  • a probe was designed (designated, ABLAluMer1) for QMH (Newkirk et al. 2005).
  • HOXB1 sequence A known single copy HOXB1 sequence (Newkirk et al., 2006) was used as the reference sequence. Both probes ( ⁇ 100 bases) were coupled to spectrally distinct microspheres and hybridized to biotinylated normal control genomic DNA. The MFI ratio of the HOXB1 and ABLAluMer1 probe should be 1 since a normal control DNA was used for validation, however the MFI ratio was 4.55 indicating that the
  • Step 2 1) followed by a genomic homology search (Step 2) and probe 16-1d was designed specific to ABL (Newkirk et al., 2006).
  • This probe was hybridized to two different normal human genomic DNAs in QMH reactions with HOXB1 and yielded respective MFI ratios of 1.36 and 1.18. While closer to 1, these ratios are still not optimal.
  • Subsequent analysis of the 16-1d probe revealed a stable hairpin loop structure close to the 3' end of the probe (Newkirk et al., 2006), which could account for its less-than-optimal MFI ratios.
  • a secondary structure analysis step (Step 3) was integrated for refinement of the UGSH method.
  • the Unique Genome Sequence Hunter (UGSH) method for genomic hybridization probe selection requires a DNA sequence (step 1), which can be entered into the UGSH program in FASTA or Genbank format.
  • this sequence can be defined by chromosomal coordinates, gene name, or region of interest (step 1a).
  • step 1a UGSH will query a database, with a particularly preferred database being the UCSC database (genome.ucsc.edu) to retrieve the appropriate sequence corresponding to the query (ie. Chr15:21263421- 21263821, SNRPN, PWS, etc.).
  • step 2 is to remove repetitive sequences from the input sequence.
  • UGSH does this by aligning the sequences of highly repetitive classes of DNA (SINE, LINE, satellites, short tandem repeats, minisatellites, microsatellites, telomere, etc.) to the sequence of interest.
  • UGSH runs the RepeatMasker program to remove repetitive sequences, but it uses strictly defined output parameters for Repeat Masker to eliminate all sequences with greater than or equal to a 90% homology match to known repeat sequences. Any similar repeat masking program could be used for this procedure.
  • this repeat masking step can be circumvented by inputting a query sequence that is already masked for repeats (step 2A ).
  • the UCSC genome browser and Genbank offer the option to display masked sequences, thus eliminating the need for this repeat-masking step.
  • step 3 is to scan this sequence for homologous sequences in the genome using the BLAT program from the UCSC genome browser. Any segment of the sequence which has a BLAT score greater than or equal to 30 is discarded from probe selection.
  • Any genome- wide homology search program such as BLAST from NCBI, can be substituted for BLAT and the same parameters used (acceptable score ⁇ 30 or between 1-30, preferably less than 25 (or between 1-25), even more preferably less than 20 (or between 1-20), still more preferably less than 15 (or between 1-15), even more preferably less than 10 (or between 1-10), still more preferably less than 8 (or between 1-8), even more preferably less than 6 (or between 1-6), still more preferably less than 5 (or between 1-5), even more preferably less than 4 (or between 1-4), still more preferably less than 3 (or between 1-3), even more preferably, less than 2 (or between 1-2), and most preferably 1).
  • accepted score ⁇ 30 or between 1-30 preferably less than 25 (or between 1-25), even more preferably less than 20 (or between 1-20), still more preferably less than 15 (or between 1-15), even more preferably less than 10 (or between 1-10), still more preferably less than 8 (or between 1-8), even more
  • the remaining sequence that is repeat-free and has little to no homology elsewhere in the genome is then examined for potential secondary structure (i.e. bulges, loops, or stems) which could render the probe suboptimal for genomic hybridization experiments (step 4).
  • the preferred UGSH method utilizes the Mfold program and uses strictly defined parameters ( ⁇ G ⁇ 50, ⁇ H ⁇ -1000, ⁇ S ⁇ -3500,
  • the remaining sequences are used for PCR primer design if PCR probes are desired (step 5).
  • the UGSH method employs the Primer3 program (Rozen et al., 2000) to design primers at least 15 bases in length.
  • these primers can range in length from 15-100 bases; for array-based and QMH applications, these primers can range from 15-70, and more preferably from 25-70 bases in length.
  • One particularly preferred length for FISH applications is 22 bases in length.
  • the product size will be equal to or slightly less than the input sequenced size.
  • the product size will be equal to or slightly less than 0 to 200 bases less than the input sequence size, however any conventional primer selection program could be substituted and longer input sequences could have product sizes more than 200 bases less than the input sequence size.
  • Primers are then BLAT searched using the UCSC BLAT program (step 6) to ensure that there is no homologous sequence elsewhere in the genome. Any primer which has more than one genomic match is discarded.
  • the PCR primer design step and PCR primer homology search step can be omitted if hybridization oligonucleotides are desired instead of PCR probes, and the repeat-free sequences with no homologous genome matches from step 4 can be used as hybridization probes.
  • UGSH After completing all processes, UGSH then displays the unique sequences sorted by size, as well as the primer sequences, if desired (step 7). This is a summary of the processes run in the UGSH method; however, steps 2 through 7 are typically performed automatically by the UGSH program and are not apparent to the user.
  • FIG. 1 is a screen capture of the UGSH input page provided through a web-based interface.
  • a user enters in a job title, minimum size for probe selection, and the number of bases to be displayed per line.
  • the sequence of interest is then either entered in FASTA format into sequence box or uploaded in Genbank file format from NCBI using the browse button by the user.
  • the number of primers to be returned is typically set at 25 as a default parameter, but can be changed by the user.
  • the minimum PCR product size for probes can be changed by the user as well.
  • FIG. 2A is a screen shot of a UGSH output page displaying unique sequence regions by position in input sequence.
  • Genbank sequence file was uploaded to the UGSH program, the Source lists the definition of the file, accession number of the sequence, version of the sequence (if applicable) and GI number for the sequence, all determined by Genbank.
  • the title of the job, as specified by the user, is displayed as well as the total length of the sequence input by the user.
  • the minimum size allowed for unique sequence probe selection, as specified in the input screen, is shown.
  • the locations of the unique sequence regions are displayed (eg.
  • FIG. 2B is a screen capture of an example Primer Selection Output screen from the UGSH program displaying the number of sequences for each unique sequence region.
  • the sequences are named seql. primer, seq2.primer, etc, and the size of each unique sequence region used for the primer design is shown in parentheses.
  • the file containing the actual 25 primer sequences, or the number specified by the user in the input screen, is displayed when the text file is opened (FIG. 2C).
  • FIG. 2C is a screen capture of an example primer sequence file from
  • UGSH displayed in FASTA format.
  • the primer sequence file is displayed.
  • "PL” indicates the left primer of the unique sequence region and “PR” refers to the right primer.
  • "PF" for full probe, displays in parentheses the starting position of the left primer, length of left primer, starting position of the right primer, and length of the right primer in relation to the input sequence in parentheses. The region encompassed and including the primers is shown beneath that. Each subsequent primer is shown and numbered 0 to n, where n is the number of primers to be shown specified by the user on the UGSH input screen.
  • the graphical interface (FIG. 1) is used for sequence entry (step 1 or step Ia).
  • FIG. 7 outlines the following procedure: given a patient sequence or sequences (input), if the sequence or sequences are already annotated (i.e. locations of repeat sequences are known), then candidate unique sequences are directly generated (see FIG. 9), otherwise the repeat locations are determined and the program returns to the next step.
  • the generated candidate sequences are stored in FASTA file format and are run with BLAST or BLAT (default settings) which singles out all those segments that do not satisfy user, third party, or default criteria.
  • the remaining sequences are passed through the Mfold program from which the output sequences are sent to be processed by the Primer3 program.
  • the Primer3 program generates probes. The probes are verified by re-running the BLAT or BLAST program.
  • Each step has filtering thresholds that are detailed elsewhere in this application.
  • a patient sequence is often retrieved from the NCBI database and thus it is marked with the annotated features (i.e. repeat locations etc.), see FIG. 8.
  • a publicly available repeat finder program such as RepeatMasker or Dust, etc., is used to determine known repetitive sequences within the patient sequence.
  • the output provided by such programs comprises a listing of all the repeat sequences and locations, typically in FASTA format.
  • the candidate sequences are generated by removing all the repeats and extracting all the remaining sequences with a size of interest.
  • the output sequences are stored in a formatted file that is consistent with the next program (i.e. FASTA format).
  • FASTA format a formatted file that is consistent with the next program.
  • An exemplary embodiment of the UGSH program is presented in pseudocode herein. As presented, the program is organized into modules that interact with one another, and with other programs and data available on the Internet, as the program is used. It is understood that the methods herein are preferably performed by a processor or program withing a computer. Main control function Create Web User Interface ⁇ Parameters
  • Run a repeat-finding program e.g. RepeatMasker Extract repeat features
  • Input unique candidate sequences from BLAT/BLAST
  • Output thermodynamically stable sequences in a file
  • Optional pass one or more variables calculated by Mfold pertaining to sequence thermodynamics/folding structure to UGSH for presentation to user in UGSH GUI window and/or local storage in data file
  • Ionic Conditions i.e., molarity of Na + and Mg ++
  • PRIMER_MAX_TM 63.0
  • PRIMER_MAX_DIFF_TM 100.0
  • PRIMER_SALT_CONC 50.0
  • PRIMER_DNA_CONC 50.0
  • PRIMER_MIN_QUALITY 0
  • PRIMER_MIN_END_QUALITY 0
  • PRIMER_QUALITY_RANGE_MIN 0
  • PRIMER_INTERNAL_OLIGO_MIN_GC 20.0
  • PRIMER_INTERNAL_OLIGO_MAX_GC 80.0
  • PRIMER_IO_WT_SIZE_LT 1.0
  • Nucleic acid and “nucleic acids” herein generally refer to large, chain- like molecules that contain phosphate groups, sugar groups, and purine and pyrimidine bases.
  • Two general types are ribonucleic acid (RNA) and deoxyribonucleic acid (DNA).
  • the terms are inclusive of hybrids of DNA and RNA (DNA/RNA) and ribosomal DNA (rDNA).
  • the bases naturally involved are adenine, guanine, cytosine, and thymine (uracil in RNA).
  • Artifical bases also exist, e.g. inosine, and may be substitute to create a nucleic acid probe. The skilled artisan will be familiar with these artificial bases and their utility.
  • Low copy nucleic acid segments and “low copy segments” are synonymous terms referring to nucleic acid sequences of varying length that are "unique”, i.e. non-repetitive, nearly unique, or so infrequent in a normal chromosome or genome to not be classified as repetitive by the skilled artisan.
  • Repetitive DNA refers to DNA sequences that are repeated in the genome.
  • highly repetitive DNA consists of short sequences, 5-100 nucleotides, repeated thousands of times in a single stretch and includes satellite DNA.
  • moderately repetitive DNA consists of longer sequences, about 150-300 nucleotides, dispersed evenly throughout the genome, and includes what are called AIu sequences and transposons.
  • Hybridization generally refers the pairing (tight physical bonding) of two complementary single strands of RNA and/or DNA to give a double- stranded molecule.
  • Hybridization techniques are inclusive of both solid support technologies, such as microarrays, southern blot analysis, and quantitative microsphere hybridization, that separate the target nucleic acids from their biological structure and of cell or chromosome-based technologies that do not separate the target nucleic acid from their biological structure, e.g. cell, tissue, cell nucleus, chromosome, or other morphologically recognizable structure.
  • PCR means polymerase chain reaction
  • Target DNA was prepared for hybridization by incorporation of biotin-16- dUTP using whole genome amplification for two different DiGeorge patient genomic DNA samples as well as one normal control sample. Biotinylated genomic DNA was sheared to an average size of 1kb and the DiGeorge probe and HOXB1 probe were hybridized in a multiplex reaction. Samples were analyzed by dual-laser flow cytometry (Luminex) and the mean fluorescence intensity (MFI) ratios for each probe obtained. Data for the DiGeorge patients (DG-1, DG-2) and normal control sample are displayed below.
  • the MFI value for the HOXB1 probe was 123 and the MFI value for the
  • DiGeorge probe was 65. This constitutes an MFI ratio of ⁇ 0.5 which indicates the
  • DiGeoge probe is present in only one copy as compared to the HOXB1 probe present in two copies, which is reflective of the actual genotype of the DiGeorge patient DNA.
  • This example illustrates that UGSH successfully identified unique sequence regions since an MFI ratio greater than ⁇ 0.5 would indicate that the DiGeorge probe hybridized to other genomic regions and was thus not composed solely of unique sequence.
  • Examples of QMH probes not effectively designed specific to unique sequence regions yielded MFI ratios not ⁇ 0.5 in patients with deleted genomic regions and were presented in Newkirk et al., 2006
  • Genomic sequence specific to BAC RP11-677F14 was uploaded into UGSH (FIG. 1 ), the program was executed, and unique sequence probes were displayed (FIG. 2).
  • One probe (chr7: 115367602-115371201) and corresponding primer sequences were selected from the UGSH output and synthesized the primers (Invitrogen).
  • the specific genomic region was amplified by PCR (Promega). Standard methods for direct probe labeling (Mirus, Inc.) were used and the probe was hybridized to normal human control chromosomes (metaphase and interphase) using FISH.
  • the single unique sequence probe produced very bright and distinct hybridization signals (FIG. 3) indicating no cross-hybridization to other genomic regions, thus verifying its unique sequence design.
  • FIG. 3 is a photograph taken from a FISH experiment using a unique sequence probe from BAC RP11-677F14 on chromosome 7 designed using the
  • FIG. 4 illustrates results obtained from using five unique sequence probes specific to chromosome 3, which were designed using the UGSH method. Each probe was PCR amplified and direct labeled (red; Mirus, Inc.), then combined and co-hybridized with a control probe (Cen7, green; Vysiss onto normal human metaphase chromosomes. The signal intensity for hybridization in this FISH experiment was much greater for the unique sequence probe cocktail, as compared to the single unique sequence probe (FIG. 3), and exhibited very little background fluorescence, allowing for faster and easier localization.
  • Results from the FISH experiment show hybridization of the probe (red) to numerous chromosomal locations indicating this sequence is homologous to more than one chromosomal region and thus not composed of purely unique sequence.
  • a control probe specific to the centromere of chromosome 9 (CEP9, Vysis) was co- hybridized during the FISH experiment. Further analysis of the ABL1 probe sequence itself revealed that 61.98% of the probe sequence was composed of repetitive elements, including AIu, LINE1, and LINE2. Because these elements are slightly divergent from the ancestral repetitive sequence for each element, repeat masking was not sufficient to identify these sequences.
  • FIGs. 3 and 4 are compared with FIG. 5.
  • FIG. 5 is a photograph taken from a FISH experiment using a probe not designed using the UGSH method, but a method presented in the '097 and '997 patents.
  • Repeats in a DNA sequence specific to chromosome 9 were masked by homology searches with well known repeat families and classes (the '097 and '997 patents) and primers were designed to one resulting "single copy" region.
  • Results from the FISH experiment show hybridization of the probe (red) to numerous chromosomal locations indicating this sequence is homologous to more than one chromosomal region and thus not composed of purely unique sequence.
  • a control probe specific to the centromere of chromosome 9 (CEP9, Vysis) was co-hybridized during the FISH experiment.
  • UGSH genomic hybridization experiment
  • UGSH can identify unique sequence probes (60- 70 bases) for microarray and arrayCGH experiments. Primer sequences would not be necessary for these applications due to the short length of probes, however UGSH would display the necessary unique sequence regions.
  • Other applications for the UGSH method include but are not limited to Southern and Northern blot analysis, in situ hybridization, multiplex ligation-dependent probe amplification (MLPA), and multiplex amplifiable probe hybridization (MAPH).
  • This Example provides a number of probes that were developed using the methods of the present invention.
  • Each of the probes can be used individually, or in combination with at least one other probe in order to assess the risk of uterine cervical cancer.
  • risk of developing uterine cervical cancer is reduced as the sequence of interest is known to be present.
  • the sequence of interest is deleted, or has mutated to a point that prevents hybridization.
  • a single probe selected from the group consisting of SEQ ID NOs. 1-31 is used in the hybridization assay.
  • the method will include at least 2 or more probes selected from the group consisting of
  • SEQ ID NOs. 1-25 or SEQ ID NOs. 26-31.
  • the probes from SEQ ID NOs. 1-25 are from chromosome 3 (3q26), and the probes from SEQ ID NOs. 26-31 are from chromosome 7.
  • probe cocktails containing a plurality of probes are used.
  • the hybridization (or lack thereof) of any one probe will provide a wealth of information related to the intactness, or variation in comparison to a sequence without variation, all of which may aid in the detection and risk assessment of individuals for uterine cervical cancer.
  • SEQ ID NOs. 32-43 also relate to genetic markers for uterine cervical cancer.
  • SEQ ID NOs. 33 and 34 are the forward and reverse primers, respectively, for SEQ ID NO. 32, SEQ ID NOs. 36 and 37, are the forward and reverse primers, respectively, for SEQ ID NO. 35, SEQ ID NOs. 39 and 40, are the forward and reverse primers, respectively, for SEQ ID NO. 38, and SEQ ID NOs. 42 and 43, are the forward and reverse primers, respectively, for SEQ ID NO. 41.
  • the probes of SEQ ID Nos 32, 35, 38, and 41 may be used individually, or in combination with one another, or even in combination with any of SEQ ID Nos. 1-31.
  • Table 2 provides a listing of coordinates for each of these probes (according to the March 2006 UCSC Genome Build).
  • probes developed in accordance with the present invention are particularly well suited for use in quantum microsphere hybridization assays.
  • Preferred probes include those provided herein as SEQ ID NOs. 44-57. Each one of these probes is used individually to detect the presence of the pathogen from which it is derived.
  • SEQ ID NO. 44 is from the Mycoplasma FRX A Gene (genus specific). Specifically, hybridization of SEQ ID NO. 45 indicates the presence of M. Fermentans, hybridization of SEQ ID NO. 46 indicates the presence of M. mollicutes, hybridization of SEQ ID NO. 47 indicates the presence of M. hominis, hybridization of SEQ ID NO. 48 indicates the presence of M. hyorhinis, hybridization of SEQ ID NO.
  • hybridization of SEQ ID NO. 50 indicates the presence of M. orale
  • hybridization of SEQ ID NO. 51 indicates the presence of Acheoplasma laidlawii
  • hybridization of SEQ ID NO. 52 indicates the presence of M. salivarium
  • hybridization of SEQ ID NO. 53 indicates the presence of M. pulmonis
  • hybridization of SEQ ID NO. 54 indicates the presence of M. pneumoniae
  • hybridization of SEQ ID NO. 55 indicates the presence of M. pirum
  • hybridization of SEQ ID NO. 56 indicates the presence of M. capricolum
  • hybridization of SEQ ID NO. 57 indicates the presence of Helicobacter pylori.
  • compositions and methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this invention have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the compositions and methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the following claims.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Engineering & Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
PCT/US2008/058795 2007-03-28 2008-03-28 Method for identifying and selecting low copy nucleic acid segments Ceased WO2008119084A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2010501272A JP2010522571A (ja) 2007-03-28 2008-03-28 少コピー数の核酸セグメントを同定および選択する方法
EP08744701A EP2129800A4 (en) 2007-03-28 2008-03-28 PROCESS FOR IDENTIFICATION AND SELECTION OF LOW COPY NUCLEIC ACID SEGMENTS

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US90860607P 2007-03-28 2007-03-28
US60/908,606 2007-03-28
US94032107P 2007-05-25 2007-05-25
US60/940,321 2007-05-25

Publications (1)

Publication Number Publication Date
WO2008119084A1 true WO2008119084A1 (en) 2008-10-02

Family

ID=39789071

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/058795 Ceased WO2008119084A1 (en) 2007-03-28 2008-03-28 Method for identifying and selecting low copy nucleic acid segments

Country Status (4)

Country Link
US (1) US20080274558A1 (enExample)
EP (1) EP2129800A4 (enExample)
JP (1) JP2010522571A (enExample)
WO (1) WO2008119084A1 (enExample)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010064893A1 (en) * 2008-12-04 2010-06-10 Keygene N.V. Method for the reduction of repetitive sequences in adapter-ligated restriction fragments
JP2013500027A (ja) * 2009-07-30 2013-01-07 エフ.ホフマン−ラ ロシュ アーゲー オリゴヌクレオチドプローブのセットならびにそれに関連する方法および使用

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5838169B2 (ja) * 2009-12-31 2016-01-06 ヴェンタナ メディカル システムズ, インク. 一意的に特異的核酸プローブを生成するための方法
WO2016069539A1 (en) * 2014-10-27 2016-05-06 Helix Nanotechnologies, Inc. Systems and methods of screening with a molecule recorder
CN118038980B (zh) * 2024-01-19 2024-10-25 成都基因汇科技有限公司 一种鉴定目标基因的探针序列的设计方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030104470A1 (en) * 2001-08-14 2003-06-05 Third Wave Technologies, Inc. Electronic medical record, library of electronic medical records having polymorphism data, and computer systems and methods for use thereof
US20030108919A1 (en) * 2001-09-05 2003-06-12 Perlegen Sciences, Inc. Methods for amplification of nucleic acids
US20060110744A1 (en) * 2004-11-23 2006-05-25 Sampas Nicolas M Probe design methods and microarrays for comparative genomic hybridization and location analysis
US20060141501A1 (en) * 1999-07-16 2006-06-29 Rosetta Inpharmatics Llc Iterative probe design and detailed expression profiling with flexible in-situ synthesis arrays
US20070059743A1 (en) * 2005-08-17 2007-03-15 Biosigma S.A. Method for the design of oligonucleotides for molecular biology techniques

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050239737A1 (en) * 1998-05-12 2005-10-27 Isis Pharmaceuticals, Inc. Identification of molecular interaction sites in RNA for novel drug discovery
EP1072679A3 (en) * 1999-07-20 2002-07-31 Agilent Technologies, Inc. (a Delaware corporation) Method of producing nucleic acid molecules with reduced secondary structure
JP2004523201A (ja) * 2000-05-16 2004-08-05 ザ チルドレンズ マーシー ホスピタル シングルコピーゲノムのハイブリダイゼーションプローブおよびその作製方法
US6828097B1 (en) * 2000-05-16 2004-12-07 The Childrens Mercy Hospital Single copy genomic hybridization probes and method of generating same
JP2003052385A (ja) * 2001-06-04 2003-02-25 Hitachi Ltd Dnaアレイ向けプローブ配列決定システム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060141501A1 (en) * 1999-07-16 2006-06-29 Rosetta Inpharmatics Llc Iterative probe design and detailed expression profiling with flexible in-situ synthesis arrays
US20030104470A1 (en) * 2001-08-14 2003-06-05 Third Wave Technologies, Inc. Electronic medical record, library of electronic medical records having polymorphism data, and computer systems and methods for use thereof
US20030108919A1 (en) * 2001-09-05 2003-06-12 Perlegen Sciences, Inc. Methods for amplification of nucleic acids
US20060110744A1 (en) * 2004-11-23 2006-05-25 Sampas Nicolas M Probe design methods and microarrays for comparative genomic hybridization and location analysis
US20070059743A1 (en) * 2005-08-17 2007-03-15 Biosigma S.A. Method for the design of oligonucleotides for molecular biology techniques

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2129800A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010064893A1 (en) * 2008-12-04 2010-06-10 Keygene N.V. Method for the reduction of repetitive sequences in adapter-ligated restriction fragments
JP2013500027A (ja) * 2009-07-30 2013-01-07 エフ.ホフマン−ラ ロシュ アーゲー オリゴヌクレオチドプローブのセットならびにそれに関連する方法および使用
US9347091B2 (en) 2009-07-30 2016-05-24 Roche Molecular Systems, Inc. Set of oligonucleotide probes as well as methods and uses thereto
US10640815B2 (en) 2009-07-30 2020-05-05 Roche Molecular Systems, Inc. Set of oligonucleotide probes as well as methods and uses thereto
US11421266B2 (en) 2009-07-30 2022-08-23 Roche Molecular Systems, Inc. Set of oligonucleotide probes as well as methods and uses thereto

Also Published As

Publication number Publication date
EP2129800A4 (en) 2010-08-04
US20080274558A1 (en) 2008-11-06
JP2010522571A (ja) 2010-07-08
EP2129800A1 (en) 2009-12-09

Similar Documents

Publication Publication Date Title
Warshauer et al. STRait Razor: a length-based forensic STR allele-calling tool for use with second generation sequencing data
JP2024059651A (ja) Dnaプロファイリングのための方法および組成物
Doshi et al. Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction
Mamanova et al. Target-enrichment strategies for next-generation sequencing
Lynce et al. Efficient haplotype inference with Boolean satisfiability
US20210363583A1 (en) Methods for assessing a genomic region of a subject
Lee et al. Genomic analysis
EP3359687B1 (en) Off-target capture reduction in sequencing techniques
CA2388738A1 (en) Data analysis and display system for ligation-based dna sequencing
US9334532B2 (en) Complexity reduction method
Rani et al. Transcriptome profiling: methods and applications-A review.
Babenko et al. Investigating extended regulatory regions of genomic DNA sequences.
US20080274558A1 (en) Method for identifying and selecting low copy nucleic segments
US9512467B2 (en) Methods and compositions for the selection and optimization of oligonucleotide tag sequences
Xu et al. Detection of splice isoforms and rare intermediates using multiplexed primer extension sequencing
JP5926189B2 (ja) Rna分析方法
Gildea et al. Multiplexed primer extension sequencing: A targeted RNA-seq method that enables high-precision quantitation of mRNA splicing isoforms and rare pre-mRNA splicing intermediates
US20160239732A1 (en) System and method for using nucleic acid barcodes to monitor biological, chemical, and biochemical materials and processes
JP2008529538A (ja) 相補性デュプリコンの増幅を含む遺伝子分析方法
Edwards et al. DNA sequencing methods contributing to new directions in cereal research
WO2002029379A2 (en) Computer system for designing oligonucleotides used in biochemical methods
EP1145178B1 (en) Method for selecting oligonucleotides having low cross-hybridization
Eaves et al. Tools for the assessment of epigenetic regulation
WO2011145614A1 (ja) 核酸標準物質検出用プローブの設計方法、核酸標準物質検出用プローブ及び当該核酸標準物質検出用プローブを有する核酸検出系
AU2006327111A1 (en) Mitigation of Cot-1 DNA distortion in nucleic acid hybridization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08744701

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2010501272

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2008744701

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE