WO2014145503A2 - Alignement de séquences à l'aide du mappage d'un maximum d'oligonucléotides par la technique "diviser pour régner" (dcmom), appareil, système et procédés associés - Google Patents
Alignement de séquences à l'aide du mappage d'un maximum d'oligonucléotides par la technique "diviser pour régner" (dcmom), appareil, système et procédés associés Download PDFInfo
- Publication number
- WO2014145503A2 WO2014145503A2 PCT/US2014/030288 US2014030288W WO2014145503A2 WO 2014145503 A2 WO2014145503 A2 WO 2014145503A2 US 2014030288 W US2014030288 W US 2014030288W WO 2014145503 A2 WO2014145503 A2 WO 2014145503A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- reference sequence
- elements
- query sequence
- query
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
Definitions
- the determining further comprises recursively repeating steps a-d with additional query sequences until the remaining unmatched portion of the first query sequence is smaller than 3 elements.
- FIG. 3 depicts the example of FIG. 2 farther along in the recursive mapping process.
- FIG. 10 depicts DCMOM's detection of insertions corresponding to the location of known variations.
- the reference sequence may be divided into three subsequences: the Matched Reference Sequence 14.450 corresponding to the area of the reference sequence matched by the Maximally Extended Match 14.420, and those portions to the sides of the Matched Reference Sequence, the Left-of- Match Reference Sequence 14.440 and the Right-of-Match Reference Sequence 14.460.
- the Left-of-Match and Right-of-Match Reference Sequences may be used in a recursive instance of the method as the reference sequence.
- An IMT is the output data from an instance of DCMOM, without complete node merging, represented visually as a tree like structure.
- the Assembled Result 14.900 may represent a Final Junction Tree ("FJT"), if node merging has already been performed on the Right Side Result 14.630 and/or Left Side Result 14.730.
- the FJT is, in certain embodiments, a visual representation of the output data from an instance of DCMOM, similar to an IMT, but where some or all of the nodes have been merged together by classifying the joints between them.
- the Assembled Result 14.900 may in some embodiments be further processed 14.940 by the current instance of DCMOM, including by additional node merging. In some embodiments, further processing or node merging will not be performed on the Assembled Result 14.900 and it will simply be returned 14.950.
- the intron len defines the minimum number of elements that must be found between two adjacent mapped nodes for the joint between the two nodes to be classified as an intron.
- the intron len may, in some embodiments, be between about 2 and about 10,000 elements or nodes. In some embodiments, intron len may be the same as min frag len. Alternatively, intron len could be between about 3 and about 32 and, in certain embodiments, in certain embodiments may be, 4, 5, 6, 7, 8, 9, 12, 15, 20 or 30.
- the entire transcriptome of an organism can be aligned and assembled by the invention.
- the invention can detect and identify the presence of and location of introns and exons within the transcriptome of an organism, including the junctions between introns and exons, between introns and introns, and between exons and exons, as well as splice junctions, and can identify splicing and alternative splicing events.
- the systems and methods of the invention can also detect and identify known or predicted combinations of exons, and unexpected exon pairs that occur through exon skipping, cryptic splicing, gene fusions, or by any other means.
- Sequencing instruments for nucleic acids can include, for example, any of the high throughput sequencing machines from 454 Life Sciences/Roche, Illumina/Solexa, Applied Biosystems/Life Technologies (SOLiD), Helicos Biosciences, Complete Genomics, and Ion Torrent Systems (which generate "next generation sequencing” or "NGS” data), as well as the more traditional machines such as the Sanger sequencing machines.
- BED Tools was used to generate exon coverage statistics for reads aligned within the genomic regions annotated as exons in Ref-seq.
- the breadth of coverage of exons are compared with other methods in the range of depth of coverage up to 100, as displayed in FIG. 12 and FIG. 13.
- DCMOM is superior to TopHat (without annotation).
- DCMOM's overall performance compares favorably to RUM, GSNAP or TopHat with annotation; however, DCMOM does not require the annotation that these other methods require.
- Not requiring annotation is advantageous at least because systems that require annotation are limited to identifying already discovered exons, junctions and variants.
- DCMOM - a system that does not use annotation - has comparable performance to those systems that do use annotation, it is apparent that DCMOM provides a way to discover novel exons, junctions and variants, especially mini exons, previously unreported or marked in annotations.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention concerne un appareil, un système et un procédé nouveaux d'alignement de séquences de données de chaîne arbitraire, comprenant des séquences nucléotidiques. Un noeud racine est sélectionné à partir d'une séquence d'interrogation et localisé dans la séquence de référence. Le noeud racine est étendu pour englober des éléments adjacents qui sont présents dans les séquences d'interrogation et de référence, pour former une correspondance de noeud racine étendu. La zone vers la gauche et/ou la droite de la correspondance de noeud racine étendu est ensuite recherchée récursivement à l'aide du même procédé pour identifier des correspondances supplémentaires. Lorsque la recherche est terminée, les articulations entre les noeuds identifiés sont classées en fonction de leurs caractéristiques comme, par ex., les SNP, les délétions, les substitutions, les insertions, les indels et/ou les introns. L'appareil peut être inclus dans une machine de séquençage d'ADN, ou il peut être une machine autonome. L'invention concerne également des applications non biologiques.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361791948P | 2013-03-15 | 2013-03-15 | |
US61/791,948 | 2013-03-15 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2014145503A2 true WO2014145503A2 (fr) | 2014-09-18 |
WO2014145503A3 WO2014145503A3 (fr) | 2014-11-06 |
Family
ID=51538483
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2014/030288 WO2014145503A2 (fr) | 2013-03-15 | 2014-03-17 | Alignement de séquences à l'aide du mappage d'un maximum d'oligonucléotides par la technique "diviser pour régner" (dcmom), appareil, système et procédés associés |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2014145503A2 (fr) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170116370A1 (en) * | 2015-10-21 | 2017-04-27 | Coherent Logix, Incorporated | DNA Alignment using a Hierarchical Inverted Index Table |
WO2018071054A1 (fr) * | 2016-10-11 | 2018-04-19 | Genomsys Sa | Procédé et système d'accès sélectif de données bioinformatiques mémorisées ou transmises |
US10395759B2 (en) | 2015-05-18 | 2019-08-27 | Regeneron Pharmaceuticals, Inc. | Methods and systems for copy number variant detection |
US11763918B2 (en) | 2016-10-11 | 2023-09-19 | Genomsys Sa | Method and apparatus for the access to bioinformatics data structured in access units |
US12071669B2 (en) | 2016-02-12 | 2024-08-27 | Regeneron Pharmaceuticals, Inc. | Methods and systems for detection of abnormal karyotypes |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090024618A1 (en) * | 2003-11-26 | 2009-01-22 | Wei Fan | System and method for indexing weighted-sequences in large databases |
US20090150313A1 (en) * | 2007-12-06 | 2009-06-11 | Andre Heilper | Vectorization of dynamic-time-warping computation using data reshaping |
US20120041977A1 (en) * | 2009-04-13 | 2012-02-16 | Hitachi, Ltd. | Pair character string retrieval system |
WO2012158621A1 (fr) * | 2011-05-13 | 2012-11-22 | Indiana University Reaserch And Technology Coporation | Mappage sécurisé et évolutif de lectures de séquençage humain sur des nuages hybrides |
-
2014
- 2014-03-17 WO PCT/US2014/030288 patent/WO2014145503A2/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090024618A1 (en) * | 2003-11-26 | 2009-01-22 | Wei Fan | System and method for indexing weighted-sequences in large databases |
US20090150313A1 (en) * | 2007-12-06 | 2009-06-11 | Andre Heilper | Vectorization of dynamic-time-warping computation using data reshaping |
US20120041977A1 (en) * | 2009-04-13 | 2012-02-16 | Hitachi, Ltd. | Pair character string retrieval system |
WO2012158621A1 (fr) * | 2011-05-13 | 2012-11-22 | Indiana University Reaserch And Technology Coporation | Mappage sécurisé et évolutif de lectures de séquençage humain sur des nuages hybrides |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10395759B2 (en) | 2015-05-18 | 2019-08-27 | Regeneron Pharmaceuticals, Inc. | Methods and systems for copy number variant detection |
US11568957B2 (en) | 2015-05-18 | 2023-01-31 | Regeneron Pharmaceuticals Inc. | Methods and systems for copy number variant detection |
CN108140071A (zh) * | 2015-10-21 | 2018-06-08 | 相干逻辑公司 | 使用分级反向索引表的dna比对 |
US20170116370A1 (en) * | 2015-10-21 | 2017-04-27 | Coherent Logix, Incorporated | DNA Alignment using a Hierarchical Inverted Index Table |
JP2018535484A (ja) * | 2015-10-21 | 2018-11-29 | コーヒレント・ロジックス・インコーポレーテッド | 階層的転置索引表を使用したdnaアラインメント |
CN108140071B (zh) * | 2015-10-21 | 2022-04-29 | 相干逻辑公司 | 使用分级反向索引表的dna比对 |
WO2017070514A1 (fr) * | 2015-10-21 | 2017-04-27 | Coherent Logix, Incorporated | Alignement d'adn à l'aide d'une table d'index inversés hiérarchique |
US11594301B2 (en) | 2015-10-21 | 2023-02-28 | Coherent Logix, Incorporated | DNA alignment using a hierarchical inverted index table |
US12087403B2 (en) | 2015-10-21 | 2024-09-10 | Coherent Logix, Incorporated | DNA alignment using a hierarchical inverted index table |
US12071669B2 (en) | 2016-02-12 | 2024-08-27 | Regeneron Pharmaceuticals, Inc. | Methods and systems for detection of abnormal karyotypes |
WO2018071054A1 (fr) * | 2016-10-11 | 2018-04-19 | Genomsys Sa | Procédé et système d'accès sélectif de données bioinformatiques mémorisées ou transmises |
CN110603595A (zh) * | 2016-10-11 | 2019-12-20 | 耶诺姆希斯股份公司 | 用于从压缩的基因组序列读段重建基因组参考序列的方法和系统 |
US11404143B2 (en) | 2016-10-11 | 2022-08-02 | Genomsys Sa | Method and systems for the indexing of bioinformatics data |
CN110603595B (zh) * | 2016-10-11 | 2023-08-08 | 耶诺姆希斯股份公司 | 用于从压缩的基因组序列读段重建基因组参考序列的方法和系统 |
US11763918B2 (en) | 2016-10-11 | 2023-09-19 | Genomsys Sa | Method and apparatus for the access to bioinformatics data structured in access units |
Also Published As
Publication number | Publication date |
---|---|
WO2014145503A3 (fr) | 2014-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11697835B2 (en) | Systems and methods for epigenetic analysis | |
US20220238180A1 (en) | Methods and systems for genome analysis | |
Liao et al. | Current challenges and solutions of de novo assembly | |
Chatterjee et al. | Comparison of alignment software for genome-wide bisulphite sequence data | |
US20200165683A1 (en) | Systems and methods for analyzing circulating tumor dna | |
Schubert et al. | Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX | |
Siragusa et al. | Fast and accurate read mapping with approximate seeds and multiple backtracking | |
Shajii et al. | Fast genotyping of known SNPs through approximate k-mer matching | |
Tripathi et al. | Next-generation sequencing revolution through big data analytics | |
JP2021525104A (ja) | 選択的スプライシングの解析のためのシステムおよび方法 | |
Coonrod et al. | Developing genome and exome sequencing for candidate gene identification in inherited disorders: an integrated technical and bioinformatics approach | |
JP2017500004A (ja) | 遺伝子試料について遺伝子型解析するための方法およびシステム | |
Knowles et al. | Grape RNA-Seq analysis pipeline environment | |
WO2014145503A2 (fr) | Alignement de séquences à l'aide du mappage d'un maximum d'oligonucléotides par la technique "diviser pour régner" (dcmom), appareil, système et procédés associés | |
Li et al. | An NGS workflow blueprint for DNA sequencing data and its application in individualized molecular oncology | |
Wu et al. | SOAPfusion: a robust and effective computational fusion discovery tool for RNA-seq reads | |
Chen et al. | Recent advances in sequence assembly: principles and applications | |
Sater et al. | UMI-VarCal: a new UMI-based variant caller that efficiently improves low-frequency variant detection in paired-end sequencing NGS libraries | |
US20130253839A1 (en) | Surprisal data reduction of genetic data for transmission, storage, and analysis | |
US20170076047A1 (en) | Systems and methods for genetic testing | |
Molinari et al. | Transcriptome analysis using RNA-Seq fromexperiments with and without biological replicates: areview | |
Moraga et al. | BrumiR: A toolkit for de novo discovery of microRNAs from sRNA-seq data | |
CN111542616A (zh) | 脱氨引起的序列错误的纠正 | |
Deshpande et al. | RNA-seq data science: From raw data to effective interpretation | |
Costa-Silva et al. | Computational methods for differentially expressed gene analysis from RNA-Seq: an overview |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14765762 Country of ref document: EP Kind code of ref document: A2 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14765762 Country of ref document: EP Kind code of ref document: A2 |