BR112018007092B1 - Alinhamento de dna com o uso de uma tabela de índice hierárquico invertido - Google Patents

Alinhamento de dna com o uso de uma tabela de índice hierárquico invertido Download PDF

Info

Publication number
BR112018007092B1
BR112018007092B1 BR112018007092-0A BR112018007092A BR112018007092B1 BR 112018007092 B1 BR112018007092 B1 BR 112018007092B1 BR 112018007092 A BR112018007092 A BR 112018007092A BR 112018007092 B1 BR112018007092 B1 BR 112018007092B1
Authority
BR
Brazil
Prior art keywords
index table
entry
level
reference data
length
Prior art date
Application number
BR112018007092-0A
Other languages
English (en)
Portuguese (pt)
Other versions
BR112018007092A2 (pt
Inventor
Michael B. Doerr
Jan D. Garmany
Stephen V. Wood
Daemon G. Arastas
Martin A. Hunt
Original Assignee
Coherent Logix, Incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Coherent Logix, Incorporated filed Critical Coherent Logix, Incorporated
Publication of BR112018007092A2 publication Critical patent/BR112018007092A2/pt
Publication of BR112018007092B1 publication Critical patent/BR112018007092B1/pt

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioethics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
BR112018007092-0A 2015-10-21 2016-10-21 Alinhamento de dna com o uso de uma tabela de índice hierárquico invertido BR112018007092B1 (pt)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201562244541P 2015-10-21 2015-10-21
US62/244,541 2015-10-21
PCT/US2016/058183 WO2017070514A1 (en) 2015-10-21 2016-10-21 Dna alignment using a hierarchical inverted index table

Publications (2)

Publication Number Publication Date
BR112018007092A2 BR112018007092A2 (pt) 2018-10-23
BR112018007092B1 true BR112018007092B1 (pt) 2024-02-20

Family

ID=58557902

Family Applications (1)

Application Number Title Priority Date Filing Date
BR112018007092-0A BR112018007092B1 (pt) 2015-10-21 2016-10-21 Alinhamento de dna com o uso de uma tabela de índice hierárquico invertido

Country Status (7)

Country Link
US (3) US11594301B2 (enExample)
EP (1) EP3365821B1 (enExample)
JP (1) JP6884143B2 (enExample)
KR (1) KR20180072684A (enExample)
CN (2) CN108140071B (enExample)
BR (1) BR112018007092B1 (enExample)
WO (1) WO2017070514A1 (enExample)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8705623B2 (en) 2009-10-02 2014-04-22 Texas Instruments Incorporated Line-based compression for digital image data
US12125559B2 (en) * 2019-05-14 2024-10-22 Samsung Electronics Co., Ltd. Parallelizable sequence alignment systems and methods
CN112948446B (zh) * 2019-11-26 2024-08-16 北京京东振世信息技术有限公司 一种匹配产品单据的方法和装置
CN111402959A (zh) * 2020-03-13 2020-07-10 苏州浪潮智能科技有限公司 一种序列比对的方法、系统、设备及可读存储介质
IL281960B2 (en) 2021-04-01 2025-12-01 Zimmerman Israel System and method for rapid statistical pattern discovery
CN114329135B (zh) * 2021-12-08 2025-06-03 腾讯科技(深圳)有限公司 一种索引点离线排序方法、装置、设备及存储介质
CN116010427B (zh) * 2023-02-13 2025-11-14 长鑫存储技术有限公司 一种编号分配方法、装置、电子设备及存储介质

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08272824A (ja) * 1995-03-31 1996-10-18 Hitachi Software Eng Co Ltd 遺伝子配列データ自動検索方法
US20040153255A1 (en) * 2003-02-03 2004-08-05 Ahn Tae-Jin Apparatus and method for encoding DNA sequence, and computer readable medium
WO2005096208A1 (ja) * 2004-03-31 2005-10-13 Bio-Think Tank Co., Ltd. 塩基配列検索装置及び塩基配列検索方法
US7702640B1 (en) * 2005-12-29 2010-04-20 Amazon Technologies, Inc. Stratified unbalanced trees for indexing of data items within a computer system
WO2007137225A2 (en) * 2006-05-19 2007-11-29 The University Of Chicago Method for indexing nucleic acid sequences for computer based searching
US8271206B2 (en) * 2008-04-21 2012-09-18 Softgenetics Llc DNA sequence assembly methods of short reads
WO2010104608A2 (en) * 2009-03-13 2010-09-16 Life Technologies Corporation Computer implemented method for indexing reference genome
CN101984445B (zh) 2010-03-04 2012-03-14 深圳华大基因科技有限公司 一种基于聚合酶链式反应产物测序序列分型的实现方法和系统
US20140163900A1 (en) * 2012-06-02 2014-06-12 Whitehead Institute For Biomedical Research Analyzing short tandem repeats from high throughput sequencing data for genetic applications
US9679104B2 (en) * 2013-01-17 2017-06-13 Edico Genome, Corp. Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform
US9792405B2 (en) * 2013-01-17 2017-10-17 Edico Genome, Corp. Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform
US10381106B2 (en) * 2013-01-28 2019-08-13 Hasso-Plattner-Institut Fuer Softwaresystemtechnik Gmbh Efficient genomic read alignment in an in-memory database
WO2014145503A2 (en) * 2013-03-15 2014-09-18 Lieber Institute For Brain Development Sequence alignment using divide and conquer maximum oligonucleotide mapping (dcmom), apparatus, system and method related thereto
US10191929B2 (en) * 2013-05-29 2019-01-29 Noblis, Inc. Systems and methods for SNP analysis and genome sequencing
CN103336916B (zh) * 2013-07-05 2016-04-06 中国科学院数学与系统科学研究院 一种测序序列映射方法及系统
NL2011817C2 (en) * 2013-11-19 2015-05-26 Genalice B V A method of generating a reference index data structure and method for finding a position of a data pattern in a reference data structure.
WO2015127058A1 (en) * 2014-02-19 2015-08-27 Hospodor Andrew Efficient encoding and storage and retrieval of genomic data
NL2013120B1 (en) * 2014-07-03 2016-09-20 Genalice B V A method for finding associated positions of bases of a read on a reference genome.

Also Published As

Publication number Publication date
US20250174304A1 (en) 2025-05-29
CN108140071B (zh) 2022-04-29
US11594301B2 (en) 2023-02-28
EP3365821A4 (en) 2019-06-26
EP3365821A1 (en) 2018-08-29
CN108140071A (zh) 2018-06-08
US20170116370A1 (en) 2017-04-27
JP2018535484A (ja) 2018-11-29
CN114783523A (zh) 2022-07-22
BR112018007092A2 (pt) 2018-10-23
JP6884143B2 (ja) 2021-06-09
EP3365821B1 (en) 2022-06-29
US12087403B2 (en) 2024-09-10
WO2017070514A1 (en) 2017-04-27
KR20180072684A (ko) 2018-06-29
US20240203527A1 (en) 2024-06-20

Similar Documents

Publication Publication Date Title
US12087403B2 (en) DNA alignment using a hierarchical inverted index table
Hoffmann et al. Fast mapping of short sequences with mismatches, insertions and deletions using index structures
Tran et al. Objective and comprehensive evaluation of bisulfite short read mapping tools
CN113826168A (zh) 用于散列表基因组映射的灵活种子延伸
Pham et al. Pathset graphs: a novel approach for comprehensive utilization of paired reads in genome assembly
Ndiaye et al. When less is more: sketching with minimizers in genomics
WO2018165762A1 (en) Systems and methods for determining effects of genetic variation on splice site selection
Srivatsa et al. A clonal evolution simulator for planning somatic evolution studies
He et al. Iped2: Inheritance path based pedigree reconstruction algorithm for complicated pedigrees
Xulvi-Brunet et al. Computational analysis of fitness landscapes and evolutionary networks from in vitro evolution experiments
Vezzi Next generation sequencing revolution challenges: Search, assemble, and validate genomes
Mäkinen et al. Unified view of backward backtracking in short read mapping
Das et al. Optimal haplotype assembly via a branch-and-bound algorithm
Zeng et al. Improved parallel processing of massive de bruijn graph for genome assembly
Guo Effloc: An Efficient Locating Algorithm for Mass-Occurrence Biological Patterns with FM-Index
Denti Algorithms for analyzing genetic variability from Next-Generation Sequencing data
Filion et al. Calibrating seed-based heuristics to map short DNA reads
Vyverman ALFALFA: fast and accurate mapping of long next generation sequencing reads
Matzoros Performance characterization and acceleration of genome-mapping tools on HPC environments
Ekim Scalable sketching and indexing algorithms for large biological datasets
Tennakoon Fast and Accurate Mapping of Next Generation Sequencing Data
Zhao Genomic variation detection using dynamic programming methods
Di Donato Leveraging succinct data structures for the burrows-wheeler mapping of short sequence reads on FPGA
Stephens Empirical accuracy bounds for next-generation sequencing variant calling workflows
Bryant Jr et al. De novo short-read assembly

Legal Events

Date Code Title Description
B06U Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]
B06A Patent application procedure suspended [chapter 6.1 patent gazette]
B09A Decision: intention to grant [chapter 9.1 patent gazette]
B16A Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]

Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 21/10/2016, OBSERVADAS AS CONDICOES LEGAIS