WO2019017806A1 - Appareil et procédé d'identification d'haplotypes - Google Patents
Appareil et procédé d'identification d'haplotypes Download PDFInfo
- Publication number
- WO2019017806A1 WO2019017806A1 PCT/RU2017/000538 RU2017000538W WO2019017806A1 WO 2019017806 A1 WO2019017806 A1 WO 2019017806A1 RU 2017000538 W RU2017000538 W RU 2017000538W WO 2019017806 A1 WO2019017806 A1 WO 2019017806A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- allele
- sequences
- allele sequences
- sequence
- aggregated
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
Definitions
- the state-of-the-art MixSIH method has several critical problems as follows: the MixSIH method performs merely single individual haplotyping and cannot be applied to the multiple genomes; the MixSIH method specializes in single individual haplotyping and, thus, cannot produce more than two haplotypes; the MixSIH method uses complex formulas in the process of haplotype inferring and thus cannot provide optimal performances; the MixSIH method does not support de-novo assembly of haplotypes and can lose the quality of haplotyping in the regions with a high frequency of repetitions; the MixSIH method does not take into account the Phred quality of nucleotide identification and thus cannot produce results with best precision.
- the invention relates to an apparatus and method for identifying haplotypes in a plurality of sample nucleotide sequences. More specifically, a novel apparatus and method are provided for overlapping haplotyping in the regions with a low frequency of repetitions of nucleotide subsequences in order to overcome the drawbacks of conventional haplotyping methods.
- the present invention offers several significant advantages compared to the prior art: firstly, the invention provides a method of identifying haplotypes in a sample of multiple genomes. In contrast to existing solutions, this method can take into account all available alleles and their possible combinations. Secondly, the invention develops a method for selection of an expected number of haplotypes.
- Fig. 6 shows a schematic diagram illustrating a method for haplotyping implemented in an apparatus according to an embodiment
- Fig. 7 shows a schematic diagram illustrating different stages of a method for haplotyping implemented in an apparatus according to an embodiment
- Fig. 10 shows a diagram illustrating the generation of unique hash codes for nucleotide sequences as implemented in an apparatus according to an embodiment
- the method of adaptive haplotyping can efficiently perform haplotyping on the regions of a genome with different frequencies of repetitions of nucleotide sequences by generating a unique Hash-Code for quick identification of repetitive subsequences of a predefined length, and hence can determine to apply the novel method, i.e., overlapping haplotyping, which is applicable to the regions with a low frequency of repetitions, or to apply the de-novo assembly method, which is applicable to the regions with a high frequency of repetitions.
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention concerne un appareil (400) permettant d'identifier des haplotypes dans une pluralité de séquences nucléotidiques d'échantillons d'après une séquence nucléotidique de référence. L'appareil (400) comprend une unité de traitement (401) configurée pour générer un ensemble initial de séquences d'allèles en extrayant une pluralité de séquences d'allèles de la pluralité de séquences nucléotidiques d'échantillon d'après la séquence nucléotidique de référence, chaque allèle de chaque séquence de la pluralité de séquences d'allèles étant associé à un site nucléotidique dans la séquence nucléotidique de référence ; générer un premier ensemble agrégé de séquences d'allèles d'après l'ensemble initial de séquences d'allèles en combinant les séquences d'allèles de l'ensemble initial de séquences d'allèles, qui possèdent les mêmes allèles dans des parties de séquence en chevauchement et appartiennent au même haplotype, dans une séquence d'allèles agrégée, le premier ensemble agrégé de séquences d'allèles comprenant les séquences d'allèles agrégées et les séquences d'allèles de l'ensemble initial des séquences d'allèles qui ne sont pas combinées dans une séquence d'allèles agrégée ; générer un second ensemble agrégé de séquences d'allèles d'après le premier ensemble agrégé de séquences d'allèles en concaténant des paires de séquences d'allèles voisines du premier ensemble agrégé de séquences d'allèles, les séquences d'allèles voisines comprenant des allèles dans des sites nucléotidiques voisins, mais pas d'allèles en chevauchement ; et identifier des haplotypes dans la pluralité de séquences nucléotidiques d'échantillon d'après le second ensemble agrégé de séquences d'allèles.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201780093397.5A CN111344794B (zh) | 2017-07-20 | 2017-07-20 | 用于鉴定单体型的装置和方法 |
PCT/RU2017/000538 WO2019017806A1 (fr) | 2017-07-20 | 2017-07-20 | Appareil et procédé d'identification d'haplotypes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/RU2017/000538 WO2019017806A1 (fr) | 2017-07-20 | 2017-07-20 | Appareil et procédé d'identification d'haplotypes |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019017806A1 true WO2019017806A1 (fr) | 2019-01-24 |
Family
ID=59895353
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/RU2017/000538 WO2019017806A1 (fr) | 2017-07-20 | 2017-07-20 | Appareil et procédé d'identification d'haplotypes |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111344794B (fr) |
WO (1) | WO2019017806A1 (fr) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112289376B (zh) * | 2020-10-26 | 2021-07-06 | 北京吉因加医学检验实验室有限公司 | 一种检测体细胞突变的方法及装置 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9213947B1 (en) * | 2012-11-08 | 2015-12-15 | 23Andme, Inc. | Scalable pipeline for local ancestry inference |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040197775A1 (en) * | 1989-08-25 | 2004-10-07 | Genetype A.G. | Intron sequence analysis method for detection of adjacent and remote locus alleles as haplotypes |
IL144515A0 (en) * | 2000-07-31 | 2002-05-23 | Pfizer Prod Inc | A pcr-based multiplex assay for determining haplotype |
CN1712545B (zh) * | 2004-06-23 | 2010-12-08 | 中国医学科学院阜外心血管病医院 | Ⅰ型血管紧张素ⅱ受体基因的标签单核苷酸多态性位点及其组成的单体型 |
CN101539967B (zh) * | 2008-12-12 | 2010-12-01 | 深圳华大基因研究院 | 一种单核苷酸多态性检测方法 |
CN102460155B (zh) * | 2009-04-29 | 2015-03-25 | 考利达基因组股份有限公司 | 用于关于参考多核苷酸序列标注样本多核苷酸序列中的变异的方法和系统 |
US11725237B2 (en) * | 2013-12-05 | 2023-08-15 | The Broad Institute Inc. | Polymorphic gene typing and somatic change detection using sequencing data |
US9670530B2 (en) * | 2014-01-30 | 2017-06-06 | Illumina, Inc. | Haplotype resolved genome sequencing |
CN107002121B (zh) * | 2014-09-18 | 2020-11-13 | 亿明达股份有限公司 | 用于分析核酸测序数据的方法和系统 |
-
2017
- 2017-07-20 WO PCT/RU2017/000538 patent/WO2019017806A1/fr active Application Filing
- 2017-07-20 CN CN201780093397.5A patent/CN111344794B/zh active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9213947B1 (en) * | 2012-11-08 | 2015-12-15 | 23Andme, Inc. | Scalable pipeline for local ancestry inference |
Non-Patent Citations (7)
Title |
---|
D. HE ET AL: "Optimal algorithms for haplotype assembly from whole-genome sequence data", BIOINFORMATICS., vol. 26, no. 12, 6 June 2010 (2010-06-06), GB, pages i183 - i190, XP055386396, ISSN: 1367-4803, DOI: 10.1093/bioinformatics/btq215 * |
HIROTAKA MATSUMOTO ET AL: "MixSIH: a mixture model for single individual haplotyping", BMC GENOMICS, BIOMED CENTRAL, vol. 14, no. Suppl 2, 15 February 2013 (2013-02-15), pages S5, XP021138646, ISSN: 1471-2164, DOI: 10.1186/1471-2164-14-S2-S5 * |
MATSUMOTO H.; KIRYU H.: "MixSIH: a mixture model for single individual haplotyping", BMC GENOMICS, vol. 14, 2013, pages S5, XP021138646, DOI: doi:10.1186/1471-2164-14-S2-S5 |
RHEE JE-KEUN ET AL: "Survey of computational haplotype determination methods for single individual", GENES & GENOMICS, THE GENETICS SOCIETY OF KOREA, HEIDELBERG, vol. 38, no. 1, 15 October 2015 (2015-10-15), pages 1 - 12, XP035966832, ISSN: 1976-9571, [retrieved on 20151015], DOI: 10.1007/S13258-015-0342-X * |
SHUYING S.: "Haplotype inference using a Hidden Markov Model with efficient Markov Chain sampling", A THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY, 2007 |
SOYEON AHN ET AL: "Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm", BMC BIOINFORMATICS, BIOMED CENTRAL, LONDON, GB, vol. 16, no. 1, 16 July 2015 (2015-07-16), pages 223, XP021225807, ISSN: 1471-2105, DOI: 10.1186/S12859-015-0651-8 * |
V. KULESHOV: "Probabilistic single-individual haplotyping", BIOINFORMATICS., vol. 30, no. 17, 26 August 2014 (2014-08-26), GB, pages i379 - i385, XP055465042, ISSN: 1367-4803, DOI: 10.1093/bioinformatics/btu484 * |
Also Published As
Publication number | Publication date |
---|---|
CN111344794B (zh) | 2024-04-23 |
CN111344794A (zh) | 2020-06-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200399719A1 (en) | Systems and methods for analyzing viral nucleic acids | |
Song et al. | Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads | |
Song et al. | Capturing the phylogeny of Holometabola with mitochondrial genome data and Bayesian site-heterogeneous mixture models | |
JP5985040B2 (ja) | データ解析装置、及びその方法 | |
WO2017123864A1 (fr) | Systèmes et procédés destinés à l'analyse d'adn tumoral circulant | |
WO2016141294A1 (fr) | Systèmes et procédés d'analyse de motifs génomiques | |
US20080281819A1 (en) | Non-random control data set generation for facilitating genomic data processing | |
RU2015136780A (ru) | Способы, системы и программное обеспечение для идентификации биомолекул с помощью моделей мультипликативной формы | |
CN110797088B (zh) | 全基因组重测序分析及用于全基因组重测序分析的方法 | |
He et al. | De novo assembly methods for next generation sequencing data | |
CA3131752A1 (fr) | Conception de sondes pour appauvrir des transcrits abondants | |
EP2394165A1 (fr) | Cartographie de séquences d'oligomères | |
WO2019017806A1 (fr) | Appareil et procédé d'identification d'haplotypes | |
WO2023209614A1 (fr) | Conception de guide et recherches hors cible | |
CN103793626A (zh) | 碱基序列比对系统及方法 | |
Vasimuddin et al. | Identification of significant computational building blocks through comprehensive investigation of NGS secondary analysis methods | |
Sacomoto et al. | A polynomial delay algorithm for the enumeration of bubbles with length constraints in directed graphs and its application to the detection of alternative splicing in RNA-seq data | |
Martin | Algorithms and tools for the analysis of high throughput DNA sequencing data | |
Bacci | Raw sequence data and quality control | |
Wang | Using PhyloCon to identify conserved regulatory motifs | |
Milicchio et al. | Hercool: high-throughput error correction by oligomers | |
Gunewardena | Optimum-time, optimum-space, algorithms for k-mer analysis of whole genome sequences | |
Federico et al. | An efficient algorithm for planted structured motif extraction | |
Bryant Jr et al. | De novo short-read assembly | |
Vaser | De novo transcriptome assembly |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17768532 Country of ref document: EP Kind code of ref document: A1 |