WO2019017806A1 - Appareil et procédé d'identification d'haplotypes - Google Patents

Appareil et procédé d'identification d'haplotypes Download PDF

Info

Publication number
WO2019017806A1
WO2019017806A1 PCT/RU2017/000538 RU2017000538W WO2019017806A1 WO 2019017806 A1 WO2019017806 A1 WO 2019017806A1 RU 2017000538 W RU2017000538 W RU 2017000538W WO 2019017806 A1 WO2019017806 A1 WO 2019017806A1
Authority
WO
WIPO (PCT)
Prior art keywords
allele
sequences
allele sequences
sequence
aggregated
Prior art date
Application number
PCT/RU2017/000538
Other languages
English (en)
Inventor
Dmitry Yurievich IGNATOV
Alexander Nikolaevich Filippov
Xuecang ZHANG
Original Assignee
Huawei Technologies Co., Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd filed Critical Huawei Technologies Co., Ltd
Priority to CN201780093397.5A priority Critical patent/CN111344794B/zh
Priority to PCT/RU2017/000538 priority patent/WO2019017806A1/fr
Publication of WO2019017806A1 publication Critical patent/WO2019017806A1/fr

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Definitions

  • the state-of-the-art MixSIH method has several critical problems as follows: the MixSIH method performs merely single individual haplotyping and cannot be applied to the multiple genomes; the MixSIH method specializes in single individual haplotyping and, thus, cannot produce more than two haplotypes; the MixSIH method uses complex formulas in the process of haplotype inferring and thus cannot provide optimal performances; the MixSIH method does not support de-novo assembly of haplotypes and can lose the quality of haplotyping in the regions with a high frequency of repetitions; the MixSIH method does not take into account the Phred quality of nucleotide identification and thus cannot produce results with best precision.
  • the invention relates to an apparatus and method for identifying haplotypes in a plurality of sample nucleotide sequences. More specifically, a novel apparatus and method are provided for overlapping haplotyping in the regions with a low frequency of repetitions of nucleotide subsequences in order to overcome the drawbacks of conventional haplotyping methods.
  • the present invention offers several significant advantages compared to the prior art: firstly, the invention provides a method of identifying haplotypes in a sample of multiple genomes. In contrast to existing solutions, this method can take into account all available alleles and their possible combinations. Secondly, the invention develops a method for selection of an expected number of haplotypes.
  • Fig. 6 shows a schematic diagram illustrating a method for haplotyping implemented in an apparatus according to an embodiment
  • Fig. 7 shows a schematic diagram illustrating different stages of a method for haplotyping implemented in an apparatus according to an embodiment
  • Fig. 10 shows a diagram illustrating the generation of unique hash codes for nucleotide sequences as implemented in an apparatus according to an embodiment
  • the method of adaptive haplotyping can efficiently perform haplotyping on the regions of a genome with different frequencies of repetitions of nucleotide sequences by generating a unique Hash-Code for quick identification of repetitive subsequences of a predefined length, and hence can determine to apply the novel method, i.e., overlapping haplotyping, which is applicable to the regions with a low frequency of repetitions, or to apply the de-novo assembly method, which is applicable to the regions with a high frequency of repetitions.

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

L'invention concerne un appareil (400) permettant d'identifier des haplotypes dans une pluralité de séquences nucléotidiques d'échantillons d'après une séquence nucléotidique de référence. L'appareil (400) comprend une unité de traitement (401) configurée pour générer un ensemble initial de séquences d'allèles en extrayant une pluralité de séquences d'allèles de la pluralité de séquences nucléotidiques d'échantillon d'après la séquence nucléotidique de référence, chaque allèle de chaque séquence de la pluralité de séquences d'allèles étant associé à un site nucléotidique dans la séquence nucléotidique de référence ; générer un premier ensemble agrégé de séquences d'allèles d'après l'ensemble initial de séquences d'allèles en combinant les séquences d'allèles de l'ensemble initial de séquences d'allèles, qui possèdent les mêmes allèles dans des parties de séquence en chevauchement et appartiennent au même haplotype, dans une séquence d'allèles agrégée, le premier ensemble agrégé de séquences d'allèles comprenant les séquences d'allèles agrégées et les séquences d'allèles de l'ensemble initial des séquences d'allèles qui ne sont pas combinées dans une séquence d'allèles agrégée ; générer un second ensemble agrégé de séquences d'allèles d'après le premier ensemble agrégé de séquences d'allèles en concaténant des paires de séquences d'allèles voisines du premier ensemble agrégé de séquences d'allèles, les séquences d'allèles voisines comprenant des allèles dans des sites nucléotidiques voisins, mais pas d'allèles en chevauchement ; et identifier des haplotypes dans la pluralité de séquences nucléotidiques d'échantillon d'après le second ensemble agrégé de séquences d'allèles.
PCT/RU2017/000538 2017-07-20 2017-07-20 Appareil et procédé d'identification d'haplotypes WO2019017806A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201780093397.5A CN111344794B (zh) 2017-07-20 2017-07-20 用于鉴定单体型的装置和方法
PCT/RU2017/000538 WO2019017806A1 (fr) 2017-07-20 2017-07-20 Appareil et procédé d'identification d'haplotypes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RU2017/000538 WO2019017806A1 (fr) 2017-07-20 2017-07-20 Appareil et procédé d'identification d'haplotypes

Publications (1)

Publication Number Publication Date
WO2019017806A1 true WO2019017806A1 (fr) 2019-01-24

Family

ID=59895353

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RU2017/000538 WO2019017806A1 (fr) 2017-07-20 2017-07-20 Appareil et procédé d'identification d'haplotypes

Country Status (2)

Country Link
CN (1) CN111344794B (fr)
WO (1) WO2019017806A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112289376B (zh) * 2020-10-26 2021-07-06 北京吉因加医学检验实验室有限公司 一种检测体细胞突变的方法及装置

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9213947B1 (en) * 2012-11-08 2015-12-15 23Andme, Inc. Scalable pipeline for local ancestry inference

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040197775A1 (en) * 1989-08-25 2004-10-07 Genetype A.G. Intron sequence analysis method for detection of adjacent and remote locus alleles as haplotypes
IL144515A0 (en) * 2000-07-31 2002-05-23 Pfizer Prod Inc A pcr-based multiplex assay for determining haplotype
CN1712545B (zh) * 2004-06-23 2010-12-08 中国医学科学院阜外心血管病医院 Ⅰ型血管紧张素ⅱ受体基因的标签单核苷酸多态性位点及其组成的单体型
CN101539967B (zh) * 2008-12-12 2010-12-01 深圳华大基因研究院 一种单核苷酸多态性检测方法
CN102460155B (zh) * 2009-04-29 2015-03-25 考利达基因组股份有限公司 用于关于参考多核苷酸序列标注样本多核苷酸序列中的变异的方法和系统
US11725237B2 (en) * 2013-12-05 2023-08-15 The Broad Institute Inc. Polymorphic gene typing and somatic change detection using sequencing data
US9670530B2 (en) * 2014-01-30 2017-06-06 Illumina, Inc. Haplotype resolved genome sequencing
CN107002121B (zh) * 2014-09-18 2020-11-13 亿明达股份有限公司 用于分析核酸测序数据的方法和系统

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9213947B1 (en) * 2012-11-08 2015-12-15 23Andme, Inc. Scalable pipeline for local ancestry inference

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
D. HE ET AL: "Optimal algorithms for haplotype assembly from whole-genome sequence data", BIOINFORMATICS., vol. 26, no. 12, 6 June 2010 (2010-06-06), GB, pages i183 - i190, XP055386396, ISSN: 1367-4803, DOI: 10.1093/bioinformatics/btq215 *
HIROTAKA MATSUMOTO ET AL: "MixSIH: a mixture model for single individual haplotyping", BMC GENOMICS, BIOMED CENTRAL, vol. 14, no. Suppl 2, 15 February 2013 (2013-02-15), pages S5, XP021138646, ISSN: 1471-2164, DOI: 10.1186/1471-2164-14-S2-S5 *
MATSUMOTO H.; KIRYU H.: "MixSIH: a mixture model for single individual haplotyping", BMC GENOMICS, vol. 14, 2013, pages S5, XP021138646, DOI: doi:10.1186/1471-2164-14-S2-S5
RHEE JE-KEUN ET AL: "Survey of computational haplotype determination methods for single individual", GENES & GENOMICS, THE GENETICS SOCIETY OF KOREA, HEIDELBERG, vol. 38, no. 1, 15 October 2015 (2015-10-15), pages 1 - 12, XP035966832, ISSN: 1976-9571, [retrieved on 20151015], DOI: 10.1007/S13258-015-0342-X *
SHUYING S.: "Haplotype inference using a Hidden Markov Model with efficient Markov Chain sampling", A THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY, 2007
SOYEON AHN ET AL: "Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm", BMC BIOINFORMATICS, BIOMED CENTRAL, LONDON, GB, vol. 16, no. 1, 16 July 2015 (2015-07-16), pages 223, XP021225807, ISSN: 1471-2105, DOI: 10.1186/S12859-015-0651-8 *
V. KULESHOV: "Probabilistic single-individual haplotyping", BIOINFORMATICS., vol. 30, no. 17, 26 August 2014 (2014-08-26), GB, pages i379 - i385, XP055465042, ISSN: 1367-4803, DOI: 10.1093/bioinformatics/btu484 *

Also Published As

Publication number Publication date
CN111344794B (zh) 2024-04-23
CN111344794A (zh) 2020-06-26

Similar Documents

Publication Publication Date Title
US20200399719A1 (en) Systems and methods for analyzing viral nucleic acids
Song et al. Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads
Song et al. Capturing the phylogeny of Holometabola with mitochondrial genome data and Bayesian site-heterogeneous mixture models
JP5985040B2 (ja) データ解析装置、及びその方法
WO2017123864A1 (fr) Systèmes et procédés destinés à l'analyse d'adn tumoral circulant
WO2016141294A1 (fr) Systèmes et procédés d'analyse de motifs génomiques
US20080281819A1 (en) Non-random control data set generation for facilitating genomic data processing
RU2015136780A (ru) Способы, системы и программное обеспечение для идентификации биомолекул с помощью моделей мультипликативной формы
CN110797088B (zh) 全基因组重测序分析及用于全基因组重测序分析的方法
He et al. De novo assembly methods for next generation sequencing data
CA3131752A1 (fr) Conception de sondes pour appauvrir des transcrits abondants
EP2394165A1 (fr) Cartographie de séquences d'oligomères
WO2019017806A1 (fr) Appareil et procédé d'identification d'haplotypes
WO2023209614A1 (fr) Conception de guide et recherches hors cible
CN103793626A (zh) 碱基序列比对系统及方法
Vasimuddin et al. Identification of significant computational building blocks through comprehensive investigation of NGS secondary analysis methods
Sacomoto et al. A polynomial delay algorithm for the enumeration of bubbles with length constraints in directed graphs and its application to the detection of alternative splicing in RNA-seq data
Martin Algorithms and tools for the analysis of high throughput DNA sequencing data
Bacci Raw sequence data and quality control
Wang Using PhyloCon to identify conserved regulatory motifs
Milicchio et al. Hercool: high-throughput error correction by oligomers
Gunewardena Optimum-time, optimum-space, algorithms for k-mer analysis of whole genome sequences
Federico et al. An efficient algorithm for planted structured motif extraction
Bryant Jr et al. De novo short-read assembly
Vaser De novo transcriptome assembly

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17768532

Country of ref document: EP

Kind code of ref document: A1