CN106795568A - 测序读段的de novo组装的方法、系统和过程 - Google Patents

测序读段的de novo组装的方法、系统和过程 Download PDF

Info

Publication number
CN106795568A
CN106795568A CN201580054801.9A CN201580054801A CN106795568A CN 106795568 A CN106795568 A CN 106795568A CN 201580054801 A CN201580054801 A CN 201580054801A CN 106795568 A CN106795568 A CN 106795568A
Authority
CN
China
Prior art keywords
read
contig
reads
overlap
storage medium
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201580054801.9A
Other languages
English (en)
Chinese (zh)
Inventor
K·康维卡
K·雅各布斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Invitae Corp
Original Assignee
Invitae Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Invitae Corp filed Critical Invitae Corp
Publication of CN106795568A publication Critical patent/CN106795568A/zh
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
CN201580054801.9A 2014-10-10 2015-10-09 测序读段的de novo组装的方法、系统和过程 Pending CN106795568A (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201462062636P 2014-10-10 2014-10-10
US62/062,636 2014-10-10
PCT/IB2015/057716 WO2016055971A2 (en) 2014-10-10 2015-10-09 Methods, systems and processes of de novo assembly of sequencing reads

Publications (1)

Publication Number Publication Date
CN106795568A true CN106795568A (zh) 2017-05-31

Family

ID=55653914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201580054801.9A Pending CN106795568A (zh) 2014-10-10 2015-10-09 测序读段的de novo组装的方法、系统和过程

Country Status (8)

Country Link
US (1) US20190244678A1 (https=)
EP (1) EP3204522A4 (https=)
JP (1) JP6762932B2 (https=)
CN (1) CN106795568A (https=)
BR (1) BR112017007282A2 (https=)
CA (1) CA2963868A1 (https=)
IL (1) IL251277B (https=)
WO (1) WO2016055971A2 (https=)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110060734A (zh) * 2019-03-29 2019-07-26 天津大学 一种高鲁棒性dna测序用条形码生成和读取方法
CN115938480A (zh) * 2021-09-23 2023-04-07 武汉华大基因技术服务有限公司 长读长测序对基因组组装结果纠错方法优化装置和系统
CN118380052A (zh) * 2024-06-24 2024-07-23 安诺优达基因科技(北京)有限公司 基因组结构预测的方法及电子装置

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10395759B2 (en) 2015-05-18 2019-08-27 Regeneron Pharmaceuticals, Inc. Methods and systems for copy number variant detection
US12071669B2 (en) 2016-02-12 2024-08-27 Regeneron Pharmaceuticals, Inc. Methods and systems for detection of abnormal karyotypes
WO2018057775A1 (en) * 2016-09-22 2018-03-29 Invitae Corporation Methods, systems and processes of identifying genetic variations
WO2019028189A2 (en) * 2017-08-01 2019-02-07 Human Longevity, Inc. DETERMINING THE STR LENGTH BY SHORT READ SEQUENCING
US11728007B2 (en) 2017-11-30 2023-08-15 Grail, Llc Methods and systems for analyzing nucleic acid sequences using mappability analysis and de novo sequence assembly
JP7361774B2 (ja) * 2018-07-27 2023-10-16 ミリアド・ウィメンズ・ヘルス・インコーポレーテッド シーケンスリードの独立したアラインメントおよびペアリングによって高度に相同なシーケンスにおける遺伝的変異を検出するための方法
EP3853763A1 (en) * 2018-09-20 2021-07-28 Aivf Ltd Image feature detection
BR112020026259A2 (pt) * 2018-11-01 2021-07-27 Illumina, Inc. métodos e composições para detecção de variante de linhagem germinativa
CN113557572B (zh) * 2019-01-25 2025-02-07 加利福尼亚太平洋生物科学股份有限公司 基于图的映射核酸片段的系统和方法
SG11202109079YA (en) * 2019-12-05 2021-09-29 Illumina Inc Rapid detection of gene fusions
US12093803B2 (en) * 2020-07-01 2024-09-17 International Business Machines Corporation Downsampling genomic sequence data
CA3184609A1 (en) * 2020-12-11 2022-06-16 Illumina Inc. Methods and systems for visualizing short reads in repetitive regions of the genome
US20240117445A1 (en) * 2021-03-16 2024-04-11 University Of North Texas Health Science Center At Fort Worth Macrohaplotypes for Forensic DNA Mixture Deconvolution

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110257889A1 (en) * 2010-02-24 2011-10-20 Pacific Biosciences Of California, Inc. Sequence assembly and consensus sequence determination
CN102460155A (zh) * 2009-04-29 2012-05-16 考利达基因组股份有限公司 用于关于参考多核苷酸序列标注样本多核苷酸序列中的变异的方法和系统
WO2012177774A2 (en) * 2011-06-21 2012-12-27 Life Technologies Corporation Systems and methods for hybrid assembly of nucleic acid sequences
US20130137605A1 (en) * 2008-09-12 2013-05-30 University Of Washington Sequence tag directed subassembly of short sequencing reads into long sequencing reads
WO2013103759A2 (en) * 2012-01-04 2013-07-11 Dow Agrosciences Llc Haplotype based pipeline for snp discovery and/or classification
CN103258145A (zh) * 2012-12-22 2013-08-21 中国科学院深圳先进技术研究院 一种基于De Bruijn图的并行基因拼接方法
US20140114582A1 (en) * 2012-10-18 2014-04-24 David A. Mittelman System and method for genotyping using informed error profiles
CN103761453A (zh) * 2013-12-09 2014-04-30 天津工业大学 一种基于簇图结构的并行基因拼接算法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130137605A1 (en) * 2008-09-12 2013-05-30 University Of Washington Sequence tag directed subassembly of short sequencing reads into long sequencing reads
CN102460155A (zh) * 2009-04-29 2012-05-16 考利达基因组股份有限公司 用于关于参考多核苷酸序列标注样本多核苷酸序列中的变异的方法和系统
US20110257889A1 (en) * 2010-02-24 2011-10-20 Pacific Biosciences Of California, Inc. Sequence assembly and consensus sequence determination
WO2012177774A2 (en) * 2011-06-21 2012-12-27 Life Technologies Corporation Systems and methods for hybrid assembly of nucleic acid sequences
WO2013103759A2 (en) * 2012-01-04 2013-07-11 Dow Agrosciences Llc Haplotype based pipeline for snp discovery and/or classification
US20140114582A1 (en) * 2012-10-18 2014-04-24 David A. Mittelman System and method for genotyping using informed error profiles
CN103258145A (zh) * 2012-12-22 2013-08-21 中国科学院深圳先进技术研究院 一种基于De Bruijn图的并行基因拼接方法
CN103761453A (zh) * 2013-12-09 2014-04-30 天津工业大学 一种基于簇图结构的并行基因拼接算法

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
ANDY RIMMER ET AL.: "Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications", 《NATURE GENETICS》 *
DANIEL R. ZERBINO ET AL.: "Velvet: Algorithms for de novo short read assembly using de Bruijn graphs", 《GENOME RESEARCH》 *
IMMA HERNAN ET AL.: "Detection of Genomic Variations in BRCA1 and BRCA2 Genes by Long-Range PCR and Next-Generation Sequencing", 《THE JOURNAL OF MOLECULAR DIAGNOSTICS》 *
MANFRED G GRABHERR ET AL.: "Full-length transcriptome assembly from RNA-Seq data without a reference genome", 《NATURE BIOTECHNOLOGY》 *
MIHAI POP ET AL.: "Comparative genome assembly", 《BRIEFINGS IN BIOINFORMATICS》 *
XIAO-LONG WU ET AL.: "TIGER: tiled iterative genome assembler", 《BMC BIOINFORMATICS》 *
XIAOQIU HUANG ET AL.: "PCAP: A Whole-Genome Assembly Program", 《GENOME RESEARCH》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110060734A (zh) * 2019-03-29 2019-07-26 天津大学 一种高鲁棒性dna测序用条形码生成和读取方法
CN110060734B (zh) * 2019-03-29 2021-08-13 天津大学 一种高鲁棒性dna测序用条形码生成和读取方法
CN115938480A (zh) * 2021-09-23 2023-04-07 武汉华大基因技术服务有限公司 长读长测序对基因组组装结果纠错方法优化装置和系统
CN118380052A (zh) * 2024-06-24 2024-07-23 安诺优达基因科技(北京)有限公司 基因组结构预测的方法及电子装置

Also Published As

Publication number Publication date
JP2018500625A (ja) 2018-01-11
IL251277A0 (en) 2017-05-29
JP6762932B2 (ja) 2020-09-30
WO2016055971A2 (en) 2016-04-14
CA2963868A1 (en) 2016-04-14
IL251277B (en) 2020-08-31
EP3204522A2 (en) 2017-08-16
EP3204522A4 (en) 2018-06-20
US20190244678A1 (en) 2019-08-08
BR112017007282A2 (pt) 2018-06-19
WO2016055971A3 (en) 2016-06-02

Similar Documents

Publication Publication Date Title
JP6762932B2 (ja) シーケンシングリードのde novoアセンブリーの方法、システム、およびプロセス
JP7284849B2 (ja) 不均一分子長を有するユニーク分子インデックスセットの生成およびエラー補正のための方法およびシステム
JP6725481B2 (ja) 母体血漿の無侵襲的出生前分子核型分析
Robasky et al. The role of replicates for error mitigation in next-generation sequencing
KR102514024B1 (ko) 유전적 변이의 비침습 평가를 위한 방법 및 프로세스
Krawitz et al. Microindel detection in short-read sequence data
US20190121941A1 (en) Algorithms for sequence determinations
CN115989544A (zh) 用于在基因组的重复区域中可视化短读段的方法和系统
US20160154930A1 (en) Methods for identification of individuals
US20210125685A1 (en) Methods and systems for analysis of ctcf binding regions in cell-free dna
EP4573219A1 (en) Method of detecting cancer dna in a sample
JP2021101629A (ja) ゲノム解析および遺伝子解析用のシステム並びに方法
US20240404624A1 (en) Structural variant alignment and variant calling by utilizing a structural-variant reference genome
Olbrich et al. An Emirati pangenome incorporating a diploid telomere-to-telomere reference
Warr Lost pigs and broken genes: the search for causes of embryonic loss in the pig and the assembly of a more contiguous reference genome
Ping Error Rectification and Deduplication Algorithms on Short-Read Sequencing Data
Alsafar et al. The Emirati T2T-Level Pangenome: A Graph of 58 Complete Genomes
Sherman Discovering novel human structural variation from diverse populations and disease patients: an exploration of what human genomics misses by relying on reference-based analyses
WO2023250504A1 (en) Improving split-read alignment by intelligently identifying and scoring candidate split groups
KR20250092241A (ko) 핵산 오류 억제
Heinrich Aspects of Quality Control for Next Generation Sequencing Data in Medical Genetics
Hosseinkhan Ali Masoudi-Nejad Zahra Narimani
Al Aamri et al. Michael Olbrich 1✉ A Mira Mousa 1, 2✉ Inken Wohlers 3, 11, 12

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170531