CN112437961A - 机器学习使能的生物聚合物组装 - Google Patents

机器学习使能的生物聚合物组装 Download PDF

Info

Publication number
CN112437961A
CN112437961A CN201980047341.5A CN201980047341A CN112437961A CN 112437961 A CN112437961 A CN 112437961A CN 201980047341 A CN201980047341 A CN 201980047341A CN 112437961 A CN112437961 A CN 112437961A
Authority
CN
China
Prior art keywords
assembly
nucleotide
learning model
locations
biopolymer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201980047341.5A
Other languages
English (en)
Chinese (zh)
Inventor
明·迪克·曹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Quantum Si Inc
Original Assignee
Quantum Si Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Quantum Si Inc filed Critical Quantum Si Inc
Publication of CN112437961A publication Critical patent/CN112437961A/zh
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional [2D] or three-dimensional [3D] molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)
  • Addition Polymer Or Copolymer, Post-Treatments, Or Chemical Modifications (AREA)
CN201980047341.5A 2018-05-14 2019-05-13 机器学习使能的生物聚合物组装 Pending CN112437961A (zh)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201862671260P 2018-05-14 2018-05-14
US62/671,260 2018-05-14
US201862671884P 2018-05-15 2018-05-15
US62/671,884 2018-05-15
PCT/US2019/032065 WO2019222120A1 (en) 2018-05-14 2019-05-13 Machine learning enabled biological polymer assembly

Publications (1)

Publication Number Publication Date
CN112437961A true CN112437961A (zh) 2021-03-02

Family

ID=66669118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980047341.5A Pending CN112437961A (zh) 2018-05-14 2019-05-13 机器学习使能的生物聚合物组装

Country Status (10)

Country Link
US (1) US20190348152A1 (https=)
EP (1) EP3794596A1 (https=)
JP (1) JP2021523479A (https=)
KR (1) KR20210010488A (https=)
CN (1) CN112437961A (https=)
AU (1) AU2019270961A1 (https=)
BR (1) BR112020022257A2 (https=)
CA (1) CA3098876A1 (https=)
MX (1) MX2020012278A (https=)
WO (1) WO2019222120A1 (https=)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3624068A1 (en) * 2018-09-14 2020-03-18 Covestro Deutschland AG Method for improving prediction relating to the production of a polymer-ic produc
US11664090B2 (en) * 2020-06-11 2023-05-30 Life Technologies Corporation Basecaller with dilated convolutional neural network
EP4211691A1 (en) * 2020-09-11 2023-07-19 F. Hoffmann-La Roche AG Deep-learning-based techniques for generating a consensus sequence from multiple noisy sequences
CA3214755A1 (en) * 2021-04-09 2022-10-13 Natalie CASTELLANA Method for antibody identification from protein mixtures

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102460155A (zh) * 2009-04-29 2012-05-16 考利达基因组股份有限公司 用于关于参考多核苷酸序列标注样本多核苷酸序列中的变异的方法和系统
CN103797486A (zh) * 2011-06-06 2014-05-14 皇家飞利浦有限公司 用于组装核酸序列数据的方法
US20150169824A1 (en) * 2013-12-16 2015-06-18 Complete Genomics, Inc. Basecaller for dna sequencing using machine learning
CA2894317A1 (en) * 2015-06-15 2016-12-15 Deep Genomics Incorporated Systems and methods for classifying, prioritizing and interpreting genetic variants and therapies using a deep neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102460155A (zh) * 2009-04-29 2012-05-16 考利达基因组股份有限公司 用于关于参考多核苷酸序列标注样本多核苷酸序列中的变异的方法和系统
CN103797486A (zh) * 2011-06-06 2014-05-14 皇家飞利浦有限公司 用于组装核酸序列数据的方法
US20150169824A1 (en) * 2013-12-16 2015-06-18 Complete Genomics, Inc. Basecaller for dna sequencing using machine learning
CA2894317A1 (en) * 2015-06-15 2016-12-15 Deep Genomics Incorporated Systems and methods for classifying, prioritizing and interpreting genetic variants and therapies using a deep neural network

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
ALLEX C F, ET AL.: "Neural network input representations that produce accurate consensus sequences from DNA fragment assemblies", BIOINFORMATICS, vol. 15, no. 9, 1 September 1999 (1999-09-01), pages 723 - 728, XP002267211, DOI: 10.1093/bioinformatics/15.9.723 *
HIRANUMA N, ET AL.: "DeepATAC: A deep-learning method to predict regulatory factor binding activity from ATAC-seq signals", COLD SPRING HARBOR LABORATORY, vol. 33, no. 14, 12 July 2017 (2017-07-12), pages 1 - 5 *
HOLMAN A G, ET AL.: "A Machine Learning Approach for Identifying Amino Acid Signatures in the HIV Env Gene Predictive of Dementia", PLOS ONE, vol. 7, no. 11, 14 November 2012 (2012-11-14), pages 2 *
KELLEY D R,ET AL: "Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks", GENOME RESEARCH, vol. 26, no. 7, 1 July 2016 (2016-07-01), pages 990 - 999, XP055507160, DOI: 10.1101/gr.200535.115 *
KOH P W, ET AL.: "Denoising genome-wide histone ChIP-seq with convolutional neural networks", BIOINFORMATICS, vol. 33, no. 14, 24 July 2017 (2017-07-24), pages 1 - 9 *
LOMAN N J, ET AL.: "A complete bacterial genome assembled de novo using only nanopore sequencing data", BIORXIV, 11 March 2015 (2015-03-11), pages 1 - 30 *
LOMAN N J,ET AL.: "A complete bacterial genome assembled de novo using only nanopore sequencing data", NATURE METHODS, vol. 12, no. 8, 1 August 2015 (2015-08-01), pages 733 *
LUO R B, ET AL.: "Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing", BIORXIV, vol. 10, 28 April 2018 (2018-04-28), pages 1 - 20 *
RYAN P, ET AL.: "A universal SNP and small-indel variant caller using deep neural networks", NATURE BIOTECHNOLOGY, vol. 36, no. 10, 20 March 2018 (2018-03-20), pages 983 *
RYAN P, ET AL.: "Creating a universal SNP and small indel variant caller with deep neural networks", BIORXIV, 21 December 2016 (2016-12-21), pages 1 - 13 *
UMAROV R K ,ET AL.: "Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks", PLOS ONE, vol. 12, no. 2, 22 March 2017 (2017-03-22), pages 1 - 12 *
VASER R,ET AL.: "Fast and accurate de novo genome assembly from long uncorrected reads", GENOME RESEARCH, vol. 27, no. 5, 1 May 2017 (2017-05-01), pages 737 - 746, XP055608901, DOI: 10.1101/gr.214270.116 *

Also Published As

Publication number Publication date
MX2020012278A (es) 2021-01-29
JP2021523479A (ja) 2021-09-02
WO2019222120A1 (en) 2019-11-21
KR20210010488A (ko) 2021-01-27
AU2019270961A1 (en) 2020-11-19
BR112020022257A2 (pt) 2021-02-23
EP3794596A1 (en) 2021-03-24
US20190348152A1 (en) 2019-11-14
CA3098876A1 (en) 2019-11-21

Similar Documents

Publication Publication Date Title
US20250226056A1 (en) Variant classifier based on deep neural networks
KR102433458B1 (ko) 심층 컨볼루션 신경망의 앙상블을 트레이닝하기 위한 반감독 학습
CN112437961A (zh) 机器学习使能的生物聚合物组装
US20200176082A1 (en) Analysis of nanopore signal using a machine-learning technique
US11769073B2 (en) Methods and systems for producing an expanded training set for machine learning using biological sequences
US20230298698A1 (en) Methods and systems for sequence generation and prediction
WO2023197718A9 (zh) 一种预测环状rna ires的方法
CN109411020A (zh) 利用长测序读段进行全基因组序列补洞的方法
Zych et al. reGenotyper: Detecting mislabeled samples in genetic data
JP2021523479A5 (https=)
CN108427865B (zh) 一种预测LncRNA和环境因素关联关系的方法
US20260004878A1 (en) Method for assuming organism or host, method for obtaining model for assuming organism or host, and computer device for performing the same
US20250253012A1 (en) Error Correction of Nucleic Acid Sequencing Reads
US10937523B2 (en) Methods, systems and computer readable storage media for generating accurate nucleotide sequences
WO2024130230A2 (en) Systems and methods for evaluation of expression patterns
CN114298214B (zh) 基于超大规模进化算法和硬件加速的蛋白质异常检测方法
US10216899B2 (en) Sentence construction for DNA classification
JPWO2019222120A5 (https=)
Agarwala Cross-Species Prediction of Transcription Factor Binding
Zagganas et al. Simplifying p-value calculation for the unbiased microRNAenrichment analysis, using ML-techniques.
Shaw Prediction of Isoform Functions and Interactions with ncRNAs via Deep Learning
Eraslan Enriching the characterization of complex clinical and molecular phenotypes with deep learning
Balaji Journey Into the Unknown: Graph and Machine Learning Based Approaches for Improved Characterization of Novel Pathogens
Fujimoto et al. Learning the language of genes: representing global codon bias with deep language models
US8762119B2 (en) Method, system and apparatus to predict and/or recognize and/or classify biological sequences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210302