AU2022319125A1 - Quality score calibration of basecalling systems - Google Patents

Quality score calibration of basecalling systems Download PDF

Info

Publication number
AU2022319125A1
AU2022319125A1 AU2022319125A AU2022319125A AU2022319125A1 AU 2022319125 A1 AU2022319125 A1 AU 2022319125A1 AU 2022319125 A AU2022319125 A AU 2022319125A AU 2022319125 A AU2022319125 A AU 2022319125A AU 2022319125 A1 AU2022319125 A1 AU 2022319125A1
Authority
AU
Australia
Prior art keywords
base
range
sensor data
value
intensity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
AU2022319125A
Other languages
English (en)
Inventor
Andrew Dodge Heiberg
Dorna KASHEFHAGHIGHI
Rohan PAUL
John S. Vieceli
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Illumina Inc
Original Assignee
Illumina Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/839,387 external-priority patent/US20230029970A1/en
Application filed by Illumina Inc filed Critical Illumina Inc
Publication of AU2022319125A1 publication Critical patent/AU2022319125A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Signal Processing (AREA)
AU2022319125A 2021-07-28 2022-07-28 Quality score calibration of basecalling systems Pending AU2022319125A1 (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US202163226707P 2021-07-28 2021-07-28
US63/226,707 2021-07-28
US17/839,387 2022-06-13
US17/839,387 US20230029970A1 (en) 2021-07-28 2022-06-13 Quality score calibration of basecalling systems
PCT/US2022/038729 WO2023009758A1 (en) 2021-07-28 2022-07-28 Quality score calibration of basecalling systems

Publications (1)

Publication Number Publication Date
AU2022319125A1 true AU2022319125A1 (en) 2024-01-18

Family

ID=83149575

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2022319125A Pending AU2022319125A1 (en) 2021-07-28 2022-07-28 Quality score calibration of basecalling systems

Country Status (7)

Country Link
EP (1) EP4377960A1 (ko)
JP (1) JP2024532049A (ko)
KR (1) KR20240037882A (ko)
AU (1) AU2022319125A1 (ko)
CA (1) CA3223746A1 (ko)
IL (1) IL309786A (ko)
WO (1) WO2023009758A1 (ko)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118053503A (zh) * 2024-01-11 2024-05-17 中国农业科学院农业基因组研究所 一种入侵生物多组学数据库构建方法及系统

Family Cites Families (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2044616A1 (en) 1989-10-26 1991-04-27 Roger Y. Tsien Dna sequencing
US5641658A (en) 1994-08-03 1997-06-24 Mosaic Technologies, Inc. Method for performing amplification of nucleic acid with two primers bound to a single solid support
US6090592A (en) 1994-08-03 2000-07-18 Mosaic Technologies, Inc. Method for performing amplification of nucleic acid on supports
WO1998044152A1 (en) 1997-04-01 1998-10-08 Glaxo Group Limited Method of nucleic acid sequencing
EP3034626A1 (en) 1997-04-01 2016-06-22 Illumina Cambridge Limited Method of nucleic acid sequencing
AR021833A1 (es) 1998-09-30 2002-08-07 Applied Research Systems Metodos de amplificacion y secuenciacion de acido nucleico
US20020150909A1 (en) 1999-02-09 2002-10-17 Stuelpnagel John R. Automated information processing in randomly ordered arrays
AU2001282881B2 (en) 2000-07-07 2007-06-14 Visigen Biotechnologies, Inc. Real-time sequence determination
WO2002044425A2 (en) 2000-12-01 2002-06-06 Visigen Biotechnologies, Inc. Enzymatic nucleic acid synthesis: compositions and methods for altering monomer incorporation fidelity
AR031640A1 (es) 2000-12-08 2003-09-24 Applied Research Systems Amplificacion isotermica de acidos nucleicos en un soporte solido
US7057026B2 (en) 2001-12-04 2006-06-06 Solexa Limited Labelled nucleotides
US20040002090A1 (en) 2002-03-05 2004-01-01 Pascal Mayer Methods for detecting genome-wide sequence variations associated with a phenotype
SI3363809T1 (sl) 2002-08-23 2020-08-31 Illumina Cambridge Limited Modificirani nukleotidi za polinukleotidno sekvenciranje
PT3147292T (pt) 2002-08-23 2018-11-22 Illumina Cambridge Ltd Nucleótidos identificados
GB0321306D0 (en) 2003-09-11 2003-10-15 Solexa Ltd Modified polymerases for improved incorporation of nucleotide analogues
EP2789383B1 (en) 2004-01-07 2023-05-03 Illumina Cambridge Limited Molecular arrays
EP3415641B1 (en) 2004-09-17 2023-11-01 Pacific Biosciences Of California, Inc. Method for analysis of molecules
EP1828412B2 (en) 2004-12-13 2019-01-09 Illumina Cambridge Limited Improved method of nucleotide detection
JP4990886B2 (ja) 2005-05-10 2012-08-01 ソレックサ リミテッド 改良ポリメラーゼ
US8045998B2 (en) 2005-06-08 2011-10-25 Cisco Technology, Inc. Method and system for communicating using position information
GB0514936D0 (en) 2005-07-20 2005-08-24 Solexa Ltd Preparation of templates for nucleic acid sequencing
GB0517097D0 (en) 2005-08-19 2005-09-28 Solexa Ltd Modified nucleosides and nucleotides and uses thereof
US7405281B2 (en) 2005-09-29 2008-07-29 Pacific Biosciences Of California, Inc. Fluorescent nucleotide analogs and uses therefor
GB0522310D0 (en) 2005-11-01 2005-12-07 Solexa Ltd Methods of preparing libraries of template polynucleotides
EP2021503A1 (en) 2006-03-17 2009-02-11 Solexa Ltd. Isothermal methods for creating clonal single molecule arrays
EP4105644A3 (en) 2006-03-31 2022-12-28 Illumina, Inc. Systems and devices for sequence by synthesis analysis
US20080242560A1 (en) 2006-11-21 2008-10-02 Gunderson Kevin L Methods for generating amplified nucleic acid arrays
US7595882B1 (en) 2008-04-14 2009-09-29 Geneal Electric Company Hollow-core waveguide-based raman systems and methods
PT3623481T (pt) 2011-09-23 2021-10-15 Illumina Inc Composições para sequenciação de ácidos nucleicos
US20160110499A1 (en) * 2014-10-21 2016-04-21 Life Technologies Corporation Methods, systems, and computer-readable media for blind deconvolution dephasing of nucleic acid sequencing data
CN111971748A (zh) * 2018-01-26 2020-11-20 宽腾矽公司 用于测序装置的机器学习使能脉冲及碱基判定
US11783917B2 (en) * 2019-03-21 2023-10-10 Illumina, Inc. Artificial intelligence-based base calling

Also Published As

Publication number Publication date
WO2023009758A1 (en) 2023-02-02
IL309786A (en) 2024-02-01
EP4377960A1 (en) 2024-06-05
KR20240037882A (ko) 2024-03-22
CA3223746A1 (en) 2023-02-02
JP2024532049A (ja) 2024-09-05

Similar Documents

Publication Publication Date Title
EP4107737B1 (en) Knowledge distillation and gradient pruning-based compression of artificial intelligence-based base caller
US20220300811A1 (en) Neural network parameter quantization for base calling
CN115136244A (zh) 基于人工智能的多对多碱基判读
AU2022319125A1 (en) Quality score calibration of basecalling systems
US20230041989A1 (en) Base calling using multiple base caller models
US20230029970A1 (en) Quality score calibration of basecalling systems
US20230026084A1 (en) Self-learned base caller, trained using organism sequences
US20220415445A1 (en) Self-learned base caller, trained using oligo sequences
CN117529780A (zh) 碱基检出系统的质量分数校准
KR20230157230A (ko) 염기 호출을 위한 타일 위치 및/또는 사이클 기반 가중치 세트 선택
KR20240027608A (ko) 유기체 서열을 사용하여 훈련된 자체-학습 염기 호출자
JP2024529843A (ja) 複数のベースコーラモデルを使用するベースコール
WO2022197752A1 (en) Tile location and/or cycle based weight set selection for base calling
CN117546248A (zh) 使用多个碱基检出器模型的碱基检出
CN117546249A (zh) 使用寡核苷酸序列训练的自学碱基检出器
EP4405955A1 (en) Compressed state-based base calling