CA3174208A1 - Quality score compression - Google Patents

Quality score compression Download PDF

Info

Publication number
CA3174208A1
CA3174208A1 CA3174208A CA3174208A CA3174208A1 CA 3174208 A1 CA3174208 A1 CA 3174208A1 CA 3174208 A CA3174208 A CA 3174208A CA 3174208 A CA3174208 A CA 3174208A CA 3174208 A1 CA3174208 A1 CA 3174208A1
Authority
CA
Canada
Prior art keywords
quality score
data
quality
quality scores
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3174208A
Other languages
English (en)
French (fr)
Inventor
Guillaume Alexandre Pascal RIZK
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Illumina Inc
Original Assignee
Illumina Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Illumina Inc filed Critical Illumina Inc
Publication of CA3174208A1 publication Critical patent/CA3174208A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/50Compression of genetic data
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3068Precoding preceding compression, e.g. Burrows-Wheeler transformation
    • H03M7/3071Prediction
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6011Encoder aspects
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6017Methods or arrangements to increase the throughput
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6017Methods or arrangements to increase the throughput
    • H03M7/6029Pipelining
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound
    • H03M7/702Software
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound
    • H03M7/705Unicode
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound
    • H03M7/707Structured documents, e.g. XML
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/4006Conversion to or from arithmetic code

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Applications Or Details Of Rotary Compressors (AREA)
CA3174208A 2020-11-05 2021-11-05 Quality score compression Pending CA3174208A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US202063110308P 2020-11-05 2020-11-05
US63/110,308 2020-11-05
PCT/US2021/058364 WO2022099097A1 (en) 2020-11-05 2021-11-05 Quality score compression

Publications (1)

Publication Number Publication Date
CA3174208A1 true CA3174208A1 (en) 2022-05-12

Family

ID=78725748

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3174208A Pending CA3174208A1 (en) 2020-11-05 2021-11-05 Quality score compression

Country Status (12)

Country Link
US (4) US11527307B2 (https=)
EP (1) EP4241276A1 (https=)
JP (1) JP7810664B2 (https=)
KR (1) KR20230101760A (https=)
CN (1) CN115668384A (https=)
AU (1) AU2021376411A1 (https=)
BR (1) BR112022025042A2 (https=)
CA (1) CA3174208A1 (https=)
IL (2) IL298981B2 (https=)
MX (1) MX2022016020A (https=)
WO (1) WO2022099097A1 (https=)
ZA (2) ZA202304367B (https=)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022099097A1 (en) * 2020-11-05 2022-05-12 Illumina, Inc. Quality score compression
JP2022086403A (ja) * 2020-11-30 2022-06-09 キオクシア株式会社 メモリシステム及び情報処理システム
EP4490735A1 (en) 2022-03-08 2025-01-15 Illumina Inc Multi-pass software-accelerated genomic read mapping engine
US11775172B1 (en) * 2022-05-05 2023-10-03 CELLGENTEK Corp. Genome data compression and transmission method for FASTQ-formatted genome data
CN115662525B (zh) * 2022-10-25 2026-04-21 湖南大学 一种测序fastq文件质量分数序列的稀疏化处理方法

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100205204A1 (en) 2007-03-02 2010-08-12 Research Organization Of Information And Systems Homology retrieval system, homology retrieval apparatus, and homology retrieval method
US10090857B2 (en) * 2010-04-26 2018-10-02 Samsung Electronics Co., Ltd. Method and apparatus for compressing genetic data
US20110288785A1 (en) * 2010-05-18 2011-11-24 Translational Genomics Research Institute (Tgen) Compression of genomic base and annotation data
AU2012272161B2 (en) * 2011-06-21 2015-12-24 Illumina Cambridge Limited Methods and systems for data analysis
US10777301B2 (en) * 2012-07-13 2020-09-15 Pacific Biosciences For California, Inc. Hierarchical genome assembly method using single long insert library
US10847251B2 (en) * 2013-01-17 2020-11-24 Illumina, Inc. Genomic infrastructure for on-site or cloud-based DNA and RNA processing and analysis
WO2014197377A2 (en) * 2013-06-03 2014-12-11 Good Start Genetics, Inc. Methods and systems for storing sequence read data
WO2016081712A1 (en) * 2014-11-19 2016-05-26 Bigdatabio, Llc Systems and methods for genomic manipulations and analysis
CN107851118A (zh) * 2015-05-21 2018-03-27 基因福米卡数据系统有限公司 下一代测序数据的存储、传输和压缩
JP6949970B2 (ja) 2016-10-11 2021-10-13 ゲノムシス エスアー バイオインフォマティクスデータを送信する方法及びシステム
CN110021349B (zh) * 2017-07-31 2021-02-02 北京哲源科技有限责任公司 基因数据的编码方法
CN110111852A (zh) * 2018-01-11 2019-08-09 广州明领基因科技有限公司 一种海量dna测序数据无损快速压缩平台
CN110797082A (zh) * 2019-10-24 2020-02-14 福建和瑞基因科技有限公司 基因测序数据的存储读取方法及系统
WO2022099097A1 (en) 2020-11-05 2022-05-12 Illumina, Inc. Quality score compression

Also Published As

Publication number Publication date
JP7810664B2 (ja) 2026-02-03
CN115668384A (zh) 2023-01-31
IL298981B2 (en) 2025-03-01
ZA202304367B (en) 2023-12-20
KR20230101760A (ko) 2023-07-06
IL316156B1 (en) 2026-04-01
IL298981A (en) 2023-02-01
WO2022099097A1 (en) 2022-05-12
ZA202402955B (en) 2025-04-30
BR112022025042A2 (pt) 2023-05-09
IL316156A (en) 2024-12-01
US20220139502A1 (en) 2022-05-05
US20240420804A1 (en) 2024-12-19
JP2023547973A (ja) 2023-11-15
MX2022016020A (es) 2023-02-02
US20230040143A1 (en) 2023-02-09
US12080385B2 (en) 2024-09-03
AU2021376411A1 (en) 2022-10-27
US20240062853A1 (en) 2024-02-22
US11527307B2 (en) 2022-12-13
IL298981B1 (en) 2024-11-01
EP4241276A1 (en) 2023-09-13
US11776663B2 (en) 2023-10-03

Similar Documents

Publication Publication Date Title
US12080385B2 (en) Quality score compression
Cánovas et al. Lossy compression of quality scores in genomic data
Wandelt et al. Trends in genome compression
US20130304391A1 (en) Transmission and compression of genetic data
US20240061843A1 (en) Flexible Seed Extension for Hash Table Genomic Mapping
EP4075438B1 (en) Efficient data structures for bioinformatics information representation
CN110070914B (zh) 一种基因序列识别方法、系统和计算机可读存储介质
Li et al. DNA-COMPACT: DNA COM pression Based on a P attern-A ware C ontextual Modeling T echnique
JP2025124637A (ja) ゲノム配列データの圧縮のための方法
Azad et al. Interpreting genomic data via entropic dissection
CN113574603A (zh) 基因融合的快速检测
Mansouri et al. One-bit dna compression algorithm
EP3583249A1 (en) Method and systems for the reconstruction of genomic reference sequences from compressed genomic sequence reads
US10460829B2 (en) Systems and methods for encoding genetic variation for a population
EP4086912B1 (en) A method and a system for profiling of a metagenome sample
Liu et al. Quality scores compression of genomic sequencing data: a comprehensive review and performance evaluation
Wang et al. smallWig: parallel compression of RNA-seq WIG files
Nazari et al. Lossless and reference-free compression of FASTQ/A files using GeneSqueeze
Sun et al. An intelligent ubiquitous compression technique for DNA sequencing using Hadoop
Rahman et al. CHAPAO: Likelihood and hierarchical reference-based representation of biomolecular sequences and applications to compressing multiple sequence alignments
Prasad A New Revisited Compression Technique through Innovative Partiotion Group Binary Compression: A Novel Approach
Numanagic Efficient high throughput sequencing data compression and genotyping methods for clinical environments
Goktas K-mer-based data structures and pipelines for sequence mapping and analysis
KR20230069046A (ko) 소프트웨어 가속 게놈 판독 매핑
CN112464011A (zh) 数据检索方法及装置

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20220929

D00 Search and/or examination requested or commenced

Free format text: ST27 STATUS EVENT CODE: A-2-2-D10-D00-D120 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: VOLUNTARY SUBMISSION OF PRIOR ART RECEIVED

Effective date: 20240806

P11 Amendment of application requested

Free format text: ST27 STATUS EVENT CODE: A-2-2-P10-P11-P100 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: AMENDMENT RECEIVED - RESPONSE TO EXAMINER'S REQUISITION

Effective date: 20240806

MFA Maintenance fee for application paid

Free format text: FEE DESCRIPTION TEXT: MF (APPLICATION, 3RD ANNIV.) - STANDARD

Year of fee payment: 3

U00 Fee paid

Free format text: ST27 STATUS EVENT CODE: A-2-2-U10-U00-U101 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE REQUEST RECEIVED

Effective date: 20241104

U11 Full renewal or maintenance fee paid

Free format text: ST27 STATUS EVENT CODE: A-2-2-U10-U11-U102 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE FEE PAYMENT DETERMINED COMPLIANT

Effective date: 20241104

Free format text: ST27 STATUS EVENT CODE: A-2-2-U10-U11-U102 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE FEE PAYMENT PAID IN FULL

Effective date: 20241104

W00 Other event occurred

Free format text: ST27 STATUS EVENT CODE: A-2-2-W10-W00-W111 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: CORRESPONDENT DETERMINED COMPLIANT

Effective date: 20250122

D00 Search and/or examination requested or commenced

Free format text: ST27 STATUS EVENT CODE: A-2-2-D10-D00-D123 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: PRIOR ART DISCLOSURE DETERMINED COMPLIANT

Effective date: 20250320

W00 Other event occurred

Free format text: ST27 STATUS EVENT CODE: A-2-2-W10-W00-W100 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: LETTER SENT

Effective date: 20250320

P11 Amendment of application requested

Free format text: ST27 STATUS EVENT CODE: A-2-2-P10-P11-P102 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: AMENDMENT DETERMINED COMPLIANT

Effective date: 20250328

P13 Application amended

Free format text: ST27 STATUS EVENT CODE: A-2-2-P10-P13-X000 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: APPLICATION AMENDED

Effective date: 20250328

MFA Maintenance fee for application paid

Free format text: FEE DESCRIPTION TEXT: MF (APPLICATION, 4TH ANNIV.) - STANDARD

Year of fee payment: 4

U00 Fee paid

Free format text: ST27 STATUS EVENT CODE: A-2-2-U10-U00-U101 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE REQUEST RECEIVED

Effective date: 20251014

U11 Full renewal or maintenance fee paid

Free format text: ST27 STATUS EVENT CODE: A-2-2-U10-U11-U102 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE FEE PAYMENT PAID IN FULL

Effective date: 20251014

D22 Grant of ip right intended

Free format text: ST27 STATUS EVENT CODE: A-2-2-D10-D22-D128 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: ALLOWANCE REQUIREMENTS DETERMINED COMPLIANT

Effective date: 20251113

W00 Other event occurred

Free format text: ST27 STATUS EVENT CODE: A-2-2-W10-W00-W100 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: LETTER SENT

Effective date: 20251114

D11 Substantive examination requested

Free format text: ST27 STATUS EVENT CODE: A-2-2-D10-D11-D131 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: REQUEST FOR CONTINUED EXAMINATION (RCE) RECEIVED

Effective date: 20260316

P11 Amendment of application requested

Free format text: ST27 STATUS EVENT CODE: A-2-2-P10-P11-P101 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: AMENDMENT RECEIVED - VOLUNTARY AMENDMENT

Effective date: 20260316