IL298981B2 - Compression quality score - Google Patents

Compression quality score

Info

Publication number
IL298981B2
IL298981B2 IL298981A IL29898122A IL298981B2 IL 298981 B2 IL298981 B2 IL 298981B2 IL 298981 A IL298981 A IL 298981A IL 29898122 A IL29898122 A IL 29898122A IL 298981 B2 IL298981 B2 IL 298981B2
Authority
IL
Israel
Prior art keywords
quality scores
sequence
quality
data
quality score
Prior art date
Application number
IL298981A
Other languages
English (en)
Hebrew (he)
Other versions
IL298981A (en
IL298981B1 (en
Original Assignee
Illumina Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Illumina Inc filed Critical Illumina Inc
Publication of IL298981A publication Critical patent/IL298981A/en
Publication of IL298981B1 publication Critical patent/IL298981B1/en
Publication of IL298981B2 publication Critical patent/IL298981B2/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/50Compression of genetic data
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3068Precoding preceding compression, e.g. Burrows-Wheeler transformation
    • H03M7/3071Prediction
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/4006Conversion to or from arithmetic code
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6011Encoder aspects
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6017Methods or arrangements to increase the throughput
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6017Methods or arrangements to increase the throughput
    • H03M7/6029Pipelining
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound
    • H03M7/702Software
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound
    • H03M7/705Unicode
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound
    • H03M7/707Structured documents, e.g. XML

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Applications Or Details Of Rotary Compressors (AREA)
IL298981A 2020-11-05 2021-11-05 Compression quality score IL298981B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063110308P 2020-11-05 2020-11-05
PCT/US2021/058364 WO2022099097A1 (en) 2020-11-05 2021-11-05 Quality score compression

Publications (3)

Publication Number Publication Date
IL298981A IL298981A (en) 2023-02-01
IL298981B1 IL298981B1 (en) 2024-11-01
IL298981B2 true IL298981B2 (en) 2025-03-01

Family

ID=78725748

Family Applications (2)

Application Number Title Priority Date Filing Date
IL298981A IL298981B2 (en) 2020-11-05 2021-11-05 Compression quality score
IL316156A IL316156B1 (en) 2020-11-05 2021-11-05 Compression quality score

Family Applications After (1)

Application Number Title Priority Date Filing Date
IL316156A IL316156B1 (en) 2020-11-05 2021-11-05 Compression quality score

Country Status (12)

Country Link
US (4) US11527307B2 (https=)
EP (1) EP4241276A1 (https=)
JP (1) JP7810664B2 (https=)
KR (1) KR20230101760A (https=)
CN (1) CN115668384A (https=)
AU (1) AU2021376411A1 (https=)
BR (1) BR112022025042A2 (https=)
CA (1) CA3174208A1 (https=)
IL (2) IL298981B2 (https=)
MX (1) MX2022016020A (https=)
WO (1) WO2022099097A1 (https=)
ZA (2) ZA202304367B (https=)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022099097A1 (en) * 2020-11-05 2022-05-12 Illumina, Inc. Quality score compression
JP2022086403A (ja) * 2020-11-30 2022-06-09 キオクシア株式会社 メモリシステム及び情報処理システム
EP4490735A1 (en) 2022-03-08 2025-01-15 Illumina Inc Multi-pass software-accelerated genomic read mapping engine
US11775172B1 (en) * 2022-05-05 2023-10-03 CELLGENTEK Corp. Genome data compression and transmission method for FASTQ-formatted genome data
CN115662525B (zh) * 2022-10-25 2026-04-21 湖南大学 一种测序fastq文件质量分数序列的稀疏化处理方法

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100205204A1 (en) 2007-03-02 2010-08-12 Research Organization Of Information And Systems Homology retrieval system, homology retrieval apparatus, and homology retrieval method
US10090857B2 (en) * 2010-04-26 2018-10-02 Samsung Electronics Co., Ltd. Method and apparatus for compressing genetic data
US20110288785A1 (en) * 2010-05-18 2011-11-24 Translational Genomics Research Institute (Tgen) Compression of genomic base and annotation data
AU2012272161B2 (en) * 2011-06-21 2015-12-24 Illumina Cambridge Limited Methods and systems for data analysis
US10777301B2 (en) * 2012-07-13 2020-09-15 Pacific Biosciences For California, Inc. Hierarchical genome assembly method using single long insert library
US10847251B2 (en) * 2013-01-17 2020-11-24 Illumina, Inc. Genomic infrastructure for on-site or cloud-based DNA and RNA processing and analysis
WO2014197377A2 (en) * 2013-06-03 2014-12-11 Good Start Genetics, Inc. Methods and systems for storing sequence read data
WO2016081712A1 (en) * 2014-11-19 2016-05-26 Bigdatabio, Llc Systems and methods for genomic manipulations and analysis
CN107851118A (zh) * 2015-05-21 2018-03-27 基因福米卡数据系统有限公司 下一代测序数据的存储、传输和压缩
JP6949970B2 (ja) 2016-10-11 2021-10-13 ゲノムシス エスアー バイオインフォマティクスデータを送信する方法及びシステム
CN110021349B (zh) * 2017-07-31 2021-02-02 北京哲源科技有限责任公司 基因数据的编码方法
CN110111852A (zh) * 2018-01-11 2019-08-09 广州明领基因科技有限公司 一种海量dna测序数据无损快速压缩平台
CN110797082A (zh) * 2019-10-24 2020-02-14 福建和瑞基因科技有限公司 基因测序数据的存储读取方法及系统
WO2022099097A1 (en) 2020-11-05 2022-05-12 Illumina, Inc. Quality score compression

Also Published As

Publication number Publication date
JP7810664B2 (ja) 2026-02-03
CN115668384A (zh) 2023-01-31
ZA202304367B (en) 2023-12-20
KR20230101760A (ko) 2023-07-06
IL316156B1 (en) 2026-04-01
IL298981A (en) 2023-02-01
WO2022099097A1 (en) 2022-05-12
ZA202402955B (en) 2025-04-30
BR112022025042A2 (pt) 2023-05-09
IL316156A (en) 2024-12-01
US20220139502A1 (en) 2022-05-05
US20240420804A1 (en) 2024-12-19
JP2023547973A (ja) 2023-11-15
MX2022016020A (es) 2023-02-02
US20230040143A1 (en) 2023-02-09
US12080385B2 (en) 2024-09-03
AU2021376411A1 (en) 2022-10-27
US20240062853A1 (en) 2024-02-22
CA3174208A1 (en) 2022-05-12
US11527307B2 (en) 2022-12-13
IL298981B1 (en) 2024-11-01
EP4241276A1 (en) 2023-09-13
US11776663B2 (en) 2023-10-03

Similar Documents

Publication Publication Date Title
IL298981B2 (en) Compression quality score
US20220383206A1 (en) Task Augmentation and Self-Training for Improved Few-Shot Learning
IL279558B2 (en) Flexible kernel extension for genomic mapping using a hash table
IL288276B2 (en) Deep learning-based framework for identifying sequence patterns that cause sequence-specific errors (sses)
US20230098398A1 (en) Molecular structure reconstruction method and apparatus, device, storage medium, and program product
IL298947A (en) Analysis method for genetic sequencing, device, storage media and computer
CN111813930A (zh) 相似文档检索方法及装置
CN109086273A (zh) 基于神经网络解答语法填空题的方法、装置和终端设备
IL301077A (en) Bot supervision
CN110070914B (zh) 一种基因序列识别方法、系统和计算机可读存储介质
CN113574603B (zh) 基因融合的快速检测
CN113886560A (zh) 庭审问题的推荐方法以及装置
JP2024540870A (ja) 符号化/復号システム及び方法
CN115345181A (zh) 神经机器翻译模型的训练方法、翻译方法及装置
CN115269830A (zh) 异常文本检测模型训练方法、异常文本检测方法及装置
Rao et al. Multi-modal Contrastive Learning with Negative Sampling Calibration for Phenotypic Drug Discovery
IL298979B2 (en) Software-accelerated genomic read mapping
CN117609495A (zh) 文本特征分类方法、分类装置、电子设备和存储介质
IL294741A (en) Cumulative secondary analysis of nucleic acid sequences
CN114357183A (zh) 实体关系抽取方法、装置、设备、介质及程序产品
WO2023055614A1 (en) Embedding compression for efficient representation learning in graph
CN118251843A (zh) 编码/解码系统和方法
CN116484252A (zh) 一种基于编码修正bert cnn的数据分类方法及其系统
CN115374418A (zh) 情绪鉴权方法、情绪鉴权装置、存储介质及电子设备
IL309509A (en) Repetitive activation of a neuron network cell to perform multiple actions in a single invocation