CN115668384A - 质量分数压缩 - Google Patents
质量分数压缩 Download PDFInfo
- Publication number
- CN115668384A CN115668384A CN202180039438.9A CN202180039438A CN115668384A CN 115668384 A CN115668384 A CN 115668384A CN 202180039438 A CN202180039438 A CN 202180039438A CN 115668384 A CN115668384 A CN 115668384A
- Authority
- CN
- China
- Prior art keywords
- quality scores
- data
- quality
- sequence
- scores
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/50—Compression of genetic data
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3068—Precoding preceding compression, e.g. Burrows-Wheeler transformation
- H03M7/3071—Prediction
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
- H03M7/4006—Conversion to or from arithmetic code
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/60—General implementation details not specific to a particular type of compression
- H03M7/6011—Encoder aspects
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/60—General implementation details not specific to a particular type of compression
- H03M7/6017—Methods or arrangements to increase the throughput
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/60—General implementation details not specific to a particular type of compression
- H03M7/6017—Methods or arrangements to increase the throughput
- H03M7/6029—Pipelining
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/70—Type of the data to be coded, other than image and sound
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/70—Type of the data to be coded, other than image and sound
- H03M7/702—Software
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/70—Type of the data to be coded, other than image and sound
- H03M7/705—Unicode
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/70—Type of the data to be coded, other than image and sound
- H03M7/707—Structured documents, e.g. XML
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Applications Or Details Of Rotary Compressors (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202063110308P | 2020-11-05 | 2020-11-05 | |
| US63/110308 | 2020-11-05 | ||
| PCT/US2021/058364 WO2022099097A1 (en) | 2020-11-05 | 2021-11-05 | Quality score compression |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN115668384A true CN115668384A (zh) | 2023-01-31 |
Family
ID=78725748
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202180039438.9A Pending CN115668384A (zh) | 2020-11-05 | 2021-11-05 | 质量分数压缩 |
Country Status (12)
| Country | Link |
|---|---|
| US (4) | US11527307B2 (https=) |
| EP (1) | EP4241276A1 (https=) |
| JP (1) | JP7810664B2 (https=) |
| KR (1) | KR20230101760A (https=) |
| CN (1) | CN115668384A (https=) |
| AU (1) | AU2021376411A1 (https=) |
| BR (1) | BR112022025042A2 (https=) |
| CA (1) | CA3174208A1 (https=) |
| IL (2) | IL298981B2 (https=) |
| MX (1) | MX2022016020A (https=) |
| WO (1) | WO2022099097A1 (https=) |
| ZA (2) | ZA202304367B (https=) |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022099097A1 (en) * | 2020-11-05 | 2022-05-12 | Illumina, Inc. | Quality score compression |
| JP2022086403A (ja) * | 2020-11-30 | 2022-06-09 | キオクシア株式会社 | メモリシステム及び情報処理システム |
| EP4490735A1 (en) | 2022-03-08 | 2025-01-15 | Illumina Inc | Multi-pass software-accelerated genomic read mapping engine |
| US11775172B1 (en) * | 2022-05-05 | 2023-10-03 | CELLGENTEK Corp. | Genome data compression and transmission method for FASTQ-formatted genome data |
| CN115662525B (zh) * | 2022-10-25 | 2026-04-21 | 湖南大学 | 一种测序fastq文件质量分数序列的稀疏化处理方法 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110288785A1 (en) * | 2010-05-18 | 2011-11-24 | Translational Genomics Research Institute (Tgen) | Compression of genomic base and annotation data |
| US20130031092A1 (en) * | 2010-04-26 | 2013-01-31 | Samsung Electronics Co., Ltd. | Method and apparatus for compressing genetic data |
| CN110021349A (zh) * | 2017-07-31 | 2019-07-16 | 北京哲源科技有限责任公司 | 基因数据的编码方法 |
| CN110111852A (zh) * | 2018-01-11 | 2019-08-09 | 广州明领基因科技有限公司 | 一种海量dna测序数据无损快速压缩平台 |
| CN110797082A (zh) * | 2019-10-24 | 2020-02-14 | 福建和瑞基因科技有限公司 | 基因测序数据的存储读取方法及系统 |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100205204A1 (en) | 2007-03-02 | 2010-08-12 | Research Organization Of Information And Systems | Homology retrieval system, homology retrieval apparatus, and homology retrieval method |
| AU2012272161B2 (en) * | 2011-06-21 | 2015-12-24 | Illumina Cambridge Limited | Methods and systems for data analysis |
| US10777301B2 (en) * | 2012-07-13 | 2020-09-15 | Pacific Biosciences For California, Inc. | Hierarchical genome assembly method using single long insert library |
| US10847251B2 (en) * | 2013-01-17 | 2020-11-24 | Illumina, Inc. | Genomic infrastructure for on-site or cloud-based DNA and RNA processing and analysis |
| WO2014197377A2 (en) * | 2013-06-03 | 2014-12-11 | Good Start Genetics, Inc. | Methods and systems for storing sequence read data |
| WO2016081712A1 (en) * | 2014-11-19 | 2016-05-26 | Bigdatabio, Llc | Systems and methods for genomic manipulations and analysis |
| CN107851118A (zh) * | 2015-05-21 | 2018-03-27 | 基因福米卡数据系统有限公司 | 下一代测序数据的存储、传输和压缩 |
| JP6949970B2 (ja) | 2016-10-11 | 2021-10-13 | ゲノムシス エスアー | バイオインフォマティクスデータを送信する方法及びシステム |
| WO2022099097A1 (en) | 2020-11-05 | 2022-05-12 | Illumina, Inc. | Quality score compression |
-
2021
- 2021-11-05 WO PCT/US2021/058364 patent/WO2022099097A1/en not_active Ceased
- 2021-11-05 CN CN202180039438.9A patent/CN115668384A/zh active Pending
- 2021-11-05 KR KR1020227044606A patent/KR20230101760A/ko active Pending
- 2021-11-05 EP EP21811704.2A patent/EP4241276A1/en active Pending
- 2021-11-05 JP JP2022575435A patent/JP7810664B2/ja active Active
- 2021-11-05 IL IL298981A patent/IL298981B2/en unknown
- 2021-11-05 BR BR112022025042A patent/BR112022025042A2/pt unknown
- 2021-11-05 CA CA3174208A patent/CA3174208A1/en active Pending
- 2021-11-05 IL IL316156A patent/IL316156B1/en unknown
- 2021-11-05 US US17/520,615 patent/US11527307B2/en active Active
- 2021-11-05 AU AU2021376411A patent/AU2021376411A1/en active Pending
- 2021-11-05 MX MX2022016020A patent/MX2022016020A/es unknown
-
2022
- 2022-10-27 US US17/974,978 patent/US11776663B2/en active Active
-
2023
- 2023-04-13 ZA ZA2023/04367A patent/ZA202304367B/en unknown
- 2023-08-23 US US18/237,187 patent/US12080385B2/en active Active
-
2024
- 2024-04-17 ZA ZA2024/02955A patent/ZA202402955B/en unknown
- 2024-08-28 US US18/817,560 patent/US20240420804A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130031092A1 (en) * | 2010-04-26 | 2013-01-31 | Samsung Electronics Co., Ltd. | Method and apparatus for compressing genetic data |
| US20110288785A1 (en) * | 2010-05-18 | 2011-11-24 | Translational Genomics Research Institute (Tgen) | Compression of genomic base and annotation data |
| CN110021349A (zh) * | 2017-07-31 | 2019-07-16 | 北京哲源科技有限责任公司 | 基因数据的编码方法 |
| CN110111852A (zh) * | 2018-01-11 | 2019-08-09 | 广州明领基因科技有限公司 | 一种海量dna测序数据无损快速压缩平台 |
| CN110797082A (zh) * | 2019-10-24 | 2020-02-14 | 福建和瑞基因科技有限公司 | 基因测序数据的存储读取方法及系统 |
Non-Patent Citations (2)
| Title |
|---|
| RAYMOND WAN, ET AL.: "Transformations for the compression of FASTQ quality scores of next-generation sequencing data", 《BIOINFORMATICS》, vol. 28, no. 5, 1 March 2012 (2012-03-01), pages 628 - 635 * |
| 陈惟昌, 陈志华, 陈志义, 王自强, 邱红霞: "遗传密码和DNA序列的高维空间数字编码", 生物物理学报, no. 04, 30 December 2000 (2000-12-30) * |
Also Published As
| Publication number | Publication date |
|---|---|
| JP7810664B2 (ja) | 2026-02-03 |
| IL298981B2 (en) | 2025-03-01 |
| ZA202304367B (en) | 2023-12-20 |
| KR20230101760A (ko) | 2023-07-06 |
| IL316156B1 (en) | 2026-04-01 |
| IL298981A (en) | 2023-02-01 |
| WO2022099097A1 (en) | 2022-05-12 |
| ZA202402955B (en) | 2025-04-30 |
| BR112022025042A2 (pt) | 2023-05-09 |
| IL316156A (en) | 2024-12-01 |
| US20220139502A1 (en) | 2022-05-05 |
| US20240420804A1 (en) | 2024-12-19 |
| JP2023547973A (ja) | 2023-11-15 |
| MX2022016020A (es) | 2023-02-02 |
| US20230040143A1 (en) | 2023-02-09 |
| US12080385B2 (en) | 2024-09-03 |
| AU2021376411A1 (en) | 2022-10-27 |
| US20240062853A1 (en) | 2024-02-22 |
| CA3174208A1 (en) | 2022-05-12 |
| US11527307B2 (en) | 2022-12-13 |
| IL298981B1 (en) | 2024-11-01 |
| EP4241276A1 (en) | 2023-09-13 |
| US11776663B2 (en) | 2023-10-03 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12080385B2 (en) | Quality score compression | |
| Cánovas et al. | Lossy compression of quality scores in genomic data | |
| US10790044B2 (en) | Systems and methods for sequence encoding, storage, and compression | |
| US9929746B2 (en) | Methods and systems for data analysis and compression | |
| Grabowski et al. | Disk-based compression of data from genome sequencing | |
| US20150227686A1 (en) | Lossless compression of dna sequences | |
| Li et al. | DNA-COMPACT: DNA COM pression Based on a P attern-A ware C ontextual Modeling T echnique | |
| CN108717461B (zh) | 海量数据结构化方法、装置、计算机设备及存储介质 | |
| JP2025124637A (ja) | ゲノム配列データの圧縮のための方法 | |
| Mansouri et al. | One-bit dna compression algorithm | |
| US10460829B2 (en) | Systems and methods for encoding genetic variation for a population | |
| CN110377822A (zh) | 用于网络表征学习的方法、装置及电子设备 | |
| Cunial et al. | A framework for space-efficient variable-order Markov models | |
| Liu et al. | Quality scores compression of genomic sequencing data: a comprehensive review and performance evaluation | |
| Wang et al. | smallWig: parallel compression of RNA-seq WIG files | |
| Nazari et al. | Lossless and reference-free compression of FASTQ/A files using GeneSqueeze | |
| Shomorony et al. | Sketching and sequence alignment: A rate-distortion perspective | |
| Rahman et al. | CHAPAO: Likelihood and hierarchical reference-based representation of biomolecular sequences and applications to compressing multiple sequence alignments | |
| Gupta et al. | An efficient compressor for biological sequences | |
| Sun et al. | An intelligent ubiquitous compression technique for DNA sequencing using Hadoop | |
| Arun et al. | A novel DNA sequence compression method based on chaos game representation | |
| Venugopal et al. | Probabilistic Approach for DNA Compression | |
| Punitha et al. | A Novel Algorithm for DNA Sequence Compression | |
| WO2024186229A1 (en) | Data compression method and apparatus | |
| Suaste Morales | Lossy Compression of Quality Values in Next-Generation Sequencing Data |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |