CN116018647A - 通过基于可配置机器学习的算术编码进行的基因组信息压缩 - Google Patents
通过基于可配置机器学习的算术编码进行的基因组信息压缩 Download PDFInfo
- Publication number
- CN116018647A CN116018647A CN202180056542.9A CN202180056542A CN116018647A CN 116018647 A CN116018647 A CN 116018647A CN 202180056542 A CN202180056542 A CN 202180056542A CN 116018647 A CN116018647 A CN 116018647A
- Authority
- CN
- China
- Prior art keywords
- context
- type
- data
- coding
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/50—Compression of genetic data
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
- H03M7/4006—Conversion to or from arithmetic code
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/60—General implementation details not specific to a particular type of compression
- H03M7/6064—Selection of Compressor
- H03M7/6076—Selection between compressors of the same type
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3068—Precoding preceding compression, e.g. Burrows-Wheeler transformation
- H03M7/3079—Context modeling
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Genetics & Genomics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202063050193P | 2020-07-10 | 2020-07-10 | |
| US63/050,193 | 2020-07-10 | ||
| PCT/EP2021/067960 WO2022008311A1 (en) | 2020-07-10 | 2021-06-30 | Genomic information compression by configurable machine learning-based arithmetic coding |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116018647A true CN116018647A (zh) | 2023-04-25 |
Family
ID=76920753
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202180056542.9A Pending CN116018647A (zh) | 2020-07-10 | 2021-06-30 | 通过基于可配置机器学习的算术编码进行的基因组信息压缩 |
Country Status (7)
| Country | Link |
|---|---|
| US (1) | US20230253074A1 (https=) |
| EP (1) | EP4179539B1 (https=) |
| JP (1) | JP7826277B2 (https=) |
| CN (1) | CN116018647A (https=) |
| ES (1) | ES3050587T3 (https=) |
| PL (1) | PL4179539T3 (https=) |
| WO (1) | WO2022008311A1 (https=) |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113810155B (zh) * | 2020-06-17 | 2022-11-18 | 华为技术有限公司 | 信道编译码方法和通信装置 |
| US11818399B2 (en) * | 2021-01-04 | 2023-11-14 | Tencent America LLC | Techniques for signaling neural network topology and parameters in the coded video stream |
| CN115391298B (zh) * | 2021-05-25 | 2026-03-27 | 戴尔产品有限公司 | 基于内容的动态混合数据压缩 |
| EP4465207A4 (en) * | 2022-01-13 | 2025-10-15 | Lg Electronics Inc | METHOD BY WHICH A RECEIVING DEVICE PERFORMS END-TO-END LEARNING IN A WIRELESS COMMUNICATION SYSTEM, RECEIVING DEVICE, PROCESSING DEVICE, STORAGE MEDIUM, METHOD BY WHICH A TRANSMITTING DEVICE PERFORMS END-TO-END LEARNING, AND TRANSMITTING DEVICE |
| JP2025522817A (ja) * | 2022-06-30 | 2025-07-17 | 華為技術有限公司 | エントロピーコーディングパラメータの適応的選択 |
| CN115083530B (zh) * | 2022-08-22 | 2022-11-04 | 广州明领基因科技有限公司 | 基因测序数据压缩方法、装置、终端设备和存储介质 |
| CN117692094B (zh) * | 2022-09-02 | 2026-03-20 | 北京邮电大学 | 编码方法、解码方法、编码装置、解码装置及电子设备 |
| CN116886104B (zh) * | 2023-09-08 | 2023-11-21 | 西安小草植物科技有限责任公司 | 一种基于人工智能的智慧医疗数据分析方法 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106096327A (zh) * | 2016-06-07 | 2016-11-09 | 广州麦仑信息科技有限公司 | 基于Torch监督式深度学习的基因性状识别方法 |
| WO2018151788A1 (en) * | 2017-02-14 | 2018-08-23 | Genomsys Sa | Method and systems for the efficient compression of genomic sequence reads |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2002655A1 (de) * | 2006-03-29 | 2008-12-17 | Nokia Siemens Networks Gmbh & Co. Kg | Verfahren und vorrichtung zum erstellen eines datenblocks für einen skalierbaren datenstrom |
| CN107529709B (zh) * | 2011-06-16 | 2019-05-07 | Ge视频压缩有限责任公司 | 解码器、编码器、解码和编码视频的方法及存储介质 |
| CN110663022B (zh) * | 2016-10-11 | 2024-03-15 | 耶诺姆希斯股份公司 | 使用基因组描述符紧凑表示生物信息学数据的方法和设备 |
| CN108306650A (zh) * | 2018-01-16 | 2018-07-20 | 厦门极元科技有限公司 | 基因测序数据的压缩方法 |
| PL4100954T3 (pl) * | 2020-02-07 | 2026-01-26 | Koninklijke Philips N.V. | Ulepszona struktura kompresji wartości jakości w dopasowanych danych sekwencjonowania na podstawie nowych kontekstów |
-
2021
- 2021-06-30 US US18/015,089 patent/US20230253074A1/en active Pending
- 2021-06-30 PL PL21742062.9T patent/PL4179539T3/pl unknown
- 2021-06-30 ES ES21742062T patent/ES3050587T3/es active Active
- 2021-06-30 EP EP21742062.9A patent/EP4179539B1/en active Active
- 2021-06-30 WO PCT/EP2021/067960 patent/WO2022008311A1/en not_active Ceased
- 2021-06-30 JP JP2023500391A patent/JP7826277B2/ja active Active
- 2021-06-30 CN CN202180056542.9A patent/CN116018647A/zh active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106096327A (zh) * | 2016-06-07 | 2016-11-09 | 广州麦仑信息科技有限公司 | 基于Torch监督式深度学习的基因性状识别方法 |
| WO2018151788A1 (en) * | 2017-02-14 | 2018-08-23 | Genomsys Sa | Method and systems for the efficient compression of genomic sequence reads |
Non-Patent Citations (1)
| Title |
|---|
| 王荣杰: "高通量基因组数据的无损压缩方法研究", 《万方学位论文》, 15 January 2020 (2020-01-15), pages 1 - 123 * |
Also Published As
| Publication number | Publication date |
|---|---|
| US20230253074A1 (en) | 2023-08-10 |
| JP7826277B2 (ja) | 2026-03-09 |
| EP4179539C0 (en) | 2025-10-01 |
| EP4179539A1 (en) | 2023-05-17 |
| EP4179539B1 (en) | 2025-10-01 |
| JP2023535131A (ja) | 2023-08-16 |
| ES3050587T3 (en) | 2025-12-22 |
| WO2022008311A1 (en) | 2022-01-13 |
| PL4179539T3 (pl) | 2026-01-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN116018647A (zh) | 通过基于可配置机器学习的算术编码进行的基因组信息压缩 | |
| Benoit et al. | Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph | |
| US12425044B2 (en) | Federated large codeword model deep learning architecture | |
| Yu et al. | Two-level data compression using machine learning in time series database | |
| Li et al. | The similarity metric | |
| JP7372347B2 (ja) | データ圧縮方法およびコンピューティングデバイス | |
| JP7810664B2 (ja) | 質スコア圧縮 | |
| CN111105029A (zh) | 神经网络的生成方法、生成装置和电子设备 | |
| US12596685B2 (en) | System and methods for bandwidth-efficient data encoding | |
| US20250379593A1 (en) | Federated Byte Latent Transformer for Privacy-Preserving Deep Learning | |
| US20260050776A1 (en) | Cloud-Edge Collaborative Data Processing Method and System, Device, and Storage Medium | |
| US11734231B2 (en) | System and methods for bandwidth-efficient encoding of genomic data | |
| US12423271B2 (en) | System and methods for adaptive bandwidth-efficient encoding of genomic data | |
| CN110915140B (zh) | 用于编码和解码数据结构的质量值的方法 | |
| Guo et al. | SGB‐ELM: an advanced stochastic gradient boosting‐based ensemble scheme for extreme learning machine | |
| US20250284393A1 (en) | System and Method for Compaction of Floating-Point Numbers Within a Dataset with Metadata Tagging | |
| CN112735392B (zh) | 语音处理方法、装置、设备及存储介质 | |
| US20250378308A1 (en) | Latent transformer core for a large codeword model | |
| US12499092B2 (en) | System and method for sourceblock length optimization for data compaction | |
| US11769570B2 (en) | Method and systems for genome sequence compression | |
| CN109698702B (zh) | 基因测序数据压缩预处理方法、系统及计算机可读介质 | |
| Izacard et al. | Lossless Data Compression with Transformer | |
| CN119338007B (zh) | 一种模型推理验证方法、装置、电子设备及存储介质 | |
| CN114023374A (zh) | Dna信道仿真与编码优化方法及装置 | |
| US20250379592A1 (en) | System and Method for Privacy-Preserving Federated Deep Learning with Distributed Model Optimization |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |