CN116018647A - 通过基于可配置机器学习的算术编码进行的基因组信息压缩 - Google Patents

通过基于可配置机器学习的算术编码进行的基因组信息压缩 Download PDF

Info

Publication number
CN116018647A
CN116018647A CN202180056542.9A CN202180056542A CN116018647A CN 116018647 A CN116018647 A CN 116018647A CN 202180056542 A CN202180056542 A CN 202180056542A CN 116018647 A CN116018647 A CN 116018647A
Authority
CN
China
Prior art keywords
context
type
data
coding
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180056542.9A
Other languages
English (en)
Chinese (zh)
Inventor
S·尚达科
张贻谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips NV filed Critical Koninklijke Philips NV
Publication of CN116018647A publication Critical patent/CN116018647A/zh
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/50Compression of genetic data
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/4006Conversion to or from arithmetic code
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6064Selection of Compressor
    • H03M7/6076Selection between compressors of the same type
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3068Precoding preceding compression, e.g. Burrows-Wheeler transformation
    • H03M7/3079Context modeling

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Genetics & Genomics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
CN202180056542.9A 2020-07-10 2021-06-30 通过基于可配置机器学习的算术编码进行的基因组信息压缩 Pending CN116018647A (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US202063050193P 2020-07-10 2020-07-10
US63/050,193 2020-07-10
PCT/EP2021/067960 WO2022008311A1 (en) 2020-07-10 2021-06-30 Genomic information compression by configurable machine learning-based arithmetic coding

Publications (1)

Publication Number Publication Date
CN116018647A true CN116018647A (zh) 2023-04-25

Family

ID=76920753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180056542.9A Pending CN116018647A (zh) 2020-07-10 2021-06-30 通过基于可配置机器学习的算术编码进行的基因组信息压缩

Country Status (7)

Country Link
US (1) US20230253074A1 (https=)
EP (1) EP4179539B1 (https=)
JP (1) JP7826277B2 (https=)
CN (1) CN116018647A (https=)
ES (1) ES3050587T3 (https=)
PL (1) PL4179539T3 (https=)
WO (1) WO2022008311A1 (https=)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113810155B (zh) * 2020-06-17 2022-11-18 华为技术有限公司 信道编译码方法和通信装置
US11818399B2 (en) * 2021-01-04 2023-11-14 Tencent America LLC Techniques for signaling neural network topology and parameters in the coded video stream
CN115391298B (zh) * 2021-05-25 2026-03-27 戴尔产品有限公司 基于内容的动态混合数据压缩
EP4465207A4 (en) * 2022-01-13 2025-10-15 Lg Electronics Inc METHOD BY WHICH A RECEIVING DEVICE PERFORMS END-TO-END LEARNING IN A WIRELESS COMMUNICATION SYSTEM, RECEIVING DEVICE, PROCESSING DEVICE, STORAGE MEDIUM, METHOD BY WHICH A TRANSMITTING DEVICE PERFORMS END-TO-END LEARNING, AND TRANSMITTING DEVICE
JP2025522817A (ja) * 2022-06-30 2025-07-17 華為技術有限公司 エントロピーコーディングパラメータの適応的選択
CN115083530B (zh) * 2022-08-22 2022-11-04 广州明领基因科技有限公司 基因测序数据压缩方法、装置、终端设备和存储介质
CN117692094B (zh) * 2022-09-02 2026-03-20 北京邮电大学 编码方法、解码方法、编码装置、解码装置及电子设备
CN116886104B (zh) * 2023-09-08 2023-11-21 西安小草植物科技有限责任公司 一种基于人工智能的智慧医疗数据分析方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096327A (zh) * 2016-06-07 2016-11-09 广州麦仑信息科技有限公司 基于Torch监督式深度学习的基因性状识别方法
WO2018151788A1 (en) * 2017-02-14 2018-08-23 Genomsys Sa Method and systems for the efficient compression of genomic sequence reads

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2002655A1 (de) * 2006-03-29 2008-12-17 Nokia Siemens Networks Gmbh & Co. Kg Verfahren und vorrichtung zum erstellen eines datenblocks für einen skalierbaren datenstrom
CN107529709B (zh) * 2011-06-16 2019-05-07 Ge视频压缩有限责任公司 解码器、编码器、解码和编码视频的方法及存储介质
CN110663022B (zh) * 2016-10-11 2024-03-15 耶诺姆希斯股份公司 使用基因组描述符紧凑表示生物信息学数据的方法和设备
CN108306650A (zh) * 2018-01-16 2018-07-20 厦门极元科技有限公司 基因测序数据的压缩方法
PL4100954T3 (pl) * 2020-02-07 2026-01-26 Koninklijke Philips N.V. Ulepszona struktura kompresji wartości jakości w dopasowanych danych sekwencjonowania na podstawie nowych kontekstów

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096327A (zh) * 2016-06-07 2016-11-09 广州麦仑信息科技有限公司 基于Torch监督式深度学习的基因性状识别方法
WO2018151788A1 (en) * 2017-02-14 2018-08-23 Genomsys Sa Method and systems for the efficient compression of genomic sequence reads

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王荣杰: "高通量基因组数据的无损压缩方法研究", 《万方学位论文》, 15 January 2020 (2020-01-15), pages 1 - 123 *

Also Published As

Publication number Publication date
US20230253074A1 (en) 2023-08-10
JP7826277B2 (ja) 2026-03-09
EP4179539C0 (en) 2025-10-01
EP4179539A1 (en) 2023-05-17
EP4179539B1 (en) 2025-10-01
JP2023535131A (ja) 2023-08-16
ES3050587T3 (en) 2025-12-22
WO2022008311A1 (en) 2022-01-13
PL4179539T3 (pl) 2026-01-05

Similar Documents

Publication Publication Date Title
CN116018647A (zh) 通过基于可配置机器学习的算术编码进行的基因组信息压缩
Benoit et al. Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph
US12425044B2 (en) Federated large codeword model deep learning architecture
Yu et al. Two-level data compression using machine learning in time series database
Li et al. The similarity metric
JP7372347B2 (ja) データ圧縮方法およびコンピューティングデバイス
JP7810664B2 (ja) 質スコア圧縮
CN111105029A (zh) 神经网络的生成方法、生成装置和电子设备
US12596685B2 (en) System and methods for bandwidth-efficient data encoding
US20250379593A1 (en) Federated Byte Latent Transformer for Privacy-Preserving Deep Learning
US20260050776A1 (en) Cloud-Edge Collaborative Data Processing Method and System, Device, and Storage Medium
US11734231B2 (en) System and methods for bandwidth-efficient encoding of genomic data
US12423271B2 (en) System and methods for adaptive bandwidth-efficient encoding of genomic data
CN110915140B (zh) 用于编码和解码数据结构的质量值的方法
Guo et al. SGB‐ELM: an advanced stochastic gradient boosting‐based ensemble scheme for extreme learning machine
US20250284393A1 (en) System and Method for Compaction of Floating-Point Numbers Within a Dataset with Metadata Tagging
CN112735392B (zh) 语音处理方法、装置、设备及存储介质
US20250378308A1 (en) Latent transformer core for a large codeword model
US12499092B2 (en) System and method for sourceblock length optimization for data compaction
US11769570B2 (en) Method and systems for genome sequence compression
CN109698702B (zh) 基因测序数据压缩预处理方法、系统及计算机可读介质
Izacard et al. Lossless Data Compression with Transformer
CN119338007B (zh) 一种模型推理验证方法、装置、电子设备及存储介质
CN114023374A (zh) Dna信道仿真与编码优化方法及装置
US20250379592A1 (en) System and Method for Privacy-Preserving Federated Deep Learning with Distributed Model Optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination