ES3050587T3 - Genomic information compression by configurable machine learning-based arithmetic coding - Google Patents

Genomic information compression by configurable machine learning-based arithmetic coding

Info

Publication number
ES3050587T3
ES3050587T3 ES21742062T ES21742062T ES3050587T3 ES 3050587 T3 ES3050587 T3 ES 3050587T3 ES 21742062 T ES21742062 T ES 21742062T ES 21742062 T ES21742062 T ES 21742062T ES 3050587 T3 ES3050587 T3 ES 3050587T3
Authority
ES
Spain
Prior art keywords
context
encoding
data
type
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
ES21742062T
Other languages
English (en)
Spanish (es)
Inventor
Shubham Chandak
Yee Cheung
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips NV filed Critical Koninklijke Philips NV
Application granted granted Critical
Publication of ES3050587T3 publication Critical patent/ES3050587T3/es
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/50Compression of genetic data
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/4006Conversion to or from arithmetic code
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6064Selection of Compressor
    • H03M7/6076Selection between compressors of the same type
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3068Precoding preceding compression, e.g. Burrows-Wheeler transformation
    • H03M7/3079Context modeling

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Genetics & Genomics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
ES21742062T 2020-07-10 2021-06-30 Genomic information compression by configurable machine learning-based arithmetic coding Active ES3050587T3 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063050193P 2020-07-10 2020-07-10
PCT/EP2021/067960 WO2022008311A1 (en) 2020-07-10 2021-06-30 Genomic information compression by configurable machine learning-based arithmetic coding

Publications (1)

Publication Number Publication Date
ES3050587T3 true ES3050587T3 (en) 2025-12-22

Family

ID=76920753

Family Applications (1)

Application Number Title Priority Date Filing Date
ES21742062T Active ES3050587T3 (en) 2020-07-10 2021-06-30 Genomic information compression by configurable machine learning-based arithmetic coding

Country Status (7)

Country Link
US (1) US20230253074A1 (https=)
EP (1) EP4179539B1 (https=)
JP (1) JP7826277B2 (https=)
CN (1) CN116018647A (https=)
ES (1) ES3050587T3 (https=)
PL (1) PL4179539T3 (https=)
WO (1) WO2022008311A1 (https=)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113810155B (zh) * 2020-06-17 2022-11-18 华为技术有限公司 信道编译码方法和通信装置
US11818399B2 (en) * 2021-01-04 2023-11-14 Tencent America LLC Techniques for signaling neural network topology and parameters in the coded video stream
CN115391298B (zh) * 2021-05-25 2026-03-27 戴尔产品有限公司 基于内容的动态混合数据压缩
EP4465207A4 (en) * 2022-01-13 2025-10-15 Lg Electronics Inc METHOD BY WHICH A RECEIVING DEVICE PERFORMS END-TO-END LEARNING IN A WIRELESS COMMUNICATION SYSTEM, RECEIVING DEVICE, PROCESSING DEVICE, STORAGE MEDIUM, METHOD BY WHICH A TRANSMITTING DEVICE PERFORMS END-TO-END LEARNING, AND TRANSMITTING DEVICE
JP2025522817A (ja) * 2022-06-30 2025-07-17 華為技術有限公司 エントロピーコーディングパラメータの適応的選択
CN115083530B (zh) * 2022-08-22 2022-11-04 广州明领基因科技有限公司 基因测序数据压缩方法、装置、终端设备和存储介质
CN117692094B (zh) * 2022-09-02 2026-03-20 北京邮电大学 编码方法、解码方法、编码装置、解码装置及电子设备
CN116886104B (zh) * 2023-09-08 2023-11-21 西安小草植物科技有限责任公司 一种基于人工智能的智慧医疗数据分析方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2002655A1 (de) * 2006-03-29 2008-12-17 Nokia Siemens Networks Gmbh & Co. Kg Verfahren und vorrichtung zum erstellen eines datenblocks für einen skalierbaren datenstrom
CN107529709B (zh) * 2011-06-16 2019-05-07 Ge视频压缩有限责任公司 解码器、编码器、解码和编码视频的方法及存储介质
CN106096327B (zh) * 2016-06-07 2018-08-17 广州麦仑信息科技有限公司 基于Torch监督式深度学习的基因性状识别方法
CN110663022B (zh) * 2016-10-11 2024-03-15 耶诺姆希斯股份公司 使用基因组描述符紧凑表示生物信息学数据的方法和设备
EP3583250B1 (en) * 2017-02-14 2023-07-12 Genomsys SA Method and systems for the efficient compression of genomic sequence reads
CN108306650A (zh) * 2018-01-16 2018-07-20 厦门极元科技有限公司 基因测序数据的压缩方法
PL4100954T3 (pl) * 2020-02-07 2026-01-26 Koninklijke Philips N.V. Ulepszona struktura kompresji wartości jakości w dopasowanych danych sekwencjonowania na podstawie nowych kontekstów

Also Published As

Publication number Publication date
US20230253074A1 (en) 2023-08-10
JP7826277B2 (ja) 2026-03-09
EP4179539C0 (en) 2025-10-01
EP4179539A1 (en) 2023-05-17
CN116018647A (zh) 2023-04-25
EP4179539B1 (en) 2025-10-01
JP2023535131A (ja) 2023-08-16
WO2022008311A1 (en) 2022-01-13
PL4179539T3 (pl) 2026-01-05

Similar Documents

Publication Publication Date Title
ES3050587T3 (en) Genomic information compression by configurable machine learning-based arithmetic coding
Benoit et al. Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph
Sabary et al. Survey for a decade of coding for DNA storage
US12596685B2 (en) System and methods for bandwidth-efficient data encoding
US20250047300A1 (en) System and method for data processing and transformation using reference data structures
US12368453B2 (en) Multi-stage fully homomorphic encryption and compression system for secure data processing and analysis
US20250055476A1 (en) Deep learning-based data compression with protocol adaptation
US11734231B2 (en) System and methods for bandwidth-efficient encoding of genomic data
US12436920B2 (en) System and method for file type identification using machine learning
US12423271B2 (en) System and methods for adaptive bandwidth-efficient encoding of genomic data
CN110915140B (zh) 用于编码和解码数据结构的质量值的方法
US20250284393A1 (en) System and Method for Compaction of Floating-Point Numbers Within a Dataset with Metadata Tagging
US12218697B2 (en) Event-driven data transmission using codebooks with protocol prediction and translation
US12499092B2 (en) System and method for sourceblock length optimization for data compaction
US11769570B2 (en) Method and systems for genome sequence compression
CN119068992B (zh) 一种满足生物条件约束的dna编码方法、终端设备及存储介质
US20250298510A1 (en) System and Method for Hardware-Accelerated Determination of Compression Performance Using Field-Programmable Gate Array Implementation
US12483269B2 (en) System and method for encrypted data compression with a hardware management layer
US20260079891A1 (en) System and Method for Sourceblock Length Optimization for Data Compaction
US12289121B2 (en) Adaptive neural upsampling system for decoding lossy compressed data streams
US20250202498A1 (en) System and method for enhancing decompressed data streams
US20250306760A1 (en) System and Method for Hardware-Accelerated Real-Time Tracking of Codebook Compression Performance Using a Field-Programmable Gate Array
US20250284395A1 (en) System and Method for Hybrid Codebook Performance Estimation Without Generation
US12192467B1 (en) Arithmetic encoding and decoding method based on semantic source and related device
US20260030214A1 (en) System and Method for Stream Data Type Identification Using Machine Learning