JP7826277B2 - 構成可能な機械学習ベースの算術コード化によるゲノム情報圧縮 - Google Patents

構成可能な機械学習ベースの算術コード化によるゲノム情報圧縮

Info

Publication number
JP7826277B2
JP7826277B2 JP2023500391A JP2023500391A JP7826277B2 JP 7826277 B2 JP7826277 B2 JP 7826277B2 JP 2023500391 A JP2023500391 A JP 2023500391A JP 2023500391 A JP2023500391 A JP 2023500391A JP 7826277 B2 JP7826277 B2 JP 7826277B2
Authority
JP
Japan
Prior art keywords
context
data
training
type
encoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2023500391A
Other languages
English (en)
Japanese (ja)
Other versions
JP2023535131A (ja
JP2023535131A5 (https=
Inventor
シュブハム チャンダク
イー ヒム チャン
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips NV filed Critical Koninklijke Philips NV
Publication of JP2023535131A publication Critical patent/JP2023535131A/ja
Publication of JP2023535131A5 publication Critical patent/JP2023535131A5/ja
Application granted granted Critical
Publication of JP7826277B2 publication Critical patent/JP7826277B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/50Compression of genetic data
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/4006Conversion to or from arithmetic code
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6064Selection of Compressor
    • H03M7/6076Selection between compressors of the same type
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3068Precoding preceding compression, e.g. Burrows-Wheeler transformation
    • H03M7/3079Context modeling

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Genetics & Genomics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
JP2023500391A 2020-07-10 2021-06-30 構成可能な機械学習ベースの算術コード化によるゲノム情報圧縮 Active JP7826277B2 (ja)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US202063050193P 2020-07-10 2020-07-10
US63/050,193 2020-07-10
PCT/EP2021/067960 WO2022008311A1 (en) 2020-07-10 2021-06-30 Genomic information compression by configurable machine learning-based arithmetic coding

Publications (3)

Publication Number Publication Date
JP2023535131A JP2023535131A (ja) 2023-08-16
JP2023535131A5 JP2023535131A5 (https=) 2023-08-23
JP7826277B2 true JP7826277B2 (ja) 2026-03-09

Family

ID=76920753

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2023500391A Active JP7826277B2 (ja) 2020-07-10 2021-06-30 構成可能な機械学習ベースの算術コード化によるゲノム情報圧縮

Country Status (7)

Country Link
US (1) US20230253074A1 (https=)
EP (1) EP4179539B1 (https=)
JP (1) JP7826277B2 (https=)
CN (1) CN116018647A (https=)
ES (1) ES3050587T3 (https=)
PL (1) PL4179539T3 (https=)
WO (1) WO2022008311A1 (https=)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113810155B (zh) * 2020-06-17 2022-11-18 华为技术有限公司 信道编译码方法和通信装置
US11818399B2 (en) * 2021-01-04 2023-11-14 Tencent America LLC Techniques for signaling neural network topology and parameters in the coded video stream
CN115391298B (zh) * 2021-05-25 2026-03-27 戴尔产品有限公司 基于内容的动态混合数据压缩
EP4465207A4 (en) * 2022-01-13 2025-10-15 Lg Electronics Inc METHOD BY WHICH A RECEIVING DEVICE PERFORMS END-TO-END LEARNING IN A WIRELESS COMMUNICATION SYSTEM, RECEIVING DEVICE, PROCESSING DEVICE, STORAGE MEDIUM, METHOD BY WHICH A TRANSMITTING DEVICE PERFORMS END-TO-END LEARNING, AND TRANSMITTING DEVICE
JP2025522817A (ja) * 2022-06-30 2025-07-17 華為技術有限公司 エントロピーコーディングパラメータの適応的選択
CN115083530B (zh) * 2022-08-22 2022-11-04 广州明领基因科技有限公司 基因测序数据压缩方法、装置、终端设备和存储介质
CN117692094B (zh) * 2022-09-02 2026-03-20 北京邮电大学 编码方法、解码方法、编码装置、解码装置及电子设备
CN116886104B (zh) * 2023-09-08 2023-11-21 西安小草植物科技有限责任公司 一种基于人工智能的智慧医疗数据分析方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101461243A (zh) 2006-03-29 2009-06-17 诺基亚西门子通信有限责任两合公司 为可定标的数据流产生数据块的方法和设备
CN110663022A (zh) 2016-10-11 2020-01-07 耶诺姆希斯股份公司 用于使用多个基因组描述符来紧凑表示生物信息学数据的方法和设备

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107529709B (zh) * 2011-06-16 2019-05-07 Ge视频压缩有限责任公司 解码器、编码器、解码和编码视频的方法及存储介质
CN106096327B (zh) * 2016-06-07 2018-08-17 广州麦仑信息科技有限公司 基于Torch监督式深度学习的基因性状识别方法
EP3583250B1 (en) * 2017-02-14 2023-07-12 Genomsys SA Method and systems for the efficient compression of genomic sequence reads
CN108306650A (zh) * 2018-01-16 2018-07-20 厦门极元科技有限公司 基因测序数据的压缩方法
PL4100954T3 (pl) * 2020-02-07 2026-01-26 Koninklijke Philips N.V. Ulepszona struktura kompresji wartości jakości w dopasowanych danych sekwencjonowania na podstawie nowych kontekstów

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101461243A (zh) 2006-03-29 2009-06-17 诺基亚西门子通信有限责任两合公司 为可定标的数据流产生数据块的方法和设备
CN110663022A (zh) 2016-10-11 2020-01-07 耶诺姆希斯股份公司 用于使用多个基因组描述符来紧凑表示生物信息学数据的方法和设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
W. Yang, Y. Lin, S. Wu, R. Yu,Improving Coding Efficiency of MPEG-G Standard Using Context-Based Arithmetic Coding,2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM),2018年,pp. 1177-1183,[online][検索日 2025年6月30日]取得先<https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8621550>

Also Published As

Publication number Publication date
US20230253074A1 (en) 2023-08-10
EP4179539C0 (en) 2025-10-01
EP4179539A1 (en) 2023-05-17
CN116018647A (zh) 2023-04-25
EP4179539B1 (en) 2025-10-01
JP2023535131A (ja) 2023-08-16
ES3050587T3 (en) 2025-12-22
WO2022008311A1 (en) 2022-01-13
PL4179539T3 (pl) 2026-01-05

Similar Documents

Publication Publication Date Title
JP7826277B2 (ja) 構成可能な機械学習ベースの算術コード化によるゲノム情報圧縮
Zheng et al. In-network machine learning using programmable network devices: A survey
Benoit et al. Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph
EP3534283B1 (en) Classification of source data by neural network processing
EP3534284B1 (en) Classification of source data by neural network processing
JP7372347B2 (ja) データ圧縮方法およびコンピューティングデバイス
Yu et al. Two-level data compression using machine learning in time series database
JP7810664B2 (ja) 質スコア圧縮
WO2015180203A1 (zh) 一种高通量dna测序质量分数无损压缩系统及压缩方法
CN114048328B (zh) 基于转换假设和消息传递的知识图谱链接预测方法及系统
US20110208820A1 (en) Method and system for message handling
CN107783998A (zh) 一种数据处理的方法以及装置
US20230222354A1 (en) A method for a distributed learning
CN114222998A (zh) 用于带宽增强的特征字典
JP2021072540A (ja) 画像符号化装置、復号装置、伝送システム、及びその制御方法
CN110362683A (zh) 一种基于递归神经网络的信息隐写方法、装置及存储介质
JP7674340B2 (ja) ゲノム配列データの圧縮のための方法
US12423271B2 (en) System and methods for adaptive bandwidth-efficient encoding of genomic data
US12218697B2 (en) Event-driven data transmission using codebooks with protocol prediction and translation
WO2020070943A1 (ja) パターン認識装置及び学習済みモデル
CN111008276B (zh) 一种完整实体关系抽取方法及装置
CN117995276A (zh) 基于生成模型的数据缺失插补方法、电子设备、介质
US20190057185A1 (en) Compression/Decompression Method and Apparatus for Genomic Variant Call Data
CN116186202A (zh) 结合时域特征的新词发现方法和系统
Pasquini et al. Robust and Lightweight Modeling of IoT Network Behaviors From Raw Traffic Packets

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20230214

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20230810

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20240627

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20250708

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20250829

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20251021

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20251219

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20260127

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20260225

R150 Certificate of patent or registration of utility model

Ref document number: 7826277

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150