CN114556482A - 用于对各种各样的表列数据进行有效压缩、表示和解压缩的系统和方法 - Google Patents

用于对各种各样的表列数据进行有效压缩、表示和解压缩的系统和方法 Download PDF

Info

Publication number
CN114556482A
CN114556482A CN202080073109.1A CN202080073109A CN114556482A CN 114556482 A CN114556482 A CN 114556482A CN 202080073109 A CN202080073109 A CN 202080073109A CN 114556482 A CN114556482 A CN 114556482A
Authority
CN
China
Prior art keywords
information
file
attributes
data
blocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080073109.1A
Other languages
English (en)
Chinese (zh)
Inventor
S·尚达科
张贻谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips NV filed Critical Koninklijke Philips NV
Publication of CN114556482A publication Critical patent/CN114556482A/zh
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/50Compression of genetic data
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6064Selection of Compressor
    • H03M7/6082Selection strategies
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/10Ontologies; Annotations
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3068Precoding preceding compression, e.g. Burrows-Wheeler transformation
    • H03M7/3079Context modeling
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Genetics & Genomics (AREA)
  • Automation & Control Theory (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
CN202080073109.1A 2019-10-18 2020-10-17 用于对各种各样的表列数据进行有效压缩、表示和解压缩的系统和方法 Pending CN114556482A (zh)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201962923141P 2019-10-18 2019-10-18
US62/923,141 2019-10-18
US202062956952P 2020-01-03 2020-01-03
US62/956,952 2020-01-03
PCT/EP2020/079298 WO2021074440A1 (en) 2019-10-18 2020-10-17 System and method for effective compression, representation and decompression of diverse tabulated data

Publications (1)

Publication Number Publication Date
CN114556482A true CN114556482A (zh) 2022-05-27

Family

ID=72915837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080073109.1A Pending CN114556482A (zh) 2019-10-18 2020-10-17 用于对各种各样的表列数据进行有效压缩、表示和解压缩的系统和方法

Country Status (6)

Country Link
US (1) US11916576B2 (https=)
EP (1) EP4046279A1 (https=)
JP (2) JP7631330B2 (https=)
CN (1) CN114556482A (https=)
BR (1) BR112022007331A2 (https=)
WO (1) WO2021074440A1 (https=)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116521063A (zh) * 2023-03-31 2023-08-01 北京瑞风协同科技股份有限公司 一种hdf5的试验数据高效读写方法及装置
CN117312261A (zh) * 2023-11-29 2023-12-29 苏州元脑智能科技有限公司 文件的压缩编码方法、装置存储介质及电子设备
WO2024148566A1 (zh) * 2023-01-12 2024-07-18 华为技术有限公司 数据压缩传输方法、装置、设备以及存储介质
WO2025201026A1 (zh) * 2024-03-29 2025-10-02 华为技术有限公司 一种通信方法及装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11463556B2 (en) * 2020-11-18 2022-10-04 Verizon Patent And Licensing Inc. Systems and methods for packet-based file compression and storage
CN114900571B (zh) * 2022-07-13 2022-09-27 工业信息安全(四川)创新中心有限公司 一种基于模板解析可信密码指令的方法、设备及介质
CN117312309B (zh) * 2023-09-20 2026-04-21 北京火山引擎科技有限公司 一种针对软件产品的数据处理方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120268298A1 (en) * 2009-09-04 2012-10-25 Yun-Sik Oh Method and apparatus for compressing and decompressing block unit data
US20140232574A1 (en) * 2013-01-10 2014-08-21 Dan ALONI System, method and non-transitory computer readable medium for compressing genetic information
CN109712674A (zh) * 2019-01-14 2019-05-03 深圳市泰尔迪恩生物信息科技有限公司 注释数据库索引结构、快速注释遗传变异的方法及系统
US20190214111A1 (en) * 2016-10-11 2019-07-11 Genomsys Sa Method and systems for the representation and processing of bioinformatics data using reference sequences
CN110168652A (zh) * 2016-10-11 2019-08-23 耶诺姆希斯股份公司 用于存储和访问生物信息学数据的方法和系统

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10090857B2 (en) * 2010-04-26 2018-10-02 Samsung Electronics Co., Ltd. Method and apparatus for compressing genetic data
US8412462B1 (en) * 2010-06-25 2013-04-02 Annai Systems, Inc. Methods and systems for processing genomic data
EP2595076B1 (en) * 2011-11-18 2019-05-15 Tata Consultancy Services Limited Compression of genomic data
US11998540B2 (en) 2015-12-11 2024-06-04 The General Hospital Corporation Compositions and methods for treating drug-tolerant glioblastoma
US20170177597A1 (en) * 2015-12-22 2017-06-22 DNANEXUS, Inc. Biological data systems
WO2017153456A1 (en) * 2016-03-09 2017-09-14 Sophia Genetics S.A. Methods to compress, encrypt and retrieve genomic alignment data
US10790044B2 (en) * 2016-05-19 2020-09-29 Seven Bridges Genomics Inc. Systems and methods for sequence encoding, storage, and compression
SG11201903858XA (en) 2016-10-28 2019-05-30 Illumina Inc Bioinformatics systems, apparatuses, and methods for performing secondary and/or tertiary processing
US10554220B1 (en) * 2019-01-30 2020-02-04 International Business Machines Corporation Managing compression and storage of genomic data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120268298A1 (en) * 2009-09-04 2012-10-25 Yun-Sik Oh Method and apparatus for compressing and decompressing block unit data
US20140232574A1 (en) * 2013-01-10 2014-08-21 Dan ALONI System, method and non-transitory computer readable medium for compressing genetic information
US20190214111A1 (en) * 2016-10-11 2019-07-11 Genomsys Sa Method and systems for the representation and processing of bioinformatics data using reference sequences
CN110168652A (zh) * 2016-10-11 2019-08-23 耶诺姆希斯股份公司 用于存储和访问生物信息学数据的方法和系统
CN109712674A (zh) * 2019-01-14 2019-05-03 深圳市泰尔迪恩生物信息科技有限公司 注释数据库索引结构、快速注释遗传变异的方法及系统

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024148566A1 (zh) * 2023-01-12 2024-07-18 华为技术有限公司 数据压缩传输方法、装置、设备以及存储介质
CN116521063A (zh) * 2023-03-31 2023-08-01 北京瑞风协同科技股份有限公司 一种hdf5的试验数据高效读写方法及装置
CN116521063B (zh) * 2023-03-31 2024-03-26 北京瑞风协同科技股份有限公司 一种hdf5的试验数据高效读写方法及装置
CN117312261A (zh) * 2023-11-29 2023-12-29 苏州元脑智能科技有限公司 文件的压缩编码方法、装置存储介质及电子设备
CN117312261B (zh) * 2023-11-29 2024-02-09 苏州元脑智能科技有限公司 文件的压缩编码方法、装置存储介质及电子设备
WO2025201026A1 (zh) * 2024-03-29 2025-10-02 华为技术有限公司 一种通信方法及装置

Also Published As

Publication number Publication date
BR112022007331A2 (pt) 2022-07-05
US11916576B2 (en) 2024-02-27
US20220368347A1 (en) 2022-11-17
JP2022553199A (ja) 2022-12-22
JP2025069371A (ja) 2025-04-30
EP4046279A1 (en) 2022-08-24
WO2021074440A1 (en) 2021-04-22
JP7631330B2 (ja) 2025-02-18

Similar Documents

Publication Publication Date Title
US11916576B2 (en) System and method for effective compression, representation and decompression of diverse tabulated data
Harris et al. Improved representation of sequence bloom trees
Holley et al. Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage
Cox et al. Large-scale compression of genomic sequence databases with the Burrows–Wheeler transform
CN110678929A (zh) 用于高效压缩基因组序列读段的方法和系统
Janin et al. BEETL-fastq: a searchable compressed archive for DNA reads
EP3526709B1 (en) Efficient data structures for bioinformatics information representation
CN114556318A (zh) 可定制的分隔文本压缩框架
CN110178183B (zh) 用于传输生物信息学数据的方法和系统
EP3309697A1 (en) System and method for storing and accessing data
Holley et al. Bloom filter trie–a data structure for pan-genome storage
CN110168652B (zh) 用于存储和访问生物信息学数据的方法和系统
KR20230003493A (ko) Mpeg-g의 효율적인 데이터 압축 방법 및 시스템
Meng et al. Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach
JP7362481B2 (ja) ゲノムシーケンスデータをコード化する方法、コード化されたゲノムデータをデコード化する方法、ゲノムシーケンスデータをコード化するためのゲノムエンコーダ、ゲノムデータをデコードするためのゲノムデコーダ、及びコンピュータ読み取り可能な記録媒体
Brown et al. Improved pangenomic classification accuracy with chain statistics
Pandey et al. VariantStore: an index for large-scale genomic variant search
US12445148B2 (en) System and method for effective compression representation and decompression of diverse tabulated data
CN110663022B (zh) 使用基因组描述符紧凑表示生物信息学数据的方法和设备
Lichtenwalter et al. Genotypic data in relational databases: efficient storage and rapid retrieval
Luo et al. GSC: efficient lossless compression of VCF files with fast query
Eskandar et al. Lossless pangenome indexing using tag arrays
Dorok et al. Efficient storage and analysis of genome data in databases
HK40082649B (en) Efficient data structures for bioinformatics information representation
HK40082649A (en) Efficient data structures for bioinformatics information representation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination