CN114556318A - 可定制的分隔文本压缩框架 - Google Patents

可定制的分隔文本压缩框架 Download PDF

Info

Publication number
CN114556318A
CN114556318A CN202080073005.0A CN202080073005A CN114556318A CN 114556318 A CN114556318 A CN 114556318A CN 202080073005 A CN202080073005 A CN 202080073005A CN 114556318 A CN114556318 A CN 114556318A
Authority
CN
China
Prior art keywords
compression
data
file
compressed
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080073005.0A
Other languages
English (en)
Chinese (zh)
Inventor
张贻谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips NV filed Critical Koninklijke Philips NV
Publication of CN114556318A publication Critical patent/CN114556318A/zh
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/173Customisation support for file systems, e.g. localisation, multi-language support, personalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/123Storage facilities
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/131Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/183Tabulation, i.e. one-dimensional [1D] positioning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/50Compression of genetic data
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6064Selection of Compressor
    • H03M7/607Selection between different types of compressors
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound
    • H03M7/707Structured documents, e.g. XML

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioethics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Genetics & Genomics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Document Processing Apparatus (AREA)
CN202080073005.0A 2019-10-18 2020-10-15 可定制的分隔文本压缩框架 Pending CN114556318A (zh)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201962923113P 2019-10-18 2019-10-18
US62/923,113 2019-10-18
US202062956941P 2020-01-03 2020-01-03
US62/956,941 2020-01-03
PCT/EP2020/078996 WO2021074272A1 (en) 2019-10-18 2020-10-15 Customizable delimited text compression framework

Publications (1)

Publication Number Publication Date
CN114556318A true CN114556318A (zh) 2022-05-27

Family

ID=72964653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080073005.0A Pending CN114556318A (zh) 2019-10-18 2020-10-15 可定制的分隔文本压缩框架

Country Status (7)

Country Link
US (1) US20240095218A1 (https=)
EP (1) EP4046052A1 (https=)
JP (1) JP7848681B2 (https=)
CN (1) CN114556318A (https=)
BR (1) BR112022007396A2 (https=)
CA (1) CA3157786A1 (https=)
WO (1) WO2021074272A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119166428A (zh) * 2024-11-21 2024-12-20 北京高阳捷迅信息技术有限公司 基于大数据的关系型数据库备份恢复方法及系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102948214B1 (ko) * 2021-07-16 2026-04-03 주식회사 쏠리드 프론트홀 다중화 장치
US12387053B2 (en) 2022-01-27 2025-08-12 International Business Machines Corporation Large-scale text data encoding and compression
CN117827775A (zh) * 2022-09-29 2024-04-05 华为技术有限公司 数据压缩方法、装置、计算设备及存储系统
CN116521063B (zh) * 2023-03-31 2024-03-26 北京瑞风协同科技股份有限公司 一种hdf5的试验数据高效读写方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000252832A (ja) * 1999-02-25 2000-09-14 Nikon Corp データ圧縮装置、およびデータ圧縮プログラムを記録した記録媒体
JP2005018672A (ja) * 2003-06-30 2005-01-20 Hitachi Ltd 構造化文書の圧縮方法
US20050228811A1 (en) * 2004-04-07 2005-10-13 Russell Perry Method of and system for compressing and decompressing hierarchical data structures
CN103026631A (zh) * 2010-06-01 2013-04-03 甲骨文国际公司 用于压缩xml文档的方法和系统

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2283591C (en) * 1997-03-07 2006-01-31 Intelligent Compression Technologies Data coding network
JP5280425B2 (ja) * 2010-11-12 2013-09-04 シャープ株式会社 画像処理装置、画像読取装置、画像形成装置、画像処理方法、プログラムおよびその記録媒体
KR101922129B1 (ko) * 2011-12-05 2018-11-26 삼성전자주식회사 차세대 시퀀싱을 이용하여 획득된 유전 정보를 압축 및 압축해제하는 방법 및 장치
CA2958478C (en) 2014-09-03 2019-04-16 Patrick Soon-Shiong Synthetic genomic variant-based secure transaction devices, systems and methods
JP6949970B2 (ja) 2016-10-11 2021-10-13 ゲノムシス エスアー バイオインフォマティクスデータを送信する方法及びシステム
EA201990933A1 (ru) 2016-10-11 2019-11-29 Эффективные структуры данных для представления информации биоинформатики

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000252832A (ja) * 1999-02-25 2000-09-14 Nikon Corp データ圧縮装置、およびデータ圧縮プログラムを記録した記録媒体
JP2005018672A (ja) * 2003-06-30 2005-01-20 Hitachi Ltd 構造化文書の圧縮方法
US20050228811A1 (en) * 2004-04-07 2005-10-13 Russell Perry Method of and system for compressing and decompressing hierarchical data structures
CN103026631A (zh) * 2010-06-01 2013-04-03 甲骨文国际公司 用于压缩xml文档的方法和系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119166428A (zh) * 2024-11-21 2024-12-20 北京高阳捷迅信息技术有限公司 基于大数据的关系型数据库备份恢复方法及系统

Also Published As

Publication number Publication date
JP2023501093A (ja) 2023-01-18
BR112022007396A2 (pt) 2022-07-05
EP4046052A1 (en) 2022-08-24
JP7848681B2 (ja) 2026-04-21
US20240095218A1 (en) 2024-03-21
CA3157786A1 (en) 2021-04-22
WO2021074272A1 (en) 2021-04-22

Similar Documents

Publication Publication Date Title
JP7631330B2 (ja) 多様な表形式データの効果的な圧縮、表現、および展開のためのシステムおよび方法
CN114556318A (zh) 可定制的分隔文本压缩框架
Holley et al. Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage
Harris et al. Improved representation of sequence bloom trees
Delcher et al. Using MUMmer to identify similar regions in large sequence sets
Maciuca et al. A natural encoding of genetic variation in a burrows-wheeler transform to enable mapping and genome inference
US9805080B2 (en) Data driven relational algorithm formation for execution against big data
US9098490B2 (en) Genetic information management system and method
US7689630B1 (en) Two-level bitmap structure for bit compression and data management
WO2018200294A1 (en) Parser for schema-free data exchange format
Holley et al. Bloom filter trie–a data structure for pan-genome storage
US11468031B1 (en) Methods and apparatus for efficiently scaling real-time indexing
JP7775215B2 (ja) Mpeg-gにおける効率的なデータ圧縮の方法、ゲノムエンコーダ、ゲノムデコーダおよびコンピュータ可読媒体
Pibiri On weighted k-mer dictionaries
RU2633178C2 (ru) Способ и система базы данных для индексирования ссылок на документы базы данных
JP2023501093A5 (https=)
Pibiri et al. Meta-colored compacted de Bruijn graphs
CN109492127A (zh) 数据处理方法、装置、介质和计算设备
Meng et al. Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach
US11126622B1 (en) Methods and apparatus for efficiently scaling result caching
EP3193260A2 (en) Encoding program, encoding method, encoding device, decoding program, decoding method, and decoding device
Brown et al. Improved pangenomic classification accuracy with chain statistics
KR102921797B1 (ko) K-부정합 검색을 위한 필터를 생성하는 시스템 및 방법
US12445148B2 (en) System and method for effective compression representation and decompression of diverse tabulated data
CN118692573A (zh) 一种基因型数据压缩及检索方法、装置、设备及计算机可读存储介质

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination