CN114556318A - 可定制的分隔文本压缩框架 - Google Patents
可定制的分隔文本压缩框架 Download PDFInfo
- Publication number
- CN114556318A CN114556318A CN202080073005.0A CN202080073005A CN114556318A CN 114556318 A CN114556318 A CN 114556318A CN 202080073005 A CN202080073005 A CN 202080073005A CN 114556318 A CN114556318 A CN 114556318A
- Authority
- CN
- China
- Prior art keywords
- compression
- data
- file
- compressed
- mode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/173—Customisation support for file systems, e.g. localisation, multi-language support, personalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/123—Storage facilities
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/131—Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/183—Tabulation, i.e. one-dimensional [1D] positioning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/50—Compression of genetic data
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/60—General implementation details not specific to a particular type of compression
- H03M7/6064—Selection of Compressor
- H03M7/607—Selection between different types of compressors
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/70—Type of the data to be coded, other than image and sound
- H03M7/707—Structured documents, e.g. XML
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioethics (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Genetics & Genomics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Document Processing Apparatus (AREA)
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201962923113P | 2019-10-18 | 2019-10-18 | |
| US62/923,113 | 2019-10-18 | ||
| US202062956941P | 2020-01-03 | 2020-01-03 | |
| US62/956,941 | 2020-01-03 | ||
| PCT/EP2020/078996 WO2021074272A1 (en) | 2019-10-18 | 2020-10-15 | Customizable delimited text compression framework |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN114556318A true CN114556318A (zh) | 2022-05-27 |
Family
ID=72964653
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202080073005.0A Pending CN114556318A (zh) | 2019-10-18 | 2020-10-15 | 可定制的分隔文本压缩框架 |
Country Status (7)
| Country | Link |
|---|---|
| US (1) | US20240095218A1 (https=) |
| EP (1) | EP4046052A1 (https=) |
| JP (1) | JP7848681B2 (https=) |
| CN (1) | CN114556318A (https=) |
| BR (1) | BR112022007396A2 (https=) |
| CA (1) | CA3157786A1 (https=) |
| WO (1) | WO2021074272A1 (https=) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119166428A (zh) * | 2024-11-21 | 2024-12-20 | 北京高阳捷迅信息技术有限公司 | 基于大数据的关系型数据库备份恢复方法及系统 |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR102948214B1 (ko) * | 2021-07-16 | 2026-04-03 | 주식회사 쏠리드 | 프론트홀 다중화 장치 |
| US12387053B2 (en) | 2022-01-27 | 2025-08-12 | International Business Machines Corporation | Large-scale text data encoding and compression |
| CN117827775A (zh) * | 2022-09-29 | 2024-04-05 | 华为技术有限公司 | 数据压缩方法、装置、计算设备及存储系统 |
| CN116521063B (zh) * | 2023-03-31 | 2024-03-26 | 北京瑞风协同科技股份有限公司 | 一种hdf5的试验数据高效读写方法及装置 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2000252832A (ja) * | 1999-02-25 | 2000-09-14 | Nikon Corp | データ圧縮装置、およびデータ圧縮プログラムを記録した記録媒体 |
| JP2005018672A (ja) * | 2003-06-30 | 2005-01-20 | Hitachi Ltd | 構造化文書の圧縮方法 |
| US20050228811A1 (en) * | 2004-04-07 | 2005-10-13 | Russell Perry | Method of and system for compressing and decompressing hierarchical data structures |
| CN103026631A (zh) * | 2010-06-01 | 2013-04-03 | 甲骨文国际公司 | 用于压缩xml文档的方法和系统 |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CA2283591C (en) * | 1997-03-07 | 2006-01-31 | Intelligent Compression Technologies | Data coding network |
| JP5280425B2 (ja) * | 2010-11-12 | 2013-09-04 | シャープ株式会社 | 画像処理装置、画像読取装置、画像形成装置、画像処理方法、プログラムおよびその記録媒体 |
| KR101922129B1 (ko) * | 2011-12-05 | 2018-11-26 | 삼성전자주식회사 | 차세대 시퀀싱을 이용하여 획득된 유전 정보를 압축 및 압축해제하는 방법 및 장치 |
| CA2958478C (en) | 2014-09-03 | 2019-04-16 | Patrick Soon-Shiong | Synthetic genomic variant-based secure transaction devices, systems and methods |
| JP6949970B2 (ja) | 2016-10-11 | 2021-10-13 | ゲノムシス エスアー | バイオインフォマティクスデータを送信する方法及びシステム |
| EA201990933A1 (ru) | 2016-10-11 | 2019-11-29 | Эффективные структуры данных для представления информации биоинформатики |
-
2020
- 2020-10-15 JP JP2022522976A patent/JP7848681B2/ja active Active
- 2020-10-15 CA CA3157786A patent/CA3157786A1/en active Pending
- 2020-10-15 BR BR112022007396A patent/BR112022007396A2/pt unknown
- 2020-10-15 CN CN202080073005.0A patent/CN114556318A/zh active Pending
- 2020-10-15 US US17/768,878 patent/US20240095218A1/en active Pending
- 2020-10-15 EP EP20793605.5A patent/EP4046052A1/en active Pending
- 2020-10-15 WO PCT/EP2020/078996 patent/WO2021074272A1/en not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2000252832A (ja) * | 1999-02-25 | 2000-09-14 | Nikon Corp | データ圧縮装置、およびデータ圧縮プログラムを記録した記録媒体 |
| JP2005018672A (ja) * | 2003-06-30 | 2005-01-20 | Hitachi Ltd | 構造化文書の圧縮方法 |
| US20050228811A1 (en) * | 2004-04-07 | 2005-10-13 | Russell Perry | Method of and system for compressing and decompressing hierarchical data structures |
| CN103026631A (zh) * | 2010-06-01 | 2013-04-03 | 甲骨文国际公司 | 用于压缩xml文档的方法和系统 |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119166428A (zh) * | 2024-11-21 | 2024-12-20 | 北京高阳捷迅信息技术有限公司 | 基于大数据的关系型数据库备份恢复方法及系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2023501093A (ja) | 2023-01-18 |
| BR112022007396A2 (pt) | 2022-07-05 |
| EP4046052A1 (en) | 2022-08-24 |
| JP7848681B2 (ja) | 2026-04-21 |
| US20240095218A1 (en) | 2024-03-21 |
| CA3157786A1 (en) | 2021-04-22 |
| WO2021074272A1 (en) | 2021-04-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7631330B2 (ja) | 多様な表形式データの効果的な圧縮、表現、および展開のためのシステムおよび方法 | |
| CN114556318A (zh) | 可定制的分隔文本压缩框架 | |
| Holley et al. | Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage | |
| Harris et al. | Improved representation of sequence bloom trees | |
| Delcher et al. | Using MUMmer to identify similar regions in large sequence sets | |
| Maciuca et al. | A natural encoding of genetic variation in a burrows-wheeler transform to enable mapping and genome inference | |
| US9805080B2 (en) | Data driven relational algorithm formation for execution against big data | |
| US9098490B2 (en) | Genetic information management system and method | |
| US7689630B1 (en) | Two-level bitmap structure for bit compression and data management | |
| WO2018200294A1 (en) | Parser for schema-free data exchange format | |
| Holley et al. | Bloom filter trie–a data structure for pan-genome storage | |
| US11468031B1 (en) | Methods and apparatus for efficiently scaling real-time indexing | |
| JP7775215B2 (ja) | Mpeg-gにおける効率的なデータ圧縮の方法、ゲノムエンコーダ、ゲノムデコーダおよびコンピュータ可読媒体 | |
| Pibiri | On weighted k-mer dictionaries | |
| RU2633178C2 (ru) | Способ и система базы данных для индексирования ссылок на документы базы данных | |
| JP2023501093A5 (https=) | ||
| Pibiri et al. | Meta-colored compacted de Bruijn graphs | |
| CN109492127A (zh) | 数据处理方法、装置、介质和计算设备 | |
| Meng et al. | Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach | |
| US11126622B1 (en) | Methods and apparatus for efficiently scaling result caching | |
| EP3193260A2 (en) | Encoding program, encoding method, encoding device, decoding program, decoding method, and decoding device | |
| Brown et al. | Improved pangenomic classification accuracy with chain statistics | |
| KR102921797B1 (ko) | K-부정합 검색을 위한 필터를 생성하는 시스템 및 방법 | |
| US12445148B2 (en) | System and method for effective compression representation and decompression of diverse tabulated data | |
| CN118692573A (zh) | 一种基因型数据压缩及检索方法、装置、设备及计算机可读存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |