CN114556482A - 用于对各种各样的表列数据进行有效压缩、表示和解压缩的系统和方法 - Google Patents
用于对各种各样的表列数据进行有效压缩、表示和解压缩的系统和方法 Download PDFInfo
- Publication number
- CN114556482A CN114556482A CN202080073109.1A CN202080073109A CN114556482A CN 114556482 A CN114556482 A CN 114556482A CN 202080073109 A CN202080073109 A CN 202080073109A CN 114556482 A CN114556482 A CN 114556482A
- Authority
- CN
- China
- Prior art keywords
- information
- file
- attributes
- data
- blocks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/50—Compression of genetic data
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/60—General implementation details not specific to a particular type of compression
- H03M7/6064—Selection of Compressor
- H03M7/6082—Selection strategies
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/604—Tools and structures for managing or administering access control systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/10—Ontologies; Annotations
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3068—Precoding preceding compression, e.g. Burrows-Wheeler transformation
- H03M7/3079—Context modeling
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/70—Type of the data to be coded, other than image and sound
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- Biotechnology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Genetics & Genomics (AREA)
- Automation & Control Theory (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201962923141P | 2019-10-18 | 2019-10-18 | |
| US62/923,141 | 2019-10-18 | ||
| US202062956952P | 2020-01-03 | 2020-01-03 | |
| US62/956,952 | 2020-01-03 | ||
| PCT/EP2020/079298 WO2021074440A1 (en) | 2019-10-18 | 2020-10-17 | System and method for effective compression, representation and decompression of diverse tabulated data |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN114556482A true CN114556482A (zh) | 2022-05-27 |
Family
ID=72915837
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202080073109.1A Pending CN114556482A (zh) | 2019-10-18 | 2020-10-17 | 用于对各种各样的表列数据进行有效压缩、表示和解压缩的系统和方法 |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US11916576B2 (https=) |
| EP (1) | EP4046279A1 (https=) |
| JP (2) | JP7631330B2 (https=) |
| CN (1) | CN114556482A (https=) |
| BR (1) | BR112022007331A2 (https=) |
| WO (1) | WO2021074440A1 (https=) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116521063A (zh) * | 2023-03-31 | 2023-08-01 | 北京瑞风协同科技股份有限公司 | 一种hdf5的试验数据高效读写方法及装置 |
| CN117312261A (zh) * | 2023-11-29 | 2023-12-29 | 苏州元脑智能科技有限公司 | 文件的压缩编码方法、装置存储介质及电子设备 |
| WO2024148566A1 (zh) * | 2023-01-12 | 2024-07-18 | 华为技术有限公司 | 数据压缩传输方法、装置、设备以及存储介质 |
| WO2025201026A1 (zh) * | 2024-03-29 | 2025-10-02 | 华为技术有限公司 | 一种通信方法及装置 |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11463556B2 (en) * | 2020-11-18 | 2022-10-04 | Verizon Patent And Licensing Inc. | Systems and methods for packet-based file compression and storage |
| CN114900571B (zh) * | 2022-07-13 | 2022-09-27 | 工业信息安全(四川)创新中心有限公司 | 一种基于模板解析可信密码指令的方法、设备及介质 |
| CN117312309B (zh) * | 2023-09-20 | 2026-04-21 | 北京火山引擎科技有限公司 | 一种针对软件产品的数据处理方法及装置 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120268298A1 (en) * | 2009-09-04 | 2012-10-25 | Yun-Sik Oh | Method and apparatus for compressing and decompressing block unit data |
| US20140232574A1 (en) * | 2013-01-10 | 2014-08-21 | Dan ALONI | System, method and non-transitory computer readable medium for compressing genetic information |
| CN109712674A (zh) * | 2019-01-14 | 2019-05-03 | 深圳市泰尔迪恩生物信息科技有限公司 | 注释数据库索引结构、快速注释遗传变异的方法及系统 |
| US20190214111A1 (en) * | 2016-10-11 | 2019-07-11 | Genomsys Sa | Method and systems for the representation and processing of bioinformatics data using reference sequences |
| CN110168652A (zh) * | 2016-10-11 | 2019-08-23 | 耶诺姆希斯股份公司 | 用于存储和访问生物信息学数据的方法和系统 |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10090857B2 (en) * | 2010-04-26 | 2018-10-02 | Samsung Electronics Co., Ltd. | Method and apparatus for compressing genetic data |
| US8412462B1 (en) * | 2010-06-25 | 2013-04-02 | Annai Systems, Inc. | Methods and systems for processing genomic data |
| EP2595076B1 (en) * | 2011-11-18 | 2019-05-15 | Tata Consultancy Services Limited | Compression of genomic data |
| US11998540B2 (en) | 2015-12-11 | 2024-06-04 | The General Hospital Corporation | Compositions and methods for treating drug-tolerant glioblastoma |
| US20170177597A1 (en) * | 2015-12-22 | 2017-06-22 | DNANEXUS, Inc. | Biological data systems |
| WO2017153456A1 (en) * | 2016-03-09 | 2017-09-14 | Sophia Genetics S.A. | Methods to compress, encrypt and retrieve genomic alignment data |
| US10790044B2 (en) * | 2016-05-19 | 2020-09-29 | Seven Bridges Genomics Inc. | Systems and methods for sequence encoding, storage, and compression |
| SG11201903858XA (en) | 2016-10-28 | 2019-05-30 | Illumina Inc | Bioinformatics systems, apparatuses, and methods for performing secondary and/or tertiary processing |
| US10554220B1 (en) * | 2019-01-30 | 2020-02-04 | International Business Machines Corporation | Managing compression and storage of genomic data |
-
2020
- 2020-10-17 CN CN202080073109.1A patent/CN114556482A/zh active Pending
- 2020-10-17 EP EP20792983.7A patent/EP4046279A1/en active Pending
- 2020-10-17 BR BR112022007331A patent/BR112022007331A2/pt unknown
- 2020-10-17 US US17/767,070 patent/US11916576B2/en active Active
- 2020-10-17 WO PCT/EP2020/079298 patent/WO2021074440A1/en not_active Ceased
- 2020-10-17 JP JP2022522858A patent/JP7631330B2/ja active Active
-
2025
- 2025-02-05 JP JP2025017259A patent/JP2025069371A/ja active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120268298A1 (en) * | 2009-09-04 | 2012-10-25 | Yun-Sik Oh | Method and apparatus for compressing and decompressing block unit data |
| US20140232574A1 (en) * | 2013-01-10 | 2014-08-21 | Dan ALONI | System, method and non-transitory computer readable medium for compressing genetic information |
| US20190214111A1 (en) * | 2016-10-11 | 2019-07-11 | Genomsys Sa | Method and systems for the representation and processing of bioinformatics data using reference sequences |
| CN110168652A (zh) * | 2016-10-11 | 2019-08-23 | 耶诺姆希斯股份公司 | 用于存储和访问生物信息学数据的方法和系统 |
| CN109712674A (zh) * | 2019-01-14 | 2019-05-03 | 深圳市泰尔迪恩生物信息科技有限公司 | 注释数据库索引结构、快速注释遗传变异的方法及系统 |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2024148566A1 (zh) * | 2023-01-12 | 2024-07-18 | 华为技术有限公司 | 数据压缩传输方法、装置、设备以及存储介质 |
| CN116521063A (zh) * | 2023-03-31 | 2023-08-01 | 北京瑞风协同科技股份有限公司 | 一种hdf5的试验数据高效读写方法及装置 |
| CN116521063B (zh) * | 2023-03-31 | 2024-03-26 | 北京瑞风协同科技股份有限公司 | 一种hdf5的试验数据高效读写方法及装置 |
| CN117312261A (zh) * | 2023-11-29 | 2023-12-29 | 苏州元脑智能科技有限公司 | 文件的压缩编码方法、装置存储介质及电子设备 |
| CN117312261B (zh) * | 2023-11-29 | 2024-02-09 | 苏州元脑智能科技有限公司 | 文件的压缩编码方法、装置存储介质及电子设备 |
| WO2025201026A1 (zh) * | 2024-03-29 | 2025-10-02 | 华为技术有限公司 | 一种通信方法及装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| BR112022007331A2 (pt) | 2022-07-05 |
| US11916576B2 (en) | 2024-02-27 |
| US20220368347A1 (en) | 2022-11-17 |
| JP2022553199A (ja) | 2022-12-22 |
| JP2025069371A (ja) | 2025-04-30 |
| EP4046279A1 (en) | 2022-08-24 |
| WO2021074440A1 (en) | 2021-04-22 |
| JP7631330B2 (ja) | 2025-02-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11916576B2 (en) | System and method for effective compression, representation and decompression of diverse tabulated data | |
| Harris et al. | Improved representation of sequence bloom trees | |
| Holley et al. | Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage | |
| Cox et al. | Large-scale compression of genomic sequence databases with the Burrows–Wheeler transform | |
| CN110678929A (zh) | 用于高效压缩基因组序列读段的方法和系统 | |
| Janin et al. | BEETL-fastq: a searchable compressed archive for DNA reads | |
| EP3526709B1 (en) | Efficient data structures for bioinformatics information representation | |
| CN114556318A (zh) | 可定制的分隔文本压缩框架 | |
| CN110178183B (zh) | 用于传输生物信息学数据的方法和系统 | |
| EP3309697A1 (en) | System and method for storing and accessing data | |
| Holley et al. | Bloom filter trie–a data structure for pan-genome storage | |
| CN110168652B (zh) | 用于存储和访问生物信息学数据的方法和系统 | |
| KR20230003493A (ko) | Mpeg-g의 효율적인 데이터 압축 방법 및 시스템 | |
| Meng et al. | Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach | |
| JP7362481B2 (ja) | ゲノムシーケンスデータをコード化する方法、コード化されたゲノムデータをデコード化する方法、ゲノムシーケンスデータをコード化するためのゲノムエンコーダ、ゲノムデータをデコードするためのゲノムデコーダ、及びコンピュータ読み取り可能な記録媒体 | |
| Brown et al. | Improved pangenomic classification accuracy with chain statistics | |
| Pandey et al. | VariantStore: an index for large-scale genomic variant search | |
| US12445148B2 (en) | System and method for effective compression representation and decompression of diverse tabulated data | |
| CN110663022B (zh) | 使用基因组描述符紧凑表示生物信息学数据的方法和设备 | |
| Lichtenwalter et al. | Genotypic data in relational databases: efficient storage and rapid retrieval | |
| Luo et al. | GSC: efficient lossless compression of VCF files with fast query | |
| Eskandar et al. | Lossless pangenome indexing using tag arrays | |
| Dorok et al. | Efficient storage and analysis of genome data in databases | |
| HK40082649B (en) | Efficient data structures for bioinformatics information representation | |
| HK40082649A (en) | Efficient data structures for bioinformatics information representation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |