WO2019128409A1 - Data compression and storage method and data compression and storage device - Google Patents

Data compression and storage method and data compression and storage device Download PDF

Info

Publication number
WO2019128409A1
WO2019128409A1 PCT/CN2018/111180 CN2018111180W WO2019128409A1 WO 2019128409 A1 WO2019128409 A1 WO 2019128409A1 CN 2018111180 W CN2018111180 W CN 2018111180W WO 2019128409 A1 WO2019128409 A1 WO 2019128409A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
compression
fields
compressed
different
Prior art date
Application number
PCT/CN2018/111180
Other languages
French (fr)
Chinese (zh)
Inventor
何东杰
Original Assignee
中国银联股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国银联股份有限公司 filed Critical 中国银联股份有限公司
Publication of WO2019128409A1 publication Critical patent/WO2019128409A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Definitions

  • the present invention relates to data processing technologies, and in particular, to a data compression storage method and a data compression storage device.
  • the compression tools used by enterprises to store data are common tools for all data.
  • the characteristics of enterprise data are not fully considered. Therefore, the compression efficiency of the data is not very high.
  • the present invention is directed to a further data compression storage method and data compression storage device.
  • the compression step compresses and compresses the compressed data for different fields based on different data contents.
  • the relationship between the determination fields is determined to be strong, and the compression policy is set according to the strength of the association relationship.
  • said compressing step comprises the following substeps:
  • said compressing step comprises the following substeps:
  • the multiple fields in which the correlation relationship exists are combined, and for the combined fields, different compression policies are used for different data contents for compression storage.
  • said compressing step comprises the following substeps:
  • binary storage is performed for the enumerated character string field, and the string value is converted to an integer or floating point for compression storage.
  • the short fields are combined and then compressed, the approximate fields are compressed and the compressed information is compressed, and only one of the fields is reversely stored, and the inclusion relationship between the fields is included. Store redundant information after compressing it.
  • binary storage is performed for the enumerated character string field, the string value is converted into an integer or floating point for compression storage, the short field is combined, then compressed, and the approximate field is compressed.
  • the remaining information is compressed and stored, and only one of the fields is reversely stored for the information in the reverse order, and the compressed redundant information exists for the inclusion relationship between the fields.
  • the mapping relationship storage step establishes a mapping relationship between the original data and the compressed data and stores the relationship.
  • a segmentation module for splitting raw data into multiple fields
  • the compression module is configured to compress and store the compressed data according to different data content by using different compression policies for different fields.
  • the compression module determines that the association relationship between the fields is strong, and sets a compression policy according to the strength of the association relationship.
  • the mapping relationship storage module is configured to establish a mapping relationship between the original data and the compressed data and store the relationship.
  • the compressing step has the following sub-modules:
  • the content analysis sub-module performs content analysis on data cut into multiple fields, and establishes an association relationship between the fields
  • Compression sub-module for a single field, uses different compression strategies for different data content for compressed storage.
  • the compression module is provided with:
  • the content analysis sub-module performs content analysis on data cut into multiple fields, establishes a data distribution map and an association relationship diagram between the fields, and identifies a correlation relationship between the data fields based on the data distribution graph and the association relationship graph;
  • the compression sub-module combines multiple fields with related relationships, and uses different compression policies for different data contents for compressed storage for the combined fields.
  • the compression module is provided with:
  • the content analysis sub-module performs content analysis on data cut into multiple fields, establishes a data distribution map and an association relationship diagram between the fields, and identifies a correlation relationship between the data fields based on the data distribution graph and the association relationship graph;
  • the compression sub-module for a single field, uses different compression policies for different data contents to be compressed and stored, and also combines multiple fields of related relationships, and uses different compression strategies for different data contents for the combined fields. Perform compressed storage.
  • the computer readable medium of the present invention has stored thereon a computer program, characterized in that the computer program is executed by a processor to implement the data compression storage method described above.
  • the computer device of the present invention comprises a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor executes the computer program to implement the steps of the data compression storage method described above .
  • an efficient data compression scheme for enterprise data characteristics is proposed.
  • a corresponding efficient compression algorithm is used for different data fields. , thereby improving the efficiency of data compression compression.
  • the general data compression tools such as GZIP and SNAPPY, there is a significant improvement in data compression rate.
  • FIG. 1 is a flow chart showing a data compression storage method of the present invention.
  • Fig. 2 is a block diagram showing the structure of a data compression storage device of the present invention.
  • the main idea of the data compression storage method and the data compression storage device of the present invention is to analyze the enterprise data content, establish a data distribution map, combine the data content and the data distribution, and adopt a corresponding optimized compression algorithm for each information.
  • the enumeration type string field is compressed by binary code, and the string storing the number is converted into an integer or a floating point type.
  • the associated fields in the data content can be combined and then compressed.
  • some fields are from a combination of other fields, and only one of the data can be stored, and some of the fields are in the reverse order of other fields, and only one of the data is stored.
  • FIG. 1 is a flow chart showing a data compression storage method of the present invention.
  • the data compression storage method of the present invention includes the following steps:
  • Segmenting step S100 cutting the original data into a plurality of fields
  • the compressing step S200 compressing and storing the compressed compressed data by using different optimized compression strategies for different fields, wherein the relationship between the determination fields is strong and weak, according to the strength of the association relationship, according to different data contents. Different compression strategies.
  • the relationship between the data table fields and the fields is established through the analysis of the data content, and then the corresponding optimized compression algorithm is used for compression, so that the effect of increasing the data compression rate can be achieved.
  • a mapping relationship storing step S300 may be further set, in which the mapping relationship between the original data and the compressed data is established in the metadata and stored, so that when the data is accessed from the outside, The mapping relationship smoothly resolves the original data.
  • FIG. 2 is a block diagram showing the structure of a data compression storage device of the present invention.
  • the data compression storage device of the present invention comprises:
  • a segmentation module 100 for dividing the original data into a plurality of fields
  • the compression module 200 is configured to compress and store the compressed compressed data by using different optimized compression policies for different fields, wherein the compression module 200 determines that the relationship between the fields is strong or not, according to the association. Strong relationship, set compression strategy.
  • the mapping relationship storage module 300 may be further configured to establish a mapping relationship between the original data and the compressed data in the mapping relationship storage module 300, so that the data can be accessed according to the mapping relationship when accessing data from the outside. Parse out the raw data.
  • the mapping relationship storage module 300 is not a structural unit necessary for the data compression storage device of the present invention, but is preferably one module.
  • the first embodiment relates to an embodiment in which compressed storage is performed for each field using an optimized compression strategy.
  • the data compression storage method of the first embodiment includes a segmentation step and a compression step, wherein the segmentation step is the same as the segmentation step S100 described above, and the compression step specifically includes the following substeps:
  • enumeration type uses binary storage, string value conversion to integer or floating point storage, and the like.
  • a mapping relationship storing step may be further set, in which a mapping relationship between the original data and the compressed data is established and stored, so that when the data is accessed from the outside, the mapping relationship can be The original data is successfully parsed.
  • the data compression storage device of the first embodiment includes a segmentation module and a compression module.
  • the function of the segmentation module in the first embodiment is the same as that of the segmentation module 100 described above.
  • the compression module in the first embodiment specifically includes the following sub-modules:
  • the content analysis sub-module performs content analysis on data cut into multiple fields, and establishes an association relationship between the fields
  • Compression sub-module for a single field, uses different compression strategies for different data content for compressed storage.
  • mapping relationship storage module may be selectively set, and the mapping relationship between the original data and the compressed data is established and stored in the mapping relationship storage module.
  • the second embodiment relates to an implementation that uses an optimized compression strategy for multiple fields.
  • the data compression storage method of the second embodiment includes a segmentation step and a compression step, wherein the segmentation step is the same as the segmentation step S100 described above, and the compression step specifically includes the following substeps:
  • the multiple fields in which the correlation relationship exists are combined, and for the combined fields, different compression policies are used for different data contents for compression storage.
  • a compression strategy multiple fields are combined, and a related optimized data compression storage method is adopted; for example, a short field is combined and then compressed, and an approximate field is compressed and compressed, and the information is reversed between fields. Store one of them, compressive redundant information with an inclusion relationship between fields, store it, and so on.
  • a mapping relationship storing step may be further set, in which a mapping relationship between the original data and the compressed data is established and stored, so that when the data is accessed from the outside, the mapping relationship can be The original data is successfully parsed.
  • the data compression storage device of the second embodiment includes a segmentation module and a compression module.
  • the function of the segmentation module in the first embodiment is the same as that of the segmentation module 100 described above.
  • the compression module in the first embodiment specifically includes the following sub-modules:
  • the content analysis sub-module performs content analysis on data cut into multiple fields, establishes a data distribution map and an association relationship diagram between the fields, and identifies a correlation relationship between the data fields based on the data distribution graph and the association relationship graph;
  • the compression sub-module combines multiple fields with related relationships, and uses different compression policies for different data contents for compressed storage for the combined fields.
  • mapping relationship storage module may be selectively set, and the mapping relationship between the original data and the compressed data is established and stored in the mapping relationship storage module.
  • the third embodiment relates to an implementation that uses an optimized compression strategy for both a single field and multiple field combinations.
  • the data compression storage method of the third embodiment includes a segmentation step and a compression step, wherein the segmentation step is the same as the segmentation step S100 described above, and the compression step specifically includes the following substeps:
  • the optimized data compression storage method is adopted for single field and multi-field; for example, the enumeration type uses binary storage, the string value is converted into integer or floating point storage, and the short field is combined.
  • the storage approximating the field to compress the redundant information
  • compressing the storage storing the information in the reverse order of the fields, storing only one of them, storing the redundant information with the inclusion relationship between the fields, and so on.
  • a mapping relationship storing step may be further set, in which a mapping relationship between the original data and the compressed data is established and stored, so that when the data is accessed from the outside, the mapping relationship can be The original data is successfully parsed.
  • a data compression storage device includes a segmentation module and a compression module.
  • the function of the segmentation module in the first embodiment is the same as that of the segmentation module 100 described above.
  • the compression module in the first embodiment specifically includes the following sub-modules:
  • the content analysis sub-module performs content analysis on the data cut into multiple fields, establishes a data distribution map and an association relationship diagram between the fields, and identifies a correlation between the data fields based on the data distribution graph and the association relationship graph;
  • the compression sub-module for a single field, uses different compression policies for different data contents to be compressed and stored, and also combines multiple fields of related relationships, and uses different compression strategies for different data contents for the combined fields. Perform compressed storage.
  • mapping relationship storage module may be selectively set, and the mapping relationship between the original data and the compressed data is established and stored in the mapping relationship storage module.
  • the present invention also provides a computer readable medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements a data compression storage method in accordance with each of the above embodiments.
  • the present invention also provides a computer device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor implements the data compression storage described above when the computer program is executed The steps of the method.

Abstract

The present invention relates to a data compression and storage method and a data compression and storage device. The data compression method comprises the following steps: a segmentation step: segmenting original data into a plurality of fields; and a compression step: on the basis of different data contents, compressing different fields by using different compression policies, and storing the compressed data. According to the data compression and storage method and the data compression and storage device of the present invention, different compression methods can be used in consideration of different data contents, data compression efficiency can be effectively improved, and compared with common data compression tools such as GZIP and SNAPPY, data compression rate is significantly improved.

Description

一种数据压缩存储方法以及数据压缩存储装置Data compression storage method and data compression storage device 技术领域Technical field
本发明涉及数据处理技术,具体涉及一种数据压缩存储方法以及数据压缩存储装置。The present invention relates to data processing technologies, and in particular, to a data compression storage method and a data compression storage device.
背景技术Background technique
企业在进行数据存储时候,从节约存储空间以及提升读取效率方面考虑,一般都会对数据进行压缩存储。但是,通用的压缩工具针对的是所有的数据。When enterprises are storing data, they generally compress and store data in terms of saving storage space and improving read efficiency. However, the generic compression tool is for all data.
再者,现有的常见的数据压缩工具包括GZIP、SNAPPY等,是针对通用数据进行压缩。Furthermore, existing common data compression tools, including GZIP, SNAPPY, etc., are compressed for general data.
但是,如上所说,目前企业进行数据存储时采用的压缩工具是对所有数据通用的工具,对企业来说,没有充分考虑到企业数据的特点。因此,数据的压缩效率并不是很高。However, as mentioned above, the compression tools used by enterprises to store data are common tools for all data. For enterprises, the characteristics of enterprise data are not fully considered. Therefore, the compression efficiency of the data is not very high.
发明内容Summary of the invention
鉴于所述问题,本发明旨在提出一种进一步数据压缩存储方法以及数据压缩存储装置。In view of the problems, the present invention is directed to a further data compression storage method and data compression storage device.
本发明的数据压缩存储方法,其特征在于,包括下述步骤:The data compression storage method of the present invention is characterized in that it comprises the following steps:
切分步骤,将原始数据切分成多个字段;以及Splitting the steps to split the raw data into multiple fields;
压缩步骤,基于数据内容的不同,对于不同字段采用不同的压缩策略进行压缩并存储压缩后的压缩数据。The compression step compresses and compresses the compressed data for different fields based on different data contents.
优选地,在所述压缩步骤中,判断字段之间的关联关系强弱,根据关联关系强弱,设定压缩策略。Preferably, in the compressing step, the relationship between the determination fields is determined to be strong, and the compression policy is set according to the strength of the association relationship.
优选地,所述压缩步骤包括下述子步骤:Preferably, said compressing step comprises the following substeps:
对切成多个字段的数据进行内容分析,建立字段之间的关联关系;以及Perform content analysis on data cut into multiple fields to establish associations between fields;
针对单个字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。For a single field, different compression policies are used for different data content for compressed storage.
优选地,所述压缩步骤包括下述子步骤:Preferably, said compressing step comprises the following substeps:
对切成多个字段的数据进行内容分析,建立数据分布图以及字段之间的关联关系图,基于数据分布图以及关联关系图识别数据字段之间的相关关系;以及Perform content analysis on data cut into multiple fields, establish a data distribution map and an association relationship diagram between the fields, and identify correlations between the data fields based on the data distribution graph and the association graph;
将存在相关关系的多个字段进行组合,针对组合后的字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。The multiple fields in which the correlation relationship exists are combined, and for the combined fields, different compression policies are used for different data contents for compression storage.
优选地,所述压缩步骤包括下述子步骤:Preferably, said compressing step comprises the following substeps:
对切成多个字段的数据进行内容分析,建立数据分布图以及字段之间的关联关系图,基于数据分布图以及关联关系图识别数据字段之间的相关关系;以及Perform content analysis on data cut into multiple fields, establish a data distribution map and an association relationship diagram between the fields, and identify correlations between the data fields based on the data distribution graph and the association graph;
针对单个字段,对于不同的数据内容采用不同的压缩策略进行压缩存储,并且,另一方面,也将存在相关关系的多个字段进行组合,针对组合后的字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。For a single field, different compression schemes are used for different data content for compression storage, and on the other hand, multiple fields of correlation are also combined, and for the combined fields, different data content is used for different combinations. The compression strategy is compressed for storage.
优选地,作为压缩策略,对于枚举型字符串字段使用二进制进行压缩存储、对于字符串数值转换为整数或者浮点进行压缩存储。Preferably, as a compression strategy, binary storage is performed for the enumerated character string field, and the string value is converted to an integer or floating point for compression storage.
优选地,作为压缩策略,将短字段进行组合后再进行压缩存储、将近似字段压缩冗余信息后进行压缩存储、对于字段之间信息逆序的仅存储其中一个、对于字段之间存在包含关系的压缩冗余信息后存储。Preferably, as a compression strategy, the short fields are combined and then compressed, the approximate fields are compressed and the compressed information is compressed, and only one of the fields is reversely stored, and the inclusion relationship between the fields is included. Store redundant information after compressing it.
优选地,作为压缩策略,对于枚举型字符串字段使用二进制进行压缩存储、将字符串数值转换为整数或者浮点进行压缩存储、将短字段进行组合后再进行压缩存储、将近似字段压缩冗余信息后进行压缩存储、对于字段之间信息逆序的仅存储其中一个、对于字段之间存在包含关系的压缩冗余信息后存储。Preferably, as a compression strategy, binary storage is performed for the enumerated character string field, the string value is converted into an integer or floating point for compression storage, the short field is combined, then compressed, and the approximate field is compressed. The remaining information is compressed and stored, and only one of the fields is reversely stored for the information in the reverse order, and the compressed redundant information exists for the inclusion relationship between the fields.
优选地,进一步具备:Preferably, further comprising:
映射关系存储步骤,建立所述原始数据与所述压缩数据之间的映射关系并存储。The mapping relationship storage step establishes a mapping relationship between the original data and the compressed data and stores the relationship.
本发明的数据压缩存储装置,其特征在于,具备:A data compression storage device according to the present invention is characterized by comprising:
切分模块,用于将原始数据切分成多个字段;以及a segmentation module for splitting raw data into multiple fields;
压缩模块,用于基于数据内容的不同,对于不同字段采用不同的压缩策略进行压缩并存储压缩后的压缩数据。The compression module is configured to compress and store the compressed data according to different data content by using different compression policies for different fields.
优选地,所述压缩模块判断字段之间的关联关系强弱,根据关联关系强弱,设定压缩策略。Preferably, the compression module determines that the association relationship between the fields is strong, and sets a compression policy according to the strength of the association relationship.
优选地,进一步具备:Preferably, further comprising:
映射关系存储模块,用于建立所述原始数据与所述压缩数据之间的映射关系并存储。The mapping relationship storage module is configured to establish a mapping relationship between the original data and the compressed data and store the relationship.
优选地,所述压缩步骤具备下述子模块:Preferably, the compressing step has the following sub-modules:
内容分析子模块,对切成多个字段的数据进行内容分析,建立字段之间的关联关系;以及The content analysis sub-module performs content analysis on data cut into multiple fields, and establishes an association relationship between the fields;
压缩子模块,针对单个字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。Compression sub-module, for a single field, uses different compression strategies for different data content for compressed storage.
优选地,所述压缩模块具备:Preferably, the compression module is provided with:
内容分析子模块,对切成多个字段的数据进行内容分析,建立数据分布图以及字段之间的关联关系图,基于数据分布图以及关联关系图识别数据字段之间的相关关系;以及The content analysis sub-module performs content analysis on data cut into multiple fields, establishes a data distribution map and an association relationship diagram between the fields, and identifies a correlation relationship between the data fields based on the data distribution graph and the association relationship graph;
压缩子模块,将存在相关关系的多个字段组合,针对组合后的字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。The compression sub-module combines multiple fields with related relationships, and uses different compression policies for different data contents for compressed storage for the combined fields.
优选地,所述压缩模块具备:Preferably, the compression module is provided with:
内容分析子模块,对切成多个字段的数据进行内容分析,建立数据分布图以及字段之间的关联关系图,基于数据分布图以及关联关系图识别数据字段之间的相关关系;以及The content analysis sub-module performs content analysis on data cut into multiple fields, establishes a data distribution map and an association relationship diagram between the fields, and identifies a correlation relationship between the data fields based on the data distribution graph and the association relationship graph;
压缩子模块,针对单个字段,对于不同的数据内容采用不同的压缩策略进行压缩存储,并且也将存在相关关系的多个字段组合,针对组合后的字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。The compression sub-module, for a single field, uses different compression policies for different data contents to be compressed and stored, and also combines multiple fields of related relationships, and uses different compression strategies for different data contents for the combined fields. Perform compressed storage.
本发明的计算机可读介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现上述的数据压缩存储方法。The computer readable medium of the present invention has stored thereon a computer program, characterized in that the computer program is executed by a processor to implement the data compression storage method described above.
本发明的计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现上述的数据压缩存储方法的步骤。The computer device of the present invention comprises a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor executes the computer program to implement the steps of the data compression storage method described above .
如上所述,根据本发明的数据压缩存储方法以及数据压缩存储装置,提出了一种面向企业数据特点的高效数据压缩方案,通过针对企业数据特点进行分析,针对不同数据字段使用相应的高效压缩算法,从而提高数据压缩压缩效率。相比通用的GZIP、SNAPPY等数据压缩工具,在数据压缩率上有明显的提升。As described above, according to the data compression storage method and the data compression storage device of the present invention, an efficient data compression scheme for enterprise data characteristics is proposed. By analyzing the characteristics of enterprise data, a corresponding efficient compression algorithm is used for different data fields. , thereby improving the efficiency of data compression compression. Compared with the general data compression tools such as GZIP and SNAPPY, there is a significant improvement in data compression rate.
附图说明DRAWINGS
图1是表示本发明的数据压缩存储方法的流程图。1 is a flow chart showing a data compression storage method of the present invention.
图2是表示本发明的数据压缩存储装置的结构示意图。Fig. 2 is a block diagram showing the structure of a data compression storage device of the present invention.
具体实施方式Detailed ways
下面介绍的是本发明的多个实施例中的一些,旨在提供对本发明的基本了解。并不旨在确认本发明的关键或决定性的要素或限定所要保护的范围。The following are some of the various embodiments of the invention, which are intended to provide a basic understanding of the invention. It is not intended to identify key or critical elements of the invention or the scope of the invention.
本发明的数据压缩存储方法以及数据压缩存储装置的主要构思在于,针对企业数据内容进行分析,建立数据分布图,结合数据内容和数据分布,对每个信息采用对应的优化压缩算法。比如说对于枚举型字符串字段采用二进制编码进行压缩,将存储数字的字符串转换为整数或者浮点类型等。再者,对于数据内容中相关联的字段,可以进行合并后再进行压缩。再者,有些字段来自于其他字段的组合,则可以仅存储其中一份数据,有些字段是其他字段的倒序,则仅存储其中一份数据。The main idea of the data compression storage method and the data compression storage device of the present invention is to analyze the enterprise data content, establish a data distribution map, combine the data content and the data distribution, and adopt a corresponding optimized compression algorithm for each information. For example, the enumeration type string field is compressed by binary code, and the string storing the number is converted into an integer or a floating point type. Furthermore, the associated fields in the data content can be combined and then compressed. Furthermore, some fields are from a combination of other fields, and only one of the data can be stored, and some of the fields are in the reverse order of other fields, and only one of the data is stored.
接着,对于本发明的数据压缩存储方法进行说明。Next, a data compression storage method of the present invention will be described.
图1是表示本发明的数据压缩存储方法的流程图。1 is a flow chart showing a data compression storage method of the present invention.
如图1所示,本发明的数据压缩存储方法包括下述步骤:As shown in FIG. 1, the data compression storage method of the present invention includes the following steps:
切分步骤S100:将原始数据切分成多个字段;以及Segmenting step S100: cutting the original data into a plurality of fields;
压缩步骤S200:基于数据内容的不同,对于不同字段采用不同的优化的压缩策略进行压缩并存储压缩后的压缩数据,其中,判断字段之间的关联关系强弱,根据关联关系强弱,设定不同的压缩策略。The compressing step S200: compressing and storing the compressed compressed data by using different optimized compression strategies for different fields, wherein the relationship between the determination fields is strong and weak, according to the strength of the association relationship, according to different data contents. Different compression strategies.
根据切分步骤S100和压缩步骤S200,通过数据内容的分析,建立数据表字段、字段之间的关联关系,再采用对应的优化压缩算法进行压缩,从而能够达到提高数据压缩率的效果。According to the segmentation step S100 and the compression step S200, the relationship between the data table fields and the fields is established through the analysis of the data content, and then the corresponding optimized compression algorithm is used for compression, so that the effect of increasing the data compression rate can be achieved.
作为优选的方式,进一步可设置映射关系存储步骤S300,在该映射关系存储步骤S300中在元数据中建立原始数据与压缩数据之间的映射关系并存储,这样,从外部访问数据时能够根据该映射关系顺利解析出原始数据。In a preferred manner, a mapping relationship storing step S300 may be further set, in which the mapping relationship between the original data and the compressed data is established in the metadata and stored, so that when the data is accessed from the outside, The mapping relationship smoothly resolves the original data.
接着,对于本发明的数据压缩存储装置进行简单说明。图2是表示本发明的数据压缩存储装置的结构示意图。Next, a brief description will be given of the data compression storage device of the present invention. Fig. 2 is a block diagram showing the structure of a data compression storage device of the present invention.
如图2所示,本发明的数据压缩存储装置具备:As shown in FIG. 2, the data compression storage device of the present invention comprises:
切分模块100,用于将原始数据切分成多个字段;以及a segmentation module 100 for dividing the original data into a plurality of fields;
压缩模块200,用于基于数据内容的不同,对于不同字段采用不同的优化压缩策 略进行压缩并存储压缩后的压缩数据,其中,所述压缩模块200判断字段之间的关联关系强弱,根据关联关系强弱,设定压缩策略。The compression module 200 is configured to compress and store the compressed compressed data by using different optimized compression policies for different fields, wherein the compression module 200 determines that the relationship between the fields is strong or not, according to the association. Strong relationship, set compression strategy.
作为优选的方式,进一步还可以设置映射关系存储模块300,在该映射关系存储模块300中建立原始数据与压缩数据之间的映射关系并存储,这样,从外部访问数据时能够根据该映射关系顺利解析出原始数据。另外,映射关系存储模块300并不是本发明的数据压缩存储装置所必须的构造单元,而只是优选具备的一个模块。In a preferred manner, the mapping relationship storage module 300 may be further configured to establish a mapping relationship between the original data and the compressed data in the mapping relationship storage module 300, so that the data can be accessed according to the mapping relationship when accessing data from the outside. Parse out the raw data. Further, the mapping relationship storage module 300 is not a structural unit necessary for the data compression storage device of the present invention, but is preferably one module.
接着对于本发明的数据压缩存储方法以及数据压缩存储装置的具体实施方式进行说明。Next, a specific embodiment of the data compression storage method and the data compression storage device of the present invention will be described.
第一实施方式First embodiment
第一实施方式涉及的是针对每个字段使用优化的压缩策略进行压缩存储的实施方式。The first embodiment relates to an embodiment in which compressed storage is performed for each field using an optimized compression strategy.
首先,对于第一实施方式的数据压缩存储方法进行说明。First, the data compression storage method of the first embodiment will be described.
第一实施方式的数据压缩存储方法包括切分步骤和压缩步骤,其中切分步骤与上述切分步骤S100相同,而压缩步骤具体包括下述子步骤:The data compression storage method of the first embodiment includes a segmentation step and a compression step, wherein the segmentation step is the same as the segmentation step S100 described above, and the compression step specifically includes the following substeps:
对切成多个字段的数据进行内容分析,建立字段之间的关联关系;以及Perform content analysis on data cut into multiple fields to establish associations between fields;
针对单个字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。For a single field, different compression policies are used for different data content for compressed storage.
例如,作为压缩策略,对不同的数据内容采用相关优化的数据压缩存储方式;比如枚举型使用二进制存储、字符串数值转换为整数或者浮点存储等等。For example, as a compression strategy, different optimized data compression storage methods are used for different data contents; for example, enumeration type uses binary storage, string value conversion to integer or floating point storage, and the like.
作为优选的方式,进一步可设置映射关系存储步骤,在该映射关系存储步骤中建立所述原始数据与所述压缩数据之间的映射关系并存储,这样,从外部访问数据时能够根据该映射关系顺利解析出原始数据。In a preferred manner, a mapping relationship storing step may be further set, in which a mapping relationship between the original data and the compressed data is established and stored, so that when the data is accessed from the outside, the mapping relationship can be The original data is successfully parsed.
再者,简单说明第一实施方式的数据压缩存储装置。Furthermore, the data compression storage device of the first embodiment will be briefly described.
第一实施方式的数据压缩存储装置具备:切分模块以及压缩模块。第一实施方式中的切分模块的功能与上述切分模块100相同,第一实施方式中的压缩模块具体地具备下述子模块:The data compression storage device of the first embodiment includes a segmentation module and a compression module. The function of the segmentation module in the first embodiment is the same as that of the segmentation module 100 described above. The compression module in the first embodiment specifically includes the following sub-modules:
内容分析子模块,对切成多个字段的数据进行内容分析,建立字段之间的关联关系;以及The content analysis sub-module performs content analysis on data cut into multiple fields, and establishes an association relationship between the fields;
压缩子模块,针对单个字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。Compression sub-module, for a single field, uses different compression strategies for different data content for compressed storage.
另外,也可以选择性地设置映射关系存储模块,在该映射关系存储模块中建立原始数据与压缩数据之间的映射关系并存储。In addition, the mapping relationship storage module may be selectively set, and the mapping relationship between the original data and the compressed data is established and stored in the mapping relationship storage module.
第二实施方式Second embodiment
第二实施方式涉及的是针对多个字段使用优化的压缩策略的实施方式。The second embodiment relates to an implementation that uses an optimized compression strategy for multiple fields.
首先,对于第二实施方式的数据压缩存储方法进行说明。First, the data compression storage method of the second embodiment will be described.
第二实施方式的数据压缩存储方法包括切分步骤和压缩步骤,其中切分步骤与上述切分步骤S100相同,而压缩步骤具体包括下述子步骤:The data compression storage method of the second embodiment includes a segmentation step and a compression step, wherein the segmentation step is the same as the segmentation step S100 described above, and the compression step specifically includes the following substeps:
对切成多个字段的数据进行内容分析,建立数据分布图以及字段之间的关联关系图,基于数据分布图以及关联关系图识别数据字段之间的相关关系;以及Perform content analysis on data cut into multiple fields, establish a data distribution map and an association relationship diagram between the fields, and identify correlations between the data fields based on the data distribution graph and the association graph;
将存在相关关系的多个字段组合,针对组合后的字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。The multiple fields in which the correlation relationship exists are combined, and for the combined fields, different compression policies are used for different data contents for compression storage.
例如,作为压缩策略,对多个字段进行组合,并采用相关优化的数据压缩存储方式;比如短字段进行组合后再压缩存储、近似字段压缩冗余信息后压缩存储、字段之间信息逆序的仅存储其中一个、字段之间存在包含关系的压缩冗余信息后存储等等。For example, as a compression strategy, multiple fields are combined, and a related optimized data compression storage method is adopted; for example, a short field is combined and then compressed, and an approximate field is compressed and compressed, and the information is reversed between fields. Store one of them, compressive redundant information with an inclusion relationship between fields, store it, and so on.
作为优选的方式,进一步可设置映射关系存储步骤,在该映射关系存储步骤中建立所述原始数据与所述压缩数据之间的映射关系并存储,这样,从外部访问数据时能够根据该映射关系顺利解析出原始数据。In a preferred manner, a mapping relationship storing step may be further set, in which a mapping relationship between the original data and the compressed data is established and stored, so that when the data is accessed from the outside, the mapping relationship can be The original data is successfully parsed.
再者,简单说明第二实施方式的数据压缩存储装置。Furthermore, the data compression storage device of the second embodiment will be briefly described.
第二实施方式的数据压缩存储装置具备:切分模块以及压缩模块。第一实施方式中的切分模块的功能与上述切分模块100相同,第一实施方式中的压缩模块具体地具备下述子模块:The data compression storage device of the second embodiment includes a segmentation module and a compression module. The function of the segmentation module in the first embodiment is the same as that of the segmentation module 100 described above. The compression module in the first embodiment specifically includes the following sub-modules:
内容分析子模块,对切成多个字段的数据进行内容分析,建立数据分布图以及字段之间的关联关系图,基于数据分布图以及关联关系图识别数据字段之间的相关关系;以及The content analysis sub-module performs content analysis on data cut into multiple fields, establishes a data distribution map and an association relationship diagram between the fields, and identifies a correlation relationship between the data fields based on the data distribution graph and the association relationship graph;
压缩子模块,将存在相关关系的多个字段组合,针对组合后的字段,对于不同的 数据内容采用不同的压缩策略进行压缩存储。The compression sub-module combines multiple fields with related relationships, and uses different compression policies for different data contents for compressed storage for the combined fields.
另外,也可以选择性地设置映射关系存储模块,在该映射关系存储模块中建立原始数据与压缩数据之间的映射关系并存储。In addition, the mapping relationship storage module may be selectively set, and the mapping relationship between the original data and the compressed data is established and stored in the mapping relationship storage module.
第三实施方式Third embodiment
第三实施方式涉及的是针对单个字段和多个字段组合均使用优化的压缩策略的实施方式。The third embodiment relates to an implementation that uses an optimized compression strategy for both a single field and multiple field combinations.
首先,对于第三实施方式的数据压缩存储方法进行说明。First, a data compression storage method of the third embodiment will be described.
第三实施方式的数据压缩存储方法包括切分步骤和压缩步骤,其中切分步骤与上述切分步骤S100相同,而压缩步骤具体包括下述子步骤:The data compression storage method of the third embodiment includes a segmentation step and a compression step, wherein the segmentation step is the same as the segmentation step S100 described above, and the compression step specifically includes the following substeps:
对切成多个字段的数据进行内容分析,建立数据分布图以及字段之间的关联关系图,基于数据分布图以及关联关系图识别数据字段之间的相关关系;以及Perform content analysis on data cut into multiple fields, establish a data distribution map and an association relationship diagram between the fields, and identify correlations between the data fields based on the data distribution graph and the association graph;
针对单个字段,对于不同的数据内容采用不同的压缩策略进行压缩存储,并且也将存在相关关系的多个字段组合,针对组合后的字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。For a single field, different compression policies are used for different data content for compression storage, and multiple fields of related relationships are also combined. For the combined fields, different compression policies are used for different data contents for compression storage.
作为压缩策略,对多个字段进行组合,针对单字段和多字段均采用相关优化的数据压缩存储方式;比如枚举型使用二进制存储、字符串数值转换为整数或者浮点存储、短字段进行组合后再压缩存储、近似字段压缩冗余信息后压缩存储、字段之间信息逆序的仅存储其中一个、字段之间存在包含关系的压缩冗余信息后存储等等。As a compression strategy, multiple fields are combined, and the optimized data compression storage method is adopted for single field and multi-field; for example, the enumeration type uses binary storage, the string value is converted into integer or floating point storage, and the short field is combined. After compressing the storage, approximating the field to compress the redundant information, compressing the storage, storing the information in the reverse order of the fields, storing only one of them, storing the redundant information with the inclusion relationship between the fields, and so on.
作为优选的方式,进一步可设置映射关系存储步骤,在该映射关系存储步骤中建立所述原始数据与所述压缩数据之间的映射关系并存储,这样,从外部访问数据时能够根据该映射关系顺利解析出原始数据。In a preferred manner, a mapping relationship storing step may be further set, in which a mapping relationship between the original data and the compressed data is established and stored, so that when the data is accessed from the outside, the mapping relationship can be The original data is successfully parsed.
再者,简单说明第三实施方式的数据压缩存储装置。Furthermore, the data compression storage device of the third embodiment will be briefly described.
第三实施方式的数据压缩存储装置具备:切分模块以及压缩模块。第一实施方式中的切分模块的功能与上述切分模块100相同,第一实施方式中的压缩模块具体地具备下述子模块:A data compression storage device according to a third embodiment includes a segmentation module and a compression module. The function of the segmentation module in the first embodiment is the same as that of the segmentation module 100 described above. The compression module in the first embodiment specifically includes the following sub-modules:
内容分析子模块,对切成多个字段的数据进行内容分析,建立数据分布图以及字段之间的关联关系图,基于数据分布图以及关联关系图识别数据字段之间的相关 关系;以及The content analysis sub-module performs content analysis on the data cut into multiple fields, establishes a data distribution map and an association relationship diagram between the fields, and identifies a correlation between the data fields based on the data distribution graph and the association relationship graph;
压缩子模块,针对单个字段,对于不同的数据内容采用不同的压缩策略进行压缩存储,并且也将存在相关关系的多个字段组合,针对组合后的字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。The compression sub-module, for a single field, uses different compression policies for different data contents to be compressed and stored, and also combines multiple fields of related relationships, and uses different compression strategies for different data contents for the combined fields. Perform compressed storage.
另外,也可以选择性地设置映射关系存储模块,在该映射关系存储模块中建立原始数据与压缩数据之间的映射关系并存储。In addition, the mapping relationship storage module may be selectively set, and the mapping relationship between the original data and the compressed data is established and stored in the mapping relationship storage module.
本发明还提供一种计算机可读介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现权利要上述各实施方式数据压缩存储方法。The present invention also provides a computer readable medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements a data compression storage method in accordance with each of the above embodiments.
本发明还提供一种计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现上述的数据压缩存储方法的步骤。The present invention also provides a computer device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor implements the data compression storage described above when the computer program is executed The steps of the method.
如上所述,根据本发明的数据压缩存储方法以及数据压缩存储装置,能够考虑到数据内容的不同而采用不同的压缩方法,能够有效提高数据压缩效率,相比通用的GZIP、SNAPPY等数据压缩工具,在数据压缩率上有明显的提升。As described above, according to the data compression storage method and the data compression storage device of the present invention, different compression methods can be adopted in consideration of different data contents, which can effectively improve data compression efficiency, compared to general data compression tools such as GZIP and SNAPPY. , there is a significant increase in data compression rate.
以上例子主要说明了本发明的数据压缩存储方法以及数据压缩存储装置。尽管只对其中一些本发明的具体实施方式进行了描述,但是本领域普通技术人员应当了解,本发明可以在不偏离其主旨与范围内以许多其他的形式实施。因此,所展示的例子与实施方式被视为示意性的而非限制性的,在不脱离如所附各权利要求所定义的本发明精神及范围的情况下,本发明可能涵盖各种的修改与替换。The above examples mainly illustrate the data compression storage method and data compression storage device of the present invention. Although only a few of the specific embodiments of the present invention have been described, it is understood that the invention may be embodied in many other forms without departing from the spirit and scope of the invention. Accordingly, the present invention is to be construed as illustrative and not restrictive, and the invention may cover various modifications without departing from the spirit and scope of the invention as defined by the appended claims With replacement.

Claims (17)

  1. 一种数据压缩存储方法,其特征在于,包括下述步骤:A data compression storage method, comprising the steps of:
    切分步骤,将原始数据切分成多个字段;以及Splitting the steps to split the raw data into multiple fields;
    压缩步骤,基于数据内容的不同,对于不同字段采用不同的压缩策略进行压缩并存储压缩后的压缩数据。The compression step compresses and compresses the compressed data for different fields based on different data contents.
  2. 如权利要求1所述的数据压缩存储方法,其特征在于,A data compression storage method according to claim 1, wherein
    在所述压缩步骤中,判断字段之间的关联关系强弱,根据关联关系强弱,设定压缩策略。In the compressing step, the relationship between the determination fields is determined to be strong, and the compression policy is set according to the strength of the association relationship.
  3. 如权利要求1所述的数据压缩存储方法,其特征在于,A data compression storage method according to claim 1, wherein
    所述压缩步骤包括下述子步骤:The compressing step includes the following sub-steps:
    对切成多个字段的数据进行内容分析,建立字段之间的关联关系;以及Perform content analysis on data cut into multiple fields to establish associations between fields;
    针对单个字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。For a single field, different compression policies are used for different data content for compressed storage.
  4. 如权利要求1所述的数据压缩存储方法,其特征在于,所述压缩步骤包括下述子步骤:The data compression storage method according to claim 1, wherein said compressing step comprises the following substeps:
    对切成多个字段的数据进行内容分析,建立数据分布图以及字段之间的关联关系图,基于数据分布图以及关联关系图识别数据字段之间的相关关系;以及Perform content analysis on data cut into multiple fields, establish a data distribution map and an association relationship diagram between the fields, and identify correlations between the data fields based on the data distribution graph and the association graph;
    将存在相关关系的多个字段进行组合,针对组合后的字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。The multiple fields in which the correlation relationship exists are combined, and for the combined fields, different compression policies are used for different data contents for compression storage.
  5. 如权利要求1所述的数据压缩存储方法,其特征在于,所述压缩步骤包括下述子步骤:The data compression storage method according to claim 1, wherein said compressing step comprises the following substeps:
    对切成多个字段的数据进行内容分析,建立数据分布图以及字段之间的关联关系图,基于数据分布图以及关联关系图识别数据字段之间的相关关系;以及Perform content analysis on data cut into multiple fields, establish a data distribution map and an association relationship diagram between the fields, and identify correlations between the data fields based on the data distribution graph and the association graph;
    针对单个字段,对于不同的数据内容采用不同的压缩策略进行压缩存储,并且,另一方面,也将存在相关关系的多个字段进行组合,针对组合后的字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。For a single field, different compression schemes are used for different data content for compression storage, and on the other hand, multiple fields of correlation are also combined, and for the combined fields, different data content is used for different combinations. The compression strategy is compressed for storage.
  6. 如权利要求3所述的数据压缩存储方法,其特征在于,A data compression storage method according to claim 3, wherein
    作为压缩策略,对于枚举型字符串字段使用二进制进行压缩存储、对于字符串数值转换为整数或者浮点进行压缩存储。As a compression strategy, binary storage is used for enumerated string fields, and string values are converted to integers or floating points for compression storage.
  7. 如权利要求4所述的数据压缩存储方法,其特征在于,A data compression storage method according to claim 4, wherein
    作为压缩策略,将短字段进行组合后再进行压缩存储、将近似字段压缩冗余信息后进行压缩存储、对于字段之间信息逆序的仅存储其中一个、对于字段之间存在包含关系的压缩冗余信息后存储。As a compression strategy, the short fields are combined and then compressed, the approximate fields are compressed and the compressed information is compressed, and only one of the fields is reversed for the information, and the compression relationship exists between the fields. Stored after the information.
  8. 如权利要求5所述的数据压缩存储方法,其特征在于,A data compression storage method according to claim 5, wherein
    作为压缩策略,对于枚举型字符串字段使用二进制进行压缩存储、将字符串数值转换为整数或者浮点进行压缩存储、将短字段进行组合后再进行压缩存储、将近似字段压缩冗余信息后进行压缩存储、对于字段之间信息逆序的仅存储其中一个、对于字段之间存在包含关系的压缩冗余信息后存储。As a compression strategy, binary storage is used for the enumerated string field, the string value is converted to an integer or floating point for compression storage, the short field is combined and then compressed, and the approximate field is compressed with redundant information. Perform compressed storage, store only one of the reverse order of information between fields, and store the compressed redundant information for the inclusion relationship between the fields.
  9. 如权利要求1~8任意一项所述的数据压缩存储方法,其特征在于,进一步具备:映射关系存储步骤,建立所述原始数据与所述压缩数据之间的映射关系并存储。The data compression storage method according to any one of claims 1 to 8, further comprising: a mapping relationship storing step of establishing a mapping relationship between the original data and the compressed data and storing the mapping relationship.
  10. 一种数据压缩存储装置,其特征在于,具备:A data compression storage device, comprising:
    切分模块,用于将原始数据切分成多个字段;以及a segmentation module for splitting raw data into multiple fields;
    压缩模块,用于基于数据内容的不同,对于不同字段采用不同的压缩策略进行压缩并存储压缩后的压缩数据。The compression module is configured to compress and store the compressed data according to different data content by using different compression policies for different fields.
  11. 如权利要求10所述的数据压缩存储装置,其特征在于,A data compression storage device according to claim 10, wherein
    所述压缩模块判断字段之间的关联关系强弱,根据关联关系强弱,设定压缩策略。The compression module determines whether the association relationship between the fields is strong or not, and sets a compression policy according to the strength of the association relationship.
  12. 如权利要求10所述的数据压缩存储装置,其特征在于,进一步具备:The data compression storage device according to claim 10, further comprising:
    映射关系存储模块,用于建立所述原始数据与所述压缩数据之间的映射关系并存储。The mapping relationship storage module is configured to establish a mapping relationship between the original data and the compressed data and store the relationship.
  13. 如权利要求10~12任意一项所述的数据压缩存储装置,其特征在于,所述压缩步骤具备下述子模块:The data compression storage device according to any one of claims 10 to 12, wherein said compressing step comprises the following sub-modules:
    内容分析子模块,对切成多个字段的数据进行内容分析,建立字段之间的关联关系;以及The content analysis sub-module performs content analysis on data cut into multiple fields, and establishes an association relationship between the fields;
    压缩子模块,针对单个字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。Compression sub-module, for a single field, uses different compression strategies for different data content for compressed storage.
  14. 如权利要求1所述的数据压缩存储装置,其特征在于,所述压缩模块具备:The data compression storage device according to claim 1, wherein the compression module is provided with:
    内容分析子模块,对切成多个字段的数据进行内容分析,建立数据分布图以及字段之间的关联关系图,基于数据分布图以及关联关系图识别数据字段之间的相关关系;以及The content analysis sub-module performs content analysis on data cut into multiple fields, establishes a data distribution map and an association relationship diagram between the fields, and identifies a correlation relationship between the data fields based on the data distribution graph and the association relationship graph;
    压缩子模块,将存在相关关系的多个字段组合,针对组合后的字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。The compression sub-module combines multiple fields with related relationships, and uses different compression policies for different data contents for compressed storage for the combined fields.
  15. 如权利要求1所述的数据压缩存储装置,其特征在于,所述压缩模块具备:The data compression storage device according to claim 1, wherein the compression module is provided with:
    内容分析子模块,对切成多个字段的数据进行内容分析,建立数据分布图以及字段之间的关联关系图,基于数据分布图以及关联关系图识别数据字段之间的相关关系;以及The content analysis sub-module performs content analysis on data cut into multiple fields, establishes a data distribution map and an association relationship diagram between the fields, and identifies a correlation relationship between the data fields based on the data distribution graph and the association relationship graph;
    压缩子模块,针对单个字段,对于不同的数据内容采用不同的压缩策略进行压缩存储,并且也将存在相关关系的多个字段组合,针对组合后的字段,对于不同的数据内容采用不同的压缩策略进行压缩存储。The compression sub-module, for a single field, uses different compression policies for different data contents to be compressed and stored, and also combines multiple fields of related relationships, and uses different compression strategies for different data contents for the combined fields. Perform compressed storage.
  16. 一种计算机可读介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现权利要求1~9中任意一项所述的数据压缩存储方法。A computer readable medium having stored thereon a computer program, wherein the computer program is executed by a processor to implement the data compression storage method according to any one of claims 1 to 9.
  17. 一种计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1~9中任意一项所述的数据压缩存储方法的步骤。A computer device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor executes the computer program to implement any one of claims 1-9 The steps of the data compression storage method.
PCT/CN2018/111180 2017-12-28 2018-10-22 Data compression and storage method and data compression and storage device WO2019128409A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711455790.2A CN108304472A (en) 2017-12-28 2017-12-28 A kind of data compression storage method and compression storing data device
CN201711455790.2 2017-12-28

Publications (1)

Publication Number Publication Date
WO2019128409A1 true WO2019128409A1 (en) 2019-07-04

Family

ID=62867648

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/111180 WO2019128409A1 (en) 2017-12-28 2018-10-22 Data compression and storage method and data compression and storage device

Country Status (3)

Country Link
CN (1) CN108304472A (en)
TW (1) TWI683548B (en)
WO (1) WO2019128409A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304472A (en) * 2017-12-28 2018-07-20 中国银联股份有限公司 A kind of data compression storage method and compression storing data device
CN110134342A (en) * 2019-05-28 2019-08-16 首都师范大学 Data approximation method and system, storage method and system, read method and system
CN111010189B (en) * 2019-10-21 2021-10-26 清华大学 Multi-path compression method and device for data set and storage medium
CN110784227B (en) * 2019-10-21 2021-07-30 清华大学 Multi-path compression method and device for data set and storage medium
CN111008230B (en) * 2019-11-22 2023-08-04 远景智能国际私人投资有限公司 Data storage method, device, computer equipment and storage medium
CN111259107B (en) * 2020-01-10 2023-08-18 北京百度网讯科技有限公司 Determinant text storage method and device and electronic equipment
CN113220651B (en) * 2021-04-25 2024-02-09 暨南大学 Method, device, terminal equipment and storage medium for compressing operation data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708183A (en) * 2012-05-09 2012-10-03 华为技术有限公司 Method and device for data compression
CN105308589A (en) * 2013-04-17 2016-02-03 朗桑有限公司 Compacting data based on data content
CN106980639A (en) * 2016-12-29 2017-07-25 中国银联股份有限公司 Short text data paradigmatic system and method
CN108304472A (en) * 2017-12-28 2018-07-20 中国银联股份有限公司 A kind of data compression storage method and compression storing data device

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102638579B (en) * 2012-03-29 2016-05-04 深圳市高正软件有限公司 A kind of data processing method and system based on mobile device data transmission
CN103379136B (en) * 2012-04-17 2017-02-22 中国移动通信集团公司 Compression method and decompression method of log acquisition data, compression apparatus and decompression apparatus of log acquisition data
CN104424229B (en) * 2013-08-26 2019-02-22 腾讯科技(深圳)有限公司 A kind of calculation method and system that various dimensions are split
CN104462524A (en) * 2014-12-24 2015-03-25 福建江夏学院 Data compression storage method for Internet of Things
CN106156037B (en) * 2015-03-26 2019-11-12 深圳市腾讯计算机系统有限公司 Data processing method, apparatus and system
SG11201703157YA (en) * 2015-12-29 2017-08-30 Huawei Tech Co Ltd Server and method for compressing data by server
CN106019369B (en) * 2016-06-28 2017-12-22 西南科技大学 Geological data lossless compression algorithm in a kind of improved SEG Y files

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708183A (en) * 2012-05-09 2012-10-03 华为技术有限公司 Method and device for data compression
CN105308589A (en) * 2013-04-17 2016-02-03 朗桑有限公司 Compacting data based on data content
CN106980639A (en) * 2016-12-29 2017-07-25 中国银联股份有限公司 Short text data paradigmatic system and method
CN108304472A (en) * 2017-12-28 2018-07-20 中国银联股份有限公司 A kind of data compression storage method and compression storing data device

Also Published As

Publication number Publication date
TWI683548B (en) 2020-01-21
TW201931780A (en) 2019-08-01
CN108304472A (en) 2018-07-20

Similar Documents

Publication Publication Date Title
WO2019128409A1 (en) Data compression and storage method and data compression and storage device
US8650163B1 (en) Estimation of data reduction rate in a data storage system
CN109254966B (en) Data table query method, device, computer equipment and storage medium
CN103020205B (en) Compression/decompression method based on hardware accelerator card in a kind of distributed file system
US9514178B2 (en) Table boundary detection in data blocks for compression
US11762813B2 (en) Quality score compression apparatus and method for improving downstream accuracy
US8650144B2 (en) Apparatus and methods for lossless compression of numerical attributes in rule based systems
US20050210054A1 (en) Information management system
US9928267B2 (en) Hierarchical database compression and query processing
US20080154927A1 (en) Use of federation services and transformation services to perform extract, transform, and load (etl) of unstructured information and associated metadata
WO2014037767A1 (en) Multi-level inline data deduplication
US9843802B1 (en) Method and system for dynamic compression module selection
US11151138B2 (en) Computer program for processing a pivot query
CN106990914B (en) Data deleting method and device
CN117033424A (en) Query optimization method and device for slow SQL (structured query language) statement and computer equipment
JP5601468B2 (en) Method and system for efficiently evaluating the minimum cut set of fault trees
US9785724B2 (en) Secondary queue for index process
WO2024021491A1 (en) Data slicing method, apparatus and system
US8463759B2 (en) Method and system for compressing data
CN106970837B (en) Information processing method and electronic equipment
US7777653B2 (en) Decoding variable-length code (VLC) bitstream information
KR20170122151A (en) Method and apparatus for reducing query processing time by dynamically changing algorithms and computer readable medium therefor
US10649979B1 (en) System, method, and computer program for maintaining consistency between a NoSQL database and non-transactional content associated with one or more files
CN110647577A (en) Data cube partitioning method and device, computer equipment and storage medium
CN111488439B (en) System and method for saving and analyzing log data

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18894483

Country of ref document: EP

Kind code of ref document: A1