CN109981110A - The method of lossy compression with point-by-point relative error boundary - Google Patents
The method of lossy compression with point-by-point relative error boundary Download PDFInfo
- Publication number
- CN109981110A CN109981110A CN201910164475.7A CN201910164475A CN109981110A CN 109981110 A CN109981110 A CN 109981110A CN 201910164475 A CN201910164475 A CN 201910164475A CN 109981110 A CN109981110 A CN 109981110A
- Authority
- CN
- China
- Prior art keywords
- point
- relative error
- lossy compression
- factor
- point relative
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007906 compression Methods 0.000 title claims abstract description 37
- 230000006835 compression Effects 0.000 title claims abstract description 36
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000013139 quantization Methods 0.000 claims abstract description 35
- 230000009466 transformation Effects 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000000875 corresponding effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
本发明提供了一种带有逐点相对误差界限的有损压缩的方法,包括以下步骤:A、制表,根据误差要求以及量化因子的区间来制表;B、获取量化因子;C、哈夫曼编码,通过哈夫曼编码来压缩步骤B中生成的量化因子序列;D、使用无损压缩方法,使用无损压缩方法来压缩步骤C生成的哈夫曼编码和哈夫曼树。本发明的有益效果是:可以避免带有逐点相对误差界限的有损压缩中耗时的对数变换,并通过查表来获取量化因子值,极大地加速了带有逐点相对误差界限的有损压缩。
The invention provides a lossy compression method with point-by-point relative error limits, comprising the following steps: A, tabulation, tabulation according to error requirements and quantization factor intervals; B, acquisition of quantization factors; C, ha Huffman coding, using Huffman coding to compress the quantization factor sequence generated in step B; D. Using a lossless compression method, using a lossless compression method to compress the Huffman coding and Huffman tree generated in step C. The beneficial effects of the present invention are: the time-consuming logarithmic transformation in lossy compression with point-by-point relative error limits can be avoided, and the quantization factor value can be obtained by looking up a table, which greatly accelerates the time-consuming logarithmic transformation with point-by-point relative error limits. Lossy compression.
Description
技术领域technical field
本发明涉及有损压缩的方法,尤其涉及一种带有逐点相对误差界限的有损压缩的方法。The present invention relates to a method of lossy compression, in particular to a method of lossy compression with point-by-point relative error bounds.
背景技术Background technique
在高性能计算(HPC)环境中进行科学模拟产生的数据非常庞大,这可能会在运行时导致严重的I/O瓶颈,并为后期分析带来巨大的存储空间负担。与传统的数据缩减方案(例如重复数据删除或无损压缩)不同,有损压缩在满足用户对误差控制的要求下可以显着减小数据大小。为了自动地适应数据集中的精度要求,带有逐点相对误差界限(即,压缩误差取决于数据值)的有损压缩被广泛使用在了许多科学应用中。Scientific simulations in high-performance computing (HPC) environments produce very large amounts of data, which can cause severe I/O bottlenecks at runtime and a huge storage space burden for post-analysis. Unlike traditional data reduction schemes such as deduplication or lossless compression, lossy compression can significantly reduce data size while meeting user requirements for error control. In order to automatically adapt to the accuracy requirements in the dataset, lossy compression with point-wise relative error bounds (ie, the compression error depends on the data values) is widely used in many scientific applications.
原始的带有逐点相对误差界限的有损压缩在压缩过程中需要将所有数据都经过一次对数转换。计算对数在计算机中一般使用级数来实现,计算量大,比较耗时。计算对数这个步骤需要将所有的数据都转换为其对数形式,计算量和数据规模正相关,这个步骤的耗时在算法总耗时中占据了一个比较大的比例。导致带有逐点相对误差界限的有损压缩复杂且耗时。The original lossy compression with point-wise relative error bounds requires that all data be log-transformed during the compression process. Calculation of logarithms is generally implemented in computers by using series, which requires a large amount of calculation and is time-consuming. The step of calculating the logarithm needs to convert all the data into its logarithmic form, and the amount of calculation is positively correlated with the size of the data. The time-consuming of this step occupies a relatively large proportion of the total time-consuming of the algorithm. Resulting in lossy compression with point-wise relative error bounds is complex and time-consuming.
因此,如何加快带有逐点相对误差界限的有损压缩是本领域技术人员所亟待解决的技术问题。Therefore, how to speed up the lossy compression with the point-by-point relative error limit is a technical problem to be solved urgently by those skilled in the art.
发明内容SUMMARY OF THE INVENTION
为了解决现有技术中的问题,本发明提供了一种带有逐点相对误差界限的有损压缩的方法。In order to solve the problems in the prior art, the present invention provides a lossy compression method with a point-by-point relative error bound.
本发明提供了一种带有逐点相对误差界限的有损压缩的方法,包括以下步骤:The present invention provides a method for lossy compression with point-by-point relative error bounds, comprising the following steps:
A、制表,根据误差要求以及量化因子的区间来制表;A. Tabulation, according to the error requirements and the interval of the quantification factor;
B、获取量化因子;B. Obtain quantification factor;
C、哈夫曼编码,通过哈夫曼编码来压缩步骤B中生成的量化因子序列;C. Huffman coding, the quantization factor sequence generated in step B is compressed by Huffman coding;
D、使用无损压缩方法,使用无损压缩方法来压缩步骤C生成的哈夫曼编码和哈夫曼树。D. Using a lossless compression method, use a lossless compression method to compress the Huffman code and Huffman tree generated in step C.
作为本发明的进一步改进,在步骤B中,计算实际值Xi和预测值Xi的比值然后使用步骤A生成的表,通过求得的R来查询量化因子。As a further improvement of the present invention, in step B, the ratio of the actual value X i to the predicted value X i is calculated Then use the table generated in step A to query the quantization factor through the obtained R.
作为本发明的进一步改进,步骤A包括以下子步骤:As a further improvement of the present invention, step A includes the following substeps:
A1、遍历量化因子的定义域,计算每个量化因子的覆盖范围,生成表T1,表T1是用量化因子来获取该量化因子覆盖范围的表;A1. Traverse the definition domain of the quantization factor, calculate the coverage of each quantization factor, and generate a table T1, which is a table that uses the quantization factor to obtain the coverage of the quantization factor;
A2、根据误差要求计算表T2的大小,根据表T1依次计算出表T2各个表项的数值并填写表T2,表T2是用比值R来获取量化因子M的表。A2. Calculate the size of table T2 according to the error requirements, calculate the values of each table item in table T2 in turn according to table T1, and fill in table T2. Table T2 is a table for obtaining quantization factor M by ratio R.
作为本发明的进一步改进,在步骤A1中,计算每个量化因子Mk对应的值域Pk,生成表T1。As a further improvement of the present invention, in step A1, the value range P k corresponding to each quantization factor M k is calculated, and a table T1 is generated.
作为本发明的进一步改进,在步骤A2中,相邻量化因子对应的值域之间产生重叠,重叠的大小小于表T2的表项。As a further improvement of the present invention, in step A2, the value ranges corresponding to adjacent quantization factors overlap, and the size of the overlap is smaller than the entries in table T2.
本发明的有益效果是:通过上述方案,可以避免带有逐点相对误差界限的有损压缩中耗时的对数变换,并通过查表来获取量化因子值,极大地加速了带有逐点相对误差界限的有损压缩。The beneficial effects of the present invention are: through the above scheme, the time-consuming logarithmic transformation in lossy compression with point-by-point relative error limits can be avoided, and the quantization factor value can be obtained by looking up the table, which greatly accelerates the point-by-point relative error limit. Lossy compression relative to error bounds.
附图说明Description of drawings
图1是本发明一种带有逐点相对误差界限的有损压缩的方法的流程图。FIG. 1 is a flow chart of a method of lossy compression with point-by-point relative error bounds of the present invention.
图2是本发明一种带有逐点相对误差界限的有损压缩的方法的步骤A的流程图。FIG. 2 is a flow chart of step A of a method of lossy compression with point-by-point relative error bounds of the present invention.
具体实施方式Detailed ways
下面结合附图说明及具体实施方式对本发明作进一步说明。The present invention will be further described below with reference to the accompanying drawings and specific embodiments.
如图1所示,一种带有逐点相对误差界限的有损压缩的方法,包括以下步骤:As shown in Figure 1, a method for lossy compression with point-by-point relative error bounds includes the following steps:
A、制表,根据用户提供的误差要求以及量化因子的区间来制表,以供之后的步骤使用;A. Tabulation, according to the error requirements provided by the user and the interval of the quantization factor to make a tabulation for subsequent steps;
B、获取量化因子,计算实际值Xi和预测值X′i的比值然后使用步骤A生成的表,通过求得的R来查询量化因子;B. Obtain the quantization factor and calculate the ratio of the actual value X i to the predicted value X' i Then use the table generated in step A to query the quantization factor through the obtained R;
C、哈夫曼编码,通过哈夫曼编码来压缩步骤B中生成的量化因子序列;C. Huffman coding, the quantization factor sequence generated in step B is compressed by Huffman coding;
D、使用无损压缩方法,使用gzip或者zstd等常规的无损压缩方法来压缩步骤C生成的哈夫曼编码和哈夫曼树。D. Use a lossless compression method, and use a conventional lossless compression method such as gzip or zstd to compress the Huffman code and Huffman tree generated in step C.
如图2所示,步骤A包括以下子步骤:As shown in Figure 2, step A includes the following sub-steps:
A1、遍历量化因子的定义域,计算每个量化因子的覆盖范围,生成表T1,表T1是用量化因子来获取该量化因子覆盖范围的表;A1. Traverse the definition domain of the quantization factor, calculate the coverage of each quantization factor, and generate a table T1, which is a table that uses the quantization factor to obtain the coverage of the quantization factor;
A2、根据误差要求计算表T2的大小,根据表T1依次计算出表T2各个表项的数值并填写表T2,表T2是用比值R来获取量化因子M的表。A2. Calculate the size of table T2 according to the error requirements, calculate the values of each table item in table T2 in turn according to table T1, and fill in table T2. Table T2 is a table for obtaining quantization factor M by ratio R.
传统的对数处理是将所有的数据对数化{Xi}→{log(Xi)},将{log(Xi)}命名为{Yi};根据实际值Yi和预测值Y′i来计算量化因子然后记录量化因子。The traditional logarithmic processing is to log all the data {X i }→{log(X i )}, and name {log(X i )} as {Y i }; according to the actual value Y i and the predicted value Y ' i to calculate the quantization factor Then record the quantization factor.
由可知,其实对于每个量化因子都对应了一个Yi-Y′i的值域,只要Yi-Y′i在这个值域内,都会生成同样的一个量化因子,步骤A1就是计算每个量化因子Mk对应的值域Pk,生成表T1。Depend on It can be seen that, in fact, each quantization factor corresponds to a value range of Y i -Y' i . As long as Y i -Y' i is within this range, the same quantization factor will be generated. Step A1 is to calculate each quantization factor. For the value range P k corresponding to M k , a table T1 is generated.
由可得其中,0<δ<1。根据精度要求(即误差要求)建立表T2,以通过来获取M。为了防止某个表项处于跨值域的位置,微调一下相邻量化因子的间隔,让相邻量化因子对应的值域之间产生一定的重叠,保证重叠的大小小于T2表项所代表的大小,这样可以某个表项一定完全属于某个值域,从而规避掉问题。最后遍历表T2,根据表T1依次填写T2的表项。Depend on Available Among them, 0<δ<1. Table T2 is established based on accuracy requirements (ie, error requirements) to pass to get M. In order to prevent an entry from being in a position that crosses the value range, fine-tune the interval between adjacent quantization factors, so that there is a certain overlap between the value ranges corresponding to adjacent quantization factors, and ensure that the size of the overlap is smaller than the size represented by the T2 entry. , so that a certain table item must belong to a certain value range completely, thus avoiding the problem. Finally, the table T2 is traversed, and the table entries of T2 are filled in according to the table T1.
步骤B则是根据计算出的去查询表T2,从而获得所需的量化因子。Step B is calculated according to Go to lookup table T2 to obtain the desired quantization factor.
按照原始的技术方案,计算对数的过程会比较耗时,并且耗时和数据规模正相关,总是会占据总耗时中一个比较大的部分本发明提供的一种带有逐点相对误差界限的有损压缩的方法,绕过了计算对数的过程,使用了建表的方法,以实际值和预测值的比值来查表,从而直接获取到量化因子。最终在保持根本原理和以前完全相同的前提下,省去了计算对数这一耗时的步骤,实现了整个算法的加速。According to the original technical solution, the process of calculating the logarithm is time-consuming, and the time-consuming is positively correlated with the data scale, which always occupies a relatively large part of the total time-consuming. The lossy compression method of the boundary bypasses the process of calculating the logarithm, and uses the method of building a table to look up the table with the ratio of the actual value and the predicted value, so as to directly obtain the quantization factor. In the end, under the premise of keeping the fundamental principle exactly the same as before, the time-consuming step of calculating the logarithm is eliminated, and the entire algorithm is accelerated.
以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。The above content is a further detailed description of the present invention in combination with specific preferred embodiments, and it cannot be considered that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field of the present invention, without departing from the concept of the present invention, some simple deductions or substitutions can be made, which should be regarded as belonging to the protection scope of the present invention.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910164475.7A CN109981110B (en) | 2019-03-05 | 2019-03-05 | Method of lossy compression with point-by-point relative error bounds |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910164475.7A CN109981110B (en) | 2019-03-05 | 2019-03-05 | Method of lossy compression with point-by-point relative error bounds |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109981110A true CN109981110A (en) | 2019-07-05 |
CN109981110B CN109981110B (en) | 2023-03-24 |
Family
ID=67077958
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910164475.7A Active CN109981110B (en) | 2019-03-05 | 2019-03-05 | Method of lossy compression with point-by-point relative error bounds |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109981110B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5724453A (en) * | 1995-07-10 | 1998-03-03 | Wisconsin Alumni Research Foundation | Image compression system and method having optimized quantization tables |
US6049630A (en) * | 1996-03-19 | 2000-04-11 | America Online, Inc. | Data compression using adaptive bit allocation and hybrid lossless entropy encoding |
US20040044521A1 (en) * | 2002-09-04 | 2004-03-04 | Microsoft Corporation | Unified lossy and lossless audio compression |
US20080193028A1 (en) * | 2007-02-13 | 2008-08-14 | Yin-Chun Blue Lan | Method of high quality digital image compression |
US20080285866A1 (en) * | 2007-05-16 | 2008-11-20 | Takashi Ishikawa | Apparatus and method for image data compression |
WO2010030256A1 (en) * | 2008-09-12 | 2010-03-18 | Tovaristvo Z Obmezenou Vidpovidalnistu 'smail' | Alias-free method of image coding and decoding (2 variants) |
US20160127746A1 (en) * | 2012-05-04 | 2016-05-05 | Environmental Systems Research Institute, Inc. | Limited error raster compression |
-
2019
- 2019-03-05 CN CN201910164475.7A patent/CN109981110B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5724453A (en) * | 1995-07-10 | 1998-03-03 | Wisconsin Alumni Research Foundation | Image compression system and method having optimized quantization tables |
US6049630A (en) * | 1996-03-19 | 2000-04-11 | America Online, Inc. | Data compression using adaptive bit allocation and hybrid lossless entropy encoding |
US20040044521A1 (en) * | 2002-09-04 | 2004-03-04 | Microsoft Corporation | Unified lossy and lossless audio compression |
US20080193028A1 (en) * | 2007-02-13 | 2008-08-14 | Yin-Chun Blue Lan | Method of high quality digital image compression |
US20080285866A1 (en) * | 2007-05-16 | 2008-11-20 | Takashi Ishikawa | Apparatus and method for image data compression |
WO2010030256A1 (en) * | 2008-09-12 | 2010-03-18 | Tovaristvo Z Obmezenou Vidpovidalnistu 'smail' | Alias-free method of image coding and decoding (2 variants) |
US20160127746A1 (en) * | 2012-05-04 | 2016-05-05 | Environmental Systems Research Institute, Inc. | Limited error raster compression |
Non-Patent Citations (1)
Title |
---|
冷星星 等: "高压缩低损耗图像编码算法研究", 《成都信息工程学院学报》 * |
Also Published As
Publication number | Publication date |
---|---|
CN109981110B (en) | 2023-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8892586B2 (en) | Accelerated query operators for high-speed, in-memory online analytical processing queries and operations | |
WO2023045204A1 (en) | Method and system for generating finite state entropy coding table, medium, and device | |
CN103177111B (en) | Data deduplication system and delet method thereof | |
CN103326730B (en) | Data parallel compression method | |
US9928267B2 (en) | Hierarchical database compression and query processing | |
WO2022188583A1 (en) | Decoding method/encoding method based on point cloud attribute prediction, decoder, and encoder | |
US20190251189A1 (en) | Delta Compression | |
US11403017B2 (en) | Data compression method, electronic device and computer program product | |
US9916319B2 (en) | Effective method to compress tabular data export files for data movement | |
CN107291935B (en) | CPIR-V Nearest Neighbor Privacy Preserving Query Method Based on Spark and Huffman Coding | |
WO2019041918A1 (en) | Data coding method and device, and storage medium | |
CN114972551A (en) | A point cloud compression and decompression method | |
CN103152430A (en) | Cloud storage method for reducing data-occupied space | |
WO2019080670A1 (en) | Gene sequencing data compression method and decompression method, system, and computer readable medium | |
Barbarioli et al. | Hierarchical residual encoding for multiresolution time series compression | |
Zou et al. | Performance optimization for relative-error-bounded lossy compression on scientific data | |
CN117216023A (en) | Large-scale network data storage method and system | |
WO2023015831A1 (en) | Huffman correction encoding method and system, and relevant components | |
Zou et al. | Accelerating relative-error bounded lossy compression for HPC datasets with precomputation-based mechanisms | |
CN109981110A (en) | The method of lossy compression with point-by-point relative error boundary | |
CN114665884A (en) | Time sequence database self-adaptive lossy compression method, system and medium | |
US20240080040A1 (en) | System and method for data storage, transfer, synchronization, and security using automated model monitoring and training | |
CN106599112A (en) | Massive incomplete data storage and operation method | |
US9787323B1 (en) | Huffman tree decompression | |
CN110348469A (en) | A kind of user's method for measuring similarity based on DeepWalk internet startup disk model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |