CN109981110A - The method of lossy compression with point-by-point relative error boundary - Google Patents

The method of lossy compression with point-by-point relative error boundary Download PDF

Info

Publication number
CN109981110A
CN109981110A CN201910164475.7A CN201910164475A CN109981110A CN 109981110 A CN109981110 A CN 109981110A CN 201910164475 A CN201910164475 A CN 201910164475A CN 109981110 A CN109981110 A CN 109981110A
Authority
CN
China
Prior art keywords
point
relative error
lossy compression
factor
point relative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910164475.7A
Other languages
Chinese (zh)
Other versions
CN109981110B (en
Inventor
夏文
邹翔宇
王轩
张伟哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology Shenzhen
Original Assignee
Harbin Institute of Technology Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology Shenzhen filed Critical Harbin Institute of Technology Shenzhen
Priority to CN201910164475.7A priority Critical patent/CN109981110B/en
Publication of CN109981110A publication Critical patent/CN109981110A/en
Application granted granted Critical
Publication of CN109981110B publication Critical patent/CN109981110B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本发明提供了一种带有逐点相对误差界限的有损压缩的方法,包括以下步骤:A、制表,根据误差要求以及量化因子的区间来制表;B、获取量化因子;C、哈夫曼编码,通过哈夫曼编码来压缩步骤B中生成的量化因子序列;D、使用无损压缩方法,使用无损压缩方法来压缩步骤C生成的哈夫曼编码和哈夫曼树。本发明的有益效果是:可以避免带有逐点相对误差界限的有损压缩中耗时的对数变换,并通过查表来获取量化因子值,极大地加速了带有逐点相对误差界限的有损压缩。

The invention provides a lossy compression method with point-by-point relative error limits, comprising the following steps: A, tabulation, tabulation according to error requirements and quantization factor intervals; B, acquisition of quantization factors; C, ha Huffman coding, using Huffman coding to compress the quantization factor sequence generated in step B; D. Using a lossless compression method, using a lossless compression method to compress the Huffman coding and Huffman tree generated in step C. The beneficial effects of the present invention are: the time-consuming logarithmic transformation in lossy compression with point-by-point relative error limits can be avoided, and the quantization factor value can be obtained by looking up a table, which greatly accelerates the time-consuming logarithmic transformation with point-by-point relative error limits. Lossy compression.

Description

带有逐点相对误差界限的有损压缩的方法A method for lossy compression with point-wise relative error bounds

技术领域technical field

本发明涉及有损压缩的方法,尤其涉及一种带有逐点相对误差界限的有损压缩的方法。The present invention relates to a method of lossy compression, in particular to a method of lossy compression with point-by-point relative error bounds.

背景技术Background technique

在高性能计算(HPC)环境中进行科学模拟产生的数据非常庞大,这可能会在运行时导致严重的I/O瓶颈,并为后期分析带来巨大的存储空间负担。与传统的数据缩减方案(例如重复数据删除或无损压缩)不同,有损压缩在满足用户对误差控制的要求下可以显着减小数据大小。为了自动地适应数据集中的精度要求,带有逐点相对误差界限(即,压缩误差取决于数据值)的有损压缩被广泛使用在了许多科学应用中。Scientific simulations in high-performance computing (HPC) environments produce very large amounts of data, which can cause severe I/O bottlenecks at runtime and a huge storage space burden for post-analysis. Unlike traditional data reduction schemes such as deduplication or lossless compression, lossy compression can significantly reduce data size while meeting user requirements for error control. In order to automatically adapt to the accuracy requirements in the dataset, lossy compression with point-wise relative error bounds (ie, the compression error depends on the data values) is widely used in many scientific applications.

原始的带有逐点相对误差界限的有损压缩在压缩过程中需要将所有数据都经过一次对数转换。计算对数在计算机中一般使用级数来实现,计算量大,比较耗时。计算对数这个步骤需要将所有的数据都转换为其对数形式,计算量和数据规模正相关,这个步骤的耗时在算法总耗时中占据了一个比较大的比例。导致带有逐点相对误差界限的有损压缩复杂且耗时。The original lossy compression with point-wise relative error bounds requires that all data be log-transformed during the compression process. Calculation of logarithms is generally implemented in computers by using series, which requires a large amount of calculation and is time-consuming. The step of calculating the logarithm needs to convert all the data into its logarithmic form, and the amount of calculation is positively correlated with the size of the data. The time-consuming of this step occupies a relatively large proportion of the total time-consuming of the algorithm. Resulting in lossy compression with point-wise relative error bounds is complex and time-consuming.

因此,如何加快带有逐点相对误差界限的有损压缩是本领域技术人员所亟待解决的技术问题。Therefore, how to speed up the lossy compression with the point-by-point relative error limit is a technical problem to be solved urgently by those skilled in the art.

发明内容SUMMARY OF THE INVENTION

为了解决现有技术中的问题,本发明提供了一种带有逐点相对误差界限的有损压缩的方法。In order to solve the problems in the prior art, the present invention provides a lossy compression method with a point-by-point relative error bound.

本发明提供了一种带有逐点相对误差界限的有损压缩的方法,包括以下步骤:The present invention provides a method for lossy compression with point-by-point relative error bounds, comprising the following steps:

A、制表,根据误差要求以及量化因子的区间来制表;A. Tabulation, according to the error requirements and the interval of the quantification factor;

B、获取量化因子;B. Obtain quantification factor;

C、哈夫曼编码,通过哈夫曼编码来压缩步骤B中生成的量化因子序列;C. Huffman coding, the quantization factor sequence generated in step B is compressed by Huffman coding;

D、使用无损压缩方法,使用无损压缩方法来压缩步骤C生成的哈夫曼编码和哈夫曼树。D. Using a lossless compression method, use a lossless compression method to compress the Huffman code and Huffman tree generated in step C.

作为本发明的进一步改进,在步骤B中,计算实际值Xi和预测值Xi的比值然后使用步骤A生成的表,通过求得的R来查询量化因子。As a further improvement of the present invention, in step B, the ratio of the actual value X i to the predicted value X i is calculated Then use the table generated in step A to query the quantization factor through the obtained R.

作为本发明的进一步改进,步骤A包括以下子步骤:As a further improvement of the present invention, step A includes the following substeps:

A1、遍历量化因子的定义域,计算每个量化因子的覆盖范围,生成表T1,表T1是用量化因子来获取该量化因子覆盖范围的表;A1. Traverse the definition domain of the quantization factor, calculate the coverage of each quantization factor, and generate a table T1, which is a table that uses the quantization factor to obtain the coverage of the quantization factor;

A2、根据误差要求计算表T2的大小,根据表T1依次计算出表T2各个表项的数值并填写表T2,表T2是用比值R来获取量化因子M的表。A2. Calculate the size of table T2 according to the error requirements, calculate the values of each table item in table T2 in turn according to table T1, and fill in table T2. Table T2 is a table for obtaining quantization factor M by ratio R.

作为本发明的进一步改进,在步骤A1中,计算每个量化因子Mk对应的值域Pk,生成表T1。As a further improvement of the present invention, in step A1, the value range P k corresponding to each quantization factor M k is calculated, and a table T1 is generated.

作为本发明的进一步改进,在步骤A2中,相邻量化因子对应的值域之间产生重叠,重叠的大小小于表T2的表项。As a further improvement of the present invention, in step A2, the value ranges corresponding to adjacent quantization factors overlap, and the size of the overlap is smaller than the entries in table T2.

本发明的有益效果是:通过上述方案,可以避免带有逐点相对误差界限的有损压缩中耗时的对数变换,并通过查表来获取量化因子值,极大地加速了带有逐点相对误差界限的有损压缩。The beneficial effects of the present invention are: through the above scheme, the time-consuming logarithmic transformation in lossy compression with point-by-point relative error limits can be avoided, and the quantization factor value can be obtained by looking up the table, which greatly accelerates the point-by-point relative error limit. Lossy compression relative to error bounds.

附图说明Description of drawings

图1是本发明一种带有逐点相对误差界限的有损压缩的方法的流程图。FIG. 1 is a flow chart of a method of lossy compression with point-by-point relative error bounds of the present invention.

图2是本发明一种带有逐点相对误差界限的有损压缩的方法的步骤A的流程图。FIG. 2 is a flow chart of step A of a method of lossy compression with point-by-point relative error bounds of the present invention.

具体实施方式Detailed ways

下面结合附图说明及具体实施方式对本发明作进一步说明。The present invention will be further described below with reference to the accompanying drawings and specific embodiments.

如图1所示,一种带有逐点相对误差界限的有损压缩的方法,包括以下步骤:As shown in Figure 1, a method for lossy compression with point-by-point relative error bounds includes the following steps:

A、制表,根据用户提供的误差要求以及量化因子的区间来制表,以供之后的步骤使用;A. Tabulation, according to the error requirements provided by the user and the interval of the quantization factor to make a tabulation for subsequent steps;

B、获取量化因子,计算实际值Xi和预测值X′i的比值然后使用步骤A生成的表,通过求得的R来查询量化因子;B. Obtain the quantization factor and calculate the ratio of the actual value X i to the predicted value X' i Then use the table generated in step A to query the quantization factor through the obtained R;

C、哈夫曼编码,通过哈夫曼编码来压缩步骤B中生成的量化因子序列;C. Huffman coding, the quantization factor sequence generated in step B is compressed by Huffman coding;

D、使用无损压缩方法,使用gzip或者zstd等常规的无损压缩方法来压缩步骤C生成的哈夫曼编码和哈夫曼树。D. Use a lossless compression method, and use a conventional lossless compression method such as gzip or zstd to compress the Huffman code and Huffman tree generated in step C.

如图2所示,步骤A包括以下子步骤:As shown in Figure 2, step A includes the following sub-steps:

A1、遍历量化因子的定义域,计算每个量化因子的覆盖范围,生成表T1,表T1是用量化因子来获取该量化因子覆盖范围的表;A1. Traverse the definition domain of the quantization factor, calculate the coverage of each quantization factor, and generate a table T1, which is a table that uses the quantization factor to obtain the coverage of the quantization factor;

A2、根据误差要求计算表T2的大小,根据表T1依次计算出表T2各个表项的数值并填写表T2,表T2是用比值R来获取量化因子M的表。A2. Calculate the size of table T2 according to the error requirements, calculate the values of each table item in table T2 in turn according to table T1, and fill in table T2. Table T2 is a table for obtaining quantization factor M by ratio R.

传统的对数处理是将所有的数据对数化{Xi}→{log(Xi)},将{log(Xi)}命名为{Yi};根据实际值Yi和预测值Y′i来计算量化因子然后记录量化因子。The traditional logarithmic processing is to log all the data {X i }→{log(X i )}, and name {log(X i )} as {Y i }; according to the actual value Y i and the predicted value Y ' i to calculate the quantization factor Then record the quantization factor.

可知,其实对于每个量化因子都对应了一个Yi-Y′i的值域,只要Yi-Y′i在这个值域内,都会生成同样的一个量化因子,步骤A1就是计算每个量化因子Mk对应的值域Pk,生成表T1。Depend on It can be seen that, in fact, each quantization factor corresponds to a value range of Y i -Y' i . As long as Y i -Y' i is within this range, the same quantization factor will be generated. Step A1 is to calculate each quantization factor. For the value range P k corresponding to M k , a table T1 is generated.

可得其中,0<δ<1。根据精度要求(即误差要求)建立表T2,以通过来获取M。为了防止某个表项处于跨值域的位置,微调一下相邻量化因子的间隔,让相邻量化因子对应的值域之间产生一定的重叠,保证重叠的大小小于T2表项所代表的大小,这样可以某个表项一定完全属于某个值域,从而规避掉问题。最后遍历表T2,根据表T1依次填写T2的表项。Depend on Available Among them, 0<δ<1. Table T2 is established based on accuracy requirements (ie, error requirements) to pass to get M. In order to prevent an entry from being in a position that crosses the value range, fine-tune the interval between adjacent quantization factors, so that there is a certain overlap between the value ranges corresponding to adjacent quantization factors, and ensure that the size of the overlap is smaller than the size represented by the T2 entry. , so that a certain table item must belong to a certain value range completely, thus avoiding the problem. Finally, the table T2 is traversed, and the table entries of T2 are filled in according to the table T1.

步骤B则是根据计算出的去查询表T2,从而获得所需的量化因子。Step B is calculated according to Go to lookup table T2 to obtain the desired quantization factor.

按照原始的技术方案,计算对数的过程会比较耗时,并且耗时和数据规模正相关,总是会占据总耗时中一个比较大的部分本发明提供的一种带有逐点相对误差界限的有损压缩的方法,绕过了计算对数的过程,使用了建表的方法,以实际值和预测值的比值来查表,从而直接获取到量化因子。最终在保持根本原理和以前完全相同的前提下,省去了计算对数这一耗时的步骤,实现了整个算法的加速。According to the original technical solution, the process of calculating the logarithm is time-consuming, and the time-consuming is positively correlated with the data scale, which always occupies a relatively large part of the total time-consuming. The lossy compression method of the boundary bypasses the process of calculating the logarithm, and uses the method of building a table to look up the table with the ratio of the actual value and the predicted value, so as to directly obtain the quantization factor. In the end, under the premise of keeping the fundamental principle exactly the same as before, the time-consuming step of calculating the logarithm is eliminated, and the entire algorithm is accelerated.

以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。The above content is a further detailed description of the present invention in combination with specific preferred embodiments, and it cannot be considered that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field of the present invention, without departing from the concept of the present invention, some simple deductions or substitutions can be made, which should be regarded as belonging to the protection scope of the present invention.

Claims (5)

1. a kind of method of the lossy compression with point-by-point relative error boundary, which comprises the following steps:
A, it tabulates, is tabulated according to the section of error requirements and quantizing factor;
B, quantizing factor is obtained;
C, Huffman encoding, the quantizing factor sequence generated in compression step B by Huffman encoding;
D, using lossless compression method, Huffman encoding and the Huffman tree of compression step C generation are come using lossless compression method.
2. the method for the lossy compression according to claim 1 with point-by-point relative error boundary, it is characterised in that: in step In rapid B, actual value X is calculatediWith predicted value X 'iRatioThen the table generated using step A, passes through the R acquired To inquire quantizing factor.
3. the method for the lossy compression according to claim 2 with point-by-point relative error boundary, which is characterized in that step A includes following sub-step:
A1, the domain for traversing quantizing factor, calculate the coverage area of each quantizing factor, generate table T1, and table T1 is with quantization The factor obtains the table of the quantizing factor coverage area;
A2, according to the size of error requirements computational chart T2, the numerical value of each list item of table T2 is successively calculated according to table T1 and is filled in Table T2, table T2 are the tables that quantization factor M is obtained with ratio R.
4. the method for the lossy compression according to claim 3 with point-by-point relative error boundary, it is characterised in that: in step In rapid A1, each quantizing factor M is calculatedkCorresponding codomain Pk, generate table T1.
5. the method for the lossy compression according to claim 4 with point-by-point relative error boundary, it is characterised in that: in step In rapid A2, overlapping is generated between the corresponding codomain of the adjacent quantization factor, the size of overlapping is less than the list item of table T2.
CN201910164475.7A 2019-03-05 2019-03-05 Method of lossy compression with point-by-point relative error bounds Active CN109981110B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910164475.7A CN109981110B (en) 2019-03-05 2019-03-05 Method of lossy compression with point-by-point relative error bounds

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910164475.7A CN109981110B (en) 2019-03-05 2019-03-05 Method of lossy compression with point-by-point relative error bounds

Publications (2)

Publication Number Publication Date
CN109981110A true CN109981110A (en) 2019-07-05
CN109981110B CN109981110B (en) 2023-03-24

Family

ID=67077958

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910164475.7A Active CN109981110B (en) 2019-03-05 2019-03-05 Method of lossy compression with point-by-point relative error bounds

Country Status (1)

Country Link
CN (1) CN109981110B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5724453A (en) * 1995-07-10 1998-03-03 Wisconsin Alumni Research Foundation Image compression system and method having optimized quantization tables
US6049630A (en) * 1996-03-19 2000-04-11 America Online, Inc. Data compression using adaptive bit allocation and hybrid lossless entropy encoding
US20040044521A1 (en) * 2002-09-04 2004-03-04 Microsoft Corporation Unified lossy and lossless audio compression
US20080193028A1 (en) * 2007-02-13 2008-08-14 Yin-Chun Blue Lan Method of high quality digital image compression
US20080285866A1 (en) * 2007-05-16 2008-11-20 Takashi Ishikawa Apparatus and method for image data compression
WO2010030256A1 (en) * 2008-09-12 2010-03-18 Tovaristvo Z Obmezenou Vidpovidalnistu 'smail' Alias-free method of image coding and decoding (2 variants)
US20160127746A1 (en) * 2012-05-04 2016-05-05 Environmental Systems Research Institute, Inc. Limited error raster compression

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5724453A (en) * 1995-07-10 1998-03-03 Wisconsin Alumni Research Foundation Image compression system and method having optimized quantization tables
US6049630A (en) * 1996-03-19 2000-04-11 America Online, Inc. Data compression using adaptive bit allocation and hybrid lossless entropy encoding
US20040044521A1 (en) * 2002-09-04 2004-03-04 Microsoft Corporation Unified lossy and lossless audio compression
US20080193028A1 (en) * 2007-02-13 2008-08-14 Yin-Chun Blue Lan Method of high quality digital image compression
US20080285866A1 (en) * 2007-05-16 2008-11-20 Takashi Ishikawa Apparatus and method for image data compression
WO2010030256A1 (en) * 2008-09-12 2010-03-18 Tovaristvo Z Obmezenou Vidpovidalnistu 'smail' Alias-free method of image coding and decoding (2 variants)
US20160127746A1 (en) * 2012-05-04 2016-05-05 Environmental Systems Research Institute, Inc. Limited error raster compression

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
冷星星 等: "高压缩低损耗图像编码算法研究", 《成都信息工程学院学报》 *

Also Published As

Publication number Publication date
CN109981110B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
US8892586B2 (en) Accelerated query operators for high-speed, in-memory online analytical processing queries and operations
WO2023045204A1 (en) Method and system for generating finite state entropy coding table, medium, and device
CN103177111B (en) Data deduplication system and delet method thereof
CN103326730B (en) Data parallel compression method
US9928267B2 (en) Hierarchical database compression and query processing
WO2022188583A1 (en) Decoding method/encoding method based on point cloud attribute prediction, decoder, and encoder
US20190251189A1 (en) Delta Compression
US11403017B2 (en) Data compression method, electronic device and computer program product
US9916319B2 (en) Effective method to compress tabular data export files for data movement
CN107291935B (en) CPIR-V Nearest Neighbor Privacy Preserving Query Method Based on Spark and Huffman Coding
WO2019041918A1 (en) Data coding method and device, and storage medium
CN114972551A (en) A point cloud compression and decompression method
CN103152430A (en) Cloud storage method for reducing data-occupied space
WO2019080670A1 (en) Gene sequencing data compression method and decompression method, system, and computer readable medium
Barbarioli et al. Hierarchical residual encoding for multiresolution time series compression
Zou et al. Performance optimization for relative-error-bounded lossy compression on scientific data
CN117216023A (en) Large-scale network data storage method and system
WO2023015831A1 (en) Huffman correction encoding method and system, and relevant components
Zou et al. Accelerating relative-error bounded lossy compression for HPC datasets with precomputation-based mechanisms
CN109981110A (en) The method of lossy compression with point-by-point relative error boundary
CN114665884A (en) Time sequence database self-adaptive lossy compression method, system and medium
US20240080040A1 (en) System and method for data storage, transfer, synchronization, and security using automated model monitoring and training
CN106599112A (en) Massive incomplete data storage and operation method
US9787323B1 (en) Huffman tree decompression
CN110348469A (en) A kind of user&#39;s method for measuring similarity based on DeepWalk internet startup disk model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant