CN114598329B - Lightweight lossless compression method for rapid decompression application - Google Patents

Lightweight lossless compression method for rapid decompression application Download PDF

Info

Publication number
CN114598329B
CN114598329B CN202210269150.7A CN202210269150A CN114598329B CN 114598329 B CN114598329 B CN 114598329B CN 202210269150 A CN202210269150 A CN 202210269150A CN 114598329 B CN114598329 B CN 114598329B
Authority
CN
China
Prior art keywords
matching
format
compression
equal
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210269150.7A
Other languages
Chinese (zh)
Other versions
CN114598329A (en
Inventor
肖卓凌
王天越
彭卓霖
陈智麒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210269150.7A priority Critical patent/CN114598329B/en
Publication of CN114598329A publication Critical patent/CN114598329A/en
Application granted granted Critical
Publication of CN114598329B publication Critical patent/CN114598329B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3088Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a lightweight lossless compression method for rapid decompression application, which is characterized in that a dictionary is updated through the existing characters, the matching length and the distance of the current character string are searched according to the dictionary, a method of double hash searching is adopted on a searching strategy, the condition of the longer matching character string is more emphasized on a compression format, meanwhile, the classification of coding conditions is simplified, the decompression abnormality caused by data overflow is avoided through checking an address in the decompression process, the decompression speed is improved, the overflow problem is solved, and the novel algorithm is used for solving the technical problems of low decompression speed, high algorithm cost and overflow in decompression existing in the conventional lossless compression algorithm on the basis of LZO.

Description

Lightweight lossless compression method for rapid decompression application
Technical Field
The invention relates to the field of data compression, in particular to a lightweight lossless compression method for quick decompression application.
Background
With the development of network technology, data storage and data transmission have prompted the field of data compression. Compared with lossy compression, lossless data compression removes redundant information in the original text as far as possible on the premise of not losing information, and ensures that decompressed data is completely consistent with data before compression. The dictionary LZ-based series algorithm plays a significant role in the lossless compression field.
Lossless compression algorithms of the LZ series can be divided into two classes according to the applicable circumstances: the method is a lightweight compression algorithm such as LZ77 and LZO, the principle of the algorithm is simpler, the realization cost of the algorithm is low, and the algorithm is suitable for running in an embedded processor, but the compression rate is lower. The other type is algorithms with higher compression rates such as DEFLATE and LZMA, however, the principle is complex, the code amount and the resource cost of the algorithms are large, and the algorithms are not suitable for running in an embedded processor.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a lightweight lossless compression method for rapid decompression application.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
a lightweight lossless compression method for fast decompression application comprises the following steps:
s1, performing matching search twice by using an LZ0 algorithm dictionary with the size twice, and updating dictionary pages successfully matched;
s2, compressing and storing the successfully matched distance and the matching length information according to different storage distances;
s3, judging the coding type of the file after compression storage through the first byte of the code, and recovering the matching length and the distance according to the coding result;
s4, adopting a boundary detection method to carry out security check when writing operation is carried out on each storage space, and judging whether data overflow occurs in decompression.
Further, the two matching search modes in S1 are as follows:
s11, sequentially carrying out individual matching by using two pages of a matching dictionary, and judging a matching result;
s12, if the first matching fails, the second hash operation is not performed, and the hash value is directly used as an index in the first matching to continuously inquire whether a matching item exists in the second page dictionary;
s13, if the two times of matching are unsuccessful, the current data is used as new characters to be directly output without compression;
and S14, if the matching is successful in both times, compressing the dictionary pages by using the matching information with longer matching length, and updating the dictionary pages.
Further, in the step S2, when the matching information with the distance smaller than 2K bytes is stored, a first type of storage format is adopted for compression storage; and when the storage distance is greater than or equal to 2K bytes of matching information, adopting a second type of storage format to carry out compression storage.
Further, the first type of storage format includes a first compressed storage format and a second compressed storage format, where the first compressed storage format is used to represent a compressed format when the matching length is smaller than 18 bytes, the matching length at this time adopts a 2-reduction representation method, the matching distance adopts 11 bits to represent, and the matching length adopts 4 bits to represent; the second compressed storage format is used for representing the compressed format when the matching length is greater than or equal to 18 bytes and less than 256 bytes, wherein the matching distance is represented by 11 bits, and the matching length is represented by 8 bits.
Further, the second type of storage format includes a third compressed storage format and a fourth compressed storage format, where the third compressed storage format is used to represent a compressed format when the matching length is less than 19 bytes, and the matching length is represented by a 3-bit subtracting method and 4 bits; the fourth compression format is used to represent a compression format when the matching length is 19 bytes or more and 256 bytes or less,
further, the step S3 specifically includes:
s31: taking the first byte W, W and 0xC0 to do AND operation, if the result is equal to 0x80, outputting the original character, and jumping to the fifth step. If the result is not 0x80, jump to S32;
s32: it is determined whether the W result is equal to 0xC0, equal to rotation S33, and not equal to rotation S34.
S33: and performing AND operation on the W and 0x3C, decompressing according to the first format if the result is not equal to zero, decompressing according to the second format if the result is equal to zero, and jumping to S35.
S34: w and 0x78 are anded, decompressed according to the third format if the result is not equal to zero, decompressed according to the fourth format if the result is equal to zero, and step S35 is skipped.
S35: whether the data processing is completed or not is judged, the first step of the jump is not completed, the loop execution is continued, and the jump is completed after the processing is completed S36.
S36: and judging that the decompression of the whole file is completed, and if yes, outputting a final result.
The invention has the following beneficial effects:
1) The decompression speed is effectively improved, and compared with the LZO algorithm, the decompression speed is improved by 16% under the condition that the compression rate is hardly influenced.
2) The problem of data overflow can be well handled through the security check of the decompression module.
3) The software development of the lossless compression algorithm for the rapid decompression application is completed, and the software development on the Windows platform and the software migration on the DSP platform are completed due to the lightweight design of the lossless compression algorithm.
Drawings
FIG. 1 is a schematic flow chart of a lightweight lossless compression method for fast decompression application.
FIG. 2 is a dictionary coding diagram of the present invention.
Fig. 3 is a schematic diagram of a compressed format of the present invention, where a is a first compressed storage format, b is a second compressed storage format, c is a third compressed storage format, and d is a fourth compressed storage format.
Fig. 4 is a schematic diagram of a decompression flow chart according to the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
A lightweight lossless compression method for fast decompression application, as shown in figure 1, comprises the following steps:
s1, performing matching search twice by using an LZ0 algorithm dictionary with the size twice, and updating dictionary pages successfully matched;
as shown in fig. 2, the newly designed compression algorithm performs matching search by adopting a twice matching mode, the matching dictionary has two pages in total, if the first matching fails, the second hash operation is not performed, and the hash value at the time of the first matching is still used as an index to directly and continuously search whether the matching item exists in the second page dictionary. If the two matches are unsuccessful, the current data is directly output as new characters without compression. If both matches are successful, the compression process is performed using the match information with the longer match length. While updating the older page of dictionary entries.
The dictionary size of the newly designed compression algorithm is twice that of the LZO algorithm, and the design can reduce the speed of the compression process, but can improve the success rate of the character string matching link. This design makes the dictionary flexible and resilient. When other compression demands exist, the size of the dictionary can be flexibly adjusted only by increasing or decreasing the pages of the dictionary, so as to achieve different compression performances.
S2, compressing and storing the successfully matched distance and the matching length information according to different storage distances;
as shown in fig. 3, in order to achieve the purpose of decompressing data more quickly, the newly designed compression algorithm performs compression storage on the distance and the matching length information of successful matching according to different distances, where the first type of storage format is used for storing matching information with a distance less than 2 kbytes, and the second type of storage format is used for storing matching information with a distance greater than or equal to 2 kbytes.
When the distance is less than 2K bytes, the algorithm designs two compression formats to store the distance and matching length information after matching is successful. The first format is used to represent a compressed format when the matching length is less than 18 bytes, where the matching length is represented by subtracting 2, the distance is represented by 11 bits, and the matching length is represented by 4 bits. The second format is used to represent a compressed format when the matching length is 18 bytes or more and 256 bytes or less, the distance is represented by 11 bits, and the matching length is represented by 8 bits.
Compression of matching data with larger distance is effectively realized for compression formats with the distance being more than or equal to 2K bytes. LZO designed two compression formats to store distance and matching length information. The first format is used to represent the compressed format when the matching length is less than 19 bytes, and the matching length is represented by 4 bits using the 3-bit reduction representation method. The second format is used to represent a compressed format when the matching length is 19 bytes or more and less than 256 bytes.
S3, judging the coding type of the file after compression storage through the first byte of the code, and recovering the matching length and the distance according to the coding result;
as shown in fig. 4, the compression format analysis module of the newly designed compression algorithm makes a quick judgment on the coding type through the first byte of the coding, classifies the coding type into four types according to the format of the step S2, performs classification processing, and recovers the matching length and the distance according to the coding result. The method has the advantages of simple design, less classification, no excessive offset, reduced decompression calculated amount and improved decompression speed. The specific method is as follows:
s31: taking the first byte W, W and 0xC0 to do AND operation, if the result is equal to 0x80, outputting the original character, and jumping to the fifth step. If the result is not 0x80, jump to S32;
s32: it is determined whether the W result is equal to 0xC0, equal to rotation S33, and not equal to rotation S34.
S33: and performing AND operation on the W and 0x3C, decompressing according to the first format if the result is not equal to zero, decompressing according to the second format if the result is equal to zero, and jumping to S35.
S34: w and 0x78 are anded, decompressed according to the third format if the result is not equal to zero, decompressed according to the fourth format if the result is equal to zero, and step S35 is skipped.
S35: whether the data processing is completed or not is judged, the first step of the jump is not completed, the loop execution is continued, and the jump is completed after the processing is completed S36.
S36: and judging that the decompression of the whole file is completed, and if yes, outputting a final result.
S4, adopting a boundary detection method to carry out security check when writing operation is carried out on each storage space, and judging whether data overflow occurs in decompression.
After format analysis of the compressed data is completed, the original characters are required to be output according to the matching length and the distance. Aiming at the overflow problem in the decompressed code of LZO, a boundary detection method is adopted to carry out security check when each time of writing operation is carried out on the storage space, namely the boundary is required to be judged when each time of memory operation is carried out. The decompression program must ensure that data does not overflow and the program does not crash when the file is parsed for any corruption.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Claims (1)

1. The lightweight lossless compression method for the fast decompression application is characterized by comprising the following steps of:
s1, performing twice matching search by using an LZ0 algorithm dictionary with double size, and updating dictionary pages successfully matched, wherein the twice matching search mode is as follows:
s11, sequentially matching two pages of the matching dictionary, and judging a matching result;
s12, if the first matching fails, the second hash operation is not performed, and the hash value is directly used as an index in the first matching to continuously inquire whether a matching item exists in the second page dictionary;
s13, if the two times of matching are unsuccessful, the current data is used as new characters to be directly output without compression;
s14, if the two times of matching are successful, compressing the matching information with longer matching length, and updating the dictionary pages;
s2, compressing and storing the successfully matched distance and the matching length information respectively according to different storage distances, and when the storage distance is smaller than 2K bytes of matching information, adopting a first type storage format for compressing and storing; when the storage distance is greater than or equal to 2K bytes of matching information, adopting a second type of storage format to carry out compression storage, wherein the first type of storage format comprises a first compression storage format and a second compression storage format, the first compression storage format is used for representing the compression format when the matching length is smaller than 18 bytes, the matching length at the moment adopts a 2-reduction representation method, the matching distance adopts 11 bits for representation, and the matching length adopts 4 bits for representation; the second compressed storage format is used for representing the compressed format when the matching length is greater than or equal to 18 bytes and less than 256 bytes, wherein the matching distance is represented by 11 bits, the matching length is represented by 8 bits, the second type of storage format comprises a third compressed storage format and a fourth compressed storage format, the third compressed storage format is used for representing the compressed format when the matching length is less than 19 bytes, and the matching length is represented by a 3-bit subtracting representation method and 4 bits; the fourth compression format is used for representing the compression format when the matching length is greater than or equal to 19 bytes and less than 256 bytes;
s3, judging the coding type of the file after compression storage through the first byte of the code, and recovering the matching length and the distance according to the coding result, wherein the method specifically comprises the following steps:
s31: taking the first byte W, W and 0xC0 for AND operation, if the result is equal to 0x80, outputting the original character, and jumping to S35; if the result is not 0x80, jump to S32;
s32: judging whether the result is equal to 0xC0, equal to the rotation S33 and not equal to the rotation S34;
s33: performing AND operation on W and 0x3C, decompressing according to the first format if the result is not equal to zero, decompressing according to the second format if the result is equal to zero, and jumping to S35;
s34: w and 0x78 are AND-operated, if the result is not equal to zero, the decompression is carried out according to the third format, if the result is equal to zero, the decompression is carried out according to the fourth format, and the step S35 is skipped;
s35: judging whether the data is processed, if not, continuing to circularly execute the first step of the jump, and if so, completing the jump S36;
s36: judging that the decompression of the whole file is completed, and if yes, outputting a final result;
s4, adopting a boundary detection method to carry out security check when writing operation is carried out on each storage space, and judging whether data overflow occurs in decompression.
CN202210269150.7A 2022-03-18 2022-03-18 Lightweight lossless compression method for rapid decompression application Active CN114598329B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210269150.7A CN114598329B (en) 2022-03-18 2022-03-18 Lightweight lossless compression method for rapid decompression application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210269150.7A CN114598329B (en) 2022-03-18 2022-03-18 Lightweight lossless compression method for rapid decompression application

Publications (2)

Publication Number Publication Date
CN114598329A CN114598329A (en) 2022-06-07
CN114598329B true CN114598329B (en) 2023-04-25

Family

ID=81819781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210269150.7A Active CN114598329B (en) 2022-03-18 2022-03-18 Lightweight lossless compression method for rapid decompression application

Country Status (1)

Country Link
CN (1) CN114598329B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011127073A1 (en) * 2010-04-05 2011-10-13 Georgia Tech Research Corporation Structural health monitoring systems and methods
WO2013048530A1 (en) * 2011-10-01 2013-04-04 Intel Corporation Method and apparatus for high bandwidth dictionary compression technique using set update dictionary update policy
CN103236847A (en) * 2013-05-06 2013-08-07 西安电子科技大学 Multilayer Hash structure and run coding-based lossless compression method for data
CN104410424A (en) * 2014-11-26 2015-03-11 西安电子科技大学 Quick lossless compression method of memory data of embedded device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10419737B2 (en) * 2015-04-15 2019-09-17 Google Llc Data structures and delivery methods for expediting virtual reality playback
US10382769B2 (en) * 2016-02-15 2019-08-13 King Abdullah University Of Science And Technology Real-time lossless compression of depth streams
US9923577B1 (en) * 2016-09-04 2018-03-20 ScaleFlux, Inc. Hybrid software-hardware implementation of lossless data compression and decompression

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011127073A1 (en) * 2010-04-05 2011-10-13 Georgia Tech Research Corporation Structural health monitoring systems and methods
WO2013048530A1 (en) * 2011-10-01 2013-04-04 Intel Corporation Method and apparatus for high bandwidth dictionary compression technique using set update dictionary update policy
CN103236847A (en) * 2013-05-06 2013-08-07 西安电子科技大学 Multilayer Hash structure and run coding-based lossless compression method for data
CN104410424A (en) * 2014-11-26 2015-03-11 西安电子科技大学 Quick lossless compression method of memory data of embedded device

Also Published As

Publication number Publication date
CN114598329A (en) 2022-06-07

Similar Documents

Publication Publication Date Title
CN105893337B (en) Method and apparatus for text compression and decompression
CN114244373B (en) LZ series compression algorithm coding and decoding speed optimization method
US5883588A (en) Data compression system and data compression device for improving data compression rate and coding speed
US6597812B1 (en) System and method for lossless data compression and decompression
US10187081B1 (en) Dictionary preload for data compression
CN107682016B (en) Data compression method, data decompression method and related system
US20130103655A1 (en) Multi-level database compression
CN1228887A (en) Data compression and decompression system with immediate dictionary updating interleaved with string search
US20200294629A1 (en) Gene sequencing data compression method and decompression method, system and computer-readable medium
JPS62212849A (en) Data file system
CN114567331A (en) LZ 77-based compression method, device and medium thereof
CN114157305B (en) Method for rapidly realizing GZIP compression based on hardware and application thereof
CN115189696A (en) Hardware compression and decompression method based on Huffman decoding table
CN114598329B (en) Lightweight lossless compression method for rapid decompression application
CN116192154B (en) Data compression and data decompression method and device, electronic equipment and chip
JP5549177B2 (en) Compression program, method and apparatus, and decompression program, method and apparatus
CN103701470A (en) Stream intelligence prediction differencing and compression algorithm and corresponding control device
US11652495B2 (en) Pattern-based string compression
JP3038223B2 (en) Data compression method
US20180300087A1 (en) System and method for an improved real-time adaptive data compression
CN112398481B (en) Feedback type matching prediction multistage real-time compression system and method
JP3105598B2 (en) Data compression method using universal code
US9176973B1 (en) Recursive-capable lossless compression mechanism
JP3241787B2 (en) Data compression method
WO2023082156A1 (en) Lz77 decoding circuit and operation method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant