WO2014029081A1

WO2014029081A1 - Compression method and apparatus

Info

Publication number: WO2014029081A1
Application number: PCT/CN2012/080429
Authority: WO
Inventors: 覃祥菊
Original assignee: 华为技术有限公司
Priority date: 2012-08-21
Filing date: 2012-08-21
Publication date: 2014-02-27
Also published as: CN103210590A; CN103210590B

Abstract

Embodiments of the present application provide a compression method and apparatus. A second bit stream is generated by processing, by using an uncompressed processing manner, for example, an uncompressed (uncompressed or stored) encoding mode in the GZIP compression technology, a final part (that is, a second part) of a text file to be compressed, which makes the length of the complete output bit stream, that is, the sum of the length of a first bit stream and the length of the second bit stream, to be a smallest data unit capable of being processed by a processor, such as an integral multiple of a byte (Byte), so that the processor does not need to perform shift and joint operations when combining bit streams output from each compression engine, thereby saving processing resources of the processor.

Description

Compression method and device

Technical field

The present application relates to compression technology, and in particular, to a compression method and device. Background technique

GZIP compression technology mainly includes two-part processing mode, one part of which is LZ77 compression mode and the other part of which is Huffman coding mode. The LZ77 compression method is a phrase compression algorithm. When a string in a text file has a string that is completely repeated before it, the {repeating length, the distance between the two} can be utilized. The dual group replaces the former to achieve the purpose of data compression; if a string in a text file does not exist before it is completely repeated, it will not be processed. Huffman coding is a method of encoding data that can be compressed. It can be based on the frequency difference between different characters in a text file, using shorter code words instead of higher frequency characters, and using longer code words. Instead of appearing lower frequency characters, the purpose of data compression is achieved. The Huffman coding method may include a static Huffman coding method and a dynamic Huffman coding method. Among them, the static Huffman coding method encodes characters according to the fixed code table specified by the protocol, and the compression effect is limited; the dynamic Huffman coding method is to count the frequency of occurrence of characters in the text file to be compressed, according to the statistical result. The encoding code table is generated, and the compression effect is obvious, but it is necessary to consume resources and time to count the appearance frequency of the characters and generate the encoding code table, and the output bit stream also needs to carry additional encoding code table information, and the output data is increased. the amount.

GZIP compression technology can be implemented using hardware devices. For example, the processor, the Central Processing Unit (CPU), can combine the bit streams output by each compression engine to obtain a compressed text file. In the prior art, when the processor combines the bit streams output by each compression engine, it needs to perform a shift splicing operation to obtain a bit stream whose length is an integer multiple of the minimum data unit that the processor can process. Summary of the invention

Aspects of the present application provide a compression method and apparatus for saving processor processing resources Source.

In an aspect of the application, a compression method is provided, including:

Obtaining a text file to be compressed, where the text file includes a first part and a second part; processing the first part by using a compression processing manner and/or a non-compression processing manner to generate a first bit stream;

And processing, according to the length of the first bit stream, the second part by using a non-compression processing manner to generate a second bit stream, a length of the first bit stream and a length of the second bit stream An integer multiple of the smallest unit of data that the processor can handle;

The first bit stream and the second bit stream are output for processing by the processor. The foregoing aspect and any possible implementation manner further provide an implementation manner, where the compression processing manner includes a static Huffman coding method and a dynamic Huffman coding method in a GZIP compression technology;

The non-compression processing method includes an uncompressed coding mode in the GZIP compression technology. The foregoing aspect and any possible implementation manner further provide an implementation manner, where the compression processing manner further includes an LZ77 compression mode in the GZIP compression technology.

In an aspect as described above and any possible implementation, an implementation is further provided, the second part comprising at least one byte of an end of the text file.

The above aspect and any possible implementation manner further provide an implementation manner, where the most d, the data unit includes a byte.

In another aspect of the present application, a compression device is provided, including:

a receiving unit, configured to acquire a text file to be compressed, and transmit the text file to a processing unit, where the text file includes a first part and a second part;

The processing unit is configured to process the first part by using a compression processing manner and/or a non-compression processing manner to generate a first bit stream, and transmit the first bit stream to an output unit; Decoding a length of the first bit stream, processing the second portion by a non-compression processing manner to generate a second bit stream, and transmitting the second bit stream to the output unit, the first bit stream The sum of the length of the second bit stream and the length of the second bit stream is an integer multiple of the smallest unit of data that the processor can process;

The output unit is configured to output the first bit stream and the second bit stream for processing by the processor. The foregoing aspect and any possible implementation manner further provide an implementation manner, where the compression processing manner includes a static Huffman coding method and a dynamic Huffman coding method in a GZIP compression technology;

a receiver, configured to acquire a text file to be compressed, and transmit the text file to a processor, the text file including a first part and a second part;

The processor is configured to process the first part by using a compression processing manner and/or a non-compression processing manner to generate a first bit stream, and transmit the first bit stream to a transmitter; Decoding a length of the first bitstream, processing the second portion by a non-compression processing manner to generate a second bitstream, and transmitting the second bitstream to the transmitter, the first bitstream The sum of the length of the second bit stream and the length of the second bit stream is an integer multiple of the smallest unit of data that the processor can process;

The transmitter is configured to output the first bit stream and the second bit stream for processing by the processor.

The foregoing aspect and any possible implementation manner further provide an implementation manner, where the compression processing manner includes a static Huffman coding method and a dynamic Huffman coding method in a GZIP compression technology;

An aspect as described above and any possible implementation manner further provide an implementation manner, The most d, the data unit includes bytes.

According to the foregoing technical solution, the second part of the text file to be compressed is processed by an uncompressed (Uncompressed or Stored) coding method in the second part of the text file to be compressed, for example, by using an uncompressed (Stored) method in the GZIP compression technology. The second bit stream may be such that the length of the output complete bit stream, that is, the sum of the lengths of the first bit stream and the second bit stream, is a minimum data unit that the processor can process, for example, an integer multiple of a byte (Byte), such that The processor no longer needs to perform a shift splicing operation when merging the bit streams output by each compression engine, thereby saving the processing resources of the processor. BRIEF DESCRIPTION OF THE DRAWINGS In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. The drawings are some embodiments of the present application, and those skilled in the art can obtain other drawings based on these drawings without any inventive labor.

FIG. 1 is a schematic flowchart of a compression method according to an embodiment of the present application;

1B is a schematic diagram showing the processing manners of the first part and the second part included in the text file to be compressed in the embodiment corresponding to FIG. 1A;

1 C is a schematic diagram of a data flow after processing a second portion included in a text file to be compressed in the embodiment corresponding to FIG. 1A;

1D is a schematic diagram of data flow after processing the first part and the second part included in the text file to be compressed in the embodiment corresponding to FIG. 1A;

2 is a schematic structural diagram of a compression device according to another embodiment of the present application;

FIG. 3 is a schematic structural diagram of a compression device according to another embodiment of the present disclosure. The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present application. The embodiments are part of the embodiments of the present application, and not all of the embodiments. Based on the embodiments in the present application, those of ordinary skill in the art obtain the following without creative efforts. All other embodiments obtained are within the scope of the protection of the present application.

In addition, the term "and/or" in this context is merely an association describing the associated object, indicating that there may be three relationships, for example, A and / or B, which may indicate: A exists separately, and A and B exist simultaneously. There are three cases of B alone. In addition, the character '7' in this article generally indicates that the contextual object is an "or" relationship. In some cases, after some text files are processed by LZ77 compression and Huffman coding, the compression effect is not Obviously, even the processed file may be larger than the original text file. Therefore, GZIP compression technology can further include another part of the processing method, that is, uncompressed or stored, that is, LZ77 compression and Huffman coding are not performed. After the flag is directly added to the original text file, the output is shown in FIG. 1A. FIG. 1A is a schematic flowchart of a compression method according to an embodiment of the present application.

101. Acquire a text file to be compressed, where the text file includes a first part and a second part.

102. Process the first part by using a compression processing manner and/or a non-compression processing manner to generate a first bit stream.

Which processing method is selected to process the first part, and may be selected according to a preset processing strategy. For example, the processing method with the highest compression ratio may be selected.

103. The second part is processed by using a non-compression processing manner to generate a second bit stream, the length of the first bit stream and the length of the second bit stream, according to the length of the first bit stream. The sum is an integer multiple of the smallest unit of data that the processor can handle.

104. Output the first bit stream and the second bit stream for processing by the processor. It should be noted that the execution body of 101~104 may be each of at least two compression engines in a hardware device.

It should be noted that the text file to be compressed acquired by each compression engine may be obtained by dividing the original text file by the processor according to the processing capability of each compression engine. Both the original text file and the text file to be compressed are in the smallest unit of data that the processor can handle, for example, a byte (Byte).

Optionally, in a possible implementation manner of this embodiment, the compression processing manner may include a static Huffman coding method and a dynamic Huffman coding method in a GZIP compression technology; The non-compression processing method may include the GZIP Uncompressed or Stored encoding in compression technology, as shown in Figure 1B. For a detailed description of the static Huffman coding method, the dynamic Huffman coding method, and the uncompressed or uncompressed coding method in the GZIP compression technology, refer to the prior art. The relevant content in the content is not described here.

Optionally, in a possible implementation manner of the embodiment, the compression processing manner may further include an LZ77 compression manner in the GZIP compression technology. Specifically, the first part may be processed by using an LZ77 compression method, and then processed by the LZ77 compression method by using a static Huffman coding method or a Dynamic Huffman coding method. The first part of the process is processed.

Optionally, in a possible implementation manner of this embodiment, the second part may include at least one byte of an end of the text file.

Specifically, the second bit stream may include a flag bit "xOO" of 3 bits (bit), a length code (LEN) of 0 bits, a length code of 1 byte, and a length code of 1 byte. (NLEN) and the data contained in the second part, as shown in Figure 1C. The sum of the length, the 3 bits, the m bits, the 1 byte, the 1 byte, and the length of the data included in the second portion of the first bit stream is the minimum data that the processor can process. The unit is, for example, an integer multiple of a byte (Byte). Wherein, if the second part is the last data block of the text file to be compressed, X is a value of "1"; if the second part is not the last data of the text file to be compressed Block, then X takes the value "0". The text file to be compressed may be divided into a plurality of data blocks (blocks) as a basic unit of a static Huffman coding method or a dynamic Huffman coding method, each data block. The size is usually 4k~32k bytes.

In order to make the method provided by the embodiment of the present application clearer, a 100-byte text file to be compressed will be taken as an example. Wherein, the 100 bytes of the text file to be compressed may include the first part, that is, the first 96 bytes and the second part, that is, the following 4 bytes. It is assumed that the data included in the first part is processed by the compression processing method and/or the non-compression processing manner, and the generated first bit stream has a length of 207 bits. The sum of the first bit stream, the 3-bit flag bit "100", the 1-byte LEN, the 1-byte NLEN, and the length of the data contained in the second portion is (207+3+) 8+8+32) bits, ie 258 bits. At this time, 258 ratios After dividing into 8 groups, the remaining 2 bits are left. Therefore, in order to make the length of the first bit stream, 3 bits, m bits, 1 byte, 1 byte, and the number The sum of the lengths of the data contained in the two parts is an integer multiple of the byte, and the value of m can be obtained as 6 bits, as shown in FIG.

In this embodiment, the last part (ie, the second part) of the text file to be compressed is processed by an uncompressed processing method, for example, an uncompressed (Stored) method in the GZIP compression technology. Generating a second bit stream, such that the length of the output complete bit stream, that is, the sum of the lengths of the first bit stream and the second bit stream, is a minimum data unit that the processor can process, for example, an integer multiple of a byte (Byte), This makes it unnecessary for the processor to perform the shift splicing operation when merging the bit streams output by each compression engine, thereby saving the processing resources of the processor.

It should be noted that, for the foregoing method embodiments, for the sake of brevity, they are all described as a series of action combinations, but those skilled in the art should understand that the present application is not limited by the described action sequence. Because in accordance with the present application, certain steps may be performed in other orders or concurrently. In addition, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present application.

In the above embodiments, the descriptions of the various embodiments are different, and the parts that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.

FIG. 2 is a schematic structural diagram of a compression device according to another embodiment of the present disclosure. As shown in FIG. 2, the compression device of this embodiment may include a receiving unit 21, a processing unit 22, and an output unit 23. The receiving unit 21 is configured to acquire a text file to be compressed, and transmit the text file to the processing unit 22, where the text file includes a first part and a second part. The processing unit 22 is configured to utilize compression processing. And processing the first portion to generate a first bit stream, and transmitting the first bit stream to the output unit 23; and utilizing the length of the first bit stream according to a manner and/or a non-compression processing manner The second portion is processed by the non-compression processing to generate a second bit stream, and the second bit stream is transmitted to the output unit 23, the length of the first bit stream and the second bit The sum of the lengths of the streams is an integer multiple of the smallest unit of data that the processor can process; the output unit 23 is configured to output the first bit stream and the second bit stream for processing by the processor.

It should be noted that the compression device provided in this embodiment may be at least two of a hardware device. Any compression engine in the compression engine.

It should be noted that the text file to be compressed acquired by the receiving unit 21 may be obtained by the processor by dividing the original text file according to the processing capability of each compression engine. Both the original text file and the text file to be compressed are in the smallest unit of data that the processor can handle, for example, Bytes.

Optionally, in a possible implementation manner of this embodiment, the compression processing manner may include a static Huffman coding method and a dynamic Huffman coding method in a GZIP compression technology; The uncompressed processing manner may include an uncompressed (Stored) or Unstamped encoding method in the GZIP compression technology, as shown in FIG. 1B.

The detailed description of the static Huffman coding method, the dynamic Huffman coding method, and the uncompressed or uncompressed coding mode in the GZIP compression technology can be found in the existing The relevant content in the technology will not be described here.

Optionally, in a possible implementation manner of the embodiment, the compression processing manner may further include an LZ77 compression manner in the GZIP compression technology. Specifically, the processing unit 22 may first process the first part by using an LZ77 compression method, and then use a static Huffman coding method or a dynamic Huffman coding method to pass the LZ77. The first portion of the compression mode processing is processed.

Specifically, the second bit stream may include a 3-bit flag bit "xOO", m bits of 0, a 1-byte length code (LEN), and a 1-byte length code inverse code (NLEN). And the data contained in the second part, as shown in FIG. 1C. The sum of the length of the first bit stream, 3 bits, m bits, 1 byte, 1 byte, and the length of data included in the second part is the minimum data that the processor can process. The unit is, for example, an integer multiple of a byte (Byte). Wherein, if the second part is the last data block of the text file to be compressed, X takes a value of "1"; if the second part is not the last data of the text file to be compressed Block, then X takes the value "0".

In order to make the method provided by the embodiment of the present application clearer, a 100-byte text file to be compressed will be taken as an example. Wherein, the 100 bytes of the text file to be compressed may include The first part is the first 96 bytes and the second part is the last 4 bytes. It is assumed that the data included in the first part is processed by the compression processing method and/or the non-compression processing manner, and the generated first bit stream has a length of 207 bits. The sum of the first bit stream, the 3-bit flag bit "100", the 1-byte LEN, the 1-byte NLEN, and the length of the data contained in the second portion is (207+3+) 8+8+32) bits, ie 258 bits. At this time, after 258 bits are divided into groups of 8 bits, the remaining 2 bits are left. Therefore, in order to make the length of the first bit stream, 3 bits, m bits, 1 byte, 1 word The sum of the lengths of the data included in the section and the second part is an integer multiple of the byte, and the value of m can be obtained as 6 bits, as shown in FIG.

The compression device provided in this embodiment is used to perform the method in the embodiment shown in FIG. 1A, and details have been described in the embodiment shown in FIG. 1A, and details are not described herein again.

In this embodiment, the last part (ie, the second part) of the text file to be compressed by the processing unit is used in an uncompressed processing manner, for example, an uncompressed or unscheduled encoding method in the GZIP compression technology. Processing to generate the second bit stream, the length of the complete bit stream output by the output unit, that is, the sum of the lengths of the first bit stream and the second bit stream, is the smallest data unit that the processor can process, for example, a byte (Byte) Integer multiples, so that the processor does not need to perform a shift splicing operation when merging the bit streams output by each compression engine, thereby saving the processing resources of the processor.

FIG. 3 is a schematic structural diagram of a compression device according to another embodiment of the present disclosure. As shown in FIG. 3, the compression device of this embodiment may include a receiver 31, a processor 32, and a transmitter 33. The receiver 31 is configured to acquire a text file to be compressed, and transmit the text file to the processor 32, where the text file includes a first part and a second part, and the processor 32 is configured to utilize compression processing. And processing the first portion to generate a first bit stream, and transmitting the first bit stream to the transmitter 33; and utilizing the length of the first bit stream according to a manner and/or a non-compression processing manner The second portion is processed by the non-compression processing to generate a second bit stream, and the second bit stream is transmitted to the transmitter 33, the length of the first bit stream and the second bit The sum of the lengths of the streams is an integer multiple of the smallest unit of data that the processor can process; the transmitter 33 is configured to output the first bit stream and the second bit stream for processing by the processor.

It should be noted that the compression device provided in this embodiment may be any one of at least two compression engines in a hardware device. It should be noted that the text file to be compressed acquired by the receiver 31 may be obtained by dividing the original text file by the processor according to the processing capability of each compression engine. Both the original text file and the text file to be compressed are in the smallest unit of data that the processor can handle, for example, Bytes.

The detailed descriptions of the static Huffman coding method, the dynamic Huffman coding method, and the uncompressed or the uncompressed coding method in the GZIP compression technology may be specifically described in the existing The relevant content in the technology will not be described here.

Optionally, in a possible implementation manner of the embodiment, the compression processing manner may further include an LZ77 compression manner in the GZIP compression technology. Specifically, the processor 32 may first process the first part by using an LZ77 compression method, and then use a static Huffman coding method or a dynamic Huffman coding method to pass the LZ77. The first portion of the compression mode processing is processed.

To make the method and apparatus provided by the embodiments of the present application clearer, a 100-byte text file to be compressed will be taken as an example. Wherein, the 100 bytes of the text file to be compressed may include the first part, that is, the first 96 bytes and the second part, that is, the following 4 bytes. Assume that After the data contained in the first part is processed by the compression processing method and/or the non-compression processing method, the generated first bit stream has a length of 207 bits. The sum of the first bit stream, the 3-bit flag bit "100", the 1-byte LEN, the 1-byte NLEN, and the length of the data contained in the second portion is (207+3+) 8+8+32) bits, ie 258 bits. At this time, after 258 bits are divided into groups of 8 bits, the remaining 2 bits are left. Therefore, in order to make the length of the first bit stream, 3 bits, m bits, 1 byte, 1 word The sum of the lengths of the data included in the section and the second part is an integer multiple of the byte, and the value of m can be obtained as 6 bits, as shown in FIG.

In this embodiment, the last part (ie, the second part) of the text file to be compressed by the processor is processed by an uncompressed processing method, for example, an uncompressed or stored code in the GZIP compression technology. Processing to generate the second bit stream, the length of the complete bit stream output by the transmitter, that is, the sum of the lengths of the first bit stream and the second bit stream, is the smallest unit of data that the processor can process, for example, a byte (Byte) Integer multiples, so that the processor does not need to perform a shift splicing operation when merging the bit streams output by each compression engine, thereby saving the processing resources of the processor.

A person skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the several embodiments provided herein, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed. In addition, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. You can choose some of them according to actual needs or All units are used to achieve the objectives of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.

The above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium. The above software functional unit is stored in a storage medium and includes a plurality of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute the method of the various embodiments of the present application. Part of the steps. The foregoing storage medium includes: a U disk, a mobile hard disk, a Read-Only Memory (ROM), a random access memory (RAM), a disk or an optical disk, and the like, which can store program codes. .

Finally, it should be noted that the above embodiments are only used to explain the technical solutions of the present application, and are not limited thereto; although the present application is described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: The technical solutions described in the foregoing embodiments are modified, or some of the technical features are equivalently substituted; and the modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

claims

1. A compression method, characterized by including:

Obtain a text file to be compressed, where the text file includes a first part and a second part; process the first part using a compression processing method and/or a non-compression processing method to generate a first bit stream;

According to the length of the first bit stream, the second part is processed using a non-compression processing method to generate a second bit stream. The sum of the length of the first bit stream and the length of the second bit stream is It is an integer multiple of the smallest data unit that the processor can process;

The first bit stream and the second bit stream are output for processing by the processor.

2. The method according to claim 1, characterized in that the compression processing method includes

The static Huffman coding method and the dynamic Huffman coding method in the GZIP compression technology; the non-compression processing method includes the non-compression coding method in the GZIP compression technology.

3. The method according to claim 2, characterized in that the compression processing method also includes the LZ77 compression method in the GZIP compression technology.

4. The method according to any one of claims 1 to 3, characterized in that the second part includes at least one byte at the end of the text file.

5. The method according to any one of claims 1 to 4, characterized in that the minimum data unit includes bytes.

6. A compression device, characterized in that it includes:

A receiving unit, configured to obtain a text file to be compressed and transmit the text file to the processing unit, where the text file includes a first part and a second part;

The processing unit is configured to process the first part using a compression processing method and/or a non-compression processing method to generate a first bit stream, and transmit the first bit stream to an output unit; and according to the The length of the first bit stream, processing the second part using a non-compression processing method to generate a second bit stream, and transmitting the second bit stream to the output unit, the first bit stream The sum of the length and the length of the second bit stream is an integer multiple of the smallest data unit that the processor can process;

The output unit is used to output the first bit stream and the second bit stream for processing by the processor.

7. The device according to claim 6, characterized in that the compression processing method includes The static Huffman coding method and the dynamic Huffman coding method in the GZIP compression technology; the non-compression processing method includes the non-compression coding method in the GZIP compression technology.

8. The device according to claim 7, wherein the compression processing method further includes the LZ77 compression method in the GZIP compression technology.

9. The device according to any one of claims 6 to 8, characterized in that the second part includes at least one byte at the end of the text file.

10. The device according to any one of claims 6 to 9, characterized in that the minimum data unit includes bytes.

11. A compression device, characterized in that it includes:

A receiver, used to obtain a text file to be compressed and transmit the text file to the processor, where the text file includes a first part and a second part;

The processor is configured to process the first part using a compression processing method and/or a non-compression processing method to generate a first bit stream, and transmit the first bit stream to the transmitter; and according to the The length of the first bit stream, processing the second part using a non-compression processing method to generate a second bit stream, and transmitting the second bit stream to the sender, the first bit stream The sum of the length and the length of the second bit stream is an integer multiple of the smallest data unit that the processor can process;

12. The device according to claim 11, wherein the compression processing method includes the static Huffman coding method and the dynamic Huffman coding method in the GZIP compression technology; the non-compression processing method includes the GZIP Non-compression coding method in compression technology.

13. The device according to claim 12, characterized in that the compression processing method also includes the LZ77 compression method in the GZIP compression technology.

14. The device according to any one of claims 11 to 13, wherein the second part includes at least one byte at the end of the text file.

15. The device according to any one of claims 11 to 14, characterized in that the minimum data unit includes bytes.