CN106849956B - Compression method, decompression method, device and data processing system - Google Patents

Compression method, decompression method, device and data processing system Download PDF

Info

Publication number
CN106849956B
CN106849956B CN201611270254.0A CN201611270254A CN106849956B CN 106849956 B CN106849956 B CN 106849956B CN 201611270254 A CN201611270254 A CN 201611270254A CN 106849956 B CN106849956 B CN 106849956B
Authority
CN
China
Prior art keywords
code
data
character
common divisor
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611270254.0A
Other languages
Chinese (zh)
Other versions
CN106849956A (en
Inventor
任麒斌
陆超
柯继伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201611270254.0A priority Critical patent/CN106849956B/en
Publication of CN106849956A publication Critical patent/CN106849956A/en
Application granted granted Critical
Publication of CN106849956B publication Critical patent/CN106849956B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application discloses a compression method, a decompression method, a device and a data processing system, and belongs to the field of data processing. The decompression method comprises the following steps: reading a code segment with the code length of a x N + b from compressed data according to the x clock period to obtain (a x N + b)/a sub-code segments; decompressing the (a x N + b)/a sub-code segments concurrently to obtain (a x N + b)/a characters; and determining target decompression data of the x clock period from the (a × N + b)/a characters according to the variable length coding table and the code segment read in the x clock period. The method and the device have the advantages that the plurality of subcode segments are decompressed concurrently to obtain a plurality of characters, and then the target decompressed data of the x-th clock period is determined from the plurality of characters. The next code word is decompressed according to the length of the code word without waiting for the completion of the decompression of the code word, and the decompression speed is high.

Description

Compression method, decompression method, device and data processing system
Technical Field
The present application relates to the field of data processing, and in particular, to a compression method, a decompression method, an apparatus, and a data processing system.
Background
Variable Length Coding (english: Variable Length Coding) is a Coding method used when compressing data, in which characters in data to be compressed can be represented by code words (code words are binary codes of several bits) with different lengths, characters with higher occurrence probability are usually represented by shorter code words (e.g. 01), characters with lower occurrence probability are represented by longer code words (e.g. 111001), and the correspondence between characters and code words can be recorded in a Variable Length Coding table.
At present, data obtained by compressing data by using variable length coding is called compressed data encapsulation, and the compressed data encapsulation may include: the compressed data and the variable length coding table, etc. when the compressed data package is decompressed, the compressed data package is firstly analyzed to obtain the variable length coding table, and then the compressed data is decoded in sequence according to the variable length coding table from the first bit of the compressed data to obtain the decompressed data.
Since the length of each codeword in the compressed data package is unknown before decompression, when decompressing the compressed data, the starting position of the next codeword can be determined after decoding one codeword, so that decompression needs to be performed on the compressed data one codeword after another according to the sequence of the compressed data, and the decompression speed is low.
Disclosure of Invention
In order to solve the problem of low decompression speed during decompression, embodiments of the present invention provide a compression method, a decompression method, an apparatus, and a data processing system. The technical scheme is as follows:
the decompression method provided by The embodiment of The invention can be executed by a decompression engine, and The decompression engine can be arranged in a processor of an X86 architecture (English: The X86 architecture) or a processor of an Advanced reduced instruction set processor (ARM) architecture.
In a first aspect, an embodiment of the present invention provides a decompression method, where the method includes:
the decompression engine reads a code segment with the code length of a x N + b from the compressed data according to the x clock period to obtain (a x N + b)/a sub-code segments; wherein x is an integer greater than or equal to 1; the code length of each sub-code segment in the (a × N + b)/a sub-code segments is c, the (a × N + b)/a sub-code segments comprise a first sub-code segment, the start bit of the first sub-code segment is overlapped with the start bit of the code segment read in the x clock period, the code length of the interval between the start bits of two adjacent sub-code segments in the (a × N + b)/a sub-code segments is a-1, and N is an integer greater than 0.
And the decompression engine is used for concurrently decompressing the (a × N + b)/a sub-code segments based on the variable length coding table to obtain (a × N + b)/a characters, wherein the decompression engine can obtain one character by decompressing each sub-code segment.
The variable length coding table may include a plurality of codewords, each of (a × N + b)/a characters corresponding to a codeword in the variable length coding table; the code lengths corresponding to at least two of the multiple code words are different, the code length corresponding to the code word with the longest code length in the multiple code words is c, the code length corresponding to the code word with the shortest code length in the multiple code words is a, and a is the greatest common divisor of the code lengths corresponding to each of the multiple code words, wherein a is an integer greater than or equal to 2, c is an integer greater than a, and the difference between c and a is b.
And the decompression engine determines target decompressed data of the x clock period from the (a x N + b)/a characters according to the variable length coding table and the code segment read by the x clock period, wherein in the target decompressed data of the x clock period, the position of the last code element of the code word corresponding to one effective character in every two adjacent effective characters in the code segment read by the x clock period is adjacent to the position of the start code element of the code word corresponding to the other effective character in the code segment read by the x clock period. The target decompressed data of the x-th clock cycle includes a plurality of valid characters arranged in a specific order.
In the decompression method provided by the embodiment of the present invention, when decompressing compressed data, the (a × N + b)/a sub-code segments are concurrently decompressed to obtain the (a × N + b)/a characters, and then the target decompressed data in the x-th clock cycle is determined from the (a × N + b)/a characters according to the variable length coding table and the code segment read in the x-th clock cycle. And a next code word is decoded according to the length of the code word without waiting for the completion of the decoding of the code word. When decompression is carried out, decompression can be started from a plurality of positions at the same time, and the decompression speed is high.
Optionally, the last b bits of the code segment with the code length a × N + b read in the x-th clock cycle coincide with the front b bits of the code segment with the code length a × N + b read in the x + 1-th clock cycle.
According to the decompression method provided by the embodiment of the invention, when the decompression engine reads two adjacent clock cycles, the front b bit of the next clock cycle is the rear b bit of the previous clock cycle, so that the uncompressing code segment is prevented from being missed during decompression.
Optionally, when x is equal to 1, the decompression engine determines the target decompressed data of the xth clock cycle from (a × N + b)/a characters according to the variable length coding table and the code segment read in the xth clock cycle, and specifically includes:
and the decompression engine determines target decompressed data of the x clock period from the (a × N + b)/a characters, wherein the start code element of the code word corresponding to the first effective character in the target decompressed data of the x clock period is coincident with the start code element of the code segment read by the x clock period.
In the decompression method provided by the embodiment of the present invention, when x is equal to 1, the target decompressed data in the 1 st clock cycle is determined by the position of the start symbol of the target decompressed data.
Optionally, when x is an integer greater than or equal to 2, the decompression engine determines the target decompressed data of the xth clock cycle from (a × N + b)/a characters according to the variable length coding table and the code segment read in the xth clock cycle, and specifically includes:
the decompression engine determines c/a group of candidate decompressed data from (a x N + b)/a characters; each group of candidate decompressed data comprises a plurality of characters, and in every two adjacent characters, the position of the last code element of the code word corresponding to one character in the code segment read in the x clock cycle is adjacent to the position of the start code element of the code word corresponding to the other character in the code segment read in the x clock cycle; the positions of the starting code elements of the code words corresponding to the first character in every two groups of candidate decompressed data in the code segments read in the x clock period are different, the starting code elements of the code words corresponding to the first character in each group of candidate decompressed data are the Wa code elements in the code segments read in the x clock period, and W is an integer which is greater than or equal to 0 and less than or equal to b/a.
The decompression engine determines target decompressed data of the x clock period from the c/a group of candidate decompressed data; in the compressed data, the last bit code element of the code word corresponding to the last character in the target decompressed data of the x-1 clock cycle and the start code element of the code word corresponding to the first character in the target decompressed data of the x-1 clock cycle are adjacent, and the target decompressed data of the x-1 clock cycle refers to the target decompressed data corresponding to the code segment read in the x-1 clock cycle.
According to the decompression method provided by the embodiment of the invention, under the condition that x is an integer greater than or equal to 2, the possible c/a candidate decompressed data of the data read in the x-th clock cycle is obtained first, and then the target decompressed data in the x-th clock cycle is selected according to the target decompressed data in the x-1 clock cycle, so that the decompression engine can start to obtain the candidate decompressed data when the data in the last clock cycle is not decompressed, and the speed of decompressing the data is improved. Under the condition that x is equal to 1, the target decompressed data of the x-1 clock cycle is a plurality of effective characters which are obtained by decompressing the code segments read in the 1 clock cycle and are arranged according to a specific sequence.
Optionally, after the decompression engine determines that the target of the xth clock cycle decompresses data, the method further includes:
and the decompression engine splices the obtained decompressed data in a plurality of clock cycles according to the compressed data to obtain the decompressed data corresponding to the compressed data, and in two adjacent clock cycles after splicing, the position of the last bit code element of the code word corresponding to the last character in the target decompressed data in one clock cycle in the compressed data is adjacent to the position of the start code element of the code word corresponding to the first character in the target decompressed data in the other clock cycle in the compressed data.
According to the decompression method provided by the embodiment of the invention, after the decompressed data of a plurality of clock cycles are obtained, the results are spliced to obtain the decompressed data of the compressed data, and the decompression speed is higher.
In a second aspect, an embodiment of the present invention provides a compression method, which may be performed by a compression engine, the method including:
the compression engine determines a target common divisor of a code length corresponding to each code word in the variable length coding table; the variable length coding table comprises a plurality of code words, the target common divisor is an integer which is greater than or equal to 2, and each character in the data to be compressed corresponds to one code word in the variable length coding table;
the compression engine generates a variable length coding table according to each character and the target common divisor in the data to be compressed; the code lengths of at least two code words in the variable length coding table are different, the code length of the code word corresponding to the character with higher occurrence probability in the data to be compressed is smaller than the code length of the code word corresponding to the character with lower occurrence probability, and the maximum common divisor of the code lengths of the code words in the variable length coding table is the target common divisor;
the compression engine compresses the data to be compressed according to the variable length coding table to obtain compressed data;
the compression engine generates a compressed data package based on the compressed data and the variable length coding table.
According to the compression method provided by the embodiment of the invention, the variable length coding table is generated according to each character and the target common divisor in the data to be compressed, and the data to be compressed is compressed according to the variable length coding table, so that a decompression engine can simultaneously start decompression from multiple positions when decompressing the compressed data, and does not need to decompress the next code word according to the length of the code word after waiting for the completion of the decompression of one code word, and the decompression speed is high.
Optionally, the determining, by the decompression engine, a target common divisor of a code length corresponding to each codeword in the variable length coding table includes:
the decompression engine can firstly determine at least one common divisor, and compress data to be compressed according to a variable length coding table corresponding to each common divisor in the at least one common divisor, wherein the obtained compression ratios are all smaller than a preset value, and a corresponding variable length coding table can be generated according to each common divisor in the at least one common divisor;
the decompression engine may then determine a target common divisor from the at least one common divisor, the target common divisor having a value greater than each common divisor of the at least one common divisor other than the target common divisor.
The larger the greatest common divisor a is, the higher the decompression speed of the data compressed by the compression method provided by the embodiment of the present invention is, but the compression rate may be increased when the greatest common divisor becomes larger. According to the compression method provided by the embodiment of the invention, the compression rate and the decompression speed are both ensured to be in a higher range by predetermining at least one common divisor of which the corresponding compression rate is smaller than a preset value and selecting the largest common divisor from the common divisors as the target common divisor.
In a third aspect, an embodiment of the present invention provides a decompression apparatus, where the decompression apparatus includes multiple units, and the multiple units are configured to implement the decompression method provided in the first aspect or any one of the possible implementation manners of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a compression apparatus, where the compression apparatus includes multiple units, and the multiple units are configured to implement the compression method provided in any one of the second aspect and the second possible implementation manner.
In a fifth aspect, an embodiment of the present invention provides a data processing system, where the system includes the decompression device provided in the third aspect and the compression device provided in the fourth aspect, where the compression device is configured to compress data to be compressed to obtain compressed data; the decompression device is used for decompressing the compressed data.
In a sixth aspect, a decompression apparatus is provided, the decompression apparatus comprising: the base board is connected with a bus interface, the bus interface can be connected with a decompression engine, and the decompression engine is used for executing the decompression method provided by the first aspect.
In a seventh aspect, a decompression apparatus is provided, including: the decompression engine may be integrated on the processor chip and connected with the CPU through the bus interface, and the decompression engine is configured to execute the decompression method provided in the first aspect.
In an eighth aspect, there is provided a compression device comprising: the system comprises at least one processor, at least one network interface, a memory and at least one bus, wherein the memory and the network interface are respectively connected with the processor through the bus; the processor is configured to execute instructions stored in the memory; the processor implements the compression method provided by the second aspect by executing the instructions.
In summary, in the decompression method provided in the embodiment of the present invention, when decompressing compressed data, the (a × N + b)/a sub-code segments are concurrently decompressed to obtain the (a × N + b)/a characters, and then the target decompressed data in the x-th clock cycle is determined from the (a × N + b)/a characters according to the variable length coding table and the code segment read in the x-th clock cycle. And a next code word is decoded according to the length of the code word without waiting for the completion of the decoding of the code word. The problem of lower decompression speed in the related art is solved. When decompression is carried out, decompression can be started from a plurality of positions at the same time, and the decompression speed is high.
Drawings
Fig. 1A is a block diagram illustrating a structure of a decompression apparatus according to an embodiment of the present invention;
fig. 1B is a block diagram of another decompression apparatus according to an embodiment of the present invention;
fig. 1C is a block diagram illustrating a compression apparatus according to an embodiment of the present invention;
FIG. 2A is a flow chart illustrating a compression method according to an embodiment of the present invention;
FIG. 2B is a flow chart of one embodiment of determining a target common divisor of the embodiment of FIG. 2A;
fig. 3A is a flow chart illustrating a decompression method according to an embodiment of the present invention;
FIG. 3B is a diagram illustrating the correspondence between codewords and characters in the data read in the xth clock cycle in the embodiment shown in FIG. 3A;
FIG. 3C is a bar graph of the chip area occupied by the decompression engine in the embodiment shown in FIG. 3A;
fig. 4 is a block diagram illustrating a structure of a compression apparatus according to an embodiment of the present invention;
fig. 5A is a block diagram illustrating a structure of a decompression apparatus according to an embodiment of the present invention;
fig. 5B is a block diagram illustrating a structure of a decompression apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a data processing system according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the present disclosure more clear, the present disclosure will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments.
Referring to fig. 1A, a block diagram of a decompression apparatus according to an embodiment of the present invention is shown, where the decompression apparatus may include: the system comprises a bottom plate 11, and a CPU12 and a memory 13 disposed on the bottom plate 11, wherein the bottom plate 11 is connected to a bus interface 14, the bus interface 14 may be connected to a decompression engine 15, the decompression engine 15 may be an expansion card, and the expansion card may be a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC). The bus interface 14 may be PCIe (a kind of bus interface).
Referring to fig. 1B, a block diagram of another decompression apparatus according to an embodiment of the present invention is shown, where the decompression apparatus may include: the processor chip 16, the memory 17 connected to the processor chip 16, and the CPU161 and the bus interface 162 provided on the processor chip 16, the decompression engine 163 may be integrated on the processor chip 16 and connected to the CPU161 through the bus interface 162. The memory 17 may be a Double Data Rate (DDR) Synchronous Dynamic Random Access Memory (SDRAM), and the bus Interface 162 may be an advanced extensible Interface (AXI).
Referring to fig. 1C, a block diagram of a compressing apparatus according to an embodiment of the present invention is shown, where the compressing apparatus 20 may include: at least one processor 21, at least one network interface 22, a memory 23 and at least one bus 24, wherein the memory 23 and the network interface 22 are respectively connected with the processor 21 through the bus 24; the processor 21 is configured to execute instructions 231 stored in the memory 23. The memory 23 may comprise a high-speed Random Access Memory (RAM) and may further comprise a non-volatile memory (non-volatile memory), such as at least one disk memory.
Fig. 2A is a flowchart of a compression method according to an embodiment of the present invention, which is illustrated in the present embodiment by applying the compression method to compress data to be compressed. The compression method may comprise the steps of:
step 201, the compression engine obtains probability distribution of various characters in the data to be compressed.
When the compression method provided by the embodiment of the invention is used, the compression engine can analyze the data to be compressed so as to obtain the probability distribution of various characters in the data to be compressed. The data to be compressed may be composed of a plurality of different characters, the probability of occurrence of different characters may be different, the probability distribution may include the probability of occurrence of each character, and for example, the probability of occurrence of the character "a" may be 4% (percentile), and the probability of occurrence of the character "B" may be 6%. The characters in the data to be compressed are uncompressed characters stored in the storage device, and the storage device usually uses binary bits (bits) as a unit when storing the data, and the number of binary bits occupied by each character is different according to different encoding methods, for example, each character can occupy 8 binary bits in the data to be compressed using American Standard Code for Information exchange (ASCII) as an encoding method.
The compression engine in the embodiment of the present invention may be incorporated in the compression apparatus shown in fig. 1C by software or hardware.
Step 202, the compression engine determines a target common divisor of a code length corresponding to each code word in the variable length coding table.
The variable length coding table may include a plurality of code words, the target common divisor is an integer greater than or equal to 2, and each character in the data to be compressed corresponds to one code word in the variable length coding table.
The target common divisor is the greatest common divisor of the code length corresponding to each code word in the coding table, and the coding table is generated by the compression engine subsequently. It should be noted that a codeword is composed of a number of symbols, and is usually represented as a number of bits of binary data in computer communication. One symbol may be a binary number.
As shown in fig. 2B, the process of determining the target common divisor by the compression engine may include the following 2 sub-steps:
the substep 2021, determining at least one common divisor by the compression engine, and compressing the data to be compressed according to the variable length coding table corresponding to each common divisor in the at least one common divisor, wherein the obtained compression ratios are all smaller than a preset value.
And generating a corresponding variable length coding table according to each common divisor in the at least one common divisor.
Sub-step 2021 may comprise:
1) the compression engine may first preset a plurality of common divisor, where the common divisor is an integer greater than or equal to 2 and smaller than S, and S is the number of binary bits occupied by each character in the data to be compressed (this is because if the common divisor is greater than or equal to the number of binary bits occupied by a character, and the common divisor is the greatest common divisor of the code length of each codeword in the compressed data, the data size of the compressed data is greater than that of the data to be compressed, which is difficult to achieve the purpose of compression). For example, if the number of bits occupied by a character in the data to be compressed is 8, the common divisor may be preset to 6 numbers of 2 to 7.
2) After setting the common divisor, the compression engine may estimate a compression ratio (english: compression ratio) the Compression ratio is the ratio of the size of the data after Compression to the size before Compression. For example, the size of the data to be compressed is 100 Megabytes (M) before compression, and the compression rate is 10% when the compressed size is 10M.
When estimating the compression rate of each common divisor, the compression engine may sample data to be compressed to obtain sample data, then generate a variable length coding table according to each common divisor (step 203 may be referred to in the process of generating a coding table according to the common divisor), and compress the sample data according to each variable length coding table to obtain the compression rate corresponding to each common divisor. The code table generated according to any common divisor of the multiple common divisors records the corresponding relation between each character in the sampling data and the code word corresponding to the character, and the code length of the code word corresponding to each character in the variable length code table is positive integer multiple of any common divisor.
3) And screening at least one common divisor of which the corresponding compression ratio is smaller than a preset value from the plurality of common divisors.
The preset value can be preset by an operator, the preset value can be set by considering the requirement on the decompression speed, when the requirement on the decompression speed is high, a higher preset value can be set, and when the requirement on the decompression speed is low, a smaller preset value can be set. This is because generally the smaller the compression ratio, the slower the decompression speed will be.
In sub-step 2022, the compression engine determines a target common divisor from the at least one common divisor, wherein the value of the target common divisor is greater than the value of each common divisor of the at least one common divisor other than the target common divisor.
The compression engine determines a target common divisor, wherein the target common divisor is a common divisor with a maximum value in at least one common divisor of which the corresponding compression rate is smaller than a preset value.
It should be understood that, in the aforementioned "at least one common divisor", in the case that the number of the "at least one common divisor" is 1, that is, the "at least one common divisor" refers to a common divisor, the common divisor is the target common divisor. The target common divisor refers to a common divisor that is greater than the value of each of the at least one common divisor other than the target common divisor only if the number of the "at least one common divisor" is greater than or equal to 2.
In the decompression method (refer to the embodiment shown in fig. 3A) corresponding to the compression method provided by the embodiment of the present invention, the larger the greatest common divisor of the codewords in the variable length coding table is, the more data are read by the decompression engine per clock cycle, and the higher the decompression speed is, so that in consideration of the decompression speed, the greatest common divisor of the common divisors with the compression ratio smaller than the preset value can be determined as the target common divisor.
Further, since the larger the common divisor maximum is, the longer the code length of the code word indicating the character is, and further the larger the data amount of the compressed data (compressed data is constituted by code words) indicating the data to be compressed is, the larger the common divisor maximum is, the larger the compression rate is. In the case of considering the compression rate, the smallest common divisor among the common divisors having the compression rate smaller than the preset value may be determined as the target common divisor.
Step 203, the compression engine generates a variable length coding table according to each character and the target common divisor in the data to be compressed.
The code lengths of at least two code words in the variable length coding table are different, the code length of the code word corresponding to the character with higher occurrence probability in the data to be compressed is smaller than the code length of the code word corresponding to the character with lower occurrence probability, and the greatest common divisor of the code lengths of the code words in the variable length coding table is the target common divisor.
The variable length coding table is a mapping table for recording the corresponding relation between characters and code words in the data to be compressed, and the process of generating the variable length coding table is a process of determining the corresponding relation between the characters and the code words in the data to be compressed. Each code word in the variable length coding table corresponds to one character in the data to be compressed, the code words corresponding to the same character are the same, and the code words corresponding to different characters are different. For "different characters correspond to different code words," one case is that the number of bits (or "code length") of the code words corresponding to different characters is different; in another case, although the number of bits of the code word corresponding to different characters is the same, the arrangement order of binary data in the code word is different. Referring to table 1, if the codeword corresponding to the character a is 000 and the codeword corresponding to the character B is 001, the arrangement order of binary data in the codewords is different although the number of bits is the same for the codeword 000 corresponding to the character a and the codeword 001 corresponding to the character B. Further referring to table 1, the code corresponding to the character a is 000 and the character F corresponds to 101000 of the code, and the number of bits of the code 000 corresponding to the character a and the code 101000 corresponding to the character F are different.
When generating the variable length coding table, according to the order of the occurrence probability of each character in the data to be compressed from large to small, a code word with the length from small to large can be allocated to each character, that is, the larger the occurrence probability of the character is, the character is corresponding to the code word with the shorter length, so that the data to be compressed can be represented by a smaller data amount.
For example, when the target common divisor is 3, the variable length coding table can be as shown in table 1:
TABLE 1
Figure DEST_PATH_GDA0001263851500000101
Figure DEST_PATH_GDA0001263851500000111
In table 1, in the character column, the probability of occurrence of a character corresponding to a codeword having a shorter code length is greater than the probability of occurrence of a character corresponding to a codeword having a longer code length. The code length of the code word corresponding to the character represents the code length of the code word corresponding to the character, or represents the number of code elements contained in the code word corresponding to the character, for example, the code length 6 of the code word corresponding to the character "P" represents the code length of the code word "110010" corresponding to the character "P". While the code word in the same row as the character represents the code word corresponding to the character, for example, the code word "110011" in the same row as the character "Q" represents the code word corresponding to the character "Q". The arrangement order of the characters in the data to be compressed is consistent with the arrangement order of the code words in the compressed data.
The variable length coding in the embodiment of the present invention may be entropy coding (which is a coding method used in lossless compression).
And step 204, the compression engine compresses the data to be compressed according to the variable length coding table to obtain compressed data.
After the variable length coding table is obtained, the data to be compressed may be compressed according to the variable length coding table, so as to obtain compressed data. Illustratively, the variable length coding table is table 1, and the compressed data obtained after compressing the data to be compressed "ABC" according to table 1 is "000001010".
Step 205, the compression engine generates a compressed data package based on the compressed data and the variable length coding table.
After obtaining the variable length coding table and the compressed data, the compression engine generates a compressed data package based on the compressed data and the variable length coding table. The compressed data package may also include other data such as a validation code (English: CHECKSUM), etc.
The compression ratio of the compression method provided by the embodiment of the invention is reduced by about 30% compared with the compression ratio of the compression method based on LZ4 (an encoding method).
In summary, in the compression method provided in the embodiments of the present invention, by generating the variable length coding table with the greatest common divisor in the length of the codeword, and compressing the data to be compressed according to the variable length coding table, when decompressing the compressed data, the decompression engine can start decompressing from multiple positions concurrently without waiting for the completion of decompressing one codeword and then decompressing the next codeword according to the length of the codeword, so that the decompression speed is high.
Fig. 3A is a flowchart of a decompression method according to an embodiment of the present invention, which is illustrated by applying the decompression method to decompress a compressed data packet generated by the compression method provided in the embodiment shown in fig. 2A, and the decompression method can be implemented by the decompression engine in fig. 1A or fig. 1B. The decompression method can comprise the following steps:
step 301, the decompression engine analyzes the compressed data package to obtain the variable length coding table and the compressed data.
When the decompression method provided by the embodiment of the invention is used, the decompression engine can firstly analyze the compressed data package to obtain the variable length coding table and the compressed data. The compressed data package may be a compressed data package compressed by the compression method provided in the embodiment shown in fig. 2A, and the compressed data package may include a variable length coding table, compressed data, and other data.
This step is an optional step, i.e. the decompression engine can also directly obtain the variable length coded and compressed data.
Step 302, the decompression engine reads a code segment with a code length of a × N + b from the compressed data in each clock cycle according to the generation sequence of the compressed data, and the rear b bits of the code segment with the code length of a × N + b read in the x-th clock cycle coincide with the front b bits of the code segment with the code length of a × N + b read in the x + 1-th clock cycle.
After obtaining the variable length coding table, the decompression engine may read a code segment with a code length a × N + b (a × N + b is greater than or equal to the code length c of the longest codeword in the compressed data) from the compressed data in each clock cycle according to the generation order of the compressed data (i.e., the generated data is read first, and the generated data is read later), the last b bits of the code segment with the code length a × N + b read in the x-th clock cycle coincide with the front b bits of the code segment with the code length a × N + b read in the x + 1-th clock cycle, where N is an integer greater than 0, a represents the target common divisor mentioned in the above embodiments, the value of N may be preset according to the hardware condition, b is the difference between the code length c of the longest codeword in the compressed data and the code length a of the shortest codeword (the code length of the shortest codeword is equal to the target common divisor a), and a N may represent the decompression speed of the decompression method provided by the embodiments of the present invention, i.e. the larger a and N, the higher the decompression speed of the decompression method provided by the embodiment of the present invention. For example, when a is 3, c is 12, and b is c-a is 9, the decompression engine reads 3N +9 bits of data every clock cycle.
In the compressed data, the values such as the greatest common divisor a of the code length of the code word, the difference b between the code length of the longest code word and the code length of the shortest code word in the compressed data, etc. may be known in advance by the decompression engine before decompression, and for example, may be written into the decompression engine when the decompression engine is set. Alternatively, the greatest common divisor a of the code length of the codeword, the difference b between the code length of the longest codeword and the code length of the shortest codeword in the compressed data, and other values may be included in the variable length coding table, and the decompression engine may obtain these values when parsing the variable length coding table.
Clock cycles (English) are also called oscillation cycles, and Clock cycles are units of time in a computer. In one clock cycle, the CPU performs a basic action.
In step 302, the decompression engine reads a code segment with a fixed code length from the compressed data every clock cycle, so that the data can be transmitted to the decompression engine with a stable bandwidth, and the utilization rate of the bandwidth by the decompression engine is high.
In the decompression method provided by the embodiment of the invention, the decompression engine can adjust the data volume read in each clock cycle by adjusting the value of N, so as to adjust the decompression speed.
Since there may be un-decompressed multi-bit symbols at the end of the code segment read in the previous clock cycle in two adjacent clock cycles, in order to avoid these symbol omissions, the decompression engine reads the last b bits of the code segment read in the previous clock cycle repeatedly when reading the code segment in each clock cycle, because the code length of the whole compressed data is an integer multiple of a, the number of the symbols that can be un-decompressed in the previous clock cycle is b at most (in the code segment read in the previous clock cycle, when the last code word is the maximum length c, the number of the un-decompressed symbols that are missing in the previous clock cycle is the maximum number b), and the number of the symbols that can be un-decompressed in the previous clock cycle is 0 at least. It should be noted that, in practice, the number of the symbols not decompressed in the previous clock cycle is the product of an integer and a in the interval [0, K ], where K is equal to b/a.
Step 303, the decompression engine reads the code segment with the code length a × N + b from the compressed data according to the x-th clock cycle to obtain (a × N + b)/a sub-code segments.
Wherein x is an integer greater than or equal to 1; the code length of each sub-code segment in the (a x N + b)/a sub-code segments is c, the (a x N + b)/a sub-code segments comprise a first sub-code segment, the starting bit of the first sub-code segment is overlapped with the starting bit of the code segment read in the x clock period, and the code length of the interval between the starting bits of two adjacent sub-code segments in the (a x N + b)/a sub-code segments is a-1.
And 304, the decompression engine decompresses the (a × N + b)/a sub-code segments concurrently based on the variable length coding table to obtain (a × N + b)/a characters, wherein each sub-code segment is decompressed to obtain one character. When x is equal to 1, perform step 305; when x is an integer greater than or equal to 2, step 306 is performed.
In the a × N + b bit data read in the x-th clock cycle, the decompression engine may concurrently decompress a plurality of sub-code segments, and a start symbol of a codeword corresponding to one character obtained by decompressing each sub-code segment is a start symbol of each sub-code segment.
When the decompression engine decompresses a plurality of sub-code segments with the code length of a.n + b, starting from the position to the position from the, after deleting the character, there are several symbols in the a x N + b bit data that are not decompressed.
Illustratively, the code segment read at the x-th clock cycle is "000001010011100", a is 3, and C is 6, and the characters obtained by decompressing the sub-code segments "000001", "001010", "010011", "011100", and "100000" are the character "a" corresponding to the codeword 000, the character "B" corresponding to the codeword 001, the character "C" corresponding to the codeword 010, the character "D" corresponding to the codeword 011, and the character "E" corresponding to the codeword "100" according to the variable length coding table shown in table 1. Wherein "100000" is a 6-bit sub-code segment, and the code length of the code word corresponding to the character "E" decompressed according to the 6-bit sub-code segment is 3, which is not greater than the length of "100000" before the filling, so that the character "E" may not be deleted.
It should be noted that the code words in the compressed data correspond to the characters in the target decompressed data obtained by decompressing the compressed data one to one. The K-th codeword in the compressed data corresponds to the K-th character in the target decompressed data, and the start bit of a certain codeword is determined, so that the codeword can be determined (at least a-bit symbols and at most c-bit symbols are read from the start bit, and one codeword represented by these symbols can be obtained from the variable length coding table). In the prior art, the code length of each codeword in the compressed data is not all the same, and before a codeword is decompressed according to a sequence in the compressed data, the decompression engine cannot know the code length of the codeword, and thus cannot know the start bit of the next codeword. However, since the code length of the code word in the compressed data in the decompression method provided by the embodiment of the present invention has the greatest common divisor a, the start bit of the code word corresponding to each character in the decompressed data is included in the start bits of each sub-code segment in (a × N + b)/a sub-code segments, and it can be said that the start bit of the code word corresponding to each character in the target decompressed data obtained by decompressing the compressed data (the code word corresponding to each character can be known from the variable length coding table) is the correct start bit in the start bits of (a × N + b)/a sub-code segments (the correct start bit is determined by the order of the bits in the whole compressed data, and if the start bit of the code word corresponding to the first character in the decompressed data is the first bit of the whole compressed data, the first bit in the whole compressed data is the correct start bit of the code word corresponding to the first character in the decompressed data), and the bits of the start bits of the (a × N + b)/a sub-code segments other than the correct start bit are called as the error start bits. Step 304 is to concurrently decompress a plurality of characters obtained by (a × N + b)/a sub-code segments, including valid characters obtained by decompressing from the correct start bit and invalid characters obtained by decompressing from the wrong start bit.
Step 305, when x equals to 1, the decompression engine determines target decompressed data of the x-th clock cycle from (a × N + b)/a characters, and a start symbol of a code word corresponding to a first valid character in the target decompressed data of the x-th clock cycle coincides with a start symbol of a code segment read by the x-th clock cycle.
In the target decompressed data of the x-th clock cycle, the position of the last bit code element of the code word corresponding to one effective character in every two adjacent effective characters in the code segment read by the x-th clock cycle is adjacent to the position of the start code element of the code word corresponding to the other effective character in the code segment read by the x-th clock cycle. The target decompressed data of the x-th clock cycle includes a plurality of valid characters arranged in a specific order.
After obtaining (a × N + b)/a characters, the decompression engine obtains the code length of the code word corresponding to each character in the (a × N + b)/a characters. Because the characters comprise invalid characters obtained by decompression from the beginning of an error, a plurality of valid characters can be selected from the characters to be spliced to obtain target decompressed data of an x clock period, the selection can be based on the principle that the correct beginning bit of a first code word in a code segment with the code length of a + N + b read in the x clock period is firstly determined, then the correct beginning bit of a second code word is determined according to the correct beginning bit and the code length of the first code word (the correct beginning bit of the second code word is the next bit of the last bit of the first code word), and so on, the decompression engine can obtain the correct beginning bit of the code word corresponding to each valid character, and then a plurality of valid characters can be selected, and the valid characters are spliced in the sequence of the code words corresponding to each valid character in the a + N + b bit data read in the x clock period, the target decompressed data of the data read in the x-th clock cycle is obtained.
When x is 1, the start bit of the code segment with the code length a × N + b read in the x-th clock cycle is the start bit of the compressed data, and the start bit is the correct start bit of the first code word in the code segment with the code length a × N + b read in the x-th clock cycle.
For example, as shown in fig. 3B, the first 24-bit symbols in the code segment with the code length a × N + B are read for the x-th clock cycle, and in step 304, the code length of each codeword and the character corresponding to each codeword, that is, a codeword with the code length 6 composed of 0 to 5 bits corresponds to character a, a codeword with the code length 6 composed of 3 to 8 bits corresponds to character B, a codeword with the code length 9 composed of 6 to 14 bits corresponds to character C, a codeword with the code length 6 composed of 9 to 14 bits corresponds to character D, a codeword with the code length 12 composed of 12 to 23 bits corresponds to character E, and a codeword with the code length 3 composed of 15 to 17 bits corresponds to character F, where the code lengths of the codewords and the characters corresponding to the codewords may be as shown in table 2:
TABLE 2
Figure DEST_PATH_GDA0001263851500000151
Figure DEST_PATH_GDA0001263851500000161
In table 2, the sequence of each character from top to bottom and the sequence of the start bit of the codeword corresponding to each character in the code segment with the code length a × N + b read in the xth clock cycle are the same. The manner of determining valid characters from the 24-bit symbols shown in fig. 3B and the plurality of characters corresponding to the 24-bit symbols may be: the initial bit of the 24-bit code element is used as the initial bit of the first code word (the initial bit is the correct initial bit of the code word corresponding to the first valid character), it is determined that the code length of the first code word is 6, the character corresponding to the first code word is a, since the code length of the first code word is 6, the correct initial bit of the second code word is the 6 th bit of the 24-bit code element (the first 0 to 5 bits are occupied by the first code word), while the character corresponding to the code word using the 6 th bit of the 24-bit code element as the initial position is C, the character C can be used as the second valid character, since the code length of the code word corresponding to the second valid character is 9, the correct initial bit of the third code word is the 15 th bit of the 24-bit code element (the first 0 to 14 bits are occupied by the first code word and the second code word), the character corresponding to the code word using the 15 th bit as the initial position is F, the resulting decompressed data is ac F.
And step 306, when x is an integer greater than or equal to 2, the decompression engine determines c/a group of candidate decompressed data from (a × N + b)/a characters.
Each group of candidate decompressed data comprises a plurality of characters, and in every two adjacent characters, the position of the last code element of the code word corresponding to one character in the code segment read in the x clock cycle is adjacent to the position of the start code element of the code word corresponding to the other character in the code segment read in the x clock cycle; the positions of the starting code elements of the code words corresponding to the first character in every two groups of candidate decompressed data in the code segments read in the x clock period are different, the starting code elements of the code words corresponding to the first character in each group of candidate decompressed data are the Wa code elements in the code segments read in the x clock period, and W is an integer which is greater than or equal to 0 and less than or equal to b/a.
When x is an integer greater than or equal to 2, because the data read in the x-1 clock cycle is not decompressed, the decompression engine is difficult to know the number of uncompressed symbols in the data read in the x-1 clock cycle, and therefore the decompression engine is currently difficult to determine the correct start bit of the first code word in the code segment with the read code length of a × N + b. At this time, the decompression engine may determine all possible correct start bits (all possible correct start bits are H × a bits in the code segment with the code length a × N + b read by the x-th clock cycle, H is an integer from 0 to K, and K is b/a), and then obtain c/a candidate decompressed data in step 305 by using all possible correct start bits as the correct start bits of the first codeword, respectively.
The first bits of the b bits read in the x clock cycle and the last bits of the code segment read in the x-1 clock cycle are repeated, and the first bits of the repeated b bits may exist in a part of the last code word in the code segment read in the x-1 clock cycle, and the correct starting bit of the first code word in the code segment read in the x clock cycle is the next bit of the last code word in the code segment read in the x-1 clock cycle.
Since the greatest common divisor of the code length of the code words in the compressed data is a, the length of the code segment read in the x-1 th clock cycle existing in the first b bits of the code segment read in the x-th clock cycle may be a product of a and any integer from 0 to K (when the length is 0 a, it indicates that there is no part of the last code word in the code segment read in the x-1 th clock cycle in the first b bits of the code segment read in the x-th clock cycle), so that the H a bit in the first b bit symbol of the code segment read in the x clock cycle may be the correct start bit of the first code word in the code segment read in the x clock cycle, and the number of candidate decompressed data depends on the number of integers from 0 to K, this number is equal to K +1 ═ b/a +1 ═ a + b)/a ═ c/a, where K is incremented by 1 because 0 also counts as an integer from 0 to K.
Illustratively, c is 9, a is 3, the first 9 bits of the code segment read in the x-th clock cycle are "000001010", and then the 0 th bit, the 3 rd bit and the 6 th bit thereof are possible correct start bits, and the decompression engine may use the 0 th bit, the 3 rd bit and the 6 th bit as start bits of a code word corresponding to a first character in the 3 candidate decompressed data, respectively, and obtain the 3 candidate decompressed data in the manner in step 305, respectively.
In step 307, the decompression engine determines the target decompressed data in the x-th clock cycle from the c/a group of candidate decompressed data.
In the compressed data, the last bit code element of the code word corresponding to the last character in the target decompressed data of the x-1 clock cycle and the start code element of the code word corresponding to the first character in the target decompressed data of the x-1 clock cycle are adjacent, and the target decompressed data of the x-1 clock cycle refers to the target decompressed data corresponding to the code segment read in the x-1 clock cycle. Under the condition that x is equal to 1, the target decompressed data of the x-1 clock cycle is a plurality of effective characters which are obtained by decompressing the code segments read in the 1 clock cycle and are arranged according to a specific sequence.
After the decompression engine obtains the c/a candidate decompressed data of the x clock period, the decompression engine completes the decompression of the code segment read in the x-1 clock period at the same time, and obtains the target decompressed data of the x-1 clock period, and the decompression engine can determine the correct start bit of the first code word in the data read in the x clock period according to the target decompressed data of the x-1 clock period and select the target decompressed data of the x clock period from the c/a candidate decompressed data.
Illustratively, the decompressed data of the x-1 clock cycle is "ABCD", the corresponding codeword is "000001010011", the code segment read in the x-1 clock cycle is "000001010011101", b is 6, a is 3, c is 9, N is 4, a is N + b is 15, and the code segment read in the x-1 clock cycle is "011101000101001", where the first 6 bits are the last 6 bits of the code segment read in the x-1 clock cycle, the last three bits "101" of the code segment read in the x-1 clock cycle are the first three bits of a 6-bit codeword, and the three bits are not decompressed in step 304, and the three bits of data are non-decompressed code segments, and the start symbol of the three bits of the code segment can be determined as the start symbol of the codeword corresponding to the first character in the target decompressed data of the x clock cycle.
It should be noted that, in the embodiment of the present invention, after the code segments read in each clock cycle are concurrently processed, that is, after the decompression engine reads the code segments in the x-1 clock cycle, when the decompression engine decompresses the code segments read in the x-1 clock cycle, the decompression engine reads the code segments in the x-1 clock cycle, and starts decompressing the code segments read in the x-1 clock cycle at the same time. The code segment read in the x-1 clock cycle is read one clock cycle before the code segment read in the x clock cycle, so that the decompression engine performs any one of the processes of step 303 to step 307 on the code segment read in the x clock cycle, one step slower than the decompression engine performs the same process on the code segment read in the x-1 clock cycle. Therefore, during the processing of the code segment read by the decompression engine in the x-th clock cycle, when step 306 is executed, the decompression engine has completed decompressing the code segment read by the x-1 clock cycle, and the target decompressed data of the x-1 clock cycle is obtained.
All the data in the compressed data can be decompressed through steps 302 to 307, and target decompressed data of each clock cycle is obtained.
And 308, splicing the obtained decompressed data of a plurality of clock periods by the decompression engine according to the compressed data to obtain the decompressed data corresponding to the compressed data.
In two adjacent clock cycles after splicing, the position of the last bit code element of the code word corresponding to the last character in the target decompressed data in one clock cycle in the compressed data is adjacent to the position of the start code element of the code word corresponding to the first character in the target decompressed data in another clock cycle in the compressed data.
As shown in fig. 3C, it is a bar graph of chip area occupied by the decompression engine when the decompression method provided by the embodiment of the present invention is applied, when the decompression method based on huffman coding is applied, and when the decompression method based on LZ4 is applied, where the vertical axis represents the chip area, the unit of the chip area is square millimeter, and the chip refers to a chip of 16 nanometer (nm) process. Wherein, the bar 31 represents the chip area occupied by the decompression engine when applying the huffman coding-based decompression method, and the area is 0.05 square millimeter. Bar 32 represents the greatest common divisor of 3, and the chip area occupied by the decompression engine when the decompression method provided by the embodiment of the present invention is applied is 0.035 square millimeters. Bar 33 represents the greatest common divisor of 2, and the chip area occupied by the decompression engine when applying the decompression method provided by the embodiment of the present invention is 0.025 square millimeters. Bar 34 represents the chip area occupied by the decompression engine when the LZ4 based decompression method is applied, which is 0.16 square millimeters. It can be seen that the decompression method provided by the embodiment of the present invention occupies a smaller chip area than the decompression method in the related art.
In addition, the decompression speed of the decompression method provided by the embodiment of the invention can reach about 10 times of the decompression speed of the compressed data package generated according to the Huffman coding, and is basically equal to the decompression speed of the compressed data package generated based on the LZ 4.
In summary, in the decompression method provided in the embodiment of the present invention, when decompressing compressed data, the (a × N + b)/a sub-code segments are concurrently decompressed to obtain the (a × N + b)/a characters, and then the target decompressed data in the x-th clock cycle is determined from the (a × N + b)/a characters according to the variable length coding table and the code segment read in the x-th clock cycle. And a next code word is decoded according to the length of the code word without waiting for the completion of the decoding of the code word. The problem of lower decompression speed in the related art is solved. When decompression is carried out, decompression can be started from a plurality of positions at the same time, and the decompression speed is high.
Fig. 4 is a block diagram of a compression apparatus according to an embodiment of the present invention, which may be a part or all of a terminal or a server. The compression apparatus 400 may include:
a common divisor determining unit 410, configured to perform step 202 in the foregoing embodiment.
And an encoding table generating unit 420 for executing step 203 and step 201 in the above embodiments.
A compressing unit 430, configured to perform step 204 in the foregoing embodiment.
A package generating unit 440, configured to perform step 205 in the foregoing embodiments.
In summary, the compression apparatus provided in the embodiments of the present invention generates the variable length coding table having the greatest common divisor of the lengths of the codewords, and compresses the data to be compressed according to the variable length coding table, so that the decompression engine can start decompression from multiple positions concurrently when decompressing the compressed data, instead of waiting for the completion of decompression of one codeword and then decompressing the next codeword according to the length of the codeword, and the decompression speed is high.
Fig. 5A is a block diagram of a decompression apparatus according to an embodiment of the present invention. The decompression apparatus 500 may include:
a sub-code segment determining unit 510 for performing steps 301, 302 and 303 in the above-described embodiments.
A decompression unit 520 for executing step 304 in the above embodiment.
A decompressed data determination unit 530 for performing steps 305, 306 and 307 in the above-described embodiment.
Optionally, as shown in fig. 5B, the decompression apparatus 500 may further include:
a splicing unit 540, configured to perform step 308 in the foregoing embodiment.
To sum up, when decompressing the compressed data, the decompression apparatus according to the embodiment of the present invention concurrently decompresses (a × N + b)/a sub-code segments to obtain (a × N + b)/a characters, and then determines the target decompressed data in the x-th clock cycle from (a × N + b)/a characters according to the variable length coding table and the code segment read in the x-th clock cycle. And a next code word is decoded according to the length of the code word without waiting for the completion of the decoding of the code word. The problem of lower decompression speed in the related art is solved. When decompression is carried out, decompression can be started from a plurality of positions at the same time, and the decompression speed is high.
Referring to fig. 6, a schematic structural diagram of a data processing system according to an embodiment of the present invention is provided, where the data processing system includes a compression apparatus 61 and a decompression apparatus 63, the compression apparatus 61 may be the compression apparatus described in the foregoing embodiment, and the decompression apparatus 63 may be the decompression apparatus described in the foregoing embodiment. The compression device 61 is used for compressing data to be compressed to obtain compressed data; the decompressing device 63 is used for decompressing the compressed data. The data processing system can be applied to a terminal or a server. The terminal can include a smart phone, a tablet computer, a computer and a laptop computer, and the server can be a server, or a server cluster composed of a plurality of servers, or a cloud computing service center.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only an alternative embodiment of the present invention and should not be construed as limiting the present invention, and any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present application.

Claims (15)

1. A method of decompression, the method comprising:
reading a code segment with the code length of a x N + b from compressed data according to the x clock period to obtain (a x N + b)/a sub-code segments; wherein x is an integer greater than or equal to 1; the code length of each sub-code segment in the (a × N + b)/a sub-code segments is c, the (a × N + b)/a sub-code segments include a first sub-code segment, the start bit of the first sub-code segment coincides with the start bit of the code segment read in the x-th clock cycle, the code length of the interval between the start bits of two adjacent sub-code segments in the (a × N + b)/a sub-code segments is a-1, and N is an integer greater than 0;
based on a variable length coding table, decompressing the (a × N + b)/a sub-code segments concurrently to obtain (a × N + b)/a characters, wherein one character can be obtained by decompressing each sub-code segment;
the variable length coding table comprises a plurality of code words, and each character in the (a × N + b)/a characters corresponds to one code word in the variable length coding table; the code lengths corresponding to at least two of the multiple code words are different, the code length corresponding to the code word with the longest code length in the multiple code words is c, the code length corresponding to the code word with the shortest code length in the multiple code words is a, and a is the greatest common divisor of the code lengths corresponding to each of the multiple code words, wherein a is an integer greater than or equal to 2, c is an integer greater than a, and the difference between c and a is b;
and determining target decompressed data of the x clock period from the (a × N + b)/a characters according to the variable length coding table and the code segment read by the x clock period, wherein in the target decompressed data of the x clock period, the position of the last code element of the code word corresponding to one effective character in every two adjacent effective characters in the code segment read by the x clock period is adjacent to the position of the start code element of the code word corresponding to the other effective character in the code segment read by the x clock period.
2. The method of claim 1,
and the rear b bits of the code segment with the code length of a × N + b read in the x clock period coincide with the front b bits of the code segment with the code length of a × N + b read in the x +1 clock period.
3. The method according to claim 1 or 2, wherein in a case that x is equal to 1, the determining, from the (a × N + b)/a characters, the target decompressed data of the xth clock cycle according to the variable length code table and the code segment read in the xth clock cycle includes:
and determining target decompressed data of the x clock period from the (a × N + b)/a characters, wherein a start code element of a code word corresponding to a first effective character in the target decompressed data of the x clock period coincides with a start code element of a code segment read by the x clock period.
4. The method according to claim 1 or 2, wherein, when x is an integer greater than or equal to 2, the determining, from the (a × N + b)/a characters, the target decompressed data of the xth clock cycle according to the variable length coding table and the code segment read in the xth clock cycle includes:
determining c/a group of candidate decompressed data from (a x N + b)/a characters; each group of candidate decompressed data comprises a plurality of characters, and in every two adjacent characters, the position of the last code element of the code word corresponding to one character in the code segment read by the x-th clock cycle is adjacent to the position of the start code element of the code word corresponding to the other character in the code segment read by the x-th clock cycle; the positions of the starting code elements of the code words corresponding to the first character in each two groups of candidate decompressed data in the code segments read in the x clock period are different, the starting code elements of the code words corresponding to the first character in each group of candidate decompressed data are the Wa code elements in the code segments read in the x clock period, and W is an integer which is greater than or equal to 0 and less than or equal to b/a;
determining target decompressed data of the x clock period from the c/a group of candidate decompressed data; in the compressed data, the last bit code element of the code word corresponding to the last character in the target decompressed data of the x-1 clock cycle and the start code element of the code word corresponding to the first character in the target decompressed data of the x-1 clock cycle are adjacent, and the target decompressed data of the x-1 clock cycle refers to the target decompressed data corresponding to the code segment read in the x-1 clock cycle.
5. The method of claim 1, wherein after determining the target decompressed data for the x-th clock cycle, the method further comprises:
according to the compressed data, splicing the obtained decompressed data in a plurality of clock cycles to obtain the decompressed data corresponding to the compressed data, wherein in two adjacent clock cycles after splicing, the position of the last code element of the code word corresponding to the last character in the target decompressed data in one clock cycle in the compressed data and the position of the start code element of the code word corresponding to the first character in the target decompressed data in the other clock cycle in the compressed data are adjacent.
6. A method of compression, the method comprising:
determining a target common divisor of a code length corresponding to each code word in a variable length coding table; the variable length coding table comprises a plurality of code words, the target common divisor is an integer which is greater than or equal to 2, and each character in the data to be compressed corresponds to one code word in the variable length coding table;
generating the variable length coding table according to each character in the data to be compressed and the target common divisor; the code lengths of at least two code words in the variable length coding table are different, the code length of the code word corresponding to the character with higher occurrence probability in the data to be compressed is smaller than the code length of the code word corresponding to the character with lower occurrence probability, and the maximum common divisor of the code lengths of the code words in the variable length coding table is the target common divisor;
compressing the data to be compressed according to the variable length coding table to obtain compressed data;
generating a compressed data package based on the compressed data and the variable length coding table.
7. The method of claim 6, wherein determining the target common divisor of the code length corresponding to each codeword in the variable length coding table comprises:
determining at least one common divisor, and compressing the data to be compressed according to a variable length coding table corresponding to each common divisor in the at least one common divisor, wherein the obtained compression ratios are all smaller than a preset value, and a corresponding variable length coding table can be generated according to each common divisor in the at least one common divisor;
and determining the target common divisor from the at least one common divisor, wherein the value of the target common divisor is greater than the value of each common divisor of the at least one common divisor except the target common divisor.
8. A decompression apparatus, characterized in that the decompression apparatus comprises:
a sub-code segment determining unit, configured to read a code segment with a code length of a × N + b from compressed data according to an x-th clock cycle, to obtain (a × N + b)/a sub-code segments; wherein x is an integer greater than or equal to 1; the code length of each sub-code segment in the (a × N + b)/a sub-code segments is c, the (a × N + b)/a sub-code segments include a first sub-code segment, the start bit of the first sub-code segment coincides with the start bit of the code segment read in the x-th clock cycle, the code length of the interval between the start bits of two adjacent sub-code segments in the (a × N + b)/a sub-code segments is a-1, and N is an integer greater than 0;
the decompression unit is used for carrying out decompression processing on the (a x N + b)/a sub-code segments concurrently based on a variable length coding table to obtain (a x N + b)/a characters, wherein each sub-code segment can be decompressed to obtain one character; the variable length coding table comprises a plurality of code words, and each character in the (a × N + b)/a characters corresponds to one code word in the variable length coding table; the code lengths corresponding to at least two of the multiple code words are different, the code length corresponding to the code word with the longest code length in the multiple code words is c, the code length corresponding to the code word with the shortest code length in the multiple code words is a, and a is the greatest common divisor of the code lengths corresponding to each of the multiple code words, wherein a is an integer greater than or equal to 2, c is an integer greater than a, and the difference between c and a is b;
and a decompressed data determining unit, configured to determine target decompressed data in an x-th clock cycle from the (a × N + b)/a characters according to the variable length coding table and the code segment read in the x-th clock cycle, where in the target decompressed data in the x-th clock cycle, a position of a last symbol of a code word corresponding to one valid character in every two adjacent valid characters in the code segment read in the x-th clock cycle is adjacent to a position of a start symbol of a code word corresponding to another valid character in the code segment read in the x-th clock cycle.
9. The decompression device according to claim 8,
and the rear b bits of the code segment with the code length of a × N + b read in the x clock period coincide with the front b bits of the code segment with the code length of a × N + b read in the x +1 clock period.
10. The decompression device according to claim 8 or 9, wherein, in case x is equal to 1, the decompressed data determining unit is specifically configured to:
and determining target decompressed data of the x clock period from the (a × N + b)/a characters, wherein a start code element of a code word corresponding to a first effective character in the target decompressed data of the x clock period coincides with a start code element of a code segment read by the x clock period.
11. The decompression device according to claim 8 or 9, wherein the decompressed data determining unit, when x is an integer greater than or equal to 2, specifically includes:
determining c/a group of candidate decompressed data from (a x N + b)/a characters; each group of candidate decompressed data comprises a plurality of characters, and in every two adjacent characters, the position of the last code element of the code word corresponding to one character in the code segment read by the x-th clock cycle is adjacent to the position of the start code element of the code word corresponding to the other character in the code segment read by the x-th clock cycle; the positions of the starting code elements of the code words corresponding to the first character in each two groups of candidate decompressed data in the code segments read in the x clock period are different, the starting code elements of the code words corresponding to the first character in each group of candidate decompressed data are the Wa code elements in the code segments read in the x clock period, and W is an integer which is greater than or equal to 0 and less than or equal to b/a;
determining target decompressed data of the x clock period from the c/a group of candidate decompressed data; in the compressed data, the last bit code element of the code word corresponding to the last character in the target decompressed data of the x-1 clock cycle and the start code element of the code word corresponding to the first character in the target decompressed data of the x-1 clock cycle are adjacent, and the target decompressed data of the x-1 clock cycle refers to the target decompressed data corresponding to the code segment read in the x-1 clock cycle.
12. The decompression device according to claim 8, further comprising:
and the splicing unit is used for splicing the obtained decompressed data in a plurality of clock cycles according to the compressed data to obtain the decompressed data corresponding to the compressed data, and in two adjacent clock cycles after splicing, the position of the last code element of the code word corresponding to the last character in the target decompressed data in one clock cycle in the compressed data and the position of the start code element of the code word corresponding to the first character in the target decompressed data in the other clock cycle in the compressed data are adjacent.
13. A compression device, characterized in that it comprises:
the common divisor determining unit is used for determining a target common divisor of the code length corresponding to each code word in the variable length coding table; the variable length coding table comprises a plurality of code words, the target common divisor is an integer which is greater than or equal to 2, and each character in the data to be compressed corresponds to one code word in the variable length coding table;
the coding table generating unit is used for generating the variable length coding table according to each character in the data to be compressed and the target common divisor; the code lengths of at least two code words in the variable length coding table are different, the code length of the code word corresponding to the character with higher occurrence probability in the data to be compressed is smaller than the code length of the code word corresponding to the character with lower occurrence probability, and the maximum common divisor of the code lengths of the code words in the variable length coding table is the target common divisor;
the compression unit is used for compressing the data to be compressed according to the variable length coding table to obtain compressed data;
and the package generating unit is used for generating a compressed data package based on the compressed data and the variable length coding table.
14. The compression apparatus as claimed in claim 13, wherein the common divisor determination unit is configured to:
determining at least one common divisor, and compressing the data to be compressed according to a variable length coding table corresponding to each common divisor in the at least one common divisor, wherein the obtained compression ratios are all smaller than a preset value, and a corresponding variable length coding table can be generated according to each common divisor in the at least one common divisor;
and determining the target common divisor from the at least one common divisor, wherein the value of the target common divisor is greater than the value of each common divisor of the at least one common divisor except the target common divisor.
15. A data processing system, characterized in that the system comprises compression means and decompression means,
the compression device comprises the compression device of claim 13 or 14, and is used for compressing data to be compressed to obtain compressed data;
the decompression apparatus comprising the decompression apparatus of any of claims 8-12; the decompression device is used for decompressing the compressed data.
CN201611270254.0A 2016-12-30 2016-12-30 Compression method, decompression method, device and data processing system Active CN106849956B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611270254.0A CN106849956B (en) 2016-12-30 2016-12-30 Compression method, decompression method, device and data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611270254.0A CN106849956B (en) 2016-12-30 2016-12-30 Compression method, decompression method, device and data processing system

Publications (2)

Publication Number Publication Date
CN106849956A CN106849956A (en) 2017-06-13
CN106849956B true CN106849956B (en) 2020-07-07

Family

ID=59118309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611270254.0A Active CN106849956B (en) 2016-12-30 2016-12-30 Compression method, decompression method, device and data processing system

Country Status (1)

Country Link
CN (1) CN106849956B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109981108B (en) * 2017-12-27 2023-05-02 杭州海康威视数字技术股份有限公司 Data compression method, decompression method, device and equipment
CN110784225A (en) * 2018-07-31 2020-02-11 华为技术有限公司 Data compression method, data decompression method, related device, electronic equipment and system
CN111384967B (en) * 2018-12-28 2022-12-09 上海寒武纪信息科技有限公司 Data encoding method
CN111600610B (en) * 2020-05-26 2023-04-28 北京思特奇信息技术股份有限公司 Universal coding method, system and electronic equipment for variable-length integers
CN114124106B (en) * 2022-01-28 2022-04-26 苏州浪潮智能科技有限公司 LZ4 decompression method, system, storage medium and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1291826A (en) * 1999-08-02 2001-04-18 三星电子株式会社 Variable-length coding method and device
CN1758761A (en) * 2004-08-27 2006-04-12 松下电器产业株式会社 Coding apparatus and imaging apparatus
CN101150719A (en) * 2006-09-20 2008-03-26 华为技术有限公司 Parallel video coding method and device
US9036711B1 (en) * 2008-11-06 2015-05-19 Marvell International Ltd. Visual data compression algorithm with parallel processing capability
CN105933708A (en) * 2016-04-15 2016-09-07 张彦刚 Data compression-decompression method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1291826A (en) * 1999-08-02 2001-04-18 三星电子株式会社 Variable-length coding method and device
CN1758761A (en) * 2004-08-27 2006-04-12 松下电器产业株式会社 Coding apparatus and imaging apparatus
CN101150719A (en) * 2006-09-20 2008-03-26 华为技术有限公司 Parallel video coding method and device
US9036711B1 (en) * 2008-11-06 2015-05-19 Marvell International Ltd. Visual data compression algorithm with parallel processing capability
CN105933708A (en) * 2016-04-15 2016-09-07 张彦刚 Data compression-decompression method and device

Also Published As

Publication number Publication date
CN106849956A (en) 2017-06-13

Similar Documents

Publication Publication Date Title
CN106849956B (en) Compression method, decompression method, device and data processing system
US10116325B2 (en) Data compression/decompression device
JP5498783B2 (en) Data compression method
US5410671A (en) Data compression/decompression processor
EP0129439B1 (en) High speed data compression and decompression apparatus and method
US11463102B2 (en) Data compression method, data decompression method, and related apparatus, electronic device, and system
US10044370B1 (en) Lossless binary compression in a memory constrained environment
KR102381999B1 (en) Method and system for decoding variable length coded input and method for modifying codebook
US20090016453A1 (en) Combinatorial coding/decoding for electrical computers and digital data processing systems
JP7321208B2 (en) Polar code rate matching method and apparatus
CN111884660B (en) Huffman coding equipment
US8947272B2 (en) Decoding encoded data
US10103747B1 (en) Lossless binary compression in a memory constrained environment
CN106293542B (en) Method and device for decompressing file
WO2018055160A1 (en) System level testing of entropy encoding
CN112332854A (en) Hardware implementation method and device of Huffman coding and storage medium
US9197243B2 (en) Compression ratio for a compression engine
CN108829872B (en) Method, device, system and storage medium for rapidly processing lossless compressed file
US7439887B2 (en) Method and apparatus for GIF decompression using fixed-size codeword table
US20220199202A1 (en) Method and apparatus for compressing fastq data through character frequency-based sequence reordering
CN112800183B (en) Content name data processing method and terminal equipment
US20220360278A1 (en) Data compression techniques using partitions and extraneous bit elimination
CN116566397A (en) Encoding method, decoding method, encoder, decoder, electronic device, and storage medium
CN108989813A (en) A kind of high efficiency of compression/decompression method, computer installation and storage medium
CN112669396B (en) Lossless image compression method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant