CN114401013B - Huffman compression coding character frequency statistical device suitable for deflate format - Google Patents

Huffman compression coding character frequency statistical device suitable for deflate format Download PDF

Info

Publication number
CN114401013B
CN114401013B CN202210298076.1A CN202210298076A CN114401013B CN 114401013 B CN114401013 B CN 114401013B CN 202210298076 A CN202210298076 A CN 202210298076A CN 114401013 B CN114401013 B CN 114401013B
Authority
CN
China
Prior art keywords
data
byte
frequency
unit
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210298076.1A
Other languages
Chinese (zh)
Other versions
CN114401013A (en
Inventor
王振
刘宇豪
马孔明
王明明
张茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210298076.1A priority Critical patent/CN114401013B/en
Publication of CN114401013A publication Critical patent/CN114401013A/en
Application granted granted Critical
Publication of CN114401013B publication Critical patent/CN114401013B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3086Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing a sliding window, e.g. LZ77

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a Huffman compressed coding character frequency statistical device suitable for a deflate format, which comprises: the byte counting unit is configured to receive the data encoded by the LZ77 and count the total amount of the received data; the frequency statistical unit comprises a frequency distribution module and a frequency combination module, the frequency distribution module is configured to receive the data sent by the byte statistical unit and count the occurrence frequency of each type of byte in the data, and the frequency combination module is configured to accumulate the frequency counted by the frequency distribution module; the data flow control unit is configured to receive the data sent by the byte counting unit and control the byte counting unit to receive the data according to the total amount of the received data counted by the byte counting unit and the amount and type of the data sent by the byte counting unit. By using the scheme of the invention, the throughput rate of the whole compression system can be improved, the parallel computing characteristic of hardware can be exerted to the maximum extent, and the efficiency of a hardware circuit is improved.

Description

Huffman compression coding character frequency statistical device suitable for deflate format
Technical Field
The invention relates to the field of computers, in particular to a Huffman compression coding character frequency statistical device suitable for deflate format.
Background
With the development of the times, people can use and acquire data information from books, magazines, newspapers and the like. Meanwhile, with the maturity and development of technologies such as mobile internet, 5G, social network, Internet of things, big data, Artificial Intelligence (AI), cloud computing and safety monitoring, various data are rapidly expanded and enlarged, and the trend of explosive growth is presented. 5G is accelerating, and more devices and convenient networking conditions bring unprecedented opportunities to the storage market. With the dramatic increase in data volume, challenges continue to be presented to data storage systems. Among all these technologies, the cloud computing technology is equivalent to the human brain, provides large-capacity data storage and an efficient computing manner, and is more centralized in computing resources and storage resources. But at the same time, the generated massive data exponentially increases, and huge pressure is brought to the existing storage equipment. The key is how to efficiently transmit and store mass data without causing excessive load influence on the CPU. Therefore, the efficient and safe data compression technology becomes an effective method for reducing the storage cost and saving the storage resources. Meanwhile, if the compression and decompression tasks are processed by special hardware circuits, the load of a CPU is greatly reduced, and the memory performance is remarkably improved.
Deflate is a lossless compression standard and is a form of coding that is a combination of LZ77 coding and Huffman coding. The huffman coding used in Deflate is the "canonical huffman coding", which was first proposed by Schwartz as a subset of huffman coding. The central idea is to reconstruct the structure of the huffman coding tree with only a small amount of data using some mandatory conventions.
The Huffman coding is secondary compression of data after being compressed by the LZ77, the compression ratio is further improved, variable-length coding for data processing is carried out by counting the occurrence times of characters, a Huffman binary tree is constructed by utilizing the occurrence times of the characters, coding is carried out from leaf nodes to root nodes of the tree, more characters occur to enable shorter codes to be replaced, meanwhile, characters with less occurrence times are replaced by longer codes, the overall size reduction of the data can be realized, and the purpose of compression is achieved.
The huffman coding in the Deflate format is a combined coding form of LZ77 coding and huffman coding. The data is firstly encoded by LZ77 and then exists in three forms of Literal, Length and Distance. Then, the occurrence frequency of the Literal, the Length and the Distance is counted (the Literal and the Length are taken as a type of information and the Distance are respectively counted), aiming at different counted frequencies, the Length of a code of a character with high occurrence frequency in Huffman coding is short, the Length of a code of a character with low occurrence frequency in Huffman coding is long, the frequency counting of the character for deflate format Huffman compression coding is carried out according to bytes in a software mode at present, the counting efficiency is low, the influence on the coding performance of a system is verified, and no related hardware counting device is provided at present.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a Huffman compression coding character frequency statistics apparatus suitable for deflate format, and by using the technical solution of the present invention, the throughput rate of the whole compression system can be improved, the characteristics of parallel computation of hardware can be exerted to the maximum extent, and the efficiency of a hardware circuit is improved.
In view of the above object, an aspect of the embodiments of the present invention provides a Huffman compression coding character frequency statistic apparatus suitable for deflate format, including:
a byte counting unit configured to receive the LZ77 encoded data and count the total amount of the received data;
the frequency statistics unit comprises a frequency distribution module and a frequency combination module, the frequency distribution module is configured to receive the data sent by the byte statistics unit and count the frequency of each type of byte in the data, and the frequency combination module is configured to accumulate the frequency counted by the frequency distribution module;
and the data flow control unit is configured to receive the data sent by the byte counting unit and control the byte counting unit to receive the data according to the total amount of the received data counted by the byte counting unit and the amount and type of the data sent by the byte counting unit.
According to one embodiment of the invention, LZ77 encoded data includes character type data, length type data, and distance type data, each of which is 1 byte in size.
According to one embodiment of the present invention, the frequency statistic unit includes:
each frequency allocation module comprises a plurality of groups of frequency division units, and is configured to count in the corresponding frequency division units according to the type of the received bytes and the value of the bytes mapped in the deflate standard protocol;
the number of the summers is the same as that of the frequency dividing units in one frequency distribution module, the input end of each summer is respectively connected with the output ends of the same frequency dividing units in the eight frequency distribution modules, the output end of each summer is connected with the input end of the counter, and the output end of the counter outputs the frequency of each character.
According to an embodiment of the present invention, the frequency division unit includes:
an accumulator, one input of which receives a byte;
and the input end of the counter is connected to the output end of the accumulator, and the output end of the counter is connected to the other input end of the accumulator.
According to one embodiment of the present invention, the lite 0 (character 0) to lite 255 (character 255) frequency division units in each frequency allocation module count frequencies of data occurrences of a character type, the Len0 (length 0) to Len28 (length 28) frequency division units count frequencies of data occurrences of a length type, and the dist0 (distance 0) to dist29 (distance 29) frequency division units count frequencies of data occurrences of a distance type.
According to an embodiment of the present invention, the frequency allocation module is configured to, after receiving a byte in the data sent by the byte counting unit, determine the type of the byte and the size of the value mapped by the byte in the deflate standard protocol, and send the byte to the input end of the accumulator of the frequency dividing unit corresponding to the type of the byte and the size of the mapped value according to the type of the byte and the size of the value mapped by the byte.
According to an embodiment of the present invention, the byte counting unit is further configured to set a total amount of the received data to 32768 bytes, and an amount of data transmitted to the frequency counting unit at a time does not exceed 8 bytes.
According to an embodiment of the present invention, the data flow control unit is configured to determine whether a sum of a total amount of data received by the byte counting unit and an amount of data to be received is greater than 32768 bytes, and if it is determined that the sum of the total amount of data received by the byte counting unit and the amount of data to be received is less than 32768 bytes, the data flow control unit controls the byte counting unit to receive the entire amount of data to be received.
According to an embodiment of the present invention, the data flow control unit is configured to determine whether a sum of a total amount of data received by the byte counting unit and an amount of data to be received is greater than 32768 bytes, if it is determined that the sum of the total amount of data received by the byte counting unit and the amount of data to be received is greater than or equal to 32768 bytes, the data flow control unit determines that the total amount of data received by the byte counting unit reaches a type of 32768 bytes and 32768 bytes, and if it is determined that the type of 32768 bytes is not data of a length type, the data flow control unit controls the byte counting unit to receive only the amount of 32768 bytes.
According to an embodiment of the present invention, the data flow control unit is configured to control the byte counting unit to receive the 32768 th byte and to receive the data amount of two bytes after the 32768 th byte if it is judged that the type of the 32768 th byte is the data of the length type when the total amount of the data received by the byte counting unit reaches 32768 bytes.
The invention has the following beneficial technical effects: the Huffman compressed coding character frequency statistical device suitable for the deflate format provided by the embodiment of the invention is characterized in that a byte statistical unit is arranged, and the byte statistical unit is configured to receive data coded by LZ77 and count the total amount of the received data; the frequency statistics unit comprises a frequency distribution module and a frequency combination module, the frequency distribution module is configured to receive the data sent by the byte statistics unit and count the frequency of each type of byte in the data, and the frequency combination module is configured to accumulate the frequency counted by the frequency distribution module; the data flow control unit is configured to receive the data sent by the byte counting unit, and controls the technical scheme that the byte counting unit receives the data according to the total amount of the received data counted by the byte counting unit, the data sending amount of the byte counting unit and the type of the byte counting unit, so that the throughput rate of the whole compression system can be improved, the characteristics of parallel calculation of hardware can be furthest exerted, the efficiency of a hardware circuit is improved, and the device has expansibility and can support different counting amounts in a parameter transmission mode.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a diagram of a Huffman compressed code character frequency statistics apparatus suitable for use in a deflate format according to an embodiment of the present invention;
fig. 2 is a diagram of a frequency statistics unit according to an embodiment of the invention.
Detailed Description
Embodiments of the present disclosure are described below. However, it is to be understood that the disclosed embodiments are merely examples and that other embodiments may take various and alternative forms. The figures are not necessarily to scale; certain features may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present invention. As one of ordinary skill in the art will appreciate, various features illustrated and described with reference to any one of the figures may be combined with features illustrated in one or more other figures to produce embodiments that are not explicitly illustrated or described. The combination of features shown provides a representative embodiment for a typical application. However, various combinations and modifications of the features consistent with the teachings of the present disclosure may be desirable for certain specific applications or implementations.
In view of the above objects, a first aspect of the embodiments of the present invention proposes an embodiment of a Huffman compression coding character frequency statistic apparatus suitable for deflate format. Fig. 1 shows a schematic view of the device.
As shown in fig. 1, the apparatus may include:
a byte counting unit configured to receive the LZ77 encoded data and count a total amount of the received data. The LZ77 encoded data includes character type data (primitive), length type data (length), and distance type data (distance), each of which has a size of 1 byte. The byte counting unit is configured to receive the data, and sets the total amount of the received data to 32768 bytes (that is, the data compression amount is 32KB each time), the data amount sent to the frequency counting unit does not exceed 8 bytes each time, a plurality of judgment circuits can be designed to simultaneously judge each byte of the data stream, and obtain the effective number of the input data this time, each data is 1 byte as small as possible, so that the data flow control module is required to judge whether the data is still received when the boundary of 32KB is reached, and also to receive several bytes of data. The byte counting module also needs to send data to a backward circuit (unit), the number of the data sent each time is 8, and when the data is sent for the last time, the situation of less than 8 bytes may occur according to different situations.
The byte counting device comprises a byte counting unit and a frequency combination module, wherein the byte counting unit is used for counting the occurrence frequency of each type of byte in data, and the frequency combination module is used for counting the frequency counted by the frequency distribution module. As shown in fig. 2, the frequency statistic unit includes eight identical frequency allocation modules, each frequency allocation module receives one byte of the data sent by the byte statistic unit, that is, each frequency allocation module receives one byte of the 8 bytes of data sent by the byte statistic unit, each frequency allocation module includes several groups of frequency division units, each frequency division unit includes an accumulator, one input end of the accumulator receives a byte, and a counter, an input end of the counter is connected to an output end of the accumulator, and an output end of the counter is connected to another input end of the accumulator. The frequency allocation module is configured to count in corresponding frequency division units according to the type of the received byte and the value mapped by the byte in the deflate standard protocol, the frequency of the data of the character type is counted from a lite 0 frequency division unit to a lite 255 frequency division unit in each frequency allocation module, the frequency of the data of the length type is counted from a Len0 frequency division unit to a Len28 frequency division unit, the frequency of the data of the distance type is counted from a dist0 frequency division unit to a dist29 frequency division unit, the frequency allocation module is configured to judge the type of the byte and the size of the value mapped by the byte in the deflate standard protocol after receiving one byte in the data sent by the byte counting unit, and send the byte to the input end of an accumulator of the frequency division unit corresponding to the type of the byte and the size of the mapped value according to the type of the byte and the size of the value mapped by the byte, for example, when a character a (one byte) enters the first frequency allocation module, the frequency allocation module first determines which type of Data the character a is, determines that a is the Data of the character type, and then determines the size of the value mapped by the character a in the DEFLATE standard protocol (DEFLATE Compressed Data Format Specification, Compressed Data Format Specification), and the size of the character a mapped value is 90, so that the first frequency allocation module allocates the character to a lite 90 frequency division unit, counts in the unit, and the rules for counting other types of characters are the same. The frequency statistics unit comprises a frequency combination module, the frequency combination module is composed of a plurality of adders, the number of the adders is the same as that of the frequency division units in one frequency distribution module, the input end of each adder is connected with the output ends of the same frequency division units in the eight frequency distribution modules respectively, the output ends of the adders are connected to the input end of the statistics device, and the output end of the statistics device outputs the frequency of each character. That is, the input end of each adder is connected to the output ends of 8 frequency dividing units, and the 8 frequency dividing units are the same, i.e. the lite 0 frequency dividing unit in the eight frequency allocation modules is connected to one adder, the lite 1 frequency dividing unit in the eight frequency allocation modules is connected to one adder, and so on, the frequency of all the characters appearing in the data can be obtained through the adders.
The data flow control unit is configured to receive the data sent by the byte counting unit and control the byte counting unit to receive the data according to the total amount of the received data counted by the byte counting unit and the amount and type of the data sent by the byte counting unit. The data flow control unit judges whether the sum of the total amount of data received by the byte counting unit and the amount of data to be received is greater than 32768 bytes, namely whether the amount of received data exceeds the boundary of 32KB, and if the sum of the total amount of data received by the byte counting unit and the amount of data to be received is judged to be less than 32768 bytes, the data flow control unit controls the byte counting unit to receive the whole amount of data to be received. If the sum of the total amount of the data received by the byte counting unit and the amount of the data to be received is judged to be greater than or equal to 32768 bytes, the data flow control unit judges that the total amount of the data received by the byte counting unit reaches the type of 32768 bytes, and if the type of the 32768 bytes is judged not to be the data of the length type, the data flow control unit controls the byte counting unit to receive the amount of the data of only 32768 bytes. If the type of the 32768 th byte is the data of the length type when the total amount of the data received by the byte counting unit reaches 32768 bytes, the data flow control unit controls the byte counting unit to receive the 32768 th byte and receive the data amount of the two bytes after the 32768 th byte, because the two bits after the data of the length type are the data of the distance type, when the boundary of the 32768 bytes is reached, if the 32768 th byte is the data of the length type, the data of the two last bytes also needs to be received.
The device can process 8Byte data in parallel and realize three-level flow. Judging the data type (real/length/distance) of the input 8-Byte data according to bytes, and then carrying out interval mapping and frequency statistics. For the data of the Literal type, it is mapped to the interval code corresponding to the closed interval [0, 255], for the data of the Length type, it is mapped to the interval code corresponding to the closed interval [0,28], for the data of the Distance type, it is mapped to the interval code corresponding to the closed interval [0, 29], and the end mark is on the interval code No. 256. And adding one to counters of the interval codes of the bytes 0-7 in each clock cycle, and counting the frequency of the interval codes corresponding to each Byte. And finally, adding the interval code counters corresponding to the Byte 0-Byte 7 according to the same interval code, and counting the final occurrence frequency of each character of the Literal, the Length and the Distance. The invention designs a parallel multi-stream design character statistical device suitable for deflate format Huffman compression coding, a multi-stage stream structure can complete parallel processing of input data streams, and the multi-stage stream design can well exert the computing performance of hardware. Meanwhile, the invention provides a hardware solution for the special case that the statistical process is at the 32KB boundary, and the solution can be extended to different boundary solutions of 8KB, 16KB, 24KB, etc., 32768 bytes in the description is the byte number (32 × 1024) represented by 32KB, that is, the 32KB is taken as an example for description, so that the data of 32768 bytes can be changed according to the preset value, and if the preset value is 16KB, 16384 bytes are adopted.
Through the technical scheme of the invention, the throughput rate of the whole compression system can be improved, the characteristics of parallel computation of hardware can be furthest exerted, the efficiency of a hardware circuit is improved, the device has expansibility, and different statistical quantities can be supported through a parameter transmission mode.
In a preferred embodiment of the present invention, LZ77 encoded data includes character type data, length type data, and distance type data, each of which is 1 byte in size.
In a preferred embodiment of the present invention, the frequency statistic unit includes:
each frequency allocation module comprises a plurality of groups of frequency division units, and is configured to count in the corresponding frequency division units according to the type of the received bytes and the value of the bytes mapped in the deflate standard protocol; the frequency allocation module is configured to count in corresponding frequency division units according to the type of the received byte and the value mapped by the byte in the deflate standard protocol, the frequency of the data of the character type is counted from a lite 0 frequency division unit to a lite 255 frequency division unit in each frequency allocation module, the frequency of the data of the length type is counted from a Len0 frequency division unit to a Len28 frequency division unit, the frequency of the data of the distance type is counted from a dist0 frequency division unit to a dist29 frequency division unit, the frequency allocation module is configured to judge the type of the byte and the size of the value mapped by the byte in the deflate standard protocol after receiving one byte in the data sent by the byte counting unit, and send the byte to the input end of an accumulator of the frequency division unit corresponding to the type of the byte and the size of the mapped value according to the type of the byte and the size of the value mapped by the byte, for example, when a character a (one byte) enters the first frequency allocation module, the frequency allocation module first determines which type of Data the character a is, determines that a is the Data of the character type, and then determines the size of the value mapped by the character a in the DEFLATE standard protocol (DEFLATE Compressed Data Format Specification, Compressed Data Format Specification), and the size of the character a mapped value is 90, so that the first frequency allocation module allocates the character to a lite 90 frequency division unit, counts in the unit, and the rules for counting other types of characters are the same.
The frequency combination module comprises a plurality of summers, the number of the summers is the same as that of the frequency dividing units in one frequency distribution module, the input end of each summer is respectively connected with the output ends of the same frequency dividing units in the eight frequency distribution modules, the output end of each summer is connected to the input end of the statistics device, and the output end of each statistics device outputs the frequency of each character. That is, the input end of each adder is connected to the output ends of 8 frequency dividing units, and the 8 frequency dividing units are the same, i.e. the lite 0 frequency dividing unit in the eight frequency allocation modules is connected to one adder, the lite 1 frequency dividing unit in the eight frequency allocation modules is connected to one adder, and so on, the frequency of all the characters appearing in the data can be obtained through the adders.
In a preferred embodiment of the present invention, the frequency division unit includes:
an accumulator, one input of which receives a byte;
and the input end of the counter is connected to the output end of the accumulator, and the output end of the counter is connected to the other input end of the accumulator.
In a preferred embodiment of the present invention, the lite 0 to lite 255 frequency division units in each frequency allocation module count frequencies of occurrence of data of character type, the Len0 to Len28 frequency division units count frequencies of occurrence of data of length type, and the dist0 to dist29 frequency division units count frequencies of occurrence of data of distance type.
In a preferred embodiment of the present invention, after receiving a byte in the data sent by the byte counting unit, the frequency allocation module is configured to determine the type of the byte and the size of the value mapped by the byte in the deflate standard protocol, and send the byte to the input end of the accumulator of the frequency dividing unit corresponding to the type of the byte and the size of the mapped value according to the type of the byte and the size of the value mapped by the byte.
In a preferred embodiment of the present invention, the byte counting unit is further configured to set the total amount of the received data to 32768 bytes, and the amount of data transmitted to the frequency counting unit at a time does not exceed 8 bytes.
In a preferred embodiment of the present invention, the data flow control unit is configured to determine whether a sum of a total amount of data received by the byte counting unit and an amount of data to be received is greater than 32768 bytes (a preset number of bytes), and if it is determined that the sum of the total amount of data received by the byte counting unit and the amount of data to be received is less than 32768 bytes, the data flow control unit controls the byte counting unit to receive the entire amount of data to be received.
In a preferred embodiment of the present invention, the data flow control unit is configured to determine whether a sum of a total amount of data received by the byte counting unit and an amount of data to be received is greater than 32768 bytes, if it is determined that the sum of the total amount of data received by the byte counting unit and the amount of data to be received is greater than or equal to 32768 bytes, the data flow control unit determines that the total amount of data received by the byte counting unit reaches a type of 32768 bytes and 32768 bytes, and if it is determined that the type of 32768 bytes is not data of a length type, the data flow control unit controls the byte counting unit to receive only the amount of 32768 bytes.
In a preferred embodiment of the present invention, the data flow control unit is configured to control the byte counting unit to receive the 32768 th byte and to receive the data amount of two bytes after the 32768 th byte if it is judged that the total amount of the data received by the byte counting unit reaches the data of the 32768 th byte whose type is the length type.
The invention provides a character frequency statistical device based on a deflate format, and for the traditional deflate format statistical mode, the invention has the following advantages:
1. a character frequency statistics parallel multistage pipeline processing circuit is designed, and 8Byte data are processed in parallel, so that the throughput rate of the whole compression system can be improved.
2. The data stream can be processed in real time, the type and the effective Byte number of the input data can be output in one clock cycle, the parallel computing characteristic of hardware can be exerted to the maximum extent, and the efficiency of a hardware circuit is improved.
3. For the deflate format-based hardware solution, the invention provides a hardware solution for the statistical boundary, and has expansibility, and different statistical quantities can be supported by a parameter transmission mode, such as: 8Kb, 16Kb and 24Kb, and has popularization.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
The embodiments described above, particularly any "preferred" embodiments, are possible examples of implementations and are presented merely to clearly understand the principles of the invention. Many variations and modifications may be made to the above-described embodiments without departing from the spirit and principles of the technology described herein. All such modifications are intended to be included within the scope of this disclosure and protected by the following claims.

Claims (9)

1. A device for counting the frequency of Huffman compression encoded characters suitable for deflate format, comprising:
a byte counting unit configured to receive the LZ77 encoded data and count a total amount of the received data;
a frequency statistic unit, wherein the frequency statistic unit comprises a frequency distribution module and a frequency combination module, the frequency distribution module is configured to receive the data sent by the byte statistic unit and count the frequency of each type of byte in the data, the frequency combination module is configured to accumulate the frequencies counted by the frequency distribution module, the frequency statistic unit comprises eight same frequency distribution modules, each frequency distribution module respectively receives one byte in the data sent by the byte statistic unit, each frequency distribution module comprises a plurality of groups of frequency division units, the frequency distribution module is configured to count in the corresponding frequency division unit according to the type of the received byte and the value mapped by the byte in the deflate standard protocol, the frequency combination module comprises a plurality of adders, the number of the adders is the same as the number of the frequency division units in one frequency distribution module, the input end of each adder is respectively connected with the output ends of the same frequency dividing units in the eight frequency distribution modules, the output end of each adder is connected with the input end of a counter, and the output end of each counter outputs the frequency of each character;
and the data flow control unit is configured to receive the data sent by the byte counting unit and control the byte counting unit to receive the data according to the total amount of the received data counted by the byte counting unit and the amount and the type of the data sent by the byte counting unit.
2. The apparatus of claim 1, wherein the LZ77 encoded data comprises character type data, length type data, and distance type data, each of which has a size of 1 byte.
3. The apparatus of claim 1, wherein the frequency division unit comprises:
an accumulator, one input of said accumulator receiving a byte;
and the input end of the counter is connected to the output end of the accumulator, and the output end of the counter is connected to the other input end of the accumulator.
4. The apparatus of claim 3, wherein the character 0 frequency division unit to character 255 frequency division unit in each frequency allocation module counts frequencies of occurrence of data of a character type, the length 0 frequency division unit to length 28 frequency division unit counts frequencies of occurrence of data of a length type, and the distance 0 frequency division unit to distance 29 frequency division unit counts frequencies of occurrence of data of a distance type.
5. The apparatus of claim 4, wherein the frequency allocation module is configured to, after receiving a byte in the data sent by the byte counting unit, determine a type of the byte and a size of a value mapped by the byte in a deflate standard protocol, and send the byte to an input terminal of an accumulator of the frequency dividing unit corresponding to the type of the byte and the size of the mapped value according to the type of the byte and the size of the value mapped by the byte.
6. The apparatus of claim 1, wherein the byte counting unit is further configured to set a total amount of received data to a preset number of bytes, and an amount of data transmitted to the frequency counting unit at a time does not exceed 8 bytes.
7. The apparatus according to claim 6, wherein the data flow control unit is configured to determine whether a sum of a total amount of data received by the byte counting unit and an amount of data to be received is greater than a preset number of bytes, and if the sum of the total amount of data received by the byte counting unit and the amount of data to be received is determined to be less than the preset number of bytes, the data flow control unit controls the byte counting unit to receive all the amount of data to be received.
8. The apparatus according to claim 6, wherein the data flow control unit is configured to determine whether a sum of a total amount of data received by the byte counting unit and an amount of data to be received is greater than a preset number of bytes, if the sum of the total amount of data received by the byte counting unit and the amount of data to be received is greater than or equal to the preset number of bytes, the data flow control unit determines a type of a first preset number of bytes when the total amount of data received by the byte counting unit reaches the preset number of bytes, and if the type of the first preset number of bytes is determined not to be data of a length type, the data flow control unit controls the byte counting unit to receive only the amount of data of the first preset number of bytes.
9. The apparatus of claim 8, wherein the data flow control unit is configured to control the byte counting unit to receive the first predetermined number of bytes and to receive the data amount of two bytes after the first predetermined number of bytes if the data flow control unit determines that the total amount of the data received by the byte counting unit reaches the predetermined number of bytes.
CN202210298076.1A 2022-03-25 2022-03-25 Huffman compression coding character frequency statistical device suitable for deflate format Active CN114401013B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210298076.1A CN114401013B (en) 2022-03-25 2022-03-25 Huffman compression coding character frequency statistical device suitable for deflate format

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210298076.1A CN114401013B (en) 2022-03-25 2022-03-25 Huffman compression coding character frequency statistical device suitable for deflate format

Publications (2)

Publication Number Publication Date
CN114401013A CN114401013A (en) 2022-04-26
CN114401013B true CN114401013B (en) 2022-06-21

Family

ID=81234412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210298076.1A Active CN114401013B (en) 2022-03-25 2022-03-25 Huffman compression coding character frequency statistical device suitable for deflate format

Country Status (1)

Country Link
CN (1) CN114401013B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102970043B (en) * 2012-11-14 2016-03-30 无锡芯响电子科技有限公司 A kind of compression hardware system based on GZIP and accelerated method thereof
US10693493B1 (en) * 2019-02-14 2020-06-23 International Business Machines Corporation Reducing latch count to save hardware area for dynamic Huffman table generation
CN113965207B (en) * 2021-12-17 2022-03-15 苏州浪潮智能科技有限公司 Deflate Huffman coding-based dynamic code table generation device and method

Also Published As

Publication number Publication date
CN114401013A (en) 2022-04-26

Similar Documents

Publication Publication Date Title
US6075470A (en) Block-wise adaptive statistical data compressor
CN100517979C (en) Data compression and decompression method
CN102970043B (en) A kind of compression hardware system based on GZIP and accelerated method thereof
CN113965207B (en) Deflate Huffman coding-based dynamic code table generation device and method
US6919826B1 (en) Systems and methods for efficient and compact encoding
CN116016606B (en) Sewage treatment operation and maintenance data efficient management system based on intelligent cloud
CN101534124B (en) Compression algorithm for short natural language
WO2001063772A1 (en) Method and apparatus for optimized lossless compression using a plurality of coders
CN102880703A (en) Methods and systems for encoding and decoding Chinese webpage data
CN114401013B (en) Huffman compression coding character frequency statistical device suitable for deflate format
CN107437976B (en) Data processing method and equipment
Mahmood et al. An Efficient 6 bit Encoding Scheme for Printable Characters by table look up
Sari et al. The implementation of timestamp, bitmap and rake algorithm on data compression and data transmission from iot to cloud
CN106253910A (en) A kind of compaction coding method
US7930435B2 (en) Hub and spoke compression
CN113824449A (en) Static Huffman parallel coding method, system, storage medium and equipment
CN106559085A (en) A kind of normal form Hafman decoding method and its device
CN106549672B (en) A kind of three axis data compression methods of acceleration transducer
CN105553625A (en) Remote channel message compression method and system for electricity consumption collection system
Ginzburg et al. Short Message Compression Scheme for Wireless Sensor Networks
CN110995274B (en) Decompression method and device
CN112200301B (en) Convolution computing device and method
WO2023237121A1 (en) Data processing method and apparatus and related device
CN102891730A (en) Method and device for encoding satellite short message based on binary coded decimal (BCD) code
WO2023105647A1 (en) Flow information collecting system, flow information collecting method, and flow information collecting program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant