CN114337682A - Huffman coding and compressing device - Google Patents

Huffman coding and compressing device Download PDF

Info

Publication number
CN114337682A
CN114337682A CN202111642072.2A CN202111642072A CN114337682A CN 114337682 A CN114337682 A CN 114337682A CN 202111642072 A CN202111642072 A CN 202111642072A CN 114337682 A CN114337682 A CN 114337682A
Authority
CN
China
Prior art keywords
information
module
coding
compressed
length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111642072.2A
Other languages
Chinese (zh)
Inventor
刘宇豪
张永兴
王振
马孔明
赵璠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Original Assignee
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd filed Critical Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority to CN202111642072.2A priority Critical patent/CN114337682A/en
Publication of CN114337682A publication Critical patent/CN114337682A/en
Withdrawn legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a Huffman coding compression device, which comprises a statistic module, a buffer, a tree building module and a coding module. In the idle state, the statistical module may begin to execute the corresponding statistical task after receiving the compression task request. The tree building module sends a first data request to the counting module in an idle state, if the counting module completes counting at the moment, the handshaking between the counting module and the post tree building module is successful, and the counting module transmits a counting result to the tree building module. And the coding module sends a second data request to the tree building module in an idle state, if the tree building module completes the tree building at the moment, the tree building module and the rear-stage coding module successfully handshake, and the tree building module transmits the tree building result to the coding module. Therefore, the multiple modules are divided, and a handshake mechanism is adopted between the front-stage module and the rear-stage module, so that the multiple modules are not coupled, the multiple modules can complete a parallel processing function, the data compression time is saved, and the data compression efficiency is improved.

Description

Huffman coding and compressing device
Technical Field
The invention relates to the field of data compression, in particular to a Huffman coding compression device.
Background
With the rapid development of technologies such as 5G, Internet of things, cloud computing, big data and artificial intelligence, the data to be stored is exponentially increased, and huge pressure is brought to the existing storage equipment. At present, an efficient and safe data compression technology becomes an effective method for reducing storage cost and saving storage resources.
Among conventional data compression methods, a commonly used data compression method is a Huffman (Huffman) coding compression method. However, the entire compression process of the existing Huffman coding compression method adopts a serial method, which results in that a large amount of data compression time is consumed when a large amount of data is compressed, and thus the data compression efficiency is low.
Therefore, how to provide a solution to the above technical problem is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a Huffman coding compression device, which is characterized in that a plurality of modules are divided, and a handshake mechanism is adopted between front-stage modules and rear-stage modules, so that the modules are not coupled, and the modules can process different data blocks to be compressed at the same time to complete a parallel processing function, thereby saving the data compression time and improving the data compression efficiency.
In order to solve the above technical problem, the present invention provides a Huffman coding and compressing apparatus, comprising:
the statistical module is used for performing information statistics on the data block to be compressed if a compression task request containing the data block to be compressed is received when the statistical module is in an idle state; after the information statistics of the data block to be compressed is completed, if a first data request initiated by a tree building module is received, transmitting the information statistics result of the data block to be compressed to the tree building module;
the buffer is used for buffering the data block to be compressed;
the tree building module is used for initiating the first data request when the tree building module is in an idle state, and building coding reference information required by Huffman coding of the data block to be compressed according to the information statistical result after receiving the information statistical result; after the coding reference information is constructed, if a second data request initiated by a coding module is received, transmitting the coding reference information to the coding module;
and the coding module is used for initiating the second data request when the coding module is in an idle state, reading the data block to be compressed from the buffer after receiving the coding reference information, and carrying out Huffman coding on the data block to be compressed based on the coding reference information to obtain a coded compressed data block.
Optionally, the Huffman coding and compressing apparatus further comprises:
the preprocessing module is respectively connected with the counting module and the buffer and used for performing LZ77 coding on total data blocks to be compressed to obtain a plurality of data blocks to be compressed with preset data sizes, sending a compression task request consisting of one data block to be compressed to the counting module every other preset counting period, and caching the currently sent data block to be compressed to the buffer;
each data block to be compressed comprises information to be compressed and indication information; the information to be compressed comprises three types of information, namely text information, character repetition length and character distance information; the indication information is used for representing the information category to which each data divided by a statistical unit belongs in the information to be compressed.
Optionally, the statistics module is specifically configured to, after receiving the compression task request, obtain information to be compressed and indication information included in a data block to be compressed in the compression task request, perform statistics on one piece of target information in the information to be compressed each time according to a statistics unit, determine a type of target information to which the target information currently counted belongs according to the indication information, and add 1 to an occurrence frequency of the target information in the type of the target information to obtain the information statistics result composed of the occurrence frequencies of the pieces of information in different information types.
Optionally, the buffer is a FIFO memory, and the FIFO memory is capable of buffering at least two of the data blocks to be compressed.
Optionally, the tree building module includes:
the sorting module is connected with the statistical module and used for sorting the occurrence frequency of each piece of information under the same information category after receiving the information statistical result;
the code length generating module is connected with the sorting module and is used for sequentially determining the code length corresponding to each information in the same information category according to the frequency arrangement sequence of each information in the same information category and the preset frequency code length corresponding relation so as to obtain a code length sequence consisting of the code lengths corresponding to each information in the same information category; wherein, the more frequent the information is, the shorter the code length corresponding to the information is;
the code table generating module is respectively connected with the code length generating module and the coding module and is used for generating Huffman coding information of each information one by one according to the code length corresponding to each information under the same information category so as to obtain a Huffman code table consisting of the Huffman coding information of each information under the same information category; the coding reference information comprises Huffman code tables under different information types;
the coding module is specifically configured to obtain Huffman coding information of each piece of information of the data block to be compressed by searching Huffman code tables in different information categories, and combine the obtained Huffman coding information to obtain a coded compressed data block.
Optionally, the tree building module further includes:
the run-length coding module is respectively connected with the code length generating module and the coding module and is used for respectively carrying out run-length coding on the code length sequences under different information categories to obtain run-length coding sequences under different information categories;
the CCL generation module is respectively connected with the run coding module and the coding module and is used for performing Huffman coding after the run coding sequences under different information categories are combined to obtain a CCL sequence; wherein the coded reference information further comprises the run-length coding sequence and the CCL sequence;
the encoding module is specifically configured to obtain a compressed data block including the run-length encoding sequence and the CCL sequence according to the encoding reference information.
Optionally, the code length generating module includes:
the character length code length generation module is connected with the sorting module and is used for correspondingly determining the code length corresponding to each piece of information under the text information category and the character repetition length category according to the frequency arrangement sequence of each piece of information under the text information category and the character repetition length category and the corresponding relation of the preset frequency code length so as to obtain a CL1 sequence consisting of the code lengths corresponding to each piece of information under the text information category and the character repetition length category;
and the distance code length generating module is connected with the sorting module and is used for sequentially determining the code length corresponding to each information under the character distance information category according to the frequency arrangement sequence of each information under the character distance information category and the preset frequency code length corresponding relation so as to obtain a CL2 sequence consisting of the code lengths corresponding to each information under the character distance information category.
Optionally, the code table generating module includes:
the character length code table generating module is respectively connected with the character length code length generating module and the coding module and is used for generating Huffman coding information of each information one by one according to the code length corresponding to each information under the text information type and the character repetition length type so as to obtain a first Huffman code table consisting of the Huffman coding information of each information under the text information type and the character repetition length type;
and the distance code table generating module is respectively connected with the distance code length generating module and the coding module and is used for generating Huffman coding information of each information one by one according to the code length corresponding to each information under the character distance information category so as to obtain a second Huffman code table consisting of the Huffman coding information of each information under the character distance information category.
Optionally, the run-length encoding module includes:
the first run-length coding module is respectively connected with the character length code length generating module and the coding module and is used for carrying out run-length coding on the CL1 sequence to obtain an SQ1 sequence;
the second run-length coding module is respectively connected with the distance code length generating module and the coding module and is used for carrying out run-length coding on the CL2 sequence to obtain an SQ2 sequence;
the CCL generation module is specifically configured to combine the SQ1 sequence and the SQ2 sequence and then perform Huffman coding to obtain a CCL sequence.
Optionally, the statistics module, the tree building module, and the encoding module are all implemented by a state machine design.
The invention provides a Huffman coding compression device which comprises a statistic module, a buffer, a tree building module and a coding module. In the idle state, the statistical module may begin to execute the corresponding statistical task after receiving the compression task request. The tree building module sends a first data request to the counting module in an idle state, if the counting module completes counting at the moment, the handshaking between the counting module and the post tree building module is successful, and the counting module transmits a counting result to the tree building module. And the coding module sends a second data request to the tree building module in an idle state, if the tree building module completes the tree building at the moment, the tree building module and the rear-stage coding module successfully handshake, and the tree building module transmits the tree building result to the coding module. Therefore, the plurality of modules are divided, and a handshake mechanism is adopted between the front-stage module and the rear-stage module, so that the plurality of modules are not coupled, the plurality of modules can process different data blocks to be compressed at the same time, a parallel processing function is completed, data compression time is saved, and data compression efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed in the prior art and the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a Huffman coding and compressing apparatus according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a multi-stage pipeline parallel compression hardware implementation according to an embodiment of the present invention;
FIG. 3 is a graph comparing the performance of a hardware circuit with a pipeline design and a hardware circuit without a pipeline design according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a Huffman coding and compressing apparatus according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating a state change of a state machine of a statistical module according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating a state change of a tree building module state machine according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a state change of a coding module state machine according to an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a Huffman coding compression device, through dividing a plurality of modules and adopting a handshake mechanism between front and rear modules, the modules are not coupled, and the modules can process different data blocks to be compressed at the same time to complete the parallel processing function, thereby saving the data compression time and improving the data compression efficiency.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a Huffman coding compression apparatus according to an embodiment of the present invention.
The Huffman coding and compressing device comprises:
the statistical module 1 is used for performing information statistics on the data block to be compressed if a compression task request containing the data block to be compressed is received when the statistical module is in an idle state; after completing the information statistics of the data block to be compressed, if receiving a first data request initiated by the tree building module 3, transmitting the information statistics result of the data block to be compressed to the tree building module 3;
the buffer 2 is used for buffering the data block to be compressed;
the tree building module 3 is used for initiating a first data request when the tree building module is in an idle state, and building coding reference information required by Huffman coding of a data block to be compressed according to an information statistical result after the information statistical result is received; after the coding reference information is constructed, if a second data request initiated by the coding module 4 is received, the coding reference information is transmitted to the coding module 4;
and the encoding module 4 is configured to initiate a second data request when the encoding module is in an idle state, and read the data block to be compressed from the buffer 2 after receiving the encoding reference information, so as to perform Huffman encoding on the data block to be compressed based on the encoding reference information, thereby obtaining an encoded compressed data block.
Specifically, the Huffman coding compression device of the present application includes a statistics module 1, a buffer 2, a tree building module 3 and a coding module 4, and the working principle thereof is as follows:
the statistical module 1 is in an IDLE state (IDLE) when there is no statistical task being processed inside, and in the IDLE state, if the statistical module 1 receives a compression task request including a data block to be compressed, the statistical module performs information statistics on the data block to be compressed. And after the statistics module 1 completes the information statistics of the data block to be compressed, if the tree building module 3 initiates a first data request to the statistics module 1, the statistics module 1 jumps to a state of outputting a statistics result, completes the handshake with the post-stage tree building module 3, and transmits the information statistics result of the data block to be compressed to the tree building module 3.
In the Huffman coding and compressing process, the coding module 4 needs to use the data block to be compressed, so in order to process a plurality of data blocks to be compressed at the same time, the buffer 2 is adopted to buffer the data block to be compressed for the subsequent coding module 4 to use.
The tree building module 3 is in an idle state when no processed tree building task exists inside, the tree building module 3 initiates a first data request to the counting module 1 in the idle state, if the counting module 1 completes counting at the moment, the counting module 1 and the post-stage tree building module 3 successfully handshake, and the counting module 1 transmits an information counting result of a data block to be compressed to the tree building module 3. And after receiving the information statistical result of the data block to be compressed, the tree building module 3 builds coding reference information required by Huffman coding of the data block to be compressed according to the information statistical result of the data block to be compressed. And after the coding reference information is constructed, if the coding module 4 sends a second data request to the tree construction module 3, the tree construction module 3 jumps to a state of outputting a tree construction result, completes handshaking with the later-stage coding module 4, and transmits the coding reference information to the coding module 4.
The coding module 4 is in an idle state when no coding task is being processed inside, in the idle state, the coding module 4 initiates a second data request to the tree building module 3, if the tree building module 3 completes the tree building at this time, the handshake between the tree building module 3 and the next-stage coding module 4 is successful, and the tree building module 3 transmits coding reference information to the coding module 4. After receiving the coding reference information, the coding module 4 reads the data block to be compressed from the buffer 2, and performs Huffman coding on the data block to be compressed based on the coding reference information to obtain a coded compressed data block.
Based on this, as shown in fig. 2, the ordinate in fig. 2 represents data of an input data block to be compressed, and the abscissa represents a pipeline cycle. In the 1 st pipeline cycle, statistics module 1 counts Data0 and stores Data0 Data in buffer 2. In the 2 nd pipeline cycle, the statistical module 1 and the tree building module 3 complete the handshake operation, the statistical result of Data0 is transmitted to the tree building module 3, the tree building module 3 starts to calculate the relevant Data compressed by Huffman coding of Data0, and the statistical module 1 is in an idle state at this time, when Data1 Data is input, the statistical task of Data1 is started, and Data1 is stored in the buffer 2, Data0 and Data1 are stored in the buffer 2, and at this time, the whole hardware circuit simultaneously processes the tree building task of Data0 and the statistical task of Data 1. In the 3 rd pipeline cycle, the tree building module 3 and the encoding module 4 complete handshake, the tree building result of Data0 is transmitted to the encoding module 4, and at this time, the encoding module 4 takes Data0 Data out of the buffer 2 for encoding; the statistics module 1 and the tree building module 3 complete handshake, transmit the statistics result of Data1 to the tree building module 3, and start the tree building task of Data 1; when the statistics module 1 completes the handshake with the tree building module 3 and is in an idle state, Data3 can be received, the statistics task of Data3 will be performed, and the Data3 will be stored in the buffer 2.
Therefore, the whole hardware circuit has a multi-stage pipeline function, three groups of data can be processed simultaneously, the first group of data is in an encoding process, the second group of data is in a tree building process, the third group of data is in a statistical process, three modules process different data blocks to be compressed in each time period, and compressed data are output in each time period, so that the throughput rate of the whole system can be improved. Assuming that the working time of each module is N, the hardware circuit with the parallel multistage pipeline design of the present application needs 5N times to complete Huffman compression of 3 groups of data, while the existing hardware circuit without the pipeline structure needs 9N times to complete Huffman compression of 3 groups of data. Therefore, when a large amount of data is processed, the hardware circuit having a pipeline structure can greatly shorten the data compression time. As shown in fig. 3, with the increase of input data, the hardware circuit with the pipeline design performs better than the hardware circuit without the pipeline design.
In summary, the plurality of modules are divided, and a handshake mechanism is adopted between the front-stage module and the rear-stage module, so that the plurality of modules are not coupled, and the plurality of modules can process different data blocks to be compressed at the same time to complete a parallel processing function, thereby saving data compression time and improving data compression efficiency.
On the basis of the above-described embodiment:
referring to fig. 4, fig. 4 is a schematic structural diagram of a Huffman coding compression apparatus according to an embodiment of the present invention.
As an alternative embodiment, the Huffman code compression device further comprises:
the preprocessing module is respectively connected with the counting module 1 and the buffer 2 and is used for performing LZ77 coding on total data blocks to be compressed to obtain a plurality of data blocks to be compressed with preset data sizes, sending a compression task request consisting of one data block to be compressed to the counting module 1 every other preset counting period, and caching the currently sent data block to be compressed to the buffer 2;
each data block to be compressed comprises information to be compressed and indication information; the information to be compressed comprises three types of information, namely text information, character repetition length and character distance information; the indication information is used for representing the information category to which each data divided by the statistical unit belongs in the information to be compressed.
Specifically, the Huffman coding and compressing device of the present application further comprises a preprocessing module, and the working principle thereof is as follows:
before statistics in the statistics module 1, the pre-processing module performs LZ77 encoding on the total data blocks to be compressed to obtain a plurality of data blocks to be compressed (smaller than or equal to 128KB) with a preset data size. After LZ77 encoding, the information to be compressed in the data block to be compressed exists in three forms of text information (Literal), character repetition Length (Length), and character Distance information (Distance), and in order to indicate the information category to which each data divided in statistical units in the information to be compressed belongs, the data block to be compressed also includes indication information (Indicator) for indicating the information category to which each data divided in statistical units in the information to be compressed belongs.
Then, the preprocessing module sends a compression task request composed of one data block to be compressed to the statistics module 1 every preset statistics period (determined by the time of the statistics module 1 for counting the data blocks to be compressed), that is, one data block to be compressed is sent to the statistics module 1 every time for the statistics module 1 to count, and after the statistics module 1 counts the last data block to be compressed, the preprocessing module sends the next data block to be compressed to the statistics module 1 for the statistics module 1 to count. And when the preprocessing module sends a compression task request consisting of one data block to be compressed to the counting module 1, the data block to be compressed which is currently sent to the counting module 1 is cached to the buffer 2 for the subsequent coding module 4 to use.
As an optional embodiment, the statistics module 1 is specifically configured to, after receiving the compression task request, obtain information to be compressed and indication information included in a data block to be compressed in the compression task request, perform statistics on one piece of target information in the information to be compressed each time according to a statistics unit, determine a target information category to which the current piece of target information to be counted belongs according to the indication information, and add 1 to the occurrence frequency of the target information in the target information category to obtain an information statistics result composed of the occurrence frequencies of the pieces of information in different information categories.
Specifically, the basic idea of data compression is: frequently occurring characters, represented shorter; rare characters appear, and can be expressed longer (in the normal case, the characters do not appear), so that the overall length of the data is reduced. The function of the statistical module 1 is: after receiving a compression task request, acquiring information to be compressed and indication information contained in a data block to be compressed in the compression task request, counting one piece of target information in the information to be compressed each time according to a counting unit, determining a target information category to which the current counted target information belongs according to the indication information, and adding 1 to the occurrence frequency of the target information in the target information category to obtain an information counting result consisting of the occurrence frequencies of the information in different information categories.
For example, one piece of target information in the information to be compressed, which is counted according to a counting unit (8bit), is "a", and the type of the target information to which the currently counted target information "a" belongs is determined to be text information according to the indication information, the occurrence frequency of the target information "a" in the type of the text information is added by 1, so that an information counting result consisting of the occurrence frequencies of the information in different information types is obtained.
It should be noted that the statistics module 1 may not perform statistics on all information to be compressed, and may perform statistics on only a part of information to be compressed (for example, the previous 32KB data). The statistical module 1 completes handshake with a front-stage module and a rear-stage module under the control of a Finite State Machine (FSM), and meets the design requirement of parallel multi-stage pipeline. The statistical process of the statistical module 1 is mainly divided into 4 stages (as shown in fig. 5):
stage 1.IDLE
The statistical module 1 has no statistical task in process, and the state of the statistical state machine is IDLE.
Stage 2. statistics of 32KB
In IDLE state, the statistic module 1 obtains the compression task request, and at this time, the skip state of the statistic state machine is 32KB, and the character frequency statistics is performed on the data block to be compressed.
Stage 3. completion of statistics
When the data statistics of 32KB is completed, the statistical state machine jumps to a statistical completion state.
Stage 4, outputting statistical results
After the statistics module 1 completes statistics, the tree building module 3 initiates a data request, and at this time, the statistics state machine skips to a state of outputting a statistics result, completes handshaking with the later-stage tree building module 3, and transmits the statistics result to the tree building module 3.
As an alternative embodiment, the buffer 2 is a FIFO memory 21, and the FIFO memory 21 is capable of buffering at least two data blocks to be compressed.
Specifically, the buffer 2 of the present application may select a First Input First Output (FIFO) memory 21, and in order to process three data blocks to be compressed simultaneously, the FIFO memory 21 can buffer at least two data blocks to be compressed. For the situation that the data block to be compressed is less than or equal to 128KB, the application can adopt a FIFO with 256KB, and ideally can process 3 data blocks to be compressed at the same time. For scenarios where the data block to be compressed is larger than 128KB, a larger FIFO needs to be replaced.
As an alternative embodiment, the tree building module 3 includes:
the sorting module 31 is connected with the statistical module 1 and used for sorting the occurrence frequency of each information under the same information category after receiving the information statistical result;
a code length generating module connected to the sorting module 31, configured to sequentially determine, according to a frequency arrangement order of each information in the same information category, a code length corresponding to each information in the same information category according to a preset frequency code length correspondence relationship, so as to obtain a code length sequence composed of code lengths corresponding to each information in the same information category; wherein, the more frequent the information is, the shorter the code length corresponding to the information is;
the code table generating module is respectively connected with the code length generating module and the coding module 4 and is used for generating Huffman coding information of each information one by one according to the code length corresponding to each information under the same information category so as to obtain a Huffman code table consisting of the Huffman coding information of each information under the same information category; the coding reference information comprises Huffman code tables under different information types;
the encoding module 4 is specifically configured to obtain Huffman encoding information of each piece of information of the data block to be compressed by searching Huffman code tables in different information categories, and combine the obtained Huffman encoding information to obtain an encoded compressed data block.
Specifically, the tree building module 3 of the present application includes a sorting module 31, a code length generating module and a code table generating module, and its working principle is:
after receiving the information statistics result transmitted by the statistics module 1, the sorting module 31 sorts the occurrence frequency of each information in the same information category, and sends the frequency arrangement sequence of each information in the same information category to the code length generation module.
The code length generating module determines the code length (code length, namely, coding length) corresponding to each information in the same information category according to the frequency arrangement sequence of each information in the same information category and the preset frequency code length corresponding relation in sequence to obtain a code length sequence consisting of the code lengths corresponding to each information in the same information category, so as to obtain the code lengths corresponding to each information in different information categories, and sends the code lengths corresponding to each information in different information categories to the code table generating module. Note that the code length corresponding to information that appears more frequently is shorter, and the code length corresponding to information that appears less frequently is longer, so that the overall data length is reduced.
The code table generating module generates Huffman coding information of each information one by one according to the code length corresponding to each information under the same information category to obtain a Huffman code table consisting of the Huffman coding information of each information under the same information category, so as to obtain Huffman code tables under different information categories, and sends the Huffman code tables under different information categories to the coding module 4.
The coding module 4 reads the data block to be compressed from the buffer 2, and obtains Huffman coding information of each information of the data block to be compressed by searching Huffman code tables in different information types, so as to combine the obtained Huffman coding information to obtain a coded compressed data block.
For example, if the code length corresponding to the target information "a" (8bit) is 5 in the text information category, 5-bit Huffman coding information is generated for the target information "a", and 8-bit data can be changed into 5-bit data after being coded, thereby shortening the data length.
It should be noted that, if the Huffman coding information cannot be found from one piece of information, the original data of the information is used as the Huffman coding information of the information.
As an alternative embodiment, the tree building module 3 further includes:
the run-length coding module is respectively connected with the code length generating module and the coding module 4 and is used for respectively carrying out run-length coding on the code length sequences under different information categories to obtain run-length coding sequences under different information categories;
a CCL generation module 35 connected to the run-length coding module and the coding module 4, respectively, and configured to perform Huffman coding after combining the run-length coding sequences in different information categories to obtain a CCL sequence; wherein, the coding reference information also comprises a run-length coding sequence and a CCL sequence;
the encoding module 4 is specifically configured to obtain a compressed data block including a run-length encoding sequence and a CCL sequence according to the encoding reference information.
Further, the tree building module 3 of the present application further includes a run length encoding module and a CCL generating module 35, and the working principle thereof is as follows:
the code length generation module also sends the code length sequences under different information categories to the run-length coding module. The run-length coding module performs run-length coding on the code length sequences under different information categories respectively (the main technology of the run-length coding is to detect repeated bit or character sequences and replace the repeated bit or character sequences with the occurrence times of the repeated bit or character sequences so as to further compress data), obtains run-length coding sequences under different information categories, and sends the run-length coding sequences under different information categories to the CCL generation module 35 and the coding module 4.
The CCL generation module 35 combines the run-length coding sequences of different information categories and performs Huffman coding to obtain a CCL sequence (a codeword length bitstream of the recorded run-length coding sequence), and sends the CCL sequence to the coding module 4. The compressed data block obtained by the encoding module 4 contains the run-length encoding sequence and the CCL sequence to obtain compressed data in a deflate format (a data compression format).
As an alternative embodiment, the code length generating module includes:
a character length code length generating module 321 connected to the sorting module 31, configured to determine, according to the frequency arrangement order of each piece of information in the text information category and the character repetition length category, the code length corresponding to each piece of information in the text information category and the character repetition length category according to a preset frequency code length correspondence, so as to obtain a CL1 sequence formed by the code lengths corresponding to each piece of information in the text information category and the character repetition length category;
the distance code length generating module 322 connected to the sorting module 31 is configured to sequentially determine, according to the frequency arrangement order of the information in the character distance information category, the code length corresponding to the information in the character distance information category according to a preset frequency code length correspondence relationship, so as to obtain a CL2 sequence consisting of the code lengths corresponding to the information in the character distance information category.
Specifically, the code length generating module of the present application includes a character length code length generating module 321 and a distance code length generating module 322, and the working principle thereof is as follows:
the character length code length generating module 321 is configured to determine code lengths corresponding to the respective information in the text information category and the character repetition length category, so as to obtain a CL1 sequence consisting of the code lengths corresponding to the respective information in the text information category and the character repetition length category. The distance code length generating module 322 is configured to determine a code length corresponding to each piece of information in the character distance information category to obtain a CL2 sequence consisting of the code lengths corresponding to each piece of information in the character distance information category.
As an alternative embodiment, the code table generating module includes:
the character length code table generating module 331 is respectively connected with the character length code table generating module 321 and the encoding module 4, and is used for generating Huffman encoding information of each information one by one according to the code length corresponding to each information under the text information category and the character repetition length category so as to obtain a first Huffman code table consisting of the Huffman encoding information of each information under the text information category and the character repetition length category;
the distance code table generating module 332, which is respectively connected to the distance code length generating module 322 and the encoding module 4, is configured to generate Huffman encoding information of each information one by one according to the code length corresponding to each information in the character distance information category, so as to obtain a second Huffman code table consisting of the Huffman encoding information of each information in the character distance information category.
Specifically, the code table generating module of the present application includes a character length code table generating module 331 and a distance code table generating module 332, and the working principle thereof is as follows:
the character length code table generating module 331 is configured to generate Huffman coding information of each information in the text information category and the character repetition length category to obtain a first Huffman code table composed of the Huffman coding information of each information in the text information category and the character repetition length category; the distance code table generating module 332 is configured to generate Huffman coding information of each information in the character distance information category to obtain a second Huffman code table consisting of the Huffman coding information of each information in the character distance information category.
As an alternative embodiment, the run-length encoding module comprises:
the first run-length encoding module 341, connected to the character length code length generating module 321 and the encoding module 4, is configured to perform run-length encoding on the CL1 sequence to obtain an SQ1 sequence;
the second run-length coding module 342 is respectively connected with the distance code length generating module 322 and the coding module 4, and is configured to perform run-length coding on the CL2 sequence to obtain an SQ2 sequence;
the CCL generation module 35 is specifically configured to combine the SQ1 sequence and the SQ2 sequence and then perform Huffman coding to obtain a CCL sequence.
Specifically, the run-length encoding module of the present application includes a first run-length encoding module 341 and a second run-length encoding module 342, and the working principle thereof is as follows:
the first run-Length encoding module 341 is configured to run-Length encode the CL1 sequence to obtain an SQ1 sequence (Literal/Length codeword Length bitstream). The second run-length encoding module 342 is configured to run-length encode the CL2 sequence to obtain an SQ2 sequence (Distance codeword length bitstream). The compressed data block obtained by the encoding module 4 contains the SQ1 sequence, the SQ2 sequence and the CCL sequence.
More specifically, the tree building module 3 completes handshaking with the former-stage module and the latter-stage module under the control of a state machine (FSM), and meets the hardware structure of multi-stage flow. In the whole tree building process, as shown in fig. 6, 8 stages are executed in series to ensure that after handshaking with the encoding module 4 at the subsequent stage, the encoding module 4 can obtain all information of the encoding reference information, which is to ensure that the encoding module 4 and the tree building module 3 can be used as a hardware circuit of a multi-stage pipeline. As shown in fig. 6, the tree building process is mainly divided into 8 stages:
stage 1.IDLE
When there are no tasks being processed by sorting module 31, distance code length generating module 322, character length code length generating module 321, distance code table generating module 332, character length code table generating module 331, first run encoding module 341, second run encoding module 342, and CCL generating module 35 in tree building module 3, the tree building state machine is IDLE at this time.
Stage 2. receiving statistical results
In the IDLE state, the tree building module 3 sends a first data request to the statistical module 1, the state machine of the statistical module 1 is in a statistical completion state, at this time, the handshake is successful, the statistical module 1 transmits the statistical result to the tree building module 3, and at this time, the tree building state machine skips to a state of receiving the statistical result.
Stage 3 completion of sequencing
After the sorting module 31 finishes sorting the statistical results of the data blocks to be compressed, the tree building state machine jumps to a sorting completion state.
Stage 4.Code Length complete
When the character Length Code Length generation module 321 completes the Length/Length Code Length generation of the data block to be compressed, and when the Distance Code Length generation module 322 completes the Distance Code Length generation of the data block to be compressed, the tree building state machine jumps to the Code Length completion state.
Stage 5 Code Table completion
When the character Length Code Table generating module 331 completes the generation of the live/Length Code Table of the data block to be compressed, and when the Distance Code Table generating module 332 completes the generation of the Distance Code Table of the data block to be compressed, the tree building state machine jumps to the Code Table completing state.
Stage 6. run length coding is completed
When the CL1 sequence generated by the character length code length generation module 321 completes run-length coding to generate an SQ1 sequence, and when the CL2 sequence generated by the distance code length generation module 322 completes run-length coding to generate an SQ2 sequence, the tree building state machine jumps to a run-length coding completion state.
Stage 7.CCL completion
And after the SQ1 sequence and the SQ2 sequence are subjected to Huffman coding to generate a CCL sequence, the tree building state machine jumps to a CCL completion state.
Stage 8, outputting tree building module result
After the tree building module 3 completes building of the live/Length code table, the Distance code table, the SQ1 sequence, the SQ2 sequence and the CCL sequence, if the encoding module 4 initiates the second data request, the tree building state machine skips to the state of outputting the tree building module result, completes the handshake with the next-stage encoding module 4, and transmits the tree building module result to the encoding module 4.
The encoding module 4 needs to obtain all information required for huffman encoding of the data block to be compressed from the tree building module 3, and read the data block to be compressed from the FIFO memory 21 for encoding and outputting. The coding module 4 completes handshake with the preceding stage module under the control of a state machine (FSM), and meets the hardware structure of multi-stage flow. The coding state machine has only two states: IDLE and encoding. As shown in fig. 7, when the coding state machine completes the coding task, it jumps to the IDLE state and initiates a second data request to the tree building module 3, and when the handshake with the tree building module 3 is completed, the coding state machine jumps to the coding state.
As an alternative embodiment, the statistical module 1, the tree building module 3 and the coding module 4 are all implemented by a state machine design.
Specifically, the statistical module 1, the tree building module 3, and the encoding module 4 of the present application can be realized by, but not limited to, a state machine design, and the present application is not particularly limited thereto.
It should be noted that, the sorting module 31, the distance code length generation module 322, the character length code length generation module 321, the distance code table generation module 332, the character length code table generation module 331, the first run coding module 341, the second run coding module 342, and the CCL generation module 35 in the tree building module 3 may remove inter-module coupling by adding a register, which brings an advantage that more stages of pipeline implementation may be added. The method only designs a hardware circuit of 3-level flow water, and the resources and the performance of 3-level flow water design are more balanced through experimental analysis, so that more levels of flow water circuits can be expanded in different scenes in the later period.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A Huffman code compression device, comprising:
the statistical module is used for performing information statistics on the data block to be compressed if a compression task request containing the data block to be compressed is received when the statistical module is in an idle state; after the information statistics of the data block to be compressed is completed, if a first data request initiated by a tree building module is received, transmitting the information statistics result of the data block to be compressed to the tree building module;
the buffer is used for buffering the data block to be compressed;
the tree building module is used for initiating the first data request when the tree building module is in an idle state, and building coding reference information required by Huffman coding of the data block to be compressed according to the information statistical result after receiving the information statistical result; after the coding reference information is constructed, if a second data request initiated by a coding module is received, transmitting the coding reference information to the coding module;
and the coding module is used for initiating the second data request when the coding module is in an idle state, reading the data block to be compressed from the buffer after receiving the coding reference information, and carrying out Huffman coding on the data block to be compressed based on the coding reference information to obtain a coded compressed data block.
2. A Huffman code compression device as recited in claim 1, wherein said Huffman code compression device further comprises:
the preprocessing module is respectively connected with the counting module and the buffer and used for performing LZ77 coding on total data blocks to be compressed to obtain a plurality of data blocks to be compressed with preset data sizes, sending a compression task request consisting of one data block to be compressed to the counting module every other preset counting period, and caching the currently sent data block to be compressed to the buffer;
each data block to be compressed comprises information to be compressed and indication information; the information to be compressed comprises three types of information, namely text information, character repetition length and character distance information; the indication information is used for representing the information category to which each data divided by a statistical unit belongs in the information to be compressed.
3. The Huffman coding and compressing device according to claim 2, wherein the statistics module is specifically configured to, after receiving the compression task request, obtain information to be compressed and indication information included in a data block to be compressed in the compression task request, perform statistics on one piece of target information in the information to be compressed each time according to a statistics unit, determine a type of target information to which the currently-counted target information belongs according to the indication information, and add 1 to an occurrence frequency of the target information in the type of the target information to obtain the information statistics result consisting of the occurrence frequencies of the pieces of information in different information types.
4. The Huffman code compression device as recited in claim 2, wherein said buffer is a FIFO memory, and wherein said FIFO memory is capable of buffering at least two of said data blocks to be compressed.
5. A Huffman code compression device as recited in claim 3, wherein said tree building module comprises:
the sorting module is connected with the statistical module and used for sorting the occurrence frequency of each piece of information under the same information category after receiving the information statistical result;
the code length generating module is connected with the sorting module and is used for sequentially determining the code length corresponding to each information in the same information category according to the frequency arrangement sequence of each information in the same information category and the preset frequency code length corresponding relation so as to obtain a code length sequence consisting of the code lengths corresponding to each information in the same information category; wherein, the more frequent the information is, the shorter the code length corresponding to the information is;
the code table generating module is respectively connected with the code length generating module and the coding module and is used for generating Huffman coding information of each information one by one according to the code length corresponding to each information under the same information category so as to obtain a Huffman code table consisting of the Huffman coding information of each information under the same information category; the coding reference information comprises Huffman code tables under different information types;
the coding module is specifically configured to obtain Huffman coding information of each piece of information of the data block to be compressed by searching Huffman code tables in different information categories, and combine the obtained Huffman coding information to obtain a coded compressed data block.
6. The Huffman code compression device as recited in claim 5, wherein said tree building block further comprises:
the run-length coding module is respectively connected with the code length generating module and the coding module and is used for respectively carrying out run-length coding on the code length sequences under different information categories to obtain run-length coding sequences under different information categories;
the CCL generation module is respectively connected with the run coding module and the coding module and is used for performing Huffman coding after the run coding sequences under different information categories are combined to obtain a CCL sequence; wherein the coded reference information further comprises the run-length coding sequence and the CCL sequence;
the encoding module is specifically configured to obtain a compressed data block including the run-length encoding sequence and the CCL sequence according to the encoding reference information.
7. The Huffman code compression device as recited in claim 6, wherein said code length generating module comprises:
the character length code length generation module is connected with the sorting module and is used for correspondingly determining the code length corresponding to each piece of information under the text information category and the character repetition length category according to the frequency arrangement sequence of each piece of information under the text information category and the character repetition length category and the corresponding relation of the preset frequency code length so as to obtain a CL1 sequence consisting of the code lengths corresponding to each piece of information under the text information category and the character repetition length category;
and the distance code length generating module is connected with the sorting module and is used for sequentially determining the code length corresponding to each information under the character distance information category according to the frequency arrangement sequence of each information under the character distance information category and the preset frequency code length corresponding relation so as to obtain a CL2 sequence consisting of the code lengths corresponding to each information under the character distance information category.
8. A Huffman code compression device as recited in claim 7, wherein said code table generating module comprises:
the character length code table generating module is respectively connected with the character length code length generating module and the coding module and is used for generating Huffman coding information of each information one by one according to the code length corresponding to each information under the text information type and the character repetition length type so as to obtain a first Huffman code table consisting of the Huffman coding information of each information under the text information type and the character repetition length type;
and the distance code table generating module is respectively connected with the distance code length generating module and the coding module and is used for generating Huffman coding information of each information one by one according to the code length corresponding to each information under the character distance information category so as to obtain a second Huffman code table consisting of the Huffman coding information of each information under the character distance information category.
9. A Huffman code compression device as recited in claim 7, wherein said run length coding module comprises:
the first run-length coding module is respectively connected with the character length code length generating module and the coding module and is used for carrying out run-length coding on the CL1 sequence to obtain an SQ1 sequence;
the second run-length coding module is respectively connected with the distance code length generating module and the coding module and is used for carrying out run-length coding on the CL2 sequence to obtain an SQ2 sequence;
the CCL generation module is specifically configured to combine the SQ1 sequence and the SQ2 sequence and then perform Huffman coding to obtain a CCL sequence.
10. A Huffman code compression device as recited in any of claims 1-9, wherein said statistics module, said tree building module and said encoding module are implemented by a state machine design.
CN202111642072.2A 2021-12-29 2021-12-29 Huffman coding and compressing device Withdrawn CN114337682A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111642072.2A CN114337682A (en) 2021-12-29 2021-12-29 Huffman coding and compressing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111642072.2A CN114337682A (en) 2021-12-29 2021-12-29 Huffman coding and compressing device

Publications (1)

Publication Number Publication Date
CN114337682A true CN114337682A (en) 2022-04-12

Family

ID=81017687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111642072.2A Withdrawn CN114337682A (en) 2021-12-29 2021-12-29 Huffman coding and compressing device

Country Status (1)

Country Link
CN (1) CN114337682A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115882867A (en) * 2023-03-01 2023-03-31 山东水发紫光大数据有限责任公司 Data compression storage method based on big data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115882867A (en) * 2023-03-01 2023-03-31 山东水发紫光大数据有限责任公司 Data compression storage method based on big data
CN115882867B (en) * 2023-03-01 2023-05-12 山东水发紫光大数据有限责任公司 Data compression storage method based on big data

Similar Documents

Publication Publication Date Title
US11044495B1 (en) Systems and methods for variable length codeword based data encoding and decoding using dynamic memory allocation
CN100553152C (en) Coding method and equipment and coding/decoding method and equipment based on CABAC
US10831655B2 (en) Methods, devices and systems for compressing and decompressing data
CN103067022B (en) A kind of integer data lossless compression method, decompression method and device
US9454552B2 (en) Entropy coding and decoding using polar codes
WO2019153700A1 (en) Encoding and decoding method, apparatus and encoding and decoding device
US7786907B2 (en) Combinatorial coding/decoding with specified occurrences for electrical computers and digital data processing systems
CN108886367B (en) Method, apparatus and system for compressing and decompressing data
CN112968706B (en) Data compression method, FPGA chip and FPGA online upgrading method
CN101534124B (en) Compression algorithm for short natural language
CN108023597B (en) Numerical control system reliability data compression method
CN114337682A (en) Huffman coding and compressing device
WO2001063772A1 (en) Method and apparatus for optimized lossless compression using a plurality of coders
CN113965207B (en) Deflate Huffman coding-based dynamic code table generation device and method
CN114442954B (en) LZ4 coding compression device
US20100085219A1 (en) Combinatorial coding/decoding with specified occurrences for electrical computers and digital data processing systems
CN110021368B (en) Comparison type gene sequencing data compression method, system and computer readable medium
CN111787325B (en) Entropy encoder and encoding method thereof
CN105099460A (en) Dictionary compression method, dictionary decompression method, and dictionary construction method
CN112054805B (en) Model data compression method, system and related equipment
CN116418348A (en) Data compression method, device, equipment and storage medium
CN113902097A (en) Run-length coding accelerator and method for sparse CNN neural network model
CN102891730B (en) Method and device for encoding satellite short message based on binary coded decimal (BCD) code
CN111628778B (en) Lossless compression method and device based on dynamic programming
Rajput et al. Comparative Study of Data Compression Techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20220412

WW01 Invention patent application withdrawn after publication