CN109361398B

CN109361398B - LZ process hardware compression method and system based on parallel and pipeline design

Info

Publication number: CN109361398B
Application number: CN201811182742.5A
Authority: CN
Inventors: 潘玉彪; 侯济恭; 林运国; 吴清顺
Original assignee: Linewell Software Co Ltd
Current assignee: Linewell Software Co Ltd
Priority date: 2018-10-11
Filing date: 2018-10-11
Publication date: 2022-12-30
Anticipated expiration: 2038-10-11
Also published as: CN109361398A

Abstract

The invention belongs to the field of computer storage data compression, and discloses an LZ process hardware compression method and system based on parallel and pipeline design. The inside of the module adopts a parallel design, namely a plurality of bytes are processed simultaneously at one time in the module; the six modules form a six-stage pipeline through the design of each module completed in one clock cycle. The invention avoids the merging operation in the modules of the same assembly line; in the result merging module of the previous beat, the matching result of the previous assembly line and the matching result of the current assembly line are designed and merged to produce a longer matching result.

Description

LZ process hardware compression method and system based on parallel and pipeline design

Technical Field

The invention belongs to the field of computer storage data compression, and particularly relates to an LZ process hardware compression method and system based on parallel and pipeline design.

Background

Currently, the current state of the art commonly used in the industry is such that:

with the development of modern science and technology, especially cloud computing and the outbreak of big data, mass data transmission and storage become more and more concerned problems in the field of computers. Therefore, the compression algorithm provides technical possibility for reducing transmission bandwidth and increasing storage efficiency. Generally, compression algorithms are divided into lossless compression and lossy compression, for data-sensitive applications, lossless compression is generally adopted to reduce the data volume, and when the compression algorithms need to be used, corresponding decompression algorithms are adopted to recover original data.

Because the implementation of the compression algorithm by software consumes valuable resources of the CPU, especially for CPU-intensive applications, the software compression and the application preempt the CPU, resulting in a decrease in system performance. Therefore, a better solution is to implement a compression algorithm through special hardware (FPGA/ASIC), that is, when the compression function is started, all data to be compressed is unloaded from the CPU to the special hardware for compression, and at this time, the CPU can continue to process the corresponding application program.

In order to ensure the robustness of the system, a system deploying a hardware compression function will also generally deploy software compression and decompression functions to prevent a scenario in which hardware compression fails. Therefore, the selection of the compression algorithm is biased toward a compression algorithm with higher performance. The compression of LZ (Lempel-Ziv) series provides an algorithm with a reasonable compression ratio, but the software compression and decompression performance is high, the system algorithm is realized by calculating a hash value for a plurality of bytes (LZ in GZIP is 3 bytes, LZ4 is 4 bytes) from the beginning of the current position, searching whether the hash value is recorded before in a hash table, if so, comparing the bytes at the corresponding position, and finding out matching information, namely matching length and offset (namely, how many bytes before the current byte can find matching), and certainly, matching failure can be caused by hash collision; then, updating the hash table by using the current hash value and the position; when LZ finds a match, it is equivalent to finding a set of information: (the length of the unmatched bytes, the unmatched original bytes, the matched length and the offset of the matched position) and then packaging according to a corresponding algorithm; the next location is then processed. For example for the string ABCDEABCDF \8230; \8230whenlz 4 is processed to position 6 a, a 4 byte match is found, i.e. the set of information found is: (5, ABCDE,4, 5), and finally outputting according to the packaging requirement of LZ 4. The process of the algorithm proceeds byte by byte (and jumps to the byte after the match if a match is encountered) according to a serial thread.

In summary, the problems of the prior art are:

(1) In the prior art, software is generally adopted for realization, and the performance of an LZ compression process generates large fluctuation due to different compression ratios of compressed files;

(2) The performance of the current fastest LZ-based compression algorithm can only reach hundreds of million per second (such as LZ 4), and for application scenes such as memory, ultra-high speed transmission and the like, the compression performance can not reach the requirement (such as GB/s or above);

(3) In the prior art, hardware compression based on parallel and pipeline design is not established aiming at an LZ process, so that the performance of LZ series compression cannot be greatly improved.

The difficulty and significance for solving the technical problems are as follows:

(1) The invention can ensure stable compression performance through LZ hardware compression process of parallel and pipeline design, namely, processing specific byte number in one period;

(2) According to the invention, the performance above GB/s can be obtained through the LZ hardware compression process of parallel and pipeline design, for example, for ASIC, clock master frequency is 800MHz, and the LZ hardware is compressed for 4 bytes in one cycle, so that the LZ hardware compression performance can obtain 3.2GB/s;

(3) The invention can slightly lose the compression ratio by adopting the parallel and pipeline design, so the invention adopts the fixed matching number in the pipeline (ensuring the parallel and pipeline characteristics), and the design of combining the matching results among pipelines finds longer matching as much as possible, thereby making up the loss of the compression ratio;

(4) Under the trend of software algorithm hardware, the invention can provide LZ hardware compression technology meeting performance requirements for memory-intensive application and application of ultra-high speed transmission.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an LZ process hardware compression method and system based on parallel and pipeline design. The invention sets a hardware compression method based on parallel and pipeline design aiming at an LZ process, thereby improving the performance of LZ series compression.

The invention is realized in this way, a LZ process hardware compression method based on parallel and pipeline design, comprising: firstly, a six-stage pipeline design is carried out by utilizing hardware characteristics. In order to be able to design a better pipeline, a maximum of 8-byte matching is performed for each position, i.e. for the current position, if a possible matching position is found by the hash table, thenIn the design, only 8 bytes are matched at the most, and 3 or 4 bytes are matched at the least (determined by a specific LZ algorithm). For example, for the string AAAAAAAAAAAA \8230;, when processing the second location a, the most matches to

And secondly, dividing the LZ process into six processes of calculating hash, searching a hash table, updating the hash table, searching matching information, combining with the last beat and outputting. Each process design is completed in one cycle. Merging with the previous beat means that if the position processed by the previous beat is matched with 8 bytes, the position possibly has a longer match, so that the matching result of the previous position needs to be sent to the current position and merged with the matching result of the current position. For example, the second position a mentioned above finds a match with an offset of 1 and a length of 8 bytes, and also finds a match with an offset of 1 and a length of 8 bytes for the third position a, and because the offsets are the same, the matches can be merged into a match with an offset of 1 and a length of 9 bytes.

The method comprises a calculation hash module, a search hash module, an update hash module, a matching module, a merging module with a last beat result and an output module. Each module can complete related tasks in one beat in one period, and the flow line idea is as follows:

after the hash module calculates the hash value of a plurality of bytes starting from the first position, the result is sent to the search hash module; simultaneously preparing, by the hash calculation module, a plurality of byte calculator hash values starting at the second position;

when the hash value of the first position of the lookup hash module is looked up for a possible matching offset (position) in the hash table, the result is sent to the update hash table module; simultaneously the lookup hash module prepares to lookup a potentially matching offset (location) in the hash table for the hash value of the second location;

after the hash table updating module updates the hash table for the hash value of the first position, the position which is possibly matched is sent to the matching module; meanwhile, the hash table updating module prepares to update the hash table for the hash value of the second position;

when the matching module searches the maximum 8-byte matching in the history window for the first position, the result is sent to the merging module in the previous beat; simultaneously, the matching module prepares to search for a maximum of 8 byte matches in the history window for the second position;

when the result merging module of the previous beat is that the first position matching result is merged with the previous position matching result, the result is sent to the output module; simultaneously, a previous beat result merging module prepares for merging the second position matching result and the first position matching result;

when the output module selects not to output or output the original byte or the matching result, namely the matching length and the matching offset, for the result of the first position combination; and simultaneously, the output module performs output preparation for the result combined by the second position.

In the pipeline design, the processing of one position is completed in each period after the sixth clock period, namely 1 byte is processed in one clock period; for a 200MHz FPGA or higher frequency hardware, such as an 800MHz ASIC, the performance of the pipelined LZ process will reach 200MB/S or 800MB/S.

Based on the flow design, the performance is changed in each module through a resource method, so that the parallel processing in the modules is realized. With the comprehensive consideration of resources and performance, 4-byte parallel processing is realized in the modules, that is, each module simultaneously completes the corresponding work of 4 positions, so that each module in the pipeline can simultaneously process four bytes. Based on the idea of pipeline design between 4 bytes parallel and modules in the module, the performance of the LZ process can reach 800MB/S or 3200MB/S.

Another object of the present invention is to provide a computer program for implementing the LZ process hardware compression method based on parallel and pipeline design.

The invention also aims to provide an information data processing terminal for realizing the LZ process hardware compression method based on parallel and pipeline design.

It is another object of the present invention to provide a computer-readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the LZ process hardware compression method based on parallel and pipeline design.

The invention also aims to provide an LZ process hardware compression system based on parallel and pipeline design, which comprises a computation hash module, a hash table searching module, a hash table updating module, a matching module, a result merging module with the last beat and an output module;

the modules are designed in parallel, namely, each module can process a plurality of positions simultaneously (for example, process 4 bytes at a time).

The six hardware modules form a six-stage flow through the design finished by each module in one clock cycle;

the hash calculation module calculates four hash values for four positions at the same time;

the hash module is searched to simultaneously compare hash values of the four positions, and possible matching positions are found out; for the position of the specific hash value needing to be compared, the method simultaneously performs three aspects of comparison: comparing the hash value of the first current position with the hash value of the previous position in the assembly line, comparing the hash value of the last assembly line needing to be updated to enter a hash table by using a bypass technology, comparing the third position with the hash table, and screening out possibly matched positions according to the first/second/third sequence;

the hash table updating module updates the hash values at four positions of the pipeline into a hash table at the same time; if write conflict occurs, only the hash value of the rear position is written;

the matching module respectively realizes longest 8/7/6/5 byte matching and least 4 byte matching for four positions at the same time, and if the 1/2/3/4 position can reach 8/7/6/5 byte matching, the position is called full matching;

the result merging module with the previous beat realizes the merging of the matching result of the current assembly line and the matching result of the previous assembly line, and transmits the merging result to the next assembly line in a parameter form;

if the last assembly line does not transmit the matching result, if the present assembly line has the full matching, selecting the full matching which is positioned most front in the present assembly line and transmitting the full matching to the module of the next assembly line, and transmitting the bytes in front of the position in the present assembly line to the output module according to the original bytes; otherwise, if the current assembly line has matching but not full matching, selecting the most front matching and message not needing to be combined in the current assembly line to transmit to the next assembly line, and transmitting the bytes in front of the position in the current assembly line to the output module according to the original bytes; otherwise, if the current assembly line has no match, the next assembly line is told that no match exists and no combination is needed, and the four bytes in the current assembly line are transmitted to the output module according to the original bytes.

If the last assembly line has a matching result but does not need to be combined, the matching result is transmitted to the output module, and meanwhile, the position of the matching result after the position of the assembly line is covered is judged: if the full match exists behind the position of the current assembly line, selecting the full match which is most close to the front behind the position in the current assembly line and transmitting the full match to the module of the next assembly line, and transmitting the bytes behind the position and before the full match in the current assembly line to the output module according to the original bytes; if there is match but not full match after the position of the current pipeline, selecting the message which is the most front match and does not need to be combined after the position in the current pipeline to transmit to the next pipeline, and transmitting the bytes after the position and before the match in the current pipeline to the output module according to the original bytes; otherwise, if there is no match after the position in the current pipeline, telling the next pipeline that there is no match and there is no need to merge, and transmitting the four bytes in the current pipeline to the output module according to the original bytes.

If the last pipeline has full matching and needs to be merged with the matching result of the current pipeline, the matching length is merged into Len0= Len0+ Leni- (5-i), wherein the meaning of i is that the matching result of the ith position in the current pipeline is merged with the matching result of the last pipeline, so that the output module does not need to output the result, if the result that the current pipeline needs to be merged is full matching, the matching result and the message that the matching needs to be continued are informed to the next pipeline, and if the result that the current pipeline needs to be merged is not full matching, the matching result and the message that the matching does not need to be continued are informed to the next pipeline. If the deviation value in the matching result of the previous pipeline is different from the deviation values of the matching results of the four positions of the current pipeline, the deviation values indicate that the matching results cannot be combined, the matching results are directly transmitted to the output module, and the messages without matching and without combination are transmitted to the next pipeline.

And the output module executes to output the original byte or output the matching result or select not to output according to the result of the merging module with the result of the last beat.

It is another object of the present invention to provide a manufacturing industry computer control apparatus that houses at least the parallel and pipeline design based LZ process hardware compression system.

In summary, the advantages and positive effects of the invention are:

(1) The LZ hardware compression process with parallel and pipeline design is adopted, and each position is processed once, so that the stable compression performance can be ensured, namely, a period is used for processing specific byte number; other LZ-series software compression algorithms skip byte processing after encountering matching, and therefore the performance fluctuates depending on the compression rate of the compression target.

(2) According to the invention, the performance above GB/s can be obtained through the LZ hardware compression process of parallel and pipeline design, for example, for ASIC, clock master frequency is 800MHz, and the LZ hardware is compressed for 4 bytes in one cycle, so that the LZ hardware compression performance can obtain 3.2GB/s; furthermore, the method can be extended to process more bytes per cycle, for example 6 bytes per cycle, so that at a clock master frequency of 800MHz, the LZ hardware compression performance will reach 4.8GB/s. While other LZ series software compression algorithms can only achieve the compression performance of less than 800MB/s by the fastest algorithm (LZ 4) under a 64-bit linux system of a 4GHz single-core i7-6700K CPU, as shown in the following table:

(4) Under the trend of software algorithm hardware, the invention can provide LZ hardware compression technology meeting performance requirements for memory-intensive applications and applications of ultra-high-speed transmission.

The LZ process is divided into six modules which are a calculating hash module, a searching hash table module, an updating hash table module, a matching module, a merging module with the last beat result and an output module respectively. The inside of the module adopts a parallel design, namely a plurality of bytes are processed simultaneously at one time (for example, 4 bytes are processed at one time); the six modules form a six-stage flowing water through the design of each module completed in one clock cycle; thus after the sixth clock cycle, the method will achieve the performance of processing 4 bytes per cycle. In addition, the method further ensures the compression rate of the LZ process by the following means, firstly, a hash table is searched in a hash table searching module, the hash value of the current processing position is compared with the hash value of the previous position in the assembly line, and the hash value generated by the previous assembly line is inquired by utilizing a bypass technology so as to ensure that the hash value inquired each time is the latest; secondly, maximum 8/7/6/5 byte matching is respectively realized for four positions in a matching module so as to avoid merging operation in the modules of the same assembly line; and finally, in a result merging module of the previous beat, designing and trying to merge the matching result of the previous assembly line and the matching result of the current assembly line to produce a longer matching result.

Drawings

Fig. 1 is a flow chart of an LZ process hardware compression method based on parallel and pipeline design according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of an LZ process hardware compression system based on parallel and pipeline design according to an embodiment of the present invention.

In the figure: m1, a hash calculation module; m2, searching a hash module; m3, updating the hash table module; m4, a matching module; m5, a result merging module with the previous beat; m6 and an output module.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

In the prior art, aiming at an LZ process, hardware compression based on parallel and pipeline design is not established, so that the performance of LZ series compression cannot be improved.

Fig. 1 shows that the LZ process hardware compression method based on parallel and pipeline design according to the embodiment of the present invention includes: firstly, a six-stage pipeline design is carried out by utilizing hardware characteristics. In order to be able to design a good pipeline, a maximum of 8-byte matches are made for each position, i.e. for the current position, if a possibly matching position is found by the hash table, then in the design at most only 8 bytes are matched, and at least 3 or 4 bytes are matched (determined with a particular LZ algorithm). For example, for the string AAAAAAAAAAAA \8230;, when processing the second location a, the most matches to

Secondly, the LZ process is divided into six processes of calculating hash, searching a hash table, updating the hash table, searching matching information, combining with the last beat and inputting. Each process design is completed in one cycle. Merging with the previous beat means that if the position processed by the previous beat is matched with 8 bytes, the position possibly has a longer match, so that the matching result of the previous position needs to be sent to the current position and merged with the matching result of the current position. For example, the second position a mentioned above finds a match with an offset of 1 and a length of 8 bytes, and also finds a match with an offset of 1 and a length of 8 bytes for the third position a, and because the offsets are the same, the matches with an offset of 1 and a length of 9 bytes can be merged.

The method comprises the steps of designing a computation hash module, a search hash module, an update hash module, a matching module, a merging module with a last beat result and an output module. Each module can complete related tasks in one beat in one cycle, and the flow line idea is as follows:

after the hash module calculates the hash value of a plurality of bytes starting from the first position, sending the result to a search hash module; simultaneously preparing, by the hash calculation module, a plurality of byte calculator hash values starting at the second position;

when the hash value of the first position is searched by the searching hash module, a possibly matched offset (position) is searched in the hash table, and the result is sent to the updating hash table module; simultaneously the lookup hash module prepares to lookup a potentially matching offset (location) in the hash table for the hash value of the second location;

when the matching module searches for the maximum 8-byte matching in the history window for the first position, the result is sent to the previous beat merging module; simultaneously, the matching module prepares to search for a maximum of 8 byte matching in the history window for the second position;

when the result merging module of the previous beat is that the first position matching result is merged with the previous position matching result, the result is sent to the output module; simultaneously, a result merging module in the previous beat prepares for merging the second position matching result and the first position matching result;

when the output module selects not to output or output the original byte or the matching result, namely the matching length and the matching offset, for the result of the first position combination; and simultaneously, the output module carries out output preparation for the combined result of the second position.

Based on the flow design, the performance is changed in each module through a resource method, so that the parallel processing in the modules is realized. On the basis of comprehensive consideration of resources and performance, 4-byte parallel processing is realized in the modules, that is, each module simultaneously completes corresponding work at 4 positions, so that each module in the pipeline can simultaneously process four bytes. Based on the idea of pipeline design between 4 bytes parallel and modules in the module, the performance of the LZ process can reach 800MB/S or 3200MB/S.

The invention is further described with reference to specific examples.

Example (b):

the invention aims at the LZ process to carry out flow among modules and parallel hardware design in the modules so as to improve the LZ compression performance.

Referring to fig. 2, the LZ process hardware compression system based on parallel and pipeline design according to the embodiment of the present invention divides the LZ process into six modules, namely, a computation hash module M1, a lookup hash module M2, an update hash table module M3, a matching module M4, a merging module M5 with a previous beat of result, and an output module M6, and pipeline design is adopted between the modules, that is, a result is generated in each time period after the sixth time period.

Secondly, parallel design is adopted in the modules, namely each module simultaneously processes tasks of four bytes, so that the result of 4 bytes in each time period can be achieved by comprehensive flow design.

The hash calculation module M1 respectively calculates four hash values for the four positions of the first four bytes and sends the four hash values to the hash search module;

the lookup hash module M2: in order to avoid read-write collision of the four positions on the hash table, the hash table is copied by the same four copies:

1) The hash table is checked for possible matches for the first position of the first four bytes.

2) Searching a second position of the first four bytes in a hash table at the same time and comparing the hash value with the hash value of the first position, wherein if the hash value of the second position is the same as the hash value of the first position, the offset which is possibly matched with the second position is 1; otherwise, the hash table lookup result is used as the standard.

3) Searching a third position of the first four bytes in a hash table, comparing the third position with the hash value of the second position, and comparing the third position with the hash value of the first position, wherein if the third position is the same as the hash value of the second position, the offset of the possible matching of the third position is 1; otherwise, if the hash value of the third position is the same as the hash value of the first position, the offset of the possible matching of the third position is 2; otherwise, the hash table lookup result is used as the standard.

4) Searching four positions of a first four-byte in a hash table, comparing the four positions with the hash value of a third position, comparing the four positions with the hash value of a second position and comparing the four positions with the hash value of a first position, wherein if the four positions are the same as the hash value of the third position, the offset of the possible matching of the fourth position is 1; otherwise, if the hash value of the fourth position is the same as the hash value of the second position, the offset of the possible matching of the fourth position is 2; otherwise, if the hash value of the fourth position is the same as the hash value of the first position, the offset of the possible matching of the fourth position is 3; otherwise, the hash table lookup result is used as the standard.

The lookup operation of the hash table compares the hash table with the hash value which is not updated into the hash table in the previous production line by using a bypass technology, so that the hash table is prevented from being not the latest hash value during lookup, and the compression ratio is prevented from being reduced; in addition, the module operates on four positions simultaneously.

The hash table updating module M3 updates the hash values and the positions of the four positions of the first four bytes into a hash table; if write conflict occurs, namely the hash values of more than two, namely a plurality of positions are the same, the related information of the last position is updated into the hash table; for example, the hash values of the four positions are the same, and only the hash value and the position of the fourth position are updated into the hash table.

The matching module M4 simultaneously scans the possible matching positions of the four positions of the first four bytes, and compares by bytes for these four positions:

1) The first position is matched with 8 bytes at most; (matching length, offset) is obtained, and expressed by (Len 1, off 1), if the matching length is less than 4, len1=0, off1=0;

2) The second location is matched for up to 7 bytes; (matching length, offset) is obtained, and expressed by (Len 2, off 2), if the matching length is less than 4, len2=0, off2=0;

3) The third position is matched by 6 bytes at most; (matching length, offset) is obtained, and expressed by (Len 3, off 3), if the matching length is less than 4, len3=0, off3=0;

4) The fourth position is matched with 5 bytes at most; (matching length, offset) is obtained, and expressed by (Len 4, off 4), if the matching length is less than 4, len4=0, off4=0;

in the previous-beat result merging module M5, (Len 0, off 0) is the result after the previous pipeline, i.e. the previous four-byte merging, flag is whether the result of the previous pipeline needs to be merged (TRUE indicates that merging is needed, FALSE indicates that merging is not needed), pos indicates that the previous pipeline has a matching result but does not need to be merged, and therefore the matching result is to be covered to the current pipeline position, for example, the first position of the previous pipeline has a 7-byte matching, and therefore pos is 3, i.e. the matching is covered to the third position of the current pipeline:

1) If flag = FLASE and both Len0 and Off0 are 0, it means that the result after the matching of the previous pipeline does not need to be merged with the matching result of the current pipeline, and no matching exists at any of the four positions of the previous pipeline;

if there is a full match, len1=8, len2=7, len3=6, or Len4=5. If Len1 is 8, assigning Len1 to Len0, assigning Off1 to Off0, assigning TRUE to flag, sending Len0, off0 and flag to the next production line for splicing, and outputting 0 original bytes to an output module; otherwise, if Len2 is 7, assigning Len2 to Len0, assigning Off2 to Off0, assigning TRUE to flag, sending Len0, off0 and flag to the next pipeline for splicing, and outputting 1 original byte to an output module; otherwise, if Len3 is 6, assigning Len3 to Len0, assigning Off3 to Off0, assigning TRUE to flag, sending Len0, off0 and flag to the next pipeline for splicing, and outputting 2 original bytes to an output module; otherwise, if Len4 is 5, assigning Len4 to Len0, assigning Off4 to Off0, assigning TRUE to flag, sending Len0, off0 and flag to the next pipeline for splicing, and outputting 3 original bytes to an output module;

otherwise, if the matching is carried out but the matching is not full, namely Len1 is more than or equal to 4 and less than 8, or Len2 is more than or equal to 4 and less than 7, len3 is more than or equal to 4 and less than 6, len4 is more than or equal to 4 and less than 5; if Len1 is greater than or equal to 4 and smaller than 8, len1 is assigned to Len0, off1 is assigned to Off0, FALSE is assigned to flag, len1-4 is assigned to pos, len0, off0, flag and pos are sent to the next pipeline, and 0 original byte is sent to an output module; otherwise, if Len2 is greater than or equal to 4 and smaller than 7, len2 is assigned to Len0, off2 is assigned to Off0, FALSE is assigned to flag, len2-3 is assigned to pos, len0, off0, flag and pos are sent to the next pipeline, and 1 original byte is sent to an output module; otherwise, if Len3 is more than or equal to 4 and less than 6, len3 is assigned to Len0, off3 is assigned to Off0, FALSE is assigned to flag, len3-2 is assigned to pos, len0, off0, flag and pos are sent to the next pipeline, and 2 original bytes are sent to an output module; otherwise, if Len4 is greater than or equal to 4 and less than 5, len4 is assigned to Len0, off4 is assigned to Off0, FALSE is assigned to flag, len4-1 is assigned to pos, len0, off0, flag and pos are sent to the next pipeline, and 3 original bytes are sent to an output module.

Otherwise, if the current four positions are not matched, the 4 original bytes are sent to an output module; and Len0=0, off =0, flag = false, to the next pipeline.

2) If flag = FLASE, and neither Len0 nor Off0 is 0, it indicates that the matching result of the previous pipeline needs to be output in the current beat, and the matching result of the previous pipeline is covered to the position of pos of the current pipeline;

outputting (Len 0, off 0) as matching information to an output module, looking up a fourth position from a pos +1 position, and operating according to a method in a merging module 1) with a result of the previous beat;

3) If flag = TRUE, it indicates that the merging operation with the last beat matching result is required.

If Off1= Off0 or Off2= Off0 or Off3= Off0 or Off4= Off0 indicates that merging is possible.

If Off1= Off0, off0 is unchanged, len0= Len0+ Len1-4, if Len1=8, flag = TRUE, and (Len 0, off 0) and flag are given to the next pipeline, and 0 original bytes are sent to the output module; otherwise, flag = FALSE, pos = Len1-4, and (Len 0, off 0) and flag, pos are given to the next pipeline, 0 original bytes are sent to the output module.

Otherwise, if Off2= Off0, then Off0 is unchanged, len0= Len0+ Len2-3, if Len2=7, then flag = TRUE, and (Len 0, off 0) and flag are assigned to the next pipeline, and 0 original bytes are sent to the output module; otherwise, flag = FALSE, pos = Len2-3, and (Len 0, off 0) and flag, pos are given to the next pipeline, and 0 original bytes are sent to the output module.

Otherwise, if Off3= Off0, then Off0 is unchanged, len0= Len0+ Len3-2, if Len3=6, then flag = TRUE, and (Len 0, off 0) and flag are assigned to the next pipeline, and 0 original bytes are sent to the output module; otherwise, flag = FALSE, pos = Len3-2, and (Len 0, off 0) and flag, pos are given to the next pipeline, and 0 original bytes are sent to the output module.

Otherwise, if Off4= Off0, then Off0 is unchanged, len0= Len0+ Len4-1, if Len4=5, then flag = TRUE, and (Len 0, off 0) and flag are assigned to the next pipeline, and 0 original bytes are sent to the output module; otherwise, flag = FALSE, pos = Len4-1, and (Len 0, off 0) and flag, pos are given to the next pipeline, and 0 original bytes are sent to the output module.

Otherwise, indicating that the merging cannot be performed, directly outputting (Len 0, off 1) through the output module, and sending Len0=0, off =0, flag = false to the next pipeline.

The output module M6 executes outputting the original byte or the matching result or not according to the result of the merging module with the result of the previous beat.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, is implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

The above description is intended to be illustrative of the preferred embodiment of the present invention and should not be taken as limiting the invention, but rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims

1. An LZ process hardware compression method based on parallel and pipeline design is characterized in that the LZ process hardware compression method based on parallel and pipeline design comprises the following steps:

firstly, designing a six-stage pipeline by utilizing hardware characteristics;

the second step, dividing the LZ process into six processes of calculating hash, searching a hash table, updating the hash table, searching matching information, combining with the last beat and outputting; each process design is completed in one cycle;

in the six-stage pipeline design, 8 bytes at most are matched at each position of the pipeline, and for the current position, the position possibly matched is found through a hash table, at most, 8 bytes are matched, and at least 3 or 4 bytes are matched;

the merging process with the previous beat is as follows: if 8-byte matching is obtained at the position processed by the previous beat, the possibility of longer matching is shown, and the matching result of the previous beat needs to be sent to the current beat and is combined with the matching result of the current beat;

the LZ process divides into the computation hash, looks up the hash table, updates the hash table, matches the information and looks for, with last bat merge, exports six in-process, by computation hash module, look up hash table module, update hash table module, matching information respectively look up the module, with last bat merge module, output module carries out, specifically includes:

when the hash value of the first position is searched by the hash searching module, a possibly matched offset position is searched in the hash table, and the result is sent to the hash table updating module; meanwhile, the searching hash module is prepared for searching offset positions which are possibly matched in the hash table for the hash value of the second position;

after the hash table updating module updates the hash table for the hash value of the first position, the matched position is sent to the matching module; simultaneously, the hash table updating module is used for preparing to update the hash table for the hash value of the second position;

2. A computer program implementing the LZ process hardware compression method based on parallel and pipeline design of claim 1.

3. An information data processing terminal implementing the LZ process hardware compression method based on parallel and pipeline design according to claim 1.

4. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the LZ process hardware compression method based on parallel and pipeline design of any of claims 1-3.

5. An LZ process hardware compression system based on parallel and pipeline design implementing the LZ process hardware compression method based on parallel and pipeline design of claim 1, wherein the LZ process hardware compression system based on parallel and pipeline design comprises:

the device comprises a calculation hash module, a lookup hash table module, an update hash table module, a matching module, a merging module with the last beat result and an output module;

the interior of each module is subjected to parallel design, and a plurality of positions are processed in each module simultaneously; the six hardware modules form a six-stage flow through the design finished by each module in one clock cycle;

the hash calculation module is used for calculating four hash values for four positions at the same time;

the searching hash module is used for simultaneously comparing hash values of the four positions and finding out possible matching positions; for certain locations where hash values need to be compared, three aspects of comparison are performed simultaneously: comparing the hash value of the first current position with the hash value of the previous position in the assembly line, comparing the hash value of the last assembly line needing to be updated to enter a hash table by using a bypass technology, comparing the third position with the hash table, and screening out possibly matched positions according to the first/second/third sequence;

the hash table updating module is used for updating the hash values of the four positions of the pipeline into a hash table at the same time; if write conflict occurs, only the hash value of the back position is written;

the matching module is used for respectively realizing longest 8/7/6/5 byte matching and least 4 byte matching for four positions at the same time, and if 1/2/3/4 of the positions can reach 8/7/6/5 byte matching, a certain position is full matching;

the result merging module is used for realizing that the matching result of the current assembly line is tried to be merged with the matching result of the previous assembly line and transmitting the merged result to the next assembly line in a parameter form;

and the output module is used for executing outputting the original byte or outputting the matching result or selecting not to output according to the result of the merging module with the result of the last beat.

6. An LZ process hardware compression system based on a parallel and pipelined design of claim 5, wherein the merge with last beat result module is further to:

if the last assembly line does not transmit the matching result, the assembly line has full matching, the full matching which is the most front position in the assembly line is selected to be transmitted to the merging module of the next assembly line and the last beat result, and the bytes in front of the position in the assembly line are transmitted to the output module according to the original bytes; otherwise, if the current pipeline has matching but not full matching, selecting the message which is most front matching and does not need to be combined in the current pipeline to be transmitted to the next pipeline, and transmitting the bytes in front of the position in the current pipeline to the output module according to the original bytes; otherwise, if the current assembly line is not matched, telling the next assembly line that the matching is not realized and the combination is not needed, and transmitting the four bytes in the current assembly line to an output module according to the original bytes;

if the last assembly line has a matching result but does not need to be combined, the matching result is transmitted to the output module, and meanwhile, the position of the assembly line covered by the matching result is judged: if the full match exists behind the position of the pipeline, selecting the full match which is the most front behind the position in the pipeline and transmitting the full match to the module of the next pipeline, and transmitting the bytes behind the position and before the match in the pipeline to the output module according to the original bytes; if there is match but not full match after the position of the current pipeline, selecting the message which is the most front match and does not need to be combined after the position in the current pipeline to transmit to the next pipeline, and transmitting the bytes after the position and before the match in the current pipeline to the output module according to the original bytes; otherwise, if the position in the current assembly line is not matched, telling the next assembly line that the position is not matched and the next assembly line is not combined, and transmitting the four bytes in the current assembly line to an output module according to the original bytes;

if the last pipeline has full matching and needs to be merged with the matching result of the current pipeline, if the offset value in the matching result of the last pipeline is the same as the offset value of the matching result of the current pipeline, the merging can be represented, so that the matching length is merged into Len0= Len0+ Leni- (5-i), wherein the meaning of i is that the matching result of the ith position in the current pipeline is merged with the matching result of the last pipeline, the output module is informed of no need of output, if the result that the current pipeline needs to be merged is full matching, the matching result and the message needing to be continuously matched are informed of the next pipeline, and if the result that the current pipeline needs to be merged is not full matching, the matching result and the message needing not to be continuously matched are informed of the next pipeline;

if the deviant values in the matching result of the previous pipeline are different from the deviant values of the matching results of the four positions of the current pipeline, the combination is not available, the matching results are directly transmitted to the output module, and the messages without matching and without combination are transmitted to the next pipeline.

7. A manufacturing computer control apparatus carrying at least the LZ process hardware compression system based on parallel and pipeline design of any of claims 5 to 6.