CN111817724B - Data compression circuit - Google Patents

Data compression circuit Download PDF

Info

Publication number
CN111817724B
CN111817724B CN202010710080.5A CN202010710080A CN111817724B CN 111817724 B CN111817724 B CN 111817724B CN 202010710080 A CN202010710080 A CN 202010710080A CN 111817724 B CN111817724 B CN 111817724B
Authority
CN
China
Prior art keywords
data
compressed
information
output
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010710080.5A
Other languages
Chinese (zh)
Other versions
CN111817724A (en
Inventor
李树青
王江
张永兴
孙华锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Original Assignee
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd filed Critical Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority to CN202010710080.5A priority Critical patent/CN111817724B/en
Publication of CN111817724A publication Critical patent/CN111817724A/en
Application granted granted Critical
Publication of CN111817724B publication Critical patent/CN111817724B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6017Methods or arrangements to increase the throughput

Abstract

The invention discloses a data compression circuit, comprising: the data input module is used for reading data to be compressed according to a preset time interval and grouping the data to be compressed to obtain a grouped data group to be compressed; the index calculation module is used for calculating index values corresponding to the data original texts of the data group to be compressed in parallel; the multi-port index table storage module is used for searching data information corresponding to the index values; the matching search module is used for determining coding output information according to the data group to be compressed and the data information; the encoding module is used for encoding the encoding output information into a corresponding bit stream; the invention utilizes the search output of the data information in the index table stored in the multi-port index table storage module, avoids the use of input cache for storing the original text data in the whole search range and reduces the cost; and the parallel processing architecture of the preset number of index calculation modules is utilized to process multiple groups of input data at one time, so that the processing throughput rate is improved.

Description

Data compression circuit
Technical Field
The invention relates to the field of data compression, in particular to a data compression circuit.
Background
With the rapid development of new industries such as big data, AI and 5G, the generated mass data will grow exponentially, and the mass data will bring huge pressure on the existing storage devices. With the replacement of cloud computing on the traditional computing architecture, the structure of data storage is changed, and computing resources and storage resources are further centralized to a data center, so that pressure is further brought to server storage. In the face of these continuously increasing mass data, data compression becomes one of effective methods for reducing storage load of server and storage cost
The traditional software compression mode cannot meet the requirements of the current server due to the influence of factors such as large consumption of CPU resources, low compression throughput rate and the like, and a hardware compression circuit with high compression rate and high throughput rate is a future development direction.
The main stream of the traditional hardware circuit for data compression is realized by storing input data to be compressed in an input cache, wherein the stored length is the length of a sliding window defined by an index range; performing hash operation on input data of N bytes at a time, and searching a hash table according to a hash value to find out a corresponding address linked list; sequentially traversing the address linked list, reading corresponding original texts in the input cache for each address, comparing the original texts with the current input, and if the original texts are equal to the current input, indicating that the current data are matched; then continuously reading back the data of the input buffer and comparing the data with the input data to seek matching as long as possible, wherein the length is called the matching length of the address; comparing the matching lengths of all the addresses, and selecting the longest address for output; finally, the input is shifted one byte backward. However, in the hardware circuit for data compression in the prior art, an input buffer is required, since a search window of a compression algorithm such as an LZ4 (a lossless compression algorithm) protocol is as long as 64KB, the input buffer requires 64KB of SRAM (static random access memory) at minimum, and the SRAM needs to occupy a large area in a chip, and the chip cost and the area are in a direct proportion relationship, so that the cost of the hardware circuit is high; and for a block of input buffer, only one data can be read in each period, and the traditional hardware circuit needs to read the input buffer for many times to finish the judgment, so that the data throughput rate and the data compression efficiency are not high.
Therefore, how to provide a data compression circuit to improve data throughput and data compression efficiency and reduce hardware cost is a problem that needs to be solved urgently today.
Disclosure of Invention
The invention aims to provide a data compression circuit to improve data throughput rate and data compression efficiency and reduce hardware cost.
To solve the above technical problem, the present invention provides a data compression circuit, including:
the data input module is used for reading data to be compressed according to a preset time interval and grouping the data to be compressed to obtain a grouped data group to be compressed; the data group to be compressed comprises a data original text and an original text initial address;
the index calculation modules with preset quantity are used for calculating the index values corresponding to the data originals of the data group to be compressed in parallel; wherein the preset number is greater than or equal to the number of packets;
the multi-port index table storage module is used for searching data information corresponding to the index values output by the index calculation module; the data information comprises a data original text and an original text initial address;
the matching search module is used for determining coding output information according to the data group to be compressed and the data information; the encoding output information comprises a data original text, index position information and connection length information;
and the coding module is used for coding the coded output information into a corresponding bit stream according to the format of a preset data compression protocol.
Optionally, the circuit further includes:
the write-in control module is used for updating expired data information in the data information stored by the multi-port index table storage module according to the data group to be compressed, the data information and the index value; and the difference between the original text head address in the expired data information and the original text head address in the target data group to be compressed is larger than a preset maximum index range, and the target data group to be compressed is a data group to be compressed in the data group to be compressed, wherein the index value of the data group to be compressed is equal to the index value corresponding to the expired data information.
Optionally, the circuit further includes:
the output control module is used for updating the expired data information in the data information output by the multi-port index table storage module according to the index value output by the multi-port index table storage module and the updated data information output by the write-in control module; and updating the data information, wherein the updated data information comprises a data group to be compressed and an index value corresponding to each overdue data information.
Optionally, the update data information is output data information in which a write enable signal output by the write control module is 1; each output data information comprises a data group to be compressed, an index value and a write enable signal which correspond to each other.
Optionally, the write control module includes a first multi-stage pipeline register, and a first stage register of each first multi-stage pipeline register is configured to store one of the data information output by the multi-port index table storage module;
correspondingly, the output control module comprises second multistage pipeline registers, and a first stage register of each second multistage pipeline register is used for storing one data information output by the multi-port index table storage module.
Optionally, the first multistage pipeline register and the second multistage pipeline register have the same stage number.
Optionally, when the index value is a hash value, each index calculation module is specifically configured to calculate hash values corresponding to respective data originals of the data group to be compressed in parallel;
correspondingly, the multi-port index table storage module is specifically configured to search for data information corresponding to the hash value output by the index calculation module.
Optionally, the encoding module is specifically configured to encode the encoded output information into a corresponding bitstream according to a format of an LZ4 protocol.
Optionally, the circuit further includes:
and the refreshing control module is used for updating the address information of the data to be compressed and/or the original initial address of the data information.
Optionally, the matching search module includes:
the third multi-stage pipeline register is used for storing the input data group to be compressed and the latest data information; the first-stage register in the third multi-stage pipeline register is used for storing the latest data information of the packet quantity received at the current moment and the corresponding data group to be compressed;
the multi-stage pipeline logic processing circuit is used for marking invalid data in the data stored in the third multi-stage pipeline register and the residual linked list memory to obtain unmarked valid data; calculating the coding output information according to the effective data, and outputting the coding output information to the coding module; storing the latest data information corresponding to the uncoded data group to be compressed in the effective data into the residual linked list memory; the uncoded data group to be compressed is a data group to be compressed in the valid data, which does not belong to the coded output information, and the invalid data comprises latest data information with a data original text unequal to a data original text of a corresponding data group to be compressed and latest data information with a difference between an original text head address and an original text head address of the corresponding data group to be compressed larger than a preset maximum index range;
and the residual linked list memory is used for storing the latest data information corresponding to the uncoded data group to be compressed.
The invention provides a data compression circuit, comprising: the data input module is used for reading data to be compressed according to a preset time interval and grouping the data to be compressed to obtain a grouped data group to be compressed; the data group to be compressed comprises a data original text and an original text initial address; the index calculation modules with preset quantity are used for calculating the index values corresponding to the data originals of the data group to be compressed in parallel; wherein the preset number is greater than or equal to the number of the groups; the multi-port index table storage module is used for searching data information corresponding to the index values output by the index calculation module; the data information comprises a data original text and an original text initial address; the matching search module is used for determining coding output information according to the data group to be compressed and the data information; the encoding output information comprises a data original text, index position information and connection length information; the encoding module is used for encoding the encoded output information into a corresponding bit stream according to the format of a preset data compression protocol;
therefore, the method and the device have the advantages that the data information in the index table stored in the multi-port index table storage module is searched and output, the use of input cache for storing the original text data in the whole search range is avoided, the chip area is reduced, and the cost is reduced; and by using a parallelization processing architecture of a preset number of index calculation modules, multiple groups of input data can be processed at one time, so that the processing throughput rate is improved; the circuit of the invention adopts a forward processing structure and has no branch jump and backtracking, thereby realizing processing a batch of new input data in each clock cycle and ensuring continuous and stable high throughput rate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a block diagram of a data compression circuit according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of another data compression circuit according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an output control module according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a matching search module according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a block diagram of a data compression circuit according to an embodiment of the present invention. The circuit may include:
the data input module 10 is configured to read data to be compressed at preset time intervals, and group the data to be compressed to obtain a grouped data group to be compressed; the data group to be compressed comprises a data original text and an original text initial address;
a preset number of index calculation modules 20, configured to calculate, in parallel, index values corresponding to respective data originals of the data group to be compressed; wherein the preset number is greater than or equal to the number of the groups;
a multi-port index table storage module 30, configured to search data information corresponding to each index value output by the index calculation module 20; the data information comprises a data original text and an original text initial address;
the matching search module 40 is used for determining the coding output information according to the data group to be compressed and the data information; the encoding output information comprises a data original text, index position information and connection length information;
and the encoding module 50 is configured to encode the encoded output information into a corresponding bit stream according to a format of a preset data compression protocol.
It is understood that the data input module 10 in this embodiment may read the data to be compressed (i.e. the data to be compressed) at preset time intervals (i.e. data reading periods); grouping the data to be compressed to obtain data groups to be compressed in the number of groups, and sending each data group to be compressed to a corresponding index calculation module 20; that is, the preset number of output ends of the data input module 10 are respectively correspondingly connected with the index calculation module 20 one-to-one, and are used for sending the grouped data sets to be compressed to the respectively corresponding index calculation module 20.
Specifically, for the specific data reading time and quantity and the grouping manner of the data input module 10 in the embodiment, a designer may set the data reading time and quantity and the grouping manner according to a use scenario and a user requirement, for example, when a preset time interval is one clock cycle, the data input module 10 may read L bytes of data to be compressed in each clock cycle, and divide the L bytes of data to be compressed into N (i.e., the number of groups) groups to obtain N data groups to be compressed, so that each data group to be compressed is sent to an independent index calculation module 20 to calculate a corresponding index value. Wherein, the data original texts in the N data groups to be compressed can be overlapped; or there may be no overlap, that is, the data to be compressed of L bytes may be equally divided into N data groups to be compressed, and the byte number of the data original text of each data group to be compressed is L/N.
It should be noted that, in this embodiment, the index calculation modules 20 with the preset number may calculate, in parallel, index values corresponding to data originals of the data groups to be compressed with the packet number, and output the obtained index values with the packet number to the multi-port index table storage module 30, that is, the index calculation modules 20 with the packet number may perform index calculation on one data group to be compressed received at the same time, so as to obtain corresponding index values. As for the specific manner of index calculation performed by the index calculation module 20 on the data group to be compressed in this embodiment, that is, the specific type of the index value, may be set by a designer, as shown in fig. 2, the index calculation modules 20 in a preset number may specifically be hash calculation modules, which are configured to calculate hash values corresponding to data originals of the data group to be compressed in a packet number in parallel, and output the obtained hash values in the packet number to the multi-port index table storage module 30, that is, the index value may specifically be a hash value, and the multi-port index table storage module 30 may specifically be a multi-port hash table storage module that stores a hash table including hash values and corresponding data information; the index value may also be a numerical value in other indexing manners, as long as it is ensured that the corresponding data information can be indexed in the multi-port index table storage module 30 by using the index value, which is not limited in this embodiment.
Correspondingly, for the specific setting number of the index calculation modules 20 in this embodiment, that is, the numerical value of the preset number, may be set by a designer, if the preset number is equal to the packet number, that is, the packet number and the preset number may both be N, as shown in fig. 2, when the packet number is N, the data input module 10 may output N data groups to be compressed to a corresponding index calculation module 20, and the N index calculation modules 20 may output N hash values to the multi-port index table storage module 30 through the output ends (port 1 to port N) of each index calculation module 20 after calculating the corresponding hash value; the preset number may be greater than the number of packets, as long as the index calculation module 20 that ensures the number of packets can perform parallel index calculation on the data group to be compressed of the number of packets, which is not limited in this embodiment.
The multi-port index table storage module 30 in this embodiment may store an index table (i.e., an index linked list) including index values and corresponding data information, such as a hash table; so that the multi-port index table storage module 30 can find the corresponding data information according to the index value output by each index calculation module 20. Specifically, when the preset number is N and the preset time interval is one clock cycle, the multi-port index table storage module 30 may have N read ports, so that the N read ports can be simultaneously accessed in each clock cycle, and the N data (i.e., the index values) output by the N index calculation modules 20 are returned. That is, a predetermined number of read ports of the multi-port index table storage module 30 are respectively connected to the output end of a corresponding index calculation module 20.
It is understood that the data compression circuit provided in this embodiment may further include: the write-in control module is used for updating expired data information in the data information stored in the multi-port index table storage module 30 according to the data group to be compressed, the data information and the index value; and the difference between the original text head address in the expired data information and the original text head address in the target data group to be compressed is larger than a preset maximum index range, and the target data group to be compressed is a data group to be compressed in the data group to be compressed, which has the same index value corresponding to the expired data information.
That is, in this embodiment, the write control module respectively connected to the output ends of the data input module 10, the index calculation module 20 and the multi-port index table storage module 30 may be used to update the data information (i.e., the expired data information) stored in the index calculation module 20, in which the difference between the original header address of the N pieces of data information output by the index calculation module 20 and the original header address of the corresponding data group to be compressed is greater than the preset maximum index range, that is, to replace the expired data information stored in the index calculation module 20 with the corresponding data group to be compressed (i.e., the target data group to be compressed). For example, the write-in control module may determine whether the original address of the data information exceeds the index range by determining whether the difference between the original address of the data information and the original address in the corresponding data group to be compressed is greater than a preset maximum index range according to the currently read data information, and if so, pack the data original and the original address of the current input data (i.e., the data group to be compressed corresponding to the data information) into one data information, and write the data information into the multi-port index table storage module 30 to replace the currently read data information, otherwise, not perform the write-in action; that is, after the data information output by the multi-port index table storage module 30 is sent to the write-in control module, the write-in control module makes a difference between the address information (such as the initial address of the original text) in the data information and the current address (i.e., the address information in the corresponding data group to be compressed), if the result is greater than the maximum index range, the data information is "expired", the write-in control module receives the currently input original data text (the original data text in the corresponding data group to be compressed) at the same time, packs the original data text and the value of the current address counter into the data information, and writes the data information into the "expired" position according to the currently input hash value.
Specifically, in order to increase the circuit frequency, the write control module is a manner supporting pipeline implementation, so that updating the outdated data information stored in each index calculation module 20 requires multi-cycle operation. For example, the write control module may include first multi-stage pipeline registers, a first stage register of each first multi-stage pipeline register being used for storing a data message output by the multi-port index table storage module; for example, the write control module may include N first multi-stage pipeline registers, and the N data information output by the multi-port index table storage module 30 in one cycle may be respectively written into the first-stage register (i.e., the first-stage register of the pipeline in the input-to-output direction) of one first multi-stage pipeline register.
That is, as shown in fig. 2, the multi-port index table storage module 30 may have M write ports, so that M write ports can be simultaneously accessed in each cycle, and M data (i.e. a data group to be compressed corresponding to the stale data information) output by the write control module is used as data information for updating to replace the stale data information stored. That is, the M write ports of the multi-port index table storage module 30 are respectively connected to the M output ends of the write control module, which are used for the expired data information stored in the multi-port index table storage module 3, in a one-to-one manner.
Correspondingly, as shown in fig. 2, the data compression circuit provided by this embodiment may further include an output control module connected to the input ends of the multi-port index table storage module 30 and the write control module, and configured to update the expired data information in the data information output by the multi-port index table storage module 30 according to the index value output by the multi-port index table storage module 30 and the updated data information output by the write control module; the updated data information comprises a data group to be compressed and an index value corresponding to each overdue data information. Therefore, the data information output by the multi-port index table storage module 30 is not directly input to the matching lookup module 40, i.e. the output end of the multi-port index table storage module 30 can be connected with the input end of the matching lookup module 40 through the output control module.
That is to say, the output control module may update the expired data information in the stored data information output by the multi-port index table storage module 30 by using the updated data information output by the write control module, that is, the data information and the index value corresponding to the data information (that is, the expired data information) output by the multi-port index table storage module 30 that needs to be updated, so as to obtain the latest data information, and send the latest data information to the matching lookup module 40.
Specifically, the write control module is designed for multi-cycle operation, that is, when the write control module includes a first multi-stage pipeline register, a plurality of data that may exist before the data currently input by the write control module is in a state to be processed in the pipeline, in order to balance the above-mentioned influence of the write control module, the output control module may include a multi-stage pipeline to store data information output by the multi-port index table storage module 30, that is, the output control module may include a second multi-stage pipeline register, and a first stage register of each second multi-stage pipeline register is used for storing one data information output by the multi-port index table storage module; for example, the output control module may include N second multi-stage pipeline registers, and the N data information output by the multi-port index table storage module 30 in one cycle may be respectively written into the first-stage register (i.e., the first-stage register of the pipeline in the input-to-output direction) of one second multi-stage pipeline register.
Correspondingly, the output end of each stage of register of the second multi-stage pipeline register (P-stage pipeline register) may include a data comparison module, and compare the data information and the corresponding index value stored in each stage of register with the output data of the current write control module to generate data output to the next stage pipeline, as shown in fig. 3, the output control module may determine whether the write enable signal is 1 or not for a data group to be compressed (i.e., a data primitive date and a primitive header address cnt), a corresponding index value (e.g., a hash value) and a corresponding write enable signal (we) in the output data of the current write control module; if the write enable signal is 1, sequentially comparing the write enable signal with P-level registers on the same port, and if the corresponding index values are the same, replacing data information (namely overdue data information) in the corresponding register by a data group to be compressed as data output to a next-level pipeline; if the write enable signal is 0, the corresponding output data may be ignored. Correspondingly, if the write enable signal is 1, the write enable signal can also be compared with the P-level registers on the same port in sequence, and if the corresponding index value and the data original text are the same, the original text head address in the data group to be compressed replaces the original text head address in the data information (namely, the expired data information) in the corresponding register.
Specifically, the number of stages of the multistage pipeline in the output control module, that is, the number and the stage of the second multistage pipeline register, is not limited in this embodiment, for example, the number of stages of the second multistage pipeline register may be equal to the number of stages of the first multistage pipeline register, for example, P in fig. 3, so that the number of stages of the second multistage pipeline register is the same as the write enable latency of the write control module; the stage number of the second multistage pipeline register can be larger than that of the first multistage pipeline register; the number of second multistage pipeline registers may be equal to the number of first multistage pipeline registers, e.g. each number N of packets.
It should be noted that, the matching lookup module 40 in this embodiment may determine the information (i.e. the encoding output information) to be output to the encoding module 50 by using the input data information (e.g. the data information output by the multi-port index table storage module 30 or the data information output by the output control module) and the data group to be compressed output by the data input module 10.
Further, the matching lookup module 40 may support a pipeline implementation, that is, the matching lookup module 40 in this embodiment may include:
the third multi-stage pipeline register is used for storing the input data group to be compressed and the latest data information; the first-stage register in the third multi-stage pipeline register is used for storing the latest data information of the number of the packets received at the current time and the corresponding data group to be compressed;
the multi-stage pipeline logic processing circuit is used for marking invalid data in the data stored in the third multi-stage pipeline register and the residual linked list memory to obtain unmarked valid data; calculating coding output information according to the effective data, and outputting the coding output information to a coding module; storing the latest data information corresponding to the uncoded data group to be compressed in the effective data into a residual linked list memory; the uncoded data group to be compressed is the data group to be compressed in the valid data which does not belong to the coded output information, and the invalid data comprises the latest data information of which the data original text is not equal to the data original text of the corresponding data group to be compressed and the latest data information of which the difference between the original text head address and the original text head address of the corresponding data group to be compressed is larger than the preset maximum index range;
and the residual linked list memory is used for storing the latest data information corresponding to the uncoded data group to be compressed.
Specifically, as shown in fig. 4, the third multi-stage pipeline register (K-stage register) may store N data information (data information linked list, that is, latest data information) output by the K-time output control module (or the multi-port index table storage module 30) and N data groups to be compressed (original text and address) output by the data input module 10, that is, each stage of register may store N data information and N data groups to be compressed, and the N data information and N data groups to be compressed currently received by the matching lookup module 40 may be stored in the first stage register (that is, the first stage register of the pipeline in the output direction). The multistage pipeline logic processing circuit can acquire data in a third multistage pipeline register and data (a residual connection relation after K-stage front coding output) in a residual linked list memory, mark expired data (for example, latest data information with a difference between an original text head address and an original text head address of a corresponding data group to be compressed larger than a preset maximum index range) in the data as invalid, and mark the latest data information with the data original text in the data unequal to the data original text of the corresponding data group to be compressed as invalid, so as to obtain valid data which is not marked as invalid; by performing logic operation on all effective data, the longest connection length (namely connection length information) and the corresponding initial address (index position information) can be calculated, and encoding output information for encoding output is obtained; after the encoding is output, the data information (i.e. the latest data information) corresponding to the residual unencoded characters (i.e. the unencoded data group to be compressed) can be sent to the residual linked list memory for storage as the residual connection relationship. That is, the remaining linked list memory may store the linked list composed of the address pairs which are not encoded and output and may be further connected; that is, the latest data information corresponding to the uncoded data group to be compressed in the current N latest data information and the latest data information corresponding to the uncoded data group to be compressed in the latest data information stored before may be stored in the remaining linked list memory. That is, part of the newly added N pieces of latest data information may be reconnected with the part of the latest data information previously stored in the remaining linked list memory to become a new linked list (i.e., a linked list that cannot be encoded and output due to the potential of continuing the subsequent link) and stored in the remaining linked list memory.
That is to say, the multistage pipeline logic processing circuit may compare the input data original text of the data group to be compressed with the data original text of the corresponding data information (i.e. the latest data information) through the index value, and the data original text which is the same as the data original text of the corresponding data group to be compressed may be called as a data information matching element, find out all matching elements in each stage of data information, and then calculate the longest connection path and encode and output according to the context between the data information and the original text address relationship of the current data to be compressed; and finally, outputting data information matched with a plurality of continuous bytes of the current data to be compressed as a residual connection relation, and using the data information as a basis for judging the continuity of the subsequent data to be compressed and the current data to be compressed.
It is understood that the encoding module 50 in this embodiment may be a device that performs compression encoding on the encoded output information output by the matching search module 40 according to a format of a preset data compression protocol. The specific device type of the encoding module 50, that is, the specific selection of the preset data compression protocol, may be set by a designer, the preset data compression protocol may be an LZ4 protocol, as shown in fig. 2, the encoding module 50 may be specifically an LZ4 encoding module, and is configured to encode the encoding output information output by the matching lookup module 40 into a corresponding bitstream according to a format defined by an LZ4 protocol; the predetermined data compression protocol may also be other data compression protocols (such as LZ77 protocol) that need to search for matching characters, for example, the encoding module 50 may be specifically an LZ77 encoding module, and is configured to encode the encoded output information output by the matching lookup module 40 into a corresponding bitstream according to a format defined by the LZ77 protocol. This embodiment does not set any limit to this.
Specifically, as shown in fig. 2, the data compression circuit provided in this embodiment may further include a refresh control module, for updating the address information of the data to be compressed and/or the original header address of the data information, to ensure that the data compression process of the data compression circuit can be smoothly performed, for example, when a data block to be compressed is processed, when it is necessary to clear the data information in the multi-port index table storage block 30 and prevent the next data block from erroneously referencing the data information, the refresh control block may increase the address counter count by an index window length, that is, the address of the data to be compressed is increased by an index window length, so that the data information in the multi-port index table storage module 30 which is searched later is identified as the expired data information and is replaced without being referred, and the data information in the multi-port index table storage module 30 is quickly cleared.
Correspondingly, when the refresh control module needs to refresh data, for example, when the data address (such as an original initial address) of the current data to be compressed and the data address of the data information in the multi-port index table storage module 30 meet a certain condition, the refresh control module can notify the data input module to suspend reading data when the refresh is started, and simultaneously, the triggers written in the control module, the output control module and the matching lookup module 40 maintain the original state, thereby freezing the flow of data in the pipeline; reading the data in the multi-port index table storage module 30 in sequence, and performing operation on the address of the current data (namely the numerical value of the address counter) and the data address in the multi-port index table storage module 30 to obtain a new address and writing the new address back to the multi-port index table storage module 30; then, the data address in the flow line in the write-in control module, the output control module and the matching search module 40 is operated with the address of the current data to obtain a new address and write back the new address; and updating the address of the current data and finishing the refreshing process.
In this embodiment, the embodiment of the present invention uses the lookup and output of the data information in the index table stored in the multi-port index table storage module 30 to avoid the use of the input buffer for storing the original text data in the whole search range, thereby reducing the chip area and the cost; moreover, by using the parallelization processing architecture of the index calculation modules 20 with the preset number, multiple groups of input data can be processed at one time, and the processing throughput rate is improved; the data compression circuit provided by the embodiment of the invention adopts a forward processing structure and has no branch jump and backtracking, so that a batch of new input data can be processed in each clock cycle, and the continuous and stable high throughput rate is ensured.
The data compression circuit provided by the invention is described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (8)

1. A data compression circuit, comprising:
the data input module is used for reading data to be compressed according to a preset time interval and grouping the data to be compressed to obtain a grouped data group to be compressed; the data group to be compressed comprises a data original text and an original text initial address;
the index calculation modules with preset quantity are used for calculating the index values corresponding to the data originals of the data group to be compressed in parallel; wherein the preset number is greater than or equal to the number of packets;
the multi-port index table storage module is used for searching data information corresponding to the index values output by the index calculation module; the data information comprises a data original text and an original text initial address;
the matching search module is used for determining coding output information according to the data group to be compressed and the data information; the encoding output information comprises a data original text, index position information and connection length information;
the encoding module is used for encoding the encoded output information into a corresponding bit stream according to a format of a preset data compression protocol;
the data compression circuit further comprises:
the write-in control module is used for updating expired data information in the data information stored by the multi-port index table storage module according to the data group to be compressed, the data information and the index value; the difference between an original text head address in the outdated data information and an original text head address in a target data group to be compressed is larger than a preset maximum index range, and the target data group to be compressed is a data group to be compressed in the data group to be compressed, wherein the index value of the data group to be compressed is equal to that of the outdated data information;
the data compression circuit further comprises:
the output control module is used for updating the expired data information in the data information output by the multi-port index table storage module according to the index value output by the multi-port index table storage module and the updated data information output by the write-in control module; and updating the data information, wherein the updated data information comprises a data group to be compressed and an index value corresponding to each overdue data information.
2. The data compression circuit according to claim 1, wherein the update data information is output data information in which a write enable signal output by the write control module is 1; each output data information comprises a data group to be compressed, an index value and a write enable signal which correspond to each other.
3. The data compression circuit of claim 1 wherein the write control module includes first multi-stage pipeline registers, a first stage register of each of the first multi-stage pipeline registers for storing one of the data information output by the multi-port index table storage module;
correspondingly, the output control module comprises second multistage pipeline registers, and a first stage register of each second multistage pipeline register is used for storing one data information output by the multi-port index table storage module.
4. The data compression circuit of claim 3 wherein the first and second multi-stage pipeline registers are equal in number of stages.
5. The data compression circuit according to claim 1, wherein when the index values are hash values, each index calculation module is specifically configured to calculate hash values corresponding to respective data originals of the data group to be compressed in parallel;
correspondingly, the multi-port index table storage module is specifically configured to search for data information corresponding to the hash value output by the index calculation module.
6. The data compression circuit of claim 1, wherein the encoding module is specifically configured to encode the encoded output information into a corresponding bitstream in accordance with a format of an LZ4 protocol.
7. The data compression circuit of claim 1, further comprising:
and the refreshing control module is used for updating the address information of the data to be compressed and/or the original initial address of the data information.
8. The data compression circuit of any of claims 1 to 7 wherein the match lookup module comprises:
the third multi-stage pipeline register is used for storing the input data group to be compressed and the latest data information; the first-stage register in the third multi-stage pipeline register is used for storing the latest data information of the packet quantity received at the current moment and the corresponding data group to be compressed;
the multi-stage pipeline logic processing circuit is used for marking invalid data in the data stored in the third multi-stage pipeline register and the residual linked list memory to obtain unmarked valid data; calculating the coding output information according to the effective data, and outputting the coding output information to the coding module; storing the latest data information corresponding to the uncoded data group to be compressed in the effective data into the residual linked list memory; the uncoded data group to be compressed is a data group to be compressed in the valid data, which does not belong to the coded output information, and the invalid data comprises latest data information with a data original text unequal to a data original text of a corresponding data group to be compressed and latest data information with a difference between an original text head address and an original text head address of the corresponding data group to be compressed larger than a preset maximum index range;
and the residual linked list memory is used for storing the latest data information corresponding to the uncoded data group to be compressed.
CN202010710080.5A 2020-07-22 2020-07-22 Data compression circuit Active CN111817724B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010710080.5A CN111817724B (en) 2020-07-22 2020-07-22 Data compression circuit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010710080.5A CN111817724B (en) 2020-07-22 2020-07-22 Data compression circuit

Publications (2)

Publication Number Publication Date
CN111817724A CN111817724A (en) 2020-10-23
CN111817724B true CN111817724B (en) 2022-03-22

Family

ID=72861950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010710080.5A Active CN111817724B (en) 2020-07-22 2020-07-22 Data compression circuit

Country Status (1)

Country Link
CN (1) CN111817724B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7256715B1 (en) * 2005-01-07 2007-08-14 Altera Corporation Data compression using dummy codes
CN102930898A (en) * 2012-11-12 2013-02-13 中国电子科技集团公司第五十四研究所 Method of structuring multiport asynchronous storage module
CN106788447A (en) * 2016-11-29 2017-05-31 郑州云海信息技术有限公司 The matching length output intent and device of a kind of LZ77 compression algorithms
CN107565972A (en) * 2017-09-19 2018-01-09 郑州云海信息技术有限公司 A kind of compression method, device, equipment and the storage medium of LZ codings
CN109716659A (en) * 2016-07-22 2019-05-03 英特尔公司 High-performance list stream LZ77 compress technique

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7256715B1 (en) * 2005-01-07 2007-08-14 Altera Corporation Data compression using dummy codes
CN102930898A (en) * 2012-11-12 2013-02-13 中国电子科技集团公司第五十四研究所 Method of structuring multiport asynchronous storage module
CN109716659A (en) * 2016-07-22 2019-05-03 英特尔公司 High-performance list stream LZ77 compress technique
CN106788447A (en) * 2016-11-29 2017-05-31 郑州云海信息技术有限公司 The matching length output intent and device of a kind of LZ77 compression algorithms
CN107565972A (en) * 2017-09-19 2018-01-09 郑州云海信息技术有限公司 A kind of compression method, device, equipment and the storage medium of LZ codings

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
S. Henriques ; N. Ranganathan.A parallel architecture for data compression.《 Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990》.2002, *
基于FPGA的LZ4无损压缩算法优化设计;顾巍;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20190115;I135-343 *

Also Published As

Publication number Publication date
CN111817724A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
US7538695B2 (en) System and method for deflate processing within a compression engine
JP4995125B2 (en) How to search fixed length data
US20150121034A1 (en) Systems and Methods for Implementing Low-Latency Lookup Circuits Using Multiple Hash Functions
CN101681249A (en) Fifo buffer
JP2009531976A (en) High-speed data compression based on set associative cache mapping technology
US7082499B2 (en) External memory control device regularly reading ahead data from external memory for storage in cache memory, and data driven type information processing apparatus including the same
CN105573711B (en) A kind of data cache method and device
JP2004507858A (en) Hardware implementation of compression algorithms.
US9292549B2 (en) Method and system for index serialization
Liu et al. Succinct filters for sets of unknown sizes
CN107801044B (en) Backward adaptive device and correlation technique
US8868584B2 (en) Compression pattern matching
CN111541617B (en) Data flow table processing method and device for high-speed large-scale concurrent data flow
CN111817724B (en) Data compression circuit
CN112153054B (en) Method and system for realizing splicing cache with any byte length
US11586587B2 (en) Hardware-implemented file reader
US8363653B2 (en) Packet forwarding method and device
CN106385260B (en) A kind of FPGA realization system of the LZ lossless compression algorithm based on low delay
US9450606B1 (en) Data matching for hardware data compression
CN114153758B (en) Cross-clock domain data processing method with frame counting function
JP6168595B2 (en) Data compressor and data decompressor
CN112416820B (en) Data packet classification storage method and system
US11139829B1 (en) Data compression techniques using partitions and extraneous bit elimination
CN114610231A (en) Control method, system, equipment and medium for large-bit-width data bus segmented storage
WO2019227447A1 (en) Data processing method and processing circuit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant