CN116683914A - Data compression method, device and system - Google Patents

Data compression method, device and system Download PDF

Info

Publication number
CN116683914A
CN116683914A CN202310472128.7A CN202310472128A CN116683914A CN 116683914 A CN116683914 A CN 116683914A CN 202310472128 A CN202310472128 A CN 202310472128A CN 116683914 A CN116683914 A CN 116683914A
Authority
CN
China
Prior art keywords
character string
data
dictionary
hash
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310472128.7A
Other languages
Chinese (zh)
Inventor
寇强
董阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Aoxian Technology Co ltd
Original Assignee
Shanghai Aoxian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Aoxian Technology Co ltd filed Critical Shanghai Aoxian Technology Co ltd
Priority to CN202310472128.7A priority Critical patent/CN116683914A/en
Publication of CN116683914A publication Critical patent/CN116683914A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a data compression method, a device and a system, wherein the data compression method is applied to a programmable logic device and comprises the following steps: acquiring input data, extracting characters to be processed of the input data, and generating a target character string; calculating the target character string according to a preset hash function, and determining hash data; reading a plurality of character string addresses corresponding to the hash data based on a preset hash dictionary; reading a plurality of corresponding character string data according to the plurality of character string addresses, and matching the target character string according to the plurality of character string data; and compressing the character to be processed according to the matching result. The data compression method, the device and the system provided by the application have the advantages that the preset hash function operation and the preset hash dictionary are added, the rapid compression of input data is realized through the programmable logic device hardware, and the real-time compression capability is improved.

Description

Data compression method, device and system
Technical Field
The present application relates to the field of data compression technologies, and in particular, to a data compression method, device, and system.
Background
In industrial production and scientific research, signals are generally sampled at high speed for a long time, and a large amount of sampled data is generated. In some special environments, the limitation of the volume and the power consumption of the receiver cannot be added with excessive memory, and the problem of introducing data compression technology is solved. The operation amount of the software compression algorithm is large, high CPU operation speed and data buffer space are needed, and the software compression algorithm is generally applied to non-real-time compression occasions with low time requirements, and the real-time compression of data is generally realized by hardware. The data is reconstructed after lossy compression, unlike the original data. Most data acquisition systems require lossless data compression due to uncertainty of the object under test. The implementation of the lossless compression function of data is currently implemented by a special compression chip, such as a lossless compression chip of adi corporation in the united states.
In the course of conception and implementation of the present application, the inventors found that at least the following problems exist: the special compression chip not only increases the difficulty and cost of hardware design, such as pcb (printed circuit board) design area and power consumption, and heat dissipation of the circuit board becomes more difficult, but also is difficult for future system program upgrade and program maintenance.
The foregoing description is provided for general background information and does not necessarily constitute prior art.
Disclosure of Invention
In order to alleviate the above problems, the present application provides a data compression method, apparatus and system.
In one aspect, the present application provides a data compression method, particularly applied to a programmable logic device, comprising:
acquiring input data, extracting characters to be processed of the input data, and generating a target character string;
calculating the target character string according to a preset hash function, and determining hash data;
reading a plurality of character string addresses corresponding to the hash data based on a preset hash dictionary;
reading a plurality of corresponding character string data according to the plurality of character string addresses, and matching the target character string according to the plurality of character string data;
And compressing the character to be processed according to the matching result.
Optionally, in the data compression method, when executing the obtaining input data, the step of extracting the character to be processed of the input data to generate the target character string includes:
acquiring multi-byte input data, and dividing the input data into a plurality of single-byte data;
extracting the characters to be processed one by one, reading the character string prefix, and combining the character string prefix and the characters to be processed to obtain the target character string.
Optionally, the step of combining the character string prefix and the character to be processed into the target character string includes:
reading a preset channel number of the single-byte data based on the plurality of single-byte data;
and outputting the single-byte data of the preset channel number to corresponding processing channels respectively, so that a plurality of processing channels process the single-byte data of the preset channel number in parallel.
Optionally, the preset hash function is a remainder function; the step of determining hash data in the data compression method, in which the target character string is operated according to a preset hash function, includes:
And performing remainder operation on the preset prime number by the target character string, and taking the obtained remainder as the hash data.
Optionally, the step of executing the data compression method to read the plurality of character string addresses corresponding to the hash data based on the preset hash dictionary includes:
inputting the hash data as a target hash address of the preset hash dictionary, and reading data stored in the target hash address;
and extracting a plurality of character string addresses with preset numbers from the data stored in the target hash address.
Optionally, the step of reading a plurality of corresponding character string data according to the plurality of character string addresses and matching the target character string according to the plurality of character string data includes:
based on a plurality of character string dictionaries, respectively reading a plurality of character string data stored corresponding to the plurality of character string addresses;
and comparing the target character string with the plurality of character string data respectively.
Optionally, the step of performing the data compression method to compare the target string with the plurality of string data respectively includes:
and setting up the character string dictionary by using a multiport memory and a preset dictionary depth so as to perform parallel comparison processing on the plurality of character string data.
Optionally, the step of performing the compression processing on the character to be processed according to the matching result in the data compression method includes:
when the target character string is matched, not outputting a coding value; and/or the number of the groups of groups,
and when the target character string is not matched, outputting the character string prefix as a coding value, adding the target character string to a character string dictionary, and adding a storage address corresponding to the target character string to the preset hash dictionary.
Optionally, the data compression method further includes, after executing the step of compressing the character to be processed according to the matching result:
when the target character string is matched, the target character string is used as the character string prefix, so that the next character to be processed is processed; and/or the number of the groups of groups,
and when the target character string is not matched, the character to be processed is taken as the character string prefix so as to process the next character to be processed.
Optionally, the data compression method includes, after executing the step of compressing the character to be processed according to the matching result:
and encrypting the coded values in the maintained character string dictionary by adopting a pseudo encryption algorithm.
In another aspect, the present application further provides a data compression apparatus, specifically, the data compression apparatus includes a processor and a memory;
the memory stores a computer program which, when executed by the processor, implements the steps of the data compression method as described above; and/or the number of the groups of groups,
the data compression device comprises a character string module, a hash algorithm module, a compression dictionary module, a coding module and an encryption module;
the data compression device is realized by using a programmable logic device;
the character string module is used for acquiring input data, extracting characters to be processed of the input data and generating a target character string;
the hash algorithm module is used for calculating the target character string according to a preset hash function and determining hash data;
the compression dictionary module is used for reading a plurality of character string addresses corresponding to the hash data based on a preset hash dictionary, reading a plurality of corresponding character string data according to the plurality of character string addresses, and matching the target character string according to the plurality of character string data;
the coding module is used for compressing the character to be processed according to the matching result;
The encryption module is used for encrypting the character to be processed after compression processing by using a pseudo encryption algorithm.
Optionally, the compression dictionary module in the data compression device includes a hash dictionary unit, at least one character string dictionary unit, and a matching decision unit;
the hash dictionary unit is used for updating a preset hash dictionary and reading a plurality of character string addresses corresponding to the hash data based on the preset hash dictionary;
each character string dictionary unit is used for updating a character string dictionary and reading a plurality of corresponding character string data according to the plurality of character string addresses;
the matching decision unit is used for matching the target character string according to the plurality of character string data.
Optionally, the character string dictionary unit in the data compression device uses a multiport memory to build the character string dictionary, so that the matching decision unit performs parallel comparison processing on the plurality of character string data.
In another aspect, the present application further provides a data compression system, specifically, the data compression system includes a data distribution module, at least one data compression device and a data aggregation module;
the data compression system is implemented using a programmable logic device;
The data distribution module acquires multi-byte input data and divides the input data into a plurality of single-byte data;
each data compression device processes a corresponding single-byte data respectively so as to process the plurality of single-byte data in parallel and generate a compression result respectively;
and the data aggregation module combines the compression results generated by each data compression device and outputs total compressed data.
As described above, the data compression method, device and system provided by the application are added with the preset hash function operation and the preset hash dictionary based on the principle of the string table compression algorithm, and the characteristics of high integration level, low power consumption, flexibility and parallel operation of the programmable logic device are combined, so that the rapid compression of input data is realized through the hardware of the programmable logic device, and the real-time compression capability is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a flowchart of a data compression method according to an embodiment of the application.
Fig. 2 is a block diagram of a dual port memory according to an embodiment of the present application.
Fig. 3 is a timing diagram illustrating a data compression method implemented by the data compression device according to an embodiment of the application.
Fig. 4 is a block diagram of a data compression apparatus according to an embodiment of the present application.
Fig. 5 is a block diagram of a compression dictionary module according to an embodiment of the present application.
Fig. 6 is a block diagram of a data compression system according to an embodiment of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments. Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the element defined by the phrase "comprising one … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element, and furthermore, elements having the same name in different embodiments of the application may have the same meaning or may have different meanings, the particular meaning of which is to be determined by its interpretation in this particular embodiment or by further combining the context of this particular embodiment.
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
It should be noted that the data compression method, device and system provided by the application are realized on the basis of the principle of a string table compression algorithm. The string table compression algorithm (LZW compression algorithm, lempel-Ziv-Welch Encoding) is a compression method for compressing files into small files based on a table search algorithm invented by Abraham Lempel, jacob Ziv and Terry Welch. Two common file formats used by the string table compression algorithm are the GIF image format and TIFF image format for web sites. The string table compression algorithm is also suitable for compressing text files.
The string table compression algorithm is to use shorter codes to represent longer strings to realize lossless compression of data by creating a string table. The string table compression algorithm has self-adaptive characteristic as a lossless compression algorithm, and still has a good compression effect under the condition of undefined signal statistics characteristic. Different characters in the original text file data are extracted, a compiling table is created based on the characters, and then indexes of the characters in the compiling table are used for replacing corresponding characters in the original text file data, so that the size of the original data is reduced. The compiling table is not created in advance, but is dynamically created according to the original file data, and the original compiling table is restored from the encoded data during decoding.
First embodiment
In one aspect, the present application provides a data compression method, and fig. 1 is a flowchart of a data compression method according to an embodiment of the present application.
Referring to fig. 1, in one embodiment, a data compression method is applied to a programmable logic device.
Illustratively, the programmable logic device may be a Field Programmable Gate Array (FPGA). The FPGA has the advantages of high clock frequency, small internal delay, pure hardware parallel control, high operation speed, flexible programming configuration, short development period, strong anti-interference capability, rich internal resources and the like, and is very suitable for real-time high-speed data compression. Alternatively, the application is not limited to the type of programmable logic device, and FPGA may be selected.
The data compression method comprises the following steps:
s10: and acquiring input data, extracting characters to be processed of the input data, and generating a target character string.
By converting the input data into different target character strings, a character string table is conveniently established, and a shorter code can be used for representing a longer character string to realize lossless compression of the data. Illustratively, on an FPGA, it takes 1 clock cycle to generate a target string from single byte data, i.e., characters to be processed, in the input data.
S20: and calculating the target character string according to a preset hash function, and determining hash data.
The hash data can be used to quickly find out whether the target character string exists in the character dictionary. Illustratively, on the FPGA, the hash function is preset to perform hash operation on the target string to generate the hash data, which requires 1 clock cycle.
S30: and reading a plurality of character string addresses corresponding to the hash data based on a preset hash dictionary.
Based on the hash collision principle, there may be multiple string addresses corresponding to the same hash data. Therefore, a plurality of character string addresses can be stored as one address list in correspondence with the same hash data.
S40: and reading a plurality of corresponding character string data according to the plurality of character string addresses, and matching the target character string according to the plurality of character string data.
Of the character strings corresponding to the plurality of read character string addresses, only one character string corresponds to the matching target character string. The target character strings to be searched can be accurately determined by means of matching and checking respectively. Illustratively, on the FPGA, a hash dictionary is preset to read a plurality of string addresses corresponding to the hash data, the corresponding plurality of string data is read according to the plurality of string addresses, and 3 clock cycles are required to match the target string according to the plurality of string data.
S50: and compressing the character to be processed according to the matching result.
When the character string corresponding to the target character string is found, the code corresponding to the target character string can be determined to be compressed. Illustratively, on the FPGA, the character to be processed is compressed according to the matching result, which requires 1 clock cycle. Alternatively, 1 clock cycle is required to encrypt the compressed character to be processed. It will be appreciated that 7 clock cycles are required to compress one single byte of data, and that if the operating clock CLK frequency of the data compression method is 200MHz and the clock cycle is 5ns, 35ns compresses one byte of data (27.25 MB/s). The clock frequency of the data compression method can be increased to increase the compression rate, and if the clock CLK frequency is 300MHz, the compression rate of the data compression method is 40.07MB/s. Alternatively, the present application does not limit the clock period required for each of steps S10 to S50.
In this embodiment, the data compression method adds a preset hash function operation and a preset hash dictionary based on the principle of a string table compression algorithm, combines the characteristics of high integration level, low power consumption, flexibility and parallel operation of a programmable logic device, and improves the real-time compression capability by implementing rapid compression of input data through hardware of the programmable logic device.
In one embodiment, the data compression method is performed in S10: the step of obtaining input data and extracting the character to be processed of the input data to generate a target character string comprises the following steps:
s11: multiple bytes of input data are acquired and divided into multiple single bytes of data.
Illustratively, 4 bytes of input data are received and split into 4 single byte data for processing for each single byte of data.
S12: extracting the character to be processed one by one, reading the character string prefix, and combining the character string prefix and the character to be processed into a target character string.
Illustratively, a plurality of single byte data are extracted one by one as characters to be processed, each character to be processed comprising one single byte data, and each single byte data combination string is prefixed to a target string of a first number of bits. For example, the single byte data, i.e. the character to be processed takes 8 bits, the character string prefix is the address of the character to be processed or the character string stored in the character string dictionary, takes 11 bits, and the target character string is composed of the character to be processed and the character string prefix, and takes 19 bits in total. The method opens up a storage space with 11 bits of fixed length in the FPGA to store the character string address, can prevent the variable-length character string from applying for the storage space with 19 bits of maximum length, and avoids wasting FPGA storage resources.
In one embodiment, the data compression method is performed in S12: extracting the character to be processed one by one and reading the character string prefix, and combining the character string prefix and the character to be processed into a target character string comprises the following steps:
s13: reading single byte data of a preset channel number based on the plurality of single byte data;
s14: and outputting the single-byte data with the preset channel number to the corresponding processing channels respectively, so that the plurality of processing channels process the single-byte data with the preset channel number in parallel.
Illustratively, the Hash dictionary module receives Hash data and outputs a list of addresses stored in the String dictionary for the current String string=p+c, for a total of 12 addresses, the 12 addresses storing 12 different String strings=p+c, respectively, but the Hash values corresponding to the 12 different String strings=p+c are the same; the first character string dictionary module, the second character string dictionary module and the third character string dictionary module respectively receive 4 different addresses, and then each character string dictionary module respectively reads out character strings corresponding to the 4 addresses from the character string dictionary.
In one embodiment, the predetermined hash function is a remainder function.
Optionally, the application does not limit the type of the preset hash function, comprehensively considers the operation speed, efficiency and uniqueness of the hash value of the preset hash function, and selects the proper type of the preset hash function. Illustratively, the preset hash function may be a remainder function. Because the speed of the residual taking function is relatively high, the hash operation can be completed only by taking the residual once, so that the compression rate can be ensured, the residual taking operation can be realized by utilizing the FPGA, and the hash operation can be completed only by one clock cycle.
The data compression method is performed at S20: the step of determining hash data includes the steps of:
s21: and performing remainder operation on the preset prime number by the target character string, and taking the obtained remainder as hash data.
Illustratively, receiving the target character string performs hash operation on the target character string by using a preset hash function, and generates hash data with a second bit number, so as to quickly find out whether the target character string exists in the character string dictionary. The hash data is equivalent to the target character string, and the hash data can be restored to the target character string by the reverse operation of the preset hash function. The remainder function formula is: h (x) =x mod m, where h (x) is hash data, x is a target string, and m is a preset prime number. Optionally, the application does not limit the size of m, m can be any constant, but prime numbers are selected as much as possible when the value of m is selected, so that the situation that too many keywords are mapped to one position can be avoided, and the efficiency of the remainder function is reduced. Illustratively, converting the 19-bit target string into 11-bit binary hash data, the hash data contains 2≡11=2048 bytes, and m may select a prime number 2039 close to 2048.
In one embodiment, the data compression method is performed in S30: the step of reading a plurality of character string addresses corresponding to the hash data based on the preset hash dictionary comprises the following steps:
s31: inputting hash data as a target hash address of a preset hash dictionary, and reading data stored in the target hash address;
s32: and extracting a plurality of character string addresses with preset numbers from the data stored in the target hash address.
Illustratively, a hash dictionary is preset for fast matching of target strings. Since the hash data generated by the preset hash function has no uniqueness, hash collision can be generated, that is, different character strings can generate the same hash data by utilizing the preset hash function. Therefore, in order to resolve the hash collision, the preset hash dictionary stores a plurality of different character string addresses corresponding to the same hash data. It will be appreciated that, each time the predetermined hash dictionary receives a target hash address, the addresses of a predetermined number of different strings in the string dictionary may be read. The preset hash dictionary is implemented by using a RAM (random access memory) which is an internal memory resource of the FPGA.
Illustratively, according to the principle of the string table compression algorithm, a string dictionary needs to be established in the string table compression process for storing strings, and the dictionary depth determines the string table compression rate and the string searching speed. The shallower (fewer) or deeper (more) dictionary depths will result in lower string table compression rates. Similarly, establishing a plurality of character string addresses corresponding to the storage hash addresses of the preset hash dictionary also takes the depth and width of the preset hash dictionary into consideration. The depth of the preset hash dictionary is not limited, if the first bit number 19 bit character string is subjected to hash operation to generate the second bit number 11bit binary hash data, the preset hash dictionary depth can be selected to be 2048, and the width of the hash address is 11 bits. The application does not limit the data width of the hash address storage of the preset hash dictionary, the data width of the hash address storage represents a plurality of character string addresses with preset quantity, and the proper width can be selected according to the depth of the preset hash dictionary and the quantity of hash conflicts. If the application is based on video gray image compression, hash conflict conditions can be counted by utilizing a software means, about 11 hash conflicts can occur for the character strings with the depth of 2K, and each hash address can be selected to store 12 different character string addresses by presetting the hash dictionary.
For example, please refer to a first table and a second table in combination, wherein the first table is a hash dictionary table of an embodiment, and the second table is a data table of hash addresses based on the first table. In the first table, RAM Address represents a hash Address of hash data, the Address width is 11 bits, and the dictionary depth is 2048. The RAM Data represents Data stored in the hash address, and has a width of 132 bits. Since the data bit width is 132 binary bits and the address bit width of the character string in the character string dictionary is 11 bits, the data stored in the hash address can store addresses corresponding to 12 different character strings. The same hash data address hash in table two may store 12 different string addresses, i.e. corresponding 12 addresses Ad 1-Ad 12.
Table one: hash dictionary table
And (II) table: hash dictionary data format table
In one embodiment, the data compression method is performed at S40: the step of reading the corresponding plurality of character string data according to the plurality of character string addresses and matching the target character string according to the plurality of character string data comprises the following steps:
s41: based on the plurality of character string dictionaries, respectively reading a plurality of character string data stored corresponding to the plurality of character string addresses;
s42: and comparing the target character string with the plurality of character string data respectively.
The character string dictionary is illustratively realized by utilizing a RAM (random access memory) of an internal storage resource of the FPGA, and the character string matching function can be realized. The depth and width of the character string dictionary are not limited, and if the first digit 19-bit character string is stored to the 11-bit character string address, the depth of the character string dictionary is 2048, the width of the character string address is 11 bits, and the width of the character string is 19 bits. The character string dictionary reads out a plurality of character string data according to a plurality of character string addresses, and matches the target character string with the plurality of character string data one by one, so that whether a unique character string matches with the target character string can be determined.
Referring to table three, a character string dictionary table of an embodiment is shown. In the third table, RAM Address represents a character string Address, the Address width is 11 bits, the dictionary depth is 2048, RAM Data represents binary Data corresponding to the character string Address, and the dictionary width is 19 bits.
RAM Address(11bit) RAM Data(19bit)
0 Data0
1 Data1
2 Data2
…… ……
2045 Data2045
2046 Data2046
2047 Data2047
Table three: character string dictionary table
In one embodiment, the data compression method is performed in S42: the step of comparing the target character string with the plurality of character string data respectively includes:
s43: and setting up a character string dictionary by using the multiport memory and a preset dictionary depth so as to perform parallel comparison processing on a plurality of character string data.
For example, the character string dictionary may increase the speed of character string search and matching by performing parallel comparison processing on a plurality of character string data through the multiport memory.
Fig. 2 is a block diagram of a dual port memory according to an embodiment of the present application.
Referring to fig. 2, a character string dictionary is built by using a dual-port memory RAM as an example. One port searches whether the character string corresponding to the first address is equal to the target character string or not, and the other port searches whether the character string corresponding to the second address is equal to the target character string or not, so that whether the character string corresponding to 2 addresses is equal to the target character string or not can be searched in one clock period, and whether the character string corresponding to 4 addresses is equal to the target character string or not can be searched in two clock periods.
For example, since the preset hash dictionary is sent to a plurality of different addresses at a time (the character strings stored in the character string dictionary for the plurality of different addresses may all be equal to the target character string), a plurality of character string dictionaries may be established for target character string matching. Optionally, the number of the character string dictionaries is not limited, and the number of the character string dictionaries can be selected appropriately according to the establishment cost and the searching speed. Taking 3 character string dictionaries as an example, when a plurality of character string dictionaries are dynamically built, the storage contents are the same, that is, the storage contents of the three character string dictionaries are consistent. 2 clocks are needed for searching 4 different character strings by one character string dictionary, and by utilizing the parallel processing capability of the FPGA, 12 different character strings are searched by 3 character string dictionaries at the same time, and each character string dictionary searches 4 different character strings, and only 2 clock cycles are needed, so that the function of quick matching of the character strings is realized. If 1 clock cycle is required to search for 12 different addresses, 6 character string dictionaries are required to search for simultaneously.
For example, taking 3 character string dictionaries as an example, please refer to table four, table five and table six, wherein table four is a matching result of the character string dictionary a of an embodiment, table five is a matching result of the character string dictionary B of an embodiment, and table six is a matching result of the character string dictionary C of an embodiment. In the table, compare represents a comparison type, string is a target String, string1 to x represent strings 1 to x to be matched, match1 to 3 represent matching results of the String dictionaries a to C, respectively, 4' b0001 may be set to represent successful matching, and other matching results all represent unsuccessful matching. In practical cases, taking 12 different character String addresses as an example, the character String dictionary a searches for the matching results of String and String 1-4, the character String dictionary B searches for the matching results of String and String 5-8, the character String dictionary C searches for the matching results of String and String 9-12, if the matching is successful, there is and only one matching result of 4'B0001, that is, it represents that String and String1 are successfully matched, and if the matching is unsuccessful, all the matching results are not 4' B0001.
Compare Match1
String=String1 4‘b0001
String=String2 4‘b0010
String=String3 4‘b0100
String=String4 4‘b1000
Other 4‘b0000
Table four: matching result of character string dictionary A
Compare Match2
String=String1 4‘b0001
String=String2 4‘b0010
String=String3 4‘b0100
String=String4 4‘b1000
Other 4‘b0000
Table five: matching result of character string dictionary B
Compare Match3
String=String1 4‘b0001
String=String2 4‘b0010
String=String3 4‘b0100
String=String4 4‘b1000
Other 4‘b0000
Table six: matching result of character string dictionary C
Illustratively, the search speed and compression ratio of the target character string in the character string dictionary at different dictionary depths are different. The character string dictionary with enough capacity can not be filled up quickly, and the data matching effect is good because of the large number of stored character strings, so that the compression ratio can be effectively improved. The too large character string dictionary can increase the time for looking up the dictionary, affect the compression speed, and the large-capacity dictionary consumes large storage space. Optionally, the method does not limit the preset dictionary depth, and can find out the proper preset dictionary depth through a software algorithm test on a computer, wherein the dictionary size of the software experiment design is required to be within the acceptable range of the RAM resource of the internal block of the hardware FPGA chip, and then the hardware realization speed of the algorithm is considered.
The method includes the steps that an infrared camera is used for collecting a section of infrared video data, the infrared video data are stored in an infrared storage hard disk, the infrared video data in a solid state hard disk are exported at a PC end, and the section of infrared data are taken as a compressed data source. And carrying out data compression on the infrared original data in character string dictionaries under different dictionary depths by using software through a data compression algorithm, and selecting an optimal dictionary size or depth as a preset dictionary depth.
Referring to table seven, table seven is a data compression result table of an embodiment. Dictionary depths 512, 1024, 2048, 4096 are set, respectively, and as can be seen from table seven, the compression ratio is the best at the dictionary depths 2048 and 4096. And then selecting according to the RAM resource selection and the character string searching speed in the FPGA. Because RAM resources inside the FPGA are limited, the 2048 dictionary depth occupies less RAM resources, and therefore the 2048 dictionary depth is selected. In addition, the smaller the dictionary depth is, the faster the character string is searched, so 2048 dictionary depth is selected. The original data is 388800×8bit, data Compression is performed on 388800 single byte data, the Compression rate (Compression rate) is an effect name describing a compressed file, and is a ratio of a size of the compressed file to a size of the compressed file, for example: the compression rate of a 100m file is 90m after compression, and is 90/100×100% =90%, and the smaller the compression rate is, the better is. On the other hand, the larger the input data is compressed, the shorter the decompression time.
Table seven data compression result table
In the above embodiment, establishing the preset hash dictionary based on the preset hash function can quickly search the character string address corresponding to the target character string, so that the search speed and the compression rate of the target character string can be improved. And establishing a character string dictionary, receiving a plurality of character string addresses, reading out corresponding character strings and performing one-to-one matching with the target character strings.
In one embodiment, the data compression method is performed at S50: the step of compressing the character to be processed according to the matching result comprises the following steps:
s51: when the target character string is matched, the code value is not output;
s52: and when the character string is matched with the target character string, the target character string is used as a character string prefix to process the next character to be processed.
For example, if the target character string is in the character string dictionary, the Code value Code is not output temporarily, and the character string prefix is extended to the character to be processed plus the character string prefix, i.e. the target character string, so as to process the next character to be processed. If there is no next character to be processed, outputting the character string prefix expanded into the target character string as the code value.
In one embodiment, the data compression method is performed at S50: the step of compressing the character to be processed according to the matching result comprises the following steps:
s53: when the target character string is not matched, outputting a character string prefix as a coding value, adding the target character string to a character string dictionary, and adding a storage address corresponding to the target character string to a preset hash dictionary.
For example, the character string dictionary may implement a character string updating function, dynamically maintain the character string dictionary, and the preset hash dictionary may also dynamically maintain the hash address. If the target character string is not in the character string dictionary, adding the target character string into the character string dictionary, and adding the target character string address into a corresponding hash address in a preset hash dictionary.
S54: and when the target character string is not matched, taking the character to be processed as a character string prefix so as to process the next character to be processed.
For example, if the target string matching is unsuccessful, the string prefix is replaced with the character to be processed to process the next character to be processed. If there is no next character to be processed, outputting the character string prefix replaced by the character to be processed as the code value.
In one embodiment, the data compression method is performed at S50: the step of compressing the character to be processed according to the matching result comprises the following steps:
s55: and encrypting the coded values in the maintained character string dictionary by adopting a pseudo encryption algorithm.
Illustratively, the compressed Code value Code is encrypted using a pseudo-encryption algorithm. Table eight is a pseudo-encryption algorithm table of an embodiment. Referring to Table eight, the pre-encryption Code binary data is 000111001000 and the post-encryption En_code binary data is 00010011100. The data is encrypted in order to protect the data from being captured by lawbreakers. Optionally, the encryption mode of the code value is not limited, different encryption algorithms are realized on the FPGA, different clock cycles are needed, and a proper encryption mode can be selected according to the requirement of the compression rate. If the pseudo encryption algorithm is selected to be output through bit exchange, the data encryption can be completed by realizing one clock cycle.
Code 11 bits 10 bits 9 bits 8 bits 7 bits 6 bits 5 bits 4 bits 3 bits 2 bits 1 bit
En_Code 1 bit 2 bits 3 bits 4 bits 5 bits 10 bits 9 bits 8 bits 7 bits 6 bits 11 bits
Code 0 0 1 1 1 0 0 1 0 0 0
En_Code 0 0 0 1 0 0 1 1 1 0 0
Table eight: pseudo-encryption algorithm table
Second embodiment
On the other hand, the application also provides a data compression device.
In one embodiment, a data compression device includes a processor and a memory. The memory stores a computer program which, when executed by the processor, implements the steps of the data compression method as described above.
Fig. 3 is a timing diagram illustrating a data compression method implemented by the data compression device according to an embodiment of the application.
Referring to fig. 3, in an embodiment, the data compression device inputs a character to be processed, single-byte data C, and generates a target character string p+c by combining the single-byte data with a character string prefix P, which requires 1 clock cycle. The preset hash function requires 1 clock cycle to operate the target string as hash data hash. The preset hash dictionary and the character string dictionary search for the matching target character string according to the hash data, and 3 clock cycles are required. And outputting the Code value Code according to the matching result Match, wherein 1 clock cycle is required. Receiving the encoded value, encrypting the encoded value and outputting the encrypted data Encode, requires 1 clock cycle. And inputting the next character C to be processed.
Fig. 4 is a block diagram of a data compression apparatus according to an embodiment of the present application.
Referring to fig. 4, in one embodiment, the data compression apparatus includes a string module 10, a hash algorithm module 20, a compression dictionary module 30, an encoding module 40, and an encryption module 50.
The data compression device is implemented using programmable logic devices.
Illustratively, the programmable logic device may use an FPGA.
The character string module 10 is configured to obtain input data, extract a character to be processed of the input data, generate a target character string, and the input data is single-byte data. The hash algorithm module 20 is configured to operate on the target string according to a preset hash function to determine hash data. The compression dictionary module 30 is configured to read a plurality of string addresses corresponding to the hash data based on a preset hash dictionary, read a corresponding plurality of string data according to the plurality of string addresses, and match a target string according to the plurality of string data. The encoding module 40 is configured to compress the character to be processed according to the matching result. The encryption module 50 is configured to encrypt the compressed character to be processed using a pseudo encryption algorithm.
In this embodiment, the data compression device adds the hash algorithm module 20 and the compression dictionary module 30 based on the principle of the string table compression algorithm, and combines the characteristics of high integration level, low power consumption, flexibility and parallel operation of the programmable logic device, so as to realize the rapid compression of single-byte input data through the hardware of the programmable logic device, and improve the real-time compression capability.
Fig. 5 is a block diagram of a compression dictionary module according to an embodiment of the present application.
Referring to fig. 5, in an embodiment, a compression dictionary module 30 in the data compression apparatus includes a hash dictionary unit 31, at least one character string dictionary unit 32, and a matching decision unit 33.
The hash dictionary unit 31 is configured to update a preset hash dictionary, and read a plurality of character string addresses corresponding to the hash data based on the preset hash dictionary. Each character string dictionary unit 32 is used for updating a character string dictionary and reading a corresponding plurality of character string data according to a plurality of character string addresses. The matching decision unit 33 is configured to match the target character string according to the plurality of character string data.
Illustratively, each string dictionary unit 32 contains a consistent content string dictionary. Alternatively, the number of the character string dictionary units 32 is not limited in the present application, and the number of the character string dictionary units 32 may be selected appropriately according to the establishment cost and the search speed. Three character string dictionary units 32 are shown in fig. 5.
In this embodiment, the data compression device can quickly find whether the target character string exists in the character string dictionary by combining the hash dictionary unit 31 with the plurality of character string dictionary units 32, so that the matching decision unit 33 quickly outputs the matching result.
In one embodiment, the string dictionary unit 32 in the data compression device uses a multiport memory to build a string dictionary, so that the matching decision unit 33 performs parallel comparison processing on a plurality of string data.
Illustratively, each string dictionary unit 32 may implement a parallel alignment of string data for a corresponding number of ports based on a multi-port memory.
Third embodiment
On the other hand, the present application also provides a data compression system, and fig. 6 is a block diagram of the data compression system according to an embodiment of the present application.
Referring to fig. 6, in one embodiment, the data compression system includes a data distribution module 1, at least one data compression device 2, and a data aggregation module 3.
The data compression system is implemented using programmable logic devices.
Illustratively, the programmable logic device may use an FPGA.
The data distribution module 1 acquires input data of a plurality of bytes and divides the input data into a plurality of single-byte data. Each data compression device 2 processes a corresponding single byte data respectively to process a plurality of single byte data in parallel and generate a compression result respectively. The data aggregation module 3 combines the compression results generated by each data compression device 2 and outputs the total compressed data.
Illustratively, compressing a plurality of single-byte data in parallel by a plurality of data compression devices 2 can increase the rate of data compression. If the operating clock of the data compression device 2 is 200MHz, the clock period is 5ns, and the compression rate of one data compression device 2 is 27MB/s, then the compression rate of four data compression devices 2 is 108MB/s. Alternatively, the present application does not limit the number of data compression devices 2, and the more data compression devices 2, the more single bytes compressed in parallel at a time. The number of suitable data compression devices 2 is selected taking the compression rate and cost into account. Four data compression devices 2 are shown in fig. 6.
In this embodiment, the data compression system implements fast compression of multi-byte input data by using the programmable logic device hardware according to the characteristics of high integration, low power consumption, flexibility and parallel operation of the data compression device 2 in combination with the programmable logic device, so as to improve the real-time compression capability.
As described above, the data compression method, apparatus and system provide a lossless compression method, which has the characteristics of real-time compressibility, compression stability and convenience for software program maintenance and upgrade, and overcomes the technical bottleneck and hardware condition limitation brought by a special compression chip. In the compression process, all data processing and transmission work are completed by the FPGA, and the real-time compression speed is considered, the RAM resources in the FPGA are used for caching the dictionary, so that the technical problem that the compression rate of the traditional FPGA for realizing the LZW scheme is too slow is solved. The establishment of the dictionary is completed by using the resources in the FPGA chip, while the compression ratio of the large-capacity dictionary can be improved, the problem of establishment and updating of the dictionary can be solved by processing the data in a fixed-length coding mode (11 bits) by considering the amount of resources in the FPGA, and the dictionary capacity of the traditional FPGA implementation LZW scheme is limited (the dictionary capacity is smaller than 1024), so that the LZW compression ratio is lower. The problem of transmission and storage of the compressed output data stream is solved by converting 11-bit data into 8-bit data stream for transmission and storage.
In the present application, step numbers such as S10 and S20 are used for the purpose of more clearly and briefly describing the corresponding contents, and are not to constitute a substantial limitation on the sequence, and those skilled in the art may execute S20 first and then S10 in the specific implementation, which are all within the scope of the present application.
The embodiments of the data compression device and the system provided by the application can include all technical features of any one of the embodiments of the method, and the expansion and explanation contents of the description are basically the same as those of each embodiment of the method, and are not repeated here.
Embodiments of the present application also provide a computer program product comprising computer program code which, when run on a computer, causes the computer to perform the method as in the various possible embodiments described above.
The embodiment of the application also provides a chip, which comprises a memory and a processor, wherein the memory is used for storing a computer program, and the processor is used for calling and running the computer program from the memory, so that the device provided with the chip executes the method in the various possible implementation manners.
It can be understood that the above scenario is merely an example, and does not constitute a limitation on the application scenario of the technical solution provided by the embodiment of the present application, and the technical solution of the present application may also be applied to other scenarios. For example, as one of ordinary skill in the art can know, with the evolution of the system architecture and the appearance of new service scenarios, the technical solution provided by the embodiment of the present application is also applicable to similar technical problems.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.
The units in the device of the embodiment of the application can be combined, divided and deleted according to actual needs.
In the present application, the same or similar term concept, technical solution and/or application scenario description will be generally described in detail only when first appearing and then repeatedly appearing, and for brevity, the description will not be repeated generally, and in understanding the present application technical solution and the like, reference may be made to the previous related detailed description thereof for the same or similar term concept, technical solution and/or application scenario description and the like which are not described in detail later.
In the present application, the descriptions of the embodiments are emphasized, and the details or descriptions of the other embodiments may be referred to.
The technical features of the technical scheme of the application can be arbitrarily combined, and all possible combinations of the technical features in the above embodiment are not described for the sake of brevity, however, as long as there is no contradiction between the combinations of the technical features, the application shall be considered as the scope of the description of the application.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (14)

1. A data compression method, applied to a programmable logic device, comprising:
acquiring input data, extracting characters to be processed of the input data, and generating a target character string;
calculating the target character string according to a preset hash function, and determining hash data;
reading a plurality of character string addresses corresponding to the hash data based on a preset hash dictionary;
Reading a plurality of corresponding character string data according to the plurality of character string addresses, and matching the target character string according to the plurality of character string data;
and compressing the character to be processed according to the matching result.
2. The data compression method of claim 1, wherein the step of obtaining input data, extracting characters to be processed of the input data to generate a target character string comprises:
acquiring multi-byte input data, and dividing the input data into a plurality of single-byte data;
extracting the characters to be processed one by one, reading the character string prefix, and combining the character string prefix and the characters to be processed to obtain the target character string.
3. The data compression method as claimed in claim 2, wherein the step of extracting the character to be processed one by one and reading a string prefix, and combining the string prefix and the character to be processed as the target string comprises:
reading a preset channel number of the single-byte data based on the plurality of single-byte data;
and outputting the single-byte data of the preset channel number to corresponding processing channels respectively, so that a plurality of processing channels process the single-byte data of the preset channel number in parallel.
4. The data compression method of claim 1, wherein the predetermined hash function is a remainder function; the step of calculating the target character string according to a preset hash function and determining hash data comprises the following steps:
and performing remainder operation on the preset prime number by the target character string, and taking the obtained remainder as the hash data.
5. The data compression method as claimed in claim 1, wherein the step of reading a plurality of character string addresses corresponding to the hash data based on a preset hash dictionary comprises:
inputting the hash data as a target hash address of the preset hash dictionary, and reading data stored in the target hash address;
and extracting a plurality of character string addresses with preset numbers from the data stored in the target hash address.
6. The data compression method as claimed in any one of claims 1 to 5, wherein the step of reading a corresponding plurality of character string data according to the plurality of character string addresses and matching the target character string according to the plurality of character string data comprises:
based on a plurality of character string dictionaries, respectively reading a plurality of character string data stored corresponding to the plurality of character string addresses;
And comparing the target character string with the plurality of character string data respectively.
7. The data compression method of claim 6, wherein the step of comparing the target string with the plurality of string data, respectively, comprises:
and setting up the character string dictionary by using a multiport memory and a preset dictionary depth so as to perform parallel comparison processing on the plurality of character string data.
8. The data compression method of claim 6, wherein the compressing the character to be processed according to the matching result comprises:
when the target character string is matched, not outputting a coding value; and/or the number of the groups of groups,
and when the target character string is not matched, outputting the character string prefix as a coding value, adding the target character string to a character string dictionary, and adding a storage address corresponding to the target character string to the preset hash dictionary.
9. The data compression method of claim 8, wherein the step of compressing the character to be processed according to the matching result further comprises:
when the target character string is matched, the target character string is used as the character string prefix, so that the next character to be processed is processed; and/or the number of the groups of groups,
And when the target character string is not matched, the character to be processed is taken as the character string prefix so as to process the next character to be processed.
10. The data compression method of claim 8, wherein the step of compressing the character to be processed according to the matching result comprises:
and encrypting the coded values in the maintained character string dictionary by adopting a pseudo encryption algorithm.
11. A data compression device, wherein the data compression device comprises a processor and a memory; the memory stores a computer program which, when executed by the processor, implements the steps of the data compression method according to any one of claims 1-10;
and/or the data compression device comprises a character string module, a hash algorithm module, a compression dictionary module, an encoding module and an encryption module;
the data compression device is realized by using a programmable logic device;
the character string module is used for acquiring input data, extracting characters to be processed of the input data and generating a target character string;
the hash algorithm module is used for calculating the target character string according to a preset hash function and determining hash data;
The compression dictionary module is used for reading a plurality of character string addresses corresponding to the hash data based on a preset hash dictionary, reading a plurality of corresponding character string data according to the plurality of character string addresses, and matching the target character string according to the plurality of character string data;
the coding module is used for compressing the character to be processed according to the matching result;
the encryption module is used for encrypting the character to be processed after compression processing by adopting a pseudo encryption algorithm.
12. The data compression apparatus of claim 11, wherein the compression dictionary module comprises a hash dictionary unit, at least one string dictionary unit, and a matching decision unit;
the hash dictionary unit is used for updating a preset hash dictionary and reading a plurality of character string addresses corresponding to the hash data based on the preset hash dictionary;
each character string dictionary unit is used for updating a character string dictionary and reading a plurality of corresponding character string data according to the plurality of character string addresses;
the matching decision unit is used for matching the target character string according to the plurality of character string data.
13. The data compression apparatus according to claim 12, wherein the character string dictionary unit builds the character string dictionary using a multiport memory so that the matching decision unit performs parallel comparison processing on the plurality of character string data.
14. A data compression system, wherein the data compression system comprises a data distribution module, at least one data compression device and a data aggregation module;
the data compression system is implemented using a programmable logic device;
the data distribution module acquires multi-byte input data and divides the input data into a plurality of single-byte data;
each data compression device processes a corresponding single-byte data respectively so as to process the plurality of single-byte data in parallel and generate a compression result respectively;
and the data aggregation module combines the compression results generated by each data compression device and outputs total compressed data.
CN202310472128.7A 2023-04-26 2023-04-26 Data compression method, device and system Pending CN116683914A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310472128.7A CN116683914A (en) 2023-04-26 2023-04-26 Data compression method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310472128.7A CN116683914A (en) 2023-04-26 2023-04-26 Data compression method, device and system

Publications (1)

Publication Number Publication Date
CN116683914A true CN116683914A (en) 2023-09-01

Family

ID=87779889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310472128.7A Pending CN116683914A (en) 2023-04-26 2023-04-26 Data compression method, device and system

Country Status (1)

Country Link
CN (1) CN116683914A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117113106A (en) * 2023-10-19 2023-11-24 深圳大普微电子股份有限公司 Data compression method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117113106A (en) * 2023-10-19 2023-11-24 深圳大普微电子股份有限公司 Data compression method and device, electronic equipment and storage medium
CN117113106B (en) * 2023-10-19 2024-03-19 深圳大普微电子股份有限公司 Data compression method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN105207678B (en) A kind of system for implementing hardware of modified LZ4 compression algorithms
US7283591B2 (en) Parallelized dynamic Huffman decoder
CN107836083B (en) Method, apparatus and system for semantic value data compression and decompression
US6674908B1 (en) Method of compression of binary data with a random number generator
US7538696B2 (en) System and method for Huffman decoding within a compression engine
US7233266B2 (en) Data compression/decompression device and data compression/decompression method
US5001478A (en) Method of encoding compressed data
US8456331B2 (en) System and method of compression and decompression
US5003307A (en) Data compression apparatus with shift register search means
US5010345A (en) Data compression method
Manzini et al. A simple and fast DNA compressor
US20090016452A1 (en) Blocking for combinatorial coding/decoding for electrical computers and digital data processing systems
CN116683914A (en) Data compression method, device and system
JPH06224778A (en) Equipment and method for data compression using matching string search and huffman coding
CN108886367A (en) Method, apparatus and system for compression and decompression data
CN115189696A (en) Hardware compression and decompression method based on Huffman decoding table
Najmabadi et al. An architecture for asymmetric numeral systems entropy decoder-a comparison with a canonical Huffman decoder
US5010344A (en) Method of decoding compressed data
US5184126A (en) Method of decompressing compressed data
US6919827B2 (en) Method and apparatus for effectively decoding Huffman code
US10230391B2 (en) Compression and/or encryption of a file
US10491241B1 (en) Data compression scheme utilizing a repetitive value within the data stream
Andrii et al. UDC 004.627 LZW COMPRESSION ALGORITHM
CN113810057B (en) Method, apparatus and system for semantic value data compression and decompression
Sriman et al. Efficient Text Compression Algorithms: Principles, Performance, and Applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination