CN115577149A - Data processing method, device and equipment and readable storage medium - Google Patents

Data processing method, device and equipment and readable storage medium Download PDF

Info

Publication number
CN115577149A
CN115577149A CN202211592819.2A CN202211592819A CN115577149A CN 115577149 A CN115577149 A CN 115577149A CN 202211592819 A CN202211592819 A CN 202211592819A CN 115577149 A CN115577149 A CN 115577149A
Authority
CN
China
Prior art keywords
address
hash
point information
character
chain table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211592819.2A
Other languages
Chinese (zh)
Other versions
CN115577149B (en
Inventor
刘伟
卢圣才
王洪良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN202211592819.2A priority Critical patent/CN115577149B/en
Publication of CN115577149A publication Critical patent/CN115577149A/en
Application granted granted Critical
Publication of CN115577149B publication Critical patent/CN115577149B/en
Priority to PCT/CN2023/101223 priority patent/WO2024124843A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data processing method, a data processing device, data processing equipment and a readable storage medium in the technical field of computers. After the hardware equipment is used for acquiring the target character string, a plurality of character fragments are determined in the target character string, hash chain tables corresponding to the character fragments are inquired according to the positions of the character fragments in the target character string, write-in addresses of the hash chain tables in the dual-port RAM of the hardware equipment are determined, and the information of each chain point in each hash chain table is written into the dual-port RAM of the hardware equipment according to the write-in addresses, so that the hash chain tables can be stored in the dual-port RAM according to the write-in addresses, and the method can be realized: and sequentially storing each hash chain table corresponding to the data to be compressed. Accordingly, the data processing device, the equipment and the readable storage medium provided by the application also have the technical effects.

Description

Data processing method, device and equipment and readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and readable storage medium.
Background
At present, when data is compressed, a preset dictionary can be inquired to construct a hash chain table of the data to be compressed. Wherein, to any compressed data, can inquire simultaneously to different character fragments wherein and obtain corresponding hash chain table, and some character fragments's hash chain table is long, and some character fragments's hash chain table is short, consequently finds the inquiry return time of these hash chain tables indefinite, also promptly: the query return time is inconsistent with the original sequence of each character fragment in the compressed data, so that the hash chain tables are output out of order, and the subsequent processing of data compression is influenced.
Therefore, how to store the hash chain tables corresponding to the data to be compressed in sequence is a problem to be solved by those skilled in the art.
Disclosure of Invention
In view of this, an object of the present application is to provide a data processing method, apparatus, device and readable storage medium, so that each hash chain table corresponding to data to be compressed is stored sequentially. The specific scheme is as follows:
in a first aspect, the present application provides a data processing method, applied to a hardware device, including:
acquiring a target character string;
determining a plurality of character fragments in the target character string;
inquiring a hash chain table corresponding to each character fragment according to the position of each character fragment in the target character string, and determining the write-in address of each hash chain table in a dual-port RAM of the hardware equipment;
and writing the chain point information in each hash chain table into the dual-port RAM according to the write address.
Optionally, the querying a hash chain table corresponding to each character fragment according to the position of each character fragment in the target character string includes:
determining the query sequence of each character segment according to the position of each character segment in the target character string and the rule of querying before the position;
and querying the hash chain table corresponding to each character segment according to the query sequence.
Optionally, the query process of the hash chain table corresponding to any character fragment includes:
calculating the hash value of the current character segment;
determining a current address to be read corresponding to the hash value;
judging whether the current address to be read is the same as the hash value of the character segment of the searched hash chain table;
if not, after reading the link point information from the current address to be read, adding a link table end zone bit for the link point information, and constructing the read link point information into a hash link table corresponding to the current character segment.
Optionally, if the current address to be read is the same as the hash value of the character segment of the found hash chain table, after reading the link point information from the current address to be read, adding a link table unfinished flag bit to the link point information, determining a next address to be read, and performing the step of judging whether the hash value of the current address to be read is the same as the hash value of the character segment of the found hash chain table.
Optionally, the determining the current address to be read corresponding to the hash value includes:
and taking the hash value as the current address to be read.
Optionally, the determining a write address of each hash chain table in the dual-port RAM of the hardware device includes:
and determining the write-in address of each hash chain table in the dual-port RAM according to the query sequence and the rule that the query write-in address is in front of the first query write-in address.
Optionally, the number of the dual-port RAMs is plural;
correspondingly, the writing process of each link point information in any hash link list comprises the following steps:
and respectively writing the chain point information in the current hash chain table into the double-port RAM according to the write address.
Optionally, the method further comprises:
reading a hash chain table corresponding to any character segment from each double-port RAM according to the write address;
and determining the coding length corresponding to each link point information in the currently read hash link table, and selecting the link point information with the longest coding length as the compression information of the current character segment.
Optionally, the selecting the link point information with the longest coding length as the compression information of the current character fragment includes:
and comparing the coding lengths corresponding to the chain point information in the currently read hash chain table in a parallel cascade mode to select the chain point information with the longest coding length as the compressed information of the current character segment.
Optionally, the method further comprises:
and if the coding length of the link point information is not longer than the coding length of the link point information, determining a plurality of distance values corresponding to the coding length of the link point information, and selecting the link point information with the minimum distance value as the compression information of the current character segment.
Optionally, the method further comprises:
after any RAM address in any double-port RAM is written into the chain point information carrying the end flag bit of the chain table, setting a state register corresponding to the RAM address to be 1; accordingly, after the data in the RAM address is read out, the status register corresponding to the RAM address is set to 0.
Optionally, the number of the chain point information in the hash chain table corresponding to any character segment does not exceed the number of the dual-port RAM.
In a second aspect, the present application provides a data processing apparatus, applied to a hardware device, including:
the acquisition module is used for acquiring a target character string;
a determining module for determining a plurality of character fragments in the target character string;
the processing module is used for inquiring the hash chain table corresponding to each character segment according to the position of each character segment in the target character string and determining the write-in address of each hash chain table in the double-port RAM of the hardware equipment;
and the writing module is used for writing the information of each link point in each hash link table into the dual-port RAM according to the writing address.
Optionally, the processing module is specifically configured to:
determining the query sequence of each character segment according to the position of each character segment in the target character string and the rule of querying before the position;
and querying the hash chain table corresponding to each character segment according to the query sequence.
Optionally, the processing module is specifically configured to: calculating the hash value of the current character segment aiming at any character segment; determining a current address to be read corresponding to the hash value; judging whether the current address to be read is the same as the hash value of the character segment of the searched hash chain table; if not, after reading the link point information from the current address to be read, adding a link table end flag bit for the link point information, and constructing the read link point information into a hash link table corresponding to the current character segment.
Optionally, the processing module is specifically configured to: if the hash value of the current address to be read is the same as the hash value of the character segment of the found hash chain table, after reading the chain point information from the current address to be read, adding a chain table unfinished flag bit to the chain point information, determining the next address to be read, and executing the step of judging whether the hash value of the character segment of the current address to be read is the same as the hash value of the character segment of the found hash chain table.
Optionally, the processing module is specifically configured to:
and taking the hash value as the current address to be read.
Optionally, the processing module further comprises:
and the address determining unit is used for determining the write-in addresses of the hash chain tables in the dual-port RAM according to the query sequence and the rule that the query write-in addresses are in front.
Optionally, the number of the dual-port RAMs is plural; correspondingly, the write module is specifically configured to: and for any hash chain table, respectively writing the chain point information in the current hash chain table into each double-port RAM according to the write address.
Optionally, the method further comprises:
the reading module is used for reading the hash chain table corresponding to any character segment from each double-port RAM according to the writing address;
and the selection module is used for determining the coding length corresponding to each link point information in the currently read hash link table, and selecting the link point information with the longest coding length as the compression information of the current character segment.
Optionally, the selection module is specifically configured to:
and comparing the coding lengths corresponding to the link point information in the currently read hash link list in a parallel cascade mode to select the link point information with the longest coding length as the compression information of the current character segment.
Optionally, the method further comprises:
and the other selection module is used for determining distance values corresponding to the plurality of link point information with the longest coding length if the number of the link point information with the longest coding length is multiple, and selecting the link point information with the smallest distance value as the compression information of the current character segment.
Optionally, the method further comprises:
the address state changing module is used for writing any RAM address in any double-port RAM into the chain point information carrying the end flag bit of the chain table, and then setting a state register corresponding to the RAM address to be 1; accordingly, after the data in the RAM address is read out, the status register corresponding to the RAM address is set to 0.
Optionally, the number of the chain point information in the hash chain table corresponding to any character segment does not exceed the number of the dual-port RAM.
In a third aspect, the present application provides an electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the data processing method disclosed in the foregoing.
In a fourth aspect, the present application provides a readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the data processing method disclosed in the foregoing.
According to the above scheme, the present application provides a data processing method, applied to a hardware device, including: acquiring a target character string; determining a plurality of character fragments in the target character string; inquiring a hash chain table corresponding to each character fragment according to the position of each character fragment in the target character string, and determining the write address of each hash chain table in a dual-port RAM of the hardware equipment; and writing the chain point information in each hash chain table into the dual-port RAM according to the write address. The method has the advantages that after the target character string is obtained by the hardware equipment, a plurality of character segments are determined in the target character string, the hash chain table corresponding to each character segment is inquired according to the position of each character segment in the target character string, the writing address of each hash chain table in the dual-port RAM of the hardware equipment is determined, and then each chain point information in each hash chain table is written into the dual-port RAM of the hardware equipment according to the writing address, so that each hash chain table can be stored in the dual-port RAM according to the writing address of each hash chain table. According to the scheme, before each hash chain table is stored, a write-in address is determined for each hash chain table in advance according to the position of each character fragment in a target character string, and when a certain hash chain table is stored by using a dual-port RAM (random access memory), the write-in of information of each chain point in the hash chain table is also completed according to the write-in address determined for the hash chain table, namely: each hash chain table can be stored in the dual-port RAM according to a predetermined write address, so that the sequence of the addresses of each hash chain table in the RAM accords with the original query sequence, and the addresses cannot be stored out of order, and therefore the method can be realized: and sequentially storing each hash chain table corresponding to the data to be compressed. Then, the hash chain tables can be read out subsequently according to the order of the RAM addresses, the hash chain tables can not be read out in a disordered way, and the subsequent compression operation can not be influenced. Meanwhile, the compression efficiency can be accelerated by utilizing hardware equipment.
Accordingly, the data processing device, the equipment and the readable storage medium provided by the application also have the technical effects.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only the embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a data processing method disclosed herein;
FIG. 2 is a schematic illustration of a chain point write disclosed herein;
FIG. 3 is a schematic diagram of a hash chain table constructed as disclosed herein;
FIG. 4 is a schematic diagram illustrating a comparison between a parallel cascade comparison and a serial bubble comparison disclosed in the present application;
FIG. 5 is a schematic diagram of a recursive transmission of comparison results disclosed herein;
FIG. 6 is a schematic diagram of a data processing apparatus according to the present disclosure;
fig. 7 is a schematic diagram of an electronic device disclosed in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
At present, when data is compressed, a preset dictionary can be queried to construct a hash chain table of the data to be compressed. Wherein, to any compressed data, can inquire simultaneously to different character fragments wherein and obtain corresponding hash chain table, and some character fragments's hash chain table is long, and some character fragments's hash chain table is short, consequently finds the inquiry return time of these hash chain tables indefinite, also promptly: the query return time is inconsistent with the original sequence of each character fragment in the compressed data, so that the hash chain tables are output out of order, and the subsequent processing of data compression is influenced. Therefore, the data processing scheme is provided, and the hash chain tables corresponding to the data to be compressed can be stored sequentially.
Referring to fig. 1, the embodiment of the present application discloses a data processing method, which is applied to a hardware device, and includes:
s101, acquiring a target character string.
S102, determining a plurality of character fragments in the target character string.
In this embodiment, the target character string is data to be compressed. For example: the target string is: ABCDEFGH. Accordingly, the plurality of character segments determined in the target character string may be: ABC, BCD, CDE, DEF, EFG, FGH. As can be seen, a character fragment typically includes 3 characters.
The compression algorithm applied to this embodiment includes an LZ77 algorithm and the like. The LZ77 algorithm is a commonly used dictionary matching algorithm that uses a constant size "sliding window" to process repeated strings in data. The commonly used algorithm for constructing the dictionary and matching the character strings at present is to extract individual bits of fixed bytes by using a hash method, then calculate a hash value according to a specific hash function, use the calculated hash value as a write address, and write link point information in. When the character strings are matched, the individual bits of the fixed bytes of the character strings are used for hashing, the hash value is used as a read address to read out the chain point information, then the character strings of the original dictionary are found according to the chain point information to be compared one by one, and then matching length information is obtained; the matching distance information may be calculated from the difference between the position information of the current lookup string and the position information of the matching link point read out from the dictionary. Finally, information of a matching length and a matching distance can be used as compressed information of a character.
S103, inquiring the hash chain table corresponding to each character segment according to the position of each character segment in the target character string, and determining the write address of each hash chain table in the dual-port RAM of the hardware equipment.
Referring to the exemplary target string ABCDEFGH, the positions of ABC, BCD, CDE, DEF, EFG, FGH in the target string may be recorded in order as: 1. 2, 3, 4, 5, 6, then the query order of these 6 character segments proceeds in the order of 1, 2, 3, 4, 5, 6. As can be seen, the query order for the top character segments is top.
In a specific embodiment, querying a hash chain table corresponding to each character fragment according to a position of each character fragment in a target character string includes: determining the query sequence of each character segment according to the position of each character segment in the target character string and the rule of querying before the position; and querying the hash chain table corresponding to each character segment according to the query sequence.
The query process of the hash chain table corresponding to any character segment comprises the following steps: calculating the hash value of the current character segment; determining a current address to be read corresponding to the hash value; judging whether the current address to be read is the same as the hash value of the character segment of the searched hash chain table; if not, after reading the link point information from the current address to be read, adding a link table end flag bit for the link point information, and constructing the read link point information into a hash link table corresponding to the current character segment. If the current address to be read is the same as the hash value of the character segment of the found hash chain table, after reading the chain point information from the current address to be read, adding a chain table unfinished flag bit to the chain point information, determining the next address to be read, and executing the step of judging whether the hash value of the current address to be read is the same as the hash value of the character segment of the found hash chain table until the hash value of the current address to be read is different from the hash value of the character segment of the found hash chain table.
For example: the hash chain table corresponding to the character fragment ABC is: a1 And A1 includes a linked list ending flag bit. When the hash chain table of the character segment BCD is inquired, the hash value of the character segment BCD is the same as the hash value of the character segment ABC, at the moment, the hash value is taken as the current address to be read, A1 is read from the current address to be read, a chain table unfinished zone bit (such as 0) is added to the A1, then the next address to be read is determined, and then whether the current address to be read (the current address to be read at the moment, namely the next address to be read determined in the previous step) is the same as the hash value of the character segment of which the hash chain table is found or not is continuously judged; if yes, repeating the process; if not, reading A2 from the address to be read next, and adding a linked list ending flag bit (such as 1) for A2, so as to obtain the hash linked list of the character fragment BCD: a1 → A2, in the linked list, A1 and A2 are the link point information, A1 includes the linked list unfinalized flag bit, and A2 includes the linked list finished flag bit. Therefore, a dictionary including all the link point information is pre-stored, and querying a hash link table corresponding to a certain character segment is to determine which link point information the current character segment can correspond to.
The determining of the current address to be read corresponding to the hash value includes: and taking the hash value as the current address to be read. The next address to be read can be determined according to a preset algorithm or strategy. A hash chain table at least comprises 1 chain point information and at most comprises a preset number of chain point information. Wherein the preset value is not more than the number of the double-port RAM. Therefore, the hash value of a character fragment is used as a read-required address, and the offset address of the character fragment can be read from the read-required address, wherein the offset address is one of the link point information of the hash link table of the character fragment.
And S104, writing the chain point information in each hash chain table into the double-port RAM according to the writing address of each hash chain table in the double-port RAM of the hardware equipment.
As described above, the search order of the character segments positioned earlier is advanced, and the write address of the character segment positioned earlier is advanced. As can be seen, the query order and write address depend on the location of the character fragment in the target string. Therefore, in a specific embodiment, determining the write address of each hash chain table in the dual-port RAM of the hardware device includes: and determining the write-in addresses of the hash chain tables in the dual-port RAM according to the query sequence and the rule that the query write-in addresses are in front of the write-in addresses.
In one embodiment, the dual-port RAM is multiple; correspondingly, the writing process of each link point information in any hash link list comprises the following steps: and respectively writing the chain point information in the current hash chain table into each dual-port RAM according to the write address.
Referring to fig. 2, if there are 6 dual-port RAMs, they are: RAM0, RAM1, RAM2, RAM3, RAM4, RAM5, and according to the positions of character segments ABC, BCD, CDE, DEF, EFG, FGH: 1. 2, 3, 4, 5, 6, whose write addresses may be address 0, address 1, address 2, address 3, address 4, address 5. Namely: when the hash chain table corresponding to ABC is: at A1, A1 is written to address 0 of RAM0, while address 0 of RAM1-RAM5 do not write data. When the hash chain table corresponding to the BCD is: a1 → A2, A1 writes to RAM0 at address 1, A2 writes to RAM1 at address 1, and RAM2-RAM5 at address 1 do not write data. When the hash chain table corresponding to DEF has 5 pieces of chain point information, these 5 pieces of chain point information are written into address 3 of RAM0, address 3 of RAM1, address 3 of RAM2, address 3 of RAM3, and address 3 of RAM4, respectively. Therefore, the number of the link point information in the hash chain table corresponding to any character fragment cannot exceed the number of the dual-port RAM, otherwise, the hash chain table cannot be completely stored.
It should be noted that the compression information may be determined based on the hash chain table corresponding to each character segment. In a specific embodiment, the method further comprises the following steps: reading a hash chain table corresponding to any character segment from each double-port RAM according to the write address; and determining the coding length corresponding to each link point information in the currently read hash link table, and selecting the link point information with the longest coding length as the compression information of the current character segment. And if the coding length of the link point information is not longer than the coding length of the link point information, determining a plurality of distance values corresponding to the coding length of the link point information, and selecting the link point information with the minimum distance value as the compression information of the current character segment. Wherein, a chain point information corresponds to a code length and a distance value.
The method for selecting the link point information with the longest coding length as the compression information of the current character segment comprises the following steps: and comparing the coding lengths corresponding to the chain point information in the currently read hash chain table in a parallel cascade mode to select the chain point information with the longest coding length as the compressed information of the current character segment. Selecting one piece of chain point information as the compressed information of a certain character segment, and showing that: the coding length and distance value corresponding to the chain point information can be used for decompressing the corresponding character segment.
The dual-port RAM in the embodiment is provided with two completely independent sets of data lines, address lines and read-write control lines, and can be used for improving the throughput rate of the RAM. In order to implement read-write control, in this embodiment, after any RAM address in any dual-port RAM is written into link point information carrying a link table end flag bit, a state register corresponding to the RAM address is set to 1; accordingly, after the data in the RAM address is read out, the status register corresponding to the RAM address is set to 0.
Therefore, according to the method and the device, after the target character string is obtained by the hardware device, a plurality of character segments are determined in the target character string, the hash chain table corresponding to each character segment is inquired according to the position of each character segment in the target character string, the writing address of each hash chain table in the dual-port RAM of the hardware device is determined, and then each chain point information in each hash chain table is written into the dual-port RAM of the hardware device according to the writing address, so that each hash chain table can be stored in the dual-port RAM according to the writing address. According to the scheme, before each hash chain table is stored, a write-in address is determined for each hash chain table in advance according to the position of each character fragment in a target character string, and when a certain hash chain table is stored by using a dual-port RAM subsequently, the write-in of information of each chain point is completed according to the write-in address determined for the hash chain table before, so that the sequence of the addresses of each hash chain table in the RAM accords with the original query sequence, and the hash chain table can not be stored out of order, and therefore the method can be realized: and sequentially storing each hash chain table corresponding to the data to be compressed. Then, the hash chain tables can be read out subsequently according to the order of the RAM addresses, the hash chain tables can not be read out in a disordered way, and the subsequent compression operation can not be influenced. Meanwhile, the compression efficiency can be accelerated by utilizing hardware equipment.
The following embodiments implement the present application by using a hardware design of FPGA, which can also solve the problem of out-of-order output in hash matching, effectively improve the link point screening efficiency in a parallel cascade comparison manner, and facilitate migration on each platform based on RTL (Register Transfer Level).
Typically, a hash lookup table selects individual bits of the first three bytes of a string for hash operations. Assuming that a character string needing table lookup is ABCDEFG \8230, firstly selecting ABC to carry out hashing to obtain a hash value A1, reading the hash table content A1 corresponding to the address A1, and writing the A1 into the RAM1. If the hash conflict exists, the backward search is continued to obtain A2, A3 \ 8230A 8 and the like. And writing the corresponding chain point information into the corresponding double-port RAM every time when the corresponding chain point information is inquired. Similarly, when the character string is subjected to table lookup, the first three bytes BCD are selected for hashing to obtain a hash value B1, the hash table content B1 corresponding to the address B1 is read, and if hash collision does not exist, the B1 is directly written into the dual-port RAM. And the rest of the analogy is that the chain point information matched with the character strings at the beginning of CDE, DEF and EFG \8230iswritten into the dual-port RAM.
As can be seen, the hash value a1 of ABC is the current read address, from which the offset address of ABC can be read, and the offset address is the first chain point information of the hash chain table of ABC. As shown in fig. 3, if the offset address of ABC is 0000, it is stored in the RAM at address a1, where address a1 is the hash value of ABC, and the hash chain table of ABC includes only 0000. If the hash value of the next character segment BCD is also a1, searching 0000 in the address a1 as a first link point, then determining a next address X according to a preset algorithm, and searching another link point 0030 in the address X; if X does not collide with the hashes of other character fragments, then the hash chain table of the character fragment BCD is determined to be: 0000 → 0030.
The data format written into the dual-port RAM is a bit of flag bit plus link point information which represents whether the linked list is finished or not. If the current link point is the last link point, the flag is 1; otherwise, the flag bit is 0. When the last link point information of a link list is written into the RAM address, the address state register corresponding to the address is set to 1.
The read control of the dual-port RAM is sequentially read according to the number of the post-stage data processing channels, for example: the latter stage is a dual channel process, and then reads the RAM data of two addresses at a time. After reading the last link point information of a link list, the address status register of the corresponding address needs to be set to 0. It can be seen that the data read from the dual-port RAM by several addresses at a time depends on the number of the subsequent data processing channels. The depth of the dual port RAM depends on the length of the linked list (i.e., the total number of link points in the linked list). The address width of the dual-port RAM is 9 bits in total, namely a chain point ending mark of one bit and chain point data of 8 bits.
The specific implementation steps comprise:
(1) And taking the relative offset position of the character segment as writing address information of the double-port RAM, and writing the hash-matched link point information and the link point end mark information of one bit into the corresponding double-port RAM.
(2) When a RAM address stores the last link point information of a link list, the mark corresponding to the address state register is set to be 1 after writing.
(3) And reading the link point data in the dual-port RAM according to the data processing channel sequence of the later stage.
(4) After the last link point of a link list is read, the flag position 0 of the state register corresponding to the address is set.
And after reading to obtain a Hash chain table, comparing the chain points to select an optimal chain point. Wherein, one chain point corresponds to one code length and one distance, and the chain point with the longest code length is selected as the optimal chain point. And if a plurality of chain points with the longest length exist, selecting the chain point with the relatively shortest distance from the chain points. When the link points in one hash linked list are compared, a parallel cascade comparison mode is adopted, and compared with a traditional serial bubble comparison method, the speed of screening the link points can be improved. Referring to fig. 4, when the hash chain length is 8, the serial bubble comparison method illustrated vertically requires 7 clock cycles, while the parallel cascade comparison method illustrated horizontally requires only 3 clock cycles. In addition, compared with the complete parallel comparison, the parallel cascade comparison mode has the advantage that the sequential logic added in the middle can effectively improve the module sequential.
If the chain points in one linked list are less than 8, the comparison result is directly transmitted in the subsequent clock period after the comparison result is obtained, and the comparison in the subsequent clock period is not needed. As shown in fig. 5, only two link points A1 and A2 are present in a linked list, so that after the comparison result of a21 is obtained in the first clock, a21 is directly transferred to a31 and a41 in the subsequent 2 nd and 3 rd clocks.
Therefore, in the embodiment, the originally out-of-order output linked list can be sequentially output by utilizing the dual-port RAM in the FPGA through reading and writing control logic. Meanwhile, the chain point comparison module carries out parallel cascade comparison on a plurality of chain points with the same address, so that the chain point screening rate is effectively shortened, and the final compression efficiency is improved.
In the following, a data processing apparatus provided in an embodiment of the present application is introduced, and a data processing apparatus described below and a data processing method described above may be referred to each other.
Referring to fig. 6, an embodiment of the present application discloses a data processing apparatus, which is applied to a hardware device, and includes:
an obtaining module 601, configured to obtain a target character string;
a determining module 602, configured to determine a plurality of character fragments in the target character string;
the processing module 603 is configured to query, according to the position of each character segment in the target character string, a hash chain table corresponding to each character segment, and determine a write address of each hash chain table in a dual-port RAM of the hardware device;
and a writing module 604, configured to write the link point information in each hash link table into the dual-port RAM according to the write address.
In a specific embodiment, the processing module is specifically configured to:
determining the query sequence of each character segment according to the position of each character segment in the target character string and the rule of querying before the position;
and querying the hash chain table corresponding to each character segment according to the query sequence.
In a specific embodiment, the processing module is specifically configured to: aiming at any character segment, calculating the hash value of the current character segment; determining a current address to be read corresponding to the hash value; judging whether the current address to be read is the same as the hash value of the character segment of the searched hash chain table; if not, after reading the link point information from the current address to be read, adding a link table end zone bit for the link point information, and constructing the read link point information into a hash link table corresponding to the current character segment.
In a specific embodiment, the processing module is specifically configured to: if the current address to be read is the same as the hash value of the character segment of the found hash chain table, after reading the chain point information from the current address to be read, adding a chain table unfinished flag bit to the chain point information, determining the next address to be read, and executing the step of judging whether the hash value of the current address to be read is the same as the hash value of the character segment of the found hash chain table.
In a specific embodiment, the processing module is specifically configured to:
and taking the hash value as the current address to be read.
In one embodiment, the processing module further comprises:
and the address determining unit is used for determining the write-in addresses of the hash chain tables in the dual-port RAM according to the query sequence and the rule that the write-in addresses are queried first.
In one embodiment, the dual-port RAM is multiple; correspondingly, the write module is specifically configured to: and for any hash chain table, respectively writing the chain point information in the current hash chain table into each dual-port RAM according to the write-in address.
In a specific embodiment, the method further comprises the following steps:
the reading module is used for reading the hash chain table corresponding to any character segment from each double-port RAM according to the writing address;
and the selection module is used for determining the coding length corresponding to each link point information in the currently read hash link table, and selecting the link point information with the longest coding length as the compression information of the current character segment.
In a specific embodiment, the selection module is specifically configured to:
and comparing the coding lengths corresponding to the link point information in the currently read hash link list in a parallel cascade mode to select the link point information with the longest coding length as the compression information of the current character segment.
In a specific embodiment, the method further comprises the following steps:
and the other selection module is used for determining distance values corresponding to a plurality of link point information with the longest coding length if the number of the link point information with the longest coding length is multiple, and selecting the link point information with the smallest distance value as the compression information of the current character segment.
In a specific embodiment, the method further comprises the following steps:
the address state changing module is used for writing any RAM address in any double-port RAM into the chain point information carrying the end flag bit of the chain table, and then setting a state register corresponding to the RAM address to be 1; accordingly, after the data in the RAM address is read out, the status register corresponding to the RAM address is set to 0.
In a specific implementation manner, the number of the chain point information in the hash chain table corresponding to any character fragment does not exceed the number of the dual-port RAM.
For more specific working processes of each module and unit in this embodiment, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not described here again.
As can be seen, the present embodiment provides a data processing apparatus, which can implement: and sequentially storing each hash chain table corresponding to the data to be compressed.
In the following, an electronic device provided by an embodiment of the present application is introduced, and an electronic device described below and a data processing method and apparatus described above may be referred to each other.
Referring to fig. 7, an embodiment of the present application discloses an electronic device, including:
a memory 701 for storing a computer program;
a processor 702 configured to execute the computer program to implement the method disclosed in any of the embodiments.
Further, an embodiment of the present application further provides a server as the electronic device. The server may specifically include: at least one processor, at least one memory, a power supply, a communication interface, an input output interface, and a communication bus. Wherein, the memory is used for storing a computer program, and the computer program is loaded and executed by the processor to realize the relevant steps in the data processing method disclosed by any one of the foregoing embodiments.
In this embodiment, the power supply is configured to provide a working voltage for each hardware device on the server; the communication interface can create a data transmission channel between the server and external equipment, and the communication protocol followed by the communication interface is any communication protocol applicable to the technical scheme of the application, and the communication protocol is not specifically limited herein; the input/output interface is used for acquiring external input data or outputting data to the outside, and the specific interface type can be selected according to specific application requirements without specific limitation.
In addition, the memory is used as a carrier for storing resources, and may be a read-only memory, a random access memory, a magnetic disk, an optical disk, or the like, where the resources stored thereon include an operating system, a computer program, data, and the like, and the storage manner may be a transient storage manner or a permanent storage manner.
The operating system is used for managing and controlling hardware devices and computer programs on the Server to realize the operation and processing of the data in the memory by the processor, and can be Windows Server, netware, unix, linux and the like. The computer program may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the data processing method disclosed in any of the foregoing embodiments. The data may include data such as developer information of the virtual machine, in addition to data such as the virtual machine.
Further, the embodiment of the application also provides a terminal as the electronic device. The terminal may specifically include, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, or the like.
Generally, the terminal in this embodiment includes: a processor and a memory.
The processor may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor may be implemented in at least one hardware form of DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), PLA (Programmable Logic Array). The processor may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in a wake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, the processor may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
The memory may include one or more computer-readable storage media, which may be non-transitory. The memory may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory is at least used for storing a computer program, wherein after being loaded and executed by the processor, the computer program can implement relevant steps in the data processing method executed by the terminal side disclosed in any one of the foregoing embodiments. In addition, the resources stored by the memory may also include an operating system, data and the like, and the storage mode may be a transient storage mode or a permanent storage mode. The operating system may include Windows, unix, linux, etc. The data may include, but is not limited to, update information for the application.
In some embodiments, the terminal may further include a display, an input/output interface, a communication interface, a sensor, a power source, and a communication bus.
In the following, a readable storage medium provided by an embodiment of the present application is introduced, and a readable storage medium described below and a data processing method, apparatus, and device described above may be referred to each other.
A readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the data processing method disclosed in the foregoing embodiments.
References to "first," "second," "third," "fourth," etc. (if any) in this application are intended to distinguish between similar elements and not necessarily to describe a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, or apparatus.
It should be noted that the descriptions relating to "first", "second", etc. in this application are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present application.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of readable storage medium known in the art.
The principle and the embodiment of the present application are explained by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (15)

1. A data processing method is applied to hardware equipment and comprises the following steps:
acquiring a target character string;
determining a plurality of character fragments in the target character string;
inquiring a hash chain table corresponding to each character fragment according to the position of each character fragment in the target character string, and determining the write address of each hash chain table in a dual-port RAM of the hardware equipment;
and writing the chain point information in each hash chain table into the dual-port RAM according to the write address.
2. The method according to claim 1, wherein the querying the hash chain table corresponding to each character segment according to the position of each character segment in the target character string comprises:
determining the query sequence of each character segment according to the position of each character segment in the target character string and the rule of querying before the position;
and querying the hash chain table corresponding to each character segment according to the query sequence.
3. The method according to claim 2, wherein the query process of the hash chain table corresponding to any character fragment comprises:
calculating the hash value of the current character segment;
determining a current address to be read corresponding to the hash value;
judging whether the current address to be read is the same as the hash value of the character segment of the searched hash chain table;
if not, after reading the link point information from the current address to be read, adding a link table end flag bit for the link point information, and constructing the read link point information into a hash link table corresponding to the current character segment.
4. The method of claim 3,
if the hash value of the current address to be read is the same as the hash value of the character segment of the found hash chain table, after reading the chain point information from the current address to be read, adding a chain table incomplete flag bit to the chain point information, determining the next address to be read, and executing the step of judging whether the hash value of the current address to be read is the same as the hash value of the character segment of the found hash chain table.
5. The method of claim 3, wherein the determining the current address to be read corresponding to the hash value comprises:
and taking the hash value as the current address to be read.
6. The method of claim 2, wherein determining the write address of each hash chain table in the dual-port RAM of the hardware device comprises:
and determining the write-in address of each hash chain table in the dual-port RAM according to the query sequence and the rule that the query write-in address is in front of the first query write-in address.
7. The method of claim 6,
a plurality of double-port RAMs are provided;
correspondingly, the writing process of each link point information in any hash link list comprises the following steps:
and respectively writing the chain point information in the current hash chain table into the double-port RAM according to the writing address.
8. The method of any one of claims 1 to 7, further comprising:
reading a hash chain table corresponding to any character segment from each double-port RAM according to the write address;
and determining the coding length corresponding to each link point information in the currently read hash link table, and selecting the link point information with the longest coding length as the compression information of the current character segment.
9. The method according to claim 8, wherein the selecting the link point information with the longest coding length as the compression information of the current character segment comprises:
and comparing the coding lengths corresponding to the chain point information in the currently read hash chain table in a parallel cascade mode to select the chain point information with the longest coding length as the compressed information of the current character segment.
10. The method of claim 8, further comprising:
and if the coding length of the link point information is not longer than the coding length of the link point information, determining a plurality of distance values corresponding to the coding length of the link point information, and selecting the link point information with the minimum distance value as the compression information of the current character segment.
11. The method of any one of claims 1 to 7, further comprising:
after any RAM address in any double-port RAM is written into the chain point information carrying the end flag bit of the chain table, setting a state register corresponding to the RAM address to be 1; accordingly, after the data in the RAM address is read out, the status register corresponding to the RAM address is set to 0.
12. The method of claim 7, wherein the number of the chain point information in the hash chain table corresponding to any character segment does not exceed the number of the dual port RAM.
13. A data processing device, applied to a hardware device, includes:
the acquisition module is used for acquiring a target character string;
a determining module for determining a plurality of character fragments in the target character string;
the processing module is used for inquiring the hash chain table corresponding to each character fragment according to the position of each character fragment in the target character string and determining the write-in address of each hash chain table in the dual-port RAM of the hardware equipment;
and the writing module is used for writing the information of each link point in each hash link table into the dual-port RAM according to the writing address.
14. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the method of any one of claims 1 to 12.
15. A readable storage medium for storing a computer program, wherein the computer program when executed by a processor implements the method of any one of claims 1 to 12.
CN202211592819.2A 2022-12-13 2022-12-13 Data processing method, device and equipment and readable storage medium Active CN115577149B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211592819.2A CN115577149B (en) 2022-12-13 2022-12-13 Data processing method, device and equipment and readable storage medium
PCT/CN2023/101223 WO2024124843A1 (en) 2022-12-13 2023-06-20 Data processing method and apparatus, and device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211592819.2A CN115577149B (en) 2022-12-13 2022-12-13 Data processing method, device and equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN115577149A true CN115577149A (en) 2023-01-06
CN115577149B CN115577149B (en) 2023-03-10

Family

ID=84590510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211592819.2A Active CN115577149B (en) 2022-12-13 2022-12-13 Data processing method, device and equipment and readable storage medium

Country Status (2)

Country Link
CN (1) CN115577149B (en)
WO (1) WO2024124843A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024124843A1 (en) * 2022-12-13 2024-06-20 浪潮电子信息产业股份有限公司 Data processing method and apparatus, and device and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103997346A (en) * 2014-05-12 2014-08-20 东南大学 Data matching method and device based on assembly line
WO2022198483A1 (en) * 2021-03-24 2022-09-29 深圳市大疆创新科技有限公司 Data compression method and apparatus, movable platform, and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6624762B1 (en) * 2002-04-11 2003-09-23 Unisys Corporation Hardware-based, LZW data compression co-processor
CN104636377B (en) * 2013-11-12 2018-09-07 华为技术服务有限公司 Data compression method and equipment
CN110928483B (en) * 2018-09-19 2021-04-09 华为技术有限公司 Data storage method, data acquisition method and equipment
CN112306420B (en) * 2020-11-13 2023-01-17 山东云海国创云计算装备产业创新中心有限公司 Data read-write method, device and equipment based on storage pool and storage medium
CN115577149B (en) * 2022-12-13 2023-03-10 浪潮电子信息产业股份有限公司 Data processing method, device and equipment and readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103997346A (en) * 2014-05-12 2014-08-20 东南大学 Data matching method and device based on assembly line
WO2022198483A1 (en) * 2021-03-24 2022-09-29 深圳市大疆创新科技有限公司 Data compression method and apparatus, movable platform, and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
裴东兴等: "LZW算法在存储测试系统中的硬件实现", 《计量与测试技术》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024124843A1 (en) * 2022-12-13 2024-06-20 浪潮电子信息产业股份有限公司 Data processing method and apparatus, and device and readable storage medium

Also Published As

Publication number Publication date
WO2024124843A1 (en) 2024-06-20
CN115577149B (en) 2023-03-10

Similar Documents

Publication Publication Date Title
CN112765271B (en) Block chain transaction index storage method and device, computer equipment and medium
CN112445729B (en) Operation address determination method, PCIe system, electronic device and storage medium
WO2020007288A1 (en) Method and system for managing memory data and maintaining data in memory
CN115577149B (en) Data processing method, device and equipment and readable storage medium
CN112667636B (en) Index establishing method, device and storage medium
CN113434943B (en) BIM standard code processing method and device, electronic equipment and readable storage medium
CN113705136A (en) Integrated circuit automation logic synthesis system, method, device and medium
CN115964002B (en) Electric energy meter terminal archive management method, device, equipment and medium
CN114817651B (en) Data storage method, data query method, device and equipment
CN112732321A (en) Firmware modification method and device, computer readable storage medium and equipment
CN111427511B (en) Data storage method and device
CN111061429B (en) Data access method, device, equipment and medium
CN111858581A (en) Page query method and device, storage medium and electronic equipment
CN115904240A (en) Data processing method and device, electronic equipment and storage medium
CN115408547A (en) Dictionary tree construction method, device, equipment and storage medium
CN112988038B (en) Data writing method of nonvolatile memory, terminal and readable storage medium
CN109840080B (en) Character attribute comparison method and device, storage medium and electronic equipment
CN116266276A (en) Method and device for realizing activation function in neural network
CN114138184A (en) Data deduplication brushing method, device, equipment and medium
CN113420191A (en) Data storage method and device, data query method and device, data structure, electronic device and computer readable storage medium
CN114547038B (en) Data processing method and device of priority database
CN117687704B (en) Display card initialization method, device, equipment and storage medium
CN113641731A (en) Fuzzy search optimization method and device, electronic equipment and readable storage medium
US7840583B2 (en) Search device and recording medium
CN110941571B (en) Flash memory controller and related access method and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant