CN103997346B - Data matching method and device based on assembly line - Google Patents

Data matching method and device based on assembly line Download PDF

Info

Publication number
CN103997346B
CN103997346B CN201410197834.6A CN201410197834A CN103997346B CN 103997346 B CN103997346 B CN 103997346B CN 201410197834 A CN201410197834 A CN 201410197834A CN 103997346 B CN103997346 B CN 103997346B
Authority
CN
China
Prior art keywords
memory
coupling
address
dictionary
chain head
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410197834.6A
Other languages
Chinese (zh)
Other versions
CN103997346A (en
Inventor
董乾
刘勇
李冰
赵霞
王刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201410197834.6A priority Critical patent/CN103997346B/en
Publication of CN103997346A publication Critical patent/CN103997346A/en
Application granted granted Critical
Publication of CN103997346B publication Critical patent/CN103997346B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a data matching method and device based on an assembly line. The data matching method includes the steps of sequentially reading original source files participating in matched compression into a dictionary storage device in a partitioned mode, and updating data in the dictionary storage device at proper moments according to the matched compression progress; sequentially operating three continuous characters in the dictionary storage device with the improved Hash algorithm, predetermining possibly-matched character strings according to the obtained Hash value, and forming an address chain of the possibly-matched character strings in this way; meanwhile, sequentially finding and comparing values from the dictionary storage device with values in the address chain as addresses, and then outputting the optimal matching result. The three steps are carried out at the same time, operation is carried out on the assembly line, and due to the fact that the reading-in speed and the Hash calculating speed are high, the data dependency of the assembly line is small.

Description

A kind of data matching method based on streamline and device
Technical field
The present embodiments relate to technical field of data compression, more particularly, to a kind of data matching method based on streamline And device.
Background technology
At present, in order to save data space, reduce storage medium demand, improve data transmission efficiency, data simultaneously Compress technique has a wide range of applications in fields such as Internet of Things, database, cloud storages.Wherein, Gzip compression algorithm is at present should With most widely a kind of efficient and compression algorithm of increasing income, for example, webpage is carried out on Web server using Gzip compression algorithm Compression, can improve access response speed.
Gzip compression algorithm specifically includes two parts:LZ77 algorithm and Huffman (Huffman) coding.Wherein Huffman is compiled Code is not within the scope of this patent discussion.LZ77 algorithm is replaced by coupling and original source file is carried out de-redundancy, to reach pressure The purpose of contracting.The software implementation method of LZ77 algorithm is:First using hash algorithm it would be possible to the address of the character string of coupling is constituted Chained list;Then in matching range (also referred to as sliding window, size be 32768,32KB), by currently processed character string with upper State character string in chained list, continuous iteration finds best match;Finally coupling string replacement is carried out de-redundancy.
When LZ77 algorithm software is realized at present, chained list can be constructed using hash algorithm;From the beginning of file header, it is original source document Every 3 successive byte (3Byte in part:24bit) calculate 15 bits (15bit) cryptographic Hash (according to hash algorithm, to breathe out Uncommon value is identical, that is, there may exist coupling), then can preserve all Hash identical character ropes with a linked list data structure Draw;The character that each is mated, calculates itself and 2 bytes (Byte) thereafter first, co-continuous 3 bytes (Byte), Cryptographic Hash;Then, while safeguarding ltsh chain table, circulated using ltsh chain table and take out the character that may mate in matching range Plough location, successively character string is taken out according to address and carry out coupling and compare;The match is successful, and partial character string is also required to by Hash Calculate and insert chained list, use when searching for character match below.
In realizing process of the present invention, inventor finds that in prior art, at least there are the following problems:
(Hash calculation → linked list maintenance → value compares (loop iteration using software, LZ77 algorithm to be carried out with order serial Value compares)) when processing, treatment effeciency is very low, consumes substantial amounts of processor and memory resource;LZ77 algorithm software at present Realize Hash calculation method existing defects, easily conflicting between the cryptographic Hash of adjacent encoder character string, (character string is different, cryptographic Hash Identical), cause invalid coupling;Both are the performance bottlenecks that LZ77 algorithm realized by software.
Content of the invention
The present invention provides a kind of data matching method based on streamline and device, and to solve in prior art, software is real Existing LZ77 algorithm, order serial iteration processes that the match is successful, and rate is low, inefficient and consume a large amount of processors and memory resource Defect, this LZ77 algorithm embodiment is existing to be realized by the hardware based on programmable gate array.
The present invention provides a kind of data matching method based on streamline, including:
Dictionary memory is used for branch and stores the file of compression to be matched, word in good time according to the progress of coupling compression From participating in order coupling compression original source file, gradually piecemeal reads in and more new content allusion quotation memory cycle, until whole file Coupling compression finishes;Hash units calculate the character (totally 3 of currently processed character and subsequent 2 bytes using improving hash algorithm The character of individual byte (Byte), hereinafter referred to as:Currently processed character field) cryptographic Hash, and with this cryptographic Hash as address, ought Positional information in dictionary memory for the pre-treatment character is content, in the chain head memory storing;According in address above mentioned The situation of content in chain head random access memory, to chain head memory, backtracking memory and related chain head coupling FIFO storage Device is safeguarded;Mate comparing unit, from chain head coupling pushup storage and backtracking memory, sequentially obtain possible The community string index community joined, and compared using improvement coupling comparative approach value successively, safeguard backtracking memory, until mating simultaneously Relatively terminate.More than it is parallel work-flow to carry out, pile line operation simultaneously.Due to reading in operation with Hash calculation speed quickly, Pipeline data dependence very little.Described improvement mates comparative approach, is different from every time only coupling and compares 1 byte aging method, Compare 8 bytes (Byte) by from dictionary memory value, splicing, making to mate every time, the relatively rear output matching failure of coupling Or the match is successful byte number;If the match is successful byte number is 8, continue to mate rear 8 bytes (Byte), until being grown most Till joining length.
The embodiment of the present invention also provides a kind of device based on streamline, including:
Dictionary memory, participates in coupling compression original source file for Sequential Block storage, and according to coupling compression feelings Condition updates;Hash units, for calculating the cryptographic Hash of currently processed character field, and with cryptographic Hash as address, by currently processed word Positional information in dictionary memory for the symbol is stored in chain head memory.If be stored with cryptographic Hash in chain head memory For the old positional information of address, then substitute described old positional information using new location information.Hash units use new location information When substituting old positional information, after new location information and old positional information being spliced, put into chain head coupling pushup storage In;Coupling comparing unit takes out this information when coupling compares, then from chain head coupling pushup storage, for traveling through back The memory that traces back obtains cryptographic Hash identical address.With address above mentioned from dictionary memory value, coupling substitute de-redundancy.
Hash units specifically include:
Hash calculation module, for using the cryptographic Hash improving the hash algorithm described currently processed character field of calculating;
Write management module, for the currently processed character address corresponding to described cryptographic Hash is stored described chain head memory In, if will write that valid data (content is zero) be there are not in memory space, the new data that only writes direct is to described chain In head memory;If will write that valid data (content non-zero) there are in memory space, need to judge this content with Whether, beyond coupling comparison range, if exceeding, the new data that only writes direct is to described chain head for currently processed character address distance In memory;If without departing from new data as address while write new data is in described chain head memory, legacy data is Content is stored in described backtracking memory, and deposits being stored in described chain head coupling FIFO after new data and legacy data splicing In reservoir.
Described backtracking memory specifically includes:
Backtracking memory storage module, is 3 piece of 68 kilobytes (68KB:Block random access memory (block 17bit*32K) RAM);Every piece corresponding with one piece of dictionary memory;Which stores the address of cache of cryptographic Hash identical character field first place character;
Read-write Catrol module, when hash units and coupling comparing unit access simultaneously, (hash units write request, coupling compares Unit read request) backtracking memory when, first respond the access request of hash units, match comparison module delayed a clock week Phase is processed.
Described coupling comparing unit specifically includes:
Match address spider module, for from chain head coupling pushup storage and backtracking memory value, circulation time Go through out the character string first place address being possible to mate with currently processed character field, and be sent in coupling dictionary value module;
Coupling dictionary value module:For the address being provided according to match address spider module, take from dictionary memory Go out respective symbols string;Spliced by value, export the character string of continuous 8 bytes every time;
Comparison module:For being compared coupling dictionary value module continuation character string, after completing all coupling relatively, The optimum matching result of output;Match address spider module compares according to the coupling that above-mentioned matching result carries out a new round.
A kind of data matching method based on streamline and device that the present invention provides, by based on programmable gate array Hardware realizing LZ77 algorithm, overcomes order serial iteration in existing Software Realization Technique and processes that the match is successful rate is low, effect Rate is poor and defect that consume a large amount of processors and memory resource, it is possible to increase the treatment effeciency of LZ77 algorithm, greatly reduces Processor and memory and resource consumption.
Brief description
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing Have technology description in required use accompanying drawing be briefly described it should be apparent that, drawings in the following description are the present invention Some embodiments, for those of ordinary skill in the art, without having to pay creative labor, can also basis These accompanying drawings obtain other accompanying drawings.
Fig. 1 is the structural representation of the Data Matching hardware based on streamline according to embodiments of the present invention;
Fig. 2 is the schematic device of dictionary memory according to embodiments of the present invention;
Fig. 3 is the schematic device of hash units according to embodiments of the present invention;
Fig. 4 is the schematic device of chain head memory according to embodiments of the present invention;
Fig. 5 is the schematic device of backtracking memory according to embodiments of the present invention;
Fig. 6 is the schematic device that chain head according to embodiments of the present invention mates pushup storage;
Fig. 7 is the schematic diagram of coupling comparison range according to embodiments of the present invention;
Fig. 8 is the schematic diagram of mark bit register according to embodiments of the present invention;
Fig. 9 is the schematic diagram of the device of coupling comparing unit according to embodiments of the present invention;
Figure 10, Figure 11, Figure 12 are according to embodiments of the present invention based on pipeline data matching process and main storage list The flow chart that unit safeguards.
Specific embodiment
According to embodiments of the invention, the device of the LZ77 algorithm based on streamline can be based on special IC (Application Specific Integrated Circuit;Abbreviation ASIC) or field programmable gate array (Field- Programmable Gate Array;Referred to as:FPGA).Wherein, disposable input threshold height, construction cycle are realized based on ASIC Longer, the shortcomings of strong to process dependency, and final products are dumb, unsuitable for being applied to, product renewing is fast, demand is many In the network equipment becoming.The disposable input of method based on FPGA is moderate, and the construction cycle is short, and platform is portable strong.Therefore exist In the embodiment of the present invention, realize LZ77 algorithm using the device based on FPGA.
Further illustrate the technical scheme of the embodiment of the present invention below in conjunction with the accompanying drawings with specific embodiment.
According to embodiments of the invention, there is provided a kind of data matching device, Fig. 1 is data according to embodiments of the present invention Coalignment schematic diagram, as shown in figure 1, data matching device according to embodiments of the present invention includes:Dictionary memory 10, Hash Unit 12, chain head memory 14, backtracking memory 16, chain head coupling pushup storage 18, mark bit register 20, coupling Comparing unit 22.Below, above-mentioned modules are described in detail.
Dictionary memory 10 is used for preserving by the character of process to be compressed, and piecemeal reads in participate in mating and compresses original source document Part, reads for hash units and coupling comparing unit inquiry.Content in dictionary memory will be cyclically updated in order.At this In bright embodiment, piecemeal as above-mentioned in Fig. 2, dictionary memory is divided into 3 pieces, and every block size is 32 kilobytes (KB).
Specifically, LZ77 algorithm needs to define a sliding window, abbreviation window, and its size is generally 32 kilobytes (KB), currently processed character field can search coupling in this window.Window moves with compression, and new content can be moved Enter, old content can be moved out of.In an embodiment of the present invention, as Fig. 7, window size is set to 32,000 (32K).
When LZ77 algorithm realized by hardware, simple in order to realize operational administrative, much all adopt the implementation of independent window: According to independent window size, artificially it is split as independent 32 kilobytes (KB) section by participating in coupling compression original source file, deposits Enter in dictionary memory 10 different masses, orthogonal between block and block.In order to carry out even to participation coupling compression original source file The process of continuous window and pile line operation, the storage way to manage to dictionary memory 10 of the embodiment of the present invention has been done as follows Optimize:
Using special pre-read and remove administrative mechanism manage each piece of content in dictionary memory 10, such as Fig. 2, dictionary is deposited Reservoir 10 is divided into 3 pieces, every piece of 32 kilobytes (KB), and sliding window size is 32 kilobytes (KB).And if only if, and sliding window moves When going out a certain piece in dictionary memory 10, in mark bit register 20, this block corresponding part is set flag bit, if participation Join compression original source file not yet to have been processed, then remove content in above-mentioned piece, and insert new follow-up be to participate in coupling compression Original source file, after the completion of remove this block mark position.When window moves into a certain new piece it is, first should in inquiry mark bit register 20 Flag bit corresponding to block, if flag bit is eliminated, directly immigration is processed;If flag bit sets, need to wait for flag bit Move into again after being eliminated.
Dictionary memory 10 leaves the interface with hash units, sequentially can move to right a byte (Byte) every time, defeated Go out the character string of continuous 3 bytes (Byte).Concrete operations are as follows, have the register (reality in the present invention that 2 groups of bit lengths are more than 3 Apply in example, register bit length be set to 8), order is read in and data from dictionary memory 10 as needed, alternately updates, And export the character string of continuous 3 bytes (Byte).
Each clock, along upper, may have 2 read requests (hash units 12 mate comparing unit 22) to access dictionary simultaneously Memory 10, therefore, it is necessary to avoid read conflict using reading Coordination module, in embodiments of the present invention, using authority classification mould Formula.By calculating assessment optimum efficiency, the access rights of order coupling comparing unit 22 are higher than hash units 12, if during two units Divide and individually propose read request, then dictionary memory 10 makes an immediate response;If said two units propose read request simultaneously, ring immediately Comparing unit 22 should be mated, time delay responds the request of dictionary memory 10 again after 1 cycle.
Hash units 12 are used for calculating the Hash of the character string of continuous 3 bytes (Byte) from dictionary memory 10 output Value, inputs the binary value of the character string for 3 bytes (Byte), is output as 16 (bit) cryptographic Hash.Output cryptographic Hash, that is, Character index for handled character string.And with this cryptographic Hash for address by character string first byte the ground in dictionary memory 10 Location (positional information), as content, is stored in chain head memory 14.
Chain head memory 14, for character string cryptographic Hash as address, storing this character string first byte in dictionary memory Address (positional information) in 10.Before being stored in, it is first determined on this address, whether data is effective, if invalid data, then directly It is access to this address and be stored in new content;If valid data, then, after this data being spliced with the data that will be stored in, it is stored in chain head Join pushup storage 18;The data that chain head memory 14 simultaneously will be stored in is address, with already present valid data is Content, is stored in backtracking memory 16;And existing valid data in chain head memory 14 are updated to new data.
Specifically, judge in chain head memory 14, to be stored in whether data on address is effective, its Rule of judgment has two: 1. whether data is full 0, if full 0, then for invalid data;If not full 0, then need to judge the new data that will be stored in and be somebody's turn to do Whether " poor " (the actually referring to the distance between two addresses) of data is more than window size?(in an embodiment of the present invention, Window size is set to 32,000 (32K).If) be more than, for invalid data;Otherwise it is valid data.
Chain head mates pushup storage 18, for a certain address more new content in chain head memory, and this address When originally having deposited valid data, store this new data and old valid data spliced content from beginning to end.Reality in the present invention Apply in example, data width is set to 34 (bit), depth is 20,000 (20K).Chain head mates the number in pushup storage 18 According to, for coupling comparing unit 22 read when coupling compares.
Backtracking memory 16 is used for a certain address more new content in chain head memory, and this address had originally been deposited effectively During data, with new data for address location space, store old valid data.In an embodiment of the present invention, backtracking storage Device 16, corresponding to dictionary memory 10, is divided into corresponding 3 pieces.Every block size is 68 kilobytes (68KB:Block 17bit*32K) Random access memory (block RAM) is constituted.Each piece of content in dictionary memory 10, is also synchronized with dictionary memory 10 corresponding The content of block, when certain block content of dictionary memory 10 is cleared, its content in corresponding piece of memory 16 of backtracking is also together It is cleared.Data in backtracking memory 16, reads when coupling compares for coupling comparing unit 22.
Mark bit register 20 is multiple independent by one group, and the register corresponding to block number in dictionary memory 10 is constituted. In an embodiment of the present invention, each register is at least 2 (bit).Mark bit register 20 is used for indicating dictionary memory The state (empty/effectively/Hash calculation mistake/invalid) of each block number evidence in 10.When certain the block content in dictionary memory 10 is cleared When being not yet updated to new content afterwards, then its corresponding flag bit is set as sky;When certain the block content quilt in dictionary memory 10 It is updated to new content, not yet when hash units 12 have calculated cryptographic Hash, then its corresponding flag bit is effective;Work as dictionary Certain block content in memory 10 is hashed after unit 12 calculated cryptographic Hash, then its corresponding flag bit is set as Hash calculation Cross;When certain the block content in dictionary memory 10 is matched comparing unit 22 and processed all data, then its corresponding mark It is invalid that position is set as.
Each clock, along upper, may have 2 read-write requests (hash units 12 mate comparing unit 22) to access back simultaneously Trace back memory 16.Therefore, it is necessary to read conflict is avoided using reader/writer coordination module, in embodiments of the present invention, divided using authority Level pattern.By calculating assessment optimum efficiency, the access rights of order coupling comparing unit 22 are higher than hash units.If above-mentioned two The unit time-division individually proposes reading and writing request, then backtracking memory 16 makes an immediate response;If said two units propose reading and writing simultaneously Request, then the read request of the coupling comparing unit 22 that makes an immediate response, time delay responds writing of dictionary memory 10 again after 1 cycle please Ask.
Coupling comparing unit 22, for coupling compare and output matching result (flag bit, repeat length, refer to back away from From).When coupling compares every time:Coupling comparing unit 22 takes out first to possible from chain head coupling pushup storage 18 The address pair joined.Then according to address above mentioned to inquiry backtracking memory 16, to obtain in window other and currently processed character The address of the character string that string may mate.Meanwhile, take out the character string initiateing with address above mentioned from dictionary memory 10, go forward side by side Row compares.More all pursue every time the matching length of maximum, complete in the range of setting all of compare after, the maximum coupling of output Length and refer to back distance accordingly.Then, the coupling starting next time compares.
Specifically, when coupling comparing unit 22 carries out coupling and compares, carry out in the range of above-mentioned setting.Set scope has Two:1. the maximum length of matching content, when matching content is more than or equal to this maximum matching length, runs into end of file, coupling Action will stop and output matching result (matching length is 255 to the maximum).In an embodiment of the present invention, maximum matching length 255 bytes (Byte);2. maximum traceback number of times, mates comparing unit, carries out coupling using address in backtracking memory 16 and compares Number of times, compared with the corresponding preferable compression effectiveness of multiple backtrace number of times and relatively low compression bandwidth, on the contrary the higher compression bandwidth of correspondence With poor compression effectiveness, when hardware is realized, this setting can be set to fixed value or be set to configurable register.Reality in the present invention Apply in example, backtracking number of times flexibly can be arranged for different targets, may be configured as 4 times, 8 times, 16 times, 32 times multiple with infinite.
Coupling comparing unit 22 includes:Match address spider module, coupling dictionary value module and comparison module.
Match address spider module in coupling comparing unit 22 mates taking-up pushup storage 18 from chain head Data, when being to use new location information to substitute old positional information by hash units, by new location information and the splicing of old positional information After put into therein.This data is the first address of currently processed character and the first ground of first character string that may be matched Location.After above-mentioned two sector addresses are taken out and decoupled by coupling comparing unit 22, taken from dictionary memory 10 according to this two sector address Go out character string and be compared.
Match address spider module in coupling comparing unit 22, with the data after above-mentioned partition as address, stores from backtracking Take out data in device 16, then backtracking memory 16 value is accessed for address with the new data taken out, such searching loop goes out all The address of the character string that may mate in the range of setting.
Coupling dictionary value module in coupling comparing unit 22 accesses dictionary memory 10 with address above mentioned and obtains possibility The character string that may mate with currently processed character;Coupling dictionary value module passes through string-concatenation, realizes output every time even The character string of continuous 8 byte longs compares for comparison module coupling in coupling comparing unit 22.Mate dictionary value module and compare Module works simultaneously, and character string value is carried out with mating to compare simultaneously, reads while coupling compares from dictionary memory 10 Mate more required data next time.Read in multiple bytes additions every time to compare, replace software and realize being to read in ratio by byte Relatively, relative efficiency can largely be improved.In an embodiment of the present invention, 8 words are read in from dictionary memory 10 every time Section (Byte) participates in comparing.
Comparison module in coupling comparing unit 22 to every a pair may the character string of coupling carry out coupling and compare when, if The coupling comparative result of the character string read in advance is whole coupling, then continue to read follow-up 8 word from dictionary memory 10 Section adds character string, after " prolongation ", then is compared coupling, until the matching result grown most, so far this is to character string Till proportioning relatively terminates.After the completion of each pair string matching, coupling comparing unit 22 can take out according to from backtracking memory 16 Data continues to read new character string from dictionary memory 10, participates in the comparison with currently processed character string.Circulation is carried out, until In the range of all couplings compare end.All couplings of currently processed character string are compared after end, mates comparing unit 22 In comparison module will export optimum coupling comparative result, and get started the coupling of a new round and compare work.Coupling ratio Relatively result includes flag bit, matching length and refers to back distance, totally 3.The content that wherein flag bit is used for mark output is not The data joined, or the length distance that the match is successful is to (matching length, refer to back distance).If all couplings are all unsuccessful, mark The non-matched data in will position;If the match is successful, in addition it is also necessary to output matching length and and matching length in addition to exporting corresponding flag bit Corresponding finger returns distance.So-called refer to back distance, refer to the distance between first byte of matched character string.
In actual application, the technical scheme of the embodiment of the present invention can be applicable to the various rings that there is data compression demand In border, for example, transparent hardware-compressed IP is desirably integrated in the mixed-media network modules mixed-media of data server and completes to compress and decompress, and data takes Business device saves memory space and the network bandwidth when receiving and dispatching file in real time by compressing file storage or transmission.The present invention is real The data matching device applying example is applied in this transparent hardware-compressed IP, can save under data server high load condition The CPU usage of 25%-40%, and make its write performance have obviously advantage.Number by using the embodiment of the present invention It is achieved that " zero consumption " to system CPU for the compression, and then improve the overall performance of system according to during coalignment compressed data. Data, after the compression of this compressed file system, can be saved disk storage capacity more than more than 40% and 50% data passes Defeated bandwidth.
By above-mentioned process, the embodiment of the present invention is assessed based on professional FPGA, is realized using the device based on streamline Closely, but the single pass process bandwidth of FPGA is Software Compression for LZ77 algorithm, final compression ratio and software implementation method More than 8 times of mode, are single cpus more than 24 times using this FPGA implementation peak velocity ability processing affairs per second, can It is beyond one's reach disposal ability with reaching CPU.
Device embodiment described above is only illustrative, and the wherein said unit illustrating as separating component is permissible It is or may not be physically separate, as the part that unit shows can be or may not be physical location, Can be same module, or can also be distributed on multiple NEs.Can select therein according to the actual needs Some or all of module is realizing the purpose of this embodiment scheme.Those of ordinary skill in the art are not paying the labor of creativeness In the case of dynamic, you can to understand and to implement.
Fig. 3 is the schematic device of the hash units according to the embodiment of the present invention, wherein, hash units, work as calculating Pre-treatment character and subsequent 2 characters (totally 3 characters, hereinafter referred to as:Currently processed character field) cryptographic Hash, using improvement Hash algorithm, the long character string binary code of input 3 bytes (Byte), export 16bit Hash calculation result.This 16 (bit) Long cryptographic Hash, can be prevented effectively from the pseudo- coupling that cryptographic Hash closes on continuous position.(character string abc and bcd, bcd and cde, general Hash calculation result, cryptographic Hash is identical, but character string mismatches.), with this Hash calculation result as address, general is current for subsequent module Positional information in dictionary memory for the processing character is content, in the chain head memory storing;According to chain in address above mentioned The situation of content in head random access memory, to chain head memory, backtracking memory and related chain head coupling pushup storage Safeguarded.
As shown in figure 4, chain head memory, be one group of bit wide be 17bit, depth is the block random memory unit of 64K.For With the cryptographic Hash of currently processed character field as address, store positional information in dictionary memory for the currently processed character.Depth: 2^16 (16 cryptographic Hash)=64K.
Width:2bit (dictionary memory numbering)+15bit (address in dictionary memory)=17bit.
As shown in figure 5, backtracking memory is one group of polylith 68 kilobytes (68KB:A width of 17bit, depth be 32K) block Random memory unit, every piece of backtracking memory is corresponded with dictionary memory, first for storing cryptographic Hash identical character field The address of cache of position character.Depth:Identical with dictionary memory depth.Width:2bit (dictionary memory numbering)+15bit (word Address in allusion quotation memory)=17bit.
As shown in fig. 6, chain head coupling pushup storage, it is one group of pushup storage.The data being stored at first, Will be read out earliest.Data width is set to 34bit, and depth is 2048 (20K), in storage cryptographic Hash identical characters section The address of two initial characters, when coupling compares, is read sequentially.When a certain address more new content in chain head memory, and should When valid data have originally been deposited in address, store this new data and old valid data spliced content from beginning to end.
Depth:It is the bigger the better in theory, be set to 20K for economizing on resources
Width:2 times of chain head memory width, 17bit*2=34bit
Fig. 7 is the schematic diagram of coupling comparison range, shown in Fig. 7, from top to bottom, is 32K shiding matching window, slowly " sliding Enter " schematic diagram of file.All of coupling replaces work, all just for the content in currently processed character field and window.
As shown in figure 8, mark bit register be one group multiple independent, corresponding to the depositing of block number in dictionary memory Device.Each register is at least 2bit width.For indicating the state of each block number evidence in dictionary memory, total " empty ", " effective ", 4 kinds of states of " Hash calculation is crossed ", engineering noise, are not yet updated to new content after certain the block content in dictionary memory is cleared When, then its corresponding flag bit is set as " empty ";When certain the block content in dictionary memory is updated to new content, not yet warp When crossing Hash calculation, then its corresponding flag bit is " effective ";When certain the block content in dictionary memory has been calculated cryptographic Hash Afterwards, then its corresponding flag bit is set as " Hash calculation is crossed ";Compare process when certain the block content in dictionary memory is matched During all data, then its corresponding flag bit is set as engineering noise;When hash units calculate a certain new piece of dictionary memory, meeting First check for its corresponding mark bit register, and if only if, and this mark bit register is " effective " state, just can start to calculate; When will process a certain piece of dictionary memory in match comparison module work process, its corresponding flag bit can be first checked for and post Storage, and if only if, and this mark bit register is " Hash calculation is crossed " state, just can start to calculate;And if only if, and this flag bit is posted Storage is engineering noise state, and the meeting of this block and corresponding backtracking memory in dictionary memory is cleared.
As shown in figure 9, coupling comparing unit, using a management module, this management module is deposited according to coupling FIFO Content in reservoir, and backtracking memory, traversal is possible to the character string mated.Each comparison match first address rises follow-up The character of 8 bytes (Byte).Return repeat length every time and refer to back distance.After the completion of relatively traveling through, export optimal repetition Length and refer to back distance.
According to embodiments of the invention, there is provided a kind of data matching method, Figure 10, Figure 11, Figure 12 are according to the present invention The flow chart safeguarded based on pipeline data matching process and primary memory cell of embodiment.
Figure 10 is the maintenance process figure of dictionary memory 10, is divided into 3 pieces.The circulation that connects of 3 pieces of memory cell first places is inserted New data.Every piece when inserting data, first checks for its corresponding mark bit register, determines whether this part is writeable;Update After complete data, judge that the data of write is last block of file.
Figure 11 is chain head memory, chain head coupling pushup storage, backtracking memory maintenance flow chart.Calculate Kazakhstan After uncommon value, in judging with this cryptographic Hash for address chain head memory 14, whether the value of storage, as full 0, if full 0, then directly ought The address of pre-treatment character is stored in;If not full 0, then need to judge that the distance between the address of currently processed character and this value are No be more than 32768 (32K), if be more than 32K, directly the address of currently processed character is stored in;If being less than 32K, will be current The address of processing character is stored in chain head memory 14, and corresponding more new chain head coupling pushup storage, backtracking storage Device, specifically repeats no more.In embodiments of the present invention, window size is equal to the size of block in dictionary memory 10, when window moves Go out a certain piece (when namely window moves into a certain new piece of dictionary memory 10) of dictionary memory 10, by the dictionary being moved out of The block of memory 10 empties, and empties corresponding piece in backtracking memory 16 simultaneously.
Figure 12 is coupling flow chart, describes the flow process of the comparison match part of data matching method based on streamline. Coupling comparing unit, mates from chain head first in the matching process and takes out the address that may mate pushup storage 18 Right, from dictionary memory 10, value compares, then in the case of less than maximum traceback matching times, then from backtracking storage Device 16 takes out the address pair that may mate, and from dictionary memory 10, value compares.Below compare pursuit every time in maximum coupling In the range of matching result the longest.After all couplings terminate, output matching length result the longest.
It should be noted that the relevant treatment about matched and searched unit is referred to the related content in above-described embodiment Understood, related structural representation is referred to Fig. 1 to Fig. 9 and is understood, will not be described here.
In sum, by means of technical scheme, the embodiment of the present invention, using the hash algorithm of original creation, is effectively kept away Exempt from the pseudo- coupling that cryptographic Hash closes on continuous position, improve the success rate that coupling compares;Mate FIFO using one piece of chain head to deposit Reservoir 18, the address pair of temporary chain head coupling, interrupt Hash calculation and mate the circulation compared it is achieved that pile line operation is real Existing comparison match, realizes compression, improves speed;Achieve pressure using polylith dictionary memory 10 and corresponding backtracking memory 16 The continuity of contracting window, improves compression ratio;While coupling compares, next group of pre-read participates in the content that coupling compares, Greatly play the concurrency of hardware system, improve efficiency.Any of the above means greatly improve the process of data compression Ability, can reach CPU and be beyond one's reach disposal ability.The treatment effeciency of LZ77 algorithm can be improved, dramatically saves on CPU Resource with memory.
One of ordinary skill in the art will appreciate that:The all or part of step realizing said method embodiment can be passed through Completing, aforesaid program can be stored in the read/write memory medium of computer the related hardware of programmed instruction, this program Upon execution, execute the step including said method embodiment;And aforesaid storage medium includes:R0M, RAM, magnetic disc or light Disk etc. is various can be with the medium of store program codes.
Finally it should be noted that:Above example only in order to technical scheme to be described, is not intended to limit;Although With reference to the foregoing embodiments the present invention is described in detail, it will be understood by those within the art that:It still may be used To modify to the technical scheme that foregoing embodiments are recorded, or equivalent is carried out to wherein some technical characteristics;And These modifications or replacement, do not make the essence of appropriate technical solution depart from spirit and the model of various embodiments of the present invention technical scheme Enclose.

Claims (18)

1. a kind of data matching method based on streamline is it is characterised in that include:
File is carried out mate squeeze operation with coupling comparing unit, dictionary memory will correspondingly circulate from participation coupling pressure Read in and more new content in contracting original source file, until whole file mates compression finishing, wherein, dictionary memory be one group with Machine memory, stores the file of compression to be matched;
Hash units calculate the cryptographic Hash of currently processed character field, and wherein, described currently processed character field refers to currently processed 1 The character of byte (Byte) and the character of subsequent 2 bytes (Byte), the character of totally 3 bytes (Byte);And with this Hash It is worth for address, positional information in dictionary memory for the currently processed character is content, in the chain head memory storing;Root According to the situation of content in chain head memory in address above mentioned, chain head memory, backtracking memory and related chain head coupling are first entered First go out memory to be safeguarded;
Coupling comparing unit sequentially obtains, from chain head coupling pushup storage and backtracking memory, the character that may mate String indexing, and compared using improvement coupling comparative approach value successively, safeguard backtracking memory, until mate comparing knot simultaneously Bundle.
2. method according to claim 1 is it is characterised in that dictionary memory is one group of polylith 32 kilobytes (32KB: 8Byte*4K or 4Byte*8K) block random access memory (block RAM), participate in coupling compression original source file be sequentially stored In dictionary memory, and updated according to coupling compression situation, every group of dictionary memory is made up of polylith random access memory;
Participate in coupling compression original source file to be sequentially stored in dictionary memory, with the carrying out of coupling compression, when a certain Content in block dictionary memory completely disengages from when being matched comparison range (32768,32K), and data therein will be cleared, with Insert the data of new follow-up participation compression fit afterwards;One group of polylith dictionary memory updates in turn, until coupling has been compressed Become.
3. method according to claim 2 is it is characterised in that described hash units provide address by described currently processed word Positional information in dictionary memory for the symbol is stored in described chain head memory and includes:
Hash units calculate the cryptographic Hash of currently processed character field using improving hash algorithm, using 3 byte (3Byte:24bit) Long character string is calculated 16 (bit) long cryptographic Hash, can be prevented effectively from the pseudo- coupling that cryptographic Hash closes on continuous position;
With above-mentioned cryptographic Hash as address, positional information in dictionary memory for the currently processed character is stored in chain head memory; If in described chain head memory, the stored old positional information with described cryptographic Hash as address, then believed using new position Breath substitutes described old positional information.
4. method according to claim 3 is it is characterised in that described coupling comparing unit obtains in described chain head memory Character index before, methods described also includes:
Described hash units when substituting described old positional information using new location information, by described new location information and described old After positional information splicing, put in chain head coupling pushup storage;Described coupling comparing unit when coupling compares beginning, Take out described new location information and the spliced data of described old positional information from chain head coupling pushup storage;
Calculate distance between old positional information and new location information, if this distance has been above mating comparison range (32768,32K), Then only substitute described old positional information using new location information, do not put in chain head coupling pushup storage.
5. method according to claim 4 is it is characterised in that described hash units are described using new location information replacement During old positional information, after described new location information and described old positional information splicing, put into the coupling FIFO storage of chain head After in device, methods described also includes:With new location information as index address, old positional information is deposited into backtracking memory In, the content in being stored with backtracking memory certain block dictionary corresponding is completely disengaged from when being matched comparison range, and this backtracking is deposited Data in reservoir will be cleared.
6. method according to claim 5 is it is characterised in that described coupling comparing unit first enters according to described chain head coupling First go out the data in memory and backtracking memory and carry out coupling inclusion:Described coupling comparing unit, first enters elder generation from chain head coupling Go out and in memory, read out the address pair that may mate with chain head, and compare carrying out coupling using improvement comparative approach;Pass through From dictionary memory value, splicing, make to mate every time and compare 8 bytes, the relatively rear output matching of coupling fails or the match is successful Byte number;If the match is successful, byte number is 8, continues to mate rear 8 bytes, till obtaining the longest matching length.
7. method according to claim 6 is it is characterised in that described improvement coupling comparative approach first enters elder generation from chain head coupling Go out to obtain in memory chain head may mate to address pair, and carrying out coupling and compare using improving comparative approach, simultaneously with Chain head may mate to latter half be address, backtracking memory in value;Again with this value for address in backtracking memory Middle value, so circulation obtain recalling the address being possible in memory mate;Using improvement comparative approach while value Carry out coupling to compare;After all couplings terminate, the output matching result that the match is successful or matching length is optimum.
8. method according to claim 7 will be it is characterised in that the content in described backtracking memory, dictionary memory will Work with match comparison module and hash units empties and resets;When hash units have calculated in a certain piece of dictionary memory During all of cryptographic Hash, this block dictionary memory can be set with a flag bit;Hash units are followed to three pieces of orders of dictionary memory Ring calculates cryptographic Hash, arrives a certain new piece of dictionary memory when calculating, first checks corresponding flag bit, if flag bit sets, Hash units suspend calculating, until flag bit starts to calculate after being eliminated again;Match comparison module, is sequentially compared coupling work Make, be matched comparison range when finding that the content in a certain piece of dictionary memory has completely disengaged from, and participate in coupling compression At the end of original source file is not yet processed, then the content in this dictionary memory and corresponding backtracking memory is emptied, And new content is inserted in dictionary memory, then will indicate bit clear.
9. method according to claim 8 it is characterised in that:
Described chain head memory, is one piece of 136 kilobytes (136KB:Block random access memory (block RAM) 17bit*64K);
Described chain head coupling pushup storage is a kind of dual-port pushup storage, its depth data width, by The block number of dictionary memory determines;Wherein, described chain head coupling pushup storage depth is at least 32768 (32K), data Width is at least 34 (bit);
Described backtracking memory, is one group of polylith 68 kilobytes (68KB:Block random access memory (block 17bit*32K) RAM);Every piece of backtracking memory is corresponded with dictionary memory;Which stores cryptographic Hash identical character field first place character Address of cache.
10. a kind of data matching device is it is characterised in that include:
Dictionary memory, reads in for piecemeal and participates in coupling compression original source file;
Hash units, are calculated the cryptographic Hash of currently processed character field, determine currently processed word according to cryptographic Hash using innovatory algorithm The index of symbol, and currently processed character address is stored in chain head memory, and according to circumstances update backtracking memory and Chain head mates pushup storage;
Mark bit register, for assisting hash units and coupling comparing unit work, mark bit register is used for indicating dictionary The state of each block number evidence in memory, has " empty ", " effective ", " Hash calculation is crossed ", 4 kinds of states of engineering noise, when dictionary storage When certain block content in device is not yet updated to new content after being cleared, then its corresponding flag bit is set as " empty ";Work as dictionary Certain block content in memory is updated to new content, and not yet when Hash calculation, then its corresponding flag bit is " to have Effect ";After certain the block content in dictionary memory has been calculated cryptographic Hash, then its corresponding flag bit is set as " Hash calculation Cross ";When certain the block content in dictionary memory is matched and compares all data of process, then its corresponding flag bit is set as Engineering noise;When hash units calculate a certain new piece of dictionary memory, its corresponding mark bit register can be first checked for, when and Only when this mark bit register is " effective " state, just can start to calculate;Will process in match comparison module work process During a certain piece of dictionary memory, its corresponding mark bit register can be first checked for, and if only if, and this mark bit register is " to breathe out Wish and calculated " state, just can start to calculate;And if only if, and this mark bit register is engineering noise state, will in dictionary memory Blocks to be processed and corresponding backtracking memory can be cleared;
Chain head memory, for storage with the cryptographic Hash of described currently processed character field as address, described currently processed character ground Location is the information of content;For judging in coupling comparison range, if there is the word that possibility is mated with currently processed character field Symbol section;
Chain head mates pushup storage, for first group of character field that may mate handled by storage coupling comparing unit Address pair, for match comparison module read;
Backtracking memory, the chain that the first address for storing cryptographic Hash identical character field is constituted, look into for match comparison module Ask and read;
Coupling comparing unit, may be with for obtaining in described chain head coupling pushup storage and described backtracking memory The first address of all character fields of currently processed character field coupling, and mated accordingly.
11. devices according to claim 10 are it is characterised in that described dictionary memory specifically includes:One group 3 pieces 32,000 Byte (32KB:Block random access memory 8Byte*4K), reads in for sequential loop piecemeal and participates in coupling compression original source file, Compare compression for coupling.
12. devices according to claim 11 are it is characterised in that hash units specifically include:
Hash calculation module, for using the cryptographic Hash improving the hash algorithm described currently processed character field of calculating;
Write management module, for the currently processed character address corresponding to described cryptographic Hash is stored in described chain head memory, If will write that valid data (content is zero) be there are not in memory space, the new data that only writes direct is deposited to described chain head In reservoir;If will write that valid data (content non-zero) there are in memory space, need to judge this content with current Whether beyond coupling comparison range, if exceeding, the new data that only writes direct stores processing character address distance to described chain head In device;If without departing from new data as address while write new data is in described chain head memory, legacy data is content It is stored in described backtracking memory, and described chain head coupling pushup storage will be stored in after new data and legacy data splicing In.
13. devices according to claim 12 are it is characterised in that described chain head memory specifically includes:1 piece of 136 K word Section (136KB:Block random access memory 17bit*64K), for described currently processed character address is stored for address with cryptographic Hash, To judge whether the character field that may mate with described currently processed character field.
14. devices according to claim 13 are it is characterised in that described chain head coupling pushup storage specifically wraps Include:1 piece of width is 34, and depth is at least the dual-port pushup storage of 32768 (32K), reads for match comparison module Take the 1st to the address pair that may mate.
15. devices according to claim 14 are it is characterised in that described backtracking memory specifically includes:
Backtracking memory storage module, is 3 piece of 68 kilobytes (68KB:Block random access memory (block 17bit*32K) RAM);Every piece corresponding with one piece of dictionary memory;Which stores the address of cache of cryptographic Hash identical character field first place character;
Read-write Catrol module, when hash units and coupling comparing unit access (hash units write request, coupling comparing unit simultaneously Read request) backtracking memory when, first respond the access request of hash units, match comparison module delayed at the clock cycle Reason.
16. devices according to claim 15 are it is characterised in that described mark bit register specifically includes:3 independent 2 bit registers, each mark bit register is corresponding with one piece of dictionary memory, for assisting hash units and coupling comparing unit Work.
17. devices according to claim 16 are it is characterised in that described coupling comparing unit specifically includes:
Match address spider module, for from chain head coupling pushup storage and backtracking memory value, searching loop goes out It is possible to the character string first place address mated with currently processed character field, and be sent in coupling dictionary value module;
Coupling dictionary value module:For the address providing according to match address spider module, take out phase from dictionary memory Answer character string;Spliced by value, export the character string of continuous 8 bytes every time;
Comparison module:For being compared coupling dictionary value module continuation character string, after completing all coupling relatively, output Optimum matching result;Match address spider module compares according to the coupling that above-mentioned matching result carries out a new round.
18. devices according to any one of claim 10 to 17 it is characterised in that described data matching device be based on can Programming gate array.
CN201410197834.6A 2014-05-12 2014-05-12 Data matching method and device based on assembly line Active CN103997346B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410197834.6A CN103997346B (en) 2014-05-12 2014-05-12 Data matching method and device based on assembly line

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410197834.6A CN103997346B (en) 2014-05-12 2014-05-12 Data matching method and device based on assembly line

Publications (2)

Publication Number Publication Date
CN103997346A CN103997346A (en) 2014-08-20
CN103997346B true CN103997346B (en) 2017-02-15

Family

ID=51311367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410197834.6A Active CN103997346B (en) 2014-05-12 2014-05-12 Data matching method and device based on assembly line

Country Status (1)

Country Link
CN (1) CN103997346B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183557B (en) * 2015-08-26 2018-11-20 东南大学 A kind of hardware based configurable data compression system
CN105207678B (en) * 2015-09-29 2018-10-26 东南大学 A kind of system for implementing hardware of modified LZ4 compression algorithms
CN106096332A (en) * 2016-06-28 2016-11-09 深圳大学 Parallel fast matching method and system thereof towards the DNA sequence stored
CN106603677A (en) * 2016-12-21 2017-04-26 济南浪潮高新科技投资发展有限公司 Physical information system data compression transmission method using multi-core multi-thread parallelism
CN108011952B (en) * 2017-12-01 2021-06-18 北京奇艺世纪科技有限公司 Method and device for acquiring compression dictionary
CN109361398B (en) * 2018-10-11 2022-12-30 南威软件股份有限公司 LZ process hardware compression method and system based on parallel and pipeline design
CN110233627B (en) * 2019-05-22 2023-05-12 深圳大学 Hardware compression system and method based on running water
CN110489355B (en) * 2019-08-19 2020-12-08 上海安路信息科技有限公司 Mapping method and system of logic BRAM
CN111124312B (en) * 2019-12-23 2023-10-31 第四范式(北京)技术有限公司 Method and device for data deduplication
CN112464593B (en) * 2020-11-25 2022-09-02 海光信息技术股份有限公司 ROM bit mapping relation generation method and device, processor chip and server
CN114025024B (en) * 2021-10-18 2023-07-07 中国银联股份有限公司 Data transmission method and device
CN115577149B (en) * 2022-12-13 2023-03-10 浪潮电子信息产业股份有限公司 Data processing method, device and equipment and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996139A (en) * 2009-08-28 2011-03-30 百度在线网络技术(北京)有限公司 Data matching method and data matching device
CN103095305A (en) * 2013-01-06 2013-05-08 中国科学院计算技术研究所 System and method for hardware LZ77 compression implementation
CN103475275A (en) * 2013-09-28 2013-12-25 重庆大学 Passive tire power generating device and tire parameter detecting system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996139A (en) * 2009-08-28 2011-03-30 百度在线网络技术(北京)有限公司 Data matching method and data matching device
CN103095305A (en) * 2013-01-06 2013-05-08 中国科学院计算技术研究所 System and method for hardware LZ77 compression implementation
CN103475275A (en) * 2013-09-28 2013-12-25 重庆大学 Passive tire power generating device and tire parameter detecting system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LZ77压缩算法及其派生算法探究;高志坚,蒋春蕾;《西昌学院学报 自然科学版》;20050331;第19卷(第1期);全文 *
基于流水线结构的可重构AES算法IP核的硬件实现;李冰,夏克维,梁文丽;《Journal of Southeast University》;20100331;第26卷(第1期);全文 *

Also Published As

Publication number Publication date
CN103997346A (en) 2014-08-20

Similar Documents

Publication Publication Date Title
CN103997346B (en) Data matching method and device based on assembly line
RU2629440C2 (en) Device and method for acceleration of compression and decompression operations
CN110599169B (en) Data processing method, device, terminal and medium
CN107111623A (en) Parallel historical search and coding for the compression based on dictionary
CN102870116B (en) Method and apparatus for content matching
CN102970043A (en) GZIP (GNUzip)-based hardware compressing system and accelerating method thereof
CN101996139A (en) Data matching method and data matching device
CN114064984B (en) World state increment updating method and device based on sparse array linked list
CN105844210B (en) Hardware efficient fingerprinting
CN112003814B (en) Market data processing method and device, terminal equipment and storage medium
CN108628898A (en) The method, apparatus and equipment of data loading
CN109889205A (en) Encoding method and system, decoding method and system, and encoding and decoding method and system
CN107294539A (en) A kind of Quasi dynamic Huffman hardware coder and coding method
CN114157305B (en) Method for rapidly realizing GZIP compression based on hardware and application thereof
CN116192154B (en) Data compression and data decompression method and device, electronic equipment and chip
EP2677450B1 (en) A system and method for compressed level-ordered edge sequence encoding
CN115982311B (en) Method and device for generating linked list, terminal equipment and storage medium
US9455742B2 (en) Compression ratio for a compression engine
CN104036141A (en) Open computing language (OpenCL)-based red-black tree acceleration algorithm
US20080306948A1 (en) String and binary data sorting
CN202931290U (en) Compression hardware system based on GZIP
CN108021678B (en) Key value pair storage structure with compact structure and quick key value pair searching method
CN106375490A (en) IP information matching and extension method
CN114647764A (en) Graph structure query method and device and storage medium
CN110096624B (en) Encoding and decoding method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Dong Qian

Inventor after: Liu Yong

Inventor after: Li Bing

Inventor after: Zhao Xia

Inventor after: Wang Gang

Inventor before: Li Bing

Inventor before: Dong Qian

Inventor before: Liu Yong

Inventor before: Zhao Xia

Inventor before: Wang Gang

COR Change of bibliographic data
C14 Grant of patent or utility model
GR01 Patent grant