CN102870116A - Method and apparatus for content matching - Google Patents
Method and apparatus for content matching Download PDFInfo
- Publication number
- CN102870116A CN102870116A CN2012800006149A CN201280000614A CN102870116A CN 102870116 A CN102870116 A CN 102870116A CN 2012800006149 A CN2012800006149 A CN 2012800006149A CN 201280000614 A CN201280000614 A CN 201280000614A CN 102870116 A CN102870116 A CN 102870116A
- Authority
- CN
- China
- Prior art keywords
- hash
- target
- hash table
- result
- string
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
Abstract
The embodiment of the invention provides a method and an apparatus for content matching. The method comprises: based on at least one set hash algorithm, performing a hash operation to at least one target string to respectively obtain hash results for each target; forming a hash table item of the target string according to each target hash result, and combinging the hash table item of the target string to be a hash table; according to the at least one hash algorithm, performing a hash operation to to-be-tested strings to obtain to-be-tested hash results; according to of the to-be-tested hash results of the to to-be-tested strings, matching the each of the hash items in the hash table, to obtain matching results. According to the invention, the system resources occupied by the matching can be simplified; the string extraction process and the string hash matching process is in parallel execution, so matching speed is improved; and when target strings are increased or decreased, recompilation of the hash matching table is not needed, easy for upgrade and maintenance.
Description
Technical field
The embodiment of the invention relates to data processing technique, relates in particular to a kind of content matching method and apparatus.
Background technology
Along with the network development that becomes more meticulous, many network users and equipment vendor more and more pay close attention to message content more than 7 layers, be used for carrying out packet filtering, content charging, flow detection, search engine etc., also progressively be used widely in fields such as national defence, public security, safety, network service management, commercial advertisements.Deep message is resolved (Deep Packet Inspection is called for short DPI) technology and is arisen at the historic moment, can be based on each field contents in the agreement regulation identification message.Protocol identification/parsing is one of DPI gordian technique, and character string/tagged word coupling is the important content of protocol identification/parsing, and the speed of matching speed directly affects properties of product.
The content matching technology that prior art is carried out for character string or tagged word is typical carries out following operation: a) target string is divided at least one first character string; B) by combination producing the second character string group, for example further with the substring of the first character string as the second character string; C) from the second character string, extract the three-character doctrine string, for example filter out character string commonly used as the three-character doctrine string according to blacklist, white list, adopt each three-character doctrine string of state machine or rule tree scheduling algorithm compiling; D) adopt sliding window mode, according to different reference positions, whether mate the three-character doctrine string of first character string Nodes in the relatively more detected character string; E) if the match is successful, but there is character late string node, then enters next coupling flow process; F) if the match is successful, and without next character string node, then detected character string and target string mate; G) if it fails to match, then detected character string and target string do not mate.
There is following defective at least in existing content matching method: 1) if when target string is longer, matched node branch and match time are multiplied, and performance can sharply descend; 2) when improving performance, can only adopt many matching engine, resource consumption is excessive; When 3) increasing target string newly, need to recompilate rule tree, be unfavorable for the heat upgrading, can only adopt list item to back up switching mode and solve.
Summary of the invention
The embodiment of the invention provides a kind of content matching method and apparatus, and the matching speed during with raising character string content matching reduces the resource occupation amount, is convenient to simultaneously upgrading and maintenance.
The embodiment of the invention provides a kind of content matching method, comprising:
Based at least a hash algorithm of setting at least one target string is carried out respectively Hash operation, to obtain respectively each target string each target Hash result corresponding with each hash algorithm;
Form the hash table of this target string according to each target Hash result of each target string, the hash table of each target string is combined to form the Hash matching list;
According to described at least a hash algorithm tested character string is carried out Hash operation, to obtain described tested character string each tested Hash result corresponding with each hash algorithm;
Each tested Hash result according to described tested character string is mated in each hash table of described Hash matching list, to obtain matching result.
The embodiment of the invention also provides a kind of content matching device, comprising:
The first Hash operation module is used for based at least a hash algorithm of setting at least one target string being carried out respectively Hash operation, to obtain respectively each target string each target Hash result corresponding with each hash algorithm;
Hash table forms module, is used for forming according to each target Hash result of each target string the hash table of this target string, and the hash table of each target string is combined to form the Hash matching list;
The second Hash operation module is used for according to described at least a hash algorithm tested character string being carried out Hash operation, to obtain described tested character string each tested Hash result corresponding with each hash algorithm;
The Hash table matching module is used for mating at each hash table of described Hash matching list according to each tested Hash result of described tested character string, to obtain matching result.
The content matching method and apparatus that the embodiment of the invention provides has been simplified and has been mated shared system resource, need not extra switching or resource backup; Character string leaching process and character string Hash matching process executed in parallel can shorten match time greatly, can improve matching speed; The matching operation of Hash result can not be subject to the impact of target string length, so matching efficiency is high; And when increasing or reduce target string, needn't recompilate the Hash matching list, get final product and only need revise corresponding hash table, hash algorithm and quantity thereof also can be upgraded at any time, therefore are easy to upgrade and safeguard.
Description of drawings
The process flow diagram of the content matching method that Fig. 1 provides for the embodiment of the invention one;
The process flow diagram of the content matching method that Fig. 2 provides for the embodiment of the invention two;
The process flow diagram of the content matching method that Fig. 3 provides for the embodiment of the invention three;
The process flow diagram of the content matching method that Fig. 4 provides for the embodiment of the invention four;
The structural representation of the content matching device that Fig. 5 provides for the embodiment of the invention five;
The structural representation of the content matching device that Fig. 6 provides for the embodiment of the invention six;
The structural representation of the content matching device that Fig. 7 provides for the embodiment of the invention seven.
Embodiment
For the purpose, technical scheme and the advantage that make the embodiment of the invention clearer, below in conjunction with the accompanying drawing in the embodiment of the invention, technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
Embodiment one
The process flow diagram of the content matching method that Fig. 1 provides for the embodiment of the invention one, this content matching method can specifically be applied in the various application scenarioss, typically such as network address filtration, packet filtering etc., carried out by the content matching device that is carried in the server with software and/or example, in hardware, for example be carried on Gateway GPRS Support Node (Gateway GPRS Support Node is called for short GGSN).The method specifically comprises the steps:
Step 110, content matching device carry out respectively Hash operation based at least a hash algorithm of setting at least one target string, to obtain respectively each target string each target Hash result corresponding with each hash algorithm;
Step 130, content matching device carry out Hash operation according to described at least a hash algorithm to tested character string, and be described by tested character string each tested Hash operation result corresponding with each hash algorithm to obtain;
Step 140, content matching device mate in each hash table of described Hash matching list according to each tested Hash operation result of described tested character string, to obtain matching result.
Comprised the compile step 110 and 120 to target string in the technical scheme of present embodiment, and the step 130 and 140 that adopts the Hash matching list of compiling that tested character string is mated.So-called target string is in the content matching technology, and the character string as the coupling benchmark can be preseted by the user.So-called tested character string is in the content matching technology, is mated the character string of filtration, such as the field in the message to be filtered, network address etc. as needs.For example, in the application of home page filter, the user can set the key word of embodiment filtration target as target string, for example can embody the character string of violence, pornographic etc. content to be filtered, target string is carried out precompile, in order to carry out follow-up matching operation.The network address opened of user as tested character string, at first can be mated with precompiler target string subsequently, if coupling is consistent, then can filter out this webpage, otherwise, then normally open the webpage of this network address.
Present embodiment is converted to the target Hash result with target string by hash algorithm, adopt same hash algorithm to obtain the tested Hash result of tested character string, the coupling by Hash result obtain tested character string whether with the result of target string coupling.
The technical scheme of the embodiment of the invention has been simplified and has been mated shared system resource, need not extra switching or resource backup; In addition, character string leaching process and character string Hash matching process executed in parallel when the tested character string in the message is more, for example more than 20, also can greatly shorten match time, but line production can improve matching speed; The matching operation of Hash result can not be subject to the impact of target string length, so matching efficiency is high; And when increasing or reduce target string, needn't recompilate the Hash matching list, get final product and only need revise corresponding hash table, hash algorithm and quantity thereof also can be upgraded at any time, therefore are easy to upgrade and safeguard.
The technical scheme of present embodiment, the degree of accuracy of matching result is relevant with the quantity of the concrete hash algorithm that adopts and hash algorithm.Select suitable hash algorithm, can embody to greatest extent the characteristic of target string, thereby so that identical character string has identical Hash result.Increase the quantity of hash algorithm, can reduce the probability that the kinds of characters string has identical Hash result equally, thereby can reduce the mistake matching rate.Concrete hash algorithm and the quantity thereof that adopts can arrange according to the practical application scenes such as quantity of target string.The technical scheme of the embodiment of the invention is not limited to the coupling of character string applicable to multiple situation, also can be for being applicable to the coupling of a string data.
Embodiment two
On the basis of above-described embodiment technical scheme, preferably the quantity of hash algorithm is at least two, the hash table that then forms this target string according to each target Hash result of each target string specifically can comprise: as the hash table index, other target Hash result are as the hash table content with the first aim Hash result of each target string; In the above-mentioned steps, chosen first tested Hash result as the hash table index, but in the practical application, be not defined as according to the order of Hash result and determine which result is as table item index, but the order of Set arbitrarily hash algorithm can determine arbitrarily that the target Hash result of first acquisition is as the hash table index.
Then step 140 is mated in each hash table of described Hash matching list according to each tested Hash result of described tested character string, can specifically carry out following operation to obtain matching result:
Step 141, content matching device as the hash table index, are searched corresponding hash table with first tested Hash result of described tested character string in described Hash matching list; If step 142 finds corresponding hash table, the content matching device mates other tested Hash result of described tested character string with the content of the hash table that finds;
Step 143, all mate when consistent the result that obtains that the match is successful when described other tested Hash result and the content of the hash table that finds.
The process flow diagram of the content matching method that Fig. 2 provides for the embodiment of the invention two, present embodiment is introduced each step by way of example in detail.
At first self-defined used hash algorithm is five, embody as far as possible the character string feature, as shown in table 1, be not limited to this in the practical operation, can be any one or combination in any of following hash algorithm, also can add other hash algorithm, when target string is short or less, also can be set to hash algorithm by the original character string:
Table 1
Hash algorithm 1 | Character string is by byte phase XOR |
Hash algorithm 2 | The character string initial character |
Hash algorithm 3 | The character string trailing character |
Hash algorithm 4 | String length |
Hash algorithm 5 | Character string is by double byte phase XOR |
Step 201, content matching device carry out Hash operation based on five kinds of hash algorithms setting to target string, to obtain respectively target string each target Hash result corresponding with each hash algorithm;
The hypothetical target character string has three, is respectively:
Target string 1=" ABCDEFG123456789 "
Target string 2=" abcdefg-xyz "
Target string 3=" Accept-Language "
Then the ASCII character sequence of target string is respectively:
Target string 1 corresponding A SCII code sequence=" 41424344454647313233343536373839 "
Target string 2 corresponding A SCII code sequence=" 616263646566672D78797A "
Target string 3 corresponding A SCII code sequence=" 4163636570742D4C616E6775616765 "
Each target Hash result is as shown in table 2 below:
Table 2
Step 202, content matching device form the hash table of this target string according to each target Hash result of each target string, and the hash table of each target string is combined to form the Hash matching list;
The target Hash result of each target string is constructed as follows the Hash matching list shown in the table 3, wherein the first aim Hash result is as hash table index (tab_index), and other target Hash result are as hash table content (tab_content).Be about to target Hash result 1 as the hash table index, target Hash result 2~5 generates the Hash matching list as the hash table content.Wherein, the form of contents in table (tab_content) can be designated as=stringID, and hash result 2, and hash result 3, and hash result 4, and hash is 5} as a result, and is as shown in table 3.
Table 3
Step 203, content matching device carry out Hash operation according to described at least a hash algorithm to tested character string, to obtain tested character string each tested Hash result corresponding with each hash algorithm.
The obtain manner of tested character string has multiple, make the concrete application scenarios of content matching method and decide, it typically can be the matching string of message in the network, HTTP (Hypertext Transfer Protocol the is called for short HTTP) request message that for example receives in the network is as follows:
GET/product/ggsn/index.htm HTTP/1.1\r\n
Accept:*/*\r\n
Referer:http://www.huawei.com∧r\n
Accept-Language:zh-cn\r\n
Accept-Encoding:gzip,deflate\r\n
User-Agent:Mozilla/4.0(compatible;MSIE 6.0;Windows NT 5.1;SV1;.NETCLR 2.0.50727)\r\n
Host:www.huawei.com\r\n
ABCDEFG123456789:xxxxxxxxxxx\r\n
Connection:Keep-Alive\r\n
\r\n
According to the http protocol resolution rules, the character string of extracting between " r n " characteristic character and ": " characteristic character is as follows:
Extract tested character string 1=" Accept "
Extract tested character string 2=" Referer "
Extract tested character string 3=" Accept-Language "
Extract tested character string 4=" Accept-Encoding "
Extract tested character string 5=" User-Agent "
Extract tested character string 6=" Host "
Extract tested character string 7=" ABCDEFG123456789 "
Extract tested character string 8=" Connection "
The fetch strategy of above-mentioned character string can also have multiple, and the embodiment of the invention is not limited to this.
It is as shown in table 4 below to calculate the tested Hash result that obtains according to the hash algorithm of setting:
Table 4
Step 204, content matching device as the hash table index, are searched corresponding hash table with first tested Hash result of tested character string in described Hash matching list;
If step 205 finds corresponding hash table, then the content matching device mates other tested Hash result of described tested character string with the hash table content that finds hash table; Certainly, if can't find corresponding hash table, then need not to carry out subsequent step, show that it fails to match.
Take " Accept-Language " as example, its corresponding hash table index is 0x3F, can in the Hash matching list of table 3, find corresponding hash table, " ABCDEFG123456789 " also can find corresponding hash table equally, other tested character strings then can't be searched corresponding list item, can be considered as directly that it fails to match.Other tested Hash result of " Accept-Language " and " ABCDEFG123456789 " are mated with the hash table content that finds hash table.The matching result of each tested character string is as shown in table 5 below:
Table 5
Matching string | Table item index | Matching result |
Accept | 0x20 | List item is empty, does not mate |
Referer | 0x51 | List item is empty, does not mate |
Accept-Language | 0x3F | 0x03_41_65_0F_7D42, fully coupling |
Accept-Encoding | 0x3D | List item is empty, does not mate |
User-Agent | 0x45 | List item is empty, does not mate |
Host | 0x20 | List item is empty, does not mate |
ABCDEFG123456789 | 0x71 | 0x01_41_39_10_0879, fully coupling |
Connection | 0x59 | List item is empty, does not mate |
Step 206, all mate when consistent the result that obtains that the match is successful when other tested Hash result and all hash table contents of the hash table that finds of tested character string.
Be that tested character string " Accept-Language " and " ABCDEFG123456789 " match target string.Can be according to the character string ID of matching result output matching success, in order to carry out follow-up operation according to matching result, such as the network address filtration etc.Also can further judge whether the in addition input of subsequent packet, if having, then repeat above-mentioned coupling flow process.
The technical scheme of present embodiment describes the operation of each step in detail, because the matching operation of each list item can independently be carried out, therefore the Hash result of each tested character string is calculated and is mated to walk abreast and carries out, and flowing water is realized the two-forty coupling, so significantly improved matching speed.
In the technical scheme of above-described embodiment, can select the Hash result of a certain hash algorithm as table item index, perhaps, also can select a plurality of Hash result of multiple hash algorithm to make up, as table item index.In fact, being made up by a plurality of Hash result namely is a kind of combined type hash algorithm.So the hash algorithm in the embodiment of the invention not only can be simple Hash calculation, also can be the combination hash algorithm of a plurality of simple Hash calculation, the feature of sign character string that can be more outstanding improves matching precision.
Embodiment three
The process flow diagram of the content matching method that Fig. 3 provides for the embodiment of the invention three, present embodiment can further comprise the updating operation of adding target string take above-described embodiment as the basis, on the basis of aforementioned flow process, also comprises the steps:
In above-mentioned steps, also can arrange adding other restrictions of operation, for example, judge at first whether the list item of Hash matching list has reached higher limit, thereby determine whether to allow to add new target string.
Present embodiment adds the process of target string can effectively avoid the identical conflict that causes of character string hash table index.If the hash table index of target string is identical, then can adopt the mode of cascade that another hash table is set.When tested character string was mated, the list item of storing for cascade form can carry out the order coupling, to guarantee the accuracy requirement of matching result.
Adopt cascade system, other target Hash result of target string to be added can specifically be comprised as the operation that the next stage contents in table of current hash table is added in the described Hash matching list:
Whether other target Hash result of step 351, target string more to be added are consistent with the contents in table of current hash table, if then execution in step 352, if not, then execution in step 353;
If when step 352 is consistent, abandon target string to be added, finish to add operating process;
When if step 353 is inconsistent, read the next stage offset table entry index of current hash table, and read the next stage hash table according to the offset table entry index, with described next stage hash table as the current hash table after upgrading;
Step 354, judge whether the contents in table of the current hash table after the described renewal is empty, if then execution in step 355, if not, then execution in step 356;
When the contents in table of step 355, the current hash table after judging described renewal is empty, other target Hash result of target string to be added are added to the contents in table of current hash table;
When the contents in table of step 356, the current hash table after judging described renewal is not empty, return the compare operation of execution in step 351.
Above-mentioned steps 354 to 356 is equivalent to return execution abovementioned steps 330.
Technique scheme, if the contents in table of certain one-level is not empty, but comparative result namely shows the conflict that the different target character string has identical hash table occurs when consistent, can directly abandon this type of target string.Descend although can produce certain matching precision,, by hash algorithm and quantity thereof are set, can reduce this type of conflict as far as possible, or by alert, reduce the setting of this type of target string as far as possible.
In the aforesaid operations, the list item of cascade can have multistage, and every one-level is recorded the offset table entry index of one-level, in order to, be transferred to next stage and continue coupling when it fails to match at the one-level list item, until the match is successful or without next stage skew list item.Namely in the coupling flow process of tested character string, if find corresponding hash table, after other tested Hash result of described tested character string and the content of the hash table that finds mated, also comprise: when described other tested Hash result and the content matching of the hash table that finds are inconsistent, according to offset table entry index sequential search next stage hash table, and return the operation that other tested Hash result of described tested character string and the content of the hash table that finds are mated in execution.
The target string adding method that present embodiment provides, the interpolation of a certain list item does not affect other list items, can carry out renewal in full to the Hash matching list, therefore is easy to realize maintenance upgrade.
On the basis of technique scheme, preferably arrange and comprise at least in the hash algorithm the original character string hash algorithm of original character string as Hash result itself, and described original character string is as the contents in table of next stage hash table.
Two hash algorithms for example are set:
Hash algorithm 1=character string is by byte phase XOR
Hash algorithm 2=complete object character string
Then the Hash result of target string is as shown in table 6 below:
Table 6
ID | Target string | Target Hash result 1 | Target Hash result 2 |
1 | A | 0x61 | 0x00000061 |
2 | B | 0x62 | 0x00000062 |
3 | Cd | 0x07 | 0x00006364 |
4 | Ef | 0x03 | 0x00006566 |
5 | ABCD | 0x04 | 0x41424344 |
6 | EFGH | 0x0C | 0x45464748 |
7 | BCD | 0x45 | 0x00424344 |
8 | EFG | 0x44 | 0x00454647 |
The original character string of employing target string is as the advantage of a hash algorithm, can guarantee the matching precision of character string, and the result of this hash algorithm is as the next stage of hash table cascade or the advantage of afterbody, cascade list item coupling can be carried out in serial, if aforesaid Hash result coupling is inconsistent, can judge rapidly then that it fails to match, guarantee to mate accuracy until next stage or afterbody just carry out the exact matching of original character string.Both can guarantee matching precision, and can save match time again.
Embodiment four
The process flow diagram of the content matching method that Fig. 4 provides for the embodiment of the invention four, present embodiment can be take above-mentioned any embodiment as the basis, further increased the operation of revising or deleting target string, modification and deletion action are substantially similar, specifically comprise the steps:
Step 410, content matching device are revised treating in request or the removal request according to the target string that receives and are revised or target string to be deleted, wait to revise or target string to be deleted carries out Hash operation to described based at least a hash algorithm of setting, describedly wait to revise or each target Hash result that target string to be deleted is corresponding with each hash algorithm to obtain;
Step 420, content matching device will wait to revise or the first aim Hash result of target string to be deleted as the hash table index, from described Hash matching list, read corresponding hash table, as current hash table;
Retouching operation is specifically revised corresponding hash table content, and deletion action then is cascade list item corresponding to deletion or deletes whole list item.
Embodiment five
The structural representation of the content matching device that Fig. 5 provides for the embodiment of the invention five, this content matching device comprises: the first Hash operation module 510, Hash table form module 520, the second Hash operation module 530 and Hash table matching module 540.Wherein, the first Hash operation module 510 is used for based at least a hash algorithm of setting at least one target string being carried out respectively Hash operation, to obtain respectively each target string each target Hash result corresponding with each hash algorithm; Hash table forms the hash table that module 520 is used for forming according to each target Hash result of each target string this target string, and the hash table of each target string is combined to form the Hash matching list; The second Hash operation module 530 is used for according to described at least a hash algorithm tested character string being carried out Hash operation, to obtain described tested character string each tested Hash result corresponding with each hash algorithm; Hash table matching module 540 is used for mating at each hash table of described Hash matching list according to each tested Hash result of described tested character string, to obtain matching result.
On the basis of technique scheme, the quantity of hash algorithm is preferably at least two, then described Hash table forms that module 520 is concrete to be used for first aim Hash result with each target string as the hash table index, other target Hash result are as the hash table content, and the hash table of each target string is combined to form the Hash matching list; Then described Hash table matching module 540 can comprise: index matching unit 541, content matching unit 542 and acquiring unit 543 as a result.Wherein, index matching unit 541 be used for will described tested character string first tested Hash result as the hash table index, in described Hash matching list, search the hash table of correspondence; If content matching unit 542 is used for finding corresponding hash table, other tested Hash result and the content of the hash table that finds of described tested character string are mated; Acquiring unit 543 is used for all mating when consistent when described other tested Hash result and the content of the hash table that finds as a result, acquisition the match is successful result.
The content matching device that the embodiment of the invention provides is realized the coupling of character string by the coupling of Hash result, can realize the PARALLEL MATCHING of each list item, thereby improves matching speed; The renewal of each list item does not influence each other in the Hash matching list, is easy to increase and the modification of realize target character string.
Embodiment six
The structural representation of the content matching device that Fig. 6 provides for the embodiment of the invention six, on the basis of above-described embodiment, this content matching device can also comprise: the 3rd Hash operation module 610, list item read module 620, content add module 630 and module 640 is added in cascade.Wherein, the 3rd Hash operation module 610 is used for adding request target string to be added according to the target string that receives, at least a hash algorithm based on described setting carries out Hash operation to target string to be added, to obtain described target string each target Hash result corresponding with each hash algorithm to be added; List item read module 620 is used for first aim Hash result with target string to be added as the hash table index, reads corresponding hash table from described Hash matching list, as current hash table; Content is added module 630 and is used for other target Hash result of described target string to be added being added in the current hash table, as the content of current hash table when the contents in table of described current hash table is sky; Cascade is added module 640 and is used for when the contents in table of described current hash table is not sky, adopt cascade system, other target Hash result of described target string to be added next stage contents in table as described current hash table is added in the described Hash matching list.
Wherein, cascade interpolation module 640 preferably includes: comparing unit 641, discarding unit 642, offset index reading unit 643, content adding device 644 and content judging unit 645.Wherein, whether comparing unit 641 is consistent with the contents in table of current hash table for other target Hash result of target string more to be added; If when discarding unit 642 is used for unanimously, abandon target string to be added; If offset index reading unit 643 is used for when inconsistent, read the next stage offset table entry index of current hash table, and read the next stage hash table according to described offset table entry index, with described next stage hash table as the current hash table after upgrading; The contents in table that content adding device 644 is used for the current hash table when judging described renewal after is added to other target Hash result of target string to be added the contents in table of current hash table during for sky; The contents in table that content judging unit 645 is used for the current hash table when judging described renewal after returns the described compare operation of execution when empty.
This content matching device can also comprise: the skew list item is searched the unit, when described other tested Hash result and the content matching of the hash table that finds are inconsistent, according to offset table entry index sequential search next stage hash table, and return carry out the content matching unit will described tested character string other tested Hash result and the operation of mating of the content of the hash table that finds.
The content matching device that present embodiment provides can add target string easily, can not cause the compiling of whole Hash matching list, so be easy to upgrade maintenance.By the list item of cascade is set, also can effectively solve the collision problem of hash table between the target string, improve the precision of coupling.
Embodiment seven
The structural representation of the content matching device that Fig. 7 provides for the embodiment of the invention seven, this content matching device can also comprise: the 4th Hash operation module 710, index matching module 720 and revise removing module 730.Wherein, the 4th Hash operation module 710 be used for according to the target string that receives revise request or removal request wait revise or target string to be deleted, wait to revise or target string to be deleted carries out Hash operation to described based at least a hash algorithm of setting, describedly wait to revise or each target Hash result that target string to be deleted is corresponding with each hash algorithm to obtain; Index matching module 720 be used for waiting revising or the first aim Hash result of target string to be deleted as the hash table index, from described Hash matching list, read corresponding hash table, as current hash table; Revise removing module 730 for the hash table content of described current hash table being made amendment or deleting.
Target string can be revised and delete to the content matching device that present embodiment provides easily, can not cause the compiling of whole Hash matching list, so be easy to upgrade maintenance.
The content matching device that various embodiments of the present invention provide can be carried out the content matching method that any embodiment of the present invention provides, and possesses corresponding functional module.This content matching method and apparatus possesses plurality of advantages, can improve matching speed, reduce resource occupation, can be easy to upgrade maintenance simultaneously.
One of ordinary skill in the art will appreciate that: all or part of step that realizes said method embodiment can be finished by the relevant hardware of programmed instruction, aforesaid program can be stored in the computer read/write memory medium, this program is carried out the step that comprises said method embodiment when carrying out; And aforesaid storage medium comprises: the various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
It should be noted that at last: above each embodiment is not intended to limit only in order to technical scheme of the present invention to be described; Although with reference to aforementioned each embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment puts down in writing, and perhaps some or all of technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the scope of various embodiments of the present invention technical scheme.
Claims (14)
1. a content matching method is characterized in that, comprising:
Based at least a hash algorithm of setting at least one target string is carried out respectively Hash operation, to obtain respectively each target string each target Hash result corresponding with each hash algorithm;
Form the hash table of this target string according to each target Hash result of each target string, the hash table of each target string is combined to form the Hash matching list;
According to described at least a hash algorithm tested character string is carried out Hash operation, to obtain described tested character string each tested Hash result corresponding with each hash algorithm;
Each tested Hash result according to described tested character string is mated in each hash table of described Hash matching list, to obtain matching result.
2. content matching method according to claim 1 is characterized in that, the quantity of described hash algorithm is at least two,
The hash table that then forms this target string according to each target Hash result of each target string comprises: as the hash table index, other target Hash result are as the hash table content with the first aim Hash result of each target string;
Then mate in each hash table of described Hash matching list according to each tested Hash result of described tested character string, comprise to obtain matching result:
First tested Hash result of described tested character string as the hash table index, is searched corresponding hash table in described Hash matching list;
If find corresponding hash table, other tested Hash result and the content of the hash table that finds of described tested character string are mated;
All mate when consistent the result that obtains that the match is successful when described other tested Hash result and the content of the hash table that finds.
3. content matching method according to claim 2 is characterized in that, also comprises:
Add target string to be added in the request according to the target string that receives, at least a hash algorithm based on described setting carries out Hash operation to target string to be added, to obtain described target string each target Hash result corresponding with each hash algorithm to be added;
The first aim Hash result of described target string to be added as the hash table index, is read corresponding hash table, as current hash table from described Hash matching list;
When the contents in table of described current hash table is sky, other target Hash result of described target string to be added are added in the list item of current hash table, as the content of current hash table;
When the contents in table of described current hash table when not being empty, adopt cascade system, other target Hash result of described target string to be added next stage contents in table as described current hash table is added in the described Hash matching list.
4. content matching method according to claim 3, it is characterized in that, adopt cascade system, other target Hash result of described target string to be added next stage contents in table as described current hash table be added in the described Hash matching list, comprising:
Whether other target Hash result of target string more to be added are consistent with the contents in table of current hash table;
If when consistent, abandon target string to be added;
If when inconsistent, read the next stage offset table entry index of current hash table, and read the next stage hash table according to described offset table entry index, with described next stage hash table as the current hash table after upgrading;
When the contents in table of the current hash table after judging described renewal is empty, other target Hash result of target string to be added are added to the contents in table of current hash table;
When the contents in table of the current hash table after judging described renewal is not empty, returns and carry out described compare operation.
5. according to claim 3 or 4 described content matching methods, it is characterized in that, if find corresponding hash table, after other tested Hash result and the content of the hash table that finds of described tested character string mated, also comprise:
When described other tested Hash result and the content matching of the hash table that finds are inconsistent, according to offset table entry index sequential search next stage hash table, and return the operation that other tested Hash result of described tested character string and the content of the hash table that finds are mated in execution.
6. content matching method according to claim 5, it is characterized in that: comprise at least in the described hash algorithm the original character string hash algorithm of original character string as Hash result itself, and described original character string is as the contents in table of next stage hash table.
7. according to claim 3 or 4 described content matching methods, it is characterized in that, also comprise:
Revising treating in request or the removal request according to the target string that receives revises or target string to be deleted, wait to revise or target string to be deleted carries out Hash operation to described based at least a hash algorithm of setting, describedly wait to revise or each target Hash result that target string to be deleted is corresponding with each hash algorithm to obtain;
Revise waiting or the first aim Hash result of target string to be deleted as the hash table index, from described Hash matching list, read corresponding hash table, as current hash table;
The hash table content of described current hash table is made amendment or deleted.
8. each described content matching method according to claim 1-7, it is characterized in that described hash algorithm comprises following a kind of or combination in any: character string is pressed double byte phase XOR by byte phase XOR, character string initial character, character string trailing character, string length and character string.
9. a content matching device is characterized in that, comprising:
The first Hash operation module is used for based at least a hash algorithm of setting at least one target string being carried out respectively Hash operation, to obtain respectively each target string each target Hash result corresponding with each hash algorithm;
Hash table forms module, is used for forming according to each target Hash result of each target string the hash table of this target string, and the hash table of each target string is combined to form the Hash matching list;
The second Hash operation module is used for according to described at least a hash algorithm tested character string being carried out Hash operation, to obtain described tested character string each tested Hash result corresponding with each hash algorithm;
The Hash table matching module is used for mating at each hash table of described Hash matching list according to each tested Hash result of described tested character string, to obtain matching result.
10. content matching device according to claim 9, it is characterized in that, the quantity of described hash algorithm is at least two, then described Hash table forms module and specifically is used for first aim Hash result with each target string as the hash table index, other target Hash result are as the hash table content, and the hash table of each target string is combined to form the Hash matching list;
Then described Hash table matching module comprises:
The index matching unit, be used for will described tested character string first tested Hash result as the hash table index, in described Hash matching list, search the hash table of correspondence;
The content matching unit if be used for finding corresponding hash table, mates other tested Hash result and the content of the hash table that finds of described tested character string;
Acquiring unit as a result is used for all mating when consistent when described other tested Hash result and the content of the hash table that finds, acquisition the match is successful result.
11. content matching device according to claim 10 is characterized in that, also comprises:
The 3rd Hash operation module, be used for adding request target string to be added according to the target string that receives, at least a hash algorithm based on described setting carries out Hash operation to target string to be added, to obtain described target string each target Hash result corresponding with each hash algorithm to be added;
The list item read module, be used for will described target string to be added the first aim Hash result as the hash table index, from described Hash matching list, read the hash table of correspondence, as current hash table;
Content is added module, is used for other target Hash result of described target string to be added being added in the current hash table, as the content of current hash table when the contents in table of described current hash table is sky;
Module is added in cascade, be used for when the contents in table of described current hash table is not sky, adopt cascade system, other target Hash result of described target string to be added next stage contents in table as described current hash table is added in the described Hash matching list.
12. content matching device according to claim 11 is characterized in that, cascade is added module and is comprised:
Comparing unit, whether other target Hash result that are used for target string more to be added are consistent with the contents in table of current hash table;
Discarding unit is if when being used for unanimously, abandon target string to be added;
The offset index reading unit if be used for when inconsistent, reads the next stage offset table entry index of current hash table, and reads the next stage hash table according to described offset table entry index, with described next stage hash table as the current hash table after upgrading;
The content adding device when contents in table that is used for the current hash table after judging described renewal is empty, is added to other target Hash result of target string to be added the contents in table of current hash table;
The content judging unit when contents in table that is used for the current hash table after judging described renewal is not empty, returns and carries out described compare operation.
13. according to claim 11 or 12 described content matching devices, it is characterized in that, also comprise:
The skew list item is searched the unit, when described other tested Hash result and the content matching of the hash table that finds are inconsistent, according to offset table entry index sequential search next stage hash table, and return carry out described content matching unit will described tested character string other tested Hash result and the operation of mating of the content of the hash table that finds.
14. according to claim 11 or 12 described content matching devices, it is characterized in that, also comprise:
The 4th Hash operation module, be used for according to the target string that receives revise request or removal request wait revise or target string to be deleted, wait to revise or target string to be deleted carries out Hash operation to described based at least a hash algorithm of setting, describedly wait to revise or each target Hash result that target string to be deleted is corresponding with each hash algorithm to obtain;
The index matching module, be used for waiting revising or the first aim Hash result of target string to be deleted as the hash table index, from described Hash matching list, read corresponding hash table, as current hash table;
Revise removing module, for the hash table content of described current hash table being made amendment or deleting.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2012/077996 WO2014000305A1 (en) | 2012-06-30 | 2012-06-30 | Method and apparatus for content matching |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102870116A true CN102870116A (en) | 2013-01-09 |
CN102870116B CN102870116B (en) | 2014-09-03 |
Family
ID=47447746
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201280000614.9A Expired - Fee Related CN102870116B (en) | 2012-06-30 | 2012-06-30 | Method and apparatus for content matching |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN102870116B (en) |
WO (1) | WO2014000305A1 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103116629A (en) * | 2013-02-01 | 2013-05-22 | 腾讯科技(深圳)有限公司 | Matching method and matching system of audio frequency content |
CN103414701A (en) * | 2013-07-25 | 2013-11-27 | 华为技术有限公司 | Rule matching method and device |
CN103500183A (en) * | 2013-09-12 | 2014-01-08 | 国家计算机网络与信息安全管理中心 | Storage structure based on multiple-relevant-field combined index and building, inquiring and maintaining method |
CN105426413A (en) * | 2015-10-31 | 2016-03-23 | 华为技术有限公司 | Coding method and device |
CN106067876A (en) * | 2016-05-27 | 2016-11-02 | 成都广达新网科技股份有限公司 | A kind of HTTP request packet identification method based on pattern match |
CN109977295A (en) * | 2019-04-11 | 2019-07-05 | 北京安护环宇科技有限公司 | A kind of black and white lists matching process and device |
CN111627536A (en) * | 2020-05-14 | 2020-09-04 | 广元市中心医院 | Adverse event management system and method for hospital |
CN113347214A (en) * | 2021-08-05 | 2021-09-03 | 湖南戎腾网络科技有限公司 | High-frequency state matching method and system |
CN113821544A (en) * | 2020-06-18 | 2021-12-21 | 律商联讯风险解决方案公司 | Fuzzy search using field-level pruning of neighborhoods |
CN114422389A (en) * | 2022-02-24 | 2022-04-29 | 成都北中网芯科技有限公司 | High-speed real-time network data monitoring method based on Hash and hardware acceleration |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030204703A1 (en) * | 2002-04-25 | 2003-10-30 | Priya Rajagopal | Multi-pass hierarchical pattern matching |
US20060034115A1 (en) * | 2003-06-27 | 2006-02-16 | Dialog Semiconductor Gmbh | Natural analog or multilevel transistor DRAM-cell |
CN1794236A (en) * | 2004-12-21 | 2006-06-28 | 英特尔公司 | Efficient CAM-based techniques to perform string searches in packet payloads |
CN101350788A (en) * | 2008-08-25 | 2009-01-21 | 中兴通讯股份有限公司 | Method for mixed loop-up table of network processor inside and outside |
CN101692651A (en) * | 2009-09-27 | 2010-04-07 | 中兴通讯股份有限公司 | Method and device for Hash lookup table |
-
2012
- 2012-06-30 WO PCT/CN2012/077996 patent/WO2014000305A1/en active Application Filing
- 2012-06-30 CN CN201280000614.9A patent/CN102870116B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030204703A1 (en) * | 2002-04-25 | 2003-10-30 | Priya Rajagopal | Multi-pass hierarchical pattern matching |
US20060034115A1 (en) * | 2003-06-27 | 2006-02-16 | Dialog Semiconductor Gmbh | Natural analog or multilevel transistor DRAM-cell |
CN1794236A (en) * | 2004-12-21 | 2006-06-28 | 英特尔公司 | Efficient CAM-based techniques to perform string searches in packet payloads |
CN101350788A (en) * | 2008-08-25 | 2009-01-21 | 中兴通讯股份有限公司 | Method for mixed loop-up table of network processor inside and outside |
CN101692651A (en) * | 2009-09-27 | 2010-04-07 | 中兴通讯股份有限公司 | Method and device for Hash lookup table |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103116629A (en) * | 2013-02-01 | 2013-05-22 | 腾讯科技(深圳)有限公司 | Matching method and matching system of audio frequency content |
CN103116629B (en) * | 2013-02-01 | 2016-04-20 | 腾讯科技(深圳)有限公司 | A kind of matching process of audio content and system |
CN103414701B (en) * | 2013-07-25 | 2017-03-01 | 华为技术有限公司 | A kind of rule matching method and device |
CN103414701A (en) * | 2013-07-25 | 2013-11-27 | 华为技术有限公司 | Rule matching method and device |
CN103500183A (en) * | 2013-09-12 | 2014-01-08 | 国家计算机网络与信息安全管理中心 | Storage structure based on multiple-relevant-field combined index and building, inquiring and maintaining method |
CN105426413B (en) * | 2015-10-31 | 2018-05-04 | 华为技术有限公司 | A kind of coding method and device |
WO2017071431A1 (en) * | 2015-10-31 | 2017-05-04 | 华为技术有限公司 | Encoding method and device |
CN105426413A (en) * | 2015-10-31 | 2016-03-23 | 华为技术有限公司 | Coding method and device |
US10305512B2 (en) | 2015-10-31 | 2019-05-28 | Huawei Technologies, Co., Ltd. | Encoding method and apparatus |
CN106067876A (en) * | 2016-05-27 | 2016-11-02 | 成都广达新网科技股份有限公司 | A kind of HTTP request packet identification method based on pattern match |
CN106067876B (en) * | 2016-05-27 | 2019-08-16 | 成都广达新网科技股份有限公司 | A kind of HTTP request packet identification method based on pattern match |
CN109977295A (en) * | 2019-04-11 | 2019-07-05 | 北京安护环宇科技有限公司 | A kind of black and white lists matching process and device |
CN111627536A (en) * | 2020-05-14 | 2020-09-04 | 广元市中心医院 | Adverse event management system and method for hospital |
CN113821544A (en) * | 2020-06-18 | 2021-12-21 | 律商联讯风险解决方案公司 | Fuzzy search using field-level pruning of neighborhoods |
CN113821544B (en) * | 2020-06-18 | 2024-03-19 | 律商联讯风险解决方案公司 | Improved fuzzy search using field level deletion neighborhood |
CN113347214A (en) * | 2021-08-05 | 2021-09-03 | 湖南戎腾网络科技有限公司 | High-frequency state matching method and system |
CN114422389A (en) * | 2022-02-24 | 2022-04-29 | 成都北中网芯科技有限公司 | High-speed real-time network data monitoring method based on Hash and hardware acceleration |
CN114422389B (en) * | 2022-02-24 | 2023-09-12 | 成都北中网芯科技有限公司 | High-speed real-time network data monitoring method based on hash and hardware acceleration |
Also Published As
Publication number | Publication date |
---|---|
WO2014000305A1 (en) | 2014-01-03 |
CN102870116B (en) | 2014-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102870116B (en) | Method and apparatus for content matching | |
CN104067282B (en) | Counter operation in state machine lattice | |
CN102857493B (en) | Content filtering method and device | |
CN101464905B (en) | Web page information extraction system and method | |
JP6051212B2 (en) | Processing iterative data | |
US8977626B2 (en) | Indexing and searching a data collection | |
CN102682098B (en) | Method and device for detecting web page content changes | |
CN102148805B (en) | Feature matching method and device | |
KR101617696B1 (en) | Method and device for mining data regular expression | |
CN101551803A (en) | Method and device for establishing pattern matching state machine and pattern recognition | |
CN103617226B (en) | A kind of matching regular expressions method and device | |
CN102609462A (en) | Method for compressed storage of massive SQL (structured query language) by means of extracting SQL models | |
CN111666468A (en) | Method for searching personalized influence community in social network based on cluster attributes | |
CN102193995B (en) | Method and device for establishing multimedia data index and retrieval | |
CN103064908A (en) | Method for rapidly removing repeated list through a memory | |
CN105630797A (en) | Data processing method and system | |
CN107977504A (en) | A kind of asymmetric in-core fuel management computational methods, device and terminal device | |
CN111078279A (en) | Processing method, device and equipment of byte code file and storage medium | |
CN111181980A (en) | Network security-oriented regular expression matching method and device | |
CN102982043B (en) | The disposal route of PE file and device | |
CN114201756A (en) | Vulnerability detection method and related device for intelligent contract code segment | |
US20080306948A1 (en) | String and binary data sorting | |
CN113779025B (en) | Optimization method, system and application of classified data retrieval efficiency in block chain | |
CN107729518A (en) | The text searching method and device of a kind of relevant database | |
CN109284268A (en) | A kind of method, system and the electronic equipment of fast resolving log |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20140903 Termination date: 20190630 |