CN102870116A - Method and apparatus for content matching - Google Patents

Method and apparatus for content matching Download PDF

Info

Publication number
CN102870116A
CN102870116A CN2012800006149A CN201280000614A CN102870116A CN 102870116 A CN102870116 A CN 102870116A CN 2012800006149 A CN2012800006149 A CN 2012800006149A CN 201280000614 A CN201280000614 A CN 201280000614A CN 102870116 A CN102870116 A CN 102870116A
Authority
CN
China
Prior art keywords
hash
target
hash table
result
string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012800006149A
Other languages
Chinese (zh)
Other versions
CN102870116B (en
Inventor
徐文广
戴崇经
田聃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN102870116A publication Critical patent/CN102870116A/en
Application granted granted Critical
Publication of CN102870116B publication Critical patent/CN102870116B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Abstract

The embodiment of the invention provides a method and an apparatus for content matching. The method comprises: based on at least one set hash algorithm, performing a hash operation to at least one target string to respectively obtain hash results for each target; forming a hash table item of the target string according to each target hash result, and combinging the hash table item of the target string to be a hash table; according to the at least one hash algorithm, performing a hash operation to to-be-tested strings to obtain to-be-tested hash results; according to of the to-be-tested hash results of the to to-be-tested strings, matching the each of the hash items in the hash table, to obtain matching results. According to the invention, the system resources occupied by the matching can be simplified; the string extraction process and the string hash matching process is in parallel execution, so matching speed is improved; and when target strings are increased or decreased, recompilation of the hash matching table is not needed, easy for upgrade and maintenance.

Description

The content matching method and apparatus
Technical field
The embodiment of the invention relates to data processing technique, relates in particular to a kind of content matching method and apparatus.
Background technology
Along with the network development that becomes more meticulous, many network users and equipment vendor more and more pay close attention to message content more than 7 layers, be used for carrying out packet filtering, content charging, flow detection, search engine etc., also progressively be used widely in fields such as national defence, public security, safety, network service management, commercial advertisements.Deep message is resolved (Deep Packet Inspection is called for short DPI) technology and is arisen at the historic moment, can be based on each field contents in the agreement regulation identification message.Protocol identification/parsing is one of DPI gordian technique, and character string/tagged word coupling is the important content of protocol identification/parsing, and the speed of matching speed directly affects properties of product.
The content matching technology that prior art is carried out for character string or tagged word is typical carries out following operation: a) target string is divided at least one first character string; B) by combination producing the second character string group, for example further with the substring of the first character string as the second character string; C) from the second character string, extract the three-character doctrine string, for example filter out character string commonly used as the three-character doctrine string according to blacklist, white list, adopt each three-character doctrine string of state machine or rule tree scheduling algorithm compiling; D) adopt sliding window mode, according to different reference positions, whether mate the three-character doctrine string of first character string Nodes in the relatively more detected character string; E) if the match is successful, but there is character late string node, then enters next coupling flow process; F) if the match is successful, and without next character string node, then detected character string and target string mate; G) if it fails to match, then detected character string and target string do not mate.
There is following defective at least in existing content matching method: 1) if when target string is longer, matched node branch and match time are multiplied, and performance can sharply descend; 2) when improving performance, can only adopt many matching engine, resource consumption is excessive; When 3) increasing target string newly, need to recompilate rule tree, be unfavorable for the heat upgrading, can only adopt list item to back up switching mode and solve.
Summary of the invention
The embodiment of the invention provides a kind of content matching method and apparatus, and the matching speed during with raising character string content matching reduces the resource occupation amount, is convenient to simultaneously upgrading and maintenance.
The embodiment of the invention provides a kind of content matching method, comprising:
Based at least a hash algorithm of setting at least one target string is carried out respectively Hash operation, to obtain respectively each target string each target Hash result corresponding with each hash algorithm;
Form the hash table of this target string according to each target Hash result of each target string, the hash table of each target string is combined to form the Hash matching list;
According to described at least a hash algorithm tested character string is carried out Hash operation, to obtain described tested character string each tested Hash result corresponding with each hash algorithm;
Each tested Hash result according to described tested character string is mated in each hash table of described Hash matching list, to obtain matching result.
The embodiment of the invention also provides a kind of content matching device, comprising:
The first Hash operation module is used for based at least a hash algorithm of setting at least one target string being carried out respectively Hash operation, to obtain respectively each target string each target Hash result corresponding with each hash algorithm;
Hash table forms module, is used for forming according to each target Hash result of each target string the hash table of this target string, and the hash table of each target string is combined to form the Hash matching list;
The second Hash operation module is used for according to described at least a hash algorithm tested character string being carried out Hash operation, to obtain described tested character string each tested Hash result corresponding with each hash algorithm;
The Hash table matching module is used for mating at each hash table of described Hash matching list according to each tested Hash result of described tested character string, to obtain matching result.
The content matching method and apparatus that the embodiment of the invention provides has been simplified and has been mated shared system resource, need not extra switching or resource backup; Character string leaching process and character string Hash matching process executed in parallel can shorten match time greatly, can improve matching speed; The matching operation of Hash result can not be subject to the impact of target string length, so matching efficiency is high; And when increasing or reduce target string, needn't recompilate the Hash matching list, get final product and only need revise corresponding hash table, hash algorithm and quantity thereof also can be upgraded at any time, therefore are easy to upgrade and safeguard.
Description of drawings
The process flow diagram of the content matching method that Fig. 1 provides for the embodiment of the invention one;
The process flow diagram of the content matching method that Fig. 2 provides for the embodiment of the invention two;
The process flow diagram of the content matching method that Fig. 3 provides for the embodiment of the invention three;
The process flow diagram of the content matching method that Fig. 4 provides for the embodiment of the invention four;
The structural representation of the content matching device that Fig. 5 provides for the embodiment of the invention five;
The structural representation of the content matching device that Fig. 6 provides for the embodiment of the invention six;
The structural representation of the content matching device that Fig. 7 provides for the embodiment of the invention seven.
Embodiment
For the purpose, technical scheme and the advantage that make the embodiment of the invention clearer, below in conjunction with the accompanying drawing in the embodiment of the invention, technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
Embodiment one
The process flow diagram of the content matching method that Fig. 1 provides for the embodiment of the invention one, this content matching method can specifically be applied in the various application scenarioss, typically such as network address filtration, packet filtering etc., carried out by the content matching device that is carried in the server with software and/or example, in hardware, for example be carried on Gateway GPRS Support Node (Gateway GPRS Support Node is called for short GGSN).The method specifically comprises the steps:
Step 110, content matching device carry out respectively Hash operation based at least a hash algorithm of setting at least one target string, to obtain respectively each target string each target Hash result corresponding with each hash algorithm;
Step 120, content matching device form the hash table of this target string according to each target Hash result of each target string, and the hash table of each target string is combined to form the Hash matching list;
Step 130, content matching device carry out Hash operation according to described at least a hash algorithm to tested character string, and be described by tested character string each tested Hash operation result corresponding with each hash algorithm to obtain;
Step 140, content matching device mate in each hash table of described Hash matching list according to each tested Hash operation result of described tested character string, to obtain matching result.
Comprised the compile step 110 and 120 to target string in the technical scheme of present embodiment, and the step 130 and 140 that adopts the Hash matching list of compiling that tested character string is mated.So-called target string is in the content matching technology, and the character string as the coupling benchmark can be preseted by the user.So-called tested character string is in the content matching technology, is mated the character string of filtration, such as the field in the message to be filtered, network address etc. as needs.For example, in the application of home page filter, the user can set the key word of embodiment filtration target as target string, for example can embody the character string of violence, pornographic etc. content to be filtered, target string is carried out precompile, in order to carry out follow-up matching operation.The network address opened of user as tested character string, at first can be mated with precompiler target string subsequently, if coupling is consistent, then can filter out this webpage, otherwise, then normally open the webpage of this network address.
Present embodiment is converted to the target Hash result with target string by hash algorithm, adopt same hash algorithm to obtain the tested Hash result of tested character string, the coupling by Hash result obtain tested character string whether with the result of target string coupling.
The technical scheme of the embodiment of the invention has been simplified and has been mated shared system resource, need not extra switching or resource backup; In addition, character string leaching process and character string Hash matching process executed in parallel when the tested character string in the message is more, for example more than 20, also can greatly shorten match time, but line production can improve matching speed; The matching operation of Hash result can not be subject to the impact of target string length, so matching efficiency is high; And when increasing or reduce target string, needn't recompilate the Hash matching list, get final product and only need revise corresponding hash table, hash algorithm and quantity thereof also can be upgraded at any time, therefore are easy to upgrade and safeguard.
The technical scheme of present embodiment, the degree of accuracy of matching result is relevant with the quantity of the concrete hash algorithm that adopts and hash algorithm.Select suitable hash algorithm, can embody to greatest extent the characteristic of target string, thereby so that identical character string has identical Hash result.Increase the quantity of hash algorithm, can reduce the probability that the kinds of characters string has identical Hash result equally, thereby can reduce the mistake matching rate.Concrete hash algorithm and the quantity thereof that adopts can arrange according to the practical application scenes such as quantity of target string.The technical scheme of the embodiment of the invention is not limited to the coupling of character string applicable to multiple situation, also can be for being applicable to the coupling of a string data.
Embodiment two
On the basis of above-described embodiment technical scheme, preferably the quantity of hash algorithm is at least two, the hash table that then forms this target string according to each target Hash result of each target string specifically can comprise: as the hash table index, other target Hash result are as the hash table content with the first aim Hash result of each target string; In the above-mentioned steps, chosen first tested Hash result as the hash table index, but in the practical application, be not defined as according to the order of Hash result and determine which result is as table item index, but the order of Set arbitrarily hash algorithm can determine arbitrarily that the target Hash result of first acquisition is as the hash table index.
Then step 140 is mated in each hash table of described Hash matching list according to each tested Hash result of described tested character string, can specifically carry out following operation to obtain matching result:
Step 141, content matching device as the hash table index, are searched corresponding hash table with first tested Hash result of described tested character string in described Hash matching list; If step 142 finds corresponding hash table, the content matching device mates other tested Hash result of described tested character string with the content of the hash table that finds;
Step 143, all mate when consistent the result that obtains that the match is successful when described other tested Hash result and the content of the hash table that finds.
The process flow diagram of the content matching method that Fig. 2 provides for the embodiment of the invention two, present embodiment is introduced each step by way of example in detail.
At first self-defined used hash algorithm is five, embody as far as possible the character string feature, as shown in table 1, be not limited to this in the practical operation, can be any one or combination in any of following hash algorithm, also can add other hash algorithm, when target string is short or less, also can be set to hash algorithm by the original character string:
Table 1
Hash algorithm 1 Character string is by byte phase XOR
Hash algorithm 2 The character string initial character
Hash algorithm 3 The character string trailing character
Hash algorithm 4 String length
Hash algorithm 5 Character string is by double byte phase XOR
Step 201, content matching device carry out Hash operation based on five kinds of hash algorithms setting to target string, to obtain respectively target string each target Hash result corresponding with each hash algorithm;
The hypothetical target character string has three, is respectively:
Target string 1=" ABCDEFG123456789 "
Target string 2=" abcdefg-xyz "
Target string 3=" Accept-Language "
Then the ASCII character sequence of target string is respectively:
Target string 1 corresponding A SCII code sequence=" 41424344454647313233343536373839 "
Target string 2 corresponding A SCII code sequence=" 616263646566672D78797A "
Target string 3 corresponding A SCII code sequence=" 4163636570742D4C616E6775616765 "
Each target Hash result is as shown in table 2 below:
Table 2
Figure BDA00001950733900061
Step 202, content matching device form the hash table of this target string according to each target Hash result of each target string, and the hash table of each target string is combined to form the Hash matching list;
The target Hash result of each target string is constructed as follows the Hash matching list shown in the table 3, wherein the first aim Hash result is as hash table index (tab_index), and other target Hash result are as hash table content (tab_content).Be about to target Hash result 1 as the hash table index, target Hash result 2~5 generates the Hash matching list as the hash table content.Wherein, the form of contents in table (tab_content) can be designated as=stringID, and hash result 2, and hash result 3, and hash result 4, and hash is 5} as a result, and is as shown in table 3.
Table 3
Figure BDA00001950733900062
Step 203, content matching device carry out Hash operation according to described at least a hash algorithm to tested character string, to obtain tested character string each tested Hash result corresponding with each hash algorithm.
The obtain manner of tested character string has multiple, make the concrete application scenarios of content matching method and decide, it typically can be the matching string of message in the network, HTTP (Hypertext Transfer Protocol the is called for short HTTP) request message that for example receives in the network is as follows:
GET/product/ggsn/index.htm HTTP/1.1\r\n
Accept:*/*\r\n
Referer:http://www.huawei.com∧r\n
Accept-Language:zh-cn\r\n
Accept-Encoding:gzip,deflate\r\n
User-Agent:Mozilla/4.0(compatible;MSIE 6.0;Windows NT 5.1;SV1;.NETCLR 2.0.50727)\r\n
Host:www.huawei.com\r\n
ABCDEFG123456789:xxxxxxxxxxx\r\n
Connection:Keep-Alive\r\n
\r\n
According to the http protocol resolution rules, the character string of extracting between " r n " characteristic character and ": " characteristic character is as follows:
Extract tested character string 1=" Accept "
Extract tested character string 2=" Referer "
Extract tested character string 3=" Accept-Language "
Extract tested character string 4=" Accept-Encoding "
Extract tested character string 5=" User-Agent "
Extract tested character string 6=" Host "
Extract tested character string 7=" ABCDEFG123456789 "
Extract tested character string 8=" Connection "
The fetch strategy of above-mentioned character string can also have multiple, and the embodiment of the invention is not limited to this.
It is as shown in table 4 below to calculate the tested Hash result that obtains according to the hash algorithm of setting:
Table 4
Figure BDA00001950733900081
Step 204, content matching device as the hash table index, are searched corresponding hash table with first tested Hash result of tested character string in described Hash matching list;
If step 205 finds corresponding hash table, then the content matching device mates other tested Hash result of described tested character string with the hash table content that finds hash table; Certainly, if can't find corresponding hash table, then need not to carry out subsequent step, show that it fails to match.
Take " Accept-Language " as example, its corresponding hash table index is 0x3F, can in the Hash matching list of table 3, find corresponding hash table, " ABCDEFG123456789 " also can find corresponding hash table equally, other tested character strings then can't be searched corresponding list item, can be considered as directly that it fails to match.Other tested Hash result of " Accept-Language " and " ABCDEFG123456789 " are mated with the hash table content that finds hash table.The matching result of each tested character string is as shown in table 5 below:
Table 5
Matching string Table item index Matching result
Accept 0x20 List item is empty, does not mate
Referer 0x51 List item is empty, does not mate
Accept-Language 0x3F 0x03_41_65_0F_7D42, fully coupling
Accept-Encoding 0x3D List item is empty, does not mate
User-Agent 0x45 List item is empty, does not mate
Host 0x20 List item is empty, does not mate
ABCDEFG123456789 0x71 0x01_41_39_10_0879, fully coupling
Connection 0x59 List item is empty, does not mate
Step 206, all mate when consistent the result that obtains that the match is successful when other tested Hash result and all hash table contents of the hash table that finds of tested character string.
Be that tested character string " Accept-Language " and " ABCDEFG123456789 " match target string.Can be according to the character string ID of matching result output matching success, in order to carry out follow-up operation according to matching result, such as the network address filtration etc.Also can further judge whether the in addition input of subsequent packet, if having, then repeat above-mentioned coupling flow process.
The technical scheme of present embodiment describes the operation of each step in detail, because the matching operation of each list item can independently be carried out, therefore the Hash result of each tested character string is calculated and is mated to walk abreast and carries out, and flowing water is realized the two-forty coupling, so significantly improved matching speed.
In the technical scheme of above-described embodiment, can select the Hash result of a certain hash algorithm as table item index, perhaps, also can select a plurality of Hash result of multiple hash algorithm to make up, as table item index.In fact, being made up by a plurality of Hash result namely is a kind of combined type hash algorithm.So the hash algorithm in the embodiment of the invention not only can be simple Hash calculation, also can be the combination hash algorithm of a plurality of simple Hash calculation, the feature of sign character string that can be more outstanding improves matching precision.
Embodiment three
The process flow diagram of the content matching method that Fig. 3 provides for the embodiment of the invention three, present embodiment can further comprise the updating operation of adding target string take above-described embodiment as the basis, on the basis of aforementioned flow process, also comprises the steps:
Step 310, content matching device add target string to be added in the request according to the target string that receives, based at least a hash algorithm of setting target string to be added is carried out Hash operation, to obtain described target string each target Hash result corresponding with each hash algorithm to be added;
In above-mentioned steps, also can arrange adding other restrictions of operation, for example, judge at first whether the list item of Hash matching list has reached higher limit, thereby determine whether to allow to add new target string.
Step 320, content matching device as the hash table index, read corresponding hash table, as current hash table with the first aim Hash result of target string to be added from described Hash matching list;
Step 330, content matching device judge that whether the contents in table of described current hash table is empty, if, execution in step 340, if not then, then execution in step 350;
Step 340, when the contents in table of described current hash table when being empty, other target Hash result of target string to be added are added in the list item of current hash table, as the content of current hash table;
Step 350, when the contents in table of described current hash table when not being empty, adopt cascade system, other target Hash result of described target string to be added next stage contents in table as described current hash table is added in the described Hash matching list.
Present embodiment adds the process of target string can effectively avoid the identical conflict that causes of character string hash table index.If the hash table index of target string is identical, then can adopt the mode of cascade that another hash table is set.When tested character string was mated, the list item of storing for cascade form can carry out the order coupling, to guarantee the accuracy requirement of matching result.
Adopt cascade system, other target Hash result of target string to be added can specifically be comprised as the operation that the next stage contents in table of current hash table is added in the described Hash matching list:
Whether other target Hash result of step 351, target string more to be added are consistent with the contents in table of current hash table, if then execution in step 352, if not, then execution in step 353;
If when step 352 is consistent, abandon target string to be added, finish to add operating process;
When if step 353 is inconsistent, read the next stage offset table entry index of current hash table, and read the next stage hash table according to the offset table entry index, with described next stage hash table as the current hash table after upgrading;
Step 354, judge whether the contents in table of the current hash table after the described renewal is empty, if then execution in step 355, if not, then execution in step 356;
When the contents in table of step 355, the current hash table after judging described renewal is empty, other target Hash result of target string to be added are added to the contents in table of current hash table;
When the contents in table of step 356, the current hash table after judging described renewal is not empty, return the compare operation of execution in step 351.
Above-mentioned steps 354 to 356 is equivalent to return execution abovementioned steps 330.
Technique scheme, if the contents in table of certain one-level is not empty, but comparative result namely shows the conflict that the different target character string has identical hash table occurs when consistent, can directly abandon this type of target string.Descend although can produce certain matching precision,, by hash algorithm and quantity thereof are set, can reduce this type of conflict as far as possible, or by alert, reduce the setting of this type of target string as far as possible.
In the aforesaid operations, the list item of cascade can have multistage, and every one-level is recorded the offset table entry index of one-level, in order to, be transferred to next stage and continue coupling when it fails to match at the one-level list item, until the match is successful or without next stage skew list item.Namely in the coupling flow process of tested character string, if find corresponding hash table, after other tested Hash result of described tested character string and the content of the hash table that finds mated, also comprise: when described other tested Hash result and the content matching of the hash table that finds are inconsistent, according to offset table entry index sequential search next stage hash table, and return the operation that other tested Hash result of described tested character string and the content of the hash table that finds are mated in execution.
The target string adding method that present embodiment provides, the interpolation of a certain list item does not affect other list items, can carry out renewal in full to the Hash matching list, therefore is easy to realize maintenance upgrade.
On the basis of technique scheme, preferably arrange and comprise at least in the hash algorithm the original character string hash algorithm of original character string as Hash result itself, and described original character string is as the contents in table of next stage hash table.
Two hash algorithms for example are set:
Hash algorithm 1=character string is by byte phase XOR
Hash algorithm 2=complete object character string
Then the Hash result of target string is as shown in table 6 below:
Table 6
ID Target string Target Hash result 1 Target Hash result 2
1 A 0x61 0x00000061
2 B 0x62 0x00000062
3 Cd 0x07 0x00006364
4 Ef 0x03 0x00006566
5 ABCD 0x04 0x41424344
6 EFGH 0x0C 0x45464748
7 BCD 0x45 0x00424344
8 EFG 0x44 0x00454647
The original character string of employing target string is as the advantage of a hash algorithm, can guarantee the matching precision of character string, and the result of this hash algorithm is as the next stage of hash table cascade or the advantage of afterbody, cascade list item coupling can be carried out in serial, if aforesaid Hash result coupling is inconsistent, can judge rapidly then that it fails to match, guarantee to mate accuracy until next stage or afterbody just carry out the exact matching of original character string.Both can guarantee matching precision, and can save match time again.
Embodiment four
The process flow diagram of the content matching method that Fig. 4 provides for the embodiment of the invention four, present embodiment can be take above-mentioned any embodiment as the basis, further increased the operation of revising or deleting target string, modification and deletion action are substantially similar, specifically comprise the steps:
Step 410, content matching device are revised treating in request or the removal request according to the target string that receives and are revised or target string to be deleted, wait to revise or target string to be deleted carries out Hash operation to described based at least a hash algorithm of setting, describedly wait to revise or each target Hash result that target string to be deleted is corresponding with each hash algorithm to obtain;
Step 420, content matching device will wait to revise or the first aim Hash result of target string to be deleted as the hash table index, from described Hash matching list, read corresponding hash table, as current hash table;
Step 430, content matching device are made amendment to the hash table content of described current hash table or are deleted.
Retouching operation is specifically revised corresponding hash table content, and deletion action then is cascade list item corresponding to deletion or deletes whole list item.
Embodiment five
The structural representation of the content matching device that Fig. 5 provides for the embodiment of the invention five, this content matching device comprises: the first Hash operation module 510, Hash table form module 520, the second Hash operation module 530 and Hash table matching module 540.Wherein, the first Hash operation module 510 is used for based at least a hash algorithm of setting at least one target string being carried out respectively Hash operation, to obtain respectively each target string each target Hash result corresponding with each hash algorithm; Hash table forms the hash table that module 520 is used for forming according to each target Hash result of each target string this target string, and the hash table of each target string is combined to form the Hash matching list; The second Hash operation module 530 is used for according to described at least a hash algorithm tested character string being carried out Hash operation, to obtain described tested character string each tested Hash result corresponding with each hash algorithm; Hash table matching module 540 is used for mating at each hash table of described Hash matching list according to each tested Hash result of described tested character string, to obtain matching result.
On the basis of technique scheme, the quantity of hash algorithm is preferably at least two, then described Hash table forms that module 520 is concrete to be used for first aim Hash result with each target string as the hash table index, other target Hash result are as the hash table content, and the hash table of each target string is combined to form the Hash matching list; Then described Hash table matching module 540 can comprise: index matching unit 541, content matching unit 542 and acquiring unit 543 as a result.Wherein, index matching unit 541 be used for will described tested character string first tested Hash result as the hash table index, in described Hash matching list, search the hash table of correspondence; If content matching unit 542 is used for finding corresponding hash table, other tested Hash result and the content of the hash table that finds of described tested character string are mated; Acquiring unit 543 is used for all mating when consistent when described other tested Hash result and the content of the hash table that finds as a result, acquisition the match is successful result.
The content matching device that the embodiment of the invention provides is realized the coupling of character string by the coupling of Hash result, can realize the PARALLEL MATCHING of each list item, thereby improves matching speed; The renewal of each list item does not influence each other in the Hash matching list, is easy to increase and the modification of realize target character string.
Embodiment six
The structural representation of the content matching device that Fig. 6 provides for the embodiment of the invention six, on the basis of above-described embodiment, this content matching device can also comprise: the 3rd Hash operation module 610, list item read module 620, content add module 630 and module 640 is added in cascade.Wherein, the 3rd Hash operation module 610 is used for adding request target string to be added according to the target string that receives, at least a hash algorithm based on described setting carries out Hash operation to target string to be added, to obtain described target string each target Hash result corresponding with each hash algorithm to be added; List item read module 620 is used for first aim Hash result with target string to be added as the hash table index, reads corresponding hash table from described Hash matching list, as current hash table; Content is added module 630 and is used for other target Hash result of described target string to be added being added in the current hash table, as the content of current hash table when the contents in table of described current hash table is sky; Cascade is added module 640 and is used for when the contents in table of described current hash table is not sky, adopt cascade system, other target Hash result of described target string to be added next stage contents in table as described current hash table is added in the described Hash matching list.
Wherein, cascade interpolation module 640 preferably includes: comparing unit 641, discarding unit 642, offset index reading unit 643, content adding device 644 and content judging unit 645.Wherein, whether comparing unit 641 is consistent with the contents in table of current hash table for other target Hash result of target string more to be added; If when discarding unit 642 is used for unanimously, abandon target string to be added; If offset index reading unit 643 is used for when inconsistent, read the next stage offset table entry index of current hash table, and read the next stage hash table according to described offset table entry index, with described next stage hash table as the current hash table after upgrading; The contents in table that content adding device 644 is used for the current hash table when judging described renewal after is added to other target Hash result of target string to be added the contents in table of current hash table during for sky; The contents in table that content judging unit 645 is used for the current hash table when judging described renewal after returns the described compare operation of execution when empty.
This content matching device can also comprise: the skew list item is searched the unit, when described other tested Hash result and the content matching of the hash table that finds are inconsistent, according to offset table entry index sequential search next stage hash table, and return carry out the content matching unit will described tested character string other tested Hash result and the operation of mating of the content of the hash table that finds.
The content matching device that present embodiment provides can add target string easily, can not cause the compiling of whole Hash matching list, so be easy to upgrade maintenance.By the list item of cascade is set, also can effectively solve the collision problem of hash table between the target string, improve the precision of coupling.
Embodiment seven
The structural representation of the content matching device that Fig. 7 provides for the embodiment of the invention seven, this content matching device can also comprise: the 4th Hash operation module 710, index matching module 720 and revise removing module 730.Wherein, the 4th Hash operation module 710 be used for according to the target string that receives revise request or removal request wait revise or target string to be deleted, wait to revise or target string to be deleted carries out Hash operation to described based at least a hash algorithm of setting, describedly wait to revise or each target Hash result that target string to be deleted is corresponding with each hash algorithm to obtain; Index matching module 720 be used for waiting revising or the first aim Hash result of target string to be deleted as the hash table index, from described Hash matching list, read corresponding hash table, as current hash table; Revise removing module 730 for the hash table content of described current hash table being made amendment or deleting.
Target string can be revised and delete to the content matching device that present embodiment provides easily, can not cause the compiling of whole Hash matching list, so be easy to upgrade maintenance.
The content matching device that various embodiments of the present invention provide can be carried out the content matching method that any embodiment of the present invention provides, and possesses corresponding functional module.This content matching method and apparatus possesses plurality of advantages, can improve matching speed, reduce resource occupation, can be easy to upgrade maintenance simultaneously.
One of ordinary skill in the art will appreciate that: all or part of step that realizes said method embodiment can be finished by the relevant hardware of programmed instruction, aforesaid program can be stored in the computer read/write memory medium, this program is carried out the step that comprises said method embodiment when carrying out; And aforesaid storage medium comprises: the various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
It should be noted that at last: above each embodiment is not intended to limit only in order to technical scheme of the present invention to be described; Although with reference to aforementioned each embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment puts down in writing, and perhaps some or all of technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the scope of various embodiments of the present invention technical scheme.

Claims (14)

1. a content matching method is characterized in that, comprising:
Based at least a hash algorithm of setting at least one target string is carried out respectively Hash operation, to obtain respectively each target string each target Hash result corresponding with each hash algorithm;
Form the hash table of this target string according to each target Hash result of each target string, the hash table of each target string is combined to form the Hash matching list;
According to described at least a hash algorithm tested character string is carried out Hash operation, to obtain described tested character string each tested Hash result corresponding with each hash algorithm;
Each tested Hash result according to described tested character string is mated in each hash table of described Hash matching list, to obtain matching result.
2. content matching method according to claim 1 is characterized in that, the quantity of described hash algorithm is at least two,
The hash table that then forms this target string according to each target Hash result of each target string comprises: as the hash table index, other target Hash result are as the hash table content with the first aim Hash result of each target string;
Then mate in each hash table of described Hash matching list according to each tested Hash result of described tested character string, comprise to obtain matching result:
First tested Hash result of described tested character string as the hash table index, is searched corresponding hash table in described Hash matching list;
If find corresponding hash table, other tested Hash result and the content of the hash table that finds of described tested character string are mated;
All mate when consistent the result that obtains that the match is successful when described other tested Hash result and the content of the hash table that finds.
3. content matching method according to claim 2 is characterized in that, also comprises:
Add target string to be added in the request according to the target string that receives, at least a hash algorithm based on described setting carries out Hash operation to target string to be added, to obtain described target string each target Hash result corresponding with each hash algorithm to be added;
The first aim Hash result of described target string to be added as the hash table index, is read corresponding hash table, as current hash table from described Hash matching list;
When the contents in table of described current hash table is sky, other target Hash result of described target string to be added are added in the list item of current hash table, as the content of current hash table;
When the contents in table of described current hash table when not being empty, adopt cascade system, other target Hash result of described target string to be added next stage contents in table as described current hash table is added in the described Hash matching list.
4. content matching method according to claim 3, it is characterized in that, adopt cascade system, other target Hash result of described target string to be added next stage contents in table as described current hash table be added in the described Hash matching list, comprising:
Whether other target Hash result of target string more to be added are consistent with the contents in table of current hash table;
If when consistent, abandon target string to be added;
If when inconsistent, read the next stage offset table entry index of current hash table, and read the next stage hash table according to described offset table entry index, with described next stage hash table as the current hash table after upgrading;
When the contents in table of the current hash table after judging described renewal is empty, other target Hash result of target string to be added are added to the contents in table of current hash table;
When the contents in table of the current hash table after judging described renewal is not empty, returns and carry out described compare operation.
5. according to claim 3 or 4 described content matching methods, it is characterized in that, if find corresponding hash table, after other tested Hash result and the content of the hash table that finds of described tested character string mated, also comprise:
When described other tested Hash result and the content matching of the hash table that finds are inconsistent, according to offset table entry index sequential search next stage hash table, and return the operation that other tested Hash result of described tested character string and the content of the hash table that finds are mated in execution.
6. content matching method according to claim 5, it is characterized in that: comprise at least in the described hash algorithm the original character string hash algorithm of original character string as Hash result itself, and described original character string is as the contents in table of next stage hash table.
7. according to claim 3 or 4 described content matching methods, it is characterized in that, also comprise:
Revising treating in request or the removal request according to the target string that receives revises or target string to be deleted, wait to revise or target string to be deleted carries out Hash operation to described based at least a hash algorithm of setting, describedly wait to revise or each target Hash result that target string to be deleted is corresponding with each hash algorithm to obtain;
Revise waiting or the first aim Hash result of target string to be deleted as the hash table index, from described Hash matching list, read corresponding hash table, as current hash table;
The hash table content of described current hash table is made amendment or deleted.
8. each described content matching method according to claim 1-7, it is characterized in that described hash algorithm comprises following a kind of or combination in any: character string is pressed double byte phase XOR by byte phase XOR, character string initial character, character string trailing character, string length and character string.
9. a content matching device is characterized in that, comprising:
The first Hash operation module is used for based at least a hash algorithm of setting at least one target string being carried out respectively Hash operation, to obtain respectively each target string each target Hash result corresponding with each hash algorithm;
Hash table forms module, is used for forming according to each target Hash result of each target string the hash table of this target string, and the hash table of each target string is combined to form the Hash matching list;
The second Hash operation module is used for according to described at least a hash algorithm tested character string being carried out Hash operation, to obtain described tested character string each tested Hash result corresponding with each hash algorithm;
The Hash table matching module is used for mating at each hash table of described Hash matching list according to each tested Hash result of described tested character string, to obtain matching result.
10. content matching device according to claim 9, it is characterized in that, the quantity of described hash algorithm is at least two, then described Hash table forms module and specifically is used for first aim Hash result with each target string as the hash table index, other target Hash result are as the hash table content, and the hash table of each target string is combined to form the Hash matching list;
Then described Hash table matching module comprises:
The index matching unit, be used for will described tested character string first tested Hash result as the hash table index, in described Hash matching list, search the hash table of correspondence;
The content matching unit if be used for finding corresponding hash table, mates other tested Hash result and the content of the hash table that finds of described tested character string;
Acquiring unit as a result is used for all mating when consistent when described other tested Hash result and the content of the hash table that finds, acquisition the match is successful result.
11. content matching device according to claim 10 is characterized in that, also comprises:
The 3rd Hash operation module, be used for adding request target string to be added according to the target string that receives, at least a hash algorithm based on described setting carries out Hash operation to target string to be added, to obtain described target string each target Hash result corresponding with each hash algorithm to be added;
The list item read module, be used for will described target string to be added the first aim Hash result as the hash table index, from described Hash matching list, read the hash table of correspondence, as current hash table;
Content is added module, is used for other target Hash result of described target string to be added being added in the current hash table, as the content of current hash table when the contents in table of described current hash table is sky;
Module is added in cascade, be used for when the contents in table of described current hash table is not sky, adopt cascade system, other target Hash result of described target string to be added next stage contents in table as described current hash table is added in the described Hash matching list.
12. content matching device according to claim 11 is characterized in that, cascade is added module and is comprised:
Comparing unit, whether other target Hash result that are used for target string more to be added are consistent with the contents in table of current hash table;
Discarding unit is if when being used for unanimously, abandon target string to be added;
The offset index reading unit if be used for when inconsistent, reads the next stage offset table entry index of current hash table, and reads the next stage hash table according to described offset table entry index, with described next stage hash table as the current hash table after upgrading;
The content adding device when contents in table that is used for the current hash table after judging described renewal is empty, is added to other target Hash result of target string to be added the contents in table of current hash table;
The content judging unit when contents in table that is used for the current hash table after judging described renewal is not empty, returns and carries out described compare operation.
13. according to claim 11 or 12 described content matching devices, it is characterized in that, also comprise:
The skew list item is searched the unit, when described other tested Hash result and the content matching of the hash table that finds are inconsistent, according to offset table entry index sequential search next stage hash table, and return carry out described content matching unit will described tested character string other tested Hash result and the operation of mating of the content of the hash table that finds.
14. according to claim 11 or 12 described content matching devices, it is characterized in that, also comprise:
The 4th Hash operation module, be used for according to the target string that receives revise request or removal request wait revise or target string to be deleted, wait to revise or target string to be deleted carries out Hash operation to described based at least a hash algorithm of setting, describedly wait to revise or each target Hash result that target string to be deleted is corresponding with each hash algorithm to obtain;
The index matching module, be used for waiting revising or the first aim Hash result of target string to be deleted as the hash table index, from described Hash matching list, read corresponding hash table, as current hash table;
Revise removing module, for the hash table content of described current hash table being made amendment or deleting.
CN201280000614.9A 2012-06-30 2012-06-30 Method and apparatus for content matching Expired - Fee Related CN102870116B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/077996 WO2014000305A1 (en) 2012-06-30 2012-06-30 Method and apparatus for content matching

Publications (2)

Publication Number Publication Date
CN102870116A true CN102870116A (en) 2013-01-09
CN102870116B CN102870116B (en) 2014-09-03

Family

ID=47447746

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201280000614.9A Expired - Fee Related CN102870116B (en) 2012-06-30 2012-06-30 Method and apparatus for content matching

Country Status (2)

Country Link
CN (1) CN102870116B (en)
WO (1) WO2014000305A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116629A (en) * 2013-02-01 2013-05-22 腾讯科技(深圳)有限公司 Matching method and matching system of audio frequency content
CN103414701A (en) * 2013-07-25 2013-11-27 华为技术有限公司 Rule matching method and device
CN103500183A (en) * 2013-09-12 2014-01-08 国家计算机网络与信息安全管理中心 Storage structure based on multiple-relevant-field combined index and building, inquiring and maintaining method
CN105426413A (en) * 2015-10-31 2016-03-23 华为技术有限公司 Coding method and device
CN106067876A (en) * 2016-05-27 2016-11-02 成都广达新网科技股份有限公司 A kind of HTTP request packet identification method based on pattern match
CN109977295A (en) * 2019-04-11 2019-07-05 北京安护环宇科技有限公司 A kind of black and white lists matching process and device
CN111627536A (en) * 2020-05-14 2020-09-04 广元市中心医院 Adverse event management system and method for hospital
CN113347214A (en) * 2021-08-05 2021-09-03 湖南戎腾网络科技有限公司 High-frequency state matching method and system
CN113821544A (en) * 2020-06-18 2021-12-21 律商联讯风险解决方案公司 Fuzzy search using field-level pruning of neighborhoods
CN114422389A (en) * 2022-02-24 2022-04-29 成都北中网芯科技有限公司 High-speed real-time network data monitoring method based on Hash and hardware acceleration

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030204703A1 (en) * 2002-04-25 2003-10-30 Priya Rajagopal Multi-pass hierarchical pattern matching
US20060034115A1 (en) * 2003-06-27 2006-02-16 Dialog Semiconductor Gmbh Natural analog or multilevel transistor DRAM-cell
CN1794236A (en) * 2004-12-21 2006-06-28 英特尔公司 Efficient CAM-based techniques to perform string searches in packet payloads
CN101350788A (en) * 2008-08-25 2009-01-21 中兴通讯股份有限公司 Method for mixed loop-up table of network processor inside and outside
CN101692651A (en) * 2009-09-27 2010-04-07 中兴通讯股份有限公司 Method and device for Hash lookup table

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030204703A1 (en) * 2002-04-25 2003-10-30 Priya Rajagopal Multi-pass hierarchical pattern matching
US20060034115A1 (en) * 2003-06-27 2006-02-16 Dialog Semiconductor Gmbh Natural analog or multilevel transistor DRAM-cell
CN1794236A (en) * 2004-12-21 2006-06-28 英特尔公司 Efficient CAM-based techniques to perform string searches in packet payloads
CN101350788A (en) * 2008-08-25 2009-01-21 中兴通讯股份有限公司 Method for mixed loop-up table of network processor inside and outside
CN101692651A (en) * 2009-09-27 2010-04-07 中兴通讯股份有限公司 Method and device for Hash lookup table

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116629A (en) * 2013-02-01 2013-05-22 腾讯科技(深圳)有限公司 Matching method and matching system of audio frequency content
CN103116629B (en) * 2013-02-01 2016-04-20 腾讯科技(深圳)有限公司 A kind of matching process of audio content and system
CN103414701B (en) * 2013-07-25 2017-03-01 华为技术有限公司 A kind of rule matching method and device
CN103414701A (en) * 2013-07-25 2013-11-27 华为技术有限公司 Rule matching method and device
CN103500183A (en) * 2013-09-12 2014-01-08 国家计算机网络与信息安全管理中心 Storage structure based on multiple-relevant-field combined index and building, inquiring and maintaining method
CN105426413B (en) * 2015-10-31 2018-05-04 华为技术有限公司 A kind of coding method and device
WO2017071431A1 (en) * 2015-10-31 2017-05-04 华为技术有限公司 Encoding method and device
CN105426413A (en) * 2015-10-31 2016-03-23 华为技术有限公司 Coding method and device
US10305512B2 (en) 2015-10-31 2019-05-28 Huawei Technologies, Co., Ltd. Encoding method and apparatus
CN106067876A (en) * 2016-05-27 2016-11-02 成都广达新网科技股份有限公司 A kind of HTTP request packet identification method based on pattern match
CN106067876B (en) * 2016-05-27 2019-08-16 成都广达新网科技股份有限公司 A kind of HTTP request packet identification method based on pattern match
CN109977295A (en) * 2019-04-11 2019-07-05 北京安护环宇科技有限公司 A kind of black and white lists matching process and device
CN111627536A (en) * 2020-05-14 2020-09-04 广元市中心医院 Adverse event management system and method for hospital
CN113821544A (en) * 2020-06-18 2021-12-21 律商联讯风险解决方案公司 Fuzzy search using field-level pruning of neighborhoods
CN113821544B (en) * 2020-06-18 2024-03-19 律商联讯风险解决方案公司 Improved fuzzy search using field level deletion neighborhood
CN113347214A (en) * 2021-08-05 2021-09-03 湖南戎腾网络科技有限公司 High-frequency state matching method and system
CN114422389A (en) * 2022-02-24 2022-04-29 成都北中网芯科技有限公司 High-speed real-time network data monitoring method based on Hash and hardware acceleration
CN114422389B (en) * 2022-02-24 2023-09-12 成都北中网芯科技有限公司 High-speed real-time network data monitoring method based on hash and hardware acceleration

Also Published As

Publication number Publication date
WO2014000305A1 (en) 2014-01-03
CN102870116B (en) 2014-09-03

Similar Documents

Publication Publication Date Title
CN102870116B (en) Method and apparatus for content matching
CN104067282B (en) Counter operation in state machine lattice
CN102857493B (en) Content filtering method and device
CN101464905B (en) Web page information extraction system and method
JP6051212B2 (en) Processing iterative data
US8977626B2 (en) Indexing and searching a data collection
CN102682098B (en) Method and device for detecting web page content changes
CN102148805B (en) Feature matching method and device
KR101617696B1 (en) Method and device for mining data regular expression
CN101551803A (en) Method and device for establishing pattern matching state machine and pattern recognition
CN103617226B (en) A kind of matching regular expressions method and device
CN102609462A (en) Method for compressed storage of massive SQL (structured query language) by means of extracting SQL models
CN111666468A (en) Method for searching personalized influence community in social network based on cluster attributes
CN102193995B (en) Method and device for establishing multimedia data index and retrieval
CN103064908A (en) Method for rapidly removing repeated list through a memory
CN105630797A (en) Data processing method and system
CN107977504A (en) A kind of asymmetric in-core fuel management computational methods, device and terminal device
CN111078279A (en) Processing method, device and equipment of byte code file and storage medium
CN111181980A (en) Network security-oriented regular expression matching method and device
CN102982043B (en) The disposal route of PE file and device
CN114201756A (en) Vulnerability detection method and related device for intelligent contract code segment
US20080306948A1 (en) String and binary data sorting
CN113779025B (en) Optimization method, system and application of classified data retrieval efficiency in block chain
CN107729518A (en) The text searching method and device of a kind of relevant database
CN109284268A (en) A kind of method, system and the electronic equipment of fast resolving log

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140903

Termination date: 20190630