CN100531179C - Method for storing character string matching rule and character string matching by storing rule - Google Patents
Method for storing character string matching rule and character string matching by storing rule Download PDFInfo
- Publication number
- CN100531179C CN100531179C CNB2004100010993A CN200410001099A CN100531179C CN 100531179 C CN100531179 C CN 100531179C CN B2004100010993 A CNB2004100010993 A CN B2004100010993A CN 200410001099 A CN200410001099 A CN 200410001099A CN 100531179 C CN100531179 C CN 100531179C
- Authority
- CN
- China
- Prior art keywords
- leaf
- keyword
- rule
- talk
- tree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The method of saving character string divides keywords of rule to be saved into several portions of sub keywords. Each sub keyword is connected to relevant leaf number respectively to form character string. The character string is as keyword to construct leaf in rule tree. Thus, keyword and data part in rule can be inserted into rule tree to be stored. The invention also discloses method for matching character string. The method divides character string to be matched into several portions of sub keywords. Each sub keyword is connected to relevant leaf number respectively to form character strings. Match of longest prefix in these strings is searched in the said rule tree so as to realize procedure of matching character string. Advantages are: possible to save rule of matching long character string, reduced hardware resources possessed and raised processing capability.
Description
Technical field
The present invention relates to field of network management, relate in particular to the string matching problem in the network.
Background technology
Along with networks development and development of technology, people begin to stand in the angle of " content ", just remove to treat network from the angle of " application layer ", remove the management network.The major issue that needs of " content " network solve is exactly the identification to content.Under a lot of situations, this work performance is: some the important character strings in the data of transmission over networks are carried out longest prefix match, and carry out corresponding action according to the content rule that it mated.Therefore, string matching is as the underlying issue of computer science, under the situation that network constantly develops, received increasingly extensive concern, the proposition that the algorithm of this respect is emerged in an endless stream, especially longest prefix match owing to its important function at aspects such as route queryings, has in the industry cycle obtained to use widely.
String matching is that the character that will form two character strings compares, and judges the operation that concerns between two character strings, wherein differentiate two character strings whether equal fully operation be called as the accurate coupling of character string.In the accurate coupling of character string, if the equal in length of two character strings, and each character on their correspondence positions is all identical, and accurately the match is successful just to think these two character strings, otherwise just think that accurate coupling is unsuccessful.The character string accurately operation of coupling is often used in having in the information bank of many records and searches certain bar customizing messages.Every record in the information bank has a unique item of distinguishing each bar record of energy to be called keyword usually, and other information is called the user data of record in the record.The process of searching is exactly accurately to mate with the keyword string of every record in information bank with certain word string S that will search relevant information, thereby finds the information about string S.One two tuple that claims keyword Key and this keyword corresponding user data User_Data to be formed in the present invention is rule, note do R=(Key, User_Data).By following example, the longest prefix match of character string is illustrated, be provided with regular collection as shown in table 1:
Numbering | | User data | |
1 | aa | Datal | |
2 | | Data2 | |
3 | | Data3 | |
4 | aabe | Data4 | |
5 | aabef | Data5 |
Table 1
Wherein, the numbering in the table 1 is not the content in the rule, just introduces for ease of following narration.
Suppose to have character string S=aabegft, in five keywords of these five rules, with S be that keyword K of longest prefix match relation will satisfy two following conditions:
1, K is the prefix of S, as the keyword of rule 1,2,4 in this example.
2, in 1 those keywords of satisfying condition, the length of K is the longest; In this example, rule 4 keyword is the longest prefix match of S, rather than 1 or 2, though 1 and 2 also all be the prefix of S, because their length does not have 4 to grow, therefore, does not constitute longest prefix match with S.
Now, the method that realizes longest prefix match in the prior art is described in conjunction with two instantiations:
Prior art one:
Because router needs to use longest prefix match often searching routing table, therefore, longest prefix match very useful in the Internet.In order to improve its speed, many hardware vendors have released the network processes chip that can realize longest prefix match, these network processor chip all have separately coprocessor to realize the longest prefix match of character or numeric string, and its typical case realizes that method of character string prefix matching is:
1) tissue of matched rule:
Earlier in internal memory rule is configured to a kind of special data structure, this structure is the form of linear list+Patricia tree+leaf, and this kind form is called as compound tree, as shown in Figure 1.11 is a linear list among the figure, and 12,13 and 14 are the Patricia tree, and as intermediate node, 10,15,16,17,18 and 19 is leaf node with this Patricia tree, has the keyword of rule and the content of rule in the leaf node.The process of preserving matched rule in this tree is:
At first create a compound tree of sky, i.e. the linear list of structure, intermediate node, leaf node application memory headroom for this reason in internal memory; Then with a regular rule be inserted in this empty tree, just formed this data structure;
Then, insert rule in this empty tree, its insertion process of rule is: regard the keyword of rule as bit sequence, take out several bits wherein, what get as Fig. 1 is preceding 3 bits, is index with the value of its formation, with the address note of keyword place leaf in linear list; When thereby these several bits that a plurality of keywords are arranged all are identical leading to a conflict, solve this conflict by the Patricia tree; Intermediate node 12,13,14 has formed the Patricia tree among the figure, numeral (4) in this node 12, the branch that (6) point out this node place are because who follow-up different causing of keyword, like this, the different branches that the keyword of conflict can be set by Patricia because of the difference of subsequent bits position make a distinction, thereby have solved collision problem.
2) Gui Ze coupling:
During coupling, word string S to be matched also is seen as bit sequence, will wherein take out with the corresponding several bits of aforementioned keyword earlier, finds corresponding list item with the numerical value of these several formation in the linear list as indexing; The Patricia tree of drawing along this list item is searched for again, just can find the longest prefix match string of word string S to be matched.In tree, seek the process of the longest prefix match string of string to be matched, also can be described as searching of tree, or the search of tree.
In above-mentioned prior art one, insert process of rule and realize, and 2 by the micro code program that network processing unit provides) described in coupling and search procedure realize by hardware, in this way, can improve the speed of longest prefix match; Yet, prior art one also has suitable limitation, show: because the consideration of aspects such as design technology of network processing unit own and cost, its length to keyword is conditional, the longest as the keyword of 4GS3 is 192 bits, C-5's is 112 bits, this is enough for application scenarios such as the short route querying of keyword, better simply traffic classifications, but the coupling for the content rule of the needed hundreds of byte of content networking device (thousands of bit) then seems too short, not enough usefulness.For the scheme of software realization, because the restriction of memory source, during the design data structure, also needing has certain restriction to the key length of tree in addition.
Prior art two:
In order to break through the restriction of hardware itself, realization is to the coupling of key length above the rule of restriction, and a kind of comparatively directly implementation is that many compound trees are cascaded up, and is configured to subtending tree as shown in Figure 2, in order to narrate conveniently, abbreviate above-mentioned compound tree as tree hereinafter.
Among Fig. 2 20,21,22,23 represented all be the tree structure described in the prior art one, but the leaf of certain one-level subtree wherein as 30 among Fig. 2,37 etc., points to the subtree of next stage, they are connected together has formed subtending tree.The key length that each tree of hypothesis allows among Fig. 2 is L bit, S to the maximum
a, S
b, S
w, S
pRepresenting length is the character or the numeric string of L bit, S
e, S
f, S
r, S
tRepresent the string of length smaller or equal to the L bit.S
aS
bExpression is with S
aAnd S
bTwo polyphones connect formed keyword string, for example: if S
aBe 0010, S
bBe 0101, S then
aS
bJust represent 00100101.The keyword string S that surpasses the L bit with length
aS
bS
eBe example, in the prior art two, the method for utilizing this subtending tree to preserve the string matching rule is:
1, keyword is divided in order the sub-keyword of multistage, except that last cross-talk keyword, the length of each sub-keyword is L, and the length of last cross-talk keyword is equal to or less than L, in this example with keyword string S
aS
bS
eBe divided into S
a, S
b, S
eThree sections;
2, create an empty tree 20 with the method in the prior art one, set as the first order; Take out first section S of keyword then
a, with S
aBe keyword, it be inserted in the tree 20 the leaf S that it is corresponding with the insertion method in the prior art one
aCascade mark 31 in 30 the user data is designated as 01, and with the expression cascade, the leaf corresponding position of other not cascade is designated as 00, to represent not cascade;
3, create a sky tree 21 identical, simultaneously from string S with prior art one described data structure
aS
bS
eThe follow-up L bit S of middle taking-up
b, with S
bBe keyword, be inserted in the partial tree 21, generate leaf 37, and this tree 21 of pointer 32 sensings of an expression cascade state is set in the leaf in 20 30; Get S again
eInsert in the leaf 33 of the compound tree 22 of the third level, so recursion is gone down, and is all got up to all bits of keyword, just structure has been got well the compound tree of afterbody, as 22 among Fig. 2, represent that with one " " the cascade case pointer of the leaf of this tree is empty, represents that it no longer includes the next stage subtree among the figure.
In the prior art two, the method for utilizing constructed tree to mate is:
For string S to be matched, also getting its 0 to L-1 bit successively mates in first order subtree as sub-keyword, after finding leaf, cascade pointer along leaf mates next cross-talk keyword in the subtree of the second level again, and so it fails to match up to scanning complete S or the part occurs in certain stalk tree for recursion.
Owing to be longest prefix match, in certain stalk tree, occur locally can not thinking that whole longest prefix match failed when it fails to match, need this moment to seek possible longest prefix match in the leaf of subtree of former process, this process is called to be recalled, and just can know whether really after recalling and fail.Recall for the ease of realizing, inserting new sub-keyword K
NewThe time, if find existing certain sub-keyword K in the tree
OldBe K
NewLongest prefix match, then at K
NewLeaf in also to be provided with one and recall pointed K
Old, like this when searching, if coupling K
NewSuccessfully carry out K
NewSubordinate fails during any one subtree coupling, so K
OldShould be the result of longest prefix match just, so use K
NewThis pointer just can find K
Old, and with K
OldResult as longest prefix match.If recall the front the leaf node of process all can not find longest prefix match, then show the longest prefix match of S failed really.The structure of the leaf node of the cascade structure in the prior art two is referring to Fig. 3.
Prior art two utilizes compound tree in the prior art one as construction unit, the keyword of overlength is divided into multistage to be inserted into respectively in many compound trees, and, just can on the structure of this subtending tree, finish rule match to the character string of overlength with these trees formation subtending tree that cascades up.But prior art two need realize the structure of subtending tree with many trees, and the number of required tree is owing to depend on the rule of processing, thereby is uncertain.For network processing unit, the management of its tree will take certain hardware resource, thus the user can with the number of tree be very limited, be 128 as 4GS3.The disposal ability of prior art two will be very limited like this.
Summary of the invention
In view of this, main purpose of the present invention is to provide a kind of method that is used to preserve the string matching rule, adopt this method the matched rule of long character string matched rule or overlength character string can be kept in the rule tree, and, adopt this method can realize utilizing the less rule tree of number to preserve the string matching rule, to reduce the occupied hardware resource of rule tree.
For achieving the above object, a kind of method of preserving the string matching rule of the present invention may further comprise the steps:
Steps A: with L-M is that unit is divided into S cross-talk keyword with the current keyword of waiting to be kept at the rule in the rule tree, wherein, in the S cross-talk keyword of being got, the length of each the cross-talk keyword except that S cross-talk keyword is equal to L-M, the length of S cross-talk keyword is less than or equal to L-M, L is the extreme length that rule tree is preserved keyword in the rule, and M is the length of leaf numbering, and the hop count S that is got is a natural number;
Step B: begin from the 1st cross-talk keyword, till S-1 cross-talk keyword, order execution in step B1 and step B2, wherein, the upper level leaf of the 1st cross-talk keyword is numbered 0;
Step B1: the leaf numbering of series connection upper level leaf before the current sub-keyword, form new word string, the existing leaf that is keyword with this new word string whether in the judgment rule tree, if do not have, the leaf that then to generate one in rule tree be keyword with this new word string, and be that this newly-generated leaf distributes unique leaf numbering, the cascade sign of this newly-generated leaf is changed to cascade, should be changed to current leaf by newly-generated leaf then; If have, and the cascade of this leaf that finds is masked as cascade, and then the leaf that this is found is changed to current leaf;
Step B2: next the cross-talk keyword of handled sub-keyword among the step B1 as current sub-keyword, as current upper level leaf numbering, is returned the leaf numbering of handled sub-keyword among the step B1 to execution in step B1;
Step C: the leaf numbering of series connection upper level leaf before the S cross-talk keyword, form new word string, the existing leaf that is keyword with this new word string whether in the judgment rule tree, if do not have, the leaf that then to generate one in rule tree be keyword with this new word string, to wait that then the data content of preserving in the rule writes this leaf,, will wait that directly the data content of preserving in the rule writes this leaf if having.
Whether wherein, had in judgment rule described in the step B1 tree with described word string is that the leaf of keyword may further comprise the steps:
Step B11: judge that whether the longest prefix match with described word string is arranged is the leaf of keyword in rule tree,, judge that then obtaining in the rule tree is not the leaf of keyword with described word string if do not have, if having, execution in step B12 then;
Step B12: whether the longest prefix match keyword that finds among the determining step B11 is identical with described word string, if not, judge that then obtaining in the rule tree is not the leaf of keyword with described word string, if judge that then obtaining having in the rule tree with described word string is the leaf of keyword.
Wherein, among the step B12, if the longest prefix match keyword is different with described word string, this method further comprises:
Described newly-generated leaf recalled the pairing leaf of this longest prefix match keyword of pointed.
Wherein, among the step B1,, describedly the leaf that finds be changed to current leaf further comprise if what find in rule tree is the not cascade of leaf of keyword with described word string:
Be that the described leaf that finds distributes unique leaf numbering, with this leaf that finds be changed to cascade and with this leaf that finds recall pointed himself.
Whether wherein, had in judgment rule described in the step C tree with this word string is that the leaf of keyword may further comprise the steps:
Step C1: judge that whether the longest prefix match with described word string is arranged is the leaf of keyword in rule tree,, judge that then obtaining in the rule tree is not the leaf of keyword with this word string if do not have, if having, execution in step C2 then;
Step C2: whether the longest prefix match keyword that finds among the determining step C1 is identical with described word string, if not, judge that then obtaining in the rule tree is not the leaf of keyword with this word string, if judge that then obtaining having in the rule tree with this word string is the leaf of keyword.
Wherein, described in the step C data of rule being write leaf is: the data in the rule are write in " other " territory of leaf.
Wherein, in step B1 neutralization procedure C, described new in rule tree to make up one be that the leaf of keyword further comprises with described word string: the value that will be somebody's turn to do in " subordinate's counting " territory in the upper level leaf of new structure leaf adds 1.
Wherein, when in the deletion rule tree one was regular, this method further comprised:
Step D1: be that keyword in the rule that unit will be to be deleted is divided into T cross-talk keyword with L-M; Wherein, in the T cross-talk keyword of being got, the length of each the cross-talk keyword except that T cross-talk keyword is equal to L-M, and the length of T cross-talk keyword is less than or equal to L-M, and the hop count T that is got is a natural number;
Step D2: the 1st cross-talk keyword from step D1 begins, and till T-1 cross-talk keyword, order is carried out following steps D21~step D22, and wherein, the upper level leaf of the 1st cross-talk keyword is numbered 0:
Step D21: the leaf numbering of series connection upper level leaf before current sub-keyword, forms new word string, judge that can find with this word string be the leaf of keyword in rule tree, if, execution in step D22 then, otherwise delete procedure is failed;
Step D22: the leaf node that will be found in step D21 is recorded in the predefined array, numbers as the upper level leaf with the leaf numbering of this leaf, as current sub-keyword, returns step D21 with the sub-keyword of the next stage of current sub-keyword;
Step D3: series connection upper level leaf numbering before the T cross-talk keyword, form new word string, can judgement find its keyword and the identical leaf of this new word string in rule tree, if, then this leaf node is recorded in the described array of step D22, execution in step D4 then, otherwise, the delete procedure failure;
Step D4: according to the record of array among the step D22, last leaf node for the treatment of deletion rule that is write down in this array begins certainly, treats to this to carry out following steps successively till first leaf node of deletion rule:
Whether " subordinate's counting " territory of judging the leaf of working as pre-treatment is 0, if, then the leaf of pre-treatment is deserved in deletion, and the value in " subordinate's counting " territory of upper level leaf that will deserve the leaf of pre-treatment subtracts 1, then, this upper level leaf is changed to leaf when pre-treatment, returns and carry out this step; Otherwise, finish delete procedure.
Wherein, if when the leaf of pre-treatment be described last leaf node, and " subordinate's counting " territory of this leaf is not 0, this method further comprises:
Delete the pointer of recalling that points to himself in this leaf node.
The present invention also provides a kind of method of utilizing above-mentioned string matching rule tree to carry out string matching, described string matching rule tree is to adopt the method for above-mentioned preservation string matching rule to set up, this method of carrying out string matching is divided into some cross-talk keywords with character string to be matched, connect respectively upper level leaf numbering of every cross-talk keyword forms word string, in above-mentioned constructed string matching rule tree, search these word strings successively, thereby realize the coupling of character string.
A kind of method of utilizing the string matching rule tree to carry out string matching of the present invention may further comprise the steps:
Step e: with L-M is that unit is divided into P cross-talk keyword with character string to be matched, wherein, in the P cross-talk keyword of being got, the length of each the cross-talk keyword except that P cross-talk keyword is equal to L-M, the length of P cross-talk keyword is less than or equal to L-M, and the hop count P that is got is a natural number;
Step F: begin from the 1st cross-talk keyword, till P cross-talk keyword, order is carried out following steps F1~step F 2, and wherein, the upper level leaf of the 1st cross-talk keyword is numbered 0:
Step F 1: the leaf numbering of series connection upper level leaf before the current sub-keyword, forms new word string, during judgment rule is set whether keyword being arranged is the leaf of the longest prefix match of this word string, if having, execution in step F2 then, otherwise it fails to match;
Step F 2: judge the whether cascade of the leaf that found in step F 1, if, then with the leaf of this leaf numbering as upper level leaf numbering, with the sub-keyword of the next stage of current sub-keyword as current sub-keyword, return step F 1 then, until the not cascade of finding of leaf; Otherwise this leaf is the afterbody leaf of matching result, takes out the data of rule from this leaf, and coupling finishes.
Wherein, in the step F 1, be the leaf of the longest prefix match of this word string if keyword is arranged in the rule tree, this method further comprises:
This leaf is recorded in the predefined array in order.
Wherein, in the step F 1, be the leaf of the longest prefix match of this word string if there is not keyword in the rule tree, this method further comprises:
Last leaf node from described array record begins, whether the pointer of recalling of the leaf of judging this array in order and being write down is empty, if, then return this step, the pointer of recalling that obtains all leaves of being write down until judgement is sky, and then it fails to match, otherwise, recall the afterbody leaf that pointer leaf pointed is a matching result, take out the data of rule from this leaf, coupling finishes.
As seen, the present invention adopts a rule tree to preserve the string matching rule, keyword in the rule to be preserved is divided into some cross-talk keywords, this a little keyword is connected with corresponding leaf numbering and is formed word string, in rule tree, with the keyword of these word strings as leaf, thereby regular segmentation that will be to be preserved is inserted in the rule tree.When utilizing this rule tree to carry out string matching, character string to be matched is divided into some cross-talk keywords, connect respectively upper level leaf numbering of every cross-talk keyword forms word string, searches these word strings successively in above-mentioned rule tree, thereby realizes the coupling of character string.
Be divided into plurality of sections insertion rule tree owing to will wait the keyword of preserving rule among the present invention, therefore, can realize preserving long character string matched rule or overlength string matching rule, and then can utilize this rule tree to realize long character string or overlength string matching; Because in the present invention, adopt a rule tree to preserve the string matching rule, therefore, reduced the quantity of the tree when preserving the string matching rule, thereby saved the shared hardware space of rule tree, made the disposal ability of this method be improved, and widened the range of application of this method.
Description of drawings
Fig. 1 is for being used to make up the structural representation of regular compound tree in the prior art.
Fig. 2 is for being used to make up the structural representation of regular subtending tree in the prior art.
Fig. 3 is the leaf structural representation of prior art cascade tree.
Fig. 4 is the schematic diagram of the leaf structure among the present invention.
Fig. 5 is the structural representation of the tree that is used to preserve the string matching rule among the present invention.
Fig. 6 is among the present invention inserting rule the flow chart of rule tree.
The flow chart of Fig. 7 among the present invention rule being deleted from rule tree.
Fig. 8 carries out the flow chart of string matching search for utilizing rule tree among the present invention.
Embodiment
The present invention is for a kind of method of preserving the string matching rule and the method for utilizing the rule tree of preserving matched rule to carry out string matching, aspect preservation string matching rule, this method can be regarded as the improvement to above-mentioned prior art two, the leaf of the new subtree that will draw for every grade of tree is given a leaf numbering N, this numbering accounts for the M bit, when getting when the sub-keyword structure of the follow-up L-M bit of the sub-keyword of pre-treatment next stage subtree, physically do not create a new subtree, but with this follow-up sub-keyword of L-M bit in the N series connection, form new word string, be this new word string structure leaf, this leaf is inserted in the rule tree, repeat above-mentioned steps, the content of each cross-talk keyword of keyword and rule is all inserted in the rule tree in rule.Adopt this kind mode in rule tree, to preserve matched rule, can only make up a rule tree, and need not to make up other subtree, thereby reach the effect of simplifying in logic; Be that new leaf is when distributing leaf numbering N, this numbering N is a unique number at different leaves, in this way, even the keyword in prior art two in the leaf of different subtrees may be identical, owing to draw their leaf numbering N difference, resulting new keyword just can not be identical after both cascades, still can go out the sub-keyword of this one-level thus according to this keyword unique identification, simultaneously, with N the sub-keyword of preceding L-M bit after form realize series connection, can also guarantee the result's of longest prefix match correctness.
Below in conjunction with accompanying drawing the method that the present invention preserves the string matching rule is described in detail.
The present invention adopts leaf structure as shown in Figure 4 to preserve the string matching rule, in this leaf structure, the keyword of leaf is the string that is cascaded and is formed by two parts, use+represent the connection between two parts as shown in Figure 4, the leaf numbering that a preceding part is the upper level leaf, a back part is the pairing sub-keyword of current leaf, and the length that both lump together is no more than the L bit; Preestablish leaf numbering account for M (M<<L) bit, when keyword is carried out segmentation, be that unit carries out segmentation with the L-M bit, number the length that is cascaded and be no more than the L bit to guarantee sub-keyword and leaf.In this leaf structure, the connotation of " the cascade sign " of leaf, " recalling pointer ", " other " these three fields is identical with prior art two described contents, different is " next stage subtree " territory of having removed in the prior art two, " leaf numbering " and " subordinate's counting " two territories have been increased, wherein, the content that write down in " leaf numbering " territory is unique leaf numbering of this leaf, " subordinate's counting " writes down the number of subordinate's leaf that this leaf drawn in logic, judges whether to delete this leaf node when this territory being set being for deletion rule.
Preserve in the method for string matching rule in the present invention, the preservation of string matching rule is undertaken by regular rule ground is inserted in the tree, below in conjunction with an instantiation, the method for the preservation string matching rule among the present invention is described in detail.
In this embodiment, regular (Key1, D1), (Key2, D2), (Key3 D3) needs to preserve, wherein:
Keyword Key1 is that unit is divided into three cross-talk keyword: S1, S2, S3 with the L-M bit;
Keyword Key2 is that unit is divided into two cross-talk keyword: S1, S4 with the L-M bit;
Keyword Key3 is that unit is divided into two cross-talk keyword: S2, S4 with the L-M bit;
The length of above-mentioned S1 and S2 is the L-M bit, and the length of S3 and S4 is less than the L-M bit;
Adopt leaf structure shown in Figure 4 to insert in the sky tree of preservation matched rule above-mentioned rule, Fig. 5 is the structure of the tree after these rules of insertion, the leaf part of only drawing in detail in Fig. 5 because linear list is little with intermediate node and this programme relation, is simply drawn; The step that above-mentioned three rules are inserted tree comprises:
Steps A: at first insertion rule (Key1 D1), specifically comprises:
Steps A 1: the first cross-talk keyword S1 that takes out Key1;
Steps A 2: go up S1 with 00 series connection and constitute new word string 00+S1, explore in tree whether to have had with 00+S1 be the leaf of keyword, because in embodiments of the present invention, the tree that preserves rule is initially empty tree, therefore, must can not find in tree with 00+S1 in this step is the leaf of keyword, directly execution in step A3; Wherein, when new regular of one of every insertion in tree, the upper level leaf of the first cross-talk keyword of this rule all is set to sky, and upper level leaf numbering is set to 00, therefore in this step, constitutes new word string 00+S1 with 00 series connection S1;
Steps A 3: 00+S1 is inserted in the tree, produce leaf 521, each value in the user data part in the leaf 521 is initialized to 0; Wherein, the user data of described leaf part is referring to shown in Figure 4, and is new when producing leaf in the subsequent step, all is to be 0 with each value initialization of the user data part of leaf at first;
Steps A 4: because Key1 also has next cross-talk keyword S2, therefore, the cascade sign of leaf 521 is marked as 1, with expression leaf 521 also will with other subordinate's leaf cascade, simultaneously, be that leaf 521 distributes a unique leaf numbering 01;
Steps A 5: take out next cross-talk keyword S2 of Key1, at this moment, the upper level leaf is 521, and the upper level leaf is numbered 01;
Steps A 6: the upper level leaf is numbered 01 sub-series keyword S2 and is constituted new word string 01+S2, whether be the leaf of keyword in search if having had with 01+S2 in tree, because in the present embodiment, be not the leaf of keyword with 01+S2 this moment also in the tree, therefore, direct execution in step A7;
Steps A 7: 01+S2 is inserted in the tree, producing leaf 522, is 0 with each value initialization of the user data part of leaf 522, and the value in " subordinate's counting " territory in the upper level leaf 521 is added 1, in the present embodiment, the value in " subordinate's counting " territory of 521 becomes 1 through after this step by 0;
Steps A 8: because Key1 also has next cross-talk keyword K3, therefore, the cascade sign of leaf 522 is changed to 1, and is that leaf 522 distributes a unique leaf numbering 02, uses for subsequent step;
Steps A 9: take out next cross-talk keyword S3 of Key1, at this moment, the upper level leaf is 522, and the upper level leaf is numbered 02;
Steps A 10: 02 series connection of upper level leaf numbering is gone up sub-keyword S3 and is constituted new word string 02+S3, explore in tree whether to have had with 02+S3 be the leaf of keyword, because in the present embodiment, also not have 02+S3 be the leaf of keyword this moment in the tree, therefore, direct execution in step A11;
Steps A 11: 02+S3 is inserted in the tree, produce leaf 523, with user data each value initialization partly of leaf 523 is 0, value in " subordinate's counting " territory in the upper level leaf 522 is added 1, in the embodiment of the invention, the value in " subordinate's counting " territory of 522 becomes 1 through after this step by 0;
Steps A 12: because each cross-talk keyword of Key1 has been got at this moment, therefore, needn't be that leaf 523 distributes the leaf numbering again, only need rule (Key1, D1) the data D1 in is written in " other " territory of leaf 523, to preserve the data content in this rule, so far, rule (Key1, D1) insertion finishes;
Step B: the insertion rule (Key2 D2), specifically comprises:
Step B1: take out the first cross-talk keyword S1 of Key2, this moment, the upper level leaf was set to sky, and upper level leaf numbering is set to " 00 ";
Step B2: the last first cross-talk keyword S1 of upper level leaf numbering 00 series connection constitutes new word string 00+S1, explore in tree whether to have had with 00+S1 be the leaf of keyword, because in the present embodiment, it is the leaf of keyword that steps A has been constructed in tree with 00+S1, therefore, need not to construct once more with 00+S1 is the leaf of keyword, directly execution in step B3;
Step B3: take out the second cross-talk keyword S4 of Key2, at this moment, the upper level leaf is the pairing leaf 521 of 00+S1, and the upper level leaf is numbered the numbering 01 of leaf 521;
Step B4: 01 series connection of upper level leaf numbering is gone up current sub-keyword S4 and is constituted new word string 01+S4, explore in tree whether to have had with 01+S4 be the leaf of keyword, because in the present embodiment, be not the leaf of keyword with 01+S4 this moment also, therefore, execution in step B5;
Step B5: 01+S4 is inserted in the tree, produce leaf 524, the value in " the subordinate counting territory " of current upper level leaf 521 is added 1, through after this step, the value in " the subordinate counting territory " of leaf 521 becomes 2 by 1;
Step B6:, therefore, needn't be leaf 524 reallocation numberings because each cross-talk keyword of Key2 has been got at this moment, only need with rule (Key2, D2) the data D2 in writes in " other " territory in the leaf 524, to preserve the content in this rule, so far, and rule (Key, D2) insertion finishes;
Step C: the insertion rule (Key3 D3), specifically comprises:
Step C1: take out the first cross-talk keyword S2 of Key3, this moment, the upper level leaf was set to sky, and upper level leaf numbering is set to " 00 ";
Step C2: the current sub-keyword S2 of upper level leaf numbering 00 series connection constitutes new word string 00+S2, explore in tree whether to have had with 00+S2 be the leaf of keyword, because in the present embodiment, also not have this moment to construct with 00+S2 is the leaf of keyword in tree, therefore, execution in step C3;
Step C3: 00+S2 is inserted in the tree, produce leaf 525;
Step C4: because Key3 also has next cross-talk keyword S4, therefore, the cascade sign of leaf 525 be marked as 1 with expression leaf 525 also will with other subordinate's leaf cascade, simultaneously, be that leaf 525 distributes a unique leaf numbering 03;
Step C5: take out next cross-talk keyword S4 of Key3, at this moment, the upper level leaf is 525, and the upper level leaf is numbered 03;
Step C6: the current sub-keyword S4 of upper level leaf numbering 03 series connection constitutes new word string 03+S4, explore in tree whether to have had with 03+S4 be the leaf of keyword, because in the present embodiment, be not the leaf of keyword with 03+S4 this moment also in the tree, therefore, execution in step C7;
Step C7: 03+S4 is inserted in the tree, produce leaf 526, with user data each value initialization partly of leaf 526 is 0, value in " subordinate's counting " territory in the upper level leaf 525 is added 1, in the embodiment of the invention, the value in " subordinate's counting " territory of 525 becomes 1 through after this step by 0;
Step C8: because each cross-talk keyword of Key3 has been got at this moment, therefore, needn't be that leaf 526 distributes numbering again, only need rule (Key3, D3) the data D3 in is written in " other " territory of leaf 526, to preserve the content in this rule, so far, (Key3, D3) insertion finishes rule.
The above is rule (Key1, D1), rule (Key2, D2) and the rule (Key3, D3) preservation process, consider the problem of for example recalling of may occurring in the string matching process etc., utilize another embodiment below, illustrate under the situation of considering various aspects problems such as for example recalling, realize preserving the string matching process of rule.
Suppose to adopt rule tree T to preserve the string matching rule, the initial condition of this rule tree T can be sky, also can preserve some other string matching rule, in the present embodiment, regular (Key, Data) to preserve by being inserted among the rule tree T, the length of keyword Key is the B bit, with the L-M bit is that unit is divided into the sub-keyword of multistage with keyword Key, wherein, the L bit is the maximum length of " upper level leaf numbering+sub-keyword at the corresponding levels " in the leaf structure shown in Figure 4, M is " leaf numbering " the shared length in the leaf structure shown in Figure 4, is that unit is the purpose that keyword carries out segmentation with the L-M bit: the length of the word string that makes each the cross-talk keyword of this keyword and leaf numbering link together to be constituted is no more than the length L of defined in Fig. 4 leaf structure; After carrying out above-mentioned segmentation, except that last cross-talk keyword, the length of each cross-talk keyword of Key all is L-M, and the length of final stage is less than or equal to the L-M bit; Represent segments with S, if B can be divided exactly by (L-M), S=B/ (L-M) then is if B can not be divided exactly S=[B/ (L-M)+1 then by (L-M)], [] expression is to the numerical value round numbers part of its inside here; According to fragmentation procedure recited above, the S cross-talk keyword of resulting Key remembered respectively make K
1, K
2... K
S-1, K
SReferring to Fig. 6, specifically may further comprise the steps to preserve the string matching process of rule by the string matching rule being inserted rule tree:
Step 600: cyclic variable i is set, and the sequence number of each sub-keyword that keyword Key is got is represented in loop initialization variable i=1 with this cyclic variable i;
Step 601: the leaf numbering of the upper level leaf of the first cross-talk keyword of insert rule is changed to 0, and the upper level leaf that institute is inserted the first regular cross-talk keyword is changed to sky; The purpose of this step is: when new regular of one of every insertion in rule tree T, guarantee that the upper level leaf of the first cross-talk keyword of this rule all is set to sky, and the upper level leaf is numbered 0;
Step 602: upper level leaf label is connected upward sub-keyword K
i, form new word string Nk
iWhen carrying out this step 602 at first, K
iBe K
1, the upper level leaf is numbered 0; When the circulation process of carrying out subsequent step returns step 602, this upper level leaf numbering and K
iRespective change takes place;
Step 603: in the T tree that has constructed, search Nk
iLongest prefix match, if find, then execution in step 611 and subsequent step, otherwise, execution in step 604 and subsequent step;
Step 604~step 607: owing in the T tree, do not find Nk
i, therefore, with Nk
iInsert in the T tree, obtain leaf Nf
i, for this leaf distributes unique leaf numbering NL
i, and with leaf Nf
iBe labeled as cascade, then, the value in " subordinate's counting " territory of upper level leaf added 1, this adds a step execution when the upper level leaf is not sky of 1, does not carry out under the situation of upper level leaf for sky;
Step 608: with NL
iAs current upper level leaf numbering, with Nf
iAs current upper level leaf;
Step 609~step 610: i adds 1 with cyclic variable, then, judge whether the currency of cyclic variable i equals the segmentation number s of Key keyword, if not, explanation is not also finished for the processing of each sub-keyword in the Key keyword, return step 602, continue to handle the sub-keyword of residue in the Key keyword; Otherwise, show that the sub-keyword when pre-treatment is K
S-1, only remaining last sub-keyword K
sPending, execution in step 621 and subsequent step;
Step 611: owing in step 603, in the T tree, found Nk
iLongest prefix match, in this step, the pairing leaf of this longest prefix match that finds is set to leaf OF
i
Step 612: judge leaf OF
iKeyword whether with Nk
iIdentical, if show current sub-keyword K
iThe leaf of required structure has been constructed in tree T in tree T, and this leaf of having constructed is exactly leaf OF
i, then execution in step 618 and subsequent step; If not, show current sub-keyword K
iThe leaf of required structure is not also constructed in tree T in tree T, then execution in step 613 and subsequent step;
Step 613~step 615: with Nk
iBe inserted among the tree T, obtain with Nk
iLeaf Nf for keyword
i, for this leaf distributes unique leaf numbering NL
i, and put leaf Nf
iBe " cascade ";
Step 616: with leaf Nf
iRecall pointed leaf OF
iThe purpose of carrying out this setting in this step is to find possible longest prefix match for the ease of recalling, and carries out this kind and recalls the reason of pointer setting and be:
Because leaf Nf
iIn keyword Nk
iWith leaf OF
iIn the keyword difference, and leaf OF
iKeyword be Nk
iLongest prefix match, therefore, OF
iThe length of middle keyword is certainly less than Nk
iIn key length, that is to say OF
iThe length of keyword certainly less than L, by OF
iKey length can and then obtain less than L: OF
iNext stage other leaf of cascade no longer certainly, OF
iIt must be certain regular afterbody leaf; According to leaf OF
iAbove-mentioned character and OF
iIn keyword be Nk
iThese characteristics of longest prefix match, can utilize the pointer of recalling that in step 616, sets to search and obtain possible longest prefix match, be specially:
In case leaf Nf
iNext stage or the leaf of stage further can't with string matching to be matched, then carry out retrospective search, when dating back to leaf Nf
iThe time, utilize leaf Nf
iOn the pointer of recalling find leaf OF
i, because leaf OF
iIn keyword be leaf Nf
iIn keyword longest prefix match and less than leaf Nf
iIn keyword, and because OF
iCertainly be the afterbody leaf of a certain rule, therefore, this OF
iBe the longest prefix match result; For instance, under the following situation, need utilization to recall pointer and realize longest prefix match:
There are following 3 rules:
Rule 1:A, B1;
Rule 2:A, B1B2, C, D;
Rule 3:A, B1B2, C, E, F;
Wherein the length of B1 and B2 is added up and is equaled L;
Rule 2 and rule 3 are being inserted in the process of rule tree, will find the leaf of the longest prefix match of word string 01+B1B2 respectively, this longest prefix match leaf is the pairing leaf of regular 1 neutron keyword B1, according to above-mentioned steps, the pointer of recalling of two leaves of corresponding B1B2 points to the pairing leaf of regular 1 neutron keyword B1 respectively in rule 2 and the rule 3; Suppose that keyword to be matched is A, B1B2, C, E, K, its matching process can pass through A so, B1B2, C, four sections of E are when arriving K at last, in rule tree, can not find the leaf that is complementary with K, so judge that whether the pointer of recalling of the leaf of sub-keyword E is empty, judge to obtain this to recall pointer be empty, judge again then whether the pointer of recalling of upper level leaf is empty, judge that the pointer of recalling that obtains the pairing leaf of this sub-keyword C also is empty, whether the pointer of recalling of then judging the pairing leaf of sub-keyword B1B2 more forward is empty, judges that the pointer of recalling that obtains this leaf is the pairing leaf of the sub-keyword B1 of sensing, therefore, find this matching string A, B1B2, C, E, the longest prefix match result of K is A, B1; Recall the specific embodiment that pointer is realized longest prefix match for utilization, also can set in the specific embodiment that carry out the character string longest prefix match and be introduced at the matched rule that later use is constructed;
Step 617: the value in " subordinate's counting " territory of upper level leaf is added 1, and execution in step 608 then, continue to handle other sub-keyword of keyword Key;
Step 618~step 619: according to leaf OF
iIn the cascade sign judge leaf OF
iWhether be cascade, if show leaf OF
iPart as rule has been inserted in the tree, and is corresponding, leaf OF
iCertainly be assigned with unique leaf numbering, therefore, direct execution in step 620; If not, show leaf OF
iBe the afterbody leaf of a certain rule, this leaf is current not with other subordinate's leaf cascade, and is corresponding, and this leaf is not assigned with unique leaf numbering, then is leaf OF
iDistribute unique leaf numbering OL
i, with leaf OF
iBe changed to cascade, and with leaf OF
iRecall pointed self, then, execution in step 620; Wherein, in this step, with leaf OF
iThe reason of recalling pointed self be:
Because leaf OF
iBe the afterbody leaf of a certain rule, therefore with this leaf recall pointed self recall pointer and find this afterbody leaf so that when carrying out string matching, can utilize;
Step 620: with leaf OF
iAs the upper level leaf, with OF
iLeaf numbering as upper level leaf numbering, then, execution in step 609 is handled other sub-keyword of keyword Key;
Step 621: owing to judge that in step 610 value that obtains current i is the segments s of keyword Key, therefore, the sub-keyword K of last of this step process keyword Key
s: current upper level leaf label is connected upward K
s, form new keyword K
e
Step 622: judge that whether can find its keyword in tree T is K
eThe leaf of longest prefix match, if, execution in step 626 and subsequent step, otherwise, execution in step 623 and subsequent step;
Step 623: owing to do not find keyword in tree T is K
eThe leaf of longest prefix match, therefore, with K
eBe inserted among the tree T, obtain with K
eLeaf Fk for keyword
e
Step 624~step 625: subordinate's counting of current upper level leaf is added 1, and the data division Data of rule is write leaf Fk
e" other " territory in, with preserve the rule particular content, then, finish this rule (Key, insertion process Data);
Step 626: owing in step 622, in tree T, found K
eLongest prefix match, therefore, the pairing leaf of the longest prefix match that this finds in this step is set to leaf Nk
e
Step 627~step 628: judge leaf Nk
eKeyword whether be formed sub-keyword K in the step 621
e, if show in tree T and constructed sub-keyword K
eLeaf Nk
e, Data writes leaf Nk with the regular data part
e" other " territory in, with preserve the rule particular content, then, finish this rule (Key, insertion process Data); If not, showing in tree T does not also have constructor keyword K
eLeaf, therefore, execution in step 623~step 625 is until end rules (Key, insertion process Data).
For each rule in the rule set, adopt the described method of the foregoing description to insert in the tree respectively and get final product, finally can obtain a tree that can carry out longest prefix match above the character string of L to length.
In embodiments of the present invention, insert rule (Key in the time of Data), adopts the label of cyclic variable i when inserting this rule, when having many rules to need to insert rule tree, can adopt following method to realize the uniqueness of label:
To number with a series of bit and shine upon one by one, and can obtain which numbering by these mapping relations and use, which numbering was not also used, thereby can realize distributing and reclaiming these numberings, for R is arranged at most
mThe the longest of keyword Key in bar rule and every the rule is L
mSituation, at most only need R
m* L
m/ L numbering gets final product.
Because matched rule may change, need delete the rule in the rule tree that has made up according to this variation, therefore, the invention provides the method for redundant rule elimination, this method utilization is accurately mated, and searches in rule tree and wants deleted rule, then deletion, referring to shown in Figure 7, its specific implementation step comprises:
Step 701: the keyword that will need deleted rule is that unit is divided into the sub-keyword of multistage with L-M, and its segmentation method is identical with above-mentioned segmentation method when preserving the string matching rule by rule tree;
Step 702: the upper level leaf of the first cross-talk keyword is changed to sky, the leaf of this upper level leaf is numbered be changed to 0;
Step 703: the first cross-talk keyword of getting from step 701 begins, till cross-talk keyword second from the bottom, in rule tree, search the pairing leaf of each cross-talk keyword, and the leaf node that finds is recorded in the predefined array, specifically carry out following steps 7031~step 7032 successively:
Step 7031: current sub-keyword is gone up in current upper level leaf numbering series connection, constitutes new word string, and search key is the leaf of the longest prefix match of this word string in rule tree, if find, then execution in step 7032; Otherwise the deletion rule failure finishes delete procedure;
Step 7032: the leaf that will be found in step 7031 is recorded among the predefined array B, this array B be used to preserve the leaf node of process, the leaf numbering of this leaf is numbered as the upper level leaf, the sub-keyword of next stage of sub-keyword that to work as pre-treatment is as current sub-keyword, return step 7031, until handling described cross-talk keyword second from the bottom;
Step 704: in rule tree, find the pairing leaf of last cross-talk keyword, the leaf node that finds is recorded among the array B, be specially: series connection upper level leaf numbering before last cross-talk keyword, form new word string, search key and the new identical leaf of word string in rule tree if find, then are recorded in this leaf node among the described array B of step 7032, otherwise, the delete procedure failure;
Step 705: find last leaf node that is write down among the array B, judge whether " subordinate's counting " territory of this node is 0, if the leaf that the next stage that shows this node no longer includes other is attached thereto execution in step 706; Otherwise, show that the next stage of this node also has the leaf of Else Rule to be attached thereto, execution in step 708;
Step 706: delete current handled leaf node, then according to the record among the array B, find the upper level leaf node of this leaf node, with this upper level leaf node as when the leaf node of pre-treatment, the value in " subordinate's counting " territory of deserving the leaf node of pre-treatment is subtracted 1, and execution in step 707 then;
Step 707: judge whether the value when " subordinate's counting " territory of the leaf node of pre-treatment is 0, if, then return step 706, be not 0 until the value of " subordinate's counting " of the leaf node of working as pre-treatment; Otherwise, the process of end deletion rule;
Step 708: only delete the data of being preserved in this leaf " other " territory, and delete the pointer of recalling that points to himself in this leaf, keep this leaf node, then, finish the process of deletion rule.
The present invention also provides a kind of method of utilizing above-mentioned string matching rule tree to carry out the character string longest prefix match, this method is divided into some cross-talk keywords with character string to be matched, the corresponding leaf of connecting respectively before each cross-talk keyword is numbered, constitute new word string, utilize this new word string in rule tree, to carry out matched and searched.
Below in conjunction with accompanying drawing, the method for utilizing rule tree to carry out the character string longest prefix match among the present invention is described in detail.
Referring to Fig. 8, the present invention realizes that utilizing rule tree to carry out the character string longest prefix match needs following steps:
Step 801: with character string to be matched is that unit is divided into the sub-keyword of multistage with L-M, and the segmentation method when its segmentation method is regular with above-mentioned preservation string matching is identical;
Step 802: the upper level leaf of the first cross-talk keyword is changed to sky, the leaf of this upper level leaf is numbered be changed to 0;
Step 803~step 804: current sub-keyword is gone up in current upper level leaf numbering series connection, constitute new word string, whether judgement can search key be the leaf of the longest prefix match of this word string in rule tree, if find, then execution in step 805, until handling resulting all the sub-keywords of step 801 segmentation; Otherwise, execution in step 808;
Step 805~step 807: judge the whether cascade of the leaf in step 804, found, if, show last leaf node that this leaf is not a matched rule, then this leaf node is recorded among the predefined array A, this array in order to preserve in the matching process the leaf node of process, then, with the numbering of this leaf that finds as current upper level leaf numbering, as current keyword, return step 803 with the sub-keyword of the next stage of current sub-keyword; Otherwise, show that this leaf that finds is the afterbody leaf that satisfies the string matching rule of longest prefix match requirement, from " other " territory of this leaf, take out these regular data, finish matching process;
Step 808~step 810: according to the leaf node of the process of in array A, being preserved in the search procedure, find the upper level leaf node of current leaf node, judge whether " recalling pointer " territory in this leaf node is empty, if be empty, then return step 808, " recalling pointer " territory in the upper level leaf node of current leaf is not for empty, if be not empty, then according to the character of recalling pointer leaf pointed of above-mentioned by the agency of, " recalling pointer " territory leaf pointed in the upper level leaf node of current leaf is exactly the longest prefix match of matching string, from " other " territory of this leaf, take out the data of rule, then, finish search procedure; Wherein, if in all leaf nodes of process, all can not find " recalling pointer " territory not to be empty leaf, then show the longest prefix match rule that can't find character string to be matched in rule tree, the longest prefix match failure finishes search procedure.
The above only is preferred embodiment of the present invention, and is in order to restriction the present invention, within the spirit and principles in the present invention not all, any modification of being done, is equal to replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (12)
1, a kind of method of preserving the string matching rule is characterized in that, this method may further comprise the steps:
Steps A: with L-M is that unit is divided into S cross-talk keyword with the current keyword of waiting to be kept at the rule in the rule tree, wherein, in the S cross-talk keyword of being got, the length of each the cross-talk keyword except that S cross-talk keyword is equal to L-M, the length of S cross-talk keyword is less than or equal to L-M, L is the extreme length that rule tree is preserved keyword in the rule, and M is the length of leaf numbering, and the hop count S that is got is a natural number;
Step B: begin from the 1st cross-talk keyword, till S-1 cross-talk keyword, order execution in step B1 and step B2, wherein, the upper level leaf of the 1st cross-talk keyword is numbered 0;
Step B1: the leaf numbering of series connection upper level leaf before the current sub-keyword, form new word string, the existing leaf that is keyword with this new word string whether in the judgment rule tree, if do not have, the leaf that then to generate one in rule tree be keyword with this new word string, and be that this newly-generated leaf distributes unique leaf numbering, the cascade sign of this newly-generated leaf is changed to cascade, should be changed to current leaf by newly-generated leaf then; If have, and the cascade of this leaf that finds is masked as cascade, and then the leaf that this is found is changed to current leaf;
Step B2: next the cross-talk keyword of handled sub-keyword among the step B1 as current sub-keyword, as current upper level leaf numbering, is returned the leaf numbering of handled sub-keyword among the step B1 to execution in step B1;
Step C: the leaf numbering of series connection upper level leaf before the S cross-talk keyword, form new word string, the existing leaf that is keyword with this new word string whether in the judgment rule tree, if do not have, the leaf that then to generate one in rule tree be keyword with this new word string, to wait that then the data content of preserving in the rule writes this leaf,, will wait that directly the data content of preserving in the rule writes this leaf if having.
2, method according to claim 1 is characterized in that, whether had in judgment rule described in the step B1 tree with described word string is that the leaf of keyword may further comprise the steps:
Step B11: judge that whether the longest prefix match with described word string is arranged is the leaf of keyword in rule tree,, judge that then obtaining in the rule tree is not the leaf of keyword with described word string if do not have, if having, execution in step B12 then;
Step B12: whether the longest prefix match keyword that finds among the determining step B11 is identical with described word string, if not, judge that then obtaining in the rule tree is not the leaf of keyword with described word string, if judge that then obtaining having in the rule tree with described word string is the leaf of keyword.
3, method according to claim 2 is characterized in that, among the step B12, if the longest prefix match keyword is different with described word string, this method further comprises:
Described newly-generated leaf recalled the pairing leaf of this longest prefix match keyword of pointed.
4, method according to claim 1 is characterized in that, among the step B1, if what find in rule tree is the not cascade of leaf of keyword with described word string, describedly the leaf that finds is changed to current leaf further comprises:
Be that the described leaf that finds distributes unique leaf numbering, with this leaf that finds be changed to cascade and with this leaf that finds recall pointed himself.
5, method according to claim 1 is characterized in that, whether had in judgment rule described in the step C tree with this word string is that the leaf of keyword may further comprise the steps:
Step C1: judge that whether the longest prefix match with described word string is arranged is the leaf of keyword in rule tree,, judge that then obtaining in the rule tree is not the leaf of keyword with this word string if do not have, if having, execution in step C2 then;
Step C2: whether the longest prefix match keyword that finds among the determining step C1 is identical with described word string, if not, judge that then obtaining in the rule tree is not the leaf of keyword with this word string, if judge that then obtaining having in the rule tree with this word string is the leaf of keyword.
6, method according to claim 1 is characterized in that, described in the step C data of rule is write leaf to be: the data in the rule are write in " other " territory of leaf.
7, method according to claim 1, it is characterized in that, in step B1 neutralization procedure C, described new in rule tree to make up one be that the leaf of keyword further comprises with described word string: the value that will be somebody's turn to do in " subordinate's counting " territory in the upper level leaf of new structure leaf adds 1.
8, method according to claim 7 is characterized in that, when in the deletion rule tree one was regular, this method further comprised:
Step D1: be that keyword in the rule that unit will be to be deleted is divided into T cross-talk keyword with L-M; Wherein, in the T cross-talk keyword of being got, the length of each the cross-talk keyword except that T cross-talk keyword is equal to L-M, and the length of T cross-talk keyword is less than or equal to L-M, and the hop count T that is got is a natural number;
Step D2: the 1st cross-talk keyword from step D1 begins, and till T-1 cross-talk keyword, order is carried out following steps D21~step D22, and wherein, the upper level leaf of the 1st cross-talk keyword is numbered 0:
Step D21: the leaf numbering of series connection upper level leaf before current sub-keyword, forms new word string, judge that can find with this word string be the leaf of keyword in rule tree, if, execution in step D22 then, otherwise delete procedure is failed;
Step D22: the leaf node that will be found in step D21 is recorded in the predefined array, numbers as the upper level leaf with the leaf numbering of this leaf, as current sub-keyword, returns step D21 with the sub-keyword of the next stage of current sub-keyword;
Step D3: series connection upper level leaf numbering before the T cross-talk keyword, form new word string, can judgement find its keyword and the identical leaf of this new word string in rule tree, if, then this leaf node is recorded in the described array of step D22, execution in step D4 then, otherwise, the delete procedure failure;
Step D4: according to the record of array among the step D22, last leaf node for the treatment of deletion rule that is write down in this array begins certainly, treats to this to carry out following steps successively till first leaf node of deletion rule:
Whether " subordinate's counting " territory of judging the leaf of working as pre-treatment is 0, if, then the leaf of pre-treatment is deserved in deletion, and the value in " subordinate's counting " territory of upper level leaf that will deserve the leaf of pre-treatment subtracts 1, then, this upper level leaf is changed to leaf when pre-treatment, returns and carry out this step; Otherwise, finish delete procedure.
9, method according to claim 8 is characterized in that, if when the leaf of pre-treatment be described last leaf node, and " subordinate's counting " territory of this leaf is not 0, this method further comprises:
Delete the pointer of recalling that points to himself in this leaf node.
10, a kind of method of utilizing the string matching rule tree to carry out string matching is characterized in that, described string matching rule tree is to adopt the described method of claim 1 to set up, and this method of carrying out string matching may further comprise the steps:
Step e: with L-M is that unit is divided into P cross-talk keyword with character string to be matched, wherein, in the P cross-talk keyword of being got, the length of each the cross-talk keyword except that P cross-talk keyword is equal to L-M, the length of P cross-talk keyword is less than or equal to L-M, and the hop count P that is got is a natural number;
Step F: begin from the 1st cross-talk keyword, till P cross-talk keyword, order is carried out following steps F1~step F 2, and wherein, the upper level leaf of the 1st cross-talk keyword is numbered 0:
Step F 1: the leaf numbering of series connection upper level leaf before the current sub-keyword, forms new word string, during judgment rule is set whether keyword being arranged is the leaf of the longest prefix match of this word string, if having, execution in step F2 then, otherwise it fails to match;
Step F 2: judge the whether cascade of the leaf that found in step F 1, if, then with the leaf of this leaf numbering as upper level leaf numbering, with the sub-keyword of the next stage of current sub-keyword as current sub-keyword, return step F 1 then, until the not cascade of finding of leaf; Otherwise this leaf is the afterbody leaf of matching result, takes out the data of rule from this leaf, and coupling finishes.
11, method according to claim 10 is characterized in that, in the step F 1, is the leaf of the longest prefix match of this character string if keyword is arranged in the rule tree, and this method further comprises:
This leaf is recorded in the array in order.
12, method according to claim 11 is characterized in that, in the step F 1, is the leaf of the longest prefix match of this character string if there is not keyword in the rule tree, and this method further comprises:
Last leaf node from described array record begins, whether the pointer of recalling of the leaf of judging this array in order and being write down is empty, if, then return this step, the pointer of recalling that obtains all leaves of being write down until judgement is sky, and then it fails to match, otherwise, recall the afterbody leaf that pointer leaf pointed is a matching result, take out the data of rule from this leaf, coupling finishes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100010993A CN100531179C (en) | 2004-02-03 | 2004-02-03 | Method for storing character string matching rule and character string matching by storing rule |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100010993A CN100531179C (en) | 2004-02-03 | 2004-02-03 | Method for storing character string matching rule and character string matching by storing rule |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1652534A CN1652534A (en) | 2005-08-10 |
CN100531179C true CN100531179C (en) | 2009-08-19 |
Family
ID=34867019
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2004100010993A Expired - Fee Related CN100531179C (en) | 2004-02-03 | 2004-02-03 | Method for storing character string matching rule and character string matching by storing rule |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100531179C (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101588291B (en) * | 2008-05-22 | 2013-01-09 | 原创信通电信技术(北京)有限公司 | Method for determining packet transmission route in IP telecommunication network system |
CN101771675B (en) * | 2008-12-31 | 2013-06-05 | 深圳市广道高新技术有限公司 | Method and device for implementing feature matching of data packet |
CN103902554B (en) * | 2012-12-25 | 2018-06-29 | 阿里巴巴集团控股有限公司 | Data access method and device |
-
2004
- 2004-02-03 CN CNB2004100010993A patent/CN100531179C/en not_active Expired - Fee Related
Non-Patent Citations (6)
Title |
---|
一种应用紧缩表技术的高效路由查表方案. 李云涛,郭云飞.信息工程大学学报,第2卷第1期. 2001 |
一种应用紧缩表技术的高效路由查表方案. 李云涛,郭云飞.信息工程大学学报,第2卷第1期. 2001 * |
快速路由查找算法及其实现. 徐宇锋,李乐民.通信技术,第7期. 2001 |
快速路由查找算法及其实现. 徐宇锋,李乐民.通信技术,第7期. 2001 * |
高性能安全路由器中快速路由查找算法的研究与实现. 吴剑,陈修环,徐明伟,徐恪.电子学报,第28卷第11A期. 2000 |
高性能安全路由器中快速路由查找算法的研究与实现. 吴剑,陈修环,徐明伟,徐恪.电子学报,第28卷第11A期. 2000 * |
Also Published As
Publication number | Publication date |
---|---|
CN1652534A (en) | 2005-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6047283A (en) | Fast string searching and indexing using a search tree having a plurality of linked nodes | |
US5758353A (en) | Storage and retrieval of ordered sets of keys in a compact 0-complete tree | |
CN1552032B (en) | Database | |
US6564211B1 (en) | Fast flexible search engine for longest prefix match | |
EP2040184B1 (en) | Database and database processing methods | |
Krishnan et al. | Estimating alphanumeric selectivity in the presence of wildcards | |
CN111801665B (en) | Hierarchical Locality Sensitive Hash (LSH) partition index for big data applications | |
US20070100873A1 (en) | Information retrieving system | |
US6427147B1 (en) | Deletion of ordered sets of keys in a compact O-complete tree | |
KR20010083096A (en) | Value-instance-connectivity computer-implemented database | |
CN101430708A (en) | Blog hierarchy classification tree construction method based on label clustering | |
CN108509505B (en) | Character string retrieval method and device based on partition double-array Trie | |
US7096235B2 (en) | Computer implemented compact 0-complete tree dynamic storage structure and method of processing stored data | |
CN101345707A (en) | Method and apparatus for implementing IPv6 packet classification | |
US6735600B1 (en) | Editing protocol for flexible search engines | |
US20230195769A1 (en) | Computer system and method for indexing and retrieval of partially specified type-less semi-infinite information | |
KR100999408B1 (en) | Method for searching an ??? using hash tree | |
CN100531179C (en) | Method for storing character string matching rule and character string matching by storing rule | |
CN110457531A (en) | A kind of parallel by character string querying method based on OpenMP | |
Fujino et al. | Discovering unordered and ordered phrase association patterns for text mining | |
Yazdani et al. | Prefix trees: new efficient data structures for matching strings of different lengths | |
Hlybovets et al. | Constructing Generalized Suffix Trees on Distributed Parallel Platforms | |
Ostadzadeh et al. | Massive concurrent deletion of keys in B*-tree | |
JP2024068905A (en) | Index Management Device | |
JP2021114037A (en) | Index management device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20090819 Termination date: 20170203 |