CN103186640A - AC algorithm based regular matching flow filtering method and device - Google Patents

AC algorithm based regular matching flow filtering method and device Download PDF

Info

Publication number
CN103186640A
CN103186640A CN2011104603659A CN201110460365A CN103186640A CN 103186640 A CN103186640 A CN 103186640A CN 2011104603659 A CN2011104603659 A CN 2011104603659A CN 201110460365 A CN201110460365 A CN 201110460365A CN 103186640 A CN103186640 A CN 103186640A
Authority
CN
China
Prior art keywords
regular pattern
string
substring
coupling
pattern string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011104603659A
Other languages
Chinese (zh)
Other versions
CN103186640B (en
Inventor
李鹏程
刘涛
刘宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201110460365.9A priority Critical patent/CN103186640B/en
Publication of CN103186640A publication Critical patent/CN103186640A/en
Application granted granted Critical
Publication of CN103186640B publication Critical patent/CN103186640B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an AC (Aho Corasick) algorithm based regular matching flow filtering method and device. The method comprises the following steps: initializing an AC state machine table; adding a regular pattern string into the AC state machine table, and stabling the AC state machine table; matching an input main character string with the regular pattern string; and if the input main character string is matched with the regular pattern string, filtering the input main character string. According to the method provided by the embodiment of the invention, multimode matching of regular character strings of regular grammar is supported, so as to avoid flow detection and repeated distribution for target character string during filtering, so that the internal memory space is saved, and the flow filtering efficiency is improved.

Description

Employing is based on traffic filtering method and the device of the canonical coupling of AC algorithm
Technical field
The present invention relates to Internet technical field, relate in particular to a kind of employing based on traffic filtering method and the device of the canonical coupling of AC algorithm.
Background technology
AC algorithm (the Aho_Corasick of prior art, the automat matching algorithm) at first sets up three functions and namely turn to function, inefficacy function and output function, set up a deterministic tree-like finite state machine according to three functions and multi-mode character string, the capital characters string that will be mated makes this finite state machine carry out the state conversion according to described capital characters string as the input of this finite state machine, and the emergence pattern coupling then is described when limited state machine reaches some specific state.Wherein, turn to function g to represent the relation that turns between a kind of state, for example, g (pre, x)=next, expression state pre is converted to state next behind character x of input, if there is not such conversion in pattern string, then next=failture is convert failed; The inefficacy function f is also represented the relation that turns between a kind of state, and for example, f (pre)=next is the transformational relation that uses under the situation that compares mismatch; Output function output represents a kind of relation between state and the model string, and for example, output (i)={ P}, when expression arrived state i when state machine, { all model strings among the P} may be finished coupling in the model string set.The advantage that the AC algorithm exists is that time complexity is 0 (n), and simple and clear, efficient, highly versatile, therefore is widely used.
But, there is defective in the AC algorithm of prior art, the AC algorithm is only supported string matching completely, the coupling of the multimode target string of tenaculum regularity property not, in flow detection and filter process, can cause the repetition of similar target string to distribute in a large number, thereby cause memory headroom to be taken in a large number, influence work efficiency, even cause the collapse of system.
Summary of the invention
The present invention is intended to one of solve the problems of the technologies described above at least.
For this reason, one object of the present invention is to propose a kind of character string multimode matching of supporting regular grammar with the employing of the duplicate allocation of avoiding target string and the save memory traffic filtering method based on the canonical coupling of AC algorithm.
Another object of the present invention is to propose a kind of employing based on the traffic filtering device of the canonical coupling of AC algorithm.
To achieve these goals, may further comprise the steps according to the employing of the embodiment of the first aspect present invention traffic filtering method based on the canonical coupling of AC algorithm: initialization AC state machine table; The regular pattern string is added to described AC state machine table, and sets up the AC state machine; Capital characters string and the described regular pattern string of input are mated; If the capital characters string of described input and regular pattern string coupling, then the capital characters string to described input filters.
According to the employing of the embodiment of the invention traffic filtering method based on the canonical coupling of AC algorithm, support the multimode matching of the canonical character string of regular grammar, can avoid the duplicate allocation of target string in flow detection and the filtration, thereby save memory headroom, improve the efficient of traffic filtering.
To achieve these goals, comprise according to the employing of the embodiment of the second aspect present invention traffic filtering device based on the canonical coupling of AC algorithm: initialization module is used for initialization AC state machine table; Creation module is used for the regular pattern string is added to described AC state machine table, and sets up the AC state machine; Matching module is used for capital characters string and the described regular pattern string of input are mated; And filtering module, if the capital characters string of described input and regular pattern string coupling, described filtering module is used for the capital characters string of described input is filtered.
According to the employing of the embodiment of the invention traffic filtering device based on the canonical coupling of AC algorithm, support the multimode matching of the canonical character string of regular grammar, can avoid the duplicate allocation of target string in flow detection and the filtration, thereby saved memory headroom, used this device can improve the work efficiency of traffic filtering.
The aspect that the present invention adds and advantage part in the following description provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Description of drawings
Above-mentioned and/or the additional aspect of the present invention and advantage be from obviously and easily understanding becoming the description of embodiment below in conjunction with accompanying drawing, wherein,
Fig. 1 is the process flow diagram that adopts the traffic filtering method of mating based on the canonical of AC algorithm according to an embodiment of the invention;
Fig. 2 is the process flow diagram that adopts the traffic filtering method of mating based on the canonical of AC algorithm according to an embodiment of the invention;
Fig. 3 is the process flow diagram that adopts the traffic filtering method of mating based on the canonical of AC algorithm according to an embodiment of the invention;
Fig. 4 is the structured flowchart that adopts the traffic filtering device that mates based on the canonical of AC algorithm according to an embodiment of the invention;
Fig. 5 is the structured flowchart that adopts the traffic filtering device that mates based on the canonical of AC algorithm according to an embodiment of the invention;
Fig. 6 is the structured flowchart that adopts the traffic filtering device that mates based on the canonical of AC algorithm according to an embodiment of the invention; And
Fig. 7 is the structured flowchart that adopts the traffic filtering device that mates based on the canonical of AC algorithm according to an embodiment of the invention.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein identical or similar label is represented identical or similar elements or the element with identical or similar functions from start to finish.Be exemplary below by the embodiment that is described with reference to the drawings, only be used for explaining the present invention, and can not be interpreted as limitation of the present invention.On the contrary, embodiments of the invention comprise spirit and interior all changes, modification and the equivalent of intension scope that falls into institute's additional claims.
In description of the invention, it will be appreciated that term " first ", " second " etc. only are used for describing purpose, and can not be interpreted as indication or hint relative importance.In description of the invention, need to prove that unless clear and definite regulation and restriction are arranged in addition, term " links to each other ", " connection " should do broad understanding, for example, can be fixedly connected, also can be to removably connect, or connect integratedly; Can be mechanical connection, also can be to be electrically connected; Can be directly to link to each other, also can link to each other indirectly by intermediary.For the ordinary skill in the art, can concrete condition understand above-mentioned term concrete implication in the present invention.In addition, in description of the invention, except as otherwise noted, the implication of " a plurality of " is two or more.
Describe and can be understood that in the process flow diagram or in this any process of otherwise describing or method: expression comprises module, fragment or the part of code of the executable instruction of the step that one or more is used to realize specific logical function or process, and the scope of preferred implementation of the present invention comprises other realization, wherein can be not according to order shown or that discuss, comprise according to related function by the mode of basic while or by opposite order, carry out function, this should be understood by the embodiments of the invention person of ordinary skill in the field.
The traffic filtering method of mating based on the canonical of AC algorithm according to the employing of the embodiment of the invention is described below with reference to Figure of description.
A kind of traffic filtering method that adopts based on the canonical coupling of AC algorithm may further comprise the steps: initialization AC state machine table; The regular pattern string is added to AC state machine table, and sets up the AC state machine; Capital characters string and the regular pattern string of input are mated; If the capital characters string of input and regular pattern string coupling, then the capital characters string to input filters.
Fig. 1 is that the employing of one embodiment of the invention is based on the process flow diagram of the traffic filtering method of the canonical coupling of AC algorithm.
As shown in Figure 1, the traffic filtering method according to the employing of the embodiment of the invention is mated based on the canonical of AC algorithm comprises the steps.
Step S101, initialization AC state machine table.
Particularly, AC state machine table comprises GO TO table (goto), lost efficacy table (fail) and output table (output), and three of corresponding A C algorithm functions namely turn to function, inefficacy function and output function respectively.In one embodiment of the invention, initialization procedure also comprises the distribution of the memory pool of AC state machine.
Step S102 is added to AC state machine table with the regular pattern string, and sets up the AC state machine.
Particularly, at first the regular pattern string is joined in the AC state machine table, then, set up the foundation that the uncertainty state machine is namely finished GO TO table according to the regular pattern string, be converted to the determinacy state machine again and namely finish the foundation of inefficacy table, finish the foundation of AC state machine at last.
Wherein, the regular pattern string can be the multimode target string that comprises regular grammar.In one embodiment of the invention, the regular pattern string comprises any expression symbol (.*), and particularly, (.*) the expression optional sign occurs 0 time or repeatedly, for example ab (.*) cd, abc (.*) bcd etc.In another embodiment of the present invention, the regular pattern string also can comprise begin symbol ^ and by symbol $, for example ^ab$ etc.
Step S103 mates capital characters string and the regular pattern string of importing.
Particularly, after the capital characters string and regular pattern string coupling with input, obtain matching result, discharge the AC state machine then, and the memory pool that distributes is reclaimed, to discharge memory headroom.Save memory space thus.
Step S104, if the capital characters string of input and regular pattern string coupling, then the capital characters string to input filters.
According to the employing of the embodiment of the invention traffic filtering method based on the canonical coupling of AC algorithm, support the multimode matching of the canonical character string of regular grammar, can avoid the duplicate allocation of target string in flow detection and the filtration, thereby save memory headroom, improve the efficient of traffic filtering.
Fig. 2 is that the employing of one embodiment of the invention is based on the process flow diagram of the traffic filtering method of the canonical coupling of AC algorithm.
As shown in Figure 2, the traffic filtering method according to the employing of the embodiment of the invention is mated based on the canonical of AC algorithm comprises the steps.
Step S201, initialization AC state machine table.
Particularly, AC state machine table comprises GO TO table (goto), lost efficacy table (fail) and output table (output), and three of corresponding A C algorithm functions namely turn to function, inefficacy function and output function respectively.In one embodiment of the invention, initialization procedure also comprises the distribution of the memory pool of AC state machine.
Step S202 is added to AC state machine table with the regular pattern string, and sets up the AC state machine.
Particularly, at first the regular pattern string is joined in the AC state machine table, then, set up the foundation that the uncertainty state machine is namely finished GO TO table according to the regular pattern string, be converted to the determinacy state machine again and namely finish the foundation of inefficacy table, finish the foundation of AC state machine at last.
Wherein, the regular pattern string can be the multimode target string that comprises regular grammar.In one embodiment of the invention, the regular pattern string comprises any expression symbol (.*), and particularly, (.*) the expression optional sign occurs 0 time or repeatedly, for example ab (.*) cd, abc (.*) bcd etc.In another embodiment of the present invention, the regular pattern string also can comprise begin symbol ^ and by symbol $, for example ^ab$ etc.
Step S203 divides to generate a plurality of substrings of regular pattern string correspondence to the canonical pattern string according to any expression symbol.
For example: given regular pattern string ab (.*) cd, represent that arbitrarily symbol (.*) is divided into corresponding two substring ab and cd with regular pattern string ab (.*) cd.And for example: given regular pattern string abc (.*) bcd is divided into corresponding two substring abc and bcd according to any expression symbol (.*) with regular pattern string abc (.*) bcd.
Step S204, with a plurality of substrings each all with the input the capital characters string mate to generate match information.
In one embodiment of the invention, match information comprises that reference position takes place for regular pattern string ID, substring ID, coupling and final position takes place coupling.Wherein, the number that comprises substring in order that regular pattern string ID, substring occur and the regular pattern string among the substring ID in the regular pattern string.
In one embodiment of the invention, for comprising begin symbol ^ in the regular pattern string and ending symbol $, needn't carry out substring divides, ^ab$ for example, judge that directly final position takes place for coupling generation reference position and coupling is the beginning and end of capital characters string, if then mate, otherwise do not mate.
Particularly, the word string of each division and the capital characters string of input are mated, for example, two substring ab that will divide according to regular pattern string ab (.*) cd and cd all mate with the capital characters string abc of an input.Use the ID of pattern to represent match information, the ID of the corresponding pattern of each substring, the ID that supposes the pattern of substring ab is 0x21FF, the ID of the pattern of substring cd is 0x22FF, stipulate ID low 16 identical of each substring pattern at this, low 16 bit representation pattern string ID, for example 0x00FF that this is identical; 16 of the ID of the pattern of each substring orders that occur in the module string to each substring of 23 bit representations also are expressed as each substring ID, the order that in regular pattern string ab (.*) cd, occurs of substring ab and substring cd for example, substring ab be 1 appear at before, substring cd be 2 appear at after, the ID that is substring ab is 1, and the ID of substring cd is 2; The substring number that 24 to the 31 bit representation regular pattern strings of the ID of each substring pattern are divided into, for example the substring number that is divided into of regular pattern string ab (.*) cd is 2.
Step S205 judges according to match information whether capital characters string and the regular pattern string of input mates.
In one embodiment of the invention, the order that also occurred in the regular pattern string according to substring among the substring ID before step S205 is gone heavily the substring that mates.For example, when coupling arrives the end of the capital characters string of importing, relatively 16 of all ID to 23,16 to 23 equal substrings are gone heavily.Go heavy after, relatively 16 to 23 and 24 to 31 of ID, if when going that substring after heavy is total to be equated with 24 to 31 the value of ID, then the regular pattern string mates.According to going retry can improve matched accuracy.
In one embodiment of the invention, after the capital characters string and regular pattern string coupling with input, obtain matching result, discharge the AC state machine then, and the memory pool that distributes is reclaimed, to discharge memory headroom.Further save memory space thus.
Step S206, if the capital characters string of input and regular pattern string coupling, then the capital characters string to input filters.
Be to be understood that, with substring in a certain order or the position successively with the capital characters string of input when mating, be to mate in the expressed mode of regular expression, in the search procedure of the substring ab of above-mentioned example and cd, be according to two independently character string search, but distinguish to some extent with two string searchings fully independently, substring ab and cd belong to a regular pattern string ID again, the order difference that each substring for example occurs among ab (.*) cd at the regular pattern string, and have correlativity between the two.
According to the traffic filtering method that the employing of the embodiment of the invention is mated based on the canonical of AC algorithm, can carry out multimode matching to the represented regular pattern string of any expression symbol, and improve matched accuracy.
Fig. 3 is that the employing of one embodiment of the invention is based on the process flow diagram of the traffic filtering method of the canonical coupling of AC algorithm.
As shown in Figure 3, the traffic filtering method according to the employing of the embodiment of the invention is mated based on the canonical of AC algorithm comprises the steps.
Step S301, initialization AC state machine table.
Particularly, initialized AC state machine table comprises GO TO table (goto), lost efficacy table (fail) and output table (output), and three of corresponding A C algorithm functions namely turn to function, inefficacy function and output function respectively.In one embodiment of the invention, initialization procedure also comprises the Memory Allocation of AC state machine.
Step S302 is added to AC state machine table with the regular pattern string, and sets up the AC state machine.
Particularly, at first the regular pattern string is joined in the AC state machine table, then, set up the foundation that the uncertainty state machine is namely finished GO TO table according to the regular pattern string, be converted to the determinacy state machine again and namely finish the foundation of inefficacy table, finish the foundation of AC state machine at last.
Wherein, the regular pattern string can be the multimode target string that comprises regular grammar.In one embodiment of the invention, the regular pattern string comprises any expression symbol (.*), and particularly, (.*) the expression optional sign occurs 0 time or repeatedly, for example ab (.*) cd, abc (.*) bcd etc.In another embodiment of the present invention, the regular pattern string also can comprise begin symbol ^ and by symbol $, for example ^ab$ etc.
Step S303 divides to generate a plurality of substrings of regular pattern string correspondence to the canonical pattern string according to any expression symbol.
For example: given regular pattern string ab (.*) cd, represent that arbitrarily symbol (.*) is divided into corresponding two substring ab and cd with regular pattern string ab (.*) cd.And for example: given regular pattern string abc (.*) bcd is divided into corresponding two substring abc and bcd according to any expression symbol (.*) with regular pattern string abc (.*) bcd.
Step S304, with a plurality of substrings each all with the input the capital characters string mate to generate match information.
In one embodiment of the invention, match information comprises that reference position takes place for regular pattern string ID, substring ID, coupling and final position takes place coupling.Wherein, the number that comprises substring in order that regular pattern string ID, substring occur and the regular pattern string among the substring ID in the regular pattern string.
In one embodiment of the invention, for comprising begin symbol ^ in the regular pattern string and ending symbol $, needn't carry out substring divides, ^ab$ for example, judge that directly final position takes place for coupling generation reference position and coupling is the beginning and end of capital characters string, if then mate, otherwise do not mate.
Particularly, the word string of each division and the capital characters string of input are mated, for example, two substring ab that will divide according to regular pattern string ab (.*) cd and cd all mate with the capital characters string abc of an input.Use the ID of pattern to represent match information, the ID of the corresponding pattern of each substring, the ID that supposes the pattern of substring ab is 0x21FF, the ID of the pattern of substring cd is 0x22FF, stipulate ID low 16 identical of each substring pattern at this, low 16 bit representation pattern string ID, for example 0x00FF that this is identical; 16 of the ID of the pattern of each substring orders that occur in the module string to each substring of 23 bit representations also are expressed as each substring ID, the order that in regular pattern string ab (.*) cd, occurs of substring ab and substring cd for example, substring ab be 1 appear at before, substring cd be 2 appear at after, the ID that is substring ab is 1, and the ID of substring cd is 2; The substring number that 24 to the 31 bit representation regular pattern strings of the ID of each substring pattern are divided into, for example the substring number that is divided into of regular pattern string ab (.*) cd is 2.
Step S305, the substring of record coupling and corresponding match information.
Particularly, the substring of record coupling and corresponding match information simultaneously in the matching process of the word string of each division and the capital characters string of input.
In one embodiment of the invention, before the substring of record coupling and corresponding match information also according to substring ID in the order that in the regular pattern string, occurs of substring the substring of coupling is gone heavily.For example, when coupling arrives the end of the capital characters string of importing, relatively 16 of all ID to 23,16 to 23 equal substrings are gone heavily.According to going retry can improve matched accuracy.
Step S306 sorts according to the substring of match information to coupling.
For example, can reference position take place and mate the generation final position substring that mates is sorted according to pattern string ID, substring ID, coupling.
Step S307 judges whether the capital characters string of input comprises whole substrings of regular pattern string.
Step S308 if comprise whole substrings of regular pattern string, then further judges in adjacent two substrings in ordering back in the coupling generation reference position of back substring greater than the coupling generation final position of substring formerly.
If the capital characters string of input does not comprise whole substrings of regular pattern string, judge that then capital characters string and the regular pattern string of input do not match, change step S311 over to.
Thus, can further improve matched accuracy, for example, given regular pattern string abc (.*) bcd mates the capital characters string abcd of input, if this step then can draw matching result not, reality is not mated, the capital characters string abcd of input can mate substring abc and substring bcd respectively, and mate substring abc earlier, back coupling substring bcd, it is overlapping that but coupling has taken place, and at this moment, with coupling final position takes place according to coupling generation reference position and can draw correct matching result, in when, as substring abc coupling taking place, reference position takes place coupling and coupling generation final position is designated as [1,3], when coupling takes place substring bcd, reference position takes place coupling and coupling generation final position is designated as [2,4], the coupling generation reference position of this moment back substring bcd is less than the coupling generation final position of substring abc formerly, and coupling is success.
Step S309, if judge greater than, judge that then the capital characters string of input and regular pattern string mate.
Particularly, in adjacent two substrings in ordering back in the coupling generation reference position of back substring greater than the coupling generation final position of substring formerly, judge thus that then the capital characters string of input and regular pattern string mate.
If the coupling generation reference position at the back substring in adjacent two substrings in ordering back is not more than the coupling generation final position of substring formerly, judge that then capital characters string and the regular pattern string of input do not match, change step S311 over to.
Step S310 filters the capital characters string of importing.
Particularly, according to the judged result of step S309, if the capital characters string of input and regular pattern string coupling, then the capital characters string to input filters.
Step S311, capital characters string and the regular pattern string of input do not match.
Particularly, when capital characters string and the regular pattern string of input do not match, the capital characters string of importing is not filtered.At this moment, discharge the AC state machine, and the memory pool that distributes is reclaimed, to discharge memory headroom.
Be to be understood that, with substring in a certain order or the position successively with the capital characters string of input when mating, be to mate in the expressed mode of regular expression, in the search procedure of the substring ab of above-mentioned example and cd, be according to two independently character string search, but distinguish to some extent with two string searchings fully independently, substring ab and cd belong to a regular pattern string ID again, the order difference that each substring for example occurs among ab (.*) cd at the regular pattern string, and have correlativity between the two.
Traffic filtering method according to the employing of the embodiment of the invention is mated based on the canonical of AC algorithm further improves matched accuracy.
The traffic filtering device that mates based on the canonical of AC algorithm according to the employing of the embodiment of the invention is described below with reference to Figure of description.
A kind of traffic filtering device that adopts based on the canonical coupling of AC algorithm, comprising: initialization module is used for initialization AC state machine table.Creation module is used for the regular pattern string is added to AC state machine table, and sets up the AC state machine.Matching module is used for capital characters string and the regular pattern string of input are mated.Filtering module, if the capital characters string of input and regular pattern string coupling, filtering module is used for the capital characters string of input is filtered.
Fig. 4 is that the employing of one embodiment of the invention is based on the structured flowchart of the traffic filtering device of the canonical coupling of AC algorithm.
As shown in Figure 4, the traffic filtering device according to the employing of the embodiment of the invention is mated based on the canonical of AC algorithm comprises initialization module 100, creation module 200, matching module 300 and filtering module 400.
Particularly, initialization module 100 is used for initialization AC state machine table.Wherein, AC state machine table comprises GO TO table (goto), lost efficacy table (fail) and output table (output), and three of corresponding A C algorithm functions namely turn to function, inefficacy function and output function respectively.In one embodiment of the invention, initialization procedure also comprises the distribution of the memory pool of AC state machine.
Creation module 200 is used for the regular pattern string is added to AC state machine table, and sets up the AC state machine.Particularly, at first the regular pattern string is joined in the AC state machine table, then, set up the foundation that the uncertainty state machine is namely finished GO TO table according to the regular pattern string, be converted to the determinacy state machine again and namely finish the foundation of inefficacy table, finish the foundation of AC state machine at last.
Wherein, the regular pattern string can be the multimode target string that comprises regular grammar.In one embodiment of the invention, the regular pattern string comprises any expression symbol (.*), and particularly, (.*) the expression optional sign occurs 0 time or repeatedly, for example ab (.*) cd, abc (.*) bcd etc.In another embodiment of the present invention, the regular pattern string also can comprise begin symbol ^ and by symbol $, for example ^ab$ etc.
Matching module 300 is used for capital characters string and the regular pattern string of input are mated.Wherein, in one embodiment of the invention, after the capital characters string and regular pattern string coupling of matching module 300 with input, obtain matching result, discharge the AC state machine then, and the memory pool that distributes is reclaimed, to discharge memory headroom.But save memory space thus.
According to the matching result of matching module 300, if the capital characters string of input and regular pattern string coupling, filtering module 400 is used for the capital characters string of input is filtered.
According to the employing of the embodiment of the invention traffic filtering device based on the canonical coupling of AC algorithm, support the multimode matching of the canonical character string of regular grammar, can avoid the duplicate allocation of target string in flow detection and the filtration, thereby saved memory headroom, used this device can improve the work efficiency of traffic filtering.
Fig. 5 is that the employing of one embodiment of the invention is based on the structured flowchart of the traffic filtering device of the canonical coupling of AC algorithm.
As shown in Figure 5, traffic filtering device according to the employing of the embodiment of the invention is mated based on the canonical of AC algorithm comprises initialization module 100, creation module 200, substring division unit 310, match information generation unit 320, matching unit 330 and filtering module 400.
In one embodiment of the invention, matching module 300 comprises substring division unit 310, match information generation unit 320 and matching unit 330.
Particularly, substring division unit 310 is used for the canonical pattern string being divided to generate a plurality of substrings of regular pattern string correspondence according to any symbol of representing.For example: given regular pattern string ab (.*) cd, represent that arbitrarily symbol (.*) is divided into corresponding two substring ab and cd with regular pattern string ab (.*) cd.And for example: given regular pattern string abc (.*) bcd is divided into corresponding two substring abc and bcd according to any expression symbol (.*) with regular pattern string abc (.*) bcd.
Match information generation unit 320 is used for a plurality of substrings each is all mated to generate match information with the capital characters string of importing.
In one embodiment of the invention, match information comprises that reference position takes place for regular pattern string ID, substring ID, coupling and final position takes place coupling.Wherein, the number that comprises substring in order that regular pattern string ID, substring occur and the regular pattern string among the substring ID in the regular pattern string.
In one embodiment of the invention, for comprising begin symbol ^ in the regular pattern string and ending symbol $, needn't carry out substring according to substring division unit 310 divides, ^ab$ for example, directly judge that according to matching module 300 final position takes place for coupling generation reference position and coupling is the beginning and end of capital characters string, if then mate, otherwise do not mate.
Particularly, the word string of each division and the capital characters string of input are mated, for example, two substring ab that will divide according to regular pattern string ab (.*) cd and cd all mate with the capital characters string abc of an input.Use the ID of pattern to represent match information, the ID of the corresponding pattern of each substring, the ID that supposes the pattern of substring ab is 0x21FF, the ID of the pattern of substring cd is 0x22FF, stipulate ID low 16 identical of each substring pattern at this, low 16 bit representation pattern string ID, for example 0x00FF that this is identical; 16 of the ID of the pattern of each substring orders that occur in the module string to each substring of 23 bit representations also are expressed as each substring ID, the order that in regular pattern string ab (.*) cd, occurs of substring ab and substring cd for example, substring ab be 1 appear at before, substring cd be 2 appear at after, the ID that is substring ab is 1, and the ID of substring cd is 2; The substring number that 24 to the 31 bit representation regular pattern strings of the ID of each substring pattern are divided into, for example the substring number that is divided into of regular pattern string ab (.*) cd is 2.
Matching unit 330 is used for judging according to match information whether capital characters string and the regular pattern string of input mate.
For example, when coupling arrived the end of the capital characters string of importing, relatively 16 to 23 and 24 to 31 of ID, when the substring sum equated with 24 to 31 the value of ID, then the regular pattern string mated.
Be to be understood that, with substring in a certain order or the position successively with the capital characters string of input when mating, be to mate in the expressed mode of regular expression, in the search procedure of the substring ab of above-mentioned example and cd, be according to two independently character string search, but distinguish to some extent with two string searchings fully independently, substring ab and cd belong to a regular pattern string ID again, the order difference that each substring for example occurs among ab (.*) cd at the regular pattern string, and have correlativity between the two.
Traffic filtering device according to the employing of the embodiment of the invention is mated based on the canonical of AC algorithm can carry out multimode matching to the represented regular pattern string of any expression symbol.
Fig. 6 is that the employing of one embodiment of the invention is based on the structured flowchart of the traffic filtering device of the canonical coupling of AC algorithm.
As shown in Figure 6, traffic filtering device according to the employing of the embodiment of the invention is mated based on the canonical of AC algorithm comprises initialization module 100, creation module 200, substring division unit 310, match information generation unit 320, matching unit 330, removes heavy unit 340 and filtering module 400.
In one embodiment of the invention, matching module 300 comprises substring division unit 310, match information generation unit 320, matching unit 330 and goes to heavy unit 340.
In one embodiment of the invention, match information comprises that reference position takes place for regular pattern string ID, substring ID, coupling and final position takes place coupling.Wherein, the number that comprises substring in order that regular pattern string ID, substring occur and the regular pattern string among the substring ID in the regular pattern string.
The order of going to heavy unit 340 to be used for occurring in the regular pattern string according to substring ID substring is gone heavily the substring that mates.For example, when coupling arrives the end of the capital characters string of importing, relatively 16 of all ID to 23,16 to 23 equal substrings are gone heavily.Go heavy after, relatively 16 to 23 and 24 to 31 of ID, if when going that substring after heavy is total to be equated with 24 to 31 the value of ID, then the regular pattern string mates.According to going retry can improve matched accuracy.
Be to be understood that, with substring in a certain order or the position successively with the capital characters string of input when mating, be to mate in the expressed mode of regular expression, in the search procedure of the substring ab of above-mentioned example and cd, be according to two independently character string search, but distinguish to some extent with two string searchings fully independently, substring ab and cd belong to a regular pattern string ID again, the order difference that each substring for example occurs among ab (.*) cd at the regular pattern string, and have correlativity between the two.
Traffic filtering device according to the employing of the embodiment of the invention is mated based on the canonical of AC algorithm can improve matched accuracy by the heavy unit that goes that increases.
Fig. 7 is that the employing of one embodiment of the invention is based on the structured flowchart of the traffic filtering device of the canonical coupling of AC algorithm.
As shown in Figure 7, according to the traffic filtering device that the employing of the embodiment of the invention is mated based on the canonical of AC algorithm, comprise initialization module 100, creation module 200, substring division unit 310, match information generation unit 320, record subelement 331, ordering subelement 332, first judgment sub-unit 333, second judgment sub-unit 334, the definite subelement 335 of coupling, remove heavy unit 340 and filtering module 400.
In one embodiment of the invention, matching unit 330 comprises record subelement 331, ordering subelement 332, first judgment sub-unit 333, second judgment sub-unit 334 and the definite subelement 335 of coupling.
In one embodiment of the invention, match information comprises that reference position takes place for regular pattern string ID, substring ID, coupling and final position takes place coupling.Wherein, the number that comprises substring in order that regular pattern string ID, substring occur and the regular pattern string among the substring ID in the regular pattern string.
Record subelement 331 is used for substring and the corresponding match information of record coupling.Particularly, the substring of record coupling and corresponding match information simultaneously in the matching process of the word string of each division and the capital characters string of input.
Ordering subelement 332 is used for sorting according to the substring of match information to coupling.For example, can reference position take place and mate the generation final position substring that mates is sorted according to pattern string ID, substring ID, coupling.
First judgment sub-unit 333 is used for judging whether the capital characters string of input comprises whole substrings of regular pattern string.
Second judgment sub-unit 334 is according to the judged result of first judgment sub-unit 333, if comprise whole substrings of regular pattern string, second judgment sub-unit 334 be used for further judging adjacent two substrings in ordering back in the coupling generation reference position of back substring greater than the coupling generation final position of substring formerly.
Can further improve matched accuracy according to second judgment sub-unit 334, for example, given regular pattern string abc (.*) bcd mates the capital characters string abcd of input, if this step then can draw matching result not, reality is not mated, the capital characters string abcd of input can mate substring abc and substring bcd respectively, and mate substring abc earlier, back coupling substring bcd, it is overlapping that but coupling has taken place, at this moment, reference position takes place and mate the generation final position to draw correct matching result according to coupling, when as substring abc coupling taking place, final position takes place with coupling and is designated as [1 in coupling generation reference position, 3], when coupling took place substring bcd, final position took place and is designated as [2,4] in coupling generation reference position and coupling, the coupling generation reference position of this moment back substring bcd is less than the coupling generation final position of substring abc formerly, and coupling is success.
Coupling is determined subelement 335 according to the judged result of second judgment sub-unit 334, determines that subelement 335 judges that the capital characters string of input and described regular pattern string mate if judge greater than, coupling.
Be to be understood that, with substring in a certain order or the position successively with the capital characters string of input when mating, be to mate in the expressed mode of regular expression, in the search procedure of the substring ab of above-mentioned example and cd, be according to two independently character string search, but distinguish to some extent with two string searchings fully independently, substring ab and cd belong to a regular pattern string ID again, the order difference that each substring for example occurs among ab (.*) cd at the regular pattern string, and have correlativity between the two.
Traffic filtering device according to the employing of the embodiment of the invention is mated based on the canonical of AC algorithm can improve matched accuracy by the heavy unit that goes that increases.
Should be appreciated that the traffic filtering method and the device that mate based on the canonical of AC algorithm according to the employing of the embodiment of the invention, be not only applicable in the traffic filtering, also be applicable in any application scenarios of any multimode string matching.
Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination.In the above-described embodiment, a plurality of steps or method can realize with being stored in the storer and by software or firmware that suitable instruction execution system is carried out.For example, if realize with hardware, the same in another embodiment, in the available following technology well known in the art each or their combination realize: have for the discrete logic of data-signal being realized the logic gates of logic function, special IC with suitable combinational logic gate circuit, programmable gate array (PGA), field programmable gate array (FPGA) etc.
In the description of this instructions, concrete feature, structure, material or characteristics that the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means in conjunction with this embodiment or example description are contained at least one embodiment of the present invention or the example.In this manual, the schematic statement to above-mentioned term not necessarily refers to identical embodiment or example.And concrete feature, structure, material or the characteristics of description can be with the suitable manner combination in any one or more embodiment or example.
Although illustrated and described embodiments of the invention, for the ordinary skill in the art, be appreciated that without departing from the principles and spirit of the present invention and can carry out multiple variation, modification, replacement and modification to these embodiment that scope of the present invention is by claims and be equal to and limit.

Claims (16)

1. the traffic filtering method that employing is mated based on the canonical of AC algorithm is characterized in that, may further comprise the steps:
Initialization AC state machine table;
The regular pattern string is added to described AC state machine table, and sets up the AC state machine;
Capital characters string and the described regular pattern string of input are mated;
If the capital characters string of described input and regular pattern string coupling, then the capital characters string to described input filters.
2. employing as claimed in claim 1 is characterized in that based on the traffic filtering method of the canonical coupling of AC algorithm described regular pattern string comprises any expression symbol.
3. employing as claimed in claim 2 is characterized in that based on the traffic filtering method of the canonical of AC algorithm coupling, and described capital characters string and described regular pattern string with input mates further and comprise:
Described regular pattern string is divided to generate a plurality of substrings of described regular pattern string correspondence according to described any expression symbol;
Each of described a plurality of substrings is all mated to generate match information with the capital characters string of described input; And
Judge according to described match information whether the capital characters string of described input and described regular pattern string mate.
4. employing as claimed in claim 3 is characterized in that based on the traffic filtering method of the canonical coupling of AC algorithm, and described match information comprises that reference position takes place for regular pattern string ID, substring ID, coupling and final position takes place coupling.
5. employing as claimed in claim 4 is based on the traffic filtering method of the canonical coupling of AC algorithm, it is characterized in that, wherein, the number that comprises substring in order that described regular pattern string ID, described substring occur and the described regular pattern string among the described substring ID in described regular pattern string.
6. employing as claimed in claim 5 is characterized in that based on the traffic filtering method of the canonical coupling of AC algorithm, also comprises:
According to the order that substring described in the described substring ID occurs in described regular pattern string the substring that mates is gone heavily.
7. employing as claimed in claim 4 is characterized in that based on the traffic filtering method of the canonical of AC algorithm coupling, describedly judges according to match information whether the capital characters string of described input and described regular pattern string mate further and comprises:
The substring of record coupling and corresponding match information;
Sort according to the substring of described match information to coupling;
Judge whether the capital characters string of described input comprises whole substrings of described regular pattern string;
If comprise whole substrings of described regular pattern string, then further judge in adjacent two substrings in ordering back in the coupling generation reference position of back substring greater than the coupling generation final position of substring formerly;
If judge greater than, judge that then the capital characters string of described input and described regular pattern string mate.
8. employing as claimed in claim 1 is characterized in that based on the traffic filtering method of the canonical coupling of AC algorithm, and described regular pattern string comprises begin symbol and ends symbol.
9. the traffic filtering device that employing is mated based on the canonical of AC algorithm is characterized in that, comprising:
Initialization module is used for initialization AC state machine table;
Creation module is used for the regular pattern string is added to described AC state machine table, and sets up the AC state machine;
Matching module is used for capital characters string and the described regular pattern string of input are mated; And
Filtering module, if the capital characters string of described input and regular pattern string coupling, described filtering module is used for the capital characters string of described input is filtered.
10. employing as claimed in claim 9 is characterized in that based on the traffic filtering device of the canonical coupling of AC algorithm described regular pattern string comprises any expression symbol.
11. employing as claimed in claim 10 is characterized in that based on the traffic filtering device of the canonical coupling of AC algorithm described matching module further comprises:
The substring division unit is for a plurality of substrings of according to described any expression symbol described regular pattern string being divided to generate described regular pattern string correspondence;
The match information generation unit, each that is used for described a plurality of substrings all mates to generate match information with the capital characters string of described input; And
Matching unit is used for judging according to described match information whether the capital characters string of described input and described regular pattern string mate.
12. employing as claimed in claim 11 is characterized in that based on the traffic filtering device of the canonical coupling of AC algorithm described match information comprises that reference position takes place for regular pattern string ID, substring ID, coupling and final position takes place coupling.
13. employing as claimed in claim 12 is based on the traffic filtering device of the canonical coupling of AC algorithm, it is characterized in that, wherein, the number that comprises substring in order that described regular pattern string ID, described substring occur and the described regular pattern string among the described substring ID in described regular pattern string.
14. employing as claimed in claim 13 is based on the traffic filtering device of the canonical coupling of AC algorithm, it is characterized in that, described matching module also comprises heavy unit, is used for the substring that mates being gone heavily in the order that described regular pattern string occurs according to substring described in the described substring ID.
15. employing as claimed in claim 12 is characterized in that based on the traffic filtering device of the canonical coupling of AC algorithm described matching unit further comprises:
The record subelement is used for the substring of record coupling and the match information of correspondence;
The ordering subelement is used for sorting according to the substring of described match information to coupling;
First judgment sub-unit is used for judging whether the capital characters string of described input comprises whole substrings of described regular pattern string;
Second judgment sub-unit, judged result according to described first judgment sub-unit, if comprise whole substrings of described regular pattern string, described second judgment sub-unit be used for further judging adjacent two substrings in ordering back in the coupling generation reference position of back substring greater than the coupling generation final position of substring formerly; And
Coupling is determined subelement, according to the judged result of described second judgment sub-unit, determines that subelement judges that the capital characters string of described input and described regular pattern string mate if judge greater than, described coupling.
16. employing as claimed in claim 9 is characterized in that based on the traffic filtering device of the canonical coupling of AC algorithm described regular pattern string comprises begin symbol and ends symbol.
CN201110460365.9A 2011-12-31 2011-12-31 Adopt traffic filtering method and the device of the canonical coupling based on AC algorithm Active CN103186640B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110460365.9A CN103186640B (en) 2011-12-31 2011-12-31 Adopt traffic filtering method and the device of the canonical coupling based on AC algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110460365.9A CN103186640B (en) 2011-12-31 2011-12-31 Adopt traffic filtering method and the device of the canonical coupling based on AC algorithm

Publications (2)

Publication Number Publication Date
CN103186640A true CN103186640A (en) 2013-07-03
CN103186640B CN103186640B (en) 2016-05-25

Family

ID=48677809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110460365.9A Active CN103186640B (en) 2011-12-31 2011-12-31 Adopt traffic filtering method and the device of the canonical coupling based on AC algorithm

Country Status (1)

Country Link
CN (1) CN103186640B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824927A (en) * 2016-03-16 2016-08-03 中国互联网络信息中心 Domain name matching method based on tree automaton
CN105991608A (en) * 2015-02-28 2016-10-05 杭州迪普科技有限公司 Distributed equipment, and service processing method and device thereof
CN106959962A (en) * 2016-01-12 2017-07-18 中国移动通信集团青海有限公司 A kind of multi-pattern match method and apparatus
CN111159497A (en) * 2019-12-31 2020-05-15 奇安信科技集团股份有限公司 Regular expression generation method and regular expression-based data extraction method
CN112118248A (en) * 2020-09-11 2020-12-22 苏州浪潮智能科技有限公司 Method and device for detecting abnormal flow of cloud platform virtual machine, virtual machine and system
CN113347214A (en) * 2021-08-05 2021-09-03 湖南戎腾网络科技有限公司 High-frequency state matching method and system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110096628B (en) * 2018-04-20 2021-01-22 武汉绿色网络信息服务有限责任公司 Quick matching identification method and device based on character strings

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090019220A1 (en) * 2006-01-31 2009-01-15 Roke Manor Research Limited Method of Filtering High Data Rate Traffic
CN101442540A (en) * 2008-12-30 2009-05-27 北京畅讯信通科技有限公司 High speed mode matching algorithm based on field programmable gate array
CN101576885A (en) * 2008-05-08 2009-11-11 韩露 Technical scheme for extracting dynamic generation web page contents
CN101841546A (en) * 2010-05-17 2010-09-22 华为技术有限公司 Rule matching method, device and system
CN102075511A (en) * 2010-11-01 2011-05-25 北京神州绿盟信息安全科技股份有限公司 Data matching equipment and method as well as network intrusion detection equipment and method
US20110270876A1 (en) * 2010-05-01 2011-11-03 Timothy David Gill Method and system for filtering information

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090019220A1 (en) * 2006-01-31 2009-01-15 Roke Manor Research Limited Method of Filtering High Data Rate Traffic
CN101576885A (en) * 2008-05-08 2009-11-11 韩露 Technical scheme for extracting dynamic generation web page contents
CN101442540A (en) * 2008-12-30 2009-05-27 北京畅讯信通科技有限公司 High speed mode matching algorithm based on field programmable gate array
US20110270876A1 (en) * 2010-05-01 2011-11-03 Timothy David Gill Method and system for filtering information
CN101841546A (en) * 2010-05-17 2010-09-22 华为技术有限公司 Rule matching method, device and system
CN102075511A (en) * 2010-11-01 2011-05-25 北京神州绿盟信息安全科技股份有限公司 Data matching equipment and method as well as network intrusion detection equipment and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孔东林等: "基于AC自动机匹配算法的入侵检测系统研究", 《微电子学与计算机》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105991608A (en) * 2015-02-28 2016-10-05 杭州迪普科技有限公司 Distributed equipment, and service processing method and device thereof
CN105991608B (en) * 2015-02-28 2019-11-12 杭州迪普科技股份有限公司 Method for processing business, device and the distributed apparatus of distributed apparatus
CN106959962A (en) * 2016-01-12 2017-07-18 中国移动通信集团青海有限公司 A kind of multi-pattern match method and apparatus
CN106959962B (en) * 2016-01-12 2019-10-15 中国移动通信集团青海有限公司 A kind of multi-pattern match method and apparatus
CN105824927A (en) * 2016-03-16 2016-08-03 中国互联网络信息中心 Domain name matching method based on tree automaton
CN105824927B (en) * 2016-03-16 2019-06-14 中国互联网络信息中心 A kind of domain name matching method based on tree automaton
CN111159497A (en) * 2019-12-31 2020-05-15 奇安信科技集团股份有限公司 Regular expression generation method and regular expression-based data extraction method
CN111159497B (en) * 2019-12-31 2023-09-22 奇安信科技集团股份有限公司 Regular expression generating method and regular expression-based data extraction method
CN112118248A (en) * 2020-09-11 2020-12-22 苏州浪潮智能科技有限公司 Method and device for detecting abnormal flow of cloud platform virtual machine, virtual machine and system
CN112118248B (en) * 2020-09-11 2022-06-14 苏州浪潮智能科技有限公司 Cloud platform virtual machine abnormal flow detection method and device, virtual machine and system
CN113347214A (en) * 2021-08-05 2021-09-03 湖南戎腾网络科技有限公司 High-frequency state matching method and system
CN113347214B (en) * 2021-08-05 2021-11-12 湖南戎腾网络科技有限公司 High-frequency state matching method and system

Also Published As

Publication number Publication date
CN103186640B (en) 2016-05-25

Similar Documents

Publication Publication Date Title
CN103186640A (en) AC algorithm based regular matching flow filtering method and device
CN106708719B (en) Service function testing method and device
Shao et al. Cascade of failures in coupled network systems with multiple support-dependence relations
CN107807982B (en) Consistency checking method and device for heterogeneous database
CN101246500B (en) Retrieval system and method for implementing data fast indexing
CN104063296B (en) Veneer condition detection method in place and device
CN102609583B (en) Chip register information management method
CN104503434A (en) Fault diagnosis method based on active fault symptom pushing
CN104899264A (en) Multi-mode regular expression matching method and apparatus
CN103093038A (en) Updating method and updating device for bills of material (BOMs)
CN103116497A (en) Multiple-electronic-control-unit building method based on software platform and system thereof
CN104063208A (en) Code file optimizing method, system and server
CN102568295A (en) Teaching platform based on product assembly sequence model facing to virtual disassembly and assembly
CN103270699A (en) Device and method for determining search starting point
CN103780263B (en) Device and method of data compression and recording medium
CN104090995A (en) Automatic generating method of rebar unit grids in ABAQUS tire model
CN102495778B (en) System and method for testing single-packet regular matching logic
CN102724505A (en) Run-length coding FPGA (field programmable gate array) implementing method in JPEG-LS (joint photographic experts group-lossless standard)
CN103135989A (en) Callback function code generation method and device
CN105447003A (en) Parameter set generation method and device
CN112711649A (en) Database multi-field matching method, device, equipment and storage medium
CN115392048B (en) Constraint solving engine-based random number generation method with constraint
CN103840835A (en) Data decompression method and device
CN103699049B (en) PLC On-off signal point expansion module and extended method
Gerber et al. Anti-classification results for the Kakutani equivalence relation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant