CN101854341A - Pattern matching method and device for data streams - Google Patents

Pattern matching method and device for data streams Download PDF

Info

Publication number
CN101854341A
CN101854341A CN200910132546A CN200910132546A CN101854341A CN 101854341 A CN101854341 A CN 101854341A CN 200910132546 A CN200910132546 A CN 200910132546A CN 200910132546 A CN200910132546 A CN 200910132546A CN 101854341 A CN101854341 A CN 101854341A
Authority
CN
China
Prior art keywords
pattern
fragment
pattern matching
divided
patterns
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200910132546A
Other languages
Chinese (zh)
Other versions
CN101854341B (en
Inventor
郑凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to CN200910132546.1A priority Critical patent/CN101854341B/en
Publication of CN101854341A publication Critical patent/CN101854341A/en
Application granted granted Critical
Publication of CN101854341B publication Critical patent/CN101854341B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a pattern matching method and a pattern matching device for data streams. In the method, a pattern set comprising a plurality of patterns is divided into a plurality of mutually exclusive pattern subsets with a given detection window length, so pattern matching checks are performed on the mutually exclusive subsets in a plurality of pattern matching engines respectively, the searching times of the pattern matching engines are greatly decreased, and the working efficiency of a system is improved correspondingly.

Description

The method for mode matching and the device that are used for data flow
Technical field
The present invention relates to a kind of pattern matching (FPM) method and apparatus that is used for data flow.
Background technology
As a novel firewall technology, deep-packet detection (DPI) technology has been widely used in intrusion detection/system of defense (IDS/IPS), has stoped spam/anti-virus, has prevented fields such as data leak, information filtering.Deep packet inspection technical is in depth checked each packet and the payload thereof by fire compartment wall, and how DPI engine wherein decides handle packet based on the rule set of technology such as fingerprint matching, heuristic technique, abnormality detection and statistical analysis.For whether each packet that detects in the data flow for example has attack signature, in the DPI engine, generally adopted pattern matching/signature search technology, each the suspicious byte in the data flow is compared.Yet the amount of calculation that existing pattern matching algorithm is required and the traffic are all very huge.Usually, it is suitable with it that the DPI application needs a large amount of transmission volume at the set of patterns that comprises a large amount of patterns, and required computing capability is directly proportional (because DPI not only detects packet header with the linear speed of monitored network interface, but also detect payload), this makes DPI be difficult to deal with thousands of million even the linear speed and the huge set of patterns of ten thousand megabits.
Because pattern matching algorithm has higher requirement to the CPU processing speed, and the development of microelectric technique is near its limit, " storage wall (memory wall) " problem may can appear very soon, be that storage speed has restricted processing speed, therefore adopting parallel (parallelism) algorithm may be exploitation the only way at the extendible pattern matching engine (PM engine) of high performance network intruding detection system (NIDS).The pattern matching algorithm of some executed in parallel has been proposed at present.At present, TCAM (three-state content addressing memory) chip especially is suitable as the pattern matching engine of parallel processing, thereby realizes the hardware-accelerated of convection model coupling.The advantage of TCAM technology be that seek rate is fast, low in energy consumption, and obtained equipment suppliers' such as Cisco, 3Com extensive support.Therefore, be necessary to provide and carry out the efficient height, promptly can realize the solution of high efficiency parallel pipeline operation.
Summary of the invention
The present invention aims to provide a kind of efficient higher stream mode matching process and device, wherein finishes the parallel mode matching of pipeline system by the mode subset that utilizes mutual exclusion.
For this reason, one aspect of the present invention provides a kind of method for mode matching that is used for data flow, this method may further comprise the steps: the data flow of input is divided into a plurality of fragments, and these fragments are distributed in a plurality of pattern matching engines one respectively, in the mode subset of a plurality of mutual exclusions under given detection window length of each pattern matching engine storage wherein; And any one pattern matching engine in described a plurality of pattern matching engine carries out the pattern matching contrast according to the mode subset of wherein storage to the fragment of being distributed, and under the situation of the mode subset match hit of any one pattern matching engine storage of this fragment and this this fragment exported.
In order to realize that pipeline system handles, alternatively, when the fragment of finding to distribute to it when certain pattern matching engine can not be complementary with corresponding mode subset, this fragment is delivered to another pattern matching engine proceed contrast; And when after certain fragment is crossed by all pattern matching engine inspections, all not finding coupling, report that match hit does not take place this fragment, finish inspection this fragment.
In order inerrably each data slot to be carried out matching detection, the data flow of input is divided into the fragment that length is not less than detection window length, and preferably the length of fragment equals detection window length.Optionally, when data flow was divided into a plurality of fragment, making needed the pattern of contrast not cross over the cut-point of at least two fragments.For this reason, " anti-pattern (the negative pattern) " of the target pattern that can contrast as required comes the cut-point on the specified data stream, in this cut-point punishment pitch cutting section, wherein said anti-pattern adds that any suffix and/or prefix can not constitute described target pattern.
According to the present invention, the mode subset of described a plurality of mutual exclusions can be to obtain by the set of patterns that comprises a plurality of patterns is divided.Optionally, when a set of patterns is divided into the mode subset of mutual exclusion, at first this set of patterns is divided into exclusive mode subclass as much as possible, and then less mode subset is merged, with the equilibrium of implementation pattern sub-set size.
The present invention also provides a kind of stream mode coalignment that is used for deep-packet detection on the other hand, comprising: a plurality of pattern matching engines, wherein in the mode subset of a plurality of mutual exclusions under given detection window length of each pattern matching engine storage; Stream is cut apart and allocation units, be used for the data flow of input is divided into a plurality of fragments, and these fragments are distributed to described pattern matching engine respectively, any one pattern matching engine in wherein said a plurality of pattern matching engine carries out the pattern matching contrast according to the mode subset of wherein storage to the fragment of being distributed, and under the situation of the mode subset match hit of any one pattern matching engine storage of this fragment and this this fragment is exported.This stream mode coalignment is correspondingly carried out each step according to said method of the present invention.
Because when a big set of patterns is divided into the subclass of several mutual exclusions, may not need to check all subclass at a fragment, thereby the present invention has utilized the alternative of mode subset to reduce the number of times of searching of PM engine, make pipeline processes become possibility, thereby improved the operating efficiency of parallel PM engine greatly.By this parallel mode matching, need not to adopt redundant memory to come the extension storage space based on " mutual exclusion " mode subset.
Description of drawings
Fig. 1 shows the illustrative diagram of carrying out the device of parallel flow pattern matching according to the present invention;
Fig. 2 shows a kind of exemplary algorithm of set of patterns being divided according to the mutual exclusion principle; And
Fig. 3 shows the example that data is flow to the row mode coupling according to the method for the invention.
Embodiment
The inventor is by discovering: under given detection window length, a big set of patterns that comprises many patterns always can be divided into several subclass of " mutual exclusion " each other, thereby can handle data stream fragment based on the subclass of these mutual exclusions respectively by a plurality of parallel PM engines, to realize the operation of high-efficiency stream line, improve matching speed greatly.
Here, the relation of " mutual exclusion (exclusive) " between the pattern is meant that same data slot can not mate with two patterns simultaneously.And the mode subset of mutual exclusion is meant such situation: when the equal mutual exclusion of arbitrary pattern PB among the arbitrary pattern PA among the mode subset SA and another mode subset SB, think that then these two mode subset SA and SB are " mutual exclusions ".
The inventor notices: after the length w of detection window is given, some pattern under any circumstance (no matter what input traffic is, also no matter pattern in which type of mode store) all can form the relation of mutual exclusions with other patterns.For example, the length of supposing detection window is w=7 byte, and pattern P1 is " ABCD ", and pattern P2 is " wxyz ".Obviously, what the character string of no matter importing is, and these two patterns be how to be stored among the TCAM (for example with "? ABCD " or " ABCD " form storage, wherein "? " the expression asterisk wildcard), these two pattern mutual exclusions always.Because can be simultaneously and the character string that is complementary of pattern P1, P2 must be more than or equal to 4+4=8 byte, always above the detection window length of 7 bytes.
For two pattern P1 and P2, their minimum merges length (MCL) and can use function MCL (P1 P2) represents, this minimum merging length equals to comprise simultaneously P1 and the P2 minimum length as the pattern of its substring.In the above example, pattern P1 and P2 comprise four bytes respectively, and they itself are mutual exclusions, so it is minimum, and to merge length be exactly 4+4=8 byte.For for example pattern P3=" ABC ", the situation of pattern P4=" CAB ", it is minimum to merge length M CL (P1 P2)=4, is that these two pattern P3 and P4 just are not to be mutual exclusion under 7 the situation in detection window length; And when detection window length was 3, P3 and P4 were exactly mutual exclusion.That is to say, and if only if minimum merge length M CL (PA, during PB) greater than detection window length w, then these two pattern PA and PB under detection window length w always " mutual exclusion ".
Fig. 1 shows the schematic representation of apparatus of carrying out the parallel flow pattern matching according to the present invention.At first, before for example network data (packet) that enters stream mode coalignment 1 is carried out matching detection, do as common intruding detection system, the data with these arrivals in stream damper are reassembled into continuous data flow.For the data flow that assembles being carried out the PARALLEL MATCHING of pipeline system, cut apart and allocation units 101 are divided into several little fragments with data flow by stream, so that at each pattern matching engine PM1, PM2 ..., carry out the pattern matching contrast among the PMk respectively.
It is to realize parallel processing for small grain size that ready data flow is divided into little fragment.In order inerrably each data slot to be carried out matching detection, the length of data stream being cut apart each data slot that obtains should be less than the detection window length of each PM engine.In order to realize optimum system's operation efficiency, make full use of the advantage of parallel processing, the length of data slot preferably equals the length w of detection window, makes that like this live load of each PM engine is more balanced.A kind of scheme of splitting traffic is according to the length requirement of data slot, to take out the data slot of this length successively from data flow.For example, data flow is abcdefg, and the length requirement of data slot is 5, and the data slot that then is partitioned into is abcde, bcdef, cdefg.
As an alternative, can adopt another kind of data flow splitting scheme.According to this splitting scheme, Dui Bi the target pattern principle of not crossing over the cut-point of at least two fragments is come splitting traffic as required.For this reason, " anti-pattern " of the target pattern that can contrast as required comes the cut-point on the specified data stream, and in this cut-point punishment pitch cutting section, wherein said anti-pattern adds that any suffix and/or prefix can not constitute the subclass of described target pattern.This preferred data flow splitting scheme equally also is applicable to the present invention, and it can realize better load balancing, and helps reducing the number of times of execution pattern coupling.
Before carrying out coupling,, comprise the mode subset SS that the big set of patterns of many patterns is divided into a plurality of " mutual exclusions " with one by the minimum merging length of the mutual exclusion subset division unit 102 among Fig. 1 according to given detection window length w and each match pattern 1, SS 2..., SS k, the equal mutual exclusion of each pattern that comprises in each pattern that comprises in promptly arbitrary mode subset and another mode subset.
Thisly always can realize under the actual conditions being divided in of set of patterns according to alternative, and many different splitting schemes (for example adopting the method for exhaustion) are arranged, as long as the minimum that satisfies between each pattern that comprises in each pattern of comprising in arbitrary mode subset and another mode subset merges length M CL all greater than the condition of detection window length w, it distinguishes the size that only is amount of calculation.Afterwards, resulting each mutual exclusion subclass SS 1, SS 2..., SS kDistributed to a corresponding pattern matching engine PM1 respectively, PM2 ..., PMk in actual applications, can store each mutual exclusion subclass respectively on the sheet identical with number of subsets among the TCAM.Then, stream is cut apart and allocation units 101 are distributed to each PM engine according to the principle of load balancing with resulting each data slot as far as possible and handled, and compares with the mode subset that is stored in this PM engine.
If the modal length that is comprised in the initial big set of patterns has surpassed the detection window length that TCAM supported, then can will be somebody's turn to do long pattern according to the support figure place of TCAM in advance and be divided into a plurality of short patterns, make its length be no more than detection window length, and then resulting short pattern is divided into mode subset according to the mutual exclusion principle.For example, to suppose that long pattern is " ABCDEF ", to support 5 bytes in order being encased in, promptly detection window length is among the TCAM of 5 bytes, this mode division can be become the pattern of " ABCDE " and " BCDEF " two 5 byte longs.
In the emulation experiment that adopts a typical module collection that has comprised 1993 patterns to finish, adopt four PM engines, promptly the mode subset that will obtain is four.When detection window length w=10 byte, at first 1993 patterns are divided into 4369 short patterns, make it adapt to detection window length, and then these patterns are divided into the mode subset of four mutual exclusions according to the mutual exclusion principle, comprise 1088~1105 patterns in each mode subset respectively, it is distributed to four PM engines carry out coupling, can realize the roughly balanced live load of each PM engine this moment.When the w=12 byte, 1993 patterns are divided into 3730 short patterns, comprise 930~939 patterns respectively in resulting four exclusive mode subclass, still can realize the roughly balanced live load of PM engine this moment.But when the w=16 byte, 1993 patterns are divided into 3099 short patterns, comprise 636~1207 patterns respectively in four exclusive mode subclass that obtain at last, and this moment, the live load of each PM engine just can not be kept in balance again.Therefore the suggestion detection window length suitably selecting TCAM in actual applications and supported, to realize load balancing effect preferably, this can be adjusted according to the situation of actual rule collection.
In addition, it should be noted that the mutual exclusion subset division unit 102 among Fig. 1 only is optional element for pattern matching,, thereby in each pattern matching engine, set the mode subset of mutual exclusion in advance off-line because this division can finish.As mentioned above, detection window length may cause very big influence to the equilibrium situation of the live load of PM engine, and detection window length is the upper limit with the byte number that TCAM was supported.On the other hand, the number of the mode subset that obtain also has a significant impact the dividing condition of mode subset, and the number of mode subset depends on the quantity of the TCAM that is adopted.Further, the quantity of TCAM and the byte number supported depend on the convenience and the cost factor of hardware designs again.Therefore, preferably, under the situation of given set of patterns, repeatedly test at different mode subset number and detection window length on off-line ground, both satisfy the convenient and cost factor of hardware designs thereby mark off, make the balanced as far as possible mutual exclusion subset division of live load of PM engine again.
Because the mode subset of storing in each PM engine is mutual exclusion, thereby the data slot of having found to distribute to it when some PM engines be stored in mode subset in this PM engine when being complementary, do not need to check by other PM engines again, can report directly that this fragment match hit has taken place and exported this fragment to be used for subsequent treatment, and the fragment allocation that the next one need be checked proceeds to handle for this PM engine, thereby makes full use of memory resource.Described subsequent treatment for example is according to match hit situation that is taken place and known rule set, compares as the virus characteristic storehouse, and this subsequent treatment with common applicable cases such as IDS/IPS is identical, repeats no more here.
If when set of patterns is cut apart, mode division that certain is long has become some short patterns, because therefore match pattern match pattern not necessarily can check further in subsequent treatment whether this data slot further mates this pattern.In the above example, long pattern " ABCDEF " is divided into " ABCDE " and " BCDEF " two short patterns.The data slot of coupling " ABCDE " this shorter pattern is further checked whether mate " ABCDEF " this long pattern in subsequent treatment.
Otherwise, if in this PM engine, there be not the match hit of generation at this data slot, and also have other PM engine this fragment not to be detected, then this fragment is cut apart by stream and allocation units 101 are redistributed to next PM engine and carried out the coupling contrast.Repeat, until in a certain PM engine, having found match hit or in all PM engines, all not found match hit this procedural order.
Alternatively, in order to realize the processing of above-mentioned pipeline system, described stream is cut apart and allocation units 101 can be enclosed an indication vector on each fragment, pattern matching engine was after checking a fragment, the indication vector of this fragment can be modified, thereby indicates this fragment by which pattern matching engine to be checked.For example, can enclose the indication vector (k represents the number of PM engine) of a k bit simultaneously after the fragment that data flow is divided into a plurality of weak points on each fragment, its all initial bits for example all are set as " 0 ".After a data fragment was detected by some PM engines, the corresponding bit in its indication vector was changed to " 1 ".
After a certain fragment is crossed by all PM engine inspections (all bits in for example described indication vector all are changed to " 1 "), all do not find match hit, report that then this fragment is " totally ", does not comprise the suspicious data content, thereby no longer it is checked.Certainly, also can give tacit consent to described fragment is " totally ", specially reports with regard to not needing when not finding match hit.
It will be appreciated by those skilled in the art that, the description that reference device shown in Figure 1 carries out is in fact corresponding to following flow process: the data flow of input is divided into a plurality of fragments, and these fragments are distributed in a plurality of pattern matching engines one respectively, in the mode subset of a plurality of mutual exclusions under given detection window length of each pattern matching engine storage wherein; And according to the mode subset of wherein storage the fragment of being distributed is carried out the pattern matching contrast respectively, and under the situation of match hit, subsequent treatment is carried out in this fragment output by described pattern matching engine.
The mode subset of a plurality of mutual exclusions always can realize owing under given window a set of patterns is divided into, and therefore many different splitting schemes are arranged.Fig. 2 has provided a kind of exemplary algorithm of set of patterns being divided according to the mutual exclusion principle, and it attempts to realize with rational amount of calculation the division of mutual exclusion subclass.The computational complexity of this algorithm can be used O (N 2) represent that N wherein represents the size of set of patterns, promptly along with the increase of set of patterns, the amount of calculation of this algorithm becomes square increase.The main thought of this algorithm is: the set of patterns that at first will comprise a plurality of patterns is divided into exclusive mode subclass as much as possible, need not to consider the equilibrium situation of each subclass this moment, and then less mode subset merged, with the equilibrium of implementation pattern sub-set size.For example, comprise N pattern P for one 1, P 2..., P NSet of patterns (suppose pattern P all in the set of patterns 1, P 2..., P NLength all be no more than detection window length w, otherwise also will be as mentioned above bigger mode division be become short pattern, to be limited in the detection window length w), the mode subset SS corresponding to N sky of model number can be set at first 1, SS 2..., SS N, compare to pattern one by one according to flow process shown in Figure 2 then and merge.
Shown in the flow chart of Fig. 2, in step S201, at first from still sorting out a pattern as present mode, as P in the unallocated pattern to existing mode subset jIn step S202, calculate under given detection window length w the present mode P that is sorted out then jWith existing mode subset SS 1, SS 2..., SS NMinimum merge length (MCL).The MCL of a so-called here pattern and a mode subset refers to the minimum value of the minimum merging length of each pattern in this pattern and this mode subset.Then in step S203, judge this pattern P jWith the MCL of each subclass whether all greater than detection window length w, if, be equivalent to this pattern and the MCL that has been divided into each pattern in the existing mode subset all greater than detection window length w, then think this present mode P jWith all patterns in the existing mode subset all be mutual exclusion, flow process proceeds to step S204, is about to P jItself is as a new exclusive mode subclass SS jOtherwise, if in step S203, find this present mode P jWith at least one mode subset (SS for example m, SS n) MCL less than detection window length w, then flow process proceeds to S205.At S205, with present mode P jTogether with all and present mode P jMCL less than the mode subset of detection window length together as a new mode subset.Particularly, exist and present mode P if find jMCL less than a plurality of mode subset SS of detection window length w m, SS n, then with these mode subsets SS m, SS nBe merged into a new mode subset SS j, and with this present mode P jBe divided into this new mode subset SS jIn; If only find an existing mode subset SS mWith present mode P jMCL less than detection window length w, then can be directly with present mode P jBe divided into this mode subset SS mIn.Then, in step S206, judge this present mode P jWhether be last pattern.If not, flow process turns back to step S201, from still sorting out next pattern as present mode in the unallocated pattern to mode subset, repeats above-mentioned steps.If find that at step S206 this present mode has been last pattern, promptly all patterns all have been divided and have finished, and have then obtained the subclass of several mutual exclusions.Yet the size of these subclass may be very unbalanced.Thereby, utilize at step S207 known for example some less subclass to be merged, thereby make the size of subclass become balanced more at " greedy algorithm (greedyalgorithm) " of " knapsack " problem.For example, at subclass SS 1, SS 2And SS 3Under the mutually exclusive situation, with SS 2And SS 3The subclass that obtains after the merging also is and SS certainly 1Mutual exclusion, therefore can guarantee the correctness of mutual exclusion subset division.Certainly, flow process shown in Figure 2 only is as example, also can adopt other algorithms that set of patterns is divided into some mutual exclusion subclass.
A real example that data is flow to the row mode coupling according to the method for the invention has been shown among Fig. 3.Here the set of patterns that supposition need be searched is { ABCD, DEFG, XYYZ, XYZZ, 1234,4321}, when the data slot that will detect is no more than 7 bytes, three subclass { ABCD wherein obviously, DEFG}, { XYYZ, XYZZ} and { 1234,4321} is mutually exclusive.As seen from Figure 3, the pattern matching engine of originally having stored 6 patterns is replaced by three less pattern matching engines (TCAM) PM1, PM2, PM3,2 patterns of storage are all handled respectively three data fragments by these three PM engines at any time simultaneously in each PM engine.
As can see from Figure 3, suppose that the input traffic that will check is by a long character string " ABCDEF GABCDE1 212C123 4ABCDE1 3ABCDE33...123456... " form; " GA " wherein, " 21 ", " 4A " and " 3A " etc. can be considered to " anti-pattern " of this target pattern collection, and promptly they add that any suffix and/or prefix can not constitute the subclass of the target pattern that will contrast.These " anti-patterns " as cut-point, are divided into a plurality of fragments with input traffic, as S1=" ABCDEFG ", S2=" ABCDE12 ", S3=" 12C1234 ", S4=" ABCDE13 ", S5=" ABCDE33 ", ..., Sn=" 123456 ", or the like.
As shown in Figure 3, in first stage, fragment S1, S2 and S3 are distributed to three pattern matching engine PM1, PM2 and PM3 respectively.When execution pattern is mated, find that match hit has taken place the mode subset of being stored in S1 and S3 and the corresponding pattern matching engine, thereby in follow-up phase, S1 and S3 have finished their coupling tasks (promptly not needing again it to be done further coupling contrast) in this system, and S2 still will be continued by other pattern matching engine to check.Second stage, S2 is assigned to first pattern matching engine PM1, and two of the back new fragment S4 and S5 are distributed to second and the 3rd pattern matching engine PM2 and PM3 respectively.Like this, this process is carried out successively continuously for entire stream.
Below the performance of parallel mode matching scheme of the present invention is analyzed:
Suppose that the probability that match hit takes place is x%, and suppose and used y TCAM chip that each TCAM chip is handled a mutual exclusion subclass as pattern matching engine.
For the fragment that match hit may take place (suspicious fragment), the desired value of the number of the TCAM that it need travel through is y/2; And for there not being the possible fragment of match hit, this desired value is y, promptly need be checked one time by all TCAM chips.Therefore the TCAM rate matched that can save is with reference to following formula:
1 - ( y / 2 ) × x % + y × ( 1 - x % ) y = x % 2
For the ideal situation that match hit does not take place, i.e. x%=0, its efficient is identical with existing method, is the frequent worst condition that suspicious feature code occurs yet IDS uses what often consider.Under these circumstances, the effractor may initiate ddos attack, and wherein suspicious feature code is from a large amount of " corpse machine (corpse machine) ", and the match hit rate may be near 100%, and many practical applications all can face this situation.
In experiment, the applicant is at by MIT[1] provided have ddos attack hacker packet and SNORT intrusion detection set of patterns has carried out simulation test, 219,148,064 byte (adds that packet header is exactly 329,322,084 bytes) taken place 56,072 in the packet that has attack signature, 428 pattern match hit, the match hit probability of this moment is about 26%, and therefore in this case, the TCAM rate matched that can save has reached 13%.
In addition, under the situation of " network work amount/application class/identification ", the match hit rate may be much more taller than the situation of IDS/IPS because the content of packet finally all should with set of patterns in one be complementary, promptly match hit rate x% levels off to 100%.In this case, more favourable according to the solution of the present invention, the speed that TCAM searches can save about 50%.
Should be understood that the embodiment that describes in detail by Fig. 1 to 3 only is as example above, and is not limitation of the present invention.Also can adopt implementation to the division of the cutting apart of input traffic, mode subset and load balancing and pipeline processes, only otherwise deviate from the present invention and adopt the mutual exclusion principle to carry out the thought of parallel processing.Pattern matching engine also is not limited to TCAM, also can consider to adopt other to have the processor of PARALLEL MATCHING operational capability.
Embodiments of the present invention can by hardware, software, firmware or it be in conjunction with realizing.One skilled in the art would recognize that also in can be on the signal bearing medium that uses for any suitable data treatment system set computer program and embody the present invention.The sort signal bearing medium can be transmission medium or the recordable media that is used for machine sensible information, comprises magnetizing mediums, light medium or other suitable media.The example of recordable media comprises: disk in the hard disk drive or floppy disk, the CD that is used for CD-ROM drive, tape, and thinkable other media of those skilled in the art.One skilled in the art would recognize that any communication terminal with suitable programmed device all can carry out the step as the inventive method that embodies in the program product.
Should be noted that for the present invention is more readily understood top description has been omitted to be known for a person skilled in the art and may to be essential some ins and outs more specifically for realization of the present invention.
The purpose that specification of the present invention is provided is in order to illustrate and to describe, rather than is used for exhaustive or limits the invention to disclosed form.For those of ordinary skill in the art, many modifications and changes all are conspicuous.
Therefore; selecting and describing execution mode is in order to explain principle of the present invention and practical application thereof better; and those of ordinary skills are understood, under the prerequisite that does not break away from essence of the present invention, all modifications and change all fall within protection scope of the present invention defined by the claims.

Claims (20)

1. method for mode matching that is used for data flow may further comprise the steps:
The data flow of input is divided into a plurality of fragments, and these fragments are distributed in a plurality of pattern matching engines one respectively, wherein each pattern matching engine is stored in a plurality of mode subsets of mutual exclusion under the given detection window length one; And
Any one pattern matching engine in described a plurality of pattern matching engine carries out the pattern matching contrast according to the mode subset of wherein storage to the fragment of being distributed, and under the situation of the mode subset match hit of any one pattern matching engine storage of this fragment and this this fragment is exported.
2. the method for claim 1, wherein in response to certain pattern matching engine find to distribute to it fragment can not with the mode subset match hit of this pattern matching engine storage, this fragment is delivered to another pattern matching engine proceed the pattern matching contrast, wherein this fragment was not carried out the pattern matching contrast in described another pattern matching engine; And
In all pattern matching engines, carry out all not having match hit after the pattern matching contrast in response to certain fragment, report that match hit does not take place this fragment, finish inspection this fragment.
3. method as claimed in claim 2 is wherein enclosed an indication vector on each fragment, be used to refer to this fragment and which pattern matching engine carried out the pattern matching contrast by.
4. the method for claim 1, wherein when data flow that will input was divided into a plurality of fragment, the data flow of described input was divided into the fragment that a plurality of length are not less than detection window length.
5. method as claimed in claim 4, wherein when the data flow with input was divided into a plurality of fragment, making needed the pattern of contrast not cross over two cut-points between the fragment.
6. method as claimed in claim 5, wherein as required the anti-pattern of pattern of contrast come on the specified data stream cut-point and in this cut-point punishment pitch cutting section, wherein said anti-pattern adds that any suffix and/or prefix can not constitute the described pattern that needs contrast.
7. as each described method in the claim 1 to 6, further comprise:
The set of patterns that comprises a plurality of patterns is carried out preliminary treatment, so as will be wherein length become length to be no more than the pattern of described detection window length greater than the mode division of described detection window length;
Pretreated set of patterns is divided into the mode subset of described a plurality of mutual exclusions under given detection window length.
8. method as claimed in claim 7, the step that wherein pretreated set of patterns is divided into the mode subset of described a plurality of mutual exclusions comprises:
Described set of patterns is divided into the mode subset of mutual exclusion as much as possible;
The mode subset of less mutual exclusion is merged, with the equilibrium of implementation pattern sub-set size.
9. method as claimed in claim 8, the step that wherein described set of patterns is divided into the mode subset of mutual exclusion as much as possible comprises:
From still selecting a pattern in the unallocated pattern to any one mode subset of having divided,
The minimum of each mode subset that calculates this pattern and divided merges length,
Whether judge all the minimum length that merge calculate all greater than given detection window length, in response to this judged result for being, with this pattern as a new mode subset; Otherwise, just this pattern together with all and the minimum of this pattern merge length less than the mode subset of detection window length as a new mode subset,
Repeat above-mentioned steps, patterns all in set of patterns all are divided in the mode subset, thereby obtain the mode subset of several mutual exclusions.
10. the method for claim 1, further comprise: at least one in the mode subset number of the mutual exclusion of change detection window length and expectation, the set of patterns that comprises a plurality of patterns is repeatedly divided so that make the mode subset size equalization of the mutual exclusion of being divided.
11. one kind is used for pattern-matching device for data stream, comprises:
A plurality of pattern matching engines, each pattern matching engine are stored in a plurality of mode subsets of mutual exclusion under the given detection window length;
Stream is cut apart and allocation units, is used for the data flow of input is divided into a plurality of fragments, and these fragments are distributed to described pattern matching engine respectively, wherein
Any one pattern matching engine in described a plurality of pattern matching engine carries out the pattern matching contrast according to the mode subset of wherein storage to the fragment of being distributed, and under the situation of the mode subset match hit of any one pattern matching engine storage of this fragment and this this fragment is exported.
12. device as claimed in claim 11, wherein find to distribute to its fragment can not be with the mode subset match hit of this pattern matching engine storage the time when certain pattern matching engine, described stream is cut apart and allocation units are delivered to another pattern matching engine with this fragment and proceeded pattern matching contrast, and wherein this fragment was not carried out the pattern matching contrast in described another pattern matching engine; And
Cut apart and allocation units when finding not exist described another pattern matching engine when described stream, report that match hit does not take place this fragment, finish inspection this fragment.
13. device as claimed in claim 12, wherein said stream is cut apart and allocation units are enclosed an indication vector on each fragment, described pattern matching engine is revised the indication vector of this fragment after checking a fragment, thereby indicates this fragment which pattern matching engine to be carried out the pattern matching contrast by.
14. device as claimed in claim 11, wherein said stream are cut apart and allocation units are divided into the fragment that a plurality of length are not less than detection window length with the data flow of importing.
15. device as claimed in claim 14, wherein said stream are cut apart and allocation units in data flow that will input when being divided into a plurality of fragment, making needs the pattern of contrast not cross over two cut-points between the fragment.
16. device as claimed in claim 15, wherein said stream is cut apart and the anti-pattern of the pattern that allocation units contrast as required come on the specified data stream cut-point and in this cut-point punishment pitch cutting section, wherein said anti-pattern adds that any suffix and/or prefix can not constitute the described pattern that needs contrast.
17. as each described device in the claim 11 to 16, wherein also comprise mutual exclusion subset division unit, be used for to the set of patterns that comprises a plurality of patterns carry out preliminary treatment in case will be wherein length become length to be no more than the pattern of described detection window length greater than the mode division of described detection window length, and pretreated set of patterns is divided into the mode subset of a plurality of mutual exclusions under described detection window length.
18. device as claimed in claim 17, wherein said mutual exclusion subset division unit at first is divided into described pretreated set of patterns the mode subset of mutual exclusion as much as possible, and then less mode subset merged, with the equilibrium of implementation pattern sub-set size.
19. device as claimed in claim 18, wherein said mutual exclusion subset division unit are carried out following operation when described set of patterns being divided into the mode subset of mutual exclusion as much as possible:
From still selecting a pattern in the unallocated set of patterns to any one mode subset of having divided, the minimum of calculating this pattern and each mode subset of having divided merges length,
Whether judge all the minimum length that merge calculate all greater than given detection window length, in response to this judged result for being, with this pattern as a new mode subset; Otherwise, with this pattern together with all and the minimum of this pattern merge length less than the mode subset of detection window length as a new mode subset,
Repeat aforesaid operations, patterns all in set of patterns all are divided in the mode subset, thereby obtain the mode subset of several mutual exclusions.
20. device as claimed in claim 17, further comprise mutual exclusion subset division unit, be used for changing at least one of mode subset number of the mutual exclusion of detection window length and expectation, the set of patterns that comprises a plurality of patterns is repeatedly divided so that make the mode subset size equalization of the mutual exclusion of being divided.
CN200910132546.1A 2009-03-31 2009-03-31 Pattern matching method and device for data streams Expired - Fee Related CN101854341B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910132546.1A CN101854341B (en) 2009-03-31 2009-03-31 Pattern matching method and device for data streams

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910132546.1A CN101854341B (en) 2009-03-31 2009-03-31 Pattern matching method and device for data streams

Publications (2)

Publication Number Publication Date
CN101854341A true CN101854341A (en) 2010-10-06
CN101854341B CN101854341B (en) 2014-03-12

Family

ID=42805613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910132546.1A Expired - Fee Related CN101854341B (en) 2009-03-31 2009-03-31 Pattern matching method and device for data streams

Country Status (1)

Country Link
CN (1) CN101854341B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016095103A1 (en) * 2014-12-16 2016-06-23 华为技术有限公司 Storage space management method and device
CN106549969A (en) * 2016-11-21 2017-03-29 英赛克科技(北京)有限公司 Data filtering method and device
WO2018120915A1 (en) * 2016-12-29 2018-07-05 华为技术有限公司 Ddos attack detection method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6631466B1 (en) * 1998-12-31 2003-10-07 Pmc-Sierra Parallel string pattern searches in respective ones of array of nanocomputers
CN1748205A (en) * 2003-02-04 2006-03-15 尖端技术公司 Method and apparatus for data packet pattern matching
CN101296114A (en) * 2007-04-29 2008-10-29 国际商业机器公司 Parallel pattern matching method and system based on stream

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6631466B1 (en) * 1998-12-31 2003-10-07 Pmc-Sierra Parallel string pattern searches in respective ones of array of nanocomputers
CN1748205A (en) * 2003-02-04 2006-03-15 尖端技术公司 Method and apparatus for data packet pattern matching
CN101296114A (en) * 2007-04-29 2008-10-29 国际商业机器公司 Parallel pattern matching method and system based on stream

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李庚等: "入侵检测中一种新的多模式匹配算法", 《计算机应用研究》 *
陈瀛: "入侵检测技术中一种改进的字符串匹配算法的研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016095103A1 (en) * 2014-12-16 2016-06-23 华为技术有限公司 Storage space management method and device
US10261715B2 (en) 2014-12-16 2019-04-16 Huawei Technologies Co., Ltd. Storage space management method and apparatus
CN106549969A (en) * 2016-11-21 2017-03-29 英赛克科技(北京)有限公司 Data filtering method and device
CN106549969B (en) * 2016-11-21 2019-10-22 英赛克科技(北京)有限公司 Data filtering method and device
WO2018120915A1 (en) * 2016-12-29 2018-07-05 华为技术有限公司 Ddos attack detection method and device
CN108259426A (en) * 2016-12-29 2018-07-06 华为技术有限公司 A kind of ddos attack detection method and equipment
CN108259426B (en) * 2016-12-29 2020-04-28 华为技术有限公司 DDoS attack detection method and device
CN111641585A (en) * 2016-12-29 2020-09-08 华为技术有限公司 DDoS attack detection method and device
US11095674B2 (en) 2016-12-29 2021-08-17 Huawei Technologies Co., Ltd. DDoS attack detection method and device
CN111641585B (en) * 2016-12-29 2023-11-10 华为技术有限公司 DDoS attack detection method and device

Also Published As

Publication number Publication date
CN101854341B (en) 2014-03-12

Similar Documents

Publication Publication Date Title
CN110266647B (en) Command and control communication detection method and system
CN109117634B (en) Malicious software detection method and system based on network traffic multi-view fusion
US9336239B1 (en) System and method for deep packet inspection and intrusion detection
CN113315742B (en) Attack behavior detection method and device and attack detection equipment
KR20120112696A (en) Malware detection via reputation system
CN101442540A (en) High speed mode matching algorithm based on field programmable gate array
Zheng et al. Algorithms to speedup pattern matching for network intrusion detection systems
CN101369278B (en) Approximate adaptation method and apparatus
Dener et al. STLGBM-DDS: an efficient data balanced DoS detection system for wireless sensor networks on big data environment
Vij et al. Detection of algorithmically generated domain names using LSTM
CN101854341B (en) Pattern matching method and device for data streams
CN1460932A (en) Hierarchial invasion detection system based on related characteristic cluster
CN1691581A (en) Multi-pattern matching algorithm based on characteristic value and hardware implementation
Aggarwal et al. Static malware analysis using pe header files api
CN117061254A (en) Abnormal flow detection method, device and computer equipment
Ramanathan et al. A Novel Supervised Deep Learning Solution to Detect Distributed Denial of Service (DDoS) attacks on Edge Systems using Convolutional Neural Networks (CNN)
US20190364066A1 (en) Apparatus and method for reconfiguring signature
Ali et al. Scalable malware clustering using multi-stage tree parallelization
Patil et al. Learning to detect phishing web pages using lexical and string complexity analysis
CN101848091B (en) Method and system for processing data search
Todorov et al. FPGA implementation of computer network security protection with machine learning
CN114398887A (en) Text classification method and device and electronic equipment
CN102576392A (en) Malicious code detection
Taylor et al. A smart system for detecting behavioural botnet attacks using random forest classifier with principal component analysis
Afroz et al. On feature selection algorithms for effective botnet detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140312

Termination date: 20210331

CF01 Termination of patent right due to non-payment of annual fee