CN1691581B - Multi-pattern matching algorithm based on characteristic value - Google Patents

Multi-pattern matching algorithm based on characteristic value Download PDF

Info

Publication number
CN1691581B
CN1691581B CN 200410023142 CN200410023142A CN1691581B CN 1691581 B CN1691581 B CN 1691581B CN 200410023142 CN200410023142 CN 200410023142 CN 200410023142 A CN200410023142 A CN 200410023142A CN 1691581 B CN1691581 B CN 1691581B
Authority
CN
China
Prior art keywords
characteristic value
string
character
model
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 200410023142
Other languages
Chinese (zh)
Other versions
CN1691581A (en
Inventor
彭诗力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN 200410023142 priority Critical patent/CN1691581B/en
Publication of CN1691581A publication Critical patent/CN1691581A/en
Application granted granted Critical
Publication of CN1691581B publication Critical patent/CN1691581B/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a multi-mode matching algorithm based on the characteristic value and realized by hardware. The characteristic value of the string is calculated by the exclusive-OR circuits or modular 3, modular 5 hardware circuit (see figure 1, 2, 3, 4) without the intervention of the CPU. The matching process is: organizing the mode library into tree structure according to the length of the mode and characteristic string (see figure 3, 4), and them applying the method of address direct image to complete the searching and matching process of the mode characteristic value (see figure 6). Filtering massive normal data packages on the first matching totally realized by the hardware and confirming the suspected data package on the second matching realized by the hardware or CPU. The flow capacity of the data needed to be confirmed is less than 0.4 percentage of the total flow capacity.

Description

Multi-model matching method based on characteristic value
Technical field
This is an invention about computer network security, extends to fields such as electronic bits of data retrieval, is widely used.Multi-pattern matching algorithm based on characteristic value is mainly used in the high-speed network intrusion detection system, finishes the pattern matching process in the misuse intruding detection system.Checking by experiment, this algorithm can be satisfied the requirement of express network misuse intrusion detection.
Background technology
High Speed Network is the inexorable trend of current network development.Along with networks development, virus, network intrusions, spam, spreading unchecked of flame are the thorny difficult problems that current network faces.Adopt the network processing system (as Network Intrusion Detection System NIDS) of current schema matching algorithm to be difficult in operation effectively in the express network, have to abandon the lot of data bag.
Mainly be BM algorithm and various improvement algorithm in the monotype matching algorithm, what adopt among the famous intrusion detection open source software Snort is exactly the BM algorithm.And the AC algorithm is the classic algorithm in the multi-pattern matching algorithm, and this algorithm uses the structure of finite-state automata to receive character strings all in the set.The processing speed of multi-pattern matching algorithm is faster than the speed of monotype matching algorithm, but its processing procedure is than monotype matching algorithm complexity.
The common feature of these matching algorithms is disposable couplings at present.Promptly once coupling judges just whether network packet comprises model string.Their matching idea is: in network packet model string is directly mated, do not skip behind a certain amount of character then coupling if match then according to certain heuristic strategies.In order to skip more character as far as possible, the processing procedure of various matching algorithms is all very complicated, is difficult to adopt hardware to realize, can only rely on processor to finish its whole matching processs.This process has consumed a large amount of system resource (mainly being cpu resource), is seriously restricting the raising of system's detection rates.
The reason of searching to the bottom mainly is that current matching algorithm exists following significant deficiency in itself:
1. irrelevant in a large number network packet is very serious to the consumption of system resource.Owing to be the method that adopts disposable coupling, a large amount of irrelevant fully network informations have participated in the entire process process, have wasted a large amount of memory spaces and cpu resource, thereby have become the bottleneck of processing speed, and this is that disposable coupling by current algorithm is determined.
2. be difficult to adopt hardware to finish pattern matching process, this is that complexity by matching algorithm is determined.Algorithm is complicated more, and its hardware realizes that difficulty is high more.
3. to the requirement height of processor.Because whole matching process must be handled by processor fully, the speed of processor plays a decisive role to rate matched.The express network information matches is very high to the requirement of processor, often adopts expensive application specific processor.
Summary of the invention
1 thought based on the characteristic value matching algorithm
In real network, the invasion packet only accounts for few part of network total flow.The consumption master of system resource is if it were not in the detection to invasion bag, but in the exhaustive coupling to normal data packet.At these actual conditions, this paper proposes and has realized matching algorithm based on characteristic value with hardware.
For the convenience of explaining, do following hypothesis earlier:
If P is a model string, length is m, and the character among the P is designated as P successively 1, P 2..., P mT is called the network character string for the character string in the packet that obtains from network, just need the text that mates, and length is n, and the character among the T is designated as T successively 1, T 2T nNetwork character substring { T 2, T 3, T 4T M+1Be by substring grouping { T 1, T 2, T 3T mPast character of right translation in the network character string.Not containing the packet of invading feature string becomes normal data packet, otherwise is called the invasion packet.
Define 1 characteristic value: the value that character string obtains through certain simple operation (being easy to hardware realizes), this value is called the characteristic value of this character string, represents with E.Character string and characteristic value are a kind of many-to-one relations, and promptly a character string has and only have a characteristic value, and a plurality of character string with minimum probability corresponding to same characteristic value.
Basic thought based on the characteristic value matching algorithm is: the characteristic value of network character string is compared with the characteristic value of isometric model string, and two character strings do not match certainly if do not wait then; If equate that then two character strings need be carried out the coupling affirmation second time with very big probability match.Briefly be exactly the method for taking twice coupling, at first filter out a large amount of unmatched proper network character strings certainly, then suspicious network character string is carried out accurately mating the second time.Matching algorithm requires simply can directly be realized with offloading the CPU by hardware for the first time, and wants to filter out most normal data packet.Coupling is a key point first, the main here matching algorithm of inquiring in detail based on the characteristic value first time.
If model string P={P 1, P 2, P 3P m, length is m, characteristic value is E.Network character string T={T 1, T 2... T n, length is n.At first obtain to grow in the network character string and be the substring { T of m 1, T 2... T mCharacteristic value, compare with model string characteristic value E then, if equate then to carry out coupling affirmation second time; If unequal, then this network character substring certainly and model string do not match.This network character substring becomes { T toward character of right translation 2, T 3... T M+1, mate with model string again.Network character substring characteristic value after the translation can be obtained through simple operation by the characteristic value before the translation, and this process is realized by hardware circuit fully.
In order to improve the degree of parallelism of system, can adopt the method for grouping coupling.When coupling length is the model string of m in the network character string, earlier the network character string is resolved into Individual substring, the length of each substring are m, calculate the characteristic value of each substring then simultaneously with hardware, the characteristic value that calculates are compared with the characteristic value of model string, if equate then to carry out the coupling second time; Otherwise character of translation simultaneously continues coupling, only needs translation just can finish whole coupling computational process m time altogether.
The computational methods of 2 characteristic values
The computational methods of network character substring characteristic value and the computational methods of model string characteristic value are in full accord.Its computational methods have multiple, but they should meet following some requirement:
1, calculates simply, can directly realize, need not CPU and intervene by hardware circuit.
2, can filter out a large amount of normal data packet, that is to say and same characteristic value corresponding characters string should be less.
3, the characteristic value after the translation can be drawn through simple operation by the characteristic value before the translation, to reduce the calculation times of characteristic value.
Define 2 bit vectors: in a character string, the bit of getting on each character ASCII character identical bits is called bit vector by the binary string that character sequence constitutes.Any one is long to have and only has 8 longly to be the bit vector of m for the character string of m.For example in the character string " GOOD " ASCII character of each character be 01000111,01001111,01001111,01000100}, get the bit vector that the lowest order of each character constitutes and be { 1110}.This character string have 8 long be 4 bit vector, be respectively 0000,1111,0000,0000,0110,1111,1110,1110}.The characteristic value of obtaining according to bit vector is called a characteristic value, uses e iSign.8 position characteristic values are formed the characteristic value E of this character string, E=[e 7, e 6, e 5, e 4, e 3, e 2, e 1, e 0].
Define 3 filterabilities: mate the normal data amount of filtration and the ratio of network total flow for the first time and be called filterability, the filterability that general requirement is mated for the first time is high more good more.When invasion was not wrapped in the network, BM etc. the accurately filterability of matching algorithm were 100%.
2.1 XOR evaluation method
Each character can obtain the XOR characteristic value of this character string in the character string by XOR.Establish character string with being without loss of generality and be { S 1, S 2... S m, the characteristic value of this character string then
Figure G2004100231426D00031
Totally 8 bits.Characteristic value as model string " CMD "
Figure G2004100231426D00032
The probability identical calculations that occurs by all codes, then the number of " 1 " is that the probability of odd and even number is identical in the bit vector, i.e. the filterability 50% of a position characteristic value.Separate between each characteristic value, the filterability of 8 position characteristic values is 1-(1/2) 8=99.61%.That is to say that the suspicious data bag that needs to mate for the second time is 0.39% of a total flow.Experimental result shows that its average filtration rate is 99.69%.
In matching process, when unequal as if network character substring spy's the value of levying and the characteristic value of model string, the network character substring needs translation.Network character substring characteristic value before the translation and immigration and the character XOR that shifts out just can draw the characteristic value after the translation.For example, in character string " POWER ", during match pattern character string " PING ", at first obtain the characteristic value of substring " POWE " It does not match with the characteristic value [00010000] of " PING ", and substring becomes " OWER " toward character of right translation.The characteristic value of this substring by Draw, do not need to carry out again 3 times XOR.
When coupling is long during for the m model string, rectificate for the first time and need carry out (m-1) inferior XOR for the network character substring characteristic value of m, only need 2 XORs during later translation.
Hardware realizes it being to adopt simple XOR circuit, as shown in Figure 1.In store length is the characteristic value of the network character substring of m among the E, and its initial value is 0.D m~D 0Be shift register, the network character string is successively from D mInput just can obtain the long network character substring characteristic value of m that is through XOR circuit in register E.
2.2 mould 3 evaluation methods
If a bit vector is { b m, b M-1..., b 1, get a suitable positive integer r it become expression formula:
b mr m-1+b m-2r m-3+……+b 1 (1)
Choose an appropriate positive integer M again and (be generally (b more than or equal to max i) (1≤i≤m) least prime), at the residual class ring Z of mould M mIn,
Order
e = Σ i = 1 m b i r i - 1 - - - ( 2 )
E is exactly the position characteristic value of this bit vector.
In order to improve filterability, should make truth of a matter r at Z mThe unit order of a group is big as far as possible.If M is a prime number, then should make r at Z m *In rank be (M-1), promptly r is cyclic group Z m *Generator.For binary bit vector, generally select M=3, r=2; Elected M=2 is exactly XOR evaluation algorithm during r=2.
When e was the check code of string, 1 dislocation is sure to be found, and the probability that can find 2 dislocations is Can find probability wrong more than 3 or 3 is
Figure G2004100231426D00043
According to top check code theorem, in binary bit vector, make M=3, r=2, the length of model string be m then:
When m=1, the position characteristic value is identical and probability that bit vector is different are 0, and its filterability is 100%.
When m=2, the position characteristic value is identical and probability that bit vector is different are
Figure G2004100231426D00044
Equal 1/2, deteriorate to the filter capacity of XOR.
When m 〉=3, the position characteristic value is identical and probability that bit vector is different are
Figure G2004100231426D00045
Equal 1/3.The probability identical calculations that occurs by each string, position characteristic value can filter out 2/3 normal data packet.Separate between each characteristic value, the filterability of the character string characteristic value of being made up of 8 position characteristic values is 1-(1/3) 8=99.985%.Needing for the second time, the suspicious data amount of coupling only accounts for 0.015% of total flow.
2 i(i>=0) value in mould 3 residual class rings is respectively 1,2,1,2 ... so, asking a characteristic value e iThe time, as long as in the bit vector everybody be multiply by the power of its correspondence again by mould 3 additions.Position characteristic value as bit vector [111] is (1*1+1*2+1*1) mod (3)=01.When the power of position was 2, one of this lt, holding power was need not be shifted in 1 o'clock.
Directly realize for the ease of hardware, at first before each bit of each character ASCII character, add 0, each character is expanded to 16 bits; The odd number character is motionless then, and the even number character moves to left one, finishes weighted; Follow one group of two bit, press mould 3 additions.
For example, ask the detailed process of mould 3 characteristic values of " CMD " as follows:
In " CMD " ASCII character of character be respectively 01000011,01010111,01001101}, after being extended to 16 be:
C 00 01 00 00 00 00 01 01
M 00 01 00 01 00 01 01 01
D 00 01 00 00 01 01 00 01
Weighting displacement back (" M " moves to left one, and " C, D " is constant), one group of 2 bit gets by mould 3 additions:
C 00 01 00 00 00 00 01 01
M 00 10 00 10 00 10 10 10
mod(3)+D 00 01 00 00 01 01 00 01
00 01 00 10 01 00 00 01
The characteristic value that is character string " CMD " is [00 01 00 10 01 00 00 01].
When network character substring characteristic value and model string characteristic value were unequal, the network character substring needed translation, carries out the coupling of next substring.The network character substring characteristic value before the translation and the weighting expanding value of shift-in character are pressed mould 3 additions, and the weighting expanding value that deducts shiftout character by mould 3 just can obtain the characteristic value after the translation again.The weighting expanding value that deducts a character by mould 3 equals to add by mould 3 radix-minus-one complement of the weighting expanding value of this character.Long in coupling is in the model string of m, begins to ask network character substring characteristic value to need 3 add operations of (m-1) apotype, only needs 3 add operations of 2 apotypes later on.
When characteristic value did not match, the network character substring characteristic value after the translation odd number is inferior was by formula
e , i = Σ j = 1 m b j r j - - - ( 3 )
Calculate, need convert formula (2) calculated feature values to.In mould 3 calculates
Figure G2004100231426D00052
Figure G2004100231426D00053
Move mould 3 complement codes of odd number time back characteristic value so make even and just can finish conversion.Mould 3 complement codes can realize that mould 3 complement codes of [01] are [10] by the position of 2 bits in the characteristic value of exchange position, and mould 3 complement codes of [10] are [01], and mould 3 complement codes of [00] are [00].
Mould 3 evaluation hardware circuits are realized as shown in Figure 2.The network character string through the expansion weighting circuit after, input shift register D successively m~D 0In, obtain length through mould 3 add circuits again and be the network substring characteristic value of m, deposit in the register E.The directly process selection circuit output of mould 3 characteristic values that the translation even number is inferior, the mould 3 characteristic values elder generation process mould 3 complementary circuit supplements that the translation odd number is inferior are again by the output of selection circuit.
For example, the characteristic value E=[00 01 00 10 01 00 00 01 of model string " CMD "], the network character string that mate is " HELL ".In order to make the sequence consensus of network character string and model string weighting, with the weighting from right to left of network character string, grouping, the power of each letter is respectively:
H [E L L]
Power: 2121
Calculate the characteristic value E of " ELL " earlier 1=[00 01 00 00 00 01 00 01] are found and mode characteristic values E does not match.The network character substring moves to left one, calculates the characteristic value E of " HEL " 2E 2=(E 1+ " 2H "+"~L ") E is got in mod (3)=[0,010 00 00 00 00 00 01] 2Mould 3 complement codes be [00 01 00 00 00 00 00 10]; Compare once more with mode characteristic values E then, find not match and just know do not comprise pattern string " CMD " in character string " HELL ", coupling finishes.
Also desirable other prime numbers of modulus M, when M=5 in the character string weighted value of each character become 1,2,4,3,1,2,4,3 ...The filterability of respective algorithms is 1-(1/5) 8=99.99974%, need carrying out for the second time, the data volume of coupling only accounts for 0.00026% of total flow.Along with the increase of M, its filterability is corresponding also to be improved, but the hardware circuit realization also can be more difficult, and cost also can increase.
3 multi-mode matching processs based on characteristic value
Aho and Corasick proposed a kind of multi-pattern matching algorithm based on finite state machine (AC algorithm) in 1975, this algorithm allows a plurality of character strings of parallel search simultaneously.The time of search is 0 (n), and the length of setting up time of automaton and model string is linear.Algorithm now commonly used is that AC algorithm and BM algorithm combine and the AC_BM algorithm that forms, and it is that different rules is placed on the scheme-tree, adopts the BM algorithm to retrieve to this scheme-tree then.
Thought based on the multi-pattern matching algorithm of characteristic value is: by its length grouping, again by the characteristic value ordering and set up index, the rule that characteristic value is identical in the group is linked on the same characteristic value index by the form of chained list in group with the rule in the pattern storehouse.By mating for the first time, find suspicious character string characteristic of correspondence value index, then carry out accurately mating the second time, suspicious character string and rule are carried out the second time relatively.
Description of drawings:
Fig. 1 XOR evaluation circuit.
Fig. 2 mould 3 evaluation circuits show.
Fig. 3 rule base organization chart.
Fig. 4 XOR multimode evaluation figure.
Fig. 5 mould 3 multimode evaluation figure
Fig. 6 match circuit figure
Fig. 7 processing time comparison diagram
Embodiment:
The tissue in 1 pattern storehouse
Whole pattern storehouse constitutes a tree structure, and the length of model string is represented (supposition Max (L)=m) with L.Initialization time and pattern storehouse big or small linear.Organizing as shown in Figure 3 of whole pattern storehouse.The coupling of characteristic value by the first time match circuit finish, following rule match is realized by coupling for the second time.
The hardware of 2 multi-mode eigenvalue calculation is realized
Multi-mode eigenvalue calculation circuit mainly is a characteristic value of calculating different length network character substring.Whole computational process is independently finished by hardware fully, need not CPU and intervenes.
2.1 XOR hardware counting circuit
In order to accelerate computational speed, adopt shift register group and XOR circuit group to obtain the network character substring characteristic value of different length simultaneously.Its circuit is realized as shown in Figure 4: (easy for what express, the maximum length of establishing model string in the rule base is 7).
If the network character string T={ " abcdefghijklmn " of input }, input shift register R successively 7~R 0, the initial value in all registers all is 0.
During first beat, character ' a ' moves into D 7, deposit R in through after the XOR circuit 7In.
During second beat, character ' b ' moves into D 7With R 7In ' a ' XOR after the R that restores 7In; Character ' a ' moves into D 6, deposit R in through after the XOR circuit 6In.
During the 7th beat, D 7~D 1In in store " gfedcba "; R 7In in store
Figure G2004100231426D00061
Value, R 6In in store Value, the rest may be inferred by analogy for it.
During the 8th beat, character ' a ' moves into D 0Return again and each characteristic value XOR, remove ' a ' character; This moment R 7In in store
Figure G2004100231426D00063
Value, R 6In in store Value.R just 7In always in store length be 7 network character substring characteristic value, R 6In always in store length be 6 network character substring characteristic value, the rest may be inferred by analogy for it.
The network character substring XOR characteristic value of all lengths just can be obtained simultaneously in character of every immigration, the model string characteristic value of equal length in characteristic value and the pattern storehouse mated again; Be R 7In network character substring characteristic value and the pattern storehouse in the branch coupling of L=7, R 6Characteristic value and the pattern storehouse in the branch coupling or the like of L=6.
2.2 mould 3 hardware counting circuits
The arithmetic speed of XOR circuit is very fast, but its filterability is not very high, needs the suspicious network character string of the coupling second time still more.In order to be adapted to the more requirement of express network, should adopt the higher algorithm of filterability, as mould 3, mould 5, mould 7 algorithms etc.
The hardware circuit of mould 3 algorithms is realized as shown in Figure 5.(for easy, the maximum length of establishing pattern string is 5, and all register initial values are 0).
At first through the weighting expansion, each character becomes 16 bits to the character string of input.Input shift register then is when arriving D 0The time again through getting inverter circuit return come with each characteristic value by mould 3 additions, remove this character.
For example, " abcdef}, the ASCII character of each character is that { 61F, 62F, 63F, 64F, 65F, 66F} are that { 2802F, 1404F, 280AF, 1420F, 2822F, 1414F} import D successively behind the extended shift to fan-in network character string T={ 5~D 0
During first beat, 2802F moves into D 5Add circuit through mould 3 and deposit R in 5In.
During second beat, 1404F moves into D 5With R 5In 2802F grouping equal 0006F after adding by mould 3, R restores 7In; 2802F moves into D 4, deposit R in after process mould 3 adds circuit 4In.
During the 5th beat, D 1~D 5In in store { 2802F, 1404F, 280AF, 1420F, 2822F, 1414F}; R 5In in store character string { mould 3 characteristic values of abcde}, R 6In in store { mould 3 characteristic values of abcd}, the rest may be inferred by analogy for it.
During the 8th beat, character ' the weighting expanding value of a ' moves into D 0After get non-ly, return again and add by mould 3, remove with each characteristic value ' a ' character; This moment R 5In in store { bcdef} is (3) mould 3 characteristic values of calculating by formula, obtain by formula mould 3 characteristic values that calculate (2) through complementing circuit.R 4In in store { bcde} is (3) mould 3 characteristic values of calculating by formula, through complementing circuit output (2) mould 3 characteristic values of calculating by formula.The rest may be inferred by analogy for it.
Beat of later every mistake just can draw mould 3 characteristic values of all lengths network character string simultaneously, again with the pattern storehouse in the model string characteristic value of equal length be complementary.
The processing of 3 coupling place processes
In order to improve matching speed, reduce the consuming time of in pattern storehouse search characteristics value, adopt the method for direct address mapping to finish search procedure to mode characteristic values.When the pattern library initialization, prop up the memory space that distributes 256 bytes (or word) for the rule tree of each length, and the address of first unit is kept in the base register of correspondence, represent the initial address of this length model string characteristic value in internal memory.An in store pointer in each memory cell, this pointed mode characteristic values equals the rule of its offset address.If do not have mode characteristic values and its offset address to equate, then pointer is empty; If have the characteristic value of a plurality of rules and its offset address to equate that then Else Rule is linked at first regular back successively, forms a list structure.Draw the network character substring characteristic value of a certain length when the eigenvalue calculation circuit after, directly finding offset address is the memory cell of its characteristic value.If this Storage Unit Pointer is empty, show that then this character substring does not match in the pattern storehouse; Otherwise and the rule of pointed is carried out the coupling second time.In the Snort rule base, extract 100 regular formation pattern library structures out, the memory cell of pointer non-NULL accounts for 2.07% of total memory space, rule corresponding to same memory cell accounts for 0.19%, and along with the increase in pattern storehouse, the memory cell of pointer non-NULL also can increase.
Whole match circuit as shown in Figure 6.Behind the network character string process multi-mode eigenvalue calculation circuit of input, obtain the characteristic value of all lengths substring., add with the isometric pattern string base address of this substring to form an actual physical address as offset address with the characteristic value of substring.If the pointer in this physical address memory cell is empty, show that this network character substring does not match in the pattern storehouse; If the Storage Unit Pointer non-NULL then carries out the rule of network character substring and pointed the coupling affirmation second time.As can be seen, very short from the time that being calculated to of network character substring characteristic value found corresponding length mode characteristic values, can ignore substantially.
If adopt the method calculating character string characteristic value of mould 5, its filterability can reach 99.99974%, needs the data volume of the coupling second time considerably less, only accounts for 0.00026% of network total flow.Become the bottleneck of processing but hardware realization more complicated, the time-delay of data on hardware also can increase to some extent, need a plurality of hardware circuit parallel processings to solve.
This algorithm mainly is to adopt the method for twice coupling and the burden that hardware realizes mitigation system CPU, and the misuse intruding detection system can be moved in express network effectively.Whole match circuit can be made card format, is inserted in the PCI slot of system board, or directly is integrated on network interface card or other circuit board.The computational process of characteristic value is finished jointly by shift register group, XOR circuit group (or mould 3 counting circuit groups) and registers group in the coupling for the first time.Wherein shift register group directly receives the packet that the high speed network interface card transmits, and process XOR circuit group (or mould 3 counting circuit groups) calculates the characteristic value of the network character substring of different length, is kept in the registers group then.When a cover counting circuit can't meet the demands, can be by the parallel processing simultaneously of many covers counting circuit.The mode of rule storehouse is kept among the EEPROM after by initialization process, directly links to each other with the counting circuit group, to avoid the frequent initialization process and the data transfer delay in pattern storehouse.When need upgrading, the pattern storehouse in EEPROM, adds rule again.Matching process can be born by CPU for the second time, also can be realized by hardware comparison circuit.When after finishing whole matching process, finding the invasion packet, transfer the regular code of packet and its correspondence to the system processing of further analyzing and report to the police together.
In order further to improve processing speed, this algorithm can be used with protocol analysis method.The network packet of different agreement type is distributed to different pattern matching circuits handle the matched rule of in store corresponding protocol type among the EEPROM of each match circuit.This process involves the problem of load balancing, but it can improve processing speed effectively, and is convenient to next step processing of system.
4 beneficial effects
Actual performance for testing algorithm, in network, catch the 150M data at random, model string is from the rule base in the Snort intruding detection system, to multi-pattern matching algorithm, added up between the execution of process simulation, XOR hardware is realized the processing time and mould 3 hardware are realized the processing time based on characteristic value.Find the time of corresponding length mode characteristic values very short from being calculated to of characteristic value, the consuming time of algorithm mainly is to mate for the second time.
The concrete time is as shown in table 1:
Table 1 processing time (unit: second)
The pattern string number The BM algorithm The AC algorithm The XOR process simulation XOR hardware is realized Mould 3 hardware are realized
10 64.78 19.74 10.14 0.15 0.035
20 19.69 20.87 12.78 0.29 0.010
50 48.79 22.29 15.71 0.71 0.055
100 96.91 25.41 21.43 1.39 0.079
150 140.43 28.91 23.27 2.11 0.104
200 186.65 32.86 27.81 2.79 0.128
500 485.72 39.97 34.16 6.97 0.27
Experiment condition: 500M Intel CPU, 256M RAM, Win2000.
Data show in the table, and the processing time of BM algorithm is linear growth along with increasing of pattern string, than other algorithms consuming time how a lot.The AC algorithm is very effective in the multi-mode coupling, but still can't satisfy the requirement of High Speed Network intrusion detection.The XOR hardware circuit implementation can satisfy the requirement of 100M network invasion monitoring substantially; Mould 3 hardware circuit implementation can satisfy the intrusion detection requirement of 1000M network; Can adopt hardware circuits such as mould 5, mould 7 to express network more.
By legend, can more be clear that the processing time of algorithms of different.Because the BM algorithm is consuming time too many, has just saved in the drawings.Specifically more as shown in Figure 7.
Along with increasing of rule in the pattern storehouse, the processing time of various algorithms also all can increase to some extent, but the characteristic value algorithm is not very sensitive to it.The increase in pattern storehouse is to mating for the first time basic not influence consuming time, and only the time of coupling has increase slightly for the second time.
Essence based on the characteristic value matching algorithm is to adopt the thought of secondary coupling, has changed the thought of existing matching algorithm fully.The bottleneck of processing speed is transferred to the coupling first time that is realized by hardware, filter out a large amount of normal data packet, then a small amount of suspicious data bag is carried out accurately mating the second time.Preferably mate for the first time and for the second time and all realize,, improve the efficient of operation with the burden of mitigation system by hardware.It is multiple to ask the method for characteristic value to have, as read group total or cyclic redundancy calculating etc., but they should satisfy as far as possible before provided ask the characteristic value condition.

Claims (3)

1. based on the multi-model matching method of characteristic value, adopt the method for twice coupling, coupling filters out a large amount of normal packets for the first time, for the second time the suspicious data bag is further mated affirmation; Coupling is realized by hardware fully for the first time, and coupling can be realized also can being born by CPU by hardware for the second time; The matching process of characteristic value is the method that adopts the direct address mapping, obtain its characteristic value behind the character string process multi-mode eigenvalue calculation circuit, as offset address, add model string base address formation actual physical address isometric with the character string characteristic value with this character string; If the pointer in this physical address memory cell is empty, then character string does not match in the pattern storehouse, otherwise the model string of character string and pointed is carried out the coupling affirmation second time.
2. the multi-model matching method based on characteristic value according to claim 1, it is characterized in that: multi-mode eigenvalue calculation circuit is to be made of shift register group, counting circuit group and characteristic value registers group, and wherein the counting circuit group is XOR counting circuit group, mould 3 counting circuit groups or mould 5 counting circuit groups; The characteristic value of all lengths character string is calculated simultaneously by multi-mode eigenvalue calculation circuit, does not need CPU to intervene.
3. according to the described multi-model matching method of claim 1 based on characteristic value, it is characterized in that: the model string in the pattern storehouse is divided into groups by its length, the memory space that distributes 256 bytes or word to the model string of each length, and the address of first unit is kept in the base register of correspondence, represent the initial address of this length model string characteristic value in internal memory; An in store pointer in each memory cell, this pointed mode characteristic values equals the model string of its offset address; If do not have model string characteristic value and its offset address to equate, then pointer is empty; If have a plurality of model string characteristic values and its offset address to equate that then other model string is linked at first model string back successively, forms a list structure.
CN 200410023142 2004-04-26 2004-04-26 Multi-pattern matching algorithm based on characteristic value Expired - Fee Related CN1691581B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200410023142 CN1691581B (en) 2004-04-26 2004-04-26 Multi-pattern matching algorithm based on characteristic value

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200410023142 CN1691581B (en) 2004-04-26 2004-04-26 Multi-pattern matching algorithm based on characteristic value

Publications (2)

Publication Number Publication Date
CN1691581A CN1691581A (en) 2005-11-02
CN1691581B true CN1691581B (en) 2010-04-28

Family

ID=35346743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200410023142 Expired - Fee Related CN1691581B (en) 2004-04-26 2004-04-26 Multi-pattern matching algorithm based on characteristic value

Country Status (1)

Country Link
CN (1) CN1691581B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106453438A (en) * 2016-12-23 2017-02-22 北京奇虎科技有限公司 Network attack identification method and apparatus

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101009660B (en) * 2007-01-19 2010-06-30 杭州华三通信技术有限公司 Universal method and device for processing the match of the segmented message mode
CN101409623B (en) * 2008-11-26 2010-09-01 湖南大学 Mode matching method facing to high speed network
CN101873199B (en) * 2010-06-29 2014-11-05 中兴通讯股份有限公司 Matching method and device of code words
CN101930458B (en) * 2010-08-18 2012-02-01 杭州东信北邮信息技术有限公司 Short message matching method based on characteristic value
CN105354150B (en) * 2015-10-31 2018-03-16 杭州华为数字技术有限公司 A kind of content matching method and apparatus
CN106101060B (en) * 2016-05-24 2021-02-12 新华三技术有限公司 Information detection method and device
CN106936834B (en) * 2017-03-16 2020-12-11 国网江苏省电力公司淮安供电公司 Method for intrusion detection of IEC61850 digital substation SMV message
CN110502611B (en) * 2019-08-01 2022-04-12 武汉虹信科技发展有限责任公司 Character string retrieval method and device
CN112118248B (en) * 2020-09-11 2022-06-14 苏州浪潮智能科技有限公司 Cloud platform virtual machine abnormal flow detection method and device, virtual machine and system
CN113051569B (en) * 2021-03-31 2024-05-28 深信服科技股份有限公司 Virus detection method and device, electronic equipment and storage medium
CN114285624B (en) * 2021-12-21 2024-05-24 天翼云科技有限公司 Attack message identification method, device, network equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1105464A (en) * 1993-04-21 1995-07-19 国际商业机器公司 Interactive computer system recognizing spoken commands
CN1419763A (en) * 2000-02-24 2003-05-21 纽米雷克斯投资公司 Non-invasive remote monitoring and reporting of digital communications systems
CN1423892A (en) * 1999-11-12 2003-06-11 通用器材公司 Intrusion detection for object security

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1105464A (en) * 1993-04-21 1995-07-19 国际商业机器公司 Interactive computer system recognizing spoken commands
CN1423892A (en) * 1999-11-12 2003-06-11 通用器材公司 Intrusion detection for object security
CN1419763A (en) * 2000-02-24 2003-05-21 纽米雷克斯投资公司 Non-invasive remote monitoring and reporting of digital communications systems

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106453438A (en) * 2016-12-23 2017-02-22 北京奇虎科技有限公司 Network attack identification method and apparatus
CN106453438B (en) * 2016-12-23 2019-12-10 北京奇虎科技有限公司 Network attack identification method and device

Also Published As

Publication number Publication date
CN1691581A (en) 2005-11-02

Similar Documents

Publication Publication Date Title
CN1691581B (en) Multi-pattern matching algorithm based on characteristic value
US20230361984A1 (en) Method and system for confidential string-matching and deep packet inspection
KR100960120B1 (en) Signature String Storing Memory Structure and the Storing Method for the Same, Signature String Pattern Matching Method
WO2003091910A2 (en) Trap matrix search engine for retrieving content
CN101753445A (en) Fast flow classification method based on keyword decomposition hash algorithm
CN106487512A (en) A kind of RSA key is to quick-speed generation system and method
CN104881439A (en) Method and system for space-efficient multi-pattern matching
Bremler-Barr et al. Encoding short ranges in TCAM without expansion: Efficient algorithm and applications
CN111370064B (en) Rapid classification method and system for gene sequences of SIMD (Single instruction multiple data) -based hash function
CN106062740B (en) Method and device for generating multiple index data fields
Chhabra et al. Engineering order‐preserving pattern matching with SIMD parallelism
Li et al. Re-randomized densification for one permutation hashing and bin-wise consistent weighted sampling
Kuszmaul Fast algorithms for finding pattern avoiders and counting pattern occurrences in permutations
CN113312058B (en) Similarity analysis method for intelligent contract binary function
CN117390480A (en) Information extraction method, device, equipment and storage medium
Matula et al. Two linear-time algorithms for five-coloring a planar graph
Hilgurt A Survey on Hardware Solutions for Signature-Based Security Systems.
Hailemariam et al. A knowledge-based Query Tree with Shortcutting and Couple-Resolution for RFID tag identification
Jiang et al. A fast regular expression matching engine for NIDS applying prediction scheme
Song et al. Novel graph processor architecture
CN106227852A (en) The recognition methods of seismic prospecting performance data file and device
Chan et al. Efficient algorithms for finding a longest common increasing subsequence
Bouillaguet Nice attacks—but what is the cost? computational models for cryptanalysis
Fredriksson et al. Average-optimal multiple approximate string matching
Jiang et al. Clusterfa: a memory-efficient dfa structure for network intrusion detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100428

Termination date: 20120426