CN1691581B

CN1691581B - Multi-pattern matching algorithm based on characteristic value

Info

Publication number: CN1691581B
Application number: CN 200410023142
Authority: CN
Inventors: 彭诗力
Original assignee: Individual
Current assignee: Individual
Priority date: 2004-04-26
Filing date: 2004-04-26
Publication date: 2010-04-28
Anticipated expiration: 2024-04-26
Also published as: CN1691581A

Abstract

The invention discloses a multi-mode matching algorithm based on the characteristic value and realized by hardware. The characteristic value of the string is calculated by the exclusive-OR circuits or modular 3, modular 5 hardware circuit (see figure 1, 2, 3, 4) without the intervention of the CPU. The matching process is: organizing the mode library into tree structure according to the length of the mode and characteristic string (see figure 3, 4), and them applying the method of address direct image to complete the searching and matching process of the mode characteristic value (see figure 6). Filtering massive normal data packages on the first matching totally realized by the hardware and confirming the suspected data package on the second matching realized by the hardware or CPU. The flow capacity of the data needed to be confirmed is less than 0.4 percentage of the total flow capacity.

Description

Multi-model matching method based on characteristic value

Technical field

This is an invention about computer network security, extends to fields such as electronic bits of data retrieval, is widely used.Multi-pattern matching algorithm based on characteristic value is mainly used in the high-speed network intrusion detection system, finishes the pattern matching process in the misuse intruding detection system.Checking by experiment, this algorithm can be satisfied the requirement of express network misuse intrusion detection.

Background technology

High Speed Network is the inexorable trend of current network development.Along with networks development, virus, network intrusions, spam, spreading unchecked of flame are the thorny difficult problems that current network faces.Adopt the network processing system (as Network Intrusion Detection System NIDS) of current schema matching algorithm to be difficult in operation effectively in the express network, have to abandon the lot of data bag.

Mainly be BM algorithm and various improvement algorithm in the monotype matching algorithm, what adopt among the famous intrusion detection open source software Snort is exactly the BM algorithm.And the AC algorithm is the classic algorithm in the multi-pattern matching algorithm, and this algorithm uses the structure of finite-state automata to receive character strings all in the set.The processing speed of multi-pattern matching algorithm is faster than the speed of monotype matching algorithm, but its processing procedure is than monotype matching algorithm complexity.

The common feature of these matching algorithms is disposable couplings at present.Promptly once coupling judges just whether network packet comprises model string.Their matching idea is: in network packet model string is directly mated, do not skip behind a certain amount of character then coupling if match then according to certain heuristic strategies.In order to skip more character as far as possible, the processing procedure of various matching algorithms is all very complicated, is difficult to adopt hardware to realize, can only rely on processor to finish its whole matching processs.This process has consumed a large amount of system resource (mainly being cpu resource), is seriously restricting the raising of system's detection rates.

The reason of searching to the bottom mainly is that current matching algorithm exists following significant deficiency in itself:

1. irrelevant in a large number network packet is very serious to the consumption of system resource.Owing to be the method that adopts disposable coupling, a large amount of irrelevant fully network informations have participated in the entire process process, have wasted a large amount of memory spaces and cpu resource, thereby have become the bottleneck of processing speed, and this is that disposable coupling by current algorithm is determined.

2. be difficult to adopt hardware to finish pattern matching process, this is that complexity by matching algorithm is determined.Algorithm is complicated more, and its hardware realizes that difficulty is high more.

3. to the requirement height of processor.Because whole matching process must be handled by processor fully, the speed of processor plays a decisive role to rate matched.The express network information matches is very high to the requirement of processor, often adopts expensive application specific processor.

Summary of the invention

1 thought based on the characteristic value matching algorithm

In real network, the invasion packet only accounts for few part of network total flow.The consumption master of system resource is if it were not in the detection to invasion bag, but in the exhaustive coupling to normal data packet.At these actual conditions, this paper proposes and has realized matching algorithm based on characteristic value with hardware.

For the convenience of explaining, do following hypothesis earlier:

If P is a model string, length is m, and the character among the P is designated as P successively ₁, P ₂..., P _mT is called the network character string for the character string in the packet that obtains from network, just need the text that mates, and length is n, and the character among the T is designated as T successively ₁, T ₂T _nNetwork character substring { T ₂, T ₃, T ₄T _M+1Be by substring grouping { T ₁, T ₂, T ₃T _mPast character of right translation in the network character string.Not containing the packet of invading feature string becomes normal data packet, otherwise is called the invasion packet.

Define 1 characteristic value: the value that character string obtains through certain simple operation (being easy to hardware realizes), this value is called the characteristic value of this character string, represents with E.Character string and characteristic value are a kind of many-to-one relations, and promptly a character string has and only have a characteristic value, and a plurality of character string with minimum probability corresponding to same characteristic value.

Basic thought based on the characteristic value matching algorithm is: the characteristic value of network character string is compared with the characteristic value of isometric model string, and two character strings do not match certainly if do not wait then; If equate that then two character strings need be carried out the coupling affirmation second time with very big probability match.Briefly be exactly the method for taking twice coupling, at first filter out a large amount of unmatched proper network character strings certainly, then suspicious network character string is carried out accurately mating the second time.Matching algorithm requires simply can directly be realized with offloading the CPU by hardware for the first time, and wants to filter out most normal data packet.Coupling is a key point first, the main here matching algorithm of inquiring in detail based on the characteristic value first time.

If model string P={P ₁, P ₂, P ₃P _m, length is m, characteristic value is E.Network character string T={T ₁, T ₂... T _n, length is n.At first obtain to grow in the network character string and be the substring { T of m ₁, T ₂... T _mCharacteristic value, compare with model string characteristic value E then, if equate then to carry out coupling affirmation second time; If unequal, then this network character substring certainly and model string do not match.This network character substring becomes { T toward character of right translation ₂, T ₃... T _M+1, mate with model string again.Network character substring characteristic value after the translation can be obtained through simple operation by the characteristic value before the translation, and this process is realized by hardware circuit fully.

In order to improve the degree of parallelism of system, can adopt the method for grouping coupling.When coupling length is the model string of m in the network character string, earlier the network character string is resolved into Individual substring, the length of each substring are m, calculate the characteristic value of each substring then simultaneously with hardware, the characteristic value that calculates are compared with the characteristic value of model string, if equate then to carry out the coupling second time; Otherwise character of translation simultaneously continues coupling, only needs translation just can finish whole coupling computational process m time altogether.

The computational methods of 2 characteristic values

The computational methods of network character substring characteristic value and the computational methods of model string characteristic value are in full accord.Its computational methods have multiple, but they should meet following some requirement:

1, calculates simply, can directly realize, need not CPU and intervene by hardware circuit.

2, can filter out a large amount of normal data packet, that is to say and same characteristic value corresponding characters string should be less.

3, the characteristic value after the translation can be drawn through simple operation by the characteristic value before the translation, to reduce the calculation times of characteristic value.

Define 2 bit vectors: in a character string, the bit of getting on each character ASCII character identical bits is called bit vector by the binary string that character sequence constitutes.Any one is long to have and only has 8 longly to be the bit vector of m for the character string of m.For example in the character string " GOOD " ASCII character of each character be 01000111,01001111,01001111,01000100}, get the bit vector that the lowest order of each character constitutes and be { 1110}.This character string have 8 long be 4 bit vector, be respectively 0000,1111,0000,0000,0110,1111,1110,1110}.The characteristic value of obtaining according to bit vector is called a characteristic value, uses e _iSign.8 position characteristic values are formed the characteristic value E of this character string, E=[e ₇, e ₆, e ₅, e ₄, e ₃, e ₂, e ₁, e ₀].

Define 3 filterabilities: mate the normal data amount of filtration and the ratio of network total flow for the first time and be called filterability, the filterability that general requirement is mated for the first time is high more good more.When invasion was not wrapped in the network, BM etc. the accurately filterability of matching algorithm were 100%.

2.1 XOR evaluation method

Each character can obtain the XOR characteristic value of this character string in the character string by XOR.Establish character string with being without loss of generality and be { S ₁, S ₂... S _m, the characteristic value of this character string then

Totally 8 bits.Characteristic value as model string " CMD "

The probability identical calculations that occurs by all codes, then the number of " 1 " is that the probability of odd and even number is identical in the bit vector, i.e. the filterability 50% of a position characteristic value.Separate between each characteristic value, the filterability of 8 position characteristic values is 1-(1/2) ⁸=99.61%.That is to say that the suspicious data bag that needs to mate for the second time is 0.39% of a total flow.Experimental result shows that its average filtration rate is 99.69%.

In matching process, when unequal as if network character substring spy's the value of levying and the characteristic value of model string, the network character substring needs translation.Network character substring characteristic value before the translation and immigration and the character XOR that shifts out just can draw the characteristic value after the translation.For example, in character string " POWER ", during match pattern character string " PING ", at first obtain the characteristic value of substring " POWE " It does not match with the characteristic value [00010000] of " PING ", and substring becomes " OWER " toward character of right translation.The characteristic value of this substring by Draw, do not need to carry out again 3 times XOR.

When coupling is long during for the m model string, rectificate for the first time and need carry out (m-1) inferior XOR for the network character substring characteristic value of m, only need 2 XORs during later translation.

Hardware realizes it being to adopt simple XOR circuit, as shown in Figure 1.In store length is the characteristic value of the network character substring of m among the E, and its initial value is 0.D _m～D ₀Be shift register, the network character string is successively from D _mInput just can obtain the long network character substring characteristic value of m that is through XOR circuit in register E.

2.2 mould 3 evaluation methods

If a bit vector is { b _m, b _M-1..., b ₁, get a suitable positive integer r it become expression formula:

b _mr ^m-1+b _m-2r ^m-3+……+b ₁ (1)

Choose an appropriate positive integer M again and (be generally (b more than or equal to max _i) (1≤i≤m) least prime), at the residual class ring Z of mould M _mIn,

Order

e = Σ_{i = 1}^{m} b_{i} r^{i - 1 - - - (2)}

E is exactly the position characteristic value of this bit vector.

In order to improve filterability, should make truth of a matter r at Z _mThe unit order of a group is big as far as possible.If M is a prime number, then should make r at Z _m ^*In rank be (M-1), promptly r is cyclic group Z _m ^*Generator.For binary bit vector, generally select M=3, r=2; Elected M=2 is exactly XOR evaluation algorithm during r=2.

When e was the check code of string, 1 dislocation is sure to be found, and the probability that can find 2 dislocations is Can find probability wrong more than 3 or 3 is

According to top check code theorem, in binary bit vector, make M=3, r=2, the length of model string be m then:

When m=1, the position characteristic value is identical and probability that bit vector is different are 0, and its filterability is 100%.

When m=2, the position characteristic value is identical and probability that bit vector is different are

Equal 1/2, deteriorate to the filter capacity of XOR.

When m 〉=3, the position characteristic value is identical and probability that bit vector is different are

Equal 1/3.The probability identical calculations that occurs by each string, position characteristic value can filter out 2/3 normal data packet.Separate between each characteristic value, the filterability of the character string characteristic value of being made up of 8 position characteristic values is 1-(1/3) ⁸=99.985%.Needing for the second time, the suspicious data amount of coupling only accounts for 0.015% of total flow.

2 ⁱ(i＞=0) value in mould 3 residual class rings is respectively 1,2,1,2 ... so, asking a characteristic value e _iThe time, as long as in the bit vector everybody be multiply by the power of its correspondence again by mould 3 additions.Position characteristic value as bit vector [111] is (1*1+1*2+1*1) mod (3)=01.When the power of position was 2, one of this lt, holding power was need not be shifted in 1 o'clock.

Directly realize for the ease of hardware, at first before each bit of each character ASCII character, add 0, each character is expanded to 16 bits; The odd number character is motionless then, and the even number character moves to left one, finishes weighted; Follow one group of two bit, press mould 3 additions.

For example, ask the detailed process of mould 3 characteristic values of " CMD " as follows:

In " CMD " ASCII character of character be respectively 01000011,01010111,01001101}, after being extended to 16 be:

C	00 01 00 00 00 00 01 01
C	00 01 00 00 00 00 01 01	M	00 01 00 01 00 01 01 01
D	00 01 00 00 01 01 00 01	M	00 01 00 01 00 01 01 01

Weighting displacement back (" M " moves to left one, and " C, D " is constant), one group of 2 bit gets by mould 3 additions:

C 00 01 00 00 00 00 01 01

M 00 10 00 10 00 10 10 10

mod(3)+D 00 01 00 00 01 01 00 01

00 01 00 10 01 00 00 01

The characteristic value that is character string " CMD " is [00 01 00 10 01 00 00 01].

When network character substring characteristic value and model string characteristic value were unequal, the network character substring needed translation, carries out the coupling of next substring.The network character substring characteristic value before the translation and the weighting expanding value of shift-in character are pressed mould 3 additions, and the weighting expanding value that deducts shiftout character by mould 3 just can obtain the characteristic value after the translation again.The weighting expanding value that deducts a character by mould 3 equals to add by mould 3 radix-minus-one complement of the weighting expanding value of this character.Long in coupling is in the model string of m, begins to ask network character substring characteristic value to need 3 add operations of (m-1) apotype, only needs 3 add operations of 2 apotypes later on.

When characteristic value did not match, the network character substring characteristic value after the translation odd number is inferior was by formula

{e^{,}}_{i} = Σ_{j = 1}^{m} b_{j} r^{j} - - - (3)

Calculate, need convert formula (2) calculated feature values to.In mould 3 calculates

Move mould 3 complement codes of odd number time back characteristic value so make even and just can finish conversion.Mould 3 complement codes can realize that mould 3 complement codes of [01] are [10] by the position of 2 bits in the characteristic value of exchange position, and mould 3 complement codes of [10] are [01], and mould 3 complement codes of [00] are [00].

Mould 3 evaluation hardware circuits are realized as shown in Figure 2.The network character string through the expansion weighting circuit after, input shift register D successively _m～D ₀In, obtain length through mould 3 add circuits again and be the network substring characteristic value of m, deposit in the register E.The directly process selection circuit output of mould 3 characteristic values that the translation even number is inferior, the mould 3 characteristic values elder generation process mould 3 complementary circuit supplements that the translation odd number is inferior are again by the output of selection circuit.

For example, the characteristic value E=[00 01 00 10 01 00 00 01 of model string " CMD "], the network character string that mate is " HELL ".In order to make the sequence consensus of network character string and model string weighting, with the weighting from right to left of network character string, grouping, the power of each letter is respectively:

H [E L L]

Power: 2121

Calculate the characteristic value E of " ELL " earlier ₁=[00 01 00 00 00 01 00 01] are found and mode characteristic values E does not match.The network character substring moves to left one, calculates the characteristic value E of " HEL " ₂E ₂=(E ₁+ " 2H "+"～L ") E is got in mod (3)=[0,010 00 00 00 00 00 01] ₂Mould 3 complement codes be [00 01 00 00 00 00 00 10]; Compare once more with mode characteristic values E then, find not match and just know do not comprise pattern string " CMD " in character string " HELL ", coupling finishes.

Also desirable other prime numbers of modulus M, when M=5 in the character string weighted value of each character become 1,2,4,3,1,2,4,3 ...The filterability of respective algorithms is 1-(1/5) ⁸=99.99974%, need carrying out for the second time, the data volume of coupling only accounts for 0.00026% of total flow.Along with the increase of M, its filterability is corresponding also to be improved, but the hardware circuit realization also can be more difficult, and cost also can increase.

3 multi-mode matching processs based on characteristic value

Aho and Corasick proposed a kind of multi-pattern matching algorithm based on finite state machine (AC algorithm) in 1975, this algorithm allows a plurality of character strings of parallel search simultaneously.The time of search is 0 (n), and the length of setting up time of automaton and model string is linear.Algorithm now commonly used is that AC algorithm and BM algorithm combine and the AC_BM algorithm that forms, and it is that different rules is placed on the scheme-tree, adopts the BM algorithm to retrieve to this scheme-tree then.

Thought based on the multi-pattern matching algorithm of characteristic value is: by its length grouping, again by the characteristic value ordering and set up index, the rule that characteristic value is identical in the group is linked on the same characteristic value index by the form of chained list in group with the rule in the pattern storehouse.By mating for the first time, find suspicious character string characteristic of correspondence value index, then carry out accurately mating the second time, suspicious character string and rule are carried out the second time relatively.

Description of drawings:

Fig. 1 XOR evaluation circuit.

Fig. 2 mould 3 evaluation circuits show.

Fig. 3 rule base organization chart.

Fig. 4 XOR multimode evaluation figure.

Fig. 5 mould 3 multimode evaluation figure

Fig. 6 match circuit figure

Fig. 7 processing time comparison diagram

Embodiment:

The tissue in 1 pattern storehouse

Whole pattern storehouse constitutes a tree structure, and the length of model string is represented (supposition Max (L)=m) with L.Initialization time and pattern storehouse big or small linear.Organizing as shown in Figure 3 of whole pattern storehouse.The coupling of characteristic value by the first time match circuit finish, following rule match is realized by coupling for the second time.

The hardware of 2 multi-mode eigenvalue calculation is realized

Multi-mode eigenvalue calculation circuit mainly is a characteristic value of calculating different length network character substring.Whole computational process is independently finished by hardware fully, need not CPU and intervenes.

2.1 XOR hardware counting circuit

In order to accelerate computational speed, adopt shift register group and XOR circuit group to obtain the network character substring characteristic value of different length simultaneously.Its circuit is realized as shown in Figure 4: (easy for what express, the maximum length of establishing model string in the rule base is 7).

If the network character string T={ " abcdefghijklmn " of input }, input shift register R successively ₇～R ₀, the initial value in all registers all is 0.

During first beat, character ' a ' moves into D ₇, deposit R in through after the XOR circuit ₇In.

During second beat, character ' b ' moves into D ₇With R ₇In ' a ' XOR after the R that restores ₇In; Character ' a ' moves into D ₆, deposit R in through after the XOR circuit ₆In.

During the 7th beat, D ₇～D ₁In in store " gfedcba "; R ₇In in store

Value, R ₆In in store Value, the rest may be inferred by analogy for it.

During the 8th beat, character ' a ' moves into D ₀Return again and each characteristic value XOR, remove ' a ' character; This moment R ₇In in store

Value, R ₆In in store Value.R just ₇In always in store length be 7 network character substring characteristic value, R ₆In always in store length be 6 network character substring characteristic value, the rest may be inferred by analogy for it.

The network character substring XOR characteristic value of all lengths just can be obtained simultaneously in character of every immigration, the model string characteristic value of equal length in characteristic value and the pattern storehouse mated again; Be R ₇In network character substring characteristic value and the pattern storehouse in the branch coupling of L=7, R ₆Characteristic value and the pattern storehouse in the branch coupling or the like of L=6.

2.2 mould 3 hardware counting circuits

The arithmetic speed of XOR circuit is very fast, but its filterability is not very high, needs the suspicious network character string of the coupling second time still more.In order to be adapted to the more requirement of express network, should adopt the higher algorithm of filterability, as mould 3, mould 5, mould 7 algorithms etc.

The hardware circuit of mould 3 algorithms is realized as shown in Figure 5.(for easy, the maximum length of establishing pattern string is 5, and all register initial values are 0).

At first through the weighting expansion, each character becomes 16 bits to the character string of input.Input shift register then is when arriving D ₀The time again through getting inverter circuit return come with each characteristic value by mould 3 additions, remove this character.

For example, " abcdef}, the ASCII character of each character is that { 61F, 62F, 63F, 64F, 65F, 66F} are that { 2802F, 1404F, 280AF, 1420F, 2822F, 1414F} import D successively behind the extended shift to fan-in network character string T={ ₅～D ₀

During first beat, 2802F moves into D ₅Add circuit through mould 3 and deposit R in ₅In.

During second beat, 1404F moves into D ₅With R ₅In 2802F grouping equal 0006F after adding by mould 3, R restores ₇In; 2802F moves into D ₄, deposit R in after process mould 3 adds circuit ₄In.

During the 5th beat, D ₁～D ₅In in store { 2802F, 1404F, 280AF, 1420F, 2822F, 1414F}; R ₅In in store character string { mould 3 characteristic values of abcde}, R ₆In in store { mould 3 characteristic values of abcd}, the rest may be inferred by analogy for it.

During the 8th beat, character ' the weighting expanding value of a ' moves into D ₀After get non-ly, return again and add by mould 3, remove with each characteristic value ' a ' character; This moment R ₅In in store { bcdef} is (3) mould 3 characteristic values of calculating by formula, obtain by formula mould 3 characteristic values that calculate (2) through complementing circuit.R ₄In in store { bcde} is (3) mould 3 characteristic values of calculating by formula, through complementing circuit output (2) mould 3 characteristic values of calculating by formula.The rest may be inferred by analogy for it.

Beat of later every mistake just can draw mould 3 characteristic values of all lengths network character string simultaneously, again with the pattern storehouse in the model string characteristic value of equal length be complementary.

The processing of 3 coupling place processes

In order to improve matching speed, reduce the consuming time of in pattern storehouse search characteristics value, adopt the method for direct address mapping to finish search procedure to mode characteristic values.When the pattern library initialization, prop up the memory space that distributes 256 bytes (or word) for the rule tree of each length, and the address of first unit is kept in the base register of correspondence, represent the initial address of this length model string characteristic value in internal memory.An in store pointer in each memory cell, this pointed mode characteristic values equals the rule of its offset address.If do not have mode characteristic values and its offset address to equate, then pointer is empty; If have the characteristic value of a plurality of rules and its offset address to equate that then Else Rule is linked at first regular back successively, forms a list structure.Draw the network character substring characteristic value of a certain length when the eigenvalue calculation circuit after, directly finding offset address is the memory cell of its characteristic value.If this Storage Unit Pointer is empty, show that then this character substring does not match in the pattern storehouse; Otherwise and the rule of pointed is carried out the coupling second time.In the Snort rule base, extract 100 regular formation pattern library structures out, the memory cell of pointer non-NULL accounts for 2.07% of total memory space, rule corresponding to same memory cell accounts for 0.19%, and along with the increase in pattern storehouse, the memory cell of pointer non-NULL also can increase.

Whole match circuit as shown in Figure 6.Behind the network character string process multi-mode eigenvalue calculation circuit of input, obtain the characteristic value of all lengths substring., add with the isometric pattern string base address of this substring to form an actual physical address as offset address with the characteristic value of substring.If the pointer in this physical address memory cell is empty, show that this network character substring does not match in the pattern storehouse; If the Storage Unit Pointer non-NULL then carries out the rule of network character substring and pointed the coupling affirmation second time.As can be seen, very short from the time that being calculated to of network character substring characteristic value found corresponding length mode characteristic values, can ignore substantially.

If adopt the method calculating character string characteristic value of mould 5, its filterability can reach 99.99974%, needs the data volume of the coupling second time considerably less, only accounts for 0.00026% of network total flow.Become the bottleneck of processing but hardware realization more complicated, the time-delay of data on hardware also can increase to some extent, need a plurality of hardware circuit parallel processings to solve.

This algorithm mainly is to adopt the method for twice coupling and the burden that hardware realizes mitigation system CPU, and the misuse intruding detection system can be moved in express network effectively.Whole match circuit can be made card format, is inserted in the PCI slot of system board, or directly is integrated on network interface card or other circuit board.The computational process of characteristic value is finished jointly by shift register group, XOR circuit group (or mould 3 counting circuit groups) and registers group in the coupling for the first time.Wherein shift register group directly receives the packet that the high speed network interface card transmits, and process XOR circuit group (or mould 3 counting circuit groups) calculates the characteristic value of the network character substring of different length, is kept in the registers group then.When a cover counting circuit can't meet the demands, can be by the parallel processing simultaneously of many covers counting circuit.The mode of rule storehouse is kept among the EEPROM after by initialization process, directly links to each other with the counting circuit group, to avoid the frequent initialization process and the data transfer delay in pattern storehouse.When need upgrading, the pattern storehouse in EEPROM, adds rule again.Matching process can be born by CPU for the second time, also can be realized by hardware comparison circuit.When after finishing whole matching process, finding the invasion packet, transfer the regular code of packet and its correspondence to the system processing of further analyzing and report to the police together.

In order further to improve processing speed, this algorithm can be used with protocol analysis method.The network packet of different agreement type is distributed to different pattern matching circuits handle the matched rule of in store corresponding protocol type among the EEPROM of each match circuit.This process involves the problem of load balancing, but it can improve processing speed effectively, and is convenient to next step processing of system.

4 beneficial effects

Actual performance for testing algorithm, in network, catch the 150M data at random, model string is from the rule base in the Snort intruding detection system, to multi-pattern matching algorithm, added up between the execution of process simulation, XOR hardware is realized the processing time and mould 3 hardware are realized the processing time based on characteristic value.Find the time of corresponding length mode characteristic values very short from being calculated to of characteristic value, the consuming time of algorithm mainly is to mate for the second time.

The concrete time is as shown in table 1:

Table 1 processing time (unit: second)

The pattern string number	The BM algorithm	The AC algorithm	The XOR process simulation	XOR hardware is realized	Mould 3 hardware are realized
The pattern string number	The BM algorithm	The AC algorithm	The XOR process simulation	XOR hardware is realized	Mould 3 hardware are realized	10	64.78	19.74	10.14	0.15	0.035
20	19.69	20.87	12.78	0.29	0.010	10	64.78	19.74	10.14	0.15	0.035
20	19.69	20.87	12.78	0.29	0.010	50	48.79	22.29	15.71	0.71	0.055
100	96.91	25.41	21.43	1.39	0.079	50	48.79	22.29	15.71	0.71	0.055
100	96.91	25.41	21.43	1.39	0.079	150	140.43	28.91	23.27	2.11	0.104
200	186.65	32.86	27.81	2.79	0.128	150	140.43	28.91	23.27	2.11	0.104
200	186.65	32.86	27.81	2.79	0.128	500	485.72	39.97	34.16	6.97	0.27

Experiment condition: 500M Intel CPU, 256M RAM, Win2000.

Data show in the table, and the processing time of BM algorithm is linear growth along with increasing of pattern string, than other algorithms consuming time how a lot.The AC algorithm is very effective in the multi-mode coupling, but still can't satisfy the requirement of High Speed Network intrusion detection.The XOR hardware circuit implementation can satisfy the requirement of 100M network invasion monitoring substantially; Mould 3 hardware circuit implementation can satisfy the intrusion detection requirement of 1000M network; Can adopt hardware circuits such as mould 5, mould 7 to express network more.

By legend, can more be clear that the processing time of algorithms of different.Because the BM algorithm is consuming time too many, has just saved in the drawings.Specifically more as shown in Figure 7.

Along with increasing of rule in the pattern storehouse, the processing time of various algorithms also all can increase to some extent, but the characteristic value algorithm is not very sensitive to it.The increase in pattern storehouse is to mating for the first time basic not influence consuming time, and only the time of coupling has increase slightly for the second time.

Essence based on the characteristic value matching algorithm is to adopt the thought of secondary coupling, has changed the thought of existing matching algorithm fully.The bottleneck of processing speed is transferred to the coupling first time that is realized by hardware, filter out a large amount of normal data packet, then a small amount of suspicious data bag is carried out accurately mating the second time.Preferably mate for the first time and for the second time and all realize,, improve the efficient of operation with the burden of mitigation system by hardware.It is multiple to ask the method for characteristic value to have, as read group total or cyclic redundancy calculating etc., but they should satisfy as far as possible before provided ask the characteristic value condition.

Claims

1. based on the multi-model matching method of characteristic value, adopt the method for twice coupling, coupling filters out a large amount of normal packets for the first time, for the second time the suspicious data bag is further mated affirmation; Coupling is realized by hardware fully for the first time, and coupling can be realized also can being born by CPU by hardware for the second time; The matching process of characteristic value is the method that adopts the direct address mapping, obtain its characteristic value behind the character string process multi-mode eigenvalue calculation circuit, as offset address, add model string base address formation actual physical address isometric with the character string characteristic value with this character string; If the pointer in this physical address memory cell is empty, then character string does not match in the pattern storehouse, otherwise the model string of character string and pointed is carried out the coupling affirmation second time.

2. the multi-model matching method based on characteristic value according to claim 1, it is characterized in that: multi-mode eigenvalue calculation circuit is to be made of shift register group, counting circuit group and characteristic value registers group, and wherein the counting circuit group is XOR counting circuit group, mould 3 counting circuit groups or mould 5 counting circuit groups; The characteristic value of all lengths character string is calculated simultaneously by multi-mode eigenvalue calculation circuit, does not need CPU to intervene.

3. according to the described multi-model matching method of claim 1 based on characteristic value, it is characterized in that: the model string in the pattern storehouse is divided into groups by its length, the memory space that distributes 256 bytes or word to the model string of each length, and the address of first unit is kept in the base register of correspondence, represent the initial address of this length model string characteristic value in internal memory; An in store pointer in each memory cell, this pointed mode characteristic values equals the model string of its offset address; If do not have model string characteristic value and its offset address to equate, then pointer is empty; If have a plurality of model string characteristic values and its offset address to equate that then other model string is linked at first model string back successively, forms a list structure.