CN1691581B - Multi-pattern matching algorithm based on characteristic value - Google Patents

Multi-pattern matching algorithm based on characteristic value Download PDF

Info

Publication number
CN1691581B
CN1691581B CN 200410023142 CN200410023142A CN1691581B CN 1691581 B CN1691581 B CN 1691581B CN 200410023142 CN200410023142 CN 200410023142 CN 200410023142 A CN200410023142 A CN 200410023142A CN 1691581 B CN1691581 B CN 1691581B
Authority
CN
Grant status
Grant
Patent type
Prior art keywords
matching
characteristic
mode
value
hardware
Prior art date
Application number
CN 200410023142
Other languages
Chinese (zh)
Other versions
CN1691581A (en )
Inventor
彭诗力
Original Assignee
彭诗
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Grant date

Links

Abstract

The invention discloses a multi-mode matching algorithm based on the characteristic value and realized by hardware. The characteristic value of the string is calculated by the exclusive-OR circuits ormodular 3, modular 5 hardware circuit (see figure 1, 2, 3, 4) without the intervention of the CPU. The matching process is: organizing the mode library into tree structure according to the length ofthe mode and characteristic string (see figure 3, 4), and them applying the method of address direct image to complete the searching and matching process of the mode characteristic value (see figure 6).Filtering massive normal data packages on the first matching totally realized by the hardware and confirming the suspected data package on the second matching realized by the hardware or CPU. The flo w capacity of the data needed to be confirmed is less than 0.4 percentage of the total flow capacity.

Description

基于特征值的多模式匹配方法 Multi-mode matching method based on eigenvalue

技术领域 FIELD

[0001] 这是一项关于计算机网络安全的发明,可推广到电子资料检索等领域,应用广泛。 [0001] It is an invention on computer network security, it can be extended to electronic information retrieval and other fields, widely used.

基于特征值的多模式匹配算法主要用于高速网络入侵检测系统中,完成误用入侵检测系统中的模式匹配过程。 Based on the multi-mode matching feature values ​​primarily used for high-speed network intrusion detection system, with a complete mismatch intrusion detection system in the process model. 通过实验验证,本算法可以满足高速网络误用入侵检测的要求。 By experimental verification, the algorithm can meet the requirements of high-speed network intrusion detection misuse.

背景技术 Background technique

[0002] 高速网是当今网络发展的必然趋势。 [0002] high-speed network is the inevitable trend of network development. 随着网络的发展,病毒、网络入侵、垃圾邮件、不良信息的泛滥是当前网络面临的棘手难题。 With the development of the network, viruses, network intrusions, spam, spread of bad information network is currently facing difficult challenges. 采用现行模式匹配算法的网络处理系统(如网络入侵检测系统NIDS)很难在高速网络中有效地运行,不得不丢弃大量的数据包。 Pattern matching algorithm using the current network processing system (e.g., network intrusion detection system, NIDS) is difficult to effectively operate a high-speed network, a large number of packets have to be discarded. [0003] 在单模式匹配算法中主要是BM算法以及各种改进算法,著名的入侵检测开源软件Snort中采用的就是BM算法。 [0003] In the single pattern matching algorithm is mainly BM algorithm and a variety of improved algorithms, the famous open-source intrusion detection software Snort is used in BM algorithm. 而AC算法是多模式匹配算法中的经典算法,该算法使用有限状态自动机的结构来接收集合中所有的字符串。 The AC algorithm is a multi-pattern matching algorithm classical algorithm, the algorithm uses finite state machine to receive the set of all strings. 多模式匹配算法的处理速度比单模式匹配算法的速度要快,但其处理过程较单模式匹配算法复杂。 Multi-speed pattern matching algorithm processing speed matching algorithm is faster than a single mode, but the process than the single pattern matching algorithm complexity.

[0004] 目前这些匹配算法的共同特点是一次性匹配。 [0004] At present, these matching algorithm common feature is a one-time match. 即一次匹配便判断网络数据包是否包含模式字符串。 I.e., it determines a match network data packet contains a pattern string. 它们的匹配思想是:在网络数据包中对模式字符串直接进行匹配,若不匹配则根据某种启发式策略跳过一定量字符后接着匹配。 Their idea is matched: directly on the pattern string matching the network packet, if the match is followed by matching characters to skip a certain amount according to some heuristic. 为了尽量跳过更多的字符,各种匹配算法的处理过程都很复杂,很难采用硬件实现,只能依靠处理器来完成其全部匹配过程。 To try to skip the more characters, the processing of various matching algorithms are complex, difficult to use hardware rely on processors to complete all its matching process. 这一过程消耗了大量的系统资源(主要是CPU资源),严重制约着系统检测速率的提高。 This process consumes a lot of system resources (mainly CPU resource), seriously restricting the system to improve the detection rate. [0005] 究其根本原因,主要是当前匹配算法在本质上存在如下重大缺陷:[0006] 1.大量无关网络数据包对系统资源的消耗非常严重。 [0005] study the fundamental reason is mainly the current matching algorithm there is a major flaw in nature: [0006] 1. a large number of network packets unrelated to consumption of system resources is very serious. 由于是采用一次性匹配的方法,大量完全无关的网络信息参与了整个处理过程,浪费了大量的存储空间和CPU资源,从而成为处理速度的瓶颈,这是由当前算法的一次性匹配所决定的。 Because it is a one-time matching method, a large number of completely unrelated to the network information involved in the whole process, wasting a lot of storage space and CPU resources, thus becoming the bottleneck of processing speed, which is a one-time matching algorithm is determined by the current .

[0007] 2.很难采用硬件完成模式匹配过程,这是由匹配算法的复杂性所决定。 [0007] 2. difficult to pattern matching process is done by hardware, it is determined by matching the complexity of the algorithm. 算法越复杂,其硬件实现难度越高。 The more complex the algorithm, hardware implementation of its higher degree of difficulty.

[0008] 3.对处理器的要求高。 High demand [0008] 3. processors. 由于整个匹配过程必须完全由处理器来处理,处理器的速度对匹配速率起决定性作用。 Because the entire matching process must be handled entirely by a processor, the processor speed matching rate play a decisive role. 高速网络信息匹配对处理器的要求非常高,往往采用昂贵的专用处理器。 High-speed Internet information matches the requirements of the processor is very high, often using an expensive dedicated processor.

发明内容 SUMMARY

[0009] 1基于特征值匹配算法的思想 [0009] 1 based on the feature value matching ALGORITHM

[0010] 在实际网络中,入侵数据包只占网络总流量的极少一部分。 [0010] in real networks, only a very small part of the invasion of the total packet traffic network. 系统资源的消耗主要不是在对入侵包的检测,而是在对正常数据包的穷举匹配。 Consumption of system resources is not primarily in the intrusion detection package, but exhaustive matched normal packets. 针对这一实际情况,本文提出并用硬件实现了基于特征值的匹配算法。 In response to this reality, and in this paper it is implemented with hardware-based matching algorithm eigenvalues. [0011] 为了表述的方便,先做以下假设: [0011] For convenience of description, the following assumptions do first:

[0012] 设P为模式字符串,长度为m,P中的字符依次记为P^P2、……、Pm。 [0012] Let P be the pattern string length m, P characters sequentially denoted as P ^ P2, ......, Pm. T为从网络中获取的数据包中的字符串,称为网络字符串,也就是需要进行匹配的文本,长度为n, T中的字符依次记为1\、1~2……Tn。 T is obtained from the network packet in the string, the string is called the network, is the need to match the text of length n, T characters are sequentially referred to as 1 \, 1 ~ 2 ...... Tn. 网络字符子串{T2、T3、T4...Tm+1}是由子串分组{VHTJ在网络字符串中往右平移一个字符而来。 Network character substring {T2, T3, T4 ... Tm + 1} is {VHTJ right substring packet translating a character string from the network. 不含入侵特征字符串的数据包成为正常数据包,否则称为入侵数据包。 Free of intrusion signature packet string becomes normal data packets, otherwise known as data packets invasion.

[0013] 定义1特征值:一个字符串经过某种简单运算(易于硬件实现)而得到的一个值,这个值称为该字符串的特征值,用E表示。 [0013] define a characteristic value: A value of a string of a sort of simple operations (hardware implementation) is obtained, this value is referred to as characteristic value of the string, denoted by E. 字符串和特征值是一种多对一的关系,即一个字符串有且仅有一个特征值,而多个字符串以极小概率对应于同一特征值。 Characteristic value string and a many relationship, i.e. a string of one and only one eigenvalue, and a plurality of strings in a very small probability value corresponding to the same characteristic. [0014] 基于特征值匹配算法的基本思想是:将网络字符串的特征值与等长模式字符串的特征值相比较,若不等则两个字符串肯定不匹配;若相等则两个字符串以极大概率匹配,需要进行第二次匹配确认。 [0014] The basic idea of ​​the feature value matching algorithm is: the characteristic value of the network as long as the string pattern string comparing eigenvalues, and so if the two strings do not match affirmative; if two characters are equal string match with great probability, require a second match confirmation. 简单地说就是采取两次匹配的方法,首先过滤掉大量肯定不匹配的正常网络字符串,接着对可疑网络字符串进行第二次准确匹配。 Briefly, the method is to take two hits in the first filter out a large number of normal network certainly does not match the string, followed by a string of suspicious network for the second match exactly. 第一次匹配算法要求简单,能由硬件直接实现以减轻CPU的负担,而且要能过滤掉绝大多数正常数据包。 The first matching algorithm requires simple, can be implemented directly by the hardware in order to reduce the burden on the CPU, but also to be able to filter out the vast majority of normal data packets. 首次匹配是关键所在,这里主要详细探讨基于特征值的第一次匹配算法。 First match is the key, this is mainly discussed in detail a matching algorithm based on the characteristic values.

[0015] 设模式字符串p二化、P2、P3……Pj,长度为m,特征值为E。 [0015] provided two pattern strings of p, P2, P3 ...... Pj, length m, wherein the value of E. 网络字符串T二{1\、T2、……Tj,长度为n。 Network string T = {1 \, T2, ...... Tj, length is n. 首先求出网络字符串中长为m的子串{1\、 T2、…Tj的特征值,然后与模式字符串特征值E相比较,若相等则进行第二次匹配确认;若不相等,则该网络字符子串肯定和模式字符串不匹配。 First, the network obtains the string length m sub-string {1 \, T2, ... Tj of the characteristic value, the pattern string is then compared with the characteristic value E, if equal, a second verification function; if equal, the network recognition and character sub-string does not match the pattern string. 该网络字符子串往右平移一个字符变成{T2、T3、…U,再与模式字符串进行匹配。 The network translating a character sub-string of characters to the right into {T2, T3, ... U, then matches the pattern string. 平移后的网络字符子串特征值可以由平移前的特征值经过简单运算得到,这一过程完全由硬件电路实现。 Network character substring eigenvalues ​​translation by the feature value may be translated across the front of a simple calculation to obtain, this process is completely implemented in hardware circuitry.

[0016] 为了提高系统的并行度,可以采用分组匹配的方法。 [0016] In order to improve the parallelism of the system, the packet matching method may be employed. 在网络字符串中匹配长为m的模式字符串时,先把网络字符串分解成「"/w"l个子串,每个子串的长度为m,然后用硬件同时计算各子串的特征值,把计算得到的特征值与模式字符串的特征值相比较,若相等则进行第二次匹配;否则同时平移一个字符继续匹配,总共只需平移m次便可完成整个匹配计算过程。 M pattern string matching network length of the string, the string into the network first, "" / w "l sub-strings, each substring of length m, and then calculate in hardware while the respective characteristic value substring , the calculated feature values ​​and feature values ​​of the pattern string comparison, if the second matching is performed is equal; otherwise, continues to match both translate a character, only a total of m times to complete the entire translation matching calculation process.

[0017] 2特征值的计算方法 [0017] 2 calculated eigenvalue

[0018] 网络字符子串特征值的计算方法与模式字符串特征值的计算方法完全一致。 [0018] exactly the same calculation method and calculation of eigenvalues ​​of the string pattern string character sub-network eigenvalues. 其计 The count

算方法有多种,但它们应符合以下几点要求: There are several calculation methods, but they should meet the following requirements:

[0019] 1 、计算简单,能由硬件电路直接实现,无需CPU干预。 [0019] 1, calculation is simple, can be directly implemented by hardware circuitry, without CPU intervention.

[0020] 2、能过滤掉大量的正常数据包,也就是说,和同一特征值对应的字符串应较少。 [0020] 2, filter out large amounts of the normal data packet, i.e., the same characteristic value and a corresponding string should be less. [0021] 3、平移后的特征值可以由平移前的特征值经过简单运算得出,以减少特征值的计算次数。 [0021] 3, the feature value may be a value after translation by the feature before the translation results by simple operation, to reduce the number of feature values ​​is calculated.

[0022] 定义2位向量:在一个字符串中,取每个字符ASCII码相同位上的比特按字符顺序构成的二进制串称为位向量。 [0022] define two vectors: in a string, taking on the same bit position of each character ASCII code binary strings alphabetically bit vector is referred to. 任何一个长为m的字符串有且仅有8个长为m的位向量。 A string of any length m, and only 8 bit vector of length m. 例如字符串"GOOD"中各个字符的ASCII码为{01000111、01001111、01001111、01000100},取每个字符的最低位构成的位向量为{1110}。 For example the string "GOOD" in the ASCII code for each character 01000111,01001111,01001111,01000100 {}, the bit vector of each character takes the lowest bit configuration is {1110}. 该字符串共有8个长为4的位向量,分别是{0000、 1111、0000、0000、0110、 1111、 1110、 1110}。 The string length is a total of eight 4 bit vectors, are {0000, 1111,0000,0000,0110, 1111, 1110, 1110}. 根据位向量求出的特征值称为位特征值,用ei标志。 According to a feature vector determined as bits bit values ​​eigenvalue, with ei flag. 8个位特征值组成该字符串的特征值E, E = [e7、 e6、 e5、 e4、 e3、 e2、 ^、 e。 Wherein 8 bits of the string consisting of eigenvalues ​​value E, E = [e7, e6, e5, e4, e3, e2, ^, e. ]。 ]. [0023] 定义3过滤率:第一次匹配过滤的正常数据量与网络总流量的比值称为过滤率,一般要求第一次匹配的过滤率越高越好。 [0023] 3 defines the rate of filtration: the first data matches the normal amount and the ratio of the total flow of the filter network is called the rate of filtration, the general requirements of the first matched filter, the better. 当网络中没有入侵包时,BM等准确匹配算法的过 When the network is not invaded package, BM and other accurate matching algorithm over

滤率是ioo%。 Filtration rate is ioo%. [0024] 2. 1异或求值法 [0024] 2.1 XOR evaluation method

[0025] 字符串中各个字符通过异或运算即可得到该字符串的异或特征值。 [0025] each character string can be obtained by the exclusive-OR operation XOR feature values ​​of the string. 不失一般性地设字符串为{SpS^……SJ,则该字符串的特征值E^S^S^S3……@Sm },共8个比特。 Without loss of generality disposed string {SpS ^ ...... SJ, wherein the string value E ^ S ^ S ^ S3 ...... @ Sm}, a total of 8 bits. 如模式字符串"CMD"的特征值:E^CeMffiDH(0101 1001}。 The pattern string "CMD" characteristic value: E ^ CeMffiDH (0101 1001}.

[0026] 按所有代码出现的概率相同计算,则位向量中"l"的个数为奇数和偶数的概率是相同的,即一个位特征值的过滤率50%。 [0026] The same calculation according to the probability of occurrence of all codes, the bit vector number "l" for odd and even the probability is the same, i.e., the filter characteristic of a bit value of 50%. 各个位特征值之间相互独立,8个位特征值的过滤率为1_(1/2)8 = 99.61%。 Wherein the individual bits are independent values, the filter was 8 bits eigenvalues ​​1_ (1/2) 8 = 99.61%. 也就是说需要第二次匹配的可疑数据包是总流量的O. 39%。 That need to match the second suspicious packet is O. 39% of the total flow. 实验结果表明其平均过滤率为99. 69%。 Experimental results show that the average filtration rate was 99.69%.

[0027] 在匹配过程中,若网络字符子串特的征值与模式字符串的特征值不相等时,网络字符子串需要平移。 When [0027] In the matching process, if the network is character substring Laid eigenvalue pattern string, characteristic values ​​are not equal, the network needs translation character sub-string. 平移前的网络字符子串特征值与移入和移出的字符异或便可得出平移后的特征值。 Network character eigenvalues ​​can be obtained after the exclusive OR value with the feature translated strings into and out of the character before translation. 例如,在字符串"POWER"中匹配模式字符串"PING"时,首先求出子串"POWE" 的特征值E产[PeOeWeE]^0000 1101]。 For example, pattern matching string "PING" string "POWER" is first determined substring "POWE" characteristic yield value E [PeOeWeE] ^ 0000 1101]. 它与"PING"的特征值[00010000]不匹配,子串往右平移一个字符,变成"OWER"。 Characterized in that the "PING" to the value [00010000] does not match, a translation character substring to the right, into "OWER". 该子串的特征值由[E^PeR]得出,不需要再进行3次异或运算。 Wherein the value of the substring given by [E ^ PeR], no further three exclusive-OR operation.

[0028] 当匹配长为m模式字符串时,第一次求长为m的网络字符子串特征值需要进行(ml)次异或运算,以后平移时只需2次异或运算。 [0028] When the matching string length m mode, the first network seeking character sub-string of length m of the eigenvalue is required (ml) twice exclusive OR operation after only 2 XOR operation is translated.

[0029] 硬件实现是采用简单的异或电路,如图l所示。 [0029] The hardware implementation is a simple exclusive OR circuit, as shown in Figure l. E中保存着长为m的网络字符子串的特征值,其初始值为0。 E stored in the network characteristic value of the long character sub-string m, the initial value is 0. Dm〜D。 Dm~D. 为移位寄存器,网络字符串依次从Dm输入,经过异或电路便可在寄存器E中得到长为m的网络字符子串特征值。 As a shift register sequentially string network, can be obtained through the exclusive OR circuit network character sub-string of length m of the feature value in the register Dm input from the E. [0030] 2. 2模3求值法 [0030] 3 2.2 modulus evaluation method

[0031] 设一位向量为{bm、bm—p……、bj,取一适当的正整数r把它变成表达式: [0032] bmrm—丄+b迈—2rm—3+……+1^ (1) [0031] a vector is provided {bm, bm-p ......, bj, take an appropriate positive integer r to turn it into an expression: [0032] bmrm- Shang + b step -2rm-3 + ...... + 1 ^ (1)

[0033] 再选取一个恰当的正整数M(通常为大于或等于max(bi) (1《i《m)的最小素数),在模M的剩余类环Zm中, [0034] 令 [0033] then select an appropriate positive integer M (generally greater than or equal to a minimum prime number max (bi) (1 "i" m)), and the residue class ring modulo M Zm, the [0034] order

[0035] bi?—1 (2) [0035] bi? -1 (2)

i=l i = l

[0036] e就是该位向量的位特征值. [0036] e is the bit value of the bit vector characterized.

[0037] 为了提高过滤率,应使底数r在Zm单位群的阶尽量大。 [0037] In order to improve the filtration rate, r should be as large as possible in order in base unit Zm group. 若M为素数,则应使r在Z^中的阶为(Ml),即r为循环群Z^的生成元。 If M is a prime number, it should be in order to make Z ^ r in the range (of Ml), i.e., a cyclic group Z ^ r generators. 对于二进制的位向量,一般选M二3,r二2 ;当选M = 2, r = 2时就是异或求值算法。 For binary bit vector, generally selected from M = 3, r = 2; elected M = 2, r = XOR evaluation algorithm is 2.

[0038] 当e为串的校验码时,1位错肯定能查出,能查出2位错的概率为^ ,能查出3位或3位以上错的概率为^。 [0038] When the check code is a string of e, can certainly detect an error, to find the probability of dislocation ^ 2, the probability can detect three or more dislocation of 3 ^.

[0039] 根据上面校验码定理,在二进制的位向量中,令M二3, r = 2,模式字符串的长度为m则: [0039] The above theorems check code, binary bit vector, so that M = 3, r = 2, m is the length of the pattern string:

[0040] 当m = 1时,位特征值相同而位向量不同的概率为O,其过滤率为100%。 [0040] When m = 1, the bit values ​​of the same characteristics of the probability of different bit vector is O, filtered 100%.

[0041] 当m二2时,位特征值相同而位向量不同的概率为(l-^),等于l/2,退化为异或 [0041] When m = 2, the same bit value different characteristic probability of the bit vector (l- ^), is equal to l / 2, or degenerate iso

运算的过滤能力。 Computing filtering capabilities. [0042] 当m > 3时,位特征值相同而位向量不同的概率为(l-;),等于1/3。 [0042] When the same m> 3, wherein the bit value of the bit vector probability of different (L-;), equal to 1/3. 按每个串 Each string

出现的概率相同计算,一个位特征值能过滤掉2/3的正常数据包。 Calculating the probability of the same, a bit value characteristic can filter out the normal data packet 2/3. 各个位特征值之间相互独立,由8个位特征值组成的字符串特征值的过滤率为l-(1/3)8 = 99. 985%。 Wherein the individual bits are independent values, it was filtered compositional values ​​by 8-bit values ​​of the features string l- (1/3) 8 = 99. 985%. 需要第二次匹配的可疑数据量仅占总流量的0. 015% 。 Suspicious need to match the amount of data a second time only 0.015 percent of the total flow.

[0043] 2i(i〉二0)在模3剩余类环中的值分别是1、2、1、2……,所以在求位特征值^时, 只要把位向量中的各位乘以其对应的权再按模3相加即可。 [0043] 2i (i> 20) the value of the mold 3 in the remainder class ring are 1,2,1,2 ......, so the characteristic values ​​for determining the bit ^ time, as long as the bit vector multiplied by everybody right press mold 3 corresponding to the sum. 如位向量[111]的位特征值为(l*l+l*2+l*l)mod(3) = 01。 The bit vector wherein bit [111] is (l * l + l * 2 + l * l) mod (3) = 01. 当位的权为2时,把该位左移一位,当权为1时不需移位。 When the right position is 2, the position of the left one, the shift in power is not required 1. [0044] 为了便于硬件直接实现,首先在每个字符ASCII码的各比特位前加O,把每个字符扩展为16个比特;然后第奇数个字符不动,第偶数个字符左移一位,完成加权处理;接着两个比特一组,按模3相加。 [0044] In order to facilitate direct hardware implementation, each bit before the first character ASCII code is added to each O, each character expanded to 16 bits; and a first odd number of characters does not move, even-the left one character , weighting processing is completed; then a group of two bits, modulo 3 addition.

[0045] 例如,求"CMD"的模3特征值的具体过程如下: [0045] For example, the specific process request "CMD" 3 modulo following characteristic values:

[0046] "CMD"中个字符的ASCII码分别为{01000011、01010111、01001101},扩展成16位 ASCII code [0046] "CMD" in character 01000011,01010111,01001101 {} are expanded into 16 bits

后为: [0047] <table>table see original document page 6</column></row> <table> After as: [0047] <table> table see original document page 6 </ column> </ row> <table>

[0048] 加权移位后("M"左移一位,"C、 D"不变),2个比特一组,按模3相加得: [0049] C 00 01 00 00 00 00 01 01 [0048] After shifting weighted ( "M" left one, "C, D" unchanged), a group of two bits, the sum obtained modulo 3: [0049] C 00 01 00 00 00 00 01 01

[0050] M 00 10 00 10 00 10 10 10 [0050] M 00 10 00 10 00 10 10 10

[0051] mod(3)+D 00 01 00 00 01 01 00 01 [0052] 00 01 00 10 01 00 00 01 [0051] mod (3) + D 00 01 00 00 01 01 00 01 [0052] 00 01 00 10 01 00 00 01

[0053] 即字符串"CMD"的特征值为[00 01 00 10 01 00 00 01]。 Wherein [0053] i.e., the string "CMD" is [0,001,001,001,000,001].

[0054] 当网络字符子串特征值与模式字符串特征值不相等时,网络字符子串需要平移, 进行下一个子串的匹配。 [0054] When the network and the character sub-string pattern string value characteristic feature values ​​are not equal, the network needs translation character sub-string, the next matching substring. 平移前的网络字符子串特征值与移入字符的加权扩展值按模3相加,再按模3减去移出字符的加权扩展值便可得到平移后的特征值。 Wherein the weighted value string extended into the character before the character sub-network translation addition modulo 3, then the mold 3 is removed by subtracting a weighted value of the character can be extended to obtain the shifted feature values. 按模3减去一个字符的加权扩展值等于按模3加上该字符的加权扩展值的反码。 Extended subtracting a weighted value equal to the character of the mold 3 by adding the weighted value of the extended character inverted mold 3. 在匹配长为m的模式字符串中, 开始求网络字符子串特征值需要(ml)次模3加法运算,以后只需2次模3加法运算。 In pattern matching string length m, a character sub-string starts seeking network characteristic values ​​required (ml) 3 times addition operations mode, only after the mold 2 3 addition operations. [0055] 当特征值不匹配时,平移奇数次后的网络字符子串特征值是按公式 [0055] When the feature values ​​do not match, the character string characteristic value of the network odd After translation is according to the formula

[0056] e,产J] (3) [0056] e, yield J] (3)

[0057] 计算得出,需要转换成公式(2)计算的特征值。 [0057] calculated, it needs to be converted into equation (2) calculated eigenvalue. 在模3计算中(|>?+|>——')<formula>formula see original document page 6</formula>=|>一-1 (l+r) =0,所以取平移奇数次后特征值的模3补码便可完成转换。 3 in the mold calculation (|> + |> - '?) <Formula> formula see original document page 6 </ formula> = | a -1 (l + r) = 0, it is taken after translation odd wherein> complement value of the mold 3 to complete the conversion. 模3补码可以 Mold 3 can complement

通过交换位特征值中2个比特的位置来实现,[01]的模3补码是[10] , [10]的模3补码是[01] , [00]的模3补码是[00]。 Is achieved by the position-exchange sites feature values ​​of 2 bits, [01] the mold 3 complement is [10], [10] the mold 3 complement is [01], [00] the mold 3 complement is [ 00]. [0058] 模3求值硬件电路实现如图2所示。 [0058] The evaluation module hardware circuit 3 shown in Figure 2. 网络字符串经过扩展加权电路后,依次输入移位寄存器Dm〜D。 After expansion the string network weighting circuit, the shift register sequentially input Dm~D. 中,再经过模3加法电路得到长为m的网络子串特征值,存入寄存器E 内。 , And then through the addition circuit 3 to obtain the network mode substring of length m of the eigenvalues, stored in the register E. 平移偶数次的模3特征值直接经过选择电路输出,平移奇数次的模3特征值先经过模3补码电路求补,再由选择电路输出。 3 pan-numbered even-mode characteristic values ​​directly output through the selection circuit, the translation of the odd mode characteristic values ​​3 through a die to complement complement circuit 3, and then output by the selection circuit.

[0059] 例如,模式字符串"CMD"的特征值E = [00 01 00 10 01 00 00 Ol],要匹配的网络字符串是"HELL"。 [0059] For example, the feature pattern string "CMD" value E = [00 01 00 10 01 00 00 Ol], to match the string network is "HELL". 为了使网络字符串和模式字符串加权的顺序一致,将网络字符串从右至左加权、分组,各字母的权分别是: [0060] H [ELL] For the network and a string of the same order weighting pattern string, the character string from right to left weighting network, a packet, each of the right letter are: [0060] H [ELL]

[0061] 权:2 1 2 1 [0061] Right: 2121

[0062] 先计算"ELL"的特征值Ei = [00 01 00 00 00 01 00 01],发现和模式特征值E不匹配。 [0062] The first computing "ELL" feature values ​​Ei = [00 01 00 00 00 01 00 01], and found that the characteristic value E does not match the pattern. 网络字符子串左移一位,计算"HEL"的特征值E2。 Left a character substring network, computing "HEL" characteristic value E2. E2 = (E一"2H"+ "〜L")mod(3) =[0010 00 00 00 00 00 01],取E2的模3补码为[00 01 00 00 00 00 00 10];然后与模式特征值E再次进行比较,发现不匹配便知在字符串"HELL"中不包含模式串"CMD",匹配结束。 E2 = (E a ​​"2H" + "~L") mod (3) = [0010 00 00 00 00 00 01], modulo 3 is a complement of E2 [0,001,000,000,000,010]; and then mode characteristic value E are compared again, no match is found glance mode is not included in a string "HELL" in "CMD", the end of the match.

[0063] 模数M也可取其他素数,当M = 5时字符串中各字符的加权值变为1、2、4、3、1、2、 4、3、……。 [0063] The modulus M is preferably also other primes, M = 5 when a string in the weighted value of each character becomes 1,2,4,3,1,2, 4,3, ....... 相应算法的过滤率为l-(1/5)8 = 99.99974%,需要进行第二次匹配的数据量仅占总流量的0. 00026%。 Filtering rate of the data amount corresponding algorithm l- (1/5) 8 = 99.99974%, require a second matching only 0.00026% of the total flow. 随着M的增大,其过滤率相应也提高,但硬件电路实现也会更困难,成本也会增加。 As M increases, the filtration rate is correspondingly increased, but the hardware circuit will be more difficult, the cost will increase.

[0064] 3基于特征值的多模式匹配过程 [0064] 3-based multi-mode matching process eigenvalues

[0065] Aho及Corasick于1975年提出一种基于有限状态机的多模式匹配算法(AC算法),该算法允许同时并行搜索多个字符串。 [0065] Aho and Corasick proposes a matching algorithm (AC method) based on multimode finite state machine, the algorithm allows a plurality of simultaneous and parallel search string in 1975. 搜索的时间为0 (n),建立自动机的时间与模式字符串的长度成线性关系。 Search time is 0 (n), establishing automaton pattern string and the length of time is linear. 现在常用的算法是AC算法和BM算法相结合而形成的AC_BM算法,它是将不同的规则放置在一棵模式树上,然后对这棵模式树采用BM算法进行检索。 AC_BM is now commonly used algorithms and algorithm AC algorithm combine to form BM algorithm, which is different in a regularly spaced pattern tree, then the tree retrieval algorithm using the BM model tree. [0066] 基于特征值的多模式匹配算法的思想是:将模式库中的规则按其长度分组,在组内再按特征值排序并建立索引,组内特征值相同的规则通过链表的形式链接在同一特征值索引上。 [0066] Based on the idea of ​​multi-mode feature value matching algorithm is: the mode in the rule base its length packets, then the rules within the set of eigenvalues ​​sorted and indexed, the same set of feature values ​​in the form of chain link on the same characteristic value of an index. 通过第一次匹配,找到可疑字符串对应的特征值索引,接着进行第二次准确匹配, 将可疑字符串与规则进行第二次比较。 A match by the second, to find the feature value corresponding to the string index suspected, followed by a second exact match, the suspected rule second string comparison.

附图说明: BRIEF DESCRIPTION OF:

[0067] 图1异或求值电路。 [0067] FIG 1 XOR evaluation circuit.

[0068] 图2模3求值电路示。 [0068] FIG. 2 shows evaluation circuit die 3.

[0069] 图3规则库组织图。 [0069] Figure 3 rule base organization chart.

[0070] 图4异或多模求值图。 [0070] FIG. 4 or multimode evaluated heterologous FIG.

[0071] 图5模3多模求值图 [0071] FIG. 5 die 3 multimode FIG evaluated

[0072] 图6匹配电路图 [0072] The circuit diagram of FIG 6 matches

[0073] 图7处理时间比较图 [0073] FIG time comparison processing of FIG. 7

具体实施方式: detailed description:

[0074] 1模式库的组织[0075] 整个模式库构成一个树形结构,模式字符串的长度用L表示(假定Max (L) = m)。 Organization [0074] 1 pattern library [0075] The entire library constitutes a tree mode, the length of the pattern string representation (assuming Max (L) = m) with L. 初始化时间与模式库的大小成线性关系。 Initialization pattern library size and the time is linear. 整个模式库的组织如图3所示。 Organize the whole pattern library as shown in FIG. 特征值的匹配由第一次匹配电路完成,下面的规则匹配由第二次匹配来实现。 Matching the feature value of the matching circuit is completed by the first time, the following rule is implemented by matching the second match. [0076] 2多模式特征值计算的硬件实现 [0076] 2 Multi-mode characteristic value calculation hardware implementation

[0077] 多模式特征值计算电路主要是计算不同长度网络字符子串的特征值。 [0077] The multi-mode feature value calculation circuit calculates eigenvalues ​​mainly networks of different length character substring. 整个计算过 The entire calculated

程完全由硬件自主完成,无需CPU干预。 Cheng completely independent complete by the hardware, without CPU intervention.

[0078] 2. l异或硬件计算电路 [0078] 2. l hardware computing the exclusive OR circuit

[0079] 为了加快计算速度,采用移位寄存器组和异或电路组同时求出不同长度的网络字符子串特征值。 [0079] In order to speed up the calculation, using shift registers and exclusive OR circuit group simultaneously determined character sub-string network eigenvalues ​​of different lengths. 其电路实现如图4所示:(为了表达的简便,设规则库中模式字符串的最大长度为7)。 The circuit shown in Figure 4 to achieve simplicity :( for expression, provided the rule base pattern string maximum length of 7).

[00S0] 设输入的网络字符串T = { "abcdefghi jklmn" },依次输入移位寄存器R7〜R。 [00S0] Network provided input string T = { "abcdefghi jklmn"}, the shift register sequentially input R7~R. , 所有寄存器中的初始值都为0。 The initial values ​​of all registers are zero.

[0081] 第一个节拍时,字符'a'移入D7,经过异或电路后存入R7内。 [0081] When the first beat, the character 'a' into D7, after the exclusive OR circuit into R7.

[0082] 第二个节拍时,字符'b'移入07与1?7中的'a'异或后再存入R7内;字符'a'移入D6,经过异或电路后存入R6内。 When [0082] the second beat, the character 'b' into the 07 1. 7 'a' and then into the exclusive or R7;? Character 'a' into D6, after the exclusive OR circuit into R6.

[0083] 第7个节拍时,D7〜D!中保存着"gfedcba" ;R7中保存着(g @ f ® e ® d ® c④b ® a} 的值,R6中保存着lf ®e@d©c©b®a化勺值,其余类推。 ! [0083] The first seven beats, D7~D stored in the "gfedcba"; R7 are preserved (g @ f ® e ® d ® c④b ® a} value, R6 are preserved lf ®e @ d © c © b®a value of the spoon, the rest on.

[0084] 第8个节拍时,字符'a'移入D。 [0084] The first eight beats, character 'a' into D. 再返回来与各特征值异或,清除'a'字符;此时R7 中保存着(h ®g®f©e®d@c@ W的值,R6中保存着(g ®f®e®d@c® W的值。也就是R7中总是保存着长度为7的网络字符子串特征值,R6中总是保存着长度为6的网络字符子串特征值,其余类推。 Return to a value with each feature or different, clear 'a' character; R7 in this case holds the value (h ®g®f © e®d @ c @ W's, R6 are preserved (g ®f®e® the value of d @ c® W. R7 is always holds the character sub-string of length eigenvalues ​​network 7, R6 always holds the length of a character substring network characteristic value of 6, the rest on.

[0085] 每移入一个字符便可同时求出各种长度的网络字符子串异或特征值,再将特征值与模式库中相同长度的模式字符串特征值进行匹配;即&中的网络字符子串特征值与模式库中L = 7的分支匹配,R6的特征值与模式库中L = 6的分支匹配等等。 [0085] each character into a character sub-network can be obtained simultaneously various lengths of string or exclusive feature values, feature values ​​of the pattern string of the same length then the feature values ​​match with the pattern library; the & character i.e. network substring eigenvalue pattern matching library branch L = 7, R6 feature value and pattern matching library branch L = 6, and the like. [0086] 2. 2模3硬件计算电路 [0086] 3 2.2 hardware computing circuit die

[0087] 异或电路的运算速度很快,但它的过滤率不是很高,需要第二次匹配的可疑网络字符串仍然较多。 [0087] The exclusive OR circuit operation speed is fast, it is not very high filtration rate, needs to match a string of suspicious network second still large. 为了适应于更高速网络的要求,应该采用过滤率更高的算法,如模3、模5、模7算法等。 To meet the requirements in a more high-speed network should use higher filtration rate algorithm, such as mold 3, mold 5, 7 modulus algorithm.

[0088] 模3算法的硬件电路实现如图5所示。 [0088] The hardware implementation of the algorithm mold 3 as shown in FIG. (为了简便,设模式串的最大长度为5,所有寄存器初始值为0)。 (For simplicity, it is assumed maximum length of the string pattern 5, all the register initial value is 0).

[0089] 输入的字符串首先经过加权扩展,每个字符变成16个比特。 [0089] First, the weighted input string extended into 16 bits per character. 然后输入移位寄存器,当到达D。 Input shift register and then, upon reaching D. 时再经过取非电路返回来与各特征值按模3相加,清除该字符。 After the non-circuit then taken back to the value added to the modulo 3 wherein each clear character. [0090] 例如,输入网络字符串T = ("abcdefh各字符的ASCII码为化1F、62F、63F、64F、 65F、66F},扩展移位后为{2802F、1404F、280AF、1420F、2822F、1414F},依次输入D5〜D。。 [0091] 第一个节拍时,2802F移入D5经过模3加电路存入R5内。 [0090] For example, the network input string T = ( "abcdefh ASCII codes for each character of 1F, 62F, 63F, 64F, 65F, 66F}, after shifting to the extended {2802F, 1404F, 280AF, 1420F, 2822F, when 1414F}, sequentially input D5~D .. [0091] a first beat, 2802F D5 into the mold 3 through the addition circuit is stored in R5.

[0092] 第二个节拍时,1404F移入D5与R5中的2802F分组按模3加后等于0006F,再存入R7内;2802F移入D4,经过模3加电路后存入R4内。 When [0092] the second beat, 1404F 2802F packet into D5 and R5 in addition modulo 3 is equal to the 0006F, and then stored into R7; 2802F into D4, after adding circuit 3 into the mold R4.

[0093] 第5个节拍时,Di〜Ds中保存着{2802F、1404F、280AF、1420F、2822F、1414F} ;R5中保存着字符串{abcde}的模3特征值,R6中保存着{abed}的模3特征值,其余类推。 [0093] The first 5 beats, Di~Ds stored in the {2802F, 1404F, 280AF, 1420F, 2822F, 1414F}; R5 holds the string {abcde} eigenvalues ​​mold 3, R6 are preserved {abed 3} modulo value characteristic, the rest on. [0094] 第8个节拍时,字符'a'的加权扩展值移入D。 [0094] The first eight beats, character 'a' weighted spread value into D. 后取非,再返回来与各特征值按模3 加,清除'a'字符;此时R5中保存着{bcdef}按公式(3)计算的模3特征值,经过求补电路得到按公式(2)计算的模3特征值。 After negated, then added back to the modulo-3 values ​​of each feature, clear 'a' character; R5 is preserved in this case {bcdef} according to formula (3) mode 3 calculated characteristic values, obtained through complementing circuit according to formula (2) molding 3 characteristic value calculated. ! ^中保存着{bcde}按公式(3)计算的模3特征值,经过求补电路输出按公式(2)计算的模3特征值。 ^ {Bcde} holds the mold according to formula (3) mode 3 calculated characteristic values, complementing through an output circuit according to formula (2) 3 calculated eigenvalues. 其余类推。 The rest on.

[0095] 以后每过一个节拍便可同时得出各种长度网络字符串的模3特征值,再与模式库 [0095] After a beat every module of various lengths can be obtained while the string network characteristic value 3, then the pattern library

中相同长度的模式字符串特征值相匹配。 Pattern string feature values ​​match the same length.

[0096] 3匹配处过程的处理 [0096] The processing procedure at the 3 match

[0097] 为了提高匹配速度,减少在模式库中搜索特征值的耗时,采用直接地址映射的方法来完成对模式特征值的查找过程。 [0097] In order to improve matching speed and reduce the value of the search feature in the pattern library consuming, direct address mapping method to complete the discovery process of model eigenvalues. 在模式库初始化时,给每一种长度的规则树支分配256字节(或字)的存储空间,并把第一个单元的地址保存在对应的基址寄存器中,表示这一长度模式字符串特征值在内存中的起始地址。 Library initialization mode when assigned to each rule tree branch length 256 byte (or word) in memory, and the address of the first unit stored in the corresponding base address register indicating the length of the character mode eigenvalue string starting address in memory. 每个存储单元中保存着一个指针,该指针指向模式特征值等于其偏移地址的规则。 Stored in each memory cell with a pointer, the pointer to the rule pattern characteristic value is equal to its offset address. 如果没有模式特征值和其偏移地址相等,则指针为空;若有多个规则的特征值和其偏移地址相等,则其它规则依次链接在第一个规则后面, 形成一个链表结构。 If equal eigenvalues ​​and their mode offset address, the pointer is null; if a plurality of feature values ​​are equal and their offset rules, other rules linking sequentially after the first rule to form a linked list structure. 当特征值计算电路得出某一长度的网络字符子串特征值后,直接找到偏移地址为其特征值的存储单元。 When the feature value calculation circuit determines the network characteristic value of a character sub-string length, directly to the offset address value for the memory cell characteristics. 若该存储单元指针为空,则表明该字符子串在模式库中不匹配;否则和指针指向的规则进行第二次匹配。 If the memory cell pointer is NULL, it indicates that the character sub-string does not match the pattern base; pointer to rules and otherwise a second match. 在Snort规则库中抽出100个规则形成模式库结构,指针非空的存储单元占总存储空间的2. 07%,对应于同一存储单元的规则占0. 19%,随着模式库的增大,指针非空的存储单元也会增多。 Withdrawn Snort rules library 100 regularly formed pattern database structure, a non-null pointer to 2.07% of the total storage space of the storage unit, the storage unit corresponding to the same rules accounted 0.19%, increases as the pattern library , a non-null pointer memory cell will also increase.

[0098] 整个匹配电路如图6所示。 [0098] the entire matching circuit shown in Fig. 输入的网络字符串经过多模式特征值计算电路后,得到各种长度子串的特征值。 Input string network through a multi-mode feature value calculation circuit, the length of the substring to obtain various characteristic values. 以子串的特征值作为偏移地址,加上与该子串等长的模式串基地址形成一个实际物理地址。 Substring to the offset address as the characteristic value, together forming an actual physical address of the substring as long as the base address pattern strings. 若该物理地址存储单元中的指针为空,表明该网络字符子串在模式库中不匹配;若存储单元指针非空,则将网络字符子串与指针指向的规则进行第二次匹配确认。 If the physical address of the memory cell pointer is null, indicating that the network does not match the character sub-string pattern base; if confirmation matching second non-null pointer, then the rule network character sub-string pointer storage unit. 可以看出,从网络字符子串特征值的计算至找到对应长度模式特征值的时间非常短,基本可以忽略。 As can be seen, the string from the feature value calculating character sub-network to find the time length corresponding pattern feature value is very short and can be ignored.

[0099] 如果采用模5的方法计算字符串特征值,其过滤率可达99. 99974%,需要第二次匹配的数据量非常少,仅占网络总流量的0. 00026%。 [0099] If the calculated feature values ​​using the method for molding the string 5, which was filtered up to 99.99974%, the amount of data that matches a second very small, only 0.00026% of the total network traffic. 但硬件实现比较复杂,数据在硬件上的延时也会有所增加而成为处理的瓶颈,需要多个硬件电路并行处理来解决。 However hardware implementation is more complex, data delay on the hardware will increase and become a bottleneck in the process, a plurality of parallel processing hardware to solve. [0100] 本算法主要是采用两次匹配的方法和硬件实现来减轻系统CPU的负担,使误用入侵检测系统能在高速网络中有效地运行。 [0100] The algorithm uses two main hardware and matching methods to reduce the burden on the system CPU, so that misuse intrusion detection system can operate effectively in high-speed networks. 整个匹配电路可以做成插件形式,插在系统主板的PCI插槽内,或直接集成在网卡或其它电路板上。 Whole matching circuit can be made in the form of plug-ins, the system board is inserted in the PCI slot, or directly integrated in the card or other circuit board. 第一次匹配中特征值的计算过程由移位寄存器组、异或电路组(或模3计算电路组)和寄存器组共同完成。 Matching the first feature value calculation process by the shift register group, the exclusive OR circuit group (or analog calculating circuit group 3) and the complete set of common registers. 其中移位寄存器组直接接收高速网卡传来的数据包,经过异或电路组(或模3计算电路组)计算出不同长度的网络字符子串的特征值,然后保存在寄存器组中。 Wherein the shift register group to directly receive high speed packet data from the card through the exclusive OR circuit group (or analog computing circuit group 3) to calculate the characteristic value of the network sub-character strings of different lengths, and then stored in the register group. 当一套计算电路无法满足要求时,可以由多套计算电路同时并行处理。 When a calculation circuit can not meet the requirements, it can be processed in parallel by a plurality of sets of computing circuits. 规则模式库通过初始化处理后保存在EEPROM中,直接和计算电路组相连,以避免模式库的频繁初始化处理和数据传输延迟。 Regular mode after the initialization process through repository stored in the EEPROM, and the calculation circuit is connected directly to the group, in order to avoid frequent initialization processing and data transmission delay pattern library. 当模式库需要更新时再往EEPR0M中添加规则。 When the mode libraries need to be updated to add rules and then further EEPR0M in. 第二次匹配过程可以由CPU承担,也可由硬件比较电路实现。 The second matching process can take a CPU, it may also be implemented by a hardware comparison circuit. 当完成整个匹配过程后发现入侵数据包时,将数据包和其对应的规则代码一起移交给系统进行进一步的分析和报警处理。 When a data packet detect intrusion complete the matching process, along with the transfer of the packet and its corresponding code rules further analysis and processing to the alarm system.

[0101] 为了进一步提高处理速度,该算法可以与协议解析方法配合使用。 [0101] In order to further improve the processing speed, the protocol algorithm can be used with analytical method. 将不同协议类型的网络数据包分配给不同的模式匹配电路处理,各匹配电路的EEPROM中保存着对应协议类型的匹配规则。 Different types of protocol packets are assigned to different network pattern matching processing circuit, EEPROM matching circuits in the protocol type corresponding to the stored matching rules. 这一过程牵涉到负载均衡的问题,但它能有效地提高处理速度,而且便于系统的下一步处理。 This process involves load balancing problems, but it can effectively improve the processing speed, and ease of further processing system. [0102] 4有益效果 [0102] Advantageous Effects [4

[0103] 为了测试算法的实际性能,在网络中随机捕获150M数据,模式字符串来自Snort [0103] In order to test the actual performance of the algorithm in the network to capture 150M random data pattern string from Snort

入侵检测系统中的规则库,对基于特征值的多模式匹配算法,统计了程序模拟的执行间、异 Intrusion detection system rule base, based on the feature value of the multi-mode matching algorithm, a statistical simulation program between the execution of different

或硬件实现处理时间和模3硬件实现处理时间。 Or hardware implement the processing time and the processing time of the mold 3 hardware. 从特征值的计算至找到对应长度模式特征 From the calculated feature values ​​to find the corresponding pattern characteristic length

值的时间非常短,算法的耗时主要是在第二次匹配上。 Very short time value, time-consuming algorithms are implemented in the second match.

[0104] 具体时间如表1所示: [0104] As the specific time shown in Table 1:

[0105] 表1处理时间(单位:秒) [0105] TABLE 1 Processing time (unit: seconds)

<table>table see original document page 10</column></row> <table>[0107] 实验条件:500M Intel CPU,256M RAM, Win2000. <Table> table see original document page 10 </ column> </ row> <table> [0107] Experimental conditions: 500M Intel CPU, 256M RAM, Win2000.

[0108] 表中数据表明,BM算法的处理时间随着模式串的增多而线性增长,比其他算法耗时多很多。 [0108] data in the table show that the treatment time with the increase of BM algorithm pattern string grow linearly, much time-consuming than other algorithms. AC算法在多模式匹配中非常有效,但仍然无法满足高速网入侵检测的要求。 AC algorithm is very effective in a multi-pattern matching, but still can not meet the requirements of high-speed network intrusion detection. 异或硬件实现电路基本可以满足100M网络入侵检测的要求;模3硬件实现电路可以满足1000M 网络的入侵检测要求;对更高速网络可以采用模5、模7等硬件电路。 XOR hardware implementation to meet the basic requirement of network intrusion detection 100M; 3 hardware circuit mode network intrusion detection 1000M meet requirements; high-speed network may be employed for more die 5, a die 7 and other hardware circuitry.

[0109] 通过图例,可以更清楚地看出不同算法的处理时间。 [0109] By illustration, we can be more clearly seen in the processing time of different algorithms. 由于BM算法耗时太多,在图中就省去了。 Since the BM algorithm consuming too much, eliminating the need in the figure. 具体比较如图7所示。 DETAILED comparison as shown in FIG.

[0110] 随着模式库中规则的增多,各种算法的处理时间也都会有所增加,但特征值算法对其不是很敏感。 [0110] With the increase in the pattern library rules, the processing time of various algorithms will also increase, but the algorithm is very sensitive to the characteristic values ​​are not. 模式库的增大对第一次匹配耗时根本没有影响,仅仅第二次匹配的时间稍有增加。 Increase on the first match takes no effect pattern library, matching the second time only a slight increase.

[0111] 基于特征值匹配算法的实质是采用二次匹配的思想,完全改变了现行匹配算法的思想。 [0111] Based on the essential characteristics of value matching algorithm is to use the second match of thinking, completely changed the idea of ​​the existing matching algorithms. 把处理速度的瓶颈转移到由硬件来实现的第一次匹配,过滤掉大量正常数据包,接着对小量可疑数据包进行第二次准确匹配。 The bottleneck of the processing proceeds to the first speed matching implemented by hardware, to filter out a large number of normal data packets, and then a small amount of a second suspect data packet an exact match. 最好第一次和第二次匹配都由硬件来实现,以减轻系统的负担,提高运行的效率。 The first and second best matching is achieved by the hardware to ease the burden on the system, improve the efficiency of operation. 求特征值的方法有多种,如求和计算或循环冗余计算等, 但他们应尽量满足先前给出的求特征值条件。 Eigenvalue Method There are many, such as cyclic redundancy sum calculation or calculation, but they should try to find the value that satisfies conditions previously given characteristics.

Claims (3)

  1. 基于特征值的多模式匹配方法,采用两次匹配的方法,第一次匹配过滤掉大量正常的数据包,第二次对可疑数据包进行进一步的匹配确认;第一次匹配完全由硬件实现,第二次匹配可以由硬件实现也可由CPU承担;特征值的匹配过程是采用直接地址映射的方法,字符串经过多模式特征值计算电路后得到其特征值,以字符串特征值作为偏移地址,加上与该字符串等长的模式字符串基地址形成一个实际物理地址;若该物理地址存储单元中的指针为空,则字符串在模式库中不匹配,否则将字符串与指针指向的模式字符串进行第二次匹配确认。 Multi-mode matching method based on eigenvalue, two matching method, first matching filters a number of normal data packet, the second packet suspicious for further verification function; first matching fully implemented in hardware, the second matching can also be realized by hardware CPU bear; feature value matching process is a direct address mapping method, the string value obtained through a multi-mode characteristic value calculation circuit wherein, as an offset value to the address string wherein , together form an actual physical address of the string as long as the base address of the pattern string; if the physical address of the storage cell pointer is null, the string does not match the pattern base, or with a pointer to a string a second pattern string matching confirmation.
  2. 2. 根据权利要求1所述的基于特征值的多模式匹配方法,其特征是:多模式特征值计算电路是由移位寄存器组、计算电路组和特征值寄存器组构成,其中计算电路组为异或计算电路组、模3计算电路组或模5计算电路组;各种长度字符串的特征值由多模式特征值计算电路同时计算得出,不需要CPU干预。 The multi-mode matching method based on the feature value according to claim 1, characterized in that: a multi-mode characteristic value calculating circuit set is composed of a shift register, and the characteristic value calculating circuit set configuration register group, wherein the calculation circuit group XOR circuit group 3 analog calculating circuit 5 calculates a group or set of analog circuits; wherein the values ​​of various circuit length of the string is calculated simultaneously, without CPU intervention by the multimode characteristic value calculation.
  3. 3. 根据权利要求1所述基于特征值的多模式匹配方法,其特征是:将模式库中的模式字符串按其长度分组,给每一种长度的模式字符串分配256字节或字的存储空间,并把第一个单元的地址保存在对应的基址寄存器中,表示这一长度模式字符串特征值在内存中的起始地址;每个存储单元中保存着一个指针,该指针指向模式特征值等于其偏移地址的模式字符串;如果没有模式字符串特征值和其偏移地址相等,则指针为空;若有多个模式字符串特征值和其偏移地址相等,则其它模式字符串依次链接在第一个模式字符串后面,形成一个链表结构。 3. Based on the multi-mode matching method of the eigenvalues ​​according to claim 1, characterized in that: the pattern string pattern library its length packets, to the pattern length of each string is allocated 256 byte or word storage space, and the address of the first unit stored in the corresponding base address register, wherein the pattern string length represents the value of the starting address in memory; each memory cell holds a pointer that points characteristic pattern string whose value is equal to the offset address mode; if no pattern string equal eigenvalues ​​and its offset address, the pointer is null; if a plurality of pattern string equal eigenvalues ​​and its offset address, other pattern string sequentially in link string following the first pattern, forming a linked list structure.
CN 200410023142 2004-04-26 2004-04-26 Multi-pattern matching algorithm based on characteristic value CN1691581B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200410023142 CN1691581B (en) 2004-04-26 2004-04-26 Multi-pattern matching algorithm based on characteristic value

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200410023142 CN1691581B (en) 2004-04-26 2004-04-26 Multi-pattern matching algorithm based on characteristic value

Publications (2)

Publication Number Publication Date
CN1691581A true CN1691581A (en) 2005-11-02
CN1691581B true CN1691581B (en) 2010-04-28

Family

ID=35346743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200410023142 CN1691581B (en) 2004-04-26 2004-04-26 Multi-pattern matching algorithm based on characteristic value

Country Status (1)

Country Link
CN (1) CN1691581B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100574181C (en) 2006-05-26 2009-12-23 上海晨兴电子科技有限公司 Method and device for virus scanning and processing of the data received by mobile phone
CN101009660B (en) 2007-01-19 2010-06-30 杭州华三通信技术有限公司 Universal method and device for processing the match of the segmented message mode
CN101409623B (en) 2008-11-26 2010-09-01 湖南大学 Mode matching method facing to high speed network
CN101873199B (en) * 2010-06-29 2014-11-05 中兴通讯股份有限公司 Matching method and device of code words
CN101930458B (en) * 2010-08-18 2012-02-01 杭州东信北邮信息技术有限公司 Short message matching method based on characteristic value
CN105354150B (en) * 2015-10-31 2018-03-16 杭州华为数字技术有限公司 A content matching method and apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1105464A (en) 1993-04-21 1995-07-19 国际商业机器公司 Interactive computer system recognizing spoken commands
CN1419763A (en) 2000-02-24 2003-05-21 纽米雷克斯投资公司 Non-invasive remote monitoring and reporting of digital communications systems
CN1423892A (en) 1999-11-12 2003-06-11 通用器材公司 Intrusion detection for object security

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1105464A (en) 1993-04-21 1995-07-19 国际商业机器公司 Interactive computer system recognizing spoken commands
CN1423892A (en) 1999-11-12 2003-06-11 通用器材公司 Intrusion detection for object security
CN1419763A (en) 2000-02-24 2003-05-21 纽米雷克斯投资公司 Non-invasive remote monitoring and reporting of digital communications systems

Also Published As

Publication number Publication date Type
CN1691581A (en) 2005-11-02 application

Similar Documents

Publication Publication Date Title
Bispo et al. Regular expression matching for reconfigurable packet inspection
US5452451A (en) System for plural-string search with a parallel collation of a first partition of each string followed by finite automata matching of second partitions
US6493698B1 (en) String search scheme in a distributed architecture
Becchi et al. Memory-efficient regular expression search using state merging
Yu et al. Gigabit rate packet pattern-matching using TCAM
US20030065800A1 (en) Method of generating of DFA state machine that groups transitions into classes in order to conserve memory
US20080189784A1 (en) Method and Apparatus for Deep Packet Inspection
Vasiliadis et al. Regular expression matching on graphics hardware for intrusion detection
Li et al. Fast and accurate long-read alignment with Burrows–Wheeler transform
US8392590B2 (en) Deterministic finite automata (DFA) processing
US8301788B2 (en) Deterministic finite automata (DFA) instruction
Lakshminarayanan et al. Algorithms for advanced packet classification with ternary CAMs
US7529746B2 (en) Search circuit having individually selectable search engines
US20080046423A1 (en) Method and system for multi-character multi-pattern pattern matching
US7644080B2 (en) Method and apparatus for managing multiple data flows in a content search system
US20080071781A1 (en) Inexact pattern searching using bitmap contained in a bitcheck command
US7240048B2 (en) System and method of parallel pattern matching
US7933282B1 (en) Packet classification device for storing groups of rules
US7539032B2 (en) Regular expression searching of packet contents using dedicated search circuits
Che et al. DRES: Dynamic range encoding scheme for TCAM coprocessors
Lu et al. A memory-efficient parallel string matching architecture for high-speed intrusion detection
Sourdis et al. Regular expression matching in reconfigurable hardware
US20090138440A1 (en) Method and apparatus for traversing a deterministic finite automata (DFA) graph compression
US20130133064A1 (en) Reverse nfa generation and processing
US7110540B2 (en) Multi-pass hierarchical pattern matching

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C14 Granted
C17 Cessation of patent right