Single bag canonical matching unit and method
Technical field
A kind of network safety system of the present invention specifically relates to a kind of single bag canonical matching unit and method.
Background technology
Regular expression has been described a kind of pattern of string matching, in order to carry out text matches, in given character string, seeking the part that is complementary with given regular expression.Being of wide application of regular expression in the prior art is mainly used in the communications industry and network safety filed the data flow carried out the pattern matching inspection, as carry out protocal analysis, virus detects and professional classification or the like.
In the prior art; Carrying out the regular expression matching check needs to convert regular expression to DFA (Deterministic Finiter Automata in advance; Finite automata), logic chip is carried out this DFA according to the character in the data flow of DFA after compiling and input then.But because when using, generally not only one of the rule of examine, but thousands of; DFA of every rule employing checks several thousand times flow to be matched even obviously is impossible up to ten thousand times; Therefore in order not omit the rule of examine, generally many rules are compiled into the large-scale DFA that hundred Mbytes are generally arranged, in coupling is carried out; With flow to be matched as input, with the output of DFA report as matched rule.
Because large-scale DFA size has the hundreds of Mbytes, and general logic chip inside can't integrated jumbo like this internal storage, can only be stored in outside SRAM (Static Random AccessMemory; Static random access memory) or among the SDRAM (Synchronous Dynamic Random AccessMemory, synchronous DRAM), when to be matched; Read a part of DFA fragment according to current state and input character and carry out buffer memory to logic chip inside; In matching process, need constantly to load the data table items related, and frequent redirect owing to state repeats to load the data table items with state relation with current state; DFA is more complicated; The loaded data list item is just many more, and the needed time of the method for this coupling of prior art is long, and matching performance is low.
Summary of the invention
For overcoming above-mentioned defective, the invention provides a kind of single bag canonical matching unit and method, can shorten the needed time of coupling.
For realizing above-mentioned purpose; The present invention provides a kind of single bag canonical matching unit; Said matching unit comprises: single bag canonical matching unit and connected buffer unit; Said single bag canonical matching unit comprises the regular expression matching module, and its improvements are, the agreement variable matching module that is connected with said regular expression matching module.
In the optimal technical scheme provided by the invention, said buffer unit comprises: buffer memory in the outer DFA of sheet outer buffer memory of showing of sheet and the memory feature that agreement argument table in the sheet is set is set; Said outer buffer memory is connected with said regular expression matching module, and said interior buffer memory is connected with said agreement variable matching module.
In second optimal technical scheme provided by the invention, said agreement variable matching module comprises: agreement variable matching engine module and connected result treatment module; Said agreement variable matching engine module receives the message data stream of said single bag canonical matching unit and reads agreement argument table in the said sheet in the interior buffer memory; Said result treatment module is provided with regular expression DFA address information table.
In the 3rd optimal technical scheme provided by the invention, said regular expression matching module comprises: canonical formula matching engine (RgxBranchRngine), the outer DFA table read module (RgxOffChipDfaCtrl) of the sheet that is connected with said canonical formula matching engine respectively and convergence module (RgxResCollecter); Said canonical formula matching engine receives the matching result of said agreement variable matching module.
In the 4th optimal technical scheme provided by the invention, said outer buffer memory is SRAM or SDRAM.
In the 5th optimal technical scheme provided by the invention, said interior agreement argument table is provided with the agreement variable, and said agreement variable comprises IP packet header, TCP packet header and UDP packet header.
In the 6th optimal technical scheme provided by the invention, said canonical formula matching engine be provided with 4 pass hold up or 8 pass hold up.
In the 7th optimal technical scheme provided by the invention; Said canonical formula matching engine is according to the matching result of said agreement variable matching module; Regular expression is divided into groups; The regular expression that will have the same protocol variable is divided into one group, and sets up the relation of each regular expression group and respective protocol variable.
In the 8th optimal technical scheme provided by the invention, said outer DFA table formed by said regular expression group.
In the 9th optimal technical scheme provided by the invention, a kind of single bag canonical matching process of single bag canonical matching unit is provided, its improvements are that said matching process comprises the steps:
(1) said agreement variable matching engine module is extracted header information to the message that receives, and carries out agreement variable coupling; (2) said result treatment module will be mated the message of agreement variable, pass to said canonical formula matching engine and carry out related DFA coupling; (3) said canonical formula matching engine receives agreement variable matching result, reads the DFA table through said outer DFA table read module, and butt joint receiving literary composition carries out canonical formula coupling; (4) after said canonical formula matching engine is accomplished the canonical formula coupling that receives message, matching result is transferred to said convergence module.
In the tenth optimal technical scheme provided by the invention, in said step (1):
Each message in the said agreement variable matching module receiving data stream; Extract the corresponding signal amount in the header; In said interior buffer memory, search,, then this message is submitted to a regular expression matching engine and carries out related DFA coupling if message has mated certain agreement variable; If message has mated N agreement variable, then this message is submitted to walk abreast DFA coupling of different regular expression engines; Wherein, N is a natural number 2,3,4,5,6,7 or 8.
In the more preferably technical scheme provided by the invention; In said step (2): said result treatment module is searched the DFA address information corresponding with message in said regular expression DFA address information table; After message passed to said canonical formula matching engine; In said canonical formula matching engine, search corresponding regular expression according to the DFA address information, message is carried out the regular expression coupling.
With the prior art ratio, single bag canonical matching unit provided by the invention and method are through dividing into groups a plurality of regular expressions according to the agreement variable; Respectively each regular expression group is compiled, obtain a plurality of DFA, at first utilize the agreement variable to carry out the message coupling; Utilize matching result to load DFA to carry out the regular expression coupling again, having reduced needs loaded data in the matching process, shortened loading procedure; Reduce regular expression match time, improved matching performance.
Description of drawings
Fig. 1 is the structural representation of single bag canonical matching unit.
Fig. 2 is the structural representation of regular expression matching module.
Fig. 3 is the flow chart of canonical formula coupling.
Fig. 4 is agreement variable storage format sketch map in the buffer memory in sheet.
Embodiment
As shown in Figure 1; Single bag canonical formula matching unit; Comprise: single bag canonical matching unit and the buffer unit that is connected with said single bag canonical matching unit, said single bag canonical matching unit comprises: regular expression matching module and the agreement variable matching module that is connected with said regular expression matching module.
Said buffer unit comprises: buffer memory in the outer DFA of sheet outer buffer memory of showing of sheet and the memory feature that agreement argument table in the sheet is set is set; Said outer buffer memory is connected with said regular expression matching module, and said interior buffer memory is connected with said agreement variable matching module.
Said agreement variable matching module comprises: agreement variable matching engine module and the result treatment module that is connected with said agreement variable matching engine module; Said agreement variable matching engine module receives the message data stream that gets into said single bag canonical matching unit and reads agreement argument table in the said sheet in the interior buffer memory.Said result treatment module is provided with regular expression DFA address information table.Said interior agreement argument table is provided with the agreement variable, and said agreement variable comprises IP packet header, TCP packet header and UDP packet header.
As shown in Figure 2, the regular expression matching module comprises: canonical formula matching engine (RgxBranchRngine), the outer DFA table read module (RgxOffChipDfaCtrl) of the sheet that is connected with said canonical formula matching engine respectively and convergence module (RgxResCollecter); Said canonical formula matching engine receives the matching result of said agreement variable matching module.Said canonical formula matching engine is provided with 4 pass and holds up; Said canonical formula matching engine is according to the matching result of said agreement variable matching module; Regular expression is divided into groups; The regular expression that will have the same protocol variable is divided into one group; Obtain a plurality of regular expressions and divide into groups, and set up the relation of each regular expression group and respective protocol variable; Said outer DFA table formed by said regular expression group.Wherein, a plurality of regular expressions divide into groups specifically to have several groups of needs to confirm according to following: 1. according to the supported protocol type of variables what, as supporting source order IP, source order port and agreement, 6 kinds of message load length then should be compiled into 6 groups of regular expressions at least; 2. consider that regular expression may cause the time of compiling DFA grouping needs longer under few situation of dividing into groups; So can be according to the canonical formula complexity behind the agreement variable; To reduce compilation time, can be divided into 10 groups~20 groups to several groups of many branches of regular expression like 100 rules.
Through a plurality of regular expressions are divided into groups according to the agreement variable, respectively each regular expression group is compiled, obtain a plurality of DFA; At first utilize the agreement variable to carry out the message coupling; Utilize matching result to load finite automata to carry out the regular expression coupling again, having reduced needs loaded data in the matching process, shortened loading procedure; Reduce regular expression match time, improved matching performance.
Rule to band agreement variable is introduced as follows:
Rule is: ip_sip=10.0.0.1&ip_dip=192.168.0.1&tcp_payload=HTTP; Wherein, ip_sip and ip_dip are the agreement variable, represent source IP address and purpose IP address in the IP header respectively, are that each semaphore carries out the different type of coded representation in the heading during software resolution rules.The type coding of ip_sip and ip_dip is 0x1 and 0x2, and the agreement variable form that then is stored in buffer memory in the sheet is as shown in Figure 4; Tcp_payload=HTTP is a regular expression, is compiled as to be stored in behind the DFA in the outer buffer memory of sheet.
Single bag canonical formula matching system mainly comprises: agreement variable matching module and regular expression matching module, agreement variable storage are in logic chip in the buffer memory, and the regular expression module stores is outside sheet among the SRAM.
Like Fig. 2, shown in 3, single bag canonical matching process comprises the steps:
(1) said agreement variable matching engine module is extracted header information to the message that receives, and carries out agreement variable coupling; (2) said result treatment module will be mated the message of agreement variable, pass to said canonical formula matching engine and carry out related DFA coupling; (3) said canonical formula matching engine receives agreement variable matching result, reads the DFA table through said outer DFA table read module, and butt joint receiving literary composition carries out canonical formula coupling; (4) after said canonical formula matching engine is accomplished the canonical formula coupling that receives message, matching result is transferred to said convergence module.
Before carrying out said single bag canonical matching process, earlier the outer DFA table of agreement argument table and sheet in the sheet is configured.Give single bag canonical matching unit through the PCle transmitting order to lower levels, configure said interior agreement argument table and said outer DFA table.
In said step 1: a message in the said agreement variable matching module receiving data stream; Extract the semaphore in the header; In said interior buffer memory, search,, then this message is submitted to a regular expression matching engine and carries out related DFA coupling if message has mated certain agreement variable; If message has mated N agreement variable, then this message is submitted to walk abreast DFA coupling of different regular expression engines; Wherein, N is a natural number 2,3,4,5,6,7 or 8.
In said step (2): said result treatment module is searched the DFA address information corresponding with message in said regular expression DFA address information table; After message passed to said canonical formula matching engine; In said canonical formula matching engine, search corresponding regular expression according to the DFA address information, message is carried out the regular expression coupling.
In said step 3: at first, main frame through RgxConfigure block configuration DFA outside sheet in the buffer memory (DFATbl); Secondly, canonical formula matching engine (RgxBranchRngine) receives agreement variable matching result, reads the DFA table according to current character through the RgxOffChipDfaCtrl module, matees; Once more, canonical formula matching engine is accomplished coupling, matching result is outputed to convergence module (RgxResCollecter) arbitrate and be uploaded to subsequent module, and coupling finishes.
Single bag canonical formula matching system can be adjusted parallel engines number and rale store buffer according to network traffics size and matching speed; If flow is big and require matching speed fast; It is fast but among the SRAM that price is high DFA to be stored in access speed, and regular expression engine number is increased to 8 the tunnel or more; If flow is little and matching speed requires not too fast situation; Can select access speed than SDRAM storage DFA slow but that price is low; Generally speaking, single bag canonical formula matching system can carry out DFA storage medium and matching engine number according to regular complexity, number and matching speed.
What need statement is that content of the present invention and embodiment are intended to prove the practical application of technical scheme provided by the present invention, should not be construed as the qualification to protection range of the present invention.Those skilled in the art under its spirit and principle inspiration, can do various modifications, be equal to replacement or improvement after reading present specification.But these changes or modification are all in the protection range that application is awaited the reply.