Single bag canonical matching unit and method
Technical field
A kind of network safety system of the present invention, specifically relates to a kind of single bag canonical matching unit and method.
Background technology
Regular expression describes a kind of pattern of string matching, in order to carry out text matches, for finding the part matched with given regular expression in given character string.Being of wide application of regular expression in prior art, is mainly used in and carries out pattern matching inspection to data traffic, as carried out protocal analysis, Viral diagnosis and business categorizing etc. in the communications industry and network safety filed.
In prior art, carry out matching regular expressions inspection to need in advance regular expression to be converted to DFA (Deterministic Finiter Automata, finite automata), then logic chip according to compiling after DFA and input data flow in character, perform this DFA.But during owing to using, general not only one of the rule of examine, but thousands of bar, every rule adopts a DFA to be obviously impossible several thousand times even up to ten thousand times to flow inspection to be matched, therefore in order to not omit the rule of examine, generally many rules are compiled into the large-scale DFA generally having hundred Mbytes, in coupling is carried out, using flow to be matched as input, using the output of DFA report as matched rule.
Because large-scale DFA size has hundreds of Mbytes, and general logic chip inside cannot integrated jumbo like this internal storage, outside SRAM (Static RandomAccessMemory can only be stored in, static random access memory) or SDRAM (Synchronous DynamicRandom AccessMemory, synchronous DRAM) in, time to be matched, read a part of DFA fragment according to current state and input character and carry out buffer memory to logic chip inside, in the matching process, need constantly to load the data table items associated with current state, and often repeat to load the data table items with state relation due to the redirect of state, DFA is more complicated, the data table items loaded is more, the time required for method of this coupling of prior art is long, matching performance is low.
Summary of the invention
For overcoming above-mentioned defect, the invention provides a kind of single bag canonical matching unit and method, the time required for coupling can be shortened.
For achieving the above object, the invention provides a kind of single bag canonical matching unit, described matching unit comprises: single bag canonical matching unit and connected buffer unit, described single bag canonical matching unit comprises matching regular expressions module, its improvements are, with the protocol variables matching module of described matching regular expressions model calling.
In optimal technical scheme provided by the invention, described buffer unit comprises: the outer buffer memory of sheet arranging sheet outer DFA table and buffer memory in the memory feature arranging protocol variables table in sheet; Described outer buffer memory and described matching regular expressions model calling, described interior buffer memory is connected with described protocol variables matching module.
In second optimal technical scheme provided by the invention, described protocol variables matching module comprises: protocol variables matching engine module and connected result treatment module; Described protocol variables matching engine module receives the message data stream of described single bag canonical matching unit and reads protocol variables table in the sheet in described interior buffer memory; Described result treatment module installation has regular expression DFA address information table.
In 3rd optimal technical scheme provided by the invention, described matching regular expressions module comprises: canonical formula matching engine (RgxBranchRngine), the outer DFA of the sheet be connected with described canonical formula matching engine respectively show read module (RgxOffChipDfaCtrl) and convergence module (RgxResCollecter); Described canonical formula matching engine receives the matching result of described protocol variables matching module.
In 4th optimal technical scheme provided by the invention, described outer buffer memory is SRAM or SDRAM.
In 5th optimal technical scheme provided by the invention, described interior protocol variables table arranges protocol variables, and described protocol variables comprises IP packet header, TCP packet header and UDP packet header.
In 6th optimal technical scheme provided by the invention, described canonical formula matching engine is provided with 4 pass and holds up or 8 pass are held up.
In 7th optimal technical scheme provided by the invention, described canonical formula matching engine is according to the matching result of described protocol variables matching module, regular expression is divided into groups, the regular expression with same protocol variable is divided into one group, and sets up the relation of each regular expression group and respective protocol variable.
In 8th optimal technical scheme provided by the invention, described outer DFA table is formed by described regular expression group.
In 9th optimal technical scheme provided by the invention, provide a kind of single bag canonical matching process of single bag canonical matching unit, its improvements are, described matching process comprises the steps:
(1) described protocol variables matching engine module extracts header information to the message received, and carries out protocol variables coupling; (2) described result treatment module will match the message of protocol variables, pass to described canonical formula matching engine and carry out association DFA and mate; (3) described canonical formula matching engine receives protocol variables matching result, shows read module read DFA table by described outer DFA, and docking receiving literary composition carries out canonical formula coupling; (4), after described canonical formula matching engine completes the canonical formula coupling receiving message, matching result is transferred to described convergence module.
In tenth optimal technical scheme provided by the invention, in described step (1):
Each message in described protocol variables matching module receiving data stream, extract the corresponding signal amount in header, search in described interior buffer memory, if message have matched certain protocol variables, then this message is submitted to a matching regular expressions engine to carry out association DFA and mate, if message have matched N number of protocol variables, then this message is submitted to different regular expression engines and carries out parallel DFA coupling; Wherein, N is natural number 2,3,4,5,6,7 or 8.
In more preferably technical scheme provided by the invention, in described step (2): described result treatment module searches the DFA address information corresponding with message in described regular expression DFA address information table, after message being passed to described canonical formula matching engine, in described canonical formula matching engine, search corresponding regular expression according to DFA address information, matching regular expressions is carried out to message.
Compared with the prior art, single bag canonical matching unit provided by the invention and method, by multiple regular expression is divided into groups according to protocol variables, respectively each regular expression group is compiled, obtain multiple DFA, first protocol variables is utilized to carry out message coupling, recycling matching result loads DFA to carry out matching regular expressions, decreases in matching process the data needing to load, shortens loading procedure, decrease the matching regular expressions time, improve matching performance.
Accompanying drawing explanation
Fig. 1 is the structural representation of single bag canonical matching unit.
Fig. 2 is the structural representation of matching regular expressions module.
Fig. 3 is the flow chart of canonical formula coupling.
Fig. 4 is the storage format schematic diagram of protocol variables in sheet in buffer memory.
Embodiment
As shown in Figure 1, single bag canonical formula matching unit, comprise: single bag canonical matching unit and the buffer unit be connected with described single bag canonical matching unit, described single bag canonical matching unit comprises: matching regular expressions module and the protocol variables matching module with described matching regular expressions model calling.
Described buffer unit comprises: the outer buffer memory of sheet arranging sheet outer DFA table and buffer memory in the memory feature arranging protocol variables table in sheet; Described outer buffer memory and described matching regular expressions model calling, described interior buffer memory is connected with described protocol variables matching module.
Described protocol variables matching module comprises: protocol variables matching engine module and the result treatment module with described protocol variables matching engine model calling; Described protocol variables matching engine module reception enters the message data stream of described single bag canonical matching unit and reads protocol variables table in the sheet in described interior buffer memory.Described result treatment module installation has regular expression DFA address information table.Described interior protocol variables table arranges protocol variables, and described protocol variables comprises IP packet header, TCP packet header and UDP packet header.
As shown in Figure 2, matching regular expressions module comprises: canonical formula matching engine (RgxBranchRngine), the outer DFA of the sheet be connected with described canonical formula matching engine respectively show read module (RgxOffChipDfaCtrl) and convergence module (RgxResCollecter); Described canonical formula matching engine receives the matching result of described protocol variables matching module.Described canonical formula matching engine is provided with 4 pass and holds up; Described canonical formula matching engine is according to the matching result of described protocol variables matching module, regular expression is divided into groups, the regular expression with same protocol variable is divided into one group, obtain the grouping of multiple regular expression, and set up the relation of each regular expression group and respective protocol variable; Described outer DFA table is formed by described regular expression group.Wherein, multiple regular expression grouping specifically has several groups of needs to determine according to following: 1. how many according to supported protocol type of variables, as supported source order IP, source order port and agreement, message load length 6 kinds, then at least should be compiled into 6 groups of regular expressions; 2. consider that regular expression may cause compiling DFA grouping to need time when dividing into groups few is longer, so can according to the canonical formula complexity after protocol variables, to regular expression many points several groups to reduce compilation time, as 100 rules can be divided into 10 groups ~ 20 groups.
By multiple regular expression is divided into groups according to protocol variables, respectively each regular expression group is compiled, obtain multiple DFA, first protocol variables is utilized to carry out message coupling, recycling matching result loads finite automata to carry out matching regular expressions, decreases in matching process the data needing to load, shortens loading procedure, decrease the matching regular expressions time, improve matching performance.
The rule of band protocol variables is introduced as follows:
Rule is: ip_sip=10.0.0.1 & ip_dip=192.168.0.1 & tcp_payload=HTTP; Wherein, ip_sip and ip_dip is protocol variables, represents the source IP address in IP header and object IP address respectively, is that in heading, each semaphore carries out the different type of coded representation during software parses rule.The type coding of ip_sip and ip_dip is 0x1 and 0x2, be then stored in the protocol variables form of buffer memory in sheet as shown in Figure 4; Tcp_payload=HTTP is regular expression, is stored in the outer buffer memory of sheet after being compiled as DFA.
Single bag canonical formula matching system mainly comprises: protocol variables matching module and matching regular expressions module, and protocol variables is stored in logic chip in buffer memory, and regular expression module is stored in the outer SRAM of sheet.
As shown in Figure 2,3, single bag canonical matching process, comprises the steps:
(1) described protocol variables matching engine module extracts header information to the message received, and carries out protocol variables coupling; (2) described result treatment module will match the message of protocol variables, pass to described canonical formula matching engine and carry out association DFA and mate; (3) described canonical formula matching engine receives protocol variables matching result, shows read module read DFA table by described outer DFA, and docking receiving literary composition carries out canonical formula coupling; (4), after described canonical formula matching engine completes the canonical formula coupling receiving message, matching result is transferred to described convergence module.
Before carrying out described single bag canonical matching process, first protocol variables table in sheet and the outer DFA table of sheet are configured.Give single bag canonical matching unit by PCle transmitting order to lower levels, configure described interior protocol variables table and described outer DFA table.
In described step 1: a message in described protocol variables matching module receiving data stream, extract the semaphore in header, search in described interior buffer memory, if message have matched certain protocol variables, then this message is submitted to a matching regular expressions engine to carry out association DFA and mate, if message have matched N number of protocol variables, then this message is submitted to different regular expression engines and carries out parallel DFA coupling; Wherein, N is natural number 2,3,4,5,6,7 or 8.
In described step (2): described result treatment module searches the DFA address information corresponding with message in described regular expression DFA address information table, after message being passed to described canonical formula matching engine, in described canonical formula matching engine, search corresponding regular expression according to DFA address information, matching regular expressions is carried out to message.
In described step 3: first, main frame is by buffer memory (DFATbl) outside RgxConfigure block configuration DFA to sheet; Secondly, canonical formula matching engine (RgxBranchRngine) receives protocol variables matching result, reads DFA table, mate according to current character by RgxOffChipDfaCtrl module; Again, canonical formula matching engine completes coupling, matching result is outputted to convergence module (RgxResCollecter) and carries out arbitrating and being uploaded to subsequent module, and coupling terminates.
Single bag canonical formula matching system can adjust parallel engines number Sum fanction store buffer according to network traffics size and matching speed, if flow is large and require that matching speed is fast, DFA can be stored in access speed fast but in the SRAM that price is high, and regular expression engine number be increased to 8 tunnels or more; If flow is little and matching speed requires not too fast situation, access speed can be selected comparatively slow but the SDRAM storage DFA that price is low, generally speaking, single bag canonical formula matching system can carry out DFA storage medium and matching engine number according to regular complexity, number and matching speed.
It is to be understood that content of the present invention and embodiment are intended to the practical application proving technical scheme provided by the present invention, should not be construed as limiting the scope of the present invention.Those skilled in the art, after reading present specification, under its spirit and principle inspire, can do various amendment, equivalent replacement or improve.But these changes or amendment are all in the protection range that application is awaited the reply.