CN102420750B - Single bag canonical matching unit and method - Google Patents

Single bag canonical matching unit and method Download PDF

Info

Publication number
CN102420750B
CN102420750B CN201110383388.4A CN201110383388A CN102420750B CN 102420750 B CN102420750 B CN 102420750B CN 201110383388 A CN201110383388 A CN 201110383388A CN 102420750 B CN102420750 B CN 102420750B
Authority
CN
China
Prior art keywords
matching
module
canonical
dfa
protocol variables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110383388.4A
Other languages
Chinese (zh)
Other versions
CN102420750A (en
Inventor
纪奎
李锋伟
姬乃军
刘兴奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Network Technology Co., Ltd.
Original Assignee
Dawning Information Industry Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Beijing Co Ltd filed Critical Dawning Information Industry Beijing Co Ltd
Priority to CN201110383388.4A priority Critical patent/CN102420750B/en
Publication of CN102420750A publication Critical patent/CN102420750A/en
Application granted granted Critical
Publication of CN102420750B publication Critical patent/CN102420750B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a kind of single bag canonical matching unit and method, described matching unit comprises: single bag canonical matching unit and the buffer unit be connected with described single bag canonical matching unit, described single bag canonical matching unit comprises: matching regular expressions module, with the protocol variables matching module of described matching regular expressions model calling.Described matching process is by dividing into groups multiple regular expression according to protocol variables, respectively each regular expression group is compiled, obtain multiple DFA, first utilize protocol variables to carry out message coupling, recycling matching result loads DFA to carry out matching regular expressions.Single bag canonical matching unit provided by the invention and method, decrease in matching process the data needing to load, shorten loading procedure, decrease the matching regular expressions time, improve matching performance.

Description

Single bag canonical matching unit and method
Technical field
A kind of network safety system of the present invention, specifically relates to a kind of single bag canonical matching unit and method.
Background technology
Regular expression describes a kind of pattern of string matching, in order to carry out text matches, for finding the part matched with given regular expression in given character string.Being of wide application of regular expression in prior art, is mainly used in and carries out pattern matching inspection to data traffic, as carried out protocal analysis, Viral diagnosis and business categorizing etc. in the communications industry and network safety filed.
In prior art, carry out matching regular expressions inspection to need in advance regular expression to be converted to DFA (Deterministic Finiter Automata, finite automata), then logic chip according to compiling after DFA and input data flow in character, perform this DFA.But during owing to using, general not only one of the rule of examine, but thousands of bar, every rule adopts a DFA to be obviously impossible several thousand times even up to ten thousand times to flow inspection to be matched, therefore in order to not omit the rule of examine, generally many rules are compiled into the large-scale DFA generally having hundred Mbytes, in coupling is carried out, using flow to be matched as input, using the output of DFA report as matched rule.
Because large-scale DFA size has hundreds of Mbytes, and general logic chip inside cannot integrated jumbo like this internal storage, outside SRAM (Static RandomAccessMemory can only be stored in, static random access memory) or SDRAM (Synchronous DynamicRandom AccessMemory, synchronous DRAM) in, time to be matched, read a part of DFA fragment according to current state and input character and carry out buffer memory to logic chip inside, in the matching process, need constantly to load the data table items associated with current state, and often repeat to load the data table items with state relation due to the redirect of state, DFA is more complicated, the data table items loaded is more, the time required for method of this coupling of prior art is long, matching performance is low.
Summary of the invention
For overcoming above-mentioned defect, the invention provides a kind of single bag canonical matching unit and method, the time required for coupling can be shortened.
For achieving the above object, the invention provides a kind of single bag canonical matching unit, described matching unit comprises: single bag canonical matching unit and connected buffer unit, described single bag canonical matching unit comprises matching regular expressions module, its improvements are, with the protocol variables matching module of described matching regular expressions model calling.
In optimal technical scheme provided by the invention, described buffer unit comprises: the outer buffer memory of sheet arranging sheet outer DFA table and buffer memory in the memory feature arranging protocol variables table in sheet; Described outer buffer memory and described matching regular expressions model calling, described interior buffer memory is connected with described protocol variables matching module.
In second optimal technical scheme provided by the invention, described protocol variables matching module comprises: protocol variables matching engine module and connected result treatment module; Described protocol variables matching engine module receives the message data stream of described single bag canonical matching unit and reads protocol variables table in the sheet in described interior buffer memory; Described result treatment module installation has regular expression DFA address information table.
In 3rd optimal technical scheme provided by the invention, described matching regular expressions module comprises: canonical formula matching engine (RgxBranchRngine), the outer DFA of the sheet be connected with described canonical formula matching engine respectively show read module (RgxOffChipDfaCtrl) and convergence module (RgxResCollecter); Described canonical formula matching engine receives the matching result of described protocol variables matching module.
In 4th optimal technical scheme provided by the invention, described outer buffer memory is SRAM or SDRAM.
In 5th optimal technical scheme provided by the invention, described interior protocol variables table arranges protocol variables, and described protocol variables comprises IP packet header, TCP packet header and UDP packet header.
In 6th optimal technical scheme provided by the invention, described canonical formula matching engine is provided with 4 pass and holds up or 8 pass are held up.
In 7th optimal technical scheme provided by the invention, described canonical formula matching engine is according to the matching result of described protocol variables matching module, regular expression is divided into groups, the regular expression with same protocol variable is divided into one group, and sets up the relation of each regular expression group and respective protocol variable.
In 8th optimal technical scheme provided by the invention, described outer DFA table is formed by described regular expression group.
In 9th optimal technical scheme provided by the invention, provide a kind of single bag canonical matching process of single bag canonical matching unit, its improvements are, described matching process comprises the steps:
(1) described protocol variables matching engine module extracts header information to the message received, and carries out protocol variables coupling; (2) described result treatment module will match the message of protocol variables, pass to described canonical formula matching engine and carry out association DFA and mate; (3) described canonical formula matching engine receives protocol variables matching result, shows read module read DFA table by described outer DFA, and docking receiving literary composition carries out canonical formula coupling; (4), after described canonical formula matching engine completes the canonical formula coupling receiving message, matching result is transferred to described convergence module.
In tenth optimal technical scheme provided by the invention, in described step (1):
Each message in described protocol variables matching module receiving data stream, extract the corresponding signal amount in header, search in described interior buffer memory, if message have matched certain protocol variables, then this message is submitted to a matching regular expressions engine to carry out association DFA and mate, if message have matched N number of protocol variables, then this message is submitted to different regular expression engines and carries out parallel DFA coupling; Wherein, N is natural number 2,3,4,5,6,7 or 8.
In more preferably technical scheme provided by the invention, in described step (2): described result treatment module searches the DFA address information corresponding with message in described regular expression DFA address information table, after message being passed to described canonical formula matching engine, in described canonical formula matching engine, search corresponding regular expression according to DFA address information, matching regular expressions is carried out to message.
Compared with the prior art, single bag canonical matching unit provided by the invention and method, by multiple regular expression is divided into groups according to protocol variables, respectively each regular expression group is compiled, obtain multiple DFA, first protocol variables is utilized to carry out message coupling, recycling matching result loads DFA to carry out matching regular expressions, decreases in matching process the data needing to load, shortens loading procedure, decrease the matching regular expressions time, improve matching performance.
Accompanying drawing explanation
Fig. 1 is the structural representation of single bag canonical matching unit.
Fig. 2 is the structural representation of matching regular expressions module.
Fig. 3 is the flow chart of canonical formula coupling.
Fig. 4 is the storage format schematic diagram of protocol variables in sheet in buffer memory.
Embodiment
As shown in Figure 1, single bag canonical formula matching unit, comprise: single bag canonical matching unit and the buffer unit be connected with described single bag canonical matching unit, described single bag canonical matching unit comprises: matching regular expressions module and the protocol variables matching module with described matching regular expressions model calling.
Described buffer unit comprises: the outer buffer memory of sheet arranging sheet outer DFA table and buffer memory in the memory feature arranging protocol variables table in sheet; Described outer buffer memory and described matching regular expressions model calling, described interior buffer memory is connected with described protocol variables matching module.
Described protocol variables matching module comprises: protocol variables matching engine module and the result treatment module with described protocol variables matching engine model calling; Described protocol variables matching engine module reception enters the message data stream of described single bag canonical matching unit and reads protocol variables table in the sheet in described interior buffer memory.Described result treatment module installation has regular expression DFA address information table.Described interior protocol variables table arranges protocol variables, and described protocol variables comprises IP packet header, TCP packet header and UDP packet header.
As shown in Figure 2, matching regular expressions module comprises: canonical formula matching engine (RgxBranchRngine), the outer DFA of the sheet be connected with described canonical formula matching engine respectively show read module (RgxOffChipDfaCtrl) and convergence module (RgxResCollecter); Described canonical formula matching engine receives the matching result of described protocol variables matching module.Described canonical formula matching engine is provided with 4 pass and holds up; Described canonical formula matching engine is according to the matching result of described protocol variables matching module, regular expression is divided into groups, the regular expression with same protocol variable is divided into one group, obtain the grouping of multiple regular expression, and set up the relation of each regular expression group and respective protocol variable; Described outer DFA table is formed by described regular expression group.Wherein, multiple regular expression grouping specifically has several groups of needs to determine according to following: 1. how many according to supported protocol type of variables, as supported source order IP, source order port and agreement, message load length 6 kinds, then at least should be compiled into 6 groups of regular expressions; 2. consider that regular expression may cause compiling DFA grouping to need time when dividing into groups few is longer, so can according to the canonical formula complexity after protocol variables, to regular expression many points several groups to reduce compilation time, as 100 rules can be divided into 10 groups ~ 20 groups.
By multiple regular expression is divided into groups according to protocol variables, respectively each regular expression group is compiled, obtain multiple DFA, first protocol variables is utilized to carry out message coupling, recycling matching result loads finite automata to carry out matching regular expressions, decreases in matching process the data needing to load, shortens loading procedure, decrease the matching regular expressions time, improve matching performance.
The rule of band protocol variables is introduced as follows:
Rule is: ip_sip=10.0.0.1 & ip_dip=192.168.0.1 & tcp_payload=HTTP; Wherein, ip_sip and ip_dip is protocol variables, represents the source IP address in IP header and object IP address respectively, is that in heading, each semaphore carries out the different type of coded representation during software parses rule.The type coding of ip_sip and ip_dip is 0x1 and 0x2, be then stored in the protocol variables form of buffer memory in sheet as shown in Figure 4; Tcp_payload=HTTP is regular expression, is stored in the outer buffer memory of sheet after being compiled as DFA.
Single bag canonical formula matching system mainly comprises: protocol variables matching module and matching regular expressions module, and protocol variables is stored in logic chip in buffer memory, and regular expression module is stored in the outer SRAM of sheet.
As shown in Figure 2,3, single bag canonical matching process, comprises the steps:
(1) described protocol variables matching engine module extracts header information to the message received, and carries out protocol variables coupling; (2) described result treatment module will match the message of protocol variables, pass to described canonical formula matching engine and carry out association DFA and mate; (3) described canonical formula matching engine receives protocol variables matching result, shows read module read DFA table by described outer DFA, and docking receiving literary composition carries out canonical formula coupling; (4), after described canonical formula matching engine completes the canonical formula coupling receiving message, matching result is transferred to described convergence module.
Before carrying out described single bag canonical matching process, first protocol variables table in sheet and the outer DFA table of sheet are configured.Give single bag canonical matching unit by PCle transmitting order to lower levels, configure described interior protocol variables table and described outer DFA table.
In described step 1: a message in described protocol variables matching module receiving data stream, extract the semaphore in header, search in described interior buffer memory, if message have matched certain protocol variables, then this message is submitted to a matching regular expressions engine to carry out association DFA and mate, if message have matched N number of protocol variables, then this message is submitted to different regular expression engines and carries out parallel DFA coupling; Wherein, N is natural number 2,3,4,5,6,7 or 8.
In described step (2): described result treatment module searches the DFA address information corresponding with message in described regular expression DFA address information table, after message being passed to described canonical formula matching engine, in described canonical formula matching engine, search corresponding regular expression according to DFA address information, matching regular expressions is carried out to message.
In described step 3: first, main frame is by buffer memory (DFATbl) outside RgxConfigure block configuration DFA to sheet; Secondly, canonical formula matching engine (RgxBranchRngine) receives protocol variables matching result, reads DFA table, mate according to current character by RgxOffChipDfaCtrl module; Again, canonical formula matching engine completes coupling, matching result is outputted to convergence module (RgxResCollecter) and carries out arbitrating and being uploaded to subsequent module, and coupling terminates.
Single bag canonical formula matching system can adjust parallel engines number Sum fanction store buffer according to network traffics size and matching speed, if flow is large and require that matching speed is fast, DFA can be stored in access speed fast but in the SRAM that price is high, and regular expression engine number be increased to 8 tunnels or more; If flow is little and matching speed requires not too fast situation, access speed can be selected comparatively slow but the SDRAM storage DFA that price is low, generally speaking, single bag canonical formula matching system can carry out DFA storage medium and matching engine number according to regular complexity, number and matching speed.
It is to be understood that content of the present invention and embodiment are intended to the practical application proving technical scheme provided by the present invention, should not be construed as limiting the scope of the present invention.Those skilled in the art, after reading present specification, under its spirit and principle inspire, can do various amendment, equivalent replacement or improve.But these changes or amendment are all in the protection range that application is awaited the reply.

Claims (1)

1. single bag canonical matching process of a single bag canonical matching unit, it is characterized in that, described single bag canonical matching unit comprises: single bag canonical matching unit and connected buffer unit, described single bag canonical matching unit comprises matching regular expressions module, it is characterized in that, with the protocol variables matching module of described matching regular expressions model calling;
Described buffer unit comprises: the outer buffer memory of sheet arranging sheet outer DFA table and buffer memory in the memory feature arranging protocol variables table in sheet; Described outer buffer memory and described matching regular expressions model calling, described interior buffer memory is connected with described protocol variables matching module;
Described protocol variables matching module comprises: protocol variables matching engine module and connected result treatment module; Described protocol variables matching engine module receives the message data stream of described single bag canonical matching unit and reads protocol variables table in the sheet in described interior buffer memory; Described result treatment module installation has regular expression DFA address information table;
Described matching regular expressions module comprises: canonical formula matching engine (RgxBranchRngine), the outer DFA of the sheet be connected with described canonical formula matching engine respectively show read module (RgxOffChipDfaCtrl) and convergence module (RgxResCollecter); Described canonical formula matching engine receives the matching result of described protocol variables matching module;
Described outer buffer memory is SRAM or SDRAM;
Described interior protocol variables table arranges protocol variables, and described protocol variables comprises IP packet header, TCP packet header and UDP packet header;
Described canonical formula matching engine is provided with 4 pass and holds up or 8 pass are held up;
Described canonical formula matching engine is according to the matching result of described protocol variables matching module, regular expression is divided into groups, the regular expression with same protocol variable is divided into one group, and sets up the relation of each regular expression group and respective protocol variable;
Described outer DFA table is formed by described regular expression group;
Described matching process comprises the steps:
(1) described protocol variables matching engine module extracts header information to the message received, and carries out protocol variables coupling; (2) described result treatment module will match the message of protocol variables, pass to described canonical formula matching engine and carry out association DFA and mate; (3) described canonical formula matching engine receives protocol variables matching result, shows read module read DFA table by described outer DFA, and docking receiving literary composition carries out canonical formula coupling; (4), after described canonical formula matching engine completes the canonical formula coupling receiving message, matching result is transferred to described convergence module;
In described step (1):
Each message in described protocol variables matching module receiving data stream, extract the corresponding signal amount in header, search in described interior buffer memory, if message have matched certain protocol variables, then this message is submitted to a matching regular expressions engine to carry out association DFA and mate, if message have matched N number of protocol variables, then this message is submitted to different regular expression engines and carries out parallel DFA coupling; Wherein, N is natural number 2,3,4,5,6,7 or 8;
In described step (2): described result treatment module searches the DFA address information corresponding with message in described regular expression DFA address information table, after message being passed to described canonical formula matching engine, in described canonical formula matching engine, search corresponding regular expression according to DFA address information, matching regular expressions is carried out to message.
CN201110383388.4A 2011-11-28 2011-11-28 Single bag canonical matching unit and method Active CN102420750B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110383388.4A CN102420750B (en) 2011-11-28 2011-11-28 Single bag canonical matching unit and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110383388.4A CN102420750B (en) 2011-11-28 2011-11-28 Single bag canonical matching unit and method

Publications (2)

Publication Number Publication Date
CN102420750A CN102420750A (en) 2012-04-18
CN102420750B true CN102420750B (en) 2015-09-23

Family

ID=45944990

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110383388.4A Active CN102420750B (en) 2011-11-28 2011-11-28 Single bag canonical matching unit and method

Country Status (1)

Country Link
CN (1) CN102420750B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9203805B2 (en) 2011-11-23 2015-12-01 Cavium, Inc. Reverse NFA generation and processing
US9426166B2 (en) * 2013-08-30 2016-08-23 Cavium, Inc. Method and apparatus for processing finite automata
US9507563B2 (en) 2013-08-30 2016-11-29 Cavium, Inc. System and method to traverse a non-deterministic finite automata (NFA) graph generated for regular expression patterns with advanced features
US9426165B2 (en) * 2013-08-30 2016-08-23 Cavium, Inc. Method and apparatus for compilation of finite automata
CN103607313B (en) * 2013-12-09 2017-04-19 深圳市双赢伟业科技股份有限公司 TCP (transmission control protocol) message matching method on Regular expression
US9904630B2 (en) 2014-01-31 2018-02-27 Cavium, Inc. Finite automata processing based on a top of stack (TOS) memory
US10110558B2 (en) 2014-04-14 2018-10-23 Cavium, Inc. Processing of finite automata based on memory hierarchy
US10002326B2 (en) 2014-04-14 2018-06-19 Cavium, Inc. Compilation of finite automata based on memory hierarchy
JP6677983B2 (en) * 2014-08-04 2020-04-08 テクトロニクス・インコーポレイテッドTektronix,Inc. Test measurement device and data generation method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7689530B1 (en) * 2003-01-10 2010-03-30 Cisco Technology, Inc. DFA sequential matching of regular expression with divergent states
CN101853301A (en) * 2010-05-25 2010-10-06 华为技术有限公司 Regular expression matching method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7689530B1 (en) * 2003-01-10 2010-03-30 Cisco Technology, Inc. DFA sequential matching of regular expression with divergent states
CN101853301A (en) * 2010-05-25 2010-10-06 华为技术有限公司 Regular expression matching method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
丁晶,陈晓岚,吴萍.基于正则表达式的深度包检测算法.《计算机应用》.2007,第27卷(第9期),第2184页引言部分1-2段和1深度包检测原理与正则表达式第1段. *

Also Published As

Publication number Publication date
CN102420750A (en) 2012-04-18

Similar Documents

Publication Publication Date Title
CN102420750B (en) Single bag canonical matching unit and method
Pontarelli et al. Traffic-aware design of a high-speed FPGA network intrusion detection system
US20170364337A1 (en) Method and apparatus for compiling regular expressions
CN102143148B (en) Parameter acquiring and general protocol analyzing method and device
US9398033B2 (en) Regular expression processing automaton
CN101853301A (en) Regular expression matching method and system
US20110116507A1 (en) Iterative parsing and classification
CN103812860B (en) A kind of high speed network strategy matching method based on FPGA
KR20090065315A (en) Signature string storing memory structure and the storing method for the same, signature string pattern matching method
CN102495818A (en) Method for improving communication speed rate of software-mode serial peripheral interface (SPI)
CN110324204B (en) High-speed regular expression matching engine and method implemented in FPGA (field programmable Gate array)
TWI389506B (en) Test System and Method of Ethernet Solid Layer Layer
CN111107068B (en) Efficient rule matching method for FPGA and terminal
SE531947C2 (en) Procedure, device and system for multi-field classification in a data communication network
CN102497319B (en) System and method for realizing single packet matching by utilizing automaton
CN105471726B (en) The method and apparatus of retransmitting paramater transmitting
CN105791163B (en) Update processing method and processing device
Wang et al. Towards fast regular expression matching in practice
CN104678815B (en) The interface structure and collocation method of fpga chip
CN109815263A (en) A kind of data stream recognition method and system of fuzzy search
CN112187935B (en) Information identification method and read-only memory
Sun et al. NFA-based pattern matching for deep packet inspection
Cronin et al. Hardware acceleration of regular expression repetitions in deep packet inspection
CN105791124B (en) Message detecting method and device
Korenek et al. Efficient mapping of nondeterministic automata to FPGA for fast regular expression matching

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20171213

Address after: 300384 Tianjin city Xiqing District Huayuan Industrial Zone (outer ring) Haitai Huake Street No. 15 1-3

Patentee after: Sugon Information Industry Co., Ltd.

Address before: 100084 Beijing Haidian District City Mill Street No. 64

Patentee before: Dawning Information Industry (Beijing) Co., Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20180408

Address after: 430040 Wuhuan Road No. 666 (10), economic and Technological Development Zone, Wuhan, Hubei Province

Patentee after: Dawning Network Technology Co., Ltd.

Address before: 300384 Tianjin city Xiqing District Huayuan Industrial Zone (outer ring) Haitai Huake Street No. 15 1-3

Patentee before: Sugon Information Industry Co., Ltd.