CN102195827A - Deep packet inspection method based on overlapping sub-character string classifier - Google Patents

Deep packet inspection method based on overlapping sub-character string classifier Download PDF

Info

Publication number
CN102195827A
CN102195827A CN2010101242368A CN201010124236A CN102195827A CN 102195827 A CN102195827 A CN 102195827A CN 2010101242368 A CN2010101242368 A CN 2010101242368A CN 201010124236 A CN201010124236 A CN 201010124236A CN 102195827 A CN102195827 A CN 102195827A
Authority
CN
China
Prior art keywords
character string
deep packet
packet inspection
inspection method
dfa
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010101242368A
Other languages
Chinese (zh)
Inventor
张志凯
赵有健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN2010101242368A priority Critical patent/CN102195827A/en
Publication of CN102195827A publication Critical patent/CN102195827A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a deep packet inspection method based on an overlapping sub-character string classifier. The deep packet inspection method belongs to the technical field of hardware modules of a security gateway and other network devices, which are used in high-performance intrusion detection systems in network safety. The technical brief description is as follows: the main task of deep packet inspection is to detect feature strings being in line with conditions from packet payload in input network traffic so as to provide basis for next-step classification treatment. In order to improve the detection speed, a character string detection algorithm based on a DFA (deterministic finite automaton) is generally used for realizing matching of the feature strings; however, as the number of the feature strings is increased, the storage cost of the DFA is very high, and the practicality of the algorithm is further affected. In order to solve the problem, the method disclosed by the invention uses the structure of the overlapping sub-character string classifier for performing pretreatment on the input character strings, thereby enabling information of the input strings to be fully compressed and extracted, and enabling the further matching (based on the traditional DFA scheme) to greatly save the storage cost. The deep packet inspection method causes the probability of erroneous judgment (namely, the probability of false positive), but the probability of the false positive can be reduced to the negligible degree by seriously selecting parameters.

Description

Deep packet detection method based on overlapping substring grader
Technical field
The deep packet detection (Deep Packet Inspection, hardware configuration DPI) is mainly used in high performance Network Security Device, as security gateway, high-performance edge router, technical fields such as high-performance intruding detection system.
Background technology
Traditional Network Security Device, as fire compartment wall (firewall) etc., can only be implemented in the information that detects head (header) in network layer or the transport layer in the network packet, and the network packet depth detection system of a new generation, then can carry out depth detection to the payload in the network packet (payload) data division, thereby can perhaps detect the relevant special flow (as the P2P flow) of operation to find malicious intrusions or attack than high-accuracy.
In general, the deep packet detection system is made of a rule set and a string matching engine that comprises the feature string; The general work process that detects is that payload content in the grouping that will import in the network and the feature string in the rule set carry out coupling by byte (this coupling can be realized by hardware, also can realize) by software, if the coupling of discovery then is further processed giving suspicion PHM packet handling module or system manager on the whole group.Therefore, realize that high performance many features string matching engine is the key of the practical DPI of structure system.
In many features string matching algorithm commonly used, Aho-Corasick[1], Commentz-Walter[2] and, Wu-Manber[3] be modal three kinds, wherein the most frequently used is Aho-Corasick algorithm (AC algorithm); This mainly is because this algorithm can realization and target strings and the irrelevant stable speed of searching of input traffic.The main thought of Aho-Corasick algorithm is the finite-state automata (DFA) of the given feature mode collection of structure.A given set of patterns, Aho-Corasick algorithm at first convert the feature string of this set to a pattern matching tree, thereby and in tree, add the rollback of failure transition when having avoided that it fails to match; At last, thus this algorithm also introduced extra transition and eliminated the failure transition, make algorithm to handle input traffic with constant rate of speed.
For throughput and the storage efficiency that improves the AC algorithm, the researcher has proposed a lot of hardware based schemes [4]~[8].[4] proposed a kind of finite-state automata (DFA) to be carried out the scheme that position level is decomposed, thereby improved the concurrency of searching, improved throughput; [5] propose a kind of coding and compress technique, thereby reduced the storage of DFA; These two kinds of schemes all need to store 256 kinds of states that receive after each byte, thereby need a large amount of memory spaces; In [6] and [8], DFA is denoted as the transition rule vector of one group of band priority, thereby has reduced the number of necessary transition, has reached the purpose that certain saving is stored.More than several hardware characteristics string matching schemes based on DFA respectively at two most important requirements of pattern matching algorithm, a kind of is at the high-performance high-throughput, be 256 kinds of transition of the NextState of each byte are all stored, realizing fixedly but having increased memory space greatly in the high-throughput, another kind is at storage efficiency, use so-called transition rule vector to realize higher storage efficiency, still but can't guarantee the high-throughput of fixing.
Traditional string matching technology major part all is based on the fully accurately coupling of AC algorithm or DFA, and when string assemble is big, required memory span and bandwidth are with huge increase, and this makes that a lot of so far DPI algorithms all can't be disposed in real system.But for a lot of application scenarios, fully accurate coupling is not indispensable, loosen this " accurately,, requirement, then can significantly reduce demand to storage.The present invention has designed one and has significantly reduced memory capacity exactly from this point, and its levels of precision is close to the hardware matching scheme that is equivalent to traditional DFA method simultaneously.
Summary of the invention
Basic thought of the present invention is not to be strict with the levels of precision of string matching, but to wish to reduce the required storage of coupling.Basic means are earlier the information of input of character string to be extracted, compression, and then mate.Therefore mating hardware structurally is divided into two parts, a part is the structure that is used for input of character string compression (summary info extraction), at this we use a kind of cry overlapping substring grader (Overlapped Substring Classifier, hardware configuration OSC) is realized this function; Another part, comparing class are similar to the matching structure of tradition based on DFA, are used for the bit string after the compression is mated, thereby confirm whether target string is arranged in the former character string.
String matching structure based on overlapping substring grader (OSC)
Fig. 1 has described the logical construction and the matching process of whole coupling hardware based on OSC.At first, all length of each string in the target character set of strings be 4 the word string (here 4 be span stride, be a parameter, note is made t) all is added into n any one in gathering.When input of character string is compressed,, see length after this is whether 4 character string belongs to some in n the set that generated just now every s position, if belong to i set, serially adding after then compressing into i, otherwise then add x.In the drawings, n=2, s=1, then the character string after the compression of Sheng Chenging is binary system bit string, like this input of character string is carried out the structure that Information Compression extracts and just is called OSC, Overlapped Substring Classifier.When so this bit string being mated, it needs canned data then can reduce significantly again.
In compression, can carry out some necessity check in some positions of input of character string, thereby further filter incoherent flow, this technology is called stamping.As shown in Figure 2, be that 0 or 1 length is that 4 all possible target substring is formed a set with the position, check in each position then, be actually the checking again that the bit string that obtains is above carried out.
Two above structures all relate to " checking length is whether the word string of t imports some in n the string assemble " this operation, and this operation can realize based on the structure that is called Bloom Filter efficiently with a kind of, and the detail of Bloom Filter can be referring to pertinent literature.As shown in Figure 3, when n=2, can realize an OSC, in the stamping module, also realize similar function in addition with BF with two BF.Like this, just the substring of 5 bytes can be encoded to 1bit.
After some is handled through this, originally the matched transform to input of character string is the coupling that the summary string after the compression is carried out, and the coupling of this part can be carried out with traditional DFA method, and the storage that unique difference is to use will be lacked a lot, the visible Fig. 4 of its primary structure.Though wrong report may appear in the coupling after the compression, by adjusting t, s, these three parameters of n, and behind the authentication module such as adding stamping, its average levels of precision is substantially near conventional method.
Technical characterstic and advantage
For the superiority of present technique is described, we weigh with two indexs, at first are storage consumption, and our scheme and other schemes relatively can greatly reduce the storage overhead, and the saving rate is more than 85%; We are defined as another index fuzzy set (Ambiguity Set, AS), i.e. report are when matching certain target strings, the in fact set of all character strings that may match.Obviously, for traditional DFA method, the size of AS is 1 forever, and our experiment of process, under suitable parameters, the AS of the scheme that we propose will approach 1 on average.
Three parameters of the present invention, forward direction span t, step-length s and substring packet count n have very significant effects to the fuzzy set size of method.Fig. 5 has just shown this influence.We see, detect string assemble Snort for deep packet commonly used, suitable t, and s and n can greatly reduce the mean value of AS.And added the system of stamping, as shown in Figure 6, then can be optimized to the mean value of AS set sizes near 1.
Like this, we use very little cost (on average less than 10 -8False positive rate, False Positive Rate, promptly Wu Pan possibility can be ignored) exchanged very objectively storage for and saved, thereby lay the foundation for the practicability of DPI system.
Description of drawings
Fig. 1. use OSC to carry out the logical construction and the schematic diagram of the compression of on-line input of character string.
Fig. 2. added the character string compressed logic schematic diagram of back check mechanism " stamping ".
The implementation structure of Fig. 3 .OSC for the OSC of n=2, needs two Bloom Filter to constitute a grader, also needs an extra Bloom Filter to carry out the back check in addition.
Fig. 4. use the design of whole DPI system behind the OSC.(a) expression is the architecture that every beat is handled two bytes, and (b) expression is not have beat to handle the architecture of 8 bytes, needs more processing engine parallel processing.
Fig. 5. (a) show the validity of back check, (b), (c), (d) showed three parameter s respectively, t, n is to the influence of coupling accuracy (weighing with Ambiguity Set size).
Fig. 6. (a), showed respectively in conjunction with after the various optimization means (b) that coupling accuracy of the present invention can approach the coupling accuracy (being that the AS size is near 1) of traditional DFA.
Reference
[1]Alfred?V.Aho?and?Margaret?J.Corasick,Efficient?string?matching:an?aid?to?bibliographic?search.Communication?of?the?ACM,18(6):333-340,1975.
[2]B.Commentz-Walter,A?string?matching?algorithm?fast?on?the?average.Proc.of?the?6thColloquium?on?Automata,Languages?and?Programming,pp.118-132,July?1979.
[3]Sun?Wu?and?Udi?Manber,A?fast?algorithm?for?multi-pattern?searching.Technical?ReportTR-94-17,Department?of?Computer?Science,University?of?Arizona,1994.
[4]Lin?Tan?and?Timothy?Sherwood,A?High?Throughput?String?Matching?Architecture?for?IntrusionDetection?and?Prevension.Proceedings?of?the?32nd?International?Symposium?on?ComputerArchitecture(ISCA’05).
[5]Benjiamin?C.Brodie,Ron?K.Cytron?and?David?E.Taylor,A?Scalable?Architecture?forHigh-Throughput?Regular-Expression?Pattern?Matching.Proceedings?of?the?33rd?InternationalSymposium?on?Computer?Architecture(ISCA’06).
[6]Jan.van.Lunteren,High?Performance?Pattern?Matching?for?Intrusion?Detection.IEEE?Infocom2006.
[7]Jan.van.Lunteren,Searching?Very?Large?Routing?Tables?in?Wide?Embedded?Memory.Proc.IEEE?Globecom,vol.3.pp.1615-1619,November?2001.
[8]Sailesh?kumar,Sarang?Dharmapurikar,Fang?Yu,etc,Algorithms?to?Accelerate?Multiple?RegularExpressions?Matching?for?Deep?Packet?Inspection.SIGCOMM’06,September?11-15,2006,Pisa,Italy.
[9]Snort,http://www.snort.org.

Claims (2)

1. use string matching algorithm based on overlapping substring grader.The generating algorithm that comprises overlapping substring set generates the matching algorithm that compresses back summary string based on grader.
2. the hardware configuration of overlapping substring grader.Comprise the structure of using Bloom Filter to make up substring sets classification device, and use back inspection technology " Stamping " to filter the technology and the implementation structure of irrelevant flow.
CN2010101242368A 2010-03-15 2010-03-15 Deep packet inspection method based on overlapping sub-character string classifier Pending CN102195827A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010101242368A CN102195827A (en) 2010-03-15 2010-03-15 Deep packet inspection method based on overlapping sub-character string classifier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010101242368A CN102195827A (en) 2010-03-15 2010-03-15 Deep packet inspection method based on overlapping sub-character string classifier

Publications (1)

Publication Number Publication Date
CN102195827A true CN102195827A (en) 2011-09-21

Family

ID=44603249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010101242368A Pending CN102195827A (en) 2010-03-15 2010-03-15 Deep packet inspection method based on overlapping sub-character string classifier

Country Status (1)

Country Link
CN (1) CN102195827A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102904951A (en) * 2012-10-12 2013-01-30 哈尔滨工业大学深圳研究生院 Data packet detecting method based on cloud system
WO2013054209A1 (en) * 2011-10-11 2013-04-18 International Business Machines Corporation Using a heuristically-generated policy to dynamically select string analysis algorithms for client queries
CN110825924A (en) * 2019-11-01 2020-02-21 深圳市前海随手数据服务有限公司 Data detection method, device and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013054209A1 (en) * 2011-10-11 2013-04-18 International Business Machines Corporation Using a heuristically-generated policy to dynamically select string analysis algorithms for client queries
CN103890788A (en) * 2011-10-11 2014-06-25 国际商业机器公司 Using a heuristically-generated policy to dynamically select string analysis aLGorithms for client queries
GB2509451A (en) * 2011-10-11 2014-07-02 Ibm Using a heuristically-generated policy to dynamically select string analysis algorithms for client queries
US9092723B2 (en) 2011-10-11 2015-07-28 International Business Machines Corporation Using a heuristically-generated policy to dynamically select string analysis algorithms for client queries
CN103890788B (en) * 2011-10-11 2016-10-26 国际商业机器公司 For dynamically selecting the mthods, systems and devices of string parsing algorithm
CN102904951A (en) * 2012-10-12 2013-01-30 哈尔滨工业大学深圳研究生院 Data packet detecting method based on cloud system
CN110825924A (en) * 2019-11-01 2020-02-21 深圳市前海随手数据服务有限公司 Data detection method, device and storage medium
CN110825924B (en) * 2019-11-01 2022-12-06 深圳市卡牛科技有限公司 Data detection method, device and storage medium

Similar Documents

Publication Publication Date Title
KR101536880B1 (en) Anchored patterns
KR101868720B1 (en) Compiler for regular expressions
CN102184197B (en) Regular expression matching method based on smart finite automaton (SFA)
CN100542176C (en) The analysis and processing method of packet content and system
CN101442540A (en) High speed mode matching algorithm based on field programmable gate array
Zheng et al. Algorithms to speedup pattern matching for network intrusion detection systems
CN106062740B (en) Method and device for generating multiple index data fields
Najam et al. Speculative parallel pattern matching using stride-k DFA for deep packet inspection
CN105760762A (en) Unknown malicious code detection method for embedded processor
CN113420802A (en) Alarm data fusion method based on improved spectral clustering
Xu et al. DDoS detection using a cloud-edge collaboration method based on entropy-measuring SOM and KD-tree in SDN
Zheng et al. GCN-ETA: high-efficiency encrypted malicious traffic detection
CN102195827A (en) Deep packet inspection method based on overlapping sub-character string classifier
Bando et al. Range hash for regular expression pre-filtering
Chen et al. Ac-suffix-tree: Buffer free string matching on out-of-sequence packets
Hu et al. Abnormal Event Correlation and Detection Based on Network Big Data Analysis.
Wu et al. Detection of improved collusive interest flooding attacks using BO-GBM fusion algorithm in NDN
Zhao et al. Intrusion detection model of Internet of Things based on LightGBM
CN101854341B (en) Pattern matching method and device for data streams
CN103198065A (en) Optimization method for regular expression matching circuit
Jain et al. A novel distributed semi-supervised approach for detection of network based attacks
Tian et al. Hierarchical distributed alert correlation model
Hilgurt A Concise Review of FPGA-Based Hardware Solutions for Network Intrusion Detection
Bandyopadhyay et al. A Decision Tree Based Intrusion Detection System for Identification of Malicious Web Attacks
Nourani et al. Bloom filter accelerator for string matching

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110921