CN105930723A

CN105930723A - Intrusion detection method based on feature selection

Info

Publication number: CN105930723A
Application number: CN201610246178.3A
Authority: CN
Inventors: 陈星�; 戴远飞
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2016-04-20
Filing date: 2016-04-20
Publication date: 2016-09-07

Abstract

The invention relates to an intrusion detection method based on feature selection. The intrusion detection method comprises the following steps: firstly, carrying out discretization processing on original data, then, carrying out the feature selection on the data subjected to the discretization processing, carrying out normalization processing on the data subjected to the feature selection, and finally, importing the data subjected to the normalization processing into a classifier for training. The intrusion detection method can shorten model training time and improve model training accuracy.

Description

The intrusion detection method that a kind of feature based selects

Technical field

The present invention relates to Data Mining, the intrusion detection that a kind of feature based selects Method.

Background technology

Along with Internet era development, connection and the flow of data are the most increasing. for meter For calculation machine and equipment, the threat of thing followed malicious intrusions increases the most day by day. so setting up One Network Intrusion Detection System is the most important.Intrusion detection is a kind of by collecting and analyzing By protection system information, thus find the technology of invasion. owing to intrusion detection needs to enter data Row is real-time, process exactly, it was predicted that whether this data is threat information, so how does To data timely, predict and just become a difficult problem accurately.Conventional intrusion detection system System uses the way of pattern match, and the most artificial sets up rule to each intrusion model, pass through If else statement judges, but the method Units-of-production method, and accuracy rate is the highest, It is important that when the attack mode of a new type occurs, system cannot be defendd.

In recent years the method for machine learning is joined in intruding detection system and be one and become Gesture.At present, neutral net, support vector machine, naive Bayesian, the machine learning such as decision tree Method has all been applied to intrusion detection.First the feature collected is carried out the pre-of data Process, then carry out machine learning by these and the data processed, generate grader. when one When individual real time data is by system, whether system can be doped this recorded by grader and be Invasion record, if being judged as threatening, system is automatically prevented from invasion, if normally, then allows it Pass through. for an intruding detection system, the accuracy rate of detection invasion, find invasion Speed etc. these all it is critical that factor. but the machine learning method of main flow exists now Accuracy rate and can only achieve about 95%, and the training time is long, it is impossible to the data to redundancy Carry out the problems such as process.

Summary of the invention

In view of this, the purpose of the present invention is to propose to the intrusion detection side that a kind of feature based selects Method, has lifting on the time and accuracy rate of training pattern.

The present invention uses below scheme to realize: the intrusion detection method that a kind of feature based selects, First initial data is carried out sliding-model control, the data after sliding-model control are carried out feature Select, the data after feature selection are normalized, after normalized Data import grader and are trained.

Further, described sliding-model control uses entropy minimization discrete method (EMD), first will Attribute successive value to be divided arranges in order, then takes the midpoint conduct of every a pair numerical value that is connected Breakpoint Candidate Set, by each breakpoint in circulation assessment Candidate Set, data have been partitioned into two Part, and the comentropy of classification results is also computed, then selects that to make entropy Little breakpoint adds in break point set；Described entropy minimization is specified by a minimum description length The dwell time of discretization.

Further, described sliding-model control may be used without ratio K time interval discrete method (PKID), an interval scale and institute's phase are found by the discrete gap size of adjustment and quantity A balance between the accuracy rate hoped, and sends out this balance between difference as discretization deviation Criterion.

Further, described data after sliding-model control are carried out feature selection, employing It it is feature selection approach (CFS) based on association；Ignored for classification shadow by a valuation functions Ringing little feature, described valuation functions is as follows:

M_{S} = \frac{{kr}_{c f}}{\sqrt{k + k (k - 1) r_{f f}}};

Wherein, M_SFor comprising the heuristic value of subset S of k feature, r_cfIt is that feature joins with classification It is the meansigma methods of intensity, r_ffIt it is the meansigma methods of relation intensity between feature and feature；

Further, described data after sliding-model control are carried out feature selection, employing It is based on conforming filter method (CONS), is projected onto spy by comparing training sample The consistency level levying subset carrys out selected characteristic；From characteristic number, one is randomly generated in each wheel Individual subset S, if number of features is less than the feature in the most best character subset in subset S Number, then calculates the inconsistent standard in S, if inconsistent rate is less than the value preset, then S has just become best character subset.

Further, described data after sliding-model control are carried out feature selection, employing Being INTERACT method, described INTERACT method is calculation based on symmetrical uncertain SU Method；First pass through the symmetrical uncertain SU descending of feature oneself, then special from these Levy sequence finally to start, assess feature one by one, if the concordance contribution of certain feature is less than threshold value, Just remove this feature, otherwise use this feature；Described symmetrical uncertain SU is to describe two Feature x, information income IG and the tolerance of entropy H ratio between y, formula is as follows:

S U (x, y) = 2 \frac{I G (x / y)}{H (x) + H (y)};

I G (\frac{x}{y}) = H (y) + H (x) - H (x, y) .

Further, described normalized uses deviation normalization (NOR), and formula is as follows:

y = \frac{x - m i n}{m a x - m i n};

Wherein, max is the maximum of sample data, and min is the minima of sample data.

Further, described grader employing Naive Bayes Classifier (Navie Bayes): Let d be the training set with limited quantity example, A={A₁,A₂,…,A_nIt is n limited genus Property, an example d ∈ D vector (a₁,a₂,…a_n) describe, wherein a_iIt is attribute A_iCurrent Value, category attribute C represents, function dom (A_i) obtain attribute A_iDefinition territory set；WhenDuring establishment, i.e. the classification of prediction example d is to belong to Property specified criteria under the maximum classification of posterior probability, it was predicted that correctness maximum；Introduce and assume: Under conditions of given classification C, all of attribute A_iSeparate: P(A_i|c,A_j)=p (A_i|c),A_j, p (C) ＞ 0, use following formula to calculate the situation in given attribute value The other posterior probability of lower class:

P (C = c | A_{1} = a_{1}, A_{2} = a_{2}, ... A_{n} = a_{n}) = \arg \underset{c &Element; C}{m a x} P (c) Π_{i = 1}^{n} P (A_{i} | C = c) .

Further, described grader uses support vector machine classifier (SVM), uses classification Function:

f (x) = s i g n (Σ_{i = 1}^{l} y_{i} a_{i} K (x_{i}, x) + b);

Wherein, l represents the number of training sample, and x represents the vector of example to be sorted, x_i,y_iRepresent the The attribute vector of i training sample and classification logotype, K (x_i, x) represent kernel function, a_iRepresent with b Model parameter, solves parameter a by being calculated as follows a quadratic programming problem_i:

\max Q (a) = Σ_{i = 1}^{l} a_{i} - \frac{1}{2} Σ_{i = 1}^{l} Σ_{j = 1}^{l} a_{i} a_{j} y_{i} y_{j} K (x_{i}, x_{j});

s . t Σ_{i = 1}^{l} a_{i} y_{i} = 0, 0 \leq a_{i} \leq C, i = 1, ..., l;

If two disaggregated models are:

G (x)=ω * x+b；

The threshold value arranging two disaggregated models is 0, obtains:

\begin{matrix} ω * x_{i} + b > 0, &ForAll; x_{i} &Element; c_{1} \\ ω * x_{i} + b < 0, &ForAll; x_{i} &Element; c_{2} \end{matrix};

Select the plane that in two class objects, distance is maximum.

Further, described grader uses decision Tree algorithms (Decision Tree), described Decision Tree algorithms is divided into two stages: set construction phase, hedge clipper branch stage；

Described tree construction phase uses top-down recursive fashion, starts each from root node Testing attribute is selected according to given standard, then according to the institute of respective attributes is likely on node Value set up downwards branch, divide training sample, until a node on all samples all by The sample size being divided in same class, or a certain node less than set-point time till；Its The standard of middle selection testing attribute includes information gain, information gain ratio, gini index and base Division in distance；

The first beta pruning of employing of described hedge clipper branch stage, rear beta pruning or the method that both combine；Hedge clipper The beta pruning standard of branch method includes Minimal Description Length Criterion and expectation error rate minimum principle；Before Person carries out binary coding to decision tree, and it is minimum that optimal beta pruning tree is exactly the required binary bit of coding Tree；The latter calculates the expectation error rate that the subtree on certain node is occurred after beta pruning.

Compared with prior art, the present invention has a following beneficial effect: conventional intruding detection system Using the way of pattern match, the application is for classifier training time length before, accuracy rate The highest situation, the thought that introduced feature selects, the network intrusions inspection that feature based selects is proposed Method of determining and calculating. according to experimental result, the algorithm that feature based selects is in time of training pattern and standard Really there is lifting in rate, particularly in terms of verification and measurement ratio, reached more than 98% especially.

Accompanying drawing explanation

Fig. 1 is embodiment of the present invention theory diagram.

Fig. 2 is embodiment of the present invention naive Bayesian structural model schematic diagram.

Detailed description of the invention

Below in conjunction with the accompanying drawings and embodiment the present invention will be further described.

As it is shown in figure 1, present embodiments provide the intrusion detection method that a kind of feature based selects, First initial data is carried out sliding-model control, the data after sliding-model control are carried out feature Select, the data after feature selection are normalized, after normalized Data import grader and are trained.

In the present embodiment, described sliding-model control uses entropy minimization discrete method, first will treat The attribute successive value divided arranges in order, then takes the midpoint of every a pair numerical value that is connected as disconnected Point Candidate Set, by each breakpoint in circulation assessment Candidate Set, data have been partitioned into two Divide, and the comentropy of classification results is also computed, then select that to make entropy minimum Breakpoint add in break point set；By a minimum description length specify described entropy minimization from The dwell time of dispersion.

In the present embodiment, described sliding-model control adoption rate K time interval discrete method is logical Cross and adjust discrete gap size and quantity finds an interval scale and desired accuracy rate Between a balance, using this balance as discretization deviation and send out difference between criterion.

In the present embodiment, described data after sliding-model control are carried out feature selection, adopt Be based on association feature selection approach；Ignored for classification impact by a valuation functions Little feature, described valuation functions is as follows:

M_{S} = \frac{{kr}_{c f}}{\sqrt{k + k (k - 1) r_{f f}}};

In the present embodiment, described data after sliding-model control are carried out feature selection, adopt Be based on conforming filter method, be projected onto feature by comparing training sample The consistency level of collection carrys out selected characteristic；From characteristic number, a son is randomly generated in each wheel Collection S, if number of features is less than the number of features in the most best character subset in subset S, Then the inconsistent standard in S is calculated, if inconsistent rate is less than the value preset, then S just becomes Best character subset.

In the present embodiment, described data after sliding-model control are carried out feature selection, adopt Be INTERACT method, described INTERACT method is based on symmetrical uncertain SU Algorithm；First pass through the symmetrical uncertain SU descending of feature oneself, then from these Characteristic sequence finally starts, and assesses feature one by one, if the concordance contribution of certain feature is less than threshold Value, just removes this feature, otherwise uses this feature；Described symmetrical uncertain SU is to describe Two features x, information income IG and the tolerance of entropy H ratio between y, formula is as follows:

S U (x, y) = 2 \frac{I G (x / y)}{H (x) + H (y)};

I G (\frac{x}{y}) = H (y) + H (x) - H (x, y) .

In the present embodiment, described normalized uses deviation normalization, and formula is as follows:

y = \frac{x - m i n}{m a x - m i n};

In the present embodiment, as in figure 2 it is shown, described grader use Naive Bayes Classifier: Let d be the training set with limited quantity example, A={A₁,A₂,…,A_nIt is n limited genus Property, an example d ∈ D vector (a₁,a₂,…a_n) describe, wherein a_iIt is attribute A_iCurrent Value, category attribute C represents, function dom (A_i) obtain attribute A_iDefinition territory set；WhenDuring establishment, i.e. the classification of prediction example d is to belong to Property specified criteria under the maximum classification of posterior probability, it was predicted that correctness maximum；Introduce and assume: Under conditions of given classification C, all of attribute A_iSeparate: P(A_i|c,A_j)=p (A_i|c),A_j, p (C) ＞ 0, use following formula to calculate the situation in given attribute value The other posterior probability of lower class:

P (C = c | A_{1} = a_{1}, A_{2} = a_{2}, ... A_{n} = a_{n}) = \arg \underset{c &Element; C}{m a x} P (c) Π_{i = 1}^{n} P (A_{i} | C = c) .

In the present embodiment, described grader uses support vector machine classifier SVM, uses and divides Class function:

f (x) = s i g n (Σ_{i = 1}^{l} y_{i} a_{i} K (x_{i}, x) + b);

\max Q (a) = Σ_{i = 1}^{l} a_{i} - \frac{1}{2} Σ_{i = 1}^{l} Σ_{j = 1}^{l} a_{i} a_{j} y_{i} y_{j} K (x_{i}, x_{j});

s . t Σ_{i = 1}^{l} a_{i} y_{i} = 0, 0 \leq a_{i} \leq C, i = 1, ..., l;

If two disaggregated models are:

G (x)=ω * x+b；

The threshold value arranging two disaggregated models is 0, obtains:

\begin{matrix} ω * x_{i} + b > 0, &ForAll; x_{i} &Element; c_{1} \\ ω * x_{i} + b < 0, &ForAll; x_{i} &Element; c_{2} \end{matrix};

Select the plane that in two class objects, distance is maximum.

In the present embodiment, described grader uses decision Tree algorithms, and described decision Tree algorithms is divided It it is two stages: tree construction phase, the hedge clipper branch stage；

Particularly, the present embodiment uses KDDcup99 data base, and this data base is uniquely can Sample label and the data set of test data are provided.This data set mainly include 41 bit attributes and One label, wherein the attribute of 19 is the basic feature that network connects, the attribute of 10 22 Be the attribute of content characteristic 23 41 that network connects be the traffic characteristic that network connects. this number Include training data according to collection, including seven weeks in tcp detect about 5,000,000 Bar linkage record, each of which is about the data of 100 bytes.Institute in our experiment Use 10%KDDcup99, i.e. 494201 data.

In order to prove the algorithm advantage compared to traditional algorithm of the present invention, we are by mentioned above 3 kinds of feature selecting algorithm and the mixing of two kinds of discretization algorithms, thus obtained following table Lattice:

Table 1 feature selection attribute

Each combination of upper table all can produce different character subsets, in conjunction with different graders More can obtain different results.So just for different combinations and different graders below, Test.

What the present embodiment was chosen is the KDD99 data set of 10%, first front 300,000 data is made For training set training pattern, then carry out model by remaining 200,000 data as test set Inspection.Nomal. in label is labeled as+1, and remaining is labeled as-1, will be in having processed Data bring in model and be trained, obtain support vector machine (SVM), decision tree (tree) With naive Bayesian (bayes) model.Test model is carried out again by remaining 200,000 data Accuracy rate.

The most just the situation of two classes is discussed the performance of three kinds of graders:

Classification accuracy after table 2 feature selection

As can be seen from Table 1 and Table 2 after having carried out feature selection, except CONS+ The combination of PKID+NOR, the feature combination that substantial majority selects can make the property of grader Can be significantly improved, in SVM classifier, INTERACT+EMD+NOR_SVM Combination obtain the highest degree of accuracy 98.35%；In decision tree classifier, INTERACT+EMD The combination of+NOR_TREE obtains the highest degree of accuracy 99.90%；In Naive Bayes Classification In device, the combination of INTERACT+PKID+NOR_BAYES obtains the highest accuracy rate 98.32%.In three above-mentioned graders, the classifying quality of decision tree is best. decision tree Error rate is maintained at less than 1%, and support vector machine is then maintained at less than 2%, due to simple pattra leaves This model proposes based on conditional independence and probability statistics, have between attribute association or During the skewness of person's attribute, all can produce the mistake of classification, so accuracy rate is not on the whole Such as both the above.

In KDDcup99 data set, having 4 big class type of errors, they are Probe, Dos respectively, U2R and R2L.Add normal normal class, a total of 5 classes.As shown in the table:

Table 3 10%kddcup all kinds of label proportion

These 5 classifications are set up many disaggregated models by the present invention, and test, and concrete data are shown in Table 4, table 5, table 6:

Many classification situation support vector machine (SVM) classification situation after table 4 feature selection

Many classification situation decision tree classification situation after table 5 feature selection

Many classification situation Bayes (Bayes) classification situation after table 6 feature selection

Finally, after carrying out the selection of many characteristic of divisions, support vector machine and decision tree are calculated The classification accuracy of method is average the most all more than 95%, and on particular category classification accuracy phase Relatively low being because is trained lazy weight, is not enough to generate grader accurately.In SVM, The combination of CFS+EMD+NOR_SVM obtains best accuracy 98.34%；Decision tree In, the combination of CONS+EMD+NOR_TREE obtains best accuracy rate 99.29%；? In naive Bayesian, the combination of CONS+PKID+NOR_BAYES obtains best accurate Rate 91.83%.By contrast, Bayesian accuracy rate is not the most the highest.Owing to Bayes divides Class device is built upon in the probability distribution of data, and the sample in possible training set can not be good The distribution situation that reaction sample is overall, so also occurring in that deviation when setting up model, causes accurately Rate declines.

The foregoing is only presently preferred embodiments of the present invention, all according to scope of the present invention patent institute Impartial change and the modification done, all should belong to the covering scope of the present invention.

Claims

1. the intrusion detection method that a feature based selects, it is characterized in that: first initial data is carried out sliding-model control, data after sliding-model control are carried out feature selection, data after feature selection are normalized, the data after normalized are imported grader and is trained.

The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterized in that: described sliding-model control uses entropy minimization discrete method, first attribute successive value to be divided is arranged in order, then the midpoint of every a pair numerical value that is connected is taken as breakpoint Candidate Set, by each breakpoint in circulation assessment Candidate Set, data have been partitioned into two parts, and the comentropy of classification results is also computed, and then select that breakpoint making entropy minimum to add in break point set；The dwell time of described entropy minimization discretization is specified by a minimum description length.

The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterized in that: described sliding-model control adoption rate K time interval discrete method, a balance between an interval scale and desired accuracy rate is found, using this balance as the criterion between discretization deviation and a difference by adjusting discrete gap size and quantity.

The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterised in that: described data after sliding-model control are carried out feature selection, use feature selection approach based on association；Ignoring the feature little for classification impact by a valuation functions, described valuation functions is as follows:

Wherein, M_SFor comprising the heuristic value of subset S of k feature, r_cfIt is the meansigma methods of feature and classification relation intensity, r_ffIt it is the meansigma methods of relation intensity between feature and feature.

The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterized in that: described data after sliding-model control are carried out feature selection, using based on conforming filter method, the consistency level being projected onto character subset by comparing training sample carrys out selected characteristic；From characteristic number, subset S is randomly generated in each wheel, if number of features is less than the number of features in the most best character subset in subset S, then the inconsistent standard in S is calculated, if inconsistent rate is less than the value preset, then S has just become best character subset.

The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterized in that: described data after sliding-model control are carried out feature selection, using INTERACT method, described INTERACT method is algorithm based on symmetrical uncertain SU；First pass through the symmetrical uncertain SU descending of feature oneself, then from the beginning of these characteristic sequences are finally, assess feature one by one, if the concordance contribution of certain feature is less than threshold value, just removes this feature, otherwise use this feature；Described symmetrical uncertain SU is to describe two features x, information income IG and the tolerance of entropy H ratio between y, and formula is as follows:

The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterised in that: described normalized uses deviation normalization, and formula is as follows:

The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterised in that: described grader uses Naive Bayes Classifier: let d be the training set with limited quantity example, A={A₁,A₂,…,A_nIt is n limited attribute, an example d ∈ D vector (a₁,a₂,…a_n) describe, wherein a_iIt is attribute A_iCurrent value, category attribute C represents, function dom (A_i) obtain attribute A_iDefinition territory set；WhenDuring establishment, i.e. the classification of prediction example d is the classification that posterior probability is maximum under attribute specified criteria, it was predicted that correctness maximum；Introduce and assume: under conditions of given classification C, all of attribute A_iSeparate: P (A_i|c,A_j)=p (A_i|c),A_j, p (C) ＞ 0, employing following formula calculating posterior probability of classification in the case of given attribute value:

The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterised in that: described grader uses support vector machine classifier SVM, employing classification function:

Wherein, l represents the number of training sample, and x represents the vector of example to be sorted, x_i,y_iRepresent attribute vector and classification logotype, the K (x of i-th training sample_i, x) represent kernel function, a_iRepresent model parameter with b, solve parameter a by being calculated as follows a quadratic programming problem_i:

If two disaggregated models are:

G (x)=ω * x+b；

The threshold value arranging two disaggregated models is 0, obtains:

Select the plane that in two class objects, distance is maximum.

The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterised in that: described grader uses decision Tree algorithms, and described decision Tree algorithms is divided into two stages: set construction phase, hedge clipper branch stage；

Described tree construction phase uses top-down recursive fashion, start to select testing attribute according to given standard on each node from root node, then according to institute's likely value of respective attributes is set up downwards branch, is divided training sample, until the sample size that is all divided in same class, or a certain node of all samples on a node is less than till during set-point；The standard wherein selecting testing attribute includes information gain, information gain ratio, gini index and division based on distance；

The first beta pruning of employing of described hedge clipper branch stage, rear beta pruning or the method that both combine；The beta pruning standard of tree pruning method includes Minimal Description Length Criterion and expectation error rate minimum principle；The former carries out binary coding to decision tree, and optimal beta pruning tree is exactly the tree that the required binary bit of coding is minimum；The latter calculates the expectation error rate that the subtree on certain node is occurred after beta pruning.