CN102324007A

CN102324007A - Method for detecting abnormality based on data mining

Info

Publication number: CN102324007A
Application number: CN201110283015A
Authority: CN
Inventors: 唐朝伟; 时豪; 严鸣; 张雪臻; 李超群; 杨磊
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2011-09-22
Filing date: 2011-09-22
Publication date: 2012-01-18
Anticipated expiration: 2031-09-22
Also published as: CN102324007B

Abstract

The invention discloses a kind of method for detecting abnormality, belong to the network security technology field based on data mining.This method for detecting abnormality at first carries out feature extraction with the Fast-ICA algorithm based on independent component analysis and Adaboost method, to eliminate redundant attributes, reduces the data dimension.The AdaBoost method is trained one group of Weak Classifier successively, and they are integrated into a strong classifier.Through the present invention, eliminate the redundant attributes information in the network data effectively, reduced the operand of the training and the detection of sorter; Also improved simultaneously the precision that detects, the probability that reduces the sample wrong report and fail to report.

Description

Method for detecting abnormality based on data mining

Technical field

The present invention relates to the computing machine method for detecting abnormality, especially a kind of method for detecting abnormality based on data mining.

Background technology

Intrusion detection is the detection to the computer system attack, provide to internal attack, the real-time guard of external attack and maloperation.For can the accurate recognition attack type; Collect related data information in several key events in the network system of the log record file of intrusion detection through from the computing machine local system, computing machine etc.; And, whether the result of behavior generation of violating security strategy or the sign that whether is subjected to attack is arranged in computing machine local system that obtains detecting or the computer network system through analysis for these data.Intrusion detection can monitoring and the current entry of analysis user and system is movable, the existing known attack of integrality, identification of the keystone resources of the security breaches in the check system configuration, evaluates calculation machine system and data file or user's abuse, statistics and analyze abnormal behaviour, write down for system journal and administer and maintain; Promptly under computer system performance can't affected situation, computer system network is carried out real-time monitoring and control.

Because in the existing Intrusion Detection Technique; The mass data that collects is as the data source of intruding detection system; It is carried out analyzing and processing to judge whether to take place intrusion event, and lot of data has also increased the difficulty of effectively utilizing these data when the quantity of information that can supply utilize is provided; Useful information may be submerged among a large amount of redundant datas on the contrary, has increased the difficulty of feature extraction.

Summary of the invention

The purpose of this invention is to provide a kind of method for detecting abnormality,, eliminated the redundant attributes in the network data, improved the precision that detects through extracting useful network data characteristic in the network data based on data mining, and the probability that has reduced wrong report and failed to report.

To achieve these goals, the invention provides a kind of method for detecting abnormality, it is characterized in that: form by following steps based on data mining:

S1, with network data as observational variable, adopt the Fast-ICA method from said observational variable, to extract the observational variable characteristic, constitute observational variable characteristic set Z, promptly obtain to eliminate the network data characteristic of redundant attributes and reduction data dimension;

S2, employing AdaBoost method training observation characteristics of variables: with the observational variable feature set is training set; Each observational variable characteristic is as training text; Give weights to each training text; Wherein said weights are used to represent that said training text is selected into the probability of training set by Weak Classifier, after the Weak Classifier training finishes, and the weight of regulating each training text according to the classification results of training set: if said training sample is by said Weak Classifier precise classification; Then the weight of said Weak Classifier reduces, and then it is reduced by the probability that next Weak Classifier is selected into training set; If said training sample is not by said Weak Classifier precise classification, then it is promoted by the probability that next Weak Classifier is selected into training set, finally obtains strong classifier;

S3, unusual network data is detected according to said strong classifier.

In said step S1, form by following steps:

S10, setting N observational variable

; Constitute the linear combination that observational variable set and each observational variable all are expressed as M isolated component

; Wherein M isolated component

constitutes the isolated component set; I=1; N; J=1; M and N, M are the integer greater than 1; Ask for the transposed matrix X=

of observational variable set and the transposed matrix S=

of isolated component set; And set X=A*S, wherein A

is unknown hybrid matrix;

S11, said observational variable is carried out albefaction handle;

The generalized inverse of S12, setting hybrid matrix A is separation matrix W; Regulate said separation matrix W according to formula through gradient method at random; Ask for the optimal estimation

of said transposed matrix S, thus the network data characteristic that obtains to eliminate redundant attributes and reduce the data dimension.

Regulating separation matrix W through gradient method at random among the said step S12 is made up of following steps:

(1) according to formula

said separation matrix W is carried out iterative processing with behavior unit; Wherein after k iteration of expression among the said separation matrix W with the observational variable set in the corresponding delegation of i observational variable

vectorial; After k+1 iteration of

expression among the separation matrix W with the observational variable set in the corresponding delegation of i observational variable

vectorial; After k iteration of expression among the separation matrix W with the observational variable set in the vectorial transposed matrix of the corresponding delegation of i observational variable

; E is the expectation operational symbol; G is the gaussian distribution calculation symbol, and i, k are the integer greater than 1;

Whether absolute value≤the ξ that (2), judges

- sets up; If set up then the finishing iteration processing; Obtain final separation matrix W (n); Execution in step (3); If be false then repeated execution of steps (1), wherein ξ gets any number between 0～1;

(3), said final separation matrix W (n) being carried out normalization with behavior unit handles; I.e.

, wherein norm is asked in

expression;

(4) with the optimal estimation

of trying to achieve said transposed matrix S in final separation matrix W (n) the substitution formula

, thus the network data characteristic that obtains to eliminate redundant attributes and reduce the data dimension.

In said step S2, form by following steps:

S20, setting training set are G= ;

;

; Wherein y is the optimal estimation of transposed matrix S; I=1; M+n, m+n are the integer greater than 1;

type of being label;

=+ 1 o'clock is minority class;

=-1 o'clock is most types; The number of minority class sample is m; The number of most type samples is n, and m < < n;

S21, the said training set of initialization: the weight of each

among the training set G all is initialized as 1/n;

S22, be Weak Classifier, call Weaklearn and carry out T iteration training that wherein each iteration training obtains one group of Weak Classifier function with BP;

S23, before each iteration training, judge whether iterations >=T sets up, if set up then, if be false then adjust weight, repeated execution of steps S22 by T group Weak Classifier combination of function acquisition strong classifier.

In sum, owing to adopted technique scheme, the invention has the beneficial effects as follows:

Through the present invention, eliminate the redundant attributes information in the network data effectively, reduced the operand of the training and the detection of sorter; Also improved simultaneously the precision that detects, the probability that reduces the sample wrong report and fail to report.

Description of drawings

The present invention will explain through example and with reference to the mode of accompanying drawing, wherein:

Fig. 1 is a process flow diagram of the present invention;

Fig. 2 is the process flow diagram that the Fast-ICA method is extracted characteristic;

Fig. 3 is the process flow diagram of AdaBoost method;

Fig. 4 is the experiment test design sketch.

Embodiment

Disclosed all characteristics in this instructions, or the step in disclosed all methods or the process except mutually exclusive characteristic and/or the step, all can make up by any way.

Disclosed arbitrary characteristic in this instructions (comprising any accessory claim, summary and accompanying drawing) is only if special narration all can be replaced by other equivalences or the alternative features with similar purpose.That is, only if special narration, each characteristic is an example in a series of equivalences or the similar characteristics.

Shown in Fig. 1, should form by three steps based on the method for detecting abnormality of data mining.

Step 1, with network data as observational variable, adopt the Fast-ICA method from this observational variable, to extract the observational variable characteristic, constitute the observational variable characteristic set, promptly obtain to eliminate the network data characteristic of redundant attributes and reduction data dimension.

The Fast-ICA algorithm is called fix point method (Fixed-point) again, and its thinking is to regulate separation matrix W through gradient method at random to make the independence between source signal the strongest.

As shown in Figure 2; The process that adopts the Fast-ICA method to extract the observational variable characteristic specifically is made up of following steps: S10, setting N observational variable ; Constitute the linear combination that observational variable set and each observational variable all are expressed as M isolated component

; Wherein M isolated component

of observational variable set and the transposed matrix S=

of isolated component set; And set X=A*S, wherein A

is unknown hybrid matrix;

S11, said observational variable is carried out albefaction handle;

The generalized inverse of S12, setting hybrid matrix A is separation matrix W; Regulate said separation matrix W according to formula

through gradient method at random; Ask for the optimal estimation

of said transposed matrix S; Thereby the network data characteristic that obtains to eliminate redundant attributes and reduce the data dimension, i.e. observational variable set characteristic.

The Fast-ICA method is to be the basis with the maximum criterion principle of negentropy.The principle of the maximum criterion of negentropy is: can be known by central limit theorem; Stochastic variable

is made up of many mutually independent random variables

in the observational variable set; As long as each independent random variables

has limited average and variance; No matter then how it distributes, stochastic variable

must be near Gaussian distribution.Therefore, in detachment process, measure non-Gauss's property of optimal estimation y, when non-Gauss's property tolerance reaches maximum, then show the separation of having accomplished each isolated component, the definition negentropy is following:

Wherein

representes to have with optimal estimation y the random quantity of mutually homoscedastic Gaussian distribution, and

is the information entropy of stochastic variable.Can find out by above-mentioned formula; When optimal estimation y has Gaussian distribution

; When non-Gauss's property of optimal estimation y is strong more, the value of

is big more.

Therefore; Gradient method adopts

at random, and (promptly is proportional to

; Wherein E is the expectation operational symbol; G is a gaussian distribution calculation symbol) the maximum criterion of negentropy separation matrix W is carried out iterative processing, form by following steps:

(1) according to formula

said separation matrix W is carried out iterative processing with behavior unit; Wherein after k iteration of

expression among the said separation matrix W with the observational variable set in the corresponding delegation of i observational variable vectorial; After k+1 iteration of

vectorial; After k iteration of

expression among the separation matrix W with the observational variable set in the vectorial transposed matrix of the corresponding delegation of i observational variable

Whether absolute value≤the ξ that (2), judges

-

sets up; If set up then the finishing iteration processing; Obtain final separation matrix W (n); Execution in step (3); If be false then repeated execution of steps (1), wherein ξ gets any number between 0～1;

, wherein norm is asked in

expression;

(4) with the optimal estimation

Step 2, employing AdaBoost method training observation characteristics of variables: with the observational variable feature set is training set; Each observational variable characteristic is as training text; Give weights to each training text; Wherein said weights are used to represent that said training text is selected into the probability of training set by Weak Classifier, after the Weak Classifier training finishes, and the weight of regulating each training text according to the classification results of training set: if said training sample is by said Weak Classifier precise classification; Then the weight of said Weak Classifier reduces, and then it is reduced by the probability that next Weak Classifier is selected into training set; If said training sample is not by said Weak Classifier precise classification, then it is promoted by the probability that next Weak Classifier is selected into training set, finally obtains strong classifier;

Step 3, unusual network data is detected according to said strong classifier.

As shown in Figure 3, in AdaBoost method training process with the BP network as Weak Classifier, form by following steps:

S20, setting training set are G=

;

;

type of being label;

=+ 1 o'clock is minority class;

S21, initialization training set: the weight of each

among the training set G all is initialized as 1/n;

S23, before each iteration, judging whether iterations >=T sets up, if set up then obtain strong classifier, if be false then adjust weight, repeated execution of steps S22 by T group Weak Classifier combination of function.Because training of the iteration of AdaBoost method and weight adjustment process are mature technology, will not tire out at this and state.

Test and Selection KDD99 data set, this data set are the test data set of being set up by Massachusetts Institute of Technology (MIT) Lincoln laboratory in 1998.Wherein every data recording all comprises 41 property values.These property values can be divided into four parts, the base attribute that promptly connects, the contents attribute of connection and time-based flow attribution, Host Based flow attribution.Experimental data is made up of training set and test set two parts.

Introduced the FASTICA feature extraction step in feature extraction step; Before to the network data classification; Use the FASTICA algorithm that data are carried out feature extraction earlier; Eliminated the redundant attributes in the data, significantly reduced the operand of the training and the detection of sorter, it is independent to utilize independent component analysis method to find between each attribute of new feature space sample in this space.Training dataset comprises 4000 records in the experiment, and test data set comprises 800 records.

Emulation platform: programming simulation under the matlab7.6, the test design sketch is as shown in Figure 4:

Strong classifier error in classification rate

ans?=?0.0063；

Weak Classifier error in classification rate

ans?=?0.0142。

Experimental analysis:

Experiment through verification and measurement ratio (detection rate, DR) and rate of false alarm (false positive rate FPR) weighs the performance of intruding detection system.Their define as follows:

Verification and measurement ratio (DR)=detected invasion sample number/invasion total sample number

The normal sample given figure of error rate (FPR)=be mistaken as invasion/normal total sample number

In experimentation, use training data set pair system to train earlier, to set up an inbreak detection rule storehouse; After training was accomplished, the use test data set was tested system.

Can find out that from experimental data what this patent proposed has than higher verification and measurement ratio and low rate of false alarm based on visible intrusion detection method through the dimension-reduction treatment of FASTICA characteristic.

Table one error in classification statistics

Table two detection statistics

Through the present invention; Adopt the FASTICA algorithm that data are carried out feature extraction and carry out the data pre-service; Redundant attributes in the data is eliminated; Significantly reduced the operand of sorter training and context of detection, aspect sorter, made Weak Classifier simultaneously and form the Adaboost strong classifier, removed to train the Adaboost sorter with 4000 training samples in the test with BP.From above table, can see; Through the pretreated Adaboost strong classifier of Fast-ICA data the higher detection rate is arranged; Simultaneously strong error in classification rate is lower than Weak Classifier error in classification rate, and the verification and measurement ratio of Adaboost strong classifier will be higher than Weak Classifier classification and Detection rate.

The present invention is not limited to aforesaid embodiment.The present invention expands to any new feature or any new combination that discloses in this manual, and the arbitrary new method that discloses or step or any new combination of process.

Claims

1. method for detecting abnormality based on data mining is characterized in that: be made up of following steps:

S3, unusual network data is detected according to said strong classifier.

2. the method for detecting abnormality based on data mining according to claim 1 is characterized in that: in said step S1, be made up of following steps:

S10, setting N observational variable

; Wherein M isolated component constitutes the isolated component set; I=1; N; J=1; M and N, M are the integer greater than 1; Ask for the transposed matrix X=

of observational variable set and the transposed matrix S=

of isolated component set; And set X=A*S, wherein A

is unknown hybrid matrix;

S11, said observational variable is carried out albefaction handle;

through gradient method at random; Ask for the optimal estimation

3. the method for detecting abnormality based on data mining according to claim 2 is characterized in that: regulate separation matrix W through gradient method at random among the said step S12 and be made up of following steps:

(1) according to formula

expression among the said separation matrix W with the observational variable set in the corresponding delegation of i observational variable

vectorial; After k+1 iteration of

vectorial; After k iteration of

Whether absolute value≤the ξ that (2), judges

-

, wherein norm is asked in

expression;

(4) with the optimal estimation

4. the method for detecting abnormality based on data mining according to claim 1 is characterized in that: in said step S2, be made up of following steps:

S20, setting training set are G= ; ;

Figure 201110283015X100001DEST_PATH_IMAGE036

Figure 201110283015X100001DEST_PATH_IMAGE038

type of being label;

=+ 1 o'clock is minority class;

S21, the said training set of initialization: the weight of each

Figure 201110283015X100001DEST_PATH_IMAGE040

among the training set G all is initialized as 1/n;