CN106951778A

CN106951778A - A kind of intrusion detection method towards complicated flow data event analysis

Info

Publication number: CN106951778A
Application number: CN201710146332.4A
Authority: CN
Inventors: 杨秋伟; 谭奕; 孙铁峰
Original assignee: BBK Electronics Co Ltd
Current assignee: BBK Electronics Co Ltd
Priority date: 2017-03-13
Filing date: 2017-03-13
Publication date: 2017-07-14

Abstract

The invention discloses a kind of intrusion detection method towards complicated flow data event analysis, implementation steps include：Sample collection is carried out in advance, the sample of collection is performed intrusion detection into result queue and obtains training sample data collection, training sample data collection is completed to the training to grader after data prediction, data prediction includes Feature Dimension Reduction and sample reduction, and execution sequence is first Feature Dimension Reduction sample reduction or first sample reduction Feature Dimension Reduction again again；Current data set to be detected is inputted to the grader trained after data prediction and carries out classification and Detection output invasion testing result.The present invention can effectively remove correlation between each feature, can greatly simplify calculating, considered sample size redundancy it is related to sample characteristics the problem of, the redundancy of data is removed in terms of Feature Dimension Reduction and sample reduction two respectively, on the premise of ensureing that the front and rear detection accuracy of sample compression is basically unchanged, shortening training time and detection time are substantially reduced.

Description

A kind of intrusion detection method towards complicated flow data event analysis

Technical field

The present invention relates to the Intrusion Detection Technique of computer network security field, and in particular to a kind of towards complicated flow data The intrusion detection method of event analysis.

Background technology

Intrusion Detection Technique is that occur over nearly 20 years a kind of actively protects oneself in order to avoid the new network of hacker attack Safe practice.In the case of in face of various Cyberthreats, the intrusion behavior to network is correctly detected out in time and is taken just When processing mode to reduce the loss that network attack is caused be a focus being currently undertaken by network security research.Current people The main method using machine learning builds detection model, and its usual way is to extract invasion data or the normal spy for accessing data Levy, construction feature database, carry out pattern match, and then complete intrusion detection.Conventional machine learning method such as Bayes point Class, k nearest neighbor (KNN), genetic algorithm (GA), decision tree, artificial neural network (ANN), SVMs (SVM).Due to SVM pairs Higher-dimension, Small Sample Database have preferable classification performance, therefore, are usually used in the training of IDS Framework.However, using machine The method of study, which is performed intrusion detection, must face that data dependence is big, repeated sample is more in intrusion detection sample data, detection The problems such as time is long.The redundancy of data is mainly manifested in two aspects：One side is that the dimension of data sample is higher；On the other hand There is the sample of bulk redundancy.At present, the method for conventional removal data redundancy has feature selecting and fuzzy clustering etc., they Preferable effect is achieved in terms of simplifying amount of calculation, but fails to solve the problems such as convergence rate is slow, accuracy of detection is not high well.

Principal component analysis (Principle Component Analysis, abbreviation PCA) method is K.Person at one A kind of statistical analysis technique proposed before many centuries, its basic thought is to be characterized as a few multiple by linear transformation The orthogonal new feature arranged from big to small by importance.In intrusion detection data, often it is related to numerous features and becomes Amount, although each feature, which is both provided, certain correlation between certain information, but numerous features, therefore can use PCA removes the redundancy feature of sample.Compressed sensing (compressive sensing, abbreviation CS) is the calculation of image processing field Method, applied to the compression and reconstruction of image, the algorithm is pressed signal in the case where sample rate is far below Niquist rates Contracting, and restoration and reconstruction can be carried out to raw information by suitable optimized algorithm, and convergence rate is very fast.Therefore using pressure Contracting perception algorithm is sampled to sample, is removed the sample of redundancy, is reached the purpose of yojan.How PCA and compressed sensing are based on The intrusion detection of real-time high-efficiency is realized, a key technical problem urgently to be resolved hurrily is had become.

The content of the invention

The technical problem to be solved in the present invention：Above mentioned problem for prior art can be removed effectively respectively there is provided one kind Correlation between feature, calculating can be greatly simplified, shorten the training time and detection time towards complicated flow data event The intrusion detection method of analysis.

In order to solve the above-mentioned technical problem, the technical solution adopted by the present invention is：

A kind of intrusion detection method towards complicated flow data event analysis, implementation steps include：

1) sample collection is carried out in advance, and the sample of collection is performed intrusion detection into result queue and obtains training sample data Collection, completes the training to grader, the data prediction includes by the training sample data collection after data prediction Two steps of Feature Dimension Reduction and sample reduction, and execution sequence is first Feature Dimension Reduction sample reduction or first sample reduction feature again again Dimensionality reduction；

2) current data set to be detected is inputted to the grader trained after the data prediction, classified Device carries out the intrusion detection result exported after classification and Detection.

Preferably, step 1) in perform intrusion detection result queue specifically refer to by the sample labeling of collection be normal sample With attack sample, step 2) in the intrusion detection result of output refer to that output is normal or attack.

Preferably, the grader is specially SVM classifier.

Preferably, the Feature Dimension Reduction in the data prediction specifically refers to carry out PCA Feature Dimension Reductions.

Preferably, the detailed step of the progress PCA Feature Dimension Reductions includes：

A1) the character type Feature Mapping for concentrating input data is the numeric type feature in the range of [0,1], completes input number According to the nondimensionalization of collection, the data set X of n rows m row is obtained_nm, data set X_nmAltogether comprising n × m sample, and n × m sample point Do not belong to k different classification y₁~y_k), wherein sample X_iBelong to classification y_i；

A2) from data set X_nmMiddle traversal takes out a feature and is used as current signature X_i；

A3 current signature X) is calculated_iCovariance matrix Cov_x’；

A4 covariance matrix Cov) is calculated_x' eigenvalue λ₁,λ₂,…,λ_m, and eigenvalue λ₁,λ₂,…,λ_mCorresponding normalizing Change characteristic vector a₁,a₂,…,a_m；

A5) by eigenvalue λ₁,λ₂,…,λ_mSort from big to small, calculate variance contribution ratio

A6 variance contribution ratio) is judgedWhether default variance contribution ratio threshold value is less than, if less than default variance tribute Rate threshold value is offered, then redirects execution step A2), otherwise redirect execution next step；

A7) by eigenvalue λ₁,λ₂,…,λ_mCorresponding normalization characteristic vector a₁,a₂,…,a_mThe master of one m rows k row of composition Component matrix P_mk, by principal component matrix P_mkThe matrix Z for obtaining n rows k row is calculated according to formula (1)_nk；

Z_nk=X_nm×P_mk (1)

In formula (1), Z_nkRepresent to calculate the matrix that obtained n rows k is arranged, X_nmRepresent the data set of n rows m row, P_mkRepresent m rows k The principal component matrix of row；

A8) the matrix Z for arranging n rows k_nkExported as the result of PCA Feature Dimension Reductions.

Preferably, step A5) calculate variance contribution ratioFunction expression such as formula (2) shown in；

In formula (2), cov_iiFor covariance matrix Cov_x’Diagonal entry, λ i represent covariance matrix Cov_x’I-th Characteristic value, m represents data set X columns, and k represents principal component matrix P_mkColumns.

Preferably, the sample reduction in the data prediction specifically refers to the data to input based on compressed sensing algorithm Collection is compressed sampling, and step 1) in the training of grader when, selected by observing the accuracy rate of detection of classifier result Select the foregoing sampling number N for being compressed sampling.

Preferably, the detailed step for being compressed sampling to the data set of input based on compressed sensing algorithm includes：

B1 the dictionary D based on sparse base) is built；

B2) data based on given sampling number N persistently to input are sampled；

B3 the data set X ' for obtained n row m ' row of sampling) is subjected to rarefaction representation on dictionary D, and constructs observing matrix；

B4) data set X ' is observed based on observing matrix, specification is the matrix X " of n ' row m ' row after being compressed；

B5 the corresponding classification accuracy rates of the matrix X " after data set X ', compression) are calculated；

B6) judge whether classification accuracy rate is set up more than default classification accuracy rate threshold value, execution is redirected if setting up Step B3), else if it is invalid, redirect execution step B7)；

B7) result of the data matrix X " of n ' row m ' row as compression sampling is exported.

Preferably, step B1) build the dictionary D based on sparse base specifically refer to DCT dictionaries.

Preferably, step B2) in construction observing matrix be independent identically distributed gaussian random matrix.

The present invention has the advantage that towards the intrusion detection method tool of complicated flow data event analysis：

1st, the present invention introduces the concept of dimensionality reduction rate, to the feature of sample using on the basis of the method based on traditional PCA Dimensionality reduction, effectively removes the correlation between each feature.

2nd, the compressed sensing algorithm of image processing field is incorporated into intrusion detection by the present invention, by original data set Repeated sample is compressed sampling, and the small sample set of structure can greatly simplify calculating, shorten detection time, to the sample of redundancy Be compressed, in the case where verification and measurement ratio is suitable with uncompressed effect, training time and detection time well below it is traditional not Compression method.

3rd, the sample reduction that the present invention has been further introduced into feed-back regulatory mechanism, data prediction is specifically referred to based on pressure Contracting perception algorithm is compressed sampling to the data set of input, foregoing to select by observing the accuracy rate of detection of classifier result The sampling number N of sampling is compressed, optimum sampling times N is drawn, so as to reach real-time, the high purpose precisely detected.

Brief description of the drawings

Fig. 1 is the basic procedure schematic diagram of the method for the embodiment of the present invention one.

Fig. 2 is SVM, naive Bayesian, the comparison figure of the classification accuracy rate index of C4.5 graders.

Fig. 3 is SVM, naive Bayesian, the comparison figure of the classification model construction time index of C4.5 graders.

Fig. 4 is SVM, naive Bayesian, the comparison figure of the classification and Detection time index of C4.5 graders.

Fig. 5 is embodiment one, the comparison figure of uncompressed, the method for embodiment two the classification accuracy rate index of tradition.

Fig. 6 is embodiment one, the comparison figure of uncompressed, the method for embodiment two the verification and measurement ratio index of tradition.

Fig. 7 is embodiment one, the comparison figure of uncompressed, the method for embodiment two the rate of false alarm index of tradition.

Embodiment

Embodiment one：

The present embodiment, as detected complicated flow data, have chosen wherein using the famous data set KDDCUP99 that increases income 98328 samples, wherein Normal (56237), DoS (40172), R2L (9), U2R (102), Probe (1808).To try to achieve more Plus accurate experimental result, towards complicated flow data by the way of 10 10 folding cross validations.Hereafter will be with data set of increasing income Exemplified by KDDCUP99, the present embodiment intrusion detection method is further detailed.

As shown in figure 1, the implementation steps of the present embodiment towards the intrusion detection method of complicated flow data event analysis include：

The present embodiment towards complicated flow data event analysis intrusion detection method in process of data preprocessing, first use PCA carries out dimensionality reduction to the redundancy feature of sample, then using compressed sensing redundant samples is compressed or by opposite suitable Sequence is compressed, considered sample size redundancy it is related to sample characteristics the problem of, respectively in terms of this is two remove The redundancy of data.On the premise of the algorithm ensures that the front and rear detection accuracy of sample compression is basically unchanged, when greatly shortening detection Between.

In the present embodiment, step 1) in perform intrusion detection result queue and specifically refer to the sample labeling of collection be normal Sample and attack sample, step 2) in the intrusion detection result of output refer to that output is normal or attack.

The low-dimensional data that appropriate grader can be obtained using PCA and compression sampling completes classification learning, and to test number There is preferable classification accuracy according to collection.In the present embodiment, grader is specially SVM classifier, and SVM classifier is to higher-dimension, small Sample data has preferable classification performance, it is possible to increase towards the performance of the intrusion detection of complicated flow data event analysis.

In the present embodiment, the Feature Dimension Reduction in data prediction specifically refers to carry out PCA Feature Dimension Reductions.PCA Feature Dimension Reductions General principle it is as follows：Remember X=(X₁,X₂,…,X_m), wherein X_j(1≤j≤m) is sample i (1≤j≤n) j-th of feature, By matrix P=(P₁,P₂,…,P_l)(1≤l≤n,P_lIt is the vector of m dimensions) conversion, form Z=(Z₁,Z₂,…,Z_k), Z be n × K matrix (k≤m).The principle of PCA Feature Dimension Reductions is：When the variance contribution ratio of the covariance matrix of sample is more than some threshold value When, now corresponding dimensionality reduction rate is ρ=(m-k)/m, and original sample can be by the new feature of a few after converting Z_j’(1≤j’≤k) Describe, and its main component does not change.

In the present embodiment, carrying out the detailed step of PCA Feature Dimension Reductions includes：

A3 current signature X) is calculated_iCovariance matrix Cov_x’；

A6 variance contribution ratio) is judgedWhether being less than default variance contribution ratio threshold value, (value is in the present embodiment 90%), if less than default variance contribution ratio threshold value, then execution step A2 is redirected), otherwise redirect execution next step；

Z_nk=X_nm×P_mk (1)

In the present embodiment, step A5) calculate variance contribution ratioFunction expression such as formula (2) shown in；

In formula (2), cov_iiFor covariance matrix Cov_x’Diagonal entry, λ i represent covariance matrix Cov_x' i-th Characteristic value, m represents data set X columns, and k represents principal component matrix P_mkColumns.

In the present embodiment, step A1) the character type Feature Mapping of concentrating input data is the numeric type in the range of [0,1] Feature, the nondimensionalization for completing input data set obtains the data set X of n rows m row_nm, data set X_nmThe attribute of 41 dimensions is included altogether. WhenWhen, the Feature Dimension Reduction rate ρ of definition now：ρ=(m-k)/m, wherein m represent data set X columns, and k represents main Component matrix P_mkColumns.In the present embodiment, Feature Dimension Reduction rate ρ is about 50%.For in a way, it must be got over when dimension drops It is low, the time of detection can be substantially reduced, but the accuracy rate of sample classification can be influenceed, and consider by compromise, it is final herein to retain 21 features, are carried out after PCA Feature Dimension Reductions, the matrix Z that n rows k is arranged_nkExported as the result of PCA Feature Dimension Reductions, matrix Z_nk In altogether include 21 dimension attribute features.

In the present embodiment, the sample reduction in data prediction specifically refers to the data to input based on compressed sensing algorithm Collection is compressed sampling, and step 1) in the training of grader when, selected by observing the accuracy rate of detection of classifier result Select the foregoing sampling number N for being compressed sampling.The key sampled based on compressed sensing is sampling number N selection, usual feelings Sampling number N is fewer in the condition lower unit interval, and compression ratio is higher, and the speed that its later stage is trained and detected is faster, but compared with High compression ratio can influence the precision of detection, it is necessary to weigh detection speed and detection precision.Therefore, the invasion of the present embodiment Detection method introduces a kind of feed-back regulatory mechanism, by observing the accuracy rate of last testing result, selects different observing matrixes Line number control compression sampling times N, experiment draws optimum sampling times N, so as to reach real-time, the high mesh precisely detected 's.

In the present embodiment, the detailed step for being compressed sampling to the data set of input based on compressed sensing algorithm includes：

B1 the dictionary D based on sparse base) is built；

B2) data based on given sampling number N persistently to input are sampled；

B4) data set X ' is observed based on observing matrix, specification is the matrix X " of n ' row m ' row after being compressed；This In embodiment, the data sample sampled every time is sampling number N=600, and correspondence observing matrix is 600 row, and compresses journey Degree depends on observing matrix line number respectively 0.6N, 0.65N, 0.7N ..., the N selected in sampling matrix line number, the present embodiment, It is exactly uncompressed sampling configuration when for N；

In the present embodiment, step B1) build the dictionary D based on sparse base and specifically refer to DCT (Discrete Cosine Transform) dictionary.The design of observing matrix, except to meet it is uncorrelated to sparse base in addition to, candes and Tao give Limited equidistant this necessary and sufficient condition of property RIP.In the present embodiment, step B2) in the observing matrix of construction be independent identically distributed Gaussian random matrix, can meet uncorrelated to sparse base simultaneously, and limited equidistant this necessary and sufficient condition of property RIP.

It is used below in order to assess the present embodiment towards the performance of the intrusion detection method of complicated flow data event analysis Confusion matrix, and on the basis of confusion matrix, be further introduced into verification and measurement ratio, classification accuracy rate, rate of false alarm, the modeling time, Several detection performance indications such as detection time, confusion matrix.

All examples in model are divided into different classes by confusion matrix by determining whether predicted value matches with actual value Not.Then all examples in each classification can be counted, and shows total in a matrix, as shown in table 1：

Table 1：Confusion matrix.

According to the true classification and detection model of given sample to its class prediction, it is likely to occur altogether such as table 1 4 kinds of shown structures：True positive (TP), true negative (TN), false positive (FP) and false negative(FN).Wherein TN detection models corresponding with TP it is correctly predicted, i.e., sample is correctly identified as normal or attack.And FP and FN then correspond to error prediction, and FP refers to normal sample and is misidentified as attack, and FN refers to attack sample and is erroneously identified as normally.

Based on confusion matrix, each index above-mentioned can be calculated.Computing formula is as follows：

In above formula, TP represents that true positive, FN represent that false negative, TN represent true negative, FP represents false positive, and FP and FN then correspond to error prediction, and FP refers to normal sample and is misidentified as attack, and FN, which refers to, to be attacked Sample is hit to be erroneously identified as normally.

Intrusion detection method experiment for the present embodiment towards complicated flow data event analysis is completed under Weka. The intrusion detection method of the present embodiment towards complicated flow data event analysis uses first Feature Dimension Reduction, the compression side of rear sample reduction Formula, is pre-processed to data set, then carries out sample in SVM, naive Bayesian, three kinds of graders of C4.5 graders respectively Detection, and compare the indexs such as their classification accuracy rate, modeling time, detection time, as shown in Figure 2, Figure 3, Figure 4.

It is can be seen that from Fig. 2, Fig. 3, Fig. 4 under relatively low sampling number N, classification of the SVM than naive Bayesian and C4.5 More preferably, main cause is attributed to advantages of the SVM in terms of processing higher-dimension, Small Sample Database to performance, with sampling number N increasing Many, naive Bayesian and C4.5 classification performance rising are more apparent, and final classification accuracy is slightly above SVM, meanwhile, SVM's builds Mould time and detection time between C4.5 and naive Bayesian (sampling number N be 450 after), it can thus be seen that SVM Most suitable as the present embodiment towards the intrusion detection method of complicated flow data event analysis grader.

Embodiment two：

Essentially identical with embodiment one in the present embodiment, its main difference is：Data prediction bag in embodiment one Two steps of Feature Dimension Reduction and sample reduction are included, and execution sequence is first Feature Dimension Reduction sample reduction again.And in the present embodiment, number Data preprocess equally includes two steps of Feature Dimension Reduction and sample reduction, but execution sequence is that feature drops first sample reduction again Dimension, but sample reduction, the detailed content of two steps of Feature Dimension Reduction are identical with embodiment one, therefore will not be repeated here.

In order to further verify the present embodiment towards the performance of the intrusion detection method of complicated flow data event analysis, experiment Comparing traditional uncompressed sample and first Feature Compression, redundant samples compression and first redundant samples compress feature again again The mode of compression, the result of classification and Detection is carried out using SVM as grader.Fig. 5, Fig. 6, Fig. 7 illustrate experimental data set process Above-mentioned three kinds of modes handle after verification and measurement ratio (DR), classification accuracy rate (TR), the situation of rate of false alarm (FPR).From Fig. 5, Fig. 6, Fig. 7 As can be seen that embodiment one (PCA-CS) is roughly the same with the tendency of the curve of the present embodiment (CS-PCA) method, in low sampling Under times N, classification accuracy rate, verification and measurement ratio are all than relatively low, and rate of false alarm is higher.With sampling number N increase, performance is substantially obtained Lifting, when particularly sampling number N is between 500-550, accuracy, verification and measurement ratio and rate of false alarm tend towards stability, and reach and tradition Uncompressed sampling equivalent effect.In order to further verify two methods of PCA-CS and CS-PCA in modeling time, detection Between upper performance quality, using 10 10 folding cross validations and by the way of being averaged, respectively to the various times of the two methods (when sampling number N is 550) is counted, as shown in table 2 and table 3.

Table 2：The time statistics of the present embodiment.

Table 3：The time statistics of embodiment one.

The modeling time of the method for use embodiment one and detection time are with the present embodiment side it can be seen from table 2 and table 3 Method is roughly equal, and the performance of embodiment one is slightly good, it was demonstrated that the present embodiment is towards the intrusion detection side of complicated flow data event analysis The PCA and the data compression method of compressed sensing that method is proposed are feasible, stable.

Table 4 compares embodiment one, the present embodiment and the uncompressed method of tradition, the grader constituted with SVM it is flat Model and detection time.

Table 4：Modeling time and detection time performance comparison.

Method	Averagely model time/S	The average detected time/S
			The uncompressed method of tradition	25.5	3.31
The present embodiment method	14.32	1.64
			The method of embodiment one	14.09	1.40

From Fig. 5, Fig. 6, Fig. 7, table 2~4 is as can be seen that classification accuracy rate and verification and measurement ratio are slightly below before compression upon compression In the range of (1%-3%), the method being combined using PCA and compressed sensing is performed intrusion detection, and can greatly simplify amount of calculation, Efficiently reduce the time of training and detection, it was demonstrated that the intrusion detection method of the present embodiment towards complicated flow data event analysis is carried The intrusion detection method based on PCA and compressed sensing gone out is feasible, effective.

The above is only the preferred embodiment of the present invention, and protection scope of the present invention is not limited merely to above-mentioned implementation Example, all technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that for the art Those of ordinary skill for, some improvements and modifications without departing from the principles of the present invention, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims

1. a kind of intrusion detection method towards complicated flow data event analysis, it is characterised in that implementation steps include：

1) sample collection is carried out in advance, and the sample of collection is performed intrusion detection into result queue and obtains training sample data collection, will The training sample data collection completes the training to grader after data prediction, and the data prediction drops including feature Two steps of peacekeeping sample reduction, and execution sequence is first Feature Dimension Reduction sample reduction or first sample reduction Feature Dimension Reduction again again；

2) current data set to be detected is inputted to the grader trained after the data prediction, grader is obtained and enters The intrusion detection result exported after row classification and Detection.

2. the intrusion detection method according to claim 1 towards complicated flow data event analysis, it is characterised in that step 1) result queue is performed intrusion detection in and specifically refers to the sample labeling of collection be normal sample and attack sample, step 2) in The intrusion detection result of output refers to that output is normal or attacks.

3. the intrusion detection method according to claim 1 towards complicated flow data event analysis, it is characterised in that described Grader is specially SVM classifier.

4. the intrusion detection method according to claim 1 towards complicated flow data event analysis, it is characterised in that described Feature Dimension Reduction in data prediction specifically refers to carry out PCA Feature Dimension Reductions.

5. the intrusion detection method according to claim 4 towards complicated flow data event analysis, it is characterised in that described Carrying out the detailed step of PCA Feature Dimension Reductions includes：

A1) the character type Feature Mapping for concentrating input data is the numeric type feature in the range of [0,1], completes input data set Nondimensionalization, obtain n rows m row data set X_nm, data set X_nmAltogether comprising n × m sample, and n × m sample belongs to respectively In k different classification y₁~y_k), wherein sample X_iBelong to classification y_i；

A3 current signature X) is calculated_iCovariance matrix Cov_x’；

A4 covariance matrix Cov) is calculated_x’Eigenvalue λ₁,λ₂,…,λ_m, and eigenvalue λ₁,λ₂,…,λ_mCorresponding normalization is special Levy vectorial a₁,a₂,…,a_m；

A6 variance contribution ratio) is judgedWhether default variance contribution ratio threshold value is less than, if less than default variance contribution ratio Threshold value, then redirect execution step A2), otherwise redirect execution next step；

A7) by eigenvalue λ₁,λ₂,…,λ_mCorresponding normalization characteristic vector a₁,a₂,…,a_mThe principal component of one m rows k row of composition Matrix P_mk, by principal component matrix P_mkThe matrix Z for obtaining n rows k row is calculated according to formula (1)_nk；

Z_nk=X_nm×P_mk (1)

In formula (1), Z_nkRepresent to calculate the matrix that obtained n rows k is arranged, X_nmRepresent the data set of n rows m row, P_mkRepresent m rows k row Principal component matrix；

6. the intrusion detection method according to claim 5 towards complicated flow data event analysis, it is characterised in that step A5 variance contribution ratio) is calculatedFunction expression such as formula (2) shown in；

φ (k) = Σ_{i = 1}^{k} {cov}_{i i} / Σ_{i = 1}^{m} {cov}_{i i} = Σ_{i = 1}^{k} λ_{i} / Σ_{i = 1}^{m} λ_{i} - - - (2)

In formula (2), cov_iiFor covariance matrix Cov_x’Diagonal entry, λ i represent covariance matrix Cov_x’Ith feature Value, m represents data set X columns, and k represents principal component matrix P_mkColumns.

7. the intrusion detection method according to claim 1 towards complicated flow data event analysis, it is characterised in that described Sample reduction in data prediction specifically refers to be compressed the data set of input sampling, and step based on compressed sensing algorithm It is rapid 1) in the training of grader when, select foregoing to be compressed sampling by observing the accuracy rate of detection of classifier result Sampling number N.

8. the intrusion detection method according to claim 7 towards complicated flow data event analysis, it is characterised in that described The detailed step for being compressed sampling to the data set of input based on compressed sensing algorithm includes：

B1 the dictionary D based on sparse base) is built；

B2) data based on given sampling number N persistently to input are sampled；

B6) judge whether classification accuracy rate is set up more than default classification accuracy rate threshold value, execution step is redirected if setting up B3), else if invalid, execution step B7 is redirected)；

9. the intrusion detection method according to claim 8 towards complicated flow data event analysis, it is characterised in that step B1) build the dictionary D based on sparse base and specifically refer to DCT dictionaries.

10. the intrusion detection method according to claim 8 towards complicated flow data event analysis, it is characterised in that step Rapid B2) in the observing matrix of construction be independent identically distributed gaussian random matrix.