CN106951778A - A kind of intrusion detection method towards complicated flow data event analysis - Google Patents
A kind of intrusion detection method towards complicated flow data event analysis Download PDFInfo
- Publication number
- CN106951778A CN106951778A CN201710146332.4A CN201710146332A CN106951778A CN 106951778 A CN106951778 A CN 106951778A CN 201710146332 A CN201710146332 A CN 201710146332A CN 106951778 A CN106951778 A CN 106951778A
- Authority
- CN
- China
- Prior art keywords
- sample
- data
- matrix
- intrusion detection
- row
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/554—Detecting local intrusion or implementing counter-measures involving event detection and direct action
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Hardware Design (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of intrusion detection method towards complicated flow data event analysis, implementation steps include:Sample collection is carried out in advance, the sample of collection is performed intrusion detection into result queue and obtains training sample data collection, training sample data collection is completed to the training to grader after data prediction, data prediction includes Feature Dimension Reduction and sample reduction, and execution sequence is first Feature Dimension Reduction sample reduction or first sample reduction Feature Dimension Reduction again again;Current data set to be detected is inputted to the grader trained after data prediction and carries out classification and Detection output invasion testing result.The present invention can effectively remove correlation between each feature, can greatly simplify calculating, considered sample size redundancy it is related to sample characteristics the problem of, the redundancy of data is removed in terms of Feature Dimension Reduction and sample reduction two respectively, on the premise of ensureing that the front and rear detection accuracy of sample compression is basically unchanged, shortening training time and detection time are substantially reduced.
Description
Technical field
The present invention relates to the Intrusion Detection Technique of computer network security field, and in particular to a kind of towards complicated flow data
The intrusion detection method of event analysis.
Background technology
Intrusion Detection Technique is that occur over nearly 20 years a kind of actively protects oneself in order to avoid the new network of hacker attack
Safe practice.In the case of in face of various Cyberthreats, the intrusion behavior to network is correctly detected out in time and is taken just
When processing mode to reduce the loss that network attack is caused be a focus being currently undertaken by network security research.Current people
The main method using machine learning builds detection model, and its usual way is to extract invasion data or the normal spy for accessing data
Levy, construction feature database, carry out pattern match, and then complete intrusion detection.Conventional machine learning method such as Bayes point
Class, k nearest neighbor (KNN), genetic algorithm (GA), decision tree, artificial neural network (ANN), SVMs (SVM).Due to SVM pairs
Higher-dimension, Small Sample Database have preferable classification performance, therefore, are usually used in the training of IDS Framework.However, using machine
The method of study, which is performed intrusion detection, must face that data dependence is big, repeated sample is more in intrusion detection sample data, detection
The problems such as time is long.The redundancy of data is mainly manifested in two aspects:One side is that the dimension of data sample is higher;On the other hand
There is the sample of bulk redundancy.At present, the method for conventional removal data redundancy has feature selecting and fuzzy clustering etc., they
Preferable effect is achieved in terms of simplifying amount of calculation, but fails to solve the problems such as convergence rate is slow, accuracy of detection is not high well.
Principal component analysis (Principle Component Analysis, abbreviation PCA) method is K.Person at one
A kind of statistical analysis technique proposed before many centuries, its basic thought is to be characterized as a few multiple by linear transformation
The orthogonal new feature arranged from big to small by importance.In intrusion detection data, often it is related to numerous features and becomes
Amount, although each feature, which is both provided, certain correlation between certain information, but numerous features, therefore can use
PCA removes the redundancy feature of sample.Compressed sensing (compressive sensing, abbreviation CS) is the calculation of image processing field
Method, applied to the compression and reconstruction of image, the algorithm is pressed signal in the case where sample rate is far below Niquist rates
Contracting, and restoration and reconstruction can be carried out to raw information by suitable optimized algorithm, and convergence rate is very fast.Therefore using pressure
Contracting perception algorithm is sampled to sample, is removed the sample of redundancy, is reached the purpose of yojan.How PCA and compressed sensing are based on
The intrusion detection of real-time high-efficiency is realized, a key technical problem urgently to be resolved hurrily is had become.
The content of the invention
The technical problem to be solved in the present invention:Above mentioned problem for prior art can be removed effectively respectively there is provided one kind
Correlation between feature, calculating can be greatly simplified, shorten the training time and detection time towards complicated flow data event
The intrusion detection method of analysis.
In order to solve the above-mentioned technical problem, the technical solution adopted by the present invention is:
A kind of intrusion detection method towards complicated flow data event analysis, implementation steps include:
1) sample collection is carried out in advance, and the sample of collection is performed intrusion detection into result queue and obtains training sample data
Collection, completes the training to grader, the data prediction includes by the training sample data collection after data prediction
Two steps of Feature Dimension Reduction and sample reduction, and execution sequence is first Feature Dimension Reduction sample reduction or first sample reduction feature again again
Dimensionality reduction;
2) current data set to be detected is inputted to the grader trained after the data prediction, classified
Device carries out the intrusion detection result exported after classification and Detection.
Preferably, step 1) in perform intrusion detection result queue specifically refer to by the sample labeling of collection be normal sample
With attack sample, step 2) in the intrusion detection result of output refer to that output is normal or attack.
Preferably, the grader is specially SVM classifier.
Preferably, the Feature Dimension Reduction in the data prediction specifically refers to carry out PCA Feature Dimension Reductions.
Preferably, the detailed step of the progress PCA Feature Dimension Reductions includes:
A1) the character type Feature Mapping for concentrating input data is the numeric type feature in the range of [0,1], completes input number
According to the nondimensionalization of collection, the data set X of n rows m row is obtainednm, data set XnmAltogether comprising n × m sample, and n × m sample point
Do not belong to k different classification y1~yk), wherein sample XiBelong to classification yi;
A2) from data set XnmMiddle traversal takes out a feature and is used as current signature Xi;
A3 current signature X) is calculatediCovariance matrix Covx’;
A4 covariance matrix Cov) is calculatedx' eigenvalue λ1,λ2,…,λm, and eigenvalue λ1,λ2,…,λmCorresponding normalizing
Change characteristic vector a1,a2,…,am;
A5) by eigenvalue λ1,λ2,…,λmSort from big to small, calculate variance contribution ratio
A6 variance contribution ratio) is judgedWhether default variance contribution ratio threshold value is less than, if less than default variance tribute
Rate threshold value is offered, then redirects execution step A2), otherwise redirect execution next step;
A7) by eigenvalue λ1,λ2,…,λmCorresponding normalization characteristic vector a1,a2,…,amThe master of one m rows k row of composition
Component matrix Pmk, by principal component matrix PmkThe matrix Z for obtaining n rows k row is calculated according to formula (1)nk;
Znk=Xnm×Pmk (1)
In formula (1), ZnkRepresent to calculate the matrix that obtained n rows k is arranged, XnmRepresent the data set of n rows m row, PmkRepresent m rows k
The principal component matrix of row;
A8) the matrix Z for arranging n rows knkExported as the result of PCA Feature Dimension Reductions.
Preferably, step A5) calculate variance contribution ratioFunction expression such as formula (2) shown in;
In formula (2), coviiFor covariance matrix Covx’Diagonal entry, λ i represent covariance matrix Covx’I-th
Characteristic value, m represents data set X columns, and k represents principal component matrix PmkColumns.
Preferably, the sample reduction in the data prediction specifically refers to the data to input based on compressed sensing algorithm
Collection is compressed sampling, and step 1) in the training of grader when, selected by observing the accuracy rate of detection of classifier result
Select the foregoing sampling number N for being compressed sampling.
Preferably, the detailed step for being compressed sampling to the data set of input based on compressed sensing algorithm includes:
B1 the dictionary D based on sparse base) is built;
B2) data based on given sampling number N persistently to input are sampled;
B3 the data set X ' for obtained n row m ' row of sampling) is subjected to rarefaction representation on dictionary D, and constructs observing matrix;
B4) data set X ' is observed based on observing matrix, specification is the matrix X " of n ' row m ' row after being compressed;
B5 the corresponding classification accuracy rates of the matrix X " after data set X ', compression) are calculated;
B6) judge whether classification accuracy rate is set up more than default classification accuracy rate threshold value, execution is redirected if setting up
Step B3), else if it is invalid, redirect execution step B7);
B7) result of the data matrix X " of n ' row m ' row as compression sampling is exported.
Preferably, step B1) build the dictionary D based on sparse base specifically refer to DCT dictionaries.
Preferably, step B2) in construction observing matrix be independent identically distributed gaussian random matrix.
The present invention has the advantage that towards the intrusion detection method tool of complicated flow data event analysis:
1st, the present invention introduces the concept of dimensionality reduction rate, to the feature of sample using on the basis of the method based on traditional PCA
Dimensionality reduction, effectively removes the correlation between each feature.
2nd, the compressed sensing algorithm of image processing field is incorporated into intrusion detection by the present invention, by original data set
Repeated sample is compressed sampling, and the small sample set of structure can greatly simplify calculating, shorten detection time, to the sample of redundancy
Be compressed, in the case where verification and measurement ratio is suitable with uncompressed effect, training time and detection time well below it is traditional not
Compression method.
3rd, the sample reduction that the present invention has been further introduced into feed-back regulatory mechanism, data prediction is specifically referred to based on pressure
Contracting perception algorithm is compressed sampling to the data set of input, foregoing to select by observing the accuracy rate of detection of classifier result
The sampling number N of sampling is compressed, optimum sampling times N is drawn, so as to reach real-time, the high purpose precisely detected.
Brief description of the drawings
Fig. 1 is the basic procedure schematic diagram of the method for the embodiment of the present invention one.
Fig. 2 is SVM, naive Bayesian, the comparison figure of the classification accuracy rate index of C4.5 graders.
Fig. 3 is SVM, naive Bayesian, the comparison figure of the classification model construction time index of C4.5 graders.
Fig. 4 is SVM, naive Bayesian, the comparison figure of the classification and Detection time index of C4.5 graders.
Fig. 5 is embodiment one, the comparison figure of uncompressed, the method for embodiment two the classification accuracy rate index of tradition.
Fig. 6 is embodiment one, the comparison figure of uncompressed, the method for embodiment two the verification and measurement ratio index of tradition.
Fig. 7 is embodiment one, the comparison figure of uncompressed, the method for embodiment two the rate of false alarm index of tradition.
Embodiment
Embodiment one:
The present embodiment, as detected complicated flow data, have chosen wherein using the famous data set KDDCUP99 that increases income
98328 samples, wherein Normal (56237), DoS (40172), R2L (9), U2R (102), Probe (1808).To try to achieve more
Plus accurate experimental result, towards complicated flow data by the way of 10 10 folding cross validations.Hereafter will be with data set of increasing income
Exemplified by KDDCUP99, the present embodiment intrusion detection method is further detailed.
As shown in figure 1, the implementation steps of the present embodiment towards the intrusion detection method of complicated flow data event analysis include:
1) sample collection is carried out in advance, and the sample of collection is performed intrusion detection into result queue and obtains training sample data
Collection, completes the training to grader, the data prediction includes by the training sample data collection after data prediction
Two steps of Feature Dimension Reduction and sample reduction, and execution sequence is first Feature Dimension Reduction sample reduction or first sample reduction feature again again
Dimensionality reduction;
2) current data set to be detected is inputted to the grader trained after the data prediction, classified
Device carries out the intrusion detection result exported after classification and Detection.
The present embodiment towards complicated flow data event analysis intrusion detection method in process of data preprocessing, first use
PCA carries out dimensionality reduction to the redundancy feature of sample, then using compressed sensing redundant samples is compressed or by opposite suitable
Sequence is compressed, considered sample size redundancy it is related to sample characteristics the problem of, respectively in terms of this is two remove
The redundancy of data.On the premise of the algorithm ensures that the front and rear detection accuracy of sample compression is basically unchanged, when greatly shortening detection
Between.
In the present embodiment, step 1) in perform intrusion detection result queue and specifically refer to the sample labeling of collection be normal
Sample and attack sample, step 2) in the intrusion detection result of output refer to that output is normal or attack.
The low-dimensional data that appropriate grader can be obtained using PCA and compression sampling completes classification learning, and to test number
There is preferable classification accuracy according to collection.In the present embodiment, grader is specially SVM classifier, and SVM classifier is to higher-dimension, small
Sample data has preferable classification performance, it is possible to increase towards the performance of the intrusion detection of complicated flow data event analysis.
In the present embodiment, the Feature Dimension Reduction in data prediction specifically refers to carry out PCA Feature Dimension Reductions.PCA Feature Dimension Reductions
General principle it is as follows:Remember X=(X1,X2,…,Xm), wherein Xj(1≤j≤m) is sample i (1≤j≤n) j-th of feature,
By matrix P=(P1,P2,…,Pl)(1≤l≤n,PlIt is the vector of m dimensions) conversion, form Z=(Z1,Z2,…,Zk), Z be n ×
K matrix (k≤m).The principle of PCA Feature Dimension Reductions is:When the variance contribution ratio of the covariance matrix of sample is more than some threshold value
When, now corresponding dimensionality reduction rate is ρ=(m-k)/m, and original sample can be by the new feature of a few after converting Zj’(1≤j’≤k)
Describe, and its main component does not change.
In the present embodiment, carrying out the detailed step of PCA Feature Dimension Reductions includes:
A1) the character type Feature Mapping for concentrating input data is the numeric type feature in the range of [0,1], completes input number
According to the nondimensionalization of collection, the data set X of n rows m row is obtainednm, data set XnmAltogether comprising n × m sample, and n × m sample point
Do not belong to k different classification y1~yk), wherein sample XiBelong to classification yi;
A2) from data set XnmMiddle traversal takes out a feature and is used as current signature Xi;
A3 current signature X) is calculatediCovariance matrix Covx’;
A4 covariance matrix Cov) is calculatedx' eigenvalue λ1,λ2,…,λm, and eigenvalue λ1,λ2,…,λmCorresponding normalizing
Change characteristic vector a1,a2,…,am;
A5) by eigenvalue λ1,λ2,…,λmSort from big to small, calculate variance contribution ratio
A6 variance contribution ratio) is judgedWhether being less than default variance contribution ratio threshold value, (value is in the present embodiment
90%), if less than default variance contribution ratio threshold value, then execution step A2 is redirected), otherwise redirect execution next step;
A7) by eigenvalue λ1,λ2,…,λmCorresponding normalization characteristic vector a1,a2,…,amThe master of one m rows k row of composition
Component matrix Pmk, by principal component matrix PmkThe matrix Z for obtaining n rows k row is calculated according to formula (1)nk;
Znk=Xnm×Pmk (1)
In formula (1), ZnkRepresent to calculate the matrix that obtained n rows k is arranged, XnmRepresent the data set of n rows m row, PmkRepresent m rows k
The principal component matrix of row;
A8) the matrix Z for arranging n rows knkExported as the result of PCA Feature Dimension Reductions.
In the present embodiment, step A5) calculate variance contribution ratioFunction expression such as formula (2) shown in;
In formula (2), coviiFor covariance matrix Covx’Diagonal entry, λ i represent covariance matrix Covx' i-th
Characteristic value, m represents data set X columns, and k represents principal component matrix PmkColumns.
In the present embodiment, step A1) the character type Feature Mapping of concentrating input data is the numeric type in the range of [0,1]
Feature, the nondimensionalization for completing input data set obtains the data set X of n rows m rownm, data set XnmThe attribute of 41 dimensions is included altogether.
WhenWhen, the Feature Dimension Reduction rate ρ of definition now:ρ=(m-k)/m, wherein m represent data set X columns, and k represents main
Component matrix PmkColumns.In the present embodiment, Feature Dimension Reduction rate ρ is about 50%.For in a way, it must be got over when dimension drops
It is low, the time of detection can be substantially reduced, but the accuracy rate of sample classification can be influenceed, and consider by compromise, it is final herein to retain
21 features, are carried out after PCA Feature Dimension Reductions, the matrix Z that n rows k is arrangednkExported as the result of PCA Feature Dimension Reductions, matrix Znk
In altogether include 21 dimension attribute features.
In the present embodiment, the sample reduction in data prediction specifically refers to the data to input based on compressed sensing algorithm
Collection is compressed sampling, and step 1) in the training of grader when, selected by observing the accuracy rate of detection of classifier result
Select the foregoing sampling number N for being compressed sampling.The key sampled based on compressed sensing is sampling number N selection, usual feelings
Sampling number N is fewer in the condition lower unit interval, and compression ratio is higher, and the speed that its later stage is trained and detected is faster, but compared with
High compression ratio can influence the precision of detection, it is necessary to weigh detection speed and detection precision.Therefore, the invasion of the present embodiment
Detection method introduces a kind of feed-back regulatory mechanism, by observing the accuracy rate of last testing result, selects different observing matrixes
Line number control compression sampling times N, experiment draws optimum sampling times N, so as to reach real-time, the high mesh precisely detected
's.
In the present embodiment, the detailed step for being compressed sampling to the data set of input based on compressed sensing algorithm includes:
B1 the dictionary D based on sparse base) is built;
B2) data based on given sampling number N persistently to input are sampled;
B3 the data set X ' for obtained n row m ' row of sampling) is subjected to rarefaction representation on dictionary D, and constructs observing matrix;
B4) data set X ' is observed based on observing matrix, specification is the matrix X " of n ' row m ' row after being compressed;This
In embodiment, the data sample sampled every time is sampling number N=600, and correspondence observing matrix is 600 row, and compresses journey
Degree depends on observing matrix line number respectively 0.6N, 0.65N, 0.7N ..., the N selected in sampling matrix line number, the present embodiment,
It is exactly uncompressed sampling configuration when for N;
B5 the corresponding classification accuracy rates of the matrix X " after data set X ', compression) are calculated;
B6) judge whether classification accuracy rate is set up more than default classification accuracy rate threshold value, execution is redirected if setting up
Step B3), else if it is invalid, redirect execution step B7);
B7) result of the data matrix X " of n ' row m ' row as compression sampling is exported.
In the present embodiment, step B1) build the dictionary D based on sparse base and specifically refer to DCT (Discrete Cosine
Transform) dictionary.The design of observing matrix, except to meet it is uncorrelated to sparse base in addition to, candes and Tao give
Limited equidistant this necessary and sufficient condition of property RIP.In the present embodiment, step B2) in the observing matrix of construction be independent identically distributed
Gaussian random matrix, can meet uncorrelated to sparse base simultaneously, and limited equidistant this necessary and sufficient condition of property RIP.
It is used below in order to assess the present embodiment towards the performance of the intrusion detection method of complicated flow data event analysis
Confusion matrix, and on the basis of confusion matrix, be further introduced into verification and measurement ratio, classification accuracy rate, rate of false alarm, the modeling time,
Several detection performance indications such as detection time, confusion matrix.
All examples in model are divided into different classes by confusion matrix by determining whether predicted value matches with actual value
Not.Then all examples in each classification can be counted, and shows total in a matrix, as shown in table 1:
Table 1:Confusion matrix.
According to the true classification and detection model of given sample to its class prediction, it is likely to occur altogether such as table 1
4 kinds of shown structures:True positive (TP), true negative (TN), false positive (FP) and false
negative(FN).Wherein TN detection models corresponding with TP it is correctly predicted, i.e., sample is correctly identified as normal or attack.And
FP and FN then correspond to error prediction, and FP refers to normal sample and is misidentified as attack, and FN refers to attack sample and is erroneously identified as normally.
Based on confusion matrix, each index above-mentioned can be calculated.Computing formula is as follows:
In above formula, TP represents that true positive, FN represent that false negative, TN represent true negative,
FP represents false positive, and FP and FN then correspond to error prediction, and FP refers to normal sample and is misidentified as attack, and FN, which refers to, to be attacked
Sample is hit to be erroneously identified as normally.
Intrusion detection method experiment for the present embodiment towards complicated flow data event analysis is completed under Weka.
The intrusion detection method of the present embodiment towards complicated flow data event analysis uses first Feature Dimension Reduction, the compression side of rear sample reduction
Formula, is pre-processed to data set, then carries out sample in SVM, naive Bayesian, three kinds of graders of C4.5 graders respectively
Detection, and compare the indexs such as their classification accuracy rate, modeling time, detection time, as shown in Figure 2, Figure 3, Figure 4.
It is can be seen that from Fig. 2, Fig. 3, Fig. 4 under relatively low sampling number N, classification of the SVM than naive Bayesian and C4.5
More preferably, main cause is attributed to advantages of the SVM in terms of processing higher-dimension, Small Sample Database to performance, with sampling number N increasing
Many, naive Bayesian and C4.5 classification performance rising are more apparent, and final classification accuracy is slightly above SVM, meanwhile, SVM's builds
Mould time and detection time between C4.5 and naive Bayesian (sampling number N be 450 after), it can thus be seen that SVM
Most suitable as the present embodiment towards the intrusion detection method of complicated flow data event analysis grader.
Embodiment two:
Essentially identical with embodiment one in the present embodiment, its main difference is:Data prediction bag in embodiment one
Two steps of Feature Dimension Reduction and sample reduction are included, and execution sequence is first Feature Dimension Reduction sample reduction again.And in the present embodiment, number
Data preprocess equally includes two steps of Feature Dimension Reduction and sample reduction, but execution sequence is that feature drops first sample reduction again
Dimension, but sample reduction, the detailed content of two steps of Feature Dimension Reduction are identical with embodiment one, therefore will not be repeated here.
In order to further verify the present embodiment towards the performance of the intrusion detection method of complicated flow data event analysis, experiment
Comparing traditional uncompressed sample and first Feature Compression, redundant samples compression and first redundant samples compress feature again again
The mode of compression, the result of classification and Detection is carried out using SVM as grader.Fig. 5, Fig. 6, Fig. 7 illustrate experimental data set process
Above-mentioned three kinds of modes handle after verification and measurement ratio (DR), classification accuracy rate (TR), the situation of rate of false alarm (FPR).From Fig. 5, Fig. 6, Fig. 7
As can be seen that embodiment one (PCA-CS) is roughly the same with the tendency of the curve of the present embodiment (CS-PCA) method, in low sampling
Under times N, classification accuracy rate, verification and measurement ratio are all than relatively low, and rate of false alarm is higher.With sampling number N increase, performance is substantially obtained
Lifting, when particularly sampling number N is between 500-550, accuracy, verification and measurement ratio and rate of false alarm tend towards stability, and reach and tradition
Uncompressed sampling equivalent effect.In order to further verify two methods of PCA-CS and CS-PCA in modeling time, detection
Between upper performance quality, using 10 10 folding cross validations and by the way of being averaged, respectively to the various times of the two methods
(when sampling number N is 550) is counted, as shown in table 2 and table 3.
Table 2:The time statistics of the present embodiment.
Table 3:The time statistics of embodiment one.
The modeling time of the method for use embodiment one and detection time are with the present embodiment side it can be seen from table 2 and table 3
Method is roughly equal, and the performance of embodiment one is slightly good, it was demonstrated that the present embodiment is towards the intrusion detection side of complicated flow data event analysis
The PCA and the data compression method of compressed sensing that method is proposed are feasible, stable.
Table 4 compares embodiment one, the present embodiment and the uncompressed method of tradition, the grader constituted with SVM it is flat
Model and detection time.
Table 4:Modeling time and detection time performance comparison.
Method | Averagely model time/S | The average detected time/S |
The uncompressed method of tradition | 25.5 | 3.31 |
The present embodiment method | 14.32 | 1.64 |
The method of embodiment one | 14.09 | 1.40 |
From Fig. 5, Fig. 6, Fig. 7, table 2~4 is as can be seen that classification accuracy rate and verification and measurement ratio are slightly below before compression upon compression
In the range of (1%-3%), the method being combined using PCA and compressed sensing is performed intrusion detection, and can greatly simplify amount of calculation,
Efficiently reduce the time of training and detection, it was demonstrated that the intrusion detection method of the present embodiment towards complicated flow data event analysis is carried
The intrusion detection method based on PCA and compressed sensing gone out is feasible, effective.
The above is only the preferred embodiment of the present invention, and protection scope of the present invention is not limited merely to above-mentioned implementation
Example, all technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that for the art
Those of ordinary skill for, some improvements and modifications without departing from the principles of the present invention, these improvements and modifications
It should be regarded as protection scope of the present invention.
Claims (10)
1. a kind of intrusion detection method towards complicated flow data event analysis, it is characterised in that implementation steps include:
1) sample collection is carried out in advance, and the sample of collection is performed intrusion detection into result queue and obtains training sample data collection, will
The training sample data collection completes the training to grader after data prediction, and the data prediction drops including feature
Two steps of peacekeeping sample reduction, and execution sequence is first Feature Dimension Reduction sample reduction or first sample reduction Feature Dimension Reduction again again;
2) current data set to be detected is inputted to the grader trained after the data prediction, grader is obtained and enters
The intrusion detection result exported after row classification and Detection.
2. the intrusion detection method according to claim 1 towards complicated flow data event analysis, it is characterised in that step
1) result queue is performed intrusion detection in and specifically refers to the sample labeling of collection be normal sample and attack sample, step 2) in
The intrusion detection result of output refers to that output is normal or attacks.
3. the intrusion detection method according to claim 1 towards complicated flow data event analysis, it is characterised in that described
Grader is specially SVM classifier.
4. the intrusion detection method according to claim 1 towards complicated flow data event analysis, it is characterised in that described
Feature Dimension Reduction in data prediction specifically refers to carry out PCA Feature Dimension Reductions.
5. the intrusion detection method according to claim 4 towards complicated flow data event analysis, it is characterised in that described
Carrying out the detailed step of PCA Feature Dimension Reductions includes:
A1) the character type Feature Mapping for concentrating input data is the numeric type feature in the range of [0,1], completes input data set
Nondimensionalization, obtain n rows m row data set Xnm, data set XnmAltogether comprising n × m sample, and n × m sample belongs to respectively
In k different classification y1~yk), wherein sample XiBelong to classification yi;
A2) from data set XnmMiddle traversal takes out a feature and is used as current signature Xi;
A3 current signature X) is calculatediCovariance matrix Covx’;
A4 covariance matrix Cov) is calculatedx’Eigenvalue λ1,λ2,…,λm, and eigenvalue λ1,λ2,…,λmCorresponding normalization is special
Levy vectorial a1,a2,…,am;
A5) by eigenvalue λ1,λ2,…,λmSort from big to small, calculate variance contribution ratio
A6 variance contribution ratio) is judgedWhether default variance contribution ratio threshold value is less than, if less than default variance contribution ratio
Threshold value, then redirect execution step A2), otherwise redirect execution next step;
A7) by eigenvalue λ1,λ2,…,λmCorresponding normalization characteristic vector a1,a2,…,amThe principal component of one m rows k row of composition
Matrix Pmk, by principal component matrix PmkThe matrix Z for obtaining n rows k row is calculated according to formula (1)nk;
Znk=Xnm×Pmk (1)
In formula (1), ZnkRepresent to calculate the matrix that obtained n rows k is arranged, XnmRepresent the data set of n rows m row, PmkRepresent m rows k row
Principal component matrix;
A8) the matrix Z for arranging n rows knkExported as the result of PCA Feature Dimension Reductions.
6. the intrusion detection method according to claim 5 towards complicated flow data event analysis, it is characterised in that step
A5 variance contribution ratio) is calculatedFunction expression such as formula (2) shown in;
In formula (2), coviiFor covariance matrix Covx’Diagonal entry, λ i represent covariance matrix Covx’Ith feature
Value, m represents data set X columns, and k represents principal component matrix PmkColumns.
7. the intrusion detection method according to claim 1 towards complicated flow data event analysis, it is characterised in that described
Sample reduction in data prediction specifically refers to be compressed the data set of input sampling, and step based on compressed sensing algorithm
It is rapid 1) in the training of grader when, select foregoing to be compressed sampling by observing the accuracy rate of detection of classifier result
Sampling number N.
8. the intrusion detection method according to claim 7 towards complicated flow data event analysis, it is characterised in that described
The detailed step for being compressed sampling to the data set of input based on compressed sensing algorithm includes:
B1 the dictionary D based on sparse base) is built;
B2) data based on given sampling number N persistently to input are sampled;
B3 the data set X ' for obtained n row m ' row of sampling) is subjected to rarefaction representation on dictionary D, and constructs observing matrix;
B4) data set X ' is observed based on observing matrix, specification is the matrix X " of n ' row m ' row after being compressed;
B5 the corresponding classification accuracy rates of the matrix X " after data set X ', compression) are calculated;
B6) judge whether classification accuracy rate is set up more than default classification accuracy rate threshold value, execution step is redirected if setting up
B3), else if invalid, execution step B7 is redirected);
B7) result of the data matrix X " of n ' row m ' row as compression sampling is exported.
9. the intrusion detection method according to claim 8 towards complicated flow data event analysis, it is characterised in that step
B1) build the dictionary D based on sparse base and specifically refer to DCT dictionaries.
10. the intrusion detection method according to claim 8 towards complicated flow data event analysis, it is characterised in that step
Rapid B2) in the observing matrix of construction be independent identically distributed gaussian random matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710146332.4A CN106951778A (en) | 2017-03-13 | 2017-03-13 | A kind of intrusion detection method towards complicated flow data event analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710146332.4A CN106951778A (en) | 2017-03-13 | 2017-03-13 | A kind of intrusion detection method towards complicated flow data event analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106951778A true CN106951778A (en) | 2017-07-14 |
Family
ID=59468268
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710146332.4A Pending CN106951778A (en) | 2017-03-13 | 2017-03-13 | A kind of intrusion detection method towards complicated flow data event analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106951778A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107590697A (en) * | 2017-09-18 | 2018-01-16 | 北京京东尚科信息技术有限公司 | Data processing method and its system |
CN109583904A (en) * | 2018-11-30 | 2019-04-05 | 深圳市腾讯计算机系统有限公司 | Training method, impaired operation detection method and the device of abnormal operation detection model |
CN109784668A (en) * | 2018-12-21 | 2019-05-21 | 国网江苏省电力有限公司南京供电分公司 | A kind of sample characteristics dimension-reduction treatment method for electric power monitoring system unusual checking |
CN109962909A (en) * | 2019-01-30 | 2019-07-02 | 大连理工大学 | A kind of network intrusions method for detecting abnormality based on machine learning |
CN110191081A (en) * | 2018-02-22 | 2019-08-30 | 上海交通大学 | The Feature Selection system and method for network flow attack detecting based on learning automaton |
CN110401649A (en) * | 2019-07-17 | 2019-11-01 | 湖北央中巨石信息技术有限公司 | Information Security Risk Assessment Methods and system based on Situation Awareness study |
CN110610148A (en) * | 2019-09-02 | 2019-12-24 | 南京邮电大学 | Privacy protection-oriented compressed sensing visual shielding video behavior identification method |
CN112149818A (en) * | 2019-06-27 | 2020-12-29 | 北京数安鑫云信息技术有限公司 | Threat identification result evaluation method and device |
CN112437053A (en) * | 2020-11-10 | 2021-03-02 | 国网北京市电力公司 | Intrusion detection method and device |
CN113254925A (en) * | 2021-02-01 | 2021-08-13 | 中国人民解放军海军工程大学 | Network intrusion detection system based on PCA and SVM |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009096903A1 (en) * | 2008-01-28 | 2009-08-06 | National University Of Singapore | Lipid tumour profile |
CN101968813A (en) * | 2010-10-25 | 2011-02-09 | 华北电力大学 | Method for detecting counterfeit webpage |
CN102158486A (en) * | 2011-04-02 | 2011-08-17 | 华北电力大学 | Method for rapidly detecting network invasion |
CN103440513A (en) * | 2013-09-17 | 2013-12-11 | 西安电子科技大学 | Method for determining specific visual cognition state of brain based on sparse nonnegative tensor factorization (SNTF) |
CN103618744A (en) * | 2013-12-10 | 2014-03-05 | 华东理工大学 | Intrusion detection method based on fast k-nearest neighbor (KNN) algorithm |
CN105160295A (en) * | 2015-07-14 | 2015-12-16 | 东北大学 | Rapid high-efficiency face identification method for large-scale face database |
CN105897517A (en) * | 2016-06-20 | 2016-08-24 | 广东电网有限责任公司信息中心 | Network traffic abnormality detection method based on SVM (Support Vector Machine) |
CN106407905A (en) * | 2016-08-31 | 2017-02-15 | 电子科技大学 | Machine learning-based wireless sensing motion identification method |
-
2017
- 2017-03-13 CN CN201710146332.4A patent/CN106951778A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009096903A1 (en) * | 2008-01-28 | 2009-08-06 | National University Of Singapore | Lipid tumour profile |
CN101968813A (en) * | 2010-10-25 | 2011-02-09 | 华北电力大学 | Method for detecting counterfeit webpage |
CN102158486A (en) * | 2011-04-02 | 2011-08-17 | 华北电力大学 | Method for rapidly detecting network invasion |
CN103440513A (en) * | 2013-09-17 | 2013-12-11 | 西安电子科技大学 | Method for determining specific visual cognition state of brain based on sparse nonnegative tensor factorization (SNTF) |
CN103618744A (en) * | 2013-12-10 | 2014-03-05 | 华东理工大学 | Intrusion detection method based on fast k-nearest neighbor (KNN) algorithm |
CN105160295A (en) * | 2015-07-14 | 2015-12-16 | 东北大学 | Rapid high-efficiency face identification method for large-scale face database |
CN105897517A (en) * | 2016-06-20 | 2016-08-24 | 广东电网有限责任公司信息中心 | Network traffic abnormality detection method based on SVM (Support Vector Machine) |
CN106407905A (en) * | 2016-08-31 | 2017-02-15 | 电子科技大学 | Machine learning-based wireless sensing motion identification method |
Non-Patent Citations (1)
Title |
---|
闫敬文: "《压缩感知及应用》", 31 October 2015 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107590697A (en) * | 2017-09-18 | 2018-01-16 | 北京京东尚科信息技术有限公司 | Data processing method and its system |
CN110191081A (en) * | 2018-02-22 | 2019-08-30 | 上海交通大学 | The Feature Selection system and method for network flow attack detecting based on learning automaton |
CN109583904B (en) * | 2018-11-30 | 2023-04-07 | 深圳市腾讯计算机系统有限公司 | Training method of abnormal operation detection model, abnormal operation detection method and device |
CN109583904A (en) * | 2018-11-30 | 2019-04-05 | 深圳市腾讯计算机系统有限公司 | Training method, impaired operation detection method and the device of abnormal operation detection model |
CN109784668A (en) * | 2018-12-21 | 2019-05-21 | 国网江苏省电力有限公司南京供电分公司 | A kind of sample characteristics dimension-reduction treatment method for electric power monitoring system unusual checking |
CN109962909A (en) * | 2019-01-30 | 2019-07-02 | 大连理工大学 | A kind of network intrusions method for detecting abnormality based on machine learning |
CN109962909B (en) * | 2019-01-30 | 2021-05-14 | 大连理工大学 | Network intrusion anomaly detection method based on machine learning |
CN112149818A (en) * | 2019-06-27 | 2020-12-29 | 北京数安鑫云信息技术有限公司 | Threat identification result evaluation method and device |
CN112149818B (en) * | 2019-06-27 | 2024-04-09 | 北京数安鑫云信息技术有限公司 | Threat identification result evaluation method and device |
CN110401649A (en) * | 2019-07-17 | 2019-11-01 | 湖北央中巨石信息技术有限公司 | Information Security Risk Assessment Methods and system based on Situation Awareness study |
CN110610148A (en) * | 2019-09-02 | 2019-12-24 | 南京邮电大学 | Privacy protection-oriented compressed sensing visual shielding video behavior identification method |
CN110610148B (en) * | 2019-09-02 | 2022-02-08 | 南京邮电大学 | Privacy protection-oriented compressed sensing visual shielding video behavior identification method |
CN112437053A (en) * | 2020-11-10 | 2021-03-02 | 国网北京市电力公司 | Intrusion detection method and device |
CN112437053B (en) * | 2020-11-10 | 2023-06-30 | 国网北京市电力公司 | Intrusion detection method and device |
CN113254925B (en) * | 2021-02-01 | 2022-11-15 | 中国人民解放军海军工程大学 | Network intrusion detection system based on PCA and SVM |
CN113254925A (en) * | 2021-02-01 | 2021-08-13 | 中国人民解放军海军工程大学 | Network intrusion detection system based on PCA and SVM |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106951778A (en) | A kind of intrusion detection method towards complicated flow data event analysis | |
CN111967502B (en) | Network intrusion detection method based on conditional variation self-encoder | |
CN109886020B (en) | Software vulnerability automatic classification method based on deep neural network | |
CN108737406A (en) | A kind of detection method and system of abnormal flow data | |
CN109948125B (en) | Method and system for improved Simhash algorithm in text deduplication | |
CN102346829A (en) | Virus detection method based on ensemble classification | |
CN104346459B (en) | A kind of text classification feature selection approach based on term frequency and chi | |
CN102142082B (en) | Virtual sample based kernel discrimination method for face recognition | |
CN109190698B (en) | Classification and identification system and method for network digital virtual assets | |
WO2022121163A1 (en) | User behavior tendency identification method, apparatus, and device, and storage medium | |
CN107203750B (en) | Hyperspectral target detection method based on combination of sparse expression and discriminant analysis | |
CN112437053B (en) | Intrusion detection method and device | |
CN113505826B (en) | Network flow anomaly detection method based on joint feature selection | |
CN112820416A (en) | Major infectious disease queue data typing method, typing model and electronic equipment | |
CN112884570A (en) | Method, device and equipment for determining model security | |
CN116204831A (en) | Road-to-ground analysis method based on neural network | |
CN112085062A (en) | Wavelet neural network-based abnormal energy consumption positioning method | |
CN113283901B (en) | Byte code-based fraud contract detection method for block chain platform | |
CN104616027A (en) | Non-adjacent graph structure sparse face recognizing method | |
CN114999628B (en) | Method for searching for obvious characteristic of degenerative knee osteoarthritis by using machine learning | |
CN105824785A (en) | Rapid abnormal point detection method based on penalized regression | |
CN106709598B (en) | Voltage stability prediction and judgment method based on single-class samples | |
CN112507299B (en) | Self-adaptive keystroke behavior authentication method and device in continuous identity authentication system | |
CN111950717B (en) | Public opinion quantification method based on neural network | |
CN111382273B (en) | Text classification method based on feature selection of attraction factors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170714 |