CN106951778A - A kind of intrusion detection method towards complicated flow data event analysis - Google Patents

A kind of intrusion detection method towards complicated flow data event analysis Download PDF

Info

Publication number
CN106951778A
CN106951778A CN201710146332.4A CN201710146332A CN106951778A CN 106951778 A CN106951778 A CN 106951778A CN 201710146332 A CN201710146332 A CN 201710146332A CN 106951778 A CN106951778 A CN 106951778A
Authority
CN
China
Prior art keywords
sample
data
matrix
intrusion detection
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710146332.4A
Other languages
Chinese (zh)
Inventor
杨秋伟
谭奕
孙铁峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BBK Electronics Co Ltd
Original Assignee
BBK Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BBK Electronics Co Ltd filed Critical BBK Electronics Co Ltd
Priority to CN201710146332.4A priority Critical patent/CN106951778A/en
Publication of CN106951778A publication Critical patent/CN106951778A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of intrusion detection method towards complicated flow data event analysis, implementation steps include:Sample collection is carried out in advance, the sample of collection is performed intrusion detection into result queue and obtains training sample data collection, training sample data collection is completed to the training to grader after data prediction, data prediction includes Feature Dimension Reduction and sample reduction, and execution sequence is first Feature Dimension Reduction sample reduction or first sample reduction Feature Dimension Reduction again again;Current data set to be detected is inputted to the grader trained after data prediction and carries out classification and Detection output invasion testing result.The present invention can effectively remove correlation between each feature, can greatly simplify calculating, considered sample size redundancy it is related to sample characteristics the problem of, the redundancy of data is removed in terms of Feature Dimension Reduction and sample reduction two respectively, on the premise of ensureing that the front and rear detection accuracy of sample compression is basically unchanged, shortening training time and detection time are substantially reduced.

Description

A kind of intrusion detection method towards complicated flow data event analysis
Technical field
The present invention relates to the Intrusion Detection Technique of computer network security field, and in particular to a kind of towards complicated flow data The intrusion detection method of event analysis.
Background technology
Intrusion Detection Technique is that occur over nearly 20 years a kind of actively protects oneself in order to avoid the new network of hacker attack Safe practice.In the case of in face of various Cyberthreats, the intrusion behavior to network is correctly detected out in time and is taken just When processing mode to reduce the loss that network attack is caused be a focus being currently undertaken by network security research.Current people The main method using machine learning builds detection model, and its usual way is to extract invasion data or the normal spy for accessing data Levy, construction feature database, carry out pattern match, and then complete intrusion detection.Conventional machine learning method such as Bayes point Class, k nearest neighbor (KNN), genetic algorithm (GA), decision tree, artificial neural network (ANN), SVMs (SVM).Due to SVM pairs Higher-dimension, Small Sample Database have preferable classification performance, therefore, are usually used in the training of IDS Framework.However, using machine The method of study, which is performed intrusion detection, must face that data dependence is big, repeated sample is more in intrusion detection sample data, detection The problems such as time is long.The redundancy of data is mainly manifested in two aspects:One side is that the dimension of data sample is higher;On the other hand There is the sample of bulk redundancy.At present, the method for conventional removal data redundancy has feature selecting and fuzzy clustering etc., they Preferable effect is achieved in terms of simplifying amount of calculation, but fails to solve the problems such as convergence rate is slow, accuracy of detection is not high well.
Principal component analysis (Principle Component Analysis, abbreviation PCA) method is K.Person at one A kind of statistical analysis technique proposed before many centuries, its basic thought is to be characterized as a few multiple by linear transformation The orthogonal new feature arranged from big to small by importance.In intrusion detection data, often it is related to numerous features and becomes Amount, although each feature, which is both provided, certain correlation between certain information, but numerous features, therefore can use PCA removes the redundancy feature of sample.Compressed sensing (compressive sensing, abbreviation CS) is the calculation of image processing field Method, applied to the compression and reconstruction of image, the algorithm is pressed signal in the case where sample rate is far below Niquist rates Contracting, and restoration and reconstruction can be carried out to raw information by suitable optimized algorithm, and convergence rate is very fast.Therefore using pressure Contracting perception algorithm is sampled to sample, is removed the sample of redundancy, is reached the purpose of yojan.How PCA and compressed sensing are based on The intrusion detection of real-time high-efficiency is realized, a key technical problem urgently to be resolved hurrily is had become.
The content of the invention
The technical problem to be solved in the present invention:Above mentioned problem for prior art can be removed effectively respectively there is provided one kind Correlation between feature, calculating can be greatly simplified, shorten the training time and detection time towards complicated flow data event The intrusion detection method of analysis.
In order to solve the above-mentioned technical problem, the technical solution adopted by the present invention is:
A kind of intrusion detection method towards complicated flow data event analysis, implementation steps include:
1) sample collection is carried out in advance, and the sample of collection is performed intrusion detection into result queue and obtains training sample data Collection, completes the training to grader, the data prediction includes by the training sample data collection after data prediction Two steps of Feature Dimension Reduction and sample reduction, and execution sequence is first Feature Dimension Reduction sample reduction or first sample reduction feature again again Dimensionality reduction;
2) current data set to be detected is inputted to the grader trained after the data prediction, classified Device carries out the intrusion detection result exported after classification and Detection.
Preferably, step 1) in perform intrusion detection result queue specifically refer to by the sample labeling of collection be normal sample With attack sample, step 2) in the intrusion detection result of output refer to that output is normal or attack.
Preferably, the grader is specially SVM classifier.
Preferably, the Feature Dimension Reduction in the data prediction specifically refers to carry out PCA Feature Dimension Reductions.
Preferably, the detailed step of the progress PCA Feature Dimension Reductions includes:
A1) the character type Feature Mapping for concentrating input data is the numeric type feature in the range of [0,1], completes input number According to the nondimensionalization of collection, the data set X of n rows m row is obtainednm, data set XnmAltogether comprising n × m sample, and n × m sample point Do not belong to k different classification y1~yk), wherein sample XiBelong to classification yi
A2) from data set XnmMiddle traversal takes out a feature and is used as current signature Xi
A3 current signature X) is calculatediCovariance matrix Covx’;
A4 covariance matrix Cov) is calculatedx' eigenvalue λ12,…,λm, and eigenvalue λ12,…,λmCorresponding normalizing Change characteristic vector a1,a2,…,am
A5) by eigenvalue λ12,…,λmSort from big to small, calculate variance contribution ratio
A6 variance contribution ratio) is judgedWhether default variance contribution ratio threshold value is less than, if less than default variance tribute Rate threshold value is offered, then redirects execution step A2), otherwise redirect execution next step;
A7) by eigenvalue λ12,…,λmCorresponding normalization characteristic vector a1,a2,…,amThe master of one m rows k row of composition Component matrix Pmk, by principal component matrix PmkThe matrix Z for obtaining n rows k row is calculated according to formula (1)nk
Znk=Xnm×Pmk (1)
In formula (1), ZnkRepresent to calculate the matrix that obtained n rows k is arranged, XnmRepresent the data set of n rows m row, PmkRepresent m rows k The principal component matrix of row;
A8) the matrix Z for arranging n rows knkExported as the result of PCA Feature Dimension Reductions.
Preferably, step A5) calculate variance contribution ratioFunction expression such as formula (2) shown in;
In formula (2), coviiFor covariance matrix Covx’Diagonal entry, λ i represent covariance matrix Covx’I-th Characteristic value, m represents data set X columns, and k represents principal component matrix PmkColumns.
Preferably, the sample reduction in the data prediction specifically refers to the data to input based on compressed sensing algorithm Collection is compressed sampling, and step 1) in the training of grader when, selected by observing the accuracy rate of detection of classifier result Select the foregoing sampling number N for being compressed sampling.
Preferably, the detailed step for being compressed sampling to the data set of input based on compressed sensing algorithm includes:
B1 the dictionary D based on sparse base) is built;
B2) data based on given sampling number N persistently to input are sampled;
B3 the data set X ' for obtained n row m ' row of sampling) is subjected to rarefaction representation on dictionary D, and constructs observing matrix;
B4) data set X ' is observed based on observing matrix, specification is the matrix X " of n ' row m ' row after being compressed;
B5 the corresponding classification accuracy rates of the matrix X " after data set X ', compression) are calculated;
B6) judge whether classification accuracy rate is set up more than default classification accuracy rate threshold value, execution is redirected if setting up Step B3), else if it is invalid, redirect execution step B7);
B7) result of the data matrix X " of n ' row m ' row as compression sampling is exported.
Preferably, step B1) build the dictionary D based on sparse base specifically refer to DCT dictionaries.
Preferably, step B2) in construction observing matrix be independent identically distributed gaussian random matrix.
The present invention has the advantage that towards the intrusion detection method tool of complicated flow data event analysis:
1st, the present invention introduces the concept of dimensionality reduction rate, to the feature of sample using on the basis of the method based on traditional PCA Dimensionality reduction, effectively removes the correlation between each feature.
2nd, the compressed sensing algorithm of image processing field is incorporated into intrusion detection by the present invention, by original data set Repeated sample is compressed sampling, and the small sample set of structure can greatly simplify calculating, shorten detection time, to the sample of redundancy Be compressed, in the case where verification and measurement ratio is suitable with uncompressed effect, training time and detection time well below it is traditional not Compression method.
3rd, the sample reduction that the present invention has been further introduced into feed-back regulatory mechanism, data prediction is specifically referred to based on pressure Contracting perception algorithm is compressed sampling to the data set of input, foregoing to select by observing the accuracy rate of detection of classifier result The sampling number N of sampling is compressed, optimum sampling times N is drawn, so as to reach real-time, the high purpose precisely detected.
Brief description of the drawings
Fig. 1 is the basic procedure schematic diagram of the method for the embodiment of the present invention one.
Fig. 2 is SVM, naive Bayesian, the comparison figure of the classification accuracy rate index of C4.5 graders.
Fig. 3 is SVM, naive Bayesian, the comparison figure of the classification model construction time index of C4.5 graders.
Fig. 4 is SVM, naive Bayesian, the comparison figure of the classification and Detection time index of C4.5 graders.
Fig. 5 is embodiment one, the comparison figure of uncompressed, the method for embodiment two the classification accuracy rate index of tradition.
Fig. 6 is embodiment one, the comparison figure of uncompressed, the method for embodiment two the verification and measurement ratio index of tradition.
Fig. 7 is embodiment one, the comparison figure of uncompressed, the method for embodiment two the rate of false alarm index of tradition.
Embodiment
Embodiment one:
The present embodiment, as detected complicated flow data, have chosen wherein using the famous data set KDDCUP99 that increases income 98328 samples, wherein Normal (56237), DoS (40172), R2L (9), U2R (102), Probe (1808).To try to achieve more Plus accurate experimental result, towards complicated flow data by the way of 10 10 folding cross validations.Hereafter will be with data set of increasing income Exemplified by KDDCUP99, the present embodiment intrusion detection method is further detailed.
As shown in figure 1, the implementation steps of the present embodiment towards the intrusion detection method of complicated flow data event analysis include:
1) sample collection is carried out in advance, and the sample of collection is performed intrusion detection into result queue and obtains training sample data Collection, completes the training to grader, the data prediction includes by the training sample data collection after data prediction Two steps of Feature Dimension Reduction and sample reduction, and execution sequence is first Feature Dimension Reduction sample reduction or first sample reduction feature again again Dimensionality reduction;
2) current data set to be detected is inputted to the grader trained after the data prediction, classified Device carries out the intrusion detection result exported after classification and Detection.
The present embodiment towards complicated flow data event analysis intrusion detection method in process of data preprocessing, first use PCA carries out dimensionality reduction to the redundancy feature of sample, then using compressed sensing redundant samples is compressed or by opposite suitable Sequence is compressed, considered sample size redundancy it is related to sample characteristics the problem of, respectively in terms of this is two remove The redundancy of data.On the premise of the algorithm ensures that the front and rear detection accuracy of sample compression is basically unchanged, when greatly shortening detection Between.
In the present embodiment, step 1) in perform intrusion detection result queue and specifically refer to the sample labeling of collection be normal Sample and attack sample, step 2) in the intrusion detection result of output refer to that output is normal or attack.
The low-dimensional data that appropriate grader can be obtained using PCA and compression sampling completes classification learning, and to test number There is preferable classification accuracy according to collection.In the present embodiment, grader is specially SVM classifier, and SVM classifier is to higher-dimension, small Sample data has preferable classification performance, it is possible to increase towards the performance of the intrusion detection of complicated flow data event analysis.
In the present embodiment, the Feature Dimension Reduction in data prediction specifically refers to carry out PCA Feature Dimension Reductions.PCA Feature Dimension Reductions General principle it is as follows:Remember X=(X1,X2,…,Xm), wherein Xj(1≤j≤m) is sample i (1≤j≤n) j-th of feature, By matrix P=(P1,P2,…,Pl)(1≤l≤n,PlIt is the vector of m dimensions) conversion, form Z=(Z1,Z2,…,Zk), Z be n × K matrix (k≤m).The principle of PCA Feature Dimension Reductions is:When the variance contribution ratio of the covariance matrix of sample is more than some threshold value When, now corresponding dimensionality reduction rate is ρ=(m-k)/m, and original sample can be by the new feature of a few after converting Zj’(1≤j’≤k) Describe, and its main component does not change.
In the present embodiment, carrying out the detailed step of PCA Feature Dimension Reductions includes:
A1) the character type Feature Mapping for concentrating input data is the numeric type feature in the range of [0,1], completes input number According to the nondimensionalization of collection, the data set X of n rows m row is obtainednm, data set XnmAltogether comprising n × m sample, and n × m sample point Do not belong to k different classification y1~yk), wherein sample XiBelong to classification yi
A2) from data set XnmMiddle traversal takes out a feature and is used as current signature Xi
A3 current signature X) is calculatediCovariance matrix Covx’
A4 covariance matrix Cov) is calculatedx' eigenvalue λ12,…,λm, and eigenvalue λ12,…,λmCorresponding normalizing Change characteristic vector a1,a2,…,am
A5) by eigenvalue λ12,…,λmSort from big to small, calculate variance contribution ratio
A6 variance contribution ratio) is judgedWhether being less than default variance contribution ratio threshold value, (value is in the present embodiment 90%), if less than default variance contribution ratio threshold value, then execution step A2 is redirected), otherwise redirect execution next step;
A7) by eigenvalue λ12,…,λmCorresponding normalization characteristic vector a1,a2,…,amThe master of one m rows k row of composition Component matrix Pmk, by principal component matrix PmkThe matrix Z for obtaining n rows k row is calculated according to formula (1)nk
Znk=Xnm×Pmk (1)
In formula (1), ZnkRepresent to calculate the matrix that obtained n rows k is arranged, XnmRepresent the data set of n rows m row, PmkRepresent m rows k The principal component matrix of row;
A8) the matrix Z for arranging n rows knkExported as the result of PCA Feature Dimension Reductions.
In the present embodiment, step A5) calculate variance contribution ratioFunction expression such as formula (2) shown in;
In formula (2), coviiFor covariance matrix Covx’Diagonal entry, λ i represent covariance matrix Covx' i-th Characteristic value, m represents data set X columns, and k represents principal component matrix PmkColumns.
In the present embodiment, step A1) the character type Feature Mapping of concentrating input data is the numeric type in the range of [0,1] Feature, the nondimensionalization for completing input data set obtains the data set X of n rows m rownm, data set XnmThe attribute of 41 dimensions is included altogether. WhenWhen, the Feature Dimension Reduction rate ρ of definition now:ρ=(m-k)/m, wherein m represent data set X columns, and k represents main Component matrix PmkColumns.In the present embodiment, Feature Dimension Reduction rate ρ is about 50%.For in a way, it must be got over when dimension drops It is low, the time of detection can be substantially reduced, but the accuracy rate of sample classification can be influenceed, and consider by compromise, it is final herein to retain 21 features, are carried out after PCA Feature Dimension Reductions, the matrix Z that n rows k is arrangednkExported as the result of PCA Feature Dimension Reductions, matrix Znk In altogether include 21 dimension attribute features.
In the present embodiment, the sample reduction in data prediction specifically refers to the data to input based on compressed sensing algorithm Collection is compressed sampling, and step 1) in the training of grader when, selected by observing the accuracy rate of detection of classifier result Select the foregoing sampling number N for being compressed sampling.The key sampled based on compressed sensing is sampling number N selection, usual feelings Sampling number N is fewer in the condition lower unit interval, and compression ratio is higher, and the speed that its later stage is trained and detected is faster, but compared with High compression ratio can influence the precision of detection, it is necessary to weigh detection speed and detection precision.Therefore, the invasion of the present embodiment Detection method introduces a kind of feed-back regulatory mechanism, by observing the accuracy rate of last testing result, selects different observing matrixes Line number control compression sampling times N, experiment draws optimum sampling times N, so as to reach real-time, the high mesh precisely detected 's.
In the present embodiment, the detailed step for being compressed sampling to the data set of input based on compressed sensing algorithm includes:
B1 the dictionary D based on sparse base) is built;
B2) data based on given sampling number N persistently to input are sampled;
B3 the data set X ' for obtained n row m ' row of sampling) is subjected to rarefaction representation on dictionary D, and constructs observing matrix;
B4) data set X ' is observed based on observing matrix, specification is the matrix X " of n ' row m ' row after being compressed;This In embodiment, the data sample sampled every time is sampling number N=600, and correspondence observing matrix is 600 row, and compresses journey Degree depends on observing matrix line number respectively 0.6N, 0.65N, 0.7N ..., the N selected in sampling matrix line number, the present embodiment, It is exactly uncompressed sampling configuration when for N;
B5 the corresponding classification accuracy rates of the matrix X " after data set X ', compression) are calculated;
B6) judge whether classification accuracy rate is set up more than default classification accuracy rate threshold value, execution is redirected if setting up Step B3), else if it is invalid, redirect execution step B7);
B7) result of the data matrix X " of n ' row m ' row as compression sampling is exported.
In the present embodiment, step B1) build the dictionary D based on sparse base and specifically refer to DCT (Discrete Cosine Transform) dictionary.The design of observing matrix, except to meet it is uncorrelated to sparse base in addition to, candes and Tao give Limited equidistant this necessary and sufficient condition of property RIP.In the present embodiment, step B2) in the observing matrix of construction be independent identically distributed Gaussian random matrix, can meet uncorrelated to sparse base simultaneously, and limited equidistant this necessary and sufficient condition of property RIP.
It is used below in order to assess the present embodiment towards the performance of the intrusion detection method of complicated flow data event analysis Confusion matrix, and on the basis of confusion matrix, be further introduced into verification and measurement ratio, classification accuracy rate, rate of false alarm, the modeling time, Several detection performance indications such as detection time, confusion matrix.
All examples in model are divided into different classes by confusion matrix by determining whether predicted value matches with actual value Not.Then all examples in each classification can be counted, and shows total in a matrix, as shown in table 1:
Table 1:Confusion matrix.
According to the true classification and detection model of given sample to its class prediction, it is likely to occur altogether such as table 1 4 kinds of shown structures:True positive (TP), true negative (TN), false positive (FP) and false negative(FN).Wherein TN detection models corresponding with TP it is correctly predicted, i.e., sample is correctly identified as normal or attack.And FP and FN then correspond to error prediction, and FP refers to normal sample and is misidentified as attack, and FN refers to attack sample and is erroneously identified as normally.
Based on confusion matrix, each index above-mentioned can be calculated.Computing formula is as follows:
In above formula, TP represents that true positive, FN represent that false negative, TN represent true negative, FP represents false positive, and FP and FN then correspond to error prediction, and FP refers to normal sample and is misidentified as attack, and FN, which refers to, to be attacked Sample is hit to be erroneously identified as normally.
Intrusion detection method experiment for the present embodiment towards complicated flow data event analysis is completed under Weka. The intrusion detection method of the present embodiment towards complicated flow data event analysis uses first Feature Dimension Reduction, the compression side of rear sample reduction Formula, is pre-processed to data set, then carries out sample in SVM, naive Bayesian, three kinds of graders of C4.5 graders respectively Detection, and compare the indexs such as their classification accuracy rate, modeling time, detection time, as shown in Figure 2, Figure 3, Figure 4.
It is can be seen that from Fig. 2, Fig. 3, Fig. 4 under relatively low sampling number N, classification of the SVM than naive Bayesian and C4.5 More preferably, main cause is attributed to advantages of the SVM in terms of processing higher-dimension, Small Sample Database to performance, with sampling number N increasing Many, naive Bayesian and C4.5 classification performance rising are more apparent, and final classification accuracy is slightly above SVM, meanwhile, SVM's builds Mould time and detection time between C4.5 and naive Bayesian (sampling number N be 450 after), it can thus be seen that SVM Most suitable as the present embodiment towards the intrusion detection method of complicated flow data event analysis grader.
Embodiment two:
Essentially identical with embodiment one in the present embodiment, its main difference is:Data prediction bag in embodiment one Two steps of Feature Dimension Reduction and sample reduction are included, and execution sequence is first Feature Dimension Reduction sample reduction again.And in the present embodiment, number Data preprocess equally includes two steps of Feature Dimension Reduction and sample reduction, but execution sequence is that feature drops first sample reduction again Dimension, but sample reduction, the detailed content of two steps of Feature Dimension Reduction are identical with embodiment one, therefore will not be repeated here.
In order to further verify the present embodiment towards the performance of the intrusion detection method of complicated flow data event analysis, experiment Comparing traditional uncompressed sample and first Feature Compression, redundant samples compression and first redundant samples compress feature again again The mode of compression, the result of classification and Detection is carried out using SVM as grader.Fig. 5, Fig. 6, Fig. 7 illustrate experimental data set process Above-mentioned three kinds of modes handle after verification and measurement ratio (DR), classification accuracy rate (TR), the situation of rate of false alarm (FPR).From Fig. 5, Fig. 6, Fig. 7 As can be seen that embodiment one (PCA-CS) is roughly the same with the tendency of the curve of the present embodiment (CS-PCA) method, in low sampling Under times N, classification accuracy rate, verification and measurement ratio are all than relatively low, and rate of false alarm is higher.With sampling number N increase, performance is substantially obtained Lifting, when particularly sampling number N is between 500-550, accuracy, verification and measurement ratio and rate of false alarm tend towards stability, and reach and tradition Uncompressed sampling equivalent effect.In order to further verify two methods of PCA-CS and CS-PCA in modeling time, detection Between upper performance quality, using 10 10 folding cross validations and by the way of being averaged, respectively to the various times of the two methods (when sampling number N is 550) is counted, as shown in table 2 and table 3.
Table 2:The time statistics of the present embodiment.
Table 3:The time statistics of embodiment one.
The modeling time of the method for use embodiment one and detection time are with the present embodiment side it can be seen from table 2 and table 3 Method is roughly equal, and the performance of embodiment one is slightly good, it was demonstrated that the present embodiment is towards the intrusion detection side of complicated flow data event analysis The PCA and the data compression method of compressed sensing that method is proposed are feasible, stable.
Table 4 compares embodiment one, the present embodiment and the uncompressed method of tradition, the grader constituted with SVM it is flat Model and detection time.
Table 4:Modeling time and detection time performance comparison.
Method Averagely model time/S The average detected time/S
The uncompressed method of tradition 25.5 3.31
The present embodiment method 14.32 1.64
The method of embodiment one 14.09 1.40
From Fig. 5, Fig. 6, Fig. 7, table 2~4 is as can be seen that classification accuracy rate and verification and measurement ratio are slightly below before compression upon compression In the range of (1%-3%), the method being combined using PCA and compressed sensing is performed intrusion detection, and can greatly simplify amount of calculation, Efficiently reduce the time of training and detection, it was demonstrated that the intrusion detection method of the present embodiment towards complicated flow data event analysis is carried The intrusion detection method based on PCA and compressed sensing gone out is feasible, effective.
The above is only the preferred embodiment of the present invention, and protection scope of the present invention is not limited merely to above-mentioned implementation Example, all technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that for the art Those of ordinary skill for, some improvements and modifications without departing from the principles of the present invention, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims (10)

1. a kind of intrusion detection method towards complicated flow data event analysis, it is characterised in that implementation steps include:
1) sample collection is carried out in advance, and the sample of collection is performed intrusion detection into result queue and obtains training sample data collection, will The training sample data collection completes the training to grader after data prediction, and the data prediction drops including feature Two steps of peacekeeping sample reduction, and execution sequence is first Feature Dimension Reduction sample reduction or first sample reduction Feature Dimension Reduction again again;
2) current data set to be detected is inputted to the grader trained after the data prediction, grader is obtained and enters The intrusion detection result exported after row classification and Detection.
2. the intrusion detection method according to claim 1 towards complicated flow data event analysis, it is characterised in that step 1) result queue is performed intrusion detection in and specifically refers to the sample labeling of collection be normal sample and attack sample, step 2) in The intrusion detection result of output refers to that output is normal or attacks.
3. the intrusion detection method according to claim 1 towards complicated flow data event analysis, it is characterised in that described Grader is specially SVM classifier.
4. the intrusion detection method according to claim 1 towards complicated flow data event analysis, it is characterised in that described Feature Dimension Reduction in data prediction specifically refers to carry out PCA Feature Dimension Reductions.
5. the intrusion detection method according to claim 4 towards complicated flow data event analysis, it is characterised in that described Carrying out the detailed step of PCA Feature Dimension Reductions includes:
A1) the character type Feature Mapping for concentrating input data is the numeric type feature in the range of [0,1], completes input data set Nondimensionalization, obtain n rows m row data set Xnm, data set XnmAltogether comprising n × m sample, and n × m sample belongs to respectively In k different classification y1~yk), wherein sample XiBelong to classification yi
A2) from data set XnmMiddle traversal takes out a feature and is used as current signature Xi
A3 current signature X) is calculatediCovariance matrix Covx’
A4 covariance matrix Cov) is calculatedx’Eigenvalue λ12,…,λm, and eigenvalue λ12,…,λmCorresponding normalization is special Levy vectorial a1,a2,…,am
A5) by eigenvalue λ12,…,λmSort from big to small, calculate variance contribution ratio
A6 variance contribution ratio) is judgedWhether default variance contribution ratio threshold value is less than, if less than default variance contribution ratio Threshold value, then redirect execution step A2), otherwise redirect execution next step;
A7) by eigenvalue λ12,…,λmCorresponding normalization characteristic vector a1,a2,…,amThe principal component of one m rows k row of composition Matrix Pmk, by principal component matrix PmkThe matrix Z for obtaining n rows k row is calculated according to formula (1)nk
Znk=Xnm×Pmk (1)
In formula (1), ZnkRepresent to calculate the matrix that obtained n rows k is arranged, XnmRepresent the data set of n rows m row, PmkRepresent m rows k row Principal component matrix;
A8) the matrix Z for arranging n rows knkExported as the result of PCA Feature Dimension Reductions.
6. the intrusion detection method according to claim 5 towards complicated flow data event analysis, it is characterised in that step A5 variance contribution ratio) is calculatedFunction expression such as formula (2) shown in;
φ ( k ) = Σ i = 1 k cov i i / Σ i = 1 m cov i i = Σ i = 1 k λ i / Σ i = 1 m λ i - - - ( 2 )
In formula (2), coviiFor covariance matrix Covx’Diagonal entry, λ i represent covariance matrix Covx’Ith feature Value, m represents data set X columns, and k represents principal component matrix PmkColumns.
7. the intrusion detection method according to claim 1 towards complicated flow data event analysis, it is characterised in that described Sample reduction in data prediction specifically refers to be compressed the data set of input sampling, and step based on compressed sensing algorithm It is rapid 1) in the training of grader when, select foregoing to be compressed sampling by observing the accuracy rate of detection of classifier result Sampling number N.
8. the intrusion detection method according to claim 7 towards complicated flow data event analysis, it is characterised in that described The detailed step for being compressed sampling to the data set of input based on compressed sensing algorithm includes:
B1 the dictionary D based on sparse base) is built;
B2) data based on given sampling number N persistently to input are sampled;
B3 the data set X ' for obtained n row m ' row of sampling) is subjected to rarefaction representation on dictionary D, and constructs observing matrix;
B4) data set X ' is observed based on observing matrix, specification is the matrix X " of n ' row m ' row after being compressed;
B5 the corresponding classification accuracy rates of the matrix X " after data set X ', compression) are calculated;
B6) judge whether classification accuracy rate is set up more than default classification accuracy rate threshold value, execution step is redirected if setting up B3), else if invalid, execution step B7 is redirected);
B7) result of the data matrix X " of n ' row m ' row as compression sampling is exported.
9. the intrusion detection method according to claim 8 towards complicated flow data event analysis, it is characterised in that step B1) build the dictionary D based on sparse base and specifically refer to DCT dictionaries.
10. the intrusion detection method according to claim 8 towards complicated flow data event analysis, it is characterised in that step Rapid B2) in the observing matrix of construction be independent identically distributed gaussian random matrix.
CN201710146332.4A 2017-03-13 2017-03-13 A kind of intrusion detection method towards complicated flow data event analysis Pending CN106951778A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710146332.4A CN106951778A (en) 2017-03-13 2017-03-13 A kind of intrusion detection method towards complicated flow data event analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710146332.4A CN106951778A (en) 2017-03-13 2017-03-13 A kind of intrusion detection method towards complicated flow data event analysis

Publications (1)

Publication Number Publication Date
CN106951778A true CN106951778A (en) 2017-07-14

Family

ID=59468268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710146332.4A Pending CN106951778A (en) 2017-03-13 2017-03-13 A kind of intrusion detection method towards complicated flow data event analysis

Country Status (1)

Country Link
CN (1) CN106951778A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590697A (en) * 2017-09-18 2018-01-16 北京京东尚科信息技术有限公司 Data processing method and its system
CN109583904A (en) * 2018-11-30 2019-04-05 深圳市腾讯计算机系统有限公司 Training method, impaired operation detection method and the device of abnormal operation detection model
CN109784668A (en) * 2018-12-21 2019-05-21 国网江苏省电力有限公司南京供电分公司 A kind of sample characteristics dimension-reduction treatment method for electric power monitoring system unusual checking
CN109962909A (en) * 2019-01-30 2019-07-02 大连理工大学 A kind of network intrusions method for detecting abnormality based on machine learning
CN110191081A (en) * 2018-02-22 2019-08-30 上海交通大学 The Feature Selection system and method for network flow attack detecting based on learning automaton
CN110401649A (en) * 2019-07-17 2019-11-01 湖北央中巨石信息技术有限公司 Information Security Risk Assessment Methods and system based on Situation Awareness study
CN110610148A (en) * 2019-09-02 2019-12-24 南京邮电大学 Privacy protection-oriented compressed sensing visual shielding video behavior identification method
CN112149818A (en) * 2019-06-27 2020-12-29 北京数安鑫云信息技术有限公司 Threat identification result evaluation method and device
CN112437053A (en) * 2020-11-10 2021-03-02 国网北京市电力公司 Intrusion detection method and device
CN113254925A (en) * 2021-02-01 2021-08-13 中国人民解放军海军工程大学 Network intrusion detection system based on PCA and SVM

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009096903A1 (en) * 2008-01-28 2009-08-06 National University Of Singapore Lipid tumour profile
CN101968813A (en) * 2010-10-25 2011-02-09 华北电力大学 Method for detecting counterfeit webpage
CN102158486A (en) * 2011-04-02 2011-08-17 华北电力大学 Method for rapidly detecting network invasion
CN103440513A (en) * 2013-09-17 2013-12-11 西安电子科技大学 Method for determining specific visual cognition state of brain based on sparse nonnegative tensor factorization (SNTF)
CN103618744A (en) * 2013-12-10 2014-03-05 华东理工大学 Intrusion detection method based on fast k-nearest neighbor (KNN) algorithm
CN105160295A (en) * 2015-07-14 2015-12-16 东北大学 Rapid high-efficiency face identification method for large-scale face database
CN105897517A (en) * 2016-06-20 2016-08-24 广东电网有限责任公司信息中心 Network traffic abnormality detection method based on SVM (Support Vector Machine)
CN106407905A (en) * 2016-08-31 2017-02-15 电子科技大学 Machine learning-based wireless sensing motion identification method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009096903A1 (en) * 2008-01-28 2009-08-06 National University Of Singapore Lipid tumour profile
CN101968813A (en) * 2010-10-25 2011-02-09 华北电力大学 Method for detecting counterfeit webpage
CN102158486A (en) * 2011-04-02 2011-08-17 华北电力大学 Method for rapidly detecting network invasion
CN103440513A (en) * 2013-09-17 2013-12-11 西安电子科技大学 Method for determining specific visual cognition state of brain based on sparse nonnegative tensor factorization (SNTF)
CN103618744A (en) * 2013-12-10 2014-03-05 华东理工大学 Intrusion detection method based on fast k-nearest neighbor (KNN) algorithm
CN105160295A (en) * 2015-07-14 2015-12-16 东北大学 Rapid high-efficiency face identification method for large-scale face database
CN105897517A (en) * 2016-06-20 2016-08-24 广东电网有限责任公司信息中心 Network traffic abnormality detection method based on SVM (Support Vector Machine)
CN106407905A (en) * 2016-08-31 2017-02-15 电子科技大学 Machine learning-based wireless sensing motion identification method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
闫敬文: "《压缩感知及应用》", 31 October 2015 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590697A (en) * 2017-09-18 2018-01-16 北京京东尚科信息技术有限公司 Data processing method and its system
CN110191081A (en) * 2018-02-22 2019-08-30 上海交通大学 The Feature Selection system and method for network flow attack detecting based on learning automaton
CN109583904B (en) * 2018-11-30 2023-04-07 深圳市腾讯计算机系统有限公司 Training method of abnormal operation detection model, abnormal operation detection method and device
CN109583904A (en) * 2018-11-30 2019-04-05 深圳市腾讯计算机系统有限公司 Training method, impaired operation detection method and the device of abnormal operation detection model
CN109784668A (en) * 2018-12-21 2019-05-21 国网江苏省电力有限公司南京供电分公司 A kind of sample characteristics dimension-reduction treatment method for electric power monitoring system unusual checking
CN109962909A (en) * 2019-01-30 2019-07-02 大连理工大学 A kind of network intrusions method for detecting abnormality based on machine learning
CN109962909B (en) * 2019-01-30 2021-05-14 大连理工大学 Network intrusion anomaly detection method based on machine learning
CN112149818A (en) * 2019-06-27 2020-12-29 北京数安鑫云信息技术有限公司 Threat identification result evaluation method and device
CN112149818B (en) * 2019-06-27 2024-04-09 北京数安鑫云信息技术有限公司 Threat identification result evaluation method and device
CN110401649A (en) * 2019-07-17 2019-11-01 湖北央中巨石信息技术有限公司 Information Security Risk Assessment Methods and system based on Situation Awareness study
CN110610148A (en) * 2019-09-02 2019-12-24 南京邮电大学 Privacy protection-oriented compressed sensing visual shielding video behavior identification method
CN110610148B (en) * 2019-09-02 2022-02-08 南京邮电大学 Privacy protection-oriented compressed sensing visual shielding video behavior identification method
CN112437053A (en) * 2020-11-10 2021-03-02 国网北京市电力公司 Intrusion detection method and device
CN112437053B (en) * 2020-11-10 2023-06-30 国网北京市电力公司 Intrusion detection method and device
CN113254925B (en) * 2021-02-01 2022-11-15 中国人民解放军海军工程大学 Network intrusion detection system based on PCA and SVM
CN113254925A (en) * 2021-02-01 2021-08-13 中国人民解放军海军工程大学 Network intrusion detection system based on PCA and SVM

Similar Documents

Publication Publication Date Title
CN106951778A (en) A kind of intrusion detection method towards complicated flow data event analysis
CN111967502B (en) Network intrusion detection method based on conditional variation self-encoder
CN109886020B (en) Software vulnerability automatic classification method based on deep neural network
CN108737406A (en) A kind of detection method and system of abnormal flow data
CN109948125B (en) Method and system for improved Simhash algorithm in text deduplication
CN102346829A (en) Virus detection method based on ensemble classification
CN104346459B (en) A kind of text classification feature selection approach based on term frequency and chi
CN102142082B (en) Virtual sample based kernel discrimination method for face recognition
CN109190698B (en) Classification and identification system and method for network digital virtual assets
WO2022121163A1 (en) User behavior tendency identification method, apparatus, and device, and storage medium
CN107203750B (en) Hyperspectral target detection method based on combination of sparse expression and discriminant analysis
CN112437053B (en) Intrusion detection method and device
CN113505826B (en) Network flow anomaly detection method based on joint feature selection
CN112820416A (en) Major infectious disease queue data typing method, typing model and electronic equipment
CN112884570A (en) Method, device and equipment for determining model security
CN116204831A (en) Road-to-ground analysis method based on neural network
CN112085062A (en) Wavelet neural network-based abnormal energy consumption positioning method
CN113283901B (en) Byte code-based fraud contract detection method for block chain platform
CN104616027A (en) Non-adjacent graph structure sparse face recognizing method
CN114999628B (en) Method for searching for obvious characteristic of degenerative knee osteoarthritis by using machine learning
CN105824785A (en) Rapid abnormal point detection method based on penalized regression
CN106709598B (en) Voltage stability prediction and judgment method based on single-class samples
CN112507299B (en) Self-adaptive keystroke behavior authentication method and device in continuous identity authentication system
CN111950717B (en) Public opinion quantification method based on neural network
CN111382273B (en) Text classification method based on feature selection of attraction factors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170714