CN108632278A

CN108632278A - A kind of network inbreak detection method being combined with Bayes based on PCA

Info

Publication number: CN108632278A
Application number: CN201810433476.2A
Authority: CN
Inventors: 胡昌振; 任家东; 刘智扬; 张炳; 赵小林; 单纯
Original assignee: Yanshan University; Beijing Institute of Technology BIT
Current assignee: Yanshan University; Beijing Institute of Technology BIT
Priority date: 2018-05-08
Filing date: 2018-05-08
Publication date: 2018-10-09

Abstract

The invention discloses a kind of network inbreak detection methods being combined with Bayes based on PCA.The fast and effective detection to common, general type attack and new type attack can be realized using the present invention, and detection time is short, and accuracy is high.The present invention first obtains training dataset and test data set application PCA the training data after dimensionality reduction and test data, reduce model training time and the detection time of Bayes classifier, then the Bayes classifier for using detection time most fast performs intrusion detection, realize quickly detection, simultaneously, the present invention also improves PCA, improves the accuracy of detection so that the method for the present invention when detecting between it is efficient with performance in detection accuracy.

Description

A kind of network inbreak detection method being combined with Bayes based on PCA

Technical field

The present invention relates to Intrusion Detection fields, and in particular to a kind of net being combined with Bayes based on PCA Network intrusion detection method.

Background technology

Also there are many safety problems while bringing people and facilitating in internet.Network attack does not exist all the time Occur, Research on Network Intrusion Detection has practical significance and the significant challenge in current network security field.

Dorothy Denning gave definition in 1987 to intrusion detection：Pass through monitoring network data information, detection Go out intrusion behavior, before intrusion behavior causes damages, sends out alarm and responded (Denning, DE.1987.An Intrusion-Detection Model[J].IEEE Transactions on Software Engineering,SE-13 (2):222-232.).It can be found that an important feature of intrusion detection is exactly instantaneity, detection method needs quickly to attack Information is judged, can be alarmed before harm occurs.There are two main classes for traditional intrusion detection method.The first kind is Rule-based intrusion detection, it depends on the feature of preliminary analysis particular attack type, then the attack signature is recorded Rule file is detected finally by matching rule file.Such methods are mainly used in that some are commercial or that increases income enters Invade detecting system (IDS), for example, Snort intruding detection systems use be exactly this method (Park, W., Ahn, S.2017.Performance Comparison and Detection Analysis in Snort and Suricata Environment[J].wireless personal communocations,94(2):241-252；Gaddam,R., Nandhini,M.2017.An Analysis of Various Snort Based Techniques to Detect and Prevent Intrusions in Networks Proposal with Code Refactoring Snort Tool in Kali Linux Environment[J].International Conference on Inventive Communication and Computational Technologies(ICICCT):10-15), because rule-based intrusion detection has detection Fireballing feature.But a significant problem existing for this method is cannot to detect new attack type, can only detect to advise Then existing attack type in library.The attack means of hacker are continually changing, often have the generation of new attack type, and new Attack type often endanger bigger, and there is also the high problems of rate of false alarm for this method.Second class method is with close The rise of a little year machine learning and data mining, data digging method have been applied among intrusion detection.Data digging method It by marked data set come training pattern, is then performed intrusion detection, unknown can be attacked by trained model The detection for hitting type has good effect, such as (Dong S K, Park J are S.2003.Network-Based by SVM Intrusion Detection with Support Vector Machines[M].//Information Networking.Berlin:Springer:747-756；Yendrapalli,K.,Mukkamala,S.,Sung,AH., Ribeiro,B.2007.World Congress on Engineering:321-325), neural network (Ryan, J., Lin, The methods of MJ., Miikkulainen R.1997.Intrusion Detection with Neural Networks [J]).Number Intrusion detection is applied to according to excavation to need to collect a large amount of data set in advance, which has limited online intrusion detection (Huang, C.- T.,Chang,R.K.C.,Huang,P.,2009.Editorial:signal processing applications in network intrusion detection systems.EURASIP J.Adv.Signal Process 2009,9:1–9: 2)。

Traditional intrusion detection method concentrate on data mining (He, W., Hu, G., Yao, X., Kan, G., Wang, H., Xiang,H.,2008.Applying multiple time series data mining to large-scale network traffic analysis.In:Proceedings of 2008 IEEE Conference on Cybernetics and Intelligent Systems,pp.394–399；Ghourabi,A.,Abbes,T.,Bouhoula, A.,2010.Data analyzer based on data mining for honeypot router.In:Proceedings of 2010 IEEE/ACS International Conference on Computer Systems and Applications(AICCSA),IEEE,pp.1-6；Osanaiye,O.,Choo,K.-K.R.,Dlodlo,M., 2016.Distributed denial of service(ddos)resilience in cloud:review and Conceptual cloud ddos mitigation framework.J.Netw.Comput.Appl.67,147-165) and it is general Logical file analysis (Raynal, F., Berthier, Y., Biondi, P., Kaminsky, D., 2004.Honeypot forensics.In:Proceedings from the Fifth Annual IEEE SMC,Information Assurance Workshop,2004,IEEE,pp.22-29).An, WJ et al. by divergence in the infima species in Fisher discriminant analyses with tradition The method that is combined of support vector machines apply in intrusion detection, it is proposed that divergence minimum support vector machines (WCS- in class SVM), there are better intrusion detection rate (An WJ., Liang MG.2013.A new than traditional support vector machines intrusion detection method based on SVM with minimum within-class scatter[J] .Security and Communication Networks,6:1064-1074).Kabir, E et al. propose to be based on least square Intruding detection system (Kabir, E., Hu, JK., Wang, H., Zhuo, the GP.2017.A novel of support vector machines (LS-SVM) statistical technique for intrusion detection systems[J].Future Generation Computer Systems:303-318).Mrudula Gudadhe et al. propose that a kind of method of new promotion decision tree is answered In intrusion detection, it allows that multiple decision trees is combined to form grader (Mrudula a Gudadhe, Prakash Prasad,Kapil Wankhade,2010.A New Data Mining Based Network Intrusion Detection Model[J],Computer and Communication Technology(ICCCT 2010),IEEE, pp.731-735).Back-propagation artificial neural network model is applied to intrusion detection by Sufyan, T et al. so that the effect of IDS Rate higher, system can adapt to new environment, can cope with new attack type (Sufyan T.Faraj Al-Janabi, Hadeel Amjed Saeed,2011.A Neural Network Based Anomaly Intrusion Detection System[J],Developments in Esystems Engineering,IEEE,pp.221-226).Because of network data Collection is very big, and manual markings classification is time-consuming and laborious, and clustering method is introduced in (Deepthy K in data set classification Denatious,Anita John.2012.Survey on Data Mining Techniques to Enhance Intrusion Detection[J].Computer Communication and Informatics(ICCCI-2012), Jan.10–12,2012,Coimbatore,INDIA,IEEE).Y- means clustering algorithms intrusion detection (Yu Guan and Ali A.Ghorbani,Nabil Belacel.2003.Y-Means:A Clustering Method For Intrusion Detection [j], Electrical and Computer Engineering, IEEE, pp.1083-1086), overcome two A disadvantage：The dependence and degeneration of k- mean value Digital Clusterings.Data set is divided into an appropriate number of cluster by the method automatically, is utilized It is feasible and effective that clustering, which performs intrusion detection,.K- mean values (LI Han.2010.Research and Implementation of an Anomaly Detection Model Based on Clustering Analysis[J] .Intelligence Information Processing and Trusted Computing(IPTC 2010),IEEE, HuangGang, China, pp.458-462) it is a simplest partitioning algorithm, solve well-known clustering problem.It is poly- Class algorithm utilizes SOM and k- mean values (WANG Huai-bin, YANG Hong-liang, XU Zhi-jian, YUAN Zheng.2010.A clustering algorithm use SOM and K-Means in Intrusion Detection [J] .E-Business and EGovernment, IEEE, pp.1281-1284) traditional SOM can be overcome essence can not be provided The shortcomings that true cluster result, the shortcomings that overcoming traditional k- mean values dependent on initial value and be difficult to find that cluster centre.For IDS Parallel clustering Integrated Algorithm (Hongwei Gao, Dingju Zhu, the Xiaomin Wang.2010.A Parallel of proposition Clustering Ensemble Algorithm for Intrusion Detection System In Proceedings of 2010 Ninth International Symposium on Distributed Computing and Applications to Business [j], Engineering and Science, IEEE, pp.450-453) realize height Speed, high detection rate, and rate of false alarm is low.(Akashdeep., Manzoor I., Kumar are N.2017.A for ANN classification device feature reduced intrusion detection system using ANN classifier[J].Expert Systems with Applications:249-257) also there is good performance in intrusion detection.By using mixed Learning method can reach higher verification and measurement ratio and lower rate of false alarm (Z.Muda, W.Yassin, M.N.Sulaiman, N.I.Udzir.2011.Intrusion Detection based on K-Means Clustering and Naive Bayes Classification[J].7th International Conference on IT in Asia(CITA), IEEE；Moriteru Ishida,Hiroki Takakura,Yasuo Okabe.2011.High-Performance Intrusion Detection Using OptiGrid Clustering and Grid-based Labelling[J] .International Symposium n Applications and the Internet,IEEE,pp.11-19；Hari Om,Aritra Kundu.2012.A Hybrid System for Reducing the False Alarm Rate of Anomaly Intrusion Detection System[J].Information Technology(RAIT-2012), IEEE), such as cluster and classification are combined and can obtain good effect.Also Syed Ali RazaShah et al. directly compare Compared with machine learning method be directly applied in snort intruding detection systems detection performance (Syed Ali RazaShah, .BijuIssac.2017.Performance comparison of intrusion detection systems and application of machine learning to Snort system[J].Future Generation Computer Systems:157-170)。

Other than the above intrusion detection method based on data mining, intrusion detection based on stream (Umer, MF., Sher,BY.,Bi,YX.2017.Flow-based intrusion detection:Techniques and challenges [J].Computers&Security:238-254) it is a kind of innovative approach for detecting express network invasion.Invasion based on stream Detection only checks packet header, not the payload of analysis bag.Filtering method selects work(using the RIA of some predefined standards Can be eliminated from data set unrelated correlated characteristic (Barmejo, P., Ossa, L., Gamez, J.A., ＆Puerta, J.M.2012.Fastwrapper feature subset selection in high dimensional datasets by means of filter re ranking.Journal of Knowledge Based Systems,25,35-44)。

Traditional file analysis intrusion detection method may be effective common and general type attack, but not It can be effective to new attack technology.Although common data digging method can be applicable in new attack type, there is detection The problem of overlong time.

Invention content

In view of this, the present invention provides a kind of network inbreak detection method being combined with Bayes based on PCA, it can Realize the fast and effective detection to common, general type attack and new type attack, detection time is short, and accuracy is high.

The network inbreak detection method of the present invention being combined with Bayes based on PCA, is included the following steps：

Step 1, the network number for each network connection record that network training data set and network testing data are concentrated is extracted According to feature, training network data characteristics matrix and test training network data characteristics matrix are built；

Step 2, special to training network data characteristics matrix and test training network data respectively using Principal Component Analysis It levies matrix and carries out dimensionality reduction, obtain the training network data characteristics matrix after dimensionality reduction and test training network data characteristics matrix；Its In, during principal component analysis, preceding 3 feature vectors are weighted, weight coefficient k=0~1；

Step 3, Bayes classifier is built, using the training network data characteristics matrix after dimensionality reduction to Bayes classifier It is trained；

Step 4, the test network data characteristics matrix after dimensionality reduction is invaded using trained Bayes classifier Detection.

Further, in the step 1, network data feature includes the essential characteristic of TCP connection, the content of TCP connection Feature and network flow statistic feature.

Further, in the step 2, the training network data characteristics matrix that step 1 obtains is trained with test first The numerical value under each feature in network data eigenmatrix is normalized, and the training network data after being normalized is special Levy matrix and test training network data characteristics matrix；Then to the training network data characteristics matrix and test instruction after normalization Practice network data eigenmatrix and carry out principal component analysis, obtains the training network data characteristics matrix after dimensionality reduction and test training net Network data characteristics matrix.

Further, k=10^-4~10^-6。

Further, the Bayes classifier is Gauss Naive Bayes Classifier.

Advantageous effect：

The present invention can be performed intrusion detection quickly using Bayes classifier, and may be implemented existing attack type with And the intrusion detection of unknown new attack type；Meanwhile dimension-reduction treatment is carried out to network data using principal component analytical method, Detection time is further increased, also, principal component analytical method is improved, has submitted the accuracy of detection.Side of the present invention Method is compared with traditional network inbreak detection method based on data mining, although there is no big promotion in detection accuracy, But detection time, which is other intrusion detection methods, to be compared.

Description of the drawings

Fig. 1 is intrusion detection method flow chart of the present invention.

Fig. 2 is that verification and measurement ratio compares figure of discounting.

Fig. 3 is that detection time compares figure of discounting.

Specific implementation mode

The present invention will now be described in detail with reference to the accompanying drawings and examples.

With the continuous variation of network intrusions attack means, new attack type constantly generates, and traditional rule-based The detection method of file does not adapt to new attack type, and therefore, the present invention selects the intrusion detection method based on data mining. However, most of intrusion detection method during available data is excavated all has that detection time is long, cannot accomplish in time Detection, for this purpose, the present invention provides a kind of network inbreak detection methods being combined with Bayes based on PCA, first to training Data set D and test data set T applications PCA obtain the training data after dimensionality reduction and test data, reduce Bayes classifier The model training time, greatly shorten detection time, then, invasion inspection carried out using the most fast Bayes classifier of detection time It surveys, detection speed is fast, and is applicable to new attack type；However, the shortening of detection time also implies that the loss of verification and measurement ratio, The present invention improves detection accuracy by the improvement to PCA so that the method for the present invention when detecting between it is correct with detection Performance is efficient in rate.The intrusion detection method flow of the present invention is as shown in Figure 1, specifically comprise the following steps：

Step 1：Extract the network number for each network connection record that network training data set and network testing data are concentrated According to feature

Network data feature includes mainly the content characteristic and network flow statistic of the essential characteristic of TCP connection, TCP connection Feature, but these three features are not limited to, as long as can reflect the feature of network flow.It is specifically described as follows：

The essential characteristic of TCP connection contains the essential attribute of some connections, includes 9 attribute, (1) duration altogether： The duration is connected, in seconds, its definition is counted from TCP connection with 3 foundation of shaking hands, and is tied to FIN/ACK connections Time until beam；(2)protocol_type：Protocol type, including TCP, UDP, ICMP；(3)service：Destination host Types of network services；(4)flag：Connect normal or wrong state；(5)src_bytes：Number from source host to destination host According to byte number；(6)dst_bytes：The byte number of data from destination host to source host；(7)land：If connection from/ It is 1 to be sent to same host/port then, is otherwise 0；(8)wrong_fragment：The quantity of mistake segmentation；(9)urgent： The number of urgent packet.

The content characteristic of TCP connection includes 13 kinds altogether：(1)hot：Access the number of system sensitive file and catalogue；(2) num_failed_logins：The number of login attempt failure；(3)logged_in：It is 1 that success, which logs in then, is otherwise 0；(4) num_compromised：The number that compromised conditions occur；(5)root_shell：It is if obtaining root shell 1, it is otherwise 0；(6)su_attempted：If occurring " it is 1 if su root " order, it is otherwise 0；(7)num_root：Root is used Family access times；(8)num_file_creations：The number of file creation operation；(9)num_shells：It is ordered using shell The number of order；(10)num_access_files：The number of access control file；(11)num_outbound_cmds：One The number of outbound connection in ftp session；(12)is_hot_login：Whether login belongs to " hot " list；(13)is_guest_ login：It is otherwise 0 if it is 1 that guest, which is logged in then,.

Network flow statistic feature includes 9 kinds altogether：(1)count：In past two seconds, and it currently connect mesh having the same Mark the connection number of host；(2)srv_count：In past two seconds, the connection number with same services is connect with current；(3) serror_rate：In past two seconds, it is connect in the connection with same target host with current, the company of " SYN " mistake occurs The percentage connect；(4)srv_serror_rate：In past two seconds, it connect in the connection with same services, goes out with current The percentage of the connection of existing " SYN " mistake；(5)rerror_rate：In past two seconds, with currently connect with same target In the connection of host, there is the percentage of the connection of " REJ " mistake；(6)srv_rerror_rate：In past two seconds, with work as In preceding connection of the connection with same services, there is the percentage of the connection of " REJ " mistake；(7)same_srv_rate：Past It in two seconds, is connect in the connection with same target host with current, hundred of the connection with same services is connect with current Divide ratio；(8)diff_srv_rate：In past two seconds, in the current connection for connect and there is same target host, and it is current Connect the percentage of the connection with different services；(9)srv_diff_host_rate：In past two seconds, with currently connect In connection with same services, with the current percentage for connecting the connection with different target host.

The essential characteristic of TCP connection is defined as B, connection features share 9 kinds, so B={ b1, b2 ..., b9 }.By TCP The content characteristic of connection is defined as C, and content characteristic shares 13 kinds, so C={ c1, c2 ..., c13 }.By network flow statistic spy Sign is defined as F, and traffic characteristic shares 9 kinds, so F={ f1, f2 ..., f9 }.It is D, Test Network by training network data set definition Network data set definition is T.Network connection record in data set is just D_iAnd T_i。

Define 1：A record D in training set_iWith a record T in test set_iIt is as follows：

D_i={ B, C, F }, i=1,2 ..., n, (training dataset includes n items record)

T_i={ B, C, F }, i=1,2 ..., m, (training dataset includes m items record)

Step 2：Using improved principal component analysis (IPCA) method to training network data characteristics matrix and test training net Network data characteristics matrix carries out dimensionality reduction

Principal component analysis (PCA) is that one group of correlated variables is converted into one group of linear uncorrelated variables using orthogonal transformation, Wherein first principal component variance is maximum, is a kind of common dimensionality reduction statistical technique.Invasion inspection is being carried out using Bayes classifier Before survey, dimensionality reduction is carried out using principal component analysis to network data, substantially reduces detection time.

Define 2：Since training data and test data will apply principal component analysis as, D and T are write to the shape of matrix For formula to get to training network data characteristics matrix D and test training network data characteristics matrix T, definition data matrix is X, that X=D or T.A linkage record in X is as follows：

X_i=D_iOr T_i (1)

Principal component analysis is carried out to X matrix, detailed process is as follows：The first step removes the average value of data, and second step calculates The covariance matrix of data matrix, third step calculate covariance matrix eigen vector, the 4th step characteristic value from Small sequence is arrived greatly, and the 5th step is multiplied by the instruction after data matrix X has just obtained dimensionality reduction with the corresponding feature vector of a characteristic values of preceding d ' Practice data set D '/test data set T '.Wherein d '=5~8, effect are more excellent.

Dimensionality reduction is carried out to network data using Principal Component Analysis and has lost detection although improving detection speed Precision improves accuracy of detection for this purpose, the present invention is improved Principal Component Analysis.

Think, first three feature vector of PCA methods can embody the Global Information of network data, when network data by To certain factor influence when, first three feature vector in PCA methods may be contaminated the most serious, if to first three feature Vector carries out certain processing, then influence of the factor to it can be reduced to a certain extent, so as to improve accuracy of detection.By This, the present invention proposes, is weighted processing to first three feature vector, i.e.,：

ω '=(k ω₁,kω₂,kω₃,ω₄,…,ω_d′) (2)

Wherein, ω ' be improved feature vector, k be introducing weight coefficient, weight coefficient be one between 0-1 it Between number, in order to reduce the weight of first three component, the contaminated degree of data reduced, finally with ω ' come to data square Battle array X carries out dimensionality reduction.The pseudocode of improved Principal Component Analysis IPCA is as follows.

Algorithm：IPCA

Input：Training network data characteristics matrix or test network data characteristics matrix X=D or T

Output：Training network data characteristics matrix after dimensionality reduction or test network data characteristics matrix X '=D ' or T '

1.X '=X-mean//removal average value

2. seeking the covariance matrix X ' X ' of data matrix X '^T

3. seeking covariance matrix X ' X '^TEigenvalue λ and feature vector

4. arrayed feature value simultaneously takes a eigenvalue λs of preceding d '₁≥λ₂≥…≥λ_d′

5. ω=(ω₁,ω₂,ω₃,ω₄,…,ω_d′)；ω₁,ω₂,ω₃,ω₄,…,ω_d′Respectively λ₁,λ₂,λ₃, λ₄,…,λ_d′Corresponding feature vector

6. ω '=(k ω₁,kω₂,kω₃,ω₄,…,ω_d′)

7.X '=X ω '

The first row removes the mean value of data matrix, and the second row seeks the covariance matrix of data matrix.The third line is to fourth line The characteristic value and feature value vector of covariance matrix are asked, and characteristic value is arranged from big to small.A features of d ' before fifth line is chosen It is worth the solution that corresponding feature vector is exactly traditional PCA.6th row is weighted processing to the solution of traditional PCA and has just obtained newly Solution, last column is multiplied with new solution with data matrix is reduced to d ' dimensions by data.The size of weighting coefficient passes through experimental verification Depending on, preferably, k=10^-4~10^-6。

In order to eliminate the dimension impact between characteristic, the present invention is first before carrying out principal component analysis to data matrix X First data matrix X is normalized.Specifically, for each row (i.e. heterogeneous networks linkage record in data matrix X Same feature under value), be normalized respectively, i.e.,：Select first under this feature the maximum value of each numerical value with most Then small value uses following formula for each numerical value under this feature, calculate the new value after normalization, the mapping range being newly worth For 0-1.

Wherein,For to X_i,jValue after being normalized；X_j,maxFor the maximum value in jth row in data X；X_j,minFor Minimum value in data X in jth row.

Then rightPrincipal component analysis is carried out, the data matrix X ' after dimensionality reduction is obtained.

Step 3：Build Bayes classifier, using the training network data matrix after dimensionality reduction to Bayes classifier into Row training

Theoretically, all kinds of Bayes classifiers may be applicable to the present invention, such as Gauss Naive Bayes Classifier, Bernoulli Jacob Naive Bayes Classifier etc..The present embodiment is illustrated by taking Gauss Naive Bayes Classifier as an example.

All known in all dependent probabilities, Bayes decision theory considers how to damage based on these probability and erroneous judgement It loses to select optimal category label.For intrusion detection task, need to judge that network flow is normal or abnormal 's.It is therefore assumed that there is 2 kinds of possible category label γ={ c₁,c₂, c₁Represent normal category label, c₂Represent abnormal class mark Note.For each linkage record x=X_i, selection can make the maximum category labels of posterior probability P (c | x).It is fixed based on Bayes Reason, and P (c | x) it can be written as

Wherein, P (c) is class prior probability, P (x | c) it is the class conditional probability that a linkage record x marks c relative to class, P (x) is to be used for the normalized evidence factor.It is unrelated with class label for given linkage record x, evidence factor P (x), therefore P (c | it is x) only related with P (c) and P (x | c).

Naive Bayes Classifier uses " attribute conditions independence assumption ", to known class, it is assumed that all properties phase It is mutually independent, in other words, it is assumed that make a difference to classification results to each attribute independent.Institute's above formula can be rewritten as

Wherein d is the attribute number of every linkage record, if before not applying IPCA, d=31.x_iRemember for connection Record values of the x in ith attribute.Since P (x) is identical for all categories, so the expression of Naive Bayes Classifier Formula is

For continuity attribute, it may be considered that probability density function.It is assumed thatWherein μ_c,iWithIt is the mean value and variance of c class samples value in ith attribute respectively, then has

For each linkage record, the normal posterior probability with abnormal two classifications is calculated separately, is selected wherein larger The category label result as the record.

Step 4 carries out invasion inspection using trained Bayes classifier to the test network data matrix T ' after dimensionality reduction It surveys.

Specifically, Gauss Naive Bayes Classifier (GNB) flow is as follows：To the training dataset D ' (training after dimensionality reduction When)/test data set T ' (when test) using Gauss Naive Bayes Classifier differentiate each record classification.Basis first Formula (7) calculates the conditional probability P (x of each attribute_i| c), then calculates separately record and belong to normal and abnormal class prior probability P(c).Record is finally calculated by formula (6) and belongs to normal with abnormal prior probability, the classification for selecting prior probability big as The testing result of the record.

Alarm mechanism：The result of detection be normally just do nothing, if detect the network data be extremely if to It alarms at family so that user can rapidly have found to invade, and reply in time avoids losing.Type of alarm is not the weight that the present invention is paid close attention to The full patterns of Snort intruding detection systems may be used in point, a catalogue are established to each IP for generating alarm, solution Under packet data recording to corresponding catalogue after code, other than exporting warning message, the header information of packet can also be exported.

Model needs the weight coefficient by constantly adjusting improved PCA, to find most suitable weight coefficient, so that Model inspection effect is more preferable.The sum of the training of model and testing time are exactly detection time, and verification and measurement ratio is exactly to test centralized detecting The correctly record of record number divided by test set sum.

Data handling procedure instance analysis：

The processed flow of data can be well understood by example, first record in training set D is as follows

0,2,19,8,181,5450,0,0,0, (essential characteristic of TCP connection)

0,0,1,0,0,0,0,0,0,0,0,0,0, (content characteristic of TCP connection)

8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00 (time-based network flow statistic feature)

Data set D is normalized according to formula (3), first record is as follows after normalization

0,0.0018,0.0012,0.0016,0,0,0,0,0, (TCP connection essential characteristic)

0,0,0.0037,0,0,0,0,0,0,0,0,0,0, (content characteristic of TCP connection)

0,0,0.00,0.00,0.00,0.00,0.0016,0.00,0.00 (time-based network flow statistic feature)

It can be seen that all features have been mapped to the sections 0-1, using IPCA processes, first after dimensionality reduction connects Record is as follows：

24.701,0,0,0.001,0,-0.001,0,0

After to training dataset D and test data set T dimensionality reductions, so that it may to enter using Gauss naive Bayesian It invades and has detected.In order to compare the effect of model proposed by the present invention, by it respectively with classical Gauss Nae Bayesianmethod, go back There is the method that traditional PCA is combined with Gauss naive Bayesian to compare, it was demonstrated that present invention introduces PCA and to improve PCA's Meaning.Also Bayes classifier and effect of other graders in intrusion detection are compared simultaneously, it was demonstrated that Bayes point Class device when detecting between on advantage.

Experimental situation is a PC for being equipped with intel G2020 processors, 8GB memories and Windows7 operating systems Machine.It compared effect of the common grader in intrusion detection first, as shown in table 1 below.

1 common classification device Contrast on effect of table

By comparison it can be found that although GNB has minimum verification and measurement ratio, but can have been trained in the time of 1.42s Model simultaneously carries out model measurement, although and other grader verification and measurement ratios are relatively high, when detecting between on cannot meet invasion The requirement of detection.The time that support vector machine classifier even needs up to more than 10 hour, iteration decision tree classifier also need Want the time of clock more than 2 points.Therefore the present invention selects GNB graders as the grader of intrusion detection, although verification and measurement ratio is not so good as it His grader, but possess shorter detection time, and by present invention introduces improved PCA verification and measurement ratios will be not less than it His grader.By introducing PCA, the GNB times can also be greatly shortened.

Improved PCA is by being arranged the different weights of first three feature vector, to observe the variation of accuracy rate, to find Go out the best weight coefficient of the model.Different weight coefficient values and corresponding detection accuracy are as shown in table 2 below.

2 weight coefficient of table and accuracy rate

The model of the present invention and the model before improvement are in detection accuracy and comparison such as Fig. 2 and Fig. 3 in detection time It is shown.The present invention is demonstrated compared to classical Bayesian model, detection time can be greatly shortened by introducing PCA, and improving PCA can So that the detection accuracy of model is not less than other data digging methods.

In conclusion the above is merely preferred embodiments of the present invention, being not intended to limit the scope of the present invention. All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in the present invention's Within protection domain.

Claims

1. a kind of network inbreak detection method being combined with Bayes based on PCA, which is characterized in that include the following steps：

Step 1, the network data for extracting each network connection record that network training data set and network testing data are concentrated is special Sign builds training network data characteristics matrix and test training network data characteristics matrix；

Step 2, using Principal Component Analysis respectively to training network data characteristics matrix and test training network data characteristics square Battle array carries out dimensionality reduction, obtains the training network data characteristics matrix after dimensionality reduction and test training network data characteristics matrix；Wherein, exist During principal component analysis, preceding 3 feature vectors are weighted, weight coefficient k=0~1；

Step 3, Bayes classifier is built, Bayes classifier is carried out using the training network data characteristics matrix after dimensionality reduction Training；

Step 4, the test network data characteristics matrix after dimensionality reduction is performed intrusion detection using trained Bayes classifier.

2. the network inbreak detection method being combined as described in claim 1 with Bayes based on PCA, which is characterized in that institute It states in step 1, network data feature includes that the essential characteristic of TCP connection, the content characteristic of TCP connection and network flow statistic are special Sign.

3. the network inbreak detection method being combined as described in claim 1 with Bayes based on PCA, which is characterized in that institute It states in step 2, in the training network data characteristics matrix and test training network data characteristics matrix that are obtained first to step 1 Numerical value under each feature is normalized, the training network data characteristics matrix after being normalized and test training network Data characteristics matrix；Then to after normalization training network data characteristics matrix and test training network data characteristics matrix into Row principal component analysis obtains the training network data characteristics matrix after dimensionality reduction and test training network data characteristics matrix.

4. the network inbreak detection method being combined as claimed in claim 3 with Bayes based on PCA, which is characterized in that k= 10^-4~10^-6。

5. the network inbreak detection method being combined as described in claim 1 with Bayes based on PCA, which is characterized in that institute It is Gauss Naive Bayes Classifier to state Bayes classifier.