CN108632278A - A kind of network inbreak detection method being combined with Bayes based on PCA - Google Patents

A kind of network inbreak detection method being combined with Bayes based on PCA Download PDF

Info

Publication number
CN108632278A
CN108632278A CN201810433476.2A CN201810433476A CN108632278A CN 108632278 A CN108632278 A CN 108632278A CN 201810433476 A CN201810433476 A CN 201810433476A CN 108632278 A CN108632278 A CN 108632278A
Authority
CN
China
Prior art keywords
network data
data characteristics
training
characteristics matrix
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810433476.2A
Other languages
Chinese (zh)
Inventor
胡昌振
任家东
刘智扬
张炳
赵小林
单纯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Beijing Institute of Technology BIT
Original Assignee
Yanshan University
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University, Beijing Institute of Technology BIT filed Critical Yanshan University
Priority to CN201810433476.2A priority Critical patent/CN108632278A/en
Publication of CN108632278A publication Critical patent/CN108632278A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Abstract

The invention discloses a kind of network inbreak detection methods being combined with Bayes based on PCA.The fast and effective detection to common, general type attack and new type attack can be realized using the present invention, and detection time is short, and accuracy is high.The present invention first obtains training dataset and test data set application PCA the training data after dimensionality reduction and test data, reduce model training time and the detection time of Bayes classifier, then the Bayes classifier for using detection time most fast performs intrusion detection, realize quickly detection, simultaneously, the present invention also improves PCA, improves the accuracy of detection so that the method for the present invention when detecting between it is efficient with performance in detection accuracy.

Description

A kind of network inbreak detection method being combined with Bayes based on PCA
Technical field
The present invention relates to Intrusion Detection fields, and in particular to a kind of net being combined with Bayes based on PCA Network intrusion detection method.
Background technology
Also there are many safety problems while bringing people and facilitating in internet.Network attack does not exist all the time Occur, Research on Network Intrusion Detection has practical significance and the significant challenge in current network security field.
Dorothy Denning gave definition in 1987 to intrusion detection:Pass through monitoring network data information, detection Go out intrusion behavior, before intrusion behavior causes damages, sends out alarm and responded (Denning, DE.1987.An Intrusion-Detection Model[J].IEEE Transactions on Software Engineering,SE-13 (2):222-232.).It can be found that an important feature of intrusion detection is exactly instantaneity, detection method needs quickly to attack Information is judged, can be alarmed before harm occurs.There are two main classes for traditional intrusion detection method.The first kind is Rule-based intrusion detection, it depends on the feature of preliminary analysis particular attack type, then the attack signature is recorded Rule file is detected finally by matching rule file.Such methods are mainly used in that some are commercial or that increases income enters Invade detecting system (IDS), for example, Snort intruding detection systems use be exactly this method (Park, W., Ahn, S.2017.Performance Comparison and Detection Analysis in Snort and Suricata Environment[J].wireless personal communocations,94(2):241-252;Gaddam,R., Nandhini,M.2017.An Analysis of Various Snort Based Techniques to Detect and Prevent Intrusions in Networks Proposal with Code Refactoring Snort Tool in Kali Linux Environment[J].International Conference on Inventive Communication and Computational Technologies(ICICCT):10-15), because rule-based intrusion detection has detection Fireballing feature.But a significant problem existing for this method is cannot to detect new attack type, can only detect to advise Then existing attack type in library.The attack means of hacker are continually changing, often have the generation of new attack type, and new Attack type often endanger bigger, and there is also the high problems of rate of false alarm for this method.Second class method is with close The rise of a little year machine learning and data mining, data digging method have been applied among intrusion detection.Data digging method It by marked data set come training pattern, is then performed intrusion detection, unknown can be attacked by trained model The detection for hitting type has good effect, such as (Dong S K, Park J are S.2003.Network-Based by SVM Intrusion Detection with Support Vector Machines[M].//Information Networking.Berlin:Springer:747-756;Yendrapalli,K.,Mukkamala,S.,Sung,AH., Ribeiro,B.2007.World Congress on Engineering:321-325), neural network (Ryan, J., Lin, The methods of MJ., Miikkulainen R.1997.Intrusion Detection with Neural Networks [J]).Number Intrusion detection is applied to according to excavation to need to collect a large amount of data set in advance, which has limited online intrusion detection (Huang, C.- T.,Chang,R.K.C.,Huang,P.,2009.Editorial:signal processing applications in network intrusion detection systems.EURASIP J.Adv.Signal Process 2009,9:1–9: 2)。
Traditional intrusion detection method concentrate on data mining (He, W., Hu, G., Yao, X., Kan, G., Wang, H., Xiang,H.,2008.Applying multiple time series data mining to large-scale network traffic analysis.In:Proceedings of 2008 IEEE Conference on Cybernetics and Intelligent Systems,pp.394–399;Ghourabi,A.,Abbes,T.,Bouhoula, A.,2010.Data analyzer based on data mining for honeypot router.In:Proceedings of 2010 IEEE/ACS International Conference on Computer Systems and Applications(AICCSA),IEEE,pp.1-6;Osanaiye,O.,Choo,K.-K.R.,Dlodlo,M., 2016.Distributed denial of service(ddos)resilience in cloud:review and Conceptual cloud ddos mitigation framework.J.Netw.Comput.Appl.67,147-165) and it is general Logical file analysis (Raynal, F., Berthier, Y., Biondi, P., Kaminsky, D., 2004.Honeypot forensics.In:Proceedings from the Fifth Annual IEEE SMC,Information Assurance Workshop,2004,IEEE,pp.22-29).An, WJ et al. by divergence in the infima species in Fisher discriminant analyses with tradition The method that is combined of support vector machines apply in intrusion detection, it is proposed that divergence minimum support vector machines (WCS- in class SVM), there are better intrusion detection rate (An WJ., Liang MG.2013.A new than traditional support vector machines intrusion detection method based on SVM with minimum within-class scatter[J] .Security and Communication Networks,6:1064-1074).Kabir, E et al. propose to be based on least square Intruding detection system (Kabir, E., Hu, JK., Wang, H., Zhuo, the GP.2017.A novel of support vector machines (LS-SVM) statistical technique for intrusion detection systems[J].Future Generation Computer Systems:303-318).Mrudula Gudadhe et al. propose that a kind of method of new promotion decision tree is answered In intrusion detection, it allows that multiple decision trees is combined to form grader (Mrudula a Gudadhe, Prakash Prasad,Kapil Wankhade,2010.A New Data Mining Based Network Intrusion Detection Model[J],Computer and Communication Technology(ICCCT 2010),IEEE, pp.731-735).Back-propagation artificial neural network model is applied to intrusion detection by Sufyan, T et al. so that the effect of IDS Rate higher, system can adapt to new environment, can cope with new attack type (Sufyan T.Faraj Al-Janabi, Hadeel Amjed Saeed,2011.A Neural Network Based Anomaly Intrusion Detection System[J],Developments in Esystems Engineering,IEEE,pp.221-226).Because of network data Collection is very big, and manual markings classification is time-consuming and laborious, and clustering method is introduced in (Deepthy K in data set classification Denatious,Anita John.2012.Survey on Data Mining Techniques to Enhance Intrusion Detection[J].Computer Communication and Informatics(ICCCI-2012), Jan.10–12,2012,Coimbatore,INDIA,IEEE).Y- means clustering algorithms intrusion detection (Yu Guan and Ali A.Ghorbani,Nabil Belacel.2003.Y-Means:A Clustering Method For Intrusion Detection [j], Electrical and Computer Engineering, IEEE, pp.1083-1086), overcome two A disadvantage:The dependence and degeneration of k- mean value Digital Clusterings.Data set is divided into an appropriate number of cluster by the method automatically, is utilized It is feasible and effective that clustering, which performs intrusion detection,.K- mean values (LI Han.2010.Research and Implementation of an Anomaly Detection Model Based on Clustering Analysis[J] .Intelligence Information Processing and Trusted Computing(IPTC 2010),IEEE, HuangGang, China, pp.458-462) it is a simplest partitioning algorithm, solve well-known clustering problem.It is poly- Class algorithm utilizes SOM and k- mean values (WANG Huai-bin, YANG Hong-liang, XU Zhi-jian, YUAN Zheng.2010.A clustering algorithm use SOM and K-Means in Intrusion Detection [J] .E-Business and EGovernment, IEEE, pp.1281-1284) traditional SOM can be overcome essence can not be provided The shortcomings that true cluster result, the shortcomings that overcoming traditional k- mean values dependent on initial value and be difficult to find that cluster centre.For IDS Parallel clustering Integrated Algorithm (Hongwei Gao, Dingju Zhu, the Xiaomin Wang.2010.A Parallel of proposition Clustering Ensemble Algorithm for Intrusion Detection System In Proceedings of 2010 Ninth International Symposium on Distributed Computing and Applications to Business [j], Engineering and Science, IEEE, pp.450-453) realize height Speed, high detection rate, and rate of false alarm is low.(Akashdeep., Manzoor I., Kumar are N.2017.A for ANN classification device feature reduced intrusion detection system using ANN classifier[J].Expert Systems with Applications:249-257) also there is good performance in intrusion detection.By using mixed Learning method can reach higher verification and measurement ratio and lower rate of false alarm (Z.Muda, W.Yassin, M.N.Sulaiman, N.I.Udzir.2011.Intrusion Detection based on K-Means Clustering and Naive Bayes Classification[J].7th International Conference on IT in Asia(CITA), IEEE;Moriteru Ishida,Hiroki Takakura,Yasuo Okabe.2011.High-Performance Intrusion Detection Using OptiGrid Clustering and Grid-based Labelling[J] .International Symposium n Applications and the Internet,IEEE,pp.11-19;Hari Om,Aritra Kundu.2012.A Hybrid System for Reducing the False Alarm Rate of Anomaly Intrusion Detection System[J].Information Technology(RAIT-2012), IEEE), such as cluster and classification are combined and can obtain good effect.Also Syed Ali RazaShah et al. directly compare Compared with machine learning method be directly applied in snort intruding detection systems detection performance (Syed Ali RazaShah, .BijuIssac.2017.Performance comparison of intrusion detection systems and application of machine learning to Snort system[J].Future Generation Computer Systems:157-170)。
Other than the above intrusion detection method based on data mining, intrusion detection based on stream (Umer, MF., Sher,BY.,Bi,YX.2017.Flow-based intrusion detection:Techniques and challenges [J].Computers&Security:238-254) it is a kind of innovative approach for detecting express network invasion.Invasion based on stream Detection only checks packet header, not the payload of analysis bag.Filtering method selects work(using the RIA of some predefined standards Can be eliminated from data set unrelated correlated characteristic (Barmejo, P., Ossa, L., Gamez, J.A., &Puerta, J.M.2012.Fastwrapper feature subset selection in high dimensional datasets by means of filter re ranking.Journal of Knowledge Based Systems,25,35-44)。
Traditional file analysis intrusion detection method may be effective common and general type attack, but not It can be effective to new attack technology.Although common data digging method can be applicable in new attack type, there is detection The problem of overlong time.
Invention content
In view of this, the present invention provides a kind of network inbreak detection method being combined with Bayes based on PCA, it can Realize the fast and effective detection to common, general type attack and new type attack, detection time is short, and accuracy is high.
The network inbreak detection method of the present invention being combined with Bayes based on PCA, is included the following steps:
Step 1, the network number for each network connection record that network training data set and network testing data are concentrated is extracted According to feature, training network data characteristics matrix and test training network data characteristics matrix are built;
Step 2, special to training network data characteristics matrix and test training network data respectively using Principal Component Analysis It levies matrix and carries out dimensionality reduction, obtain the training network data characteristics matrix after dimensionality reduction and test training network data characteristics matrix;Its In, during principal component analysis, preceding 3 feature vectors are weighted, weight coefficient k=0~1;
Step 3, Bayes classifier is built, using the training network data characteristics matrix after dimensionality reduction to Bayes classifier It is trained;
Step 4, the test network data characteristics matrix after dimensionality reduction is invaded using trained Bayes classifier Detection.
Further, in the step 1, network data feature includes the essential characteristic of TCP connection, the content of TCP connection Feature and network flow statistic feature.
Further, in the step 2, the training network data characteristics matrix that step 1 obtains is trained with test first The numerical value under each feature in network data eigenmatrix is normalized, and the training network data after being normalized is special Levy matrix and test training network data characteristics matrix;Then to the training network data characteristics matrix and test instruction after normalization Practice network data eigenmatrix and carry out principal component analysis, obtains the training network data characteristics matrix after dimensionality reduction and test training net Network data characteristics matrix.
Further, k=10-4~10-6
Further, the Bayes classifier is Gauss Naive Bayes Classifier.
Advantageous effect:
The present invention can be performed intrusion detection quickly using Bayes classifier, and may be implemented existing attack type with And the intrusion detection of unknown new attack type;Meanwhile dimension-reduction treatment is carried out to network data using principal component analytical method, Detection time is further increased, also, principal component analytical method is improved, has submitted the accuracy of detection.Side of the present invention Method is compared with traditional network inbreak detection method based on data mining, although there is no big promotion in detection accuracy, But detection time, which is other intrusion detection methods, to be compared.
Description of the drawings
Fig. 1 is intrusion detection method flow chart of the present invention.
Fig. 2 is that verification and measurement ratio compares figure of discounting.
Fig. 3 is that detection time compares figure of discounting.
Specific implementation mode
The present invention will now be described in detail with reference to the accompanying drawings and examples.
With the continuous variation of network intrusions attack means, new attack type constantly generates, and traditional rule-based The detection method of file does not adapt to new attack type, and therefore, the present invention selects the intrusion detection method based on data mining. However, most of intrusion detection method during available data is excavated all has that detection time is long, cannot accomplish in time Detection, for this purpose, the present invention provides a kind of network inbreak detection methods being combined with Bayes based on PCA, first to training Data set D and test data set T applications PCA obtain the training data after dimensionality reduction and test data, reduce Bayes classifier The model training time, greatly shorten detection time, then, invasion inspection carried out using the most fast Bayes classifier of detection time It surveys, detection speed is fast, and is applicable to new attack type;However, the shortening of detection time also implies that the loss of verification and measurement ratio, The present invention improves detection accuracy by the improvement to PCA so that the method for the present invention when detecting between it is correct with detection Performance is efficient in rate.The intrusion detection method flow of the present invention is as shown in Figure 1, specifically comprise the following steps:
Step 1:Extract the network number for each network connection record that network training data set and network testing data are concentrated According to feature
Network data feature includes mainly the content characteristic and network flow statistic of the essential characteristic of TCP connection, TCP connection Feature, but these three features are not limited to, as long as can reflect the feature of network flow.It is specifically described as follows:
The essential characteristic of TCP connection contains the essential attribute of some connections, includes 9 attribute, (1) duration altogether: The duration is connected, in seconds, its definition is counted from TCP connection with 3 foundation of shaking hands, and is tied to FIN/ACK connections Time until beam;(2)protocol_type:Protocol type, including TCP, UDP, ICMP;(3)service:Destination host Types of network services;(4)flag:Connect normal or wrong state;(5)src_bytes:Number from source host to destination host According to byte number;(6)dst_bytes:The byte number of data from destination host to source host;(7)land:If connection from/ It is 1 to be sent to same host/port then, is otherwise 0;(8)wrong_fragment:The quantity of mistake segmentation;(9)urgent: The number of urgent packet.
The content characteristic of TCP connection includes 13 kinds altogether:(1)hot:Access the number of system sensitive file and catalogue;(2) num_failed_logins:The number of login attempt failure;(3)logged_in:It is 1 that success, which logs in then, is otherwise 0;(4) num_compromised:The number that compromised conditions occur;(5)root_shell:It is if obtaining root shell 1, it is otherwise 0;(6)su_attempted:If occurring " it is 1 if su root " order, it is otherwise 0;(7)num_root:Root is used Family access times;(8)num_file_creations:The number of file creation operation;(9)num_shells:It is ordered using shell The number of order;(10)num_access_files:The number of access control file;(11)num_outbound_cmds:One The number of outbound connection in ftp session;(12)is_hot_login:Whether login belongs to " hot " list;(13)is_guest_ login:It is otherwise 0 if it is 1 that guest, which is logged in then,.
Network flow statistic feature includes 9 kinds altogether:(1)count:In past two seconds, and it currently connect mesh having the same Mark the connection number of host;(2)srv_count:In past two seconds, the connection number with same services is connect with current;(3) serror_rate:In past two seconds, it is connect in the connection with same target host with current, the company of " SYN " mistake occurs The percentage connect;(4)srv_serror_rate:In past two seconds, it connect in the connection with same services, goes out with current The percentage of the connection of existing " SYN " mistake;(5)rerror_rate:In past two seconds, with currently connect with same target In the connection of host, there is the percentage of the connection of " REJ " mistake;(6)srv_rerror_rate:In past two seconds, with work as In preceding connection of the connection with same services, there is the percentage of the connection of " REJ " mistake;(7)same_srv_rate:Past It in two seconds, is connect in the connection with same target host with current, hundred of the connection with same services is connect with current Divide ratio;(8)diff_srv_rate:In past two seconds, in the current connection for connect and there is same target host, and it is current Connect the percentage of the connection with different services;(9)srv_diff_host_rate:In past two seconds, with currently connect In connection with same services, with the current percentage for connecting the connection with different target host.
The essential characteristic of TCP connection is defined as B, connection features share 9 kinds, so B={ b1, b2 ..., b9 }.By TCP The content characteristic of connection is defined as C, and content characteristic shares 13 kinds, so C={ c1, c2 ..., c13 }.By network flow statistic spy Sign is defined as F, and traffic characteristic shares 9 kinds, so F={ f1, f2 ..., f9 }.It is D, Test Network by training network data set definition Network data set definition is T.Network connection record in data set is just DiAnd Ti
Define 1:A record D in training setiWith a record T in test setiIt is as follows:
Di={ B, C, F }, i=1,2 ..., n, (training dataset includes n items record)
Ti={ B, C, F }, i=1,2 ..., m, (training dataset includes m items record)
Step 2:Using improved principal component analysis (IPCA) method to training network data characteristics matrix and test training net Network data characteristics matrix carries out dimensionality reduction
Principal component analysis (PCA) is that one group of correlated variables is converted into one group of linear uncorrelated variables using orthogonal transformation, Wherein first principal component variance is maximum, is a kind of common dimensionality reduction statistical technique.Invasion inspection is being carried out using Bayes classifier Before survey, dimensionality reduction is carried out using principal component analysis to network data, substantially reduces detection time.
Define 2:Since training data and test data will apply principal component analysis as, D and T are write to the shape of matrix For formula to get to training network data characteristics matrix D and test training network data characteristics matrix T, definition data matrix is X, that X=D or T.A linkage record in X is as follows:
Xi=DiOr Ti (1)
Principal component analysis is carried out to X matrix, detailed process is as follows:The first step removes the average value of data, and second step calculates The covariance matrix of data matrix, third step calculate covariance matrix eigen vector, the 4th step characteristic value from Small sequence is arrived greatly, and the 5th step is multiplied by the instruction after data matrix X has just obtained dimensionality reduction with the corresponding feature vector of a characteristic values of preceding d ' Practice data set D '/test data set T '.Wherein d '=5~8, effect are more excellent.
Dimensionality reduction is carried out to network data using Principal Component Analysis and has lost detection although improving detection speed Precision improves accuracy of detection for this purpose, the present invention is improved Principal Component Analysis.
Think, first three feature vector of PCA methods can embody the Global Information of network data, when network data by To certain factor influence when, first three feature vector in PCA methods may be contaminated the most serious, if to first three feature Vector carries out certain processing, then influence of the factor to it can be reduced to a certain extent, so as to improve accuracy of detection.By This, the present invention proposes, is weighted processing to first three feature vector, i.e.,:
ω '=(k ω1,kω2,kω34,…,ωd′) (2)
Wherein, ω ' be improved feature vector, k be introducing weight coefficient, weight coefficient be one between 0-1 it Between number, in order to reduce the weight of first three component, the contaminated degree of data reduced, finally with ω ' come to data square Battle array X carries out dimensionality reduction.The pseudocode of improved Principal Component Analysis IPCA is as follows.
Algorithm:IPCA
Input:Training network data characteristics matrix or test network data characteristics matrix X=D or T
Output:Training network data characteristics matrix after dimensionality reduction or test network data characteristics matrix X '=D ' or T '
1.X '=X-mean//removal average value
2. seeking the covariance matrix X ' X ' of data matrix X 'T
3. seeking covariance matrix X ' X 'TEigenvalue λ and feature vector
4. arrayed feature value simultaneously takes a eigenvalue λs of preceding d '1≥λ2≥…≥λd′
5. ω=(ω1234,…,ωd′);ω1234,…,ωd′Respectively λ123, λ4,…,λd′Corresponding feature vector
6. ω '=(k ω1,kω2,kω34,…,ωd′)
7.X '=X ω '
The first row removes the mean value of data matrix, and the second row seeks the covariance matrix of data matrix.The third line is to fourth line The characteristic value and feature value vector of covariance matrix are asked, and characteristic value is arranged from big to small.A features of d ' before fifth line is chosen It is worth the solution that corresponding feature vector is exactly traditional PCA.6th row is weighted processing to the solution of traditional PCA and has just obtained newly Solution, last column is multiplied with new solution with data matrix is reduced to d ' dimensions by data.The size of weighting coefficient passes through experimental verification Depending on, preferably, k=10-4~10-6
In order to eliminate the dimension impact between characteristic, the present invention is first before carrying out principal component analysis to data matrix X First data matrix X is normalized.Specifically, for each row (i.e. heterogeneous networks linkage record in data matrix X Same feature under value), be normalized respectively, i.e.,:Select first under this feature the maximum value of each numerical value with most Then small value uses following formula for each numerical value under this feature, calculate the new value after normalization, the mapping range being newly worth For 0-1.
Wherein,For to Xi,jValue after being normalized;Xj,maxFor the maximum value in jth row in data X;Xj,minFor Minimum value in data X in jth row.
Then rightPrincipal component analysis is carried out, the data matrix X ' after dimensionality reduction is obtained.
Step 3:Build Bayes classifier, using the training network data matrix after dimensionality reduction to Bayes classifier into Row training
Theoretically, all kinds of Bayes classifiers may be applicable to the present invention, such as Gauss Naive Bayes Classifier, Bernoulli Jacob Naive Bayes Classifier etc..The present embodiment is illustrated by taking Gauss Naive Bayes Classifier as an example.
All known in all dependent probabilities, Bayes decision theory considers how to damage based on these probability and erroneous judgement It loses to select optimal category label.For intrusion detection task, need to judge that network flow is normal or abnormal 's.It is therefore assumed that there is 2 kinds of possible category label γ={ c1,c2, c1Represent normal category label, c2Represent abnormal class mark Note.For each linkage record x=Xi, selection can make the maximum category labels of posterior probability P (c | x).It is fixed based on Bayes Reason, and P (c | x) it can be written as
Wherein, P (c) is class prior probability, P (x | c) it is the class conditional probability that a linkage record x marks c relative to class, P (x) is to be used for the normalized evidence factor.It is unrelated with class label for given linkage record x, evidence factor P (x), therefore P (c | it is x) only related with P (c) and P (x | c).
Naive Bayes Classifier uses " attribute conditions independence assumption ", to known class, it is assumed that all properties phase It is mutually independent, in other words, it is assumed that make a difference to classification results to each attribute independent.Institute's above formula can be rewritten as
Wherein d is the attribute number of every linkage record, if before not applying IPCA, d=31.xiRemember for connection Record values of the x in ith attribute.Since P (x) is identical for all categories, so the expression of Naive Bayes Classifier Formula is
For continuity attribute, it may be considered that probability density function.It is assumed thatWherein μc,iWithIt is the mean value and variance of c class samples value in ith attribute respectively, then has
For each linkage record, the normal posterior probability with abnormal two classifications is calculated separately, is selected wherein larger The category label result as the record.
Step 4 carries out invasion inspection using trained Bayes classifier to the test network data matrix T ' after dimensionality reduction It surveys.
Specifically, Gauss Naive Bayes Classifier (GNB) flow is as follows:To the training dataset D ' (training after dimensionality reduction When)/test data set T ' (when test) using Gauss Naive Bayes Classifier differentiate each record classification.Basis first Formula (7) calculates the conditional probability P (x of each attributei| c), then calculates separately record and belong to normal and abnormal class prior probability P(c).Record is finally calculated by formula (6) and belongs to normal with abnormal prior probability, the classification for selecting prior probability big as The testing result of the record.
Alarm mechanism:The result of detection be normally just do nothing, if detect the network data be extremely if to It alarms at family so that user can rapidly have found to invade, and reply in time avoids losing.Type of alarm is not the weight that the present invention is paid close attention to The full patterns of Snort intruding detection systems may be used in point, a catalogue are established to each IP for generating alarm, solution Under packet data recording to corresponding catalogue after code, other than exporting warning message, the header information of packet can also be exported.
Model needs the weight coefficient by constantly adjusting improved PCA, to find most suitable weight coefficient, so that Model inspection effect is more preferable.The sum of the training of model and testing time are exactly detection time, and verification and measurement ratio is exactly to test centralized detecting The correctly record of record number divided by test set sum.
Data handling procedure instance analysis:
The processed flow of data can be well understood by example, first record in training set D is as follows
0,2,19,8,181,5450,0,0,0, (essential characteristic of TCP connection)
0,0,1,0,0,0,0,0,0,0,0,0,0, (content characteristic of TCP connection)
8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00 (time-based network flow statistic feature)
Data set D is normalized according to formula (3), first record is as follows after normalization
0,0.0018,0.0012,0.0016,0,0,0,0,0, (TCP connection essential characteristic)
0,0,0.0037,0,0,0,0,0,0,0,0,0,0, (content characteristic of TCP connection)
0,0,0.00,0.00,0.00,0.00,0.0016,0.00,0.00 (time-based network flow statistic feature)
It can be seen that all features have been mapped to the sections 0-1, using IPCA processes, first after dimensionality reduction connects Record is as follows:
24.701,0,0,0.001,0,-0.001,0,0
After to training dataset D and test data set T dimensionality reductions, so that it may to enter using Gauss naive Bayesian It invades and has detected.In order to compare the effect of model proposed by the present invention, by it respectively with classical Gauss Nae Bayesianmethod, go back There is the method that traditional PCA is combined with Gauss naive Bayesian to compare, it was demonstrated that present invention introduces PCA and to improve PCA's Meaning.Also Bayes classifier and effect of other graders in intrusion detection are compared simultaneously, it was demonstrated that Bayes point Class device when detecting between on advantage.
Experimental situation is a PC for being equipped with intel G2020 processors, 8GB memories and Windows7 operating systems Machine.It compared effect of the common grader in intrusion detection first, as shown in table 1 below.
1 common classification device Contrast on effect of table
By comparison it can be found that although GNB has minimum verification and measurement ratio, but can have been trained in the time of 1.42s Model simultaneously carries out model measurement, although and other grader verification and measurement ratios are relatively high, when detecting between on cannot meet invasion The requirement of detection.The time that support vector machine classifier even needs up to more than 10 hour, iteration decision tree classifier also need Want the time of clock more than 2 points.Therefore the present invention selects GNB graders as the grader of intrusion detection, although verification and measurement ratio is not so good as it His grader, but possess shorter detection time, and by present invention introduces improved PCA verification and measurement ratios will be not less than it His grader.By introducing PCA, the GNB times can also be greatly shortened.
Improved PCA is by being arranged the different weights of first three feature vector, to observe the variation of accuracy rate, to find Go out the best weight coefficient of the model.Different weight coefficient values and corresponding detection accuracy are as shown in table 2 below.
2 weight coefficient of table and accuracy rate
The model of the present invention and the model before improvement are in detection accuracy and comparison such as Fig. 2 and Fig. 3 in detection time It is shown.The present invention is demonstrated compared to classical Bayesian model, detection time can be greatly shortened by introducing PCA, and improving PCA can So that the detection accuracy of model is not less than other data digging methods.
In conclusion the above is merely preferred embodiments of the present invention, being not intended to limit the scope of the present invention. All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in the present invention's Within protection domain.

Claims (5)

1. a kind of network inbreak detection method being combined with Bayes based on PCA, which is characterized in that include the following steps:
Step 1, the network data for extracting each network connection record that network training data set and network testing data are concentrated is special Sign builds training network data characteristics matrix and test training network data characteristics matrix;
Step 2, using Principal Component Analysis respectively to training network data characteristics matrix and test training network data characteristics square Battle array carries out dimensionality reduction, obtains the training network data characteristics matrix after dimensionality reduction and test training network data characteristics matrix;Wherein, exist During principal component analysis, preceding 3 feature vectors are weighted, weight coefficient k=0~1;
Step 3, Bayes classifier is built, Bayes classifier is carried out using the training network data characteristics matrix after dimensionality reduction Training;
Step 4, the test network data characteristics matrix after dimensionality reduction is performed intrusion detection using trained Bayes classifier.
2. the network inbreak detection method being combined as described in claim 1 with Bayes based on PCA, which is characterized in that institute It states in step 1, network data feature includes that the essential characteristic of TCP connection, the content characteristic of TCP connection and network flow statistic are special Sign.
3. the network inbreak detection method being combined as described in claim 1 with Bayes based on PCA, which is characterized in that institute It states in step 2, in the training network data characteristics matrix and test training network data characteristics matrix that are obtained first to step 1 Numerical value under each feature is normalized, the training network data characteristics matrix after being normalized and test training network Data characteristics matrix;Then to after normalization training network data characteristics matrix and test training network data characteristics matrix into Row principal component analysis obtains the training network data characteristics matrix after dimensionality reduction and test training network data characteristics matrix.
4. the network inbreak detection method being combined as claimed in claim 3 with Bayes based on PCA, which is characterized in that k= 10-4~10-6
5. the network inbreak detection method being combined as described in claim 1 with Bayes based on PCA, which is characterized in that institute It is Gauss Naive Bayes Classifier to state Bayes classifier.
CN201810433476.2A 2018-05-08 2018-05-08 A kind of network inbreak detection method being combined with Bayes based on PCA Pending CN108632278A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810433476.2A CN108632278A (en) 2018-05-08 2018-05-08 A kind of network inbreak detection method being combined with Bayes based on PCA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810433476.2A CN108632278A (en) 2018-05-08 2018-05-08 A kind of network inbreak detection method being combined with Bayes based on PCA

Publications (1)

Publication Number Publication Date
CN108632278A true CN108632278A (en) 2018-10-09

Family

ID=63695907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810433476.2A Pending CN108632278A (en) 2018-05-08 2018-05-08 A kind of network inbreak detection method being combined with Bayes based on PCA

Country Status (1)

Country Link
CN (1) CN108632278A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110062011A (en) * 2019-05-30 2019-07-26 海南大学 Ddos attack detection method and device based on V-SVM
CN110138776A (en) * 2019-05-14 2019-08-16 重庆天蓬网络有限公司 Docker intrusion detection method, device and medium based on order monitoring
CN110276195A (en) * 2019-04-25 2019-09-24 北京邮电大学 A kind of smart machine intrusion detection method, equipment and storage medium
CN110868414A (en) * 2019-11-14 2020-03-06 北京理工大学 Industrial control network intrusion detection method and system based on multi-voting technology
CN111553381A (en) * 2020-03-23 2020-08-18 北京邮电大学 Network intrusion detection method and device based on multiple network models and electronic equipment
CN111988306A (en) * 2020-08-17 2020-11-24 北京邮电大学 Method and system for detecting DDoS attack traffic in network based on variational Bayes
CN112185484A (en) * 2020-10-13 2021-01-05 华北科技学院 AdaBoost model-based water quality characteristic mineral water classification method
CN113255212A (en) * 2021-05-17 2021-08-13 中国南方电网有限责任公司超高压输电公司昆明局 Model selection method for converter valve cooling system based on PCA and Bayesian classifier
CN113688436A (en) * 2020-05-19 2021-11-23 天津大学 PCA and naive Bayes classification fusion hardware Trojan horse detection method
CN113726785A (en) * 2021-08-31 2021-11-30 平安普惠企业管理有限公司 Network intrusion detection method and device, computer equipment and storage medium
CN117650949A (en) * 2024-01-30 2024-03-05 山东鲁商科技集团有限公司 Network attack interception method and system based on RPA robot data analysis

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276195A (en) * 2019-04-25 2019-09-24 北京邮电大学 A kind of smart machine intrusion detection method, equipment and storage medium
CN110138776A (en) * 2019-05-14 2019-08-16 重庆天蓬网络有限公司 Docker intrusion detection method, device and medium based on order monitoring
CN110062011A (en) * 2019-05-30 2019-07-26 海南大学 Ddos attack detection method and device based on V-SVM
CN110868414A (en) * 2019-11-14 2020-03-06 北京理工大学 Industrial control network intrusion detection method and system based on multi-voting technology
CN111553381A (en) * 2020-03-23 2020-08-18 北京邮电大学 Network intrusion detection method and device based on multiple network models and electronic equipment
CN111553381B (en) * 2020-03-23 2022-11-18 北京邮电大学 Network intrusion detection method and device based on multiple network models and electronic equipment
CN113688436A (en) * 2020-05-19 2021-11-23 天津大学 PCA and naive Bayes classification fusion hardware Trojan horse detection method
CN111988306A (en) * 2020-08-17 2020-11-24 北京邮电大学 Method and system for detecting DDoS attack traffic in network based on variational Bayes
CN112185484A (en) * 2020-10-13 2021-01-05 华北科技学院 AdaBoost model-based water quality characteristic mineral water classification method
CN113255212A (en) * 2021-05-17 2021-08-13 中国南方电网有限责任公司超高压输电公司昆明局 Model selection method for converter valve cooling system based on PCA and Bayesian classifier
CN113726785A (en) * 2021-08-31 2021-11-30 平安普惠企业管理有限公司 Network intrusion detection method and device, computer equipment and storage medium
CN113726785B (en) * 2021-08-31 2022-11-11 平安普惠企业管理有限公司 Network intrusion detection method and device, computer equipment and storage medium
CN117650949A (en) * 2024-01-30 2024-03-05 山东鲁商科技集团有限公司 Network attack interception method and system based on RPA robot data analysis

Similar Documents

Publication Publication Date Title
CN108632278A (en) A kind of network inbreak detection method being combined with Bayes based on PCA
Yang et al. MTH-IDS: A multitiered hybrid intrusion detection system for internet of vehicles
Aljawarneh et al. Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model
Depren et al. An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks
Gogoi et al. MLH-IDS: a multi-level hybrid intrusion detection method
Bai et al. A machine learning approach for rdp-based lateral movement detection
Farahani Feature selection based on cross-correlation for the intrusion detection system
Diallo et al. Adaptive clustering-based malicious traffic classification at the network edge
Garg et al. HyClass: Hybrid classification model for anomaly detection in cloud environment
Diwan et al. Feature entropy estimation (FEE) for malicious IoT traffic and detection using machine learning
Al-Fawa'reh et al. Detecting stealth-based attacks in large campus networks
Zhong et al. An adversarial learning model for intrusion detection in real complex network environments
Brandao et al. Log Files Analysis for Network Intrusion Detection
Silva et al. Attackers are not stealthy: Statistical analysis of the well-known and infamous KDD network security dataset
Zhang et al. Detection of android malware based on deep forest and feature enhancement
Sakthivelu et al. Advanced Persistent Threat Detection and Mitigation Using Machine Learning Model.
Dharamkar et al. A review of cyber attack classification technique based on data mining and neural network approach
Seniaray et al. Machine learning-based network intrusion detection system
Kosamkar et al. Data Mining Algorithms for Intrusion Detection System: An Overview
Caulkins et al. A dynamic data mining technique for intrusion detection systems
Sulaiman et al. Big data analytic of intrusion detection system
Manandhar A practical approach to anomaly-based intrusion detection system by outlier mining in network traffic
Rani et al. Analysis of machine learning and deep learning intrusion detection system in Internet of Things network
Babu et al. Improved Monarchy Butterfly Optimization Algorithm (IMBO): Intrusion Detection Using Mapreduce Framework Based Optimized ANU-Net.
Mohammed et al. An automated signature generation method for zero-day polymorphic worms based on multilayer perceptron model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181009

WD01 Invention patent application deemed withdrawn after publication