CN108632278A - A kind of network inbreak detection method being combined with Bayes based on PCA - Google Patents
A kind of network inbreak detection method being combined with Bayes based on PCA Download PDFInfo
- Publication number
- CN108632278A CN108632278A CN201810433476.2A CN201810433476A CN108632278A CN 108632278 A CN108632278 A CN 108632278A CN 201810433476 A CN201810433476 A CN 201810433476A CN 108632278 A CN108632278 A CN 108632278A
- Authority
- CN
- China
- Prior art keywords
- network data
- data characteristics
- training
- characteristics matrix
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
Abstract
The invention discloses a kind of network inbreak detection methods being combined with Bayes based on PCA.The fast and effective detection to common, general type attack and new type attack can be realized using the present invention, and detection time is short, and accuracy is high.The present invention first obtains training dataset and test data set application PCA the training data after dimensionality reduction and test data, reduce model training time and the detection time of Bayes classifier, then the Bayes classifier for using detection time most fast performs intrusion detection, realize quickly detection, simultaneously, the present invention also improves PCA, improves the accuracy of detection so that the method for the present invention when detecting between it is efficient with performance in detection accuracy.
Description
Technical field
The present invention relates to Intrusion Detection fields, and in particular to a kind of net being combined with Bayes based on PCA
Network intrusion detection method.
Background technology
Also there are many safety problems while bringing people and facilitating in internet.Network attack does not exist all the time
Occur, Research on Network Intrusion Detection has practical significance and the significant challenge in current network security field.
Dorothy Denning gave definition in 1987 to intrusion detection:Pass through monitoring network data information, detection
Go out intrusion behavior, before intrusion behavior causes damages, sends out alarm and responded (Denning, DE.1987.An
Intrusion-Detection Model[J].IEEE Transactions on Software Engineering,SE-13
(2):222-232.).It can be found that an important feature of intrusion detection is exactly instantaneity, detection method needs quickly to attack
Information is judged, can be alarmed before harm occurs.There are two main classes for traditional intrusion detection method.The first kind is
Rule-based intrusion detection, it depends on the feature of preliminary analysis particular attack type, then the attack signature is recorded
Rule file is detected finally by matching rule file.Such methods are mainly used in that some are commercial or that increases income enters
Invade detecting system (IDS), for example, Snort intruding detection systems use be exactly this method (Park, W., Ahn,
S.2017.Performance Comparison and Detection Analysis in Snort and Suricata
Environment[J].wireless personal communocations,94(2):241-252;Gaddam,R.,
Nandhini,M.2017.An Analysis of Various Snort Based Techniques to Detect and
Prevent Intrusions in Networks Proposal with Code Refactoring Snort Tool in
Kali Linux Environment[J].International Conference on Inventive Communication
and Computational Technologies(ICICCT):10-15), because rule-based intrusion detection has detection
Fireballing feature.But a significant problem existing for this method is cannot to detect new attack type, can only detect to advise
Then existing attack type in library.The attack means of hacker are continually changing, often have the generation of new attack type, and new
Attack type often endanger bigger, and there is also the high problems of rate of false alarm for this method.Second class method is with close
The rise of a little year machine learning and data mining, data digging method have been applied among intrusion detection.Data digging method
It by marked data set come training pattern, is then performed intrusion detection, unknown can be attacked by trained model
The detection for hitting type has good effect, such as (Dong S K, Park J are S.2003.Network-Based by SVM
Intrusion Detection with Support Vector Machines[M].//Information
Networking.Berlin:Springer:747-756;Yendrapalli,K.,Mukkamala,S.,Sung,AH.,
Ribeiro,B.2007.World Congress on Engineering:321-325), neural network (Ryan, J., Lin,
The methods of MJ., Miikkulainen R.1997.Intrusion Detection with Neural Networks [J]).Number
Intrusion detection is applied to according to excavation to need to collect a large amount of data set in advance, which has limited online intrusion detection (Huang, C.-
T.,Chang,R.K.C.,Huang,P.,2009.Editorial:signal processing applications in
network intrusion detection systems.EURASIP J.Adv.Signal Process 2009,9:1–9:
2)。
Traditional intrusion detection method concentrate on data mining (He, W., Hu, G., Yao, X., Kan, G., Wang, H.,
Xiang,H.,2008.Applying multiple time series data mining to large-scale
network traffic analysis.In:Proceedings of 2008 IEEE Conference on
Cybernetics and Intelligent Systems,pp.394–399;Ghourabi,A.,Abbes,T.,Bouhoula,
A.,2010.Data analyzer based on data mining for honeypot router.In:Proceedings
of 2010 IEEE/ACS International Conference on Computer Systems and
Applications(AICCSA),IEEE,pp.1-6;Osanaiye,O.,Choo,K.-K.R.,Dlodlo,M.,
2016.Distributed denial of service(ddos)resilience in cloud:review and
Conceptual cloud ddos mitigation framework.J.Netw.Comput.Appl.67,147-165) and it is general
Logical file analysis (Raynal, F., Berthier, Y., Biondi, P., Kaminsky, D., 2004.Honeypot
forensics.In:Proceedings from the Fifth Annual IEEE SMC,Information Assurance
Workshop,2004,IEEE,pp.22-29).An, WJ et al. by divergence in the infima species in Fisher discriminant analyses with tradition
The method that is combined of support vector machines apply in intrusion detection, it is proposed that divergence minimum support vector machines (WCS- in class
SVM), there are better intrusion detection rate (An WJ., Liang MG.2013.A new than traditional support vector machines
intrusion detection method based on SVM with minimum within-class scatter[J]
.Security and Communication Networks,6:1064-1074).Kabir, E et al. propose to be based on least square
Intruding detection system (Kabir, E., Hu, JK., Wang, H., Zhuo, the GP.2017.A novel of support vector machines (LS-SVM)
statistical technique for intrusion detection systems[J].Future Generation
Computer Systems:303-318).Mrudula Gudadhe et al. propose that a kind of method of new promotion decision tree is answered
In intrusion detection, it allows that multiple decision trees is combined to form grader (Mrudula a Gudadhe, Prakash
Prasad,Kapil Wankhade,2010.A New Data Mining Based Network Intrusion
Detection Model[J],Computer and Communication Technology(ICCCT 2010),IEEE,
pp.731-735).Back-propagation artificial neural network model is applied to intrusion detection by Sufyan, T et al. so that the effect of IDS
Rate higher, system can adapt to new environment, can cope with new attack type (Sufyan T.Faraj Al-Janabi,
Hadeel Amjed Saeed,2011.A Neural Network Based Anomaly Intrusion Detection
System[J],Developments in Esystems Engineering,IEEE,pp.221-226).Because of network data
Collection is very big, and manual markings classification is time-consuming and laborious, and clustering method is introduced in (Deepthy K in data set classification
Denatious,Anita John.2012.Survey on Data Mining Techniques to Enhance
Intrusion Detection[J].Computer Communication and Informatics(ICCCI-2012),
Jan.10–12,2012,Coimbatore,INDIA,IEEE).Y- means clustering algorithms intrusion detection (Yu Guan and Ali
A.Ghorbani,Nabil Belacel.2003.Y-Means:A Clustering Method For Intrusion
Detection [j], Electrical and Computer Engineering, IEEE, pp.1083-1086), overcome two
A disadvantage:The dependence and degeneration of k- mean value Digital Clusterings.Data set is divided into an appropriate number of cluster by the method automatically, is utilized
It is feasible and effective that clustering, which performs intrusion detection,.K- mean values (LI Han.2010.Research and
Implementation of an Anomaly Detection Model Based on Clustering Analysis[J]
.Intelligence Information Processing and Trusted Computing(IPTC 2010),IEEE,
HuangGang, China, pp.458-462) it is a simplest partitioning algorithm, solve well-known clustering problem.It is poly-
Class algorithm utilizes SOM and k- mean values (WANG Huai-bin, YANG Hong-liang, XU Zhi-jian, YUAN
Zheng.2010.A clustering algorithm use SOM and K-Means in Intrusion Detection
[J] .E-Business and EGovernment, IEEE, pp.1281-1284) traditional SOM can be overcome essence can not be provided
The shortcomings that true cluster result, the shortcomings that overcoming traditional k- mean values dependent on initial value and be difficult to find that cluster centre.For IDS
Parallel clustering Integrated Algorithm (Hongwei Gao, Dingju Zhu, the Xiaomin Wang.2010.A Parallel of proposition
Clustering Ensemble Algorithm for Intrusion Detection System In Proceedings
of 2010 Ninth International Symposium on Distributed Computing and
Applications to Business [j], Engineering and Science, IEEE, pp.450-453) realize height
Speed, high detection rate, and rate of false alarm is low.(Akashdeep., Manzoor I., Kumar are N.2017.A for ANN classification device
feature reduced intrusion detection system using ANN classifier[J].Expert
Systems with Applications:249-257) also there is good performance in intrusion detection.By using mixed
Learning method can reach higher verification and measurement ratio and lower rate of false alarm (Z.Muda, W.Yassin, M.N.Sulaiman,
N.I.Udzir.2011.Intrusion Detection based on K-Means Clustering and Naive
Bayes Classification[J].7th International Conference on IT in Asia(CITA),
IEEE;Moriteru Ishida,Hiroki Takakura,Yasuo Okabe.2011.High-Performance
Intrusion Detection Using OptiGrid Clustering and Grid-based Labelling[J]
.International Symposium n Applications and the Internet,IEEE,pp.11-19;Hari
Om,Aritra Kundu.2012.A Hybrid System for Reducing the False Alarm Rate of
Anomaly Intrusion Detection System[J].Information Technology(RAIT-2012),
IEEE), such as cluster and classification are combined and can obtain good effect.Also Syed Ali RazaShah et al. directly compare
Compared with machine learning method be directly applied in snort intruding detection systems detection performance (Syed Ali RazaShah,
.BijuIssac.2017.Performance comparison of intrusion detection systems and
application of machine learning to Snort system[J].Future Generation Computer
Systems:157-170)。
Other than the above intrusion detection method based on data mining, intrusion detection based on stream (Umer, MF.,
Sher,BY.,Bi,YX.2017.Flow-based intrusion detection:Techniques and challenges
[J].Computers&Security:238-254) it is a kind of innovative approach for detecting express network invasion.Invasion based on stream
Detection only checks packet header, not the payload of analysis bag.Filtering method selects work(using the RIA of some predefined standards
Can be eliminated from data set unrelated correlated characteristic (Barmejo, P., Ossa, L., Gamez, J.A., &Puerta,
J.M.2012.Fastwrapper feature subset selection in high dimensional datasets by
means of filter re ranking.Journal of Knowledge Based Systems,25,35-44)。
Traditional file analysis intrusion detection method may be effective common and general type attack, but not
It can be effective to new attack technology.Although common data digging method can be applicable in new attack type, there is detection
The problem of overlong time.
Invention content
In view of this, the present invention provides a kind of network inbreak detection method being combined with Bayes based on PCA, it can
Realize the fast and effective detection to common, general type attack and new type attack, detection time is short, and accuracy is high.
The network inbreak detection method of the present invention being combined with Bayes based on PCA, is included the following steps:
Step 1, the network number for each network connection record that network training data set and network testing data are concentrated is extracted
According to feature, training network data characteristics matrix and test training network data characteristics matrix are built;
Step 2, special to training network data characteristics matrix and test training network data respectively using Principal Component Analysis
It levies matrix and carries out dimensionality reduction, obtain the training network data characteristics matrix after dimensionality reduction and test training network data characteristics matrix;Its
In, during principal component analysis, preceding 3 feature vectors are weighted, weight coefficient k=0~1;
Step 3, Bayes classifier is built, using the training network data characteristics matrix after dimensionality reduction to Bayes classifier
It is trained;
Step 4, the test network data characteristics matrix after dimensionality reduction is invaded using trained Bayes classifier
Detection.
Further, in the step 1, network data feature includes the essential characteristic of TCP connection, the content of TCP connection
Feature and network flow statistic feature.
Further, in the step 2, the training network data characteristics matrix that step 1 obtains is trained with test first
The numerical value under each feature in network data eigenmatrix is normalized, and the training network data after being normalized is special
Levy matrix and test training network data characteristics matrix;Then to the training network data characteristics matrix and test instruction after normalization
Practice network data eigenmatrix and carry out principal component analysis, obtains the training network data characteristics matrix after dimensionality reduction and test training net
Network data characteristics matrix.
Further, k=10-4~10-6。
Further, the Bayes classifier is Gauss Naive Bayes Classifier.
Advantageous effect:
The present invention can be performed intrusion detection quickly using Bayes classifier, and may be implemented existing attack type with
And the intrusion detection of unknown new attack type;Meanwhile dimension-reduction treatment is carried out to network data using principal component analytical method,
Detection time is further increased, also, principal component analytical method is improved, has submitted the accuracy of detection.Side of the present invention
Method is compared with traditional network inbreak detection method based on data mining, although there is no big promotion in detection accuracy,
But detection time, which is other intrusion detection methods, to be compared.
Description of the drawings
Fig. 1 is intrusion detection method flow chart of the present invention.
Fig. 2 is that verification and measurement ratio compares figure of discounting.
Fig. 3 is that detection time compares figure of discounting.
Specific implementation mode
The present invention will now be described in detail with reference to the accompanying drawings and examples.
With the continuous variation of network intrusions attack means, new attack type constantly generates, and traditional rule-based
The detection method of file does not adapt to new attack type, and therefore, the present invention selects the intrusion detection method based on data mining.
However, most of intrusion detection method during available data is excavated all has that detection time is long, cannot accomplish in time
Detection, for this purpose, the present invention provides a kind of network inbreak detection methods being combined with Bayes based on PCA, first to training
Data set D and test data set T applications PCA obtain the training data after dimensionality reduction and test data, reduce Bayes classifier
The model training time, greatly shorten detection time, then, invasion inspection carried out using the most fast Bayes classifier of detection time
It surveys, detection speed is fast, and is applicable to new attack type;However, the shortening of detection time also implies that the loss of verification and measurement ratio,
The present invention improves detection accuracy by the improvement to PCA so that the method for the present invention when detecting between it is correct with detection
Performance is efficient in rate.The intrusion detection method flow of the present invention is as shown in Figure 1, specifically comprise the following steps:
Step 1:Extract the network number for each network connection record that network training data set and network testing data are concentrated
According to feature
Network data feature includes mainly the content characteristic and network flow statistic of the essential characteristic of TCP connection, TCP connection
Feature, but these three features are not limited to, as long as can reflect the feature of network flow.It is specifically described as follows:
The essential characteristic of TCP connection contains the essential attribute of some connections, includes 9 attribute, (1) duration altogether:
The duration is connected, in seconds, its definition is counted from TCP connection with 3 foundation of shaking hands, and is tied to FIN/ACK connections
Time until beam;(2)protocol_type:Protocol type, including TCP, UDP, ICMP;(3)service:Destination host
Types of network services;(4)flag:Connect normal or wrong state;(5)src_bytes:Number from source host to destination host
According to byte number;(6)dst_bytes:The byte number of data from destination host to source host;(7)land:If connection from/
It is 1 to be sent to same host/port then, is otherwise 0;(8)wrong_fragment:The quantity of mistake segmentation;(9)urgent:
The number of urgent packet.
The content characteristic of TCP connection includes 13 kinds altogether:(1)hot:Access the number of system sensitive file and catalogue;(2)
num_failed_logins:The number of login attempt failure;(3)logged_in:It is 1 that success, which logs in then, is otherwise 0;(4)
num_compromised:The number that compromised conditions occur;(5)root_shell:It is if obtaining root shell
1, it is otherwise 0;(6)su_attempted:If occurring " it is 1 if su root " order, it is otherwise 0;(7)num_root:Root is used
Family access times;(8)num_file_creations:The number of file creation operation;(9)num_shells:It is ordered using shell
The number of order;(10)num_access_files:The number of access control file;(11)num_outbound_cmds:One
The number of outbound connection in ftp session;(12)is_hot_login:Whether login belongs to " hot " list;(13)is_guest_
login:It is otherwise 0 if it is 1 that guest, which is logged in then,.
Network flow statistic feature includes 9 kinds altogether:(1)count:In past two seconds, and it currently connect mesh having the same
Mark the connection number of host;(2)srv_count:In past two seconds, the connection number with same services is connect with current;(3)
serror_rate:In past two seconds, it is connect in the connection with same target host with current, the company of " SYN " mistake occurs
The percentage connect;(4)srv_serror_rate:In past two seconds, it connect in the connection with same services, goes out with current
The percentage of the connection of existing " SYN " mistake;(5)rerror_rate:In past two seconds, with currently connect with same target
In the connection of host, there is the percentage of the connection of " REJ " mistake;(6)srv_rerror_rate:In past two seconds, with work as
In preceding connection of the connection with same services, there is the percentage of the connection of " REJ " mistake;(7)same_srv_rate:Past
It in two seconds, is connect in the connection with same target host with current, hundred of the connection with same services is connect with current
Divide ratio;(8)diff_srv_rate:In past two seconds, in the current connection for connect and there is same target host, and it is current
Connect the percentage of the connection with different services;(9)srv_diff_host_rate:In past two seconds, with currently connect
In connection with same services, with the current percentage for connecting the connection with different target host.
The essential characteristic of TCP connection is defined as B, connection features share 9 kinds, so B={ b1, b2 ..., b9 }.By TCP
The content characteristic of connection is defined as C, and content characteristic shares 13 kinds, so C={ c1, c2 ..., c13 }.By network flow statistic spy
Sign is defined as F, and traffic characteristic shares 9 kinds, so F={ f1, f2 ..., f9 }.It is D, Test Network by training network data set definition
Network data set definition is T.Network connection record in data set is just DiAnd Ti。
Define 1:A record D in training setiWith a record T in test setiIt is as follows:
Di={ B, C, F }, i=1,2 ..., n, (training dataset includes n items record)
Ti={ B, C, F }, i=1,2 ..., m, (training dataset includes m items record)
Step 2:Using improved principal component analysis (IPCA) method to training network data characteristics matrix and test training net
Network data characteristics matrix carries out dimensionality reduction
Principal component analysis (PCA) is that one group of correlated variables is converted into one group of linear uncorrelated variables using orthogonal transformation,
Wherein first principal component variance is maximum, is a kind of common dimensionality reduction statistical technique.Invasion inspection is being carried out using Bayes classifier
Before survey, dimensionality reduction is carried out using principal component analysis to network data, substantially reduces detection time.
Define 2:Since training data and test data will apply principal component analysis as, D and T are write to the shape of matrix
For formula to get to training network data characteristics matrix D and test training network data characteristics matrix T, definition data matrix is X, that
X=D or T.A linkage record in X is as follows:
Xi=DiOr Ti (1)
Principal component analysis is carried out to X matrix, detailed process is as follows:The first step removes the average value of data, and second step calculates
The covariance matrix of data matrix, third step calculate covariance matrix eigen vector, the 4th step characteristic value from
Small sequence is arrived greatly, and the 5th step is multiplied by the instruction after data matrix X has just obtained dimensionality reduction with the corresponding feature vector of a characteristic values of preceding d '
Practice data set D '/test data set T '.Wherein d '=5~8, effect are more excellent.
Dimensionality reduction is carried out to network data using Principal Component Analysis and has lost detection although improving detection speed
Precision improves accuracy of detection for this purpose, the present invention is improved Principal Component Analysis.
Think, first three feature vector of PCA methods can embody the Global Information of network data, when network data by
To certain factor influence when, first three feature vector in PCA methods may be contaminated the most serious, if to first three feature
Vector carries out certain processing, then influence of the factor to it can be reduced to a certain extent, so as to improve accuracy of detection.By
This, the present invention proposes, is weighted processing to first three feature vector, i.e.,:
ω '=(k ω1,kω2,kω3,ω4,…,ωd′) (2)
Wherein, ω ' be improved feature vector, k be introducing weight coefficient, weight coefficient be one between 0-1 it
Between number, in order to reduce the weight of first three component, the contaminated degree of data reduced, finally with ω ' come to data square
Battle array X carries out dimensionality reduction.The pseudocode of improved Principal Component Analysis IPCA is as follows.
Algorithm:IPCA
Input:Training network data characteristics matrix or test network data characteristics matrix X=D or T
Output:Training network data characteristics matrix after dimensionality reduction or test network data characteristics matrix X '=D ' or T '
1.X '=X-mean//removal average value
2. seeking the covariance matrix X ' X ' of data matrix X 'T
3. seeking covariance matrix X ' X 'TEigenvalue λ and feature vector
4. arrayed feature value simultaneously takes a eigenvalue λs of preceding d '1≥λ2≥…≥λd′
5. ω=(ω1,ω2,ω3,ω4,…,ωd′);ω1,ω2,ω3,ω4,…,ωd′Respectively λ1,λ2,λ3,
λ4,…,λd′Corresponding feature vector
6. ω '=(k ω1,kω2,kω3,ω4,…,ωd′)
7.X '=X ω '
The first row removes the mean value of data matrix, and the second row seeks the covariance matrix of data matrix.The third line is to fourth line
The characteristic value and feature value vector of covariance matrix are asked, and characteristic value is arranged from big to small.A features of d ' before fifth line is chosen
It is worth the solution that corresponding feature vector is exactly traditional PCA.6th row is weighted processing to the solution of traditional PCA and has just obtained newly
Solution, last column is multiplied with new solution with data matrix is reduced to d ' dimensions by data.The size of weighting coefficient passes through experimental verification
Depending on, preferably, k=10-4~10-6。
In order to eliminate the dimension impact between characteristic, the present invention is first before carrying out principal component analysis to data matrix X
First data matrix X is normalized.Specifically, for each row (i.e. heterogeneous networks linkage record in data matrix X
Same feature under value), be normalized respectively, i.e.,:Select first under this feature the maximum value of each numerical value with most
Then small value uses following formula for each numerical value under this feature, calculate the new value after normalization, the mapping range being newly worth
For 0-1.
Wherein,For to Xi,jValue after being normalized;Xj,maxFor the maximum value in jth row in data X;Xj,minFor
Minimum value in data X in jth row.
Then rightPrincipal component analysis is carried out, the data matrix X ' after dimensionality reduction is obtained.
Step 3:Build Bayes classifier, using the training network data matrix after dimensionality reduction to Bayes classifier into
Row training
Theoretically, all kinds of Bayes classifiers may be applicable to the present invention, such as Gauss Naive Bayes Classifier, Bernoulli Jacob
Naive Bayes Classifier etc..The present embodiment is illustrated by taking Gauss Naive Bayes Classifier as an example.
All known in all dependent probabilities, Bayes decision theory considers how to damage based on these probability and erroneous judgement
It loses to select optimal category label.For intrusion detection task, need to judge that network flow is normal or abnormal
's.It is therefore assumed that there is 2 kinds of possible category label γ={ c1,c2, c1Represent normal category label, c2Represent abnormal class mark
Note.For each linkage record x=Xi, selection can make the maximum category labels of posterior probability P (c | x).It is fixed based on Bayes
Reason, and P (c | x) it can be written as
Wherein, P (c) is class prior probability, P (x | c) it is the class conditional probability that a linkage record x marks c relative to class,
P (x) is to be used for the normalized evidence factor.It is unrelated with class label for given linkage record x, evidence factor P (x), therefore P (c
| it is x) only related with P (c) and P (x | c).
Naive Bayes Classifier uses " attribute conditions independence assumption ", to known class, it is assumed that all properties phase
It is mutually independent, in other words, it is assumed that make a difference to classification results to each attribute independent.Institute's above formula can be rewritten as
Wherein d is the attribute number of every linkage record, if before not applying IPCA, d=31.xiRemember for connection
Record values of the x in ith attribute.Since P (x) is identical for all categories, so the expression of Naive Bayes Classifier
Formula is
For continuity attribute, it may be considered that probability density function.It is assumed thatWherein
μc,iWithIt is the mean value and variance of c class samples value in ith attribute respectively, then has
For each linkage record, the normal posterior probability with abnormal two classifications is calculated separately, is selected wherein larger
The category label result as the record.
Step 4 carries out invasion inspection using trained Bayes classifier to the test network data matrix T ' after dimensionality reduction
It surveys.
Specifically, Gauss Naive Bayes Classifier (GNB) flow is as follows:To the training dataset D ' (training after dimensionality reduction
When)/test data set T ' (when test) using Gauss Naive Bayes Classifier differentiate each record classification.Basis first
Formula (7) calculates the conditional probability P (x of each attributei| c), then calculates separately record and belong to normal and abnormal class prior probability
P(c).Record is finally calculated by formula (6) and belongs to normal with abnormal prior probability, the classification for selecting prior probability big as
The testing result of the record.
Alarm mechanism:The result of detection be normally just do nothing, if detect the network data be extremely if to
It alarms at family so that user can rapidly have found to invade, and reply in time avoids losing.Type of alarm is not the weight that the present invention is paid close attention to
The full patterns of Snort intruding detection systems may be used in point, a catalogue are established to each IP for generating alarm, solution
Under packet data recording to corresponding catalogue after code, other than exporting warning message, the header information of packet can also be exported.
Model needs the weight coefficient by constantly adjusting improved PCA, to find most suitable weight coefficient, so that
Model inspection effect is more preferable.The sum of the training of model and testing time are exactly detection time, and verification and measurement ratio is exactly to test centralized detecting
The correctly record of record number divided by test set sum.
Data handling procedure instance analysis:
The processed flow of data can be well understood by example, first record in training set D is as follows
0,2,19,8,181,5450,0,0,0, (essential characteristic of TCP connection)
0,0,1,0,0,0,0,0,0,0,0,0,0, (content characteristic of TCP connection)
8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00 (time-based network flow statistic feature)
Data set D is normalized according to formula (3), first record is as follows after normalization
0,0.0018,0.0012,0.0016,0,0,0,0,0, (TCP connection essential characteristic)
0,0,0.0037,0,0,0,0,0,0,0,0,0,0, (content characteristic of TCP connection)
0,0,0.00,0.00,0.00,0.00,0.0016,0.00,0.00 (time-based network flow statistic feature)
It can be seen that all features have been mapped to the sections 0-1, using IPCA processes, first after dimensionality reduction connects
Record is as follows:
24.701,0,0,0.001,0,-0.001,0,0
After to training dataset D and test data set T dimensionality reductions, so that it may to enter using Gauss naive Bayesian
It invades and has detected.In order to compare the effect of model proposed by the present invention, by it respectively with classical Gauss Nae Bayesianmethod, go back
There is the method that traditional PCA is combined with Gauss naive Bayesian to compare, it was demonstrated that present invention introduces PCA and to improve PCA's
Meaning.Also Bayes classifier and effect of other graders in intrusion detection are compared simultaneously, it was demonstrated that Bayes point
Class device when detecting between on advantage.
Experimental situation is a PC for being equipped with intel G2020 processors, 8GB memories and Windows7 operating systems
Machine.It compared effect of the common grader in intrusion detection first, as shown in table 1 below.
1 common classification device Contrast on effect of table
By comparison it can be found that although GNB has minimum verification and measurement ratio, but can have been trained in the time of 1.42s
Model simultaneously carries out model measurement, although and other grader verification and measurement ratios are relatively high, when detecting between on cannot meet invasion
The requirement of detection.The time that support vector machine classifier even needs up to more than 10 hour, iteration decision tree classifier also need
Want the time of clock more than 2 points.Therefore the present invention selects GNB graders as the grader of intrusion detection, although verification and measurement ratio is not so good as it
His grader, but possess shorter detection time, and by present invention introduces improved PCA verification and measurement ratios will be not less than it
His grader.By introducing PCA, the GNB times can also be greatly shortened.
Improved PCA is by being arranged the different weights of first three feature vector, to observe the variation of accuracy rate, to find
Go out the best weight coefficient of the model.Different weight coefficient values and corresponding detection accuracy are as shown in table 2 below.
2 weight coefficient of table and accuracy rate
The model of the present invention and the model before improvement are in detection accuracy and comparison such as Fig. 2 and Fig. 3 in detection time
It is shown.The present invention is demonstrated compared to classical Bayesian model, detection time can be greatly shortened by introducing PCA, and improving PCA can
So that the detection accuracy of model is not less than other data digging methods.
In conclusion the above is merely preferred embodiments of the present invention, being not intended to limit the scope of the present invention.
All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in the present invention's
Within protection domain.
Claims (5)
1. a kind of network inbreak detection method being combined with Bayes based on PCA, which is characterized in that include the following steps:
Step 1, the network data for extracting each network connection record that network training data set and network testing data are concentrated is special
Sign builds training network data characteristics matrix and test training network data characteristics matrix;
Step 2, using Principal Component Analysis respectively to training network data characteristics matrix and test training network data characteristics square
Battle array carries out dimensionality reduction, obtains the training network data characteristics matrix after dimensionality reduction and test training network data characteristics matrix;Wherein, exist
During principal component analysis, preceding 3 feature vectors are weighted, weight coefficient k=0~1;
Step 3, Bayes classifier is built, Bayes classifier is carried out using the training network data characteristics matrix after dimensionality reduction
Training;
Step 4, the test network data characteristics matrix after dimensionality reduction is performed intrusion detection using trained Bayes classifier.
2. the network inbreak detection method being combined as described in claim 1 with Bayes based on PCA, which is characterized in that institute
It states in step 1, network data feature includes that the essential characteristic of TCP connection, the content characteristic of TCP connection and network flow statistic are special
Sign.
3. the network inbreak detection method being combined as described in claim 1 with Bayes based on PCA, which is characterized in that institute
It states in step 2, in the training network data characteristics matrix and test training network data characteristics matrix that are obtained first to step 1
Numerical value under each feature is normalized, the training network data characteristics matrix after being normalized and test training network
Data characteristics matrix;Then to after normalization training network data characteristics matrix and test training network data characteristics matrix into
Row principal component analysis obtains the training network data characteristics matrix after dimensionality reduction and test training network data characteristics matrix.
4. the network inbreak detection method being combined as claimed in claim 3 with Bayes based on PCA, which is characterized in that k=
10-4~10-6。
5. the network inbreak detection method being combined as described in claim 1 with Bayes based on PCA, which is characterized in that institute
It is Gauss Naive Bayes Classifier to state Bayes classifier.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810433476.2A CN108632278A (en) | 2018-05-08 | 2018-05-08 | A kind of network inbreak detection method being combined with Bayes based on PCA |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810433476.2A CN108632278A (en) | 2018-05-08 | 2018-05-08 | A kind of network inbreak detection method being combined with Bayes based on PCA |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108632278A true CN108632278A (en) | 2018-10-09 |
Family
ID=63695907
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810433476.2A Pending CN108632278A (en) | 2018-05-08 | 2018-05-08 | A kind of network inbreak detection method being combined with Bayes based on PCA |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108632278A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110062011A (en) * | 2019-05-30 | 2019-07-26 | 海南大学 | Ddos attack detection method and device based on V-SVM |
CN110138776A (en) * | 2019-05-14 | 2019-08-16 | 重庆天蓬网络有限公司 | Docker intrusion detection method, device and medium based on order monitoring |
CN110276195A (en) * | 2019-04-25 | 2019-09-24 | 北京邮电大学 | A kind of smart machine intrusion detection method, equipment and storage medium |
CN110868414A (en) * | 2019-11-14 | 2020-03-06 | 北京理工大学 | Industrial control network intrusion detection method and system based on multi-voting technology |
CN111553381A (en) * | 2020-03-23 | 2020-08-18 | 北京邮电大学 | Network intrusion detection method and device based on multiple network models and electronic equipment |
CN111988306A (en) * | 2020-08-17 | 2020-11-24 | 北京邮电大学 | Method and system for detecting DDoS attack traffic in network based on variational Bayes |
CN112185484A (en) * | 2020-10-13 | 2021-01-05 | 华北科技学院 | AdaBoost model-based water quality characteristic mineral water classification method |
CN113255212A (en) * | 2021-05-17 | 2021-08-13 | 中国南方电网有限责任公司超高压输电公司昆明局 | Model selection method for converter valve cooling system based on PCA and Bayesian classifier |
CN113688436A (en) * | 2020-05-19 | 2021-11-23 | 天津大学 | PCA and naive Bayes classification fusion hardware Trojan horse detection method |
CN113726785A (en) * | 2021-08-31 | 2021-11-30 | 平安普惠企业管理有限公司 | Network intrusion detection method and device, computer equipment and storage medium |
CN117650949A (en) * | 2024-01-30 | 2024-03-05 | 山东鲁商科技集团有限公司 | Network attack interception method and system based on RPA robot data analysis |
-
2018
- 2018-05-08 CN CN201810433476.2A patent/CN108632278A/en active Pending
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110276195A (en) * | 2019-04-25 | 2019-09-24 | 北京邮电大学 | A kind of smart machine intrusion detection method, equipment and storage medium |
CN110138776A (en) * | 2019-05-14 | 2019-08-16 | 重庆天蓬网络有限公司 | Docker intrusion detection method, device and medium based on order monitoring |
CN110062011A (en) * | 2019-05-30 | 2019-07-26 | 海南大学 | Ddos attack detection method and device based on V-SVM |
CN110868414A (en) * | 2019-11-14 | 2020-03-06 | 北京理工大学 | Industrial control network intrusion detection method and system based on multi-voting technology |
CN111553381A (en) * | 2020-03-23 | 2020-08-18 | 北京邮电大学 | Network intrusion detection method and device based on multiple network models and electronic equipment |
CN111553381B (en) * | 2020-03-23 | 2022-11-18 | 北京邮电大学 | Network intrusion detection method and device based on multiple network models and electronic equipment |
CN113688436A (en) * | 2020-05-19 | 2021-11-23 | 天津大学 | PCA and naive Bayes classification fusion hardware Trojan horse detection method |
CN111988306A (en) * | 2020-08-17 | 2020-11-24 | 北京邮电大学 | Method and system for detecting DDoS attack traffic in network based on variational Bayes |
CN112185484A (en) * | 2020-10-13 | 2021-01-05 | 华北科技学院 | AdaBoost model-based water quality characteristic mineral water classification method |
CN113255212A (en) * | 2021-05-17 | 2021-08-13 | 中国南方电网有限责任公司超高压输电公司昆明局 | Model selection method for converter valve cooling system based on PCA and Bayesian classifier |
CN113726785A (en) * | 2021-08-31 | 2021-11-30 | 平安普惠企业管理有限公司 | Network intrusion detection method and device, computer equipment and storage medium |
CN113726785B (en) * | 2021-08-31 | 2022-11-11 | 平安普惠企业管理有限公司 | Network intrusion detection method and device, computer equipment and storage medium |
CN117650949A (en) * | 2024-01-30 | 2024-03-05 | 山东鲁商科技集团有限公司 | Network attack interception method and system based on RPA robot data analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108632278A (en) | A kind of network inbreak detection method being combined with Bayes based on PCA | |
Yang et al. | MTH-IDS: A multitiered hybrid intrusion detection system for internet of vehicles | |
Aljawarneh et al. | Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model | |
Depren et al. | An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks | |
Gogoi et al. | MLH-IDS: a multi-level hybrid intrusion detection method | |
Bai et al. | A machine learning approach for rdp-based lateral movement detection | |
Farahani | Feature selection based on cross-correlation for the intrusion detection system | |
Diallo et al. | Adaptive clustering-based malicious traffic classification at the network edge | |
Garg et al. | HyClass: Hybrid classification model for anomaly detection in cloud environment | |
Diwan et al. | Feature entropy estimation (FEE) for malicious IoT traffic and detection using machine learning | |
Al-Fawa'reh et al. | Detecting stealth-based attacks in large campus networks | |
Zhong et al. | An adversarial learning model for intrusion detection in real complex network environments | |
Brandao et al. | Log Files Analysis for Network Intrusion Detection | |
Silva et al. | Attackers are not stealthy: Statistical analysis of the well-known and infamous KDD network security dataset | |
Zhang et al. | Detection of android malware based on deep forest and feature enhancement | |
Sakthivelu et al. | Advanced Persistent Threat Detection and Mitigation Using Machine Learning Model. | |
Dharamkar et al. | A review of cyber attack classification technique based on data mining and neural network approach | |
Seniaray et al. | Machine learning-based network intrusion detection system | |
Kosamkar et al. | Data Mining Algorithms for Intrusion Detection System: An Overview | |
Caulkins et al. | A dynamic data mining technique for intrusion detection systems | |
Sulaiman et al. | Big data analytic of intrusion detection system | |
Manandhar | A practical approach to anomaly-based intrusion detection system by outlier mining in network traffic | |
Rani et al. | Analysis of machine learning and deep learning intrusion detection system in Internet of Things network | |
Babu et al. | Improved Monarchy Butterfly Optimization Algorithm (IMBO): Intrusion Detection Using Mapreduce Framework Based Optimized ANU-Net. | |
Mohammed et al. | An automated signature generation method for zero-day polymorphic worms based on multilayer perceptron model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20181009 |
|
WD01 | Invention patent application deemed withdrawn after publication |