CN109257383A - A kind of BGP method for detecting abnormality and system - Google Patents
A kind of BGP method for detecting abnormality and system Download PDFInfo
- Publication number
- CN109257383A CN109257383A CN201811331848.7A CN201811331848A CN109257383A CN 109257383 A CN109257383 A CN 109257383A CN 201811331848 A CN201811331848 A CN 201811331848A CN 109257383 A CN109257383 A CN 109257383A
- Authority
- CN
- China
- Prior art keywords
- feature
- optimization
- distance
- data set
- abnormal data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Complex Calculations (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application discloses a kind of BGP method for detecting abnormality and systems, method includes: acquisition abnormal data set, data normalization processing is carried out to abnormal data set, the feature that can be maximized between class distance simultaneously and minimize inter- object distance is selected from feature set, and obtain the feature weight of each characteristic measure classification capacity, optimization gauss kernel function, parameter optimization determine optimal feature subset.The application can carry out parameter optimization based on improved gaussian kernel function and based on grid search and cross validation, to improve category of model accuracy rate, based on optimal feature subset come evaluation model comprehensive performance.
Description
Technical field
This application involves abnormality detection technical field more particularly to a kind of BGP (Border Gateway Protocol, sides
Boundary's gateway protocol) method for detecting abnormality and system.
Background technique
According to event consequence, BGP can be divided into data flow abduction exception extremely and update message surge exception.Data flow is kidnapped
Exception will lead to the redirection of victim network data flow, forms flow black hole etc., destroys the accessibility of victim network.Update report
Abnormal will lead to of text surge generates a large amount of bgp update message in a very short period of time, destroys the stability of Global Internet.
BGP method for detecting abnormality is generally divided into five classes at present, is method based on statistical-simulation spectrometry respectively, based on history
The method of bgp update message, based on accessibility verifying method, and the method based on time series analysis and be based on engineering
The method of habit.Method based on statistical-simulation spectrometry carries out pattern-recognition using statistical probability theory, according to the distance between mode
Function determines exception, can simultaneously detection data stream kidnap it is abnormal and update message increase sharply it is abnormal.But this method is faced with just
The really difficulty of estimation high dimensional data distribution, detection speed is slow, and the threshold value for manually determining model parameter is needed in practical application.It is based on
The method of history bgp update message and the method verified based on accessibility are only capable of detection data stream and kidnap exception, the former utilizes and goes through
History data detect BGP anomalous routes, and the latter carries out abnormality detection according to the accessibility verification result of destination prefix.Base
Method in time series analysis and the method based on machine learning, which are able to detect, updates message surge exception.Wherein, when being based on
Between the method analyzed of sequence bgp update message is considered as to the time series of a multidimensional, pass through the suitable time slip-window of selection
Cause for gossip shows abnormality detection.But this method is difficult to determine the size of time window, and time window is too small, and to will lead to model available
Information content is inadequate, and time window is excessive and to will lead to model insensitive to local anomaly, so that rate of failing to report rises.In recent years, machine
Device learning method has obtained certain application in BGP abnormality detection field.From the point of view of machine learning angle, BGP abnormality detection problem
Two classification problems can be abstracted as, it is therefore an objective to unknown bgp update message is identified as normal message or exception message, to realize
BGP abnormality detection.
In conclusion that there is such as Detection accuracies is lower, parameter threshold estimation is tired for traditional BGP method for detecting abnormality
Difficult, detection speed compared with slow, deployment difficulty is big, dependent on a series of practical problems such as the completeness of knowledge base.
Therefore, how to solve that prior art classification accuracy is lower, and effect is less good, the comprehensive performance of model is not done
It evaluates out, is a urgent problem to be solved.
Summary of the invention
In view of this, this application provides a kind of BGP method for detecting abnormality, can based on improved gaussian kernel function and
Parameter optimization is carried out based on grid search and cross validation to comment to improve category of model accuracy rate based on optimal feature subset
Valence model comprehensive performance.
This application provides a kind of BGP method for detecting abnormality, which comprises
Obtain abnormal data set;
Data normalization processing is carried out to the abnormal data set;
The feature that can be maximized between class distance simultaneously and minimize inter- object distance is selected from feature set, and is obtained each
The feature weight of characteristic measure classification capacity;
Optimization gauss kernel function;
Parameter optimization;
Determine optimal feature subset.
Preferably, the acquisition abnormal data set includes:
Abnormal data set is obtained from autonomous system.
Preferably, described to include: to abnormal data set progress data normalization processing
Population mean is replaced using sample average, population standard deviation is replaced using sample standard deviation.
Preferably, the optimization gauss kernel function includes:
Using manhatton distance and feature weight optimization gauss kernel function, wherein the manhatton distance is as Gaussian kernel
The distance between two vectors Measurement Method is measured in function.
Preferably, the parameter optimization includes:
Parameter optimization is carried out to supporting vector machine model based on grid search and cross validation.
A kind of BGP abnormality detection system, comprising:
Module is obtained, for obtaining abnormal data set;
Processing module, for carrying out data normalization processing to the abnormal data set;
First determining module can maximize between class distance and minimum inter- object distance for selecting from feature set simultaneously
Feature, and obtain the feature weight of each characteristic measure classification capacity;
Optimization module is used for optimization gauss kernel function;
Optimizing module is used for parameter optimization;
Second determining module, for determining optimal feature subset.
Preferably, the acquisition module is specifically used for:
Abnormal data set is obtained from autonomous system.
Preferably, the processing module is specifically used for:
Population mean is replaced using sample average, population standard deviation is replaced using sample standard deviation.
Preferably, the optimization module is specifically used for:
Using manhatton distance and feature weight optimization gauss kernel function, wherein the manhatton distance is as Gaussian kernel
The distance between two vectors Measurement Method is measured in function.
Preferably, the optimizing module is specifically used for:
Parameter optimization is carried out to supporting vector machine model based on grid search and cross validation.
In conclusion this application discloses a kind of BGP method for detecting abnormality, it is abnormal when needing to carry out Border Gateway Protocol
When detection, first then acquisition abnormal data set carries out data normalization processing to abnormal data set, selects from feature set
Between class distance can be maximized simultaneously out and minimize the feature of inter- object distance, and obtain the feature of each characteristic measure classification capacity
Weight, optimization gauss kernel function, parameter optimization determine optimal feature subset.The application can be based on improved gaussian kernel function
And parameter optimization is carried out based on grid search and cross validation, to improve category of model accuracy rate, it is based on optimal feature subset
Carry out evaluation model comprehensive performance.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of application for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow chart of BGP method for detecting abnormality embodiment 1 disclosed in the present application;
Fig. 2 is a kind of flow chart of BGP method for detecting abnormality embodiment 2 disclosed in the present application;
Fig. 3 is a kind of structural schematic diagram of BGP abnormality detection system embodiment 1 disclosed in the present application;
Fig. 4 is a kind of structural schematic diagram of BGP abnormality detection system embodiment 2 disclosed in the present application;
Fig. 5 is grid search schematic diagram disclosed in the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on
Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall in the protection scope of this application.
As shown in Figure 1, being a kind of flow chart of BGP method for detecting abnormality embodiment 1 disclosed in the present application, the method can
With the following steps are included:
S101, abnormal data set is obtained;
When needing to carry out abnormality detection BGP (Border Gateway Protocol, Border Gateway Protocol), first
Abnormal data set is obtained, that is, obtains detection sample.The BGP is the autonomous routing association of the decentralization of a core on internet
View realizes the accessibility between autonomous system by maintenance routing table, belongs to vector route agreement.
S102, data normalization processing is carried out to abnormal data set;
After getting abnormal data set, data normalization processing further is carried out to the abnormal data set got, with
The influence for eliminating dimension and numerical values recited, compares and weights so that different characteristic is able to carry out.
S103, the feature that can be maximized between class distance simultaneously and minimize inter- object distance is selected from feature set, and obtain
To the feature weight of each characteristic measure classification capacity;
Then, using FMS (Fisher-Markov Selector) feature selecting algorithm, selecting from feature set can be same
When maximize between class distance and minimize the feature of inter- object distance, and obtain the feature weight of each characteristic measure classification capacity.
S104, optimization gauss kernel function;
Then, SVM (Super Vector is constructed based on the improvement gaussian kernel function of manhatton distance and feature weight
Machine, support vector machines) disaggregated model.SVM be classification with regression analysis in analyze data supervised learning model with
Relevant machine learning algorithm.
S105, parameter optimization;
Then, parameter optimization is carried out to SVM model based on grid search and cross validation.
S106, optimal feature subset is determined.
Finally, proposing optimal feature subset based on considering of both category of model accuracy rate and model training time
Concept, and building method is provided, under optimal feature subset, it is optimal that model performance can reach synthesis.
In conclusion in the above-described embodiments, when needing to carry out abnormality detection Border Gateway Protocol, obtaining first different
Then regular data collection carries out data normalization processing to abnormal data set, between selecting in feature set and can maximize class simultaneously
Distance and the feature for minimizing inter- object distance, and the feature weight of each characteristic measure classification capacity is obtained, optimization gauss core letter
Number, parameter optimization determine optimal feature subset.The application can based on improved gaussian kernel function and based on grid search with
Cross validation carries out parameter optimization, to improve category of model accuracy rate, based on optimal feature subset come evaluation model comprehensive performance.
As shown in Fig. 2, being a kind of flow chart of BGP method for detecting abnormality embodiment 2 disclosed in the present application, the method can
With the following steps are included:
S201, abnormal data set is obtained from autonomous system;
When needing to carry out abnormality detection BGP (Border Gateway Protocol, Border Gateway Protocol), from
AS513 (RIPE RIS, rcc04, CIXP, Geneva) downloads BGP when Slammer, Nimda and Code Red I are broken out more
New message is as BGP abnormal data set.Data, which will be routed, using libBGPdump tool is converted to ASCII fromat from MRT format,
Ascii text file is then parsed based on the analytical tool that C# writes and extracts the statistical information of 37 features (as shown in table 1).Five
At interval of one minute one sub-eigenvalue of sampling statistics in it, to can get 7200 samples of each anomalous event.Each thing
Two days samples are considered as normal data set before and after part, and third day is the peak period of each abnormal movement.
1 feature extraction of table
S202, data normalization processing is carried out to abnormal data set;
Then, the influence that dimension and numerical values recited are eliminated using Z-score standardized method, so that different characteristic energy
It is enough compared and weights.In view of BGP data set from sampling statistics, the present invention replaces population mean with sample average, uses
Sample standard deviation replaces population standard deviation.Processing method is shown below:
In formulaRepresentative sample mean value, S representative sample standard deviation.
S203, the feature that can be maximized between class distance simultaneously and minimize inter- object distance is selected from feature set, and obtain
To the feature weight of each characteristic measure classification capacity;
Due to the presence of redundancy feature, computing cost, and noise number will increase based on high dimensional feature structural classification model
According to the classification accuracy that can also reduce model.Therefore, individual features are obtained via characteristic extraction procedure in data preprocessing phase
It on the basis of set, needs further to delete redundancy and uncorrelated features, finds and distinguish the optimal character subset of classification ability, with
Reach the dimension and computation complexity for reducing eigenmatrix, while improving the purpose of category of model accuracy rate.
FMS feature selecting algorithm can go out according to Fisher linear analysis and the Markov random field choice of technology can simultaneously most
Bigization between class distance and the feature for minimizing inter- object distance, it is mutually only using the feature selection process and assorting process of FMS algorithm
It is vertical, correlation is measured according to data inherent attribute, the weight size of each feature is ranked up, the bigger expression of weight should
The ability of feature differentiation classification is stronger.This method efficiency is higher, can not only guarantee global optimum, and computation complexity compared with
It is low, it is suitble to processing large-scale data.Training data is expressed as { (xk,yk)}n K=1, xk∈RpIndicate p dimensional feature vector, yk∈
{ω1,...,ωgIndicate class label, Ci, i=1 ..., g indicate i-th of class, each class CiThere is niA sample.
Define 1: within-class scatter matrix, between class scatter matrix and total population scatter matrix, which are divided into, is denoted as Sw, SbAnd St, in formula
Indicate CjI-th of sample of class,Indicate CjThe sample average of class,Indicate population mean.
Define 2: from input space RpTo nuclear space RDNonlinear Mapping φ () be defined as follows:
φ:Rp→RD (2)
3: kernel function k () is defined to meet:
< φ (x1),φ(x2) >=k (x1,x2) (3)
Operation < in formula, > represent the dot product under nuclear space.
Define 4:K and K(i)Respectively n rank and niRank square matrix, and meet:
K in formula, l ∈ 1 ..., n }, u, v ∈ 1 ..., ni, i=1 ..., g
Define 5:Sw, SbAnd StIt is denoted as respectively under nuclear spaceWithIts mark is respectively as follows:
Sum () indicates the summation of calculating matrix all elements in formula.
Define 6: defined feature selects vector are as follows:
α=[α1,...,αp]T∈{0,1}p (6)
α in formulak=1 shows that k-th of feature is selected, αk=0 shows that k-th of feature is not selected.
The feature selected from feature vector x is provided by x (α)=x ⊙ α, and ⊙ indicates Hadamard product.Therefore it can will maximize
The feature selecting criterion of class separation is converted into such as next Unconstrained Optimization Problem.
Wherein γ is free parameter, analysis shows γ≤0 can obtain a better classifying quality, in the reasonable fluctuation of γ
In range, the experiment effect of classifier is insensitive to γ.In order to handle linearly inseparable from redundancy feature bring
L is added in strong noise data set0The optimization of norm, i.e. feature selecting criterion are as follows:
Regular factor β indicates global threshold in formula.
Consider such as next linear kernel function:
Formula (8) are updated to, are obtained:
Wherein θjDefinition is as shown in formula (11), θjFor measuring the significance level that feature differentiates in class separation, the i.e. power of feature
Value.θjValue is bigger, shows that j-th of feature is more important.
For given β and γ, convolution (10), FMS feature selecting algorithm obtains an optimal feature selecting vector
α*∈{0,1}pMeet:
IfShow that j-th of feature is selected;Otherwise, ifThen show that j-th of feature is not selected.
For pseudo-code of the algorithm as shown in the following table 2 algorithm 1, the computation complexity of the algorithm is O (n2p)。
Table 2
S204, using manhatton distance and feature weight optimization gauss kernel function, wherein the manhatton distance is as high
The distance between two vectors Measurement Method is measured in this kernel function;
Traditional Gauss kernel function is using the distance between two vectors of Euclidean distance metric.But Euclidean
Distance can amplify effect of the large error element in distance calculates to a certain extent, influence the classification accuracy of SVM.It is based on
This, the present invention is using manhatton distance as the distance between two vectors of measurement Measurement Method in gaussian kernel function.Manhattan
Influence of the error of each element to overall distance is all identical in distance, makes value with more comparativity, and operand is lower.
If each feature can be embodied to the percentage contribution of classification in distance calculates, it will be bonded classification method more
The data characteristics of BGP, can further improve classification accuracy.Accordingly, introduced feature weight carrys out contribution of the measures characteristic to classification
Degree proposes the improvement gaussian kernel function based on manhatton distance and feature weight, is denoted as k ' (x, y), as shown in formula (13):
K ' (x, y)=exp (- γ δ (x, y)) (13)
δ (x, y) indicates the manhatton distance between two vectors in formula, as shown in formula (14):
S205, parameter optimization is carried out to supporting vector machine model based on grid search and cross validation;
The performance of SVM model relies on a pair of important parameter (C, gamma).Wherein C is referred to as penalty factor, indicates to accidentally
The tolerance of difference.C is higher, shows that model more can't stand and error occurs, easily lead to model over-fitting;On the contrary, C is smaller, and easily
Lead to model poor fitting.C is excessive or too small, can reduce the generalization ability of model, therefore the appropriate value of parameter C is to model point
The promotion of class accuracy rate and generalization ability is of great significance.Gamma is in polynomial kernel, Gaussian kernel and Sigmoid core
One parameter, it, which is implied, determines the distribution that data are mapped to after new feature space.Gamma value is bigger, then supporting vector is got over
Few, gamma value is smaller, then supporting vector is more, and the number of supporting vector will affect the speed of model training and prediction.
It is evaluation mesh with overall classification accuracy in view of the disequilibrium (such as table 3 shows) of two class sample of training dataset
Target traditional classification algorithm can pay close attention to most classes too much, so that the classification performance of minority class sample declines.For this purpose,
In the searching process of (C, gamma), needs sufficiently to look after minority class sample data, make two class samples that there is phase in the training process
Same " right of speech ".Carry out respectively two class samples according to the inverse ratio of two class number of samples size ratios herein and assign weight, in this way
The unbalanced situation of data can effectively be solved.
3 liang of class sample weights of table
The selection of nuclear parameter is a difficult point, there is presently no the method for internationally recognized universality, in practical application only
Energy compare by experiment or experience gained.Therefore, grid search and cross validation are combined under unbalanced dataset constraint herein
It carries out parameter optimization (such as Fig. 5 shows), the search range of (C, gamma) is divided into grid according to value, each point in grid
Represent a kind of parameter combination scheme.The range of grid search meets formula (16), step-length 1, i.e. C ∈ { 2-5,2-4,...,25And
gamma∈{2-4,2-3,...,20}。
In each mesh point, cross validation is carried out by following process: total training set is divided into N number of subset,
Middle N-1 are used as training set, and remaining 1 is used as test set.Model after going test training with test set every time can obtain one
Classification accuracy took the average value of N folding cross validation classification accuracy after N number of subset all does test set.Net is traversed in this way
All the points in lattice, taking the maximum point of classification accuracy average value is (C, the gamma) of corresponding best performance.It may be noted that
It is to use 5 folding cross validations herein, and since (C, gamma) to choose search range in search process limited and discrete
Value, so (C, gamma) is perhaps locally optimal solution.
S206, optimal feature subset is determined.
Based on the weight of the available each feature of FMS feature selecting algorithm, each feature is arranged by weight descending, according to
Feature is sequentially added model training and concentrated by ranking results.Experiment discovery because early period be added training set in feature weight compared with
Greatly, the classification accuracy of model can be gradually increased, but with the addition of the lower feature of later period weight, and noise and superfluous in data set
The presence of remainder evidence, the growth rate of category of model accuracy rate will slow down even accuracy rate decline at this time.But at the same time,
The training time of SVM model can then increase always with the increase of feature quantity.Therefore, increase feature simply for model
Training is inappropriate.Based on this, set forth herein the concepts of optimal feature subset, sort when according to feature weight, are used for model
When trained characteristic set is just optimal feature subset, model performance (i.e. the classification accuracy of model and training time) reaches comprehensive
It closes optimal.Further, the pass between measurement model classification accuracy and model training time is come set forth herein feature efficiency function
System, to determine optimal feature subset, so that model performance reaches comprehensive optimal.
Defining 7: function f (n) is function of the category of model accuracy rate about feature quantity n, n ∈ Z.
Defining 8: function g (n) is function of the model training time about feature quantity n, n ∈ Z.
By above-mentioned definition it is found that function f (n) and g (n) are respectively described when model training collection includes a certain number of features
When, the classification accuracy and the size of model training time of model.For the optimal synthesis performance of evaluation model, feature is defined
Efficiency function, as defined shown in 9.
Defining 9:h (n) is the feature efficiency function about feature quantity n, and expression formula is as follows:
Intuitively, h (n) describes the size of classification accuracy in the unit time, if classification accuracy is got in the unit time
Greatly namely h (n) is bigger, then model comprehensive performance is more excellent.Naturally, the concept for having obtained following optimum point, as defined 10 institutes
Show.
It defines 10: h (n) being made to obtain the point n of maximum value0It is called the optimum point of model.
Optimum point, which describes, works as n=n0When, model can obtain maximum classification accuracy within the unit time, at this time model
Comprehensive performance has reached optimal.It is apparent that sorted according to feature weight, TOPn0As optimal feature subset.
As shown in figure 3, being a kind of structural schematic diagram of BGP abnormality detection system embodiment 1 disclosed in the present application, the system
System may include:
Module 301 is obtained, for obtaining abnormal data set;
When needing to carry out abnormality detection BGP (Border Gateway Protocol, Border Gateway Protocol), first
Abnormal data set is obtained, that is, obtains detection sample.The BGP is the autonomous routing association of the decentralization of a core on internet
View realizes the accessibility between autonomous system by maintenance routing table, belongs to vector route agreement.
Processing module 302, for carrying out data normalization processing to abnormal data set;
After getting abnormal data set, data normalization processing further is carried out to the abnormal data set got, with
The influence for eliminating dimension and numerical values recited, compares and weights so that different characteristic is able to carry out.
First determining module 303, for from selected in feature set can maximize simultaneously between class distance and minimize class in
The feature of distance, and obtain the feature weight of each characteristic measure classification capacity;
Then, using FMS feature selecting algorithm, between class distance and minimum can be maximized simultaneously by selecting from feature set
The feature of inter- object distance, and obtain the feature weight of each characteristic measure classification capacity.
Optimization module 304 is used for optimization gauss kernel function;
Then, SVM (Super Vector is constructed based on the improvement gaussian kernel function of manhatton distance and feature weight
Machine, support vector machines) disaggregated model.SVM be classification with regression analysis in analyze data supervised learning model with
Relevant machine learning algorithm.
Optimizing module 305 is used for parameter optimization;
Then, parameter optimization is carried out to SVM model based on grid search and cross validation.
Second determining module 306, for determining optimal feature subset.
Finally, proposing optimal feature subset based on considering of both category of model accuracy rate and model training time
Concept, and building method is provided, under optimal feature subset, it is optimal that model performance can reach synthesis.
In conclusion in the above-described embodiments, when needing to carry out abnormality detection Border Gateway Protocol, obtaining first different
Then regular data collection carries out data normalization processing to abnormal data set, between selecting in feature set and can maximize class simultaneously
Distance and the feature for minimizing inter- object distance, and the feature weight of each characteristic measure classification capacity is obtained, optimization gauss core letter
Number, parameter optimization determine optimal feature subset.The application can based on improved gaussian kernel function and based on grid search with
Cross validation carries out parameter optimization, to improve category of model accuracy rate, based on optimal feature subset come evaluation model comprehensive performance.
As shown in figure 4, being a kind of structural schematic diagram of BGP abnormality detection system embodiment 2 disclosed in the present application, the system
System may include:
Module 401 is obtained, for obtaining abnormal data set from autonomous system;
When needing to carry out abnormality detection BGP (Border Gateway Protocol, Border Gateway Protocol), from
AS513 (RIPE RIS, rcc04, CIXP, Geneva) downloads BGP when Slammer, Nimda and Code Red I are broken out more
New message is as BGP abnormal data set.Data, which will be routed, using libBGPdump tool is converted to ASCII fromat from MRT format,
Ascii text file is then parsed based on the analytical tool that C# writes and extracts the statistical information of 37 features (as shown in table 1).Five
At interval of one minute one sub-eigenvalue of sampling statistics in it, to can get 7200 samples of each anomalous event.Each thing
Two days samples are considered as normal data set before and after part, and third day is the peak period of each abnormal movement.
1 feature extraction of table
Processing module 402, for carrying out data normalization processing to abnormal data set;
Then, the influence that dimension and numerical values recited are eliminated using Z-score standardized method, so that different characteristic energy
It is enough compared and weights.In view of BGP data set from sampling statistics, the present invention replaces population mean with sample average, uses
Sample standard deviation replaces population standard deviation.Processing method is shown below:
In formulaRepresentative sample mean value, S representative sample standard deviation.
First determining module 403, for from selected in feature set can maximize simultaneously between class distance and minimize class in
The feature of distance, and obtain the feature weight of each characteristic measure classification capacity;
Due to the presence of redundancy feature, computing cost, and noise number will increase based on high dimensional feature structural classification model
According to the classification accuracy that can also reduce model.Therefore, individual features are obtained via characteristic extraction procedure in data preprocessing phase
It on the basis of set, needs further to delete redundancy and uncorrelated features, finds and distinguish the optimal character subset of classification ability, with
Reach the dimension and computation complexity for reducing eigenmatrix, while improving the purpose of category of model accuracy rate.
FMS feature selecting algorithm can go out according to Fisher linear analysis and the Markov random field choice of technology can simultaneously most
Bigization between class distance and the feature for minimizing inter- object distance, it is mutually only using the feature selection process and assorting process of FMS algorithm
It is vertical, correlation is measured according to data inherent attribute, the weight size of each feature is ranked up, the bigger expression of weight should
The ability of feature differentiation classification is stronger.This method efficiency is higher, can not only guarantee global optimum, and computation complexity compared with
It is low, it is suitble to processing large-scale data.Training data is expressed asxk∈RpIndicate p dimensional feature vector, yk∈
{ω1,...,ωgIndicate class label, Ci, i=1 ..., g indicate i-th of class, each class CiThere is niA sample.
Define 1: within-class scatter matrix, between class scatter matrix and total population scatter matrix, which are divided into, is denoted as Sw, SbAnd St, in formula
Indicate CjI-th of sample of class,Indicate CjThe sample average of class,Indicate population mean.
Define 2: from input space RpTo nuclear space RDNonlinear Mapping φ () be defined as follows:
φ:Rp→RD (2)
3: kernel function k () is defined to meet:
< φ (x1),φ(x2) >=k (x1,x2) (3)
Operation < in formula, > represent the dot product under nuclear space.
Define 4:K and K(i)Respectively n rank and niRank square matrix, and meet:
K in formula, l ∈ 1 ..., n }, u, v ∈ 1 ..., ni, i=1 ..., g
Define 5:Sw, SbAnd StIt is denoted as respectively under nuclear spaceWithIts mark is respectively as follows:
Sum () indicates the summation of calculating matrix all elements in formula.
Define 6: defined feature selects vector are as follows:
α=[α1,...,αp]T∈{0,1}p (6)
α in formulak=1 shows that k-th of feature is selected, αk=0 shows that k-th of feature is not selected.
The feature selected from feature vector x is provided by x (α)=x ⊙ α, and ⊙ indicates Hadamard product.Therefore it can will maximize
The feature selecting criterion of class separation is converted into such as next Unconstrained Optimization Problem.
Wherein γ is free parameter, analysis shows γ≤0 can obtain a better classifying quality, in the reasonable fluctuation of γ
In range, the experiment effect of classifier is insensitive to γ.In order to handle linearly inseparable from redundancy feature bring
L is added in strong noise data set0The optimization of norm, i.e. feature selecting criterion are as follows:
Regular factor β indicates global threshold in formula.
Consider such as next linear kernel function:
Formula (8) are updated to, are obtained:
Wherein θjDefinition is as shown in formula (11), θjFor measuring the significance level that feature differentiates in class separation, the i.e. power of feature
Value.θjValue is bigger, shows that j-th of feature is more important.
For given β and γ, convolution (10), FMS feature selecting algorithm obtains an optimal feature selecting vector
α*∈{0,1}pMeet:
IfShow that j-th of feature is selected;Otherwise, ifThen show that j-th of feature is not selected.
For pseudo-code of the algorithm as shown in the following table 2 algorithm 1, the computation complexity of the algorithm is O (n2p)。
Table 2
Optimization module 404, for use manhatton distance and feature weight optimization gauss kernel function, wherein Manhattan away from
From as the distance between two vectors of measurement Measurement Method in gaussian kernel function;
Traditional Gauss kernel function is using the distance between two vectors of Euclidean distance metric.But Euclidean
Distance can amplify effect of the large error element in distance calculates to a certain extent, influence the classification accuracy of SVM.It is based on
This, the present invention is using manhatton distance as the distance between two vectors of measurement Measurement Method in gaussian kernel function.Manhattan
Influence of the error of each element to overall distance is all identical in distance, makes value with more comparativity, and operand is lower.
If each feature can be embodied to the percentage contribution of classification in distance calculates, it will be bonded classification method more
The data characteristics of BGP, can further improve classification accuracy.Accordingly, introduced feature weight carrys out contribution of the measures characteristic to classification
Degree proposes the improvement gaussian kernel function based on manhatton distance and feature weight, is denoted as k ' (x, y), as shown in formula (13):
K ' (x, y)=exp (- γ δ (x, y)) (13)
δ (x, y) indicates the manhatton distance between two vectors in formula, as shown in formula (14):
Optimizing module 405, for carrying out parameter optimization to supporting vector machine model based on grid search and cross validation;
The performance of SVM model relies on a pair of important parameter (C, gamma).Wherein C is referred to as penalty factor, indicates to accidentally
The tolerance of difference.C is higher, shows that model more can't stand and error occurs, easily lead to model over-fitting;On the contrary, C is smaller, and easily
Lead to model poor fitting.C is excessive or too small, can reduce the generalization ability of model, therefore the appropriate value of parameter C is to model point
The promotion of class accuracy rate and generalization ability is of great significance.Gamma is in polynomial kernel, Gaussian kernel and Sigmoid core
One parameter, it, which is implied, determines the distribution that data are mapped to after new feature space.Gamma value is bigger, then supporting vector is got over
Few, gamma value is smaller, then supporting vector is more, and the number of supporting vector will affect the speed of model training and prediction.
It is evaluation mesh with overall classification accuracy in view of the disequilibrium (such as table 3 shows) of two class sample of training dataset
Target traditional classification algorithm can pay close attention to most classes too much, so that the classification performance of minority class sample declines.For this purpose,
In the searching process of (C, gamma), needs sufficiently to look after minority class sample data, make two class samples that there is phase in the training process
Same " right of speech ".Carry out respectively two class samples according to the inverse ratio of two class number of samples size ratios herein and assign weight, in this way
The unbalanced situation of data can effectively be solved.
3 liang of class sample weights of table
The selection of nuclear parameter is a difficult point, there is presently no the method for internationally recognized universality, in practical application only
Energy compare by experiment or experience gained.Therefore, grid search and cross validation are combined under unbalanced dataset constraint herein
It carries out parameter optimization (such as Fig. 5 shows), the search range of (C, gamma) is divided into grid according to value, each point in grid
Represent a kind of parameter combination scheme.The range of grid search meets formula (16), step-length 1, i.e. C ∈ { 2-5,2-4,...,25And
gamma∈{2-4,2-3,...,20}。
In each mesh point, cross validation is carried out by following process: total training set is divided into N number of subset,
Middle N-1 are used as training set, and remaining 1 is used as test set.Model after going test training with test set every time can obtain one
Classification accuracy took the average value of N folding cross validation classification accuracy after N number of subset all does test set.Net is traversed in this way
All the points in lattice, taking the maximum point of classification accuracy average value is (C, the gamma) of corresponding best performance.It may be noted that
It is to use 5 folding cross validations herein, and since (C, gamma) to choose search range in search process limited and discrete
Value, so (C, gamma) is perhaps locally optimal solution.
Second determining module 406, for determining optimal feature subset.
Based on the weight of the available each feature of FMS feature selecting algorithm, each feature is arranged by weight descending, according to
Feature is sequentially added model training and concentrated by ranking results.Experiment discovery because early period be added training set in feature weight compared with
Greatly, the classification accuracy of model can be gradually increased, but with the addition of the lower feature of later period weight, and noise and superfluous in data set
The presence of remainder evidence, the growth rate of category of model accuracy rate will slow down even accuracy rate decline at this time.But at the same time,
The training time of SVM model can then increase always with the increase of feature quantity.Therefore, increase feature simply for model
Training is inappropriate.Based on this, set forth herein the concepts of optimal feature subset, sort when according to feature weight, are used for model
When trained characteristic set is just optimal feature subset, model performance (i.e. the classification accuracy of model and training time) reaches comprehensive
It closes optimal.Further, the pass between measurement model classification accuracy and model training time is come set forth herein feature efficiency function
System, to determine optimal feature subset, so that model performance reaches comprehensive optimal.
Defining 7: function f (n) is function of the category of model accuracy rate about feature quantity n, n ∈ Z.
Defining 8: function g (n) is function of the model training time about feature quantity n, n ∈ Z.
By above-mentioned definition it is found that function f (n) and g (n) are respectively described when model training collection includes a certain number of features
When, the classification accuracy and the size of model training time of model.For the optimal synthesis performance of evaluation model, feature is defined
Efficiency function, as defined shown in 9.
Defining 9:h (n) is the feature efficiency function about feature quantity n, and expression formula is as follows:
Intuitively, h (n) describes the size of classification accuracy in the unit time, if classification accuracy is got in the unit time
Greatly namely h (n) is bigger, then model comprehensive performance is more excellent.Naturally, the concept for having obtained following optimum point, as defined 10 institutes
Show.
It defines 10: h (n) being made to obtain the point n of maximum value0It is called the optimum point of model.
Optimum point, which describes, works as n=n0When, model can obtain maximum classification accuracy within the unit time, at this time model
Comprehensive performance has reached optimal.It is apparent that sorted according to feature weight, TOPn0As optimal feature subset.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment
For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part
It is bright.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure
And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These
Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession
Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered
Think beyond scope of the present application.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor
The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit
Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology
In any other form of storage medium well known in field.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (10)
1. a kind of BGP method for detecting abnormality, which is characterized in that the described method includes:
Obtain abnormal data set;
Data normalization processing is carried out to the abnormal data set;
The feature that can be maximized between class distance simultaneously and minimize inter- object distance is selected from feature set, and obtains each feature
Measure the feature weight of classification capacity;
Optimization gauss kernel function;
Parameter optimization;
Determine optimal feature subset.
2. the method according to claim 1, wherein the acquisition abnormal data set includes:
Abnormal data set is obtained from autonomous system.
3. the method according to claim 1, wherein described carry out at data normalization the abnormal data set
Reason includes:
Population mean is replaced using sample average, population standard deviation is replaced using sample standard deviation.
4. the method according to claim 1, wherein the optimization gauss kernel function includes:
Using manhatton distance and feature weight optimization gauss kernel function, wherein the manhatton distance is as gaussian kernel function
The distance between two vectors of middle measurement Measurement Method.
5. the method according to claim 1, wherein the parameter optimization includes:
Parameter optimization is carried out to supporting vector machine model based on grid search and cross validation.
6. a kind of BGP abnormality detection system characterized by comprising
Module is obtained, for obtaining abnormal data set;
Processing module, for carrying out data normalization processing to the abnormal data set;
First determining module, for selecting the spy that can be maximized between class distance simultaneously and minimize inter- object distance from feature set
Sign, and obtain the feature weight of each characteristic measure classification capacity;
Optimization module is used for optimization gauss kernel function;
Optimizing module is used for parameter optimization;
Second determining module, for determining optimal feature subset.
7. system according to claim 6, which is characterized in that the acquisition module is specifically used for:
Abnormal data set is obtained from autonomous system.
8. system according to claim 6, which is characterized in that the processing module is specifically used for:
Population mean is replaced using sample average, population standard deviation is replaced using sample standard deviation.
9. system according to claim 6, which is characterized in that the optimization module is specifically used for:
Using manhatton distance and feature weight optimization gauss kernel function, wherein the manhatton distance is as gaussian kernel function
The distance between two vectors of middle measurement Measurement Method.
10. system according to claim 6, which is characterized in that the optimizing module is specifically used for:
Parameter optimization is carried out to supporting vector machine model based on grid search and cross validation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811331848.7A CN109257383B (en) | 2018-11-09 | 2018-11-09 | BGP anomaly detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811331848.7A CN109257383B (en) | 2018-11-09 | 2018-11-09 | BGP anomaly detection method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109257383A true CN109257383A (en) | 2019-01-22 |
CN109257383B CN109257383B (en) | 2021-09-21 |
Family
ID=65044099
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811331848.7A Active CN109257383B (en) | 2018-11-09 | 2018-11-09 | BGP anomaly detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109257383B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111835791A (en) * | 2020-07-30 | 2020-10-27 | 哈尔滨工业大学 | BGP security event rapid detection system |
CN112653675A (en) * | 2020-12-12 | 2021-04-13 | 海南师范大学 | Intelligent intrusion detection method and device based on deep learning |
CN112702221A (en) * | 2019-10-23 | 2021-04-23 | 中国电信股份有限公司 | BGP abnormal route monitoring method and device |
CN112905572A (en) * | 2021-01-29 | 2021-06-04 | 铁道警察学院 | Data anomaly information studying and judging model and method |
CN114535142A (en) * | 2022-01-11 | 2022-05-27 | 华南理工大学 | Data-driven intelligent determination method for dimension qualification of injection molding product |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101594361A (en) * | 2009-06-02 | 2009-12-02 | 浙江大学 | Network Intrusion Detection System based on shortcut calculation of support vector machine |
CN102903075A (en) * | 2012-10-15 | 2013-01-30 | 西安电子科技大学 | Robust watermarking method based on image feature point global correction |
CN105184316A (en) * | 2015-08-28 | 2015-12-23 | 国网智能电网研究院 | Support vector machine power grid business classification method based on feature weight learning |
US20180262525A1 (en) * | 2017-03-09 | 2018-09-13 | General Electric Company | Multi-modal, multi-disciplinary feature discovery to detect cyber threats in electric power grid |
-
2018
- 2018-11-09 CN CN201811331848.7A patent/CN109257383B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101594361A (en) * | 2009-06-02 | 2009-12-02 | 浙江大学 | Network Intrusion Detection System based on shortcut calculation of support vector machine |
CN102903075A (en) * | 2012-10-15 | 2013-01-30 | 西安电子科技大学 | Robust watermarking method based on image feature point global correction |
CN105184316A (en) * | 2015-08-28 | 2015-12-23 | 国网智能电网研究院 | Support vector machine power grid business classification method based on feature weight learning |
US20180262525A1 (en) * | 2017-03-09 | 2018-09-13 | General Electric Company | Multi-modal, multi-disciplinary feature discovery to detect cyber threats in electric power grid |
Non-Patent Citations (2)
Title |
---|
杨光、巫春玲等: "《基于RS和WSVM的网络入侵检测算法研究》", 《计算机仿真》 * |
高巍、彭宇: "《基于马氏距离多核学习的高光谱图像分类》", 《仪器仪表学报》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112702221A (en) * | 2019-10-23 | 2021-04-23 | 中国电信股份有限公司 | BGP abnormal route monitoring method and device |
CN111835791A (en) * | 2020-07-30 | 2020-10-27 | 哈尔滨工业大学 | BGP security event rapid detection system |
CN111835791B (en) * | 2020-07-30 | 2022-10-28 | 哈尔滨工业大学 | BGP security event rapid detection system |
CN112653675A (en) * | 2020-12-12 | 2021-04-13 | 海南师范大学 | Intelligent intrusion detection method and device based on deep learning |
CN112905572A (en) * | 2021-01-29 | 2021-06-04 | 铁道警察学院 | Data anomaly information studying and judging model and method |
CN114535142A (en) * | 2022-01-11 | 2022-05-27 | 华南理工大学 | Data-driven intelligent determination method for dimension qualification of injection molding product |
CN114535142B (en) * | 2022-01-11 | 2023-09-26 | 华南理工大学 | Intelligent judgment method for size qualification of injection molding product based on data driving |
Also Published As
Publication number | Publication date |
---|---|
CN109257383B (en) | 2021-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109257383A (en) | A kind of BGP method for detecting abnormality and system | |
CN106202561B (en) | Digitlization contingency management case base construction method and device based on text big data | |
Fong et al. | Accelerated PSO swarm search feature selection for data stream mining big data | |
CN111324642A (en) | Model algorithm type selection and evaluation method for power grid big data analysis | |
Amini et al. | On density-based data streams clustering algorithms: A survey | |
WO2020147488A1 (en) | Method and device for identifying irregular group | |
Martins | A supervised machine learning approach for duplicate detection over gazetteer records | |
CN108717408A (en) | A kind of sensitive word method for real-time monitoring, electronic equipment, storage medium and system | |
CN106503086A (en) | The detection method of distributed local outlier | |
CN100416560C (en) | Method and apparatus for clustered evolving data flow through on-line and off-line assembly | |
Shimada et al. | Class association rule mining with chi-squared test using genetic network programming | |
CN110347840A (en) | Complain prediction technique, system, equipment and the storage medium of text categories | |
CN107679734A (en) | It is a kind of to be used for the method and system without label data classification prediction | |
Gu et al. | [Retracted] Application of Fuzzy Decision Tree Algorithm Based on Mobile Computing in Sports Fitness Member Management | |
CN105183792B (en) | Distributed fast text classification method based on locality sensitive hashing | |
CN108304851A (en) | A kind of High Dimensional Data Streams Identifying Outliers method | |
CN113704389A (en) | Data evaluation method and device, computer equipment and storage medium | |
CN111143838A (en) | Database user abnormal behavior detection method | |
CN113762703A (en) | Method and device for determining enterprise portrait, computing equipment and storage medium | |
CN112087316B (en) | Network anomaly root cause positioning method based on anomaly data analysis | |
García-Vico et al. | Fepds: A proposal for the extraction of fuzzy emerging patterns in data streams | |
CN113298116A (en) | Attention weight-based graph embedding feature extraction method and device and electronic equipment | |
CN113837266B (en) | Software defect prediction method based on feature extraction and Stacking ensemble learning | |
CN114897085A (en) | Clustering method based on closed subgraph link prediction and computer equipment | |
CN117155771B (en) | Equipment cluster fault tracing method and device based on industrial Internet of things |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |