CN109257383A

CN109257383A - A kind of BGP method for detecting abnormality and system

Info

Publication number: CN109257383A
Application number: CN201811331848.7A
Authority: CN
Inventors: 王娜; 杜学绘; 戴仙波; 任志宇; 王文娟; 单棣斌; 杨智; 刘敖迪; 李少卓
Original assignee: PLA Information Engineering University
Current assignee: PLA Information Engineering University
Priority date: 2018-11-09
Filing date: 2018-11-09
Publication date: 2019-01-22
Anticipated expiration: 2038-11-09
Also published as: CN109257383B

Abstract

This application discloses a kind of BGP method for detecting abnormality and systems, method includes: acquisition abnormal data set, data normalization processing is carried out to abnormal data set, the feature that can be maximized between class distance simultaneously and minimize inter- object distance is selected from feature set, and obtain the feature weight of each characteristic measure classification capacity, optimization gauss kernel function, parameter optimization determine optimal feature subset.The application can carry out parameter optimization based on improved gaussian kernel function and based on grid search and cross validation, to improve category of model accuracy rate, based on optimal feature subset come evaluation model comprehensive performance.

Description

A kind of BGP method for detecting abnormality and system

Technical field

This application involves abnormality detection technical field more particularly to a kind of BGP (Border Gateway Protocol, sides Boundary's gateway protocol) method for detecting abnormality and system.

Background technique

According to event consequence, BGP can be divided into data flow abduction exception extremely and update message surge exception.Data flow is kidnapped Exception will lead to the redirection of victim network data flow, forms flow black hole etc., destroys the accessibility of victim network.Update report Abnormal will lead to of text surge generates a large amount of bgp update message in a very short period of time, destroys the stability of Global Internet.

BGP method for detecting abnormality is generally divided into five classes at present, is method based on statistical-simulation spectrometry respectively, based on history The method of bgp update message, based on accessibility verifying method, and the method based on time series analysis and be based on engineering The method of habit.Method based on statistical-simulation spectrometry carries out pattern-recognition using statistical probability theory, according to the distance between mode Function determines exception, can simultaneously detection data stream kidnap it is abnormal and update message increase sharply it is abnormal.But this method is faced with just The really difficulty of estimation high dimensional data distribution, detection speed is slow, and the threshold value for manually determining model parameter is needed in practical application.It is based on The method of history bgp update message and the method verified based on accessibility are only capable of detection data stream and kidnap exception, the former utilizes and goes through History data detect BGP anomalous routes, and the latter carries out abnormality detection according to the accessibility verification result of destination prefix.Base Method in time series analysis and the method based on machine learning, which are able to detect, updates message surge exception.Wherein, when being based on Between the method analyzed of sequence bgp update message is considered as to the time series of a multidimensional, pass through the suitable time slip-window of selection Cause for gossip shows abnormality detection.But this method is difficult to determine the size of time window, and time window is too small, and to will lead to model available Information content is inadequate, and time window is excessive and to will lead to model insensitive to local anomaly, so that rate of failing to report rises.In recent years, machine Device learning method has obtained certain application in BGP abnormality detection field.From the point of view of machine learning angle, BGP abnormality detection problem Two classification problems can be abstracted as, it is therefore an objective to unknown bgp update message is identified as normal message or exception message, to realize BGP abnormality detection.

In conclusion that there is such as Detection accuracies is lower, parameter threshold estimation is tired for traditional BGP method for detecting abnormality Difficult, detection speed compared with slow, deployment difficulty is big, dependent on a series of practical problems such as the completeness of knowledge base.

Therefore, how to solve that prior art classification accuracy is lower, and effect is less good, the comprehensive performance of model is not done It evaluates out, is a urgent problem to be solved.

Summary of the invention

In view of this, this application provides a kind of BGP method for detecting abnormality, can based on improved gaussian kernel function and Parameter optimization is carried out based on grid search and cross validation to comment to improve category of model accuracy rate based on optimal feature subset Valence model comprehensive performance.

This application provides a kind of BGP method for detecting abnormality, which comprises

Obtain abnormal data set；

Data normalization processing is carried out to the abnormal data set；

The feature that can be maximized between class distance simultaneously and minimize inter- object distance is selected from feature set, and is obtained each The feature weight of characteristic measure classification capacity；

Optimization gauss kernel function；

Parameter optimization；

Determine optimal feature subset.

Preferably, the acquisition abnormal data set includes:

Abnormal data set is obtained from autonomous system.

Preferably, described to include: to abnormal data set progress data normalization processing

Population mean is replaced using sample average, population standard deviation is replaced using sample standard deviation.

Preferably, the optimization gauss kernel function includes:

Using manhatton distance and feature weight optimization gauss kernel function, wherein the manhatton distance is as Gaussian kernel The distance between two vectors Measurement Method is measured in function.

Preferably, the parameter optimization includes:

Parameter optimization is carried out to supporting vector machine model based on grid search and cross validation.

A kind of BGP abnormality detection system, comprising:

Module is obtained, for obtaining abnormal data set；

Processing module, for carrying out data normalization processing to the abnormal data set；

First determining module can maximize between class distance and minimum inter- object distance for selecting from feature set simultaneously Feature, and obtain the feature weight of each characteristic measure classification capacity；

Optimization module is used for optimization gauss kernel function；

Optimizing module is used for parameter optimization；

Second determining module, for determining optimal feature subset.

Preferably, the acquisition module is specifically used for:

Abnormal data set is obtained from autonomous system.

Preferably, the processing module is specifically used for:

Preferably, the optimization module is specifically used for:

Preferably, the optimizing module is specifically used for:

In conclusion this application discloses a kind of BGP method for detecting abnormality, it is abnormal when needing to carry out Border Gateway Protocol When detection, first then acquisition abnormal data set carries out data normalization processing to abnormal data set, selects from feature set Between class distance can be maximized simultaneously out and minimize the feature of inter- object distance, and obtain the feature of each characteristic measure classification capacity Weight, optimization gauss kernel function, parameter optimization determine optimal feature subset.The application can be based on improved gaussian kernel function And parameter optimization is carried out based on grid search and cross validation, to improve category of model accuracy rate, it is based on optimal feature subset Carry out evaluation model comprehensive performance.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is a kind of flow chart of BGP method for detecting abnormality embodiment 1 disclosed in the present application；

Fig. 2 is a kind of flow chart of BGP method for detecting abnormality embodiment 2 disclosed in the present application；

Fig. 3 is a kind of structural schematic diagram of BGP abnormality detection system embodiment 1 disclosed in the present application；

Fig. 4 is a kind of structural schematic diagram of BGP abnormality detection system embodiment 2 disclosed in the present application；

Fig. 5 is grid search schematic diagram disclosed in the present application.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall in the protection scope of this application.

As shown in Figure 1, being a kind of flow chart of BGP method for detecting abnormality embodiment 1 disclosed in the present application, the method can With the following steps are included:

S101, abnormal data set is obtained；

When needing to carry out abnormality detection BGP (Border Gateway Protocol, Border Gateway Protocol), first Abnormal data set is obtained, that is, obtains detection sample.The BGP is the autonomous routing association of the decentralization of a core on internet View realizes the accessibility between autonomous system by maintenance routing table, belongs to vector route agreement.

S102, data normalization processing is carried out to abnormal data set；

After getting abnormal data set, data normalization processing further is carried out to the abnormal data set got, with The influence for eliminating dimension and numerical values recited, compares and weights so that different characteristic is able to carry out.

S103, the feature that can be maximized between class distance simultaneously and minimize inter- object distance is selected from feature set, and obtain To the feature weight of each characteristic measure classification capacity；

Then, using FMS (Fisher-Markov Selector) feature selecting algorithm, selecting from feature set can be same When maximize between class distance and minimize the feature of inter- object distance, and obtain the feature weight of each characteristic measure classification capacity.

S104, optimization gauss kernel function；

Then, SVM (Super Vector is constructed based on the improvement gaussian kernel function of manhatton distance and feature weight Machine, support vector machines) disaggregated model.SVM be classification with regression analysis in analyze data supervised learning model with Relevant machine learning algorithm.

S105, parameter optimization；

Then, parameter optimization is carried out to SVM model based on grid search and cross validation.

S106, optimal feature subset is determined.

Finally, proposing optimal feature subset based on considering of both category of model accuracy rate and model training time Concept, and building method is provided, under optimal feature subset, it is optimal that model performance can reach synthesis.

In conclusion in the above-described embodiments, when needing to carry out abnormality detection Border Gateway Protocol, obtaining first different Then regular data collection carries out data normalization processing to abnormal data set, between selecting in feature set and can maximize class simultaneously Distance and the feature for minimizing inter- object distance, and the feature weight of each characteristic measure classification capacity is obtained, optimization gauss core letter Number, parameter optimization determine optimal feature subset.The application can based on improved gaussian kernel function and based on grid search with Cross validation carries out parameter optimization, to improve category of model accuracy rate, based on optimal feature subset come evaluation model comprehensive performance.

As shown in Fig. 2, being a kind of flow chart of BGP method for detecting abnormality embodiment 2 disclosed in the present application, the method can With the following steps are included:

S201, abnormal data set is obtained from autonomous system；

When needing to carry out abnormality detection BGP (Border Gateway Protocol, Border Gateway Protocol), from AS513 (RIPE RIS, rcc04, CIXP, Geneva) downloads BGP when Slammer, Nimda and Code Red I are broken out more New message is as BGP abnormal data set.Data, which will be routed, using libBGPdump tool is converted to ASCII fromat from MRT format, Ascii text file is then parsed based on the analytical tool that C# writes and extracts the statistical information of 37 features (as shown in table 1).Five At interval of one minute one sub-eigenvalue of sampling statistics in it, to can get 7200 samples of each anomalous event.Each thing Two days samples are considered as normal data set before and after part, and third day is the peak period of each abnormal movement.

1 feature extraction of table

S202, data normalization processing is carried out to abnormal data set；

Then, the influence that dimension and numerical values recited are eliminated using Z-score standardized method, so that different characteristic energy It is enough compared and weights.In view of BGP data set from sampling statistics, the present invention replaces population mean with sample average, uses Sample standard deviation replaces population standard deviation.Processing method is shown below:

In formulaRepresentative sample mean value, S representative sample standard deviation.

S203, the feature that can be maximized between class distance simultaneously and minimize inter- object distance is selected from feature set, and obtain To the feature weight of each characteristic measure classification capacity；

Due to the presence of redundancy feature, computing cost, and noise number will increase based on high dimensional feature structural classification model According to the classification accuracy that can also reduce model.Therefore, individual features are obtained via characteristic extraction procedure in data preprocessing phase It on the basis of set, needs further to delete redundancy and uncorrelated features, finds and distinguish the optimal character subset of classification ability, with Reach the dimension and computation complexity for reducing eigenmatrix, while improving the purpose of category of model accuracy rate.

FMS feature selecting algorithm can go out according to Fisher linear analysis and the Markov random field choice of technology can simultaneously most Bigization between class distance and the feature for minimizing inter- object distance, it is mutually only using the feature selection process and assorting process of FMS algorithm It is vertical, correlation is measured according to data inherent attribute, the weight size of each feature is ranked up, the bigger expression of weight should The ability of feature differentiation classification is stronger.This method efficiency is higher, can not only guarantee global optimum, and computation complexity compared with It is low, it is suitble to processing large-scale data.Training data is expressed as { (x_k,y_k)}ⁿ _K=1, x_k∈R^pIndicate p dimensional feature vector, y_k∈ {ω₁,...,ω_gIndicate class label, C_i, i=1 ..., g indicate i-th of class, each class C_iThere is n_iA sample.

Define 1: within-class scatter matrix, between class scatter matrix and total population scatter matrix, which are divided into, is denoted as S_w, S_bAnd S_t, in formula Indicate C_jI-th of sample of class,Indicate C_jThe sample average of class,Indicate population mean.

Define 2: from input space R^pTo nuclear space R^DNonlinear Mapping φ () be defined as follows:

φ:R^p→R^D (2)

3: kernel function k () is defined to meet:

< φ (x₁),φ(x₂) >=k (x₁,x₂) (3)

Operation < in formula, > represent the dot product under nuclear space.

Define 4:K and K⁽ⁱ⁾Respectively n rank and n_iRank square matrix, and meet:

K in formula, l ∈ 1 ..., n }, u, v ∈ 1 ..., n_i, i=1 ..., g

Define 5:S_w, S_bAnd S_tIt is denoted as respectively under nuclear spaceWithIts mark is respectively as follows:

Sum () indicates the summation of calculating matrix all elements in formula.

Define 6: defined feature selects vector are as follows:

α=[α₁,...,α_p]^T∈{0,1}^p (6)

α in formula_k=1 shows that k-th of feature is selected, α_k=0 shows that k-th of feature is not selected.

The feature selected from feature vector x is provided by x (α)=x ⊙ α, and ⊙ indicates Hadamard product.Therefore it can will maximize The feature selecting criterion of class separation is converted into such as next Unconstrained Optimization Problem.

Wherein γ is free parameter, analysis shows γ≤0 can obtain a better classifying quality, in the reasonable fluctuation of γ In range, the experiment effect of classifier is insensitive to γ.In order to handle linearly inseparable from redundancy feature bring L is added in strong noise data set₀The optimization of norm, i.e. feature selecting criterion are as follows:

Regular factor β indicates global threshold in formula.

Consider such as next linear kernel function:

Formula (8) are updated to, are obtained:

Wherein θ_jDefinition is as shown in formula (11), θ_jFor measuring the significance level that feature differentiates in class separation, the i.e. power of feature Value.θ_jValue is bigger, shows that j-th of feature is more important.

For given β and γ, convolution (10), FMS feature selecting algorithm obtains an optimal feature selecting vector α^*∈{0,1}^pMeet:

IfShow that j-th of feature is selected；Otherwise, ifThen show that j-th of feature is not selected.

For pseudo-code of the algorithm as shown in the following table 2 algorithm 1, the computation complexity of the algorithm is O (n²p)。

Table 2

S204, using manhatton distance and feature weight optimization gauss kernel function, wherein the manhatton distance is as high The distance between two vectors Measurement Method is measured in this kernel function；

Traditional Gauss kernel function is using the distance between two vectors of Euclidean distance metric.But Euclidean Distance can amplify effect of the large error element in distance calculates to a certain extent, influence the classification accuracy of SVM.It is based on This, the present invention is using manhatton distance as the distance between two vectors of measurement Measurement Method in gaussian kernel function.Manhattan Influence of the error of each element to overall distance is all identical in distance, makes value with more comparativity, and operand is lower.

If each feature can be embodied to the percentage contribution of classification in distance calculates, it will be bonded classification method more The data characteristics of BGP, can further improve classification accuracy.Accordingly, introduced feature weight carrys out contribution of the measures characteristic to classification Degree proposes the improvement gaussian kernel function based on manhatton distance and feature weight, is denoted as k ' (x, y), as shown in formula (13):

K ' (x, y)=exp (- γ δ (x, y)) (13)

δ (x, y) indicates the manhatton distance between two vectors in formula, as shown in formula (14):

S205, parameter optimization is carried out to supporting vector machine model based on grid search and cross validation；

The performance of SVM model relies on a pair of important parameter (C, gamma).Wherein C is referred to as penalty factor, indicates to accidentally The tolerance of difference.C is higher, shows that model more can't stand and error occurs, easily lead to model over-fitting；On the contrary, C is smaller, and easily Lead to model poor fitting.C is excessive or too small, can reduce the generalization ability of model, therefore the appropriate value of parameter C is to model point The promotion of class accuracy rate and generalization ability is of great significance.Gamma is in polynomial kernel, Gaussian kernel and Sigmoid core One parameter, it, which is implied, determines the distribution that data are mapped to after new feature space.Gamma value is bigger, then supporting vector is got over Few, gamma value is smaller, then supporting vector is more, and the number of supporting vector will affect the speed of model training and prediction.

It is evaluation mesh with overall classification accuracy in view of the disequilibrium (such as table 3 shows) of two class sample of training dataset Target traditional classification algorithm can pay close attention to most classes too much, so that the classification performance of minority class sample declines.For this purpose, In the searching process of (C, gamma), needs sufficiently to look after minority class sample data, make two class samples that there is phase in the training process Same " right of speech ".Carry out respectively two class samples according to the inverse ratio of two class number of samples size ratios herein and assign weight, in this way The unbalanced situation of data can effectively be solved.

3 liang of class sample weights of table

The selection of nuclear parameter is a difficult point, there is presently no the method for internationally recognized universality, in practical application only Energy compare by experiment or experience gained.Therefore, grid search and cross validation are combined under unbalanced dataset constraint herein It carries out parameter optimization (such as Fig. 5 shows), the search range of (C, gamma) is divided into grid according to value, each point in grid Represent a kind of parameter combination scheme.The range of grid search meets formula (16), step-length 1, i.e. C ∈ { 2^-5,2^-4,...,2⁵And gamma∈{2^-4,2^-3,...,20}。

In each mesh point, cross validation is carried out by following process: total training set is divided into N number of subset, Middle N-1 are used as training set, and remaining 1 is used as test set.Model after going test training with test set every time can obtain one Classification accuracy took the average value of N folding cross validation classification accuracy after N number of subset all does test set.Net is traversed in this way All the points in lattice, taking the maximum point of classification accuracy average value is (C, the gamma) of corresponding best performance.It may be noted that It is to use 5 folding cross validations herein, and since (C, gamma) to choose search range in search process limited and discrete Value, so (C, gamma) is perhaps locally optimal solution.

S206, optimal feature subset is determined.

Based on the weight of the available each feature of FMS feature selecting algorithm, each feature is arranged by weight descending, according to Feature is sequentially added model training and concentrated by ranking results.Experiment discovery because early period be added training set in feature weight compared with Greatly, the classification accuracy of model can be gradually increased, but with the addition of the lower feature of later period weight, and noise and superfluous in data set The presence of remainder evidence, the growth rate of category of model accuracy rate will slow down even accuracy rate decline at this time.But at the same time, The training time of SVM model can then increase always with the increase of feature quantity.Therefore, increase feature simply for model Training is inappropriate.Based on this, set forth herein the concepts of optimal feature subset, sort when according to feature weight, are used for model When trained characteristic set is just optimal feature subset, model performance (i.e. the classification accuracy of model and training time) reaches comprehensive It closes optimal.Further, the pass between measurement model classification accuracy and model training time is come set forth herein feature efficiency function System, to determine optimal feature subset, so that model performance reaches comprehensive optimal.

Defining 7: function f (n) is function of the category of model accuracy rate about feature quantity n, n ∈ Z.

Defining 8: function g (n) is function of the model training time about feature quantity n, n ∈ Z.

By above-mentioned definition it is found that function f (n) and g (n) are respectively described when model training collection includes a certain number of features When, the classification accuracy and the size of model training time of model.For the optimal synthesis performance of evaluation model, feature is defined Efficiency function, as defined shown in 9.

Defining 9:h (n) is the feature efficiency function about feature quantity n, and expression formula is as follows:

Intuitively, h (n) describes the size of classification accuracy in the unit time, if classification accuracy is got in the unit time Greatly namely h (n) is bigger, then model comprehensive performance is more excellent.Naturally, the concept for having obtained following optimum point, as defined 10 institutes Show.

It defines 10: h (n) being made to obtain the point n of maximum value₀It is called the optimum point of model.

Optimum point, which describes, works as n=n₀When, model can obtain maximum classification accuracy within the unit time, at this time model Comprehensive performance has reached optimal.It is apparent that sorted according to feature weight, TOPn₀As optimal feature subset.

As shown in figure 3, being a kind of structural schematic diagram of BGP abnormality detection system embodiment 1 disclosed in the present application, the system System may include:

Module 301 is obtained, for obtaining abnormal data set；

Processing module 302, for carrying out data normalization processing to abnormal data set；

First determining module 303, for from selected in feature set can maximize simultaneously between class distance and minimize class in The feature of distance, and obtain the feature weight of each characteristic measure classification capacity；

Then, using FMS feature selecting algorithm, between class distance and minimum can be maximized simultaneously by selecting from feature set The feature of inter- object distance, and obtain the feature weight of each characteristic measure classification capacity.

Optimization module 304 is used for optimization gauss kernel function；

Optimizing module 305 is used for parameter optimization；

Second determining module 306, for determining optimal feature subset.

As shown in figure 4, being a kind of structural schematic diagram of BGP abnormality detection system embodiment 2 disclosed in the present application, the system System may include:

Module 401 is obtained, for obtaining abnormal data set from autonomous system；

1 feature extraction of table

Processing module 402, for carrying out data normalization processing to abnormal data set；

First determining module 403, for from selected in feature set can maximize simultaneously between class distance and minimize class in The feature of distance, and obtain the feature weight of each characteristic measure classification capacity；

FMS feature selecting algorithm can go out according to Fisher linear analysis and the Markov random field choice of technology can simultaneously most Bigization between class distance and the feature for minimizing inter- object distance, it is mutually only using the feature selection process and assorting process of FMS algorithm It is vertical, correlation is measured according to data inherent attribute, the weight size of each feature is ranked up, the bigger expression of weight should The ability of feature differentiation classification is stronger.This method efficiency is higher, can not only guarantee global optimum, and computation complexity compared with It is low, it is suitble to processing large-scale data.Training data is expressed asx_k∈R^pIndicate p dimensional feature vector, y_k∈ {ω₁,...,ω_gIndicate class label, C_i, i=1 ..., g indicate i-th of class, each class C_iThere is n_iA sample.

φ:R^p→R^D (2)

3: kernel function k () is defined to meet:

< φ (x₁),φ(x₂) >=k (x₁,x₂) (3)

Operation < in formula, > represent the dot product under nuclear space.

K in formula, l ∈ 1 ..., n }, u, v ∈ 1 ..., n_i, i=1 ..., g

Sum () indicates the summation of calculating matrix all elements in formula.

Define 6: defined feature selects vector are as follows:

α=[α₁,...,α_p]^T∈{0,1}^p (6)

Regular factor β indicates global threshold in formula.

Consider such as next linear kernel function:

Formula (8) are updated to, are obtained:

Table 2

Optimization module 404, for use manhatton distance and feature weight optimization gauss kernel function, wherein Manhattan away from From as the distance between two vectors of measurement Measurement Method in gaussian kernel function；

K ' (x, y)=exp (- γ δ (x, y)) (13)

Optimizing module 405, for carrying out parameter optimization to supporting vector machine model based on grid search and cross validation；

3 liang of class sample weights of table

The selection of nuclear parameter is a difficult point, there is presently no the method for internationally recognized universality, in practical application only Energy compare by experiment or experience gained.Therefore, grid search and cross validation are combined under unbalanced dataset constraint herein It carries out parameter optimization (such as Fig. 5 shows), the search range of (C, gamma) is divided into grid according to value, each point in grid Represent a kind of parameter combination scheme.The range of grid search meets formula (16), step-length 1, i.e. C ∈ { 2^-5,2^-4,...,2⁵And gamma∈{2^-4,2^-3,...,2⁰}。

Second determining module 406, for determining optimal feature subset.

Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.

Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond scope of the present application.

The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.

The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims

1. a BGP abnormality detection method, is characterized in that, described method comprises:

Get anomalous dataset;

performing data normalization processing on the abnormal data set;

From the feature set, select the features that can maximize the distance between classes and minimize the distance between classes at the same time, and obtain the feature weights of each feature to measure the classification ability;

Optimize the Gaussian kernel function;

parameter optimization;

Determine the optimal feature subset.

2. The method according to claim 1, wherein the acquiring an abnormal data set comprises:

Get anomalous datasets from autonomous systems.

3. The method according to claim 1, wherein the performing data normalization processing on the abnormal data set comprises:

The sample mean is used instead of the population mean, and the sample standard deviation is used instead of the population standard deviation.

4. The method according to claim 1, wherein the optimized Gaussian kernel function comprises:

The Gaussian kernel function is optimized by using Manhattan distance and feature weights, wherein the Manhattan distance is used as a distance measure method for measuring the distance between two vectors in the Gaussian kernel function.

5. The method according to claim 1, wherein the parameter optimization comprises:

Parameter optimization of support vector machine model based on grid search and cross-validation.

6. A BGP anomaly detection system, characterized in that, comprising:

The acquisition module is used to acquire abnormal data sets;

a processing module, configured to perform data standardization processing on the abnormal data set;

The first determination module is used to select features that can simultaneously maximize the inter-class distance and minimize the intra-class distance from the feature set, and obtain the feature weights for each feature to measure the classification ability;

The optimization module is used to optimize the Gaussian kernel function;

Optimization module, used for parameter optimization;

The second determination module is used to determine the optimal feature subset.

7. The system according to claim 6, wherein the acquisition module is specifically used for:

Get anomalous datasets from autonomous systems.

8. The system according to claim 6, wherein the processing module is specifically used for:

9. The system according to claim 6, wherein the optimization module is specifically used for:

10. The system according to claim 6, wherein the optimization module is specifically used for: