CN109257383A - A kind of BGP method for detecting abnormality and system - Google Patents

A kind of BGP method for detecting abnormality and system Download PDF

Info

Publication number
CN109257383A
CN109257383A CN201811331848.7A CN201811331848A CN109257383A CN 109257383 A CN109257383 A CN 109257383A CN 201811331848 A CN201811331848 A CN 201811331848A CN 109257383 A CN109257383 A CN 109257383A
Authority
CN
China
Prior art keywords
feature
optimization
distance
data set
abnormal data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811331848.7A
Other languages
Chinese (zh)
Other versions
CN109257383B (en
Inventor
王娜
杜学绘
戴仙波
任志宇
王文娟
单棣斌
杨智
刘敖迪
李少卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN201811331848.7A priority Critical patent/CN109257383B/en
Publication of CN109257383A publication Critical patent/CN109257383A/en
Application granted granted Critical
Publication of CN109257383B publication Critical patent/CN109257383B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Complex Calculations (AREA)

Abstract

This application discloses a kind of BGP method for detecting abnormality and systems, method includes: acquisition abnormal data set, data normalization processing is carried out to abnormal data set, the feature that can be maximized between class distance simultaneously and minimize inter- object distance is selected from feature set, and obtain the feature weight of each characteristic measure classification capacity, optimization gauss kernel function, parameter optimization determine optimal feature subset.The application can carry out parameter optimization based on improved gaussian kernel function and based on grid search and cross validation, to improve category of model accuracy rate, based on optimal feature subset come evaluation model comprehensive performance.

Description

A kind of BGP method for detecting abnormality and system
Technical field
This application involves abnormality detection technical field more particularly to a kind of BGP (Border Gateway Protocol, sides Boundary's gateway protocol) method for detecting abnormality and system.
Background technique
According to event consequence, BGP can be divided into data flow abduction exception extremely and update message surge exception.Data flow is kidnapped Exception will lead to the redirection of victim network data flow, forms flow black hole etc., destroys the accessibility of victim network.Update report Abnormal will lead to of text surge generates a large amount of bgp update message in a very short period of time, destroys the stability of Global Internet.
BGP method for detecting abnormality is generally divided into five classes at present, is method based on statistical-simulation spectrometry respectively, based on history The method of bgp update message, based on accessibility verifying method, and the method based on time series analysis and be based on engineering The method of habit.Method based on statistical-simulation spectrometry carries out pattern-recognition using statistical probability theory, according to the distance between mode Function determines exception, can simultaneously detection data stream kidnap it is abnormal and update message increase sharply it is abnormal.But this method is faced with just The really difficulty of estimation high dimensional data distribution, detection speed is slow, and the threshold value for manually determining model parameter is needed in practical application.It is based on The method of history bgp update message and the method verified based on accessibility are only capable of detection data stream and kidnap exception, the former utilizes and goes through History data detect BGP anomalous routes, and the latter carries out abnormality detection according to the accessibility verification result of destination prefix.Base Method in time series analysis and the method based on machine learning, which are able to detect, updates message surge exception.Wherein, when being based on Between the method analyzed of sequence bgp update message is considered as to the time series of a multidimensional, pass through the suitable time slip-window of selection Cause for gossip shows abnormality detection.But this method is difficult to determine the size of time window, and time window is too small, and to will lead to model available Information content is inadequate, and time window is excessive and to will lead to model insensitive to local anomaly, so that rate of failing to report rises.In recent years, machine Device learning method has obtained certain application in BGP abnormality detection field.From the point of view of machine learning angle, BGP abnormality detection problem Two classification problems can be abstracted as, it is therefore an objective to unknown bgp update message is identified as normal message or exception message, to realize BGP abnormality detection.
In conclusion that there is such as Detection accuracies is lower, parameter threshold estimation is tired for traditional BGP method for detecting abnormality Difficult, detection speed compared with slow, deployment difficulty is big, dependent on a series of practical problems such as the completeness of knowledge base.
Therefore, how to solve that prior art classification accuracy is lower, and effect is less good, the comprehensive performance of model is not done It evaluates out, is a urgent problem to be solved.
Summary of the invention
In view of this, this application provides a kind of BGP method for detecting abnormality, can based on improved gaussian kernel function and Parameter optimization is carried out based on grid search and cross validation to comment to improve category of model accuracy rate based on optimal feature subset Valence model comprehensive performance.
This application provides a kind of BGP method for detecting abnormality, which comprises
Obtain abnormal data set;
Data normalization processing is carried out to the abnormal data set;
The feature that can be maximized between class distance simultaneously and minimize inter- object distance is selected from feature set, and is obtained each The feature weight of characteristic measure classification capacity;
Optimization gauss kernel function;
Parameter optimization;
Determine optimal feature subset.
Preferably, the acquisition abnormal data set includes:
Abnormal data set is obtained from autonomous system.
Preferably, described to include: to abnormal data set progress data normalization processing
Population mean is replaced using sample average, population standard deviation is replaced using sample standard deviation.
Preferably, the optimization gauss kernel function includes:
Using manhatton distance and feature weight optimization gauss kernel function, wherein the manhatton distance is as Gaussian kernel The distance between two vectors Measurement Method is measured in function.
Preferably, the parameter optimization includes:
Parameter optimization is carried out to supporting vector machine model based on grid search and cross validation.
A kind of BGP abnormality detection system, comprising:
Module is obtained, for obtaining abnormal data set;
Processing module, for carrying out data normalization processing to the abnormal data set;
First determining module can maximize between class distance and minimum inter- object distance for selecting from feature set simultaneously Feature, and obtain the feature weight of each characteristic measure classification capacity;
Optimization module is used for optimization gauss kernel function;
Optimizing module is used for parameter optimization;
Second determining module, for determining optimal feature subset.
Preferably, the acquisition module is specifically used for:
Abnormal data set is obtained from autonomous system.
Preferably, the processing module is specifically used for:
Population mean is replaced using sample average, population standard deviation is replaced using sample standard deviation.
Preferably, the optimization module is specifically used for:
Using manhatton distance and feature weight optimization gauss kernel function, wherein the manhatton distance is as Gaussian kernel The distance between two vectors Measurement Method is measured in function.
Preferably, the optimizing module is specifically used for:
Parameter optimization is carried out to supporting vector machine model based on grid search and cross validation.
In conclusion this application discloses a kind of BGP method for detecting abnormality, it is abnormal when needing to carry out Border Gateway Protocol When detection, first then acquisition abnormal data set carries out data normalization processing to abnormal data set, selects from feature set Between class distance can be maximized simultaneously out and minimize the feature of inter- object distance, and obtain the feature of each characteristic measure classification capacity Weight, optimization gauss kernel function, parameter optimization determine optimal feature subset.The application can be based on improved gaussian kernel function And parameter optimization is carried out based on grid search and cross validation, to improve category of model accuracy rate, it is based on optimal feature subset Carry out evaluation model comprehensive performance.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow chart of BGP method for detecting abnormality embodiment 1 disclosed in the present application;
Fig. 2 is a kind of flow chart of BGP method for detecting abnormality embodiment 2 disclosed in the present application;
Fig. 3 is a kind of structural schematic diagram of BGP abnormality detection system embodiment 1 disclosed in the present application;
Fig. 4 is a kind of structural schematic diagram of BGP abnormality detection system embodiment 2 disclosed in the present application;
Fig. 5 is grid search schematic diagram disclosed in the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall in the protection scope of this application.
As shown in Figure 1, being a kind of flow chart of BGP method for detecting abnormality embodiment 1 disclosed in the present application, the method can With the following steps are included:
S101, abnormal data set is obtained;
When needing to carry out abnormality detection BGP (Border Gateway Protocol, Border Gateway Protocol), first Abnormal data set is obtained, that is, obtains detection sample.The BGP is the autonomous routing association of the decentralization of a core on internet View realizes the accessibility between autonomous system by maintenance routing table, belongs to vector route agreement.
S102, data normalization processing is carried out to abnormal data set;
After getting abnormal data set, data normalization processing further is carried out to the abnormal data set got, with The influence for eliminating dimension and numerical values recited, compares and weights so that different characteristic is able to carry out.
S103, the feature that can be maximized between class distance simultaneously and minimize inter- object distance is selected from feature set, and obtain To the feature weight of each characteristic measure classification capacity;
Then, using FMS (Fisher-Markov Selector) feature selecting algorithm, selecting from feature set can be same When maximize between class distance and minimize the feature of inter- object distance, and obtain the feature weight of each characteristic measure classification capacity.
S104, optimization gauss kernel function;
Then, SVM (Super Vector is constructed based on the improvement gaussian kernel function of manhatton distance and feature weight Machine, support vector machines) disaggregated model.SVM be classification with regression analysis in analyze data supervised learning model with Relevant machine learning algorithm.
S105, parameter optimization;
Then, parameter optimization is carried out to SVM model based on grid search and cross validation.
S106, optimal feature subset is determined.
Finally, proposing optimal feature subset based on considering of both category of model accuracy rate and model training time Concept, and building method is provided, under optimal feature subset, it is optimal that model performance can reach synthesis.
In conclusion in the above-described embodiments, when needing to carry out abnormality detection Border Gateway Protocol, obtaining first different Then regular data collection carries out data normalization processing to abnormal data set, between selecting in feature set and can maximize class simultaneously Distance and the feature for minimizing inter- object distance, and the feature weight of each characteristic measure classification capacity is obtained, optimization gauss core letter Number, parameter optimization determine optimal feature subset.The application can based on improved gaussian kernel function and based on grid search with Cross validation carries out parameter optimization, to improve category of model accuracy rate, based on optimal feature subset come evaluation model comprehensive performance.
As shown in Fig. 2, being a kind of flow chart of BGP method for detecting abnormality embodiment 2 disclosed in the present application, the method can With the following steps are included:
S201, abnormal data set is obtained from autonomous system;
When needing to carry out abnormality detection BGP (Border Gateway Protocol, Border Gateway Protocol), from AS513 (RIPE RIS, rcc04, CIXP, Geneva) downloads BGP when Slammer, Nimda and Code Red I are broken out more New message is as BGP abnormal data set.Data, which will be routed, using libBGPdump tool is converted to ASCII fromat from MRT format, Ascii text file is then parsed based on the analytical tool that C# writes and extracts the statistical information of 37 features (as shown in table 1).Five At interval of one minute one sub-eigenvalue of sampling statistics in it, to can get 7200 samples of each anomalous event.Each thing Two days samples are considered as normal data set before and after part, and third day is the peak period of each abnormal movement.
1 feature extraction of table
S202, data normalization processing is carried out to abnormal data set;
Then, the influence that dimension and numerical values recited are eliminated using Z-score standardized method, so that different characteristic energy It is enough compared and weights.In view of BGP data set from sampling statistics, the present invention replaces population mean with sample average, uses Sample standard deviation replaces population standard deviation.Processing method is shown below:
In formulaRepresentative sample mean value, S representative sample standard deviation.
S203, the feature that can be maximized between class distance simultaneously and minimize inter- object distance is selected from feature set, and obtain To the feature weight of each characteristic measure classification capacity;
Due to the presence of redundancy feature, computing cost, and noise number will increase based on high dimensional feature structural classification model According to the classification accuracy that can also reduce model.Therefore, individual features are obtained via characteristic extraction procedure in data preprocessing phase It on the basis of set, needs further to delete redundancy and uncorrelated features, finds and distinguish the optimal character subset of classification ability, with Reach the dimension and computation complexity for reducing eigenmatrix, while improving the purpose of category of model accuracy rate.
FMS feature selecting algorithm can go out according to Fisher linear analysis and the Markov random field choice of technology can simultaneously most Bigization between class distance and the feature for minimizing inter- object distance, it is mutually only using the feature selection process and assorting process of FMS algorithm It is vertical, correlation is measured according to data inherent attribute, the weight size of each feature is ranked up, the bigger expression of weight should The ability of feature differentiation classification is stronger.This method efficiency is higher, can not only guarantee global optimum, and computation complexity compared with It is low, it is suitble to processing large-scale data.Training data is expressed as { (xk,yk)}n K=1, xk∈RpIndicate p dimensional feature vector, yk∈ {ω1,...,ωgIndicate class label, Ci, i=1 ..., g indicate i-th of class, each class CiThere is niA sample.
Define 1: within-class scatter matrix, between class scatter matrix and total population scatter matrix, which are divided into, is denoted as Sw, SbAnd St, in formula Indicate CjI-th of sample of class,Indicate CjThe sample average of class,Indicate population mean.
Define 2: from input space RpTo nuclear space RDNonlinear Mapping φ () be defined as follows:
φ:Rp→RD (2)
3: kernel function k () is defined to meet:
< φ (x1),φ(x2) >=k (x1,x2) (3)
Operation < in formula, > represent the dot product under nuclear space.
Define 4:K and K(i)Respectively n rank and niRank square matrix, and meet:
K in formula, l ∈ 1 ..., n }, u, v ∈ 1 ..., ni, i=1 ..., g
Define 5:Sw, SbAnd StIt is denoted as respectively under nuclear spaceWithIts mark is respectively as follows:
Sum () indicates the summation of calculating matrix all elements in formula.
Define 6: defined feature selects vector are as follows:
α=[α1,...,αp]T∈{0,1}p (6)
α in formulak=1 shows that k-th of feature is selected, αk=0 shows that k-th of feature is not selected.
The feature selected from feature vector x is provided by x (α)=x ⊙ α, and ⊙ indicates Hadamard product.Therefore it can will maximize The feature selecting criterion of class separation is converted into such as next Unconstrained Optimization Problem.
Wherein γ is free parameter, analysis shows γ≤0 can obtain a better classifying quality, in the reasonable fluctuation of γ In range, the experiment effect of classifier is insensitive to γ.In order to handle linearly inseparable from redundancy feature bring L is added in strong noise data set0The optimization of norm, i.e. feature selecting criterion are as follows:
Regular factor β indicates global threshold in formula.
Consider such as next linear kernel function:
Formula (8) are updated to, are obtained:
Wherein θjDefinition is as shown in formula (11), θjFor measuring the significance level that feature differentiates in class separation, the i.e. power of feature Value.θjValue is bigger, shows that j-th of feature is more important.
For given β and γ, convolution (10), FMS feature selecting algorithm obtains an optimal feature selecting vector α*∈{0,1}pMeet:
IfShow that j-th of feature is selected;Otherwise, ifThen show that j-th of feature is not selected.
For pseudo-code of the algorithm as shown in the following table 2 algorithm 1, the computation complexity of the algorithm is O (n2p)。
Table 2
S204, using manhatton distance and feature weight optimization gauss kernel function, wherein the manhatton distance is as high The distance between two vectors Measurement Method is measured in this kernel function;
Traditional Gauss kernel function is using the distance between two vectors of Euclidean distance metric.But Euclidean Distance can amplify effect of the large error element in distance calculates to a certain extent, influence the classification accuracy of SVM.It is based on This, the present invention is using manhatton distance as the distance between two vectors of measurement Measurement Method in gaussian kernel function.Manhattan Influence of the error of each element to overall distance is all identical in distance, makes value with more comparativity, and operand is lower.
If each feature can be embodied to the percentage contribution of classification in distance calculates, it will be bonded classification method more The data characteristics of BGP, can further improve classification accuracy.Accordingly, introduced feature weight carrys out contribution of the measures characteristic to classification Degree proposes the improvement gaussian kernel function based on manhatton distance and feature weight, is denoted as k ' (x, y), as shown in formula (13):
K ' (x, y)=exp (- γ δ (x, y)) (13)
δ (x, y) indicates the manhatton distance between two vectors in formula, as shown in formula (14):
S205, parameter optimization is carried out to supporting vector machine model based on grid search and cross validation;
The performance of SVM model relies on a pair of important parameter (C, gamma).Wherein C is referred to as penalty factor, indicates to accidentally The tolerance of difference.C is higher, shows that model more can't stand and error occurs, easily lead to model over-fitting;On the contrary, C is smaller, and easily Lead to model poor fitting.C is excessive or too small, can reduce the generalization ability of model, therefore the appropriate value of parameter C is to model point The promotion of class accuracy rate and generalization ability is of great significance.Gamma is in polynomial kernel, Gaussian kernel and Sigmoid core One parameter, it, which is implied, determines the distribution that data are mapped to after new feature space.Gamma value is bigger, then supporting vector is got over Few, gamma value is smaller, then supporting vector is more, and the number of supporting vector will affect the speed of model training and prediction.
It is evaluation mesh with overall classification accuracy in view of the disequilibrium (such as table 3 shows) of two class sample of training dataset Target traditional classification algorithm can pay close attention to most classes too much, so that the classification performance of minority class sample declines.For this purpose, In the searching process of (C, gamma), needs sufficiently to look after minority class sample data, make two class samples that there is phase in the training process Same " right of speech ".Carry out respectively two class samples according to the inverse ratio of two class number of samples size ratios herein and assign weight, in this way The unbalanced situation of data can effectively be solved.
3 liang of class sample weights of table
The selection of nuclear parameter is a difficult point, there is presently no the method for internationally recognized universality, in practical application only Energy compare by experiment or experience gained.Therefore, grid search and cross validation are combined under unbalanced dataset constraint herein It carries out parameter optimization (such as Fig. 5 shows), the search range of (C, gamma) is divided into grid according to value, each point in grid Represent a kind of parameter combination scheme.The range of grid search meets formula (16), step-length 1, i.e. C ∈ { 2-5,2-4,...,25And gamma∈{2-4,2-3,...,20}。
In each mesh point, cross validation is carried out by following process: total training set is divided into N number of subset, Middle N-1 are used as training set, and remaining 1 is used as test set.Model after going test training with test set every time can obtain one Classification accuracy took the average value of N folding cross validation classification accuracy after N number of subset all does test set.Net is traversed in this way All the points in lattice, taking the maximum point of classification accuracy average value is (C, the gamma) of corresponding best performance.It may be noted that It is to use 5 folding cross validations herein, and since (C, gamma) to choose search range in search process limited and discrete Value, so (C, gamma) is perhaps locally optimal solution.
S206, optimal feature subset is determined.
Based on the weight of the available each feature of FMS feature selecting algorithm, each feature is arranged by weight descending, according to Feature is sequentially added model training and concentrated by ranking results.Experiment discovery because early period be added training set in feature weight compared with Greatly, the classification accuracy of model can be gradually increased, but with the addition of the lower feature of later period weight, and noise and superfluous in data set The presence of remainder evidence, the growth rate of category of model accuracy rate will slow down even accuracy rate decline at this time.But at the same time, The training time of SVM model can then increase always with the increase of feature quantity.Therefore, increase feature simply for model Training is inappropriate.Based on this, set forth herein the concepts of optimal feature subset, sort when according to feature weight, are used for model When trained characteristic set is just optimal feature subset, model performance (i.e. the classification accuracy of model and training time) reaches comprehensive It closes optimal.Further, the pass between measurement model classification accuracy and model training time is come set forth herein feature efficiency function System, to determine optimal feature subset, so that model performance reaches comprehensive optimal.
Defining 7: function f (n) is function of the category of model accuracy rate about feature quantity n, n ∈ Z.
Defining 8: function g (n) is function of the model training time about feature quantity n, n ∈ Z.
By above-mentioned definition it is found that function f (n) and g (n) are respectively described when model training collection includes a certain number of features When, the classification accuracy and the size of model training time of model.For the optimal synthesis performance of evaluation model, feature is defined Efficiency function, as defined shown in 9.
Defining 9:h (n) is the feature efficiency function about feature quantity n, and expression formula is as follows:
Intuitively, h (n) describes the size of classification accuracy in the unit time, if classification accuracy is got in the unit time Greatly namely h (n) is bigger, then model comprehensive performance is more excellent.Naturally, the concept for having obtained following optimum point, as defined 10 institutes Show.
It defines 10: h (n) being made to obtain the point n of maximum value0It is called the optimum point of model.
Optimum point, which describes, works as n=n0When, model can obtain maximum classification accuracy within the unit time, at this time model Comprehensive performance has reached optimal.It is apparent that sorted according to feature weight, TOPn0As optimal feature subset.
As shown in figure 3, being a kind of structural schematic diagram of BGP abnormality detection system embodiment 1 disclosed in the present application, the system System may include:
Module 301 is obtained, for obtaining abnormal data set;
When needing to carry out abnormality detection BGP (Border Gateway Protocol, Border Gateway Protocol), first Abnormal data set is obtained, that is, obtains detection sample.The BGP is the autonomous routing association of the decentralization of a core on internet View realizes the accessibility between autonomous system by maintenance routing table, belongs to vector route agreement.
Processing module 302, for carrying out data normalization processing to abnormal data set;
After getting abnormal data set, data normalization processing further is carried out to the abnormal data set got, with The influence for eliminating dimension and numerical values recited, compares and weights so that different characteristic is able to carry out.
First determining module 303, for from selected in feature set can maximize simultaneously between class distance and minimize class in The feature of distance, and obtain the feature weight of each characteristic measure classification capacity;
Then, using FMS feature selecting algorithm, between class distance and minimum can be maximized simultaneously by selecting from feature set The feature of inter- object distance, and obtain the feature weight of each characteristic measure classification capacity.
Optimization module 304 is used for optimization gauss kernel function;
Then, SVM (Super Vector is constructed based on the improvement gaussian kernel function of manhatton distance and feature weight Machine, support vector machines) disaggregated model.SVM be classification with regression analysis in analyze data supervised learning model with Relevant machine learning algorithm.
Optimizing module 305 is used for parameter optimization;
Then, parameter optimization is carried out to SVM model based on grid search and cross validation.
Second determining module 306, for determining optimal feature subset.
Finally, proposing optimal feature subset based on considering of both category of model accuracy rate and model training time Concept, and building method is provided, under optimal feature subset, it is optimal that model performance can reach synthesis.
In conclusion in the above-described embodiments, when needing to carry out abnormality detection Border Gateway Protocol, obtaining first different Then regular data collection carries out data normalization processing to abnormal data set, between selecting in feature set and can maximize class simultaneously Distance and the feature for minimizing inter- object distance, and the feature weight of each characteristic measure classification capacity is obtained, optimization gauss core letter Number, parameter optimization determine optimal feature subset.The application can based on improved gaussian kernel function and based on grid search with Cross validation carries out parameter optimization, to improve category of model accuracy rate, based on optimal feature subset come evaluation model comprehensive performance.
As shown in figure 4, being a kind of structural schematic diagram of BGP abnormality detection system embodiment 2 disclosed in the present application, the system System may include:
Module 401 is obtained, for obtaining abnormal data set from autonomous system;
When needing to carry out abnormality detection BGP (Border Gateway Protocol, Border Gateway Protocol), from AS513 (RIPE RIS, rcc04, CIXP, Geneva) downloads BGP when Slammer, Nimda and Code Red I are broken out more New message is as BGP abnormal data set.Data, which will be routed, using libBGPdump tool is converted to ASCII fromat from MRT format, Ascii text file is then parsed based on the analytical tool that C# writes and extracts the statistical information of 37 features (as shown in table 1).Five At interval of one minute one sub-eigenvalue of sampling statistics in it, to can get 7200 samples of each anomalous event.Each thing Two days samples are considered as normal data set before and after part, and third day is the peak period of each abnormal movement.
1 feature extraction of table
Processing module 402, for carrying out data normalization processing to abnormal data set;
Then, the influence that dimension and numerical values recited are eliminated using Z-score standardized method, so that different characteristic energy It is enough compared and weights.In view of BGP data set from sampling statistics, the present invention replaces population mean with sample average, uses Sample standard deviation replaces population standard deviation.Processing method is shown below:
In formulaRepresentative sample mean value, S representative sample standard deviation.
First determining module 403, for from selected in feature set can maximize simultaneously between class distance and minimize class in The feature of distance, and obtain the feature weight of each characteristic measure classification capacity;
Due to the presence of redundancy feature, computing cost, and noise number will increase based on high dimensional feature structural classification model According to the classification accuracy that can also reduce model.Therefore, individual features are obtained via characteristic extraction procedure in data preprocessing phase It on the basis of set, needs further to delete redundancy and uncorrelated features, finds and distinguish the optimal character subset of classification ability, with Reach the dimension and computation complexity for reducing eigenmatrix, while improving the purpose of category of model accuracy rate.
FMS feature selecting algorithm can go out according to Fisher linear analysis and the Markov random field choice of technology can simultaneously most Bigization between class distance and the feature for minimizing inter- object distance, it is mutually only using the feature selection process and assorting process of FMS algorithm It is vertical, correlation is measured according to data inherent attribute, the weight size of each feature is ranked up, the bigger expression of weight should The ability of feature differentiation classification is stronger.This method efficiency is higher, can not only guarantee global optimum, and computation complexity compared with It is low, it is suitble to processing large-scale data.Training data is expressed asxk∈RpIndicate p dimensional feature vector, yk∈ {ω1,...,ωgIndicate class label, Ci, i=1 ..., g indicate i-th of class, each class CiThere is niA sample.
Define 1: within-class scatter matrix, between class scatter matrix and total population scatter matrix, which are divided into, is denoted as Sw, SbAnd St, in formula Indicate CjI-th of sample of class,Indicate CjThe sample average of class,Indicate population mean.
Define 2: from input space RpTo nuclear space RDNonlinear Mapping φ () be defined as follows:
φ:Rp→RD (2)
3: kernel function k () is defined to meet:
< φ (x1),φ(x2) >=k (x1,x2) (3)
Operation < in formula, > represent the dot product under nuclear space.
Define 4:K and K(i)Respectively n rank and niRank square matrix, and meet:
K in formula, l ∈ 1 ..., n }, u, v ∈ 1 ..., ni, i=1 ..., g
Define 5:Sw, SbAnd StIt is denoted as respectively under nuclear spaceWithIts mark is respectively as follows:
Sum () indicates the summation of calculating matrix all elements in formula.
Define 6: defined feature selects vector are as follows:
α=[α1,...,αp]T∈{0,1}p (6)
α in formulak=1 shows that k-th of feature is selected, αk=0 shows that k-th of feature is not selected.
The feature selected from feature vector x is provided by x (α)=x ⊙ α, and ⊙ indicates Hadamard product.Therefore it can will maximize The feature selecting criterion of class separation is converted into such as next Unconstrained Optimization Problem.
Wherein γ is free parameter, analysis shows γ≤0 can obtain a better classifying quality, in the reasonable fluctuation of γ In range, the experiment effect of classifier is insensitive to γ.In order to handle linearly inseparable from redundancy feature bring L is added in strong noise data set0The optimization of norm, i.e. feature selecting criterion are as follows:
Regular factor β indicates global threshold in formula.
Consider such as next linear kernel function:
Formula (8) are updated to, are obtained:
Wherein θjDefinition is as shown in formula (11), θjFor measuring the significance level that feature differentiates in class separation, the i.e. power of feature Value.θjValue is bigger, shows that j-th of feature is more important.
For given β and γ, convolution (10), FMS feature selecting algorithm obtains an optimal feature selecting vector α*∈{0,1}pMeet:
IfShow that j-th of feature is selected;Otherwise, ifThen show that j-th of feature is not selected.
For pseudo-code of the algorithm as shown in the following table 2 algorithm 1, the computation complexity of the algorithm is O (n2p)。
Table 2
Optimization module 404, for use manhatton distance and feature weight optimization gauss kernel function, wherein Manhattan away from From as the distance between two vectors of measurement Measurement Method in gaussian kernel function;
Traditional Gauss kernel function is using the distance between two vectors of Euclidean distance metric.But Euclidean Distance can amplify effect of the large error element in distance calculates to a certain extent, influence the classification accuracy of SVM.It is based on This, the present invention is using manhatton distance as the distance between two vectors of measurement Measurement Method in gaussian kernel function.Manhattan Influence of the error of each element to overall distance is all identical in distance, makes value with more comparativity, and operand is lower.
If each feature can be embodied to the percentage contribution of classification in distance calculates, it will be bonded classification method more The data characteristics of BGP, can further improve classification accuracy.Accordingly, introduced feature weight carrys out contribution of the measures characteristic to classification Degree proposes the improvement gaussian kernel function based on manhatton distance and feature weight, is denoted as k ' (x, y), as shown in formula (13):
K ' (x, y)=exp (- γ δ (x, y)) (13)
δ (x, y) indicates the manhatton distance between two vectors in formula, as shown in formula (14):
Optimizing module 405, for carrying out parameter optimization to supporting vector machine model based on grid search and cross validation;
The performance of SVM model relies on a pair of important parameter (C, gamma).Wherein C is referred to as penalty factor, indicates to accidentally The tolerance of difference.C is higher, shows that model more can't stand and error occurs, easily lead to model over-fitting;On the contrary, C is smaller, and easily Lead to model poor fitting.C is excessive or too small, can reduce the generalization ability of model, therefore the appropriate value of parameter C is to model point The promotion of class accuracy rate and generalization ability is of great significance.Gamma is in polynomial kernel, Gaussian kernel and Sigmoid core One parameter, it, which is implied, determines the distribution that data are mapped to after new feature space.Gamma value is bigger, then supporting vector is got over Few, gamma value is smaller, then supporting vector is more, and the number of supporting vector will affect the speed of model training and prediction.
It is evaluation mesh with overall classification accuracy in view of the disequilibrium (such as table 3 shows) of two class sample of training dataset Target traditional classification algorithm can pay close attention to most classes too much, so that the classification performance of minority class sample declines.For this purpose, In the searching process of (C, gamma), needs sufficiently to look after minority class sample data, make two class samples that there is phase in the training process Same " right of speech ".Carry out respectively two class samples according to the inverse ratio of two class number of samples size ratios herein and assign weight, in this way The unbalanced situation of data can effectively be solved.
3 liang of class sample weights of table
The selection of nuclear parameter is a difficult point, there is presently no the method for internationally recognized universality, in practical application only Energy compare by experiment or experience gained.Therefore, grid search and cross validation are combined under unbalanced dataset constraint herein It carries out parameter optimization (such as Fig. 5 shows), the search range of (C, gamma) is divided into grid according to value, each point in grid Represent a kind of parameter combination scheme.The range of grid search meets formula (16), step-length 1, i.e. C ∈ { 2-5,2-4,...,25And gamma∈{2-4,2-3,...,20}。
In each mesh point, cross validation is carried out by following process: total training set is divided into N number of subset, Middle N-1 are used as training set, and remaining 1 is used as test set.Model after going test training with test set every time can obtain one Classification accuracy took the average value of N folding cross validation classification accuracy after N number of subset all does test set.Net is traversed in this way All the points in lattice, taking the maximum point of classification accuracy average value is (C, the gamma) of corresponding best performance.It may be noted that It is to use 5 folding cross validations herein, and since (C, gamma) to choose search range in search process limited and discrete Value, so (C, gamma) is perhaps locally optimal solution.
Second determining module 406, for determining optimal feature subset.
Based on the weight of the available each feature of FMS feature selecting algorithm, each feature is arranged by weight descending, according to Feature is sequentially added model training and concentrated by ranking results.Experiment discovery because early period be added training set in feature weight compared with Greatly, the classification accuracy of model can be gradually increased, but with the addition of the lower feature of later period weight, and noise and superfluous in data set The presence of remainder evidence, the growth rate of category of model accuracy rate will slow down even accuracy rate decline at this time.But at the same time, The training time of SVM model can then increase always with the increase of feature quantity.Therefore, increase feature simply for model Training is inappropriate.Based on this, set forth herein the concepts of optimal feature subset, sort when according to feature weight, are used for model When trained characteristic set is just optimal feature subset, model performance (i.e. the classification accuracy of model and training time) reaches comprehensive It closes optimal.Further, the pass between measurement model classification accuracy and model training time is come set forth herein feature efficiency function System, to determine optimal feature subset, so that model performance reaches comprehensive optimal.
Defining 7: function f (n) is function of the category of model accuracy rate about feature quantity n, n ∈ Z.
Defining 8: function g (n) is function of the model training time about feature quantity n, n ∈ Z.
By above-mentioned definition it is found that function f (n) and g (n) are respectively described when model training collection includes a certain number of features When, the classification accuracy and the size of model training time of model.For the optimal synthesis performance of evaluation model, feature is defined Efficiency function, as defined shown in 9.
Defining 9:h (n) is the feature efficiency function about feature quantity n, and expression formula is as follows:
Intuitively, h (n) describes the size of classification accuracy in the unit time, if classification accuracy is got in the unit time Greatly namely h (n) is bigger, then model comprehensive performance is more excellent.Naturally, the concept for having obtained following optimum point, as defined 10 institutes Show.
It defines 10: h (n) being made to obtain the point n of maximum value0It is called the optimum point of model.
Optimum point, which describes, works as n=n0When, model can obtain maximum classification accuracy within the unit time, at this time model Comprehensive performance has reached optimal.It is apparent that sorted according to feature weight, TOPn0As optimal feature subset.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond scope of the present application.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (10)

1. a kind of BGP method for detecting abnormality, which is characterized in that the described method includes:
Obtain abnormal data set;
Data normalization processing is carried out to the abnormal data set;
The feature that can be maximized between class distance simultaneously and minimize inter- object distance is selected from feature set, and obtains each feature Measure the feature weight of classification capacity;
Optimization gauss kernel function;
Parameter optimization;
Determine optimal feature subset.
2. the method according to claim 1, wherein the acquisition abnormal data set includes:
Abnormal data set is obtained from autonomous system.
3. the method according to claim 1, wherein described carry out at data normalization the abnormal data set Reason includes:
Population mean is replaced using sample average, population standard deviation is replaced using sample standard deviation.
4. the method according to claim 1, wherein the optimization gauss kernel function includes:
Using manhatton distance and feature weight optimization gauss kernel function, wherein the manhatton distance is as gaussian kernel function The distance between two vectors of middle measurement Measurement Method.
5. the method according to claim 1, wherein the parameter optimization includes:
Parameter optimization is carried out to supporting vector machine model based on grid search and cross validation.
6. a kind of BGP abnormality detection system characterized by comprising
Module is obtained, for obtaining abnormal data set;
Processing module, for carrying out data normalization processing to the abnormal data set;
First determining module, for selecting the spy that can be maximized between class distance simultaneously and minimize inter- object distance from feature set Sign, and obtain the feature weight of each characteristic measure classification capacity;
Optimization module is used for optimization gauss kernel function;
Optimizing module is used for parameter optimization;
Second determining module, for determining optimal feature subset.
7. system according to claim 6, which is characterized in that the acquisition module is specifically used for:
Abnormal data set is obtained from autonomous system.
8. system according to claim 6, which is characterized in that the processing module is specifically used for:
Population mean is replaced using sample average, population standard deviation is replaced using sample standard deviation.
9. system according to claim 6, which is characterized in that the optimization module is specifically used for:
Using manhatton distance and feature weight optimization gauss kernel function, wherein the manhatton distance is as gaussian kernel function The distance between two vectors of middle measurement Measurement Method.
10. system according to claim 6, which is characterized in that the optimizing module is specifically used for:
Parameter optimization is carried out to supporting vector machine model based on grid search and cross validation.
CN201811331848.7A 2018-11-09 2018-11-09 BGP anomaly detection method and system Active CN109257383B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811331848.7A CN109257383B (en) 2018-11-09 2018-11-09 BGP anomaly detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811331848.7A CN109257383B (en) 2018-11-09 2018-11-09 BGP anomaly detection method and system

Publications (2)

Publication Number Publication Date
CN109257383A true CN109257383A (en) 2019-01-22
CN109257383B CN109257383B (en) 2021-09-21

Family

ID=65044099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811331848.7A Active CN109257383B (en) 2018-11-09 2018-11-09 BGP anomaly detection method and system

Country Status (1)

Country Link
CN (1) CN109257383B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111835791A (en) * 2020-07-30 2020-10-27 哈尔滨工业大学 BGP security event rapid detection system
CN112653675A (en) * 2020-12-12 2021-04-13 海南师范大学 Intelligent intrusion detection method and device based on deep learning
CN112702221A (en) * 2019-10-23 2021-04-23 中国电信股份有限公司 BGP abnormal route monitoring method and device
CN112905572A (en) * 2021-01-29 2021-06-04 铁道警察学院 Data anomaly information studying and judging model and method
CN114535142A (en) * 2022-01-11 2022-05-27 华南理工大学 Data-driven intelligent determination method for dimension qualification of injection molding product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101594361A (en) * 2009-06-02 2009-12-02 浙江大学 Network Intrusion Detection System based on shortcut calculation of support vector machine
CN102903075A (en) * 2012-10-15 2013-01-30 西安电子科技大学 Robust watermarking method based on image feature point global correction
CN105184316A (en) * 2015-08-28 2015-12-23 国网智能电网研究院 Support vector machine power grid business classification method based on feature weight learning
US20180262525A1 (en) * 2017-03-09 2018-09-13 General Electric Company Multi-modal, multi-disciplinary feature discovery to detect cyber threats in electric power grid

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101594361A (en) * 2009-06-02 2009-12-02 浙江大学 Network Intrusion Detection System based on shortcut calculation of support vector machine
CN102903075A (en) * 2012-10-15 2013-01-30 西安电子科技大学 Robust watermarking method based on image feature point global correction
CN105184316A (en) * 2015-08-28 2015-12-23 国网智能电网研究院 Support vector machine power grid business classification method based on feature weight learning
US20180262525A1 (en) * 2017-03-09 2018-09-13 General Electric Company Multi-modal, multi-disciplinary feature discovery to detect cyber threats in electric power grid

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨光、巫春玲等: "《基于RS和WSVM的网络入侵检测算法研究》", 《计算机仿真》 *
高巍、彭宇: "《基于马氏距离多核学习的高光谱图像分类》", 《仪器仪表学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112702221A (en) * 2019-10-23 2021-04-23 中国电信股份有限公司 BGP abnormal route monitoring method and device
CN111835791A (en) * 2020-07-30 2020-10-27 哈尔滨工业大学 BGP security event rapid detection system
CN111835791B (en) * 2020-07-30 2022-10-28 哈尔滨工业大学 BGP security event rapid detection system
CN112653675A (en) * 2020-12-12 2021-04-13 海南师范大学 Intelligent intrusion detection method and device based on deep learning
CN112905572A (en) * 2021-01-29 2021-06-04 铁道警察学院 Data anomaly information studying and judging model and method
CN114535142A (en) * 2022-01-11 2022-05-27 华南理工大学 Data-driven intelligent determination method for dimension qualification of injection molding product
CN114535142B (en) * 2022-01-11 2023-09-26 华南理工大学 Intelligent judgment method for size qualification of injection molding product based on data driving

Also Published As

Publication number Publication date
CN109257383B (en) 2021-09-21

Similar Documents

Publication Publication Date Title
CN109257383A (en) A kind of BGP method for detecting abnormality and system
CN106202561B (en) Digitlization contingency management case base construction method and device based on text big data
Fong et al. Accelerated PSO swarm search feature selection for data stream mining big data
CN111324642A (en) Model algorithm type selection and evaluation method for power grid big data analysis
Amini et al. On density-based data streams clustering algorithms: A survey
Martins A supervised machine learning approach for duplicate detection over gazetteer records
CN108717408A (en) A kind of sensitive word method for real-time monitoring, electronic equipment, storage medium and system
CN100416560C (en) Method and apparatus for clustered evolving data flow through on-line and off-line assembly
Shimada et al. Class association rule mining with chi-squared test using genetic network programming
CN110347840A (en) Complain prediction technique, system, equipment and the storage medium of text categories
CN107679734A (en) It is a kind of to be used for the method and system without label data classification prediction
Pit-Claudel et al. Outlier detection in heterogeneous datasets using automatic tuple expansion
Gu et al. [Retracted] Application of Fuzzy Decision Tree Algorithm Based on Mobile Computing in Sports Fitness Member Management
CN108470022A (en) A kind of intelligent work order quality detecting method based on operation management
CN112800115B (en) Data processing method and data processing device
CN105183792B (en) Distributed fast text classification method based on locality sensitive hashing
CN108304851A (en) A kind of High Dimensional Data Streams Identifying Outliers method
CN113704389A (en) Data evaluation method and device, computer equipment and storage medium
CN112732690B (en) Stabilizing system and method for chronic disease detection and risk assessment
CN111143838A (en) Database user abnormal behavior detection method
CN112087316B (en) Network anomaly root cause positioning method based on anomaly data analysis
CN113298116A (en) Attention weight-based graph embedding feature extraction method and device and electronic equipment
CN113837266B (en) Software defect prediction method based on feature extraction and Stacking ensemble learning
CN114897085A (en) Clustering method based on closed subgraph link prediction and computer equipment
CN117155771B (en) Equipment cluster fault tracing method and device based on industrial Internet of things

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant