Disclosure of Invention
The invention aims to provide a method and a system for detecting the traffic anomaly of a converged communication network, which can improve the accuracy of detecting the traffic anomaly of the converged communication network.
In order to achieve the purpose, the invention provides the following scheme:
a method for detecting abnormal traffic of a converged communication network comprises the following steps:
acquiring flow data of a converged communication network; the traffic data includes: network state data, protocol analysis data, service operation state data and corresponding network flow state; the network status data includes: throughput, packet traffic, delay jitter, and call traffic; the protocol analysis data includes: protocol type, protocol packet length, connection duration, port information, and IP information; the service operation state data comprises service fault information; the network flow state comprises abnormal network flow or normal network flow;
according to the flow data, based onKPerforming refraction and cross validation, and determining a network flow abnormity detection model by adopting a non-greedy teaching and learning optimization algorithm; the network anomaly detection model takes network state data, protocol analysis data and service operation state data as input and takes the network flow state as output;
and determining the network flow state of the converged communication network to be detected by adopting the network anomaly detection model according to the network state data, the protocol analysis data and the service operation state data of the converged communication network to be detected.
Optionally, the acquiring traffic data of the converged communication network further includes:
and carrying out standardization processing on the flow data.
Optionally, the data according to the flow rate is based onKAnd (3) performing cross validation, namely determining a network flow abnormity detection model by adopting a non-greedy teaching and learning optimization algorithm, and specifically comprising the following steps of:
acquiring a machine learning algorithm; the machine learning algorithm includes: a support vector machine, a decision tree and a neural network;
based onKPerforming cross validation, namely optimizing parameters of the machine learning algorithm by adopting a non-greedy teaching and learning optimization algorithm; the parameters comprise a penalty coefficient and a kernel width;
and determining a network flow abnormity detection model according to the flow data and the optimized parameters of the machine learning algorithm.
Optionally, the base isKAnd (3) performing cross-turn verification, namely optimizing parameters of the machine learning algorithm by adopting a non-greedy teaching and learning optimization algorithm, and specifically comprising the following steps of:
initializing non-greedy teaching and learning optimization algorithm parameters; the non-greedy teaching and learning optimization algorithm parameters comprise the number of students, the maximum iteration times, a non-greedy coefficient, a search space of a punishment coefficient and a search space of a kernel width;
random initializationNIndividual student and based onKCalculating the fitness of each student individual by a folding and crossing verification strategy; the individual students are parameters of a machine learning algorithm;
determining an optimal student individual and a worst student individual according to the fitness, and performing first-stage learning based on the optimal student individual and the worst student individual; the optimal student individuals are the student individuals with the maximum fitness; the worst individual student is the individual student with the minimum fitness;
calculating the individual fitness of the student after the learning in the first stage;
updating individual students based on a non-greedy strategy;
performing second-stage learning according to the historical optimal student individuals and random student individuals;
calculating the individual fitness of the student after the second-stage learning;
secondarily updating the individual students based on a non-greedy strategy;
judging whether the maximum iteration times is reached;
if so, determining the optimal student individual as the optimized parameter of the machine learning algorithm;
and if not, returning to the step of determining the optimal student individual and the worst student individual according to the fitness and performing the first-stage learning based on the optimal student individual and the worst student individual.
A converged communication network traffic anomaly detection system, comprising:
the traffic data acquisition module is used for acquiring traffic data of the converged communication network; the traffic data includes: network state data, protocol analysis data, service operation state data and corresponding network flow state; the network status data includes: throughput, packet traffic, delay jitter, and call traffic; the protocol analysis data includes: protocol type, protocol packet length, connection duration, port information, and IP information; the service operation state data comprises service fault information; the network flow state comprises abnormal network flow or normal network flow;
a network flow anomaly detection model determination module for determining the flow data based onKPerforming refraction and cross validation, and determining a network flow abnormity detection model by adopting a non-greedy teaching and learning optimization algorithm; the network anomaly detection model takes network state data, protocol analysis data and service operation state data as input and takes the network flow state as output;
and the network flow state determining module is used for determining the network flow state of the converged communication network to be detected by adopting the network anomaly detection model according to the network state data, the protocol analysis data and the service operation state data of the converged communication network to be detected.
Optionally, the method further includes:
and the flow data standardization module is used for carrying out standardization processing on the flow data.
Optionally, the module for determining a network traffic anomaly detection model specifically includes:
a machine learning algorithm acquisition unit for acquiring a machine learning algorithm; the machine learning algorithm includes: a support vector machine, a decision tree and a neural network;
a parameter optimization unit of machine learning algorithm for being based onKPerforming cross validation, namely optimizing parameters of the machine learning algorithm by adopting a non-greedy teaching and learning optimization algorithm; the parameters comprise a penalty coefficient and a kernel width;
and the network flow abnormity detection model determining unit is used for determining a network flow abnormity detection model according to the flow data and the optimized parameters of the machine learning algorithm.
Optionally, the parameter optimization unit of the machine learning algorithm specifically includes:
the non-greedy teaching and learning optimization algorithm parameter initialization subunit is used for initializing non-greedy teaching and learning optimization algorithm parameters; the non-greedy teaching and learning optimization algorithm parameters comprise the number of students, the maximum iteration times, a non-greedy coefficient, a search space of a punishment coefficient and a search space of a kernel width;
a fitness first calculation subunit for random initializationNIndividual student and based onKCalculating the fitness of each student individual by a folding and crossing verification strategy; the individual students are parameters of a machine learning algorithm;
the first-stage learning subunit is used for determining an optimal student individual and a worst student individual according to the fitness and performing first-stage learning based on the optimal student individual and the worst student individual; the optimal student individuals are the student individuals with the maximum fitness; the worst individual student is the individual student with the minimum fitness;
the fitness second calculating subunit is used for calculating the individual fitness of the student after the first-stage learning;
a first updating subunit, configured to update the student individuals based on a non-greedy policy;
the second-stage learning subunit is used for performing second-stage learning according to the historical optimal student individuals and the random student individuals;
the fitness third calculating subunit is used for calculating the individual fitness of the student after the second-stage learning;
the second updating subunit is used for updating the student individuals secondarily based on a non-greedy strategy;
the judging subunit is used for judging whether the maximum iteration number is reached;
the parameter determining subunit of the optimized machine learning algorithm is used for determining the optimal student individual as the parameter of the optimized machine learning algorithm if the optimal student individual is reached;
and the iteration subunit is used for returning to the step of determining the optimal student individual and the worst student individual according to the fitness and performing the first-stage learning based on the optimal student individual and the worst student individual if the fitness is not reached.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the method and the system for detecting the traffic anomaly of the converged communication network, provided by the invention, the key parameters influencing the performance of the anomaly detection method based on machine learning are optimized and selected through a non-greedy teaching and learning optimization algorithm, and the possibility of the key parameters falling into local optimum is reduced, so that the optimal parameters can be obtained in the parameter optimization of the machine learning algorithm, the performance of the algorithm is effectively improved, and the accuracy of the traffic anomaly detection of the converged communication network is improved. The method is simple, efficient and easy to implement, can provide accurate and reasonable prediction for the abnormal flow of the converged communication network, assists in guiding the formulation of relevant decisions, and promotes the intelligent and scientific development of the converged communication network, so that the method has very important application value.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a method and a system for detecting the traffic anomaly of a converged communication network, which can improve the accuracy of detecting the traffic anomaly of the converged communication network.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a schematic flow chart of a method for detecting traffic anomaly in a converged communication network, according to fig. 1, the method for detecting traffic anomaly in a converged communication network includes:
s101, acquiring traffic data of the converged communication network; the traffic data includes: network state data, protocol analysis data, service operation state data and corresponding network flow state; the network status data includes: throughput, packet traffic, delay jitter, and call traffic; the protocol analysis data includes: protocol type, protocol packet length, connection duration, port information, and IP information; the service operation state data comprises service fault information; the network traffic state comprises network traffic abnormity or network traffic normality. For the network traffic status label, "1" indicates that the network traffic is abnormal, and'-1' indicates that the network traffic is normal.
Then also comprises the following steps:
and carrying out standardization processing on the flow data.
Using formulas
Standardizing the flow data to make the flow data at [0, 1]And (3) a range.
Wherein the content of the first and second substances,P i is shown asiThe original value of the data on a certain feature,P min represents the minimum value of all data on the feature,P max represents the maximum value of all data over the feature,P i ’is shown asiThe normalized value of each data over the feature.
S102, according to the flow data, based onKPerforming refraction and cross validation, and determining a network flow abnormity detection model by adopting a non-greedy teaching and learning optimization algorithm; the network anomaly detection model takes network state data, protocol analysis data and service operation state data as input and takes the network flow state as output.
The network flow abnormity detection model is
f(
x) Wherein, in the step (A),
sgn (.) is a sign function,
xindicating converged communication network data to be detected,
x i is shown as
iTraining sample data corresponding to each support vector,
for training sample data
x i The corresponding langerhan coefficient of the corresponding langerhans,
y i representing training samples
x i The corresponding label is marked with a corresponding label,
ba threshold value is indicated which is indicative of,
k(.) is a radial basis kernel function, such as the formula
As shown in the drawings, the above-described,
is kernel wide.
As shown in fig. 2, S102 specifically includes:
acquiring a machine learning algorithm; the machine learning algorithm includes: support vector machines, decision trees, and neural networks.
Based onKPerforming cross validation, namely optimizing parameters of the machine learning algorithm by adopting a non-greedy teaching and learning optimization algorithm; the parameters include, but are not limited to, penalty factor and kernel width of the support vector machine.
And determining a network flow abnormity detection model according to the flow data and the optimized parameters of the machine learning algorithm.
Based on the K-fold cross validation, a non-greedy teaching and learning optimization algorithm is adopted to optimize parameters of the machine learning algorithm, and the method specifically comprises the following steps:
initializing non-greedy teaching and learning optimization algorithm parameters; the non-greedy teaching and learning optimization algorithm parameters comprise the number of students, the maximum iteration times, a non-greedy coefficient, a search space of a punishment coefficient and a search space of a kernel width.
Random initializationNIndividual student and based onKCalculating the fitness of each student individual by a folding and crossing verification strategy; the individual students are parameters of a machine learning algorithm.
Determining an optimal student individual and a worst student individual according to the fitness, and performing first-stage learning based on the optimal student individual and the worst student individual; the optimal student individuals are the student individuals with the maximum fitness; the worst individual student is the individual student with the minimum fitness.
And calculating the individual fitness of the students after the learning in the first stage.
Individual students are updated based on a non-greedy strategy.
And performing the second stage learning according to the historical optimal student individuals and random student individuals.
And calculating the individual fitness of the student after the second-stage learning.
And secondarily updating the individual students based on a non-greedy strategy.
And judging whether the maximum iteration number is reached.
And if so, determining the optimal student individual as the optimized parameter of the machine learning algorithm.
And if not, returning to the step of determining the optimal student individual and the worst student individual according to the fitness and performing the first-stage learning based on the optimal student individual and the worst student individual.
As a specific embodiment, a specific process for performing parameter optimization by using a non-greedy teaching and learning optimization algorithm is as follows:
step 1: initialization of non-greedy teaching and learning optimization algorithm, including number of students
NMaximum number of iterations
TNon-greedy coefficients
And anomaly detection algorithm
MA search space of parameters;
step 2: individual student initialization, random generationNIndividual studentS i =[S i,1 ,…,S i,D ](i=1,2,..,N),S i,1 ,S i,D 1 st and 1 st respectively representing abnormality detection algorithms represented by individual studentsDAnd (3) mapping the continuous value into the discrete value by adopting a rounding value-taking strategy aiming at the discrete value parameter.
And step 3: calculate each student
iIs adapted to
f i The fitness is based on students
iThe abnormality representedDetecting algorithm parameter values, internally
KCalculating the average value of the accuracy indexes F1-score of the anomaly detection model obtained by the parameter training by the cross-folding verification strategy (the average value is the value of the accuracy indexes F1-score of the anomaly detection model obtained by the parameter training: (
F1
avg) And variance (
F1
sd) Then according to the formula
Calculating student
iIs adapted to
f i 。
And 4, step 4: based on the formula
Updating the individual students with a non-greedy strategy, wherein,
randis [0, 1 ]]The random number in (1) is selected,
S i is an original individual, and is a new individual,
S kbest for the best individual among the current student individuals,
S kworst is the worst individual of the current student individuals,
M i is the average of all the individuals at present,
T F1 leading factors for the best individual and the worst individual,
,
T F2 is the worst individual lead factor and is the worst individual lead factor,
。
random(
a,b) As a random function, from [ a, b ]]Randomly generating a value. Non-greedy policy updates as formula
As shown in the drawings, the above-described,
random_select(.) is selected at random in the form of a random selection,
greedy_select(.) are greedy choices, always select the best individual,
entering the next stage of learning process for the updated individual students;
and 5: based on the formula
Updating the individual student by a non-greedy strategy for a second stage learning
S k,j Is selected randomly differently from
S i,j The individual(s) of (a),
S gbest,j is composed of
S i,j The history of the individual is the best one,
randis [0, 1 ]]The random number in (1) is also expressed by the formula
Updating the student individuals by the non-greedy strategy, and entering the next learning process by the updated individuals;
step 6: judging whether the maximum iteration number is reachedTIf yes, go to step 7, otherwise go to step 4;
and 7: and acquiring the individual student with the maximum fitness, namely the optimal abnormal detection algorithm parameter value.
S103, determining the network traffic state of the converged communication network to be detected by adopting the network anomaly detection model according to the network state data, the protocol analysis data and the service operation state data of the converged communication network to be detected.
The key innovation points and effects of the invention are as follows:
1. based on a non-greedy teaching and learning optimization algorithm, key parameters influencing the performance of the fused communication network flow anomaly detection algorithm are automatically optimized, so that an efficient and accurate prediction model is obtained;
2. aiming at the selection process of the teaching and learning optimization algorithm, a non-greedy selection strategy is provided to prevent trapping into local optimization;
3. aiming at the learning process of a teaching and learning optimization algorithm, worst student individuals and the optimal historical information of the students are integrated, so that the learning efficiency is improved;
4. aiming at the problem of prediction precision of an anomaly detection algorithm, a method considering the prediction precision and stability is provided.
Fig. 3 is a schematic structural diagram of a converged communication network traffic anomaly detection system provided by the present invention, and as shown in fig. 3, the converged communication network traffic anomaly detection system provided by the present invention includes:
a traffic data acquiring module 301, configured to acquire traffic data of the converged communication network; the traffic data includes: network state data, protocol analysis data, service operation state data and corresponding network flow state; the network status data includes: throughput, packet traffic, delay jitter, and call traffic; the protocol analysis data includes: protocol type, protocol packet length, connection duration, port information, and IP information; the service operation state data comprises service fault information; the network traffic state comprises network traffic abnormity or network traffic normality.
A network flow anomaly detection model determination module 302 for determining the flow data based onKPerforming refraction and cross validation, and determining a network flow abnormity detection model by adopting a non-greedy teaching and learning optimization algorithm; the network anomaly detection model takes network state data, protocol analysis data and service operation state data as input and takes the network flow state as output.
The network traffic state determining module 303 is configured to determine the network traffic state of the converged communication network to be detected by using the network anomaly detection model according to the network state data, the protocol analysis data, and the service operation state data of the converged communication network to be detected.
The invention provides a system for detecting the abnormal traffic of a converged communication network, which further comprises:
and the flow data standardization module is used for carrying out standardization processing on the flow data.
The module 302 for determining a network traffic anomaly detection model specifically includes:
a machine learning algorithm acquisition unit for acquiring a machine learning algorithm; the machine learning algorithm includes: support vector machines, decision trees, and neural networks.
A parameter optimization unit of machine learning algorithm for being based onKPerforming cross validation, namely optimizing parameters of the machine learning algorithm by adopting a non-greedy teaching and learning optimization algorithm; the parameter packageIncluding but not limited to penalty factor and kernel width of the support vector machine.
And the network flow abnormity detection model determining unit is used for determining a network flow abnormity detection model according to the flow data and the optimized parameters of the machine learning algorithm.
The parameter optimization unit of the machine learning algorithm specifically comprises:
the non-greedy teaching and learning optimization algorithm parameter initialization subunit is used for initializing non-greedy teaching and learning optimization algorithm parameters; the non-greedy teaching and learning optimization algorithm parameters comprise the number of students, the maximum iteration times, a non-greedy coefficient, a search space of a punishment coefficient and a search space of a kernel width.
A fitness first calculation subunit for random initializationNIndividual student and based onKCalculating the fitness of each student individual by a folding and crossing verification strategy; the individual students are parameters of a machine learning algorithm.
The first-stage learning subunit is used for determining an optimal student individual and a worst student individual according to the fitness and performing first-stage learning based on the optimal student individual and the worst student individual; the optimal student individuals are the student individuals with the maximum fitness; the worst individual student is the individual student with the minimum fitness;
and the fitness second calculating subunit is used for calculating the individual fitness of the student after the first-stage learning.
And the first updating subunit is used for updating the student individuals based on a non-greedy strategy.
And the second-stage learning subunit is used for performing second-stage learning according to the historical optimal student individuals and the random student individuals.
And the fitness third calculating subunit is used for calculating the individual fitness of the student after the second-stage learning.
And the second updating subunit is used for updating the student individuals secondarily based on a non-greedy strategy.
And the judging subunit is used for judging whether the maximum iteration number is reached.
And the parameter determining subunit is used for determining the optimal student individual as the optimized parameter of the machine learning algorithm if the optimal parameter is reached.
And the iteration subunit is used for returning to the step of determining the optimal student individual and the worst student individual according to the fitness and performing the first-stage learning based on the optimal student individual and the worst student individual if the fitness is not reached.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.