CN113612660B

CN113612660B - LSTM network traffic prediction method based on population self-adaptive differential evolution

Info

Publication number: CN113612660B
Application number: CN202110883692.9A
Authority: CN
Inventors: 田军; 徐政五; 廖强; 甘露
Original assignee: Research Institute Of Yibin University Of Electronic Science And Technology; University of Electronic Science and Technology of China
Current assignee: Research Institute Of Yibin University Of Electronic Science And Technology; University of Electronic Science and Technology of China
Priority date: 2021-08-03
Filing date: 2021-08-03
Publication date: 2023-12-08
Anticipated expiration: 2041-08-03
Also published as: CN113612660A

Abstract

The invention belongs to the field of network traffic prediction, and in particular relates to an LSTM network traffic prediction algorithm based on population self-adaptive differential evolution, which comprises the following steps: and constructing a network flow data set, constructing a self-adaptive population differential evolution algorithm, carrying out parameter optimization on the double-layer LSTM network, and iteratively searching for a better value. The invention selects a long-short-term memory network (LSTM) as a basic network model based on a population self-adaptive differential evolution algorithm, combines the ideas of adding chaotic individuals of a population and deleting bad individuals, and seeks the variation of the individuals in the population matrix to obtain the parameters meeting the requirement of higher prediction precision of the LSTM network.

Description

LSTM network traffic prediction method based on population self-adaptive differential evolution

Technical Field

The invention relates to a network traffic prediction technology, in particular to an LSTM network traffic prediction method based on population self-adaptive differential evolution.

Background

With the development of the data age, whether future data trends can be mined from a large amount of sequence data is becoming more and more interesting, and the time sequence prediction capability can provide decision support for related tasks. In the field of time sequence prediction, the cyclic neural network can effectively extract time-dependent features of a time sequence and ensure the capability of learning the time sequence, but the problem of gradient disappearance or explosion restricts the long-term dependency features of the time sequence. Because Long Short-Term Memory (LSTM) can solve the gradient problem of the traditional circulating neural network, the LSTM sequence prediction technology has wide application in realizing time sequence, especially Long-time sequence prediction. The high-precision effect of network traffic prediction can be better realized by combining the differential evolution theory and the LSTM prediction technology.

However, the neural network itself has many super parameters which need to be set in advance, and in actual situations, the super parameters are usually enumerated and verified on values in experience through a grid search mode, but the method is easy to fall into an experience crisis and is not easy to find better parameters. But the super-parameter optimizing based on the differential evolution algorithm not only can be performed on the basis of experience, but also can be used for widely searching better parameters based on biological theory. There are now some algorithms for optimizing neural network parameters, such as genetic algorithm (Genetic Algorithm, GA), particle swarm algorithm (Particle Swarm Optimization, PSO), artificial Fish-swarm Algorithm (AF), teaching-learning optimization algorithm (based optimization, TLBO), and differential evolution algorithm (Differential Evolution Algorithm, DE). As a whole, these evolutionary algorithms have more or less local convergence and convergence time limitations.

Disclosure of Invention

The invention aims to: the invention aims to solve the problem that the network precision is poor due to the found parameters caused by the defects of the local convergence and convergence time of the traditional evolutionary algorithm in the current LSTM network traffic prediction, and provides a LSTM network traffic prediction method based on the population self-adaptive differential evolution, which can not only keep the global property of the population individuals but also converge on the better individuals in a self-adaptive population individual increasing and decreasing mode.

The technical scheme is as follows: the invention relates to an LSTM network flow prediction method based on population self-adaptive differential evolution, which comprises the following steps:

s1, constructing a data set: firstly, normalizing network flow data and setting an input sequence length T and a delay time tau, then constructing input data and an output label, wherein the input data is the network flow at the first T moments, the network flow at the time T+1 is used as the output label, finally randomly scrambling the sequence and selecting the first 75% as a training set, and the rest data is used as a test set.

Wherein Data is shown as an input Data set structure, each row is input once, and the numeric value of label of the corresponding row number is taken as an output label value.

S2, utilizing the LSTM network as a network flow prediction model, and adopting an Adam optimization algorithm to perform optimization training on the network flow prediction model. And then importing training data with constructed labels, initializing the weight of the network model, and using a mean square error function (RMSE) as an fitness evaluation value. And then optimizing the number of the hidden neurons of the first layer, the learning rate, the dropout rate and the number of the hidden neurons of the second layer of the LSTM.

S21, initializing a population matrix X of NP rows and 4 columns, wherein NP is the number of individuals in the population, each dimension in the population matrix represents four parameter values of LSTM, and each parameter has a selection range, so that each dimension of the individuals is restrained. The initialization matrix X is shown below, where N is set to 4, and if there are more parameters, the number of parameters can be set to be the corresponding number.

S22, each dimension of the individual is respectively assigned with corresponding parameters: and carrying out fitness evaluation after substituting the number of hidden neurons of the LSTM first layer, the Adam learning rate, the dropout rate and the number of hidden neurons of the LSTM second layer. The fitness evaluation is the root mean square error RMSE of the LSTM model on the test set.

Where y (t) represents the t-th true value and y' (t) represents the t predicted value. n is the test set label length.

S23, carrying out mutation operation on the population: the basic principle of generating more individuals by utilizing mutation through the evolution of the current individuals is that the mutation vector is generated by adding the differential mutation results of the current individuals and the differential mutation results of two other different individuals, and the specific expression is as follows:

X _r1 ＝X _r2 +F(X _r2 -X _r3 )

wherein r is ₁ 、r ₂ 、r ₃ Is a random integer which is not equal to each other, and F is a variation factor. X is X _r1 、X _r2 、X _r3 Refers to the (r) ₁ 、r ₂ 、r ₃ Matrix individuals of rows.

S24, performing cross selection operation on the individual: crossover is a random mixing of individual parameters of a variation vector with a predetermined pre-vector. The vector after crossing is:

j is a random function j _rand The random integer number generated, CR is the crossover probability,is the individual vector of the next generation after mutation by step S23,/the individual vector is expressed in terms of->Is the current individual X in the population matrix X _r1 ，/>Is a vector after crossing in order to randomly extract the dimensions of different individuals to make up a new crossed individual. After crossing, the crossed individuals are compared with the old individuals to obtain individuals with lower fitness, wherein the fitness is obtained by substituting the values of the population individuals into the LSTM network and returning to the RMSE.

S3, generating a brand new population matrix after carrying out the operation on all individuals of the population matrix, calculating and recording the fitness of each generation of individuals and the value of the minimum fitness of each generation, and then carrying out variance operation on the fitness value of each generation of individuals, wherein the variance calculation formula is as follows:

wherein f is a normalization factor, f _i Is the value of the fitness of the ith individual,representing the average fitness, sigma, of individual population matrix X ² Is the variance value.

S31, deleting harmful individuals: on the premise that the population individual number is in a self-defined specified range, if the adaptability variance of the current population individual is large, the number of harmful individuals is large, and at the moment, the individual with the highest adaptability, namely the worst adaptability, is selected and deleted. If this step S31 is performed, it is indicated that the population individual diversity is good, and the step S33 is directly performed without adding chaotic individuals.

S32, adding chaos individuals: under the precondition that the population individual number is in a self-defined specified range and S31 is not passed, if the population optimal individual changes of two continuous generations are not great, namely the population lacks diversity, chaotic individual addition is carried out from the corresponding individuals of the Logistic mapping to the specified range, and the standard Logistic chaotic mapping is as follows:

Z _n+1 ＝μ(1-Z _n )

the value of μ=4 shows extremely strong chaos, and the value generated after the initial value is defined is not a number in a parameter range, so that range mapping is needed after the chaos is traversed. Chaos variable Z _n The interval of the chaotic map is mapped to a specified range by the following equation:

wherein X is _min ΔX refers to the minimum value of the mapping result and the interval size of X, Z _max 、Z _min Respectively the maximum and minimum values of the chaos vector, Z _n Is the nth chaotic vector. And adding the generated chaotic variable into the population matrix X to finish the addition of the chaotic body, and then performing a parameter self-adaption step S33.

S33, if the steps S31 and S32 are not carried out, the population after evolution is subjected to the following parameter adaptation:

where G is the number of iterations of the population, G _max Is the maximum iteration algebra, CR is the crossover rate, and F is the mutation factor. If step S31 or S32 is passed, the following parameter adaptation is performed, where the formula is:

wherein NP++ and NP- -refer to the self-increase and self-decrease in population individual numbers, G _max Is the maximum population iteration number, G is the current iteration number, F _min Is the minimum variation factor, delta F is the variation interval of the variation factor, F _i Is the fitness value of the model for i individual returns, f _min And f _max Is the minimum and maximum individual fitness in the current population, F _last Is the variation factor of the last iteration. CR (computed radiography) _min Represents the minimum cross probability set, ΔCR is the cross probabilityVariable interval size, CR _last Is the crossover probability of the last iteration.

S4, outputting: after the final optimizing parameters are obtained, the parameters are input into an LSTM network to predict a network flow data set, and after the corresponding predicted values are obtained, the network flow values at the next moment are output through inverse normalization.

Drawings

FIG. 1 is a block diagram of an implementation of the present invention;

FIG. 2 is a schematic flow chart of the present invention;

FIG. 3 is a schematic diagram of a model architecture of the present invention;

Detailed Description

The technical scheme of the invention will be further described with reference to the accompanying drawings and examples.

Examples: network traffic data obtained by the local server through packet capturing is selected, and the network traffic statistic value every 10 seconds is set through packet capturing software to obtain the traffic data, wherein the unit is bytes. The purpose of this embodiment is to predict the flow value at the next moment by counting the network flow of a time sequence, so as to provide a certain support for setting resources such as network broadband.

S1, firstly, carrying out normalization processing on input data, and adopting a Min-Max normalization method:

wherein x represents the original data of the network traffic, x' represents the data after data normalization, and the normalization is only to solve the problem of data amplitude and translation.

S2, on the setting of model parameters, the range of the four parameters of LSTM is the hidden neuron number [10,50] of the LSTM of the first layer, the Adam optimization learning rate [0.0001,0.2], the Dropout rate [0.2,0.5], and the hidden neuron number [10,50] of the LSTM of the second layer. A population matrix of 20 rows and 4 columns is then generated.

And S3, carrying out evolutionary optimization on the population matrix by using a population self-adaptive differential evolutionary algorithm for 30 generations to obtain a search optimal value. The optimizing adaptability comprises one of mean square error, root mean square error, average absolute error and returned by the LSTM network. The formulas of the three error indexes are respectively as follows:

wherein y is _i Representing the true value, y' _i Representing the predicted value. N is the number of the labels in the test set.

S4, substituting the final optimizing value into the LSTM network to conduct flow prediction on the input data and then conducting inverse normalization output on the output data.

The optimization parameter performance of the LSTM network is the lowest in error value, namely the highest in accuracy, based on the population self-adaptive differential evolution method (Population Adaptive Differential Evolution, PADE), and the optimization range can be global by combining the addition of chaotic individuals and the deletion of harmful individuals, so that the obtained parameter performance is better.

Claims

1. The LSTM network traffic prediction method based on the population self-adaptive differential evolution is characterized by comprising the following steps of:

s1, constructing a data set: normalizing network traffic data and setting an input sequence length T and a delay time tau, then constructing input data and an output label, wherein the input data is the network traffic of the previous T moments, the network traffic of the time T+1 is used as the output label, finally randomly scrambling the sequence and selecting the previous 75% as a training set, and the rest data is used as a test set:

wherein Data is shown as an input Data set structure, each row is input once, and the numeric value of label corresponding to the number of rows is taken as an output label value;

s2, utilizing an LSTM network as a network traffic prediction model, adopting an Adam optimization algorithm to perform optimization training on the network traffic prediction model, then importing training data with built labels, initializing the weight of the network model, using a mean square error function (RMSE) as an fitness evaluation value, and then optimizing the number of hidden neurons of a first layer, the learning rate, the dropout rate and the number of hidden neurons of a second layer of the LSTM:

s21, initializing a population matrix X of NP rows and 4 columns, wherein NP is the number of individuals in the population, each dimension in the population matrix represents four parameter values of LSTM, each parameter has a selection range, and each dimension of the individuals is restrained; the initialization matrix X is shown below, where N is set to 4:

s22, each dimension of the individual is respectively assigned with corresponding parameters: the number of hidden neurons of the LSTM first layer, the Adam learning rate, the dropout rate and the number of hidden neurons of the LSTM second layer are substituted and then subjected to fitness evaluation, wherein the fitness evaluation is the Root Mean Square Error (RMSE) of the LSTM model on a test set:

wherein y (t) represents the t-th true value, y' (t) represents the t predicted value, and n is the label length of the test set;

s23, carrying out mutation operation on the population: the mutation is utilized to evolve to generate more individuals through the current individuals, namely, the differential mutation results of the current individuals and the other two different individuals are added to generate mutation vectors, and the specific expression is as follows:

X _r1 ＝X _r2 +F(X _r2 -X _r3 )

wherein r is ₁ 、r ₂ 、r ₃ Is a random integer which is not equal to each other, F is a variation factor, X _r1 、X _r2 、X _r3 Refers to the (r) ₁ 、r ₂ 、r ₃ Matrix individuals of rows;

s24, performing cross selection operation on the individual: the crossover is to randomly mix the variation vector with a predetermined pre-vector, and the vector after crossover is:

j is a random function j _rand The random integer number generated, CR is the crossover probability,is the individual vector of the next generation after mutation by step S23,/the individual vector is expressed in terms of->Is the current individual X in the population matrix X _r1 ，/>The method comprises the steps of crossing vectors, comparing crossed individuals with old individuals to obtain individuals with lower fitness after crossing, wherein the fitness is obtained by substituting the values of population individuals into an LSTM network and returning to the RMSE;

wherein f is a normalization factor, f _i Is the value of the fitness of the ith individual,representing the average fitness, sigma, of individual population matrix X ² Is the variance value;

s31, deleting harmful individuals: on the premise that the population individual number is in a self-defined specified range, if the current population individual fitness variance is large, the harmful individuals are more, at the moment, the individuals with the highest fitness, namely the worst fitness, are selected to be deleted, if the step S31 is carried out, the population individual diversity is better, the addition of chaotic individuals is not needed, and the step S33 is directly carried out;

Z _n+1 ＝μ(1-Z _n )

wherein, mu=4 shows extremely strong chaos, and the values generated later after the initial value is defined are not numbers in the parameter range, so that the range mapping is needed after the chaos is traversed, and the chaos variable Z _n The interval of the chaotic map is mapped to a specified range by the following equation:

wherein X is _min ΔX refers to the minimum value of the mapping result and the interval size of X, Z _max 、Z _min Respectively of chaos vectorMaximum and minimum value, Z _n The nth chaotic vector is added into the population matrix X to finish the addition of the chaotic body by the generated chaotic variable, and then the parameter self-adaption step S33 is carried out;

where G is the number of iterations of the population, G _max The maximum iteration algebra, CR is the crossover rate, F is the mutation factor, and if step S31 or S32 is passed, the following parameter adaptation is performed, and the formula is:

wherein NP++ and NP- -refer to the self-increase and self-decrease in population individual numbers, G _max Is the maximum population iteration number, G is the current iteration number, F _min Is the minimum variation factor, delta F is the variation interval of the variation factor, F _i Is the fitness value of the model for i individual returns, f _min And f _max Is the minimum and maximum individual fitness in the current population, F _last Is the variation factor of the last iteration, CR _min Represents the minimum cross probability of the setting, ΔCR is the cross probability variable interval size, CR _last Is the crossover probability of the last iteration;