CN112506899A - PM2.5 data abnormal value detection method based on improved LSTM - Google Patents

PM2.5 data abnormal value detection method based on improved LSTM Download PDF

Info

Publication number
CN112506899A
CN112506899A CN202011333748.5A CN202011333748A CN112506899A CN 112506899 A CN112506899 A CN 112506899A CN 202011333748 A CN202011333748 A CN 202011333748A CN 112506899 A CN112506899 A CN 112506899A
Authority
CN
China
Prior art keywords
lstm
data
particle
lstm model
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011333748.5A
Other languages
Chinese (zh)
Inventor
徐洪珍
蔡友林
周梁琦
许杰云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Institute of Technology
Original Assignee
East China Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Institute of Technology filed Critical East China Institute of Technology
Priority to CN202011333748.5A priority Critical patent/CN112506899A/en
Publication of CN112506899A publication Critical patent/CN112506899A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a PM2.5 data abnormal value detection method based on improved LSTM, which comprises the following steps: s1, collecting historical PM2.5 data, and dividing the data into a training set and a test set; s2, constructing an LSTM model for detecting abnormal values of PM2.5 big data, and initializing the LSTM model; s3, optimizing the LSTM model by adopting an improved simulated annealing particle swarm algorithm, and training by utilizing a training set to obtain the LSTM model meeting the conditions; and S4, performing PM2.5 data test on the optimized LSTM model by using the test set to obtain an abnormal value. The simulated annealing particle swarm optimization method is improved based on the characteristics of PM2.5 big data, so that the method is optimized according to the characteristic that a prediction curve is smoother, the detection method is strong in pertinence, and a better result can be obtained.

Description

PM2.5 data abnormal value detection method based on improved LSTM
Technical Field
The invention relates to the field of big data processing, in particular to a PM2.5 data abnormal value detection method based on improved LSTM.
Background
Abnormal value detection is a long-standing problem of great importance to researchers in the field of big data processing, and the problem has wide practical application value in specific applications such as data preprocessing, behavior prediction and behavior analysis. Again, however, this problem is more challenging. Firstly, under the condition of big data, the data are often characterized by complex structure, much noise and the like, which becomes a barrier for deeply mining the potential value of the big data. Secondly, the traditional abnormal value detection method is not applicable to PM2.5 big data.
The current PM2.5 abnormal value detection method can be mainly classified into a method based on traditional statistics, a clustering method based on deep learning, and a prediction method based on deep learning. Statistical-based methods are only applicable to low-dimensional numerical data sets and depend on indicators such as data distribution, parameter distribution, number of expected outliers, and the like. In data samples needing clustering based on the deep learning clustering method, normal points account for most parts, abnormal points account for very small parts, otherwise, excessive abnormal samples can be clustered to influence judgment.
In recent years, the prediction method based on deep learning exhibits good performance and robustness. However, the method for detecting the abnormal value of PM2.5 mainly has the following problems: 1) the structure of the LSTM network is complex, when PM2.5 data with different characteristics are faced, the adjustment of parameters of the LSTM requires a designer to have rich neural network design and parameter adjustment experience, a good neural network structure usually requires careful adjustment of the designer, and the process takes a lot of time and energy of the designer; 2) the prediction process often has the defects of slow convergence speed and easy falling into local optimization.
Disclosure of Invention
The invention aims to provide a PM2.5 data abnormal value detection method based on improved LSTM, which aims to solve the problems in the prior art, and aims to solve the problems that the LSTM model is easy to learn the main variation trend of the PM2.5 concentration data due to more wind seasons, larger wind power and certain randomness in spring and autumn, and the method is optimized by neglecting random influence, so that the prediction curve is smoother, the pertinence is strong, and better results can be obtained.
In order to achieve the purpose, the invention provides the following scheme: the invention provides a PM2.5 data abnormal value detection method based on improved LSTM, which comprises the following steps:
s1, collecting historical PM2.5 data, marking normal values and abnormal values in the historical PM2.5 data, and dividing a training set and a test set;
s2, constructing an LSTM model for detecting abnormal values of PM2.5 big data, and initializing the LSTM model;
s3, optimizing the LSTM model by adopting an improved simulated annealing particle swarm algorithm, and training the LSTM model by utilizing the training set to obtain the optimized LSTM model;
and S4, performing PM2.5 data test on the optimized LSTM model by using the test set to obtain an abnormal value.
Preferably, the LSTM model constructed in step S2 includes an input layer, an LSTM layer, a full connection layer, and an output layer.
Preferably, the number of input layer neurons is 24, the prediction step of the LSTM model is initialized to 24, the number of LSTM layer neurons is initialized to 100, the full-link layer is initialized to 3 layers, the number of full-link layer neurons is initialized to 50, and the number of output layer neurons is 1, which is used to output predicted PM2.5 concentration data.
Preferably, the LSTM model optimization in S3 includes the following steps:
s31, initializing parameters in the simulated annealing particle swarm algorithm;
s32, initializing the current iteration times;
s33, constructing a fitness function;
s34, carrying out LSTM model training by using the training set data, calculating the fitness value of each sample particle according to the fitness function, obtaining the fitness values of all sample particles, comparing the fitness values of the sample particles, and selecting the minimum value as the particle swarm fitness value;
s35, randomly jumping each sample particle to obtain new sample particles, calculating the jumping probability of each sample particle and the new sample particles, and selecting the sample particles forming the new sample particle swarm according to the jumping probability to form the new particle swarm;
s36, calculating the position and the speed of each sample particle in the new particle swarm, and updating the minimum value of the spatial position of the corresponding sample particle and the minimum value of the spatial position of the new particle swarm;
s37, judging whether the iteration number reaches the maximum iteration number, if not, repeatedly executing S34, if so, updating the current temperature T, if so, judging whether the updated current temperature T' is greater than a preset end temperature, if so, executing the step S32, otherwise, finishing the temperature reduction, and storing the optimal individual;
and S38, obtaining various parameters of the LSTM network according to the obtained optimal sample particles, and establishing an optimized LSTM model.
Preferably, the parameters in the S31 simulated annealing particle swarm algorithm include a particle swarm size, a maximum iteration number, an acceleration factor, an inertia weight, an initial velocity and an initial position of each sample particle.
Preferably, in step S33, the fitness function is a loss function of the LSTM model, and the fitness function is:
Figure BDA0002796563910000041
in the formula, Fit is fitness, n is the number of training samples, m is the preset maximum iteration number, and diIs the actual value of PM2.5, t, of the ith sample particleqThe predicted PM2.5 output value of the qth iteration, d is the actual PM2.5 mean value, and t is the PM2.5 mean value of the predicted output.
Preferably, the method for calculating the transition probability by using the Metropolis criterion in step S35 includes:
Figure BDA0002796563910000042
in the formula, P1For transition probability, k is Boltzmann constant, f (. cndot.) is fitness value, xnew(j) J individual in the New particle population, xold(j) And xi is a preset constant and T is the current temperature for the jth individual in the particle swarm before jumping.
Preferably, the calculation method for updating the sample particle velocity in step S36 is as follows:
Figure BDA0002796563910000043
in the formula, Si qPosition in space, V, of the ith particle for the q iterationi qFor the q iteration the velocity in space of the ith particle, PiIs the minimum value of the spatial position of the ith particle, PgIs the minimum value of the space position of the particle swarm, omega is the inertia weight, q is the current iteration number, c1And c2Is an acceleration factor, r1And r2Is [0,1 ]]A random number in between;
the method for updating the positions of the sample particles comprises the following steps: si q+1=Si q+Vi q+1
The invention discloses the following technical effects:
firstly, the neural network structure derived by the improved LSTM-based PM2.5 big data abnormal value detection method is a tree structure, and the neural network structure shares characteristic information at the bottom layer of the deep LSTM, so that the neural network is ensured to have a better effect on complex big data.
Secondly, the simulated annealing particle swarm algorithm is improved based on the characteristics of PM2.5 big data, and aims at the characteristics that the LSTM model is easy to learn the main variation trend of the PM2.5 big data in spring and autumn due to more wind seasons, larger wind power and certain randomness, and random influence is ignored, so that the method is optimized according to the characteristic that a prediction curve is smoother, the pertinence is strong, and better results can be obtained.
Finally, the method is a parameter-free and hyper-parameter-free algorithm, can be well adapted to various application scenes, does not need extra manpower to adjust the algorithm, effectively reduces the time required for designing and adjusting the LSTM network structure, optimizes the LSTM network structure and improves the robustness of the algorithm.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a LSTM model structure diagram of PM2.5 big data anomaly detection of the present invention;
FIG. 2 is an internal structure diagram of the LSTM model for PM2.5 big data anomaly detection according to the present invention;
FIG. 3 is a schematic flow chart of an embodiment of an LSTM model optimization method for PM2.5 big data anomaly detection according to the present invention;
FIG. 4 is a flowchart of the PM2.5 data abnormal value detection method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Referring to fig. 1-4, the invention provides a PM2.5 data abnormal value detection method based on improved LSTM, comprising the following steps:
the method comprises the steps of firstly, collecting historical PM2.5 data, marking normal values and abnormal values in the historical data, taking the first two months of data of the same quarter PM2.5 data as a training set, and taking the last month of data as a test set
Secondly, constructing a basic LSTM model for detecting abnormal values of PM2.5 big data, and initializing the basic LSTM model:
the overall structure of the LSTM model for detecting the PM2.5 big data abnormal value is shown in fig. 1, and the structure is unique in that a hidden layer is not separately arranged, and the hidden layer is different from a fully-connected layer in the influence of a weight function on the input value, so that when the number of neurons is optimized, the hidden layer is regarded as the fully-connected layer, and the optimized convergence speed is greatly increased. The number of neurons in an input layer is 24, the number of neurons in an LSTM layer is initialized to 100, a full connection layer is initialized to 3 layers, the number of neurons is initialized to 50, and the number of neurons in the full connection layer of the LSTM model aims to balance training speed and training difficulty. The batch processing number is fixed to 90, and the step number is fixed to 24, namely, the LSTM model processes 24 input PM2.5 concentration data (data size of one day) at a time, and processes 90 sets of data (data size of one quarter) by taking 24 data as a set. The number of neurons in the output layer is 1, and the output layer is responsible for outputting the predicted PM2.5 concentration data. The learning rate of LSTM is the depth of learning PM2.5 concentration data, and the smaller the learning rate, the finer the learning. The output layer is always 1, and the prediction result is the PM2.5 concentration data at the next time.
The internal structure of the LSTM model for PM2.5 data outlier detection is shown in fig. 2, where: xtIndicates the input at time t, htRepresenting the output of the fully connected layer at time t, CtThe output of the fully-connected layer memory unit at the time t is shown, and X is expressed as a dot-by-dot product. a represents a sigmoid layer, defines the degree to which each component is passed, and has an output between 0 and 1. 0 represents "none pass" and 1 represents "all pass".
The calculation method of each parameter comprises the following steps:
the calculation flow of the forgetting door is as formula (1), wherein h ist-1Is the output of the fully connected layer at time t-1, xtIs an input at time t, bfTo forget the door parameter, θfAre weights.
ft=sigmoid(θf*[ht-1,xt]+bf) (1)
Output value i of input gatetThe calculation process is shown as formula (2), wherein biFor input of gate parameters, θiAre weights.
it=sigmoid(θi*[ht-1,xt]+bi) (2)
Output value o of output gatetThe calculation process of (a) is shown in formula (3), wherein b0To output the gate parameter, θoAre weights.
ot=sigmoid(θo*[ht-1,xt]+bo) (3)
Input value c added to memory celltThe calculation flow is shown as formula (4), thetacAre weights.
ct=tanh(θc*[ht-1,xt]+bc) (4)
Output value c of memory celltAs shown in formula (5), wherein c ist-1Indicating the output of the fully-connected layer memory cell at time t-1.
ct=ft*ct-1+it*ct-1 (5)
Output value h of full connection layertThe calculation flow is shown in formula (6).
ht=ot*tanh(ct) (6)
And thirdly, optimizing the LSTM model by adopting an improved simulated annealing particle swarm algorithm, putting the training set into the LSTM model for training, feeding back the calculated fitness value to the improved simulated annealing particle swarm algorithm to optimize the LSTM model, and obtaining the LSTM model meeting the conditions in continuous training and optimization. In the present embodiment, taking the particle group X as an example, the particle group X is composed of a single set of particles (X)1,X2,...,Xn) And (4) forming.
S31, initializing parameters in the simulated annealing particle swarm algorithm, including particle swarm size N, iteration times q and accelerationDegree factor c1And c2Inertial weight ω, initial velocity V of each sample particleiAnd an initial position SiAnd the like. Wherein, the particle swarm size N, the iteration times q are set fixed values, and the acceleration factor c1And c2The inertia weight omega participates in the velocity ViAnd position SiIn the update iteration of (2). Randomly generating sample particles X in an initial population of particlesi(h1,h2,h3A) wherein h1Representing the number of neurons in the first fully-connected layer of the LSTM model, h2Representing the number of neurons in the second fully-connected layer, h3Representing the number of neurons in the third fully-connected layer, a representing the learning rate of LSTM, the initial particle population XiAnd randomly generating a number as the value of the parameter in the value range of each parameter by using a rand function of randomly generating the number. Initial temperature T 0100 ℃, current temperature T, end temperature Tmin. The parameters specifically implemented are set as shown in table 1:
TABLE 1
Figure BDA0002796563910000091
S32, initializing the current iteration number q to 1.
And S33, constructing a fitness function. The fitness function is as shown in equation (7):
Figure BDA0002796563910000092
the fitness function Fit is simultaneously used as a loss function of the LSTM model, the value of the fitness function can visually express the effect of optimizing the LSTM model by a simulated annealing particle swarm algorithm, the magnitude of the fitness function value is determined by the result of PM2.5 data trained in the LSTM model, in the formula (7), n is the number of trained samples, m is the set maximum iteration number, d is the set maximum iteration number, andiis the actual value of PM2.5, t, of the ith sample particleqFor the predicted PM2.5 output value at the qth iteration,
Figure BDA0002796563910000093
actual PM2.5 mean and predicted output PM2.5 mean, respectively.
Fitness function pair d in the present inventioniAnd tqThe squared difference value of (2) increases the difference of the obtained fitness values, and simultaneously increases the square of the mean difference between the actual value and the predicted output value into a formula, so that the overall difference is considered under the condition of considering the individual difference of sample particles. The improved fitness function can evaluate sample particles more effectively.
S34, putting the PM2.5 data into an LSTM model for training, calculating the fitness value of each sample particle according to the fitness function, and reflecting the training effect of the PM2.5 data in the LSTM model through the fitness value so as to obtain the fitness values of all the sample particles. Comparing the fitness values of the sample particles, selecting the minimum value as a group fitness value, wherein the group fitness value reflects the goodness and badness of the LSTM model, and the smaller the fitness value is, the better the effect of the model on processing data is, so that the group fitness value is the optimal solution of the current sample particle swarm, and a basis is provided for the selection of the next optimal model;
s35, for sample particle xold(j)Random jump is carried out to obtain new sample particles xnew(j)And carrying out sample particle iteration. The fitness value f (x) is obtained from the fitness function (shown in equation (7))new(j)) The transition probability is calculated by the improved Metropolis criterion. The greater the transition probability, select xnew(j)The greater the probability of being an individual in a new population. And jumping all sample particles, and updating through an improved Metropolis criterion to obtain a new particle swarm.
The calculation of the transition probability P by means of the modified Metropolis criterion1The method of (2) is represented by the formula (8):
Figure BDA0002796563910000101
in the formula, P1For transition probability, k is BoltzmannConstant, f (x)new(j)) And f (x)old(j)) Represents the fitness value, xnew(j)And xold(j)And xi respectively represents a new particle swarm and an old particle swarm, namely the jth individual in the particle swarm before jumping, is a constant, and is 0.8 when detecting an abnormal PM2.5 concentration value in spring and autumn, so that T x xi is lower than T, and the particles receive a new state with a small energy difference with the current state. Xi is 1.2 when detecting PM2.5 concentration abnormal value in winter and summer, making T xi higher than T, accepting new state with large energy difference with current state. In the invention, the purpose of introducing the improved Metropolis criterion to calculate the transition probability is to select f (x)new(j)) And f (x)old(j)) The individuals with small medium fitness value are used as the individuals in the new particle swarm, the probability of local optimization is reduced, and therefore the optimization capability of the annealing algorithm is improved by the improved Metropolis criterion.
S36, the position and velocity of each sample particle constituting the new particle group are calculated. Updating the position S of the sample particlei q+1And velocity Vi q+1Using equations (9) and (10) respectively,
Vi q+1=ωVi q+c1r1(Pi-Si q)+c2r2(Pg-Si q) (9)
Si q+1=Si q+Vi q+1 (10)
in the formula, Si qFor the ith particle X of the q iterationiPosition in space, Vi qFor the ith particle X of the q iterationiVelocity in space, particle XiHas a minimum value of PiThe minimum value of the spatial position of the particle swarm is Pgω is the inertial weight, q is the current iteration number, c1And c2Is an acceleration factor and is a non-negative constant. r is1And r2Is [0,1 ]]Random number in between, make the space position of the ith particle and particle swarm have randomness in the proportion in the speed calculation of the sample particle;
the particle group X is composed of one particleSet (X)1,X2,...,Xn) And carrying out iterative replacement according to the calculated position and speed of the updated particle.
And updating the minimum value of the spatial positions of the sample particles forming the new particle sample group and the minimum value of the spatial positions of the new particle group, wherein the specific method comprises the following steps: firstly, calculating the fitness value of each sample particle according to the fitness function, judging whether the fitness value is the historical minimum fitness value of the sample particle or not for each sample particle, and if so, taking the spatial position corresponding to the fitness value as the spatial position minimum value of the sample particle. Then, the minimum fitness value of all the sample particles at this time is selected as a group fitness value, whether the group fitness value is the historical minimum fitness value of the group particles or not is judged, and if yes, the spatial position corresponding to the group fitness value is used as the spatial position minimum value of the group particles.
S37, when q is less than qmaxWhen q is q +1, step S34 is executed in a loop; when q is qmaxWhen the temperature T is higher than the preset temperature T, the current temperature T is updated to be T ', and T' is 0.99T; when T' is greater than TminIf yes, return to step S32; when T' is less than or equal to TminAnd (4) when the temperature reduction is finished, storing the optimal sample particles, wherein the parameters of the optimal sample particles are the same as the parameters corresponding to the population fitness value.
S38, obtaining the optimal sample particles Xi(h1,h2,h3A), obtaining parameters of the LSTM network to build the LSTM model, e.g. the optimal particle X generatedi(h1,h2,h3In a) h1,h2,h3And the values of a are respectively 50, 100, 150 and 0.01, then the number of neurons in the first fully-connected layer of the LSTM model is 50, the number of neurons in the second fully-connected layer is 100, the number of neurons in the third fully-connected layer is 150, and the learning rate of the LSTM is 0.01.
And fourthly, putting the PM2.5 big data test set into the established LSTM model to obtain prediction data, and comparing the prediction data with actual detection data to obtain abnormal values in the PM2.5 data set.
Aiming at the defects of a method for detecting PM2.5 concentration big data abnormal value by an LSTM model, the invention provides an improved simulated annealing particle swarm algorithm for optimizing the LSTM model, the algorithm provides an improved Metropolis criterion according to the fitness of each corresponding individual in an old particle swarm and a new particle swarm, all the individuals in the new particle swarm are corrected according to the situation, the diversity of the individual particle swarm is increased, and the global optimization capability of the algorithm is improved.
The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention can be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are within the scope of the present invention defined by the claims.

Claims (8)

1. A PM2.5 data abnormal value detection method based on improved LSTM is characterized by comprising the following steps:
s1, collecting historical PM2.5 data, marking normal values and abnormal values in the historical PM2.5 data, and dividing a training set and a test set;
s2, constructing an LSTM model for detecting abnormal values of PM2.5 big data, and initializing the LSTM model;
s3, optimizing the LSTM model by adopting an improved simulated annealing particle swarm algorithm, and training the LSTM model by utilizing the training set to obtain the optimized LSTM model;
and S4, performing PM2.5 data test on the optimized LSTM model by using the test set to obtain an abnormal value.
2. The improved LSTM based PM2.5 data outlier detection method of claim 1, characterized by: the LSTM model constructed in step S2 includes an input layer, an LSTM layer, a fully connected layer, and an output layer.
3. The improved LSTM based PM2.5 data outlier detection method of claim 2, wherein: the number of the input layer neurons is 24, the prediction step of the LSTM model is initialized to 24, the number of the LSTM layer neurons is initialized to 100, the full-link layer is initialized to 3 layers, the number of the full-link layer neurons is initialized to 50, and the number of the output layer neurons is 1, and is used for outputting predicted PM2.5 concentration data.
4. The improved LSTM based PM2.5 data outlier detection method of claim 1, characterized by: the LSTM model optimization in the S3 comprises the following steps:
s31, initializing parameters in the simulated annealing particle swarm algorithm;
s32, initializing the current iteration times;
s33, constructing a fitness function;
s34, carrying out LSTM model training by using the training set data, calculating the fitness value of each sample particle according to the fitness function, obtaining the fitness values of all sample particles, comparing the fitness values of the sample particles, and selecting the minimum value as the particle swarm fitness value;
s35, randomly jumping each sample particle to obtain new sample particles, calculating the jumping probability of each sample particle and the new sample particles, and selecting the sample particles forming the new sample particle swarm according to the jumping probability to form the new particle swarm;
s36, calculating the position and the speed of each sample particle in the new particle swarm, and updating the minimum value of the spatial position of the corresponding sample particle and the minimum value of the spatial position of the new particle swarm;
s37, judging whether the iteration number reaches the maximum iteration number, if not, repeatedly executing S34, if so, updating the current temperature T, if so, judging whether the updated current temperature T' is greater than a preset end temperature, if so, executing the step S32, otherwise, finishing the temperature reduction, and storing the optimal individual;
and S38, obtaining various parameters of the LSTM network according to the obtained optimal sample particles, and establishing an optimized LSTM model.
5. The improved LSTM-based PM2.5 data outlier detection method of claim 3, wherein: parameters in the simulated annealing particle swarm algorithm of S31 comprise particle swarm size, maximum iteration number, acceleration factor, inertia weight, and initial speed and initial position of each sample particle.
6. The improved LSTM-based PM2.5 data outlier detection method of claim 3, wherein: in the step S33, a fitness function is a loss function of the LSTM model, where the fitness function is:
Figure RE-FDA0002911628810000031
in the formula, Fit is fitness, n is the number of training samples, m is the preset maximum iteration number, and diIs the actual value of PM2.5, t, of the ith sample particleqFor the predicted PM2.5 output value of the qth iteration,
Figure RE-FDA0002911628810000034
is the actual PM2.5 mean value,
Figure RE-FDA0002911628810000033
the predicted output PM2.5 mean.
7. The improved LSTM-based PM2.5 data outlier detection method of claim 3, wherein: in step S35, the jump probability is calculated by using Metropolis criterion, and the specific method includes:
Figure RE-FDA0002911628810000032
in the formula, P1For transition probability, k is Boltzmann constant, f (. cndot.) is fitness value, xnew(j) J individual in the New particle population, xold(j) Before jumpingAnd xi of the jth individual in the particle swarm is a preset constant, and T is the current temperature.
8. The improved LSTM-based PM2.5 data outlier detection method of claim 3, wherein: the calculation method for updating the sample particle velocity in step S36 is as follows:
Vi q+1=ωVi q+c1r1(Pi-Si q)+c2r2(Pg-Si q)
in the formula, Si qPosition in space, V, of the ith particle for the q iterationi qFor the q iteration the velocity in space of the ith particle, PiIs the minimum value of the spatial position of the ith particle, PgIs the minimum value of the space position of the particle swarm, omega is the inertia weight, q is the current iteration number, c1And c2Is an acceleration factor, r1And r2Is [0,1 ]]A random number in between;
the method for updating the positions of the sample particles comprises the following steps: si q+1=Si q+Vi q+1
CN202011333748.5A 2020-11-25 2020-11-25 PM2.5 data abnormal value detection method based on improved LSTM Pending CN112506899A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011333748.5A CN112506899A (en) 2020-11-25 2020-11-25 PM2.5 data abnormal value detection method based on improved LSTM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011333748.5A CN112506899A (en) 2020-11-25 2020-11-25 PM2.5 data abnormal value detection method based on improved LSTM

Publications (1)

Publication Number Publication Date
CN112506899A true CN112506899A (en) 2021-03-16

Family

ID=74959818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011333748.5A Pending CN112506899A (en) 2020-11-25 2020-11-25 PM2.5 data abnormal value detection method based on improved LSTM

Country Status (1)

Country Link
CN (1) CN112506899A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115577643A (en) * 2022-11-23 2023-01-06 广东电网有限责任公司中山供电局 Temperature prediction method and device for cable terminal

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850891A (en) * 2015-05-29 2015-08-19 厦门大学 Intelligent optimal recursive neural network method of time series prediction
CN108986470A (en) * 2018-08-20 2018-12-11 华南理工大学 The Travel Time Estimation Method of particle swarm algorithm optimization LSTM neural network
CN109561084A (en) * 2018-11-20 2019-04-02 四川长虹电器股份有限公司 URL parameter rejecting outliers method based on LSTM autoencoder network
CN110263977A (en) * 2019-05-24 2019-09-20 河南大学 The method and device of Optimization Prediction PM2.5 based on LSTM neural network model
CN110334726A (en) * 2019-04-24 2019-10-15 华北电力大学 A kind of identification of the electric load abnormal data based on Density Clustering and LSTM and restorative procedure
CN111709182A (en) * 2020-05-25 2020-09-25 温州大学 Electromagnet fault prediction method based on SA-PSO (SA-particle swarm optimization) optimized BP (Back propagation) neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850891A (en) * 2015-05-29 2015-08-19 厦门大学 Intelligent optimal recursive neural network method of time series prediction
CN108986470A (en) * 2018-08-20 2018-12-11 华南理工大学 The Travel Time Estimation Method of particle swarm algorithm optimization LSTM neural network
CN109561084A (en) * 2018-11-20 2019-04-02 四川长虹电器股份有限公司 URL parameter rejecting outliers method based on LSTM autoencoder network
CN110334726A (en) * 2019-04-24 2019-10-15 华北电力大学 A kind of identification of the electric load abnormal data based on Density Clustering and LSTM and restorative procedure
CN110263977A (en) * 2019-05-24 2019-09-20 河南大学 The method and device of Optimization Prediction PM2.5 based on LSTM neural network model
CN111709182A (en) * 2020-05-25 2020-09-25 温州大学 Electromagnet fault prediction method based on SA-PSO (SA-particle swarm optimization) optimized BP (Back propagation) neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周梁琦等: "基于贝叶斯的大数据异常值检测模型研究", 《电脑知识与技术》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115577643A (en) * 2022-11-23 2023-01-06 广东电网有限责任公司中山供电局 Temperature prediction method and device for cable terminal

Similar Documents

Publication Publication Date Title
CN111738512B (en) Short-term power load prediction method based on CNN-IPSO-GRU hybrid model
CN110084610B (en) Network transaction fraud detection system based on twin neural network
CN112906982A (en) GNN-LSTM combination-based network flow prediction method
CN107316099A (en) Ammunition Storage Reliability Forecasting Methodology based on particle group optimizing BP neural network
CN103971160A (en) Particle swarm optimization method based on complex network
WO2019061187A1 (en) Credit evaluation method and apparatus and gradient boosting decision tree parameter adjustment method and apparatus
CN111260118A (en) Vehicle networking traffic flow prediction method based on quantum particle swarm optimization strategy
CN104200096B (en) Arrester grading ring optimization based on differential evolution algorithm and BP neural network
CN111119282A (en) Pressure monitoring point optimal arrangement method for water supply pipe network
CN106781465A (en) A kind of road traffic Forecasting Methodology
WO2018137442A1 (en) Data prediction method and device
CN110991621A (en) Method for searching convolutional neural network based on channel number
CN110598929A (en) Wind power nonparametric probability interval ultrashort term prediction method
Nevtipilova et al. Testing artificial neural network (ANN) for spatial interpolation
CN114202253A (en) Charging station load adjustable potential evaluation method and system, storage medium and server
CN114118567A (en) Power service bandwidth prediction method based on dual-channel fusion network
CN115131131A (en) Credit risk assessment method for unbalanced data set multi-stage integration model
CN112506899A (en) PM2.5 data abnormal value detection method based on improved LSTM
CN115272774A (en) Sample attack resisting method and system based on improved self-adaptive differential evolution algorithm
CN107871157B (en) Data prediction method, system and related device based on BP and PSO
CN109697531A (en) A kind of logistics park-hinterland Forecast of Logistics Demand method
CN109447231B (en) Method for solving multi-attribute bilateral matching problem under shared economic background by ant colony algorithm
CN109190820B (en) Electric power market electricity selling quantity depth prediction method considering user loss rate
CN115438842A (en) Load prediction method based on adaptive improved dayflies and BP neural network
CN110322351A (en) Multi-source driving quantization investment model under Depth Stratification strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210316

RJ01 Rejection of invention patent application after publication