CN116227748A

CN116227748A - Training method and prediction method of ecological environment PM2.5 concentration prediction model

Info

Publication number: CN116227748A
Application number: CN202310509941.7A
Authority: CN
Inventors: 王兴举; 李彦伟; 王志斌; 冯雷; 杜群乐; 郭猛; 申大为
Original assignee: Shijiazhuang Tiedao University
Current assignee: Shijiazhuang Tiedao University
Priority date: 2023-05-08
Filing date: 2023-05-08
Publication date: 2023-06-06

Abstract

The invention discloses a training method and a prediction method of an ecological environment PM2.5 concentration prediction model, which belong to the technical field of environmental monitoring and comprise the following steps: s1, acquiring original data acquired from an ecological environment monitoring system; s2, based on Pearson correlation analysis, feature selection is carried out from the original data, and data features are normalized to obtain a training set and a testing set for training; s3, training the PSO-LSTM, lightGBM model through the training set, and weighting output values of the PSO-LSTM, lightGBM model and the PSO-LSTM, lightGBM model to obtain a prediction model, so that the technical problems of low accuracy of the existing ecological environment data, inaccurate monitoring prediction and the like are solved.

Description

Training method and prediction method of ecological environment PM2.5 concentration prediction model

Technical Field

The invention relates to the technical field of construction ecological monitoring, in particular to a training method and a prediction method of an ecological environment PM2.5 concentration prediction model.

Background

Ecological environment construction and protection problems in construction areas have attracted great attention. With the increase of construction projects, the ecological environment is rapidly deteriorated, such as extreme weather caused by groundwater pollution and dust emission, and pollutants such as PM2.5 bring about a plurality of adverse effects on the living health of people, so that the ecological environment monitoring and control of construction are necessary. Because the ecological environment monitoring relates to the field widely and has a large range, the ecological environment monitoring technology is improved by machine learning during the monitoring and processing of ecological data, the environment prediction and early warning functions are realized, the abnormal value is better monitored according to the historical data, the environmental pollution condition is restrained by timely taking relevant measures, and the damage of construction to the environment is reduced.

With the importance of various departments of society on environmental protection, particularly the prevention and treatment of air pollution, new technologies mainly based on machine learning begin to be used for the research of the prevention and treatment of air pollution, and various neural network learning methods based on Support Vector Machines (SVM), particle Swarm Optimization (PSO), long and short time memory neural networks (LSTM), convolutional Neural Networks (CNN) and the like are used for researching the relation between the concentration of air pollutants and other environmental factors by people to monitor and predict the concentration of dust.

The technical problem to be solved by the invention is to provide an ecological concentration prediction method, wherein the accuracy of the existing time sequence prediction model is limited when complex and nonlinear data are processed, the flexibility of the prediction model is limited, and the continuously-changing data mode cannot be processed, so that the dynamically-adaptive continuously-changing data mode is required to be provided, and the prediction accuracy of the model is improved. Meanwhile, the model needs to be used for processing a large data set through a large amount of data calculation, and some existing time sequence prediction models may have higher calculation cost, so that feature selection and optimization are needed for the input data set, and the problems of reducing the calculation amount and the input data dimension are solved. Based on the above-described problems, there is a need in the art for a method of predicting PM 2.5.

Disclosure of Invention

The invention provides a training method and a prediction method of an ecological environment PM2.5 concentration prediction model, which are used for realizing the improvement of an ecological environment monitoring technology from root by effectively monitoring and processing ecological data and realizing the environment prediction and early warning functions by means of machine learning, and adopting relevant measures to inhibit environmental pollution in time according to better realized monitoring abnormal values of historical data, so as to reduce the damage of construction to the environment.

The invention provides the following technical scheme for realizing the purpose:

on the one hand, the training method of the ecological environment PM2.5 concentration prediction model comprises the following steps:

s1, acquiring original data acquired from an ecological environment monitoring system;

s2, based on Pearson correlation analysis, relevant feature selection is carried out from the original data, and relevant features are normalized to obtain a training set and a testing set for training;

and S3, training the PSO-LSTM, lightGBM model through the training set, and weighting the output values of the two models to obtain a prediction model.

The method is further improved as follows: the step S1 includes:

and acquiring the original data of temperature and humidity, wind speed, air pressure, concentration of dissolved oxygen in water, soil conductivity, PM2.5 and PM10 in the construction environment.

The method is further improved as follows: the step S2 includes:

step S21: checking the integrity of the original data obtained in the step S1, and complementing the abnormal value and the missing value in the original data in the step S1 by adopting a smooths algorithm;

step S22: s21, carrying out pearson correlation analysis on the completed data to determine the related characteristics of PM2.5 and wind speed, temperature, humidity, PM2.5 and PM 10;

step S23: and carrying out normalization processing on the related features, and dividing the feature data after normalization processing into a training set and a testing set.

The method is further improved as follows: the normalization processing method in the step S23 is maximum and minimum normalization.

The method is further improved as follows: training the PSO-LSTM model and the lightGBM model through the training set respectively, weighting the output values of the PSO-LSTM model and the lightGBM model, and obtaining the prediction model comprises the following steps:

training the PSO-LSTM model by a training set, and obtaining an output value of the PSO-LSTM model by the following method:

initializing the inertia weight and the bias constant of a particle swarm algorithm PSO single particle to obtain an initial weight and a bias constant; wherein the initial weights include: the weight of the input gate, the weight of the forget gate and the weight of the output gate; the bias constant includes: the bias constant of the input gate, the bias constant of the forget gate and the bias constant of the output gate;

training process: and training the LSTM model by combining the inertia weight and the bias constant of the single particle to obtain the output of each unit gate of the PSO-LSTM model, and judging whether the output of each unit gate reaches the optimal solution or not and whether the iteration number reaches the maximum iteration number or not.

The method is further improved as follows: based on the PSO-LSTM model, in combination with the self-adaptive adjustment of single particles of the particle swarm algorithm PSO, the inertia weight and the learning factor of the particle swarm algorithm PSO are updated according to the following formula, and the updated inertia weight and learning factor are obtained:

the method comprises the following steps: the inertia weight improvement formula is:

；

wherein ,

for improved inertial weight, +.>

Is the minimum inertial weight, +.>

Is the maximum weight of the inertia that will be the greatest,

is an activation function->

Is the current iteration number, +.>

Is the maximum number of iterations;

learning factor

and />

The improvement formula is as follows:

；

wherein ,

is the current iteration number, +.>

Is the maximum number of iterations;

when (when)

At this time, the learning factor +.>

and />

The following is shown: />

。

training the lightGBM model by a training set to obtain an initial training result of the lightGBM model, wherein the training result comprises the following specific steps:

determining a primary optimal segmentation point of the lightGBM model according to the training set and a gain formula;

generating leaf division points of an initial lightGBM index model according to the primary optimal division points;

determining a maximum point of the leaf division point gain according to a preset division threshold value, and generating a decision tree of an initial lightGBM model according to the maximum point of the leaf division point gain and the leaf division point;

and configuring the initial lightGBM model by utilizing the decision tree to obtain the lightGBM model with the training completed.

The method is further improved as follows: the step S3 includes:

step S31: inputting the test set of the step S23 into the trained PSO-LSTM and lightGBM models;

step S32: and (3) carrying out weighted combination on the wind speed, the temperature, the humidity, the PM2.5 and the PM10 data predicted by the PSO-LSTM and the lightGBM models by adopting an error reciprocal method to obtain a prediction model, wherein the weighting process is as follows:

；

wherein ,

and />

For the absolute value of the error between the predicted value and the true value of the model, < >>

For the prediction value of the weighted combination model, +.>

Is a predicted value of PSO-LSTM, < >>

For the predictive value of the lightGBM model, +.>

and />

The weight coefficients of the PSO-LSTM, lightGBM model are respectively given.

On the other hand, the prediction method of the ecological environment PM2.5 concentration prediction model comprises the following steps:

s101, acquiring data of temperature and humidity, wind speed, air pressure, concentration of dissolved oxygen in water, soil conductivity, PM2.5 and PM10 in an ecological environment, and carrying out normalization treatment on the data;

s102, inputting the normalized data into the prediction model of the training method to obtain the PM2.5 concentration predicted value of the ecological environment.

By adopting the technical scheme, the invention has the following technical progress:

1. according to the invention, the lightweight transmission of the data is realized under the condition of not losing the data by the data cleaning and data transmission frequency compressing methods.

2. According to the invention, all particles are updated to global optimum through a particle swarm algorithm PSO, and super parameters corresponding to the global optimum are recorded and used as super parameters input by an LSTM model, so that the situation that the particles are trapped in the local optimum is avoided.

3. The invention adjusts the parameters of the particle swarm algorithm PSO to improve the global and local optimizing capability of the algorithm, and dynamically adjusts the inertia weight linearly (or nonlinearly) according to the iteration process and the particle flight condition so as to balance the global and convergence speed of the search.

4. According to the invention, the PSO-LSTM and lightGBM weight combination prediction model is used for improving the prediction precision and accuracy of the model.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of the PSO-LSTM model of the present invention;

FIG. 2 is a flow chart of the PSO-LSTM and lightGBM models of the present invention;

FIG. 3 is a diagram of a predictive model architecture of the present invention;

fig. 4 is a diagram of a development apparatus of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention is described in further detail below with reference to the accompanying drawings:

as shown in fig. 1-4, the method of the present invention comprises the following steps:

the training method of the ecological environment PM2.5 concentration prediction model comprises the following steps of:

As can be seen from fig. 1, the embodiment of the present invention acquires raw data acquired from an ecological environment monitoring system, obtains a training set and a test set based on pearson correlation analysis and normalization processing, trains a PSO-LSTM, lightGBM model through the training set, and weights output values of two models thereof to obtain a prediction model. According to the method, the change condition of PM2.5 concentration is predicted through construction dust monitoring and prediction models, and the prediction method can be combined with the characteristics based on PSO-LSTM and lightGBM models, and can be used for better prediction through time sequence characteristics and decision tree models.

Further, raw data of temperature and humidity, wind speed, air pressure, concentration of dissolved oxygen in water, soil conductivity, PM2.5 and PM10 in the construction environment are obtained.

Checking the integrity of the obtained original data, and replacing and supplementing the abnormal value and the missing value of the original data by adopting a smooth function, wherein the smooth function is as follows:

wherein ,

the original value of the same type of sensor in unit time is obtained, the sensor uploads one data every 5 minutes,

taking an average value of the missing value data in each hour as 12;

carrying out relevance analysis on the extracted ecological environment features by adopting pearson relevance analysis to obtain a contribution value of each feature to PM2.5 concentration, wherein a pearson relevance coefficient formula is as follows:

wherein, the data x represents the temperature, humidity, wind speed, air pressure, concentration of dissolved oxygen in water, soil conductivity and parameter values of PM10, y is the concentration value of PM2.5, and N is the number of predicted time points in each data set;

in this embodiment, the contribution value of each variable to the PM2.5 concentration can be obtained according to the pearson correlation coefficient, the environmental variables which are irrelevant or have too small correlation are removed according to the size of the contribution value, the other variables are taken as environmental parameters for predicting the PM2.5 concentration, and the completed data are analyzed by pearson correlation to determine the characteristics of the PM2.5 related to the wind speed, the temperature, the humidity, the PM2.5 and the PM 10;

carrying out normalization processing on the related features, dividing the feature data after normalization processing into a training set and a testing set, and normalizing the feature data with the normalization function of maximum and minimum normalization:

wherein ,

for normalized data, ++>

For the original data +.>

Is the minimum of the original data, +.>

Is the maximum value of the original data.

Further, the LSTM neural network model comprises an input layer, a hidden layer and an output layer; in this embodiment, the method for constructing the long-term and short-term memory neural model is as follows:

1. establishing an LSTM neural network model with the hidden layer number of 2, wherein the first layer is an input layer, and the input dimension is: the divided data set is input after being divided, the input dimension is set to 5, and the time step (time_step): the time step is the amount of data collected by the sensor for one hour, and in the experimental method, one piece of data is updated every 5 minutes, so the input time series length is set to be 12; the second layer and the third layer are hidden layers (hidden_size), the number of neurons of the second layer is 12, and the number of neurons of the third layer is 22; a fourth output layer (output_size) for predicting the value of PM2.5, so that the feature dimension is 1;

2. setting batch size (batch_size), learning rate (learning_rate), training iteration number (nb_epoch) and hidden layer neuron number (hidden_layers) in an LSTM model as super parameters, wherein the batch size value range is [1,60], the learning rate value range is [0.001,0.01], the training iteration number value range is [50,500], the hidden layer neuron number value range is [10,100], determining the optimal value of the super parameters according to a PSO algorithm, and constructing the LSTM model by utilizing the optimal super parameters;

3. training the PSO-LSTM model by a training set, and obtaining an output value of the PSO-LSTM model by the following method:

training process: training the LSTM model by combining the inertia weight and the bias constant of a single particle to obtain each unit gate output of the PSO-LSTM model, and judging whether each unit gate output reaches an optimal solution or not and whether the iteration number reaches the maximum iteration number or not;

4. setting the particle number of the PSO particle group as 30, wherein the inertia weight, the individual acceleration factor and the group acceleration factor of each particle are updated in real time based on an improved formula, and updated inertia weight and learning factor are obtained:

；

wherein ,

for improved inertial weight, +.>

Is the minimum inertial weight, +.>

Is the maximum inertial weight, +.>

Is an activation function->

Is the current iteration number, +.>

Is the maximum number of iterations;

learning factor

and />

The improvement formula is as follows:

；

wherein ,

is the current iteration number, +.>

Is the maximum number of iterations;

when (when)

At this time, the learning factor +.>

and />

The following is shown:

；

and training the LSTM model by combining the inertia weight and the bias constant of the single particle to obtain the output of each unit gate of the PSO-LSTM model, and judging whether the output of each unit gate reaches the optimal solution or not and whether the iteration number reaches the maximum iteration number or not.

Further, the gradient lifting decision tree model comprises an input layer, a hidden layer and an output layer; in this embodiment, the method for constructing the gradient lifting decision tree model is as follows:

the gradient lifting decision tree model parameters comprise a decision tree maximum depth (max_depth), a leaf node minimum record number (min_data_in_leaf), a learning rate (learning_rate), a stored maximum feature number, a leaf node number (num_leaves), a iteration number (num_boost_round), a data proportion (bagging_fraction) used in each iteration, a proportion (feature_fraction) of randomly used features in each iteration and bagging, and the gradient lifting decision tree model is constructed as follows:

building a LightGBM model and defining parameters: in the decision random forest, the maximum depth (max_depth) of the decision tree is set to be 4; the leaf node minimum record number (min_data_in_leaf) is set to 20; the learning rate (learning_rate) is set to 0.1; the number of leaf nodes (num_leave) is set to 31; the stored maximum feature number (max_bin) is set to 255; the iteration number (num_boost_round) is set to 100; the data ratio (bagging_fraction) used for each iteration is set to 0.8; the bagging use number (bagging_freq) is set to 4; the proportion of randomly used features in each iteration (feature_fraction) is set to 0.8.

Further, training a gradient lifting decision tree model to obtain a test network, wherein the training process is as follows:

Gradient-lifting decision tree model (lightGBM) learning includes: defining weak models

The model means decision tree model, +.>

Indicating pass->

A model generated by multiple iterations, consisting of->

A weak model composition, defining f_0->

The weak learner defining the integrated m iteration number is a strong learner:

when the LSTM neural network is trained, a sigmoid function is used as an activation function of the hidden neurons, and the activation formula is as follows:

in the set PSO-LSTM model training, determining an optimal value of the super parameter based on a PSO algorithm, wherein the batch size value is 50, the learning rate value is 0.007, the training iteration number value is 600, the number of hidden layer neurons of the second layer is 12, the number of hidden layer neurons of the third layer is 22, and the determined batch size, learning rate, training iteration number and hidden layer neuron number are brought into the LSTM model;

setting training parameters in the LSTM neural network model, wherein a training sample is the half-year historical data volume of the sensor, and is set to 48384; batch size was set to 50 and setting cycle (epochs) was set to 100; parameters include training sample number, batch size, learning rate, and periodic period;

when the LSTM neural network model is trained, parameters of a hidden layer of the LSTM model are adjusted by using an Adam optimization algorithm, the exponential decay rate beta_1 of the first moment estimation defaults to 0.9, the exponential decay rate beta_2 of the second moment estimation defaults to 0.999, and the learning rate epsilon is set to 1e-8;

setting the parameters of a LightGBM model and training: the data set collected by the sensor in half a year is loaded into a model, a training set comprises 48384 samples, the number of lifting iterations (num_boost_round) is set to 20, the sample sampling rate (bagging_fraction) is set to 0.8, training is carried out, and the construction dust monitoring and predicting model is obtained after the training is finished.

After PSO-LSTM model and lightGBM model are trained and parameter tuning is carried out respectively, a test set is input into the trained PSO-LSTM and lightGBM models, and the PSO-LSTM and lightGBM models are weighted and combined by adopting a reciprocal error method to obtain a final prediction model;

the weighting process is as follows:

；

wherein ,

and />

For the prediction value of the weighted combination model, +.>

Is a predicted value of PSO-LSTM, < >>

For the predictive value of the lightGBM model, +.>

and />

The weight coefficients of the PSO-LSTM, lightGBM model are respectively given.

It should be noted that, the PSO-LSTM model is a particle swarm algorithm-the long-short-term memory neural network lightGBM model is a gradient lifting decision tree model;

based on the prediction model obtained by the training method of the ecological environment PM2.5 concentration prediction model, the process of predicting the PM2.5 concentration is as follows:

s102, inputting the normalized data into a prediction model obtained by the training method of the ecological environment PM2.5 concentration prediction model, and obtaining the PM2.5 concentration prediction value of the ecological environment.

The deployment process of the ecological data collection and PM2.5 concentration prediction method provided by the invention is as follows:

by constructing an ecological environment monitoring platform to realize the functions of collecting, storing, preprocessing, monitoring and predicting ecological environment data, under the condition that a plurality of environment variables influence PM2.5 concentration, PM2.5 concentration prediction is carried out by adopting a combined model based on PSO-LSTM and LightGBM, and the dust distribution of a whole road section is calculated by adopting an interpolation method based on the data of each detection device for the whole construction environment.

Claims

1. The training method of the ecological environment PM2.5 concentration prediction model is characterized by comprising the following steps of:

2. The method for training an ecological environment PM2.5 concentration prediction model according to claim 1, wherein the step S1 comprises:

3. The method for training an ecological environment PM2.5 concentration prediction model according to claim 1, wherein the step S2 comprises:

4. The method for training an ecological environment PM2.5 concentration prediction model according to claim 3, wherein the normalization processing method in step S23 is maximum and minimum normalization.

5. The method for training the ecological environment PM2.5 concentration prediction model according to claim 1, wherein training the PSO-LSTM and lightGBM models by the training set and weighting the output values of the two models respectively, and obtaining the prediction model comprises:

6. The training method of the ecological environment PM2.5 concentration prediction model according to claim 5, which is characterized in that based on the PSO-LSTM model, in combination with adaptive adjustment of individual particles of a particle swarm algorithm PSO, the inertia weight and learning factor of the particle swarm algorithm PSO are updated according to the following formula, so as to obtain updated inertia weight and learning factor:

；

wherein ,

for improved inertial weight, +.>

Is the minimum inertial weight, +.>

Is the maximum inertial weight, +.>

Is an activation function->

Is the current iteration number, +.>

Is the maximum number of iterations;

learning factor

and />

The improvement formula is as follows:

；

wherein ,

is the current iteration number, +.>

Is the maximum number of iterations;

when (when)

At this time, the learning factor +.>

and />

The following is shown:

。

7. the method for training the ecological environment PM2.5 concentration prediction model according to claim 1, wherein training the PSO-LSTM and lightGBM models by the training set and weighting the output values of the two models respectively, and obtaining the prediction model comprises:

8. The method for training an ecological environment PM2.5 concentration prediction model according to claim 2, wherein the step S3 comprises:

；

wherein ,

and />

For the prediction value of the weighted combination model, +.>

Is a predicted value of PSO-LSTM, < >>

For the predictive value of the lightGBM model, +.>

and />

The weight coefficients of the PSO-LSTM, lightGBM model are respectively given.

9. The prediction method of the ecological environment PM2.5 concentration prediction model is characterized by comprising the following steps of:

s102, inputting the normalized data into a prediction model obtained by the training method of the ecological environment PM2.5 concentration prediction model according to any one of claims 1-8, and obtaining the ecological environment PM2.5 concentration predicted value.