CN116227748A - Training method and prediction method of ecological environment PM2.5 concentration prediction model - Google Patents

Training method and prediction method of ecological environment PM2.5 concentration prediction model Download PDF

Info

Publication number
CN116227748A
CN116227748A CN202310509941.7A CN202310509941A CN116227748A CN 116227748 A CN116227748 A CN 116227748A CN 202310509941 A CN202310509941 A CN 202310509941A CN 116227748 A CN116227748 A CN 116227748A
Authority
CN
China
Prior art keywords
training
model
pso
lstm
lightgbm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310509941.7A
Other languages
Chinese (zh)
Inventor
王兴举
李彦伟
王志斌
冯雷
杜群乐
郭猛
申大为
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shijiazhuang Tiedao University
Original Assignee
Shijiazhuang Tiedao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shijiazhuang Tiedao University filed Critical Shijiazhuang Tiedao University
Priority to CN202310509941.7A priority Critical patent/CN116227748A/en
Publication of CN116227748A publication Critical patent/CN116227748A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A50/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
    • Y02A50/20Air quality improvement or preservation, e.g. vehicle emission control or emission reduction by using catalytic converters

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Biophysics (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Primary Health Care (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a training method and a prediction method of an ecological environment PM2.5 concentration prediction model, which belong to the technical field of environmental monitoring and comprise the following steps: s1, acquiring original data acquired from an ecological environment monitoring system; s2, based on Pearson correlation analysis, feature selection is carried out from the original data, and data features are normalized to obtain a training set and a testing set for training; s3, training the PSO-LSTM, lightGBM model through the training set, and weighting output values of the PSO-LSTM, lightGBM model and the PSO-LSTM, lightGBM model to obtain a prediction model, so that the technical problems of low accuracy of the existing ecological environment data, inaccurate monitoring prediction and the like are solved.

Description

Training method and prediction method of ecological environment PM2.5 concentration prediction model
Technical Field
The invention relates to the technical field of construction ecological monitoring, in particular to a training method and a prediction method of an ecological environment PM2.5 concentration prediction model.
Background
Ecological environment construction and protection problems in construction areas have attracted great attention. With the increase of construction projects, the ecological environment is rapidly deteriorated, such as extreme weather caused by groundwater pollution and dust emission, and pollutants such as PM2.5 bring about a plurality of adverse effects on the living health of people, so that the ecological environment monitoring and control of construction are necessary. Because the ecological environment monitoring relates to the field widely and has a large range, the ecological environment monitoring technology is improved by machine learning during the monitoring and processing of ecological data, the environment prediction and early warning functions are realized, the abnormal value is better monitored according to the historical data, the environmental pollution condition is restrained by timely taking relevant measures, and the damage of construction to the environment is reduced.
With the importance of various departments of society on environmental protection, particularly the prevention and treatment of air pollution, new technologies mainly based on machine learning begin to be used for the research of the prevention and treatment of air pollution, and various neural network learning methods based on Support Vector Machines (SVM), particle Swarm Optimization (PSO), long and short time memory neural networks (LSTM), convolutional Neural Networks (CNN) and the like are used for researching the relation between the concentration of air pollutants and other environmental factors by people to monitor and predict the concentration of dust.
The technical problem to be solved by the invention is to provide an ecological concentration prediction method, wherein the accuracy of the existing time sequence prediction model is limited when complex and nonlinear data are processed, the flexibility of the prediction model is limited, and the continuously-changing data mode cannot be processed, so that the dynamically-adaptive continuously-changing data mode is required to be provided, and the prediction accuracy of the model is improved. Meanwhile, the model needs to be used for processing a large data set through a large amount of data calculation, and some existing time sequence prediction models may have higher calculation cost, so that feature selection and optimization are needed for the input data set, and the problems of reducing the calculation amount and the input data dimension are solved. Based on the above-described problems, there is a need in the art for a method of predicting PM 2.5.
Disclosure of Invention
The invention provides a training method and a prediction method of an ecological environment PM2.5 concentration prediction model, which are used for realizing the improvement of an ecological environment monitoring technology from root by effectively monitoring and processing ecological data and realizing the environment prediction and early warning functions by means of machine learning, and adopting relevant measures to inhibit environmental pollution in time according to better realized monitoring abnormal values of historical data, so as to reduce the damage of construction to the environment.
The invention provides the following technical scheme for realizing the purpose:
on the one hand, the training method of the ecological environment PM2.5 concentration prediction model comprises the following steps:
s1, acquiring original data acquired from an ecological environment monitoring system;
s2, based on Pearson correlation analysis, relevant feature selection is carried out from the original data, and relevant features are normalized to obtain a training set and a testing set for training;
and S3, training the PSO-LSTM, lightGBM model through the training set, and weighting the output values of the two models to obtain a prediction model.
The method is further improved as follows: the step S1 includes:
and acquiring the original data of temperature and humidity, wind speed, air pressure, concentration of dissolved oxygen in water, soil conductivity, PM2.5 and PM10 in the construction environment.
The method is further improved as follows: the step S2 includes:
step S21: checking the integrity of the original data obtained in the step S1, and complementing the abnormal value and the missing value in the original data in the step S1 by adopting a smooths algorithm;
step S22: s21, carrying out pearson correlation analysis on the completed data to determine the related characteristics of PM2.5 and wind speed, temperature, humidity, PM2.5 and PM 10;
step S23: and carrying out normalization processing on the related features, and dividing the feature data after normalization processing into a training set and a testing set.
The method is further improved as follows: the normalization processing method in the step S23 is maximum and minimum normalization.
The method is further improved as follows: training the PSO-LSTM model and the lightGBM model through the training set respectively, weighting the output values of the PSO-LSTM model and the lightGBM model, and obtaining the prediction model comprises the following steps:
training the PSO-LSTM model by a training set, and obtaining an output value of the PSO-LSTM model by the following method:
initializing the inertia weight and the bias constant of a particle swarm algorithm PSO single particle to obtain an initial weight and a bias constant; wherein the initial weights include: the weight of the input gate, the weight of the forget gate and the weight of the output gate; the bias constant includes: the bias constant of the input gate, the bias constant of the forget gate and the bias constant of the output gate;
training process: and training the LSTM model by combining the inertia weight and the bias constant of the single particle to obtain the output of each unit gate of the PSO-LSTM model, and judging whether the output of each unit gate reaches the optimal solution or not and whether the iteration number reaches the maximum iteration number or not.
The method is further improved as follows: based on the PSO-LSTM model, in combination with the self-adaptive adjustment of single particles of the particle swarm algorithm PSO, the inertia weight and the learning factor of the particle swarm algorithm PSO are updated according to the following formula, and the updated inertia weight and learning factor are obtained:
the method comprises the following steps: the inertia weight improvement formula is:
Figure SMS_1
wherein ,
Figure SMS_2
for improved inertial weight, +.>
Figure SMS_3
Is the minimum inertial weight, +.>
Figure SMS_4
Is the maximum weight of the inertia that will be the greatest,
Figure SMS_5
is an activation function->
Figure SMS_6
Is the current iteration number, +.>
Figure SMS_7
Is the maximum number of iterations;
learning factor
Figure SMS_8
and />
Figure SMS_9
The improvement formula is as follows:
Figure SMS_10
wherein ,
Figure SMS_11
is the current iteration number, +.>
Figure SMS_12
Is the maximum number of iterations;
when (when)
Figure SMS_13
At this time, the learning factor +.>
Figure SMS_14
and />
Figure SMS_15
The following is shown: />
Figure SMS_16
The method is further improved as follows: training the PSO-LSTM model and the lightGBM model through the training set respectively, weighting the output values of the PSO-LSTM model and the lightGBM model, and obtaining the prediction model comprises the following steps:
training the lightGBM model by a training set to obtain an initial training result of the lightGBM model, wherein the training result comprises the following specific steps:
determining a primary optimal segmentation point of the lightGBM model according to the training set and a gain formula;
generating leaf division points of an initial lightGBM index model according to the primary optimal division points;
determining a maximum point of the leaf division point gain according to a preset division threshold value, and generating a decision tree of an initial lightGBM model according to the maximum point of the leaf division point gain and the leaf division point;
and configuring the initial lightGBM model by utilizing the decision tree to obtain the lightGBM model with the training completed.
The method is further improved as follows: the step S3 includes:
step S31: inputting the test set of the step S23 into the trained PSO-LSTM and lightGBM models;
step S32: and (3) carrying out weighted combination on the wind speed, the temperature, the humidity, the PM2.5 and the PM10 data predicted by the PSO-LSTM and the lightGBM models by adopting an error reciprocal method to obtain a prediction model, wherein the weighting process is as follows:
Figure SMS_17
wherein ,
Figure SMS_18
and />
Figure SMS_19
For the absolute value of the error between the predicted value and the true value of the model, < >>
Figure SMS_20
For the prediction value of the weighted combination model, +.>
Figure SMS_21
Is a predicted value of PSO-LSTM, < >>
Figure SMS_22
For the predictive value of the lightGBM model, +.>
Figure SMS_23
and />
Figure SMS_24
The weight coefficients of the PSO-LSTM, lightGBM model are respectively given.
On the other hand, the prediction method of the ecological environment PM2.5 concentration prediction model comprises the following steps:
s101, acquiring data of temperature and humidity, wind speed, air pressure, concentration of dissolved oxygen in water, soil conductivity, PM2.5 and PM10 in an ecological environment, and carrying out normalization treatment on the data;
s102, inputting the normalized data into the prediction model of the training method to obtain the PM2.5 concentration predicted value of the ecological environment.
By adopting the technical scheme, the invention has the following technical progress:
1. according to the invention, the lightweight transmission of the data is realized under the condition of not losing the data by the data cleaning and data transmission frequency compressing methods.
2. According to the invention, all particles are updated to global optimum through a particle swarm algorithm PSO, and super parameters corresponding to the global optimum are recorded and used as super parameters input by an LSTM model, so that the situation that the particles are trapped in the local optimum is avoided.
3. The invention adjusts the parameters of the particle swarm algorithm PSO to improve the global and local optimizing capability of the algorithm, and dynamically adjusts the inertia weight linearly (or nonlinearly) according to the iteration process and the particle flight condition so as to balance the global and convergence speed of the search.
4. According to the invention, the PSO-LSTM and lightGBM weight combination prediction model is used for improving the prediction precision and accuracy of the model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the PSO-LSTM model of the present invention;
FIG. 2 is a flow chart of the PSO-LSTM and lightGBM models of the present invention;
FIG. 3 is a diagram of a predictive model architecture of the present invention;
fig. 4 is a diagram of a development apparatus of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention is described in further detail below with reference to the accompanying drawings:
as shown in fig. 1-4, the method of the present invention comprises the following steps:
the training method of the ecological environment PM2.5 concentration prediction model comprises the following steps of:
s1, acquiring original data acquired from an ecological environment monitoring system;
s2, based on Pearson correlation analysis, relevant feature selection is carried out from the original data, and relevant features are normalized to obtain a training set and a testing set for training;
and S3, training the PSO-LSTM, lightGBM model through the training set, and weighting the output values of the two models to obtain a prediction model.
As can be seen from fig. 1, the embodiment of the present invention acquires raw data acquired from an ecological environment monitoring system, obtains a training set and a test set based on pearson correlation analysis and normalization processing, trains a PSO-LSTM, lightGBM model through the training set, and weights output values of two models thereof to obtain a prediction model. According to the method, the change condition of PM2.5 concentration is predicted through construction dust monitoring and prediction models, and the prediction method can be combined with the characteristics based on PSO-LSTM and lightGBM models, and can be used for better prediction through time sequence characteristics and decision tree models.
Further, raw data of temperature and humidity, wind speed, air pressure, concentration of dissolved oxygen in water, soil conductivity, PM2.5 and PM10 in the construction environment are obtained.
Checking the integrity of the obtained original data, and replacing and supplementing the abnormal value and the missing value of the original data by adopting a smooth function, wherein the smooth function is as follows:
Figure SMS_25
wherein ,
Figure SMS_26
the original value of the same type of sensor in unit time is obtained, the sensor uploads one data every 5 minutes,
Figure SMS_27
taking an average value of the missing value data in each hour as 12;
carrying out relevance analysis on the extracted ecological environment features by adopting pearson relevance analysis to obtain a contribution value of each feature to PM2.5 concentration, wherein a pearson relevance coefficient formula is as follows:
Figure SMS_28
wherein, the data x represents the temperature, humidity, wind speed, air pressure, concentration of dissolved oxygen in water, soil conductivity and parameter values of PM10, y is the concentration value of PM2.5, and N is the number of predicted time points in each data set;
in this embodiment, the contribution value of each variable to the PM2.5 concentration can be obtained according to the pearson correlation coefficient, the environmental variables which are irrelevant or have too small correlation are removed according to the size of the contribution value, the other variables are taken as environmental parameters for predicting the PM2.5 concentration, and the completed data are analyzed by pearson correlation to determine the characteristics of the PM2.5 related to the wind speed, the temperature, the humidity, the PM2.5 and the PM 10;
carrying out normalization processing on the related features, dividing the feature data after normalization processing into a training set and a testing set, and normalizing the feature data with the normalization function of maximum and minimum normalization:
Figure SMS_29
wherein ,
Figure SMS_30
for normalized data, ++>
Figure SMS_31
For the original data +.>
Figure SMS_32
Is the minimum of the original data, +.>
Figure SMS_33
Is the maximum value of the original data.
Further, the LSTM neural network model comprises an input layer, a hidden layer and an output layer; in this embodiment, the method for constructing the long-term and short-term memory neural model is as follows:
1. establishing an LSTM neural network model with the hidden layer number of 2, wherein the first layer is an input layer, and the input dimension is: the divided data set is input after being divided, the input dimension is set to 5, and the time step (time_step): the time step is the amount of data collected by the sensor for one hour, and in the experimental method, one piece of data is updated every 5 minutes, so the input time series length is set to be 12; the second layer and the third layer are hidden layers (hidden_size), the number of neurons of the second layer is 12, and the number of neurons of the third layer is 22; a fourth output layer (output_size) for predicting the value of PM2.5, so that the feature dimension is 1;
2. setting batch size (batch_size), learning rate (learning_rate), training iteration number (nb_epoch) and hidden layer neuron number (hidden_layers) in an LSTM model as super parameters, wherein the batch size value range is [1,60], the learning rate value range is [0.001,0.01], the training iteration number value range is [50,500], the hidden layer neuron number value range is [10,100], determining the optimal value of the super parameters according to a PSO algorithm, and constructing the LSTM model by utilizing the optimal super parameters;
3. training the PSO-LSTM model by a training set, and obtaining an output value of the PSO-LSTM model by the following method:
initializing the inertia weight and the bias constant of a particle swarm algorithm PSO single particle to obtain an initial weight and a bias constant; wherein the initial weights include: the weight of the input gate, the weight of the forget gate and the weight of the output gate; the bias constant includes: the bias constant of the input gate, the bias constant of the forget gate and the bias constant of the output gate;
training process: training the LSTM model by combining the inertia weight and the bias constant of a single particle to obtain each unit gate output of the PSO-LSTM model, and judging whether each unit gate output reaches an optimal solution or not and whether the iteration number reaches the maximum iteration number or not;
4. setting the particle number of the PSO particle group as 30, wherein the inertia weight, the individual acceleration factor and the group acceleration factor of each particle are updated in real time based on an improved formula, and updated inertia weight and learning factor are obtained:
the method comprises the following steps: the inertia weight improvement formula is:
Figure SMS_34
wherein ,
Figure SMS_35
for improved inertial weight, +.>
Figure SMS_36
Is the minimum inertial weight, +.>
Figure SMS_37
Is the maximum inertial weight, +.>
Figure SMS_38
Is an activation function->
Figure SMS_39
Is the current iteration number, +.>
Figure SMS_40
Is the maximum number of iterations;
learning factor
Figure SMS_41
and />
Figure SMS_42
The improvement formula is as follows:
Figure SMS_43
wherein ,
Figure SMS_44
is the current iteration number, +.>
Figure SMS_45
Is the maximum number of iterations;
when (when)
Figure SMS_46
At this time, the learning factor +.>
Figure SMS_47
and />
Figure SMS_48
The following is shown:
Figure SMS_49
and training the LSTM model by combining the inertia weight and the bias constant of the single particle to obtain the output of each unit gate of the PSO-LSTM model, and judging whether the output of each unit gate reaches the optimal solution or not and whether the iteration number reaches the maximum iteration number or not.
Further, the gradient lifting decision tree model comprises an input layer, a hidden layer and an output layer; in this embodiment, the method for constructing the gradient lifting decision tree model is as follows:
the gradient lifting decision tree model parameters comprise a decision tree maximum depth (max_depth), a leaf node minimum record number (min_data_in_leaf), a learning rate (learning_rate), a stored maximum feature number, a leaf node number (num_leaves), a iteration number (num_boost_round), a data proportion (bagging_fraction) used in each iteration, a proportion (feature_fraction) of randomly used features in each iteration and bagging, and the gradient lifting decision tree model is constructed as follows:
building a LightGBM model and defining parameters: in the decision random forest, the maximum depth (max_depth) of the decision tree is set to be 4; the leaf node minimum record number (min_data_in_leaf) is set to 20; the learning rate (learning_rate) is set to 0.1; the number of leaf nodes (num_leave) is set to 31; the stored maximum feature number (max_bin) is set to 255; the iteration number (num_boost_round) is set to 100; the data ratio (bagging_fraction) used for each iteration is set to 0.8; the bagging use number (bagging_freq) is set to 4; the proportion of randomly used features in each iteration (feature_fraction) is set to 0.8.
Further, training a gradient lifting decision tree model to obtain a test network, wherein the training process is as follows:
determining a primary optimal segmentation point of the lightGBM model according to the training set and a gain formula;
generating leaf division points of an initial lightGBM index model according to the primary optimal division points;
determining a maximum point of the leaf division point gain according to a preset division threshold value, and generating a decision tree of an initial lightGBM model according to the maximum point of the leaf division point gain and the leaf division point;
and configuring the initial lightGBM model by utilizing the decision tree to obtain the lightGBM model with the training completed.
Gradient-lifting decision tree model (lightGBM) learning includes: defining weak models
Figure SMS_50
The model means decision tree model, +.>
Figure SMS_51
Indicating pass->
Figure SMS_52
A model generated by multiple iterations, consisting of->
Figure SMS_53
A weak model composition, defining f_0->
Figure SMS_54
The weak learner defining the integrated m iteration number is a strong learner:
Figure SMS_55
when the LSTM neural network is trained, a sigmoid function is used as an activation function of the hidden neurons, and the activation formula is as follows:
Figure SMS_56
in the set PSO-LSTM model training, determining an optimal value of the super parameter based on a PSO algorithm, wherein the batch size value is 50, the learning rate value is 0.007, the training iteration number value is 600, the number of hidden layer neurons of the second layer is 12, the number of hidden layer neurons of the third layer is 22, and the determined batch size, learning rate, training iteration number and hidden layer neuron number are brought into the LSTM model;
setting training parameters in the LSTM neural network model, wherein a training sample is the half-year historical data volume of the sensor, and is set to 48384; batch size was set to 50 and setting cycle (epochs) was set to 100; parameters include training sample number, batch size, learning rate, and periodic period;
when the LSTM neural network model is trained, parameters of a hidden layer of the LSTM model are adjusted by using an Adam optimization algorithm, the exponential decay rate beta_1 of the first moment estimation defaults to 0.9, the exponential decay rate beta_2 of the second moment estimation defaults to 0.999, and the learning rate epsilon is set to 1e-8;
setting the parameters of a LightGBM model and training: the data set collected by the sensor in half a year is loaded into a model, a training set comprises 48384 samples, the number of lifting iterations (num_boost_round) is set to 20, the sample sampling rate (bagging_fraction) is set to 0.8, training is carried out, and the construction dust monitoring and predicting model is obtained after the training is finished.
After PSO-LSTM model and lightGBM model are trained and parameter tuning is carried out respectively, a test set is input into the trained PSO-LSTM and lightGBM models, and the PSO-LSTM and lightGBM models are weighted and combined by adopting a reciprocal error method to obtain a final prediction model;
the weighting process is as follows:
Figure SMS_57
wherein ,
Figure SMS_58
and />
Figure SMS_59
For the absolute value of the error between the predicted value and the true value of the model, < >>
Figure SMS_60
For the prediction value of the weighted combination model, +.>
Figure SMS_61
Is a predicted value of PSO-LSTM, < >>
Figure SMS_62
For the predictive value of the lightGBM model, +.>
Figure SMS_63
and />
Figure SMS_64
The weight coefficients of the PSO-LSTM, lightGBM model are respectively given.
It should be noted that, the PSO-LSTM model is a particle swarm algorithm-the long-short-term memory neural network lightGBM model is a gradient lifting decision tree model;
based on the prediction model obtained by the training method of the ecological environment PM2.5 concentration prediction model, the process of predicting the PM2.5 concentration is as follows:
s101, acquiring data of temperature and humidity, wind speed, air pressure, concentration of dissolved oxygen in water, soil conductivity, PM2.5 and PM10 in an ecological environment, and carrying out normalization treatment on the data;
s102, inputting the normalized data into a prediction model obtained by the training method of the ecological environment PM2.5 concentration prediction model, and obtaining the PM2.5 concentration prediction value of the ecological environment.
The deployment process of the ecological data collection and PM2.5 concentration prediction method provided by the invention is as follows:
by constructing an ecological environment monitoring platform to realize the functions of collecting, storing, preprocessing, monitoring and predicting ecological environment data, under the condition that a plurality of environment variables influence PM2.5 concentration, PM2.5 concentration prediction is carried out by adopting a combined model based on PSO-LSTM and LightGBM, and the dust distribution of a whole road section is calculated by adopting an interpolation method based on the data of each detection device for the whole construction environment.

Claims (9)

1. The training method of the ecological environment PM2.5 concentration prediction model is characterized by comprising the following steps of:
s1, acquiring original data acquired from an ecological environment monitoring system;
s2, based on Pearson correlation analysis, relevant feature selection is carried out from the original data, and relevant features are normalized to obtain a training set and a testing set for training;
and S3, training the PSO-LSTM, lightGBM model through the training set, and weighting the output values of the two models to obtain a prediction model.
2. The method for training an ecological environment PM2.5 concentration prediction model according to claim 1, wherein the step S1 comprises:
and acquiring the original data of temperature and humidity, wind speed, air pressure, concentration of dissolved oxygen in water, soil conductivity, PM2.5 and PM10 in the construction environment.
3. The method for training an ecological environment PM2.5 concentration prediction model according to claim 1, wherein the step S2 comprises:
step S21: checking the integrity of the original data obtained in the step S1, and complementing the abnormal value and the missing value in the original data in the step S1 by adopting a smooths algorithm;
step S22: s21, carrying out pearson correlation analysis on the completed data to determine the related characteristics of PM2.5 and wind speed, temperature, humidity, PM2.5 and PM 10;
step S23: and carrying out normalization processing on the related features, and dividing the feature data after normalization processing into a training set and a testing set.
4. The method for training an ecological environment PM2.5 concentration prediction model according to claim 3, wherein the normalization processing method in step S23 is maximum and minimum normalization.
5. The method for training the ecological environment PM2.5 concentration prediction model according to claim 1, wherein training the PSO-LSTM and lightGBM models by the training set and weighting the output values of the two models respectively, and obtaining the prediction model comprises:
training the PSO-LSTM model by a training set, and obtaining an output value of the PSO-LSTM model by the following method:
initializing the inertia weight and the bias constant of a particle swarm algorithm PSO single particle to obtain an initial weight and a bias constant; wherein the initial weights include: the weight of the input gate, the weight of the forget gate and the weight of the output gate; the bias constant includes: the bias constant of the input gate, the bias constant of the forget gate and the bias constant of the output gate;
training process: and training the LSTM model by combining the inertia weight and the bias constant of the single particle to obtain the output of each unit gate of the PSO-LSTM model, and judging whether the output of each unit gate reaches the optimal solution or not and whether the iteration number reaches the maximum iteration number or not.
6. The training method of the ecological environment PM2.5 concentration prediction model according to claim 5, which is characterized in that based on the PSO-LSTM model, in combination with adaptive adjustment of individual particles of a particle swarm algorithm PSO, the inertia weight and learning factor of the particle swarm algorithm PSO are updated according to the following formula, so as to obtain updated inertia weight and learning factor:
the method comprises the following steps: the inertia weight improvement formula is:
Figure QLYQS_1
wherein ,
Figure QLYQS_2
for improved inertial weight, +.>
Figure QLYQS_3
Is the minimum inertial weight, +.>
Figure QLYQS_4
Is the maximum inertial weight, +.>
Figure QLYQS_5
Is an activation function->
Figure QLYQS_6
Is the current iteration number, +.>
Figure QLYQS_7
Is the maximum number of iterations;
learning factor
Figure QLYQS_8
and />
Figure QLYQS_9
The improvement formula is as follows:
Figure QLYQS_10
wherein ,
Figure QLYQS_11
is the current iteration number, +.>
Figure QLYQS_12
Is the maximum number of iterations;
when (when)
Figure QLYQS_13
At this time, the learning factor +.>
Figure QLYQS_14
and />
Figure QLYQS_15
The following is shown:
Figure QLYQS_16
7. the method for training the ecological environment PM2.5 concentration prediction model according to claim 1, wherein training the PSO-LSTM and lightGBM models by the training set and weighting the output values of the two models respectively, and obtaining the prediction model comprises:
training the lightGBM model by a training set to obtain an initial training result of the lightGBM model, wherein the training result comprises the following specific steps:
determining a primary optimal segmentation point of the lightGBM model according to the training set and a gain formula;
generating leaf division points of an initial lightGBM index model according to the primary optimal division points;
determining a maximum point of the leaf division point gain according to a preset division threshold value, and generating a decision tree of an initial lightGBM model according to the maximum point of the leaf division point gain and the leaf division point;
and configuring the initial lightGBM model by utilizing the decision tree to obtain the lightGBM model with the training completed.
8. The method for training an ecological environment PM2.5 concentration prediction model according to claim 2, wherein the step S3 comprises:
step S31: inputting the test set of the step S23 into the trained PSO-LSTM and lightGBM models;
step S32: and (3) carrying out weighted combination on the wind speed, the temperature, the humidity, the PM2.5 and the PM10 data predicted by the PSO-LSTM and the lightGBM models by adopting an error reciprocal method to obtain a prediction model, wherein the weighting process is as follows:
Figure QLYQS_17
wherein ,
Figure QLYQS_18
and />
Figure QLYQS_19
For the absolute value of the error between the predicted value and the true value of the model, < >>
Figure QLYQS_20
For the prediction value of the weighted combination model, +.>
Figure QLYQS_21
Is a predicted value of PSO-LSTM, < >>
Figure QLYQS_22
For the predictive value of the lightGBM model, +.>
Figure QLYQS_23
and />
Figure QLYQS_24
The weight coefficients of the PSO-LSTM, lightGBM model are respectively given.
9. The prediction method of the ecological environment PM2.5 concentration prediction model is characterized by comprising the following steps of:
s101, acquiring data of temperature and humidity, wind speed, air pressure, concentration of dissolved oxygen in water, soil conductivity, PM2.5 and PM10 in an ecological environment, and carrying out normalization treatment on the data;
s102, inputting the normalized data into a prediction model obtained by the training method of the ecological environment PM2.5 concentration prediction model according to any one of claims 1-8, and obtaining the ecological environment PM2.5 concentration predicted value.
CN202310509941.7A 2023-05-08 2023-05-08 Training method and prediction method of ecological environment PM2.5 concentration prediction model Pending CN116227748A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310509941.7A CN116227748A (en) 2023-05-08 2023-05-08 Training method and prediction method of ecological environment PM2.5 concentration prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310509941.7A CN116227748A (en) 2023-05-08 2023-05-08 Training method and prediction method of ecological environment PM2.5 concentration prediction model

Publications (1)

Publication Number Publication Date
CN116227748A true CN116227748A (en) 2023-06-06

Family

ID=86589543

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310509941.7A Pending CN116227748A (en) 2023-05-08 2023-05-08 Training method and prediction method of ecological environment PM2.5 concentration prediction model

Country Status (1)

Country Link
CN (1) CN116227748A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340292A (en) * 2020-02-26 2020-06-26 福州大学 Integrated neural network PM2.5 prediction method based on clustering
CN111860979A (en) * 2020-07-01 2020-10-30 广西大学 Short-term load prediction method based on TCN and IPSO-LSSVM combined model
CN111898820A (en) * 2020-07-27 2020-11-06 重庆市规划设计研究院 PM2.5 hour concentration combined prediction method and system based on trend clustering and integrated tree
CN111915073A (en) * 2020-04-28 2020-11-10 同济大学 Short-term prediction method for intercity passenger flow of railway by considering date attribute and weather factor
CN113627070A (en) * 2021-05-24 2021-11-09 国网新疆电力有限公司经济技术研究院 Short-term photovoltaic power prediction method
CN114970815A (en) * 2022-05-19 2022-08-30 南京信息工程大学 Traffic flow prediction method and device based on improved PSO algorithm optimized LSTM
CN115238850A (en) * 2022-06-30 2022-10-25 西南交通大学 Mountain slope displacement prediction method based on MI-GRA and improved PSO-LSTM
CN115951014A (en) * 2022-11-21 2023-04-11 南通大学 CNN-LSTM-BP multi-mode air pollutant prediction method combining meteorological features

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340292A (en) * 2020-02-26 2020-06-26 福州大学 Integrated neural network PM2.5 prediction method based on clustering
CN111915073A (en) * 2020-04-28 2020-11-10 同济大学 Short-term prediction method for intercity passenger flow of railway by considering date attribute and weather factor
CN111860979A (en) * 2020-07-01 2020-10-30 广西大学 Short-term load prediction method based on TCN and IPSO-LSSVM combined model
CN111898820A (en) * 2020-07-27 2020-11-06 重庆市规划设计研究院 PM2.5 hour concentration combined prediction method and system based on trend clustering and integrated tree
CN113627070A (en) * 2021-05-24 2021-11-09 国网新疆电力有限公司经济技术研究院 Short-term photovoltaic power prediction method
CN114970815A (en) * 2022-05-19 2022-08-30 南京信息工程大学 Traffic flow prediction method and device based on improved PSO algorithm optimized LSTM
CN115238850A (en) * 2022-06-30 2022-10-25 西南交通大学 Mountain slope displacement prediction method based on MI-GRA and improved PSO-LSTM
CN115951014A (en) * 2022-11-21 2023-04-11 南通大学 CNN-LSTM-BP multi-mode air pollutant prediction method combining meteorological features

Similar Documents

Publication Publication Date Title
CN108009674A (en) Air PM2.5 concentration prediction methods based on CNN and LSTM fused neural networks
CN111798051A (en) Air quality space-time prediction method based on long-short term memory neural network
CN110782093B (en) PM fusing SSAE deep feature learning and LSTM2.5Hourly concentration prediction method and system
Khajavi et al. Predicting the carbon dioxide emission caused by road transport using a Random Forest (RF) model combined by Meta-Heuristic Algorithms
Jalalkamali Using of hybrid fuzzy models to predict spatiotemporal groundwater quality parameters
CN108426812A (en) A kind of PM2.5 concentration value prediction techniques based on Memory Neural Networks
CN113554466B (en) Short-term electricity consumption prediction model construction method, prediction method and device
CN109143408B (en) Dynamic region combined short-time rainfall forecasting method based on MLP
CN109308544B (en) Blue algae bloom prediction method based on contrast divergence-long and short term memory network
CN112396234A (en) User side load probability prediction method based on time domain convolutional neural network
CN111931983A (en) Precipitation prediction method and system
CN108961460B (en) Fault prediction method and device based on sparse ESGP (Enterprise service gateway) and multi-objective optimization
CN112116162A (en) Power transmission line icing thickness prediction method based on CEEMDAN-QFAOA-LSTM
CN112272074B (en) Information transmission rate control method and system based on neural network
CN116316599A (en) Intelligent electricity load prediction method
CN114492922A (en) Medium-and-long-term power generation capacity prediction method
CN115271186B (en) Reservoir water level prediction and early warning method based on delay factor and PSO RNN Attention model
CN112504682A (en) Chassis engine fault diagnosis method and system based on particle swarm optimization algorithm
Sun et al. Pruning Elman neural network and its application in bolt defects classification
CN117077870B (en) Water resource digital management method based on artificial intelligence
CN113962819A (en) Method for predicting dissolved oxygen in industrial aquaculture based on extreme learning machine
Suresh et al. IoT with evolutionary algorithm based deep learning for smart irrigation system
CN117009788A (en) Buried fluid delivery pipeline perimeter collapse early warning method, storage medium and method based on water hammer characteristic parameter set
CN116227748A (en) Training method and prediction method of ecological environment PM2.5 concentration prediction model
CN116303786A (en) Block chain financial big data management system based on multidimensional data fusion algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230606

RJ01 Rejection of invention patent application after publication