CN115757534A

CN115757534A - Air quality prediction method of deep learning model based on potential source contribution analysis

Info

Publication number: CN115757534A
Application number: CN202211286414.6A
Authority: CN
Inventors: 黄欣悦; 刘海隆; 黎园园; 周良军; 赵宏涛; 沈淳懿
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-10-20
Filing date: 2022-10-20
Publication date: 2023-03-07

Abstract

The invention discloses an air quality prediction method of a deep learning model based on potential source contribution analysis, which is applied to the technical field of air quality prediction and aims at solving the problems that the prior art only considers the geographical distance of a research object in space and lacks deeper discussion on the influence of space diffusion; the method utilizes potential source contribution analysis based on the airflow track to construct a space influence graph to represent the space correlation among cities; extracting effective spatial characteristics from the spatial influence diagram by using a diagram convolution algorithm in combination with the pollution conditions of various cities, learning time characteristics in the concentration data of pollutants in the air in the past by using a long-term and short-term memory model, generating a prediction result, acting on a diagram convolution network by using a potential source contribution analysis result, and predicting the atmospheric pollution data of the cities by combining the long-term and short-term memory model; the method of the invention can be used for accurately predicting the urban atmospheric pollutants on a space-time scale.

Description

Air quality prediction method of deep learning model based on potential source contribution analysis

Technical Field

The invention belongs to the technical field of air quality prediction, and particularly relates to an air quality prediction technology based on potential source contribution analysis.

Background

Along with the continuous development of urbanization, the scale of cities is gradually enlarged, the urban environment is deteriorated, the problems of excessive pollution discharge and environmental pollution occur sometimes, the quantity of vehicles is increased, the air quality is damaged by air pollution sources such as factory air pollutant discharge around the cities, automobile exhaust and the like, the air quality of the cities becomes one of the environmental problems in China, and the air quality problem seriously influences the physical health of people in China. The pollutants causing air pollution mainly include carbon monoxide, ozone, carbon dioxide, partial nitrogen oxides, particulate matters and the like, wherein the pollutants causing the greatest harmful effect on human health are the particulate matters. People who have long-term exposure to fine particulate contamination are more likely to be at serious risk for cardiovascular and respiratory diseases because the atmospheric fine particulate has a small particle size and can act as a carrier for toxic substances. The prediction of the atmospheric pollution has important significance on environmental protection and air pollution control, can effectively predict the air quality, can effectively reflect the future change trend of the atmospheric environment, and can prevent the occurrence of atmospheric pollution events in advance. The method has important significance for strengthening atmospheric environment protection, improving the current situation of atmospheric pollution prevention and control and promoting ecological environment restoration.

At present, most of the research on air quality space-time prediction only considers the geographical distance of a research object in space, and the deep research on the influence of the space diffusion is lacked.

Disclosure of Invention

In order to solve the technical problems, the invention provides an air quality prediction method of a deep learning model based on potential contribution analysis.

The technical scheme adopted by the invention is as follows: the air quality prediction method of the deep learning model based on the potential contribution analysis comprises the following steps:

s1, calculating atmospheric pollution contribution factors of peripheral cities to be predicted based on a potential source contribution analysis method;

s2, selecting a city with influence on the city to be predicted to be brought into the area to be analyzed according to the contribution factor, collecting pollution data of the area to be analyzed, and constructing a space influence graph; the area to be analyzed comprises the city to be predicted and a selected city with influence on the city to be predicted;

s3, inputting the space influence graph and the pollution data into a graph convolution network, so that the pollution original data are converted into time sequence data with space characteristics;

and S4, inputting the time sequence data with the spatial characteristics into a long-term and short-term memory model for training, extracting the time characteristics of the time sequence data, and predicting the future atmospheric pollution condition.

The invention has the beneficial effects that: the method applies the graph convolution network and the long-short term memory network to the prediction of urban atmospheric environmental pollution data, deeply explores the influence of ambient urban atmospheric pollution on a certain city by utilizing potential source contribution analysis based on airflow tracks, and constructs a spatial influence graph to represent spatial correlation among cities; and then extracting effective spatial characteristics from the spatial influence graph by using a graph convolution algorithm and combining the pollution conditions of various cities, and learning the time characteristics in the previous air pollutant concentration data by adopting a long-term and short-term memory model to generate a prediction result. The invention starts from space-time two dimensions, and simultaneously provides a new mode for establishing a space diagram model, so that the deeper correlation degree related to the environment between cities is explored, and the prediction precision and efficiency of atmospheric pollution are improved.

Drawings

FIG. 1 is an air quality prediction method of the present invention based on a deep learning model for potential source contribution analysis;

FIG. 2 is a schematic diagram of a method of creating a graphical model based on potential source contribution analysis in accordance with the present invention;

FIG. 3 is the airflow trajectory for city d over a 24 hour period;

FIG. 4 is a schematic diagram of meshing;

FIG. 5 is a PSCF distribution plot;

FIG. 6 is a schematic diagram of a graph-convolution network model of the present invention;

FIG. 7 is a diagram of a long term and short term memory model according to the present invention.

Detailed Description

In order to facilitate the understanding of the technical contents of the present invention by those skilled in the art, the present invention will be further explained with reference to the accompanying drawings.

The potential source contribution analysis is widely applied to identify a possible source area of high-concentration pollutants observed at a receptor site, so that the pollution influence of a peripheral area on a certain area can be deeply analyzed by using a potential source contribution analysis method, and a more accurate spatial distribution map can be obtained from the perspective of pollution conditions. And (3) constructing a space distribution map model by using the result of the potential source contribution analysis, and effectively predicting the atmospheric pollution condition from the aspects of time and space, thereby improving the air quality prediction precision.

As shown in fig. 1, an air quality prediction method based on a deep learning model of potential source contribution analysis includes the following steps:

s1, collecting pollution data and meteorological data of cities to be predicted, and calculating atmospheric pollution contribution factors of surrounding cities to the cities to be predicted by combining a potential source contribution analysis method; the peripheral cities of the embodiment mainly consider the adjacent one-circle-layer cities taking the city to be analyzed as the center, and the two-circle-layer cities adjacent to the one-circle-layer cities. Of course, in practical applications, a wider urban area may be considered as required.

Calculating influence factors of pollution of surrounding cities according to results of potential source contribution analysis, wherein the potential source contribution analysis step is shown in fig. 2 and specifically comprises the following sub-steps:

s11, taking city d as an example, firstly collecting multi-period meteorological data and a map thereofVector data, which is used for carrying out backward trajectory analysis on the city d to be predicted by utilizing meteorological data to obtain the airflow trajectory of the city d within 24 hours as shown in fig. 3; dividing the research area into a plurality of small horizontal grids, as shown in fig. 4, setting the grid resolution mostly to 0.25 degrees x 0.25 degrees, and counting the number of track points passing through the grids (i, j) in the area and recording the number as n _ij ；

S12, setting a pollutant concentration threshold value, and selecting a pollutant concentration mean value or setting according to a secondary standard limit value in the environmental air quality standard (GB 3095-2012); calculating a pollutant concentration value of partial point positions on each track by using the original pollutant concentration value and combining the airflow tracks obtained in S11 through a reverse track, counting the number of the pollution track points which pass through grids (i, j) in the research area and have pollutant concentrations exceeding a threshold value, and recording the number as m _ij The PSCF value for any grid (i, j) can be obtained using the following equation:

PSCF represents the potential source contribution factor and is finally calculated to yield the result shown in FIG. 5.

S13, calculating the PSCF total value in the urban area surrounding the city to be predicted, dividing the PSCF total value by the corresponding urban area to obtain the PSCF value of each unit area of surrounding cities, and obtaining the PSCF-based spatial influence factor sigma of a certain city c on the area to be detected _c The formula is as follows:

wherein S is _c Representing the area of the city to be predicted.

S2, according to the contribution factors, selecting city areas with space influence factors larger than 0 as surrounding cities having influences on the city to be predicted, collecting pollution data of the city areas, and constructing a space influence graph;

s21, establishing a graph model adjacency matrix A taking the city d as a point to be predicted _factor ：

Thus, a spatial influence graph G can be obtained _factor ＝(V,E,A _factor ) Wherein V represents the number of nodes in the spatial impact graph,

v _i represents city i; e represents an edge set; a. The _factor Is the adjacency matrix of the graph.

S3, inputting the space influence graph and monitoring data of atmospheric pollution PM 2.5 of all relevant cities into a graph convolution network, and converting the original data into time sequence data with space characteristics;

and extracting the spatial characteristics of atmospheric pollution of each city in the research area by adopting a graph convolution network. And (3) carrying out graph convolution calculation on the pollution concentration data of each city through the space influence graph constructed in the S1 so as to extract the space characteristics of the atmospheric pollution among the cities. The graph convolution network model process is shown in fig. 6, wherein v0-v9 represent nodes of a graph, which refer to different cities, v0 represents a city to be predicted, and spatial features of the city to be predicted are extracted after graph convolution operation, and the method specifically comprises the following steps:

s31, collecting atmospheric pollution data of all cities in the research area, and standardizing the data by adopting dispersion standardization, wherein the formula is as follows:

wherein x is _max Maximum value of the monitored data, x, of PM 2.5 _min The minimum value of the monitored data of PM 2.5, x represents a target value after dispersion standardization, and x represents data needing dispersion standardization;

s32, carrying out symmetrical normalization on the space influence diagram obtained in S1, namely

Wherein the content of the first and second substances,

i is an identity matrix of size N x N,

is a diagonal matrix of the angles,

s33, defining a characteristic value matrix

Storing all standardized characteristic data of N cities at time t, wherein N is the number of the cities, and t is the monitoring time.

S34, carrying out graph convolution operation on the result after the symmetry normalization, wherein the calculation formula is as follows:

wherein the content of the first and second substances,

is the output of the convolution of the l +1 th layer graph, W ^(l) Is a parameter of the l-th layer,

for the output of the I-th layer graph convolution, the input layer

σ (-) denotes the activation function.

S4, inputting the time sequence data with the spatial characteristics into a long-term and short-term memory model for training, extracting the time characteristics of the time sequence data, and predicting the future atmospheric pollution condition; the specific implementation method comprises the following steps: convolution net of step S2The result with spatial characteristics of the processing is set to G for the characteristic matrix at any time t _t Combining the convolution results of the past T time steps, and constructing a space influence graph according to the 24-hour airflow trajectory meter, so that T is 1, and a group of time sequence sequences with space characteristics can be obtained

Inputting the time sequence into a long-short term memory model, combining the time characteristics of the extracted time sequence data, and obtaining the predicted value of atmospheric pollution at T' time steps in the future

The atmospheric pollution has certain periodicity characteristics, one week is taken as a period, and T' can be an integer selected from 1-7 according to prediction needs. As can be seen from FIG. 7, each loop structure of the long-short term memory model is composed of a unit state, three gate structures and four neural network layers. The key component of the long-short term memory model is the cell state, c, output by the top horizontal arrow in FIG. 7 _t . The cell state runs down the entire chain and information can easily flow to the next node in the direction of the arrow.

The long-short term memory model can completely delete or add the information of the unit state by the door structure control information; the gate may also optionally pass portions of the information through, which is accomplished by Sigmod neural network layers and dot product operations. The Sigmod layer may output a number between 0 and 1 describing how much information each component may pass, with 0 indicating no pass at all and 1 indicating all pass. The long-short term memory model comprises three gate structures, namely an input gate, an output gate and a forgetting gate: the input gate can decide how much input information can be stored in the cell state c _t Middle, output door control unit state

How much current output value h can be output to the long-term and short-term memory model _t In the unit, forgetting gate determines t-1 timeHow much state is remained in the unit state at the time t, and the cell state c at the time t is updated through the interaction between the three gates _t The information transmitted at the time t-1 which needs to be discarded at the moment and the information which needs to be newly added and is acquired from the input signal at the time t are included

c _t-1 And the cell state is transmitted to a long-short term memory model at the t +1 moment as a new cell state.

The future data and the historical data have strong relevance, and the atmospheric pollutant concentration prediction value of T' time steps in the future can be realized through the extraction of the time characteristics by the long-term and short-term memory model.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. The air quality prediction method of the deep learning model based on the potential contribution analysis is characterized by comprising the following steps:

s1, calculating atmospheric pollution contribution factors of surrounding cities to be predicted based on a potential source contribution analysis method;

2. The air quality prediction method based on the deep learning model of the potential contribution analysis according to claim 1, wherein the step S1 specifically comprises the following sub-steps:

s11, taking a city to be predicted and a surrounding city thereof as a research area, collecting multi-period meteorological data and map vector data of the research area, and performing backward trajectory analysis on the city to be predicted by using the meteorological data to obtain an airflow trajectory of the city to be predicted;

dividing a research area into a plurality of small horizontal grids, setting the grid resolution to be 0.25 degrees multiplied by 0.25 degrees mostly, counting the number of track points passing through the grids (i, j) in the research area, and recording the number as n _ij ；

S12, setting a pollutant concentration threshold value; calculating a pollutant concentration value of a track point on each track by using the original pollutant concentration value and combining the airflow track obtained in S11 through a reverse track, counting the number of the pollution track points which pass through grids (i, j) in the research area and have pollutant concentrations exceeding a threshold value, and recording the number as m _ij The PSCF value for grid (i, j) is calculated using the following equation:

PSCF represents a potential source contribution factor;

s13, calculating a PSCF-based spatial influence factor sigma of each peripheral city to the city to be predicted _c The calculation formula is:

wherein S is _c The area of the surrounding city c is shown.

3. The air quality prediction method based on the deep learning model of potential contribution analysis as claimed in claim 2, wherein the pollutant concentration threshold is set according to the pollutant concentration mean value or according to the secondary standard limit in "environmental air quality standard".

4. The air quality prediction method based on the deep learning model for potential contribution analysis according to claim 3, wherein step S2 includes a surrounding city with a spatial influence factor greater than 0 in the region to be analyzed.

5. The air quality prediction method based on the deep learning model for potential contribution analysis according to claim 4, wherein the spatial influence map is constructed by establishing a map model adjacency matrix A of a city to be predicted as a prediction point _factor Obtaining a spatial influence map G _factor ＝(V,E,A _factor ) Wherein V represents the number of nodes in the spatial impact graph,

v _i representing the number of cities i and N; e represents an edge set.

6. The air quality prediction method based on the deep learning model of potential contribution analysis according to claim 5, wherein A is _factor Expressed as:

wherein d represents a city to be predicted.

7. The air quality prediction method based on the deep learning model of the potential contribution analysis as claimed in claim 6, wherein the step S3 specifically comprises the following sub-steps:

wherein x is _max Maximum value of atmospheric pollution data, x _min The minimum value of the atmospheric pollution data is represented by x, the target value after dispersion standardization is represented by x, and the data needing dispersion standardization is represented by x;

Wherein, the first and the second end of the pipe are connected with each other,

i is an identity matrix of size N x N,

is a diagonal matrix of the angles,

s33, defining a characteristic value matrix

Storing all standardized characteristic data of N cities at time t, wherein t is monitoring time;

s34, carrying out graph convolution operation on the result after the symmetric normalization, wherein the calculation formula is as follows:

wherein the content of the first and second substances,

for the output of the I-th layer graph convolution, the input layer

σ (-) represents the activation function.