CN114647819A

CN114647819A - Grid point processing method of environmental data based on graph convolution network

Info

Publication number: CN114647819A
Application number: CN202210325115.2A
Authority: CN
Inventors: 张晓霞; 周鹏程; 胡峰
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-06-21

Abstract

The invention relates to the field of data processing, in particular to a grid point processing method of environmental data based on a graph convolution network; the method comprises the steps of obtaining air quality monitoring data and meteorological monitoring data in a target area; performing deletion processing on all monitoring data, and mapping the monitoring data of the station to a grid point matrix divided by a target area; generating a dynamic wind field graph by using wind direction data and wind speed data, and calculating a wind field adjacency matrix by using a Dijkstra algorithm; constructing a mask matrix at each moment according to the air quality concentration data, and constructing a characteristic vector set Z at each moment according to the wind field adjacency matrix, the mask matrix and the meteorological monitoring data; generating a target matrix Y at each moment according to the mask matrix and the air mass concentration data; and inputting the characteristic vector set Z matrix into the trained graph convolution neural network model to obtain an estimation matrix P of the target matrix. The invention can improve the grid-point precision of the environmental data.

Description

Grid point processing method of environmental data based on graph convolution network

Technical Field

The invention relates to the field of data processing, in particular to a grid point processing method of environment data based on a graph convolution network.

Background

In recent years, the problem of environmental pollution is emphasized by people, the density of automatic environmental monitoring sites is greatly improved, and under the background of the rapid development of intelligent grid forecasting technology and the requirement of human production activities on air quality service based on positions, grid data generated by monitoring data with irregular spatial resolution and discrete distribution of the monitoring sites to be regular has important social service and business application values.

Common methods for processing site data into grid points include objective analysis, remote sensing inversion, data assimilation, statistical interpolation, kriging interpolation, tensor completion and the like, but the existing technology processes single environmental data without considering the influence of other factors on target data, so that the grid point effect is not ideal.

Disclosure of Invention

Aiming at the defects in the technology, the invention provides a grid point processing method of environmental data based on a graph convolution network, which is used for gridding the environmental data by using meteorological monitoring data, and fitting the influence of the meteorological data on the air quality data and the change rule of the air quality data in the grid point process by using the strong fitting capacity of a neural network, thereby improving the grid point accuracy of the environmental data.

The grid point processing method of the environment data based on the graph convolution network comprises the following steps:

s1: acquiring air quality monitoring data of N air quality monitoring stations and meteorological monitoring data of M meteorological monitoring stations in a target area; the air quality monitoring data comprises air quality concentration data, and the meteorological monitoring data comprises humidity, temperature, wind direction and wind speed data;

s2: preprocessing the missing parts of all monitoring data, and mapping the monitoring data of the corresponding station to a grid point matrix divided by a target area according to the scale according to the coordinates of the monitored station;

s3: generating a dynamic wind field graph by using wind direction data and wind speed data in meteorological monitoring data in the lattice point matrix, and calculating a wind field adjacency matrix by using a Dijkstra algorithm;

s4: constructing a mask matrix at each moment according to the air quality concentration data in the lattice point matrix, and constructing a characteristic vector set Z at each moment according to the wind field adjacency matrix, the mask matrix and the meteorological monitoring data;

s5: generating a target matrix Y at each moment according to the mask matrix and the air quality concentration data;

s6: and inputting the characteristic vector set Z matrix into a trained graph convolution neural network model to obtain an estimation matrix P of the target matrix.

The grid point processing method of the environmental data based on the graph convolution network has the following beneficial effects:

1. according to the method, wind speed data and wind direction data are converted into a dynamic wind field matrix, a wind field adjacent matrix is calculated according to the dynamic wind field matrix, and finally the adjacent matrix is input into a model, so that the influence mapping relation of a wind field on the air quality can be learned by utilizing a neural network model, and the air quality data lattice precision is improved;

2. according to the method, the GCN layer based on the spectrum is added into the graph convolution neural network model, adverse factors influencing gridding precision can be filtered in the training process, and the grid point processing effect is better and more accurate;

3. according to the method, the mask matrix after random zero setting is used for carrying out random initialization on the target matrix, so that the robustness of the model is improved, and the lattice point effect of the model on the air quality is more stable;

4. when the environment quality data is subjected to lattice transformation, F nodes before the current time node are used for providing information for the current node, so that the lattice transformation effect is more reasonably distributed;

5. the invention also fully considers the influence of temperature and humidity on the air quality, and the lattice point effect is more accurate.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic overall flow diagram of the present invention;

FIG. 2 is a schematic diagram of the generation of a mask and a mask2 matrix according to an embodiment of the invention;

fig. 3 is a schematic diagram of generating a feature vector set Z according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of generating a target matrix Y according to an embodiment of the present invention;

fig. 5 is a main data flow diagram provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic overall flow chart of the present invention, and as shown in fig. 1, the method includes:

in this embodiment of the present invention, the step S1 may include:

s11: acquiring data of each time node with one hour of monitoring time frequency of N air quality monitoring stations in a target area;

s12: acquiring data of each time node with one hour of humidity, temperature, wind direction and wind speed time frequency monitored by M meteorological stations in a target area;

s13: the air mass concentration data and the meteorological monitoring data are sequenced in time, the data are aligned in time, and the default part is filled with-1.

And sequencing the air quality concentration data and the meteorological monitoring data according to time to form a time sequence, and processing the time sequence in the subsequent process to realize lattice point processing of the time sequence.

It is to be understood that, in the embodiment of the present invention, the air quality monitoring data of the air quality monitoring station may select PM2.5 concentration data, and may also be other air quality concentration data such as PM10, CO, SO2, NO2, O3, AQI, and for convenience of description, the PM2.5 concentration data is exemplified in the embodiment of the present invention.

in this embodiment of the present invention, the step S2 includes the following steps:

s21: obtaining the boundary of a target area, dividing the target area into fishing grid points (grids) according to a certain scale, and obtaining the center point coordinate l of each grid_ijForming a central point coordinate matrix L, namely a lattice point matrix, wherein i and j are row and column coordinates of the lattice point matrix respectively;

in this embodiment, the scale may be divided according to 1km × 1km, and certainly, in an actual situation, a person skilled in the art may select different scales to divide according to actual needs, so as to divide the target area into different grids; the target area can be rectangular, circular or other irregular figures, and if the target area is non-rectangular, the non-rectangular target area needs to be filled outwards to be rectangular, so that the target area is conveniently divided into grids with the same size, and the subsequent grid point processing on the environmental data is facilitated.

After a target area is divided into a plurality of grids, the coordinates of the central point of each grid are used as the coordinates of the grid, and a central point coordinate matrix is formed; the row and column coordinates of the coordinate matrix can reflect the position of the grid in the target area.

S22: according to the coordinates and the longitude and latitude distances of the air quality monitoring station, traversing and calculating a first distance from the air quality monitoring station to each grid point in the grid point matrix, determining the grid point position when the first distance is shortest, and recording the grid point position in a first mapping table D¹Performing the following steps;

wherein the first mapping table D¹Comprises N mapping coordinates of air quality monitoring stations which are respectively mapped to the shortest distance in a grid point matrix,

the row and column marks representing the mapping of the ith air quality monitoring station to the data matrix are shown, i belongs to [1, N ∈]。

S23: reading the monitoring data of all the air quality monitoring stations, and supplementing the default monitoring data of the current moment into the average value of the monitoring data of the previous moment and the next moment of the monitoring station or the average value of the monitoring data of all the moments of the monitoring station for the same grid point;

in the embodiment of the invention, the monitoring data of all N air quality monitoring stations are read, and if the monitoring data at the current moment is found to have default, the monitoring data is supposed to be in

By default, i.e.

Then order

If a number of consecutive default values occur, it will be filled directly with the average value for that station, where

Represents the air mass concentration value at the time t of the nth station, t-1 represents the previous time before the time t, t +1 represents the later time after the time t,wherein t is the (0, L) and L is the time sequence length of the monitoring data.

S24: according to the coordinates and the longitude and latitude distances of the weather monitoring station, traversing and calculating a second distance from the weather monitoring station to each grid point in the grid point matrix, determining the grid point position when the second distance is shortest, and recording the grid point position in a second mapping table D²The preparation method comprises the following steps of (1) performing;

wherein the first mapping table D²Comprises M weather monitoring stations which are respectively mapped to mapping coordinates with the shortest distance in a grid point matrix,

the row and column marks representing the ith weather monitoring station mapped into the data matrix are shown, i belongs to [1, M ∈]。

S25: reading the meteorological monitoring data of all meteorological monitoring sites, and supplementing the default monitoring data of the current moment into the average value of the monitoring data of the previous and subsequent moments of the monitoring sites or the average value of the monitoring data of all moments of the monitoring sites for the same grid point;

in the embodiment of the invention, the meteorological monitoring data of all M meteorological monitoring sites are read, and for each acquired meteorological data, the meteorological monitoring data are

By default, i.e.

Then order

The method comprises the steps of representing a meteorological data value of an mth meteorological monitoring station at a time t, representing a previous time of the time t by t-1, representing a later time of the time t by t +1, wherein t belongs to (0, L), and L is the length of a monitoring data time series.

In the embodiment, for the data default part, namely the meteorological monitoring data, the generated data matrix is optimized by using the Cressman filling algorithm, so that a new optimized matrix is obtained and the original matrix is covered.

S26: at the current moment t, the air quality concentration data of each air quality monitoring station is mapped according to a first mapping table D¹Mapping to matrix V_tAt the corresponding position, the humidity, temperature, wind direction and wind speed data of each meteorological monitoring station are mapped according to a second mapping table D²Mapping to matrix H_t、T_t、

At the corresponding position;

wherein, V_tAir mass concentration data matrix, H, representing time t_t、T_t、

And sequentially obtaining a humidity, temperature, wind direction and wind speed data matrix at the moment t, wherein t belongs to (0, L), and L is the time sequence length of the monitoring data.

In the embodiment of the invention, for any time t, PM2.5 data of each air quality monitoring station at the time t can be mapped according to the first mapping table D¹Mapping to matrix V_tAt the corresponding position, the humidity, temperature, wind direction and wind speed data of each weather monitoring station at the moment t are mapped according to a second mapping table D²Mapping to matrix H_t、T_t、

At a corresponding position where V_tThe monitoring value representing PM2.5 at the time t is mapped to a target area and divided into fishing nets according to the scale of 1km multiplied by 1km, and H is similar to the monitoring value_t、T_t、

The monitored values of humidity, temperature, wind direction and wind speed at the time t are mapped to a target area and divided into fishing nets according to the scale of 1km multiplied by 1 km.

in this embodiment of the present invention, the step S3 may include:

s31: using wind direction data matrices

And wind speed data matrix

Generating a dynamic wind farm map matrix

Wherein the content of the first and second substances,

representing the distance from the ith grid point to the jth grid point in the grid point matrix, wherein theta_ijRepresenting the wind direction included angle from the ith grid point to the jth grid point, wherein t belongs to (0, L), and L is the length of the monitoring data time sequence;

s32: according to the dynamic wind field diagram matrix DW_tCalculating the shortest path a from the ith lattice point to the jth lattice point in the lattice point matrix by using Dijkstra algorithm_ijRespectively calculating the shortest path for each lattice point in the lattice point matrix, and forming all the shortest paths into a wind field adjacency matrix A_t。

in this embodiment of the present invention, the step S4 includes the following steps:

s41: initializing the first mask matrix mask at time t_tAnd a second mask matrix mask2_tValue of air mass concentration data matrix at time t

When the air is not empty, the air conditioner is not empty,

otherwise

Order to

And from mask2_tRandomly selecting K grid points with the numerical value of 1, resetting the grid points to be 0, updating a second mask matrix, wherein i and j are horizontal coordinates and vertical coordinates of the mask matrix;

s42: v, W for each of the truncated time t and the first F times_s、W_dH, T, A, mask and mask2 matrixes, and constructing feature vector base D of F moments before the moment t_fIs shown as D_f＝[x₁，x₂，x₃，x₄，x₅，x₆，x₇]；

S43: constructing a characteristic vector set Z according to the length of the monitoring data time sequence and the characteristic vector base number, so that Z is [ D ═ D₁，D₂，…，D_n，…，D_J]；

Wherein x is₁，x₂，x₃，x₄，x₅，x₆，x₇Sequentially represents the air pollution concentration matrix V mask2 and the wind speed matrix W of the first F time nodes at the time t_sWind direction matrix W_dHumidity matrix H, temperature matrix T, wind field adjacency matrix A and mask matrix; f, t, F are integers, F belongs to [1, J ]]T is equal to (0, L), and F is less than L; j represents a feature vector set dimension; (n, J) epsilon (0, L-F), and L is the time sequence length of the monitoring data.

FIG. 2 shows a mask matrix according to an embodiment of the present inventionAs shown in fig. 2, in the embodiment of the present invention, first, a first mask matrix needs to be generated according to the air mass concentration data matrix at time t, where the air mass concentration data matrix V at time t_tThe method comprises the steps of determining a matrix coordinate with an actual numerical value, setting the matrix coordinate to be 1 at a corresponding position of a first mask matrix and setting the matrix coordinate to be 0 at a corresponding position of being empty, thereby forming the first mask matrix_tConsidering that the first mask matrix is completely determined by the air quality concentration data and has weak anti-interference capability, the invention partially randomizes the matrix coordinates set to 1 in the first mask matrix, selects partial matrix coordinates to be reset to 0, and thereby forms the second mask matrix 2_tThe two mask matrices will play different roles in subsequent graph convolution networks.

It can be understood that, in the embodiment of the present invention, the first mask matrix is obtained according to the wind field adjacency matrix, and the second mask matrix is obtained according to the first mask matrix and the random initialization, in the embodiment of the present invention, the first mask matrix and the second mask matrix before and after the random initialization are respectively used, which can enhance the robustness of the graph convolution network model, so that the lattice localization effect of the model on the air quality is more stable.

Fig. 3 is a schematic diagram illustrating feature vector set generation in an embodiment of the present invention, and as shown in fig. 3, in the embodiment of the present invention, it is first necessary to obtain an air pollution concentration matrix V mask2 and a wind speed matrix W at the first F times of each time t_sWind direction matrix W_dThe humidity matrix H, the temperature matrix T, the wind field adjacency matrix A and the mask matrix are integrated at each moment T to form a characteristic vector set with the length J.

in this embodiment of the present invention, the step S5 includes the following steps:

s51: according to the second mask matrix and air qualityThe volume concentration data matrix generates a target matrix at time t, denoted as Y_t＝V_t·mask2_t；

S52: integrating the target matrixes at different moments to construct a target matrix set Y ═ Y₁，Y₂，…，Y_n，…，Y_J]；

Wherein, V_tMask2 showing the mapping of the air mass concentration data at time t to the grid point position of the target region_tA second mask matrix representing time t; t belongs to (0, L), and J represents a characteristic vector set dimension; (n, J) epsilon (0, L-F), and L is the time sequence length of the monitoring data.

Fig. 4 is a schematic diagram showing generation of a target matrix in an embodiment of the present invention, and as shown in fig. 4, in the embodiment of the present invention, an air mass concentration data matrix V at each time t needs to be obtained first_tAnd a second mask matrix mask2_tThe two matrixes are subjected to dot multiplication to obtain a matrix which is a target matrix Y at the moment t_t。

FIG. 5 is a diagram convolution neural network model according to an embodiment of the present invention, and as shown in FIG. 5, the diagram convolution neural network model includes a time convolution network layer, a spectrum-based diagram convolution network layer, a space-based diagram convolution network layer, and a convolution neural network output layer, which are sequentially arranged according to an embodiment of the present invention; the time convolution network layer extracts time sequence characteristics of an air pollution concentration matrix, a wind speed matrix, a wind direction matrix, a humidity matrix and a temperature matrix, the spectrum-based graph convolution network layer carries out noise disturbance denoising on a wind field adjacent matrix, the space-based graph convolution network layer maps the interaction relation between the time sequence characteristics of the time convolution network layer and the wind field characteristics of the spectrum-based graph convolution network layer, and the convolution neural network output layer processes the interaction relation and the mask matrix of the space-based graph convolution network layer and restores implicit characteristics into a target matrix.

In the invention, the TCN can be used for extracting the sequence relation of the time characteristics, the change trend of data in the lattice point process can be better drawn, the spectrum-based GCN filters random noise in the lattice point process, the GCN simulates the influence of wind on air quality pollutants based on the space, and the CNN generates a target matrix, and the influence of humidity and temperature on air quality lattice point is fully considered, so that the lattice point effect is more accurate. Meanwhile, random disturbance is added to the environment data lattice model in the training process based on the graph convolution neural network, so that the stability of the model is higher.

In the embodiment of the invention, the training process of the graph convolution neural network model comprises the steps of inputting the characteristic vector set Z as a model to obtain an estimation matrix P of a target matrix, carrying out counterpoint multiplication on the estimation matrix P of the target matrix and a mask matrix, comparing a counterpoint multiplication result with the target matrix Y, and optimizing the parameters of the graph convolution neural network model according to the comparison result until the error is within a set range.

It can be understood that, in the embodiment of the present invention, only in the training process of the model, the result of the bit multiplication needs to be compared with the target matrix Y, and the neural network model parameters are optimized according to the comparison result, and in the application process of the model, that is, when the air quality monitoring data of N air quality monitoring sites and the weather monitoring data of M weather monitoring sites in the target area are directly processed, only the estimation matrix P of the target matrix needs to be output, where the estimation matrix P is a dense matrix and the target matrix is a sparse matrix.

In this example, in step S6, the graph-convolution-based neural network uses MAE as a loss function for the environment data lattice model, and the calculation formula is

The error threshold is set to 5 and the model stops training when the model average loss is less than 5.

In the description of the present invention, it is to be understood that the terms "coaxial", "bottom", "one end", "top", "middle", "other end", "upper", "one side", "top", "inner", "outer", "front", "center", "both ends", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention.

In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "disposed," "connected," "fixed," "rotated," and the like are to be construed broadly, e.g., as meaning fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; the terms may be directly connected or indirectly connected through an intermediate, and may be communication between two elements or interaction relationship between two elements, unless otherwise specifically limited, and the specific meaning of the terms in the present invention will be understood by those skilled in the art according to specific situations.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A grid point processing method of environment data based on graph convolution network is characterized by comprising the following steps:

2. The grid point processing method of environment data based on graph convolution network as claimed in claim 1, wherein: the step S1 includes respectively obtaining air quality concentration data of the air quality monitoring station at each moment in a first frequency cycle and humidity, temperature, wind direction and wind speed data of the weather monitoring station in the first frequency cycle; and sequencing the corresponding air quality monitoring data and the meteorological monitoring data according to time sequence, and aligning the corresponding monitoring data according to time.

3. The grid point processing method of environment data based on graph convolution network as claimed in claim 1, wherein: the step S2 includes the following steps:

s21: obtaining the boundary of a target area, dividing the target area into fishing grid points, namely grids according to a certain scale, and obtaining the center point coordinate l of each grid_ijForming a central point coordinate matrix P, i.e. a grid point matrix, wherein i and j are rows of the grid point matrix respectivelyAnd column coordinates;

s24: according to the coordinates and the longitude and latitude distances of the weather monitoring station, traversing and calculating a second distance from the weather monitoring station to each grid point in the grid point matrix, determining the grid point position when the second distance is shortest, and recording the grid point position in a second mapping table D²Performing the following steps;

At the corresponding position;

And sequentially obtaining a humidity, temperature, wind direction and wind speed data matrix at the time t, wherein t belongs to (0, L), and L is the time sequence length of the monitoring data.

4. The grid point processing method of environment data based on graph convolution network according to claim 1, characterized in that: the step S3 includes the following steps:

s31: using a wind direction data matrix

And wind speed data matrix

Generating a dynamic wind farm map matrix DW_t,

Wherein, DW_t ^ijRepresenting the distance theta from the ith grid point to the jth grid point in the grid point matrix_ijRepresenting the wind direction angle from the ith grid point to the jth grid point,

representing the wind speed from the ith grid point to the jth grid point in the wind speed data matrix at the time t, wherein t belongs to (0, L), and L is the time sequence length of the monitoring data;

5. The grid point processing method of environment data based on graph convolution network as claimed in claim 1, wherein: the step S4 includes the following steps:

s41: initializing the first mask matrix mask at time t_tAnd a second mask matrix mask2_tValue V of the air mass concentration data matrix at time t_t ^ijWhen the air is not empty, the air conditioner is not empty,

otherwise

Order to

And from mask2_tRandomly selecting K grid points with the numerical value of 1 to reset to 0, and updating a second mask matrix;

s42: v, W for each of the truncated time t and the first F times_s、W_dH, T, A, mask and mask2 matrixes, and constructing feature vector base D of F moments before the moment t_fIs represented by D_f＝[x₁,x₂,x₃,x₄,x₅,x₆,x₇]；

S43: constructing a characteristic vector set Z according to the length of the monitoring data time sequence and the characteristic vector base number, so that Z is [ D ═ D₁,D₂,…,D_n,…,D_J]；

Wherein x is₁,x₂,x₃,x₄,x₅,x₆,x₇Sequentially represents the air pollution concentration matrix V mask2 and the wind speed matrix W of the first F time nodes at the time t_sWind direction matrix W_dHumidity matrix H, temperature matrix T, wind field adjacency matrix A and mask matrix; f, t, F are integers, F belongs to [1, J ]]，t∈(0,L)，F<L; j represents a feature vector set dimension; (n, J) epsilon (0, L-F), and L is the time sequence length of the monitoring data.

6. The grid point processing method of environment data based on graph convolution network as claimed in claim 1, wherein: the step S5 includes the following steps:

s51: generating a target matrix at time t, denoted as Y, from the second mask matrix and the air mass concentration data matrix_t＝V_t·mask2_t；

S52: will be differentIntegrating the target matrixes at the moment to construct a target matrix set Y ═ Y₁,Y₂,…,Y_n,…,Y_J]；

7. The grid point processing method of environment data based on graph convolution network as claimed in claim 1, wherein: the graph convolution neural network model in the step S6 comprises a time convolution network layer, a graph convolution network layer based on a spectrum, a graph convolution network layer based on a space and a convolution neural network output layer which are sequentially arranged; the time convolution network layer extracts time sequence characteristics of an air pollution concentration matrix, a wind speed matrix, a wind direction matrix, a humidity matrix and a temperature matrix, the spectrum-based graph convolution network layer carries out noise disturbance denoising on a wind field adjacent matrix, the space-based graph convolution network layer maps the interaction relation between the time sequence characteristics of the time convolution network layer and the wind field characteristics of the spectrum-based graph convolution network layer, and the convolution neural network output layer processes the interaction relation and the mask matrix of the space-based graph convolution network layer and restores implicit characteristics into a target matrix.

8. The grid point processing method of environment data based on graph convolution network as claimed in claim 1 or 7, wherein: the training process of the graph convolution neural network model comprises the steps of inputting the characteristic vector set Z as a model to obtain an estimation matrix P of a target matrix, carrying out counterpoint multiplication on the estimation matrix P of the target matrix and a mask matrix, comparing a counterpoint multiplication result with the target matrix Y, and optimizing the parameters of the graph convolution neural network model according to the comparison result until the error is within a set range.