CN114611792A

CN114611792A - Atmospheric ozone concentration prediction method based on mixed CNN-Transformer model

Info

Publication number: CN114611792A
Application number: CN202210238135.6A
Authority: CN
Inventors: 孙强; 陈逸彬; 徐爱兰; 蒋行健; 黄勋; 陈晓敏
Original assignee: Nantong University
Current assignee: Nantong University
Priority date: 2022-03-11
Filing date: 2022-03-11
Publication date: 2022-06-10
Anticipated expiration: 2042-03-11
Also published as: CN114611792B

Abstract

The invention relates to an atmospheric ozone concentration prediction method based on a mixed CNN-Transformer model, and designs the mixed CNN-Transformer model which is formed by combining a convolutional neural network and an improved Transformer model. The convolutional neural network consists of 2 one-dimensional convolutional layers, the Transformer model consists of encoders and decoders, each encoder has 3 encoding layers, and the decoders have 3 decoding layers. Meanwhile, on the basis of the traditional Transformer encoder decoder framework, a cross multi-head attention layer from an encoder to an encoder is added among different encoding layers of the encoder, and the correlation of encoding information among the different encoding layers is further mined. The CNN model can well extract effective information on characteristic dimension, and the problem that an encoder in a Transformer model is insufficient in information extraction capability is solved. The prediction method can reflect the influence of multivariate data on the ozone concentration more truly, and learn the influence mode through a CNN-Transformer model, thereby giving a more accurate ozone concentration prediction result.

Description

Atmospheric ozone concentration prediction method based on mixed CNN-Transformer model

Technical Field

The invention relates to an atmospheric ozone concentration prediction method, in particular to an atmospheric ozone concentration prediction method based on a mixed CNN-Transformer model.

Background

Currently, the acceleration of industrialization and urbanization leads to a drastic drop in air quality. The main pollutants in the atmosphere include sulfur dioxide, nitrogen oxides, particulate matter, carbon monoxide and ozone. Prolonged exposure to ozone at too high or too low a concentration can cause physical harm such as dyspnea. In consideration of the influence of ozone on human bodies, a method capable of accurately predicting the concentration of ozone is urgently needed so that relevant organizations take measures to manage and control.

With the development of the fields of artificial intelligence and deep learning, the use of deep learning models to predict the concentration of pollutants in the atmosphere has become a research hotspot and development trend. The existing models for predicting ozone concentration based on deep learning include: recurrent Neural Networks (RNN), long short term memory networks (LSTM), and hybrid models based on convolutional neural networks and long short term memory networks (CNN-LSTM). The recurrent neural network is a deep learning model of the machine translation field proposed earlier, and then many scholars apply the model to the time series prediction field. The biancofore et al used the conventional RNN model, for example, peskara, and used past ozone data and other atmospheric pollutant data as training data to predict the ozone concentration in the

future

1, 3, 6, 12, and 24 hours, respectively, and the prediction effect in 6 hours was good, but the prediction error outside 6 hours was large. Conventional RNN models tend to have long range dependence problems, i.e., do not yield good results for longer input and output sequences. At present, the mainstream model for time series prediction is a CNN and LSTM mixed model, the CNN can extract local space-time characteristics between data, and the LSTM model relieves the long-range dependence problem to a certain extent compared with the traditional RNN model, so that the two models can be combined to obtain a better effect in time series prediction.

However, the LSTM model still does not solve the long-range dependency problem well, and when the input sequence is long, the obtained information is continuously forgotten, resulting in less effective information to be finally extracted. The Transformer model based on the attention mechanism proposed by the scholars of Vaswani et al can well solve the problem of information forgetting. Wu et al used a deep Transformer model to predict American influenza, and the study shows that Transformer can achieve a good effect on univariate datasets, but the prediction effect on multivariate datasets is not significant, which indicates that the model does not mine the association between different dimensions of data well. Therefore, how to improve the Transformer model and improve the information extraction capability of the Transformer model on multivariate data is a problem to be faced. In order to solve the problems, the application provides an atmospheric ozone concentration prediction method based on a mixed CNN-Transformer model.

Disclosure of Invention

The invention aims to provide an atmospheric ozone concentration prediction method based on a mixed CNN-Transformer model, which can accurately and effectively predict the atmospheric ozone concentration, solves the problem of poor extraction capability of multivariate data in a traditional Transformer framework, can extract the relation among different characteristics in multivariate data, and fully utilizes historical data to accurately predict the future ozone concentration.

In order to achieve the above object, the present invention provides the following solutions:

an atmospheric ozone concentration prediction method based on a mixed CNN-Transformer model comprises the following steps:

a model training stage:

s1.1, acquiring ozone data, other atmospheric pollutant data and meteorological data as training data, processing missing values and abnormal values, normalizing the processed data, and eliminating influences caused by dimensional differences. The other atmospheric pollutant data comprise nitrogen dioxide, nitric oxide, sulfur dioxide and carbon monoxide, and the meteorological data comprise wind speed, wind direction, air pressure and highest and lowest air temperature;

and S1.2, serializing the data by adopting a sliding window technology to form time series data. The prediction method provides a default sliding window size, which the user can also customize to determine the time span of the historical data for predicting future ozone concentrations. The user can also adjust the length of the predicted data, namely the time span of the output data;

and S1.3, constructing a CNN-Transformer model, sending the time series data into the CNN-Transformer model, and starting a model training process.

Model reasoning prediction stage:

s1.4, acquiring ozone data, other atmospheric pollutant data and meteorological data acquired by a meteorological station in real time from an environment monitoring station as historical data, processing missing and abnormal values, normalizing the processed data and eliminating the influence caused by dimensional difference;

and S1.5, sending the data into a trained CNN-Transformer model, and giving a prediction result after operation.

Preferably, in step S1.1, the normalization processing is performed on the processed data, and specifically includes:

the acquired data is normalized in the feature dimension using a maximum-minimum normalization function, limiting the range of the data to between [0,1 ].

The maximum and minimum normalization function is formulated as follows:

wherein X is original data and X belongs to X, X represents an original data set and comprises ozone data, other atmospheric pollutant data and meteorological data, and X^*Representing normalized data, x_minIs the minimum value, x, in the original data set_maxIs the maximum value in the original data set.

Preferably, in step S1.3, the CNN-Transformer model is mainly composed of three parts, namely, a Convolutional Neural Network (CNN), a Transformer and a deep linear neural network (DNN), and is specifically configured as follows: the CNN model comprises 2 one-dimensional convolution layers, each convolution layer is provided with 32 convolution kernels, and potential features of 32 dimensions can be extracted; the Transformer model mainly consists of an encoder and a decoder. There are 1 input mapping layer, 1 position encoding layer, 3 encoding layers, and 1 "encoder to encoder" interleaved multi-headed attention layer in each encoder. There are 1 input mapping layer, 3 decoding layers and 1 sequence-based fully-connected layer in each decoder. And each coding layer consists of 1 multi-head self-attention layer with 64 output dimensions and 1 forward full-connection layer with the same output dimension. The structure of each of the 3 decoding layers is consistent with that of the encoding layer, but a cross multi-head attention layer of 'encoder to decoder' is attached after the multi-head self-attention layer in the decoding layers. The number of heads in the multi-head attention layer is 8; the DNN model has 2 hidden layers, with 256 neurons in the first layer and 128 neurons in the second layer, using the ReLU function as the activation function. A Dropout layer was added after each hidden layer, at ratios of 0.4 and 0.3, respectively. The number of neurons in the output layer is 1.

Preferably, in step S1.3, the model training process is: the number of iterations for the model was set to 500, the training batch size was 64, Adam was used as the optimizer, and the learning rate was 0.001. In the training phase, HuberLoss is adopted as a loss function, and the expression is as follows:

wherein n is the total number of samples in the training set or the verification set, O_iIs the true value of ozone, P, for the ith sample_iPredicting the ozone value of the ith sample;

for each iteration process, firstly, a CNN model is entered, useful information is extracted on the characteristic dimension of a training sample by 2 one-dimensional convolution layers, an output time sequence is sent to a Transformer model, the time sequence information is extracted and stored by a position coding layer, the sequence with the time sequence information is sent to an encoder for encoding, an encoding result is sent to a decoder to obtain decoding data, the decoding data is sent to a DNN layer, the dimension of the DNN layer is compressed to a target dimension 1, the result is reversely normalized to obtain a final predicted value, the predicted value is compared with a label value, a HuberLoss value is calculated, and iteration is continued until the model reaches the optimum value, namely the HuberLoss value is the minimum.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention uses a model based on a Transformer framework to predict the atmospheric ozone concentration, and utilizes the capability of extracting the global characteristics of time series data.

2. The method adopts the CNN model, can effectively extract local features and relevance among data from multiple dimensions, fuses the CNN model and the Transformer model at the same time, and fully considers the global features of time sequence data.

3. In the encoder of the transform model, a multi-head attention layer is added among different encoding layers, so that the problem that the traditional transform model encoder is insufficient in extracting capacity of multi-dimensional features can be effectively solved.

According to the invention, different experiments are carried out according to the adopted data set, and the experimental result shows the superiority and prediction accuracy of the CNN-Transformer model.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a method for predicting atmospheric ozone concentration according to example 1 of the present invention;

fig. 2 is a schematic view of a sliding window provided in embodiment 1 of the present invention;

FIG. 3 is a diagram illustrating the overall architecture of the hybrid CNN-Transformer model provided in example 1 of the present invention;

fig. 4 is a CNN model architecture diagram provided in embodiment 1 of the present invention;

fig. 5 is a diagram of a codec architecture of the transform model according to embodiment 1 of the present invention.

Detailed description of the preferred embodiments

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

The invention aims to provide an atmospheric ozone concentration prediction method based on a mixed CNN-transform model, which solves the problems that the prediction precision of the traditional prediction model is not high and the extraction capability of the traditional transform model on multi-dimensional characteristics is not enough.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are further described below.

As shown in fig. 1, the present embodiment provides an atmospheric ozone concentration prediction method based on a mixed CNN-Transformer model, including:

step S1.1: acquiring multivariate atmospheric pollutant data and meteorological data as historical training data, processing missing and abnormal values, normalizing the processed data, and eliminating the influence caused by dimensional difference.

Step S1.1 specifically includes:

for the missing value, adopting a mean filling method, namely filling the missing value by using the mean of all data, and changing the abnormal value into the mean of the data;

normalizing the processed data by adopting a maximum and minimum normalization function, wherein the maximum and minimum normalization function formula is as follows:

wherein X is original data and X belongs to X, X represents an original data set and comprises ozone data, other atmospheric pollutant data and meteorological data, and X^*Representing normalized data, x_minIs the minimum value, x, in the original data set_maxIs the maximum value in the original data set. The other atmospheric pollutant data comprise nitrogen dioxide, nitric oxide, sulfur dioxide and carbon monoxide, and the meteorological data comprise wind speed, wind direction, air pressure and highest and lowest air temperature.

Step S1.2: serializing the data by adopting a sliding window technology to form time sequence data;

the data serialization is realized by adopting a sliding window technology, and the method specifically comprises the following steps:

suppose a time series sample T_mIs k, m is the index value of the sample, and the data is expressed in the form of vector, i.e. T_m＝[T_m，1，…，T_m，k]，T_m，kOzone concentration data. A sliding window of length N +1 is introduced to segment the data, T being the number of samples in all samples_m，…，T_m+N-1As history data, T_m+NAs known future data, the model gives the ozone prediction result P_m+NWill be related to T_m+N，kA comparison is made. The sliding window is moved one step to generate the next sequence of samples and the sliding process is repeated until all the sample data has been scanned. FIG. 2 shows an example of serializing data using a sliding window technique where the sliding window has a size of 3, S_iDenotes the ith training sample and n denotes the length of the sequence.

Step S1.3: constructing a CNN-Transformer model, and sending time series data into the CNN-Transformer model for training;

FIG. 3 shows the overall architecture of the hybrid CNN-Transformer model, which is trained as follows: after data passes through a CNN model, firstly, convolving historical data by using two one-dimensional convolution layers in the CNN model, and extracting local features between different time nodes in a time sequence; secondly, further extracting the features extracted by the CNN by using an attention mechanism by using an improved Transformer model, and mining the global features in a time sequence; finally, compressing the global characteristics extracted by the Transformer model by using a DNN model, and mapping high-dimensional data to one-dimensional data to obtain an ozone concentration predicted value; and comparing the predicted result with the true value, calculating a loss function HuberLoss value, and continuously training until the HuberLoss value is optimal.

Fig. 4 shows a convolution operation process of the CNN model. Assuming that the input sequence has a length n and the characteristic size of each sample is m, each sequence sample is expressed as T_ijWhere i is 1, …, n designates the sample currently participating in the convolution operation, j is 1, …, m designates the multivariate features participating in the convolution, such as: contains nitrogen dioxide, nitrogen monoxide, sulfur dioxide, carbon monoxide, wind speed, wind direction, air pressure, highest and lowest air temperature and ozone. The number of convolution kernels is k, the size of each convolution kernel is p and the step size of the shift is q, which means that the characteristic dimension finally extracted by the CNN is k. Each convolution kernel first sums a sample T_1j，…，T_pjPerforming convolution to generate an output f_1hWhere h is 1, …, k. The convolution kernel then slides q steps and repeats the previous process until all samples participate in the convolution. The whole convolution calculation process can be expressed as the following formula:

wherein h represents the h convolution kernel, l represents the l sample in the output sequence, the total length of the output sequence is r, and the total length r can be expressed by the following formula:

the results obtained by the operations of different convolution kernels on all the sequence samples are recorded as a set omega ═ F₁，…，F_rAnd F_r＝[f_r1，…，f_rk]And Ω is the final output of the CNN layer.

The CNN model is only described herein to make the CNN model more clearly understood by those skilled in the art, and does not have any limitation to the present invention.

The Transformer model is mainly composed of two parts, namely an encoder and a decoder, which are described in detail below with reference to the encoder and decoder architectures of fig. 5. As shown in FIG. 5, the time series data first passes through an input mapping layer, and the dimension extracted by CNN is mapped into a new dimension, denoted as d_modelThe data is then position encoded. In this process, assume that the length of the input sequence is n, t represents a specific position of the sequence, i represents the ith characteristic dimension at that position,

a position-coding function in this dimension is represented, which can be expressed as:

wherein, i is 1, …, d_modelT is 1, …, n, frequency ω_k＝1/(10000^2k/dmodel). Each calculated position-coding function value is added to the corresponding sample, i.e. timing information is added to the sample. Then entering into the coding layer, the coding layer is composed of 2 sub-layers, which are respectively: the multi-head self-attention layer and the forward full-connection layer are respectively subjected to residual error connection and application layer normalization after each sub-layer is operated. The input of the attention layer consists of three parts of inquiry, key and value, and the respective dimensions are respectively marked as d_k、d_kAnd d_v. To facilitate the computation, all query and key-value pairs are packed in a matrix, denoted Q, K, V, and then computed by an attention function written as:

the multi-head attention function is based on the attention functionDivide it into h heads, each head having a dimension d_modelAnd h, calculating attention through the formulas respectively, and after calculation is finished, splicing the calculation result of each head on the characteristic dimension and mapping the calculation result to the corresponding dimension. The entire multi-headed attention function can be written as:

M(Q，K，V)＝Concat(head₁，…，head_h)W^O

wherein the content of the first and second substances,

representing the mapping of the matrix Q, K, V to a dimension d_modelParameter matrix of/h. W^OAlso a parameter matrix, which maps the dimension of each head after splicing to the initial dimension d_model. The invention adds an attention layer from encoder to encoder between different encoding layers to further extract features. Dimension of output result of encoder is d_modelAnd fed to the decoder.

The decoder has a structure similar to that of the encoder, except for two identical sub-layers in the encoding layer of the encoder, the decoding layer also has a cross multi-head attention layer from the encoder to the decoder, the cross multi-head attention layer takes the input of the decoder, namely a target sequence after convolution as a query part, and the output of the encoder as a key value pair part to carry out self-attention function operation. The cross-attention layer may extract the association of the target sequence with the input sequence, looking for dynamic associations and computation patterns between them. Sequence-based full-connection layer for mapping the features of each time point of the decoded layer output sequence to d_modelAnd (4) dimension. The length of the output sequence of the encoder is consistent with that of the target sequence, and the characteristic dimension of the output is d_model。

Finally, through full connection operation of DNN layer, characteristic dimension d output by decoder_modelCompressing to target dimension, which is 1 in this embodiment, and finally performing inverse normalization on the target sequence by using the maximum and minimum normalization function to obtain the target sequenceTo the ozone concentration that needs to be predicted.

In order to evaluate the prediction capability and accuracy of the trained CNN-Transformer model on the ozone concentration, the invention uses HuberLoss as a loss function, and specifically comprises the following steps:

and calculating the values of the predicted result and the real result HuberLoss given by the training model according to the following formula:

wherein n is the total number of samples in the training set or the verification set, O_iIs the true value of ozone, P, for the ith sample_iIs the predicted value of the ozone of the ith sample. The smaller the value of HuberLoss, the closer the predicted value of the trained CNN-Transformer model is to the real value, i.e. the higher the model accuracy. In order to optimize the HuberLoss value, the model needs to be trained and iterated continuously, and all parameters in the model are updated after each training is completed until the HuberLoss value is optimal.

Step S1.4: acquiring multi-element atmospheric pollutant data and meteorological data provided by a monitoring station in real time as historical reference data;

step S1.5: and (3) sending the data into a trained CNN-Transformer model, and obtaining a prediction result, namely the predicted atmospheric ozone concentration value after operation.

It should be noted that the time span of the ozone concentration to be predicted is consistent with the time span of the target sequence provided during model training, and if the target sequence provided during training is the ozone concentration within 3 hours, the actual prediction can only predict the ozone concentration within 3 hours in the future. The operation process during model prediction is consistent with that during training.

In order to enable the technical personnel in the field to better understand the scheme, Beijing city is selected as a research case city, and the experimental scheme can be expanded to a wider range. The experiment is divided into two parts, wherein the first part is the training of the model, a large amount of historical data is used as training data to train the constructed mixed CNN-Transformer model, and one part of data is used as a verification set to evaluate the performance of the model; and the second part is the reasoning test of the model, uses partial data as a test set, predicts by using the trained CNN-Transformer in the first part, compares a predicted value with a true value, and adopts various evaluation indexes to give a scoring result to measure the prediction precision of the model.

14 monitoring sites in Beijing are selected as research targets, and sample data are divided into ozone data, other multi-pollutant data (nitrogen dioxide, nitric oxide, sulfur dioxide and carbon monoxide) and meteorological data (wind speed, wind direction, highest air temperature and lowest air temperature). Data of from 1/2014 to 7/31/2021 are used here. In order to ensure the integrity of data, the present embodiment employs mean filling for partial missing or abnormal values. In this embodiment, 10% of the data is used as the test set, 20% of the data is used as the verification set, and the rest of the data is used as the training set.

In the training process, the present embodiment inputs a sequence with a length of 10, that is, the model inputs ozone concentration data, other multivariate atmospheric pollutant data and meteorological data of the past 10 days. This example performed two sets of experiments, the first set of experiments to predict the 1 day future ozone concentration and the second set of experiments to predict the 3 day future ozone concentration. The training parameters in both sets of experiments were as follows: the number of iterations of the model was 500, the training batch size was 64, Adam was used as the optimizer, and the learning rate was 0.001. During the training phase, HuberLoss is used as a loss function.

It should be noted that, those skilled in the art may set different superparameters and input sequence lengths for different types of data, which are not necessarily described with reference to the embodiment, and the parameters provided in the embodiment should not be construed as limiting the present invention.

In the reasoning test, the prediction accuracy of the model is measured by three evaluation indexes, namely Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and standard root mean square error (NRMSE), which are expressed by the following formula:

where n is the total number of samples in the test set, O_iIs the true value of the ith sample, P_iIs the predicted value of the ith sample, O_maxRepresents the maximum value in the observed sample, O_minRepresenting the minimum in the observed sample. Table 1 shows three evaluation index values for two sets of experiments, respectively.

Table 1 three evaluation index values of

experiments

1, 2 on the test set

Evaluation index	Experiment	1	Experiment 2
				RMSE	7.75	16.27
MAE	5.92	12.83
			NRMSE	3.61％	7.56％

Other metrics may be selected by those skilled in the art to evaluate the performance of the model. When the index value of the prediction result given by the model on the test set reaches a certain threshold value, the trained CNN-Transformer can be considered to give an accurate prediction result, and the threshold value is not limited by the invention.

The principle and embodiments of the present invention are illustrated herein by using specific examples, and the above description is only an example of the present invention for helping understanding the method and core concept of the present invention, and is not intended to limit the scope of the present invention; meanwhile, all the equivalent structures or equivalent processes performed by using the drawings and the contents of the drawings in the specification of the invention, or directly or indirectly applied to other related technical fields, are included in the scope of patent protection of the invention.

Claims

1. The atmospheric ozone concentration prediction method based on the mixed CNN-Transformer model is characterized by comprising a model training stage and a model reasoning prediction stage:

the model training phase comprises the following steps:

s1.1, acquiring ozone data, other atmospheric pollutant data and meteorological data as training data, processing missing values and abnormal values, normalizing processed multivariate data, and eliminating influences caused by dimensional differences; the other atmospheric pollutant data comprise nitrogen dioxide, nitric oxide, sulfur dioxide and carbon monoxide, and the meteorological data comprise wind speed, wind direction, air pressure and highest and lowest air temperature;

s1.2, serializing data by adopting a sliding window technology to form time sequence data;

s1.3, constructing a CNN-Transformer model, sending time series data into the CNN-Transformer model, and starting a model training process;

the model reasoning prediction phase comprises the following steps:

2. The method for predicting atmospheric ozone concentration based on the mixed CNN-Transformer model according to claim 1, wherein: in step S1.1, the normalization processing is performed on the processed data, and specifically includes:

normalizing the acquired data on the characteristic dimension by using a maximum and minimum normalization function, and limiting the range of the data between [0 and 1 ];

the maximum and minimum normalization function is formulated as follows:

3. The method for predicting atmospheric ozone concentration based on the mixed CNN-Transformer model according to claim 1, wherein: in step S1.3, the CNN-Transformer model is composed of three parts, namely, a convolutional neural network CNN, a Transformer, and a deep linear neural network DNN, and is specifically configured as follows: the CNN model comprises 2 one-dimensional convolution layers, each convolution layer is provided with 32 convolution kernels, namely potential features of 32 dimensions can be extracted; the Transformer model consists of an encoder and a decoder, wherein each encoder comprises 1 input mapping layer, 1 position encoding layer, 3 encoding layers and 1 cross multi-head attention layer from the encoder to the encoder; each decoder has 1 input mapping layer, 3 decoding layers and 1 sequence-based full-connection layer; each of the 3 coding layers consists of 1 multi-head self-attention layer with 64 output dimensions and 1 forward full-connection layer with the same output dimension; the structure of each decoding layer of the 3 decoding layers is consistent with that of the coding layer, but a cross multi-head attention layer of 'coder to decoder' is added after the multi-head self-attention layer in the decoding layers; the number of heads in the multi-head attention layer is 8; the DNN model has 2 hidden layers, the number of neurons in the first layer is 256, the number of neurons in the second layer is 128, and a ReLU function is used as an activation function; a Dropout layer is added after each hidden layer, the ratio is 0.4 and 0.3 respectively, and the number of neurons of the output layer is 1.

4. The method for predicting atmospheric ozone concentration based on the mixed CNN-Transformer model according to claim 1, wherein: in step S1.3, the model training process is: setting the iteration number of the model to be 500, the size of a training batch to be 64, using Adam as an optimizer, and setting the learning rate to be 0.001; in the training phase, HuberLoss is adopted as a loss function, and the expression is as follows: