CN114611792A - Atmospheric ozone concentration prediction method based on mixed CNN-Transformer model - Google Patents

Atmospheric ozone concentration prediction method based on mixed CNN-Transformer model Download PDF

Info

Publication number
CN114611792A
CN114611792A CN202210238135.6A CN202210238135A CN114611792A CN 114611792 A CN114611792 A CN 114611792A CN 202210238135 A CN202210238135 A CN 202210238135A CN 114611792 A CN114611792 A CN 114611792A
Authority
CN
China
Prior art keywords
data
model
cnn
layer
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210238135.6A
Other languages
Chinese (zh)
Other versions
CN114611792B (en
Inventor
孙强
陈逸彬
徐爱兰
蒋行健
黄勋
陈晓敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN202210238135.6A priority Critical patent/CN114611792B/en
Publication of CN114611792A publication Critical patent/CN114611792A/en
Application granted granted Critical
Publication of CN114611792B publication Critical patent/CN114611792B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention relates to an atmospheric ozone concentration prediction method based on a mixed CNN-Transformer model, and designs the mixed CNN-Transformer model which is formed by combining a convolutional neural network and an improved Transformer model. The convolutional neural network consists of 2 one-dimensional convolutional layers, the Transformer model consists of encoders and decoders, each encoder has 3 encoding layers, and the decoders have 3 decoding layers. Meanwhile, on the basis of the traditional Transformer encoder decoder framework, a cross multi-head attention layer from an encoder to an encoder is added among different encoding layers of the encoder, and the correlation of encoding information among the different encoding layers is further mined. The CNN model can well extract effective information on characteristic dimension, and the problem that an encoder in a Transformer model is insufficient in information extraction capability is solved. The prediction method can reflect the influence of multivariate data on the ozone concentration more truly, and learn the influence mode through a CNN-Transformer model, thereby giving a more accurate ozone concentration prediction result.

Description

Atmospheric ozone concentration prediction method based on mixed CNN-Transformer model
Technical Field
The invention relates to an atmospheric ozone concentration prediction method, in particular to an atmospheric ozone concentration prediction method based on a mixed CNN-Transformer model.
Background
Currently, the acceleration of industrialization and urbanization leads to a drastic drop in air quality. The main pollutants in the atmosphere include sulfur dioxide, nitrogen oxides, particulate matter, carbon monoxide and ozone. Prolonged exposure to ozone at too high or too low a concentration can cause physical harm such as dyspnea. In consideration of the influence of ozone on human bodies, a method capable of accurately predicting the concentration of ozone is urgently needed so that relevant organizations take measures to manage and control.
With the development of the fields of artificial intelligence and deep learning, the use of deep learning models to predict the concentration of pollutants in the atmosphere has become a research hotspot and development trend. The existing models for predicting ozone concentration based on deep learning include: recurrent Neural Networks (RNN), long short term memory networks (LSTM), and hybrid models based on convolutional neural networks and long short term memory networks (CNN-LSTM). The recurrent neural network is a deep learning model of the machine translation field proposed earlier, and then many scholars apply the model to the time series prediction field. The biancofore et al used the conventional RNN model, for example, peskara, and used past ozone data and other atmospheric pollutant data as training data to predict the ozone concentration in the future 1, 3, 6, 12, and 24 hours, respectively, and the prediction effect in 6 hours was good, but the prediction error outside 6 hours was large. Conventional RNN models tend to have long range dependence problems, i.e., do not yield good results for longer input and output sequences. At present, the mainstream model for time series prediction is a CNN and LSTM mixed model, the CNN can extract local space-time characteristics between data, and the LSTM model relieves the long-range dependence problem to a certain extent compared with the traditional RNN model, so that the two models can be combined to obtain a better effect in time series prediction.
However, the LSTM model still does not solve the long-range dependency problem well, and when the input sequence is long, the obtained information is continuously forgotten, resulting in less effective information to be finally extracted. The Transformer model based on the attention mechanism proposed by the scholars of Vaswani et al can well solve the problem of information forgetting. Wu et al used a deep Transformer model to predict American influenza, and the study shows that Transformer can achieve a good effect on univariate datasets, but the prediction effect on multivariate datasets is not significant, which indicates that the model does not mine the association between different dimensions of data well. Therefore, how to improve the Transformer model and improve the information extraction capability of the Transformer model on multivariate data is a problem to be faced. In order to solve the problems, the application provides an atmospheric ozone concentration prediction method based on a mixed CNN-Transformer model.
Disclosure of Invention
The invention aims to provide an atmospheric ozone concentration prediction method based on a mixed CNN-Transformer model, which can accurately and effectively predict the atmospheric ozone concentration, solves the problem of poor extraction capability of multivariate data in a traditional Transformer framework, can extract the relation among different characteristics in multivariate data, and fully utilizes historical data to accurately predict the future ozone concentration.
In order to achieve the above object, the present invention provides the following solutions:
an atmospheric ozone concentration prediction method based on a mixed CNN-Transformer model comprises the following steps:
a model training stage:
s1.1, acquiring ozone data, other atmospheric pollutant data and meteorological data as training data, processing missing values and abnormal values, normalizing the processed data, and eliminating influences caused by dimensional differences. The other atmospheric pollutant data comprise nitrogen dioxide, nitric oxide, sulfur dioxide and carbon monoxide, and the meteorological data comprise wind speed, wind direction, air pressure and highest and lowest air temperature;
and S1.2, serializing the data by adopting a sliding window technology to form time series data. The prediction method provides a default sliding window size, which the user can also customize to determine the time span of the historical data for predicting future ozone concentrations. The user can also adjust the length of the predicted data, namely the time span of the output data;
and S1.3, constructing a CNN-Transformer model, sending the time series data into the CNN-Transformer model, and starting a model training process.
Model reasoning prediction stage:
s1.4, acquiring ozone data, other atmospheric pollutant data and meteorological data acquired by a meteorological station in real time from an environment monitoring station as historical data, processing missing and abnormal values, normalizing the processed data and eliminating the influence caused by dimensional difference;
and S1.5, sending the data into a trained CNN-Transformer model, and giving a prediction result after operation.
Preferably, in step S1.1, the normalization processing is performed on the processed data, and specifically includes:
the acquired data is normalized in the feature dimension using a maximum-minimum normalization function, limiting the range of the data to between [0,1 ].
The maximum and minimum normalization function is formulated as follows:
Figure BDA0003543165850000031
wherein X is original data and X belongs to X, X represents an original data set and comprises ozone data, other atmospheric pollutant data and meteorological data, and X*Representing normalized data, xminIs the minimum value, x, in the original data setmaxIs the maximum value in the original data set.
Preferably, in step S1.3, the CNN-Transformer model is mainly composed of three parts, namely, a Convolutional Neural Network (CNN), a Transformer and a deep linear neural network (DNN), and is specifically configured as follows: the CNN model comprises 2 one-dimensional convolution layers, each convolution layer is provided with 32 convolution kernels, and potential features of 32 dimensions can be extracted; the Transformer model mainly consists of an encoder and a decoder. There are 1 input mapping layer, 1 position encoding layer, 3 encoding layers, and 1 "encoder to encoder" interleaved multi-headed attention layer in each encoder. There are 1 input mapping layer, 3 decoding layers and 1 sequence-based fully-connected layer in each decoder. And each coding layer consists of 1 multi-head self-attention layer with 64 output dimensions and 1 forward full-connection layer with the same output dimension. The structure of each of the 3 decoding layers is consistent with that of the encoding layer, but a cross multi-head attention layer of 'encoder to decoder' is attached after the multi-head self-attention layer in the decoding layers. The number of heads in the multi-head attention layer is 8; the DNN model has 2 hidden layers, with 256 neurons in the first layer and 128 neurons in the second layer, using the ReLU function as the activation function. A Dropout layer was added after each hidden layer, at ratios of 0.4 and 0.3, respectively. The number of neurons in the output layer is 1.
Preferably, in step S1.3, the model training process is: the number of iterations for the model was set to 500, the training batch size was 64, Adam was used as the optimizer, and the learning rate was 0.001. In the training phase, HuberLoss is adopted as a loss function, and the expression is as follows:
Figure BDA0003543165850000032
Figure BDA0003543165850000033
wherein n is the total number of samples in the training set or the verification set, OiIs the true value of ozone, P, for the ith sampleiPredicting the ozone value of the ith sample;
for each iteration process, firstly, a CNN model is entered, useful information is extracted on the characteristic dimension of a training sample by 2 one-dimensional convolution layers, an output time sequence is sent to a Transformer model, the time sequence information is extracted and stored by a position coding layer, the sequence with the time sequence information is sent to an encoder for encoding, an encoding result is sent to a decoder to obtain decoding data, the decoding data is sent to a DNN layer, the dimension of the DNN layer is compressed to a target dimension 1, the result is reversely normalized to obtain a final predicted value, the predicted value is compared with a label value, a HuberLoss value is calculated, and iteration is continued until the model reaches the optimum value, namely the HuberLoss value is the minimum.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention uses a model based on a Transformer framework to predict the atmospheric ozone concentration, and utilizes the capability of extracting the global characteristics of time series data.
2. The method adopts the CNN model, can effectively extract local features and relevance among data from multiple dimensions, fuses the CNN model and the Transformer model at the same time, and fully considers the global features of time sequence data.
3. In the encoder of the transform model, a multi-head attention layer is added among different encoding layers, so that the problem that the traditional transform model encoder is insufficient in extracting capacity of multi-dimensional features can be effectively solved.
According to the invention, different experiments are carried out according to the adopted data set, and the experimental result shows the superiority and prediction accuracy of the CNN-Transformer model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for predicting atmospheric ozone concentration according to example 1 of the present invention;
fig. 2 is a schematic view of a sliding window provided in embodiment 1 of the present invention;
FIG. 3 is a diagram illustrating the overall architecture of the hybrid CNN-Transformer model provided in example 1 of the present invention;
fig. 4 is a CNN model architecture diagram provided in embodiment 1 of the present invention;
fig. 5 is a diagram of a codec architecture of the transform model according to embodiment 1 of the present invention.
Detailed description of the preferred embodiments
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The invention aims to provide an atmospheric ozone concentration prediction method based on a mixed CNN-transform model, which solves the problems that the prediction precision of the traditional prediction model is not high and the extraction capability of the traditional transform model on multi-dimensional characteristics is not enough.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are further described below.
As shown in fig. 1, the present embodiment provides an atmospheric ozone concentration prediction method based on a mixed CNN-Transformer model, including:
step S1.1: acquiring multivariate atmospheric pollutant data and meteorological data as historical training data, processing missing and abnormal values, normalizing the processed data, and eliminating the influence caused by dimensional difference.
Step S1.1 specifically includes:
for the missing value, adopting a mean filling method, namely filling the missing value by using the mean of all data, and changing the abnormal value into the mean of the data;
normalizing the processed data by adopting a maximum and minimum normalization function, wherein the maximum and minimum normalization function formula is as follows:
Figure BDA0003543165850000051
wherein X is original data and X belongs to X, X represents an original data set and comprises ozone data, other atmospheric pollutant data and meteorological data, and X*Representing normalized data, xminIs the minimum value, x, in the original data setmaxIs the maximum value in the original data set. The other atmospheric pollutant data comprise nitrogen dioxide, nitric oxide, sulfur dioxide and carbon monoxide, and the meteorological data comprise wind speed, wind direction, air pressure and highest and lowest air temperature.
Step S1.2: serializing the data by adopting a sliding window technology to form time sequence data;
the data serialization is realized by adopting a sliding window technology, and the method specifically comprises the following steps:
suppose a time series sample TmIs k, m is the index value of the sample, and the data is expressed in the form of vector, i.e. Tm=[Tm,1,…,Tm,k],Tm,kOzone concentration data. A sliding window of length N +1 is introduced to segment the data, T being the number of samples in all samplesm,…,Tm+N-1As history data, Tm+NAs known future data, the model gives the ozone prediction result Pm+NWill be related to Tm+N,kA comparison is made. The sliding window is moved one step to generate the next sequence of samples and the sliding process is repeated until all the sample data has been scanned. FIG. 2 shows an example of serializing data using a sliding window technique where the sliding window has a size of 3, SiDenotes the ith training sample and n denotes the length of the sequence.
Step S1.3: constructing a CNN-Transformer model, and sending time series data into the CNN-Transformer model for training;
FIG. 3 shows the overall architecture of the hybrid CNN-Transformer model, which is trained as follows: after data passes through a CNN model, firstly, convolving historical data by using two one-dimensional convolution layers in the CNN model, and extracting local features between different time nodes in a time sequence; secondly, further extracting the features extracted by the CNN by using an attention mechanism by using an improved Transformer model, and mining the global features in a time sequence; finally, compressing the global characteristics extracted by the Transformer model by using a DNN model, and mapping high-dimensional data to one-dimensional data to obtain an ozone concentration predicted value; and comparing the predicted result with the true value, calculating a loss function HuberLoss value, and continuously training until the HuberLoss value is optimal.
Fig. 4 shows a convolution operation process of the CNN model. Assuming that the input sequence has a length n and the characteristic size of each sample is m, each sequence sample is expressed as TijWhere i is 1, …, n designates the sample currently participating in the convolution operation, j is 1, …, m designates the multivariate features participating in the convolution, such as: contains nitrogen dioxide, nitrogen monoxide, sulfur dioxide, carbon monoxide, wind speed, wind direction, air pressure, highest and lowest air temperature and ozone. The number of convolution kernels is k, the size of each convolution kernel is p and the step size of the shift is q, which means that the characteristic dimension finally extracted by the CNN is k. Each convolution kernel first sums a sample T1j,…,TpjPerforming convolution to generate an output f1hWhere h is 1, …, k. The convolution kernel then slides q steps and repeats the previous process until all samples participate in the convolution. The whole convolution calculation process can be expressed as the following formula:
Figure BDA0003543165850000061
wherein h represents the h convolution kernel, l represents the l sample in the output sequence, the total length of the output sequence is r, and the total length r can be expressed by the following formula:
Figure BDA0003543165850000062
the results obtained by the operations of different convolution kernels on all the sequence samples are recorded as a set omega ═ F1,…,FrAnd Fr=[fr1,…,frk]And Ω is the final output of the CNN layer.
The CNN model is only described herein to make the CNN model more clearly understood by those skilled in the art, and does not have any limitation to the present invention.
The Transformer model is mainly composed of two parts, namely an encoder and a decoder, which are described in detail below with reference to the encoder and decoder architectures of fig. 5. As shown in FIG. 5, the time series data first passes through an input mapping layer, and the dimension extracted by CNN is mapped into a new dimension, denoted as dmodelThe data is then position encoded. In this process, assume that the length of the input sequence is n, t represents a specific position of the sequence, i represents the ith characteristic dimension at that position,
Figure BDA0003543165850000063
a position-coding function in this dimension is represented, which can be expressed as:
Figure BDA0003543165850000071
wherein, i is 1, …, dmodelT is 1, …, n, frequency ωk=1/(100002k/dmodel). Each calculated position-coding function value is added to the corresponding sample, i.e. timing information is added to the sample. Then entering into the coding layer, the coding layer is composed of 2 sub-layers, which are respectively: the multi-head self-attention layer and the forward full-connection layer are respectively subjected to residual error connection and application layer normalization after each sub-layer is operated. The input of the attention layer consists of three parts of inquiry, key and value, and the respective dimensions are respectively marked as dk、dkAnd dv. To facilitate the computation, all query and key-value pairs are packed in a matrix, denoted Q, K, V, and then computed by an attention function written as:
Figure BDA0003543165850000072
the multi-head attention function is based on the attention functionDivide it into h heads, each head having a dimension dmodelAnd h, calculating attention through the formulas respectively, and after calculation is finished, splicing the calculation result of each head on the characteristic dimension and mapping the calculation result to the corresponding dimension. The entire multi-headed attention function can be written as:
M(Q,K,V)=Concat(head1,…,headh)WO
wherein the content of the first and second substances,
Figure BDA0003543165850000073
Figure BDA0003543165850000074
representing the mapping of the matrix Q, K, V to a dimension dmodelParameter matrix of/h. WOAlso a parameter matrix, which maps the dimension of each head after splicing to the initial dimension dmodel. The invention adds an attention layer from encoder to encoder between different encoding layers to further extract features. Dimension of output result of encoder is dmodelAnd fed to the decoder.
The decoder has a structure similar to that of the encoder, except for two identical sub-layers in the encoding layer of the encoder, the decoding layer also has a cross multi-head attention layer from the encoder to the decoder, the cross multi-head attention layer takes the input of the decoder, namely a target sequence after convolution as a query part, and the output of the encoder as a key value pair part to carry out self-attention function operation. The cross-attention layer may extract the association of the target sequence with the input sequence, looking for dynamic associations and computation patterns between them. Sequence-based full-connection layer for mapping the features of each time point of the decoded layer output sequence to dmodelAnd (4) dimension. The length of the output sequence of the encoder is consistent with that of the target sequence, and the characteristic dimension of the output is dmodel
Finally, through full connection operation of DNN layer, characteristic dimension d output by decodermodelCompressing to target dimension, which is 1 in this embodiment, and finally performing inverse normalization on the target sequence by using the maximum and minimum normalization function to obtain the target sequenceTo the ozone concentration that needs to be predicted.
In order to evaluate the prediction capability and accuracy of the trained CNN-Transformer model on the ozone concentration, the invention uses HuberLoss as a loss function, and specifically comprises the following steps:
and calculating the values of the predicted result and the real result HuberLoss given by the training model according to the following formula:
Figure BDA0003543165850000081
Figure BDA0003543165850000082
wherein n is the total number of samples in the training set or the verification set, OiIs the true value of ozone, P, for the ith sampleiIs the predicted value of the ozone of the ith sample. The smaller the value of HuberLoss, the closer the predicted value of the trained CNN-Transformer model is to the real value, i.e. the higher the model accuracy. In order to optimize the HuberLoss value, the model needs to be trained and iterated continuously, and all parameters in the model are updated after each training is completed until the HuberLoss value is optimal.
Step S1.4: acquiring multi-element atmospheric pollutant data and meteorological data provided by a monitoring station in real time as historical reference data;
step S1.5: and (3) sending the data into a trained CNN-Transformer model, and obtaining a prediction result, namely the predicted atmospheric ozone concentration value after operation.
It should be noted that the time span of the ozone concentration to be predicted is consistent with the time span of the target sequence provided during model training, and if the target sequence provided during training is the ozone concentration within 3 hours, the actual prediction can only predict the ozone concentration within 3 hours in the future. The operation process during model prediction is consistent with that during training.
In order to enable the technical personnel in the field to better understand the scheme, Beijing city is selected as a research case city, and the experimental scheme can be expanded to a wider range. The experiment is divided into two parts, wherein the first part is the training of the model, a large amount of historical data is used as training data to train the constructed mixed CNN-Transformer model, and one part of data is used as a verification set to evaluate the performance of the model; and the second part is the reasoning test of the model, uses partial data as a test set, predicts by using the trained CNN-Transformer in the first part, compares a predicted value with a true value, and adopts various evaluation indexes to give a scoring result to measure the prediction precision of the model.
14 monitoring sites in Beijing are selected as research targets, and sample data are divided into ozone data, other multi-pollutant data (nitrogen dioxide, nitric oxide, sulfur dioxide and carbon monoxide) and meteorological data (wind speed, wind direction, highest air temperature and lowest air temperature). Data of from 1/2014 to 7/31/2021 are used here. In order to ensure the integrity of data, the present embodiment employs mean filling for partial missing or abnormal values. In this embodiment, 10% of the data is used as the test set, 20% of the data is used as the verification set, and the rest of the data is used as the training set.
In the training process, the present embodiment inputs a sequence with a length of 10, that is, the model inputs ozone concentration data, other multivariate atmospheric pollutant data and meteorological data of the past 10 days. This example performed two sets of experiments, the first set of experiments to predict the 1 day future ozone concentration and the second set of experiments to predict the 3 day future ozone concentration. The training parameters in both sets of experiments were as follows: the number of iterations of the model was 500, the training batch size was 64, Adam was used as the optimizer, and the learning rate was 0.001. During the training phase, HuberLoss is used as a loss function.
It should be noted that, those skilled in the art may set different superparameters and input sequence lengths for different types of data, which are not necessarily described with reference to the embodiment, and the parameters provided in the embodiment should not be construed as limiting the present invention.
In the reasoning test, the prediction accuracy of the model is measured by three evaluation indexes, namely Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and standard root mean square error (NRMSE), which are expressed by the following formula:
Figure BDA0003543165850000091
Figure BDA0003543165850000092
Figure BDA0003543165850000093
where n is the total number of samples in the test set, OiIs the true value of the ith sample, PiIs the predicted value of the ith sample, OmaxRepresents the maximum value in the observed sample, OminRepresenting the minimum in the observed sample. Table 1 shows three evaluation index values for two sets of experiments, respectively.
Table 1 three evaluation index values of experiments 1, 2 on the test set
Evaluation index Experiment 1 Experiment 2
RMSE 7.75 16.27
MAE 5.92 12.83
NRMSE 3.61% 7.56%
Other metrics may be selected by those skilled in the art to evaluate the performance of the model. When the index value of the prediction result given by the model on the test set reaches a certain threshold value, the trained CNN-Transformer can be considered to give an accurate prediction result, and the threshold value is not limited by the invention.
The principle and embodiments of the present invention are illustrated herein by using specific examples, and the above description is only an example of the present invention for helping understanding the method and core concept of the present invention, and is not intended to limit the scope of the present invention; meanwhile, all the equivalent structures or equivalent processes performed by using the drawings and the contents of the drawings in the specification of the invention, or directly or indirectly applied to other related technical fields, are included in the scope of patent protection of the invention.

Claims (4)

1. The atmospheric ozone concentration prediction method based on the mixed CNN-Transformer model is characterized by comprising a model training stage and a model reasoning prediction stage:
the model training phase comprises the following steps:
s1.1, acquiring ozone data, other atmospheric pollutant data and meteorological data as training data, processing missing values and abnormal values, normalizing processed multivariate data, and eliminating influences caused by dimensional differences; the other atmospheric pollutant data comprise nitrogen dioxide, nitric oxide, sulfur dioxide and carbon monoxide, and the meteorological data comprise wind speed, wind direction, air pressure and highest and lowest air temperature;
s1.2, serializing data by adopting a sliding window technology to form time sequence data;
s1.3, constructing a CNN-Transformer model, sending time series data into the CNN-Transformer model, and starting a model training process;
the model reasoning prediction phase comprises the following steps:
s1.4, acquiring ozone data, other atmospheric pollutant data and meteorological data acquired by a meteorological station in real time from an environment monitoring station as historical data, processing missing and abnormal values, normalizing the processed data and eliminating the influence caused by dimensional difference;
and S1.5, sending the data into a trained CNN-Transformer model, and giving a prediction result after operation.
2. The method for predicting atmospheric ozone concentration based on the mixed CNN-Transformer model according to claim 1, wherein: in step S1.1, the normalization processing is performed on the processed data, and specifically includes:
normalizing the acquired data on the characteristic dimension by using a maximum and minimum normalization function, and limiting the range of the data between [0 and 1 ];
the maximum and minimum normalization function is formulated as follows:
Figure FDA0003543165840000011
wherein X is original data and X belongs to X, X represents an original data set and comprises ozone data, other atmospheric pollutant data and meteorological data, and X*Representing normalized data, xminIs the minimum value, x, in the original data setmaxIs the maximum value in the original data set.
3. The method for predicting atmospheric ozone concentration based on the mixed CNN-Transformer model according to claim 1, wherein: in step S1.3, the CNN-Transformer model is composed of three parts, namely, a convolutional neural network CNN, a Transformer, and a deep linear neural network DNN, and is specifically configured as follows: the CNN model comprises 2 one-dimensional convolution layers, each convolution layer is provided with 32 convolution kernels, namely potential features of 32 dimensions can be extracted; the Transformer model consists of an encoder and a decoder, wherein each encoder comprises 1 input mapping layer, 1 position encoding layer, 3 encoding layers and 1 cross multi-head attention layer from the encoder to the encoder; each decoder has 1 input mapping layer, 3 decoding layers and 1 sequence-based full-connection layer; each of the 3 coding layers consists of 1 multi-head self-attention layer with 64 output dimensions and 1 forward full-connection layer with the same output dimension; the structure of each decoding layer of the 3 decoding layers is consistent with that of the coding layer, but a cross multi-head attention layer of 'coder to decoder' is added after the multi-head self-attention layer in the decoding layers; the number of heads in the multi-head attention layer is 8; the DNN model has 2 hidden layers, the number of neurons in the first layer is 256, the number of neurons in the second layer is 128, and a ReLU function is used as an activation function; a Dropout layer is added after each hidden layer, the ratio is 0.4 and 0.3 respectively, and the number of neurons of the output layer is 1.
4. The method for predicting atmospheric ozone concentration based on the mixed CNN-Transformer model according to claim 1, wherein: in step S1.3, the model training process is: setting the iteration number of the model to be 500, the size of a training batch to be 64, using Adam as an optimizer, and setting the learning rate to be 0.001; in the training phase, HuberLoss is adopted as a loss function, and the expression is as follows:
Figure FDA0003543165840000021
Figure FDA0003543165840000022
wherein n is the total number of samples in the training set or the verification set, OiIs the true value of ozone, P, for the ith sampleiPredicting the ozone value of the ith sample;
for each iteration process, firstly, a CNN model is entered, useful information is extracted on the characteristic dimension of a training sample by 2 one-dimensional convolution layers, an output time sequence is sent to a Transformer model, the time sequence information is extracted and stored by a position coding layer, the sequence with the time sequence information is sent to an encoder for encoding, an encoding result is sent to a decoder to obtain decoding data, the decoding data is sent to a DNN layer, the dimension of the DNN layer is compressed to a target dimension 1, the result is reversely normalized to obtain a final predicted value, the predicted value is compared with a label value, a HuberLoss value is calculated, and iteration is continued until the model reaches the optimum value, namely the HuberLoss value is the minimum.
CN202210238135.6A 2022-03-11 2022-03-11 Atmospheric ozone concentration prediction method based on mixed CNN-converter model Active CN114611792B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210238135.6A CN114611792B (en) 2022-03-11 2022-03-11 Atmospheric ozone concentration prediction method based on mixed CNN-converter model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210238135.6A CN114611792B (en) 2022-03-11 2022-03-11 Atmospheric ozone concentration prediction method based on mixed CNN-converter model

Publications (2)

Publication Number Publication Date
CN114611792A true CN114611792A (en) 2022-06-10
CN114611792B CN114611792B (en) 2023-05-02

Family

ID=81862655

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210238135.6A Active CN114611792B (en) 2022-03-11 2022-03-11 Atmospheric ozone concentration prediction method based on mixed CNN-converter model

Country Status (1)

Country Link
CN (1) CN114611792B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926749A (en) * 2022-07-22 2022-08-19 山东大学 Near-surface atmospheric pollutant inversion method and system based on remote sensing image
CN115146844A (en) * 2022-06-27 2022-10-04 北京交通大学 Multi-mode traffic short-time passenger flow collaborative prediction method based on multi-task learning
CN115545269A (en) * 2022-08-09 2022-12-30 南京信息工程大学 Power grid parameter identification method based on convolution self-attention Transformer model
CN116070799A (en) * 2023-03-30 2023-05-05 南京邮电大学 Photovoltaic power generation amount prediction system and method based on attention and deep learning
CN116091842A (en) * 2023-02-23 2023-05-09 中国人民解放军军事科学院系统工程研究院 Vision Transformer model structure optimization system, method and medium
CN116302509A (en) * 2023-02-21 2023-06-23 中船(浙江)海洋科技有限公司 Cloud server dynamic load optimization method and device based on CNN-converter
CN116913413A (en) * 2023-09-12 2023-10-20 山东省计算中心(国家超级计算济南中心) Ozone concentration prediction method, system, medium and equipment based on multi-factor driving

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109142171A (en) * 2018-06-15 2019-01-04 上海师范大学 The city PM10 concentration prediction method of fused neural network based on feature expansion
US20210089900A1 (en) * 2019-09-20 2021-03-25 Wuhan University Transformer dga data prediction method based on multi-dimensional time sequence frame convolution lstm
CN113095550A (en) * 2021-03-26 2021-07-09 北京工业大学 Air quality prediction method based on variational recursive network and self-attention mechanism
CN113326981A (en) * 2021-05-26 2021-08-31 北京交通大学 Atmospheric environment pollutant prediction model based on dynamic space-time attention mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109142171A (en) * 2018-06-15 2019-01-04 上海师范大学 The city PM10 concentration prediction method of fused neural network based on feature expansion
US20210089900A1 (en) * 2019-09-20 2021-03-25 Wuhan University Transformer dga data prediction method based on multi-dimensional time sequence frame convolution lstm
CN113095550A (en) * 2021-03-26 2021-07-09 北京工业大学 Air quality prediction method based on variational recursive network and self-attention mechanism
CN113326981A (en) * 2021-05-26 2021-08-31 北京交通大学 Atmospheric environment pollutant prediction model based on dynamic space-time attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐爱兰 等: "基于K-means 划分区域的深度学习空气质量预报", 《南通大学学报(自然科学版)》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115146844A (en) * 2022-06-27 2022-10-04 北京交通大学 Multi-mode traffic short-time passenger flow collaborative prediction method based on multi-task learning
CN114926749A (en) * 2022-07-22 2022-08-19 山东大学 Near-surface atmospheric pollutant inversion method and system based on remote sensing image
CN114926749B (en) * 2022-07-22 2022-11-04 山东大学 Near-surface atmospheric pollutant inversion method and system based on remote sensing image
CN115545269A (en) * 2022-08-09 2022-12-30 南京信息工程大学 Power grid parameter identification method based on convolution self-attention Transformer model
CN116302509A (en) * 2023-02-21 2023-06-23 中船(浙江)海洋科技有限公司 Cloud server dynamic load optimization method and device based on CNN-converter
CN116091842A (en) * 2023-02-23 2023-05-09 中国人民解放军军事科学院系统工程研究院 Vision Transformer model structure optimization system, method and medium
CN116091842B (en) * 2023-02-23 2023-10-27 中国人民解放军军事科学院系统工程研究院 Vision Transformer model structure optimization system, method and medium
CN116070799A (en) * 2023-03-30 2023-05-05 南京邮电大学 Photovoltaic power generation amount prediction system and method based on attention and deep learning
CN116070799B (en) * 2023-03-30 2023-05-30 南京邮电大学 Photovoltaic power generation amount prediction system and method based on attention and deep learning
CN116913413A (en) * 2023-09-12 2023-10-20 山东省计算中心(国家超级计算济南中心) Ozone concentration prediction method, system, medium and equipment based on multi-factor driving

Also Published As

Publication number Publication date
CN114611792B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN114611792B (en) Atmospheric ozone concentration prediction method based on mixed CNN-converter model
CN110889546A (en) Attention mechanism-based traffic flow model training method
CN113487061A (en) Long-time-sequence traffic flow prediction method based on graph convolution-Informer model
CN110083125B (en) Machine tool thermal error modeling method based on deep learning
CN111507046B (en) Method and system for predicting remaining service life of electric gate valve
CN111123894B (en) Chemical process fault diagnosis method based on combination of LSTM and MLP
CN111859264A (en) Time sequence prediction method and device based on Bayes optimization and wavelet decomposition
CN114841072A (en) Differential fusion Transformer-based time sequence prediction method
CN111141879B (en) Deep learning air quality monitoring method, device and equipment
CN113328755A (en) Compressed data transmission method facing edge calculation
CN115409258A (en) Hybrid deep learning short-term irradiance prediction method
CN115827335A (en) Time sequence data missing interpolation system and method based on modal crossing method
CN115730456A (en) Motor vehicle multielement tail gas prediction method and system based on double attention fusion network
CN116843012A (en) Time sequence prediction method integrating personalized context and time domain dynamic characteristics
CN116415200A (en) Abnormal vehicle track abnormality detection method and system based on deep learning
CN113935458A (en) Air pollution multi-site combined prediction method based on convolution self-coding deep learning
CN114626012A (en) GNSS sequence prediction method and system of multi-scale attention mechanism
CN113807003A (en) Track clustering method based on RPCA and depth attention self-encoder
CN114386666A (en) Wind power plant short-term wind speed prediction method based on space-time correlation
CN113657533A (en) Multi-element time sequence segmentation clustering method for space-time scene construction
CN113221450A (en) Dead reckoning method and system for sparse and uneven time sequence data
CN112562788A (en) Construction method of circular RNA-RNA binding protein relation prediction model
Wu et al. Deep community detection method for social networks
Rai et al. PM2. 5 Level Forecasting using Transformer-Based Model
CN117935555A (en) Traffic flow prediction method based on bidirectional GRU hypergraph convolution model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant