CN115295086A - Modeling method, device, equipment and storage medium of air quality prediction model - Google Patents

Modeling method, device, equipment and storage medium of air quality prediction model Download PDF

Info

Publication number
CN115295086A
CN115295086A CN202210931237.6A CN202210931237A CN115295086A CN 115295086 A CN115295086 A CN 115295086A CN 202210931237 A CN202210931237 A CN 202210931237A CN 115295086 A CN115295086 A CN 115295086A
Authority
CN
China
Prior art keywords
data
monitoring
monitoring data
meteorological
air quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210931237.6A
Other languages
Chinese (zh)
Inventor
范荣春
李巍
张惠臻
范金海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Qingmiao Intelligent Technology Co.,Ltd.
Original Assignee
Xiamen New Energy Convergence Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen New Energy Convergence Intelligent Technology Co ltd filed Critical Xiamen New Energy Convergence Intelligent Technology Co ltd
Priority to CN202210931237.6A priority Critical patent/CN115295086A/en
Publication of CN115295086A publication Critical patent/CN115295086A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/20Identification of molecular entities, parts thereof or of chemical compositions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides a modeling method, a modeling device, equipment and a storage medium of an air quality prediction model, and relates to the technical field of air quality prediction. This modeling method includes: s1, acquiring monitoring data of various monitoring objects. And S2, preprocessing the monitoring data to eliminate abnormal data in the monitoring data. And S3, calculating correlation coefficients among all monitored objects according to the preprocessed monitoring data. And S4, classifying the meteorological conditions according to the correlation coefficient, and obtaining a meteorological representative of each classification. And S5, acquiring feature sets of various pollutants from the preprocessed monitoring data according to the correlation coefficient and the meteorological representation. And S7, respectively constructing concentration prediction models of various pollutants based on the I nformer deep learning model according to the feature sets of the various pollutants. The prediction model constructed by the modeling method can more accurately predict the concentration of various pollutants, so that more accurate air quality index can be calculated.

Description

Modeling method, device, equipment and storage medium of air quality prediction model
Technical Field
The invention relates to the technical field of air quality prediction, in particular to a modeling method, a device, equipment and a storage medium of an air quality prediction model.
Background
When the atmospheric pollution reaches a certain concentration, the pollution can harm human health and pollute the ecological environment. Therefore, the establishment of the air quality forecasting model, the advance acquisition of the air quality condition and the adoption of corresponding measures are one of effective methods for reducing the atmospheric pollution and improving the environmental air quality.
The existing common WRF-CMAQ simulation system is limited by simulation results and generation mechanisms of pollutants, and prediction results are not ideal. Therefore, a reasonable prediction model needs to be established, the prediction precision is improved, and the air quality condition is predicted more accurately.
In view of the above, the applicant has specifically proposed the present application after studying the existing technologies.
Disclosure of Invention
The invention provides a modeling method, a device, equipment and a storage medium of an air quality prediction model, which aim to improve at least one of the technical problems.
First aspect,
The embodiment of the invention provides a modeling method of an air quality prediction model, which comprises steps S1 to S5 and step S7.
S1, acquiring monitoring data of various monitoring objects; wherein the plurality of monitored objects comprises a plurality of meteorological conditions and a plurality of pollutants;
s2, preprocessing the monitoring data to eliminate abnormal data in the monitoring data;
s3, calculating correlation coefficients among all monitored objects according to the preprocessed monitoring data;
s4, classifying the meteorological conditions according to the correlation coefficient, and obtaining a meteorological representation of each classification;
s5, acquiring feature sets of various pollutants from the preprocessed monitoring data according to the correlation coefficient and the meteorological representation;
and S7, respectively constructing concentration prediction models of various pollutants based on an Informer deep learning model according to the feature sets of the various pollutants.
The second aspect,
The embodiment of the invention provides a modeling device of an air quality prediction model, which comprises:
the data acquisition module is used for acquiring monitoring data of various monitoring objects; wherein the plurality of monitored objects comprises a plurality of meteorological conditions and a plurality of pollutants;
the preprocessing module is used for preprocessing the monitoring data so as to eliminate abnormal data in the monitoring data;
the correlation module is used for calculating correlation coefficients among all monitoring objects according to the preprocessed monitoring data;
the classification module is used for classifying the meteorological conditions according to the correlation coefficient and obtaining a meteorological representation of each classification;
the characteristic set module is used for acquiring characteristic sets of various pollutants from the preprocessed monitoring data according to the correlation coefficient and the meteorological representation;
and the construction module is used for respectively constructing concentration prediction models of various pollutants based on the Informer deep learning model according to the feature sets of various pollutants.
The third aspect,
The embodiment of the invention provides modeling equipment of an air quality prediction model, which comprises a processor, a memory and a computer program stored in the memory; the computer program is executable by the processor to implement a method of modeling an air quality prediction model as described in any of the paragraphs above.
The fourth aspect,
Embodiments of the present invention provide a computer-readable storage medium, which includes a stored computer program, wherein when the computer program runs, an apparatus in which the computer-readable storage medium is located is controlled to execute the modeling method of the air quality prediction model according to any one of the paragraphs of the first aspect.
By adopting the technical scheme, the invention can obtain the following technical effects:
the prediction model constructed by the modeling method provided by the embodiment of the invention can be used for more accurately predicting the concentration of various pollutants, so that more accurate air quality index is obtained through calculation, and the method has good practical significance.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flow chart of a modeling method according to a first embodiment of the present invention.
FIG. 2 is a logic diagram of pre-processing of monitored data.
Fig. 3 is raw data before detection by the Prophet algorithm.
Fig. 4 shows abnormal data after detection by the Prophet algorithm.
Fig. 5 is a graph visualizing the correlation coefficient between the respective monitored subjects.
Fig. 6 is a logic block diagram of a modeling method provided by the first embodiment of the present invention.
FIG. 7 is an Informmer model framework diagram.
Figure 8 is a flow chart of the calculation of primary pollutants and AQI.
Fig. 9 is a schematic flow chart of another modeling method provided in the first embodiment of the present invention.
Fig. 10 is a graph of the proximity between a monitoring station and a current monitoring station.
Fig. 11 is a schematic structural diagram of a modeling apparatus according to a second embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to better understand the technical scheme of the invention, the following detailed description of the embodiments of the invention is made with reference to the accompanying drawings.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The word "if" as used herein may be interpreted as "at 8230; \8230;" or "when 8230; \8230;" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
The invention is described in further detail below with reference to the following detailed description and accompanying drawings:
the first embodiment is as follows:
referring to fig. 1 to 10, a first embodiment of the present invention provides a modeling method of an air quality prediction model, which can be performed by a modeling apparatus of the air quality prediction model (hereinafter, referred to as a modeling apparatus). In particular, it is executed by one or more processors in the modelling apparatus to implement steps S1 to S5, and step S7.
S1, acquiring monitoring data of various monitoring objects. Wherein the plurality of monitored objects comprises a plurality of meteorological conditions and a plurality of pollutants.
Specifically, the weather monitoring station can periodically monitor various weather conditions and various weather pollutant concentrations. In the embodiment, the meteorological conditions comprise 16 meteorological indexes of near-earth 2-meter temperature, earth surface temperature, specific humidity, near-earth 10-meter wind speed, near-earth 10-meter wind direction, rainfall, cloud cover, marginal layer height, atmospheric pressure, sensible heat flux, latent heat flux, long-wave radiation, short-wave radiation and ground solar radiation. The contaminants include SO 2 Concentration, HO 2 Concentration, PM 10 Concentration, PM 2.5 Concentration, O 3 Concentration and CO concentration 6 common contaminants. In other embodiments, other meteorological conditions and contaminants may also be included, as the present invention is not particularly limited in this regard. Each monitoring period may be 1 hour, 15 minutes, or other time period, which is not intended to be limiting.
It will be appreciated that forecast data for an upcoming date is often more accurate. Therefore, in the present embodiment, the monitoring data includes a forecast data of an upcoming date in addition to the actually measured data. In other embodiments, the monitoring data may include only measured data, or may include forecast data several days before the current date, which is not limited in this respect.
It is to be understood that the modeling device may be an electronic device with computing capabilities, such as a laptop computer, desktop computer, server, smart phone, or tablet computer.
And S2, preprocessing the monitoring data to eliminate abnormal data in the monitoring data.
Specifically, the problems of loss and abnormal distribution of measured data exist due to the fact that equipment of the monitored site is debugged and maintained, equipment difference is caused, and the measured data are affected by the monitored site and some accidental factors nearby the monitored site. In the forecasting work, the server is influenced by the conditions of long-time power failure and the like of an external power supply, so that one-time forecasting data of part of running dates is lost. Therefore, in order to facilitate the accuracy of subsequent prediction, data needs to be preprocessed, abnormal data is eliminated, and missing data is repaired.
As shown in fig. 2, on the basis of the above embodiment, in an optional embodiment of the present invention, step S2 includes steps S21 to S25.
S21, detecting whether data missing with the adjacent time difference not larger than two hours or two days exists in the monitoring data. And when detecting that the data with the adjacent time difference not greater than two hours or two days are missing, filling the missing data according to the numerical value average value of the two time periods before and after the missing value.
And S22, detecting whether the monitoring data has data loss of not more than 80%. And when the data missing is not more than 80 percent, filling the missing data according to the numerical average value of the two time periods before and after the missing value.
Specifically, for the reasons of debugging and maintaining equipment of the monitoring station, and the like, the data has the condition that a part of continuous or discontinuous variable values are missing as null values in a continuous time period. The time difference of adjacent data does not exceed two hours (two days) or the loss of a variable of a certain recording part does not exceed 20%, and the numerical value mean value filling of two time periods before and after the loss value is mainly adopted.
And S23, detecting abnormal values in the monitoring data by adopting a Prophet algorithm. And when the abnormal value is detected, setting the abnormal value as a null value, and filling the abnormal value by adopting a K neighbor filling method. Wherein the confidence interval of the Prophet algorithm is 98%.
Specifically, for the case where the data deviates from the abnormal distribution. And detecting abnormal data by using the record of eliminating the null value, and detecting by using a Prophet algorithm. Namely: and (4) taking the confidence interval of the fitting value of the data fluctuation condition (the default confidence interval is 98%) as the upper limit and the lower limit of judging whether the abnormal value exists. In other embodiments, the confidence interval may be set to other ranges, which are not specifically limited by the present invention. As shown in fig. 3 and 4 as an example, fig. 3 shows the distribution of raw data of the PM10, and after anomaly detection (fig. 4), about 10 anomalous data can be found.
Outliers were monitored using the Prophet algorithm, which can be implemented using the Prophet library of Python. And after the abnormal value is detected, setting the abnormal value as a null value, and filling the data by using a K neighbor filling method. The method fills in samples with missing values using a composite of multiple neighbors of samples with missing values. Namely, calculating K samples closest to the missing value sample according to the Euclidean distance (formula 1), and finally filling the missing value with the K sample mean value. This method was performed using KNNImputer from sklern library in python.
Figure BDA0003781576690000061
And S24, detecting whether the monitoring data has data loss larger than 80%. And when the data missing is detected to be more than 80%, filling missing data by adopting a K neighbor filling method.
Specifically, the case where a part of the index is widely absent or continuous period data is widely absent. The data missing rate is over 20%. If a row containing a missing value is deleted directly, a large amount of useful information is lost. From the statistical point of view, as the number of independent information pieces is reduced, the degree of freedom is reduced, and the model prediction precision is inevitably reduced. In order to ensure the accuracy of subsequent prediction, K neighbor filling is also adopted.
And S25, normalizing the filled monitoring data by adopting Z-Score standardization to obtain the preprocessed monitoring data.
In particular, Z-Score normalization can convert data of different magnitudes into uniformly scaled Z-Score scores for comparison. The calculation formula is as follows:
Figure BDA0003781576690000071
where x is the raw data, μ is the sample mean, and σ is the data standard deviation (equation 3).
Figure BDA0003781576690000072
After the data were normalized, the mean of the data became 0 and the variance became 1. All data are processed by the method, so that dimensionless is realized, and errors caused by unit difference are avoided.
And S3, calculating correlation coefficients among all monitored objects according to the preprocessed monitoring data.
Specifically, each index in the data is subjected to statistical induction and arrangement by using a statistical method. The embodiment of the invention integrates a correlation calculation method and visual analysis, and performs correlation calculation and visualization on the correlation between pollutants, among various meteorological condition indexes and on the pollutants under the meteorological conditions.
On the basis of the foregoing embodiment, in an optional embodiment of the present invention, step S3 includes calculating correlation coefficients between various detection objects by using Pearson, spearman, and Kendall, respectively, according to the preprocessed monitoring data.
Specifically, to more scientifically mine and elucidate the correlation of meteorological conditions to pollutant concentrations, pollutants and meteorological conditions. In the embodiment of the invention, three methods for calculating correlation numbers, namely Pearson (formula 4), spearman (formula 5) and Kendall (formula 6), are used. And carrying out correlation analysis on the meteorological condition characteristics and the pollutant factors of the monitoring point A by using three formulas. The visualization results are shown in fig. 5. In other embodiments, other correlation coefficients may also be used to calculate the correlation between the monitoring objects, which is not limited in particular.
Figure BDA0003781576690000073
Figure BDA0003781576690000074
Figure BDA0003781576690000075
It can be understood that the actual meteorological conditions have a great influence on the concentration of the pollutants (for example, the reduction of humidity is beneficial to reducing the generation of ozone), that is, under the condition that the pollutant discharge condition is not changed, the diffusion of the pollutants has an important correlation with the meteorological conditions, and the calculation of the correlation coefficient between the monitoring objects has a very good practical significance for evaluating the influence degree of the meteorological conditions on the pollutant concentration from both quantitative and qualitative angles.
And S4, classifying the weather conditions according to the correlation coefficient, and obtaining weather representatives of each classification.
Specifically, each meteorological condition may have a strong correlation, similar meteorological conditions are eliminated, and the size of the modeled original data can be effectively reduced.
On the basis of the above embodiment, in an optional embodiment of the present invention, step S4 includes steps S41 to S42.
And S41, classifying the meteorological conditions with high correlation degree into one type according to the correlation coefficient, thereby classifying the meteorological conditions.
And S42, selecting a meteorological condition from each classification to obtain a meteorological representative of each classification.
In the present embodiment, as shown in fig. 5, the correlation between the meteorological conditions indicates: near-ground 2-meter temperature, surface temperature, specific humidity and long-wave radiation have similar strong positive correlations, and therefore, these meteorological conditions are classified into one category. The marginal layer height, the surface temperature, the sensible heat flux, the latent heat flux, the short wave radiation and the ground solar radiation also have similar strong positive correlations, so that the meteorological conditions are also classified into one category. The resulting weather representation includes: specific humidity, surface temperature, humidity, wind speed of near-earth 10 meters, wind direction of near-earth 10 meters, rainfall, cloud cover and atmospheric pressure.
In other embodiments, the surveillance object may also include other weather conditions and contaminants, and thus the final classification and weather representation need not be the same as embodiments of the present invention, which is not particularly limited.
In this example, the correlation between the contaminants indicates that: PM (particulate matter) 2. And PM 10 A very strong positive correlation results, and the correlation coefficient approaches 0.99. Description of PM 2.5 And PM 10 Have similarities in the formation mechanism of (c). And the fact that PM is 2.5 Refers to particulate matter, PM, having an aerodynamic diameter of 2.5 microns or less 10 Then 10 microns or less. It is demonstrated that the types of contamination are similar to the possible sources of contamination, and the real situation is in full agreement with the correlation analysis results. In addition, SO 2 For NO 2 、PM 2.5 、PM 10 And particularly strongly associated with CO, NO 2 To SO 2 、PM 2.5 、PM 10 And a particularly strong correlation with CO, PM 2.5 、PM 10 And CO to SO 2 And NO 2 There is also a strong correlation.
In this example, the correlation of each meteorological condition to the contaminant indicates: near-earth 2 m temperature, surface temperature, specific humidity and long wave radiation to SO 2 、NO 2 、PM 2.5 And PM 10 Has close strong negative correlation, atmospheric pressure to SO 2 、NO 2 、PM 2.5 And PM 10 Has a positive correlation, while the marginal layer height, sensible heat flux, latent heat flux, short wave radiation and ground solar radiation have a similarly strong positive correlation to ozone. These meteorological features have similar effects on the change of pollutant concentration, and just show that the meteorological conditions in the meteorological condition classification have stronger similarity.
And S5, acquiring feature sets of various pollutants from the preprocessed monitoring data according to the correlation coefficient and the meteorological representation.
Specifically, it can be seen from step S4 that the influencing factors of the six pollutants are different. Therefore, the strongly correlated factors of each pollutant are respectively combined into the feature set of the pollutant, thereby providing a more reliable data base for the subsequent prediction of the pollutant.
On the basis of the above embodiment, in an optional embodiment of the present invention, step S5 includes step S51 to step S52.
S51, according to the correlation coefficient, respectively obtaining strongly correlated meteorological representatives and other pollutants of various pollutants.
Table 1 strong correlation factors for six contaminants
Figure BDA0003781576690000091
Figure BDA0003781576690000101
And S52, acquiring the strongly correlated meteorological representation and the monitoring data of other pollutants from the preprocessed monitoring data, thereby acquiring the feature set for acquiring various pollutants.
Specifically, since different formation mechanisms of six kinds of pollutants and each meteorological condition have different degrees of influence on the formation of each pollutant, a feature set of each monitoring object is constructed by the correlation coefficient calculated in step S4. In general, the impact of two features needs to be considered in the prediction process of the contaminants: the effect of meteorological conditions on contaminant concentration and the interplay relationship between contaminants.
And S7, respectively constructing concentration prediction models of various pollutants based on an Informer deep learning model according to the feature sets of the various pollutants.
Specifically, as can be seen from step S4, the influence factors of the concentrations of the six pollutants are different, and therefore it is necessary to establish respective prediction models for the six pollutants. Various unknown non-linear factors and complex formation mechanisms are adopted, and therefore six pollutant concentration prediction models are established by using the correlation calculation result in the step S4 and the deep learning model.
The deep learning model Transformer has good performance in a long sequence time sequence, and the variant Informer of the deep learning model Transformer can effectively capture accurate long-range correlation coupling between output and input, thereby greatly reducing network parameters and processing dimensionality, improving the long-sequence prediction speed which is gradually carried out in the past and reducing the space-time complexity of calculation. In the pollutant concentration prediction, the prediction error is reduced to a decimal level. The training algorithm for the Informer network comprises two phases: a forward propagation phase and a backward propagation phase.
The forward propagation stage is as follows:
selecting training sample (X) i ,Y i ) Input into the network to calculate the corresponding Output i
Output i =F n (,…,(F 2 (F 1 (X i W (1) )W (2) ),…,)W (n) ) (7)
In the backward propagation stage, a model Output needs to be constructed i And the true result value Y i And (3) adjusting the parameters of the target function (equation 8) with the minimization of the target function as the target.
Figure BDA0003781576690000111
As shown in FIG. 7, the Informmer model is largely divided into three parts: a data input section, an encoder section, and a decoder section.
A data input section: the ability to capture long-term independence requires global information such as hierarchical timestamps (week, month, and year) and agnostic timestamps (holidays, events). These are difficult to utilize in the classic Self-attentions, so the Informer uses a unified input representation to solve this problem. The embedding of the input consists of three separate parts: scalar projection, local timestamp (location), and global timestamp embedding (minutes, hours, weeks, months, holidays, etc.).
An encoder section: the encoder receives a large number of long sequence inputs. The method comprises the steps of obtaining a ProbSparse Self-attention, obtaining a Self-attention solution, selecting a most main Self-attention, reducing the network size, and increasing the robustness of a copy by stacking copies, wherein the ProbSparse Self-attention replaces the classical Self-attention (Self-attention), and the Self-attention refinement (Self-attention differentiation) partially solves the problem of attention scores.
The decoder part: the decoder receives the long sequence input, fills the target elements with zeros, measures the weighted attention component of the profile, and immediately predicts the output elements in a generative manner, i.e., the generative decoder obtains the long sequence output using a forward step.
It should be noted that, according to the concentrations of the pollutants predicted by the concentration prediction model, the air quality fraction index, the primary pollutants, and the air quality index of each pollutant can be predicted and calculated. Specifically, the air quality fraction index of each pollutant is calculated according to a formula 9 and a predicted value of the concentration of each pollutant, and finally, the primary pollutant and the AQI are calculated by using a formula 10. The calculation flow is shown in fig. 8 below.
Figure BDA0003781576690000112
Figure BDA0003781576690000113
The prediction model constructed by the modeling method provided by the embodiment of the invention can be used for more accurately predicting the concentration of various pollutants, so that more accurate air quality index is obtained through calculation, and the method has good practical significance.
As shown in fig. 9, on the basis of the above embodiment, in an optional embodiment of the present invention, step S6 is further included before step S7.
S6, acquiring the geographical position relationship between the current monitoring point and the adjacent monitoring point, and constructing the area cross-correlation factor of the adjacent monitoring point to the current monitoring point according to the geographical position relationship and the preprocessed monitoring data.
Specifically, the pollutant concentrations in adjacent areas may affect each other, so that the area collaborative prediction may improve the accuracy of the air quality prediction. In the embodiment of the invention, a collaborative forecasting model of the current monitoring point and the monitoring points in the adjacent area is established. Preferably, step S6 includes steps S61 to S63.
First, the factors that will affect the wind direction need to be determined, and in this embodiment, three interacting factors, i.e., geographical location, wind speed, and wind direction, are mainly considered.
S61, acquiring the geographical position relation between the current monitoring point and the adjacent monitoring point.
S62, according to the geographical position relation, the distance and the most influenced angle of the adjacent monitoring points to the current monitoring point are obtained.
Specifically, in the embodiment of the present invention, the wind direction is represented by an angle, the wind direction from the due north direction to the detection point is defined as 0 ° wind direction, and the wind direction is recorded with a clockwise rotation angle (unit: °) as a positive value. When wind reaches the monitoring point from the east-righting direction, the wind direction is recorded to be 90 degrees.
First, a relative coordinate system between the current monitoring station and the nearby monitoring station is established. The wind direction will cause the contamination of both monitoring points to spread. Specifically, as shown in fig. 10, when wind blows from the point a to the points A1, A2, and A3, the influence strength of the three monitoring points on the point a is 0. If wind blows from A1 to A, A2 to A and A3 to A, the three points have the greatest influence on A. Therefore, according to the actual wind direction influence, AA1, AA2, and AA3 need to be connected to establish respective relative coordinate systems. With an A- > A1 orientation of 0 deg., and with an orientation perpendicular to the straight line AA1 of 90 deg.. When the wind direction is 0 degree and 90 degrees, the influence strength of the monitoring point A1 on the monitoring point A is 0; when the wind direction is 180 degrees, the influence intensity of A1 on A is the maximum; when the wind directions are 180-90 degrees and 180-270 degrees, the influence strength of A1 on the monitoring point A is gradually reduced, so that the wind direction interval with the influence is [90 degrees and 270 degrees ]. The new wind direction coordinates of AA2 and AA3 are established in the same manner as AA 1.
Then, a standard coordinate system with the current monitoring point as the origin is established. The wind direction from the north direction to the detection point is 0 DEG, the wind direction is recorded by taking the clockwise rotation angle (unit:DEG) as a positive value, and the wind direction is 90 DEG when the wind flows from the east direction to the monitoring point. The wind direction coordinate defined by the method is taken as a standard, and the wind direction angle which has influence on the monitoring point A in the relative coordinate system is converted into the angle value of the standard coordinate system, wherein the wind direction angle value which has influence is shown in the table 2.
Table 2: wind direction angle value influenced by approaching monitoring station
Figure BDA0003781576690000131
S63, according to the distance, the angle with the largest influence and the preprocessed monitoring data, a calculation model of the area cross-correlation factor of the adjacent monitoring points to the current monitoring point is constructed. Wherein the plurality of meteorological conditions includes wind direction and wind speed. The calculation model is as follows:
Figure BDA0003781576690000132
h(x 1 )=|max-x 1 |
in the formula, x 0 Distance, x, from adjacent monitoring point to current monitoring point 1 Indicating the wind direction in absolute coordinates. x is the number of 2 Wind speed, e natural base number, and max maximum angle of influence.
In particular, wind direction and wind speed are limited by the distance between two monitoring points. When the distance is short, the influence of wind direction and wind speed on pollutant diffusion is large. When the distance is long, the influence of wind direction and wind speed on pollutant diffusion is small. It can be derived that the distance is inversely proportional to the degree of influence. The influence on the pollutant diffusion is large when the wind speed is large, and the influence degree on the pollutant diffusion is small when the wind speed is small. It can be concluded that wind speed is proportional to the degree of influence.
The wind direction has a large influence on the diffusion of pollutants in the influence area, and has a small influence on the diffusion of pollutants in the non-influence area, which is close to 0, so that the wind direction in the influence area is in direct proportion to the influence.
Taking fig. 10 as an example: for A1- > A, the influence gradually decreases when there is an influence angle ranging from the maximum value max to the left end point 172.255 ° and when there is an influence angle ranging from the max value to 352.255 ° (symmetry is assumed); for A2- > A, the degree of influence gradually decreases in the range from max to 228.704 ° and in the range from max to 48.704 ° (symmetry is present); for A3- > a, the degree of influence decreases from the max value to the range of 123.783 ° and from max to the range of 303.783 ° (symmetry is present).
In this embodiment, the influence of the wind direction can be calculated using equation 11:
h(x 1 )=|max-x 1 | (11)
in this embodiment, the overall effect can be summarized as equation 12:
Figure BDA0003781576690000141
preferably, step S7 includes: and respectively constructing a concentration prediction model of each pollutant based on an Informer deep learning model according to the feature set of each pollutant and the area cross-correlation factor of the adjacent monitoring point to the current monitoring point.
Specifically, the present step is the same as the foregoing step S7. Only the embodiment of the invention adds the area cross-correlation factor in the feature set, thereby obtaining the concentration prediction model considering each pollutant adjacent to the monitoring station.
In the embodiment of the invention, the influence of the adjacent monitoring station on the current monitoring station is considered, so that the accuracy of the concentration prediction model is improved.
Table 3: symbolic illustration of embodiments of the invention
Figure BDA0003781576690000142
Figure BDA0003781576690000151
Example II,
Referring to fig. 11, an embodiment of the present invention provides a modeling apparatus for an air quality prediction model, including:
the data acquisition module 1 is used for acquiring monitoring data of various monitoring objects. Wherein the plurality of monitored objects comprises a plurality of meteorological conditions and a plurality of pollutants.
And the preprocessing module 2 is used for preprocessing the monitoring data so as to eliminate abnormal data in the monitoring data.
And the correlation module 3 is used for calculating correlation coefficients among all the monitored objects according to the preprocessed monitoring data.
And the classification module 4 is used for classifying the meteorological conditions according to the correlation coefficient and obtaining a meteorological representative of each classification.
And the feature set module 5 is used for acquiring feature sets of various pollutants from the preprocessed monitoring data according to the correlation coefficient and the meteorological representation.
And the building module 7 is used for respectively building concentration prediction models of various pollutants based on the Informer deep learning model according to the feature sets of the various pollutants.
In an alternative embodiment, the pre-processing module 2 comprises:
and the first filling unit is used for detecting whether data missing with adjacent time difference not larger than two hours or two days exists in the monitoring data. And when detecting that the data with the adjacent time difference not more than two hours or two days are missing, filling the missing data according to the numerical average value of the two time periods before and after the missing data.
And the second unit is used for detecting whether data loss which is not more than 80 percent of data loss exists in the monitoring data. And when the data missing is not more than 80 percent, filling missing data according to the numerical average values of two time periods before and after the missing data.
And a third unit for detecting abnormal values in the monitoring data by using a Prophet algorithm. And when the abnormal value is detected, setting the abnormal value as a null value, and filling the abnormal value by adopting a K neighbor filling method. Among these, the confidence interval of the Prophet algorithm is 98%.
And the fourth unit is used for detecting whether the monitoring data has data loss larger than 80%. And when the data loss of more than 80 percent is detected, filling the missing data by adopting a K neighbor filling method.
And the normalization unit is used for performing normalization processing on the filled monitoring data by adopting Z-Score standardization to obtain the preprocessed monitoring data.
And the correlation module 3 is specifically used for calculating correlation coefficients among various detection objects by adopting Pearson, spearman and Kendall according to the preprocessed monitoring data.
In an alternative embodiment, the classification module 4 comprises:
and the classification unit is used for classifying the meteorological conditions with high correlation degree into one type according to the correlation coefficient so as to classify various meteorological conditions.
And the representative acquiring unit is used for selecting one meteorological condition from each classification to acquire a meteorological representative of each classification. Wherein, the meteorological representation includes: specific humidity, surface temperature, humidity, wind speed of near-earth 10 meters, wind direction of near-earth 10 meters, rainfall, cloud cover and atmospheric pressure.
In an alternative embodiment, feature set module 5 includes:
and the strong correlation monitoring object acquisition unit is used for respectively acquiring strong correlation meteorological representatives and other pollutants of various pollutants according to the correlation coefficient.
And the characteristic set acquisition unit is used for acquiring the strongly correlated meteorological representation and the monitoring data of other pollutants from the preprocessed monitoring data so as to acquire the characteristic set for acquiring various pollutants.
In an alternative embodiment, the modeling apparatus further comprises a region cross-correlation factor construction module 7.
And the region cross-correlation factor building module 7 is used for obtaining the geographical position relationship between the current monitoring point and the adjacent monitoring point and building the region cross-correlation factor of the adjacent monitoring point to the current monitoring point according to the geographical position relationship and the preprocessed monitoring data.
In an alternative embodiment, the building module 7 is specifically configured to: and respectively constructing a concentration prediction model of each pollutant based on an Informer deep learning model according to the feature set of each pollutant and the area cross-correlation factor of adjacent monitoring points to the current monitoring point.
In an alternative embodiment, the region cross-correlation factor building block 7 comprises:
and the geographic position acquisition unit is used for acquiring the geographic position relationship between the current monitoring point and the adjacent monitoring point.
And the influence parameter acquisition unit is used for acquiring the distance and the maximum influence angle of the adjacent monitoring points to the current monitoring point according to the geographical position relation.
And the region cross-correlation factor calculation unit is used for constructing a calculation model of the region cross-correlation factor of the adjacent monitoring points to the current monitoring point according to the distance, the maximum influence angle and the preprocessed monitoring data. Wherein the plurality of meteorological conditions includes wind direction and wind speed. The calculation model is as follows:
Figure BDA0003781576690000171
h(x 1 )=|max-x 1 |
in the formula, x 0 Distance, x, from adjacent monitoring point to current monitoring point 1 Indicating the wind direction in absolute coordinates. x is the number of 2 Wind speed, e natural base number, and max maximum angle of influence.
Example III,
An embodiment of the present invention provides a modeling apparatus for an air quality prediction model, which includes a processor, a memory, and a computer program stored in the memory. The computer program can be executed by a processor to implement the method of modeling an air quality prediction model as described in any of the paragraphs to the embodiments.
Examples IV,
An embodiment of the present invention provides a computer-readable storage medium including a stored computer program, wherein when the computer program is executed, an apparatus in which the computer-readable storage medium is located is controlled to execute the modeling method of the air quality prediction model according to any one of the paragraphs of the embodiment.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A modeling method of an air quality prediction model is characterized by comprising the following steps:
acquiring monitoring data of a plurality of monitoring objects; wherein the plurality of monitored objects comprises a plurality of meteorological conditions and a plurality of pollutants;
preprocessing the monitoring data to eliminate abnormal data in the monitoring data;
calculating correlation coefficients among all monitored objects according to the preprocessed monitoring data;
classifying the meteorological conditions according to the correlation coefficient, and obtaining a meteorological representative of each classification;
acquiring feature sets of various pollutants from the preprocessed monitoring data according to the correlation coefficient and the meteorological representation;
and respectively constructing concentration prediction models of various pollutants based on an Informer deep learning model according to the feature sets of various pollutants.
2. The modeling method of the air quality prediction model according to claim 1, wherein preprocessing the monitoring data to eliminate abnormal data in the monitoring data specifically includes:
detecting whether data missing with adjacent time difference not larger than two hours or two days exists in the monitoring data; when detecting that data with adjacent time difference not larger than two hours or two days are missing, filling missing data according to the numerical value mean value of two time periods before and after the missing value;
detecting whether data loss of which the data loss is not more than 80 percent exists in the monitoring data; when data loss of which the data loss is not more than 80% is detected, filling the missing data according to the numerical value mean value of two time periods before and after the missing data;
detecting abnormal values in the monitoring data by adopting a Prophet algorithm; when the abnormal value is detected, setting the abnormal value as a null value, and then filling the abnormal value by adopting a K neighbor filling method; wherein the confidence interval of the Prophet algorithm is 98%;
detecting whether data loss larger than 80% exists in the monitoring data; when data loss larger than 80% of the data loss is detected, filling the missing data by adopting a K neighbor filling method;
and performing normalization processing on the filled monitoring data by adopting Z-Score standardization to obtain preprocessed monitoring data.
3. The modeling method of the air quality prediction model according to claim 1, wherein calculating a correlation coefficient between each monitored object according to the preprocessed monitoring data specifically includes:
according to the preprocessed monitoring data, correlation coefficients among various detection objects are calculated by adopting Pearson, spearman and Kendall respectively;
classifying the weather conditions according to the correlation coefficient, and obtaining a weather representative of each classification, which specifically comprises:
classifying the meteorological conditions with high correlation degree into one type according to the correlation coefficient, thereby classifying the meteorological conditions;
selecting a weather condition from each category to obtain a weather representative of each category; wherein the weather representation comprises: specific humidity, surface temperature, humidity, wind speed of near-earth 10 meters, wind direction of near-earth 10 meters, rainfall, cloud cover and atmospheric pressure.
4. The modeling method of an air quality prediction model according to claim 1, characterized in that the feature set of each pollutant is obtained from the preprocessed monitoring data according to the correlation coefficient and the meteorological representation, and specifically comprises:
respectively acquiring strongly correlated meteorological representations and other pollutants of various pollutants according to the correlation coefficients;
and acquiring strongly correlated meteorological representatives and monitoring data of other pollutants from the preprocessed monitoring data so as to acquire characteristic sets of various pollutants.
5. The modeling method of the air quality prediction model according to any one of claims 1 to 5, wherein before the step of respectively constructing the concentration prediction model of each pollutant based on the Informer deep learning model according to the feature set of each pollutant, the modeling method further comprises:
acquiring a geographical position relationship between a current monitoring point and an adjacent monitoring point, and constructing a zone cross-correlation factor of the adjacent monitoring point to the current monitoring point according to the geographical position relationship and the preprocessed monitoring data;
respectively constructing a concentration prediction model of each pollutant based on an Informer deep learning model according to the feature set of each pollutant, and specifically comprising the following steps:
and respectively constructing a concentration prediction model of each pollutant based on an Informer deep learning model according to the feature set of each pollutant and the area cross-correlation factor of adjacent monitoring points to the current monitoring point.
6. The modeling method of the air quality prediction model according to claim 5, wherein the step of obtaining the geographical position relationship between the current monitoring point and the adjacent monitoring point, and the step of constructing the area cross-correlation factor of the adjacent monitoring point to the current monitoring point according to the geographical position relationship and the preprocessed monitoring data specifically comprises the steps of:
acquiring a geographical position relationship between a current monitoring point and an adjacent monitoring point;
acquiring the distance and the maximum influence angle of the adjacent monitoring points to the current monitoring point according to the geographical position relationship;
according to the distance, the maximum influence angle and the preprocessed monitoring data, a calculation model of the area cross-correlation factor of the adjacent monitoring points to the current monitoring point is constructed; wherein the plurality of meteorological conditions comprises wind direction and wind speed; the calculation model is as follows:
Figure FDA0003781576680000031
h(x 1 )=|max-x 1 |
in the formula, x 0 Distance, x, from adjacent monitoring point to current monitoring point 1 Representing the wind direction in absolute coordinates; x is a radical of a fluorine atom 2 Representing the wind speed, e the natural base, max the maximum angle of influence.
7. A modeling apparatus for an air quality prediction model, comprising:
the data acquisition module is used for acquiring monitoring data of various monitoring objects; wherein the plurality of monitored objects comprises a plurality of meteorological conditions and a plurality of pollutants;
the preprocessing module is used for preprocessing the monitoring data so as to eliminate abnormal data in the monitoring data;
the correlation module is used for calculating correlation coefficients among all monitoring objects according to the preprocessed monitoring data;
the classification module is used for classifying the meteorological conditions according to the correlation coefficient and obtaining a meteorological representative of each classification;
the feature set module is used for acquiring feature sets of various pollutants from the preprocessed monitoring data according to the correlation coefficient and the meteorological representation;
and the construction module is used for respectively constructing concentration prediction models of various pollutants based on the Informmer deep learning model according to the feature sets of various pollutants.
8. A modeling apparatus for an air quality prediction model, comprising a processor, a memory, and a computer program stored in the memory; the computer program is executable by the processor to implement a method of modeling an air quality prediction model as claimed in any one of claims 1 to 6.
9. A computer-readable storage medium, comprising a stored computer program, wherein the computer program when executed controls an apparatus in which the computer-readable storage medium is located to perform the method of modeling an air quality prediction model according to any one of claims 1 to 6.
CN202210931237.6A 2022-08-04 2022-08-04 Modeling method, device, equipment and storage medium of air quality prediction model Pending CN115295086A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210931237.6A CN115295086A (en) 2022-08-04 2022-08-04 Modeling method, device, equipment and storage medium of air quality prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210931237.6A CN115295086A (en) 2022-08-04 2022-08-04 Modeling method, device, equipment and storage medium of air quality prediction model

Publications (1)

Publication Number Publication Date
CN115295086A true CN115295086A (en) 2022-11-04

Family

ID=83826643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210931237.6A Pending CN115295086A (en) 2022-08-04 2022-08-04 Modeling method, device, equipment and storage medium of air quality prediction model

Country Status (1)

Country Link
CN (1) CN115295086A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116564431A (en) * 2023-06-02 2023-08-08 江苏捷利达环保科技有限公司 Pollution source online analysis system and method based on big data processing
WO2024103616A1 (en) * 2022-11-17 2024-05-23 河北先河环保科技股份有限公司 Air pollution early-warning method and apparatus, electronic device, and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024103616A1 (en) * 2022-11-17 2024-05-23 河北先河环保科技股份有限公司 Air pollution early-warning method and apparatus, electronic device, and storage medium
CN116564431A (en) * 2023-06-02 2023-08-08 江苏捷利达环保科技有限公司 Pollution source online analysis system and method based on big data processing
CN116564431B (en) * 2023-06-02 2024-01-09 江苏捷利达环保科技有限公司 Pollution source online analysis system and method based on big data processing

Similar Documents

Publication Publication Date Title
Sahu et al. High-resolution space–time ozone modeling for assessing trends
CN115295086A (en) Modeling method, device, equipment and storage medium of air quality prediction model
Lee et al. Subseasonal predictions of tropical cyclone occurrence and ACE in the S2S dataset
Politi et al. A sensitivity study of high-resolution climate simulations for Greece
CN113836808A (en) PM2.5 deep learning prediction method based on heavy pollution feature constraint
Syafei et al. Prediction model of air pollutant levels using linear model with component analysis
Jurj et al. Custom outlier detection for electrical energy consumption data applied in case of demand response in block of buildings
Li et al. A nonstationary runoff frequency analysis for future climate change and its uncertainties
Bodini et al. Operational-based annual energy production uncertainty: are its components actually uncorrelated?
CN117526274A (en) New energy power prediction method, electronic equipment and storage medium in extreme climate
Zhang et al. Forecasting skewed biased stochastic ozone days: Analyses and solutions
Mok et al. Predicting ground-level ozone concentrations by adaptive Bayesian model averaging of statistical seasonal models
CN117371303A (en) Prediction method for effective wave height under sea wave
Thupeng Use of the Three-parameter Burr XII Distribution for modelling ambient daily maximum nitrogen dioxide concentrations in the Gaborone fire brigade
CN109583095A (en) NORTHWESTERN PACIFIC TYPHOON extended peroid forecasting procedure based on mixing statistics dynamic model
Wardana et al. TinyML models for a low-cost air quality monitoring device
Yang et al. A new approach for forecasting the price range with financial interval-valued time series data
Hu et al. A review of anthropogenic ground-level carbon emissions based on satellite data
Thupeng et al. A Principal Component Regression Model, for Forecasting Daily Peak Ambient Ground Level Ozone Concentrations, in the Presence of Multicollinearity Amongst Precursor Air Pollutants and Local Meteorological Conditions: A Case Study of Maun
Verma et al. hour advance forecast of surface Ozone using linear and non-linear models at a semi-urban site of Indo-Gangetic plain
Ma et al. Spatial and temporal characteristics analysis and prediction model of PM2. 5 concentration based on SpatioTemporal-Informer model
Ruan et al. Tracking and analyzing the short-run impact of covid-19 on the us electricity sector
Li Research on house price forecast based on grey system GM (1, 1)
Lakku et al. Skill and Intercomparison of Global Climate Models in Simulating Wind Speed, and Future Changes in Wind Speed over South Asian Domain
Falchi et al. Deep Learning and Structural Health Monitoring: A TFT-Based Approach for Anomaly Detection in Masonry Towers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230605

Address after: Unit G, Room 1601, No.1 Xishanwei Road, Software Park Phase III, Torch High tech Zone, Xiamen City, Fujian Province, 361000

Applicant after: Xiamen Qingmiao Intelligent Technology Co.,Ltd.

Address before: Part A, Unit 1901, No. 365, Chengyi Street, Phase III, Software Park, Xiamen, Fujian 361,000

Applicant before: Xiamen New Energy Convergence Intelligent Technology Co.,Ltd.