CN113222236A - Data distribution self-adaptive cross-regional exhaust emission prediction method and system - Google Patents
Data distribution self-adaptive cross-regional exhaust emission prediction method and system Download PDFInfo
- Publication number
- CN113222236A CN113222236A CN202110481944.5A CN202110481944A CN113222236A CN 113222236 A CN113222236 A CN 113222236A CN 202110481944 A CN202110481944 A CN 202110481944A CN 113222236 A CN113222236 A CN 113222236A
- Authority
- CN
- China
- Prior art keywords
- data
- exhaust
- concentration
- target
- exhaust gas
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000009826 distribution Methods 0.000 title claims abstract description 51
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012549 training Methods 0.000 claims abstract description 32
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 23
- 238000007781 pre-processing Methods 0.000 claims abstract description 20
- 238000012545 processing Methods 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000012795 verification Methods 0.000 claims description 9
- 230000003044 adaptive effect Effects 0.000 claims description 8
- 239000003344 environmental pollutant Substances 0.000 claims description 7
- 231100000719 pollutant Toxicity 0.000 claims description 7
- 230000002159 abnormal effect Effects 0.000 claims description 6
- 210000002569 neuron Anatomy 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 4
- 238000003915 air pollution Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000011109 contamination Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000005192 partition Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 9
- 238000012544 monitoring process Methods 0.000 description 7
- 238000003860 storage Methods 0.000 description 6
- 230000007613 environmental effect Effects 0.000 description 4
- 238000000638 solvent extraction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a data distribution self-adaptive cross-regional exhaust emission prediction method and a data distribution self-adaptive cross-regional exhaust emission prediction system, which comprise the following steps of preprocessing exhaust data of known regional sites and data related to exhaust emission to obtain target domain data; constructing an exhaust gas concentration prediction model on the data set by using a convolutional neural network; training to obtain estimated value data of the exhaust concentration of the target area, and grading the exhaust concentration according to a set standard; based on the data distribution, the edge and the conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data are minimized in a self-adaptive mode, and the difference between the two domains is eliminated; and inputting the target area data into the trained exhaust gas concentration prediction model again to obtain a prediction result. The method disclosed by the invention can be used for reducing the difference between the source region and the target region as much as possible based on data distribution self-adaptation, so that the training model can be better suitable for the target region in which tail gas data is difficult to obtain directly, and the effect of predicting the tail gas concentration of the target region is realized.
Description
Technical Field
The invention relates to the technical field of environmental detection, in particular to a data distribution self-adaptive cross-regional tail gas emission prediction method and system.
Background
Research shows that the amount of tail gas discharged by a motor vehicle in a specific space-time region is closely related to the intensity of traffic flow in the region, road condition information of roads, the quality of weather, the pollutant discharge amount of the related region and the like. The existing method is mainly used for correspondingly predicting the exhaust emission at a certain moment in the future based on the existing exhaust measurement data, and the method is suitable for areas which are fully distributed with exhaust monitoring sites and can effectively provide the exhaust concentration. However, not all monitoring stations can effectively measure the exhaust gas concentration, such as road condition information monitoring stations and Air Quality Index (AQI) monitoring stations, and it is not easy to directly obtain the specific exhaust gas concentration for the area with these stations. Therefore, it is desired to predict and analyze the exhaust gas concentration in an area where the exhaust gas concentration is not easily obtained directly, by using data relating to the traffic flow situation, the air quality index, and the like in an easily monitored area and by combining the data with the exhaust gas concentration.
Disclosure of Invention
The invention provides a data distribution self-adaptive cross-regional tail gas emission prediction method and system, which can solve the technical problem of regional tail gas prediction in the case that a target region has no tail gas concentration data but has other related data such as AQI (air quality index) and the like in the conventional method.
In order to achieve the purpose, the invention adopts the following technical scheme:
a data distribution self-adaptive cross-region exhaust emission prediction method comprises the following steps,
s10, acquiring, sorting and preprocessing the tail gas data of the known regional stations and the data related to the tail gas emission to obtain source domain data; acquiring and preprocessing relevant data of a target area to obtain target area data;
s20, performing time series division on the source domain exhaust concentration and related data, and constructing an exhaust concentration prediction model on the data set by using a convolutional neural network;
s30, training the target domain data by using the exhaust concentration prediction model to obtain estimated value data of the exhaust concentration of the target domain, and dividing the exhaust concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to a set standard;
s40, based on the data distribution, the edge and the conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data are minimized in a self-adaptive mode, and the difference between the two domains is eliminated;
and S50, inputting the target area data into the trained exhaust gas concentration prediction model again to obtain the prediction result of the target area exhaust gas concentration.
Further, the S10 obtains, sorts, and preprocesses the exhaust gas data of the known regional site and the data related to the exhaust emission to obtain source region data; the method comprises the following steps of obtaining and preprocessing relevant data of a target area to obtain target area data:
s11: respectively acquiring historical tail gas data of a source region and a target region and external factor data related to tail gas concentration from a specified official website, and recording the highest concentration of various pollutants in the same road section in the same time period as the tail gas concentration of the road section under a specific time space;
s12: the method comprises the steps of interpolating historical tail gas data of a source region and a target region, identifying abnormal values by using a box plot method, processing the abnormal values, carrying out zero-mean normalized preprocessing operation, eliminating the influence of dimension and value range difference between indexes, and scaling the data according to a proportion to enable the data to fall into a specific region.
Further, the S20 performs time-series division on the source-domain exhaust gas concentration and the related data, and constructs an exhaust gas concentration prediction model on the data set by using a convolutional neural network, which specifically includes:
s21: dividing the historical observation data of the tail gas of the source domain into observation data sequences at time intervals delta t according to the time sequence; according to the length of time seriesPartitioning a source domain observation data sequence intoBy HsRepresents;
s22: when a convolutional neural network is used, two-dimensional characteristic data needs to be provided, tail gas and related data in a certain time section are organized into two-dimensional data, 5 lines correspond to 5 influence factors in total, and 10 columns correspond to the concentration and the total concentration of each main component of the tail gas, so that thousands of groups of regional tail gas distribution maps each day are obtained;
s23: randomly selecting 80% of the data set as a training set, and performing model training on the training set by using a convolutional neural network; the neural network uses a stack of Conv2D layers and MaxPooling2D layers, using the mean square loss function MSE as the loss function loss:
in the above formula, yiIs the exact value of the ith data, and y'iAn estimated value is given for the convolutional neural network;
and taking the rest 20% of the data set as a verification set, observing the performance of the convolutional neural network on the verification set while training, and stopping training when the loss function loss on the verification set reaches the minimum value to obtain the exhaust gas prediction model suitable for the source domain data set.
Further, in step S23, weights of some neurons are randomly set to 0(dropout) in the training process, i.e., some neurons are disabled.
Further, the S30 trains the target region data by using the exhaust gas concentration prediction model to obtain the estimation value data of the exhaust gas concentration of the target region, and divides the exhaust gas concentration into five grades, i.e., excellent, good, light pollution, moderate pollution and severe pollution, according to the set standard, and specifically includes:
s31: processing data related to the tail gas in a target domain, and dividing the data into observation data sequences at time intervals delta t according to a time sequence; according to the length of time seriesPartitioning target Domain AQI Observation data sequences intoBy HtThe data set of the target area is input into a trained exhaust gas concentration prediction model to obtain an exhaust gas concentration estimated value Y 'corresponding to the target area't;
S32: because the concentration of the tail gas is defined as the highest value of the concentration of various pollutants under the space-time condition, the tail gas is divided according to the relationship between the concentration and the corresponding air pollution index, and the concentration unit of the tail gas is milligram/cubic meter and is divided into five grades: (1)0-5 is marked as "you" (grade I); (2)6-15 are scored as "good" (grade II); (3)16-60 scored as "light pollution" (grade III); (4)61-100 was scored as "moderate contamination" (grade IV); (5)100 above is denoted as "heavy pollution" (level v), and the exhaust gas concentration data on the source data domain and the estimated output exhaust gas concentration of the target data domain are divided according to the above classification method for probability calculation.
Further, the S40 is configured to adaptively minimize the edge and conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data based on the data distribution, and eliminate the difference between the two domains, specifically including:
s41: probability calculation is carried out on estimated values of the source domain exhaust data and the target domain exhaust data which are divided, wherein P (S) and P (T) are edge probability distribution of the source domain and the target domain, and the probability distribution of the two-dimensional discrete random variables (X, Y) is set as P { X ═ X%i,Y=yi}=pijI, j ═ 1,2, …, and the probability distribution of random variables X and Y is P { X ═ Xi}=pi·I ═ 1,2, …, P { Y ═ yi ═ P · j, j ═ 1,2, …, referred to as the edge probability distributions of (X, Y) with respect to X and Y, respectively;
let Q (S) and Q (T) be the conditional distribution probabilities of the source and target data fields:
s42: the maximum difference between their mean function values is measured with the maximum mean difference MMD:
Distance(Ds,Dt)≈||P(xs)-P(xt)||
under conditional probability, the formula is expressed as:
wherein n1 and n2 respectively represent the number of samples of the source domain and the target domain;
applied to the target data field, which is an estimate, the MMD is expressed as:
to obtain a reasonable f (-), the kernel matrix K is introduced:
and an MMD matrix L, the calculation formula of each element of which is as follows:
at this point the distance is converted to tr (kl) - γ tr (k), constructing a matrix W of lower dimension such that:
the target for TCA last optimization becomes:
s.t.W′KHKW=Im
where H is the central matrix.
Further, the historical exhaust gas data of the source area and the target area and the external factor data related to the exhaust gas concentration in the step S11 include weather factors, road condition information, and density of traffic flow.
On the other hand, the invention also discloses a data distribution self-adaptive cross-region exhaust emission prediction system, which comprises the following units,
the data acquisition and processing unit is used for acquiring, sorting and preprocessing the tail gas data of the known regional stations and the data related to tail gas emission to obtain source domain data; acquiring and preprocessing relevant data of a target area to obtain target area data;
the prediction model construction unit is used for carrying out time series division on the source domain exhaust concentration and related data and constructing an exhaust concentration prediction model on the data set by using a convolutional neural network;
the model training unit is used for training the target domain data by using the exhaust concentration prediction model to obtain estimated value data of the exhaust concentration of the target domain, and dividing the exhaust concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to a set standard;
the data difference processing unit is used for adaptively minimizing the edge and conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data based on data distribution and eliminating the difference between the two domains;
and the prediction unit is used for inputting the target area data into the trained exhaust gas concentration prediction model again to obtain a prediction result of the target area exhaust gas concentration.
In a third aspect, the present invention also discloses a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method as described above.
According to the technical scheme, the cross-region exhaust emission prediction method, the cross-region exhaust emission prediction system and the storage medium based on the data distribution self-adaption are characterized in that a cross-region limit learning method based on the domain adaption is fused with a traditional method, the system extracts data from monitoring sites capable of acquiring exhaust concentration, AQI and other data in a known region, trains an exhaust prediction model, and better migrates the model to a target region which can only acquire the AQI and other related data but does not have the exhaust data according to a cross-region limit learning machine and the domain adaption method, so that the exhaust concentration of the target region is predicted and analyzed.
The method overcomes the defects of the existing method, trains the source domain exhaust gas prediction model on the known regional site monitoring data, and reduces the difference between the source domain and the target region as much as possible based on data distribution self-adaptation, so that the training model can be better suitable for the target region in which the exhaust gas data is not easy to obtain directly, and the exhaust gas concentration prediction effect of the target region is realized.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a system block diagram of the present invention;
fig. 3 shows an exhaust gas concentration prediction thermodynamic diagram of the present invention (a) source domain and (b) target domain.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.
As shown in fig. 1, the data distribution adaptive cross-region exhaust emission prediction method according to the embodiment includes the following steps:
step 1: acquiring, sorting and preprocessing the tail gas data of known regional stations and other external data directly or indirectly related to tail gas emission to obtain source region data. And acquiring and preprocessing relevant data of a target area (tail gas data cannot be directly acquired due to the problems of environment, technology and the like) to obtain target area data.
Step 2: and (3) carrying out time series division on the source domain exhaust concentration and data (such as AQI and the like) related to the exhaust concentration, and constructing an exhaust concentration prediction model on the data set by using a convolutional neural network.
And step 3: training the target region data by using a tail gas concentration prediction model to obtain estimated value data of the tail gas concentration of the target region, and dividing the tail gas concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to the unified regulation of the national environmental protection agency.
And 4, step 4: the edge and conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data are minimized based on data distribution self-adaptation, the difference between the two domains is effectively eliminated, and the learned exhaust gas concentration prediction model can be well adapted to the target domain data.
And 5: and inputting the target area data (AQI and the like) into the trained exhaust gas concentration prediction model again to obtain a prediction result of the target area exhaust gas concentration.
The following is a detailed description:
further, the above step S1: acquiring, sorting and preprocessing the tail gas data of known regional stations and other external data directly or indirectly related to tail gas emission to obtain source region data. And acquiring and preprocessing relevant data of a target area (tail gas data cannot be directly acquired due to the problems of environment, technology and the like) to obtain target area data. The method specifically comprises the following subdivision steps S11-S12:
s11: historical tail gas data of a source region and a target region and external factor data related to tail gas concentration, such as weather factors, road condition information, traffic flow intensity and the like, are respectively obtained from a government official website, and the highest concentration of various pollutants in the same road section in the same time period is recorded as the tail gas concentration of the road section under a specific time space.
S12: the method comprises the steps of interpolating historical tail gas data of a source region and a target region, identifying abnormal values by using a box-plot method, processing the abnormal values, carrying out preprocessing operations such as zero-mean normalization and the like, eliminating the influence of dimension and value range difference between indexes, and scaling the data according to a proportion to enable the data to fall into a specific region.
The above step S2: and (3) carrying out time series division on the source domain exhaust concentration and data (such as AQI and the like) related to the exhaust concentration, and constructing an exhaust concentration prediction model on the data set by using a convolutional neural network.
The method specifically comprises the following subdivision steps S21-S23:
s21: the historical observation data of the exhaust gas of the source domain are divided into observation data sequences at time intervals delta t according to the time sequence (wherein delta t takes 1 hour). According to the length of time seriesPartitioning a source domain observation data sequence intoBy HsAnd (4) showing.
S22: to use the convolutional neural network, two-dimensional characteristic data needs to be provided, and the exhaust and related data in a certain time section are organized into two-dimensional data similar to pictures, wherein the two-dimensional data comprises 5 lines (corresponding to 5 influence factors such as weather factors, road condition information, traffic flow density and the like) and 10 columns (concentration of each main component and total concentration of the exhaust), so that thousands of regional exhaust distribution maps per day are obtained.
S23: and randomly selecting 80% of the data set as a training set, and performing model training on the training set by using a convolutional neural network. The neural network uses a stack of Conv2D layers and MaxPooling2D layers, using the mean square loss function MSE as the loss function loss:
in the above formula, yiIs the exact value of the ith data, and y'iAn estimate is given for the convolutional neural network. And taking the rest 20% of the data set as a verification set, observing the performance of the convolutional neural network on the verification set while training, and stopping training when the loss function loss on the verification set reaches the minimum value to obtain the tail gas prediction model (AQI → tail gas concentration) suitable for the source domain data set. In the training process, the weights of part of neurons are randomly set to 0(dropout), namely some neurons are disabled, so that the parameter quantity can be reduced, and overfitting is avoided.
The above step S3: training the target region data by using a tail gas concentration prediction model to obtain estimated value data of the tail gas concentration of the target region, and dividing the tail gas concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to the unified regulation of the national environmental protection agency. The method specifically comprises the following subdivision steps S31-S32:
s31: the exhaust gas-related data (e.g., AQI) in the target domain are processed in the same way, and are divided into observation data sequences in chronological order at time intervals Δ t (where Δ t is taken to be 1 hour). According to the length of time seriesDividing target domain AQI and other observation data sequences intoBy HtThe data set of the target area is input into a trained exhaust gas concentration prediction model to obtain an exhaust gas concentration estimated value Y 'corresponding to the target area't
S32: because the concentration of the tail gas is defined as the highest value of the concentration of various pollutants under the space-time condition, the tail gas can be divided according to the relationship between the concentration and the corresponding Air Pollution Index (API) according to the unified regulation of the national environmental protection agency, and the concentration (milligram/cubic meter) of the tail gas is divided into five grades: (1)0-5 is marked as "you" (grade I); (2)6-15 are scored as "good" (grade II); (3)16-60 scored as "light pollution" (grade III); (4)61-100 was scored as "moderate contamination" (grade IV); (5)100 above is denoted as "heavy pollution" (level v), and the exhaust gas concentration data on the source data domain and the estimated output exhaust gas concentration of the target data domain are divided according to the above classification method for probability calculation.
The above step S4: based on data distribution self-adaptation, the edge and conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data are minimized, and the difference between the two domains is effectively eliminated, so that the learned exhaust gas concentration prediction model can be well adapted to the target domain data. The method specifically comprises the following subdivision steps S41-S42:
s41: probability calculation is carried out on estimated values of the source domain exhaust data and the target domain exhaust data which are divided, wherein P (S) and P (T) are edge probability distribution of the source domain and the target domain, and the probability distribution of the two-dimensional discrete random variables (X, Y) is set as P { X ═ X%i,Y=yi}=pijI, j ═ 1,2, …, and the probability distribution of random variables X and Y is P { X ═ Xi}=pi·I is 1,2, …, P { Y is yi is p.j, j is 1,2, …, referred to as the edge probability distribution of (X, Y) with respect to X and Y, respectively.
Let Q (S) and Q (T) be the conditional distribution probabilities of the source and target data fields:
s42: to accomplish data distribution adaptive migration learning, it is necessary to shorten the distance between the source domain data and the target domain data as much as possible to reduce the difference between the two domains. The maximum difference between their mean function values is measured with the maximum mean difference MMD:
Distance(Ds,Dt)≈||P(xs)-P(xt)||
under conditional probability, the formula can be expressed as:
wherein n1 and n2 respectively represent the number of samples of the source domain and the target domain,
applied to the target data field, which is an estimate, the MMD can be expressed as:
to obtain a reasonable f (-), the kernel matrix K is introduced:
and an MMD matrix L, the calculation formula of each element of which is as follows:
at this point the distance is converted to tr (kl) - γ tr (k), constructing a matrix W of lower dimension such that:
in summary, the objective of TCA final optimization becomes:
s.t.W′KHKW=Im
where H is the central matrix.
The above step S5: and inputting the target area data (AQI and the like) into the trained exhaust gas concentration prediction model again to obtain a prediction result of the target area exhaust gas concentration. The method specifically comprises the following steps: by minimizing the distance between the source domain and the target domain, the two domains are close enough, when the model trained on the source domain exhaust gas dataset (AQI → exhaust gas concentration) can be well suited for the target domain dataset. Inputting data (data related to the exhaust gas concentration, such as AQI and the like) on the target area into an exhaust gas concentration prediction model constructed by using a convolutional neural network in the step 2 to obtain an exhaust gas concentration prediction value of the target area; the test results are shown in fig. 3.
In summary, the source-domain exhaust gas prediction model is trained on the known regional site monitoring data, and the difference between the source domain and the target domain is reduced as much as possible based on data distribution self-adaptation, so that the training model can be better suitable for the target domain in which the exhaust gas data is not easily and directly acquired, and the exhaust gas concentration prediction effect on the target domain is realized.
On the other hand, the invention also discloses a data distribution self-adaptive cross-region exhaust emission prediction system, which comprises the following units,
the data acquisition and processing unit is used for acquiring, sorting and preprocessing the tail gas data of the known regional stations and the data related to tail gas emission to obtain source domain data; acquiring and preprocessing relevant data of a target area to obtain target area data;
the prediction model construction unit is used for carrying out time series division on the source domain exhaust concentration and related data and constructing an exhaust concentration prediction model on the data set by using a convolutional neural network;
the model training unit is used for training the target domain data by using the exhaust concentration prediction model to obtain estimated value data of the exhaust concentration of the target domain, and dividing the exhaust concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to a set standard;
the data difference processing unit is used for adaptively minimizing the edge and conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data based on data distribution and eliminating the difference between the two domains;
and the prediction unit is used for inputting the target area data into the trained exhaust gas concentration prediction model again to obtain a prediction result of the target area exhaust gas concentration.
In a third aspect, the present invention also discloses a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method as described above.
It is understood that the system provided by the embodiment of the present invention corresponds to the method provided by the embodiment of the present invention, and the explanation, the example and the beneficial effects of the related contents can refer to the corresponding parts in the method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (8)
1. A data distribution self-adaptive cross-region exhaust emission prediction method is characterized by comprising the following steps,
s10, acquiring, sorting and preprocessing the tail gas data of the known regional stations and the data related to the tail gas emission to obtain source domain data; acquiring and preprocessing relevant data of a target area to obtain target area data;
s20, performing time series division on the source domain exhaust concentration and related data, and constructing an exhaust concentration prediction model on the data set by using a convolutional neural network;
s30, training the target domain data by using the exhaust concentration prediction model to obtain estimated value data of the exhaust concentration of the target domain, and dividing the exhaust concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to a set standard;
s40, based on the data distribution, the edge and the conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data are minimized in a self-adaptive mode, and the difference between the two domains is eliminated;
and S50, inputting the target area data into the trained exhaust gas concentration prediction model again to obtain the prediction result of the target area exhaust gas concentration.
2. The data distribution adaptive cross-regional exhaust emission prediction method according to claim 1, characterized in that: the S10 acquires, arranges and preprocesses the tail gas data of the known regional stations and the data related to the tail gas emission to obtain source region data; the method comprises the following steps of obtaining and preprocessing relevant data of a target area to obtain target area data:
s11: respectively acquiring historical tail gas data of a source region and a target region and external factor data related to tail gas concentration from a specified official website, and recording the highest concentration of various pollutants in the same road section in the same time period as the tail gas concentration of the road section under a specific time space;
s12: the method comprises the steps of interpolating historical tail gas data of a source region and a target region, identifying abnormal values by using a box plot method, processing the abnormal values, carrying out zero-mean normalized preprocessing operation, eliminating the influence of dimension and value range difference between indexes, and scaling the data according to a proportion to enable the data to fall into a specific region.
3. The data distribution adaptive cross-regional exhaust emission prediction method according to claim 2, characterized in that: the S20 time-series partitions the source-domain exhaust gas concentration and the related data, and constructs an exhaust gas concentration prediction model on the data set using a convolutional neural network, including:
s21: dividing the historical observation data of the tail gas of the source domain into observation data sequences at time intervals delta t according to the time sequence; dividing a source domain observation data sequence into a plurality of source domain observation data sequences according to the time sequence length lBy HsRepresents;
s22: when a convolutional neural network is used, two-dimensional characteristic data needs to be provided, tail gas and related data in a certain time section are organized into two-dimensional data, 5 lines correspond to 5 influence factors in total, and 10 columns correspond to the concentration and the total concentration of each main component of the tail gas, so that thousands of groups of regional tail gas distribution maps each day are obtained;
s23: randomly selecting 80% of the data set as a training set, and performing model training on the training set by using a convolutional neural network; the neural network uses a stack of Conv2D layers and MaxPooling2D layers, using the mean square loss function MSE as the loss function loss:
in the above formula, yiIs the exact value of the ith data, and y'iAn estimated value is given for the convolutional neural network;
and taking the rest 20% of the data set as a verification set, observing the performance of the convolutional neural network on the verification set while training, and stopping training when the loss function loss on the verification set reaches the minimum value to obtain the exhaust gas prediction model suitable for the source domain data set.
4. The data distribution adaptive cross-regional exhaust emission prediction method of claim 3, characterized in that: in step S23, weights of some neurons are randomly set to 0(dropout) in the training process, i.e., some neurons are disabled.
5. The data distribution adaptive cross-regional exhaust emission prediction method of claim 3, characterized in that: the S30 trains the target region data to obtain estimated value data of the target region exhaust gas concentration by using the exhaust gas concentration prediction model, and divides the exhaust gas concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to a set standard, and specifically includes:
s31: processing data related to the tail gas in a target domain, and dividing the data into observation data sequences at time intervals delta t according to a time sequence; dividing target domain AQI observation data sequence intoBy HtThe data set of the target area is input into a trained exhaust gas concentration prediction model to obtain an exhaust gas concentration estimated value Y 'corresponding to the target area't;
S32: because the concentration of the tail gas is defined as the highest value of the concentration of various pollutants under the space-time condition, the tail gas is divided according to the relationship between the concentration and the corresponding air pollution index, and the concentration unit of the tail gas is milligram/cubic meter and is divided into five grades: (1)0-5 is marked as "you" (grade I); (2)6-15 are scored as "good" (grade II); (3)16-60 scored as "light pollution" (grade III); (4)61-100 was scored as "moderate contamination" (grade IV); (5)100 above is denoted as "heavy pollution" (level v), and the exhaust gas concentration data on the source data domain and the estimated output exhaust gas concentration of the target data domain are divided according to the above classification method for probability calculation.
6. The data distribution adaptive cross-regional exhaust emission prediction method of claim 5, characterized in that: based on the data distribution, the S40 adaptively minimizes the edge and conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data, and eliminates the difference between the two domains, specifically including:
s41: probability calculation is carried out on estimated values of the source domain exhaust data and the target domain exhaust data which are divided, wherein P (S) and P (T) are edge probability distribution of the source domain and the target domain, and the probability distribution of the two-dimensional discrete random variables (X, Y) is set as P { X ═ X%i,Y=yi}=pijI, j ═ 1,2, …, and the probability distribution of random variables X and Y is P { X ═ Xi}=pi·,i=1,2,…,P{Y=yi}=p·jJ ═ 1,2, …, referred to as the edge probability distribution of (X, Y) with respect to X and Y, respectively;
let Q (S) and Q (T) be the conditional distribution probabilities of the source and target data fields:
s42: the maximum difference between their mean function values is measured with the maximum mean difference MMD:
Distance(Ds,Dt)≈||P(xs)-P(xt)||
under conditional probability, the formula is expressed as:
wherein n1 and n2 respectively represent the number of samples of the source domain and the target domain;
applied to the target data field, which is an estimate, the MMD is expressed as:
to obtain a reasonable f (-), the kernel matrix K is introduced:
and an MMD matrix L, the calculation formula of each element of which is as follows:
at this point the distance is converted to tr (kl) - γ tr (k), constructing a matrix W of lower dimension such that:
the target for TCA last optimization becomes:
s.t.W′KHKW=Im
where H is the central matrix.
7. The data distribution adaptive cross-regional exhaust emission prediction method according to claim 2, characterized in that: the historical exhaust gas data of the source area and the target area and the external factor data related to the exhaust gas concentration in the step S11 include weather factors, road condition information, and the density of traffic flow.
8. A data distribution self-adaptive cross-regional exhaust emission prediction system is characterized in that: comprises the following units of a first unit, a second unit,
the data acquisition and processing unit is used for acquiring, sorting and preprocessing the tail gas data of the known regional stations and the data related to tail gas emission to obtain source domain data; acquiring and preprocessing relevant data of a target area to obtain target area data;
the prediction model construction unit is used for carrying out time series division on the source domain exhaust concentration and related data and constructing an exhaust concentration prediction model on the data set by using a convolutional neural network;
the model training unit is used for training the target domain data by using the exhaust concentration prediction model to obtain estimated value data of the exhaust concentration of the target domain, and dividing the exhaust concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to a set standard;
the data difference processing unit is used for adaptively minimizing the edge and conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data based on data distribution and eliminating the difference between the two domains;
and the prediction unit is used for inputting the target area data into the trained exhaust gas concentration prediction model again to obtain a prediction result of the target area exhaust gas concentration.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110481944.5A CN113222236A (en) | 2021-04-30 | 2021-04-30 | Data distribution self-adaptive cross-regional exhaust emission prediction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110481944.5A CN113222236A (en) | 2021-04-30 | 2021-04-30 | Data distribution self-adaptive cross-regional exhaust emission prediction method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113222236A true CN113222236A (en) | 2021-08-06 |
Family
ID=77090666
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110481944.5A Pending CN113222236A (en) | 2021-04-30 | 2021-04-30 | Data distribution self-adaptive cross-regional exhaust emission prediction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113222236A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105488316A (en) * | 2014-09-17 | 2016-04-13 | 日本电气株式会社 | Air quality prediction system and method |
CN108288109A (en) * | 2018-01-11 | 2018-07-17 | 安徽优思天成智能科技有限公司 | Motor-vehicle tail-gas concentration prediction method based on LSTM depth space-time residual error networks |
WO2018214060A1 (en) * | 2017-05-24 | 2018-11-29 | 北京质享科技有限公司 | Small-scale air quality index prediction method and system for city |
CN112132264A (en) * | 2020-09-11 | 2020-12-25 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Regional exhaust emission prediction method and system based on space-time residual perception network |
CN112598170A (en) * | 2020-12-18 | 2021-04-02 | 中国科学技术大学 | Vehicle exhaust emission prediction method and system based on multi-component fusion time network |
-
2021
- 2021-04-30 CN CN202110481944.5A patent/CN113222236A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105488316A (en) * | 2014-09-17 | 2016-04-13 | 日本电气株式会社 | Air quality prediction system and method |
WO2018214060A1 (en) * | 2017-05-24 | 2018-11-29 | 北京质享科技有限公司 | Small-scale air quality index prediction method and system for city |
CN108288109A (en) * | 2018-01-11 | 2018-07-17 | 安徽优思天成智能科技有限公司 | Motor-vehicle tail-gas concentration prediction method based on LSTM depth space-time residual error networks |
CN112132264A (en) * | 2020-09-11 | 2020-12-25 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Regional exhaust emission prediction method and system based on space-time residual perception network |
CN112598170A (en) * | 2020-12-18 | 2021-04-02 | 中国科学技术大学 | Vehicle exhaust emission prediction method and system based on multi-component fusion time network |
Non-Patent Citations (1)
Title |
---|
袁立镇: "基于生成对抗网络的无监督域适应研究", 《中国优秀硕士学位论文全文数据库 (工程科技Ⅱ辑)》, no. 1, 15 January 2021 (2021-01-15), pages 2 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108491970B (en) | Atmospheric pollutant concentration prediction method based on RBF neural network | |
WO2020063689A1 (en) | Method and device for predicting thermal load of electrical system | |
CN110782658B (en) | Traffic prediction method based on LightGBM algorithm | |
CN110555989B (en) | Xgboost algorithm-based traffic prediction method | |
CN111325403B (en) | Method for predicting residual life of electromechanical equipment of highway tunnel | |
CN113496314B (en) | Method for predicting road traffic flow by neural network model | |
CN113918538B (en) | New road maintenance data migration system based on artificial neural network | |
CN111027662A (en) | SD-LSSVR short-time traffic flow prediction method based on chaos quantum particle swarm optimization | |
CN111694856B (en) | Intelligent prediction method and device for reservoir sensitivity | |
CN115526298A (en) | High-robustness comprehensive prediction method for concentration of atmospheric pollutants | |
CN111951104A (en) | Risk conduction early warning method based on associated graph | |
CN110134754B (en) | Method, device, server and medium for predicting operation duration of region interest point | |
CN112819218B (en) | High-resolution urban mobile source pollution space-time prediction method, system and storage medium | |
CN111694855B (en) | Intelligent prediction data processing method and device for reservoir sensitivity | |
CN116933947A (en) | Landslide susceptibility prediction method based on soft voting integrated classifier | |
Bidyuk et al. | An Approach to Identifying and Filling Data Gaps in Machine Learning Procedures | |
CN113222236A (en) | Data distribution self-adaptive cross-regional exhaust emission prediction method and system | |
CN113256304B (en) | Campus card abnormal use behavior online early warning method and system | |
CN111260927B (en) | Road network flow prediction method | |
CN113158084B (en) | Method, device, computer equipment and storage medium for processing movement track data | |
CN112506930B (en) | Data insight system based on machine learning technology | |
Zhu et al. | Predicting profit performance of international construction projects | |
CN113326976A (en) | Port freight volume online prediction method and system based on time-space correlation | |
CN117171678B (en) | Soil microbial flora regulation and control method and system in microbial remediation process | |
Kyshenko et al. | Technological monitoring in the management of the distillation-rectification plant |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210806 |
|
RJ01 | Rejection of invention patent application after publication |