CN113222236A - Data distribution self-adaptive cross-regional exhaust emission prediction method and system - Google Patents

Data distribution self-adaptive cross-regional exhaust emission prediction method and system Download PDF

Info

Publication number
CN113222236A
CN113222236A CN202110481944.5A CN202110481944A CN113222236A CN 113222236 A CN113222236 A CN 113222236A CN 202110481944 A CN202110481944 A CN 202110481944A CN 113222236 A CN113222236 A CN 113222236A
Authority
CN
China
Prior art keywords
data
exhaust
concentration
target
exhaust gas
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110481944.5A
Other languages
Chinese (zh)
Inventor
康宇
鲁晔
曹洋
许镇义
夏秀山
李兵兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Ecological Environment Monitoring Center Anhui Heavy Pollution Weather Forecast And Early Warning Center
Institute of Advanced Technology University of Science and Technology of China
Original Assignee
Anhui Ecological Environment Monitoring Center Anhui Heavy Pollution Weather Forecast And Early Warning Center
Institute of Advanced Technology University of Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Ecological Environment Monitoring Center Anhui Heavy Pollution Weather Forecast And Early Warning Center, Institute of Advanced Technology University of Science and Technology of China filed Critical Anhui Ecological Environment Monitoring Center Anhui Heavy Pollution Weather Forecast And Early Warning Center
Priority to CN202110481944.5A priority Critical patent/CN113222236A/en
Publication of CN113222236A publication Critical patent/CN113222236A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a data distribution self-adaptive cross-regional exhaust emission prediction method and a data distribution self-adaptive cross-regional exhaust emission prediction system, which comprise the following steps of preprocessing exhaust data of known regional sites and data related to exhaust emission to obtain target domain data; constructing an exhaust gas concentration prediction model on the data set by using a convolutional neural network; training to obtain estimated value data of the exhaust concentration of the target area, and grading the exhaust concentration according to a set standard; based on the data distribution, the edge and the conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data are minimized in a self-adaptive mode, and the difference between the two domains is eliminated; and inputting the target area data into the trained exhaust gas concentration prediction model again to obtain a prediction result. The method disclosed by the invention can be used for reducing the difference between the source region and the target region as much as possible based on data distribution self-adaptation, so that the training model can be better suitable for the target region in which tail gas data is difficult to obtain directly, and the effect of predicting the tail gas concentration of the target region is realized.

Description

Data distribution self-adaptive cross-regional exhaust emission prediction method and system
Technical Field
The invention relates to the technical field of environmental detection, in particular to a data distribution self-adaptive cross-regional tail gas emission prediction method and system.
Background
Research shows that the amount of tail gas discharged by a motor vehicle in a specific space-time region is closely related to the intensity of traffic flow in the region, road condition information of roads, the quality of weather, the pollutant discharge amount of the related region and the like. The existing method is mainly used for correspondingly predicting the exhaust emission at a certain moment in the future based on the existing exhaust measurement data, and the method is suitable for areas which are fully distributed with exhaust monitoring sites and can effectively provide the exhaust concentration. However, not all monitoring stations can effectively measure the exhaust gas concentration, such as road condition information monitoring stations and Air Quality Index (AQI) monitoring stations, and it is not easy to directly obtain the specific exhaust gas concentration for the area with these stations. Therefore, it is desired to predict and analyze the exhaust gas concentration in an area where the exhaust gas concentration is not easily obtained directly, by using data relating to the traffic flow situation, the air quality index, and the like in an easily monitored area and by combining the data with the exhaust gas concentration.
Disclosure of Invention
The invention provides a data distribution self-adaptive cross-regional tail gas emission prediction method and system, which can solve the technical problem of regional tail gas prediction in the case that a target region has no tail gas concentration data but has other related data such as AQI (air quality index) and the like in the conventional method.
In order to achieve the purpose, the invention adopts the following technical scheme:
a data distribution self-adaptive cross-region exhaust emission prediction method comprises the following steps,
s10, acquiring, sorting and preprocessing the tail gas data of the known regional stations and the data related to the tail gas emission to obtain source domain data; acquiring and preprocessing relevant data of a target area to obtain target area data;
s20, performing time series division on the source domain exhaust concentration and related data, and constructing an exhaust concentration prediction model on the data set by using a convolutional neural network;
s30, training the target domain data by using the exhaust concentration prediction model to obtain estimated value data of the exhaust concentration of the target domain, and dividing the exhaust concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to a set standard;
s40, based on the data distribution, the edge and the conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data are minimized in a self-adaptive mode, and the difference between the two domains is eliminated;
and S50, inputting the target area data into the trained exhaust gas concentration prediction model again to obtain the prediction result of the target area exhaust gas concentration.
Further, the S10 obtains, sorts, and preprocesses the exhaust gas data of the known regional site and the data related to the exhaust emission to obtain source region data; the method comprises the following steps of obtaining and preprocessing relevant data of a target area to obtain target area data:
s11: respectively acquiring historical tail gas data of a source region and a target region and external factor data related to tail gas concentration from a specified official website, and recording the highest concentration of various pollutants in the same road section in the same time period as the tail gas concentration of the road section under a specific time space;
s12: the method comprises the steps of interpolating historical tail gas data of a source region and a target region, identifying abnormal values by using a box plot method, processing the abnormal values, carrying out zero-mean normalized preprocessing operation, eliminating the influence of dimension and value range difference between indexes, and scaling the data according to a proportion to enable the data to fall into a specific region.
Further, the S20 performs time-series division on the source-domain exhaust gas concentration and the related data, and constructs an exhaust gas concentration prediction model on the data set by using a convolutional neural network, which specifically includes:
s21: dividing the historical observation data of the tail gas of the source domain into observation data sequences at time intervals delta t according to the time sequence; according to the length of time series
Figure BDA0003048812380000022
Partitioning a source domain observation data sequence into
Figure BDA0003048812380000021
By HsRepresents;
s22: when a convolutional neural network is used, two-dimensional characteristic data needs to be provided, tail gas and related data in a certain time section are organized into two-dimensional data, 5 lines correspond to 5 influence factors in total, and 10 columns correspond to the concentration and the total concentration of each main component of the tail gas, so that thousands of groups of regional tail gas distribution maps each day are obtained;
s23: randomly selecting 80% of the data set as a training set, and performing model training on the training set by using a convolutional neural network; the neural network uses a stack of Conv2D layers and MaxPooling2D layers, using the mean square loss function MSE as the loss function loss:
Figure BDA0003048812380000031
in the above formula, yiIs the exact value of the ith data, and y'iAn estimated value is given for the convolutional neural network;
and taking the rest 20% of the data set as a verification set, observing the performance of the convolutional neural network on the verification set while training, and stopping training when the loss function loss on the verification set reaches the minimum value to obtain the exhaust gas prediction model suitable for the source domain data set.
Further, in step S23, weights of some neurons are randomly set to 0(dropout) in the training process, i.e., some neurons are disabled.
Further, the S30 trains the target region data by using the exhaust gas concentration prediction model to obtain the estimation value data of the exhaust gas concentration of the target region, and divides the exhaust gas concentration into five grades, i.e., excellent, good, light pollution, moderate pollution and severe pollution, according to the set standard, and specifically includes:
s31: processing data related to the tail gas in a target domain, and dividing the data into observation data sequences at time intervals delta t according to a time sequence; according to the length of time series
Figure BDA0003048812380000032
Partitioning target Domain AQI Observation data sequences into
Figure BDA0003048812380000033
By HtThe data set of the target area is input into a trained exhaust gas concentration prediction model to obtain an exhaust gas concentration estimated value Y 'corresponding to the target area't
S32: because the concentration of the tail gas is defined as the highest value of the concentration of various pollutants under the space-time condition, the tail gas is divided according to the relationship between the concentration and the corresponding air pollution index, and the concentration unit of the tail gas is milligram/cubic meter and is divided into five grades: (1)0-5 is marked as "you" (grade I); (2)6-15 are scored as "good" (grade II); (3)16-60 scored as "light pollution" (grade III); (4)61-100 was scored as "moderate contamination" (grade IV); (5)100 above is denoted as "heavy pollution" (level v), and the exhaust gas concentration data on the source data domain and the estimated output exhaust gas concentration of the target data domain are divided according to the above classification method for probability calculation.
Further, the S40 is configured to adaptively minimize the edge and conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data based on the data distribution, and eliminate the difference between the two domains, specifically including:
s41: probability calculation is carried out on estimated values of the source domain exhaust data and the target domain exhaust data which are divided, wherein P (S) and P (T) are edge probability distribution of the source domain and the target domain, and the probability distribution of the two-dimensional discrete random variables (X, Y) is set as P { X ═ X%i,Y=yi}=pijI, j ═ 1,2, …, and the probability distribution of random variables X and Y is P { X ═ Xi}=pI ═ 1,2, …, P { Y ═ yi ═ P · j, j ═ 1,2, …, referred to as the edge probability distributions of (X, Y) with respect to X and Y, respectively;
let Q (S) and Q (T) be the conditional distribution probabilities of the source and target data fields:
Figure BDA0003048812380000041
s42: the maximum difference between their mean function values is measured with the maximum mean difference MMD:
Distance(Ds,Dt)≈||P(xs)-P(xt)||
under conditional probability, the formula is expressed as:
Figure BDA0003048812380000042
wherein n1 and n2 respectively represent the number of samples of the source domain and the target domain;
applied to the target data field, which is an estimate, the MMD is expressed as:
Figure BDA0003048812380000051
to obtain a reasonable f (-), the kernel matrix K is introduced:
Figure BDA0003048812380000052
and an MMD matrix L, the calculation formula of each element of which is as follows:
Figure BDA0003048812380000053
at this point the distance is converted to tr (kl) - γ tr (k), constructing a matrix W of lower dimension such that:
Figure BDA0003048812380000054
the target for TCA last optimization becomes:
Figure BDA0003048812380000055
s.t.W′KHKW=Im
where H is the central matrix.
Further, the historical exhaust gas data of the source area and the target area and the external factor data related to the exhaust gas concentration in the step S11 include weather factors, road condition information, and density of traffic flow.
On the other hand, the invention also discloses a data distribution self-adaptive cross-region exhaust emission prediction system, which comprises the following units,
the data acquisition and processing unit is used for acquiring, sorting and preprocessing the tail gas data of the known regional stations and the data related to tail gas emission to obtain source domain data; acquiring and preprocessing relevant data of a target area to obtain target area data;
the prediction model construction unit is used for carrying out time series division on the source domain exhaust concentration and related data and constructing an exhaust concentration prediction model on the data set by using a convolutional neural network;
the model training unit is used for training the target domain data by using the exhaust concentration prediction model to obtain estimated value data of the exhaust concentration of the target domain, and dividing the exhaust concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to a set standard;
the data difference processing unit is used for adaptively minimizing the edge and conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data based on data distribution and eliminating the difference between the two domains;
and the prediction unit is used for inputting the target area data into the trained exhaust gas concentration prediction model again to obtain a prediction result of the target area exhaust gas concentration.
In a third aspect, the present invention also discloses a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method as described above.
According to the technical scheme, the cross-region exhaust emission prediction method, the cross-region exhaust emission prediction system and the storage medium based on the data distribution self-adaption are characterized in that a cross-region limit learning method based on the domain adaption is fused with a traditional method, the system extracts data from monitoring sites capable of acquiring exhaust concentration, AQI and other data in a known region, trains an exhaust prediction model, and better migrates the model to a target region which can only acquire the AQI and other related data but does not have the exhaust data according to a cross-region limit learning machine and the domain adaption method, so that the exhaust concentration of the target region is predicted and analyzed.
The method overcomes the defects of the existing method, trains the source domain exhaust gas prediction model on the known regional site monitoring data, and reduces the difference between the source domain and the target region as much as possible based on data distribution self-adaptation, so that the training model can be better suitable for the target region in which the exhaust gas data is not easy to obtain directly, and the exhaust gas concentration prediction effect of the target region is realized.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a system block diagram of the present invention;
fig. 3 shows an exhaust gas concentration prediction thermodynamic diagram of the present invention (a) source domain and (b) target domain.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.
As shown in fig. 1, the data distribution adaptive cross-region exhaust emission prediction method according to the embodiment includes the following steps:
step 1: acquiring, sorting and preprocessing the tail gas data of known regional stations and other external data directly or indirectly related to tail gas emission to obtain source region data. And acquiring and preprocessing relevant data of a target area (tail gas data cannot be directly acquired due to the problems of environment, technology and the like) to obtain target area data.
Step 2: and (3) carrying out time series division on the source domain exhaust concentration and data (such as AQI and the like) related to the exhaust concentration, and constructing an exhaust concentration prediction model on the data set by using a convolutional neural network.
And step 3: training the target region data by using a tail gas concentration prediction model to obtain estimated value data of the tail gas concentration of the target region, and dividing the tail gas concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to the unified regulation of the national environmental protection agency.
And 4, step 4: the edge and conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data are minimized based on data distribution self-adaptation, the difference between the two domains is effectively eliminated, and the learned exhaust gas concentration prediction model can be well adapted to the target domain data.
And 5: and inputting the target area data (AQI and the like) into the trained exhaust gas concentration prediction model again to obtain a prediction result of the target area exhaust gas concentration.
The following is a detailed description:
further, the above step S1: acquiring, sorting and preprocessing the tail gas data of known regional stations and other external data directly or indirectly related to tail gas emission to obtain source region data. And acquiring and preprocessing relevant data of a target area (tail gas data cannot be directly acquired due to the problems of environment, technology and the like) to obtain target area data. The method specifically comprises the following subdivision steps S11-S12:
s11: historical tail gas data of a source region and a target region and external factor data related to tail gas concentration, such as weather factors, road condition information, traffic flow intensity and the like, are respectively obtained from a government official website, and the highest concentration of various pollutants in the same road section in the same time period is recorded as the tail gas concentration of the road section under a specific time space.
S12: the method comprises the steps of interpolating historical tail gas data of a source region and a target region, identifying abnormal values by using a box-plot method, processing the abnormal values, carrying out preprocessing operations such as zero-mean normalization and the like, eliminating the influence of dimension and value range difference between indexes, and scaling the data according to a proportion to enable the data to fall into a specific region.
The above step S2: and (3) carrying out time series division on the source domain exhaust concentration and data (such as AQI and the like) related to the exhaust concentration, and constructing an exhaust concentration prediction model on the data set by using a convolutional neural network.
The method specifically comprises the following subdivision steps S21-S23:
s21: the historical observation data of the exhaust gas of the source domain are divided into observation data sequences at time intervals delta t according to the time sequence (wherein delta t takes 1 hour). According to the length of time series
Figure BDA0003048812380000081
Partitioning a source domain observation data sequence into
Figure BDA0003048812380000082
By HsAnd (4) showing.
S22: to use the convolutional neural network, two-dimensional characteristic data needs to be provided, and the exhaust and related data in a certain time section are organized into two-dimensional data similar to pictures, wherein the two-dimensional data comprises 5 lines (corresponding to 5 influence factors such as weather factors, road condition information, traffic flow density and the like) and 10 columns (concentration of each main component and total concentration of the exhaust), so that thousands of regional exhaust distribution maps per day are obtained.
S23: and randomly selecting 80% of the data set as a training set, and performing model training on the training set by using a convolutional neural network. The neural network uses a stack of Conv2D layers and MaxPooling2D layers, using the mean square loss function MSE as the loss function loss:
Figure BDA0003048812380000091
in the above formula, yiIs the exact value of the ith data, and y'iAn estimate is given for the convolutional neural network. And taking the rest 20% of the data set as a verification set, observing the performance of the convolutional neural network on the verification set while training, and stopping training when the loss function loss on the verification set reaches the minimum value to obtain the tail gas prediction model (AQI → tail gas concentration) suitable for the source domain data set. In the training process, the weights of part of neurons are randomly set to 0(dropout), namely some neurons are disabled, so that the parameter quantity can be reduced, and overfitting is avoided.
The above step S3: training the target region data by using a tail gas concentration prediction model to obtain estimated value data of the tail gas concentration of the target region, and dividing the tail gas concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to the unified regulation of the national environmental protection agency. The method specifically comprises the following subdivision steps S31-S32:
s31: the exhaust gas-related data (e.g., AQI) in the target domain are processed in the same way, and are divided into observation data sequences in chronological order at time intervals Δ t (where Δ t is taken to be 1 hour). According to the length of time series
Figure BDA0003048812380000092
Dividing target domain AQI and other observation data sequences into
Figure BDA0003048812380000093
By HtThe data set of the target area is input into a trained exhaust gas concentration prediction model to obtain an exhaust gas concentration estimated value Y 'corresponding to the target area't
S32: because the concentration of the tail gas is defined as the highest value of the concentration of various pollutants under the space-time condition, the tail gas can be divided according to the relationship between the concentration and the corresponding Air Pollution Index (API) according to the unified regulation of the national environmental protection agency, and the concentration (milligram/cubic meter) of the tail gas is divided into five grades: (1)0-5 is marked as "you" (grade I); (2)6-15 are scored as "good" (grade II); (3)16-60 scored as "light pollution" (grade III); (4)61-100 was scored as "moderate contamination" (grade IV); (5)100 above is denoted as "heavy pollution" (level v), and the exhaust gas concentration data on the source data domain and the estimated output exhaust gas concentration of the target data domain are divided according to the above classification method for probability calculation.
The above step S4: based on data distribution self-adaptation, the edge and conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data are minimized, and the difference between the two domains is effectively eliminated, so that the learned exhaust gas concentration prediction model can be well adapted to the target domain data. The method specifically comprises the following subdivision steps S41-S42:
s41: probability calculation is carried out on estimated values of the source domain exhaust data and the target domain exhaust data which are divided, wherein P (S) and P (T) are edge probability distribution of the source domain and the target domain, and the probability distribution of the two-dimensional discrete random variables (X, Y) is set as P { X ═ X%i,Y=yi}=pijI, j ═ 1,2, …, and the probability distribution of random variables X and Y is P { X ═ Xi}=pI is 1,2, …, P { Y is yi is p.j, j is 1,2, …, referred to as the edge probability distribution of (X, Y) with respect to X and Y, respectively.
Let Q (S) and Q (T) be the conditional distribution probabilities of the source and target data fields:
Figure BDA0003048812380000101
s42: to accomplish data distribution adaptive migration learning, it is necessary to shorten the distance between the source domain data and the target domain data as much as possible to reduce the difference between the two domains. The maximum difference between their mean function values is measured with the maximum mean difference MMD:
Distance(Ds,Dt)≈||P(xs)-P(xt)||
under conditional probability, the formula can be expressed as:
Figure BDA0003048812380000111
wherein n1 and n2 respectively represent the number of samples of the source domain and the target domain,
applied to the target data field, which is an estimate, the MMD can be expressed as:
Figure BDA0003048812380000112
to obtain a reasonable f (-), the kernel matrix K is introduced:
Figure BDA0003048812380000113
and an MMD matrix L, the calculation formula of each element of which is as follows:
Figure BDA0003048812380000114
at this point the distance is converted to tr (kl) - γ tr (k), constructing a matrix W of lower dimension such that:
Figure BDA0003048812380000115
in summary, the objective of TCA final optimization becomes:
Figure BDA0003048812380000116
s.t.W′KHKW=Im
where H is the central matrix.
The above step S5: and inputting the target area data (AQI and the like) into the trained exhaust gas concentration prediction model again to obtain a prediction result of the target area exhaust gas concentration. The method specifically comprises the following steps: by minimizing the distance between the source domain and the target domain, the two domains are close enough, when the model trained on the source domain exhaust gas dataset (AQI → exhaust gas concentration) can be well suited for the target domain dataset. Inputting data (data related to the exhaust gas concentration, such as AQI and the like) on the target area into an exhaust gas concentration prediction model constructed by using a convolutional neural network in the step 2 to obtain an exhaust gas concentration prediction value of the target area; the test results are shown in fig. 3.
In summary, the source-domain exhaust gas prediction model is trained on the known regional site monitoring data, and the difference between the source domain and the target domain is reduced as much as possible based on data distribution self-adaptation, so that the training model can be better suitable for the target domain in which the exhaust gas data is not easily and directly acquired, and the exhaust gas concentration prediction effect on the target domain is realized.
On the other hand, the invention also discloses a data distribution self-adaptive cross-region exhaust emission prediction system, which comprises the following units,
the data acquisition and processing unit is used for acquiring, sorting and preprocessing the tail gas data of the known regional stations and the data related to tail gas emission to obtain source domain data; acquiring and preprocessing relevant data of a target area to obtain target area data;
the prediction model construction unit is used for carrying out time series division on the source domain exhaust concentration and related data and constructing an exhaust concentration prediction model on the data set by using a convolutional neural network;
the model training unit is used for training the target domain data by using the exhaust concentration prediction model to obtain estimated value data of the exhaust concentration of the target domain, and dividing the exhaust concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to a set standard;
the data difference processing unit is used for adaptively minimizing the edge and conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data based on data distribution and eliminating the difference between the two domains;
and the prediction unit is used for inputting the target area data into the trained exhaust gas concentration prediction model again to obtain a prediction result of the target area exhaust gas concentration.
In a third aspect, the present invention also discloses a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method as described above.
It is understood that the system provided by the embodiment of the present invention corresponds to the method provided by the embodiment of the present invention, and the explanation, the example and the beneficial effects of the related contents can refer to the corresponding parts in the method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A data distribution self-adaptive cross-region exhaust emission prediction method is characterized by comprising the following steps,
s10, acquiring, sorting and preprocessing the tail gas data of the known regional stations and the data related to the tail gas emission to obtain source domain data; acquiring and preprocessing relevant data of a target area to obtain target area data;
s20, performing time series division on the source domain exhaust concentration and related data, and constructing an exhaust concentration prediction model on the data set by using a convolutional neural network;
s30, training the target domain data by using the exhaust concentration prediction model to obtain estimated value data of the exhaust concentration of the target domain, and dividing the exhaust concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to a set standard;
s40, based on the data distribution, the edge and the conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data are minimized in a self-adaptive mode, and the difference between the two domains is eliminated;
and S50, inputting the target area data into the trained exhaust gas concentration prediction model again to obtain the prediction result of the target area exhaust gas concentration.
2. The data distribution adaptive cross-regional exhaust emission prediction method according to claim 1, characterized in that: the S10 acquires, arranges and preprocesses the tail gas data of the known regional stations and the data related to the tail gas emission to obtain source region data; the method comprises the following steps of obtaining and preprocessing relevant data of a target area to obtain target area data:
s11: respectively acquiring historical tail gas data of a source region and a target region and external factor data related to tail gas concentration from a specified official website, and recording the highest concentration of various pollutants in the same road section in the same time period as the tail gas concentration of the road section under a specific time space;
s12: the method comprises the steps of interpolating historical tail gas data of a source region and a target region, identifying abnormal values by using a box plot method, processing the abnormal values, carrying out zero-mean normalized preprocessing operation, eliminating the influence of dimension and value range difference between indexes, and scaling the data according to a proportion to enable the data to fall into a specific region.
3. The data distribution adaptive cross-regional exhaust emission prediction method according to claim 2, characterized in that: the S20 time-series partitions the source-domain exhaust gas concentration and the related data, and constructs an exhaust gas concentration prediction model on the data set using a convolutional neural network, including:
s21: dividing the historical observation data of the tail gas of the source domain into observation data sequences at time intervals delta t according to the time sequence; dividing a source domain observation data sequence into a plurality of source domain observation data sequences according to the time sequence length l
Figure FDA0003048812370000021
By HsRepresents;
s22: when a convolutional neural network is used, two-dimensional characteristic data needs to be provided, tail gas and related data in a certain time section are organized into two-dimensional data, 5 lines correspond to 5 influence factors in total, and 10 columns correspond to the concentration and the total concentration of each main component of the tail gas, so that thousands of groups of regional tail gas distribution maps each day are obtained;
s23: randomly selecting 80% of the data set as a training set, and performing model training on the training set by using a convolutional neural network; the neural network uses a stack of Conv2D layers and MaxPooling2D layers, using the mean square loss function MSE as the loss function loss:
Figure FDA0003048812370000022
in the above formula, yiIs the exact value of the ith data, and y'iAn estimated value is given for the convolutional neural network;
and taking the rest 20% of the data set as a verification set, observing the performance of the convolutional neural network on the verification set while training, and stopping training when the loss function loss on the verification set reaches the minimum value to obtain the exhaust gas prediction model suitable for the source domain data set.
4. The data distribution adaptive cross-regional exhaust emission prediction method of claim 3, characterized in that: in step S23, weights of some neurons are randomly set to 0(dropout) in the training process, i.e., some neurons are disabled.
5. The data distribution adaptive cross-regional exhaust emission prediction method of claim 3, characterized in that: the S30 trains the target region data to obtain estimated value data of the target region exhaust gas concentration by using the exhaust gas concentration prediction model, and divides the exhaust gas concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to a set standard, and specifically includes:
s31: processing data related to the tail gas in a target domain, and dividing the data into observation data sequences at time intervals delta t according to a time sequence; dividing target domain AQI observation data sequence into
Figure FDA0003048812370000031
By HtThe data set of the target area is input into a trained exhaust gas concentration prediction model to obtain an exhaust gas concentration estimated value Y 'corresponding to the target area't
S32: because the concentration of the tail gas is defined as the highest value of the concentration of various pollutants under the space-time condition, the tail gas is divided according to the relationship between the concentration and the corresponding air pollution index, and the concentration unit of the tail gas is milligram/cubic meter and is divided into five grades: (1)0-5 is marked as "you" (grade I); (2)6-15 are scored as "good" (grade II); (3)16-60 scored as "light pollution" (grade III); (4)61-100 was scored as "moderate contamination" (grade IV); (5)100 above is denoted as "heavy pollution" (level v), and the exhaust gas concentration data on the source data domain and the estimated output exhaust gas concentration of the target data domain are divided according to the above classification method for probability calculation.
6. The data distribution adaptive cross-regional exhaust emission prediction method of claim 5, characterized in that: based on the data distribution, the S40 adaptively minimizes the edge and conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data, and eliminates the difference between the two domains, specifically including:
s41: probability calculation is carried out on estimated values of the source domain exhaust data and the target domain exhaust data which are divided, wherein P (S) and P (T) are edge probability distribution of the source domain and the target domain, and the probability distribution of the two-dimensional discrete random variables (X, Y) is set as P { X ═ X%i,Y=yi}=pijI, j ═ 1,2, …, and the probability distribution of random variables X and Y is P { X ═ Xi}=p,i=1,2,…,P{Y=yi}=p·jJ ═ 1,2, …, referred to as the edge probability distribution of (X, Y) with respect to X and Y, respectively;
let Q (S) and Q (T) be the conditional distribution probabilities of the source and target data fields:
Figure FDA0003048812370000032
s42: the maximum difference between their mean function values is measured with the maximum mean difference MMD:
Distance(Ds,Dt)≈||P(xs)-P(xt)||
under conditional probability, the formula is expressed as:
Figure FDA0003048812370000033
wherein n1 and n2 respectively represent the number of samples of the source domain and the target domain;
applied to the target data field, which is an estimate, the MMD is expressed as:
Figure FDA0003048812370000041
to obtain a reasonable f (-), the kernel matrix K is introduced:
Figure FDA0003048812370000042
and an MMD matrix L, the calculation formula of each element of which is as follows:
Figure FDA0003048812370000043
at this point the distance is converted to tr (kl) - γ tr (k), constructing a matrix W of lower dimension such that:
Figure FDA0003048812370000044
the target for TCA last optimization becomes:
Figure FDA0003048812370000045
s.t.W′KHKW=Im
where H is the central matrix.
7. The data distribution adaptive cross-regional exhaust emission prediction method according to claim 2, characterized in that: the historical exhaust gas data of the source area and the target area and the external factor data related to the exhaust gas concentration in the step S11 include weather factors, road condition information, and the density of traffic flow.
8. A data distribution self-adaptive cross-regional exhaust emission prediction system is characterized in that: comprises the following units of a first unit, a second unit,
the data acquisition and processing unit is used for acquiring, sorting and preprocessing the tail gas data of the known regional stations and the data related to tail gas emission to obtain source domain data; acquiring and preprocessing relevant data of a target area to obtain target area data;
the prediction model construction unit is used for carrying out time series division on the source domain exhaust concentration and related data and constructing an exhaust concentration prediction model on the data set by using a convolutional neural network;
the model training unit is used for training the target domain data by using the exhaust concentration prediction model to obtain estimated value data of the exhaust concentration of the target domain, and dividing the exhaust concentration into five grades of excellent, good, light pollution, moderate pollution and severe pollution according to a set standard;
the data difference processing unit is used for adaptively minimizing the edge and conditional probability of the source domain exhaust gas data and the target domain exhaust gas concentration estimated value data based on data distribution and eliminating the difference between the two domains;
and the prediction unit is used for inputting the target area data into the trained exhaust gas concentration prediction model again to obtain a prediction result of the target area exhaust gas concentration.
CN202110481944.5A 2021-04-30 2021-04-30 Data distribution self-adaptive cross-regional exhaust emission prediction method and system Pending CN113222236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110481944.5A CN113222236A (en) 2021-04-30 2021-04-30 Data distribution self-adaptive cross-regional exhaust emission prediction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110481944.5A CN113222236A (en) 2021-04-30 2021-04-30 Data distribution self-adaptive cross-regional exhaust emission prediction method and system

Publications (1)

Publication Number Publication Date
CN113222236A true CN113222236A (en) 2021-08-06

Family

ID=77090666

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110481944.5A Pending CN113222236A (en) 2021-04-30 2021-04-30 Data distribution self-adaptive cross-regional exhaust emission prediction method and system

Country Status (1)

Country Link
CN (1) CN113222236A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488316A (en) * 2014-09-17 2016-04-13 日本电气株式会社 Air quality prediction system and method
CN108288109A (en) * 2018-01-11 2018-07-17 安徽优思天成智能科技有限公司 Motor-vehicle tail-gas concentration prediction method based on LSTM depth space-time residual error networks
WO2018214060A1 (en) * 2017-05-24 2018-11-29 北京质享科技有限公司 Small-scale air quality index prediction method and system for city
CN112132264A (en) * 2020-09-11 2020-12-25 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Regional exhaust emission prediction method and system based on space-time residual perception network
CN112598170A (en) * 2020-12-18 2021-04-02 中国科学技术大学 Vehicle exhaust emission prediction method and system based on multi-component fusion time network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488316A (en) * 2014-09-17 2016-04-13 日本电气株式会社 Air quality prediction system and method
WO2018214060A1 (en) * 2017-05-24 2018-11-29 北京质享科技有限公司 Small-scale air quality index prediction method and system for city
CN108288109A (en) * 2018-01-11 2018-07-17 安徽优思天成智能科技有限公司 Motor-vehicle tail-gas concentration prediction method based on LSTM depth space-time residual error networks
CN112132264A (en) * 2020-09-11 2020-12-25 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Regional exhaust emission prediction method and system based on space-time residual perception network
CN112598170A (en) * 2020-12-18 2021-04-02 中国科学技术大学 Vehicle exhaust emission prediction method and system based on multi-component fusion time network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
袁立镇: "基于生成对抗网络的无监督域适应研究", 《中国优秀硕士学位论文全文数据库 (工程科技Ⅱ辑)》, no. 1, 15 January 2021 (2021-01-15), pages 2 *

Similar Documents

Publication Publication Date Title
CN108491970B (en) Atmospheric pollutant concentration prediction method based on RBF neural network
WO2020063689A1 (en) Method and device for predicting thermal load of electrical system
CN110782658B (en) Traffic prediction method based on LightGBM algorithm
CN110555989B (en) Xgboost algorithm-based traffic prediction method
CN111325403B (en) Method for predicting residual life of electromechanical equipment of highway tunnel
CN113496314B (en) Method for predicting road traffic flow by neural network model
CN113918538B (en) New road maintenance data migration system based on artificial neural network
CN111027662A (en) SD-LSSVR short-time traffic flow prediction method based on chaos quantum particle swarm optimization
CN111694856B (en) Intelligent prediction method and device for reservoir sensitivity
CN115526298A (en) High-robustness comprehensive prediction method for concentration of atmospheric pollutants
CN111951104A (en) Risk conduction early warning method based on associated graph
CN110134754B (en) Method, device, server and medium for predicting operation duration of region interest point
CN112819218B (en) High-resolution urban mobile source pollution space-time prediction method, system and storage medium
CN111694855B (en) Intelligent prediction data processing method and device for reservoir sensitivity
CN116933947A (en) Landslide susceptibility prediction method based on soft voting integrated classifier
Bidyuk et al. An Approach to Identifying and Filling Data Gaps in Machine Learning Procedures
CN113222236A (en) Data distribution self-adaptive cross-regional exhaust emission prediction method and system
CN113256304B (en) Campus card abnormal use behavior online early warning method and system
CN111260927B (en) Road network flow prediction method
CN113158084B (en) Method, device, computer equipment and storage medium for processing movement track data
CN112506930B (en) Data insight system based on machine learning technology
Zhu et al. Predicting profit performance of international construction projects
CN113326976A (en) Port freight volume online prediction method and system based on time-space correlation
CN117171678B (en) Soil microbial flora regulation and control method and system in microbial remediation process
Kyshenko et al. Technological monitoring in the management of the distillation-rectification plant

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210806

RJ01 Rejection of invention patent application after publication