CN117933466B - Pollution prediction method and device, storage medium and electronic equipment - Google Patents
Pollution prediction method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN117933466B CN117933466B CN202410095360.8A CN202410095360A CN117933466B CN 117933466 B CN117933466 B CN 117933466B CN 202410095360 A CN202410095360 A CN 202410095360A CN 117933466 B CN117933466 B CN 117933466B
- Authority
- CN
- China
- Prior art keywords
- data
- pollution
- weather
- predicted
- time range
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 319
- 239000013598 vector Substances 0.000 claims abstract description 332
- 230000008569 process Effects 0.000 claims abstract description 257
- 238000012544 monitoring process Methods 0.000 claims description 229
- 239000003344 environmental pollutant Substances 0.000 claims description 93
- 231100000719 pollutant Toxicity 0.000 claims description 93
- 238000012549 training Methods 0.000 claims description 58
- 238000012423 maintenance Methods 0.000 claims description 48
- 238000012545 processing Methods 0.000 claims description 35
- 239000000356 contaminant Substances 0.000 claims description 32
- 230000012010 growth Effects 0.000 claims description 25
- 238000012937 correction Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 description 14
- 238000011109 contamination Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 238000012795 verification Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000013179 statistical model Methods 0.000 description 4
- 238000003915 air pollution Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000001556 precipitation Methods 0.000 description 3
- RAHZWNYVWXNFOC-UHFFFAOYSA-N Sulphur dioxide Chemical compound O=S=O RAHZWNYVWXNFOC-UHFFFAOYSA-N 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 239000013618 particulate matter Substances 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 230000008685 targeting Effects 0.000 description 2
- MGWGWNFMUOTEHG-UHFFFAOYSA-N 4-(3,5-dimethylphenyl)-1,3-thiazol-2-amine Chemical compound CC1=CC(C)=CC(C=2N=C(N)SC=2)=C1 MGWGWNFMUOTEHG-UHFFFAOYSA-N 0.000 description 1
- UGFAIRIUMAVXCW-UHFFFAOYSA-N Carbon monoxide Chemical compound [O+]#[C-] UGFAIRIUMAVXCW-UHFFFAOYSA-N 0.000 description 1
- CBENFWSGALASAD-UHFFFAOYSA-N Ozone Chemical compound [O-][O+]=O CBENFWSGALASAD-UHFFFAOYSA-N 0.000 description 1
- 230000003698 anagen phase Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 229910002091 carbon monoxide Inorganic materials 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 210000003127 knee Anatomy 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- JCXJVPUVTGWSNB-UHFFFAOYSA-N nitrogen dioxide Inorganic materials O=[N]=O JCXJVPUVTGWSNB-UHFFFAOYSA-N 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Primary Health Care (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Educational Administration (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a pollution prediction method, a pollution prediction device, a storage medium and electronic equipment, wherein the pollution prediction method comprises the following steps: judging whether a matched weather data vector group matched with the weather data vector to be predicted exists in the K groups of weather data vectors based on the weather data vector to be predicted and the K groups of weather data vectors; if the matched meteorological data vector group exists in the K groups of meteorological data vectors, determining a matched historical pollution process matched with the meteorological data vector to be predicted, and determining target meteorological data to be predicted corresponding to the initial meteorological data to be predicted according to the duration of the matched historical pollution process; and calling a pollution process concentration prediction model, performing pollution prediction based on target weather data to be predicted to obtain an initial pollution prediction result, and determining a target pollution prediction result based on the historical pollution data under the matching historical pollution process and the initial pollution prediction result. The embodiment of the invention can conveniently carry out pollution prediction and improve the prediction accuracy.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a pollution prediction method, a pollution prediction device, a storage medium, and an electronic device.
Background
At present, the air pollution prediction is widely applied, is an important precondition for air pollution prevention and control, and can be used for accurately judging the time, degree and the like of future air pollution; the current atmospheric pollution prediction method is mainly divided into a numerical model and a statistical model, wherein the numerical model can simulate an atmospheric power process, an atmospheric chemical mechanism and the like by inputting pollution emission data and meteorological data, but certain uncertainty exists in the pollution emission data and the meteorological data, so that certain deviation exists in the numerical model when predicting the future atmospheric pollutant concentration, and the numerical model has higher requirements on computer hardware and professional capability of a user; compared with a numerical model, the accuracy and usability of the statistical model are greatly improved, but the atmospheric pollution concentration data generally show bias distribution, namely the probability of occurrence of the condition of higher pollutant concentration is lower, so that the accuracy of the statistical model is lower when the statistical model is used for training and predicting the high value of the pollution concentration. Based on this, how to conveniently carry out pollution prediction and improve prediction accuracy has no better solution at present.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a pollution prediction method, apparatus, storage medium, and electronic device, so as to solve the problems of higher requirements or lower accuracy of related technologies; that is, the embodiment of the invention can conveniently carry out pollution prediction and improve the prediction accuracy.
According to an aspect of the present invention, there is provided a pollution prediction method, the method comprising:
Obtaining K groups of meteorological data vectors, wherein one meteorological data vector is determined based on historical pollution data in a historical pollution process, and K is a positive integer;
Acquiring initial weather data to be predicted, and determining weather data vectors to be predicted corresponding to the initial weather data to be predicted, wherein the initial weather data to be predicted comprises weather data of a target area in a first time range, and the determining mode of the weather data vectors to be predicted is the same as the determining mode of each weather data vector in the K groups of weather data vectors;
Judging whether a matched weather data vector group matched with the weather data vector to be predicted exists in the K groups of weather data vectors based on the weather data vector to be predicted and the K groups of weather data vectors;
If the matched meteorological data vector group exists in the K groups of meteorological data vectors, determining a matched historical pollution process matched with the meteorological data vector to be predicted based on the matched meteorological data vector group, and determining target to-be-predicted meteorological data corresponding to the initial to-be-predicted meteorological data according to the duration of the matched historical pollution process, wherein the target to-be-predicted meteorological data comprises meteorological data of the target area in a second time range, and the corresponding duration of the second time range is the duration;
And calling a pollution process concentration prediction model, performing pollution prediction based on the target weather data to be predicted to obtain an initial pollution prediction result, and determining a target pollution prediction result based on the history pollution data under the matched history pollution process and the initial pollution prediction result.
According to another aspect of the present invention, there is provided a pollution prediction device, the device comprising:
The acquisition unit is used for acquiring K groups of meteorological data vectors, wherein one meteorological data vector is determined based on historical pollution data in a historical pollution process, and K is a positive integer; acquiring initial weather data to be predicted;
The processing unit is used for determining a to-be-predicted meteorological data vector corresponding to the initial to-be-predicted meteorological data, the initial to-be-predicted meteorological data comprises meteorological data of a target area in a first time range, and the determination mode of the to-be-predicted meteorological data vector is the same as the determination mode of each meteorological data vector in the K groups of meteorological data vectors;
The processing unit is further used for judging whether a matched weather data vector group matched with the weather data vector to be predicted exists in the K groups of weather data vectors based on the weather data vector to be predicted and the K groups of weather data vectors;
The processing unit is further configured to determine, based on the matched weather data vector set, a matched historical pollution process matched with the weather data vector to be predicted, and determine target weather data to be predicted corresponding to the initial weather data to be predicted according to a duration of the matched historical pollution process, where the target weather data to be predicted includes weather data of the target area within a second time range, and the duration corresponds to the second time range;
the processing unit is also used for calling a pollution process concentration prediction model, performing pollution prediction based on the target weather data to be predicted to obtain an initial pollution prediction result, and determining a target pollution prediction result based on the history pollution data under the matched history pollution process and the initial pollution prediction result.
According to another aspect of the invention there is provided an electronic device comprising a processor, and a memory storing a program, wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the above mentioned method.
According to another aspect of the present invention there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the above mentioned method.
According to the embodiment of the invention, after the K groups of weather data vectors and the initial weather data to be predicted are obtained and the weather data vector to be predicted corresponding to the initial weather data to be predicted is determined, whether the matched weather data vector group matched with the weather data vector to be predicted exists in the K groups of weather data vectors or not is judged based on the weather data vector to be predicted and the K groups of weather data vectors, one weather data vector is determined based on the historical pollution data in a historical pollution process, the initial weather data to be predicted comprises weather data of a target area in a first time range, and the determination mode of the weather data vector to be predicted is the same as the determination mode of each weather data vector in the K groups of weather data vectors. If the K groups of weather data vectors include the matched weather data vector group, a matched historical pollution process matched with the weather data vector to be predicted can be determined based on the matched weather data vector group, and target weather data to be predicted corresponding to the initial weather data to be predicted is determined according to the duration of the matched historical pollution process, wherein the target weather data to be predicted comprises weather data of a target area in a second time range, and the duration corresponding to the second time range is the duration. Further, a pollution process concentration prediction model can be called, pollution prediction is carried out based on target weather data to be predicted, an initial pollution prediction result is obtained, and a target pollution prediction result is determined based on the historical pollution data under the matching historical pollution process and the initial pollution prediction result. Therefore, the embodiment of the invention can conveniently carry out pollution prediction without simulation through a numerical model, thereby improving the prediction efficiency; in addition, the embodiment of the invention can correct the time range related to the weather data to be predicted by matching the duration of the historical pollution process, can correct the pollution prediction result by matching the historical pollution data in the historical pollution process, and the like, and can effectively improve the prediction accuracy.
Drawings
Further details, features and advantages of the invention are disclosed in the following description of exemplary embodiments with reference to the following drawings, in which:
FIG. 1 illustrates a flow diagram of a pollution prediction method according to an exemplary embodiment of the present invention;
FIG. 2 illustrates a schematic diagram of a growing inflection point according to an exemplary embodiment of the present invention;
FIG. 3 shows a schematic diagram of a contamination process according to an exemplary embodiment of the present invention;
FIG. 4 illustrates a flow diagram of another pollution prediction method according to an exemplary embodiment of the present invention;
FIG. 5 illustrates a flow diagram of yet another pollution prediction method according to an exemplary embodiment of the present invention;
FIG. 6 shows a schematic block diagram of a pollution prediction device according to an exemplary embodiment of the present invention;
fig. 7 shows a block diagram of an exemplary electronic device that can be used to implement an embodiment of the invention.
Detailed Description
Embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the invention is susceptible of embodiment in the drawings, it is to be understood that the invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the invention. It should be understood that the drawings and embodiments of the invention are for illustration purposes only and are not intended to limit the scope of the present invention.
It should be understood that the various steps recited in the method embodiments of the present invention may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like herein are merely used for distinguishing between different devices, modules, or units and not for limiting the order or interdependence of the functions performed by such devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those skilled in the art will appreciate that "one or more" is intended to be construed as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the devices in the embodiments of the present invention are for illustrative purposes only and are not intended to limit the scope of such messages or information.
It should be noted that, the execution body of the pollution prediction method provided by the embodiment of the present invention may be one or more electronic devices, which is not limited in this aspect of the present invention; the electronic device may be a terminal (i.e. a client) or a server, and when the execution body includes a plurality of electronic devices and the plurality of electronic devices include at least one terminal and at least one server, the pollution prediction method provided by the embodiment of the present invention may be executed jointly by the terminal and the server. Accordingly, the terminals referred to herein may include, but are not limited to: smart phones, tablet computers, notebook computers, desktop computers, smart watches, smart voice interaction devices, smart appliances, vehicle terminals, aircraft, and so on. The server mentioned herein may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing (cloud computing), cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and artificial intelligence platforms, and so on.
Based on the above description, an embodiment of the present invention proposes a pollution prediction method that can be performed by the above-mentioned electronic device (terminal or server); or the pollution prediction method may be performed by both the terminal and the server together. For convenience of explanation, the pollution prediction method is taken as an example of the electronic device to execute the pollution prediction method; as shown in fig. 1, the pollution prediction method may include the following steps S101 to S105:
s101, obtaining K groups of meteorological data vectors, wherein one meteorological data vector is determined based on historical pollution data in a historical pollution process, and K is a positive integer.
Wherein, one historical pollution process may refer to a pollution process occurring within a corresponding historical time frame; accordingly, one contamination process may include a growth time range (may also be referred to as a growth period) and a maintenance time range (may also be referred to as a maintenance period), that is, one contamination process may be divided into a growth period and a maintenance period, and one contamination data includes data of the corresponding contamination process within the growth time range and data within the maintenance time range, and one time range data may include, but is not limited to: meteorological data and contaminant concentration data within each of a plurality of monitoring periods. Optionally, there may or may not be an overlap region between the growth time range and the maintenance time range in a pollution process, which is not limited by the embodiment of the present invention. Optionally, the duration corresponding to one monitoring period may be one hour, or may be 30 minutes, or the like, which is not limited in the embodiment of the present invention; optionally, one monitoring period may include at least one monitoring time, and then data of one monitoring point in one monitoring period may be: and determining based on the monitoring data of the corresponding monitoring point under each monitoring time in the corresponding monitoring period, such as performing mean value operation or statistical analysis (such as statistics to select the maximum statistical result, etc.) on the monitoring data of the corresponding monitoring point under each monitoring time in the corresponding monitoring period.
In the embodiment of the present invention, the acquisition modes of the K sets of meteorological data vectors may include, but are not limited to, the following:
The first acquisition mode is as follows: the K sets of weather data vectors may be determined based on historical pollution process data; in this case, the electronic device may acquire historical pollution process data, which may include historical pollution data for each of a plurality of historical pollution processes; determining weather data vectors under each historical pollution process based on weather data in an increasing time range in each historical pollution data respectively to obtain N weather data vectors, wherein N is the number of the historical pollution processes in a plurality of historical pollution processes; based on the above, unsupervised learning classification processing (i.e., cluster analysis) can be performed on the N meteorological data vectors, so as to obtain K groups of meteorological data vectors.
Wherein one of the weather data vectors may be determined in a targeting manner, one of the determining manners being operable to indicate a weather element determining manner for each of the at least one weather element, one of the weather element determining manners including any one of: mean value determination mode, maximum value determination mode and highest frequency determination mode. Based on this, when determining the weather data vector under each historical pollution process based on the weather data within the growth time range in each historical pollution data, respectively, for any one of the plurality of historical pollution processes and any one of the at least one weather element, the electronic device may determine P weather element results for any one weather element based on the weather data within the growth time range included in the historical pollution data under any one of the historical pollution processes, one weather element result of the P weather element results being determined based on the weather element results for any one weather element within the corresponding monitoring period, P being the number of monitoring periods in the growth time range for any one of the historical pollution processes. Correspondingly, if the weather element determining mode of any weather element comprises the mean value determining mode, taking the mean value among the P weather element results as the weather element result in the weather data vector of any weather element in any historical pollution process; if the weather element determining mode of any weather element comprises the most value determining mode, taking the maximum value or the minimum value in the P weather element results as the weather element result in the weather data vector of any weather element in any historical pollution process; if the weather element determining mode of any weather element comprises the highest frequency determining mode, counting the P weather element results to obtain the frequency of each statistical weather element result in at least one statistical weather element result, and taking the statistical weather element result with the largest frequency in the at least one statistical weather element result as the weather element result in the weather data vector of any weather element in any historical pollution process.
Optionally, the weather data in one time range may include weather data in each of a plurality of monitoring periods, and the weather data in one monitoring period may include weather data in each monitoring point in the corresponding area in the corresponding monitoring period; then, correspondingly, the weather element results for any weather element within the respective monitoring period may include the weather element results for any weather element within the respective monitoring period and at the respective monitoring point. Optionally, for any monitoring period in the growing time range under any historical pollution process, the electronic device may select one meteorological element result (such as randomly selecting or selecting a maximum value) from the meteorological element results of any meteorological element in any monitoring period, and add the selected meteorological element result to the P meteorological element results; or the weather element results of any weather element in any monitoring period can be weighted and summed, and the weighted and summed results of any weather element in any monitoring period are added to the P weather element results, so as to realize the determination of the P weather element results of any weather element. Alternatively, each weight value may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention.
Optionally, the at least one meteorological element may include, but is not limited to: temperature, barometric pressure, relative humidity, wind speed, wind direction, precipitation, boundary layer height, solar irradiance, etc., that is, a meteorological datum may include, but is not limited to: temperature, barometric pressure, relative humidity, wind speed, wind direction, precipitation, boundary layer height, solar irradiance, etc., i.e., one meteorological data may comprise meteorological element results for each of the at least one meteorological element. Optionally, when any meteorological element is a continuous variable such as temperature, atmospheric pressure, relative humidity, wind speed, precipitation, boundary layer height, solar radiation degree, etc., the meteorological element determining mode of any meteorological element may be a mean value determining mode or a maximum value determining mode; when any meteorological element is a discrete variable such as a wind direction, the meteorological element determination mode of any meteorological element may be the highest frequency determination mode.
For example, taking the weather element determination mode of any weather element as the highest frequency determination mode as an example, assuming that the P weather element results of any weather element include a weather element result 1, a weather element result 1 and a weather element result 2, the electronic device counts the P weather element results, so that the frequency of the weather element result 1 is 2 or 2/3, and the frequency of the weather element result 2 is 1 or 1/3, so that the weather element result 1 can be used as the weather element result in the weather data vector of any weather element in any historical pollution process. At this time, the at least one statistical meteorological element result may include meteorological element result 1 and meteorological element result 2. Optionally, if the frequencies of the statistical weather element results are the same, one statistical weather element result may be randomly selected from at least one statistical weather element result, and the selected statistical weather element result is used as the weather element result in the weather data vector of any weather element in any historical pollution process.
Optionally, when performing unsupervised learning classification processing on the N meteorological data vectors to obtain K groups of meteorological data vectors, a K-Means (K-Means) clustering algorithm may be adopted to perform unsupervised learning classification processing on the N meteorological data vectors; or, an average shift clustering algorithm can be adopted to carry out unsupervised learning classification treatment on the N meteorological data vectors; or performing unsupervised learning classification processing on the N meteorological data vectors by adopting a hierarchical clustering algorithm to obtain K groups of meteorological data vectors, and the like; the embodiment of the present invention is not limited thereto.
In the embodiment of the invention, the weather can be used as a pollution forming cause, the pollution process can play a key role, and the weather elements of the growth period of the pollution process can play a role in promoting the formation of the pollution, so that the average value, the maximum value or the highest frequency value and the like of the weather data of the growth period in each historical pollution process can be adopted to obtain N weather data vectors, the N weather data vectors are divided into K groups based on unsupervised learning classification processing, and each group can represent the weather cause of one type of pollution process.
The second acquisition mode is as follows: k sets of meteorological data vectors can be stored in the storage space of the electronic device, and in this case, the electronic device can acquire the K sets of meteorological data vectors from the storage space of the electronic device.
The electronic equipment can acquire a meteorological data vector downloading link, and the meteorological data vector downloading link can be used for downloading K groups of meteorological data vectors; in such a case, the electronic device may download the data that is downloaded based on the meteorological data vector download link as K sets of meteorological data vectors to enable acquisition of K sets of meteorological data vectors, and so on.
In the embodiment of the present invention, the above-mentioned methods for acquiring the historical pollution process data may include, but are not limited to, the following:
The first acquisition mode is as follows: the storage space of the electronic device itself may store history pollution data under each of the F history pollution processes, in which case the electronic device may select a plurality of history pollution data from the F history pollution data, so that the selected history pollution data is used as history pollution process data, and F is a positive integer greater than 1.
In a second acquisition mode, the electronic device may acquire an air quality data set, where the air quality data set may include air quality indication data of each of at least one region, and one air quality indication data includes at least one air quality indication information of each of the monitoring periods of the one region in a time range; then, each air quality data in the air quality data set can be traversed, the currently traversed air quality data is used as current air quality data, and whether a current pollution process corresponding to the current air quality data exists or not is judged based on the current air quality data. If the current pollution process exists, the current pollution process is used as a historical pollution process, and pollution data under the current pollution process is added into the historical pollution process data, so that the pollution data under the current pollution process is used as historical pollution data in the historical pollution process data. After traversing each air quality data in the air quality data set, historical pollution process data is obtained. Optionally, an area may include an area where a province is located, or may include areas where a plurality of cities are located, which is not limited in the embodiment of the present invention; when an area includes a plurality of cities, the area may also be referred to as a city group, and the analysis may be performed in units of a city group, which may refer to an area composed of a target city and at least one city around the target city.
Optionally, the at least one air quality indication information in one monitoring period may include air quality indication information of each monitoring point in the corresponding area in the corresponding monitoring period, average value of air quality indication information of each monitoring point in the corresponding area in the corresponding monitoring period, and so on; the embodiment of the present invention is not limited thereto.
Specifically, when judging whether a current pollution process corresponding to the current air quality data exists based on the current air quality data, the electronic equipment can detect whether a pollution time range exists in a time range corresponding to the current air quality data based on the current air quality data; wherein, the pollution time range refers to: the time period is greater than a preset time period threshold, and the air quality indication information in each included monitoring period is greater than a time range of the preset indication information threshold, or the pollution time range refers to: the time period is longer than a preset time period threshold value, and the ratio of the target air quality indication information in any monitoring period is larger than a time range of a preset proportion threshold value, wherein one target air quality indication information is the air quality indication information which is larger than the preset indication information threshold value, and the ratio of the target air quality indication information is the ratio between the number of the target air quality indication information and the number of the air quality indication information in any monitoring period. Correspondingly, if the pollution time range exists, determining that the current pollution process corresponding to the current air quality data exists; if the pollution time range does not exist, determining that the current pollution process does not exist. Optionally, the air quality indication information may be an air quality index (Air Quality Index, AQI) value, or may be a concentration of a specified pollutant, which is not limited in the embodiment of the present invention; for convenience of explanation, the air quality index value will be taken as the air quality index information.
Alternatively, when the number of air quality indication information in the at least one air quality indication information is 1, the pollution time range may refer to: the time length is greater than a preset time length threshold value, and the air quality indication information in each included monitoring period is greater than a time range of the preset indication information threshold value; when the number of the air quality indication information in the at least one air quality indication information is plural, the contamination time range may refer to: the time length is greater than a preset time length threshold value, and the included air quality indication information in each monitoring period is greater than a time range of the preset indication information threshold value, or the time length is greater than the time range of the preset time length threshold value, and the ratio of the included target air quality indication information in any monitoring period is greater than a preset ratio threshold value.
Optionally, the preset duration threshold, the preset indication information threshold and the preset proportion threshold may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention. Illustratively, at a preset duration threshold of 24 hours, the contamination time range refers to: the time period is greater than the preset time period threshold, the included time range in which the ratio of the target air quality indication information in any monitoring period is greater than the preset ratio threshold, and the preset ratio threshold is 80% are illustrated as examples, in this case, if the air quality indication information of monitoring points exceeding 80% in the area corresponding to the current air quality data is greater than the preset indication information threshold, and the duration exceeds 24 hours, it may be determined that the pollution time range exists, that is, it is determined that a pollution process occurs.
It should be noted that the electronic device may also determine pollution data (here, a historical pollution data) under the current pollution process before adding the pollution data under the current pollution process to the historical pollution process data. Specifically, the electronic device may determine a maintenance period start point and a maintenance period end point of the current pollution process, and may determine an increase period start point and an increase period end point of the current pollution process. Further, the current growth time range under the current pollution process may be determined using the growth period start point and the growth period end point, and the current maintenance time range under the current pollution process may be determined using the maintenance period start point and the maintenance period end point. Based on the above, the area corresponding to the current air quality data can be determined, and the data in the time range formed by the current growing time range and the current maintaining time range can be obtained, so that the pollution data in the current pollution process can be obtained, wherein the pollution data in the current pollution process comprises the data in the current growing time range and the data in the current maintaining time range. Optionally, the air quality indication information in the next monitoring period of the pollution time range has the air quality indication information smaller than or equal to the preset indication information threshold value, or the ratio of the target air quality indication information in the next monitoring period of the pollution time range is smaller than or equal to the preset ratio threshold value.
Optionally, the start point of the maintenance period may be the start time of the pollution time range corresponding to the current air quality data, or may be the start time when the air quality indication information is determined to be greater than the preset indication information threshold value, which is not limited in the embodiment of the present invention; the determined air quality indication information in one of the monitoring periods may be an average of air quality indication information between each monitoring point in the corresponding area in the corresponding monitoring period.
Optionally, the end point of the maintenance period includes any one of the following: the end time of the contaminated time range and the end time of the target period. Optionally, the target period may be a monitoring period in which the air quality indication information is determined after the start of the maintenance period and the difference between the first air quality indicator and the air quality indicator is greater than a preset difference; or the target period may be a period that is located after the contaminated time range and that is equal to the first preset distance from the contaminated time range; or determining a last period of time when the air quality indication information is continuously lower than a preset indication information threshold value and reaches a preset determination duration, and the like; optionally, the air quality index may be the maximum air quality indication information of the current air quality data in the pollution time range, or may be the average value of the current air quality data among the air quality indication information in the pollution time range, etc.; optionally, the preset difference, the first preset distance and the preset determination duration may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention.
Optionally, the start point of the growth period may be a time point located before the start point of the maintenance period, and the distance between the start point of the maintenance period and the start point of the maintenance period is a second preset distance; or the start point of the increase period may be the start time of the monitoring period in which the first determination air quality indication information valley is located before the start point of the sustain period, and so on. Wherein, the air quality indication information in the monitoring period before the monitoring period where the air quality indication information valley is located is larger than the corresponding air quality indication information valley, and the air quality indication information in the monitoring period after the monitoring period where the air quality indication information valley is located is larger than the corresponding air quality indication information valley.
Alternatively, the growth phase end point may be located within the contamination time frame. Alternatively, the end point of the growth period may be the start point of the maintenance period, or may be the start time or the end time of the monitoring period where the inflection point of the air quality indication information curve is determined. Wherein, the air quality indication information determining curve (namely, the knee point curve) can be formed by air quality indication determining information in each monitoring period in the current air quality data; specifically, for the determined air quality indication information in each monitoring period, the determined air quality indication information curve may be regarded as a discrete sequence, and one determined air quality indication information may correspond to one time point, then the electronic device may calculate the second derivative of the curve at the t time point (i.e., the time point where the t-th determined air quality indication information is located), and select the time point corresponding to the maximum value of the second derivative as the growing inflection point, as shown in fig. 2; wherein t is a positive integer, and t is less than or equal to the number of monitoring periods in the time range corresponding to the current air quality data. In the embodiment of the invention, the electronic equipment can calculate the second derivative of the curve at the point of time t by adopting the formula 1.1:
f '' (x t)=f(xt-1)+f(xt+1)-2f(xt) 1.1
Wherein x t may refer to the t-th determination air quality indication information, i.e., the determination air quality indication information corresponding to the t time point.
For example, as shown in fig. 3, assuming that the start point of the increase period is point 1 (e.g., the point of time at which point 1 is located or the start time of the monitoring period at which point 1 is located, etc.), the end point of the increase period is point 3, the start point of the sustain period is point 2, and the end point of the sustain period is point 4, the current increase time range may be a time range between point 1 and point 3, and the current sustain time range may be a time range between point 2 and point 4.
In a third manner of acquisition, the electronic device may acquire a contaminated process data download link, and use data downloaded based on the contaminated process data download link as historical contaminated process data, and so on.
S102, acquiring initial weather data to be predicted, and determining weather data vectors corresponding to the initial weather data to be predicted, wherein the initial weather data to be predicted comprises weather data of a target area in a first time range, and the determining mode of the weather data vectors to be predicted is the same as the determining mode of each weather data vector in the K groups of weather data vectors.
Alternatively, the corresponding duration of the first time range may be 1 day or two days, which is not limited in the embodiment of the present invention. It should be understood that the manner of determining each of the K sets of weather data vectors is the same, that is, for any of the at least one weather element, the weather element results for any of the weather elements under each of the K sets of weather data vectors are determined according to the same weather element determination manner, that is, the weather element determination manner for any of the weather elements under each of the K sets of weather data vectors is the same, and then the weather element determination manner for any of the weather elements under each of the K sets of weather data vectors may also be referred to as the weather element determination manner for any of the weather elements under the K sets of weather data vectors.
Alternatively, the target area may be any area, and the first time range may be any time range, which is not limited in the embodiment of the present invention. Then the first time frame may be any time frame in the future, such as a day in the future, etc., accordingly.
Wherein the meteorological data for the target area within the first time range may include: weather data of the target area in each monitoring period in the first time range; optionally, the initial weather data to be predicted may be determined based on weather data of each monitoring point in the target area within each monitoring period in the first time range, i.e. the weather data in any monitoring period in the initial weather data to be predicted may be determined based on weather data of each monitoring point in the target area within any monitoring period. Based on the above, for any one of the at least one meteorological element and any monitoring period in the first time range, if any one meteorological element is a continuous variable, average calculation can be performed on the meteorological element results of any one meteorological element of each monitoring point in the target area in any monitoring period to obtain the meteorological element results of any one meteorological element in any monitoring period in the initial meteorological data to be predicted; if any meteorological element is a discrete variable, the meteorological element results of any meteorological element of each monitoring point in the target area in any monitoring period can be counted, so that the statistical meteorological element result with highest frequency is used as the meteorological element result of any meteorological element in any monitoring period in the initial meteorological data to be predicted, and the initial meteorological data to be predicted is obtained. Or one monitoring point can be randomly selected from the target area, and the meteorological data of the selected monitoring point in any monitoring period is used as the meteorological data in any monitoring period in the initial meteorological data to be predicted, and the like.
It should be noted that, the determining mode of the weather data vector to be predicted is the same as the determining mode of each weather data vector in the K sets of weather data vectors, and then the determining mode of the weather element of any weather element under the weather data vector to be predicted is the same as the determining mode of the weather element of any weather element under the K sets of weather data vectors.
For example, assuming that at least one meteorological element includes a wind speed and a wind direction, and the meteorological element determining mode of the wind speed under the K sets of meteorological data vectors is a mean value determining mode, and the meteorological element determining mode of the wind direction under the K sets of meteorological data vectors is a highest frequency determining mode, the wind speed in the meteorological data vector to be predicted can be determined based on the wind speed of the target area in each monitoring period in the first time range according to the mean value determining mode, that is, mean value operation can be performed on the wind speed of the target area in each monitoring period in the first time range to determine the wind speed in the meteorological data vector to be predicted; correspondingly, the wind direction in the weather data vector to be predicted can be determined based on the wind direction of the target area in each monitoring period in the first time range according to the highest frequency determining mode, namely, the wind direction of the target area in each monitoring period in the first time range can be counted, and the wind direction with the highest frequency is used as the wind direction in the weather data vector to be predicted, and the like.
And S103, judging whether a matched weather data vector group matched with the weather data vector to be predicted exists in the K groups of weather data vectors based on the weather data vector to be predicted and the K groups of weather data vectors.
In the embodiment of the invention, if the matched meteorological data vector group matched with the meteorological data vector to be predicted exists in the K groups of meteorological data vectors, the occurrence of a pollution process in the target area within the first time range can be determined, the pollution process of the target area within the first time range is similar to the pollution process type corresponding to the matched meteorological data vector group, and one group of meteorological data vectors can correspond to one pollution process type.
Accordingly, if there is no set of matched weather data vectors that match the weather data vector to be predicted among the K sets of weather data vectors, it may be determined that no pollution process occurs to the target area within the first time range. Optionally, if there is no vector set of the matched meteorological data, the electronic device may call a concentration prediction model of the pollution-free process, perform pollution prediction on the meteorological data to be predicted initially, or perform pollution prediction on the meteorological data of each monitoring point in the target area in each monitoring period in the first time range, so as to obtain a target prediction result. Optionally, the model for predicting the concentration of the pollution-free process may be obtained by model training the model for predicting the concentration of the pollution-free process based on meteorological data and pollutant concentration data of each monitoring point in at least one monitoring period; alternatively, the initial pollution-free process concentration prediction model may be a gradient-lifted tree (Gradient Boosting Decision Tree, GBDT) model, an AdaBoost model (an integrated model), or the like, which is not limited by the embodiment of the present invention.
And S104, if the matched meteorological data vector group exists in the K groups of meteorological data vectors, determining a matched historical pollution process matched with the meteorological data vector to be predicted based on the matched meteorological data vector group, and determining target meteorological data to be predicted corresponding to the initial meteorological data to be predicted according to the duration of the matched historical pollution process.
The target weather data to be predicted comprises weather data of the target area in a second time range, and the corresponding duration of the second time range is the duration, that is, the duration corresponding to the second time range is the same as the duration; optionally, the weather data of the target area in the second time range may include weather data of the target area in each monitoring period in the second time range, and may also include weather data of each monitoring point in the target area in each monitoring period in the second time range. It should be noted that the duration may be a duration corresponding to the duration of the matching history pollution process (i.e., a time range formed by a growing time range and a maintaining time range of the matching history pollution process).
Alternatively, the start time of the second time range may be the start time of the first time range. For example, assuming a duration of 30 hours, the first time range is 2024, 1, 0, 24, then the start time of the second time range may be the start time of the first time range (i.e., 2024, 1, 0, 1, 0), and the corresponding duration of the second time range may be 30 hours, then the second time range may be 2024, 1, 0, 2024, 1, 2, 6; also assuming that one monitoring period is one hour, in this case, each monitoring period in the second time range may include each hour in the second time range.
S105, calling a pollution process concentration prediction model, performing pollution prediction based on target weather data to be predicted to obtain an initial pollution prediction result, and determining a target pollution prediction result based on the historical pollution data under the matching historical pollution process and the initial pollution prediction result.
In an embodiment of the present invention, one pollution prediction result may include pollutant concentration data of each monitoring period in the second time range of the target area, and may also include pollutant concentration data of each monitoring point in the target area in each monitoring period in the second time range, where one pollutant concentration data includes a concentration value of each pollutant in at least one pollutant in the corresponding monitoring period. Alternatively, the at least one contaminant may include, but is not limited to: nitrogen dioxide, sulfur dioxide, carbon monoxide, ozone, inhalable particulate matter (PM 10), fine particulate matter (PM 2.5), etc., which are not limiting examples of the invention.
According to the embodiment of the invention, after the K groups of weather data vectors and the initial weather data to be predicted are obtained and the weather data vector to be predicted corresponding to the initial weather data to be predicted is determined, whether the matched weather data vector group matched with the weather data vector to be predicted exists in the K groups of weather data vectors or not is judged based on the weather data vector to be predicted and the K groups of weather data vectors, one weather data vector is determined based on the historical pollution data in a historical pollution process, the initial weather data to be predicted comprises weather data of a target area in a first time range, and the determination mode of the weather data vector to be predicted is the same as the determination mode of each weather data vector in the K groups of weather data vectors. If the K groups of weather data vectors include the matched weather data vector group, a matched historical pollution process matched with the weather data vector to be predicted can be determined based on the matched weather data vector group, and target weather data to be predicted corresponding to the initial weather data to be predicted is determined according to the duration of the matched historical pollution process, wherein the target weather data to be predicted comprises weather data of a target area in a second time range, and the duration corresponding to the second time range is the duration. Further, a pollution process concentration prediction model can be called, pollution prediction is carried out based on target weather data to be predicted, an initial pollution prediction result is obtained, and a target pollution prediction result is determined based on the historical pollution data under the matching historical pollution process and the initial pollution prediction result. Therefore, the embodiment of the invention can conveniently carry out pollution prediction without simulation through a numerical model, thereby improving the prediction efficiency; in addition, the embodiment of the invention can correct the time range related to the weather data to be predicted by matching the duration of the historical pollution process, can correct the pollution prediction result by matching the historical pollution data in the historical pollution process, and the like, and can effectively improve the prediction accuracy.
Based on the above description, the embodiment of the invention also provides a more specific pollution prediction method. Accordingly, the pollution prediction method may be performed by the above-mentioned electronic device (terminal or server); or the pollution prediction method may be performed by both the terminal and the server together. For convenience of explanation, the pollution prediction method is taken as an example of the electronic device to execute the pollution prediction method; referring to fig. 4, the pollution prediction method may include the following steps S401 to S407:
s401, K groups of meteorological data vectors are acquired, one meteorological data vector is determined based on historical pollution data in a historical pollution process, and K is a positive integer.
S402, acquiring initial weather data to be predicted, and determining weather data vectors corresponding to the initial weather data to be predicted, wherein the initial weather data to be predicted comprises weather data of a target area in a first time range, and the determining mode of the weather data vectors to be predicted is the same as the determining mode of each weather data vector in the K groups of weather data vectors.
S403, judging whether a matched weather data vector group matched with the weather data vector to be predicted exists in the K groups of weather data vectors based on the weather data vector to be predicted and the K groups of weather data vectors.
Wherein, one meteorological data vector group can correspond to one central meteorological data vector; optionally, the central meteorological data vector corresponding to any meteorological data vector group may be a mean value operation result between each meteorological data vector in any meteorological data vector group. Specifically, the electronic device may traverse each of the K sets of weather data vectors, and use the currently traversed weather data vector set as the current weather data vector set; based on the above, the current center meteorological data vector corresponding to the current meteorological data vector group can be determined, and the similarity between the meteorological data vector to be predicted and the current center meteorological data vector is calculated. Correspondingly, if the similarity between the weather data vector to be predicted and the current central weather data vector is larger than a preset similarity threshold, the current weather data vector set can be used as a matched weather data vector set matched with the weather data vector to be predicted, so that the fact that the matched weather data vector set exists in the K weather data vectors is determined, and the traversal is finished. After traversing each of the K sets of weather data vectors, if no matched set of weather data vectors is traversed, it may be determined that no matched set of weather data vectors exists in the K sets of weather data vectors. Optionally, the preset similarity threshold may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention; for example, the preset similarity threshold may be 90%.
In one embodiment, the electronic device may calculate a cosine similarity between the to-be-predicted meteorological data vector and the current center meteorological data vector to enable calculation of the similarity between the to-be-predicted meteorological data vector and the current center meteorological data vector; specifically, the electronic device may calculate the cosine similarity between the weather data vector to be predicted and the current center weather data vector using equation 2.1:
where A may represent the meteorological data vector to be predicted, B may represent the current central meteorological data vector, and A|and B| may represent the modulus of the vector.
In another embodiment, the electronic device may calculate a vector distance between the to-be-predicted meteorological data vector and the current center meteorological data vector, and take an inverse of the vector distance as a similarity between the to-be-predicted meteorological data vector and the current center meteorological data vector, and so on.
S404, if the matched meteorological data vector group exists in the K groups of meteorological data vectors, determining a matched historical pollution process matched with the meteorological data vector to be predicted based on the matched meteorological data vector group, and determining target to-be-predicted meteorological data corresponding to the initial to-be-predicted meteorological data according to the duration of the matched historical pollution process, wherein the target to-be-predicted meteorological data comprises meteorological data of a target area in a second time range.
Specifically, when determining a matching historical pollution process matched with the weather data vector to be predicted based on the matching weather data vector group, the electronic device may calculate the similarity between the weather data vector to be predicted and each weather data vector in the matching weather data vector group, and select, based on the similarity between the weather data vector to be predicted and each weather data vector in the matching weather data vector group, a weather data vector having the highest similarity with the weather data vector to be predicted from the matching weather data vector group, thereby taking the historical pollution process corresponding to the selected weather data vector as the matching historical pollution process matched with the weather data vector to be predicted. Alternatively, the similarity between the weather data vector to be predicted and one weather data vector may be cosine similarity, or may be reciprocal distance, which is not limited in the embodiment of the present invention.
Therefore, the embodiment of the invention can select one historical pollution process example closest to the weather data vector to be predicted, and takes the duration of the historical pollution process example as the duration of a future process (namely, the pollution process corresponding to the weather data vector to be predicted), so that the corresponding duration of the second time range is the duration, the weather data in one pollution process can be reflected more accurately, and the accuracy of pollution prediction in the pollution process can be improved through the target weather data to be predicted.
S405, acquiring an initial concentration prediction model, and determining a target training data set based on data in a maintenance time range included in each historical pollution data.
In an embodiment of the present invention, the following pollution process concentration prediction model may be trained based on data within a maintenance time range included in each of the historical pollution data in the historical pollution process data, where the data within a time range includes meteorological data and pollutant concentration data within each of the plurality of monitoring periods. The data in the maintenance time range is stable, model training is carried out through the data in the maintenance time range, model convergence can be accelerated, and model accuracy is improved.
Alternatively, the initial concentration prediction model may be GBDT models, also may be AdaBoost models, and so on; the embodiment of the present invention is not limited thereto.
Optionally, the meteorological data and contaminant concentration data for a monitoring period may include: the weather data and the contaminant concentration data for each of the at least one monitoring point over the respective monitoring period (i.e., the weather data and the contaminant concentration data for one of the one contaminant data may include the weather data and the contaminant concentration data for each of the at least one monitoring point over the respective monitoring period), and the one target training data may include one training data and the respective tag data. Specifically, when determining the target training data set based on the data in the maintenance time range included in each historical pollution data, determining weather data and pollutant concentration data of any monitoring point in any monitoring period according to any historical pollution data in the historical pollution process data, any monitoring point in any area corresponding to any historical pollution data and any monitoring period in the maintenance time range in any historical pollution data; then, the meteorological data of any monitoring point in any monitoring period can be used as training data of any monitoring point and any monitoring period, and the pollution concentration data of any monitoring point in any monitoring period can be used as label data of any monitoring point and any monitoring period, so that the target training data of any monitoring point and any monitoring period can be obtained. Based on the above, the target training data of any monitoring point and any monitoring period can be added into the target training data set to determine the target training data set.
In this case, all monitoring points, meteorological data and pollutant concentration data in each monitoring period in the maintenance time range of all pollution processes can be spliced and integrated to form a set of training sets (i.e. target training data sets). In other words, the target training data set may include target training data composed of meteorological data and contaminant concentration data for any monitoring point corresponding to historical contaminated process data during any monitoring period in the corresponding maintenance time frame.
S406, performing model training on the initial concentration prediction model by using the target training data set to obtain a pollution process concentration prediction model.
In one embodiment, the electronic device may model train the initial concentration prediction model with the target training dataset through 10-fold cross-validation. Correspondingly, in one training process, 9 data in the target training data set can be adopted to perform model training, and the rest 1 data is adopted to perform verification, so that the verification precision (namely the verification precision of 10-fold cross verification) after 10 training processes, such as root mean square error loss value or square root error loss value, is obtained; based on the above, whether iteration is finished or not can be judged through the verification precision of 10-fold cross verification (if the verification precision is smaller than the preset precision) so as to complete model training; and/or reaching the preset iteration times.
In another embodiment, an initial concentration prediction model may be invoked to perform pollution prediction on training data included in each target training data in the target training data set, to obtain training pollution prediction results of each training data, and calculate a model loss value (such as a root mean square error loss value or a square root error loss value) based on differences between the training pollution prediction results of each training data and the tag data of the corresponding training data; and model parameters in the initial concentration prediction model can be optimized according to the direction of reducing the model loss value, so that an optimized initial concentration prediction model is obtained, and model training is continuously carried out on the optimized initial concentration prediction model until an iteration ending condition (such as the model loss value is smaller than a loss value threshold value or the preset iteration times are reached) is reached, so that the pollution process concentration prediction model is obtained.
Alternatively, the loss value threshold, the preset iteration number and the preset precision may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention.
S407, calling a pollution process concentration prediction model, performing pollution prediction based on target weather data to be predicted to obtain an initial pollution prediction result, and determining a target pollution prediction result based on the historical pollution data under the matching historical pollution process and the initial pollution prediction result.
Optionally, when pollution prediction is performed based on the weather data to be predicted of the target, pollution prediction can be performed on the weather data of each monitoring point in the target area in each monitoring period in the second time range to obtain an initial pollution prediction result, and at this time, the initial pollution prediction result may include pollutant concentration data of each monitoring point in the target area in each monitoring period in the second time range, that is, the initial pollution prediction result may include pollutant concentration data of any monitoring point in the target area in any monitoring period in the second time range; or the meteorological data of each monitoring point in the target area in any monitoring period in the second time range can be subjected to fusion processing (such as mean value operation and/or statistical analysis and the like) to obtain the fusion meteorological data of the target area in any monitoring period in the second time range (namely the meteorological data of the target area in any monitoring period in the second time range), a pollution process concentration prediction model is called, pollution prediction is respectively carried out on the fusion meteorological data of the target area in each monitoring period in the second time range, an initial pollution prediction result is obtained, and the initial pollution prediction result can comprise the pollutant concentration data of the target area in each monitoring period in the second time range.
Based on this, when determining the target pollution prediction result based on the history pollution data and the initial pollution prediction result under the matching history pollution process, the specified pollutant may be determined from at least one pollutant, and Y pollutant concentration values of the specified pollutant are determined from the history pollution data under the matching history pollution process, one pollutant concentration value being determined based on the concentration value of the specified pollutant in one monitoring period included in the history pollution data under the matching history pollution process, Y being a positive integer. Optionally, for any monitoring period under the matching history pollution process, average calculation can be performed on the concentration value of the specified pollutant in any monitoring period of each monitoring point in the corresponding area of the matching history pollution process, so as to obtain the concentration value of the pollutant in any monitoring period of the specified pollutant; or a monitoring point can be randomly selected from the corresponding area of the matched historical pollution process, and the concentration value of the appointed pollutant of the selected monitoring point in any monitoring period is used as the concentration value of the appointed pollutant in any monitoring period. Optionally, the Y contaminant concentration values may include contaminant concentration values for each monitoring period of the specified contaminant in the maintenance time frame of the matched historical contamination process, contaminant concentration values for each monitoring period of the matched historical contamination process, and so forth; the embodiment of the present invention is not limited thereto.
Accordingly, the electronic device may determine V predicted concentration values for the specified contaminant from the initial pollution prediction result, one predicted concentration value being determined based on the concentration value of the specified contaminant during a monitoring period included in the initial pollution prediction result, V being a positive integer. Optionally, if the initial pollution prediction result includes pollutant concentration data of each monitoring point in the target area in each monitoring period in the second time range, for any monitoring period in the second time range, average calculation can be performed on concentration values of specified pollutants of each monitoring point in the target area in any monitoring period to obtain a predicted concentration value of the specified pollutants in any monitoring period; or randomly selecting one monitoring point from the target area, and taking the concentration value of the appointed pollutant of the selected monitoring point in any monitoring period as the predicted concentration value of the appointed pollutant in any monitoring period; if the initial pollution prediction result includes pollutant concentration data of the target area in each monitoring period in the second time range, the concentration value of the specified pollutant of the target area in any monitoring period in the second time range can be used as the predicted concentration value of the specified pollutant in any monitoring period. Optionally, the V predicted concentration values may include a predicted concentration value of the specified contaminant in each monitoring period in the maintenance time range of the second time range, a predicted concentration value of the specified contaminant in each monitoring period of the second time range, and the like, which is not limited by the embodiment of the present invention.
Further, the electronic device may calculate the deviation index based on a mean value between the Y contaminant concentration values and a mean value between the V predicted concentration values. Specifically, the mean value of the concentration values of the Y pollutants may be calculated, to obtain a mean value of the concentration of the pollutants, and the mean value of the V predicted concentration values may be calculated, to obtain a mean value of the predicted concentration, so as to calculate a concentration difference between the mean value of the concentration of the pollutants and the mean value of the predicted concentration, and determine a deviation index based on the concentration difference.
In one embodiment, the electronic device may calculate the deviation index using equation 2.2:
Wherein C 1 may be the mean of the contaminant concentration and C 2 may be the mean of the predicted concentration.
In another embodiment, the electronic device may also use the concentration difference as a deviation indicator; alternatively, the ratio between the concentration difference and the predicted concentration mean may be used to calculate a deviation index, etc.
Further, if the deviation index is greater than the preset index threshold, a correction coefficient may be determined, and the concentration value of the specified pollutant in the initial pollution prediction result is corrected by using the correction coefficient, so as to obtain a target pollution prediction result, so that the target pollution prediction result includes corrected concentration values of the specified pollutant in each monitoring period in the second time range, for example, corrected concentration values of the specified pollutant in each monitoring period in the second time range in the target area, corrected concentration values of the specified pollutant in each monitoring period in the second time range in each monitoring point in the target area, and the like. Optionally, the preset index threshold may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention; the preset index threshold may be 15% by way of example.
Alternatively, the electronic device may calculate the correction factor using equation 2.3:
alternatively, when calculating the deviation index using the ratio between the concentration difference and the predicted concentration mean, the electronic device may also use the predicted concentration mean as a numerator and the contaminant concentration mean as a denominator to calculate the correction parameter, and so on.
Correspondingly, if the deviation index is smaller than or equal to the preset index threshold, the electronic device can use the concentration value of the specified pollutant in the initial pollution prediction result as the concentration value of the specified pollutant in the target pollution prediction result, so as to determine the target pollution prediction result, that is, can use the initial pollution prediction result as the target pollution prediction result.
It should be appreciated that the concentration value in the target pollution prediction result of each of the at least one pollutant other than the specified pollutant is equal to the concentration value of each of the pollutants in the initial pollution prediction result, that is, the embodiment of the present invention may correct only the concentration value of the specified pollutant.
Alternatively, the specified contaminant may be any one of the at least one contaminant, or may be any plurality of the at least one contaminant, which is not limited in this embodiment of the present invention. It should be appreciated that when the number of specified contaminants is plural, the concentration value of each of the plural specified contaminants in the target pollution prediction result may be determined based on the concentration value of the corresponding specified contaminant in the initial pollution prediction result, respectively.
In summary, the embodiment of the invention can define the pollution process, thereby determining each historical pollution process and K groups of meteorological data vectors; in such a case, a future process determination may be made (i.e., determining whether a contaminating process is likely to exist in the future using the initial weather data to be predicted); specifically, a first similarity calculation may be performed to match the type to determine a set of matching meteorological data vectors to determine that a contamination process is occurring, and a second similarity calculation may be performed to match the closest historical case cases to determine a matching historical contamination process, as shown in FIG. 5. Further, the target weather data to be predicted can be determined based on the matching history pollution process, so that pollution prediction is performed on the target weather data to be predicted, and prediction results (such as an initial pollution prediction result and a target pollution prediction result) are obtained; the model refers to a pollution process concentration prediction model.
According to the embodiment of the invention, after the K groups of weather data vectors and the initial weather data to be predicted are obtained, and the weather data vectors to be predicted corresponding to the initial weather data to be predicted are determined, whether the K groups of weather data vectors have the matched weather data vector groups matched with the weather data vectors to be predicted or not is judged based on the weather data vectors to be predicted and the K groups of weather data vectors. Correspondingly, if the matched meteorological data vector group exists in the K groups of meteorological data vectors, a matched historical pollution process matched with the meteorological data vector to be predicted is determined based on the matched meteorological data vector group, target to-be-predicted meteorological data corresponding to the initial to-be-predicted meteorological data is determined according to the duration of the matched historical pollution process, the target to-be-predicted meteorological data comprises meteorological data of a target area in a second time range, and the corresponding duration of the second time range is the duration. Further, an initial concentration prediction model may be obtained and a target training data set may be determined based on data within a maintenance time range included in each of the historical pollution data; the target training data set is adopted to carry out model training on the initial concentration prediction model to obtain a pollution process concentration prediction model, and the accuracy of the pollution process concentration prediction model can be improved, so that the accuracy of pollution prediction is improved; based on the method, a pollution process concentration prediction model can be called, pollution prediction is carried out based on target weather data to be predicted, an initial pollution prediction result is obtained, and a target pollution prediction result is determined based on the historical pollution data under the matching historical pollution process and the initial pollution prediction result. Therefore, the embodiment of the invention can conveniently carry out pollution prediction, and can correct the pollution concentration data of the latest historical pollution process example, so that the target pollution prediction result is more reasonable and reliable, and the accuracy of the target pollution prediction result can be improved.
Based on the description of the related embodiments of the pollution prediction method, the embodiments of the present invention also provide a pollution prediction device, which may be a computer program (including program code) running in an electronic device; as shown in fig. 6, the pollution prediction device may include an acquisition unit 601 and a processing unit 602. The pollution prediction device may perform the pollution prediction method shown in fig. 1 or fig. 4, i.e. the pollution prediction device may operate the above units:
an obtaining unit 601, configured to obtain K sets of weather data vectors, where one weather data vector is determined based on historical pollution data in a historical pollution process, and K is a positive integer; acquiring initial weather data to be predicted;
the processing unit 602 is configured to determine a to-be-predicted weather data vector corresponding to the initial to-be-predicted weather data, where the initial to-be-predicted weather data includes weather data of a target area within a first time range, and a determination manner of the to-be-predicted weather data vector is the same as a determination manner of each weather data vector in the K sets of weather data vectors;
The processing unit 602 is further configured to determine, based on the weather data vector to be predicted and the K sets of weather data vectors, whether a matched weather data vector set matching the weather data vector to be predicted exists in the K sets of weather data vectors;
The processing unit 602 is further configured to determine, based on the set of matched weather data vectors, a matched historical pollution process matched with the weather data vector to be predicted, and determine target weather data to be predicted corresponding to the initial weather data to be predicted according to a duration of the matched historical pollution process, where the target weather data to be predicted includes weather data of the target area within a second time range, and the duration of the second time range corresponds to the duration;
The processing unit 602 is further configured to invoke a pollution process concentration prediction model, predict pollution based on the target weather data to be predicted, obtain an initial pollution prediction result, and determine a target pollution prediction result based on the historical pollution data under the matching historical pollution process and the initial pollution prediction result.
In one embodiment, the obtaining unit 601 may be specifically configured to, when obtaining K sets of meteorological data vectors:
Acquiring historical pollution process data, wherein the historical pollution process data comprises historical pollution data under each historical pollution process in a plurality of historical pollution processes, one pollution process comprises an increase time range and a maintenance time range, one pollution data comprises data of the corresponding pollution process in the increase time range and data in the maintenance time range, and the data in one time range comprises meteorological data and pollutant concentration data in each monitoring period in a plurality of monitoring periods;
Determining weather data vectors under each historical pollution process based on weather data in an increasing time range in each historical pollution data respectively to obtain N weather data vectors, wherein N is the number of the historical pollution processes in the plurality of historical pollution processes;
and performing unsupervised learning classification processing on the N meteorological data vectors to obtain K groups of meteorological data vectors.
In another embodiment, a weather data vector is determined in a targeting manner, one determining manner is used to indicate a weather element determining manner for each of at least one weather element, and one weather element determining manner includes any one of: mean value determination mode, maximum value determination mode and highest frequency determination mode; the acquiring unit 601 may be specifically configured to, when determining the weather data vector under each of the historical pollution processes based on the weather data within the growth time range in each of the historical pollution data, respectively:
For any one of the plurality of historical pollution processes and any one of the at least one meteorological element, determining P meteorological element results for the any one meteorological element based on meteorological data within a growth time range included in historical pollution data under the any one historical pollution process, one meteorological element result of the P meteorological element results being determined based on meteorological element results of the any one meteorological element within a corresponding monitoring period, P being the number of monitoring periods in the growth time range under the any one historical pollution process;
If the weather element determining mode of any weather element comprises the average value determining mode, taking the average value among the P weather element results as the weather element result in the weather data vector of any weather element in the history pollution process; or alternatively
If the weather element determining mode of any weather element comprises the maximum value determining mode, taking the maximum value or the minimum value in the P weather element results as the weather element result in the weather data vector of any weather element in the history pollution process; or alternatively
If the weather element determining mode of any weather element comprises the highest frequency determining mode, counting the P weather element results to obtain the frequency of each statistical weather element result in at least one statistical weather element result, and taking the statistical weather element result with the largest frequency in the at least one statistical weather element result as the weather element result in the weather data vector of any weather element in the history pollution process.
In another embodiment, the pollution process concentration prediction model is trained based on data in a maintenance time range included in each of the historical pollution process data, where the data in one time range includes meteorological data and pollutant concentration data in each of a plurality of monitoring periods, and the obtaining unit 601 is further configured to:
Acquiring an initial concentration prediction model, and determining a target training data set based on data in a maintenance time range included in each historical pollution data;
The processing unit 602 may be further configured to:
and carrying out model training on the initial concentration prediction model by adopting the target training data set to obtain the pollution process concentration prediction model.
In another embodiment, the meteorological data and contaminant concentration data for a monitoring period include: the weather data and the pollutant concentration data of each monitoring point in the corresponding monitoring period in at least one monitoring point, one target training data includes one training data and corresponding tag data, and the obtaining unit 601 may be specifically configured to, when determining the target training data set based on the data in the maintenance time range included in each of the historical pollution data:
Determining weather data and pollutant concentration data of any monitoring point in any monitoring period from any historical pollution data of any historical pollution process data, any monitoring point in a corresponding area of any historical pollution data and any monitoring period in a maintenance time range in any historical pollution data;
taking the meteorological data of any monitoring point in any monitoring period as training data of the any monitoring point and any monitoring period, and taking the pollution concentration data of the any monitoring point in any monitoring period as tag data of the any monitoring point and any monitoring period, so as to obtain target training data of the any monitoring point and any monitoring period;
and adding the target training data of any monitoring point and any monitoring period to the target training data set to determine the target training data set.
In another embodiment, when determining whether a matched weather data vector group matching the weather data vector to be predicted exists in the K weather data vectors based on the weather data vector to be predicted and the K weather data vectors, the processing unit 602 may be specifically configured to:
Traversing each meteorological data vector group in the K groups of meteorological data vectors, and taking the currently traversed meteorological data vector group as a current meteorological data vector group;
determining a current central meteorological data vector corresponding to the current meteorological data vector group, and calculating the similarity between the meteorological data vector to be predicted and the current central meteorological data vector;
If the similarity between the weather data vector to be predicted and the current central weather data vector is larger than a preset similarity threshold, using the current weather data vector group as a matched weather data vector group matched with the weather data vector to be predicted so as to realize the determination that the matched weather data vector group exists in the K groups of weather data vectors, and ending traversal;
After traversing each meteorological data vector group in the K groups of meteorological data vectors, if the matched meteorological data vector group is not traversed, determining that the matched meteorological data vector group does not exist in the K groups of meteorological data vectors.
In another embodiment, the initial pollution prediction result includes predicted concentration data of the target area within each monitoring period in the second time range, one predicted concentration data including a concentration value of each of at least one pollutant; the processing unit 602 may be specifically configured to, when determining the target pollution prediction result based on the history pollution data under the matching history pollution process and the initial pollution prediction result:
Determining a designated pollutant from the at least one pollutant, and determining Y pollutant concentration values of the designated pollutant from the historical pollution data under the matched historical pollution process, wherein one pollutant concentration value is determined based on the concentration value of the designated pollutant in a monitoring period included in the historical pollution data under the matched historical pollution process, and Y is a positive integer;
determining V predicted concentration values of the specified pollutant from the initial pollution prediction result, wherein one predicted concentration value is determined based on the concentration value of the specified pollutant in a monitoring period included in the initial pollution prediction result, and V is a positive integer;
calculating a deviation index based on the mean value between the Y contaminant concentration values and the mean value between the V predicted concentration values;
If the deviation index is larger than a preset index threshold, determining a correction coefficient, and correcting the concentration value of the specified pollutant in the initial pollution prediction result by adopting the correction coefficient to obtain a target pollution prediction result, so that the target pollution prediction result comprises corrected concentration values of the specified pollutant in each monitoring period in the second time range;
And if the deviation index is smaller than or equal to the preset index threshold, taking the concentration value of the specified pollutant in the initial pollution prediction result as the concentration value of the specified pollutant in the target pollution prediction result so as to determine the target pollution prediction result.
In another embodiment, the K sets of weather data vectors are determined based on historical pollution process data, the historical pollution process data comprising historical pollution data for each of a plurality of historical pollution processes, one pollution process comprising a growth time range and a maintenance time range, and one pollution data comprising data for the corresponding pollution process over the growth time range and the maintenance time range, the data for one time range comprising weather data and pollutant concentration data for each of a plurality of monitoring periods; the acquisition unit 601 may further be configured to:
Acquiring an air quality data set, wherein the air quality data set comprises air quality indication data of each area in at least one area, and one air quality indication data comprises at least one air quality indication information of each monitoring period of one area in a time range;
Traversing each air quality data in the air quality data set, taking the currently traversed air quality data as current air quality data, and judging whether a current pollution process corresponding to the current air quality data exists or not based on the current air quality data;
if the current pollution process exists, taking the current pollution process as a historical pollution process, and adding pollution data under the current pollution process into the historical pollution process data so that the pollution data under the current pollution process is taken as historical pollution data in the historical pollution process data;
And after traversing each air quality data in the air quality data set, obtaining the historical pollution process data.
In another embodiment, when determining whether there is a current pollution process corresponding to the current air quality data based on the current air quality data, the obtaining unit 601 may be specifically configured to:
Detecting whether a pollution time range exists in a time range corresponding to the current air quality data based on the current air quality data; wherein, the pollution time range refers to: the time length is greater than a preset time length threshold value, and the air quality indication information in each included monitoring period is greater than a time range of the preset indication information threshold value, or the pollution time range refers to: the time length is larger than the preset time length threshold value, and the ratio of the target air quality indication information in any monitoring period is larger than the time range of the preset ratio threshold value, wherein one target air quality indication information refers to the air quality indication information which is larger than the preset indication information threshold value, and the ratio of the target air quality indication information refers to the ratio between the number of the target air quality indication information and the number of the air quality indication information in any monitoring period;
if the pollution time range exists, determining that a current pollution process corresponding to the current air quality data exists;
if the pollution time range does not exist, determining that the current pollution process does not exist.
In another embodiment, the obtaining unit 601 may further be configured to:
Determining a maintenance period starting point and a maintenance period ending point of the current pollution process, and determining an increase period starting point and an increase period ending point of the current pollution process;
determining a current growing time range under the current pollution process by adopting the starting point of the growing period and the ending point of the growing period, and determining a current maintaining time range under the current pollution process by adopting the starting point of the maintaining period and the ending point of the maintaining period;
Determining the region corresponding to the current air quality data, and obtaining pollution data in the current pollution process in a time range formed by the current growing time range and the current maintaining time range, wherein the pollution data in the current pollution process comprises the data in the current growing time range and the data in the current maintaining time range.
According to one embodiment of the invention, the steps involved in the method of fig. 1 or 4 may be performed by the units of the pollution prediction device of fig. 6. For example, step S101 shown in fig. 1 may be performed by the acquisition unit 601 shown in fig. 6, step S102 may be performed by both the acquisition unit 601 and the processing unit 602 shown in fig. 6, and steps S103 to S105 may be performed by both the processing unit 602 shown in fig. 6. As another example, steps S401 and S405 shown in fig. 4 may each be performed by the acquisition unit 601 shown in fig. 6, step S402 may be performed by the acquisition unit 601 and the processing unit 602 shown in fig. 6 together, steps S403, S404, S406, and S407 may each be performed by the processing unit 602 shown in fig. 6, and so on.
According to another embodiment of the present invention, each unit in the pollution prediction device shown in fig. 6 may be separately or completely combined into one or several other units, or some unit(s) thereof may be further split into a plurality of units with smaller functions, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present invention. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present invention, any pollution prediction device may also include other units, and in practical applications, these functions may also be implemented with assistance from other units, and may be implemented by cooperation of a plurality of units.
According to another embodiment of the present invention, a pollution prediction apparatus as shown in fig. 6 may be constructed by running a computer program (including program code) capable of executing the steps involved in the respective methods as shown in fig. 1 or 4 on a general-purpose electronic device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and a storage element, and a pollution prediction method of an embodiment of the present invention is implemented. The computer program may be recorded on, for example, a computer storage medium, and loaded into and run in the above-described electronic device through the computer storage medium.
Based on the description of the method embodiment and the apparatus embodiment, the exemplary embodiment of the present invention further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor for causing the electronic device to perform a method according to an embodiment of the invention when executed by the at least one processor.
The exemplary embodiments of the present invention also provide a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the present invention.
The exemplary embodiments of the invention also provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the invention.
Referring to fig. 7, a block diagram of an electronic device 700 that may be a server or a client of the present invention will now be described, which is an example of a hardware device that may be applied to aspects of the present invention. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 7, the electronic device 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706, an output unit 707, a storage unit 708, and a communication unit 709. The input unit 706 may be any type of device capable of inputting information to the electronic device 700, and the input unit 706 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 707 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 708 may include, but is not limited to, magnetic disks, optical disks. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through computer networks, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above. For example, in some embodiments, the pollution prediction method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. In some embodiments, the computing unit 701 may be configured to perform the pollution prediction method by any other suitable means (e.g., by means of firmware).
Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It is also to be understood that the foregoing is merely illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.
Claims (12)
1. A method of pollution prediction, comprising:
Obtaining K groups of meteorological data vectors, wherein one meteorological data vector is determined based on historical pollution data in a historical pollution process, and K is a positive integer;
Acquiring initial weather data to be predicted, and determining weather data vectors to be predicted corresponding to the initial weather data to be predicted, wherein the initial weather data to be predicted comprises weather data of a target area in a first time range, and the determining mode of the weather data vectors to be predicted is the same as the determining mode of each weather data vector in the K groups of weather data vectors;
Judging whether a matched weather data vector group matched with the weather data vector to be predicted exists in the K groups of weather data vectors based on the weather data vector to be predicted and the K groups of weather data vectors;
If the matched meteorological data vector group exists in the K groups of meteorological data vectors, determining a matched historical pollution process matched with the meteorological data vector to be predicted based on the matched meteorological data vector group, and determining target to-be-predicted meteorological data corresponding to the initial to-be-predicted meteorological data according to the duration of the matched historical pollution process, wherein the target to-be-predicted meteorological data comprises meteorological data of the target area in a second time range, and the corresponding duration of the second time range is the duration;
Invoking a pollution process concentration prediction model, performing pollution prediction based on the target weather data to be predicted to obtain an initial pollution prediction result, and determining a target pollution prediction result based on the history pollution data under the matched history pollution process and the initial pollution prediction result, wherein the method comprises the following steps:
Determining a designated pollutant from at least one pollutant, and determining Y pollutant concentration values of the designated pollutant from the historical pollution data under the matched historical pollution process, wherein one pollutant concentration value is determined based on the concentration value of the designated pollutant in a monitoring period included in the historical pollution data under the matched historical pollution process, and Y is a positive integer;
determining V predicted concentration values of the specified pollutant from the initial pollution prediction result, wherein one predicted concentration value is determined based on the concentration value of the specified pollutant in a monitoring period included in the initial pollution prediction result, and V is a positive integer;
calculating a deviation index based on the mean value between the Y contaminant concentration values and the mean value between the V predicted concentration values;
If the deviation index is larger than a preset index threshold, determining a correction coefficient, and correcting the concentration value of the specified pollutant in the initial pollution prediction result by adopting the correction coefficient to obtain a target pollution prediction result, so that the target pollution prediction result comprises corrected concentration values of the specified pollutant in each monitoring period in the second time range;
And if the deviation index is smaller than or equal to the preset index threshold, taking the concentration value of the specified pollutant in the initial pollution prediction result as the concentration value of the specified pollutant in the target pollution prediction result so as to determine the target pollution prediction result.
2. The method of claim 1, wherein the obtaining K sets of meteorological data vectors comprises:
Acquiring historical pollution process data, wherein the historical pollution process data comprises historical pollution data under each historical pollution process in a plurality of historical pollution processes, one pollution process comprises an increase time range and a maintenance time range, one pollution data comprises data of the corresponding pollution process in the increase time range and data in the maintenance time range, and the data in one time range comprises meteorological data and pollutant concentration data in each monitoring period in a plurality of monitoring periods;
Determining weather data vectors under each historical pollution process based on weather data in an increasing time range in each historical pollution data respectively to obtain N weather data vectors, wherein N is the number of the historical pollution processes in the plurality of historical pollution processes;
and performing unsupervised learning classification processing on the N meteorological data vectors to obtain K groups of meteorological data vectors.
3. The method of claim 2, wherein a weather data vector is determined in accordance with a target determination, a determination indicating a weather element determination for each of the at least one weather element, a weather element determination comprising any of: mean value determination mode, maximum value determination mode and highest frequency determination mode; the determining the weather data vector under each historical pollution process based on the weather data in the growth time range in each historical pollution data respectively comprises the following steps:
For any one of the plurality of historical pollution processes and any one of the at least one meteorological element, determining P meteorological element results for the any one meteorological element based on meteorological data within a growth time range included in historical pollution data under the any one historical pollution process, one meteorological element result of the P meteorological element results being determined based on meteorological element results of the any one meteorological element within a corresponding monitoring period, P being the number of monitoring periods in the growth time range under the any one historical pollution process;
If the weather element determining mode of any weather element comprises the average value determining mode, taking the average value among the P weather element results as the weather element result in the weather data vector of any weather element in the history pollution process; or alternatively
If the weather element determining mode of any weather element comprises the maximum value determining mode, taking the maximum value or the minimum value in the P weather element results as the weather element result in the weather data vector of any weather element in the history pollution process; or alternatively
If the weather element determining mode of any weather element comprises the highest frequency determining mode, counting the P weather element results to obtain the frequency of each statistical weather element result in at least one statistical weather element result, and taking the statistical weather element result with the largest frequency in the at least one statistical weather element result as the weather element result in the weather data vector of any weather element in the history pollution process.
4. A method according to any one of claims 1-3, wherein the pollution process concentration prediction model is trained based on data within a maintenance time frame comprised by each of the historical pollution process data, the data within a time frame comprising meteorological data and pollutant concentration data within each of a plurality of monitoring periods, the method further comprising:
Acquiring an initial concentration prediction model, and determining a target training data set based on data in a maintenance time range included in each historical pollution data;
and carrying out model training on the initial concentration prediction model by adopting the target training data set to obtain the pollution process concentration prediction model.
5. The method of claim 4, wherein the meteorological data and contaminant concentration data for a monitoring period comprises: weather data and contaminant concentration data for each of at least one monitoring point over a respective monitoring period, a target training data set comprising a training data and respective label data, said determining the target training data set based on data within a maintenance time range comprised by said respective historical contaminant data, comprising:
Determining weather data and pollutant concentration data of any monitoring point in any monitoring period from any historical pollution data of any historical pollution process data, any monitoring point in a corresponding area of any historical pollution data and any monitoring period in a maintenance time range in any historical pollution data;
taking the meteorological data of any monitoring point in any monitoring period as training data of the any monitoring point and any monitoring period, and taking the pollution concentration data of the any monitoring point in any monitoring period as tag data of the any monitoring point and any monitoring period, so as to obtain target training data of the any monitoring point and any monitoring period;
and adding the target training data of any monitoring point and any monitoring period to the target training data set to determine the target training data set.
6. A method according to any one of claims 1-3, wherein said determining whether there is a set of matching meteorological data vectors in the K sets of meteorological data vectors that match the meteorological data vectors to be predicted based on the meteorological data vectors to be predicted and the K sets of meteorological data vectors comprises:
Traversing each meteorological data vector group in the K groups of meteorological data vectors, and taking the currently traversed meteorological data vector group as a current meteorological data vector group;
determining a current central meteorological data vector corresponding to the current meteorological data vector group, and calculating the similarity between the meteorological data vector to be predicted and the current central meteorological data vector;
If the similarity between the weather data vector to be predicted and the current central weather data vector is larger than a preset similarity threshold, using the current weather data vector group as a matched weather data vector group matched with the weather data vector to be predicted so as to realize the determination that the matched weather data vector group exists in the K groups of weather data vectors, and ending traversal;
After traversing each meteorological data vector group in the K groups of meteorological data vectors, if the matched meteorological data vector group is not traversed, determining that the matched meteorological data vector group does not exist in the K groups of meteorological data vectors.
7. A method according to any one of claims 1-3, wherein the K sets of weather data vectors are determined based on historical pollution process data, the historical pollution process data comprising historical pollution data for each of a plurality of historical pollution processes, one pollution process comprising a growth time range and a maintenance time range, and one pollution data comprising data for the corresponding pollution process over the growth time range and data over the maintenance time range, the data over the one time range comprising weather data and pollutant concentration data for each of a plurality of monitoring periods; the method for acquiring the historical pollution process data comprises the following steps:
Acquiring an air quality data set, wherein the air quality data set comprises air quality indication data of each area in at least one area, and one air quality indication data comprises at least one air quality indication information of each monitoring period of one area in a time range;
Traversing each air quality data in the air quality data set, taking the currently traversed air quality data as current air quality data, and judging whether a current pollution process corresponding to the current air quality data exists or not based on the current air quality data;
if the current pollution process exists, taking the current pollution process as a historical pollution process, and adding pollution data under the current pollution process into the historical pollution process data so that the pollution data under the current pollution process is taken as historical pollution data in the historical pollution process data;
And after traversing each air quality data in the air quality data set, obtaining the historical pollution process data.
8. The method of claim 7, wherein the determining whether a current pollution process corresponding to the current air quality data exists based on the current air quality data comprises:
Detecting whether a pollution time range exists in a time range corresponding to the current air quality data based on the current air quality data; wherein, the pollution time range refers to: the time length is greater than a preset time length threshold value, and the air quality indication information in each included monitoring period is greater than a time range of the preset indication information threshold value, or the pollution time range refers to: the time length is larger than the preset time length threshold value, and the ratio of the target air quality indication information in any monitoring period is larger than the time range of the preset ratio threshold value, wherein one target air quality indication information refers to the air quality indication information which is larger than the preset indication information threshold value, and the ratio of the target air quality indication information refers to the ratio between the number of the target air quality indication information and the number of the air quality indication information in any monitoring period;
if the pollution time range exists, determining that a current pollution process corresponding to the current air quality data exists;
if the pollution time range does not exist, determining that the current pollution process does not exist.
9. The method of claim 7, wherein the method further comprises:
Determining a maintenance period starting point and a maintenance period ending point of the current pollution process, and determining an increase period starting point and an increase period ending point of the current pollution process;
determining a current growing time range under the current pollution process by adopting the starting point of the growing period and the ending point of the growing period, and determining a current maintaining time range under the current pollution process by adopting the starting point of the maintaining period and the ending point of the maintaining period;
Determining the region corresponding to the current air quality data, and obtaining pollution data in the current pollution process in a time range formed by the current growing time range and the current maintaining time range, wherein the pollution data in the current pollution process comprises the data in the current growing time range and the data in the current maintaining time range.
10. A pollution prediction device, the device comprising:
The acquisition unit is used for acquiring K groups of meteorological data vectors, wherein one meteorological data vector is determined based on historical pollution data in a historical pollution process, and K is a positive integer; acquiring initial weather data to be predicted;
The processing unit is used for determining a to-be-predicted meteorological data vector corresponding to the initial to-be-predicted meteorological data, the initial to-be-predicted meteorological data comprises meteorological data of a target area in a first time range, and the determination mode of the to-be-predicted meteorological data vector is the same as the determination mode of each meteorological data vector in the K groups of meteorological data vectors;
The processing unit is further used for judging whether a matched weather data vector group matched with the weather data vector to be predicted exists in the K groups of weather data vectors based on the weather data vector to be predicted and the K groups of weather data vectors;
The processing unit is further configured to determine, based on the matched weather data vector set, a matched historical pollution process matched with the weather data vector to be predicted, and determine target weather data to be predicted corresponding to the initial weather data to be predicted according to a duration of the matched historical pollution process, where the target weather data to be predicted includes weather data of the target area within a second time range, and the duration corresponds to the second time range;
The processing unit is further configured to invoke a pollution process concentration prediction model, perform pollution prediction based on the target weather data to be predicted, obtain an initial pollution prediction result, and determine a target pollution prediction result based on the history pollution data under the matching history pollution process and the initial pollution prediction result, where the determining includes: determining a designated pollutant from at least one pollutant, and determining Y pollutant concentration values of the designated pollutant from the historical pollution data under the matched historical pollution process, wherein one pollutant concentration value is determined based on the concentration value of the designated pollutant in a monitoring period included in the historical pollution data under the matched historical pollution process, and Y is a positive integer; determining V predicted concentration values of the specified pollutant from the initial pollution prediction result, wherein one predicted concentration value is determined based on the concentration value of the specified pollutant in a monitoring period included in the initial pollution prediction result, and V is a positive integer; calculating a deviation index based on the mean value between the Y contaminant concentration values and the mean value between the V predicted concentration values; if the deviation index is larger than a preset index threshold, determining a correction coefficient, and correcting the concentration value of the specified pollutant in the initial pollution prediction result by adopting the correction coefficient to obtain a target pollution prediction result, so that the target pollution prediction result comprises corrected concentration values of the specified pollutant in each monitoring period in the second time range; and if the deviation index is smaller than or equal to the preset index threshold, taking the concentration value of the specified pollutant in the initial pollution prediction result as the concentration value of the specified pollutant in the target pollution prediction result so as to determine the target pollution prediction result.
11. An electronic device, comprising:
A processor; and
A memory in which a program is stored,
Wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method according to any of claims 1-9.
12. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410095360.8A CN117933466B (en) | 2024-01-23 | 2024-01-23 | Pollution prediction method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410095360.8A CN117933466B (en) | 2024-01-23 | 2024-01-23 | Pollution prediction method and device, storage medium and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117933466A CN117933466A (en) | 2024-04-26 |
CN117933466B true CN117933466B (en) | 2024-08-20 |
Family
ID=90766002
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410095360.8A Active CN117933466B (en) | 2024-01-23 | 2024-01-23 | Pollution prediction method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117933466B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304610A (en) * | 2017-12-22 | 2018-07-20 | 中山大学 | A kind of air high pollution process dynamics method for tracing |
CN113632101A (en) * | 2018-08-25 | 2021-11-09 | 山东诺方电子科技有限公司 | Method for predicting atmospheric pollution through vectorization analysis |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108053071A (en) * | 2017-12-21 | 2018-05-18 | 宇星科技发展(深圳)有限公司 | Regional air pollutant concentration Forecasting Methodology, terminal and readable storage medium storing program for executing |
CN109142171B (en) * | 2018-06-15 | 2021-08-03 | 上海师范大学 | Urban PM10 Concentration Prediction Method Based on Feature Dilation Fusion Neural Network |
CN113077097B (en) * | 2021-04-14 | 2023-08-25 | 江南大学 | An air quality prediction method based on deep spatio-temporal similarity |
CN116739189A (en) * | 2023-08-14 | 2023-09-12 | 中科三清科技有限公司 | Transmission tracing method and device, storage medium and electronic equipment |
-
2024
- 2024-01-23 CN CN202410095360.8A patent/CN117933466B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304610A (en) * | 2017-12-22 | 2018-07-20 | 中山大学 | A kind of air high pollution process dynamics method for tracing |
CN113632101A (en) * | 2018-08-25 | 2021-11-09 | 山东诺方电子科技有限公司 | Method for predicting atmospheric pollution through vectorization analysis |
Also Published As
Publication number | Publication date |
---|---|
CN117933466A (en) | 2024-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112818023B (en) | Big data analysis method and cloud computing server in associated cloud service scene | |
CN112488183B (en) | Model optimization method, device, computer equipment and storage medium | |
US20240127795A1 (en) | Model training method, speech recognition method, device, medium, and apparatus | |
CN112101172A (en) | Weight grafting-based model fusion face recognition method and related equipment | |
CN112396613A (en) | Image segmentation method and device, computer equipment and storage medium | |
CN116882321B (en) | Meteorological influence quantitative evaluation method and device, storage medium and electronic equipment | |
CN116756522B (en) | Probability forecasting method and device, storage medium and electronic equipment | |
CN117520907A (en) | Abnormal data detection method, device and storage medium | |
CN115759413A (en) | Meteorological prediction method and device, storage medium and electronic equipment | |
CN113888381A (en) | Pollutant concentration forecasting method and device | |
CN115270013B (en) | Method and device for evaluating emission reduction measures during activity and electronic equipment | |
CN114936323B (en) | Training method and device of graph representation model and electronic equipment | |
CN117933466B (en) | Pollution prediction method and device, storage medium and electronic equipment | |
CN116776073B (en) | Pollutant concentration evaluation method and device | |
CN114490965B (en) | Question processing method and device, electronic equipment and storage medium | |
CN111241297A (en) | Map data processing method and device based on label propagation algorithm | |
CN116739189A (en) | Transmission tracing method and device, storage medium and electronic equipment | |
CN111582456B (en) | Method, apparatus, device and medium for generating network model information | |
CN113779335A (en) | Information generation method and device, electronic equipment and computer readable medium | |
CN115099875A (en) | Data classification method based on decision tree model and related equipment | |
CN115630979A (en) | Day-ahead electricity price prediction method and device, storage medium and computer equipment | |
CN113139673A (en) | Method, device, terminal and storage medium for predicting air quality | |
CN118690144B (en) | Probability forecasting method and device, storage medium and electronic equipment | |
CN117079711B (en) | Biological aerosol diffusion simulation method and device, storage medium and electronic equipment | |
CN118734205B (en) | A wind speed prediction method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |