CN117577227B - PM2.5 point location high value identification method, system, equipment and medium - Google Patents

PM2.5 point location high value identification method, system, equipment and medium Download PDF

Info

Publication number
CN117577227B
CN117577227B CN202410058043.9A CN202410058043A CN117577227B CN 117577227 B CN117577227 B CN 117577227B CN 202410058043 A CN202410058043 A CN 202410058043A CN 117577227 B CN117577227 B CN 117577227B
Authority
CN
China
Prior art keywords
point
data
monitoring
historical data
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410058043.9A
Other languages
Chinese (zh)
Other versions
CN117577227A (en
Inventor
刘保献
王莉华
沈秀娥
王小菊
王欣
李云婷
景宽
安青青
姜南
张立坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ecological Environment Monitoring Center
Original Assignee
Beijing Ecological Environment Monitoring Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ecological Environment Monitoring Center filed Critical Beijing Ecological Environment Monitoring Center
Priority to CN202410058043.9A priority Critical patent/CN117577227B/en
Publication of CN117577227A publication Critical patent/CN117577227A/en
Application granted granted Critical
Publication of CN117577227B publication Critical patent/CN117577227B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/20Identification of molecular entities, parts thereof or of chemical compositions

Abstract

The invention provides a PM2.5 point location high-value identification method, a PM2.5 point location high-value identification system, PM2.5 point location high-value identification equipment and PM2.5 point location high-value identification medium, and relates to the technical field of environmental quality monitoring. Comprising the following steps: determining all monitoring points of the area to be detected, and calculating historical data correlation coefficients between every two monitoring points in all the monitoring points according to PM2.5 historical data and PM10 historical data of each monitoring point; in all the monitoring points, acquiring a plurality of similar point sets of the target monitoring points according to the correlation coefficient between the target monitoring points and the rest points and the distance between the target monitoring points and the rest points; constructing a machine learning model, and obtaining a PM2.5 data prediction range of a target monitoring point through the model; and comparing the actual monitoring data of the target monitoring point position with the upper limit value of the PM2.5 data prediction range, and identifying whether the current PM2.5 data of the target monitoring point position is a point position high value. The method improves the accuracy of the PM2.5 point location high value identification result.

Description

PM2.5 point location high value identification method, system, equipment and medium
Technical Field
The invention relates to the technical field of environmental quality monitoring, in particular to a PM2.5 point location high value identification method, a system, equipment and a medium.
Background
In atmospheric environment quality monitoring, the high value of the point location is identified by arranging ground monitoring stations and analyzing monitoring hour data, and then the high value is combined with other pollution source information and on-site environment protection investigation feedback conditions, so that the workflow is widely applied to environment protection supervision. In the monitoring point high-value data identification, the common high-value identification means at the present stage mainly identify through various fixed rules, such as: exceeding a fixed threshold; the difference from the peripheral point location exceeds a threshold value, etc.
In an actual monitoring scenario, two types of monitoring points exist in one city, and the monitoring principle is different or the equipment maintenance conditions are different because the two types of monitoring points belong to different manufacturers. If the conventional high-value recognition algorithm based on the fixed rule is directly applied to such points, structural differences may occur in the high-value recognition result due to the objective reasons, and the complexity of specific situations of the monitored points, such as the differentiation of manual determination of the points aiming at different types and different maintenance conditions, makes the high-value recognition algorithm based on the fixed rule difficult to realize.
Disclosure of Invention
Therefore, the embodiment of the application provides a PM2.5 point location high value identification method based on a prediction algorithm, so as to achieve the purpose of improving the accuracy of PM2.5 point location high value identification results.
The embodiment of the application provides the following technical scheme: a PM2.5 point location high value identification method comprises the following steps:
Determining all monitoring points in the to-be-monitored area, and respectively calculating historical data correlation coefficients between every two monitoring points in all the monitoring points according to PM2.5 historical data and PM10 historical data of each monitoring point;
In all the monitoring points, respectively acquiring a plurality of similar point sets of the target monitoring points according to the historical data correlation coefficient between the target monitoring points and the rest of the monitoring points and the distance between the target monitoring points and the rest of the monitoring points;
According to the historical data of the target monitoring point, the historical data and the actual monitoring data of each point in the plurality of similar point sets, a machine learning model is constructed, and a PM2.5 data prediction range of the target monitoring point is obtained through model prediction;
and comparing the actual monitoring data of the target monitoring point position with the upper limit value of the PM2.5 data prediction range, and identifying whether the current PM2.5 data of the target monitoring point position is a point position high value.
According to one embodiment of the application, the PM2.5 history data includes PM2.5 data of 1 year and PM2.5 data of the first 1 month, and the PM10 history data includes PM10 data of 1 year and PM10 data of the first 1 month.
According to one embodiment of the present application, the historical data correlation coefficient is an average value of the correlation coefficient of PM2.5 historical data and the correlation coefficient of PM10 historical data;
Wherein, the calculation formula of the correlation coefficient is as follows:
Wherein is a correlation coefficient,/> is the kth historical data of the mth point, p is the total hours of the historical data, is the historical data average of the mth point,/> is the kth historical data of the nth point, and/> is the historical data average of the nth point.
According to an embodiment of the present application, in all the monitoring points, a plurality of similar point sets of the target monitoring points are respectively obtained according to the historical data correlation coefficient between the target monitoring point and the rest of the monitoring points and the distance between the target monitoring point and the rest of the monitoring points, including:
among all monitoring points, ordering all the historical data correlation coefficients between a target monitoring point and other monitoring points from high to low, obtaining a set number of corresponding monitoring points with the historical data correlation coefficients ordered at the front, and taking the corresponding monitoring points as a similar point set A of the target monitoring point;
Among all the monitoring points, the distances between the target monitoring point and the rest monitoring points are sequenced according to the sequence from near to far, and the set number of monitoring points with the distances within a set first threshold range are taken as a similar point set B of the target monitoring points; and taking the set number of monitoring points with the distance within a set second threshold range as a similar point set C of the target monitoring points.
According to one embodiment of the present application, a machine learning model is constructed according to historical data of the target monitoring point location, and historical data and actual monitoring data of each point location in the plurality of similar point location sets, and a PM2.5 data prediction range of the target monitoring point location is obtained through model prediction, including:
respectively training a plurality of machine learning models by taking at least one type of data of the historical data of the target monitoring point, the historical data of each point in the plurality of similar point sets and the actual monitoring data of each point as input data to obtain a plurality of prediction models;
And obtaining a plurality of prediction results through the plurality of prediction models, taking a next highest value in the plurality of prediction results, and multiplying the next highest value by a coefficient a to be used as an upper limit value of the PM2.5 data prediction range.
According to one embodiment of the application, the machine learning model is a machine learning model constructed based on XGBoost algorithm.
According to an embodiment of the present application, comparing the actual monitoring data of the target monitoring point location with the upper limit value of the PM2.5 data prediction range, and identifying whether the current PM2.5 data of the target monitoring point location is a point location high value includes:
If the PM2.5 hour data of the target monitoring point is higher than the upper limit value of the PM2.5 data prediction range, judging the current PM2.5 data of the target monitoring point as a class 1 high value;
If the PM2.5 data of the target monitoring point position for at least 3 continuous hours are all judged to be the class 1 high value, judging the current PM2.5 data of the target monitoring point position to be the class 2 high value;
And if the PM2.5 data of the target monitoring point and the monitoring point within the range of 3km around the target monitoring point are both judged to be the class 1 high value, judging the current PM2.5 data of the target monitoring point to be the class 3 high value.
The application also provides a PM2.5 point location high value identification system, which comprises:
The calculation module is used for respectively calculating historical data correlation coefficients between every two monitoring points in all the monitoring points according to the PM2.5 historical data and the PM10 historical data of each monitoring point after determining all the monitoring points in the to-be-detected area;
The similar point position acquisition module is used for respectively acquiring a plurality of similar point position sets of the target monitoring point positions in all the monitoring point positions according to the historical data correlation coefficient between the target monitoring point positions and other monitoring point positions and the distance between the target monitoring point positions and the other monitoring point positions;
The prediction module is used for constructing a machine learning model according to the historical data of the target monitoring point location, the historical data and the actual monitoring data of each point location in the plurality of similar point location sets, and obtaining a PM2.5 data prediction range of the target monitoring point location through model prediction;
And the identification module is used for comparing the actual monitoring data of the target monitoring point position with the upper limit value of the PM2.5 data prediction range and identifying whether the current PM2.5 data of the target monitoring point position is a point position high value or not.
The application also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the PM2.5 point position high value identification method is realized when the processor executes the computer program.
The application also provides a computer readable storage medium storing a computer program for executing the PM2.5 point location high value identification method.
Compared with the prior art, the beneficial effects that above-mentioned at least one technical scheme that this description embodiment adopted can reach include at least: according to the embodiment of the invention, based on historical data such as pollutants and weather, a point data set similar to and related to a target point is selected according to the point type, the point current data characteristics, the operation and maintenance conditions and the like, a learning model is constructed according to the point data set, the reasonable range of the target point data based on the similar point is predicted, the point actual data is compared with the predicted reasonable range, and the PM2.5 point high value is further identified. The invention provides a prediction algorithm based on historical data, which is used for identifying PM2.5 point location high values, can solve the problem of point location classification and processing with very many factors to be considered, can be applied to complex and large-scale areas/projects, obtains uniform and accurate high value identification results, and is particularly suitable for PM2.5 point location high value identification application scenes which are hidden in rules and difficult to identify by general fixed rules.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a PM2.5 point location high value identification method according to an embodiment of the invention;
FIG. 2 is a block diagram of a PM2.5 point high value identification system in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of a computer device of the present invention.
Detailed Description
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
As shown in fig. 1, an embodiment of the present invention provides a PM2.5 point location high value identification method, including:
S101, determining all monitoring points in a to-be-monitored area, and respectively calculating historical data correlation coefficients between every two monitoring points in all the monitoring points according to PM2.5 historical data and PM10 historical data of each monitoring point;
S102, respectively acquiring a plurality of similar point location sets of the target monitoring point locations in all the monitoring point locations according to the historical data correlation coefficient between the target monitoring point locations and other monitoring point locations and the distance between the target monitoring point locations and the other monitoring point locations;
S103, constructing a machine learning model according to the historical data of the target monitoring point, the historical data and the actual monitoring data of each point in the plurality of similar point sets, and obtaining a PM2.5 data prediction range of the target monitoring point through model prediction;
S104, comparing actual monitoring data of the target monitoring point position with the upper limit value of the PM2.5 data prediction range, and identifying whether current PM2.5 data of the target monitoring point position is a point position high value.
The PM2.5 point location high value identification method provided by the embodiment of the invention is based on meteorological, point location PM2.5 data and other pollutant data, firstly, point locations with similar data are selected from the point locations, a reasonable range of machine learning model prediction target point location PM2.5 data is constructed for the point locations, then point location actual data is compared with the prediction range, and further whether the point location data is high or not is judged.
According to one embodiment, in S101, the PM2.5 history data includes PM2.5 data of 1 year and PM2.5 data of the first 1 month, and the PM10 history data includes PM10 data of 1 year and PM10 data of the first 1 month.
In the specific implementation, for all monitoring points, historical 1-year PM2.5 data, historical 1-year PM10 data, previous 1-month (namely 60 days before the current moment and 30 days before the current moment) PM2.5 data and previous 1-month PM10 data are respectively taken, correlation coefficients of four groups of data between every two of all the points are respectively calculated, and an average value is obtained to obtain the historical data correlation coefficients.
Wherein, the calculation formula of the correlation coefficient of the single item is as follows:
Wherein is a correlation coefficient,/> is the kth historical data of the mth point, p is the total hours of the historical data, is the historical data average of the mth point,/> is the kth historical data of the nth point, and/> is the historical data average of the nth point.
The correlation coefficient is a statistical index for reflecting how closely the linear correlations between the variables are. The calculation method of the embodiment of the invention mainly refers to a Pelson correlation coefficient calculation method, and the correlation degree between two variables is reflected by multiplying the two differences on the basis of the differences between the two groups of variables and the average value of the two groups of variables.
According to one embodiment, in S102, among all the monitoring points, ordering all the historical data correlation coefficients between a target monitoring point and the rest of the monitoring points from high to low, obtaining a set number of corresponding monitoring points with the historical data correlation coefficients ordered at the front, and taking the corresponding monitoring points as a similar point set a of the target monitoring points;
Among all the monitoring points, the distances between the target monitoring point and the rest monitoring points are sequenced according to the sequence from near to far, and the set number of monitoring points with the distances within a set first threshold range are taken as a similar point set B of the target monitoring points; and taking the set number of monitoring points with the distance within a set second threshold range as a similar point set C of the target monitoring points.
Specifically, the similar point location set A represents a point location with similar pollution characteristics to the target point location but no relation in space; the similar point position set B represents a point position nearby the target point position; the set of similar points C represents points that are closer to the target point but slightly further away. Prediction methods based on three types of sets have different performances under different conditions, and specific examples are as follows:
The point positions located in the city park may have larger differences from the characteristics of the peripheral point positions (B, C), but are consistent with the characteristics of the point positions of the set A; 2. the point location is adjacent to an individual fire point, so that only a small area (B) is polluted, and a slightly far area (C) or a point location (A) similar to the point location in history has no corresponding characteristic;
3. In a pollution transmission process with a certain scale, the point location (A) which is similar to the point location in history does not necessarily have corresponding characteristics, and the peripheral area (B) and the slightly far area (C) have corresponding characteristics, but have certain difference in time.
A. The B, C point location sets respectively consider the prediction effect under different conditions of point location, local pollution source, regional pollution transmission and the like which can cope with the special local characteristics relative to the periphery.
In the implementation, for each point, the first 10 points (excluding the target point) with the highest similarity are ordered according to the average value of correlation coefficients of other points and the target point (namely, the correlation coefficient of the historical data), and the first 10 points are the similar point set A of the target point. And for each point location, taking at most 5 point locations closest to the point location according to the distance, wherein the distance is less than 3km, and the point location is a point location set B. And for each point location, taking at most 10 point locations closest to the point location according to the distance, wherein the distance is more than or equal to 3km and less than or equal to 5km, and the point location is a point location set C.
According to one embodiment, in S103, training a plurality of machine learning models respectively by using at least one type of data of the historical data of the target monitoring point, the historical data of each point in the plurality of similar point sets, and the actual monitoring data of each point as input data to obtain a plurality of prediction models; and obtaining a plurality of prediction results through the plurality of prediction models, taking a next highest value in the plurality of prediction results, and multiplying the next highest value by a coefficient a to be used as an upper limit value of the PM2.5 data prediction range.
In specific implementation, based on XGBoost algorithm, the following 8 prediction models are constructed with reference to the selection of input features and training data in table 1.
Wherein each predictive model, when trained alone, selects the best model obtained in the training by automatic tuning. The specific model is shown in table 1.
TABLE 1
The XGBoost algorithm is a mature machine learning algorithm, and modeling, automatic parameter tuning details and mathematical expressions of the model are not repeated here, but for input data features of each model, data set labels for training the model need to follow strict limitations as follows:
Model M1:
the input data is characterized by historical data 25-1 hour before the target point location, and the time of hour to be predicted is the number of hours.
And the data set label is actual monitoring data of the target point position to be predicted for hours.
The model M1 only refers to the historical data of the point position, characterizes the self pollution change of the point position, and is insensitive to pollution source emission and external transmission pollution.
Model M2:
The input data features are historical data of the target point position in the front 25-1 hour, 10 point positions of the set A are actually monitored data of the target point position in the front 25-1 hour, 5 point positions of the set B are actually monitored data of the target point position in the front 25-1 hour, and 10 point positions of the set C are actually monitored data of the target point position in the front 25-1 hour and the time of the target point position in the front 25-1 hour.
And the data set label is actual monitoring data of the target point position to be predicted for hours.
The present model M2 takes all factors (spot self characteristics, nearer peripheral pollution sources, farther peripheral pollution sources, regional transmission pollution) into account, and theoretically has the best effect, but has the worst anti-interference capability (for example, any anomaly in individual data may cause an anomaly in the result).
Model M3:
the input data features are the actual monitoring data of 10 points of the set A for the hour to be predicted, and the time of the hour to be predicted.
And the data set label is actual monitoring data of the target point position to be predicted for hours.
The model M3 only depends on the hour data of the similar point positions to predict the hour data of the target point positions, and is similar to the model M1, and when the similar point positions are positioned under the condition of no pollution source emission and no external transmission pollution, the prediction is accurate.
Model M4:
The input data features are actual monitoring data of 10 points of the set A for the hour to be predicted, and actual monitoring data of 5 points of the set B for the hour to be predicted, and the time of the hour to be predicted.
And the data set label is actual monitoring data of the target point position to be predicted for hours.
The model M4 uses similar point location and adjacent point location data, and can represent the influence of the self-characteristics of the point location and the local pollution source characteristics of a small range on the point location.
Model M5:
The input data features are the actual monitoring data of 10 points of the set A for the hour to be predicted, the actual monitoring data of 5 points of the set B for the hour to be predicted, the actual monitoring data of 10 points of the set C for the hour to be predicted, and the time of the hour to be predicted.
And the data set label is actual monitoring data of the target point position to be predicted for hours.
The model M5 is additionally provided with near point location data on the basis of the model M4, and can represent the influence of point location self characteristics, small-range local pollution source characteristics and regional transmission pollution on the point location.
Model M6:
The input data is characterized by the actual monitoring data of 10 points of the set A for the hour to be predicted and the first 25-first 1 hour, and the time of the hour to be predicted.
And the data set label is actual monitoring data of the target point position to be predicted for hours.
Model M7:
The input data is characterized by the actual monitoring data of 10 points of the set A for the hour to be predicted and the first 25-first 1 hour, and the actual monitoring data of 5 points of the set B for the hour to be predicted and the first 25-first 1 hour, and the time of the hour to be predicted.
And the data set label is actual monitoring data of the target point position to be predicted for hours.
Model M8:
The input data is characterized by the actual monitoring data of 10 points of the set A for the hour to be predicted and the first 25-first 1 hour, the actual monitoring data of 5 points of the set B for the hour to be predicted and the first 25-first 1 hour, the actual monitoring data of 10 points of the set C for the hour to be predicted and the first 25-first 1 hour, and the time of the hour to be predicted.
And the data set label is actual monitoring data of the target point position to be predicted for hours.
The model M6-M8 has the advantages similar to the model M3-M5, but the historical 24-hour data of the corresponding point location set is added, and under the conventional condition, the prediction effect is better than that of the model M3-M5, but the anti-interference capability on abnormal data is weaker.
The embodiment of the invention sets 8 types of models because the single model has different effects under different conditions (no pollution, local small-scale pollution, large-scale pollution transmission and the like), and the anti-interference capability is different when the single model faces abnormal conditions, and the 8 types of algorithms share the lower limit capable of improving the final prediction effect.
Finally, taking the next highest value in all the model prediction results and multiplying the next highest value by a coefficient a to be used as the upper limit of a prediction value interval/> . Wherein the coefficient a is a constant, and a constant value is obtained based on analysis of historical data for analysis.
According to one embodiment, in S104, the actual monitoring data of the target monitoring point is compared with the upper limit value of the PM2.5 data prediction range, and it is identified whether the current PM2.5 data of the target monitoring point is a point high value.
In specific implementation, based on a fixed rule (high value judgment rule), the actual PM2.5 concentration of the point location is compared with the upper limit of the concentration of the predicted interval, and then a high value result is output. The following is a high value judgment rule which is specifically adopted:
If the PM2.5 hour data of the target monitoring point is higher than the upper limit value of the PM2.5 data prediction range, judging the current PM2.5 data of the target monitoring point as a class 1 high value;
If the PM2.5 data of the target monitoring point position for at least 3 continuous hours are all judged to be the class 1 high value, judging the current PM2.5 data of the target monitoring point position to be the class 2 high value;
And if the PM2.5 data of the target monitoring point and the monitoring point within the range of 3km around the target monitoring point are both judged to be the class 1 high value, judging the current PM2.5 data of the target monitoring point to be the class 3 high value.
The embodiment of the invention provides a prediction algorithm based on historical data, which is used for identifying PM2.5 point location high values, can solve the problem of point location classification and processing with very many factors to be considered, can be applied to complex and large-scale areas/projects, obtains uniform and accurate high value identification results, and is particularly suitable for PM2.5 point location high value identification application scenes which are hidden in rules and difficult to identify by general fixed rules (high value judgment rules).
As shown in fig. 2, the present application further provides a PM2.5 point location high value identification system 200, including:
The calculation module 201 is configured to calculate, after determining all the monitoring points in the area to be monitored, historical data correlation coefficients between every two monitoring points in all the monitoring points according to PM2.5 historical data and PM10 historical data of each monitoring point;
The similar point position obtaining module 202 is configured to obtain, from all the monitoring points, a plurality of similar point position sets of the target monitoring points according to the historical data correlation coefficient between the target monitoring point position and the rest of the monitoring points and the distance between the target monitoring point position and the rest of the monitoring points;
The prediction module 203 is configured to construct a machine learning model according to the historical data of the target monitoring point location, the historical data and the actual monitoring data of each point location in the plurality of similar point location sets, and obtain a PM2.5 data prediction range of the target monitoring point location through model prediction;
And the identifying module 204 is configured to compare the actual monitoring data of the target monitoring point location with the upper limit value of the PM2.5 data prediction range, and identify whether the current PM2.5 data of the target monitoring point location is a point location high value.
The system provided by the embodiment is based on historical data of weather, point position PM2.5 data and other pollutants, firstly, point position data sets with similar data are selected from the point positions according to the point position type, the point position current data characteristics, the operation and maintenance conditions and the like, a machine learning model is built for the point position data sets, the reasonable range of target point position PM2.5 data is predicted, then the point position actual data is compared with the predicted reasonable range, and further whether the point position data is high or not is judged.
In one embodiment, a computer device is provided, as shown in fig. 3, including a memory 301, a processor 302, and a computer program stored on the memory and executable on the processor, where the processor implements any of the PM2.5 point location high value identification methods described above when the computer program is executed.
In particular, the computer device may be a computer terminal, a server or similar computing means.
In the present embodiment, a computer-readable storage medium storing a computer program for executing any of the above PM2.5 point location high value identification methods is provided.
In particular, computer-readable storage media, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer-readable storage media include, but are not limited to, phase-change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable storage media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than what is shown or described, or they may be separately fabricated into individual integrated circuit modules, or a plurality of modules or steps in them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present application should be included in the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims (7)

1. The PM2.5 point position high value identification method is characterized by comprising the following steps of:
Determining all monitoring points in the to-be-monitored area, and respectively calculating historical data correlation coefficients between every two monitoring points in all the monitoring points according to PM2.5 historical data and PM10 historical data of each monitoring point;
In all the monitoring points, respectively acquiring a plurality of similar point sets of the target monitoring points according to the historical data correlation coefficient between the target monitoring points and the rest of the monitoring points and the distance between the target monitoring points and the rest of the monitoring points;
According to the historical data of the target monitoring point, the historical data and the actual monitoring data of each point in the plurality of similar point sets, a machine learning model is constructed, and a PM2.5 data prediction range of the target monitoring point is obtained through model prediction;
comparing the actual monitoring data of the target monitoring point position with the upper limit value of the PM2.5 data prediction range, and identifying whether the current PM2.5 data of the target monitoring point position is a point position high value or not;
the historical data correlation coefficient is an average value of the correlation coefficient of PM2.5 historical data and the correlation coefficient of PM10 historical data;
Wherein, the calculation formula of the correlation coefficient is as follows:
Wherein is the kth historical data of the mth point, p is the total hours of the historical data, is the historical data average of the mth point, i.e./> is the kth historical data of the nth point, and i.e./> is the historical data average of the nth point;
In all the monitoring points, respectively obtaining a plurality of similar point sets of the target monitoring points according to the historical data correlation coefficient between the target monitoring points and the rest monitoring points and the distance between the target monitoring points and the rest monitoring points, wherein the method comprises the following steps:
among all monitoring points, ordering all the historical data correlation coefficients between a target monitoring point and other monitoring points from high to low, obtaining a set number of corresponding monitoring points with the historical data correlation coefficients ordered at the front, and taking the corresponding monitoring points as a similar point set A of the target monitoring point;
Among all the monitoring points, the distances between the target monitoring point and the rest monitoring points are sequenced according to the sequence from near to far, and the set number of monitoring points with the distances within a set first threshold range are taken as a similar point set B of the target monitoring points; taking a set number of monitoring points with the distance within a set second threshold range as a similar point set C of the target monitoring points;
According to the historical data of the target monitoring point, the historical data and the actual monitoring data of each point in the plurality of similar point sets, a machine learning model is constructed, and a PM2.5 data prediction range of the target monitoring point is obtained through model prediction, wherein the machine learning model comprises the following steps:
respectively training a plurality of machine learning models by taking at least one type of data of the historical data of the target monitoring point, the historical data of each point in the plurality of similar point sets and the actual monitoring data of each point as input data to obtain a plurality of prediction models;
And obtaining a plurality of prediction results through the plurality of prediction models, taking a next highest value in the plurality of prediction results, and multiplying the next highest value by a coefficient a to be used as an upper limit value of the PM2.5 data prediction range.
2. The PM2.5 spot high value identification method according to claim 1, wherein the PM2.5 history data comprises PM2.5 data of 1 year and PM2.5 data of the first 1 month, and the PM10 history data comprises PM10 data of 1 year and PM10 data of the first 1 month.
3. The PM2.5 point high value identification method according to claim 1, wherein the machine learning model is a machine learning model constructed based on XGBoost algorithm.
4. The PM2.5 point location high value identification method according to claim 1, characterized in that comparing the actual monitored data of the target monitored point location with the upper limit value of the PM2.5 data prediction range, identifying whether the current PM2.5 data of the target monitored point location is a point location high value, comprises:
If the PM2.5 hour data of the target monitoring point is higher than the upper limit value of the PM2.5 data prediction range, judging the current PM2.5 data of the target monitoring point as a class 1 high value;
If the PM2.5 data of the target monitoring point position for at least 3 continuous hours are all judged to be the class 1 high value, judging the current PM2.5 data of the target monitoring point position to be the class 2 high value;
And if the PM2.5 data of the target monitoring point and the monitoring point within the range of 3km around the target monitoring point are both judged to be the class 1 high value, judging the current PM2.5 data of the target monitoring point to be the class 3 high value.
5. A PM2.5 point high value identification system applying the PM2.5 point high value identification method according to any one of claims 1 to 4, characterized by comprising:
The calculation module is used for respectively calculating historical data correlation coefficients between every two monitoring points in all the monitoring points according to the PM2.5 historical data and the PM10 historical data of each monitoring point after determining all the monitoring points in the to-be-detected area;
The similar point position acquisition module is used for respectively acquiring a plurality of similar point position sets of the target monitoring point positions in all the monitoring point positions according to the historical data correlation coefficient between the target monitoring point positions and other monitoring point positions and the distance between the target monitoring point positions and the other monitoring point positions;
The prediction module is used for constructing a machine learning model according to the historical data of the target monitoring point location, the historical data and the actual monitoring data of each point location in the plurality of similar point location sets, and obtaining a PM2.5 data prediction range of the target monitoring point location through model prediction;
And the identification module is used for comparing the actual monitoring data of the target monitoring point position with the upper limit value of the PM2.5 data prediction range and identifying whether the current PM2.5 data of the target monitoring point position is a point position high value or not.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the PM2.5 point location high value identification method according to any one of claims 1 to 4 when the computer program is executed by the processor.
7. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program that executes the PM2.5 point location high value identification method according to any one of claims 1 to 4.
CN202410058043.9A 2024-01-16 2024-01-16 PM2.5 point location high value identification method, system, equipment and medium Active CN117577227B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410058043.9A CN117577227B (en) 2024-01-16 2024-01-16 PM2.5 point location high value identification method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410058043.9A CN117577227B (en) 2024-01-16 2024-01-16 PM2.5 point location high value identification method, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN117577227A CN117577227A (en) 2024-02-20
CN117577227B true CN117577227B (en) 2024-04-16

Family

ID=89862823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410058043.9A Active CN117577227B (en) 2024-01-16 2024-01-16 PM2.5 point location high value identification method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN117577227B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801423A (en) * 2021-03-29 2021-05-14 北京英视睿达科技有限公司 Method and device for identifying abnormity of air quality monitoring data and storage medium
WO2022217839A1 (en) * 2021-04-14 2022-10-20 江南大学 Air quality prediction method based on deep spatiotemporal similarity
CN115718169A (en) * 2022-11-14 2023-02-28 河北先河环保科技股份有限公司 Method, device and equipment for positioning high-value area with atmospheric pollution and storage medium
CN116796805A (en) * 2023-05-10 2023-09-22 华南师范大学 PM2.5 concentration prediction method based on Gaussian process regression and deep learning
CN117371608A (en) * 2023-10-30 2024-01-09 重庆市畜牧科学院 Pig house multi-point temperature and humidity prediction method and system based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801423A (en) * 2021-03-29 2021-05-14 北京英视睿达科技有限公司 Method and device for identifying abnormity of air quality monitoring data and storage medium
WO2022217839A1 (en) * 2021-04-14 2022-10-20 江南大学 Air quality prediction method based on deep spatiotemporal similarity
CN115718169A (en) * 2022-11-14 2023-02-28 河北先河环保科技股份有限公司 Method, device and equipment for positioning high-value area with atmospheric pollution and storage medium
CN116796805A (en) * 2023-05-10 2023-09-22 华南师范大学 PM2.5 concentration prediction method based on Gaussian process regression and deep learning
CN117371608A (en) * 2023-10-30 2024-01-09 重庆市畜牧科学院 Pig house multi-point temperature and humidity prediction method and system based on deep learning

Also Published As

Publication number Publication date
CN117577227A (en) 2024-02-20

Similar Documents

Publication Publication Date Title
CN115578015B (en) Sewage treatment whole process supervision method, system and storage medium based on Internet of things
CN110209560B (en) Data anomaly detection method and detection device
CN115018022B (en) Quality control method, device, equipment and medium for gridding environment monitoring data
CN108629436B (en) Method and electronic equipment for estimating warehouse goods picking capacity
CN110705115A (en) Meteorological forecasting method and system based on deep belief network
CN115936262B (en) Yield prediction method, system and medium based on big data environment interference
CN116257663A (en) Abnormality detection and association analysis method and related equipment for unmanned ground vehicle
CN111680712A (en) Transformer oil temperature prediction method, device and system based on similar moments in the day
CN114781688A (en) Method, device, equipment and storage medium for identifying abnormal data of business expansion project
CN117577227B (en) PM2.5 point location high value identification method, system, equipment and medium
Ghassemi et al. Optimal surrogate and neural network modeling for day-ahead forecasting of the hourly energy consumption of university buildings
CN114708007A (en) Intelligent decomposition method and system for store sales plan
CN113688506B (en) Potential atmospheric pollution source identification method based on multi-dimensional data such as micro-station and the like
Aydın Classification of the fire station requirement with using machine learning algorithms
CN115936242A (en) Method and device for obtaining traceability relation data of air quality and traffic condition
CN112243193B (en) Indoor positioning method and device, computer equipment and readable storage medium
WO2021245925A1 (en) Degradation estimation device and degradation estimation method
CN117664888B (en) Water quality monitoring method, device, equipment and medium based on water quality prediction model library
CN113379125B (en) Logistics storage sales prediction method based on TCN and LightGBM combined model
CN115455814B (en) Pollution source searching and pollutant distribution predicting method and system based on deep learning
CN116910595B (en) Efficient storage method for hydraulic circular ecological restoration data
Peruffo Improving predictive maintenance classifiers of industrial sensors' data using entropy. A case study
CN115508511A (en) Sensor self-adaptive calibration method based on gridding equipment full-parameter feature analysis
CN114594437A (en) Automatic labeling method, system and storage medium for radar trace data
CN116541252A (en) Computer room fault log data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant