CN109614526A - Environmental monitoring data fraud means recognition methods based on higher-dimension abnormality detection model - Google Patents

Environmental monitoring data fraud means recognition methods based on higher-dimension abnormality detection model Download PDF

Info

Publication number
CN109614526A
CN109614526A CN201811329158.8A CN201811329158A CN109614526A CN 109614526 A CN109614526 A CN 109614526A CN 201811329158 A CN201811329158 A CN 201811329158A CN 109614526 A CN109614526 A CN 109614526A
Authority
CN
China
Prior art keywords
data
monitoring data
environmental monitoring
counterfeiting
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811329158.8A
Other languages
Chinese (zh)
Inventor
伯鑫
常象宇
崔维庚
汤铃
薛晓达
孙少波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Jiupai Data Technology Co Ltd
Environmental Engineering Assessment Center Ministry Of Environmental Protection Of People's Republic Of China
Original Assignee
Xi'an Jiupai Data Technology Co Ltd
Environmental Engineering Assessment Center Ministry Of Environmental Protection Of People's Republic Of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Jiupai Data Technology Co Ltd, Environmental Engineering Assessment Center Ministry Of Environmental Protection Of People's Republic Of China filed Critical Xi'an Jiupai Data Technology Co Ltd
Priority to CN201811329158.8A priority Critical patent/CN109614526A/en
Publication of CN109614526A publication Critical patent/CN109614526A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The environmental monitoring data fraud means recognition methods based on higher-dimension abnormality detection model that the present invention provides a kind of, for solving the problems, such as that environmental monitoring data fraud means identification task weight, difficulty are big.The described method includes: carrying out quality evaluation to environmental monitoring data;Classify to the fraud means of environmental monitoring data;Intelligent recognition is carried out to " the data identification type fraud means " in environmental monitoring data.The present invention is based on higher-dimension abnormality detection model theories, it is analyzed with multidimensional statistics, spatial analysis, a variety of analytical technologies such as index portrait analysis, establish the fraud means identification model of on-line environmental monitoring data, form doubtful fraud analysis results abundant, from the various attributive analysis Suspected Degrees of enterprise, automatic identification is carried out to the fraud means of environmental monitoring data, early warning and alarm, improve the working efficiency of environmental protection administration monitoring personnel, effectively strengthen environmental Kuznets Curves and enterprise's supervision, an intelligentized supervising platform is provided preferably to practice national environmental protection policy.

Description

Environmental monitoring data counterfeiting means identification method based on high-dimensional anomaly detection model
Technical Field
The invention belongs to the field of environmental monitoring and protection, and particularly relates to an environmental monitoring data counterfeiting means identification method based on a high-dimensional anomaly detection model.
Background
With the increase of population and the development of civilization, people pay more and more attention to the problem of the living environment, and the country pays high attention to the environmental protection industry. China strongly promotes pollution treatment, and releases ten items of atmosphere and ten items of water in succession, and comprehensively treats pollution in the fields of atmosphere and water environment. With the gradual emphasis of the country on the environmental protection industry and the deepening and the wide application of the pollution source automatic monitoring data, some enterprises perform private transformation on automatic monitoring equipment and even counterfeit data in order to avoid supervision and punishment. Therefore, identifying the counterfeiting behavior of the automatic monitoring data and monitoring the counterfeiting enterprises by law are one of the important works of the environmental protection department.
At present, national key monitoring enterprises in China are as many as 1.4 thousands of enterprises, and environmental protection departments have limited monitoring personnel, and the difficulty and the task are high by relying on manual monitoring alone. In the big data era, how to fully mine and analyze massive automatic monitoring data currently mastered by the environmental protection department, establish an automatic monitoring data identification warning early warning model, intelligently identify counterfeiting means, counterfeiting behaviors and counterfeiting enterprises, improve the monitoring accuracy of the environmental data counterfeiting behaviors, improve the working efficiency of monitoring personnel of the environmental protection department, better practice the national environmental protection policy, and become a new exploration direction of the current data monitoring work of the environmental protection department.
Disclosure of Invention
The technical problem to be solved by the embodiment of the invention is that the artificial monitoring difficulty of the environment monitoring data counterfeiting means identification is large, the task is heavy, and the environment monitoring data counterfeiting means identification method is provided.
In order to solve the above technical problem, an embodiment of the present invention provides an environmental monitoring data counterfeiting method identification method based on a high-dimensional anomaly detection model, where the method includes the following steps:
step S1, carrying out quality evaluation on the environmental monitoring data;
step S2, classifying the counterfeiting means of the environmental monitoring data;
step S3, intelligently identifies the "data identification type counterfeit means" in the environmental monitoring data.
Further, the method further comprises:
and step S4, pushing the identification result to the relevant user.
Further, the quality evaluation in step S1 includes: integrity, variability, repeatability and availability of data, and compliance analysis.
Further, the air conditioner is provided with a fan,
integrity of the data, including: zero value, null value;
variability of the data, comprising: a negative value, an abnormally large value, an abnormally small value;
the repeatability and the availability of the data comprise the running time of the data;
and the standard-reaching analysis comprises an over-standard value or an over-standard rate.
Further, the step S2 of classifying the counterfeit means includes: data identification type counterfeiting means capable of being identified according to on-line monitoring data and non-data identification type counterfeiting means difficult to be identified according to the on-line monitoring data.
Further, the step S3 includes:
step S31, selecting an abnormal detection algorithm and detecting a change point;
step S32, constructing a fake-making means identification model according to the change points detected in the step S31;
and step S33, identifying the environmental monitoring data by using the counterfeiting means identification model.
Further, the model building process in step S32 is:
for the environmental monitoring data, a time series (X) is determined1,…,Xn) Distribution function ofWherein, the position parameter mu and the scale parameter sigma are both larger than 0;
for environment monitoring data of different time periods, the following data are available:
when mu is1≠μ2,σ1≠σ2Then, Xr (1) is present<r<n) a position scale change point;
according to another aspect of the present invention, there is also provided an environmental monitoring data counterfeiting means identification system, including: the environment monitoring data quality evaluation module; the environment monitoring data counterfeiting means classification module; the intelligent identification module is used for identifying data in the environment monitoring data by a fake making means; wherein,
the environment monitoring data quality evaluation module is used for carrying out data quality control evaluation on the on-line monitoring data of the important industry aiming at the characteristics of data of different industries;
the environment monitoring data counterfeiting means classification module is used for classifying data identification type counterfeiting means and non-data identification type counterfeiting means in the environment monitoring data;
the intelligent identification module of the data identification type counterfeiting means in the environment monitoring data is used for extracting the counterfeiting characteristics of the data identification type counterfeiting means and constructing an environment monitoring data counterfeiting means identification model based on a high-dimensional anomaly detection model by utilizing the extracted characteristics.
Further, the feature extraction includes: the method comprises the steps of determining the distribution and the position of change points by establishing the identification information quantity and the criterion of the change points of the detection structure, and performing combined analysis on the distribution and the position of the change points by using a data mining algorithm to form the characteristics for anomaly detection modeling.
Further, the system further comprises: an identification result pushing module;
the identification result pushing module is used for storing data analysis results in a database or pushing the data analysis results to related users in a mail, short message, FTP, Web page and WeChat mode by providing a group of nodes based on a visual and procedural data integration and fusion tool.
The technical scheme of the invention has the following beneficial effects:
in the scheme, based on a high-dimensional anomaly detection model theory and multiple analysis technologies such as multidimensional statistical analysis, spatial analysis, index portrait analysis and the like, a counterfeiting means identification model of online environment monitoring data is established, abundant suspected counterfeiting analysis achievements are formed, suspected degrees of enterprises are analyzed according to various attributes, the counterfeiting means of the environment monitoring data are automatically identified, early-warned and alarmed, the working efficiency of monitoring personnel of an environmental protection department is improved, environment control and enterprise supervision are effectively enhanced, and an intelligent supervision platform is provided for better practicing national environmental protection policies.
Drawings
In order to more clearly illustrate the embodiments of the present invention and the prior art, the following technical scheme description figures of the present invention are briefly introduced, and it is obvious that other figures can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic flow chart of a counterfeit identifying method according to a first embodiment of the present invention;
fig. 2 is a flowchart illustrating the pushing recognition result of step S4 according to the first embodiment of the present invention;
fig. 3 is a data screenshot of a suspected counterfeit recognition result output by the counterfeit means recognition model according to the embodiment of the present invention;
FIG. 4 is a diagram illustrating an enterprise ranking obtained by averaging the suspected degree of the gas data according to the registration type according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating an enterprise ranking obtained by averaging the water data according to registration type according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating an enterprise ranking of gas data based on average doubtful degree according to a scale according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating an enterprise ranking of water data scaled to average doubtful degree according to an embodiment of the present invention;
FIG. 8 is a diagram illustrating an enterprise ranking in which gas data is averaged to determine an average suspicion according to a degree of interest, according to an embodiment of the present invention;
FIG. 9 is a diagram illustrating an enterprise ranking of water data based on the attention to determine average suspicion;
FIG. 10 is an environmental image of a suspected counterfeit enterprise according to an embodiment of the present invention;
FIG. 11 is an environmental image of an enterprise with suspected equipment problems according to an embodiment of the present invention;
FIG. 12 is a suspected normal environment-friendly image of an enterprise according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The invention provides an environmental monitoring data counterfeiting means identification method aiming at the problems of heavy work task and high difficulty of the existing environmental monitoring data counterfeiting means identification, and the counterfeiting data is intelligently monitored by constructing a counterfeiting means identification model, so that the environmental data counterfeiting behavior monitoring accuracy is improved, and the work efficiency of monitoring personnel of the environmental protection department is improved.
The technical solution of the present invention is described in detail by specific embodiments with reference to the accompanying drawings.
First embodiment
The embodiment provides a method for identifying a counterfeiting means of environmental monitoring data based on a high-dimensional anomaly detection model, and fig. 1 is a schematic flow chart of the method for identifying the counterfeiting means. As shown in fig. 1, the method comprises the steps of:
step S1, carrying out quality evaluation on the environmental monitoring data;
step S2, classifying the counterfeiting means of the environmental monitoring data;
step S3, intelligently identifies the "data identification type counterfeit means" in the environmental monitoring data.
The method may further comprise:
and step S4, pushing the identification result to the relevant user.
Wherein:
and step S1, performing quality evaluation on the environmental monitoring data.
Preferably, the environmental monitoring data in this step is online environmental monitoring data. In this embodiment, the environmental monitoring data described below refers to online environmental monitoring data unless otherwise specified.
In the embodiment, the data quality control evaluation, namely the data quality evaluation, can be performed on the online monitoring data of the important industry according to the characteristics of the data of different industries. The data quality assessment comprises: integrity, variability, repeatability and availability of data, and compliance analysis, etc.
Integrity of the data, including: zero value, null value;
variability of the data, comprising: a negative value, an abnormally large value, an abnormally small value;
the repeatability and the availability of the data comprise the running time of the data;
and the standard-reaching analysis comprises an over-standard value or an over-standard rate.
The specific conditions of the data quality evaluation are analyzed by way of example in combination with the actual conditions of the environmental monitoring.
Analysis for the zero value case: for example, in monitoring data of atmospheric or water pollution, there are some cases where the emission concentration is zero, and in most cases, the occurrence of zero is continuous; successive zero values indicate a particular anomaly. The number of the monitoring data zero values of an enterprise can also reflect the monitoring data quality and the monitoring equipment quality of the enterprise, so that the zero value rate can be used as one of indexes of subsequent modeling analysis.
The statistical model formula for zero values is as follows: the zero-value rate of the enterprise monitoring data is equal to the zero-value data hours of the enterprise/the total operating hours.
Analysis for null case: still take monitoring data of atmospheric or water pollution as an example, there are some cases that the emission concentration is empty in the data, and in most cases, the appearance of the empty value is continuous; the continuous null indicates a continuous production run of the enterprise. The number of null values of monitoring data of an enterprise can reflect the continuous production operation condition of the enterprise, and the null value rate is also one of indexes of modeling analysis.
The formula of the statistical model of the null rate is as follows: the empty rate of the enterprise monitoring data is equal to the number of hours of the enterprise empty data/the total number of hours of operation.
Analysis for negative case: some emission concentration is negative in the environment monitoring data, and in most cases, the occurrence of negative values is instantaneous, and a large negative value record appears in a short time and quickly falls back to form a negative peak value. In a few cases, the emission concentration appears negative for a sustained period of time. The possible cause of negative value abnormal data is an equipment problem.
Analysis for outlier case: some cases of abnormally high emission concentration exist in the data, and some cases are represented by the situation that the concentration is increased instantaneously and then the node at the next time is reduced back quickly. Part of the instantaneous discharge concentration can be increased to 1000mg/m3Above, sometimes even up to 8000mg/m3. In addition, there are some enterprises that have high emission concentrations during certain periods; part of enterprises firstly appear that the emission concentration is very high for a period of time and then is reduced; some businesses first develop lower emission concentrations and then high concentrations for some time from a certain point in time. Combining with expert experience, the instantaneous discharge concentration exceeds 1000mg/m3Can be determined as an abnormal value. Possible causes of abnormally large data are instrument calibration or equipment maintenance. The abnormal value of the monitoring data of one enterprise can reflect the quality of the monitoring data and the quality of the monitoring equipment of the enterprise.
The abnormal rate is also one of the indexes of modeling analysis, and the formula of the statistical model is as follows: abnormal data is (reduced) concentration <0or (reduced) concentration > 1000. The abnormal rate of the monitoring data of a certain enterprise is equal to the abnormal data hours/total operating hours of the certain enterprise.
Analysis of run length: the operation duration of an enterprise reflects the production operation stability of the enterprise, and is also one of the indexes of modeling analysis.
And (3) analyzing the overproof rate: the standard reaching analysis is an important method for determining whether the enterprise is legally discharged, so that the standard reaching value or the standard exceeding value is an important index for measuring the operation condition of the enterprise. The embodiment takes the exceeding rate as one of the indexes of modeling analysis. The gas data was standardized using a new standard specification executed from 7/1/2014: the converted emission concentration of smoke dust is not more than 10mg/m3, the converted emission concentration of sulfur dioxide is not more than 35mg/m3, and the converted emission concentration of nitrogen oxide is not more than 50mg/m 3. And for the environmental monitoring data, on the basis of eliminating abnormal data values, carrying out standard reaching analysis on the time-by-time discharge data, and counting that the pollutant discharge of a certain discharge exceeds a standard once when the average concentration of the certain discharge hour of a certain enterprise exceeds a discharge limit value.
According to the analysis method and the evaluation standard, the standard-reaching analysis statistical model formula is as follows:
effective operation hours-total operation hours-abnormal operation hours
The out-of-standard rate of a certain enterprise is equal to the accumulated out-of-standard hours of a certain enterprise/the effective operation total hours of a certain enterprise
Through the data quality evaluation, the data quality, the equipment quality and the enterprise operation stability can be directly or indirectly inspected from five levels of abnormal values, zero values, null values, operation time and over-standard values. Correspondingly, five design indexes of the abnormal rate, the zero value rate, the null value rate, the running time and the standard exceeding rate are used as indexes of modeling analysis and important characteristics of the enterprise environment-friendly portrait so as to quickly know the enterprise environment-friendly general view.
When the subsequent model analysis is carried out, the quality of the data is controlled according to the result of the data quality evaluation, and after the analysis:
(1) abnormal values in the original data need to be screened and eliminated;
(2) zero values are reserved, and the condition of continuous zero values needs to be mainly examined;
(3) when model analysis is carried out on the null value, the null value needs to be removed in order to ensure the continuity of data;
(4) the running time and the standard exceeding value can be used as a reference threshold value in the model analysis process, and if the running time is too short, enterprises which do not carry out model analysis can be skipped.
In the step, the quality of the data is evaluated, and the abnormal data can be directly screened and removed after the evaluation is finished. In addition, the quality evaluation result reflects the abnormal value or proportion of the enterprise data, indirectly reflects the quality of the enterprise monitoring data or the quality of the monitoring equipment, and can be used as one of the characteristics of modeling analysis.
Step S2, the environmental monitoring data is classified into fake means.
The online environment monitoring data counterfeiting means comprises equipment counterfeiting and data counterfeiting, and results reflected by various counterfeiting situations to a presentation system are different. In addition, the counterfeiting means of different industries are different due to different business characteristics of each industry, and the characteristics reflected on the monitoring data are not completely the same. Therefore, the conventional counterfeiting means in the key industry must be systematically analyzed and classified according to the characteristics presented by the data. By analyzing and classifying, the key counterfeiting means can be divided into: a data identification type counterfeiting means which can be identified according to online monitoring data, and a non-data identification type counterfeiting means which is difficult to identify according to the online monitoring data; the recognizable features can be provided aiming at the data recognition type counterfeiting means.
Step S3, intelligently identifies the "data identification type counterfeit means" in the environmental monitoring data.
In this step, firstly, a statistical analysis method, a machine learning algorithm, and the like are used to extract the counterfeiting features of the data identification type counterfeiting means. Specifically, the method comprises the following steps: the method comprises the steps of determining the distribution and the position of change points by establishing the identification information quantity and the criterion of the change points of the detection structure, and performing combined analysis on the distribution and the position of the change points by using a data mining algorithm to form the characteristics for anomaly detection modeling. Secondly, an intelligent and automatic environment monitoring data counterfeiting means identification model is constructed based on a high-dimensional anomaly detection model by using the extracted features, so that the counterfeiting means of the environment monitoring data is intelligently identified.
And step S4, pushing the identification result to the relevant user.
Fig. 2 is a schematic flow chart illustrating the pushing of the recognition result in this step. As shown in fig. 2, this step is implemented based on a visualized and streamlined data integration and fusion process, and a group of nodes is provided to store data analysis results in a database or directly push data analysis results to relevant users in the form of mails, short messages, FTP, Web pages, WeChat, and the like. The process can be published into a background cloud service, and the user can call the published service.
The model is identified through the environment monitoring data counterfeiting means, the environment monitoring data is analyzed, a suspected counterfeiting enterprise list can be provided, suspected counterfeiting starting point analysis and suspected counterfeiting time period analysis are carried out, and suspected counterfeiting starting time, suspected counterfeiting ending time, suspected counterfeiting types, suspected degree and the like are provided.
Wherein, the step S3 is to intelligently identify the environmental monitoring data, and further includes the following steps:
in step S31, an abnormality detection algorithm is selected, and a change point is detected.
Some point or points of sudden change in the statistical model are called outliers, or outliers. The change points reflect qualitative changes of things and contain rich information. The detection of the change point is realized by an anomaly detection algorithm, and the method is widely applied to various fields of industrial quality control, climate simulation, network security, fraud detection and the like. In the step, the environmental monitoring data is combined with the reality of the environmental monitoring data, and the online environmental monitoring data is analyzed by adopting a statistical-based parameter anomaly detection algorithm. And detecting a variable point through the anomaly detection algorithm and giving an alarm in time.
And a step S32 of constructing a fake-making means identification model according to the change points detected in the step S31.
In this step, a distribution model to which the environment monitoring data obeys is determined by a statistical analysis method, and parameters are monitored and analyzed by a position-scale parameter model and a variable point.
Specifically, the model construction process is as follows:
evaluating the environmental monitoring dataObtaining its time series (X)1,…,Xn) Distribution function ofHere, both the position parameter μ and the scale parameter σ are greater than 0.
For environment monitoring data of different time periods, the following data are available:
when mu is1≠μ2,σ1≠σ2Then, Xr (1) is present<r<n) position scale point. And (4) performing machine learning by using the position scale variable point, and constructing a fake-making means identification model.
And step S33, identifying the environmental monitoring data by using the counterfeiting means identification model.
In this step, the identification result of the environmental monitoring data includes: suspected counterfeit start time, suspected counterfeit end time, suspected counterfeit type, and suspected degree.
The following describes the identification model of the counterfeiting means in this embodiment in detail by using a specific application example.
The embodiment is applied, taking the continuous monitoring data (gas data for short) of the smoke emission of the fixed pollution source and the online monitoring data (water data for short) of the wastewater emission as an example, the data set is described as follows:
the storage mode is as follows: a CSV file;
number of files: 3 (gas data), 2 (water data);
total size of space: 127 GB;
the number of data pieces: 725,641,678 (7 hundred million 2 pieces);
the number of fields: 16 (gas data) and 15 (water data).
The gas data comprises 16 fields, the water data comprises 15 fields, and an enterprise code (pscode), a discharge code (outputcode), time (monitorme) and reduced concentration (reviedzsstrength) are core variables.
In the identification model of the counterfeiting means, 7 hundred million online monitoring data generated by 10000 enterprises with input data are calculated by the identification model of the counterfeiting means, and an output result item list is shown in table 1 and comprises: suspected counterfeit start time, suspected counterfeit end time, suspected counterfeit type, and suspected degree.
TABLE 1
For the suspected counterfeiting types, research analysis shows that a considerable part of enterprises exist, the data of the enterprises are constant values for a long time (even 1 year), which is not realistic, and therefore, the data of the enterprises are marked as "constant values". And labeled as "non-constant" for other types.
For non-constant value data, in a counterfeiting means identification model, an output index, namely a suspected degree, is designed, the field ranges from 0 to 1, the more close the field is to 1, the data in a suspected counterfeiting time period is shown, the more suspected counterfeiting degree is larger, the more close the field is to 0, the data in the suspected counterfeiting time period is shown, and the less suspected counterfeiting degree is smaller. For the constant value data, the suspected degree is directly assigned to be 1, and the suspected fake type field is used for distinguishing the suspected fake degree from the non-constant value data.
Fig. 3 is a screenshot of the suspected counterfeit recognition result data output by the counterfeit means recognition model. As shown in fig. 3, the doubtful degree threshold is set to 0.1; the data of the suspected counterfeit identification result are divided into the following data according to the size of the suspected degree: the suspected counterfeit data with the suspected degree of being more than 0.1 has 33155 strips (wherein 29400 strips are non-constant class data). Because the threshold value of the suspected degree is set to be lower, more suspected counterfeit data items are provided, and the practical application significance is not achieved; the suspected counterfeit data with the suspected degree greater than 0.5 has 5258 entries (non-constant class data 1503). The threshold value of the suspected degree is set to be higher, so that the method has application significance. As can be seen, the analysis result is related to the setting of the threshold of the doubtness degree. In this embodiment, when the method is applied, the doubtful degree threshold is set to 0.5.
Further, the doubtful degree is deeply analyzed, so that more enterprise information can be known. The plausibility analysis may include the following:
carrying out multi-dimensional statistical analysis; performing spatial analysis; and (5) analyzing the environmental protection portrait of the enterprise.
Wherein the multi-dimensional statistical analysis comprises: analyzing the enterprise registration type; enterprise scale analysis and attention degree analysis; the spatial analysis comprises: province space analysis; analyzing a coordinate space; the environmental protection portrait analysis of enterprise includes: suspected counterfeit enterprises, suspected equipment problem enterprises and suspected normal enterprises.
Still take the data sets of the continuous monitoring data (gas data for short) of the smoke emission of the fixed pollution source and the online monitoring data (water data for short) of the wastewater emission in the application of this embodiment as examples:
and (3) according to the dimension of the registration type, calculating the average suspicion degree of enterprises of different registration types, wherein the ranking of the qi data enterprises is shown in fig. 4, and the ranking of the water data enterprises is shown in fig. 5.
As shown in fig. 4 and 5:
(1) the registration types are 30 types, including personal partnership, limited liability company, Chinese and foreign joint venture, national enterprise, foreign enterprise, collective enterprise, and the like.
(2) For businesses involved in gas data, the three types with the highest average doubtness are: personal partners, individual households, private sole proprietorships (personal sole proprietorships); the three types with the lowest average doubtness are: outsourcer investment shares company Limited, and Hongkong, Macao, Taiwan business enterprises for exclusive operation. The method shows that personal enterprises are relatively lack of execution strength on environmental monitoring data, and national enterprises and foreign enterprises have better execution strength. It is noted that the average suspicion of a partner enterprise is much higher than that of other enterprises, close to 1, and the monitoring data of a large number of enterprises is constant or zero throughout the year. It can be concluded from the results that in the suspected fraud analysis, personal type businesses are of major concern.
(3) For water data related enterprises, the three types with the highest average doubtness are: individual management, private sole proprietorship enterprises (individual sole proprietorship enterprises), individual household; the three types with the lowest average doubtness are: the enterprises of exclusive operation of the Hongkong and Australia Taiwan traders, the enterprises of united nations and the enterprises of united nations. The method shows that personal enterprises are relatively lack of execution strength on environmental monitoring data, and national enterprises and foreign enterprises have better execution strength. It can be concluded from the results that in the suspected fraud analysis, personal type businesses are of major concern.
And (4) according to the scale dimension, calculating the average suspicion degree of enterprises of different scales, wherein the gas data enterprise ranking is shown in fig. 6, and the water data enterprise ranking is shown in fig. 7.
As shown in fig. 6 and 7:
(1) the scale attribute is 8 types, including small, medium-type, large-scale first grade, large-scale second grade, super-large-scale, stocks limited company, and the like.
(2) For businesses involved in gas data, the three types with the highest average doubtness are: other, small, large first-class; the three types with the lowest average doubtness are: the company Limited, Extra-large, medium-sized. The small-scale enterprise has relatively poor execution strength on environmental monitoring data, and the large-scale enterprise has relatively good execution strength. From the results, it can be seen that small-scale enterprises are of great interest in suspected counterfeit analysis.
(3) For water data related enterprises, the three types with the highest average doubtness are: small, medium; the three types with the lowest average doubtness are: oversize, other, large first gear. The small-scale enterprise has relatively poor execution strength on environmental monitoring data, and the large-scale enterprise has relatively good execution strength. From the results, it can be seen that small-scale enterprises are of great interest in suspected counterfeit analysis.
And (4) according to the attention degree dimension, calculating the average suspicion degrees of enterprises with different attention degrees, wherein the gas data enterprise ranking is shown in fig. 8, and the water data enterprise ranking is shown in fig. 9.
As shown in fig. 8 and 9:
(1) the attention degree attributes are 4 types including non-heavy pollution sources, provincial control, city control and national control.
(2) For businesses involved in gas data, the two types of average suspicion are: non-heavy pollution sources, saving and controlling; the two types with the lowest average doubts are: national control and municipal control. The execution strength of the non-important pollution source enterprises on the environmental protection monitoring data is relatively deficient, and the execution strength of the state-controlled enterprises is better. From the results, it can be seen that in the analysis of suspected counterfeit, the enterprises which are not important pollution sources are focused.
(3) For water data related enterprises, the two types with the highest average doubtness are: non-heavy pollution sources and city control; the two types with the lowest average doubts are: national control and provincial control. The execution strength of the non-important pollution source enterprises on the environmental protection monitoring data is relatively deficient, and the execution strength of the state-controlled enterprises is better. From the results, it can be seen that in the analysis of suspected counterfeit, the enterprises which are not important pollution sources are focused.
According to the province dimension, the average suspected degree of enterprises with different provinces is obtained, and the province space distribution of the gas data and the water data can be obtained.
And according to the coordinate dimension, carrying out spatial display on the suspected degree of the enterprise to obtain the enterprise coordinate spatial distribution of the gas data and the water data.
In addition, matching the suspected degree with other indexes including abnormal rate, zero value rate, null value rate and standard exceeding rate can obtain the environment-friendly portrait of the enterprise.
FIG. 10 is a diagram of an environmental image of a suspected counterfeit enterprise. As shown in fig. 10, a suspected counterfeit enterprise is typically characterized by a high suspected counterfeit index, while other indexes are relatively normal. For such enterprises, attention can be focused, and further investigation can be carried out to confirm whether counterfeiting behaviors exist.
FIG. 11 is an environmental image of an enterprise suspected of equipment problems. As shown in fig. 11, a suspected enterprise with equipment problems is typically characterized by a high abnormality rate index, while other indexes are relatively normal, so that the enterprise can determine the quality of the monitored equipment, and if confirmed, the quality of the monitored equipment can be repaired or improved.
FIG. 12 is a suspected normal environment-friendly image of an enterprise. As shown in fig. 12, a suspected normal business is typically characterized by low indexes.
Second embodiment
The embodiment provides an environmental monitoring data counterfeiting means identification system based on a high-dimensional anomaly detection model, which comprises: the environment monitoring data quality evaluation module; the environment monitoring data counterfeiting means classification module; and the intelligent identification module is used for identifying data in the environment monitoring data.
Preferably, the system may further include: and an identification result pushing module.
The environment monitoring data quality evaluation module is used for performing data quality control evaluation on-line monitoring data of the important industry aiming at the characteristics of data of different industries, namely data quality evaluation.
The data quality assessment comprises: integrity, variability, repeatability and availability of data, and compliance analysis, etc.
Integrity of the data, including: zero value, null value;
variability of the data, comprising: a negative value, an abnormally large value, an abnormally small value;
the repeatability and the availability of the data comprise the running time of the data;
and the standard-reaching analysis comprises an over-standard value or an over-standard rate.
And the environment monitoring data counterfeiting means classification module is used for classifying the data identification type counterfeiting means and the non-data identification type counterfeiting means in the environment monitoring data.
The online environment monitoring data counterfeiting means comprises equipment counterfeiting and data counterfeiting, and results reflected by various counterfeiting situations to a presentation system are different. In addition, the counterfeiting means of different industries are different due to different business characteristics of each industry, and the characteristics reflected on the monitoring data are not completely the same. Therefore, the conventional counterfeiting means in the key industry must be systematically analyzed and classified according to the characteristics presented by the data. By analyzing and classifying, the key counterfeiting means can be divided into: a data identification type counterfeiting means which can be identified according to online monitoring data, and a non-data identification type counterfeiting means which is difficult to identify according to the online monitoring data; the recognizable features can be provided aiming at the data recognition type counterfeiting means.
The intelligent identification module of the data identification type counterfeiting means in the environment monitoring data is used for extracting the counterfeiting characteristics of the data identification type counterfeiting means by utilizing a statistical analysis method, a machine learning algorithm and the like, and constructing an intelligent and automatic environment monitoring data counterfeiting means identification model based on a high-dimensional anomaly detection model by utilizing the extracted characteristics.
Specifically, the feature extraction includes: the method comprises the steps of determining the distribution and the position of change points by establishing the identification information quantity and the criterion of the change points of the detection structure, and performing combined analysis on the distribution and the position of the change points by using a data mining algorithm to form the characteristics for anomaly detection modeling. And constructing an intelligent and automatic environment monitoring data counterfeiting means identification model based on the high-dimensional anomaly detection model by using the extracted features, so as to intelligently identify the counterfeiting means of the environment monitoring data.
The identification result pushing module is used for directly pushing data analysis results to relevant users in a storage mode or in a mail mode, a short message mode, an FTP mode, a Web page mode, a WeChat mode and the like through providing a group of nodes based on a visual and procedural data integration and fusion tool. The process can be published into the background cloud service, and the user can call the published service.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (7)

1. A method for identifying an environmental monitoring data counterfeiting means based on a high-dimensional anomaly detection model is characterized by comprising the following steps:
step S1, carrying out quality evaluation on the environmental monitoring data;
step S2, classifying the counterfeiting means of the environmental monitoring data;
step S3, intelligently identifies the "data identification type counterfeit means" in the environmental monitoring data.
2. The environmental monitoring data counterfeiting means identification method according to claim 1, wherein the method further comprises:
and step S4, pushing the identification result to the relevant user.
3. The environmental monitoring data counterfeiting method for identifying the environmental monitoring data according to claim 1 or 2, wherein the quality assessment in the step S1 comprises: integrity, variability, repeatability and availability of data, and compliance analysis.
4. The environmental monitoring data counterfeiting means identification method according to claim 3,
integrity of the data, including: zero value, null value;
variability of the data, comprising: a negative value, an abnormally large value, an abnormally small value;
the repeatability and the availability of the data comprise the running time of the data;
and the standard-reaching analysis comprises an over-standard value or an over-standard rate.
5. The environmental monitoring data counterfeiting method for identifying the environmental monitoring data according to claim 1 or 2, wherein the step S2 of classifying the counterfeiting means comprises: data identification type counterfeiting means capable of being identified according to on-line monitoring data and non-data identification type counterfeiting means difficult to be identified according to the on-line monitoring data.
6. The environmental monitoring data counterfeiting method for identifying the environmental monitoring data according to claim 1 or 2, wherein the step S3 further comprises:
step S31, selecting an abnormal detection algorithm and detecting a change point;
step S32, constructing a fake-making means identification model according to the change points detected in the step S31;
and step S33, identifying the environmental monitoring data by using the counterfeiting means identification model.
7. The environmental monitoring data counterfeiting method for identifying the environmental monitoring data according to claim 6, wherein the model construction process in the step S32 is as follows:
for the environmental monitoring data, a time series (X) is determined1,…,Xn) Distribution function ofWherein, the position parameter mu and the scale parameter sigma are both larger than 0;
for environment monitoring data of different time periods, the following data are available:
when mu is1≠μ2,σ1≠σ2Then, Xr (1) is present<r<n) a position scale change point;
and (4) performing machine learning by using the position scale variable point, and constructing a fake-making means identification model.
CN201811329158.8A 2018-11-09 2018-11-09 Environmental monitoring data fraud means recognition methods based on higher-dimension abnormality detection model Pending CN109614526A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811329158.8A CN109614526A (en) 2018-11-09 2018-11-09 Environmental monitoring data fraud means recognition methods based on higher-dimension abnormality detection model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811329158.8A CN109614526A (en) 2018-11-09 2018-11-09 Environmental monitoring data fraud means recognition methods based on higher-dimension abnormality detection model

Publications (1)

Publication Number Publication Date
CN109614526A true CN109614526A (en) 2019-04-12

Family

ID=66004010

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811329158.8A Pending CN109614526A (en) 2018-11-09 2018-11-09 Environmental monitoring data fraud means recognition methods based on higher-dimension abnormality detection model

Country Status (1)

Country Link
CN (1) CN109614526A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245880A (en) * 2019-07-02 2019-09-17 浙江成功软件开发有限公司 A kind of pollution sources on-line monitoring data cheating recognition methods
CN110689257A (en) * 2019-09-24 2020-01-14 北京市天元网络技术股份有限公司 Fast-moving-away product business supervision method and device based on operator big data
CN110796847A (en) * 2020-01-06 2020-02-14 北京英视睿达科技有限公司 Block chain-based environment monitoring station operation and maintenance system and method
CN112990632A (en) * 2019-12-18 2021-06-18 北京智识企业管理咨询有限公司 Regional industry competitiveness analysis system and method based on big data
CN113012388A (en) * 2021-02-19 2021-06-22 浙江清之元信息科技有限公司 Pollution source online monitoring system and online monitoring data false identification analysis method
CN113655189A (en) * 2021-03-31 2021-11-16 吴超烽 Automatic monitoring data analysis and judgment system for pollution source
CN114354854A (en) * 2022-01-06 2022-04-15 武汉祁联生态科技有限公司 Abnormity detection method for flue gas monitoring data
CN117235624A (en) * 2023-09-22 2023-12-15 中节能天融科技有限公司 Emission data falsification detection method, device and system and storage medium
WO2024030525A1 (en) * 2022-08-03 2024-02-08 Schlumberger Technology Corporation Automated record quality determination and processing for pollutant emission quantification
CN118313564A (en) * 2024-06-05 2024-07-09 生态环境部环境工程评估中心 Abnormality identification method, device, equipment and medium for enterprise emission monitoring data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005032181A (en) * 2003-07-11 2005-02-03 Nippon Telegr & Teleph Corp <Ntt> Environment monitoring system and its authentication device
CN104091061A (en) * 2014-07-01 2014-10-08 北京金控自动化技术有限公司 Method for using normal distribution for assisting in determining effectiveness of pollution source monitoring data
CN104135521A (en) * 2014-07-29 2014-11-05 广东省环境监测中心 Method and system of identifying data abnormal values of environment automatic monitoring network
CN106709242A (en) * 2016-12-07 2017-05-24 常州大学 Method for identifying authenticity of sewage monitoring data
CN108304851A (en) * 2017-01-13 2018-07-20 重庆邮电大学 A kind of High Dimensional Data Streams Identifying Outliers method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005032181A (en) * 2003-07-11 2005-02-03 Nippon Telegr & Teleph Corp <Ntt> Environment monitoring system and its authentication device
CN104091061A (en) * 2014-07-01 2014-10-08 北京金控自动化技术有限公司 Method for using normal distribution for assisting in determining effectiveness of pollution source monitoring data
CN104135521A (en) * 2014-07-29 2014-11-05 广东省环境监测中心 Method and system of identifying data abnormal values of environment automatic monitoring network
CN106709242A (en) * 2016-12-07 2017-05-24 常州大学 Method for identifying authenticity of sewage monitoring data
CN108304851A (en) * 2017-01-13 2018-07-20 重庆邮电大学 A kind of High Dimensional Data Streams Identifying Outliers method

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245880A (en) * 2019-07-02 2019-09-17 浙江成功软件开发有限公司 A kind of pollution sources on-line monitoring data cheating recognition methods
CN110689257B (en) * 2019-09-24 2022-09-09 北京市天元网络技术股份有限公司 Operator big data-based fast-moving consumer goods business supervision method and device
CN110689257A (en) * 2019-09-24 2020-01-14 北京市天元网络技术股份有限公司 Fast-moving-away product business supervision method and device based on operator big data
CN112990632A (en) * 2019-12-18 2021-06-18 北京智识企业管理咨询有限公司 Regional industry competitiveness analysis system and method based on big data
CN112990632B (en) * 2019-12-18 2024-01-09 北京智识企业管理咨询有限公司 Regional industry competitiveness analysis system and method based on big data
CN110796847A (en) * 2020-01-06 2020-02-14 北京英视睿达科技有限公司 Block chain-based environment monitoring station operation and maintenance system and method
CN113012388B (en) * 2021-02-19 2023-02-24 浙江清之元信息科技有限公司 Pollution source online monitoring system and online monitoring data false identification analysis method
CN113012388A (en) * 2021-02-19 2021-06-22 浙江清之元信息科技有限公司 Pollution source online monitoring system and online monitoring data false identification analysis method
CN113655189A (en) * 2021-03-31 2021-11-16 吴超烽 Automatic monitoring data analysis and judgment system for pollution source
CN114354854A (en) * 2022-01-06 2022-04-15 武汉祁联生态科技有限公司 Abnormity detection method for flue gas monitoring data
CN114354854B (en) * 2022-01-06 2024-02-13 武汉祁联生态科技有限公司 Abnormality detection method for smoke monitoring data
WO2024030525A1 (en) * 2022-08-03 2024-02-08 Schlumberger Technology Corporation Automated record quality determination and processing for pollutant emission quantification
CN117235624A (en) * 2023-09-22 2023-12-15 中节能天融科技有限公司 Emission data falsification detection method, device and system and storage medium
CN117235624B (en) * 2023-09-22 2024-05-07 中节能数字科技有限公司 Emission data falsification detection method, device and system and storage medium
CN118313564A (en) * 2024-06-05 2024-07-09 生态环境部环境工程评估中心 Abnormality identification method, device, equipment and medium for enterprise emission monitoring data
CN118313564B (en) * 2024-06-05 2024-08-23 生态环境部环境工程评估中心 Abnormality identification method, device, equipment and medium for enterprise emission monitoring data

Similar Documents

Publication Publication Date Title
CN109614526A (en) Environmental monitoring data fraud means recognition methods based on higher-dimension abnormality detection model
WO2021184630A1 (en) Method for locating pollutant discharge object on basis of knowledge graph, and related device
CN110223168B (en) Label propagation anti-fraud detection method and system based on enterprise relationship map
Peng et al. A hybrid data mining approach on BIM-based building operation and maintenance
CN111275333B (en) Pollution data processing method and device
CN105868373B (en) Method and device for processing key data of power business information system
CN110705855A (en) Enterprise environment portrait evaluation method and system
CN110990393A (en) Big data identification method for abnormal data behaviors of industry enterprises
CN111506618B (en) Abnormal electricity consumption behavior analysis method combined with lightgbm-stacking algorithm
CN111310803B (en) Environment data processing method and device
CN109754177B (en) Pollution source portrait label system, construction method of pollution source portrait and application thereof
Roberts et al. Developing a library of sustainable manufacturing practices
CN114757468B (en) Root cause analysis method for process execution abnormality in process mining
CN115794803B (en) Engineering audit problem monitoring method and system based on big data AI technology
CN115689396B (en) Pollutant emission control method, device, equipment and medium
CN108763966B (en) Tail gas detection cheating supervision system and method
CN105260849A (en) Scientific researcher evaluation method across social networks
CN111951104A (en) Risk conduction early warning method based on associated graph
CN109358608A (en) A kind of transformer state methods of risk assessment and device based on integrated study
CN113988711A (en) Power consumption data-based monitoring method for stopping or limiting production of sewage disposal enterprises in control state
CN115660262A (en) Intelligent engineering quality inspection method, system and medium based on database application
Yemelyanov et al. Assessment of Information Barriers to the Implementation of Energy Saving Projects at Ukrainian Enterprises
Liu Artificial Intelligence and Machine Learning based Financial Risk Network Assessment Model
CN111489073A (en) Classification algorithm-based user electricity consumption price situation early warning method
Vijaya et al. Monitoring the stability of the processes in defined level software companies using control charts with three sigma limits

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination