CN111710373A - Method, device, equipment and medium for detecting volatile organic compound observation data - Google Patents

Method, device, equipment and medium for detecting volatile organic compound observation data Download PDF

Info

Publication number
CN111710373A
CN111710373A CN202010700226.8A CN202010700226A CN111710373A CN 111710373 A CN111710373 A CN 111710373A CN 202010700226 A CN202010700226 A CN 202010700226A CN 111710373 A CN111710373 A CN 111710373A
Authority
CN
China
Prior art keywords
data
abnormal
volatile organic
organic compound
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010700226.8A
Other languages
Chinese (zh)
Inventor
樊旭
吴剑斌
陈焕盛
晏平仲
秦东明
王文丁
梁倩
杨佩霖
肖林鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
3Clear Technology Co Ltd
Original Assignee
3Clear Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 3Clear Technology Co Ltd filed Critical 3Clear Technology Co Ltd
Priority to CN202010700226.8A priority Critical patent/CN111710373A/en
Publication of CN111710373A publication Critical patent/CN111710373A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Testing Or Calibration Of Command Recording Devices (AREA)

Abstract

The application discloses a method, a device, equipment and a medium for detecting abnormal values of volatile organic compound observation data. The method comprises the following steps: acquiring historical observation data of volatile organic compounds for a period of time; determining abnormal threshold values and normal data of concentrations of all components in historical volatile organic compound observation data; respectively training an isolated forest algorithm model and a local abnormal factor detection algorithm model by using normal data; and detecting single-time volatile organic compound observation data by using the abnormal threshold, the trained isolated forest algorithm model and the trained local abnormal factor detection algorithm model. According to the method, the volatile organic compound observation data are detected by using the abnormal threshold, the trained isolated forest algorithm model and the trained local abnormal factor detection algorithm model, so that the accuracy of the detection result can be greatly improved, and the application blank of the machine learning algorithm in the aspect of volatile organic compound observation data abnormal detection is filled.

Description

Method, device, equipment and medium for detecting volatile organic compound observation data
Technical Field
The application relates to the technical field of environmental monitoring, in particular to a method, a device, equipment and a medium for detecting abnormal values of volatile organic compound observation data.
Background
Volatile Organic Compounds (VOCs) are Organic Compounds that have high saturated vapor pressures and are Volatile at room temperature and pressure. High-concentration VOCs can bring adverse effects to human activities and ecological environment, and when the high-concentration VOCs are exposed to the environment with the concentration exceeding a certain threshold value, adverse symptoms such as headache, nausea, vomiting and the like can be caused, and convulsion, coma and memory decline can be caused by long-term contact, and even adverse effects can be caused to the liver, kidney and central nervous system of a human; meanwhile, as an important precursor of secondary organic aerosol and ozone, high-concentration VOCs is one of main factors causing urban ozone and particulate pollution.
VOCs in the atmosphere contain hundreds of substances, the sources of the VOCs are complex, and different factors such as industrial structures, underlying surfaces and climates cause the typical components of the VOCs in different areas and the overall chemical activity of the VOCs to have obvious differences. The complexity of VOCs directly increases the prevention and control difficulty of ozone and particulate pollution, and the time-space change characteristics of VOCs in specific areas are clarified based on high-quality VOCs observation data, so that the method becomes a core problem for scientifically formulating pollution prevention and control measures such as ozone.
When the VOCs observation instrument has been developed for a long time, but compared with the conventional six-parameter observation instrument, the VOCs observation instrument still has more defects, a large amount of abnormal data exist in the detection result, the utilization value of the data is greatly reduced, and how to effectively identify and eliminate the abnormal data becomes a key technical point for improving the quality of the VOCs observation data.
The abnormal values of the VOCs observed data are mainly caused by insufficient response or excessive response of an observation instrument to the change of the pollutant concentration. The low response can cause the observation concentration data to have an extremely low abnormal value, and the excessive response of the instrument can cause the observation concentration to have an abnormally high value.
The business department sets a global abnormal threshold value of each pollutant in a certain area through experience, and detects the abnormal value by adopting a threshold value method. By setting maximum and minimum thresholds of observed values of different species of VOCs, the observed data which are not in the range are marked as abnormal data to be removed; the empirical method is difficult to comprehensively consider the difference between regions, the threshold value cannot be reused at each observation point, the constraint relation among components is rarely considered (such as iso-butane and n-butane, isopentane and n-pentane, meta-para-xylene and o-xylene, acetylene and benzene, trans-2-butene and cis-2-butene, and iso-isomers of trans-2-pentene and cis-2-pentene, etc. have better correlation, and the components with better correlation are called feature related component pairs), and abnormal value detection and elimination are performed through the global threshold value, and the time change feature of the total pollutant concentration (the threshold value difference in different seasons) is difficult to consider.
Disclosure of Invention
The application aims to provide a method, a device, equipment and a medium for detecting volatile organic compound observation data. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
According to an aspect of the embodiments of the present application, there is provided a method for detecting observed data of volatile organic compounds, including:
acquiring historical observation data of volatile organic compounds for a period of time;
determining abnormal threshold values and normal data of concentrations of all components in the historical volatile organic compound observation data; wherein the anomaly threshold comprises an anomaly high value threshold and an anomaly low value threshold;
respectively training an isolated forest algorithm model and a local abnormal factor detection algorithm model by using the normal data;
and detecting single-time volatile organic compound observation data by using the abnormal threshold, the trained isolated forest algorithm model and the trained local abnormal factor detection algorithm model.
Further, the determining abnormal threshold values and normal data of the concentrations of the components in the historical observed data of the volatile organic compounds includes: and determining abnormal threshold values and normal data of the concentrations of all components in the historical volatile organic compound observation data by adopting a percentile threshold value method.
Further, the determining the abnormal threshold and the normal data of the concentration of each component in the historical observation data of the volatile organic compounds by adopting a percentile threshold method comprises the following steps:
arranging the concentrations of all components in the historical volatile organic compound observation data in an ascending order to obtain a sequence;
setting the percentile of abnormal values of the concentrations of the components;
determining abnormal low value threshold and abnormal high value threshold of each component concentration according to data corresponding to the abnormal value percentile;
determining data between the abnormally low threshold and the abnormally high threshold as normal data.
Further, before determining the abnormal low threshold and the abnormal high threshold of each component concentration according to the data corresponding to the abnormal percentile, the method further comprises:
calculating the serial numbers of the data items corresponding to the percentile of the abnormal values;
multiplying the number of the data of the sequence by the percentile of the abnormal value, and calculating to obtain a numerical value;
when the numerical value obtained by calculating the percentile of the abnormal value is an integer, taking the data corresponding to the data item serial number equal to the numerical value as the data corresponding to the percentile of the abnormal value;
when the numerical value obtained by calculating the percentile of the abnormal value is a non-integer, rounding the numerical value of the non-integer, and taking the data corresponding to the serial number of the data item equal to the rounded value as the data corresponding to the percentile of the abnormal value; alternatively, the first and second electrodes may be,
and when the numerical value obtained by calculating the abnormal value percentile is a non-integer, calculating the average value of the data corresponding to the serial number of the previous data item and the serial number of the next data item which are adjacent to the non-integer numerical value respectively, and taking the average value as the data corresponding to the abnormal value percentile.
Further, the detecting the single-time volatile organic compound observation data by using the abnormal threshold, the trained isolated forest algorithm model and the trained local abnormal factor detection algorithm model comprises:
judging whether the concentration of each component in the single-time volatile organic compound observation data is normal data or abnormal data by using the abnormal threshold value to obtain a primary detection result;
respectively detecting and scoring each characteristic related component pair in the single volatile organic compound observation data by using the trained isolated forest algorithm model and the trained local abnormal factor detection algorithm model;
calculating the sum of all detection scoring result scores;
judging whether the single-time volatile organic compound observation data is abnormal data or not according to the sum of all detection scoring result scores;
if yes, all the single-time volatile organic compound observation data are determined to be abnormal data; and if not, taking the primary detection result as the detection result of the single-time volatile organic compound observation data.
Further, the determining whether the single-time volatile organic compound observation data is abnormal data according to the sum of the scores of all the detection scores includes:
judging whether the sum of all the detection scoring result scores is larger than the value of the number alpha of the single-time characteristic related component pairs; wherein alpha is a preset parameter, and the value range of alpha is [0, 100% ];
if yes, judging that the single-time volatile organic compound observation data are abnormal data;
and if not, judging that the single-time volatile organic compound observation data is not abnormal data.
According to another aspect of the embodiments of the present application, there is provided a device for detecting observed data of volatile organic compounds, including:
the acquisition module is used for acquiring historical observation data of the volatile organic compounds for a period of time;
the determining module is used for determining abnormal threshold values and normal data of concentrations of all components in the historical volatile organic compound observation data; wherein the anomaly threshold comprises an anomaly high value threshold and an anomaly low value threshold;
the training module is used for respectively training an isolated forest algorithm model and a local abnormal factor detection algorithm model by using the normal data;
and the detection module is used for detecting single-time volatile organic compound observation data by utilizing the abnormal threshold, the trained isolated forest algorithm model and the trained local abnormal factor detection algorithm model.
Further, the determining module is specifically configured to: and determining abnormal threshold values and normal data of the concentrations of all components in the historical volatile organic compound observation data by adopting a percentile threshold value method.
According to another aspect of the embodiments of the present application, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method for detecting observed volatile organic compounds described above.
According to another aspect of the embodiments of the present application, there is provided a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method for detecting observed volatile organic compounds described above.
The technical scheme provided by one aspect of the embodiment of the application can have the following beneficial effects:
according to the detection method for the volatile organic compound observation data, the volatile organic compound observation data are detected by using the abnormal threshold, the trained isolated forest algorithm model and the trained local abnormal factor detection algorithm model, so that the accuracy of the detection result can be greatly improved, a more powerful support is provided for scientific prevention and control of atmospheric pollution, and the application blank of a machine learning algorithm in the aspect of volatile organic compound observation data detection is filled.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the application, or may be learned by the practice of the embodiments. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of a method for detecting observed volatile organic compounds according to one embodiment of the present application;
FIG. 2 shows a flowchart of step S20 of the embodiment shown in FIG. 1;
FIG. 3 shows a flowchart of step S40 of the embodiment shown in FIG. 1;
FIG. 4 is a block diagram of a detection apparatus for observing data of volatile organic compounds according to an embodiment of the present application;
fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As shown in fig. 1, an embodiment of the present application provides a method for detecting observed data of volatile organic compounds, including:
and S10, acquiring historical observation data of the volatile organic compounds for a period of time.
Observations of Volatile Organic Compounds (VOCs) were collected over a period of time, typically over a period of one year. The Volatile Organic Compounds (VOCs) observation data comprise the content of each component. If the data states of two components are correlated, the two components constitute a feature-related component pair. The pair of characteristic-relevant components may be, for example, isobutane and n-butane, isopentane and n-pentane, m-p-xylene and o-xylene, acetylene and benzene, trans-2-butene and cis-2-butene, trans-2-pentene and cis-2-pentene, and the like. The associated meaning of the data state includes: under normal circumstances, there is some consistency in the change in the concentration of the two components. In the feature-related component pair, as long as there is an abnormality in the observed data of one component, the observed data of both components of the feature-related component pair are determined as abnormal data.
S20, determining abnormal threshold values and normal data of concentration data of each component in historical Volatile Organic Compounds (VOCs) observation data by a percentile threshold value method; wherein the anomaly threshold includes an anomaly high value threshold and an anomaly low value threshold. By screening out the normal data in step S20, preliminary quality control of the historical observed data of volatile organic compounds can be realized.
In certain embodiments, as shown in fig. 2, step S20 includes:
s201, arranging the concentrations of the components in the historical data of the volatile organic compounds in ascending order (not considering the value below 0, the value below 0 may be caused by the fault of a monitoring instrument), and obtaining a sequence. The data items in the sequence are referred to as data items, and the position sequence number of each data item in the sequence is referred to as a data item sequence number.
For example, for the sequence { a ] obtained by ascending order permutation1,a2,a3,……,an,……a100In which a1<a2<a3<……<an<……<a100The subscripts 1, 2, 3, … …, n, … … 100 of each data item represent a location number, i.e., a data item number.
Then, the anomaly threshold in the sequence is determined through steps S202-S205.
S202, setting the abnormal value percentile of the concentration of each component.
And setting percentage values (small abnormality and large abnormality) of content abnormal values of each component according to the probability distribution characteristics (normal distribution) of the historical volatile organic compound observation data and the data volume gradient change of each percentage value. For example, the abnormally high percentile may be set to 95% and the abnormally low percentile may be set to 5%.
And S203, calculating data corresponding to the abnormal value percentile.
In some embodiments, calculating data corresponding to the percentile of the outliers comprises:
calculating the serial numbers of the data items corresponding to the percentile of the abnormal values;
multiplying the number of the data of the sequence by the percentile of the abnormal value, and calculating to obtain a numerical value;
when the numerical value obtained by calculating the percentile of the abnormal value is an integer, taking the data corresponding to the data item serial number equal to the numerical value as the data corresponding to the percentile of the abnormal value;
for example, two year old observations of VOCs are collected and each observed observation of acetylene content is sorted in ascending order to yield a sequence with 24X 365X 2 total data. When the number of the data item corresponding to the abnormal value percentile is an integer, for example, the abnormal high value percentile is set to 95%, and the abnormal low value percentile is set to 5%, the number of the data item corresponding to the 95 th percentile is 24 × 365 × 2 × 95% — 16644, the number of the data item corresponding to the 5 th percentile is 24 × 365 × 2 × 5% — 876, the abnormal high value threshold is the value of the 16644 th data, and the abnormal low value threshold is the value of the 876 th data, the data between the 876 th and the 16644 th data is determined to be normal data, and the other data is determined to be abnormal constant data.
When the numerical value obtained by calculating the percentile of the abnormal value is a non-integer, rounding the numerical value of the non-integer, and taking the data corresponding to the serial number of the data item equal to the rounded value as the data corresponding to the percentile of the abnormal value; alternatively, the first and second electrodes may be,
and when the numerical value obtained by calculating the abnormal value percentile is a non-integer, calculating the average value of the data corresponding to the serial number of the previous data item and the serial number of the next data item which are adjacent to the non-integer numerical value respectively, and taking the average value as the data corresponding to the abnormal value percentile.
Specifically, when the numerical value corresponding to the abnormal value percentile is a non-integer, for example, when the abnormal high value percentile is 97% and the abnormal low value percentile is 3%, the numerical value corresponding to the 97 th percentile is 24 × 365 × 2 × 97% 16994.4 and the numerical value corresponding to the 3 rd percentile is 24 × 365 × 2 × 3% 525.6, the numerical value is rounded up (16994.4 rounded up is 1699 and 525.6 rounded up is 526), the numerical value may be rounded down (16994.4 rounded down is 169994 and 525.6 rounded down is 525), or the numerical value may be rounded down (16994.4 rounded up is 16994 and 525.6 rounded down is 526). Taking rounding as an example, the abnormally high threshold is the value of 16974 th data and the abnormally low threshold is the value of 526 th data. Alternatively, the average values of the data corresponding to the previous data item sequence number and the subsequent data item sequence number adjacent to 525.6 are calculated 16994.4, respectively, that is, the average value of the value of 16974 th data and the value of 1699 th data is the value of the data corresponding to the 97 th percentile (i.e., the abnormally high threshold value), and the average value of the 525 th data and the value of the 526 th data is the value of the data corresponding to the 3 rd percentile (i.e., the abnormally low threshold value).
And S204, determining abnormal low value threshold and abnormal high value threshold of each component concentration according to data corresponding to the abnormal value percentile.
And S205, determining data between the abnormal low value threshold and the abnormal high value threshold as normal data, and determining other data except the normal data in the sequence as abnormal data.
And determining an abnormal threshold, abnormal data and normal data of the VOCs historical observation data to realize preliminary quality control of the VOCs historical observation data.
And S30, respectively training an isolated forest algorithm model and a local abnormal factor detection algorithm model by using the normal data.
The normal data of the concentrations of the respective components obtained in step S20 are used to compose a training data set.
Respectively training an isolated forest (IsoF) algorithm model and a local abnormal factor detection (LOF) algorithm model by utilizing the training data set, and determining parameters of the isolated forest (IsoF) algorithm model and parameters of the local abnormal factor detection (LOF) algorithm model. And combining the trained isolated forest (IsoF) algorithm model and the trained local abnormal factor detection (LOF) algorithm model into an outlier detection model. Namely, an outlier detection model is constructed based on a machine learning algorithm.
And S40, detecting single-time volatile organic compound observation data by using the abnormal threshold, the trained isolated forest algorithm model and the trained local abnormal factor detection algorithm model.
And after determining the abnormal threshold and the parameters of the two algorithm models, carrying out real-time dynamic detection on the single-time volatile organic compound observation data by using the abnormal threshold and the outlier detection model.
As shown in fig. 3, step S40 includes:
s401, judging whether the concentration data of each component in the single time volatile organic compound observation data is normal data or abnormal data by using an abnormal threshold value to obtain a primary detection result.
Specifically, density data between an abnormally low threshold value and an abnormally high threshold value is marked as normal data, and density data not between the abnormally low threshold value and the abnormally high threshold value is marked as abnormal data; and forming the abnormal data into a primary single time abnormal detection sample.
S402, detecting and scoring each characteristic related component pair in the single-time volatile organic compound observation data by using the trained isolated forest algorithm model and the trained local abnormal factor detection algorithm model. And detecting each characteristic related component pair, and fully considering the influence of constraint relation factors among the components by utilizing the correlation among the components of the characteristic related component pair, thereby improving the detection accuracy.
When the characteristic related component pair is detected and scored, only the content data of one component in the characteristic related component pair needs to be detected and scored (because the data states are related, the content data of the other component in the characteristic related component pair does not need to be detected); and if the detection result of the content data of one component in the characteristic related component pair is abnormal data, marking as 1, and if the detection result of the content data of one component in the characteristic related component pair is normal data, marking as 0.
And S403, calculating the sum of all detection scoring result scores (namely, the sum of all detection scoring result scores of the two models for respectively detecting the characteristic related component pairs).
S404, judging whether the single-time volatile organic compound observation data is abnormal data or not according to the sum of all the detection scoring result scores.
Recording the sum of all the detection scoring result scores as a sum value, and judging whether the sum value is greater than the number alpha of the single time characteristic related component pairs; if sum value is larger than the number of the single temporal characteristic related component pairs alpha, judging that the single temporal volatile organic compound observation data is abnormal data, determining all the single temporal volatile organic compound observation data as abnormal data, and discarding the primary detection result of the step S401; and if the sum value is not more than the number alpha of the single time feature related component pairs, judging that the single time volatile organic compound observed data is not abnormal data, and taking the primary detection result as the detection result of the single time volatile organic compound observed data. Wherein alpha is a preset parameter, the numeric area of alpha is [0, 100% ], and the setting is carried out according to experience. By utilizing the relativity between the components of the characteristic related component pair, the influence of constraint relation factors among the components is fully considered, and the accuracy of judging whether the single-time volatile organic compound observation data is abnormal data is improved. In the present embodiment, α is preferably 75%.
In the step S40, firstly, the abnormal threshold is used for primary detection, and then the isolated forest algorithm model and the local abnormal factor detection algorithm model are used for primary detection, so that the accuracy of the detection result is improved.
As shown in fig. 4, another embodiment of the present application provides a device for detecting observed data of volatile organic compounds, including:
the acquisition module is used for acquiring historical observation data of the volatile organic compounds for a period of time;
the determining module is used for determining abnormal threshold values and normal data of concentrations of all components in the historical volatile organic compound observation data; wherein the anomaly threshold comprises an anomaly high value threshold and an anomaly low value threshold;
the training module is used for respectively training an isolated forest algorithm model and a local abnormal factor detection algorithm model by using the normal data;
and the detection module is used for detecting single-time volatile organic compound observation data by utilizing the abnormal threshold, the trained isolated forest algorithm model and the trained local abnormal factor detection algorithm model.
In some embodiments, the determining module is specifically configured to: and determining abnormal threshold values and normal data of the concentrations of all components in the historical volatile organic compound observation data by adopting a percentile threshold value method.
As shown in fig. 5, another embodiment of the present application provides an electronic device 50, which includes a memory 501, a processor 500, and a computer program stored on the memory 501 and executable on the processor 500, and the processor 500 executes the program to implement the method for detecting the observed volatile organic compound data. The electronic device 50 further comprises a bus 502 and a communication interface 503, and the processor 500, the communication interface 503 and the memory 501 are connected in pairs via the bus 502.
Another embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, the program being executed by a processor to implement the method for detecting observed volatile organic compounds described above.
The embodiment of the application provides a method for determining abnormal high-value threshold values and abnormal low-value threshold values of different components of volatile organic compound observation data, and an abnormal threshold value method, an isolated forest algorithm model and a local abnormal factor detection algorithm model are combined and applied to the abnormal detection of the volatile organic compound observation data. The isolated forest (IsoF) algorithm and the local anomaly factor detection (LOF) algorithm belong to unsupervised learning algorithms.
An isolated forest (IsoF) algorithm and a local abnormal factor detection (LOF) algorithm belong to a machine learning algorithm, and the application of the machine learning algorithm in the aspect of abnormal detection of volatile organic compound observation data is blank at present. By adopting an isolated forest (IsoF) algorithm model and a local abnormal factor detection (LOF) algorithm model, constraint relations among components are fully considered, abnormal volatile organic compound observation data are intelligently screened, normal data and abnormal data are judged, multivariable constraint relations such as regional differences, time differences and the like can be comprehensively considered in the detection process, the accuracy of a detection result can be greatly improved, so that a more powerful support is provided for scientific prevention and control of atmospheric pollution, and the blank of application of a machine learning algorithm in the aspect of abnormal detection of volatile organic compound observation data is filled.
It should be noted that:
the term "module" is not intended to be limited to a particular physical form. Depending on the particular application, a module may be implemented as hardware, firmware, software, and/or combinations thereof. Furthermore, different modules may share common components or even be implemented by the same component. There may or may not be clear boundaries between the various modules.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The above-mentioned embodiments only express the embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (10)

1. A detection method of volatile organic compound observation data is characterized by comprising the following steps:
acquiring historical observation data of volatile organic compounds for a period of time;
determining abnormal threshold values and normal data of concentrations of all components in the historical volatile organic compound observation data; wherein the anomaly threshold comprises an anomaly high value threshold and an anomaly low value threshold;
respectively training an isolated forest algorithm model and a local abnormal factor detection algorithm model by using the normal data;
and detecting single-time volatile organic compound observation data by using the abnormal threshold, the trained isolated forest algorithm model and the trained local abnormal factor detection algorithm model.
2. The method for detecting observed volatile organic compound data according to claim 1, wherein the determining abnormal threshold values and normal data of concentrations of components in the historical observed volatile organic compound data includes: and determining abnormal threshold values and normal data of the concentrations of all components in the historical volatile organic compound observation data by adopting a percentile threshold value method.
3. The method for detecting observed volatile organic compound data according to claim 2, wherein the determining abnormal threshold values and normal data of concentrations of components in the historical observed volatile organic compound data by using a percentile threshold method comprises:
arranging the concentrations of all components in the historical volatile organic compound observation data in an ascending order to obtain a sequence;
setting the percentile of abnormal values of the concentrations of the components;
determining abnormal low value threshold and abnormal high value threshold of each component concentration according to data corresponding to the abnormal value percentile;
determining data between the abnormally low threshold and the abnormally high threshold as normal data.
4. The method for detecting observed volatile organic compound data according to claim 3, wherein before determining the abnormally low threshold value and the abnormally high threshold value of each component concentration according to the data corresponding to the abnormal percentile, the method further comprises:
calculating the serial numbers of the data items corresponding to the percentile of the abnormal values;
multiplying the number of the data of the sequence by the percentile of the abnormal value, and calculating to obtain a numerical value;
when the numerical value obtained by calculating the percentile of the abnormal value is an integer, taking the data corresponding to the data item serial number equal to the numerical value as the data corresponding to the percentile of the abnormal value;
when the numerical value obtained by calculating the percentile of the abnormal value is a non-integer, rounding the numerical value of the non-integer, and taking the data corresponding to the serial number of the data item equal to the rounded value as the data corresponding to the percentile of the abnormal value; alternatively, the first and second electrodes may be,
and when the numerical value obtained by calculating the abnormal value percentile is a non-integer, calculating the average value of the data corresponding to the serial number of the previous data item and the serial number of the next data item which are adjacent to the non-integer numerical value respectively, and taking the average value as the data corresponding to the abnormal value percentile.
5. The method for detecting observed volatile organic compounds data according to claim 1, wherein the detecting observed volatile organic compounds data for a single time by using the anomaly threshold, the trained isolated forest algorithm model and the trained local anomaly factor detection algorithm model comprises:
judging whether the concentration of each component in the single-time volatile organic compound observation data is normal data or abnormal data by using the abnormal threshold value to obtain a primary detection result;
respectively detecting and scoring each characteristic related component pair in the single volatile organic compound observation data by using the trained isolated forest algorithm model and the trained local abnormal factor detection algorithm model;
calculating the sum of all detection scoring result scores;
judging whether the single-time volatile organic compound observation data is abnormal data or not according to the sum of all detection scoring result scores;
if yes, all the single-time volatile organic compound observation data are determined to be abnormal data; and if not, taking the primary detection result as the detection result of the single-time volatile organic compound observation data.
6. The method for detecting observed volatile organic compound data according to claim 5, wherein the step of determining whether the single-time observed volatile organic compound data is abnormal data according to the sum of all the detection score results comprises:
judging whether the sum of all the detection scoring result scores is larger than the value of the number alpha of the single-time characteristic related component pairs; wherein alpha is a preset parameter, and the value range of alpha is [0, 100% ];
if yes, judging that the single-time volatile organic compound observation data are abnormal data;
and if not, judging that the single-time volatile organic compound observation data is not abnormal data.
7. A detection device for volatile organic compound observation data is characterized by comprising:
the acquisition module is used for acquiring historical observation data of the volatile organic compounds for a period of time;
the determining module is used for determining abnormal threshold values and normal data of concentrations of all components in the historical volatile organic compound observation data; wherein the anomaly threshold comprises an anomaly high value threshold and an anomaly low value threshold;
the training module is used for respectively training an isolated forest algorithm model and a local abnormal factor detection algorithm model by using the normal data;
and the detection module is used for detecting single-time volatile organic compound observation data by utilizing the abnormal threshold, the trained isolated forest algorithm model and the trained local abnormal factor detection algorithm model.
8. The apparatus according to claim 7, wherein the determining module is specifically configured to: and determining abnormal threshold values and normal data of the concentrations of all components in the historical volatile organic compound observation data by adopting a percentile threshold value method.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the method for detecting voc observations as claimed in any of claims 1-6.
10. A computer-readable storage medium, on which a computer program is stored, the program being executed by a processor to implement the method for detecting observed volatile organic compounds according to any one of claims 1 to 6.
CN202010700226.8A 2020-07-20 2020-07-20 Method, device, equipment and medium for detecting volatile organic compound observation data Pending CN111710373A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010700226.8A CN111710373A (en) 2020-07-20 2020-07-20 Method, device, equipment and medium for detecting volatile organic compound observation data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010700226.8A CN111710373A (en) 2020-07-20 2020-07-20 Method, device, equipment and medium for detecting volatile organic compound observation data

Publications (1)

Publication Number Publication Date
CN111710373A true CN111710373A (en) 2020-09-25

Family

ID=72546839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010700226.8A Pending CN111710373A (en) 2020-07-20 2020-07-20 Method, device, equipment and medium for detecting volatile organic compound observation data

Country Status (1)

Country Link
CN (1) CN111710373A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340063A (en) * 2020-02-10 2020-06-26 北京华电天仁电力控制技术有限公司 Coal mill data anomaly detection method
CN114580572A (en) * 2022-04-07 2022-06-03 中科三清科技有限公司 Abnormal value identification method and device, electronic equipment and storage medium
CN117236528A (en) * 2023-11-15 2023-12-15 成都信息工程大学 Ozone concentration forecasting method and system based on combined model and factor screening
CN118133211A (en) * 2024-05-07 2024-06-04 山东世纪智慧农业科技有限公司 Black termitomyces albuminosus stick pollution evaluation method based on multidimensional sensor
CN118133211B (en) * 2024-05-07 2024-07-09 山东世纪智慧农业科技有限公司 Black termitomyces albuminosus stick pollution evaluation method based on multidimensional sensor

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102105794A (en) * 2008-06-04 2011-06-22 B.R.A.H.M.S有限公司 A marker for graft failure and mortality
CN108156131A (en) * 2017-10-27 2018-06-12 上海观安信息技术股份有限公司 Webshell detection methods, electronic equipment and computer storage media
CN108709945A (en) * 2018-05-25 2018-10-26 浙江省环境监测中心 A kind of VOCs stationary sources on-line monitoring monitoring method
CN109034252A (en) * 2018-08-01 2018-12-18 中国科学院大气物理研究所 The automatic identification method of air quality website monitoring data exception
CN109063993A (en) * 2018-07-23 2018-12-21 上海市环境监测中心 A kind of method of atmospheric environment VOCs online monitoring data quality automatic discrimination
CN109088903A (en) * 2018-11-07 2018-12-25 湖南大学 A kind of exception flow of network detection method based on streaming
CN110619345A (en) * 2019-07-22 2019-12-27 重庆交通大学 Cable-stayed bridge monitoring data validity-oriented label reliability comprehensive verification method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102105794A (en) * 2008-06-04 2011-06-22 B.R.A.H.M.S有限公司 A marker for graft failure and mortality
CN108156131A (en) * 2017-10-27 2018-06-12 上海观安信息技术股份有限公司 Webshell detection methods, electronic equipment and computer storage media
CN108709945A (en) * 2018-05-25 2018-10-26 浙江省环境监测中心 A kind of VOCs stationary sources on-line monitoring monitoring method
CN109063993A (en) * 2018-07-23 2018-12-21 上海市环境监测中心 A kind of method of atmospheric environment VOCs online monitoring data quality automatic discrimination
CN109034252A (en) * 2018-08-01 2018-12-18 中国科学院大气物理研究所 The automatic identification method of air quality website monitoring data exception
CN109088903A (en) * 2018-11-07 2018-12-25 湖南大学 A kind of exception flow of network detection method based on streaming
CN110619345A (en) * 2019-07-22 2019-12-27 重庆交通大学 Cable-stayed bridge monitoring data validity-oriented label reliability comprehensive verification method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
林涛: "基于无监督学习的物联网卡流量异常检测算法", 《城市建设理论研究(电子版)》 *
王会会: "空气质量监测可视化分析系统的设计与实现", 《中国优秀硕士学位论文全文数据库 工程科技Ⅰ辑》 *
王鸣等: "基于城市大气挥发性有机物特征分析的数据质量评估方法及案例", 《中国环境监测》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340063A (en) * 2020-02-10 2020-06-26 北京华电天仁电力控制技术有限公司 Coal mill data anomaly detection method
CN111340063B (en) * 2020-02-10 2023-08-29 国能信控互联技术有限公司 Data anomaly detection method for coal mill
CN114580572A (en) * 2022-04-07 2022-06-03 中科三清科技有限公司 Abnormal value identification method and device, electronic equipment and storage medium
CN117236528A (en) * 2023-11-15 2023-12-15 成都信息工程大学 Ozone concentration forecasting method and system based on combined model and factor screening
CN117236528B (en) * 2023-11-15 2024-01-23 成都信息工程大学 Ozone concentration forecasting method and system based on combined model and factor screening
CN118133211A (en) * 2024-05-07 2024-06-04 山东世纪智慧农业科技有限公司 Black termitomyces albuminosus stick pollution evaluation method based on multidimensional sensor
CN118133211B (en) * 2024-05-07 2024-07-09 山东世纪智慧农业科技有限公司 Black termitomyces albuminosus stick pollution evaluation method based on multidimensional sensor

Similar Documents

Publication Publication Date Title
CN111710373A (en) Method, device, equipment and medium for detecting volatile organic compound observation data
Yang et al. Distinct fungal successional trajectories following wildfire between soil horizons in a cold‐temperate forest
Ma et al. Distinct biogeographic patterns for archaea, bacteria, and fungi along the vegetation gradient at the continental scale in Eastern China
Lee et al. Quantifying the agreement between observed and simulated extratropical modes of interannual variability
CN111860645B (en) Method and device for repairing default value in volatile organic compound observation data
CN111208445A (en) Power battery abnormal monomer identification method and system
CN112990111B (en) Method and device for identifying ozone generation high-value area, storage medium and equipment
Whitford et al. Political and social foundations for environmental sustainability
Shobande et al. The rise and fall of the energy-carbon Kuznets curve: Evidence from Africa
Zandi et al. The role of trade liberalization in carbon dioxide emission: Evidence from heterogeneous panel estimations
CN112114103B (en) Sewage plant sludge bulking detection method based on robust adaptive canonical correlation analysis
Zhang et al. Mixture of tree species enhances stability of the soil bacterial community through phylogenetic diversity
Bauman et al. Testing and interpreting the shared space‐environment fraction in variation partitioning analyses of ecological data
CN110032799A (en) A kind of the angle similarity divided stages and monitoring method of microbiological pharmacy process
Wang et al. Freshwater trophic status mediates microbial community assembly and interdomain network complexity
CN113597664A (en) Method, electronic device, storage medium and system for determining bad reason
CN117170304B (en) PLC remote monitoring control method and system based on industrial Internet of things
CN111934903B (en) Docker container fault intelligent prediction method based on time sequence evolution gene
CN112651633A (en) Tunnel water inrush and mud inrush risk analysis method, system, storage medium and equipment
Koven et al. Controls on terrestrial carbon feedbacks by productivity vs. turnover in the CMIP5 Earth System Models
CN102830624A (en) Semi-supervised monitoring method of production process of polypropylene based on self-learning statistic analysis
Rong et al. Distributed process monitoring framework based on decomposed modified partial least squares
CN115407753A (en) Industrial fault diagnosis method for multivariate weighted ensemble learning
CN117423406B (en) EKMA curve generation method and device, electronic equipment and storage medium
Shobande et al. Exploring the Criticality of Natural Resources Management and Technological Innovations for Ecological Footprint in the OECD Countries

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination