CN112949735A - Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining - Google Patents

Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining Download PDF

Info

Publication number
CN112949735A
CN112949735A CN202110273839.2A CN202110273839A CN112949735A CN 112949735 A CN112949735 A CN 112949735A CN 202110273839 A CN202110273839 A CN 202110273839A CN 112949735 A CN112949735 A CN 112949735A
Authority
CN
China
Prior art keywords
outlier
weight
distance
algorithm
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110273839.2A
Other languages
Chinese (zh)
Inventor
薛善良
彭振峰
韦青燕
肖雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202110273839.2A priority Critical patent/CN112949735A/en
Publication of CN112949735A publication Critical patent/CN112949735A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining is characterized in that firstly, a dividing information entropy is introduced to determine the weight of outlier attributes; the method comprises the steps that a density-based clustering algorithm is used for screening an original data set collected by a sensor to obtain a primary outlier data set, and the operation efficiency of the algorithm is improved; then, usePReplacing the reachable distance in the local abnormal factor algorithm by the weight; finally using the newly defined basesPAnd calculating the outlier degree of the objects in the preliminary outlier data set by using the local outlier LOFBP of the weight. According to the invention, a large amount of gas concentration sensor data is processed by using a data mining technology, so that the data reliability of a single gas concentration sensor can be improved, and the data of a plurality of gas concentration sensor arrays form a whole to estimate the gas concentration in space, thereby effectively helping dangerous chemical production and processing enterprises to improve the production safety risk identification capability and prevent production accidents.

Description

Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining
Technical Field
The invention relates to an outlier data mining method, in particular to a liquid-state hazardous chemical substance volatile concentration abnormity discovery method based on local abnormal factor improvement.
Background
The storage and transportation problems of the liquid dangerous chemicals always relate to the life and property safety of people in China. Different liquid hazardous chemicals have different properties. Volatility is common to most liquid hazardous chemicals, such as gasoline, LNG, liquid ammonia, alcohols and benzene are common volatile hazardous chemicals. In practice, petrochemical enterprises often need to frequently monitor a plurality of safety indexes in the processes of storage and transportation to ensure that accidental leakage does not occur in the production process. Monitoring of gas volatilization of liquid hazardous chemicals is an important basis for judging accidental leakage. In open-work situations, certain errors, even false positives, often occur due to limitations of single gas sensor deployment. Therefore, enterprises also use a large number of sensors to form an array for detection, the number of sensors can reduce false alarms and improve accuracy, but a large number of sensor raw data need to be preprocessed before being used. Therefore, the method has higher significance and realization value for solving the problems of efficiently preprocessing the data, accurately identifying individual sensors with accidental errors, excavating the sensor data of outliers and the like.
The traditional outlier data mining algorithm has the problems of inconvenience and overlong running time in monitoring data of a liquid hazardous chemical gas concentration sensor. The data collected by the gas concentration sensor is determined according to actual conditions, and shows strong attractiveness and unpredictability. But the behavioral causes of their data outliers may be multiple. The security problem often has higher timeliness requirement, the outlier data mining algorithm needs to have fast execution efficiency, and the outlier data points can be accurately positioned, so that timely and accurate data can be provided for subsequent analysis. The invention applies a data mining technology to gas concentration data outlier detection, provides a liquid hazardous chemical substance volatilization concentration abnormity finding method improved based on a local abnormal factor algorithm, and assists a subsequent algorithm to carry out more targeted batch processing on a large amount of sensor data.
Disclosure of Invention
The invention aims to solve the problems of discomfort and overlong running time of a large amount of sensor monitoring data of the existing outlier data mining algorithm in the case that a gas concentration sensor array collects batch data, and provides a liquid hazardous chemical substance volatilization concentration abnormity discovery method based on outlier data mining.
The technical scheme of the invention is as follows:
the liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining utilizes liquid hazardous chemical substance gas concentration sensor monitoring data to conduct outlier mining, and can improve the efficiency of processing the liquid hazardous chemical substance volatile gas concentration monitoring data. The method is characterized in that firstly, a division information entropy is introduced to determine the weight of the outlier; then, screening an original data set acquired by the gas concentration sensor by using an OPTIC clustering algorithm to obtain a primary outlier data set, and improving the operation efficiency of the algorithm; replacing the reachable distance in the LOF algorithm by the P weight; and finally, calculating the outlier degree of the object in the preliminary outlier data set by using a newly defined Local Outliers Factor (LOFBP) based on P weight, and improving the execution efficiency while keeping the detection precision of the algorithm.
For the problem that the running time of the existing outlier sensor data detected by the existing outlier data mining algorithm is too high, the LOFBP algorithm adopts OPTIC as preprocessing, and redefines an outlier factor in the local outlier algorithm. In order to reduce the time complexity of outlier mining, the data set is reduced and the mining efficiency is improved on the premise of not influencing the final analysis result. In order to solve the defects of the local outlier factor algorithm, the LOFBP algorithm introduces a division information entropy during distance measurement, replaces the reachable distance in the traditional LOF algorithm with a P weight and redefines the local outlier factor, and can greatly improve the detection accuracy.
The method specifically comprises the following steps:
step 1: reading an original data set S acquired by a gas sensor;
step 2: calculating a de-partitioned information entropy delta (N) for all attributes in a data seti);
a) In order to improve the quality of detection of outliers, the distance between data objects in the OPTICS algorithm is measured by adopting weighted distance, and the weight of the attribute is determined by removing one division information entropy increment. Entropy is a measure of how much information a system contains, and therefore, the entropy E (x) value measures the uncertainty of a data set. It is defined as:
E(x)=-[p(x1)·log p(x1)]-[p(x2)·log p(x2)]…-[p(xn)·log p(xn)] (1)
in formula (1), x is a random variable, and the possible set of values is s (x) { x }1,x2,......,xn};
p (x) represents the probability of taking the value x.
b) To highlight outlier attributes, the weight of the associated attribute is defined by the change in entropy value after one attribute is removed. Let attribute set be N ═ N1,N2,...,NmGet Ni(i ═ 1, 2.., m) divides N into two parts: { NiAnd { N-N }iP ═ P }, denoted as P ═ P1,P2In which P is1={Ni},P2={N1,N2,…,Ni-1,Ni+1,…,NmGet one division information entropy increment delta (N)i) Defined as formula (2), the larger the value, the more NiThe more uncertainty of the removed data set is reduced:
Δ(Ni)=E(N)-E(P) (2)
in the formula (2), Δ (N)i) Representing set N removing NiThe later information entropy changes;
e (N) information entropy representing the attribute set N;
the calculation formula of E (P) is:
Figure BDA0002975689300000021
c) if two data objects are p ═ p respectively1,p2,…,pm},p′={q1,q2,...,qmAnd the weighted distance between the two is denoted as dist (p, p'), then the weighted distance based on the one-division-information entropy increment is defined as:
dist(p,p′)=[Δ(N1)×d(p1,p′1)]+[Δ(N2)×d(p2,p′2)]+…+[Δ(Nm)×d(pm,p′m)] (4)
and step 3: calculating the reachable distance of all objects in the data set;
if p is a core object, the larger of the core distance of p and the distance of o from p is defined as the reachable distance of o with respect to p; if p is a non-core object, then p has no definition of a core object. Thus for object p, o e S, the reachable distance is defined as follows:
Figure BDA0002975689300000031
in formula (5), reachDist is the reachable distance of o with respect to p;
the reachable distance is calculated if and only if the p-core object.
And 4, step 4: obtaining a preliminary outlier data set S using the OPTIC algorithm2
Step 4.1: after the points in the neighborhood are added into the unordered queue, the whole unordered queue is not required to be sorted, and only the minimum point of the reachable distance is taken out through comparison and stored into a temporary variable. When a new point in the non-ordered queue needs to be processed, only the minimum point of the temporary variable storage needs to be taken out, and the reachable graph is obtained through the method.
And 5: calculating k distances and k distance neighborhoods of all objects in the preliminary outlier data set, and calculating a P weight;
p-weighting is a distance-based method for finding outlier data by measuring the degree of outlier of an object in a data set by P-weighting. The sum of the distances between an arbitrary object P and its nearest k objects is called P weight, and is calculated as follows:
Wk(p)=d1(p,nb1(p))+d2(p,nb2(p))+…+dk(p,nbk(p)) (6)
in the formula (6), Wk(P) is the P weight;
nbi(p) the ith neighbor representing p;
dk(p,nbk(p)) represents the distance of point p to the kth object adjacent to p.
Step 6: calculating local density Ldp based on the P weight;
when the P weight is used for measuring the outlier degree of an object, the operation is simple, but only outlier data with single density can be found, so the algorithm adopts the idea of local outlier factor algorithm to improve the P weight. In the LOF algorithm, for the core point P, the reachable distance of any point o to P is defined as the larger of d (P, o) and coredist (P), where the reachable distance in the local outlier algorithm is replaced with the P weight of the core object. Since the distance between any two points has already been calculated in the OPTIC algorithm, dist in the above equationi(p,nbi(p)) use is made of the weighted distances that have been calculated during the clustering process of the OPTICS algorithm. Given a data set S, P is any point in the set, the local density Ldp based on the P weight value proposed hereink(p) may be represented by the following formula:
Figure BDA0002975689300000041
in the formula (7), Nk(p) represents a k-distance neighborhood of object p;
disti(p,nbi(p)) is a weighted distance based on the de-one partition information entropy increment.
And 7: calculating local reachable density LOFBP based on P weight;
according to the definition of the local outlier factor in the LOF algorithm, the local outlier factor can be defined by Ldpk(P) analogize local outliers based on P weights in LOFBPAnd (4) defining the factor. The local outlier factor based on the P weight is found by the mean of the ratio of the object density in the epsilon neighborhood of object P to the object P density, and this value is denoted as LOFBPk(p) of the formula (I). If LOFBPkThe closer the value of (p) is to 1, the more p and Nk(p) in which the density of the objects is not very different, p and Nk(p) the object may belong to a cluster; if LOFBPkThe smaller the value of (p) is, the higher the density of p is, the lower the value is, the lower the density is, thek(p) density of objects, whereas the more likely p is an outlier. The local outlier definition for object p is shown below:
Figure BDA0002975689300000042
in equation (8), LOFBPk(p) is the local reachable density of p points, which can be used as an index of data outlier;
and 8: and outputting the local reachable density LOFBP in a descending order to obtain outlier data.
The invention has the beneficial effects that:
(1) the invention overcomes the defect of the traditional outlier mining algorithm for anomaly discovery that the execution efficiency of the sensor array data processing is low. Outlier data points are mined by analyzing raw gas concentration sensor data. The method can assist the subsequent algorithm in processing a large amount of sensing data, and improve the execution efficiency.
(2) The LOFBP provided by the invention uses OPTICS to preprocess the original data set of the gas concentration sensor, introduces a division information entropy when measuring the distance, uses a P weight to replace the reachable distance, and redefines a local outlier factor, thereby effectively improving the accuracy of outlier mining while ensuring the efficiency.
(3) Compared with LOF, P-weight and LODCD algorithms, the LOFBP provided by the invention has the advantages that the comprehensive performance of the mining effect and the operation efficiency is highest, and the consumed time is very short while the outlier data points are effectively mined.
According to the method, through analyzing the original data of the gas concentration sensor, outlier data points can be more efficiently and accurately excavated, so that the execution of a subsequent space concentration estimation algorithm or other realistic significance methods is assisted, the execution efficiency of the algorithm is improved under the same hardware condition, and the production safety risk identification capability is improved.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a schematic of a simulated data set for use with the present invention.
FIG. 3 is a graph comparing the detection accuracy of the algorithm of the present invention with three other algorithms on a simulated data set.
FIG. 4 is a graph comparing the detection accuracy of the algorithm of the present invention with three other algorithms on an Iris dataset.
Fig. 5 is a graph comparing the detection accuracy of the algorithm of the present invention with that of the other three algorithms on the break-cancer data set.
FIG. 6 is a run time comparison of the algorithm of the present invention with three other algorithms.
Detailed Description
The invention is explained in more detail below with reference to the drawings and the examples.
The invention aims to solve the problems that the existing outlier data mining algorithm can not effectively detect outlier gas concentration sensor data points and the running time is too high. Firstly, weights of various attributes, such as sensor deployment position information, wind speed, temperature, concentration and the like, are determined by using de-one partition information entropy. Then screening an original data set of the gas concentration sensor by using an OPTIC clustering algorithm to obtain a primary outlier data set, and replacing the reachable distance in the local abnormal factor algorithm by using a P weight; and finally, calculating the outlier degree of the objects in the preliminary outlier data set by using a newly defined local outlier factor LOFBP based on the P weight, and excavating the outlier data points.
Fig. 1 is a flow chart of the present invention, and the specific implementation process is as follows:
step 1: reading an original data set S acquired by a gas sensor;
step 2: calculating a de-partitioned information entropy delta (N) for all attributes in a data seti);
And step 3: calculating the reachable distance reachDist of all objects in the data set;
and 4, step 4: obtaining a preliminary outlier data set S using the OPTIC algorithm2
And 5: calculating k distances and k distance neighborhoods of all objects in the preliminary outlier data set, and calculating a P weight;
step 6: calculating the local density based on the P weight;
and 7: calculating local reachable density based on the P weight;
and 8: and outputting the local reachable density in a descending order to obtain outlier data.
In the experiment, 1 simulation data set and 2 UCI data sets are used for carrying out method effectiveness verification and efficiency analysis. The gas concentration sensor data is used in the implementation as real experimental data from UCI authorization-free. Simulated data set as shown in fig. 2, two clusters and outliers of different densities are included in the data set. The number of the cluster data represented by the black dot symbols is 500, and the number of the outliers represented by the black triangles is 10. Statistical analysis was performed on the first 10 data of the run results. Counting the number R of outlier data in the first 10 data of the operation result0From R0The detection accuracy of the LOF, P-weight, LODCD and LOFBP algorithms under different k values is calculated, and the comparison result is shown in FIG. 3. Fig. 4 and 5 are graphs of the detection accuracy of four algorithms based on the Iris and Breast-cancer data sets, respectively. As can be seen from fig. 3 to 5, the overall detection accuracy of the LOFBP algorithm is the highest. Fig. 6 is a comparison graph of the running time of the four algorithms on the break-candidate data set, and it can be seen from fig. 6 that the running time of the P weight algorithm is the lowest, the running time of the LODCD algorithm is the highest, and the running efficiencies of the LOFBP algorithm and the LOF algorithm are substantially equal. The results show that the LOFBP algorithm provided by the invention not only has a better data mining effect, but also can ensure higher operation efficiency.
The data of the gas concentration sensor used in the implementation process is taken as an example of ethanol serving as a typical liquid hazardous chemical, and the experimental environment is windless, the temperature is 22.4 ℃, and the humidity is 68.92%. The data are concentration indexes of 72 metal oxide gas concentration sensors, each eight of the sensors are divided into nine groups, and the sampling frequency is 100Hz and lasts for 20 seconds. Subtracting the sampling consumption time of the device, collecting 1928 times of data in total, and performing data processing and outlier analysis by using the algorithm of the invention to quickly locate the outlier sampling data points, wherein 72 x 1928 x 138816 are the total sampling data points. The sampled data points sorted according to the size of the outlier index can facilitate the data to be processed in various aspects subsequently. For example: 1. because the gas concentration is difficult to realize jumping under the condition of quick sampling, the data point of which the outlier factor is larger than a certain threshold can be preliminarily judged to be a noise point caused by equipment failure, and can be eliminated; 2. the data points collected by the same group of sensors are high in general outlier index, so that whether the equipment works in a required normal working environment or not can be checked, and the sampling result is possibly influenced by the working temperature or the integrated hardware environment; 3. a large number of data distributions of different groups present similar clusters, it may be that the relative position of the single sensor deployment in the sensor array affects the sampling results, and so on.
In practice, through outlier analysis of a large number of data indexes, noise points can be eliminated in time in the work of carrying out feature engineering, data point distribution rules can be analyzed more effectively, or factors which cannot be considered in detail in the design process of experiments or production practices can be found, meanwhile, the execution efficiency of a subsequent exception handling algorithm can be accelerated, and the method has great practical significance.
The present invention is not concerned with parts which are the same as or can be implemented using prior art techniques.

Claims (8)

1. A liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining is characterized by comprising the following steps of firstly, introducing a division information entropy to determine the weight of outlier attributes; then, screening an original data set acquired by the gas concentration sensor by using an OPTIC clustering algorithm to obtain a primary outlier data set, and improving the operation efficiency of the algorithm; replacing the reachable distance in the LOF algorithm by the P weight; and finally, calculating the outlier degree of the object in the preliminary outlier data set by using a newly defined Local Outliers Factor (LOFBP) based on P weight, and improving the execution efficiency while keeping the detection precision of the algorithm.
2. Method according to claim 1, characterized in that it comprises the following steps:
step 1: reading a gas sensor raw data set;
step 2: calculating the entropy increment of the one-off division information of all attributes in the data set;
and step 3: calculating the reachable distance of all objects in the data set;
and 4, step 4: acquiring a primary outlier data set by using an OPTIC algorithm;
and 5: calculating k distances and k distance neighborhoods of all objects in the preliminary outlier data set, and calculating a P weight;
step 6: calculating the local density based on the P weight;
and 7: calculating local reachable density based on the P weight;
and 8: and outputting the local reachable density in a descending order to obtain outlier data.
3. The method of claim 2, wherein the OPTICS algorithm obtaining a preliminary outlier data set comprises: after the points in the neighborhood are added into the disordered queue, the whole disordered queue is not required to be sequenced, and the minimum point of the reachable distance can be taken out and stored into a temporary variable only by comparing the newly added point with the original minimum point; when a new point in the non-ordered queue needs to be processed, only the minimum point of the temporary variable storage needs to be taken out, and the reachable graph is obtained through the method.
4. The method of claim 2, wherein the de-partitioned information entropy delta Δ (N) is computed for all attributes in the data seti) The method comprises the following steps:
a) in order to improve the quality of detection of outliers, the distance between data objects in an OPTIC algorithm is measured by adopting weighted distance, and the weight of attributes is determined by removing a division information entropy increment; the information entropy is used for measuring how much information a system contains, so that the information entropy E (x) value can measure the uncertainty of a data set; it is defined as:
E(x)=-[p(x1)·log p(x1)]-[p(x2)·log p(x2)]…-[p(xn)·log p(xn)] (1)
in formula (1), x is a random variable, and the possible set of values is s (x) { x }1,x2,……,xn};
p (x) represents the probability of taking the value x;
b) to highlight outlier attributes, the weight of the associated attribute is defined by the change in entropy value after one attribute is removed. Let attribute set be N ═ N1,N2,…,NmGet Ni(i ═ 1,2, …, m), dividing N into two parts: { NiAnd { N-N }iP ═ P }, denoted as P ═ P1,P2In which P is1={Ni},P2={N1,N2,…,Ni-1,Ni+1,…,NmGet one division information entropy increment delta (N)i) Defined as formula (2), the larger the value, the more NiThe more uncertainty of the removed data set is reduced:
Δ(Ni)=E(N)-E(P) (2)
in the formula (2), Δ (N)i) Representing set N removing NiThe later information entropy changes;
e (N) information entropy representing the attribute set N;
the calculation formula of E (P) is:
Figure FDA0002975689290000021
c) if two data objects are p ═ p respectively1,p2,…,pm},p′={q1,q2,…,qmAnd the weighted distance between the two is denoted as dist (p, p'), then the weighted distance based on the one-division-information entropy increment is defined as:
dist(p,p′)=[Δ(N1)×d(p1,p′1)]+[Δ(N2)×d(p2,p′2)]+…+[Δ(Nm)×d(pm,p′m)] (4)。
5. the method of claim 2, wherein when calculating the reachable distance of all objects in the data set, if p is a core object, the larger of the core distance of p and the distance of o from p is defined as the reachable distance of o with respect to p; if p is a non-core object, then p is defined as a non-core object; thus for object p, o e S, the reachable distance is defined as follows:
Figure FDA0002975689290000022
in formula (5), reachDist is the reachable distance of o with respect to p; the reachable distance is calculated if and only if the p-core object.
6. The method of claim 2, wherein k-distance and k-distance neighborhoods of all objects in the preliminary outlier data set are calculated, and P-weight is calculated; the P weight is a method for finding outlier data based on distance, and the method measures the outlier degree of a certain object in a data set through the P weight; the sum of the distances between an arbitrary object P and its nearest k objects is called P weight, and is calculated as follows:
Wk(p)=d1(p,nb1(p))+d2(p,nb2(p))+…+dk(p,nbk(p)) (6)
in the formula (6), Wk(P) is the P weight;
nbi(p) the ith neighbor representing p;
dk(p,nbk(p)) represents the distance of point p to the kth object adjacent to p.
7. The method as claimed in claim 2, wherein when calculating the local density Ldp based on the P-weight, and measuring the degree of outlier of the object by using the P-weight, the operation is simple but only outlier data of a single density can be found, so the algorithm uses the local density LdpThe P weight is improved by the idea of the outlier factor algorithm; in the LOF algorithm, for a core point P, the reachable distance from any point o to P is defined as the larger of d (P, o) and coredist (P), where the reachable distance in the local outlier algorithm is replaced by the P weight of the core object; since the distance between any two points has already been calculated in the OPTIC Algorithm, disti(p,nbi(p)) use is made of the weighted distances that have been calculated during the clustering process of the OPTICS algorithm; if a data set S is given, and P is any point in the set, the local density Ldp based on the weight of Pk(p) may be represented by the following formula:
Figure FDA0002975689290000031
in the formula (7), Nk(p) represents a k-distance neighborhood of object p; disti(p,nbi(p)) is a weighted distance based on the de-one partition information entropy increment.
8. The method of claim 2, wherein calculating the local reachable density LOFBP based on P weight is based on the local outlier definition in LOF algorithm, which can be defined by Ldpk(P) analogizing the definition of local outlier factors based on P weight in LOFBP; the local outlier factor based on the P weight is found by the mean of the ratio of the object density in the epsilon neighborhood of object P to the object P density, and this value is denoted as LOFBPk(p); if LOFBPkThe closer the value of (p) is to 1, the more p and Nk(p) in which the density of the objects is not very different, p and Nk(p) the object may belong to a cluster; if LOFBPkThe smaller the value of (p) is, the higher the density of p is, the lower the value is, the lower the density is, thek(p) density of objects, whereas the more likely p is an outlier; the local outlier definition for object p is shown below:
Figure FDA0002975689290000032
in equation (8), LOFBPk(p) is the local achievable density of p points, which can be used as an indicator of data outliers.
CN202110273839.2A 2021-03-15 2021-03-15 Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining Pending CN112949735A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110273839.2A CN112949735A (en) 2021-03-15 2021-03-15 Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110273839.2A CN112949735A (en) 2021-03-15 2021-03-15 Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining

Publications (1)

Publication Number Publication Date
CN112949735A true CN112949735A (en) 2021-06-11

Family

ID=76229759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110273839.2A Pending CN112949735A (en) 2021-03-15 2021-03-15 Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining

Country Status (1)

Country Link
CN (1) CN112949735A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114943434A (en) * 2022-05-16 2022-08-26 南京航空航天大学 Liquid hazardous chemical substance loading and unloading crane position dynamic allocation method based on LOF outlier
CN117272215A (en) * 2023-11-21 2023-12-22 江苏达海智能系统股份有限公司 Intelligent community safety management method and system based on data mining
CN117436024A (en) * 2023-12-19 2024-01-23 湖南翰文云机电设备有限公司 Fault diagnosis method and system based on drilling machine operation data analysis

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063733A (en) * 2018-06-27 2018-12-21 西安理工大学 A kind of outlier detection method based on the two-parameter factor that peels off

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063733A (en) * 2018-06-27 2018-12-21 西安理工大学 A kind of outlier detection method based on the two-parameter factor that peels off

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
肖雪等: "基于改进的OPTICS聚类和LOPW的离群数据检测算法", 《计算机工程与科学》, pages 885 - 892 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114943434A (en) * 2022-05-16 2022-08-26 南京航空航天大学 Liquid hazardous chemical substance loading and unloading crane position dynamic allocation method based on LOF outlier
CN117272215A (en) * 2023-11-21 2023-12-22 江苏达海智能系统股份有限公司 Intelligent community safety management method and system based on data mining
CN117272215B (en) * 2023-11-21 2024-02-02 江苏达海智能系统股份有限公司 Intelligent community safety management method and system based on data mining
CN117436024A (en) * 2023-12-19 2024-01-23 湖南翰文云机电设备有限公司 Fault diagnosis method and system based on drilling machine operation data analysis
CN117436024B (en) * 2023-12-19 2024-03-08 湖南翰文云机电设备有限公司 Fault diagnosis method and system based on drilling machine operation data analysis

Similar Documents

Publication Publication Date Title
CN109816031B (en) Transformer state evaluation clustering analysis method based on data imbalance measurement
CN113092981B (en) Wafer data detection method and system, storage medium and test parameter adjustment method
CN107679734A (en) It is a kind of to be used for the method and system without label data classification prediction
US11640328B2 (en) Predicting equipment fail mode from process trace
CN112949735A (en) Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining
CN112131575B (en) Concept drift detection method based on classification error rate and consistency prediction
CN110543907A (en) fault classification method based on microcomputer monitoring power curve
CN114004137A (en) Multi-source meteorological data fusion and pretreatment method
CN110889441A (en) Distance and point density based substation equipment data anomaly identification method
CN113298162A (en) Bridge health monitoring method and system based on K-means algorithm
CN111709668A (en) Power grid equipment parameter risk identification method and device based on data mining technology
CN111400911A (en) GNSS deformation information identification and early warning method based on EWMA control chart
CN117669394B (en) Mountain canyon bridge long-term performance comprehensive evaluation method and system
Cui et al. Analysis and prediction of pipeline corrosion defects based on data analytics of in-line inspection
CN112329868A (en) CLARA clustering-based manufacturing and processing equipment group energy efficiency state evaluation method
CN113255810B (en) Network model testing method based on key decision logic design test coverage rate
CN115659271A (en) Sensor abnormality detection method, model training method, system, device, and medium
CN112765219B (en) Stream data abnormity detection method for skipping steady region
CN114597886A (en) Power distribution network operation state evaluation method based on interval type two fuzzy clustering analysis
JP2008258486A (en) Distribution analysis method and system, abnormality facility estimation method and system, program for causing computer to execute its distribution analysis method or its abnormality facility estimation method, and recording medium readable by computer having its program recorded therein
CN117591836B (en) Pipeline detection data analysis method and related device
CN112884167B (en) Multi-index anomaly detection method based on machine learning and application system thereof
CN113190406B (en) IT entity group anomaly detection method under cloud native observability
CN117113248B (en) Gas volume data anomaly detection method based on data driving
CN117236572B (en) Method and system for evaluating performance of dry powder extinguishing equipment based on data analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination