CN111222540A

CN111222540A - Abnormal judgment dioxin detection method based on unsupervised learning of clustering

Info

Publication number: CN111222540A
Application number: CN201911155400.9A
Authority: CN
Inventors: 董圆媛; 凌志辉; 熊飞; 张然; 王姗姗; 郭蓉; 郭仁庆; 崔嘉宇
Original assignee: Nanjing Xinktech Information Technology Co ltd; JIANGSU ENVIRONMENTAL MONITORING CENTER
Current assignee: Nanjing Xinktech Information Technology Co ltd; JIANGSU ENVIRONMENTAL MONITORING CENTER
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2020-06-02

Abstract

The invention provides an abnormal judgment dioxin detection method based on unsupervised learning of clustering, which comprises a standardizer, m clustering devices, an evaluator and a classifier, and is characterized in that: standardizing the collected original data by the standardizer, copying m results, and respectively transmitting the results to each clustering device; after the clustering device receives the data, clustering the data by adopting a K-means algorithm (the K value of each clustering device is different), calculating the intra-cluster distance, the inter-cluster distance and the DB index by the clustering device according to respective clustering results, and transmitting the DB index to the evaluator; after the evaluator receives the DB index transmitted by each clustering device, the corresponding clustering device is selected according to the 'DB index minimum' principle, so that the most clustering result is obtained; and the classifier classifies the k clusters in the optimal clustering result one by one according to preset minimum and maximum intra-cluster distance thresholds.

Description

Abnormal judgment dioxin detection method based on unsupervised learning of clustering

Technical Field

The invention belongs to the technical and environmental protection field, and is used for detecting and positioning dioxin in the waste incineration process.

Background

Foreign researchers study the content of dioxin in the raw garbage samples to obtain that the content of dioxin in each kilogram of municipal solid garbage is 6-5 OngI-TEQ, which corresponds to the garbage condition of China, and the value is 11-255 ngI-TEQ. In the incineration process, a large amount of dioxin contained in the garbage can directly enter the environment along with the flue gas without decomposition due to incomplete incineration.

Because most of dioxin in the flue gas can be in the high-temperature environment of the garbage incinerator (>850 deg.C), so the most dominant mode of dioxin formation is heterogeneous catalytic reaction on the surface of fly ash. The reaction temperature of the heterogeneous catalytic reaction is 250-500 ℃, and under the temperature condition, the dioxin can be synthesized by the catalytic reaction of the precursor, and can be gradually generated by combining residual carbon in the fly ash with atoms of hydrogen, chlorine, oxygen and the like through a innove reaction. Japanese environmental protection expert researches and finds that the average concentration of dioxin in the flue gas of the waste incinerator is 14.47ngI-TEQ/m³Greatly exceeds 0.1ngI-TEQ/m³Must be treated.

Because the technical requirements of dioxin detection and assay are very high, the real-time monitoring of dioxin is also a difficult problem internationally at present, the fastest monitoring speed can only be 12 hours at present, and the dioxin cannot be detected, so that the dioxin can be discharged randomly. At present, the detection of dioxin toxic pollutants can not realize online detection at home, and a sample is generally sent to a laboratory for offline analysis. The detection of dioxin substances in flue gas of a waste incineration plant generally adopts a high-resolution gas chromatography/high-resolution mass spectrometry (HRGC/HRMS), has the main advantages of high sensitivity and strong selectivity, can carry out qualitative and quantitative analysis on a certain monomer, and has the main disadvantages of complex sample pretreatment process, high detection cost, long analysis period and expensive detection supporting facilities. Therefore, the development of accurate, rapid and low-cost online dioxin detection, analysis and monitoring technology is the development direction of future technology.

At the present stage, the monitoring method for dioxin emission in the waste incineration process is still mainly based on a chemical method, including a chromatography method, an immunization method, a biological method, a laser mass spectrometry method and the like, which cannot realize on-line rapid detection, and compared with the rapidly developed waste incineration industry, the monitoring and monitoring technology for pollutants is obviously lagged behind. With the increasing automation of modern production processes and the increasingly widespread and intensive use of industrial control computers in continuous production processes, many important process variables of the production process are, for the most part, difficult to measure by means of existing sensors due to technical or economic problems. The basic idea of the soft measurement technology is to select a group of auxiliary variables which are easy to detect industrially and have close relation with the main variable according to some optimal criteria, and realize the online measurement of the main variable by establishing a mathematical model of the process. The soft measurement technology mainly comprises the selection of auxiliary variables, data acquisition and processing and a soft measurement model, and software replaces the functions of hardware. The method for realizing the online detection of the content of the element components by applying the soft measurement technology is economical and reliable, has rapid dynamic response and can continuously give the content of the element components in the extraction process.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to apply the deep neural network and machine learning to soft measurement of dioxin emission, on one hand, the advantage of the nonlinear problem of a complex system is solved, the difficulty of establishing a mathematical model of the nonlinear multivariable complex system is avoided, on the other hand, the possibility and the necessity of combining soft computing technologies such as garbage incineration process control and artificial intelligence are enhanced, and the soft measurement theory of dioxin emission is further enriched. The method has certain universality for indirect measurement of a multivariable complex nonlinear system, and provides certain guidance for timely measuring the problem of secondary pollution possibly caused in the waste incineration process and optimizing the incineration state of the waste.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that:

a abnormal judgment dioxin detection method based on Cluster unsupervised learning comprises a Normalizer (Normalizer), m Cluster devices (Cluster), an evaluator (Assessor) and a Classifier (Classifier), and is characterized in that:

standardizing the collected original data by the standardizer, copying m results, and respectively transmitting the results to each clustering device; after the clustering device receives the data, clustering the data by adopting a K-means algorithm (the K value of each clustering device is different), calculating the intra-cluster distance, the inter-cluster distance and the DB index by the clustering device according to respective clustering results, and transmitting the DB index to the evaluator; after the evaluator receives the DB index transmitted by each clustering device, the corresponding clustering device is selected according to the 'DB index minimum' principle, so that the most clustering result is obtained; and the classifier classifies the k clusters in the optimal clustering result one by one according to preset minimum and maximum intra-cluster distance thresholds.

The Normalizer (Normalizer): because each piece of data has a plurality of attribute fields, and different attributes have different value ranges, the attribute with a small value is seriously influenced by the attribute with a large value, and therefore, the data is standardized.

The clusterer: adopting m clustering devices (m is 10), clustering the standardized data by each clustering device through a K-means algorithm, and calculating a DB value of a clustering result; the data source is normalized data, the only difference is the value of K, and the aim is to reduce the dependence of the K-means algorithm on the value of K.

The evaluator (assassasssor): and the m clustering devices submit the DB indexes calculated by the m clustering devices to the evaluator, the evaluator selects the corresponding clustering device according to the 'DB index minimum' principle, the clustering result in the clustering device is used as the final clustering result of the model, and the result is sent to the classifier.

The Classifier (Classifier): in the process of waste incineration, most of dioxin can be decomposed in the high-temperature environment (>850 ℃) of the waste incinerator, and data clusters after the dioxin decomposition also have high similarity, so that the cluster internal distance is small, and the minimum cluster internal distance min is defined to identify the abnormal condition that the dioxin exceeds the standard; when data with normal dioxin content far exceeds data with abnormal dioxin content, the data with abnormal dioxin content cannot fall into normal clusters due to the fact that the similarity of normal data and abnormal data is low, so that the data with the abnormal dioxin content become isolated points, when the isolated points are gathered into one cluster, the intra-cluster distance is used to be too large, and the maximum intra-cluster distance is defined to be max; the classifier classifies each cluster in the final clustering result by using two thresholds, namely a minimum intra-cluster distance min and a maximum intra-cluster distance max; and judging the standard test, namely when the distance in the cluster is less than min or more than max, the cluster is an abnormal cluster, otherwise, the cluster is a normal cluster.

Has the advantages that: the method selects 20% of about 10 ten thousand pieces of data from data collected in the field of 2015-2017 year for verification, and divides the data into 10 groups, wherein each group comprises 1 ten thousand pieces. 3 clustering devices are set, which are represented by (Ci, Ki) and are respectively (C1,3), (C2,5), (C3,7), wherein Ci refers to the ith clustering device, and k is the k value adopted by the ith clustering device. The verification of a large amount of data shows that min is 1.34, and max is 21.69, so that the method has good performance and accuracy.

The clustering-based unsupervised learning dioxin abnormality detection method provided by the invention has the advantages that a plurality of clustering devices are adopted to cluster a data set, and the model selects the optimal clustering result for detection, so that the model has certain intelligence. By setting different thresholds, it can be the anomaly detection that the model applies to any dioxin. The experimental result shows that the model shows high detection rate and good performance of low false alarm rate under the common influence of various cross-influence factors on dioxin.

Drawings

FIG. 1 is a diagram of a detection model according to an embodiment of the present invention.

Fig. 2 is a data cluster diagram of normal dioxin according to an embodiment of the present invention.

Fig. 3 is a data cluster diagram in the case of a small amount of dioxin abnormality according to the embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and specific examples.

The invention provides an abnormal dioxin judgment method based on cluster unsupervised learning, which can achieve the aim and aims to solve the problem that dioxin types cannot be identified and reduce the false alarm rate as much as possible while improving the detection rate. The model is built on two conditions:

(1) when taking place the dioxin and exceeding standard, each item index change state similarity is high when msw incineration, and wherein the index includes, stove tail flue import flue gas temperature, stove tail flue export flue gas temperature, furnace export O2 concentration 1, furnace export O2 concentration 2, the concentration of CO, the first flue export flue gas temperature of furnace, the first flue entry flue gas temperature of furnace, active carbon injection volume, the actual operating current of stove primary air fan, the actual operating current of stove secondary air fan, boiler steam volume.

(2) In the waste incineration process, the similarity between the normal data of the dioxin and the overproof data of the dioxin is low.

Fig. 1 is a diagram of a detection model in embodiment 1 of the present invention.

The model used in this example consists of a Normalizer (Normalizer), m clusterers (Cluster), an evaluator (Assessor) and a Classifier (Classifier). The working principle is as follows: standardizing the collected original data by the standardizer, copying m results, and respectively transmitting the results to each clustering device; after the clustering device receives the data, clustering the data by adopting a K-means algorithm (the K value of each clustering device is different), calculating the intra-cluster distance, the inter-cluster distance and the DB index by the clustering device according to respective clustering results, and transmitting the DB index to the evaluator; after the evaluator receives the DB index transmitted by each clustering device, the corresponding clustering device is selected according to the 'DB index minimum' principle, so that the most clustering result is obtained; and the classifier classifies the k clusters in the optimal clustering result one by one according to preset minimum and maximum intra-cluster distance thresholds. Wherein the content of the first and second substances,

1) standardizing device (Normalizer)

Because each piece of data has a plurality of attribute fields, and different attributes have different value ranges, the attribute with a small value is seriously influenced by the attribute with a large value, and therefore, the data is standardized.

Selecting N records from the historical data of the waste incineration detection, respectively calculating the mean value and the standard deviation of each attribute by using the formulas (1) and (2), and standardizing each data by combining the formula (3).

2) Cluster equipment (Cluster)

This embodiment employs m clusterers (m ═ 10), each clustering the normalized data by the K-means algorithm, and calculates a DB value of the clustering result. The data source is normalized data, the only difference is the value of K, and the aim is to reduce the dependence of the K-means algorithm on the value of K.

The algorithm is as follows:

inputting the number k of clusters and n objects

Output k clusters

The method comprises the following steps:

a) randomly selecting k objects as initial cluster centers

b) (re) assigning each object to its nearest cluster based on the average of the objects in the cluster

c) Updating the mean of the clusters, i.e. calculating the mean of the objects in each cluster

d) Repeating the steps 2 and 3 until the center of each cluster is not changed any more

When the distance between the data and the cluster center is calculated, the Euclidean formula is adopted, and the formula (4) is shown as follows:

(n represents the number of each attribute, and n is 11 in this embodiment);

the attributes of each record in the embodiment include the flue gas temperature at the inlet of the flue at the tail part of the furnace, the flue gas temperature at the outlet of the flue at the tail part of the furnace, the concentration 1 of O2 at the outlet of the furnace, the concentration 2 of O2 at the outlet of the furnace, the concentration of CO, the flue gas temperature at the outlet of the first flue of the furnace, the flue gas temperature at the inlet of the first flue of the furnace, the injection amount of activated carbon, the actual operation current of the primary fan of the furnace, the actual operation current of the.

The inter-cluster distance in this example is: delta (C)_i，C_j)＝d(SC_i-SC_j) Wherein SCi represents the center point of the ith cluster, and the formula means the distance between the center of the ith cluster and the center of the jth cluster

Where Xp denotes the p-th piece of data in the i-th cluster, | Ci | denotes the total number n of data in the i-th cluster, and this formula means twice the average distance of all samples from the cluster center within one cluster.

The DB index mentioned in this embodiment is a method for measuring the clustering quality. When the inter-cluster distance is increased and the intra-cluster distance is decreased, the DB index is decreased, and finally the clustering effect is indicated to be good, namely the DB index is decreased, and the clustering effect is better. The DB indices are used by the model to allow the evaluator to select the best clustering result among the plurality of clusterers. DB index calculation is shown in a formula

Where K represents the value of K used in the K-means algorithm, i.e., the number of clusters contained in the clustering result.

3) Evaluator (Assessor)

And the m clustering devices submit the DB indexes calculated by the m clustering devices to the evaluator, the evaluator selects the corresponding clustering device according to the 'DB index minimum' principle, the clustering result in the clustering device is used as the final clustering result of the model, and the result is sent to the classifier.

4) Classifier (Classiier)

In the waste incineration process, most of dioxin can be decomposed in the high-temperature environment (>850 ℃) of the waste incinerator, so that the data clusters before decomposition of the dioxin have high similarity, and the data clusters after decomposition of the dioxin also have high similarity, so that the cluster distance is small, and therefore the minimum cluster distance min of the model is defined in the embodiment to identify the abnormal exceeding of the dioxin. When data with normal dioxin content far exceeds data with abnormal dioxin content, because the similarity between normal data and abnormal data is low, n cases of data with abnormal dioxin content cannot fall into a normal cluster, so that the data become isolated points, and the intra-cluster distance is too large when the isolated points are gathered into one cluster, so that the maximum intra-cluster distance is defined as max in the embodiment.

And the classifier classifies each cluster in the final clustering result respectively, namely a normal cluster and an abnormal cluster, by using two thresholds of the minimum intra-cluster distance min and the maximum intra-cluster distance max. And judging the standard test, namely when the distance in the cluster is less than min or more than max, the cluster is an abnormal cluster, otherwise, the cluster is a normal cluster. The threshold setting in this embodiment is determined by 10 ten thousand experimental data tests, and min is 1.34, and max is 21.69, which shows good performance and accuracy.

As shown in fig. 2, a data cluster diagram in a normal case of dioxin.

Fig. 3 is a data cluster diagram in the case of a small number of dioxin anomalies.

The following table shows the output results of 10 groups of data in the model of the present invention, each row represents one group of data, t in column 6 represents the number of iterations in the K-means algorithm, and the last column represents the clusterer selected by the model and its results.

The embodiment provides an unsupervised dioxin abnormality detection method based on clustering, which is characterized in that a plurality of clustering devices are adopted to cluster a data set, and a model selects an optimal clustering result to detect, so that the model has certain intelligence. By setting different thresholds, it can be the anomaly detection that the model applies to any dioxin. The experimental result shows that the model shows high detection rate and good performance of low false alarm rate under the common influence of various cross-influence factors on dioxin.

The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention cannot be limited thereby, and any modification made on the basis of the technical scheme according to the technical idea proposed by the present invention falls within the protection scope of the present invention; the technology not related to the invention can be realized by the prior art.

Claims

1. A clustering-based unsupervised learning anomaly judgment dioxin detection method is characterized by comprising the following steps:

2. The abnormal judgment dioxin detection method based on unsupervised learning of clustering according to claim 1, characterized in that the Normalizer (Normalizer):

3. The abnormal judgment dioxin detection method based on unsupervised learning of clustering according to claim 2, characterized in that: selecting N records from the historical data of the waste incineration detection, respectively calculating the mean value and the standard deviation of each attribute by using formulas (1) and (2), and standardizing each data by combining a formula (3):

4. the anomaly judgment dioxin detection method based on unsupervised learning by clustering according to claim 1, characterized by the clustering unit (Cluster):

adopting m clustering devices (m is 10), clustering the standardized data by each clustering device through a K-means algorithm, and calculating a DB value of a clustering result; the data source is normalized data, the only difference is the value of K, and the aim is to reduce the dependence of the K-means algorithm on the value of K.

5. The abnormal judgment dioxin detection method based on unsupervised learning of clustering according to claim 4, characterized in that the K-means algorithm is as follows:

inputting: number of clusters k and n objects

And (3) outputting: k clusters

The method comprises the following steps:

step one, randomly selecting k objects as initial cluster centers;

secondly, according to the average value of the objects in the clusters, each object is (re) allocated to the cluster nearest to the object;

step three, updating the average value of the clusters, namely calculating the average value of the objects in each cluster;

step four, repeating the step two and the step three until the center of each cluster is not changed;

when the distance between the data and the cluster center is calculated, the Euclidean formula is adopted:

n represents the number of attributes per item.

6. The abnormal judgment dioxin detection method based on unsupervised learning of clustering according to claim 4, characterized in that: the DB value is a method for measuring the clustering quality, when the inter-cluster distance is increased and the intra-cluster distance is decreased, the DB index is decreased, and finally the clustering effect is indicated to be good, namely the DB index is decreased, and the clustering effect is better; DB index calculation is shown in a formula

7. The anomaly judgment dioxin detection method based on unsupervised learning of clustering according to claim 1, characterized by the fact that the evaluator (Assessor):

8. The abnormal judgment dioxin detection method based on unsupervised learning of clustering according to claim 1, characterized in that the Classifier (Classifier):

in the process of waste incineration, most of dioxin can be decomposed in the high-temperature environment (>850 ℃) of the waste incinerator, and data clusters after the dioxin decomposition also have high similarity, so that the cluster internal distance is small, and the minimum cluster internal distance min is defined to identify the abnormal condition that the dioxin exceeds the standard; when data with normal dioxin content far exceeds data with abnormal dioxin content, the data with abnormal dioxin content cannot fall into normal clusters due to the fact that the similarity of normal data and abnormal data is low, so that the data with the abnormal dioxin content become isolated points, when the isolated points are gathered into one cluster, the intra-cluster distance is used to be too large, and the maximum intra-cluster distance is defined to be max;

the classifier classifies each cluster in the final clustering result by using two thresholds, namely a minimum intra-cluster distance min and a maximum intra-cluster distance max; and judging the standard test, namely when the distance in the cluster is less than min or more than max, the cluster is an abnormal cluster, otherwise, the cluster is a normal cluster.