CN115964662A

CN115964662A - Complex equipment parameter anomaly detection method based on improved density peak clustering

Info

Publication number: CN115964662A
Application number: CN202111173491.6A
Authority: CN
Inventors: 付旭云; 夏湘钊; 孙昊; 张永健; 刘雪云; 钟诗胜
Original assignee: Harbin Institute of Technology Weihai
Current assignee: Harbin Institute of Technology Weihai
Priority date: 2021-10-08
Filing date: 2021-10-08
Publication date: 2023-04-14

Abstract

The invention relates to a complex equipment parameter abnormity detection method based on improved density peak value clustering, which can improve the detection accuracy without executing complex parameter adjustment and other processing, and compared with the prior art, in order to avoid the influence of extreme sample unbalance on the detection effect, a density peak value clustering algorithm is adopted; in order to overcome the subjectivity of a local density calculation mode, a local density measurement criterion based on sample K nearest neighbors is introduced; in order to overcome the potential problem of chain distribution errors in the sample distribution process, a sample distribution strategy based on K nearest neighbors of a sample is introduced; in order to improve the accuracy of outlier selection and reduce the influence of outliers on the clustering process, a new outlier threshold determining mode and an outlier processing mode are provided; aiming at the engine with enough abnormal samples, a weak supervision clustering parameter adjustment strategy is provided for reducing parameter adjustment difficulty, and aiming at the engine with insufficient abnormal samples, an abnormal detection mode under the condition of weak supervision is provided, so that the detection precision is obviously improved.

Description

Complex equipment parameter anomaly detection method based on improved density peak clustering

The technical field is as follows:

the invention relates to the technical field of complex equipment parameter anomaly detection, in particular to a complex equipment parameter anomaly detection method based on improved density peak value clustering, which can improve the detection accuracy on the premise of not executing complex parameter adjustment and other processing.

The background art comprises the following steps:

due to the fact that the original values of gas circuit parameters of complex equipment such as civil aviation engines have the characteristics of being unbalanced in extreme samples, insufficient in labels and the like, most of abnormity detection methods are difficult to obtain accurate abnormity detection results, and different equipment are different in running health state for abnormity detection tasks of complex equipment parameters with large number of parts, so that the accuracy is inevitably reduced when the extracted features participate in density clustering together. The data set of a single device has smaller scale, and the local density of the sample is subjected to the truncation distance d _c The great influence of (c). In addition, the density peak value clustering divides the clustering centers through a decision diagram formed by rho and delta, samples j outside the clustering centers are classified to the clustering centers which are higher in density and closest to the clustering centers, and although the method is efficient, once clustering division of a certain point is wrong, linkage errors of the rest points are easily caused. For this reason, many scholars have studied local density definition methods using K-nearest neighbors of samples and new sample allocation strategies.

Existing modified density peak clustering DPC algorithms ultimately aim at clustering rather than anomaly detection, and outliers may be classified as cluster centers or assigned to the closest class. However, outliers faced by the anomaly detection task are usually independent of all categories, the correlation between outliers is small, and the outliers are roughly taken as a cluster center or distributed to other categories, which brings great errors to the anomaly detection task. Meanwhile, the number of K neighbor samples needs to be adjusted for many times, and great subjectivity exists.

The invention content is as follows:

aiming at the defects and shortcomings in the prior art, the invention provides the complex equipment parameter abnormity detection method based on the improved density peak value clustering, which can improve the detection accuracy on the premise of not executing complex parameter adjustment and other processing.

The invention is achieved by the following measures:

a complex equipment parameter anomaly detection method based on improved density peak clustering is characterized by comprising the following steps:

step 1: constructing a complex equipment parameter anomaly detection model, wherein a weak supervision anomaly detection model is established for equipment with enough anomalous samples, and an unsupervised anomaly detection model is established for equipment with insufficient anomalous samples;

step 2: performing anomaly detection, specifically comprising:

step 2-1: the local density is defined using an exponential kernel function of width δ =1, as shown in the following equation:

KNN (i) -a set of K neighbor points of the point i,

d _ij point x _i And x _j The euclidean distance of (c) is,

the original outlier threshold is shown in formulas (5) to (7), the outliers are finally allocated to the clusters closest to the original outliers after the other clusters are completed, the outlier threshold is determined in an average mode, so that the selected outliers are very many, the outliers are possibly not judged to be abnormal because the outliers are finally allocated to the categories closest to the outliers, and k is _dist (i)＝max _j∈KNN(i) {d _ij }(5)，

Outlier＝{o|k _dist (o) > threshold } (7), wherein k is _dist (i) The farthest K of the point i is adjacent to the point;

step 2-2: coordinate transformation as in equations (8) - (10) is performed,

(x _Kdist ′,y _Kdist ′)＝(x _Kdist -min(x _Kdist ),y _Kdist -min(y _Kdist ))·M _trans (10)，

in the formula x _Kdist -vector, y, representing all values of curve Kdist on the x-axis _Kdist -vector, M, representing all values of curve Kdist on the y-axis _trans -a transformation matrix, x, representing samples for rotation axes _Kdist ' -vector, y, representing all values on the axis after rotation _Kdist 'represents the vector formed by all values on the axis after rotation, and the Threshold point Threshold' can be found after coordinate transformation is finished (x) _Kdist ′(i),min(y _Kdist ′))，x _Kdist ' (i) represents x _Kdist ' sequentially arranging indexes of ith value, and determining x according to the indexes i _Kdist (i) And the corresponding Threshold value Threshold of the outlier, after the Threshold value is determined, the outlier is directly classified into an abnormal class without participating in the subsequent clustering process;

step 2-3: the clustering process is as follows:

step 2-3-1: calculating local density, determining a density peak point, determining a clustering center through a decision diagram, and sequentially adding a class label M (M =0,1,2,3, \8230; M),

step 2-3-2: continuously searching the points which are not distributed with clusters in the K neighbors of the cluster center, dividing the points into clusters where the cluster center is located,

step 2-3-3: calculating the probability of the remaining points belonging to each category, and assigning the probability to the category with the highest probability, wherein the probability calculation method is shown as formulas (11) and (12), and respectively calculating the point x _i Similarity to each class, then assign point xi to the most probable class,

where KNN (i) is a K nearest neighbor set of points i, y _j ＝m，m＝1,2,3, \8230;, M denotes a label of a category,

d _ij is a point x _i And x _j Euclidean distance of(s) _ij Is a point x _i And x _j The closer the distance, the higher the similarity, P _i ^m Is the probability that point i belongs to the category m,

w _ij is x _i And x _j Similar weight of (2) to x _j Is related to the K-nearest neighbor distribution,

and step 3: adjusting a weakly supervised anomaly detection hyper-parameter according to a weakly supervised anomaly detection result, and adjusting an unsupervised anomaly detection hyper-parameter according to an unsupervised clustering evaluation index;

and 4, step 4: and outputting an abnormal detection result.

For complex equipment with the number of abnormal samples larger than 1, the following weak supervision clustering adjustment strategy is implemented:

step 3-1: an iteration strategy of the K nearest neighbor number is provided, the K nearest neighbor number starts from 0.9 total sample number, the value is reduced at intervals of 0.1 total sample number, when the K nearest neighbor number is smaller than 0.1 total sample number, half is continuously reduced, and the value is rounded until the value of the K nearest neighbor number is 1;

step 3-2: and continuously iteratively dividing outliers and minority classes, wherein the judgment standard of the minority classes is as follows: judging according to whether the number of the samples in the class is less than 0.01 total sample number;

step 3-3: only one of the abnormal labels is randomly reserved, after each iteration, whether the known abnormality is detected by the abnormal detection result is judged, namely whether the known abnormality is contained in the outlier and the few samples is detected, and if the known abnormality can be detected, the iteration is stopped;

step 3-4: if the K nearest neighbor number is reduced to 1 and no abnormality is detected, increasing 0.01 total sample number for the minority class judgment standard;

step 3-5: if the minority class judgment standard is increased to 0.1 total sample number and still no abnormity can be found, the abnormity detection algorithm is considered to be invalid;

step 3-6: after the anomaly detection is finished, storing the weak supervision clustering parameters, judging whether the anomaly with the removed labels is detected by the algorithm, if all the anomalies can be detected, considering that the effect of the abnormal algorithm is good, and if the anomalies can not be detected, considering that the effect of the algorithm is good, and if the anomalies can not be detected, considering that the algorithm is invalid;

therefore, a weak supervision anomaly detection algorithm based on improved density peak clustering without complex parameter adjustment process is completed, when new data needs to be abnormally detected, the new data is added into the old data, and the abnormal detection of the new data is realized through the weak supervision anomaly detection;

for complex equipment with only one abnormal number, a weak supervision clustering adjustment strategy is not applicable, so that only an unsupervised clustering method can be adopted, the K neighbor number is continuously adjusted according to four evaluation indexes of a CH coefficient, a contour coefficient, a Davison baume index and a Dunn index, the cluster with the highest score is used for abnormal detection, wherein the CH coefficient calculates the compactness and the separability through the sum of squares of distances, the contour coefficient SC calculates the compactness and the separability through observing the obvious degree of a clustering boundary, the Davison baume index DBI calculates the compactness and the separability through the maximum value of an intra-class sample clustering average value and the clustering center distance, and the Dengen index DVI calculates the ratio of the nearest distance of the inter-class sample to the farthest distance of the intra-class sample to judge the compactness and the separability.

Compared with the prior art, in order to avoid the influence of the unbalance of the extreme samples on the detection effect, a density peak value clustering algorithm is adopted; in order to overcome the subjectivity of a local density calculation mode, a local density measurement criterion based on K nearest neighbors of a sample is introduced; in order to overcome the potential problem of chain distribution errors in the sample distribution process, a sample distribution strategy based on K nearest neighbors of a sample is introduced; in order to improve the accuracy of outlier selection and reduce the influence of outliers on the clustering process, a new outlier threshold determining mode and an outlier processing mode are provided; aiming at the engine with enough abnormal samples, a weak supervision clustering parameter adjustment strategy is provided for reducing parameter adjustment difficulty, and aiming at the engine with insufficient abnormal samples, an abnormal detection mode under the condition of weak supervision is provided, so that the detection precision is obviously improved.

Description of the drawings:

FIG. 1 is a schematic block diagram of one embodiment of the present invention.

Fig. 2 is a graph of old outlier threshold values in an embodiment of the present invention.

Fig. 3 is a graph of new outlier threshold values in an embodiment of the present invention.

Fig. 4 is a graph showing the influence of the K nearest neighbor number on the clustering effect in the embodiment of the present invention.

The specific implementation mode is as follows:

the invention is further described below with reference to the accompanying drawings and examples.

Example 1:

the embodiment takes an aircraft engine parameter as an example, and provides an aircraft engine equipment parameter abnormity detection method based on improved density peak value clustering, which comprises the following steps:

due to the fact that the original values of the gas circuit parameters of the civil aviation engine have the characteristics of unbalance of extreme samples, insufficient labels and the like, most of abnormity detection methods are difficult to obtain accurate abnormity detection results, and therefore the method is of great importance for researching an abnormity detection method which is not influenced by the unbalance of the extreme samples and can be used for mining useful information from the insufficient labels. The density clustering judges the type of the sample according to the density of the distribution position of the sample, is not influenced by the type imbalance, and can accurately find the abnormal point, so that the method for researching the abnormal detection based on the density clustering has wide application prospect.

Aiming at the problems, the invention provides a weak supervision anomaly detection algorithm based on improved density peak value clustering. Firstly, considering the defects of a density peak value clustering local density calculation mode and a sample distribution strategy, introducing a K nearest neighbor-based sample local density definition strategy and a sample distribution strategy; in order to better use the density peak value clustering for abnormal detection and reduce the influence of outliers on clustering distribution, a self-adaptive outlier threshold value determining method is provided and the outliers are excluded from the clustering distribution process; finally, in order to utilize the label and reduce the complex parameter adjusting process, a weak supervision clustering parameter adjusting strategy is provided by analyzing the influence of the K neighbor number on the clustering effect, the purpose of detecting the known abnormality is achieved, and the weak supervision abnormality detecting method without the complex parameter adjusting process is realized by iteratively and automatically adjusting the parameters.

In order to accurately detect an abnormality of the extracted features, an abnormality detection algorithm as shown in fig. 1 is proposed. In order to avoid the influence of the unbalance of the extreme samples on the detection effect, a density peak value clustering algorithm is adopted; in order to overcome the subjectivity of a local density calculation mode, a local density measurement criterion based on K nearest neighbors of a sample is introduced; in order to overcome the potential problem of chain distribution errors in the sample distribution process, a sample distribution strategy based on K nearest neighbors of a sample is introduced; in order to improve the accuracy of outlier selection and reduce the influence of outliers on the clustering process, a new outlier threshold determining mode and an outlier processing mode are provided; aiming at the engine with enough abnormal samples, a weak supervision clustering parameter adjustment strategy is provided for reducing parameter adjustment difficulty, and an abnormal detection mode under the condition of weak supervision is provided for the engine with insufficient abnormal samples.

The Density Peak Clustering (DPC) algorithm assumes that the ideal cluster center has two basic features: 1) Its local density is greater than that of the surrounding points; 2) The different cluster centers are distributed far apart. The DPC algorithm proposes the local density ρ of the sample i _i Distance d of truncation _c Distance δ of nearest sample j other than local density _j Three concepts. Rho _i And delta _j The calculation formula is shown in formulas 1-3, and the truncation distance d _c And adjusting the super-parameter according to the clustering effect.

In the formula d _ij -representing the euclidean distance between samples i, j.

For the anomaly detection task of the invention, different engines have different health states, so that the accuracy is necessarily reduced when the extracted features participate in density clustering together. The data set of a single engine has smaller scale, and the local density of the sample is subjected to the truncation distance d _c The great influence of (2). In addition, the DPC divides the cluster centers by a decision graph formed by ρ and δ, and classifies the samples j outside the cluster centers to the cluster centers which have higher density and are closest to the cluster centers. For this reason, many scholars have studied local density definition methods using K-nearest neighbors of samples and new sample allocation strategies.

The ultimate goal of existing modified DPC algorithms is clustering rather than anomaly detection, and outliers may be classified as cluster centers or assigned to the closest class. However, outliers faced by the anomaly detection task are usually independent of all categories, and the correlation between outliers is small, so that the outliers are roughly taken as a clustering center or distributed to other categories, which brings great errors to the anomaly detection task. Meanwhile, the number of K adjacent samples needs to be adjusted for multiple times, and great subjectivity exists. For this reason, we propose an anomaly detection algorithm and a weakly supervised Clustering adjustment strategy based on Improved Density Peak Clustering (IDPC).

Firstly, optimizing an outlier division mode, directly defining outliers as exceptions, and reducing the interference of the outliers on clustering results; then optimizing a clustering distribution strategy to finish clustering on the residual points, and taking a few class samples with too few samples in the class and outliers as exception handling; the influence of the K near neighbor number on the clustering precision is evaluated through a plurality of evaluation indexes, and a weakly supervised clustering parameter adjustment strategy and a weakly supervised IDPC (inverse discrete cosine computer) which automatically select the K near neighbor sample number through iteration and find out the abnormality, namely terminate the iteration are provided. Finally, the application range of weak supervision is considered to be small, and the unsupervised IDPC is provided.

The local density is defined using an exponential kernel function with width δ =1, as shown in equation 4.

KNN (i) -a set of K neighbor points of point i.

d _ij -point x _i And x _j The euclidean distance of (c).

The outlier threshold is shown in equations 5-7, and after the clustering of the remaining points is completed, the outlier is finally assigned to the cluster closest to the outlier. The outlier threshold is determined by averaging, resulting in a large number of outliers being selected, and the last assignment of an outlier to the closest class may result in the outlier not being judged as abnormal.

k _dist (i)＝max _j∈KNN(i) {d _ij }(5)

Outlier＝{o|k _dist (o)＞threshold}(7)

In the formula k _dist (i) The farthest K of points i is the nearest neighbor.

As FIG. 2 green line segment represents the original old outlier threshold, kdist represents all k _dist (i) According to the curve formed by the size arrangement, the inflection point of Kdist cannot be accurately found by the outlier threshold dividing method, and therefore a new outlier threshold dividing method shown by a red line segment in FIG. 2 is provided.

As shown in fig. 3, a new outlier threshold calculation method is performed by rotating the curve Kdist to the angle shown in the drawing with the minimum value of the curve Kdist as the origin, and a new coordinate axis and a new curve Kdist' are obtained. The minimum value of a new curve Kdist' on a new coordinate axis is an inflection point, the inflection point is set as an outlier threshold, and then the position on the original coordinate axis corresponding to the threshold is obtained, wherein the coordinate transformation process is shown as a formula 8-10.

(x _Kdist ′,y _Kdist ′)＝(x _Kdist -min(x _Kdist ),y _Kdist -min(y _Kdist ))·M _trans (10)

In the formula x _Kdist The vector formed by all values of the curve Kdist on the x axis is represented.

y _Kdist The vector formed by all values of the curve Kdist on the y axis is represented.

M _trans -a transformation matrix representing samples for rotation of the coordinate axes.

x _Kdist ' -vector composed of all values of KdIst ' on x ' axis after rotation.

y _Kdist ' -vector consisting of all values of the rotated Kdist ' on the y ' axis.

After the coordinate transformation is completed, the Threshold point Threshold' (x) can be found _Kdist ′(i),min(y _Kdist ′))，x _Kdist ' (i) i denotes x _Kdist ' sequentially arranging indexes of ith value, and determining x according to the indexes i _Kdist (i) And its corresponding outlier Threshold. After the threshold value is determined, the outliers are directly classified into the abnormal class without participating in the subsequent clustering process.

The complete clustering process is as follows:

1) Calculating local density, determining a density peak point, determining a clustering center through a decision diagram, and sequentially adding a category label M (M =0,1,2,3, \8230; M).

2) And continuously searching points which are not distributed with clusters in the K neighbors of the cluster center, and dividing the points into clusters where the cluster center is located.

3) The probability that the remaining points belong to each class is calculated and assigned to the class with the highest probability.

Probability calculation method, as shown in equations 11-12, calculates points x, respectively _i Similarity to each class, then point x _i Assigned to the class with the highest probability.

KNN (i) -K neighbor set of points i.

y _j = M-M =1,2,3, \ 8230;, M denotes a label of a category.

d _ij Point x _i And x _j The euclidean distance of (c).

s _ij -point x _i And x _j The closer the distance, the higher the similarity.

P _i ^m The probability that point i belongs to category m.

w _ij ——x _i And x _j Similar weight of (2) to x _j Is related to the K-nearest neighbor distribution,

and after the clustering is completed, taking the noise points and the minority classes as the anomalies.

And (3) adjusting a weak supervision clustering parameter strategy: the CH Coefficient (Calinski-Harabasz, CH), the contour Coefficient (SC), the Daviesenberg Index (DBI) and the Dunn Index (DVI) are four unsupervised indexes for evaluating the clustering effect, and the larger the intra-class compactness is, the larger the inter-class separation is, and the better the clustering effect is. CH calculates closeness and separability through the square sum of distances, SC calculates closeness and separability through observing the obvious degree of clustering boundaries, DBI calculates closeness and separability through the maximum value of the clustering average value of the intra-class samples and the distance of the clustering centers, and DVI calculates the ratio of the nearest distance of the inter-class samples to the farthest distance of the intra-class samples to judge closeness and separability.

Fig. 4 shows the scores of the four indicators when the K-neighborhood number of the data used in the present invention increases from 1 to 1602 (0.9 times the total sample). From the unsupervised view, for the abnormal detection task of the invention, when the K nearest neighbor number is larger, the clustering result is closer to the result under the ideal condition: normal-abnormal class, with a smaller number of classes. When the number of the K nearest neighbors is small, the number of the clusters in the clustering result is large, and outliers and few class samples are very large. The larger the K nearest neighbor number is, the larger the number of most classes of samples is, the fewer the classes in the clustering result is, which results in the increased compactness and the decreased separation degree obtained by the calculation of the sum of squared distances of the samples. Therefore, the larger the K neighbor number, the worse the CH. Other indicators use the mean or maximum value instead of the sum of squares to calculate closeness and separation, avoiding the effects of excessive samples in most categories, resulting in similar score trends. Therefore, the clustering effect is based on the following three indexes. That is, the larger the number of K neighbors, the higher the clustering score, and when the number of K neighbors is close to 1, the clustering score is one, so the value of K neighbors should be larger or smaller.

Most of the data used by the invention are normal data, the distribution of which is concentrated, and the most ideal clustering result is to cluster the data into two types, namely normal and noise (abnormal). When the K nearest neighbor number is larger (close to the total number of samples), the clustering result is influenced by the overall data, the clustering result is closer to the clustering under the ideal condition, the found noise is less, and a few samples in the class are also less. When the number of K neighbors is small (close to 1), the local density can reflect local conditions near the data points, clustering is finer, the number of clustered categories is larger, and noise points and few intra-category samples are also larger. Therefore, the value of the K neighbor number is larger or smaller.

For engines with more than 1 abnormal sample number, we propose the following weakly supervised clustering adjustment strategy:

1) An iteration strategy of the K nearest neighbor number is provided, the K nearest neighbor number starts from 0.9 total sample number, the value is reduced at intervals of 0.1 total sample number, when the K nearest neighbor number is smaller than 0.1 total sample number, half is continuously reduced, and the value is rounded until the value of the K nearest neighbor number is 1;

2) And continuously iteratively dividing outliers and minority classes, wherein the judgment standard of the minority classes is as follows: judging according to whether the number of the samples in the class is less than 0.01 total sample number;

3) Only one of the abnormal labels is randomly reserved, after each iteration, whether the known abnormality is detected by the abnormal detection result is judged, namely whether the known abnormality is contained in the outlier and the few samples is detected, and if the known abnormality can be detected, the iteration is stopped;

4) If the K nearest neighbor number is reduced to 1 and no abnormality is detected, increasing 0.01 total sample number for the minority class judgment standard;

5) And if the few types of judgment standards are increased to 0.1 total sample number and the abnormality still cannot be found, the abnormality detection algorithm is considered to be invalid.

6) And after the anomaly detection is finished, storing the weak supervision clustering parameters, judging whether the anomaly with the removed labels is detected by the algorithm, if all the anomalies can be detected, considering that the effect of the anomaly algorithm is good, and if the anomalies can not be detected, considering that the algorithm effect is good, and if the anomalies can not be detected, considering that the algorithm is invalid.

Therefore, the weak supervision anomaly detection algorithm based on the improved density peak value clustering without a complex parameter adjusting process is completed. When new data needs to be detected abnormally, the new data is added into the old data, and the abnormal detection of the new data is realized through weak supervision abnormal detection.

For an engine with only one abnormal number, the weak supervision clustering adjustment strategy is not applicable, so that only an unsupervised clustering method can be adopted, the K neighbor number is continuously adjusted according to the four evaluation indexes, and the cluster with the highest score is used for abnormal detection. However, due to the fact that evaluation standards of the four indexes are not consistent, K neighbor is not easy to adjust, and due to the fact that guidance of label information is lacked, abnormal detection and false alarm are difficult to control.

The effect of the anomaly detection algorithm is usually evaluated by using a false alarm rate and an ROC (Receiver Operating Characteristic Curve), wherein the false alarm rate is a ratio of the number of false alarms to the number of detected anomalies, and the ROC is calculated according to indexes such as false alarms and false negatives. The false alarm rate requires a sufficient number of outlier samples and the ROC curve requires a sufficient number of outlier samples and a complete supervised label.

For the data used by the method, the total amount of samples is large, abnormal samples are few, labels are fewer, and both the false alarm rate and the ROC curve cannot be used for judging the quality of the method. Considering that the tolerance degrees of different sample total amounts to the number of false alarms are different, the invention adopts the ratio of the total number of the false alarms to the total number of the samples under the condition that the abnormality can be detected as the evaluation index.

In order to verify whether the method provided by the invention is effective, the method provided by the invention is applied and verified. In the data used by the method, most engines only have one abnormal label, so the part of engines adopt unsupervised IDPC to carry out abnormal detection; for other engines with a plurality of abnormalities, one abnormal label is randomly reserved for carrying out weak supervision IDPC (idle mode performance) abnormality detection, and after the detection is finished, the judgment standard of the abnormality detection effect is as follows: whether an anomaly of all or a portion of the erased labels is detected.

The application verification process of the anomaly detection algorithm based on the improved density peak clustering is as follows:

1) Constructing an anomaly detection model;

2) Respectively setting a weak supervision abnormity detection model and an unsupervised abnormity detection model aiming at engines with different abnormity numbers, randomly reserving an abnormity label for guiding the weak supervision abnormity detection process for the engine using the weak supervision abnormity detection, and not participating in the abnormity detection process for the engine using the unsupervised abnormity detection;

3) Performing anomaly detection;

4) Adjusting a weak supervision abnormal detection super parameter according to a weak supervision abnormal detection result, and adjusting an unsupervised abnormal detection super parameter according to an unsupervised clustering evaluation index;

5) Outputting an abnormal detection result;

6) For weak supervision abnormal detection, the effectiveness of the abnormal detection result is verified by using the abnormal of the erased label, and for unsupervised abnormal detection, the effectiveness of the abnormal detection result is judged by using the label, as shown in a table;

7) In order to prove the superiority of the method, a multi-group density clustering method and a multi-group feature extraction method are combined in a cross mode to serve as a comparison test, and missing reports and false reports are used as indexes for evaluating an abnormal detection result.

TABLE 1 results of anomaly detection

Table 1 shows the anomaly detection results of the anomaly detection algorithm (MKSFA + IDPC) combining MKSFA with IDPC and the anomaly detection results of WSFE combining IDPC. The four old engines in the WSFE feature extraction model adopt features extracted by a new training model, and the other engines adopt features extracted by a mixed training model. The two methods provided by the invention can find most of the abnormalities of 12 engines, have less false reports and less false reports except for individual engines, and can effectively reduce the false reports by combining the two methods to meet the requirements of practical application.

In order to prove the effectiveness of the proposed method of the present invention and the correctness of the model setup in the present invention, a number of different anomaly detection models are used as comparative tests in the following. The model 1 is a Gaussian kernel slow characteristic analysis feature extraction method combined with IDPC to carry out anomaly detection; the model 2 is a polynomial kernel slow characteristic analysis feature extraction method combined with IDPC to carry out anomaly detection; the model 3 is a linear kernel slow characteristic analysis feature extraction method combined with IDPC to carry out anomaly detection; the model 4 is used for carrying out anomaly detection by combining an IDPC (inverse discrete cosine computer) by using a feature extraction method without adding a WSCE (Wireless sensor element) loss function and a classifier; the model 5 is a feature extraction method without adding a WSCE loss function and a classifier, and combines DPC to carry out anomaly detection; the model 6 is used for carrying out abnormity detection by combining WSCE with DPC; model 7 was performed for WSFE in combination with OPTICS for anomaly detection. Table 2 shows the anomaly detection effect of the above model, and the underlined results indicate that the anomaly detection result is weakly supervised.

Table 2 anomaly detection comparative test for model details

The results of the models 1 to 3 show that the mixed kernel function provided by chapter two of the invention can effectively improve the feature extraction and anomaly detection effects, the anomaly features are more obvious, and meanwhile, the false alarms of anomaly detection are less, thereby proving the effectiveness of the feature extraction method of chapter two of the invention. The results of the models 4-7 show that the model structure, the loss function and the like of the third chapter can effectively improve the feature extraction effect, and the extracted feature is more easily detected while the accurate gas circuit baseline of the engine is mined, so that the effectiveness of the third chapter feature extraction method is proved. The anomaly detection method can detect most of known anomalies, has less missing report and false report, and proves the superiority of the feature extraction and anomaly detection method.

In order to prove that the anomaly detection method is superior to the existing method, four feature extraction methods of PCA, AE, ICA and LLE are combined with IDPC, DPC and OPTICS respectively to construct 12 groups of models for anomaly detection comparison experiments. For the sake of brevity, the anomaly detection results of the seven invalid models are not specifically listed, and the comparison results are shown in table 3, and the underlined results indicate that the anomaly detection results are weakly supervised.

TABLE 3 anomaly detection comparative test of conventional methods

As can be seen from table 3, the features extracted by the conventional feature extraction method cannot accurately reflect the features of the abnormality, and the abnormality cannot be effectively detected by any abnormality detection method, which proves that the method provided by the present invention is an abnormality detection method with a wide application prospect.

The invention mainly provides an anomaly detection algorithm and a weak supervision adjustment strategy based on improved density peak value clustering. And secondly, by analyzing the characteristics of the characteristic result extracted in the previous step, the influence of different K adjacent sample numbers on the clustering result is researched. Finally, aiming at the engine with sufficient abnormal samples, a weak supervision clustering adjustment strategy is provided by adding weak supervision label information, and a weak supervision abnormal detection algorithm without complex parameter adjustment is realized. Through verification of multiple groups of comparison experiments, the method provided by the invention is more suitable for the feature extraction method provided by the foregoing, the number of missed reports and false reports is less, the performance is improved by at least one order of magnitude compared with the performance of the traditional algorithm, and the actual requirements of airlines can be met.

Claims

1. A complex equipment parameter anomaly detection method based on improved density peak clustering is characterized by comprising the following steps:

step 2: performing anomaly detection, specifically comprising:

step 2-1: the local density is defined using an exponential kernel function with width δ =1, as shown in the following equation:

KNN (i) -a set of K neighbors of point i,

d _ij point x _i And x _j The Euclidean distance of (a) is,

the original outlier threshold is shown in formulas (5) to (7), the outliers are finally allocated to the closest clusters after the other clusters are completed, the outlier threshold is determined in an average manner, so that the selected outliers are very many, and the outliers are finally allocated to the closest category, which may cause the outliers not to be judged as abnormal,

k _dist (i)＝max _j∈KNN(i) {d _ij } (5)，

step 2-2: coordinate transformation as in equations (8) - (10) is performed,

in the formula x _Kdist -vector, y, representing all values of curve Kdist on the x-axis _Kdist -a vector, M, of all values of the curve Kdist on the y-axis _trans -a transformation matrix, x, representing samples for rotation axes _Kdist ' -vector, y, representing all values on the axis after rotation _Kdist ' -a vector consisting of all values on the axis after rotation is shown,

after the coordinate transformation is completed, the Threshold point Threshold' (x) can be found _Kdist ′(i),min(y _Kdist ′))，x _Kdist ' (i) i denotes x _Kdist ' sequentially arranging indexes of ith value, and determining x according to the indexes i _Kdist (i) And the corresponding Threshold value Threshold of the outlier, after the Threshold value is determined, the outlier is directly classified into an abnormal class without participating in the subsequent clustering process;

step 2-3: the clustering process is as follows:

step 2-3-1: calculating local density, determining density peak point, determining clustering center through decision diagram, and sequentially adding class label M (M =0,1,2,3, \\ 8230;, M),

step 2-3-3: calculating the probability of the remaining points belonging to each category and assigning the probability to the category with the highest probability, wherein the probability calculation method is as shown in formulas (11) and (12), and the point x is calculated respectively _i Similarity to each class, then assign point xi to the most probable class,

k neighbor set of points i, y _j = M, M =1,2,3, \ 8230;, M denotes a label of a category,

and step 3: adjusting a weak supervision abnormal detection super parameter according to a weak supervision abnormal detection result, and adjusting an unsupervised abnormal detection super parameter according to an unsupervised clustering evaluation index;

and 4, step 4: and outputting an abnormal detection result.

2. The method for detecting the parameter abnormality of the complex equipment based on the improved density peak value clustering is characterized in that the following weak supervision clustering adjustment strategy is implemented for the complex equipment with the abnormal sample number larger than 1:

step 3-1: an iteration strategy of the K neighbor number is provided, the K neighbor number starts from 0.9 total sample number, the value is reduced at intervals of 0.1 total sample number, when the K neighbor number is less than 0.1 total sample number, the K neighbor number is continuously halved and rounded until the value of the K neighbor number is 1;

step 3-3: only one of the abnormal labels is randomly reserved, after each iteration, whether the known abnormality is detected by the abnormal detection result is judged, namely whether the known abnormality is contained in the outlier and the minority samples, and if the known abnormality can be detected, the iteration is stopped;

step 3-4: if the K neighbor number is reduced to 1 and no abnormality is detected, increasing 0.01 total sample number for the minority class judgment standard;

step 3-6: and after the anomaly detection is finished, storing the weak supervision clustering parameters, judging whether the anomaly with the removed labels is detected by the algorithm, if all the anomalies can be detected, considering that the effect of the anomaly algorithm is good, and if the anomalies can not be detected, considering that the algorithm effect is good, and if the anomalies can not be detected, considering that the algorithm is invalid.

3. The method for detecting the parameter abnormality of the complex equipment based on the improved density peak clustering as claimed in claim 1, wherein for the complex equipment with only one abnormal number, the weakly supervised clustering adjustment strategy is not applicable, so that only unsupervised clustering method can be adopted, the K nearest neighbor number is continuously adjusted according to four evaluation indexes of CH coefficient, contour coefficient, davisenburg index and dunen index, and the cluster with the highest score is used for detecting the abnormality, wherein the CH coefficient calculates closeness and separation by the sum of squares of distances, the contour coefficient SC calculates closeness and separation by observing the significance of clustering boundaries, the davisburg index DBI calculates closeness and separation by the maximum value of the cluster mean value of the intra-class samples and the cluster center distance, and the dunen index calculates the ratio of the nearest distance of the inter-class samples to the farthest distance of the intra-class samples to judge closeness and separation.