CN108052974B

CN108052974B - Fault diagnosis method, system, equipment and storage medium

Info

Publication number: CN108052974B
Application number: CN201711320019.4A
Authority: CN
Inventors: 张莉; 薛杨涛; 王邦军; 凌兴宏; 张召; 李凡长
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2017-12-12
Filing date: 2017-12-12
Publication date: 2022-05-17
Anticipated expiration: 2037-12-12
Also published as: CN108052974A

Abstract

The application discloses a fault diagnosis method, a system, equipment and a storage medium, wherein the method comprises the following steps: respectively acquiring feature data sets corresponding to the initial feature set when the target equipment runs normally and faults to obtain training data comprising the normal feature data sets and the fault feature data sets; respectively calculating corresponding KL (karhunen-Loeve) distances between each feature data in the normal feature data set and corresponding feature data in the fault feature data set to obtain a KL distance set; performing cross validation on the training data by adopting a support vector machine classification; determining features related to fault operation from the initial feature set according to the verification result and the size of the KL distance in the KL distance set to obtain an optimal feature set; and when the data to be diagnosed of the target equipment is acquired, determining the characteristic data corresponding to the optimal characteristic set from the data to be diagnosed, and then carrying out corresponding fault diagnosis on the target equipment by using the characteristic data. The method and the device effectively improve the accuracy of the follow-up fault diagnosis result.

Description

Fault diagnosis method, system, equipment and storage medium

Technical Field

The present invention relates to the field of fault diagnosis technologies, and in particular, to a fault diagnosis method, system, device, and storage medium.

Background

At present, when people judge whether a piece of equipment breaks down, the mode of manual judgment is usually used for determining, so that on one hand, due to the influence of artificial subjective factors, the reliability of a fault diagnosis result is unstable, on the other hand, a large amount of labor cost needs to be consumed, and the diagnosis efficiency is low. In order to reduce labor cost and improve diagnosis efficiency, people gradually start to adopt a support vector machine to perform fault diagnosis on equipment, but the accuracy of a diagnosis result needs to be further improved.

In summary, how to improve the accuracy of the fault diagnosis result is a problem to be solved urgently at present.

Disclosure of Invention

In view of this, the present invention provides a fault diagnosis method, system, device and storage medium, which can improve the accuracy of the fault diagnosis result. The specific scheme is as follows:

in a first aspect, the present invention discloses a fault diagnosis method, including:

respectively acquiring feature data sets corresponding to the initial feature set when the target equipment runs normally and faults to obtain training data comprising the normal feature data sets and the fault feature data sets; wherein the initial set of features comprises a plurality of features;

respectively calculating corresponding KL (karhunen-Loeve) distances between each feature data in the normal feature data set and corresponding feature data in the fault feature data set to obtain a KL distance set;

performing cross validation on the training data by adopting a support vector machine to obtain a validation result;

determining features related to fault operation from the initial feature set according to the verification result and the size of the KL distance in the KL distance set to obtain an optimal feature set;

and when the data to be diagnosed of the target equipment is acquired, determining the characteristic data corresponding to the optimal characteristic set from the data to be diagnosed, and then performing corresponding fault diagnosis on the target equipment by using the characteristic data.

Optionally, the step of respectively calculating a KL distance corresponding to each feature data in the normal feature data set and a corresponding feature data in the fault feature data set includes:

respectively determining Gaussian distribution corresponding to each feature data in the normal feature data set and the fault feature data set;

and respectively calculating KL (KL) distances between the Gaussian distribution corresponding to each feature data in the normal feature data set and the Gaussian distribution corresponding to the corresponding feature data in the fault feature data set.

Optionally, the step of respectively determining a gaussian distribution corresponding to each feature data in the normal feature data set and the fault feature data set includes:

respectively carrying out standardization processing on each feature data in the normal feature data set and the fault feature data set, and then respectively determining the Gaussian distribution corresponding to each standardized feature data in the normal feature data set and the fault feature data set.

Optionally, the step of performing cross validation on the training data by using support vector machine classification includes:

and performing ten-fold cross validation on the training data by adopting support vector machine classification.

Optionally, the step of determining, according to the verification result and the size of the KL distance in the KL distance set, the feature related to the fault operation from the initial feature set to obtain an optimal feature set includes:

intensively screening KL distances meeting preset conditions from the KL distances to obtain a target KL distance;

screening the characteristics corresponding to the target KL distance from the initial characteristic set to obtain target characteristics;

and determining the characteristics related to the fault operation from the target characteristics according to the verification result to obtain an optimal characteristic set.

Optionally, the step of screening KL distances satisfying a preset condition from the KL distance set includes:

sorting the KL distance sets in a descending order to obtain sorted distance sets;

and screening out KL distances in a preset number from the sorted distance set.

and screening out the KL distance with the KL distance larger than a preset threshold value from the KL distance set.

In a second aspect, the present invention discloses a fault diagnosis system, comprising:

the characteristic data acquisition module is used for respectively acquiring characteristic data sets corresponding to the initial characteristic set when the target equipment runs in a normal mode and a fault mode to obtain training data comprising the normal characteristic data sets and the fault characteristic data sets; wherein the initial set of features comprises a plurality of features;

a KL distance calculating module, configured to calculate a corresponding KL distance between each feature data in the normal feature data set and a corresponding feature data in the fault feature data set, respectively, to obtain a KL distance set;

the cross validation module is used for performing cross validation on the training data by adopting a support vector machine to obtain a validation result;

the characteristic determining module is used for determining characteristics related to fault operation from the initial characteristic set according to the verification result and the size of the KL distance in the KL distance set to obtain an optimal characteristic set;

and the fault diagnosis module is used for determining the characteristic data corresponding to the optimal characteristic set from the data to be diagnosed when the data to be diagnosed of the target equipment is obtained, and then performing corresponding fault diagnosis on the target equipment by using the characteristic data.

In a third aspect, the invention discloses a fault diagnosis device, comprising a processor and a memory; wherein the processor implements the fault diagnosis method disclosed above when executing the fault diagnosis program stored in the memory.

In a fourth aspect, the present invention discloses a computer-readable storage medium for storing a fault diagnosis program; wherein the fault diagnosis program implements the fault diagnosis method disclosed above when executed by a processor.

It can be seen that, after the training data comprising a normal feature data set and a fault feature data set is obtained, the corresponding KL distance between each feature data in the normal feature data set and the corresponding feature data in the fault feature data set is calculated, cross validation is performed on the training data by using a support vector machine in a classified manner, then the features related to fault operation are determined from the initial feature set according to the validation result and the calculated KL distance, after the data to be diagnosed is obtained, the feature data corresponding to the features related to fault operation can be determined from the data to be diagnosed, and as the feature data can reflect whether the equipment is in a fault operation state, the equipment can be subsequently diagnosed by using the feature data. Therefore, before fault diagnosis is carried out on the equipment, the characteristics related to fault operation are determined based on the KL distance, and the characteristics can reflect the fault operation characteristics, so that the accuracy of subsequent fault diagnosis results is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a fault diagnosis method disclosed in the embodiments of the present invention;

FIG. 2a is a graph illustrating the results of monitoring "fault 21" using an SVM;

FIG. 2b is a diagram illustrating the monitoring result of "failure 21" when using KL-FS-SVM;

fig. 3 is a schematic structural diagram of a fault diagnosis system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a fault diagnosis method, which is shown in figure 1 and comprises the following steps:

step S11: respectively acquiring feature data sets corresponding to the initial feature set when the target equipment runs normally and faults to obtain training data comprising the normal feature data sets and the fault feature data sets; wherein the initial set of features comprises a plurality of features.

It should be noted that, in this embodiment, when the feature data set corresponding to the initial feature set of the target device during fault operation is obtained, the feature data set corresponding to the initial feature set of the target device during operation under a type of fault may be obtained, so that a corresponding fault feature data set is finally obtained; of course, the embodiment may also acquire the feature data sets corresponding to the initial feature set when the target device operates under K types of faults, so that corresponding K types of fault feature data sets are finally obtained. Wherein the value of K is an integer greater than 1.

Step S12: and respectively calculating corresponding KL (Kullback-Leibler divergence) distances between each feature data in the normal feature data set and the corresponding feature data in the fault feature data set to obtain a KL distance set.

In this embodiment, the step of calculating the KL distance corresponding to each feature data in the normal feature data set and the corresponding feature data in the fault feature data set respectively may specifically include:

respectively determining Gaussian distribution corresponding to each feature data in the normal feature data set and the fault feature data set; and respectively calculating KL (KL) distances between the Gaussian distribution corresponding to each feature data in the normal feature data set and the Gaussian distribution corresponding to the corresponding feature data in the fault feature data set.

Considering that different dimensions and orders of magnitude may exist between different initially acquired feature data, for this reason, in this embodiment, when determining the gaussian distribution corresponding to the feature data, normalization processing needs to be performed on the feature data first. That is, the step of determining the gaussian distribution corresponding to each feature data in the normal feature data set and the fault feature data set respectively may specifically include:

It is understood that, for each type of fault, the present embodiment will perform the processing procedures disclosed in the above step S12 and the following steps S13, S14 and S15 for the corresponding fault feature data set.

Step S13: and performing cross validation on the training data by adopting a support vector machine to obtain a validation result.

The step of performing cross validation on the training data by using support vector machine classification may specifically include: and performing ten-fold cross validation on the training data by adopting support vector machine classification.

Step S14: and determining the characteristics related to the fault operation from the initial characteristic set according to the verification result and the size of the KL distance in the KL distance set to obtain an optimal characteristic set.

In this embodiment, the step of determining, according to the verification result and the size of the KL distance in the KL distance set, the feature related to the faulty operation from the initial feature set to obtain the optimal feature set may specifically include:

intensively screening KL distances meeting preset conditions from the KL distances to obtain a target KL distance; screening the characteristics corresponding to the target KL distance from the initial characteristic set to obtain target characteristics; and determining the characteristics related to the fault operation from the target characteristics according to the verification result to obtain an optimal characteristic set.

In a specific embodiment, the step of screening KL distances satisfying a preset condition from the KL distance set may specifically include:

sorting the KL distance sets in a descending order to obtain sorted distance sets; and screening out KL distances in a preset number from the sorted distance set.

It is understood that the preset number can be specifically set according to actual needs, and is not limited herein.

In another specific embodiment, the step of screening KL distances satisfying the preset condition from the KL distance set may specifically include:

It is understood that the preset threshold may be specifically set according to actual needs, and is not limited herein.

Step S15: and when the data to be diagnosed of the target equipment is acquired, determining the characteristic data corresponding to the optimal characteristic set from the data to be diagnosed, and then performing corresponding fault diagnosis on the target equipment by using the characteristic data.

In this embodiment, after the feature data corresponding to the optimal feature set is determined from the data to be diagnosed, corresponding fault diagnosis may be performed on the target device by using the feature data and combining a Support Vector Machine (SVM).

It can be seen that, in the embodiments of the present invention, after the training data including the normal feature data set and the fault feature data set is acquired, the KL distance corresponding to each feature data in the normal feature data set and the corresponding feature data in the fault feature data set is calculated, and cross validation is performed on the training data by using a support vector machine, and then the features related to fault operation are determined from the initial feature set according to the validation result and the calculated KL distance. Therefore, before fault diagnosis is carried out on the equipment, the method and the device for fault diagnosis determine the characteristics related to fault operation based on the KL distance, and the characteristics can reflect the characteristics of fault operation, so that the accuracy of subsequent fault diagnosis results is effectively improved.

Based on the foregoing embodiments, the embodiments of the present invention perform corresponding tests on a tennessman Process (TEP) data set. The TEP data set comprises a feature data set corresponding to normal operation and a feature data set corresponding to 21 different faults. For each fault, the training set has 480 fault feature data, the test set comprises 960 observation data, each observation data comprises 52 variables, the data of the test set starts with normal data, the 161 th sampling fails, all data are sampled every 3 minutes, and all data are generated by the TEP simulation software. In this embodiment, 500 training data in the normal feature data set and 480 training data of one fault are taken as input of the training set, and fault detection is performed on the test set of each fault. The specific implementation steps are as follows:

1) for collected normal characteristic data set in industrial process

And fault signature data set

N₀And N_kRespectively, the number of samples in the normal feature data set and the kth fault feature data set, m is the number of features, N is the number of fault categories, where N is₁＝500，N_kWhen the value is 480, m is 52, n is 21, and then the characteristic data are respectively processed by standardization preprocessing. The normalized formula is:

wherein the content of the first and second substances,

is the mean value of the jth characteristic data of the normal characteristic data set,

is the standard deviation of the jth characteristic data of the normal characteristic data set.

2) Assuming that each feature data obeys different Gaussian distribution, respectively corresponding feature data of different classes of data

And estimating the generalized Gaussian distribution, wherein the distribution of each characteristic data can be approximated by a Gaussian density function:

wherein the content of the first and second substances,

different parameters

And

representing different gaussian density functions, i.e. the gaussian distributions are also different. Therefore, the invention provides that the Gaussian distribution of the jth characteristic data of the kth class is composed of parameters

And

it is decided that,

and

it is obtained by maximum likelihood estimation of the gaussian density function. The maximum likelihood estimation method calculates as follows:

where Ψ (z) ═ Φ' (z)/Φ (z), is first determined by the above formula

And

first, estimate initialization by using a moment method

Iteration is carried out by using Newton method

To obtain

f (β) is calculated as follows:

to obtain

After that, the air conditioner is started to work,

is calculated as follows:

to obtain

And

the gaussian distribution of the jth characteristic data of the kth class can be obtained.

3) The KL divergence measures the distance of two distributions, P and Q, and the formula is calculated:

therefore, substituting the Gaussian density function of the characteristic j into the KL distance to obtain the KL distance between the corresponding characteristic data of the characteristic j under the normal and kth fault conditions respectively:

4) for the same fault type k, KL distances corresponding to different characteristics j are expressed as

To pair

And sorting from large to small, and marking the feature set obtained by corresponding sorting as R.

5) And performing ten-fold cross validation on the training data by adopting the support vector machine classification, and taking the feature subset F with the best classification effect from the R, namely the optimal feature set.

After the optimal feature set is obtained, the present embodiment can be examined based on the test set. Specifically, the method comprises the following steps:

1) collecting test data of an industrial process in real time

(m is the number of features), here, 960 test samples, 52 feature m, and the test data is normalized, the corresponding normalization formula is:

2) and selecting the characteristics of the test data according to the obtained optimal characteristic set F to form input data, classifying by using a support vector machine, outputting a result, and judging whether the test sample has a fault or not and whether the test sample belongs to the fault.

According to the feature selection method based on the KL distance, provided by the embodiment of the invention, the normal feature data set and the fault feature data set of the TEP are used as training data, and the fault test data set of the TEP is tested. Experiments show that the combination of the feature selection method based on the KL distance and the support vector machine (namely KL-FS-SVM) provided by the embodiment of the invention can improve the fault diagnosis result of the traditional Support Vector Machine (SVM), and meanwhile, for complex process data, the feature selection method based on the KL distance is far superior to the traditional feature selection methods (such as Fscore and Relief). As shown in table one, the fault diagnosis results of different faults are found in this embodiment that the detection rates of the Fscore-SVM and the Relief-SVM in the "fault 20" are not as high as that of the SVM, which indicates that the features causing the fault are more and less different, and the detection rate obtained by selecting the first 47 features by the proposed feature selection method based on the KL distance is slightly higher than that of the SVM without feature selection, which indicates the effectiveness of the feature selection method proposed in the embodiment of the present invention. And in the KL-FS-SVM fault diagnosis model, the detection rates of the fault 11 and the fault 21 are greatly improved.

Watch 1

Type of failure	SVM	Fscore‐SVM	Relief‐SVM	KL‐FS‐SVM
					Fault
11	83.50％	76.88％	84.75％	87.88％
					Fault 20	80.63％	79.00％	78.88％	80.75％
Fault 21	12.88％	13.38％	42.13％	100％

Watch two

Type of failure	Fscore	Relief	KL‐FS
				Fault
11	43	15	2
				Fault 20	27	29	47
Fault 21	33	35	1

In addition to the fault detection rate, referring to table two, comparing the feature numbers of the optimal feature set obtained by the Fscore, Relief and KL-FS feature selection methods, the number of the selected features of KL-FS is the smallest, and the diagnostic result of the SVM is improved to a great extent, especially the diagnostic performance of the "fault 21" is the most obvious, as shown in fig. 2a and fig. 2 b. Fig. 2a is a schematic diagram of a monitoring result of "fault 21" when the SVM is used, and fig. 2b is a schematic diagram of a monitoring result of "fault 21" when the KL-FS-SVM is used.

Correspondingly, the embodiment of the present invention further discloses a fault diagnosis system, as shown in fig. 3, the system includes:

the characteristic data acquisition module 11 is configured to acquire characteristic data sets corresponding to the initial characteristic set of the target device during normal operation and fault operation, respectively, to obtain training data including a normal characteristic data set and a fault characteristic data set; wherein the initial set of features comprises a plurality of features;

a KL distance calculating module 12, configured to calculate a corresponding KL distance between each feature data in the normal feature data set and a corresponding feature data in the fault feature data set, respectively, to obtain a KL distance set;

a cross validation module 13, configured to perform cross validation on the training data by using a support vector machine to obtain a validation result;

the characteristic determining module 14 is configured to determine characteristics related to fault operation from the initial characteristic set according to the verification result and the size of the KL distance in the KL distance set, so as to obtain an optimal characteristic set;

and the fault diagnosis module 15 is configured to determine, when data to be diagnosed of the target device is obtained, feature data corresponding to the optimal feature set from the data to be diagnosed, and then perform corresponding fault diagnosis on the target device by using the feature data.

For more specific working processes of the modules, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

Furthermore, the invention also discloses a fault diagnosis device, which comprises a processor and a memory; wherein the processor implements the fault diagnosis method disclosed above when executing the fault diagnosis program stored in the memory. For more specific steps of the method, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

Further, the present invention also discloses a computer readable storage medium for storing a fault diagnosis program; wherein the fault diagnosis program implements the fault diagnosis method disclosed above when executed by a processor. For more specific steps of the method, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The fault diagnosis method, system, device and storage medium provided by the present invention are described in detail above, and the principle and the implementation of the present invention are explained in this document by applying specific examples, and the description of the above examples is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A fault diagnosis method, comprising:

when the data to be diagnosed of the target equipment is obtained, determining feature data corresponding to the optimal feature set from the data to be diagnosed, and then performing corresponding fault diagnosis on the target equipment by using the feature data;

wherein the step of calculating the KL distance between each feature data in the normal feature data set and the corresponding feature data in the fault feature data set respectively includes:

respectively calculating KL (KL) distances between Gaussian distributions corresponding to each feature data in the normal feature data set and Gaussian distributions corresponding to corresponding feature data in the fault feature data set;

the step of determining the features related to the fault operation from the initial feature set according to the verification result and the size of the KL distance in the KL distance set to obtain an optimal feature set includes:

2. The method according to claim 1, wherein the step of determining the gaussian distribution corresponding to each of the normal characteristic data set and the fault characteristic data set respectively comprises:

respectively carrying out standardization processing on each feature data in the normal feature data set and the fault feature data set, and then respectively determining Gaussian distribution corresponding to each standardized feature data in the normal feature data set and the fault feature data set.

3. The method of fault diagnosis according to claim 1, wherein said step of cross-validating on said training data using support vector machine classification comprises:

4. The fault diagnosis method according to claim 1, wherein the step of screening KL distances satisfying a preset condition from the KL distance set includes:

and screening out KL distances in a preset number from the sorted distance set.

5. The fault diagnosis method according to claim 1, wherein the step of screening KL distances satisfying a preset condition from the KL distance set includes:

6. A fault diagnosis system, comprising:

the fault diagnosis module is used for determining feature data corresponding to the optimal feature set from the data to be diagnosed when the data to be diagnosed of the target equipment is obtained, and then performing corresponding fault diagnosis on the target equipment by using the feature data;

the KL distance calculation module is further configured to determine a Gaussian distribution corresponding to each feature data in the normal feature data set and the fault feature data set respectively; respectively calculating KL (KL) distances between Gaussian distributions corresponding to each feature data in the normal feature data set and Gaussian distributions corresponding to corresponding feature data in the fault feature data set;

the characteristic determination module is further used for screening KL distances meeting preset conditions from the KL distance set to obtain a target KL distance; screening the characteristics corresponding to the target KL distance from the initial characteristic set to obtain target characteristics; and determining the characteristics related to the fault operation from the target characteristics according to the verification result to obtain an optimal characteristic set.

7. A fault diagnosis apparatus comprising a processor and a memory; wherein the processor implements the failure diagnosis method according to any one of claims 1 to 5 when executing the failure diagnosis program stored in the memory.

8. A computer-readable storage medium for storing a fault diagnosis program; wherein the fault diagnosing program implements the fault diagnosing method according to any one of claims 1 to 5 when executed by a processor.