CN115081468A

CN115081468A - Multi-task convolutional neural network fault diagnosis method based on knowledge migration

Info

Publication number: CN115081468A
Application number: CN202110276577.5A
Authority: CN
Inventors: 刘若楠; 蒲宇胜; 王煜; 胡清华
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2021-03-15
Filing date: 2021-03-15
Publication date: 2022-09-20

Abstract

The invention discloses a multi-task convolutional neural network fault diagnosis method based on knowledge migration, which comprises the following steps of: step 1: preprocessing data; step 2: learning from a coarse structure to a fine structure; and 3, step 3: multitasking is migrated from coarse to fine knowledge. The invention shows significant advantages in large scale fault diagnosis. Since a good initialization of the CNN parameters obtained from the coarse-grained task can effectively avoid poor local minima. Meanwhile, effective judgment information is reserved and transmitted to a fine-grained task to realize effective fault identification, and compared with a flat CNN, the PKT-MCNN provided by the application converges to a better local minimum value, so that the remarkable influence of gradual knowledge transfer on the learning of the CNN is verified.

Description

Multi-task convolutional neural network fault diagnosis method based on knowledge migration

Technical Field

The invention relates to the technical field of intelligent fault diagnosis, in particular to a multi-task convolutional neural network fault diagnosis method based on knowledge migration.

Background

Fault diagnosis is becoming increasingly important in modern society as an effective tool to maintain safe operation of industrial systems and reduce maintenance costs of unnecessary routine shutdowns. Therefore, various diagnostic methods have been proposed to detect faults early and accurately. In recent years, with the development of sensors and information technology, industry data is rapidly accumulated, and Deep Learning (DL) based diagnostic methods are being promoted. Based on a deep framework and a plurality of nonlinear layers, the DL algorithm can adaptively learn high-level representation characteristics, and the inherent defects of the traditional diagnosis method are overcome.

Most deep learning based methods work well for small numbers of fault diagnoses, but fail to converge to satisfactory results when dealing with large scale fault diagnoses, because the large number of fault types can lead to inter-class distance imbalances and local minima in the neural network. For example, in a large-scale diagnosis task for handling a large number of fault types, the DL-based diagnosis method has a drawback that, first, the DL-based method tends to fall into a local minimum when random initialization is performed, and those methods that need to handle a large number of fault types may aggravate the problem due to an increase in parameters and a complication of the structure. Furthermore, as the label space increases, the upper bound of the generalization error increases, resulting in a decrease in the final diagnostic performance, and, from the perspective of algorithm design, an increase in the fault category may result in the problem of imbalance of the intra-class/inter-class distances, i.e., large intra-class distances and small inter-class distances of faults. For example, the distance between similar failures of one component is small and difficult to distinguish, while the distance between failures of different subsystems is large and can be easily classified. Therefore, in such large-scale fault diagnosis tasks, the inter-class distance of some similar faults may be even smaller than the intra-class distance of other faults, which makes the conventional deep learning method difficult to apply. Therefore, large-scale fault diagnosis of complex systems with multiple fault types has been a difficult problem to crack.

Disclosure of Invention

The application provides a multi-task convolutional neural network fault diagnosis method based on knowledge migration, which can overcome the current defects of fault diagnosis of a large complex system, and learns fault information from coarse to fine into a network by using known fault information in a migration learning mode, so that better performance is obtained.

The invention provides a multi-task convolutional neural network fault diagnosis method based on knowledge migration,

the method comprises the following steps:

step 1: preprocessing data;

step 2: learning from a coarse structure to a fine structure;

and step 3: multitasking is migrated from coarse to fine knowledge.

Preferably, step 1 specifically comprises: and processing the input sample signal to obtain an input sample matrix.

Preferably, step 2 specifically comprises:

step 21: similar graph construction for fault types;

step 22: spectral clustering of fault types;

step 23: and outputting the knowledge structure.

Preferably, steps 21 and 22 specifically include: and clustering similar fault types to form coarse-grained knowledge.

Preferably, step 23 specifically includes:

step 231: training a neural network for extracting coarse-grained knowledge;

step 232: extracting coarse grain knowledge;

step 233: obtaining a network with coarse-grained knowledge extracted;

step 244: and transferring the knowledge of the network to a fine-grained network through transfer learning.

Preferably, step 3 specifically comprises:

step 31: constructing a multitask convolution neural network based on progressive knowledge transfer;

step 32: training PKT-MCNN model transfer by using progressive knowledge;

step 33: testing a new sample by using the trained PKT-MCNN model;

step 34: and outputting the fault type of the test sample.

Preferably, in step 3, the PKT-MCNN training process includes three stages:

(1) training a coarse-grained task to master coarse-grained knowledge;

(2) simultaneously training coarse-grained and fine-grained tasks in a multi-task mode;

(4) fine-grained tasks are trained individually by fine-tuning the parameters updated in stage (2).

Preferably, in step 3, the PKT algorithm is embedded on top of the multitasking CNN,

the PKT adjusts the different attentions for the two tasks, the training coarse-grained task and the training fine-grained task by a weight parameter lambda,

in particular, the amount of the solvent to be used,

in the first phase, λ is set to 1, learning only coarse-grained tasks, since the weight of the fine-grained tasks is 0, the loss stops from back propagation;

in the second stage, the lambda is gradually reduced to 0 according to the number of training epochs, so that the two tasks are learned simultaneously, and coarse-grained knowledge is gradually transferred to fine-grained knowledge; second stage λ, noted λ ₂ Calculated from the following equation:

b1 and B3 are the training epochs of the first stage and the second stage respectively; b is the current epoch number, Bmax is the maximum epoch number;

finally, the value of λ at the third stage is opposite to the value of λ at the first stage, and λ is set to 0 to fine tune the fine-grained task without updating the parameters of the coarse-grained task.

Compared with the prior art, the invention has the advantages that,

the invention shows significant advantages in large scale fault diagnosis. Since a good initialization of the CNN parameters obtained from the coarse-grained task can effectively avoid poor local minima. Meanwhile, effective judgment information is reserved and transmitted to a fine-grained task to realize effective fault identification, and compared with a flat CNN, the PKT-MCNN provided by the application converges to a better local minimum value, so that the remarkable influence of gradual knowledge transfer on the learning of the CNN is verified.

The method can intelligently learn a reasonable knowledge structure consistent with the physical composition of the nuclear power system, is very good at handling large-scale fault diagnosis under a special physical background, and extracts and transfers coarse-grained knowledge to a final fine-grained diagnosis task. The method provided by the application can become a promising tool in future industrial big data analysis research.

Drawings

FIG. 1 is a flow chart of a method of the present application;

FIG. 2 is a flow chart of a coarse-to-fine structure learning method of the present application;

FIG. 3 is a diagram of the PKT-MCNN model architecture of the present application.

Detailed Description

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict.

The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, a method for diagnosing a fault of a multitask convolutional neural network based on knowledge migration according to an embodiment of the present invention includes the following steps:

step 1: preprocessing data;

the input signal refers to an original signal of an input device, and is usually vibration data, and the input signal is processed by means of noise reduction, spectrum extraction, and the like to obtain a matrix, which is input to a subsequent algorithm and is called an input sample matrix.

And 2, step: learning from a coarse structure to a fine structure;

as shown in fig. 2, the method specifically comprises the following steps:

step 21: similar graph construction for fault types;

in the present embodiment, the construction of the failure type similarity graph adopts the following method:

to automatically extract the coarse-grained knowledge structure, the information in the data is used to group similar fault types into one common superior node of the structure, which represents the coarse-grained concept of a fault.

Give the jth sample of the ith fault

Similarity graph G ═ V, E for the error types contains similarity information for each of the two fault types, where each vertex V is a vertex _i e.V is a failure type, and each edge e _ij E is two connected vertices v _i And v _j The similarity between them. To construct this graph, the similarity of each pair of fault types should be calculated. Unlike instance-level clustering, each fault type needs to be represented before calculating the similarity. In this work, each fault type is vectorized and represented by the centroid of all its samples. Mathematically, the vertex v in the similarity graph _i Is shown as

Wherein M is _i The number of samples of the ith fault. This representation has two advantages for large scale fault diagnosis. On the one hand, for many fault samples, it is efficient to compute the first order statistics, i.e. the mean vector. On the other hand, representing the failure type with a mean vector may mitigate the adverse effects of noise samples that are typically present in large data sets. Through the expression of the vertex, the similarity information e between the two fault types i and j can be calculated through Gaussian similarity _ij . The formula is as follows:

wherein σ is the sum of e _ij Value of (A) to (B)To be [0,1]The scaling factor of (c).

Step 22: spectral clustering of fault types;

using the similarity graph G, the fault types are assigned to different coarse-grained fault concepts by clustering techniques. The method adopts a normalized cut (NCut) algorithm to cut the graph into a plurality of subgraphs, and can formally optimize the following objective functions:

s·t·H'DH＝I

in the formula, k is the number of the coarse-grained fault concepts to be clustered. If vi ∈ Cj, then h _ij ＝1/|C _j I, otherwise, h _ij ＝0，h _ij Are elements in the matrix H. i ∈ {1,2, ·, N }; j ∈ {1,2, ·, k }; L-D-G is a laplacian matrix; d is a secondary matrix of G; i is the identity matrix, |, is the cardinality of the set, and Shi and Malik demonstrate that this goal can be approximated with the eigenvector of L associated with the second smallest eigenvalue. Thus, k coarse-grained fault concepts C ₁ ,C ₂ ，..，C _k And constructing a two-level knowledge structure T, wherein each coarse-grained concept comprises a plurality of fine-grained fault types.

And clustering similar fault types through the two steps to form coarse-grained knowledge.

Step 23: and outputting the knowledge structure.

Wherein, step 23 specifically comprises the following steps:

step 231: training a neural network for extracting coarse-grained knowledge;

step 232: extracting coarse grain knowledge;

step 233: obtaining a network with coarse-grained knowledge extracted;

And step 3: multitasking is migrated from coarse to fine knowledge.

The method specifically comprises the following steps:

step 32: training PKT-MCNN model transfer by using progressive knowledge;

step 33: testing a new sample by using the trained PKT-MCNN model;

step 34: and outputting the fault type of the test sample.

In a preferred embodiment, as shown in fig. 3, for the architecture diagram of the PKT-MCNN model of the present application, in step 3, the training process of the PKT-MCNN includes three stages:

(1) training a coarse-grained task to master coarse-grained knowledge;

(3) fine-grained tasks are trained individually by fine-tuning the parameters updated in stage (2).

The PKT controls and switches the stage through the weight of the loss layer, and the weight represents the current attention degree of the corresponding task. The loss function used here is the cross entropy loss, noted

Wherein r is _i Is 0 or 1, corresponding to a true tag; p is a radical of _i (. cndot.) produces a predicted score, which is a learnable parameter W _f As a function of (c). Therefore, the loss function L of PKT-MCNN _M Defined as a weighted combination of coarse and fine grained tasks:

the PKT adjusts the different attentions to the two tasks, the coarse-grained task and the fine-grained task,

in particular, the amount of the solvent to be used,

wherein, B1 and B3 are respectively the training epoch numbers of the first stage and the second stage; b is the current epoch number, Bmax is the maximum epoch number;

The technical means not described in detail in the present application are known techniques.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A multi-task convolutional neural network fault diagnosis method based on knowledge migration is characterized in that,

the method comprises the following steps:

step 1: preprocessing data;

step 2: learning from a coarse structure to a fine structure;

and step 3: multitasking is migrated from coarse to fine knowledge.

2. The knowledge migration based multitask convolutional neural network fault diagnosis method of claim 1,

the step 1 specifically comprises the following steps: and processing the input sample signal to obtain an input sample matrix.

3. The knowledge migration based multitask convolutional neural network fault diagnosis method of claim 1,

the step 2 specifically comprises the following steps:

step 21: similar graph construction for fault types;

step 22: spectral clustering of fault types;

step 23: and outputting the knowledge structure.

4. The knowledge migration based multitask convolutional neural network fault diagnosis method of claim 3,

the steps 21 and 22 specifically include: and clustering similar fault types to form coarse-grained knowledge.

5. The knowledge migration based multitask convolutional neural network fault diagnosis method of claim 3,

step 23 specifically includes:

step 231: training a neural network for extracting coarse-grained knowledge;

step 232: extracting coarse grain knowledge;

step 233: obtaining a network with coarse-grained knowledge extracted;

6. The knowledge migration based multitask convolutional neural network fault diagnosis method of claim 1,

the step 3 specifically comprises the following steps:

step 32: training PKT-MCNN model transfer by using progressive knowledge;

step 33: testing a new sample by using the trained PKT-MCNN model;

step 34: and outputting the fault type of the test sample.

7. The knowledge migration based multitask convolutional neural network fault diagnosis method of claim 6,

in step 3, the PKT-MCNN training process includes three stages:

(1) training a coarse-grained task to master coarse-grained knowledge;

8. The knowledge migration based multitask convolutional neural network fault diagnosis method of claim 6,

in step 3, the PKT algorithm is embedded on top of the multitasking CNN,

in particular, the amount of the solvent to be used,

in the second stage, the lambda is gradually reduced to 0 according to the number of training epochs, so that the two tasks are learned simultaneously, and coarse-grained knowledge is gradually transferred to fine-grained knowledge; second stage λ, denoted λ ₂ Calculated from the following equation: