CN114818996B

CN114818996B - Method and system for diagnosing mechanical fault based on federal domain generalization

Info

Publication number: CN114818996B
Application number: CN202210738070.1A
Authority: CN
Inventors: 宋艳; 李沂滨; 贾磊; 崔明; 王代超
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2022-06-28
Filing date: 2022-06-28
Publication date: 2022-10-11
Anticipated expiration: 2042-06-28
Also published as: CN114818996A

Abstract

The invention discloses a method and a system for diagnosing mechanical faults based on the generalization of the federal domain, which relate to the technical field of fault diagnosis. In a second step, the client independently trains the model using its own training data set. And thirdly, sending the models trained by all the clients to a server, and averaging all the model parameters in the server to obtain a global model. And fourthly, the client and the central server cooperate to train the global model. In the testing stage, the server sends the global model to the client side containing the target domain data to complete fault diagnosis. The invention utilizes the inherent relation between the label and the characteristic of the source domain data, and completes the training of the global fault diagnosis model by weighting and aggregating the training loss and the model parameters of different client models in the central server.

Description

Method and system for diagnosing mechanical fault based on federal domain generalization

Technical Field

The invention relates to the technical field of fault diagnosis, in particular to a method and a system for diagnosing mechanical faults based on the generalization of the federal domain.

Background

The mechanical fault data are usually from different types of equipment, different working conditions or different operating environments, and the fault diagnosis model trained by using the data in cooperation has the defects of low accuracy and poor generalization capability on the prediction of new data. The domain generalization and domain adaptation method in the transfer learning solves the domain drift problem by aligning the data feature space. In the domain generalization and domain adaptation methods, labeled data is generally called source domain data, and unlabeled data to be predicted is called target domain data. The domain adaptation realizes the fault prediction of the target domain data by aligning the feature space of the source domain data and the target domain data. Unlike domain adaptation, which only has source domain data and no target domain data, domain generalization achieves domain migration by exploiting the internal relationships of data features and labels in the source domain.

In the prior art, a certain amount of Fault Diagnosis is studied based on Domain Generalization, and a Rolling Bearing Fault Diagnosis method based on Domain Generalization is proposed in a document [ Deep Domain Generalization combination A principle Diagnosis Knowledge heated Cross-Domain Fault Diagnosis of Rolling Bearing ]. The method eliminates potential differences among multiple domains under the condition that the target domain only has a healthy sample, and realizes efficient fault diagnosis. The document [ Conditional adaptive Domain Generalization With a Single Discriminator for Bearing Fault Diagnosis ] proposes a condition-to-immunity Domain Generalization method With a Discriminator, and aims to extract Domain invariant features from data With different working conditions and generalize the features into new Fault data. A novel Intelligent Fault Identification method Based on a multi-source Domain is provided in a document [ Intelligent Fault Identification Based on multi-source Domain general knowledge scientific scienio ]. The method describes the discriminant structure of each source domain as a point of the Grassmann manifold using local Fisher discriminant analysis. By preserving the local structure within the class, the local Fisher discriminant analysis can learn an effective discriminant from the multimodal fault data. A multi-Source Domain Adaptation probability learning Method is proposed in the document (A New Multiple Source Domain Adaptation probability methods Between Different learning Machines). The method uses a multi-pair learning strategy to obtain a feature representation of domain alignment while having discriminability for a target domain. The document [ Deep adaptive Domain Adaptation Model for Bearing Fault Diagnosis ] proposes a depth-to-anti-Domain adaptive Model for Fault Diagnosis of a rolling Bearing. The model constructs an anti-domain adaptation network to solve the problem of inconsistent distribution of source domain and target domain characteristics.

It can be seen that the data-driven fault diagnosis algorithm trains the diagnostic model based on a large amount of fault data. Therefore, in order to guarantee the effectiveness of the deep learning method, as much fault data as possible needs to be used in an aggregation manner. However, due to data security and privacy requirements, the aggregated use of data by different clients is not allowed in most cases. Therefore, in order to effectively aggregate and use data on the premise of ensuring the data safety of different clients and solve the problem of data island in the deep learning process, federal learning is carried forward. In federal learning, the learning task is solved in federal form by multiple participating devices (i.e., clients) under the coordination of a central server. From the perspective of theoretical research, scholars at home and abroad develop research on common scientific problems in federal learning, such as the problem of non-independent and same distribution of data, the problem of no-labeled data, safety and the like. From the application perspective, a great deal of research has been conducted by domestic and foreign scholars on how to combine federal learning with a specific application scenario, such as finance, medical treatment, robots, smart cities, and the like.

Federal learning uses data of different clients to collaboratively train a model, but due to different operating conditions or models of devices of different clients, the data usually has domain drift problems, so federal migration learning is concerned by more and more researchers. Zhang et al (Federal Transfer Learning method for Intelligent Fault Diagnostics Using Deep adaptive Networks with Data Privacy) provides a Federated Transfer Learning method for Fault diagnosis, which designs different network model structures for different clients. The document [ Data privacy fed transfer learning in mechanical failure diagnostics using prior distributions ] proposes a joint migration learning method for mechanical failure diagnosis. The method provides that the domain drifting problem is indirectly solved by using prior distribution, and the fault diagnosis is carried out by extracting the domain invariant features of different users.

The existing Federal transfer learning fault diagnosis method considers the problems of data safety and domain drift between a source domain and a target domain. However, the existing method assumes that the target domain data exists and participates in the training process, and does not consider the problems that the target domain data is unavailable and the model is personalized to train.

Disclosure of Invention

Aiming at the problems, the invention provides a method and a system for training individualized fault diagnosis models for each client based on the difference of source domain data, and provides a method and a system for diagnosing mechanical faults based on the generalization of the federal domain.

In the invention, in order to ensure the safety of data, fault data and fault characteristics are not shared between the client and between the client and the central server. On the other hand, the method provided by the invention trains a global fault diagnosis model in a central server by using partial model parameters and weighted loss of different clients.

In order to achieve the purpose, the invention is realized by the following technical scheme:

the first aspect of the disclosure provides a mechanical fault diagnosis method based on the generalization of the federal domain, which comprises the following steps:

the central server randomly initializes the global model and sends the global model to all the clients;

the client independently trains the model by using the training data set of the client;

sending the models trained by all the clients to a central server, and averaging all the model parameters in the central server to obtain a global model;

the central server sends the global model to all the clients, and the clients and the central server cooperate to train the global model;

and the central server sends the trained global model to a client containing target domain data to complete fault diagnosis.

Further, the central server sends the global model to all the clients, and the clients complete the following tasks:

calculating classification loss based on the classification loss function and sending the classification loss to a central server;

acquiring the output characteristics of each client characteristic extraction network, and sending the covariance matrix of the output characteristics to a central server;

and calculating the invariant risk minimization loss based on the invariant risk minimization loss function and sending the invariant risk minimization loss to the central server.

Further, the classification loss is

The classification loss function is:

wherein, the first and the second end of the pipe are connected with each other,

is as follows

The training data of the individual clients is,

is as follows

The training data sets of the individual clients are true labels,

is a prediction result.

Further, the invariant risk minimization loss function is as follows:

wherein IRM is an invariant risk minimization loss,

it is meant that the gradient calculation is performed,

b represents

And

the number of the (c) is greater than the total number of the (c),

、

in order to input the data, it is proposed that,

、

in order to input the label, the user must,

is composed of

Passing through a feature extraction network

The characteristics of the latter output are such that,

is composed of

Passing through a feature extraction network

The characteristics of the latter output are such that,

is the first

To a client

The group characteristics and the label of the tag,

is the first

To a client

Group characteristics and labels;

is a function of the loss of the classification,

is a network of feature extraction that is,

is a scalar quantity.

Further, the central server receives the classification loss and invariant risk minimization loss of all the clients and the covariance matrix of the features; and calculating the second-order statistical characteristic distance of the characteristic covariance matrix of every two clients in the central server, and obtaining characteristic distance measurement loss based on a characteristic distance measurement loss function.

Further, the feature distance metric loss function is as follows:

，

wherein the content of the first and second substances,

for the feature distance metric loss, N represents the number of clients; f denotes the F-norm of the matrix,

is the size of the feature vector and is,

is a party of assistanceThe difference matrix is a matrix of the differences,

、

representing the characteristic covariance matrix of any two clients.

Further, a global penalty value is calculated at the central server based on the global penalty function, and a global model of the central server is trained based on the global penalty value back propagation.

Further, the global penalty function is as follows:

，

classification loss, invariant risk minimization loss and feature distance metric loss,

is as follows

The number of the client-side is small,

classifying a weight lost for each client, wherein,

is an integer between 1 and N,

denotes the first

The loss value of a sample of source domain datasets, N representing the number of clients.

A second aspect of the present disclosure provides a mechanical fault diagnosis system based on federal domain generalization, including:

a central server and a client; the central server comprises a global feature extraction network and a global classification network, and the central server simultaneously carries out information interaction with a plurality of clients; the central server is also used for initializing the global model.

Furthermore, each client comprises a feature extraction network and a classification network, wherein N clients comprise N source domain data sets, and the (N + 1) th client comprises a target domain data set.

The beneficial effects of the above-mentioned embodiment of the present invention are as follows:

according to the method, a federal learning mode is adopted, data of a source domain does not need to be leaked to other untrusted third parties, the privacy of the source domain data is protected, the safety of the data is guaranteed, and a global fault diagnosis model is trained on the basis of the source domain data of all clients; compared with other domain adaptation methods, the data security and the fault diagnosis accuracy of the federal domain generalized fault diagnosis method provided by the invention are improved, the interpretability of a machine learning method is improved, and the problem of cross-domain fault diagnosis is fundamentally solved.

The method comprehensively considers the internal inherent relation between the data characteristics and the labels, the spatial distance of the client data characteristics and the migration among different client models, takes the intrinsic causal relation between the fault characteristics and the fault types of each client as a training objective function, and trains under a federal learning framework to obtain the fault diagnosis model with strong generalization capability.

The method provided by the invention does not share fault data or characteristics, reduces the difference of data in different fields through the training and migration of part of model parameters in each client, adopts a model transfer strategy in a characteristic extraction layer of a client model, and reduces the workload of a training model on the premise of not influencing the generalization capability of the model.

The method provided by the invention quantifies the difference of the characteristic space distances of different source domains, and realizes domain generalization by weighting the model losses of different clients.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

Fig. 1 is a structural relationship diagram of a federal learning center server and a client in a conventional method;

FIG. 2 is a framework diagram of the federated domain generalization method of the present invention.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular forms "a", "an", and/or "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof;

the first embodiment is as follows:

let N clients contain N source domain data sets

The (N + 1) th client side has a target domain data set

Wherein, in the process,

represents the total number of samples of the target domain data set,

is shown as

The number of samples of the target domain data set,

range of values from 1 to

An integer in between.

Is shown as

The individual source domain data set samples are,

is an integer between 1 and N. The target domain data set does not participate in model training, is only used for testing, and the source domain participates in training.

Each client has a feature extraction network and a classification network. Setting the model set of N characteristic extraction networks as

The set of N classification network models is

. Setting the global characteristic extraction network model in the central server as

The global classification network model is

。

In order to effectively aggregate and use data on the premise of ensuring the data safety of different clients and solve the problem of data island in the deep learning process, federal learning is carried forward. The distribution architecture of the client and the central server in the traditional federal learning is shown in fig. 1. In federal learning, the learning task is solved in federal form by multiple participating devices (i.e., clients) under the coordination of a central server.

The existing federal migration learning fault diagnosis method considers the problems of data safety and domain drift between a source domain and a target domain. However, the existing method assumes that the target domain data exists and participates in the training process, and does not consider the problems that the target domain data is unavailable and the model is trained individually. Aiming at the problems, the invention trains individualized fault diagnosis models for each client based on the principle that source domain data are different, and as shown in fig. 2, the training of a global model is completed by performing weighted aggregation of loss gradients on training losses and model parameters of different client models in a central server by utilizing the inherent relation between labels and characteristics of the source domain data. In order to ensure the safety of data, fault data and fault characteristics are not shared between the client and between the client and the central server. On the other hand, the method provided by the invention uses partial model parameters and weighting loss of different clients to carry out loss gradient aggregation in the central server, thereby training the model and overcoming the defects that target domain data is unavailable and the model is not trained in a personalized way.

The first embodiment of the disclosure provides a mechanical fault diagnosis method based on federal domain generalization, and in a training phase, a central server firstly transmits a randomly initialized global model to all clients. The client side independently trains the models by using the training data set of the client side, then transmits all client side models to the central server, averages all model parameters in the central server to obtain a global model, and then transmits the processed model to the client side. The client transmits the loss to the central server, and based on the loss value, the central server trains the global model. And finally, sending the trained global model to a client with target domain data, and inputting fault data to be tested to the trained global model to diagnose the fault. The method specifically comprises the following steps:

first, the central server initializes the model randomly

And

to each client.

Second, the client will

And

as its initial model, the client model is obtained by training model parameters using a data set in the client

And

，

is shown as

The individual source domain data set samples are,

is an integer between 1 and N.

Third, set of all client models

And

is sent to a central server which averages all model parameters to obtain a global model

And

wherein, in the step (A),

a global model of the network is extracted for the central server features,

a global model of the network is categorized for the central server. The specific feature extraction network and classification network structures are shown in the following table:

TABLE 1 network architecture

As shown in the above table, the feature extraction network consists of three sets of one-dimensional convolutional layers, batch normalization layers, modified linear unit layers, and one-dimensional maximum pooling layers connected in series. The number of convolution kernels of the three one-dimensional convolution layers is 128, the sizes of the convolution kernels are 17/17/3 respectively, and convolution step lengths are 1; batch standardization layer no parameter; the parameters of the three modified linear units are all 0.2; the parameters of the three one-dimensional maximum pooling layers are 16/16/2, respectively.

The classification network consists of a full connection layer, a batch standardization layer, a modified linear unit layer, a Dropout layer and a Softmax layer. Wherein the parameter of the full connection layer is 512, the parameter of the correction linear unit layer is 0.2, the parameter of random zero setting is 0.3, and the parameter of the Softmax layer is the number of fault categories.

The fourth step, the central server will

And

sending the data to all clients, and completing the following tasks by the clients:

calculating classification loss

. Wherein

For the training data of the k-th client,

for the training data set true label of the kth client,

is a prediction result. Will classify the loss

And sending the data to a central server.

Obtaining the output characteristics of each client terminal characteristic extraction network, and outputting the covariance matrix of the output characteristics

And sending the data to a central server.

Invariant Risk Minimization loss (Invariant Risk Minimization) was calculated. Invariant risk minimization assumes that the distribution of data in different domains is different, but the causal relationship of data features to tags is constant. The causal relationship between the tags and features does not change with changes in the operating conditions or environment. The purpose of invariant risk minimization is to find out the potential invariance of different domains. The invariant risk minimization loss function is as follows:

wherein IRM is invariable windThe risk is minimized and the loss is minimized,

it is meant that the gradient calculation is performed,

b represents

And

the number of the (c) is,

、

in order to input the data, the data is,

、

in order to input the label, the user must,

is composed of

Via a feature extraction network

The characteristics of the latter output are such that,

is composed of

Via a feature extraction network

The characteristics of the latter output are such that,

is the first

To a client

The group characteristics and the label of the tag,

is the first

To a client

Group characteristics and labels;

is a function of the classification loss for the,

is a network of feature extraction that is,

is a scalar quantity.

Fifthly, the central server receives the loss of all the clients

And

and covariance matrix of features

. Calculating the second-order statistical characteristic distance of the characteristic covariance matrix of every two clients in the central server to obtain a characteristic distance measurement loss function as follows:

in order to characterize the loss of the distance metric,

is the size of the feature vector, N represents the number of clients; f denotes the F-norm of the matrix,

is shown as

The number of the client-side is small,

、

representing the characteristic covariance matrix of any two clients.

Sixthly, calculating a global loss value on the central server, wherein the global loss function is as follows:

，

wherein the content of the first and second substances,

is a first

The number of the client-side is small,

classifying a weight lost for each client, wherein,

is an integer between 1 and N,

is shown as

By weighting, clients with poor classification performance (large loss value) contribute large loss value proportion, so the selection of the model with the worst performance is considered in the global model training.

And seventhly, training the fault diagnosis model of the central server by adopting a back propagation mode based on the global loss value. Because the low-level feature extraction network is a general feature of fault data, namely, the difference of the model parameters in different clients is small, in order to reduce the model training burden, a model transfer strategy is adopted, the parameters of the low-level network of the global model are frozen, only the parameters in the high-level network are trained, and the workload of the training model is reduced on the premise of not influencing the generalization capability of the model.

The model transfer strategy means that in the process of cooperatively training a model by a client and a server, because the characteristics extracted by the low-level network parameters of the characteristic feature extraction layer are general characteristics, in the process of training the model by the central server, the low-level network parameters can be frozen, and only the parameters of a higher-level network are trained.

And finally, sending the trained global model to a client with target domain data, and inputting fault data to be tested to carry out fault diagnosis.

And (3) experimental verification:

the invention verifies the fault diagnosis accuracy and safety of the fault diagnosis model obtained by the application through the following experiments.

1) Introduction of data set: the bearing failure data set was provided by the university of Keiss West reservoir (CWRU). In this data set, the bearings had three different failure diameters, including 7, 14, and 21 mils. The label information is shown in table 2, and the condition information is shown in table 3. The data set sample length is 4096. The training and testing protocol of the present invention is shown in table 4.

Case 1 indicates that the 1 st experiment uses the data sets numbered 0 and 1 in table 3 as source domain 1 and source domain 2, respectively, and the data set numbered 3 as the target domain.

Case 2 indicates that the 2 nd experiment will use the data sets numbered 1 and 2 in table 3 as source domain 1 and source domain 2, respectively, and the data set numbered 3 as the target domain.

Table 2.Cwru fault data set tag information

TABLE 3 CWRU Equipment Condition

TABLE 4 Source and target Domain information in CWRU experiments

2) The experimental results are as follows: the results of the experiment are shown in Table 5. As can be seen from table 5, compared With the Domain adaptation method in the paper [ Conditional adaptive Domain Generalization With a Single discovery Fault Diagnosis ], the method provided by the present invention can improve the Fault Diagnosis accuracy of the target Domain on the premise of ensuring the data security of the source Domain.

TABLE 5 results of the experiment

Example two:

the second embodiment of the present disclosure provides a mechanical fault diagnosis system based on federal domain generalization, including:

the central server is used for initializing a global model, the central server comprises a global feature extraction network and a global classification network, and the central server simultaneously carries out information interaction with a plurality of clients; each client comprises a feature extraction network and a classification network, N clients comprise N source domain data sets, and the (N + 1) th client comprises a target domain data set.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. The mechanical fault diagnosis method based on the generalization of the federal domain is characterized by comprising the following steps of:

the central server randomly initializes the global model and sends the global model to all the clients; the client completes the following tasks:

calculating the classification loss based on the classification loss function and sending the classification loss to the central server;

calculating invariant risk minimum loss based on the invariant risk minimum loss function and sending the invariant risk minimum loss to the central server;

sending the models trained by all the clients to a central server, and averaging all the model parameters in the central server to obtain a global model; the specific process is as follows: the central server receives the classification loss and invariant risk minimization loss of all the clients and the covariance matrix of the features; calculating second-order statistical characteristic distances of characteristic covariance matrixes of every two clients in a central server, and obtaining characteristic distance measurement loss based on a characteristic distance measurement loss function;

is classified as

The classification loss function is:

wherein the content of the first and second substances,

for the training data of the k-th client,

for the training data set true label of the kth client,

is a predicted result;

the invariant risk minimization loss function is as follows:

wherein the IRM is an invariant risk minimization loss,

it is meant that the gradient calculation is performed,

b represents

And

the number of the (c) is greater than the total number of the (c),

、

in order to input the data, the data is,

、

in order to input the label, the user can input the label,

is composed of

Passing through a feature extraction network

The characteristics of the latter output are such that,

is composed of

Passing through a feature extraction network

The characteristics of the latter output are such that,

is the first

To a client

A group of characteristics and a label, and,

is the first

To a client

Group characteristics and labels;

is a function of the loss of the classification,

is a network of feature extraction that is,

is a scalar;

the feature distance metric loss function is as follows:

，

wherein the content of the first and second substances,

is the size of the feature vector(s),

denotes the first

The number of the client-side is small,

、

a feature covariance matrix representing any two clients;

2. The federal domain generalization-based mechanical fault diagnosis method as claimed in claim 1, wherein the global loss value is calculated on the central server based on a global loss function, and the global model of the central server is trained based on the global loss value back propagation.

3. The federal domain generalization-based mechanical failure diagnostic method of claim 2, wherein the global loss function is as follows:

，

wherein the content of the first and second substances,

is a first

The number of the client-side is small,

classifying a weight lost for each client, wherein,

is an integer between 1 and N,

denotes the first

Loss values for the source domain dataset samples, N represents the number of clients.

4. A diagnostic system for a federal domain generalization-based mechanical failure diagnostic method as defined in any one of claims 1 to 3, comprising:

a central server and a client; the central server comprises a global feature extraction network and a global classification network, and the central server simultaneously carries out information interaction with a plurality of clients; the central server is also used for initializing a global model; the characteristic extraction network consists of three groups of one-dimensional convolutional layers, a batch standardization layer, a correction linear unit layer and a one-dimensional maximum pooling layer which are connected in series; the classification network consists of a full connection layer, a batch standardization layer, a modified linear unit layer, a Dropout layer and a Softmax layer.

5. The diagnostic system of claim 4, wherein each client comprises a feature extraction network and a classification network, and it is assumed that N clients comprise N source domain data sets, and the (N + 1) th client comprises a target domain data set; the characteristic extraction network consists of three groups of one-dimensional convolution layers, batch standardization layers, correction linear unit layers and one-dimensional maximum pooling layers which are connected in series; the classification network consists of a full connection layer, a batch standardization layer, a correction linear unit layer, a Dropout layer and a Softmax layer.