CN112101489A

CN112101489A - Equipment fault diagnosis method driven by united learning and deep learning fusion

Info

Publication number: CN112101489A
Application number: CN202011291656.5A
Authority: CN
Inventors: 董志红; 赵宜斌; 王岩
Original assignee: Tianjin Development Zone Jingnuo Hanhai Data Technology Co ltd
Current assignee: Tianjin Development Zone Jingnuo Hanhai Data Technology Co ltd
Priority date: 2020-11-18
Filing date: 2020-11-18
Publication date: 2020-12-18

Abstract

The invention discloses a method for diagnosing equipment faults driven by the fusion of federated learning and deep learning, which comprises the following steps: s1, setting a distributed training sub-end LSTM fault diagnosis model capable of running in each trusted factory; s2, preprocessing such as resampling, EMD decomposition, labeling and normalization are carried out on the credible factory data; s3, training a distributed training sub-end LSTM fault diagnosis model on each credible factory; s4, uploading the bottom layer parameters of the middle training model to a central server; s5, the central server performs aggregation calculation on model bottom layer parameters and distributes the model bottom layer parameters to each distributed training sub-end LSTM fault diagnosis model for updating; and S6, finishing the training of the equipment fault diagnosis model driven by the integration of global shared federal learning and deep learning. The invention maintains the data security and the production data privacy of a factory, improves the accuracy of fault diagnosis, and realizes the privacy protection and the reliable classification diagnosis of the fault data of key equipment.

Description

Equipment fault diagnosis method driven by united learning and deep learning fusion

Technical Field

The invention relates to the field of equipment fault diagnosis, in particular to an equipment fault diagnosis method driven by the fusion of federated learning and deep learning.

Background

Modern equipment gradually develops towards large-scale, precision and intellectualization, the structure and the function of the equipment are gradually complicated, the maintenance difficulty is increased, and the maintenance cost for ensuring normal operation is increased day by day, so that a reliable equipment monitoring technology is developed, a scientific and efficient fault diagnosis system is established, the safe and stable operation of mechanical equipment is ensured, and the equipment is a necessary requirement for modern industrial development.

With the development of the internet of things technology, a large amount of operation data such as temperature, rotating speed, power, vibration and the like are acquired by key equipment, how to utilize the data to carry out effective fault early warning and diagnosis to become a leading-edge hotspot, machine learning is widely applied and a good effect is obtained, and articles [ Liu Jing and the like ] a maintenance decision method with risk control [ J ] a computer integrated manufacturing system, 2010(10):2087 and 2093 ] provide a maintenance decision method with risk control, and the method applies the association rule in data mining to fault diagnosis and analyzes and processes the BP neural network consequence value to achieve the purpose of determining the maintenance decision. An article [ Kontao et al ] rolling bearing intelligent composite fault diagnosis method [ J ] mechanical transmission, 2016(12): 139-; an article [ Zhaojianpo, and the like ] is based on the rotary machine state prediction research [ J ] of a long-time memory network, noise and vibration control, 2017(4) extracts relevant characteristics from collected vibration signal data by using an empirical mode decomposition method, and inputs the characteristics into a single-layer LSTM network model to realize the operation state prediction of the rotary machine. If the accuracy of fault diagnosis is further improved, more and richer fault data are needed to participate in training, but the plant operation data has stronger privacy, and the training data in the traditional model needs to be concentrated in a central position, so that large-scale collaborative training becomes difficult.

Disclosure of Invention

The invention provides an equipment fault diagnosis method driven by the fusion of federated learning and deep learning, which aims to: the method comprises the steps that a fault diagnosis model is operated on a credible factory side, a decentralized training mode of federal learning is utilized to train the fault diagnosis model, and a federal learning framework combines data collected from all credible factories to train a global shared fault diagnosis model so as to improve the diagnosis accuracy of plant equipment faults.

In order to solve the technical problems, the invention provides the following technical scheme:

a method for diagnosing equipment faults driven by the fusion of federated learning and deep learning comprises the following steps:

s1, setting a distributed training sub-end LSTM fault diagnosis model on a real-time data source generated by the trusted factory side: setting a mathematical model according to a business problem, namely an LSTM neural network comprising an input layer, a hidden layer, a random inactivation Dropout and an activation function softmax output layer, and after the model is trained, carrying out fault analysis on the model deployed at a trusted factory side by directly reading in data acquired by equipment;

s2, extracting the characteristics of the collected data of the credible factory: according to the requirement of the distributed training sub-end LSTM fault diagnosis model on training in the previous step, data used for training directly use data on a local storage medium, and training set loading is completed through reading and loading. The data sets extract public parts in the collected data of the credible factory according to the fault characteristics, and the public characteristics are taken and are transversely combined with the data sets in the factory domain;

s3, local model training: aiming at the training of the LSTM fault diagnosis model of the distributed training sub-terminal, the training of each local model sets the same random inactivation Dropout and learning rate training parameters according to the traditional single-machine LSTM neural network, sets the sample batch batchsize, selects the client selection percentage and the iteration model parameters;

s4, uploading intermediate parameters of the local model: after the training model at the local credible factory side completes one iteration, uploading the intermediate training parameters to a central server, wherein the central server serves as a training collaborator in the whole training process;

s5, parameter fusion and postback: after summarizing model parameters uploaded by all models, a central server serving as a training collaborator calculates an updating gradient based on a federal distributed equipment training algorithm defined by federal learning, and distributes an updating matrix back to each local model after calculation so as to complete self updating;

s6, finishing the training of the global sharing model: stopping training after each local model reaches the iteration number set by the training collaborators, confirming the convergence of the joint loss function by the training collaborators, stopping uploading parameters and stopping updating; after the global sharing model obtains the optimal parameters, when the fault diagnosis problem analysis of the plant equipment is carried out each time, the global sharing model on the central server and the distributed training sub-terminal LSTM fault diagnosis model on the credible plant have the same structure, so that the data characteristics can be well fitted after the input samples are read; the resulting results will be mapped to the corresponding plant equipment failure type.

Further, the process of distributed training of the sub-end LSTM fault diagnosis model in step S1 includes the following steps:

1-1) firstly processing a fault data sample of time sequence sample data of equipment, using data which has abnormality and has determined fault type, and additionally adding a large number of normal samples;

1-2) resampling, namely performing undersampling on time sequence sample data by using a K neighbor method, and selecting normal samples with the minimum average Euclidean distance of the farthest 3 fault samples, thereby avoiding the problem of data imbalance caused by unbalanced data;

1-3) feature engineering, decomposing an original vibration signal of sample time sequence data by using an Empirical Mode Decomposition (EMD), observing the IMF (intrinsic mode function) features of stationary intrinsic mode components at each stage, decomposing the original vibration signal to obtain 7 components IMFs 1-IMF 7 and a residual error map, wherein each IMF component represents an intrinsic mode function existing in the original vibration signal;

1-4) numbering the intrinsic mode functions IMF of the time sequence data to obtain model input data X, and determining the output data of the model according to the number

；

1-5) normalizing the data, calculating the mean of the samples

And standard deviation of

The data X is normalized to obtain normalized data

：

1-6) dividing the data set, randomly arranging the data, and dividing the input data X and the output data X according to the same proportion

Obtaining a training data set and a test data set;

1-7) training a model, setting a check point, storing model parameters once for each iteration number Epoch, adjusting the iteration number Epoch and random inactivation Dropout parameters, and observing training loss trainloss and verification loss valloss.

Further, in step S5, calculating the update gradient by using a federal distributed equipment training algorithm defined based on federal learning, the method includes the following steps:

2-1) all credible factories uniformly determine the LSTM fault diagnosis model training targets of the local distributed training sub-terminals, namely:

wherein n represents the size of all the credible factory-built data sets participating in the joint training process,

representing model parameters in d-dimension, R represents real number,

is to use distributed training sub-end LSTM fault diagnosis model parameterswFor the sample

The loss of the prediction is made and,

and

respectively, the ith training data point and its associated label.

2-2) the objective equation can be converted into:

wherein K is the number of trusted plants participating in federal learning,

an index set representing data points located at a trusted plant k,

to represent

The cardinality of (a) is the size of the set,

representing LSTM fault diagnosis model parameters at distributed training sub-end

Calculating the average loss on the private data set;

2-3) for each plant k,

is provided with

Where t = 1,2, … represents each round of communication,

represents a fixed learning rate for each trusted plant,

is shown in

The gradient of the average loss over the private data set is calculated,

representing the LSTM fault diagnosis model parameters of the distributed training sub-terminal of the t-th round,

representing the t +1 th round distributed training sub-terminal LSTM fault diagnosis model parameters of the kth trusted factory,

representing the use of distributed training sub-terminal LSTM fault diagnosis model parameterswFor the sample

Making a gradient of predicted loss;

2-4) natural logarithm

And the loss of the diagnostic model is included in the weight of each plant local model parameter vector to achieve the balance purpose, and the general view shows that:

adopt the produced beneficial effect of above-mentioned technical scheme to lie in:

(1) aiming at the problem of data islands formed by data privacy protection of factories, a Method for diagnosing equipment faults (FDMDFL & DL for short) Driven by united Learning and Deep Learning fusion is provided, and the Method effectively solves the difficulty of data islands encountered by large-scale collaborative research of data by aggregating model bottom layer parameters uploaded by each distributed factory and fusing special Fault types of each distributed factory;

(2) in the unbalanced data processing stage, an under-sampling unbalanced data processing method based on neighbor division is provided, a K neighbor under-sampling method added with heuristic rules is adopted to select samples, the problem of information loss of a random under-sampling method is solved, the interference of data noise on a model is eliminated, and the full consideration of the distribution characteristics of the samples is met;

(3) in consideration of the characteristic that fault time sequence data has complex time relevance, a Distributed Training Sub-end LSTM fault diagnosis model (DTSTLSTM) is provided, the fault diagnosis method uses Empirical Mode Decomposition (EMD) to perform feature processing and denoising, and uses a Long Short-Term Memory (LSTM) structure-based deep neural network to extract features in a time dimension so as to improve fault classification accuracy.

Drawings

FIG. 1 is a diagnostic flow of an equipment fault diagnosis method driven by the integration of federal learning and deep learning;

FIG. 2 is a schematic diagram of a federated learning and deep learning fusion-driven equipment failure diagnosis framework;

FIG. 3 is a distributed training sub-terminal LSTM fault diagnosis model;

FIG. 4 is an Accuracy Accuracy plot of the diagnostic model;

FIG. 5 Loss value Loss plot for the diagnostic model;

FIG. 6 is a graph of Accuracy of diagnosis for different classification methods;

FIG. 7 is a graph of a vibration signature of a rolling bearing;

FIG. 8 is a diagram of a trusted plant data set profile;

FIG. 9 is an exploded view of the normal data EMD;

FIG. 10 is an exploded view of inner ring fault data EMD;

FIG. 11 Algorithm 1 pseudo code;

FIG. 12 test set accuracy of the diagnostic model;

FIG. 13 depicts bearing failure experimental data.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

S1, setting a distributed training sub-end LSTM fault diagnosis model on a real-time data source generated by a credible factory side: a mathematical model is set up according to the business problem, namely an LSTM neural network comprising an input layer, an implicit layer and a random deactivation Dropout and activation function softmax output layer. After the model is trained, the model deployed on the trusted factory side carries out fault analysis by directly reading in data acquired by equipment;

the experimental verification of the invention uses experimental data containing three faults of Inner ring Inner Raceway Fault, Outer ring Outer Raceway Fault and rolling body Ball Fault and Normal data.

The distributed training sub-end LSTM fault diagnosis model flow proposed by the present invention is described as follows, and the model flow diagram is shown in fig. 3, wherein,

which represents the input data, is,

representing the predicted output, Adam is an optimization function,

indicating the state of the current cell and,

which represents the output of the current cell and,

indicating an LSTM unit.

；

1-5) normalizing the data, calculating the mean of the samples

And standard deviation of

The data X is normalized to obtain normalized data

：

Obtaining a training data set and a test data set;

s3, local model training: aiming at the training of the LSTM fault diagnosis model of the distributed training sub-terminal, the training of each local model sets the same random inactivation Dropout and learning rate training parameters according to the traditional single-machine LSTM neural network. Setting a sample batch size, selecting a client selection percentage and iteration model parameters;

fig. 2 is a specific implementation process of the federal learning and deep learning fusion-driven equipment failure diagnosis model training cooperation corresponding to S4 and S5.

Specifically, the distributed model training algorithm is mainly used for obtaining a global shared model by using local data of each credible factory for collaborative training, assuming that K credible factories are used as participants of federal learning, n represents the size of a data set constructed by all credible factories participating in a joint training process,

representing model parameters in d-dimension, R represents real number,

The loss of the prediction is made and,

and

respectively, the ith training data point and its associated label. Before participating in the joint training, all trusted factories will uniformly determine the local distributed training sub-end LSTM fault diagnosis model training targets, namely:

in the course of combined training, it is provided

An index set representing data points located in a trusted plant k

To represent

Is the size of the set, i.e. we assume that the kth trusted factory has

The number of data points is, for example,

The average loss over the private data set is calculated as follows, so the objective equation can be converted to:

the central server first initializes the distributed trainingThe training terminal LSTM fault diagnosis model parameters randomly select a certain percentage of factories participating in the joint training to directly communicate with the central server in each round of communication t = 1,2, …. Each trusted factory participating in federal learning then downloads the current global model parameters from the central server. The learning rate fixed by each trusted plant is

At distributed training sub-end LSTM fault diagnosis model parameters

Calculating the gradient of the average loss over the private data set

. The credible factories synchronously update the LSTM fault diagnosis model of the distributed training sub-terminal, and simultaneously upload the update of the LSTM fault diagnosis model parameters of the distributed training sub-terminal to the central server. The central server will aggregate the uploaded model parameters to further optimize the global shared central model:

for each of the plants k, the plant k,

in combination with (5) have

Wherein the content of the first and second substances,

The gradient of the loss of the line prediction.

Considering that the difference between the local data sample size and the sample type of each trusted plant may have a large influence on the global shared model diagnosis result, the balance is achieved by taking the natural logarithm e and the loss of the diagnosis model into the weight of each trusted plant local model parameter vector, and in general view:

after each round of model parameter aggregation and updating is carried out through iteration, the central server reevaluates the performance of a new global model, the error and the accuracy of the new global model on a test set are calculated, then the model is stored again and parameters are distributed, and therefore the global shared model can be quickly responded to change, and the overall speed of model training is improved.

In order to improve the overall working efficiency of the system, various super-parameter regulation global conditions and a model performance monitor are arranged at a central server end, and whether training is stopped or not is judged in time according to the accuracy of a new model uploaded by a distributed training sub-end. Three key hyper-parameters for the design of the federal distributed equipment training algorithm are: b, C and E. Wherein, B represents the updated batch processing size of the LSTM fault diagnosis model of the distributed training sub-terminal during training, and a factory can adjust the speed of the local data training model by using the parameter and finally influence the training efficiency of the global shared model. C (C belongs to [ 0, 1 ]) represents the percentage of selected factories when a new model needs to be trained, each round of selected factories is randomly selected, for example, C is equal to 0.3, the C represents that 3 factories are randomly selected from all 10 credible factories to participate in the joint training of the round, other factories which do not participate in the joint training can only be selected again in the next round, the parameter not only automatically randomizes the factory data, but also controls the training amount of the participated data, thereby being beneficial to the diversity of samples and improving the convergence speed of the model. E represents the number of iterations executed by the factories in the round by using the local training set, when the value of E is too large, the local attribute represented by the distributed training sub-end LSTM fault diagnosis model of each factory is increased, however, the common attribute of other factories is correspondingly reduced when the value of E is too small, and the opposite is true when the value of E is too small. Therefore, adjusting the value of the iteration number Epoch will affect the performance of the global shared model and also change the local characteristics of the distributed training sub-end LSTM fault diagnosis model, so setting the value of the iteration number Epoch parameter is particularly critical.

Algorithm 1 describes pseudo code of the device failure diagnosis method driven by the combination of federal learning and deep learning, including a central Server execution part Server executions and a factory update part factory update, as shown in fig. 11.

The general structural flow of the specific fault diagnosis method is shown in fig. 1.

The invention discloses a test verification of an equipment fault diagnosis method driven by the integration of federal learning and deep learning, which comprises the following steps:

description of data: the experimental data are derived from bearing fault data of a CWRU electrical engineering laboratory of university of Kaiser Sichu, 1,341,856 data points in total, and the bearing model is 6205-2RS JEM SKF deep groove ball bearing. The single-point failures of 3 grades are respectively set on the inner ring, the outer ring and the rolling body on the bearing by utilizing an electric spark machining mode, the failure diameters are respectively 0.007 inches mild, 0.014 inches moderate and 0.021 inches severe, and the failure depths are respectively 0.011, 0.050 and 0.150 inches. Three kinds of trouble have set up respectively at motor drive End Driver End and Fan End, have placed vibration sensor collection frequency 12KHz at motor drive End, Fan End and base and have gathered 21 group data and have contained 6 kinds of trouble types altogether, and the data sample information that the experiment adopted is shown in fig. 13.

On a Keiss Caesar data set, different states with acquisition frequency of 12KHz were used: drawing graphs of vibration data of Normal, rolling element Fault Ball Fault, Inner ring Fault Inner radius Fault and Outer ring Fault Outer radius Fault. As shown in fig. 7, Acceleration represents Acceleration, and Time Step represents Time. The amplitude of the fault vibration signal obtained from the pattern of the sample data is obviously larger than that of the normal signal. Periodically, large amplitude sequences may occur in the fault signal.

Fig. 9 and 10 are empirical mode decomposition EMD decompositions of normal data and inner ring fault data, respectively, in which IMFs 1-7 represent signal components at different frequencies, which are sequentially arranged from high frequency to low frequency, and the right part is an Instantaneous frequency of each IMF component. It can be seen from the graph that the normal signal and the abnormal signal have a large difference in residual error and instantaneous frequency distribution, the instantaneous frequency distribution of the normal signal is smooth, and the instantaneous frequency distribution of the abnormal signal has a large fluctuation.

Data processing: in the training mechanism, the proportion of the number of plants selected at a time is determined by the size of the plant data set participating in the joint training. When the total data volume of the combined training factory is less, the proportion is correspondingly increased so as to improve the accuracy of the model; on the contrary, when the total data volume of the combined training factory is large, the proportion is correspondingly reduced so as to improve the model training speed, namely, the optimal solution is selected while the accuracy and the speed are ensured. In this experiment, 10 plant datasets were selected (assuming that 10 plant datasets fit into an independent co-distribution), and 3 plants were randomly selected to participate in the co-training in 10 plants at a time. After the preprocessed data sets are arranged in corresponding factories, each Factory has 2000-5000 unequal training set data (as shown in fig. 8) and 10% test set data. And updating the server after the model training of the selected factory from the local data set is finished, wherein the updating of the bottom layer parameters of all factory-trained models is accumulated on the central server and aggregated by the server. The aggregation follows the above-described normalized mechanism that results in global updates and provides updates for all local plant models; the global shared model will be updated according to the global and ready to perform the next update iteration, with the plant server running multiple iterations of the above steps separately to better aggregate the models.

In order to verify the equipment fault diagnosis model driven by the integration of Federal learning and deep learning, according to the Training methods, a Central Training method (CT for short), a Federal Average method (FA for short), FDMDFL & DL and an unbalanced data set FDMDFL & DL (equalized-FDMDFL & DL, Imb-FDMDFL & DL for short) are used for carrying out comparison experiments on 6 types of data sets, and the experimental results of 21 groups of fault data 6 types of data sets are shown in FIG. 12.

As can be seen from observing the experimental results of fig. 12 and fig. 6, in terms of unbalanced data, the data subjected to the K-nearest neighbor undersampling process has higher accuracy in the diagnostic model, which is benefited by good balance of the data; in the classification method type, FDMDFL & DL is obviously superior to FA and CT on each input node, because the aggregation algorithm used by FDMDFL & DL can dynamically adjust each side weight according to each round of information, each participating factory can play the maximum effect; in terms of the number of input layer nodes, the greater the number of input nodes, the higher the accuracy, the more difficult the reason for this is to understand, the longer the sequence contains the more fault features, the easier it is to be captured by the model, if the sequence length is too small, such as only 20, it may be completely impossible to obtain any fault information from it. In conclusion, the best effect is that the equipment fault diagnosis model driven by the combination of federal learning and deep learning is constructed when the input node of FDMDFL & DL subjected to K-neighbor undersampling is 1000, so that the accuracy rate of classifying 21 groups of data of 7-class faults reaches 93.46%, and the effectiveness of the method is proved.

Fig. 4 and 5 show the accuracy and loss value, respectively, at a number of input nodes of 1000. The dotted square line represents the accuracy and loss values for FDMDFL & DL, the dotted dot line represents the accuracy and loss values for CT, the dotted triangular line represents the accuracy and loss values for FA, and the dotted cross-point represents the accuracy and loss values for Imb-FDMDFL & DL. In the aspect of the training process, the credible factories participating in FDMDFL & DL and FA are randomly extracted by the central server in each round, so that some credible factories are selected for a plurality of times in the training process, and the test performance is good; some of the test curves are selected less, the test performance is poor, and the test curve process is shown to be zigzag; however, after multiple iterations, the test curve tends to be smooth, and the FDMDFL & DL test results are superior to the FA test results, both significantly superior to CT. In the aspect of training speed, trusted factories participating in FDMDFL & DL and FA carry out synchronous training, and meanwhile, as private data sets of the trusted factories do not need to be uploaded to a data center in a centralized and unified manner, under the condition of not revealing privacy and sensitive information of data of the factories, the consumption of training time is reduced to a great extent, the cost on a training model is saved, and the training speed is accelerated; and CT spends a large amount of time in the process of uploading to the data center, the data volume for training after data concentration is huge, the training period of the whole model is prolonged, and the updating iteration speed of the model is reduced, so that the time cost of the Federal distributed framework is less than that of the central training.

Claims

1. The method for diagnosing the equipment fault driven by the fusion of the federated learning and the deep learning is characterized by comprising the following steps of:

s2, extracting the characteristics of the collected data of the credible factory: according to the requirement of the distributed training sub-end LSTM fault diagnosis model on training in the previous step, data used for training directly use data on a local storage medium, training set loading is completed by reading and loading, the data set extracts a public part in data collected by a credible factory according to fault characteristics, and the public characteristics are taken and are transversely combined with each data set in a factory domain;

s6, finishing the training of the global sharing model: stopping training after each local model reaches the iteration number set by the training collaborators, confirming the convergence of the joint loss function by the training collaborators, stopping uploading parameters and stopping updating; after the global sharing model obtains the optimal parameters, when the fault diagnosis problem analysis of the plant equipment is carried out each time, the global sharing model on the central server and the distributed training sub-terminal LSTM fault diagnosis model on the credible plant have the same structure, so that the input sample is read and then the data characteristics are fitted; the resulting results will be mapped to the corresponding plant equipment failure type.

2. The method for diagnosing equipment faults driven by the fusion of the federated learning and the deep learning according to claim 1, characterized in that: the process of the distributed training of the sub-end LSTM fault diagnosis model in step S1 includes the following steps:

；

1-5) normalizing the data, calculating the mean of the samples

And standard deviation of

The data X is normalized to obtain normalized data

：

Obtaining a training data set and a test data set;

3. The method for diagnosing equipment faults driven by the fusion of the federated learning and the deep learning according to claim 1, characterized in that: in step S5, an update gradient is calculated by using a federal distributed device training algorithm defined based on federal learning, and the steps are as follows:

representing model parameters in d-dimension, R represents real number,

The loss of the prediction is made and,

and

respectively representing the ith training data point and the related label;

2-2) converting the objective equation into:

wherein K is the number of trusted plants participating in federal learning,

an index set representing data points located at a trusted plant k,

to represent

The cardinality of (a) is the size of the set,

Calculating the average loss on the private data set;

2-3) for each plant k,

is provided with

Where t = 1,2, … represents each round of communication,

represents a fixed learning rate for each trusted plant,

is shown in

The gradient of the average loss over the private data set is calculated,

representing the use of distributed training sub-terminal LSTM fault diagnosis model parameters

For the sample

Making a gradient of predicted loss;

2-4) natural logarithm

And the loss of the diagnostic model is incorporated into the weight of each plant local model parameter vector to achieve the balance purpose, as follows:

。