CN116910652A

CN116910652A - An equipment fault diagnosis method based on federated self-supervised learning

Info

Publication number: CN116910652A
Application number: CN202310893683.7A
Authority: CN
Inventors: 刘振宇; 郑皓文; 刘惠; 谭建荣
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2023-07-20
Filing date: 2023-07-20
Publication date: 2023-10-20

Abstract

The invention discloses a device fault diagnosis method based on federal self-supervision learning. Comprising the following steps: firstly, initializing the weight of a feature extractor by a server and transmitting the weight to each client; then, each client acquires signals generated when the local equipment works by using a sensor and records the signals as local vibration data, so as to obtain a non-tag data set and a tag data set; then, each client trains the local feature extractor under the federal self-supervision learning framework respectively, and further obtains the trained local feature extractor; respectively training classifiers by each client under a supervision and learning framework to obtain corresponding client classifiers, and connecting the feature extractor and the client classifiers to form a fault diagnosis model; and finally, performing equipment diagnosis by using a fault diagnosis model of the client. The invention solves the problems that the fault data set of the rotating equipment is smaller and dispersed, and the high-precision diagnosis model is difficult to train due to the lack of labels.

Description

An equipment fault diagnosis method based on federated self-supervised learning

技术领域Technical field

本发明属于旋转设备故障诊断领域，涉及到机器学习、深度学习和时间序列分类领域的一种设备故障诊断方法，具体涉及了一种基于联邦自监督学习的设备故障诊断方法。The invention belongs to the field of rotating equipment fault diagnosis, relates to an equipment fault diagnosis method in the fields of machine learning, deep learning and time series classification, and specifically relates to an equipment fault diagnosis method based on federated self-supervised learning.

背景技术Background technique

旋转设备在现代工业中有着广泛的应用，并且变得越来越复杂，越来越精密，例如航空发动机、燃气轮机。当旋转设备出现轴承损坏、叶片断裂等故障时，可能导致严重的事故，造成巨大的经济损失。因此，正确识别运行中的设备状态，在出现故障早期征兆时就及时干预，对提高生产效率，减少灾难损失等方面都有着重要的意义。Rotating equipment is widely used in modern industry and is becoming more and more complex and sophisticated, such as aircraft engines and gas turbines. When rotating equipment has faults such as bearing damage or blade breakage, it may cause serious accidents and cause huge economic losses. Therefore, correctly identifying the status of equipment in operation and promptly intervening when early signs of failure occur are of great significance to improving production efficiency and reducing disaster losses.

目前的旋转设备故障诊断大都属于数据驱动方法，通过大量的数据，挖掘出振动信号到设备状态的映射关系。常用的方法包括支持向量机、决策树和神经网络等。深度神经网络因其有自动提取特征的能力，是目前主流的研究方向之一。如专利“一种基于机器学习的水泥生产旋转设备故障诊断方法”公开的方法使用了一维卷积神经网络和全连接神经网络提取振动特征，然后使用集成学习从多个分类器中获得诊断结果。专利“一种基于深度残差网络的旋转设备故障诊断方法、系统及可读存储介质”公开的方法则使用了深度残差网络从振动信号中提取故障特征。Most of the current rotating equipment fault diagnosis methods are data-driven. Through a large amount of data, the mapping relationship between vibration signals and equipment status is mined. Commonly used methods include support vector machines, decision trees, and neural networks. Deep neural network is one of the current mainstream research directions because of its ability to automatically extract features. For example, the method disclosed in the patent "A Machine Learning-Based Fault Diagnosis Method for Rotating Equipment in Cement Production" uses a one-dimensional convolutional neural network and a fully connected neural network to extract vibration features, and then uses integrated learning to obtain diagnostic results from multiple classifiers. . The method disclosed in the patent "A rotating equipment fault diagnosis method, system and readable storage medium based on a deep residual network" uses a deep residual network to extract fault features from vibration signals.

现有的方法虽然从诊断精度上达到了相当高的水平，但是它们使用大尺寸的、完全标注的数据集训练模型，这在很多场合下是受到限制的。数据缺乏标注是最常见的限制。振动信号需要具备领域知识的专家进行标注，成本很高，在实际应用中，只有很少一部分数据被标注。数据安全也是需要考虑的。不同客户在不同工况下使用设备。为了确保模型鲁棒性，应该尽量收集所有客户的数据。但是客户可能出于利益考量或者对数据泄露风险的担忧，不愿意将自己的数据共享出来以供训练更优秀的模型。Although existing methods have achieved a very high level of diagnostic accuracy, they use large-size, fully-labeled data sets to train models, which is limited in many situations. Lack of annotation of data is the most common limitation. Vibration signals need to be annotated by experts with domain knowledge, which is very costly. In practical applications, only a small part of the data is annotated. Data security also needs to be considered. Different customers use equipment under different working conditions. To ensure model robustness, data from all customers should be collected as much as possible. However, customers may be unwilling to share their data to train better models out of interest considerations or concerns about the risk of data leakage.

发明内容Contents of the invention

为了解决背景技术中存在的问题和需求，本发明提出了一种基于联邦自监督学习的设备故障诊断方法，能够利用分散且无法共享的多个小型缺乏标签的数据集训练故障诊断模型并用于在线诊断。利用自监督学习方法从大量无标签数据中学习有效的故障特征表示，使用联邦学习方法从多个客户端处训练具有全局知识的故障诊断模型而不需要客户端上传本地数据，该方法可在故障数据集较小且分散、缺乏标签的场景下训练高效的故障诊断模型。In order to solve the problems and needs existing in the background technology, the present invention proposes an equipment fault diagnosis method based on federated self-supervised learning, which can utilize multiple small data sets lacking labels that are scattered and cannot be shared to train a fault diagnosis model and use it online diagnosis. Use self-supervised learning methods to learn effective fault feature representations from a large amount of unlabeled data, and use federated learning methods to train fault diagnosis models with global knowledge from multiple clients without requiring clients to upload local data. Train efficient fault diagnosis models in scenarios where the data set is small, scattered and lacks labels.

本发明的具体技术方案包括几下步骤：The specific technical solution of the present invention includes several steps:

S1：服务器初始化特征提取器的权重，并将该权重下发至各个客户端，各个客户端均使用该权重作为本地特征提取器的初始权重；S1: The server initializes the weight of the feature extractor and sends the weight to each client. Each client uses this weight as the initial weight of the local feature extractor;

S2：各客户端利用传感器采集本地设备工作时产生的信号并记为本地振动数据，接着对本地振动数据进行预处理后，获得无标签数据集和有标签数据集；S2: Each client uses sensors to collect signals generated by local equipment during operation and records them as local vibration data. Then, after preprocessing the local vibration data, obtain unlabeled data sets and labeled data sets;

S3：在联邦自监督学习框架下，各客户端使用无标签数据集训练本地特征提取器，进而获得训练好的本地特征提取器；S3: Under the federated self-supervised learning framework, each client uses an unlabeled data set to train a local feature extractor, and then obtains a trained local feature extractor;

S4：各客户端在监督学习框架下分别使用标签数据集训练分类器，获得对应的客户端分类器，每个客户端中，由当前特征提取器与客户端分类器连接后形成故障诊断模型；S4: Each client uses the labeled data set to train a classifier under the supervised learning framework to obtain the corresponding client classifier. In each client, a fault diagnosis model is formed by connecting the current feature extractor and the client classifier;

S5：将待诊断设备的传感器数据经过预处理后接入到对应客户端的故障诊断模型中，获得对应的设备诊断结果。S5: After preprocessing, the sensor data of the device to be diagnosed is connected to the fault diagnosis model of the corresponding client to obtain the corresponding device diagnosis results.

所述S1中，特征提取器采用带有残差连接的卷积神经网络。In the S1, the feature extractor uses a convolutional neural network with residual connections.

所述S2中，每个客户端均执行以下步骤：In the S2, each client performs the following steps:

S21：利用加速度计采集本地设备工作时产生的信号并记为本地振动数据；S21: Use the accelerometer to collect the signals generated by the local equipment during operation and record them as local vibration data;

S22：随机选择预设比例的本地振动数据并按照设备真实状态对选择的本地振动数据进行标注，获得初始有标签数据集，未选择的本地振动数据则记为初始无标签数据集；S22: Randomly select a preset proportion of local vibration data and label the selected local vibration data according to the real state of the device to obtain an initial labeled data set. The unselected local vibration data is recorded as an initial unlabeled data set;

S23：使用滑动窗口分别将初始有标签数据集和无标签数据集的所有信号切分为多段，分别获得已切分的有标签数据集和无标签数据集；S23: Use a sliding window to divide all signals of the initial labeled data set and unlabeled data set into multiple segments, and obtain the segmented labeled data set and unlabeled data set respectively;

S24：使用最大-最小方法分别将已切分的有标签数据集和无标签数据集进行数值缩放，获得最终的无标签数据集和有标签数据集。S24: Use the max-min method to numerically scale the segmented labeled data set and unlabeled data set, respectively, to obtain the final unlabeled data set and labeled data set.

所述S3具体为：The S3 is specifically:

S31：每轮训练中，各客户端在自监督学习框架下分别使用无标签数据集训练本地特征提取器，获得当前轮训练后各客户端的本地特征提取器权重并均上传至服务器；S31: In each round of training, each client uses an unlabeled data set to train a local feature extractor under a self-supervised learning framework. The local feature extractor weights of each client after the current round of training are obtained and uploaded to the server;

S32：服务器聚合全体客户端的本地特征提取器权重后，获得全局特征提取器权重并下发该权重至各客户端的本地特征提取器；S32: After the server aggregates the local feature extractor weights of all clients, it obtains the global feature extractor weight and delivers the weight to the local feature extractor of each client;

S33：重复S31-S32，进行多轮全局特征提取器权重的更新，直至达到预设轮次，将最终的全局特征提取器权重下发至各客户端的本地特征提取器，各客户端从而获得训练好的本地特征提取器。S33: Repeat S31-S32 to update the global feature extractor weights for multiple rounds until the preset round is reached. The final global feature extractor weights are sent to the local feature extractors of each client, so that each client obtains training. Good local feature extractor.

所述S31中，每个客户端均执行以下步骤：In the S31, each client performs the following steps:

S311：利用两种不同的数据增强方法对无标签数据集中的各条无标签数据进行数据增强，获得对应的增强样本对；S311: Use two different data enhancement methods to perform data enhancement on each piece of unlabeled data in the unlabeled data set, and obtain the corresponding enhanced sample pairs;

S312：在自监督学习框架下使用无标签数据集对应的增强样本对数据集进行一轮本地特征提取器的训练，获得当前轮训练后的本地特征提取器的权重并上传至服务器。S312: Under the self-supervised learning framework, use the enhanced samples corresponding to the unlabeled data set to perform a round of local feature extractor training on the data set, obtain the weight of the local feature extractor after the current round of training and upload it to the server.

所述S311中，第一数据增强样本是先为每条无标签数据添加高斯噪声，再对添加噪声的数据进行缩放后获得的；第二数据增强样本/>是先通过平滑地扭曲每条无标签数据的时间步的间隔，再施加噪声后获得的。In the S311, the first data enhancement sample It is obtained by first adding Gaussian noise to each piece of unlabeled data, and then scaling the noise-added data; the second data enhancement sample/> It is obtained by first smoothly warping the interval of time steps of each piece of unlabeled data and then applying noise.

所述本地特征提取器在训练过程中，输出第一特征矩阵和第一特征矩阵/>使用梯度下降算法优化特征提取器权重，损失函数包括第一损失函数和第二损失函数，第一损失函数loss1的计算公式如下：During the training process, the local feature extractor outputs the first feature matrix and the first characteristic matrix/> Use the gradient descent algorithm to optimize the weight of the feature extractor. The loss function includes the first loss function and the second loss function. The calculation formula of the first loss function loss1 is as follows:

其中，N为批次大小，α，β分别为第一和第二超参数，为指示函数；i与j分别表示批次中样本的第一、第二索引；l_c为上下文对比损失函数，l_t为时间对比损失函数/>表示第一裁剪特征矩阵，/>表示第二裁剪特征矩阵，s1和s2分别表示第一和第二起始位置，e1和e2分别表示第一和第二结束位置；T表示特征矩阵时间维度的长度；/>表示第二裁剪特征矩阵/>中样本i在时间步t′的特征，/>表示第一裁剪特征矩阵/>中样本i在时间步t的特征，/>表示第一裁剪特征矩阵/>中的样本i，/>表示第二裁剪特征矩阵/>中的样本i，/>表示第二裁剪特征矩阵/>中的样本j；Among them, N is the batch size, α and β are the first and second hyperparameters respectively, is the indicator function; i and j represent the first and second index of the sample in the batch respectively; l _c is the context contrast loss function, l _t is the time contrast loss function/> Represents the first clipping feature matrix, /> represents the second cropping feature matrix, s1 and s2 represent the first and second starting positions respectively, e1 and e2 represent the first and second ending positions respectively; T represents the length of the time dimension of the feature matrix;/> Represents the second clipping feature matrix/> Characteristics of sample i at time step t′,/> Represents the first clipping feature matrix/> Characteristics of sample i at time step t,/> Represents the first clipping feature matrix/> Sample i in ,/> Represents the second clipping feature matrix/> Sample i in ,/> Represents the second clipping feature matrix/> Sample j in

第二损失函数loss2的计算公式如下：The calculation formula of the second loss function loss2 is as follows:

其中，分别表示第k个客户端第r轮的本地特征提取器提取的第一特征矩阵/>和第一特征矩阵/>分别表示第k个客户端第p轮的本地特征提取器提取的第一特征矩阵/>和第一特征矩阵/>表示每个客户端使用在第r轮训练时从服务器接收的全局特征提取器权重提取的特征。in, Respectively represent the first feature matrix extracted by the local feature extractor of the k-th client in the r-th round/> and the first characteristic matrix/> Respectively represent the first feature matrix extracted by the local feature extractor of the p-th round of the k-th client/> and the first characteristic matrix/> Represents the features extracted by each client using the global feature extractor weights received from the server at round r of training.

所述S32中，服务器使用加权平均方法聚合全体客户端的本地特征提取器权重，计算公式如下：In S32, the server uses the weighted average method to aggregate the local feature extractor weights of all clients. The calculation formula is as follows:

其中，k为客户端索引，|D|、|D^k|分别表示全局客户端的数据量和第k个客户端的数据量，θ^G，θ^k分别表示全局特征提取器的权重和第k个客户端的本地特征提取器权重。Among them, k is the client index, |D|, | ^Dk | represent the data volume of the global client and the data volume of the k-th client respectively, θ ^G and θ ^k represent the weight of the global feature extractor and the k-th client respectively. Local feature extractor weights at the end.

所述S4中，每个客户端均执行以下步骤：In the S4, each client performs the following steps:

S51：将标签数据集输入到已更新权重的本地特征提取器中，获得特征矩阵数据集；S51: Input the label data set into the local feature extractor with updated weights to obtain the feature matrix data set;

S52：将特征矩阵数据集作为支持向量机的输入并对支持向量机进行训练，获得客户端分类器。S52: Use the feature matrix data set as the input of the support vector machine and train the support vector machine to obtain a client classifier.

所述S5中，使用滑动窗口对待诊断设备的传感器数据进行信号分段以及使用最大-最小方法进行归一化处理后再输入到对应客户端的故障诊断模型中。In S5, a sliding window is used to perform signal segmentation on the sensor data of the device to be diagnosed, and a maximum-minimum method is used for normalization processing before being input into the fault diagnosis model of the corresponding client.

本发明所述方法中，使用联邦学习从分散的客户端数据中训练对全体客户端有效的特征提取器；在训练过程中，使用自监督学习从大量无标签数据中学习有用知识；使用监督学习从少量标签数据中提升分类器的最终性能。In the method of the present invention, federated learning is used to train a feature extractor that is effective for all clients from scattered client data; during the training process, self-supervised learning is used to learn useful knowledge from a large amount of unlabeled data; supervised learning is used Improving the final performance of classifiers from small amounts of labeled data.

本发明相对已有的技术，其有益效果在于：Compared with the existing technology, the beneficial effects of the present invention are:

1.模型训练全部是在客户端本地进行的，客户数据不需要上传至服务器，客户无需担忧数据泄露。1. Model training is all performed locally on the client. Customer data does not need to be uploaded to the server, and customers do not need to worry about data leakage.

2.本发明方法能从大量无标签数据中学习知识，在当下设备故障诊断应用普遍缺乏标签的情况下，能发挥良好的性能。2. The method of the present invention can learn knowledge from a large amount of unlabeled data, and can exert good performance in the current situation where labels are generally lacking in equipment fault diagnosis applications.

3.本发明采用对比式自监督学习从小型无标签数据集中训练健壮的模型；采用联邦学习从多个客户端处聚合具有全局知识的故障特征提取器，并避免了客户端本地数据共享，从而解决了旋转设备故障数据集较小、分散并且缺乏标签难以训练出高精度诊断模型的问题。3. The present invention uses comparative self-supervised learning to train a robust model from a small unlabeled data set; it uses federated learning to aggregate fault feature extractors with global knowledge from multiple clients, and avoids client local data sharing, thereby It solves the problem that the rotating equipment fault data set is small, scattered and lacks labels, making it difficult to train a high-precision diagnosis model.

附图说明Description of the drawings

图1为本发明具体步骤的流程示意图。Figure 1 is a schematic flow chart of specific steps of the present invention.

图2为本发明中模型的训练、使用示意图。Figure 2 is a schematic diagram of the training and use of the model in the present invention.

图3为本发明实施例中实验台示意图。Figure 3 is a schematic diagram of the experimental bench in the embodiment of the present invention.

图4为本发明实施例中三个客户端数据分布情况。Figure 4 shows the data distribution of three clients in the embodiment of the present invention.

图5为本发明实施例中第一个客户端故障诊断结果的混淆矩阵。Figure 5 is a confusion matrix of the first client fault diagnosis result in the embodiment of the present invention.

图6为本发明实施例中第二个客户端故障诊断结果的混淆矩阵。Figure 6 is a confusion matrix of the second client fault diagnosis result in the embodiment of the present invention.

图7为本发明实施例中第三个客户端故障诊断结果的混淆矩阵。Figure 7 is a confusion matrix of the third client fault diagnosis result in the embodiment of the present invention.

具体实施方式Detailed ways

下面以凯斯西储大学轴承故障数据集(CWRU)作为具体实施例的数据，对本发明做进一步说明。The present invention will be further described below using the Case Western Reserve University Bearing Failure Dataset (CWRU) as the data of a specific embodiment.

CWRU数据集的试验台由电机、扭矩传感器和功率测试仪组成。电动机的转轴被待测轴承支承，电动机上安装了加速度计并以12KHZ的频率采样一秒的振动信号，该数据集的实验台示意如图3所示。轴承存在三种故障，分别为内圈损坏、外圈损坏和滚子损坏。每种故障类型还能细分为三种严重程度。因此，算上正常状态，轴承数据一共存在十种状态。The testbed of the CWRU data set consists of a motor, a torque sensor and a power tester. The rotating shaft of the motor is supported by the bearing to be tested. An accelerometer is installed on the motor and the vibration signal is sampled for one second at a frequency of 12KHZ. The experimental bench diagram of this data set is shown in Figure 3. There are three types of bearing failures, namely inner ring damage, outer ring damage and roller damage. Each fault type can also be broken down into three severity levels. Therefore, including the normal state, there are ten states in total for bearing data.

图1和图2展示了本发明的流程，结合CWRU数据集的实施，具体包括下列步骤：Figures 1 and 2 show the process of the present invention, combined with the implementation of the CWRU data set, which specifically includes the following steps:

S1中，特征提取器采用带有残差连接的卷积神经网络(ResNet)，这是一种广泛使用的深度学习模型。服务器与客户端的特征提取器结构保持一致，为了适应时间序列数据，Resnet的卷积核被限制仅沿时间维度滑动。In S1, the feature extractor uses a convolutional neural network (ResNet) with residual connections, which is a widely used deep learning model. The feature extractor structure of the server and client is consistent. In order to adapt to time series data, the convolution kernel of Resnet is restricted to slide only along the time dimension.

S2中，每个客户端均执行以下步骤：In S2, each client performs the following steps:

S21：利用加速度计采集本地设备工作时产生的信号并记为本地振动数据；具体是，如图3所示，在电机靠近轴承的位置安装加速度计。运行电机，在随后的某一时刻启动加速度计采样十秒钟信号，并记录轴承状态。S21: Use the accelerometer to collect the signals generated by the local equipment during operation and record them as local vibration data; specifically, as shown in Figure 3, install the accelerometer near the motor bearing. Run the motor, and at a later point start the accelerometer to sample the signal for ten seconds and record the bearing status.

S22：随机选择预设比例的本地振动数据并按照数据反映的设备真实状态对选择的本地振动数据进行标注，按照1：2的比例划分为初始有标签数据集和初始测试集，未选择的本地振动数据则记为初始无标签数据集；具体实施中，预设比例设置为30％。S22: Randomly select local vibration data with a preset ratio and label the selected local vibration data according to the real status of the device reflected by the data. Divide the selected local vibration data into an initial labeled data set and an initial test set according to a ratio of 1:2. Unselected local vibration data The vibration data is recorded as the initial unlabeled data set; in the specific implementation, the preset proportion is set to 30%.

S23：使用以长度为1024，滑动步长为1024的滑动窗口分别将初始有标签数据集、初始无标签数据集和初始测试集的所有信号切分为不重合的多段，分别获得已切分的有标签数据集、无标签数据集和测试集；CWRU数据集包含161个轴承的振动数据，本实施例以三客户端为例进行说明，因此将161个轴承振动数据切分之后按照非独立同分布方式划分为这三个客户端的本地数据集，数据分布如图4所示,可以看出，三个客户端数据分布有着较大的差异。每个客户端的本地有标签数据集、无标签数据集和测试集的数据量比例约为1：7：2。S23: Use a sliding window with a length of 1024 and a sliding step of 1024 to segment all the signals of the initial labeled data set, the initial unlabeled data set and the initial test set into non-overlapping segments, and obtain the segmented data respectively. There are labeled data sets, unlabeled data sets and test sets; the CWRU data set contains vibration data of 161 bearings. This embodiment uses three clients as an example to illustrate. Therefore, the vibration data of 161 bearings are divided and divided according to non-independent synchronization. The distribution method is divided into local data sets of these three clients. The data distribution is shown in Figure 4. It can be seen that the data distribution of the three clients is quite different. The data volume ratio of each client's local labeled data set, unlabeled data set and test set is approximately 1:7:2.

S24：使用最大-最小方法分别将已切分的有标签数据集、无标签数据集和测试集进行数值缩放，缩放到[0,1]区间，获得最终的无标签数据集、有标签数据集和测试集；S24: Use the max-min method to numerically scale the segmented labeled data set, unlabeled data set and test set to the [0,1] interval to obtain the final unlabeled data set and labeled data set. and test set;

S3具体为：S3 specifically is:

S31中，每个客户端均执行以下步骤：In S31, each client performs the following steps:

S311中，第一数据增强样本是先为每条无标签数据添加高斯噪声，再对添加噪声的数据进行缩放后获得的；第二数据增强样本/>是先通过平滑地扭曲每条无标签数据的时间步的间隔，再施加噪声后获得的。将一个批次的无标签数据集表示为：x_train＝{x¹，x²，...，x^N}，其中，x∈R^L，N为批次大小。在本实施例中，一个批次的数据使用相同的噪声和缩放因子，均从标准高斯分布中采样。In S311, the first data enhancement sample It is obtained by first adding Gaussian noise to each piece of unlabeled data, and then scaling the noise-added data; the second data enhancement sample/> It is obtained by first smoothly warping the interval of time steps of each piece of unlabeled data and then applying noise. Express a batch of unlabeled data sets as: x _train = {x ¹ , x ² ,..., x ^N }, where x∈R ^L and N is the batch size. In this embodiment, a batch of data is sampled from a standard Gaussian distribution using the same noise and scaling factors.

S312：在自监督学习框架下使用无标签数据集对应的增强样本对数据集进行一轮本地特征提取器的训练，获得当前轮训练后的本地特征提取器的权重并上传至服务器。在本实施例中，使用ResNet提取增强样本的特征表示。Resnet的卷积核大小被固定为[3,1],并且被限制只沿时间维度滑动。Resnet的输出维度,即特征表示的长度，被设置为64。S312: Under the self-supervised learning framework, use the enhanced samples corresponding to the unlabeled data set to perform a round of local feature extractor training on the data set, obtain the weight of the local feature extractor after the current round of training and upload it to the server. In this embodiment, ResNet is used to extract feature representations of enhanced samples. The convolution kernel size of Resnet is fixed to [3,1] and is restricted to sliding only along the time dimension. The output dimension of Resnet, that is, the length of the feature representation, is set to 64.

本地特征提取器f在训练过程中，输出第一特征矩阵和第一特征矩阵/> T表示第一特征矩阵时间维度的长度，即时间步数量。使用梯度下降算法优化特征提取器权重，本实施例使用Adam优化器执行梯度下降算法，学习率设置为3e^-4。损失函数包括第一损失函数和第二损失函数，第一损失函数loss1的计算公式如下：During the training process, the local feature extractor f outputs the first feature matrix and the first characteristic matrix/> T represents the length of the time dimension of the first feature matrix, that is, the number of time steps. The gradient descent algorithm is used to optimize the weight of the feature extractor. In this embodiment, the Adam optimizer is used to perform the gradient descent algorithm, and the learning rate is set to 3e ^-4 . The loss function includes the first loss function and the second loss function. The calculation formula of the first loss function loss1 is as follows:

其中，N为批次大小，α，β分别为第一和第二超参数，用于调节l_c与l_t的贡献权重。为指示函数，当括号中条件成立时值为1，否则为0；i与j分别表示批次中样本的第一、第二索引；l_c为上下文对比损失函数，该损失函数目的是缩小特征提取器所输出的增强样本对之间的特征距离，同时加大增强样本与其他所有样本的特征距离；l_t为时间对比损失函数，该函数从时间步的相似性上进一步约束特征提取器的输出。具体来说，尽量缩小增强样本对之间相同时间步位置的特征距离，尽量扩大增强样本对之间不同时间步位置的特征距离和每个样本自身不同时间步的特征距离。l_c与l_t是根据NT-Xent损失函数思想设计的对比损失项，分别使用批次数据中增强样本间的上下文依赖和时间依赖构造监督信息，从而帮助特征提取器从无标签数据中获益。对第一特征矩阵/>和第二特征矩阵/>应用随机剪裁，切分成较短的段落，/>表示第一裁剪特征矩阵，为从第一起始位置s1到第一结束位置e1对第一特征矩阵/>进行裁剪后的矩阵，/>表示第二裁剪特征矩阵，为从第一起始位置s2到第一结束位置e2对第二特征矩阵/>进行裁剪后的矩阵，/>与/>在[s2:e1]部分是重叠的，s1和s2分别表示第一和第二起始位置，e1和e2分别表示第一和第二结束位置；T表示特征矩阵时间维度的长度，即时间步数量；/>表示第二裁剪特征矩阵/>中样本i在时间步t′的特征，/>表示第一裁剪特征矩阵/>中样本i在时间步t的特征，/>表示第一裁剪特征矩阵/>中的样本i，/>表示第二裁剪特征矩阵/>中的样本i，/>表示第二裁剪特征矩阵/>中的样本j；Among them, N is the batch size, α and β are the first and second hyperparameters respectively, which are used to adjust the contribution weight of l _c and l _t . is an indicator function, the value is 1 when the conditions in parentheses are true, otherwise it is 0; i and j respectively represent the first and second index of the sample in the batch; l _c is the context contrast loss function, the purpose of this loss function is to narrow the features The feature distance between the enhanced sample pairs output by the extractor also increases the feature distance between the enhanced sample and all other samples; l _t is the time contrast loss function, which further constrains the feature extractor from the similarity of time steps. output. Specifically, try to minimize the feature distance between the enhanced sample pairs at the same time step position, try to expand the feature distance between the enhanced sample pairs at different time step positions, and try to expand the feature distance at different time steps of each sample itself. l _c and l _t are contrastive loss terms designed based on the idea of NT-Xent loss function. They use the enhanced context dependence and time dependence between samples in batch data to construct supervision information respectively, thereby helping the feature extractor benefit from unlabeled data. . For the first characteristic matrix/> and the second characteristic matrix/> Apply random cropping to split into shorter paragraphs/> Represents the first cropping feature matrix, which is the pair of first feature matrices from the first starting position s1 to the first ending position e1/> The matrix after clipping,/> Represents the second cropping feature matrix, which is the pair of second feature matrices from the first starting position s2 to the first ending position e2/> The matrix after clipping,/> with/> The [s2:e1] part overlaps, s1 and s2 represent the first and second starting positions respectively, e1 and e2 represent the first and second ending positions respectively; T represents the length of the time dimension of the feature matrix, that is, the time step Quantity;/> Represents the second clipping feature matrix/> Characteristics of sample i at time step t′,/> Represents the first clipping feature matrix/> Characteristics of sample i at time step t,/> Represents the first clipping feature matrix/> Sample i in ,/> Represents the second clipping feature matrix/> Sample i in ,/> Represents the second clipping feature matrix/> Sample j in

第二损失函数loss2用于约束特征特征提取器与之前训练轮次特征提取器的距离，第二损失函数loss2的计算公式如下：The second loss function loss2 is used to constrain the distance between the feature extractor and the feature extractor of the previous training round. The calculation formula of the second loss function loss2 is as follows:

其中，分别表示第k个客户端第r轮的本地特征提取器提取的第一特征矩阵/>和第一特征矩阵/>分别表示第k个客户端第p轮的本地特征提取器提取的第一特征矩阵/>和第一特征矩阵/>表示每个客户端使用在第r轮开始训练时从服务器接收的全局特征提取器权重提取的特征。in, Respectively represent the first feature matrix extracted by the local feature extractor of the k-th client in the r-th round/> and the first characteristic matrix/> Respectively represent the first feature matrix extracted by the local feature extractor of the p-th round of the k-th client/> and the first characteristic matrix/> Represents the features extracted by each client using the global feature extractor weights received from the server at the start of training in round r.

本实施例中，被裁剪保留的特征表示长度为32，一个批次的数据裁剪起始位置相同，从均匀分布U[0,32]中采样。In this embodiment, the length of the feature representation that is retained by cropping is 32, the data cropping starting position of a batch is the same, and it is sampled from the uniform distribution U[0,32].

S32：服务器聚合全体客户端的本地特征提取器权重后，获得全局特征提取器权重并下发该权重至各客户端的本地特征提取器，作为各客户端的本地特征提取器下一训练轮次的初始权重；S32: After the server aggregates the local feature extractor weights of all clients, it obtains the global feature extractor weight and delivers the weight to the local feature extractor of each client as the initial weight of the local feature extractor of each client for the next training round. ;

S32中，服务器使用加权平均方法聚合全体客户端的本地特征提取器权重，计算公式如下：In S32, the server uses the weighted average method to aggregate the local feature extractor weights of all clients. The calculation formula is as follows:

本实施例中，当三个客户端均完成特征提取器训练并上传模型权重后，服务器聚合这些模型权重形成新的模型权重，并下发给客户端。训练次数设置为40。In this embodiment, after all three clients have completed feature extractor training and uploaded model weights, the server aggregates these model weights to form new model weights and sends them to the clients. The number of training times is set to 40.

S4中，每个客户端均执行以下步骤：In S4, each client performs the following steps:

S41：将标签数据集输入到已更新权重的本地特征提取器中，获得特征矩阵数据集；S41: Input the label data set into the local feature extractor with updated weights to obtain the feature matrix data set;

S42：将特征矩阵数据集作为支持向量机的输入并对支持向量机进行训练，获得客户端分类器。S42: Use the feature matrix data set as the input of the support vector machine and train the support vector machine to obtain a client classifier.

在本实施例中，步骤S24获得的测试集实际上就是经过预处理后的待诊断数据。使用故障诊断模型对测试集数据进行分类，得到设备诊断结果。In this embodiment, the test set obtained in step S24 is actually the preprocessed data to be diagnosed. Use the fault diagnosis model to classify the test set data and obtain the equipment diagnosis results.

将设备诊断结果与实际结果相对比，选择了准确率(Acc)和宏平均F1分数(MF1)作为评价指标衡量模型性能。本实施例中，上述指标结果如表1所示。The device diagnosis results were compared with the actual results, and accuracy (Acc) and macro-average F1 score (MF1) were selected as evaluation indicators to measure model performance. In this embodiment, the above index results are shown in Table 1.

表1为模型性能评价表Table 1 is the model performance evaluation table

评价结果显示，本发明的方法能够成功对设备进行诊断，并取得较高的诊断准确率，方法可行有效。图5、图6、图7展示了三个客户端诊断结果的混淆矩阵。可以看出，尽管每个客户端的数据分布存在较大差异，但是本发明的方法均能取得极高的诊断精度。The evaluation results show that the method of the present invention can successfully diagnose the equipment and achieve high diagnostic accuracy, and the method is feasible and effective. Figure 5, Figure 6, and Figure 7 show the confusion matrices of the three client diagnosis results. It can be seen that although there are large differences in the data distribution of each client, the method of the present invention can achieve extremely high diagnostic accuracy.

上述实施例是本发明在凯斯西储大学轴承故障数据集上的实施方式，但本发明的故障诊断方法具体实施不仅局限于轴承，任何通过传感器收集设备运行数据，并依照本发明原理和思路进行设备故障诊断的相似方案，均应视为本发明专利的保护范围。The above embodiment is an implementation of the present invention on the bearing fault data set of Case Western Reserve University. However, the specific implementation of the fault diagnosis method of the present invention is not limited to bearings. Any equipment operation data collected through sensors can be collected in accordance with the principles and ideas of the present invention. Similar solutions for equipment fault diagnosis should be regarded as the scope of protection of the patent of this invention.

Claims

1. An equipment fault diagnosis method based on federated self-supervised learning, which is characterized by including the following steps:

S1: The server initializes the weight of the feature extractor and sends the weight to each client. Each client uses this weight as the initial weight of the local feature extractor;

S2: Each client uses sensors to collect signals generated by local equipment during operation and records them as local vibration data. Then, after preprocessing the local vibration data, obtain unlabeled data sets and labeled data sets;

S3: Under the federated self-supervised learning framework, each client uses an unlabeled data set to train a local feature extractor, and then obtains a trained local feature extractor;

S4: Each client uses the labeled data set to train a classifier under the supervised learning framework to obtain the corresponding client classifier. In each client, a fault diagnosis model is formed by connecting the current feature extractor and the client classifier;

S5: After preprocessing, the sensor data of the device to be diagnosed is connected to the fault diagnosis model of the corresponding client to obtain the corresponding device diagnosis results.

2. An equipment fault diagnosis method based on federated self-supervised learning according to claim 1, characterized in that in S1, the feature extractor adopts a convolutional neural network with residual connections.

3. A device fault diagnosis method based on federated self-supervised learning according to claim 1, characterized in that, in the S2, each client performs the following steps:

S21: Use the accelerometer to collect the signals generated by the local equipment during operation and record them as local vibration data;

S22: Randomly select a preset proportion of local vibration data and label the selected local vibration data according to the real state of the device to obtain an initial labeled data set. The unselected local vibration data is recorded as an initial unlabeled data set;

S23: Use a sliding window to divide all signals of the initial labeled data set and unlabeled data set into multiple segments, and obtain the segmented labeled data set and unlabeled data set respectively;

S24: Use the max-min method to numerically scale the segmented labeled data set and unlabeled data set, respectively, to obtain the final unlabeled data set and labeled data set.

4. An equipment fault diagnosis method based on federated self-supervised learning according to claim 1, characterized in that the S3 is specifically:

S31: In each round of training, each client uses an unlabeled data set to train a local feature extractor under a self-supervised learning framework. The local feature extractor weights of each client after the current round of training are obtained and uploaded to the server;

S32: After the server aggregates the local feature extractor weights of all clients, it obtains the global feature extractor weight and delivers the weight to the local feature extractor of each client;

S33: Repeat S31-S32 to update the global feature extractor weights for multiple rounds until the preset round is reached. The final global feature extractor weights are sent to the local feature extractors of each client, so that each client obtains training. Good local feature extractor.

5. A device fault diagnosis method based on federated self-supervised learning according to claim 4, characterized in that, in the S31, each client performs the following steps:

S311: Use two different data enhancement methods to perform data enhancement on each piece of unlabeled data in the unlabeled data set, and obtain the corresponding enhanced sample pairs;

S312: Under the self-supervised learning framework, use the enhanced samples corresponding to the unlabeled data set to perform a round of local feature extractor training on the data set, obtain the weight of the local feature extractor after the current round of training and upload it to the server.

6. An equipment fault diagnosis method based on federated self-supervised learning according to claim 5, characterized in that, in the S311, the first data enhancement sample It is obtained by first adding Gaussian noise to each piece of unlabeled data, and then scaling the noise-added data; the second data enhancement sample/> It is obtained by first smoothly warping the interval of time steps of each piece of unlabeled data and then applying noise.

7. An equipment fault diagnosis method based on federated self-supervised learning according to claim 4, characterized in that, during the training process, the local feature extractor outputs a first feature matrix and the first characteristic matrix/> Use the gradient descent algorithm to optimize the weight of the feature extractor. The loss function includes the first loss function and the second loss function. The calculation formula of the first loss function loss1 is as follows:

Among them, N is the batch size, α and β are the first and second hyperparameters respectively, is the indicator function; i and j represent the first and second index of the sample in the batch respectively; l _c is the context contrast loss function, l _t is the time contrast loss function/> Represents the first clipping feature matrix, /> represents the second cropping feature matrix, s1 and s2 represent the first and second starting positions respectively, e1 and e2 represent the first and second ending positions respectively; T represents the length of the time dimension of the feature matrix;/> Represents the second clipping feature matrix/> Characteristics of sample i at time step t′,/> Represents the first clipping feature matrix/> Characteristics of sample i at time step t,/> Represents the first clipping feature matrix/> Sample i in ,/> Represents the second clipping feature matrix/> Sample i in ,/> Represents the second clipping feature matrix/> Sample j in

The calculation formula of the second loss function loss2 is as follows:

in, Respectively represent the first feature matrix extracted by the local feature extractor of the k-th client in the r-th round/> and the first characteristic matrix/> Respectively represent the first feature matrix extracted by the local feature extractor of the p-th round of the k-th client/> and the first characteristic matrix/> Represents the features extracted by each client using the global feature extractor weights received from the server at round r of training.

8. A device fault diagnosis method based on federated self-supervised learning according to claim 4, characterized in that, in the S32, the server uses a weighted average method to aggregate the local feature extractor weights of all clients, and the calculation formula is as follows:

Among them, k is the client index, |D|, | ^Dk | represent the data volume of the global client and the data volume of the k-th client respectively, θ ^G and θ ^k represent the weight of the global feature extractor and the k-th client respectively. Local feature extractor weights at the end.

9. A device fault diagnosis method based on federated self-supervised learning according to claim 1, characterized in that, in the S4, each client performs the following steps:

S51: Input the label data set into the local feature extractor with updated weights to obtain the feature matrix data set;

S52: Use the feature matrix data set as the input of the support vector machine and train the support vector machine to obtain a client classifier.

10. A device fault diagnosis method based on federated self-supervised learning according to claim 1, characterized in that, in the S5, a sliding window is used to perform signal segmentation on the sensor data of the device to be diagnosed and a maximum-minimum method is used. After normalization processing, it is input into the fault diagnosis model of the corresponding client.