CN110414383A

CN110414383A - Convolutional Neural Network Adversarial Transfer Learning Method Based on Wasserstein Distance and Its Application

Info

Publication number: CN110414383A
Application number: CN201910624662.9A
Authority: CN
Inventors: 袁烨; 周倍同; 程骋; 李星毅; 马贵君
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2019-11-05

Abstract

The invention relates to a convolutional neural network confrontation migration learning method based on Wasserstein distance and its application, comprising: using a convolutional neural network to be migrated to obtain a source domain feature set and a source domain fault judgment set of a source domain marked sample set; The target feature set of the target domain sample set; with the goal of maximizing the Wasserstein distance between the source domain feature set and the target feature set and minimizing the sum of the Wasserstein distance and the judgment loss value of the source domain fault judgment set, based on the convergence criterion , which implements adversarial transfer learning for convolutional neural networks. The invention introduces the Wasserstein distance in the migration learning of the convolutional neural network, takes the maximum Wasserstein distance as the goal, improves the discrimination sensitivity of the features extracted from the two sample sets, and then uses the Wasserstein distance and the loss value of the source domain fault judgment set. In order to improve the judgment accuracy of the convolutional neural network, it has low requirements on the sample data and network structure while ensuring the fault diagnosis ability. It is suitable for migration between multiple working conditions and has strong practical applicability.

Description

Convolutional neural network adversarial transfer learning method based on Wasserstein distance and its application

技术领域technical field

本发明属于工业过程故障诊断领域，更具体地，涉及一种基于Wasserstein距离的卷积神经网络对抗迁移学习方法及其应用，The invention belongs to the field of industrial process fault diagnosis, and more particularly, relates to a convolutional neural network confrontation transfer learning method based on Wasserstein distance and its application,

背景技术Background technique

故障诊断旨在通过使用获取的测量数据和其他信息，监测和分析机器状态来隔离系统上的故障，想做到上述这点需要具有高超技能、经验丰富的专家，这增加了使用人工智能技术做出故障诊断决策的需求。实时故障诊断框架的部署允许维护团队提前采取行动来更换或修复受影响的组件，从而提高生产效率并保证操作安全。因此，精确的诊断轴承故障对机械制造系统的可靠性和安全性具有关键意义。Fault diagnosis aims to isolate faults on the system by monitoring and analyzing machine status using acquired measurement data and other information. This requires highly skilled and experienced experts, which increases the use of artificial intelligence techniques. The need for troubleshooting decisions. The deployment of a real-time troubleshooting framework allows maintenance teams to take early action to replace or repair affected components, increasing productivity and maintaining operational safety. Therefore, accurate diagnosis of bearing faults is of critical significance to the reliability and safety of mechanical manufacturing systems.

许多先进的信号处理和机器学习技术已被用于故障诊断。在过去几年中，深度学习模型(例如，深度置信网络，稀疏自动编码器，尤其是卷积神经网络)，在故障诊断任务中，比基于规则和基于模型的方法，表现出更好的拟合和学习能力，其旨在利用相关源域的知识(具有足够的标记数据)，在目标域(标记数据不足或没有标记数据)中进行学习，从而节省了从头开始重建一个新的故障诊断模型的大量时间和计算成本。Many advanced signal processing and machine learning techniques have been used for fault diagnosis. In the past few years, deep learning models (e.g., deep belief networks, sparse autoencoders, and especially convolutional neural networks), have shown better performance than rule-based and model-based methods in fault diagnosis tasks Combined learning capability, which aims to leverage knowledge from the relevant source domain (with sufficient labeled data) to learn in the target domain (insufficient or no labeled data), saving the need to rebuild a new fault diagnosis model from scratch considerable time and computational cost.

然而，上述深度学习模型仍存在两个困难：1)大多数方法需要基于独立同分布假设，即源域和目标域任务的样本集需要相同的分布。因此，当面对新的诊断任务时，提前训练的网络的适应性受到限制，其中新任务的不同操作条件和物理特性可能导致新样本集(目标样本集)和原始样本集(源样本集)之间的分布差异，导致的结果是：对于新的故障诊断任务，深度学习模型通常是需要从头开始训练，这导致计算资源和训练时间的浪费；2)目标域中标记或未标记的数据不足是另一个常见问题。在实际工业情况下，对于新的诊断任务，收集足够的典型样本以构建大规模和高质量的数据集来训练网络是极其困难的。并且由于难以在封装设备中安装足够的传感器，同时，工业打标通常需要昂贵的人力。因此，域适应的挑战在于无法在目标域中采集到有标记的数据或者只能收集到少量的有标记数据。However, the above deep learning models still suffer from two difficulties: 1) Most methods need to be based on the IID assumption, that is, the sample sets of the source and target domain tasks require the same distribution. Therefore, the adaptability of pre-trained networks is limited when faced with new diagnostic tasks, where different operating conditions and physical properties of the new task may result in a new sample set (target sample set) and original sample set (source sample set) The difference in distribution between them leads to the following: For new fault diagnosis tasks, deep learning models usually need to be trained from scratch, which leads to wasted computing resources and training time; 2) Insufficient labeled or unlabeled data in the target domain is another common problem. In practical industrial situations, for new diagnostic tasks, it is extremely difficult to collect enough typical samples to construct large-scale and high-quality datasets to train networks. And because it is difficult to install enough sensors in the packaging equipment, at the same time, industrial marking often requires expensive labor. Therefore, the challenge of domain adaptation is that labeled data cannot be collected in the target domain or only a small amount of labeled data can be collected.

基于上述挑战，目前深度迁移学习算法做了一些改进，一种是基于采集数据的迁移网络，主要是通过为源域中采集的数据分配适当的权重值，将源域中所选择的数据移植到目标域中作为数据补充，这种迁移学习方法不依赖模型且对数据相似度要求较高。第二种是基于网络结构的迁移网络是指对源域预先训练的部分网络进行迁移，包括其网络结构和连接参数，将其转换为目标域中使用的深度神经网络的一部分，但这种迁移学习方法只针对某些网络结构有较好的效果，如：LeNet,AlexNet,VGG,Inception,ResNet。第三种是基于域融合的迁移网络，其中最大平均差异法最为常见，但这种最大平均差异法，计算成本将随着样本数的增加而呈四次方增长，这限制了MMD在许多具有大数据集的实际应用中的适用性。Based on the above challenges, the current deep transfer learning algorithm has made some improvements. One is the transfer network based on the collected data, which is mainly to transfer the selected data in the source domain to the data collected in the source domain by assigning appropriate weight values. As a data supplement in the target domain, this transfer learning method does not depend on the model and requires high data similarity. The second is the transfer network based on network structure, which refers to the transfer of part of the network pre-trained in the source domain, including its network structure and connection parameters, to convert it into a part of the deep neural network used in the target domain, but this transfer The learning method only has good results for some network structures, such as: LeNet, AlexNet, VGG, Inception, ResNet. The third is the transfer network based on domain fusion, in which the maximum average difference method is the most common, but this maximum average difference method, the computational cost will increase quadratically with the increase of the number of samples, which limits MMD in many applications with Applicability to practical applications of large datasets.

因此，开发一种工业实用性强且故障诊断能力高的深度迁移学习算法，是目标工业过程故障诊断领域亟待解决的技术问题。Therefore, developing a deep transfer learning algorithm with strong industrial practicability and high fault diagnosis ability is an urgent technical problem to be solved in the field of target industrial process fault diagnosis.

发明内容SUMMARY OF THE INVENTION

本发明提供一种基于Wasserstein距离的卷积神经网络对抗迁移学习方法及其应用，用以解决现有深度迁移学习方法在保证工业过程故障诊断能力的同时对实际样本数据和/或神经网络结构要求较高而导致难以应用于实际工业过程的技术问题。The invention provides a convolutional neural network confrontation transfer learning method based on Wasserstein distance and its application, which are used to solve the requirements of the existing deep transfer learning method for actual sample data and/or neural network structure while ensuring the fault diagnosis capability of industrial processes. It is high and leads to technical problems that are difficult to apply to actual industrial processes.

本发明解决上述技术问题的技术方案如下：一种基于Wasserstein距离的卷积神经网络对抗迁移学习方法，包括：The technical solution of the present invention to solve the above-mentioned technical problems is as follows: a Convolutional Neural Network confrontation transfer learning method based on Wasserstein distance, comprising:

步骤1、从源域标记数据集和目标域数据集中，确定源域标记样本集和目标域样本集；Step 1. From the source domain labeled dataset and the target domain dataset, determine the source domain labeled sample set and the target domain sample set;

步骤2、采用待迁移学习的卷积神经网络，得到所述源域标记样本集的源域特征集和源域故障判断集以及所述目标域样本集的目标特征集；Step 2, using the convolutional neural network to be transferred and learned, to obtain the source domain feature set and the source domain fault judgment set of the source domain marked sample set and the target feature set of the target domain sample set;

步骤3、以最大化所述源域特征集和所述目标特征集之间的Wasserstein距离并最小化所述Wasserstein距离和所述源域故障判断集的判断损失值的加和为目标，调整所述卷积神经网络的参数，基于收敛判据，重复步骤1，或者，完成卷积神经网络的对抗迁移学习。Step 3. With the goal of maximizing the Wasserstein distance between the source domain feature set and the target feature set and minimizing the sum of the Wasserstein distance and the judgment loss value of the source domain fault judgment set, adjust all parameters. The parameters of the convolutional neural network described above, based on the convergence criterion, repeat step 1, or, complete the adversarial transfer learning of the convolutional neural network.

本发明的有益效果是：本发明在卷积神经网络的迁移学习中引入Wasserstein距离，以Wasserstein距离最大为目标，提高对两种样本集所提取的特征的区分敏感度，再以Wasserstein距离和源域故障判断集的判断损失值的加和最小为目标，以提高卷积神经网络的判断精度。基于上述目标，优化卷积神经网络的参数，在训练过程中，两个目标形成对抗，实现深度对抗迁移学习。因此，本发明训练得到的卷积神经网络能够最小化源域和目标域之间的分布差异，采用未标记的目标域数据进行无监督迁移学习，就足够获得精确度高的故障诊断能力。一定程度上解决了现有技术面对新的故障诊断任务需要从头开始重建深度学习模型所导致的计算资源和训练时间的浪费以及缺乏在目标域中足够的标记数据的技术问题，并且在保证工业过程故障诊断能力的同时，对样本数据和网络结构没有特别高的要求，适用于实际工业过程的故障诊断。The beneficial effects of the present invention are as follows: the present invention introduces the Wasserstein distance in the migration learning of the convolutional neural network, takes the maximum Wasserstein distance as the target, improves the discrimination sensitivity of the features extracted from the two sample sets, and then uses the Wasserstein distance and the source The goal is to minimize the sum of the judgment loss values of the domain fault judgment set to improve the judgment accuracy of the convolutional neural network. Based on the above goals, the parameters of the convolutional neural network are optimized. During the training process, the two goals form a confrontation to achieve deep adversarial transfer learning. Therefore, the convolutional neural network trained by the present invention can minimize the distribution difference between the source domain and the target domain, and using unlabeled target domain data for unsupervised transfer learning is sufficient to obtain a high-precision fault diagnosis capability. To a certain extent, it solves the technical problems of the waste of computing resources and training time caused by the need to rebuild the deep learning model from scratch in the face of new fault diagnosis tasks in the existing technology and the lack of sufficient labeled data in the target domain, and it is in the industry to ensure In addition to the process fault diagnosis ability, it does not have particularly high requirements on sample data and network structure, and is suitable for fault diagnosis of actual industrial processes.

上述技术方案的基础上，本发明还可以做如下改进。On the basis of the above technical solutions, the present invention can also be improved as follows.

进一步，所述Wasserstein距离为：Further, the Wasserstein distance is:

通过采用域相似度评价神经网络，分别将所述源域特征集映射为源域实数集、所述目标特征集映射为目标实数集，并将所述源域实数集的实数平均值减去所述目标实数集的实数平均值计算得到。By using the domain similarity evaluation neural network, the source domain feature set is mapped to the source domain real number set, the target feature set is mapped to the target real number set, and the real number average value of the source domain real number set is subtracted from all real numbers. Calculate the average value of the real numbers of the target real number set.

本发明的进一步有益效果是：本发明引入域相似度评价神经网络，分别将源域判断集和目标判断集映射为实数，基于实数计算Wasserstein距离，以Wasserstein距离最大为目标，训练基于域相似度的评价神经网络，提高域相似度评价神经网络对两种样本集所提取的特征的区分敏感度。其中，基于实数平均值进行Wasserstein距离计算，简单且可靠性高。The further beneficial effects of the present invention are as follows: the present invention introduces a domain similarity evaluation neural network, respectively maps the source domain judgment set and the target judgment set to real numbers, calculates the Wasserstein distance based on the real numbers, takes the maximum Wasserstein distance as the target, and trains the training based on the domain similarity. The evaluation neural network improves the discrimination sensitivity of the domain similarity evaluation neural network to the features extracted from the two sample sets. Among them, the Wasserstein distance calculation is performed based on the average value of real numbers, which is simple and highly reliable.

进一步，所述步骤3包括：Further, the step 3 includes:

步骤3.1、以增大所述源域特征集和所述目标特征集之间的Wasserstein距离为目标，优化所述域相似度评价神经网络的参数，得到预设迭代次数对应的新的域相似度评价神经网络；Step 3.1, with the goal of increasing the Wasserstein distance between the source domain feature set and the target feature set, optimize the parameters of the domain similarity evaluation neural network, and obtain a new domain similarity corresponding to the preset number of iterations Evaluate neural networks;

步骤3.2、基于所述源域标记样本集、所述目标域样本集和所述新的域相似度评价神经网络，以减小所述Wasserstein距离和所述卷积神经网络的判断损失值的加和为目标，优化所述卷积神经网络的参数，得到预设迭代次数对应的新的卷积神经网络；Step 3.2. Evaluate the neural network based on the source domain labeled sample set, the target domain sample set and the new domain similarity to reduce the addition of the Wasserstein distance and the judgment loss value of the convolutional neural network. and as the goal, optimize the parameters of the convolutional neural network to obtain a new convolutional neural network corresponding to the preset number of iterations;

步骤3.3、基于域相似度评价神经网络和卷积神经网络的收敛判据，停止训练，完成所述卷积神经网络的对抗迁移学习，或者，重复步骤1。Step 3.3: Evaluate the convergence criterion of the neural network and the convolutional neural network based on the domain similarity, stop training, and complete the confrontation transfer learning of the convolutional neural network, or repeat step 1.

本发明的进一步有益效果是：以Wasserstein距离最大为目标，训练域相似度评价神经网络，提高域相似度评价神经网络对两种样本集所提取的特征的区分敏感度。之后，固定域相似度评价神经网络的参数，调整卷积神经网络的参数，一方面基于卷积神经网络得到源域样本集的判断损失值，另一方面，将卷积神经网络提取得到的源域特征集和目标域特征集输入上述训练好的域相似度评价神经网络，得到Wasserstein距离，以Wasserstein距离和判断损失值的加和最小为目标，训练故障判断卷积神经网络。因此，基于先训练域相似度评价神经网络后训练卷积神经网络，之后再训练域相似度评价神经网络再训练卷积神经网络的方式，反复迭代多次，直至两种神经网络收敛，完成卷积神经网络训练，使得该积神经网络具有较高的故障判断能力。The further beneficial effects of the invention are: aiming at maximizing the Wasserstein distance, the domain similarity evaluation neural network is trained, and the discrimination sensitivity of the domain similarity evaluation neural network to the features extracted from the two sample sets is improved. After that, the parameters of the neural network are evaluated by the fixed domain similarity, and the parameters of the convolutional neural network are adjusted. On the one hand, the judgment loss value of the source domain sample set is obtained based on the convolutional neural network. The domain feature set and the target domain feature set are input into the above trained domain similarity evaluation neural network, and the Wasserstein distance is obtained, and the fault judgment convolutional neural network is trained with the goal of minimizing the sum of the Wasserstein distance and the judgment loss value. Therefore, based on the method of first training the domain similarity evaluation neural network, then training the convolutional neural network, and then training the domain similarity evaluation neural network and then training the convolutional neural network, iterates many times until the two neural networks converge, and the volume is completed. The product neural network is trained, so that the product neural network has a higher fault judgment ability.

进一步，所述步骤3.1中，所述Wasserstein距离为基于梯度惩罚的所述源域实数集的实数平均值和所述目标实数集的实数平均值的差值；Further, in the step 3.1, the Wasserstein distance is the difference between the real average value of the source domain real number set and the real number average value of the target real number set based on gradient penalty;

所述步骤3.2中，所述Wasserstein距离为所述源域实数集的实数平均值和所述目标实数集的实数平均值的差值。In the step 3.2, the Wasserstein distance is the difference between the real average value of the source domain real number set and the real number average value of the target real number set.

本发明的进一步有益效果是：本发明引入梯度惩罚，用于限制域相似度评价神经网络，防止参数过于复杂，降低计算复杂度，提高域相似度评价神经网络和卷积神经网络的收敛速度。另外，在优化卷积神经网络的参数时，忽略梯度惩罚，梯度惩罚不影响特征表示的学习过程，避免了梯度惩罚对卷积神经网络的故障诊断能力。Further beneficial effects of the present invention are: the present invention introduces gradient penalty to limit the domain similarity evaluation neural network, prevent the parameters from being too complicated, reduce the computational complexity, and improve the convergence speed of the domain similarity evaluation neural network and the convolutional neural network. In addition, when optimizing the parameters of the convolutional neural network, the gradient penalty is ignored, and the gradient penalty does not affect the learning process of the feature representation, which avoids the fault diagnosis ability of the convolutional neural network by the gradient penalty.

进一步，所述Wasserstein距离表示：Further, the Wasserstein distance expresses:

源域子实数集的实数平均值减去目标子实数集的实数平均值；The real average value of the sub-real number set in the source domain minus the real average value of the target sub-real number set;

其中，所述源域子实数集是通过利普希茨约束从所述源域实数集获得，所述目标子实数集是通过所述利普希茨约束从所述目标实数集获得。Wherein, the source domain sub-real number set is obtained from the source domain real number set through the Lipschitz constraint, and the target sub-real number set is obtained from the target real number set through the Lipschitz constraint.

本发明的进一步有益效果是：基于利普希茨约束，得到计算Wasserstein距离所需的实数集，降低数据维度，在保证卷积神经网络的故障诊断能力的同时，提高域相似度评价神经网络和卷积神经网络的收敛速度。The further beneficial effects of the present invention are: based on the Lipschitz constraint, the real number set required for calculating the Wasserstein distance is obtained, the data dimension is reduced, the fault diagnosis capability of the convolutional neural network is guaranteed, and the domain similarity evaluation neural network and the neural network are improved. Convergence speed of convolutional neural networks.

进一步，所述利普希茨约束具体为一阶利普希茨约束。Further, the Lipschitz constraint is specifically a first-order Lipschitz constraint.

本发明的进一步有益效果是：采用一阶的利普希茨约束，计算Wasserstein距离，以保证卷积神经网络的连续性和可导性，同时保证卷积神经网络的故障诊断能力的同时，提高域相似度评价神经网络和卷积神经网络的收敛速度。The further beneficial effects of the present invention are: adopting the first-order Lipschitz constraint to calculate the Wasserstein distance to ensure the continuity and derivability of the convolutional neural network, while ensuring the fault diagnosis capability of the convolutional neural network, improving the Domain similarity evaluates the convergence speed of neural networks and convolutional neural networks.

进一步，当所述目标域样本集中包括部分标记样本时，则所述步骤2为：Further, when the target domain sample set includes some marked samples, the step 2 is:

采用待迁移学习的卷积神经网络，得到所述源域标记样本集的源域特征集和源域故障判断集以及所述目标域样本集的目标特征集和目标故障判断集；Using the convolutional neural network to be transferred and learned, obtain the source domain feature set and source domain fault judgment set of the source domain marked sample set and the target feature set and target fault judgment set of the target domain sample set;

所述步骤3为：The step 3 is:

以最大化所述源域特征集和所述目标特征集之间的Wasserstein距离并最小化所述Wasserstein距离、所述源域故障判断集的判断损失值和所述目标故障判断集的判断损失值的加和为目标，调整所述卷积神经网络的参数，基于收敛判据，重复步骤1，或者，完成卷积神经网络的对抗迁移学习。To maximize the Wasserstein distance between the source domain feature set and the target feature set and minimize the Wasserstein distance, the judgment loss value of the source domain fault judgment set and the judgment loss value of the target fault judgment set The sum of the convolutional neural network is adjusted as the goal, and based on the convergence criterion, step 1 is repeated, or, the adversarial transfer learning of the convolutional neural network is completed.

本发明的进一步有益效果是：本发明的深度对抗迁移学习方法，即使面对相关但不足够相似的任务之间的迁移任务(例如不同传感器位置之间的迁移学习)时，与大量未标记样本的无监督情况相比，只需添加少量标记的目标域样本，便大大提高迁移学习的准确性。因此，本发明不仅适用于目标域样本集无标记的情况，也适用于有部分标记的样本集，灵活应用。且当采用有部分标记的目标域样本集时，能够进一步提高对抗迁移训练得到的卷积神经网络对工业过程故障的判断精度，实用性强。A further beneficial effect of the present invention is that the deep adversarial transfer learning method of the present invention, even in the face of transfer tasks between tasks that are related but not sufficiently similar (such as transfer learning between different sensor positions), is incompatible with a large number of unlabeled samples. Compared with the unsupervised case of , only adding a small number of labeled target domain samples greatly improves the accuracy of transfer learning. Therefore, the present invention is not only applicable to the case where the target domain sample set is unmarked, but also applicable to the partially marked sample set, and can be applied flexibly. Moreover, when a partially labeled target domain sample set is used, the accuracy of judging industrial process failures by the convolutional neural network obtained by adversarial transfer training can be further improved, and the practicability is strong.

本发明还提供一种工业过程故障诊断卷积神经网络，采用如上所述的任一种基于Wasserstein距离的卷积神经网络对抗迁移学习方法训练得到。The present invention also provides a convolutional neural network for fault diagnosis in an industrial process, which is obtained by training using any of the above-mentioned Wasserstein distance-based convolutional neural network confrontation transfer learning methods.

本发明的有益效果是：可以基于任务之间的相关性，采用上述任一种基于Wasserstein距离的卷积神经网络对抗迁移学习方法训练得到的卷积神经网络，将源域对应的卷积神经网络深度对抗迁移学习为适用于源域和目标域且精度较高的卷积神经网络，一定程度上解决了现有技术面对新的故障诊断任务需要从头开始重建深度学习模型所导致的计算资源和训练时间的浪费以及缺乏在目标域中足够的样本数据的技术问题。The beneficial effects of the present invention are: based on the correlation between tasks, any one of the above-mentioned Wasserstein distance-based convolutional neural networks can be used to confront the convolutional neural network trained by the migration learning method, and the convolutional neural network corresponding to the source domain can be Deep adversarial transfer learning is a high-precision convolutional neural network suitable for source and target domains. The waste of training time and the lack of sufficient sample data in the target domain are technical problems.

本发明还提供一种工业过程故障诊断方法，基于如上所述的任一种工业过程故障诊断卷积神经网络，当接收到如上所述的任一种基于Wasserstein距离的卷积神经网络对抗迁移学习方法中所述目标域的新样本时，得到该新样本对应的工业过程故障判断结果。The present invention also provides an industrial process fault diagnosis method, based on any of the above-mentioned industrial process fault diagnosis convolutional neural networks, when receiving any of the above-mentioned Wasserstein distance-based convolutional neural networks against transfer learning When a new sample of the target domain is used in the method, the industrial process fault judgment result corresponding to the new sample is obtained.

本发明的有益效果是：采用上述任一种基于Wasserstein距离的卷积神经网络对抗迁移学习方法训练得到的卷积神经网络，进行工业过程故障诊断，在保证安全、高效的前提下，故障诊断精度更高。The beneficial effects of the present invention are as follows: using any of the above-mentioned Wasserstein distance-based convolutional neural networks against the convolutional neural network trained by the migration learning method to carry out fault diagnosis in industrial processes, under the premise of ensuring safety and high efficiency, the accuracy of fault diagnosis is improved. higher.

本发明还提供一种存储介质，所述存储介质中存储有指令，当计算机读取所述指令时，使所述计算机执行上述任一种基于Wasserstein距离的卷积神经网络对抗迁移学习方法和/或如上所述的任一种工业过程故障诊断方法。The present invention also provides a storage medium, the storage medium stores instructions, when the computer reads the instructions, the computer is made to execute any of the above-mentioned Wasserstein distance-based convolutional neural network adversarial transfer learning methods and/or Or any of the industrial process fault diagnosis methods described above.

附图说明Description of drawings

图1为本发明一个实施例提供的一种基于Wasserstein距离的卷积神经网络对抗迁移学习方法的流程框图；1 is a flowchart of a Wasserstein distance-based convolutional neural network confrontation transfer learning method provided by an embodiment of the present invention;

图2为本发明一个实施例提供的基于Wasserstein距离的卷积神经网络对抗迁移学习方法的流程图；2 is a flowchart of a Wasserstein distance-based convolutional neural network adversarial transfer learning method provided by an embodiment of the present invention;

图3为本发明一个实施例提供的迁移任务US(C)→US(A)的卷积神经网络输出可视化示意图；3 is a schematic diagram of the visualization of the convolutional neural network output of the migration task US(C)→US(A) provided by an embodiment of the present invention;

图4为本发明一个实施例提供的迁移任务US(E)→US(F)的的卷积神经网络输出可视化示意图；FIG. 4 is a schematic diagram of a convolutional neural network output visualization of a migration task US(E)→US(F) provided by an embodiment of the present invention;

图5为本发明一个实施例提供的任务(a)US(E)→US(F)和任务(b)S(E)→S(F)的诊断准确度随样本数量变化曲线。FIG. 5 is a graph showing the variation curve of the diagnostic accuracy of the task (a) US(E)→US(F) and the task (b) S(E)→S(F) with the number of samples provided by an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。此外，下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

实施例一Example 1

一种基于Wasserstein距离的卷积神经网络对抗迁移学习方法100，如图1所示，包括：A Convolutional Neural Network Adversarial Transfer Learning Method 100 Based on Wasserstein Distance, as shown in Figure 1, includes:

步骤110、从源域标记数据集和目标域数据集中，确定源域标记样本集和目标域样本集；Step 110, from the source domain labeled dataset and the target domain dataset, determine the source domain labeled sample set and the target domain sample set;

步骤120、采用待迁移学习的卷积神经网络，得到源域标记样本集的源域特征集和源域故障判断集以及目标域样本集的目标特征集；Step 120, using the convolutional neural network to be transferred and learned, to obtain the source domain feature set and the source domain fault judgment set of the source domain marked sample set and the target feature set of the target domain sample set;

步骤130、以最大化源域特征集和目标特征集之间的Wasserstein距离并最小化Wasserstein距离和源域故障判断集的判断损失值的加和为目标，调整卷积神经网络的参数，基于收敛判据，重复步骤110，或者，完成卷积神经网络的对抗迁移学习。Step 130, with the goal of maximizing the Wasserstein distance between the source domain feature set and the target feature set and minimizing the sum of the Wasserstein distance and the judgment loss value of the source domain fault judgment set, adjust the parameters of the convolutional neural network, based on convergence Criterion, repeat step 110, or complete the adversarial transfer learning of the convolutional neural network.

需要说明的是，在本发明之前，需要先基于源域数据集，训练得到一个卷积神经网络，该卷积神经网络即为待迁移学习的卷积神经网络。It should be noted that, before the present invention, a convolutional neural network needs to be obtained by training based on the source domain data set, and the convolutional neural network is the convolutional neural network to be transferred and learned.

具体的，首先，使用源域的标记数据集X^s对卷积神经网络模型进行预训练：卷积层包含一个大小为k的滤波器和偏置被用来计算特征。一个输出特征v_i通过滤波器w和一个非线性激活函数Γ获得，具体表达式为：υ_i＝Γ(w*u_j+b)，其中，是代表源域数据集X^s第j个子向量的输入数据，“*”代表卷积操作。使用非线性激活函数修正线性单元(ReLU)来降低影响深度学习优化过程中梯度消失的风险。因此，定义特征映射为v＝[υ₁，υ₂，...，υ_L]，其中L＝(pN-s)/I_cv+1是特征数，p表示填充的大小，N表示输入的维度，s表示滤波器大小，是卷积操作的步长。Specifically, first, the convolutional neural network model is pre-trained using the labeled dataset Xs of the source domain: the convolutional layer contains a filter of size ^k and bias are used to compute features. An output feature v _i is obtained by filter w and a nonlinear activation function Γ, and the specific expression is: υ _i =Γ(w*u _j +b), where, is the input data representing the jth sub-vector of the source domain dataset X ^s , and "*" represents the convolution operation. Modified Linear Units (ReLUs) with nonlinear activation functions to reduce the risk of vanishing gradients affecting the deep learning optimization process. Therefore, define the feature map as v=[υ ₁ , υ ₂ , . . . , υ _L ], where L=(pN-s)/I _cv +1 is the number of features, p represents the size of the padding, and N represents the input dimension, s is the filter size, is the stride of the convolution operation.

然后，将最大池化层应用到特征映射上，以提取一个步长范围内的最大特征值最大池化对应的滤波器大小为β，步长为I_pl。Then, a max pooling layer is applied to the feature map to extract the largest feature value within a stride range The filter size corresponding to max pooling is β and the step size is I _pl .

通过交替堆叠多个卷积层和最大池化层(滤波器尺寸可变)，构造用于描述特征的多层结构。多层结构的输出特征被展平并传递给全连接层来进行分类，从而在标签上产生概率分布的最终输出。源域中预训练的卷积神经网络使用Softmax函数来获得最终的分类结果。为了在源域中计算预测标签和真实数据的标签之间的差异，使用交叉熵计算损失l_c，即：By alternately stacking multiple convolutional layers and max-pooling layers (with variable filter size), a multi-layer structure for describing features is constructed. The output features of the multilayer structure are flattened and passed to a fully connected layer for classification, resulting in the final output of a probability distribution over the labels. The pretrained convolutional neural network in the source domain uses the Softmax function to obtain the final classification result. To compute predicted labels in the source domain and labels for real data The difference between , the loss _lc is calculated using cross entropy, namely:

具有未标记数据的目标域的可迁移特征，通过上述训练的卷积神经网络的特征提取器(即为全连接层之前的网络结构，包括卷积层和最大池化层交替堆叠构成的多层结构)直接获得。The transferable features of the target domain with unlabeled data are passed through the feature extractor of the above-trained convolutional neural network (that is, the network structure before the fully connected layer, including the convolutional layer and the maximum pooling layer. structure) directly obtained.

下一步衡量源数据集和目标数据集之间的特征分布差异。具体的，可以基于在两个特征分布之间的共同潜在空间，利用Wasserstein距离，进行对抗训练，来学习提取不变特征。具体使用预训练卷积神经网络模型的特征提取器来学习来自两个域的特征。对于n＜N^s，N^t，给定两个来自X^s和X^t的小批量实例和两个实例都通过一个参数为θ_f的特征提取器直接生成源域特征h^s＝r_f(x^s)和目标域特征h^t＝r_f(x^t)。设和分别为h^s和h^t的分布。其中，(·)^s和(·)^t分别代表源域和目标域信息。The next step is to measure the difference in feature distribution between the source and target datasets. Specifically, adversarial training can be performed based on a common latent space between two feature distributions using Wasserstein distance to learn to extract invariant features. Specifically, a feature extractor from a pretrained convolutional neural network model is used to learn features from both domains. For n < N ^s , N ^t , given two mini-batch instances from X ^s and X ^t and Both instances pass through a feature extractor with parameter θ _f The source domain features h ^s =r _f (x ^s ) and the target domain features h ^t =r _f (x ^t ) are directly generated. Assume and are the distributions of h ^s and h ^t , respectively. Among them, ( ) ^s and ( ) ^t represent the source domain and target domain information, respectively.

通过Wasserstein距离进行域适应的目的是优化参数θ_f以减小分布和之间的距离，即学习两个域的不变特征。The purpose of domain adaptation via Wasserstein distance is to optimize the parameter _θf to reduce the distribution and The distance between, i.e. learning invariant features of the two domains.

本实施例采用反向传播理论优化模型思想。在卷积神经网络的迁移学习中引入Wasserstein距离，以Wasserstein距离最大为目标，提高对两种样本集所提取的特征的区分敏感度，再以Wasserstein距离和源域故障判断集的判断损失值的加和最小为目标，以提高卷积神经网络的判断精度。基于上述目标，更改卷积神经网络的参数，在训练过程中，两个目标形成对抗，实现深度对抗迁移学习。因此，本实施例训练得到的卷积神经网络能够最小化源域和目标域之间的分布差异，采用未标记的目标域数据进行无监督迁移学习，就足够获得精确度高的故障诊断能力。一定程度上解决现有技术面对新的故障诊断任务需要从头开始重建深度学习模型所导致的计算资源和训练时间的浪费以及在目标域中缺乏足够的样本数据的技术问题。This embodiment adopts the idea of back-propagation theory to optimize the model. The Wasserstein distance is introduced in the transfer learning of the convolutional neural network, and the maximum Wasserstein distance is taken as the goal to improve the discrimination sensitivity of the features extracted from the two sample sets. The minimum summation is the goal to improve the judgment accuracy of the convolutional neural network. Based on the above goals, the parameters of the convolutional neural network are changed. During the training process, the two goals form a confrontation to achieve deep adversarial transfer learning. Therefore, the convolutional neural network trained in this embodiment can minimize the distribution difference between the source domain and the target domain, and using unlabeled target domain data for unsupervised transfer learning is sufficient to obtain a high-precision fault diagnosis capability. To a certain extent, it solves the technical problems of the waste of computing resources and training time caused by the need to rebuild the deep learning model from scratch in the face of new fault diagnosis tasks in the existing technology and the lack of sufficient sample data in the target domain.

优选的，Wasserstein距离为：Preferably, the Wasserstein distance is:

通过采用域相似度评价神经网络，分别将源域特征集映射为源域实数集、目标特征集映射为目标实数集，并将源域实数集的实数平均值减去目标实数集的实数平均值计算得到。By using the domain similarity evaluation neural network, the source domain feature set is mapped to the source domain real number set, the target feature set is mapped to the target real number set, and the real number average value of the source domain real number set is subtracted from the real number average value of the target real number set. Calculated.

本实施例引入域相似度评价神经网络，分别将源域判断集和目标判断集映射为实数，基于实数计算Wasserstein距离，以Wasserstein距离最大为目标，训练域相似度评价神经网络，提高域相似度评价神经网络对两种样本集所提取的特征的区分敏感度。其中，基于实数平均值进行Wasserstein距离计算，简单且可靠性高。In this embodiment, a domain similarity evaluation neural network is introduced, the source domain judgment set and the target judgment set are respectively mapped to real numbers, the Wasserstein distance is calculated based on the real numbers, and the maximum Wasserstein distance is taken as the goal, and the domain similarity evaluation neural network is trained to improve the domain similarity Evaluate the discriminative sensitivity of the neural network to the features extracted from the two sample sets. Among them, the Wasserstein distance calculation is performed based on the average value of real numbers, which is simple and highly reliable.

优选的，步骤130包括：Preferably, step 130 includes:

步骤131、以增大源域特征集和目标特征集之间的Wasserstein距离为目标，调整域相似度评价神经网络的参数，得到预设迭代次数对应的新的域相似度评价神经网络；Step 131, aiming to increase the Wasserstein distance between the source domain feature set and the target feature set, adjust the parameters of the domain similarity evaluation neural network, and obtain a new domain similarity evaluation neural network corresponding to the preset number of iterations;

步骤132、基于源域标记样本集、目标域样本集和新的域相似度评价神经网络，以减小Wasserstein距离和卷积神经网络的判断损失值的加和为目标，更改卷积神经网络的参数，得到预设迭代次数对应的新的卷积神经网络；Step 132: Evaluate the neural network based on the source domain labeled sample set, the target domain sample set and the new domain similarity, and change the value of the convolutional neural network to reduce the sum of the Wasserstein distance and the judgment loss value of the convolutional neural network. parameters to obtain a new convolutional neural network corresponding to the preset number of iterations;

步骤133、基于域相似度评价神经网络和卷积神经网络的收敛判据，停止训练，完成卷积神经网络的对抗迁移学习，或者，重复步骤110。Step 133 , evaluate the convergence criterion of the neural network and the convolutional neural network based on the domain similarity, stop training, and complete the confrontation transfer learning of the convolutional neural network, or repeat step 110 .

以Wasserstein距离最大为目标，训练域相似度评价神经网络，提高域相似度评价神经网络对两种样本集所提取的特征的区分敏感度。之后，固定域相似度评价神经网络的参数，调整卷积神经网络的参数，一方面基于卷积神经网络得到源域样本集的判断损失值，另一方面，将卷积神经网络提取得到的源域特征集和目标域特征集输入上述训练好的域相似度评价神经网络，得到Wasserstein距离，以Wasserstein距离和判断损失值的加和最小为目标，训练故障判断卷积神经网络。因此，基于先训练域相似度评价神经网络后训练卷积神经网络，之后再训练域相似度评价神经网络再训练卷积神经网络的方式，反复迭代多次，直至两种神经网络收敛，完成卷积神经网络训练，使得该积神经网络具有较高的故障判断能力。Aiming at the maximum Wasserstein distance, the domain similarity evaluation neural network is trained to improve the discrimination sensitivity of the domain similarity evaluation neural network to the features extracted from the two sample sets. After that, the parameters of the neural network are evaluated by the fixed domain similarity, and the parameters of the convolutional neural network are adjusted. On the one hand, the judgment loss value of the source domain sample set is obtained based on the convolutional neural network. The domain feature set and the target domain feature set are input into the above trained domain similarity evaluation neural network, and the Wasserstein distance is obtained, and the fault judgment convolutional neural network is trained with the goal of minimizing the sum of the Wasserstein distance and the judgment loss value. Therefore, based on the method of first training the domain similarity evaluation neural network, then training the convolutional neural network, and then training the domain similarity evaluation neural network and then training the convolutional neural network, iterates many times until the two neural networks converge, and the volume is completed. The product neural network is trained, so that the product neural network has a higher fault judgment ability.

优选的，Wasserstein距离表示：源域子实数集的实数平均值减去目标子实数集的实数平均值；其中，源域子实数集是通过利普希茨约束从源域实数集获得，目标子实数集是通过利普希茨约束从目标实数集获得。Preferably, the Wasserstein distance represents: the real average value of the sub-real number set in the source domain minus the real average value of the target sub-real number set; wherein, the sub-real number set in the source domain is obtained from the real number set in the source domain through the Lipschitz constraint, and the sub-real number set in the target domain is obtained from the real number set in the target domain. The set of real numbers is obtained from the target set of real numbers by Lipschitz constraints.

基于利普希茨约束，得到计算Wasserstein距离所需的实数集，降低数据维度，在保证卷积神经网络的故障诊断能力的同时，提高域相似度评价神经网络和卷积神经网络的收敛速度。Based on the Lipschitz constraint, the real number set required to calculate the Wasserstein distance is obtained, and the data dimension is reduced. While ensuring the fault diagnosis ability of the convolutional neural network, the convergence speed of the domain similarity evaluation neural network and the convolutional neural network is improved.

优选的，利普希茨约束具体为一阶利普希茨约束。Preferably, the Lipschitz constraint is specifically a first-order Lipschitz constraint.

采用一阶的利普希茨约束，计算Wasserstein距离，降低计算复杂度，在保证卷积神经网络的故障诊断能力的同时，提高域相似度评价神经网络和卷积神经网络的收敛速度。The first-order Lipschitz constraint is used to calculate the Wasserstein distance, which reduces the computational complexity. While ensuring the fault diagnosis capability of the convolutional neural network, the convergence speed of the domain similarity evaluation neural network and the convolutional neural network is improved.

优选的，Wasserstein距离为基于梯度惩罚的Wasserstein距离。Preferably, the Wasserstein distance is a Wasserstein distance based on gradient penalty.

引入梯度惩罚，用于限制神经网络，防止参数过于复杂，降低计算复杂度，提高域相似度评价神经网络和卷积神经网络的收敛速度。Gradient penalty is introduced to limit the neural network, prevent the parameters from being too complex, reduce the computational complexity, and improve the convergence speed of the domain similarity evaluation neural network and convolutional neural network.

具体的，如图2所示，在步骤130中引入一个评价域相似度的神经网络(记作DomainCritic，缩写为DC)来学习参数为θ_c的映射能将源域和目标域的特征映射成实数。Specifically, as shown in FIG. 2 , a neural network for evaluating domain similarity (referred to as DomainCritic, abbreviated as DC) is introduced in step 130 to learn the mapping whose parameter is θ _c The features of the source and target domains can be mapped to real numbers.

而Wasserstein距离可以通过来计算，其中上限超过所有一阶利普希茨函数且经验Wasserstein距离可近似计算如下：其中，l_wd表示源域数据X^s和目标域数据X^t之间的DC损失(即命名为经验Wasserstein-1距离)。现在在受利普希茨约束时找l_wd的最大值，实践证明，可以结合梯度惩罚来训练DC的参数θ_c，式中的特征表示h由生成的源域和目标域特征(即h^s和h^t)以及沿h^s和h^t对之间的直线上随机选择的点h^r组成。由于Wasserstein-1距离是可微且几乎所有地方都连续的，因此通过求解以下优化问题来训练DC：其中，ρ是平衡系数。And the Wasserstein distance can be obtained by to calculate where the upper bound exceeds all first-order Lipschitz functions And the empirical Wasserstein distance can be approximately calculated as follows: where _lwd represents the DC loss between the source domain data ^Xs and the target domain data ^Xt (i.e. named empirical Wasserstein-1 distance). Now find the maximum value of _lwd when it is constrained by Lipschitz. It has been proved that it can be combined with gradient penalty. The parameters θ _c to train the DC, where the feature representation h is generated by the source and target domain features (i.e. h ^s and h ^t ) and a randomly selected point hr ^along the line between h ^s and h ^t pair composition. Since the Wasserstein-1 distance is differentiable and continuous almost everywhere, the DC is trained by solving the following optimization problem: where ρ is the balance coefficient.

上述采用无标记的目标域样本集，是一种用于域自适应的无监督特征学习，但可能两个域中学习的特征还分得不够开。本实施例最终目标是为目标域开发一个精确的深度迁移学习分类器，它需要将标记的源域(和目标域，如果可以的话)数据的监督学习结合进不变特征学习的问题中。然后将鉴别器(具有两个全连接层)用于进一步减小源域和目标域特征分布之间的距离。在该步骤中，DC的参数θ_c是在上述训练的参数，并且更新参数θ_f以优化最小化算子。The above-mentioned unlabeled target domain sample set is a kind of unsupervised feature learning for domain adaptation, but the features learned in the two domains may not be separated enough. The ultimate goal of this embodiment is to target the domain Developing an accurate deep transfer learning classifier requires incorporating supervised learning of labeled source domain (and target domain, if available) data into the problem of invariant feature learning. The discriminator (with two fully connected layers) is then used to further reduce the distance between the source and target domain feature distributions. In this step, the parameter θ _c of DC is the parameter trained in the above, and the parameter θ _f is updated to optimize the minimization operator.

最终的目标函数可以根据鉴别器的交叉熵损失l_c和与域差异相关的上述经验Wasserstein距离l_wd表示，即：其中，θ_d表示鉴别器的参数，λ是确定域混乱程度的超参数。当优化上述最小化算子时，忽略了梯度惩罚l_grad(即设置ρ等于0)，因为它不应该影响表示的学习过程。The final objective function can be expressed in terms of the discriminator's cross-entropy loss l _c and the above empirical Wasserstein distance l _wd related to the domain difference, namely: where _θd denotes the parameters of the discriminator and λ is a hyperparameter that determines the degree of domain confusion. When optimizing the above minimization operator, the gradient penalty _lgrad (i.e. setting ρ equal to 0) is ignored since it should not affect the learning process of the representation.

优选的，当目标域样本集中包括部分标记样本时，则步骤120为：Preferably, when the target domain sample set includes some marked samples, then step 120 is:

采用待迁移学习的卷积神经网络，得到源域标记样本集的源域特征集和源域故障判断集以及目标域样本集的目标特征集和目标故障判断集；步骤130为：以最大化源域特征集和目标特征集之间的Wasserstein距离并最小化Wasserstein距离、源域故障判断集的判断损失值和目标故障判断集的判断损失值的加和为目标，优化卷积神经网络的参数，基于收敛判据，重复步骤110，或者，完成卷积神经网络的对抗迁移学习。Using the convolutional neural network to be transferred and learned, the source domain feature set and source domain fault judgment set of the source domain marked sample set and the target feature set and target fault judgment set of the target domain sample set are obtained; Step 130 is: to maximize the source domain Take the Wasserstein distance between the domain feature set and the target feature set and minimize the sum of the Wasserstein distance, the judgment loss value of the source domain fault judgment set and the judgment loss value of the target fault judgment set as the goal, and optimize the parameters of the convolutional neural network, Based on the convergence criterion, step 110 is repeated, or, the adversarial transfer learning of the convolutional neural network is completed.

本实施例的深度对抗迁移学习方法，即使面对相关但不足够相似的任务之间的迁移任务(例如不同传感器位置之间的迁移学习)时，与大量未标记样本的无监督情况相比，只需添加少量标记的目标域样本，便大大提高迁移学习的准确性。因此，本发明不仅适用于目标域样本集无标记的情况，也适用于有部分标记的样本集，灵活应用。且当采用有部分标记的目标域样本集时，能够进一步提高对抗迁移训练得到的卷积神经网络对工业过程故障的诊断精度，实用性强。The deep adversarial transfer learning method of this embodiment, even when faced with transfer tasks between tasks that are related but not sufficiently similar (such as transfer learning between different sensor locations), compared to the unsupervised case of a large number of unlabeled samples, Just adding a small number of labeled target domain samples greatly improves the accuracy of transfer learning. Therefore, the present invention is not only applicable to the case where the target domain sample set is unmarked, but also applicable to the partially marked sample set, and can be applied flexibly. Moreover, when a partially labeled target domain sample set is used, the diagnostic accuracy of the convolutional neural network obtained by adversarial transfer training for industrial process faults can be further improved, and the practicability is strong.

实施例二Embodiment 2

一种工业过程故障诊断卷积神经网络，采用如上实施例一所述的任一种基于Wasserstein距离的卷积神经网络对抗迁移学习方法训练得到。A convolutional neural network for fault diagnosis in an industrial process is obtained by using any of the Wasserstein distance-based convolutional neural network confrontation transfer learning methods described in the first embodiment.

可以基于任务之间的相关性，采用上述任一种基于Wasserstein距离的卷积神经网络对抗迁移学习方法训练得到的卷积神经网络，将源域对应的卷积神经网络深度对抗迁移学习为适用于源域和目标域且精度较高的卷积神经网络，一定程度上解决了现有技术面对新的故障诊断任务需要从头开始重建深度学习模型所导致的计算资源和训练时间的浪费以及缺乏在目标域中足够的样本数据的技术问题。Based on the correlation between tasks, the convolutional neural network trained by any of the above-mentioned Wasserstein distance-based convolutional neural network adversarial transfer learning methods can be used to learn the deep adversarial transfer learning of the convolutional neural network corresponding to the source domain as suitable for The high-precision convolutional neural network in the source and target domains, to a certain extent, solves the waste of computing resources and training time caused by the need to rebuild the deep learning model from scratch in the face of new fault diagnosis tasks in the existing technology. Technical issues with sufficient sample data in the target domain.

实施例三Embodiment 3

一种工业过程故障诊断方法，基于如上实施例二所述的任一种工业过程故障诊断卷积神经网络，当接收到如上实施例一所述的任一种基于Wasserstein距离的卷积神经网络对抗迁移学习方法中所述目标域的新样本时，得到该新样本对应的工业过程故障判断结果。An industrial process fault diagnosis method, based on any of the industrial process fault diagnosis convolutional neural networks as described in the second embodiment, when receiving any of the Wasserstein distance-based convolutional neural network confrontations as described in the first embodiment When migrating a new sample of the target domain in the learning method, the industrial process fault judgment result corresponding to the new sample is obtained.

采用上述任一种基于Wasserstein距离的卷积神经网络对抗迁移学习方法训练得到的卷积神经网络，进行工业过程故障诊断，在保证安全、高效的前提下，故障诊断精度更高。The convolutional neural network trained by any of the above-mentioned Wasserstein distance-based convolutional neural network adversarial transfer learning methods is used to diagnose industrial process faults. On the premise of ensuring safety and efficiency, the fault diagnosis accuracy is higher.

为了验证上述深度对抗迁移学习对故障诊断问题的有效性，本实施例引入了凯斯西储大学数据中心获得的基准轴承故障数据集。监测四种类型的轴承状态(即正常、内圈故障、外圈故障和滚子故障)，数字信号的采样频率为12kHz，驱动端轴承故障数据同时也以48kHz的采样速率采集。同时，每种故障类型都以不同的故障严重程度(0.007英寸、0.014英寸和0.021英寸的故障直径)运行。每种类型的故障轴承都配备有测试电机，该电机在四种不同的电机速度(即1797rpm、1772rpm、1750rpm和1730rpm)下运行。记录每个实验的振动信号以进行故障诊断。In order to verify the effectiveness of the above-mentioned deep adversarial transfer learning on the fault diagnosis problem, this example introduces the benchmark bearing fault data set obtained by the Case Western Reserve University data center. Four types of bearing conditions (ie, normal, inner ring fault, outer ring fault, and roller fault) are monitored, and the digital signal is sampled at a frequency of 12 kHz. The drive-end bearing fault data is also collected at a sampling rate of 48 kHz. At the same time, each fault type operates with different fault severities (0.007-inch, 0.014-inch, and 0.021-inch fault diameters). Each type of faulty bearing was equipped with a test motor, which was run at four different motor speeds (ie 1797rpm, 1772rpm, 1750rpm and 1730rpm). The vibration signal of each experiment was recorded for fault diagnosis.

数据预处理：将简单的数据预处理技术应用于轴承数据集：1)分割样本以保持每个样本在和中都有2000个测量点；2)使用快速傅里叶变换(FFT)计算每个样本的频域功率谱；3)将FFT计算的对称功率谱左侧部分作为深度迁移学习模型的输入。因此，每个输入样本具有1000个测量值。Data Preprocessing: Apply simple data preprocessing techniques to the bearing dataset: 1) Split the samples to keep each sample in the and There are 2000 measurement points in each sample; 2) the frequency-domain power spectrum of each sample is calculated using the Fast Fourier Transform (FFT); 3) the left part of the symmetrical power spectrum calculated by the FFT is used as the input of the deep transfer learning model. Therefore, each input sample has 1000 measurements.

一下基于两个无监督场景和一个监督场景(见表1)进行验证，场景具体包括：The following is verified based on two unsupervised scenarios and one supervised scenario (see Table 1). The scenarios include:

(1)电机速度之间的无监督迁移(US-Speed)：对于这种情况，测试在电机驱动端获得的数据，且忽略故障的严重性。这里构建四分类任务(即正常，以及内圈、外圈和滚子故障的三种故障条件)，跨越4个不同电机速度的域：1797rpm(US(A))，1772rpm(US(B))，1750rpm(US(C))和1730rpm(US(D))。(1) Unsupervised migration between motor speeds (US-Speed): For this case, the data obtained at the motor drive side is tested and the severity of the fault is ignored. Four classification tasks are constructed here (i.e. normal, and three fault conditions for inner ring, outer ring and roller failure), spanning 4 domains of different motor speeds: 1797rpm (US(A)), 1772rpm (US(B)) , 1750rpm (US(C)) and 1730rpm (US(D)).

(2)两个传感器位置(US-Location)之间的无监督迁移：对于这种情况，关注不同传感器位置之间的域自适应，但忽略故障的严重程度和电机速度的差异。同样，这里为源域和目标域构建四分类(正常和三种故障)任务，其中振动加速度数据分别由位于驱动端(US(E))和电机外壳的风扇端(US(F))的两个传感器获取。(2) Unsupervised transfer between two sensor locations (US-Location): For this case, focus on domain adaptation between different sensor locations, but ignore the difference in fault severity and motor speed. Likewise, four classification (normal and three faults) tasks are constructed here for the source and target domains, where the vibration acceleration data is composed of two samples located at the drive end (US(E)) and the fan end (US(F)) of the motor housing, respectively. sensor acquisition.

(3)两个传感器位置(S-Location)的数据集之间的监督迁移：此场景使用与前一场景US-Location相同的设置，但在源域中添加了少量目标域的标记数据(大约0.5％)，旨在提高分类性能。(3) Supervised transfer between datasets of two sensor locations (S-Location): This scenario uses the same settings as the previous scenario US-Location, but adds a small amount of labeled data from the target domain in the source domain (approx. 0.5%), aiming to improve the classification performance.

表1Table 1

为进行比较，其他的方法也在相同的数据集做测试，包括：For comparison, other methods were also tested on the same dataset, including:

(1)卷积神经网络(CNN)：该模型是实施例一描述的预训练网络，该网络基于标记的源域数据进行训练，并直接用于测试目标域上的分类结果。(1) Convolutional Neural Network (CNN): This model is the pre-training network described in the first embodiment, the network is trained based on the labeled source domain data, and is directly used to test the classification results on the target domain.

(2)深度适应网络(DAN)：通过深度神经网络中的MK-MMD学习可迁移特征。MMD度量是一个积分概率度量，它通过将样本映射到再生核希尔伯特空间(RKHS)来测量两个概率分布之间的距离。(2) Deep Adaptation Network (DAN): Transferable features are learned through MK-MMD in a deep neural network. The MMD metric is an integral probability metric that measures the distance between two probability distributions by mapping samples to a Regenerated Kernel Hilbert Space (RKHS).

此外，与使用传统统计特征相比，本实施例还评估了卷积神经网络的特征提取能力、比较了使用统计(手工)特征的传统迁移学习方法的结果，包括迁移成分分析(TCA)、联合分布适应(JDA)和相关对齐(CORAL)。In addition, compared with using traditional statistical features, this embodiment also evaluates the feature extraction ability of convolutional neural networks, and compares the results of traditional transfer learning methods using statistical (hand-made) features, including transfer component analysis (TCA), joint Distribution Adaptation (JDA) and Correlation Alignment (CORAL).

具体的实现细节，如下：The specific implementation details are as follows:

使用TensorFlow作为实验的软件框架，这些模型都使用Adam进行训练。在5000次迭代中测试每种方法五次，并记录每次测试的最佳结果。这里采用平均值和分类准确度的95％置信区间进行比较。电动机速度任务(A)、(B)、(C)和(D)的样本大小分别为1026、1145、1390和1149。不同传感器位置的任务(E)和(F)的样本大小分别为3790和4710。对于所有实验，每一次训练或诊断的样本批量大小n固定为32。Using TensorFlow as the software framework for the experiments, the models are all trained using Adam. Each method was tested five times in 5000 iterations and the best result for each test was recorded. Means and 95% confidence intervals for classification accuracy were used here for comparison. The sample sizes for the motor speed tasks (A), (B), (C) and (D) were 1026, 1145, 1390 and 1149, respectively. The sample sizes for tasks (E) and (F) at different sensor locations are 3790 and 4710, respectively. For all experiments, the sample batch size n for each training or diagnosis is fixed to 32.

卷积神经网络(CNN)：卷积神经网络架构由两个卷积层(Conv1-Conv2)，两个最大池化层(Pool1-Pool2)和两个全连接层(FC1-FC2)组成。输出层中的激活函数是Softmax，而ReLU用于卷积层。FC1和FC2中的神经元数分别为128和4。每层的滤波器数量，核大小和步长可参见表2。在迁移之前，对CNN模型进行微调，以便达到所有迁移场景的最佳验证精度。Convolutional Neural Network (CNN): The Convolutional Neural Network architecture consists of two convolutional layers (Conv1-Conv2), two max-pooling layers (Pool1-Pool2) and two fully connected layers (FC1-FC2). The activation function in the output layer is Softmax, while ReLU is used in the convolutional layer. The number of neurons in FC1 and FC2 are 128 and 4, respectively. The number of filters, kernel size and step size of each layer can be seen in Table 2. Before migration, the CNN model is fine-tuned in order to achieve the best validation accuracy for all transfer scenarios.

深度自适应网络(DAN)：卷积神经网络的卷积层(Conv1-Conv2)用作特征提取器。然后，为了最小化源域和目标域之间的域距离，FC1被用作适应的隐藏层。两个域中隐藏层的最终表示被嵌入在RKHS中以减少MK-MMD距离。最终目标函数是MK-MMD损失和分类损失的组合。Deep Adaptive Network (DAN): The convolutional layers (Conv1-Conv2) of the convolutional neural network are used as feature extractors. Then, in order to minimize the domain distance between the source and target domains, FC1 is used as an adapted hidden layer. The final representations of hidden layers in both domains are embedded in RKHS to reduce the MK-MMD distance. The final objective function is a combination of MK-MMD loss and classification loss.

本申请的基于Wasserstein深度对抗迁移学习模型(WD-DTL)：与深度自适应网络类似，卷积层(Conv1-Conv2)用于提取特征。DC网络中隐藏层的节点分别设置为128和1。每一批量的训练次数C设定为10。鉴别器和DC的学习率分别为α₁＝10^-3和α₂＝2×10^-4。梯度损失ρ设定为10。用于优化最小化算子的平衡系数λ为0.1和0.8，分别用于电机速度迁移和传感器位置迁移。The Wasserstein-based Deep Adversarial Transfer Learning Model (WD-DTL) of this application: Similar to deep adaptive networks, convolutional layers (Conv1-Conv2) are used to extract features. The nodes of the hidden layer in the DC network are set to 128 and 1, respectively. The number of training times C for each batch is set to 10. The learning rates of the discriminator and DC are α ₁ =10 ⁻³ and α ₂ =2×10 ⁻⁴ , respectively. The gradient loss ρ is set to 10. The balance coefficients λ used to optimize the minimization operator are 0.1 and 0.8 for motor speed migration and sensor position migration, respectively.

就传统的迁移学习方法TCA、JDA和CORAL而言，正则化项λ选自{0.001 0.01 0.11.0 10 100}。使用SVM在TCA和CORAL进行分类。For the traditional transfer learning methods TCA, JDA and CORAL, the regularization term λ is selected from {0.001 0.01 0.11.0 10 100}. Classification at TCA and CORAL using SVM.

表2Table 2

层Floor 滤波器filter 核大小nuclear size 步长step size Conv1/2Conv1/2 8/168/16 1x201x20 22 Pool1/2Pool1/2 -- 1x21x2 22

基于以上实现细节，得出如下结果：Based on the above implementation details, the following results are obtained:

WD-DTL和其他两种方法的迁移任务结果如表3所示。对于在目标域中具有未标记数据的迁移任务(即US-Speed和US-Location)，可以观察到深度迁移学习模型明显优于卷积神经网络，并且平均准确度分别在电动机速度和传感器位置迁移任务中大约增加了13.6％和25％。此外，除了在迁移任务US(D)US(A)中准确度小于DAN 1％之外，WD-DTL的迁移准确度优于DAN的大多数结果(平均增加5％)。The migration task results of WD-DTL and the other two methods are shown in Table 3. For transfer tasks with unlabeled data in the target domain (i.e. US-Speed and US-Location), it can be observed that deep transfer learning models significantly outperform convolutional neural networks, and the average accuracy is transferred on motor speed and sensor location, respectively There are roughly 13.6% and 25% increases in tasks. Furthermore, the transfer accuracy of WD-DTL outperforms most results of DAN (with an average increase of 5%), except in the transfer task US(D)US(A), which is less than 1% accurate for DAN.

表3table 3

TCATCA JDAJDA CORALCORAL CNNCNN DANDAN WD-DTLWD-DTL Us(A)→Us(B)Us(A)→Us(B) 26.5526.55 65.07(±7.55)65.07(±7.55) 59.1859.18 82.75(±6.77)82.75(±6.77) 92.97(±3.88)92.97(±3.88) 97.52(±3.09)97.52(±3.09) Us(A)→Us(C)Us(A)→Us(C) 46.8046.80 51.31(±1.56)51.31(±1.56) 62.1462.14 78.65(±4.54)78.65(±4.54) 85.32(±5.26)85.32(±5.26) 94.43(±2.99)94.43(±2.99) Us(A)→Us(D)Us(A)→Us(D) 26.5726.57 57.70(±8.59)57.70(±8.59) 49.8349.83 82.99(±5.89)82.99(±5.89) 89.39(±4.37)89.39(±4.37) 95.05(±2.12)95.05(±2.12) Us(B)→Us(A)Us(B)→Us(A) 26.6326.63 71.19(±1.21)71.19(±1.21) 53.5753.57 84.14(±6.63)84.14(±6.63) 94.43(±2.95)94.43(±2.95) 96.80(±1.10)96.80(±1.10) Us(B)→Us(C)Us(B)→Us(C) 26.6026.60 69.80(±5.67)69.80(±5.67) 57.2857.28 85.41(±9.44)85.41(±9.44) 90.43(±4.62)90.43(±4.62) 99.69(±0.59)99.69(±0.59) Us(B)→Us(D)Us(B)→Us(D) 26.5726.57 88.50(±1.96)88.50(±1.96) 60.5360.53 86.09(±4.63)86.09(±4.63) 87.37(±5.42)87.37(±5.42) 95.51(±2.52)95.51(±2.52) Us(C)→Us(A)Us(C)→Us(A) 26.6326.63 56.42(±2.52)56.42(±2.52) 54.0354.03 76.50(±3.76)76.50(±3.76) 89.88(±1.57)89.88(±1.57) 92.16(±2.61)92.16(±2.61) Us(C)→Us(B)Us(C)→Us(B) 26.6626.66 69.18(±1.90)69.18(±1.90) 76.6676.66 82.75(±5.51)82.75(±5.51) 92.93(±1.57)92.93(±1.57) 96.03(±6.27)96.03(±6.27) Us(C)→Us(D)Us(C)→Us(D) 46.7546.75 77.45(±0.83)77.45(±0.83) 70.3470.34 87.04(±6.81)87.04(±6.81) 90.66(±5.24)90.66(±5.24) 97.56(±3.31)97.56(±3.31) Us(D)→Us(A)Us(D)→Us(A) 46.7446.74 61.72(±5.48)61.72(±5.48) 59.7859.78 79.23(±6.96)79.23(±6.96) 90.88(±1.82)90.88(±1.82) 89.82(±2.41)89.82(±2.41) Us(D)→Us(B)Us(D)→Us(B) 46.7946.79 74.03(±0.86)74.03(±0.86) 59.7359.73 79.73(±5.49)79.73(±5.49) 87.91(±2.42)87.91(±2.42) 95.16(±3.67)95.16(±3.67) Us(D)→Us(C)Us(D)→Us(C) 26.6026.60 65.24(±4.18)65.24(±4.18) 63.0263.02 80.64(±4.23)80.64(±4.23) 92.94(±3.96)92.94(±3.96) 99.62(±0.80)99.62(±0.80) 平均average 33.3233.32 67.35(±3.53)67.35(±3.53) 56.0156.01 82.10(±5.89)82.10(±5.89) 90.42(±3.59)90.42(±3.59) 95.75(±2.62)95.75(±2.62) Us(E)→Us(F)Us(E)→Us(F) 19.0519.05 57.35(±0.47)57.35(±0.47) 47.9747.97 39.07(±2.22)39.07(±2.22) 56.89(±2.73)56.89(±2.73) 64.17(±7.16)64.17(±7.16) Us(F)→Us(E)Us(F)→Us(E) 20.4520.45 66.34(±4.47)66.34(±4.47) 39.8739.87 39.95(±3.84)39.95(±3.84) 55.97(±3.17)55.97(±3.17) 64.24(±3.87)64.24(±3.87) 平均average 19.7519.75 61.85(±2.47)61.85(±2.47) 43.9243.92 39.51(±3.03)39.51(±3.03) 56.43(±2.95)56.43(±2.95) 64.20(±5.52)64.20(±5.52) s(E)→s(F)s(E)→s(F) 20.4320.43 65.48(±0.57)65.48(±0.57) 51.7751.77 54.04(±7.67)54.04(±7.67) 59.68(±4.61)59.68(±4.61) 65.69(±3.74)65.69(±3.74) s(F)→s(E)s(F)→s(E) 19.0219.02 59.07(±0.56)59.07(±0.56) 47.8847.88 50.47(±5.74)50.47(±5.74) 58.78(±5.67)58.78(±5.67) 64.15(±5.52)64.15(±5.52) 平均average 19.7319.73 62.28(±0.57)62.28(±0.57) 49.8349.83 52.26(±6.71)52.26(±6.71) 59.23(±5.14)59.23(±5.14) 64.92(±4.63)64.92(±4.63)

总结一下，可以得出以下观察结果：1)WD-DTL达到最佳迁移准确度，平均得分为95.75％；2)在没有域适应的情况下，卷积神经网络方法由于其出色的特征检测能力，已经具备了为电机速度迁移任务实现良好分类性能的能力；3)本发明提出的WD-DTL方法显示出用少量标记数据解决监督问题的良好能力。监督迁移任务S(E)→S(F)和S(F)→S(E)仅使用无监督案例的0.5％样本大小进行，但实现了使用100％未标记的样本的无监督案例一样好的性能。To sum up, the following observations can be drawn: 1) WD-DTL achieves the best transfer accuracy with an average score of 95.75%; 2) in the absence of domain adaptation, the convolutional neural network approach due to its excellent feature detection ability , already possessed the ability to achieve good classification performance for the motor speed transfer task; 3) the WD-DTL method proposed in the present invention shows a good ability to solve the supervision problem with a small amount of labeled data. The supervised transfer tasks S(E)→S(F) and S(F)→S(E) are performed using only 0.5% of the sample size of the unsupervised case, but achieves as good as the unsupervised case using 100% unlabeled samples performance.

基于上述结果，现实验分析如下：Based on the above results, the experimental analysis is as follows:

特征可视化：采用T分布随机邻域嵌入(t-SNE)对网络可视化进行非线性降维。对于电机速度之间的迁移任务，即US-Speed，随机选择任务US(C)→US(A)以在不同电机速度下可视化所学习的特征表示。图3显示了比较结果。可以观察到，由本发明提出的WD-DTL形成的图3(c)中的簇比图3(a)中的CNN网络结果和图3(b)中的DAN域自适应的结果更好地分离了。更重要的是，可以观察到图3(c)中域自适应的明显改进，因为源域和目标域特征几乎是在同一个簇中。Feature Visualization: Non-linear dimensionality reduction for network visualization using T-distributed Stochastic Neighbor Embedding (t-SNE). For the transfer task between motor speeds, i.e., US-Speed, the task US(C)→US(A) is randomly selected to visualize the learned feature representations at different motor speeds. Figure 3 shows the comparison results. It can be observed that the clusters in Fig. 3(c) formed by the proposed WD-DTL of the present invention are better separated than the CNN network results in Fig. 3(a) and the results of DAN domain adaptation in Fig. 3(b) . More importantly, a clear improvement in domain adaptation can be observed in Fig. 3(c), since the source and target domain features are almost in the same cluster.

对于不同传感器位置之间的迁移任务，即US-Location和S-Location，传输任务US(E)→US(F)的t-SNE，结果如图4所示。可以看出虽然故障类型1、2和3难以清楚地分成单个聚类，但是WD-DTL显示出比CNN和DAN更好的聚类结果。必须强调的是，上述结果是通过在目标域中使用100％(4710)样本大小来执行的，即使在这种情况下，性能也不够满足。这就提出了当源域和目标域中的信号相关但不够相似时如何增强迁移学习性能的问题。For the migration tasks between different sensor locations, i.e. US-Location and S-Location, the t-SNE of the transfer task US(E)→US(F), the results are shown in Fig. 4. It can be seen that although fault types 1, 2 and 3 are difficult to be clearly divided into individual clusters, WD-DTL shows better clustering results than CNN and DAN. It must be emphasized that the above results are performed by using 100% (4710) sample size in the target domain, and even in this case the performance is not sufficient. This raises the question of how to enhance transfer learning performance when the signals in the source and target domains are correlated but not similar enough.

样本大小对无监督和监督准确性的影响：图5显示了任务US(E)→US(F)和S(E)→S(F)相对于US-Location和S-Location的WD-DTL的准确度变化曲线，其中样本数从10增加到2500。当样本数大于2500时，诊断精度将在固定值附近发生饱和，因此仅显示10到2500的结果。在图5(a)中，可以观察到，WD-DTL的准确度从59.47％增加，最终测试准确度限制在64％左右。当样本数量增加时，WD-DTL方法的故障诊断准确度均高于DAN和CNN。该分析表明，对于这种无监督的情况，样本数量的增加可以提高迁移学习的准确性，然而，即使目标域中的样本数量为100％，改进也是有限的(小于5％)。为了解决这个问题，在图5(b)中，采用少量标记数据来提高故障诊断的准确性，这与实际工业应用中标签数据有限的情况对应。该图显示，当标记的样本量大于20(总样本数4710)时，WD-DTL的迁移学习准确度将超过图5(a)中具有100％样本大小的情况(图5(a)中的蓝色区域)。更具体地说，仅使用100个标记样本(相当于每个故障分类25个)可以实现80％的迁移学习的准确性，这表明本发明提出的WD-DTL也是监督迁移任务的极好框架。Effect of sample size on unsupervised and supervised accuracy: Figure 5 shows the performance of WD-DTL for tasks US(E)→US(F) and S(E)→S(F) relative to US-Location and S-Location Accuracy curve where the number of samples is increased from 10 to 2500. When the number of samples is greater than 2500, the diagnostic accuracy will saturate around the fixed value, so only results from 10 to 2500 are displayed. In Fig. 5(a), it can be observed that the accuracy of WD-DTL increases from 59.47%, and the final test accuracy is limited to around 64%. When the number of samples increases, the fault diagnosis accuracy of WD-DTL method is higher than that of DAN and CNN. This analysis shows that for this unsupervised case, increasing the number of samples can improve the accuracy of transfer learning, however, the improvement is limited (less than 5%) even when the number of samples in the target domain is 100%. To solve this problem, in Fig. 5(b), a small amount of labeled data is adopted to improve the accuracy of fault diagnosis, which corresponds to the limited labeled data in practical industrial applications. The figure shows that when the labeled sample size is greater than 20 (total number of samples 4710), the transfer learning accuracy of WD-DTL will exceed the case with 100% sample size in Fig. 5(a) (Fig. 5(a) blue area). More specifically, 80% transfer learning accuracy can be achieved using only 100 labeled samples (equivalent to 25 for each fault classification), indicating that the proposed WD-DTL is also an excellent framework for supervised transfer tasks.

实施例四Embodiment 4

一种存储介质，所述存储介质中存储有指令，当计算机读取所述指令时，使所述计算机执行上述实施例一任一种基于Wasserstein距离的卷积神经网络对抗迁移学习方法和/或如上实施例三所述的任一种工业过程故障诊断方法。A storage medium, storing instructions in the storage medium, when a computer reads the instructions, the computer is made to execute any one of the Wasserstein distance-based convolutional neural network adversarial transfer learning methods and/or as above Any of the industrial process fault diagnosis methods described in the third embodiment.

先构建卷积神经网络架构，提取特征并引入Wasserstein距离来学习域的不变特征表示。通过对抗性训练过程，显著减少了域差异。相关技术方案同上，在此不再赘述。First build a convolutional neural network architecture, extract features and introduce Wasserstein distance to learn domain-invariant feature representation. Through the adversarial training process, the domain variance is significantly reduced. The related technical solutions are the same as above, and will not be repeated here.

本领域的技术人员容易理解，以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。Those skilled in the art can easily understand that the above are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention, etc., All should be included within the protection scope of the present invention.

Claims

1. a convolutional neural network confrontation transfer learning method based on Wasserstein distance, is characterized in that, comprises:

Step 1. From the source domain labeled dataset and the target domain dataset, determine the source domain labeled sample set and the target domain sample set;

Step 2, using the convolutional neural network to be migrated to obtain the source domain feature set and the source domain fault judgment set of the source domain marked sample set and the target feature set of the target domain sample set;

Step 3, with the goal of maximizing the Wasserstein distance between the source domain feature set and the target feature set and minimizing the sum of the Wasserstein distance and the judgment loss value of the source domain fault judgment set, optimize all The parameters of the convolutional neural network described above, based on the convergence criterion, repeat step 1, or, complete the adversarial transfer learning of the convolutional neural network.

2. a kind of convolutional neural network confrontation transfer learning method based on Wasserstein distance according to claim 1, is characterized in that, described Wasserstein distance is:

By using the domain similarity evaluation neural network, the source domain feature set is mapped to the source domain real number set, the target feature set is mapped to the target real number set, and the real number average value of the source domain real number set is subtracted from all real numbers. Calculate the average value of the real numbers of the target real number set.

3. a kind of convolutional neural network confrontation transfer learning method based on Wasserstein distance according to claim 2, is characterized in that, described step 3 comprises:

Step 3.1, with the goal of increasing the Wasserstein distance between the source domain feature set and the target feature set, optimize the parameters of the domain similarity evaluation neural network, and obtain a new domain similarity corresponding to the preset number of iterations Evaluate neural networks;

Step 3.2. Evaluate the neural network based on the source domain labeled sample set, the target domain sample set and the new domain similarity to reduce the addition of the Wasserstein distance and the judgment loss value of the convolutional neural network. and as the goal, optimize the parameters of the convolutional neural network to obtain a new convolutional neural network corresponding to the preset number of iterations;

Step 3.3: Evaluate the convergence criterion of the neural network and the convolutional neural network based on the domain similarity, stop training, and complete the confrontation transfer learning of the convolutional neural network, or repeat step 1.

4. a kind of convolutional neural network confrontation transfer learning method based on Wasserstein distance according to any one of claims 3, is characterized in that, in described step 3.1, described Wasserstein distance is described source domain based on gradient penalty the difference between the real average value of the real number set and the real number average value of the target real number set;

In the step 3.2, the Wasserstein distance is the difference between the real average value of the source domain real number set and the real number average value of the target real number set.

5. a kind of convolutional neural network confrontation transfer learning method based on Wasserstein distance according to claim 2, is characterized in that, described Wasserstein distance represents:

The real average value of the sub-real number set in the source domain minus the real average value of the target sub-real number set;

Wherein, the source domain sub-real number set is obtained from the source domain real number set through the Lipschitz constraint, and the target sub-real number set is obtained from the target real number set through the Lipschitz constraint.

6 . The Wasserstein distance-based convolutional neural network confrontation transfer learning method according to claim 5 , wherein the Lipschitz constraint is specifically a first-order Lipschitz constraint. 7 .

7. The Convolutional Neural Network confrontation transfer learning method based on Wasserstein distance according to any one of claims 1 to 6, wherein when the target domain sample set includes some labeled samples, the step 2 is:

Using the convolutional neural network to be transferred and learned, obtain the source domain feature set and source domain fault judgment set of the source domain marked sample set and the target feature set and target fault judgment set of the target domain sample set;

The step 3 is:

To maximize the Wasserstein distance between the source domain feature set and the target feature set and minimize the Wasserstein distance, the judgment loss value of the source domain fault judgment set and the judgment loss value of the target fault judgment set The sum of the convolutional neural network is optimized as the goal, and based on the convergence criterion, step 1 is repeated, or, the adversarial transfer learning of the convolutional neural network is completed.

8 . A convolutional neural network for fault diagnosis in an industrial process, characterized in that it is obtained by training with a Wasserstein distance-based convolutional neural network confrontation transfer learning method according to any one of claims 1 to 7 .

9. An industrial process fault diagnosis method, characterized in that, based on a convolutional neural network for industrial process fault diagnosis as claimed in claim 8, when receiving a method as claimed in any one of claims 1 to 7 When the convolutional neural network based on Wasserstein distance confronts a new sample in the target domain described in the transfer learning method, the industrial process fault judgment result corresponding to the new sample is obtained.

10. A storage medium, characterized in that, the storage medium stores instructions, and when a computer reads the instructions, the computer is made to execute the above-mentioned one based on any one of claims 1 to 7. Convolutional neural network adversarial transfer learning method with Wasserstein distance and/or an industrial process fault diagnosis method as claimed in claim 9.