WO2024060381A1

WO2024060381A1 - Incremental device fault diagnosis method

Info

Publication number: WO2024060381A1
Application number: PCT/CN2022/131657
Authority: WO
Inventors: 乔非; 关柳恩
Original assignee: 同济大学
Priority date: 2022-09-20
Filing date: 2022-11-14
Publication date: 2024-03-28
Also published as: CN115510963A

Abstract

An incremental device fault diagnosis method, using a trained fault diagnosis model to process data to be diagnosed having a new sample influx, to obtain a fault diagnosis result of a device. A construction method for the fault diagnosis model comprises: 1) constructing a complete sample set; 2) constructing an initial fault diagnosis model, and training the initial fault diagnosis model by using the complete sample set; 3) selectively retaining important sample subsets by using a sample retention method; 4) during the new sample influx, constructing an intermediate fault diagnosis model on the basis of the structure and parameters of the initial fault diagnosis model; and 5) on the basis of a knowledge distillation algorithm, jointly training the intermediate fault diagnosis model by using a new sample set and the important sample subsets, and obtaining and testing a final fault diagnosis model. The present invention realizes effective learning of new samples and retention of old samples, so that not only is the capability to determine new fault types achieved, but also good memory capability is kept for historical samples.

Description

An incremental equipment fault diagnosis method

Technical field

本发明属于设备故障诊断技术领域，涉及一种增量式设备故障诊断方法。The invention belongs to the technical field of equipment fault diagnosis and relates to an incremental equipment fault diagnosis method.

Background technique

现代系统复杂性提高，因此，对安全性、稳定性提出更高要求，需要及时对设备系统故障进行识别、诊断以及快速恢复，避免更严重的经济损失和人身安全事故。目前故障诊断模型的建立均需要满足训练样本和测试样本同分布的强假设，但是，随着设备的不断运行，可能出现新的故障特征和类型，原有模型不再适用，因此需要对故障诊断模型进行及时更新。故障诊断模型的更新过程通常需要结合新、旧样本，也就是采用完全样本集以批量学习的模式进行。但是，在这个过程中，设备长时间运行积累大量历史样本，存储成本较高，如果使用完备样本集重新训练模型，时间成本和计算成本比较高；如果只用新样本调整模型参数，模型很容易逐渐遗忘旧样本；更新频率过高会产生不必要的消耗，而过低则难以确保模型的性能。为了解决上述问题，人们提出增量学习更新方式，该方式能够不断地从持续产生的新样本中学习新知识，还能够保留大部分旧知识，不需要保存全部历史样本，减少存储空间的占用，充分利用历史训练结果，提高模型训练效率。The complexity of modern systems has increased. Therefore, higher requirements are put forward for safety and stability. Equipment system faults need to be identified, diagnosed and quickly restored in a timely manner to avoid more serious economic losses and personal safety accidents. At present, the establishment of fault diagnosis models requires the strong assumption that training samples and test samples are equally distributed. However, as the equipment continues to operate, new fault characteristics and types may appear, and the original model is no longer applicable. Therefore, fault diagnosis needs to be The model is updated in a timely manner. The update process of fault diagnosis models usually requires combining new and old samples, that is, using a complete sample set in batch learning mode. However, in this process, the equipment has accumulated a large number of historical samples after running for a long time, and the storage cost is high. If the complete sample set is used to retrain the model, the time cost and calculation cost are relatively high; if only new samples are used to adjust the model parameters, the model is easy to Gradually forget old samples; too high an update frequency will cause unnecessary consumption, while too low an update frequency will make it difficult to ensure the performance of the model. In order to solve the above problems, people have proposed an incremental learning update method, which can continuously learn new knowledge from new samples that are continuously generated, and can also retain most of the old knowledge. It does not need to save all historical samples and reduces the occupation of storage space. Make full use of historical training results to improve model training efficiency.

为了使诊断模型具备增量学习的能力，人们在机器学习、深度学习等方法的基础上进行算法层面上的功能延伸。其中，深度学习算法在抽象特征表示较传统机器学习算法更具有优势，被广泛应用在故障诊断领域，并且基于深度学习建立的故障诊断模型实现增量学习的方法多数采用对模型的结构和参数进行扩展，但是每次更新模型都需要精心设计额外增加的网络结构或者分配权重。In order to make the diagnostic model have the ability of incremental learning, people extend the functions at the algorithm level based on machine learning, deep learning and other methods. Among them, deep learning algorithms have more advantages than traditional machine learning algorithms in abstract feature representation and are widely used in the field of fault diagnosis. Most of the methods for achieving incremental learning in fault diagnosis models established based on deep learning adopt the structure and parameters of the model. Extension, but each update of the model requires careful design of additional network structures or allocation of weights.

对于学习新样本，经过检索中国公开号CN112508192A公开了一种具有深度结构的增量堆叠式宽度学习系统，该方法通过堆叠多个宽度学习系统进行模型的增量式更新，虽然只对新增参数进行训练，但是新模型结构去向复杂化且其特征映射关系受到固定旧参数比较大的限制，不利于提高模型的拟合能力；对于历史样本保留，为了能够大幅度降低数据的存储成本和模型的重训练成本，人们尝试从完备样本集中选择和保留重要样本替代完备样本集。现有技术中重要样本保留的常用方法是基于最近样本均值(NME)的样本保留方法，但是该方法只考虑当前迭代的最优样本，而非从整体考虑，其所选样本实质上是局部最优样本。当完备样本集中存在某个样本与均值中心十分贴近，将会导致样本子集中存在多个该样本，从而降低样本集多样性，造成较为严重的信息损失，导致模型无法有效保留旧样本知识。综上所述，现有技术存在模型复杂度高以及无法有效管理历史样本的缺点。For learning new samples, after searching the Chinese Publication No. CN112508192A, an incremental stacked width learning system with a deep structure was disclosed. This method incrementally updates the model by stacking multiple width learning systems, although only new parameters are added. Training is performed, but the direction of the new model structure is complicated and its feature mapping relationship is greatly restricted by the fixed old parameters, which is not conducive to improving the fitting ability of the model; for historical sample retention, in order to significantly reduce the data storage cost and model To reduce the training cost, people try to select and retain important samples from the complete sample set to replace the complete sample set. The common method of retaining important samples in the prior art is the sample retention method based on the nearest sample mean (NME). However, this method only considers the optimal sample of the current iteration rather than considering the whole. The selected sample is essentially the local optimal sample. Excellent sample. When there is a sample in the complete sample set that is very close to the mean center, there will be multiple samples of the same sample in the sample subset, thereby reducing the diversity of the sample set, causing serious information loss, and causing the model to be unable to effectively retain the knowledge of old samples. To sum up, the existing technology has the disadvantages of high model complexity and inability to effectively manage historical samples.

发明内容Contents of the invention

本发明的目的是提供一种由少量标注旧样本驱动的增量式设备故障诊断方法，以克服模型复杂度高以及历史样本管理效果不好的问题。The purpose of the present invention is to provide an incremental equipment fault diagnosis method driven by a small number of labeled old samples to overcome the problems of high model complexity and poor historical sample management effects.

本发明的目的可以通过以下技术方案来实现：The object of the present invention can be achieved through the following technical solutions:

一种增量式设备故障诊断方法，该方法利用一经训练的故障诊断模型处理有新样本涌入的待诊断数据，获得设备的故障诊断结果，所述的故障诊断模型的构建方法包括：An incremental equipment fault diagnosis method, which uses a trained fault diagnosis model to process the data to be diagnosed with the influx of new samples to obtain fault diagnosis results of the equipment. The construction method of the fault diagnosis model includes:

步骤S1、获取和处理与设备状态相关的传感器数据，构建完备样本集；Step S1: Obtain and process sensor data related to device status, and build a complete sample set;

步骤S2、基于深度神经网络构建初始故障诊断模型，并应用所述的完备样本集训练初始故障诊断模型；Step S2: Construct an initial fault diagnosis model based on the deep neural network, and use the complete sample set to train the initial fault diagnosis model;

步骤S3、基于遗传算法中的样本保留方法，从完备样本集中选择性保留用于表征完备样本集统计特性的重要样本子集；Step S3: Based on the sample retention method in the genetic algorithm, selectively retain important sample subsets from the complete sample set that are used to characterize the statistical characteristics of the complete sample set;

步骤S4、新样本涌入时，基于所述的初始故障诊断模型的结构和参数，构建中间故障诊断模型，并初始化其参数；Step S4: When new samples pour in, based on the structure and parameters of the initial fault diagnosis model, an intermediate fault diagnosis model is constructed and its parameters are initialized;

步骤S5、调整中间故障诊断模型用于参数优化的目标函数，基于知识蒸馏算法，使用新样本集和所述的重要样本子集共同训练中间故障诊断模型，得到并测试最终故障诊断模型，结束。Step S5: Adjust the objective function of the intermediate fault diagnosis model for parameter optimization. Based on the knowledge distillation algorithm, use the new sample set and the important sample subset to jointly train the intermediate fault diagnosis model, obtain and test the final fault diagnosis model, and end.

进一步地，所述的构建完备样本集包括对传感器数据进行归一化计算，将数值限制在[0，1]之间；对齐每个传感器时间，切割成若干段信号片段，每个片段作为一个样本，用以构建完备样本集。Further, the described construction of a complete sample set includes normalizing the sensor data and limiting the value to [0, 1]; aligning each sensor time and cutting it into several signal segments, each segment is used as a samples to construct a complete sample set.

进一步地，所述的选择性保留重要样本子集通过基于遗传算法的样本保留方法实现，包括：Further, the selective retention of important sample subsets is achieved through a sample retention method based on genetic algorithms, including:

S310、筛选被初始故障诊断模型分类正确的样本集；S310. Screen the sample set correctly classified by the initial fault diagnosis model;

S320、对筛选过的样本集索引进行二进制编码形成基因；S320. Binary encode the filtered sample set index to form a gene;

S330、随机初始化种群；S330, randomly initialize the population;

S340、计算所述的种群每个个体的适应度；S340. Calculate the fitness of each individual of the population;

S350、对种群进行轮盘赌选择、两点交叉和多点变异操作，回到步骤S340，直至满足迭代停止条件，生成最终种群；S350. Perform roulette selection, two-point crossover and multi-point mutation operations on the population, and return to step S340 until the iteration stop condition is met and the final population is generated;

S360、将最终种群中的最优个体进行解码得到重要样本子集。S360. Decode the optimal individuals in the final population to obtain important sample subsets.

进一步地，所述的适应度的计算方法包括：Further, the fitness calculation method includes:

步骤S341、将当前迭代生成的种群中的所有个体进行解码，得到每个个体对应的样本子集；Step S341: Decode all individuals in the population generated by the current iteration to obtain the sample subset corresponding to each individual;

步骤S342、将所述的完备样本集和当前样本子集分别输入故障诊断模型，得到各自的logits向量集合；Step S342: Enter the complete sample set and the current sample subset into the fault diagnosis model respectively to obtain respective logits vector sets;

步骤S343、计算完备样本集和当前样本子集的logits向量集合的均值中心，分别得到

和μ，计算每个个体的适应度。 Step S343: Calculate the mean center of the logits vector set of the complete sample set and the current sample subset, and obtain

and μ, calculate the fitness of each individual.

进一步地，所述的计算每个个体的适应度的计算公式为Further, the calculation formula for calculating the fitness of each individual is:

其中，F是每个个体的适应度。Among them, F is the fitness of each individual.

进一步地，所述的构建中间故障诊断模型，并初始化其参数，包括：Furthermore, the construction of the intermediate fault diagnosis model and initialization of its parameters include:

步骤S410、构建中间故障诊断模型，其结构与初始故障诊断模型结构相同，更新中间故障诊断模型的输出神经元数量，所述的输出神经元数量与样本集包含的故障类别数量相同；Step S410: Construct an intermediate fault diagnosis model with the same structure as the initial fault diagnosis model, and update the number of output neurons of the intermediate fault diagnosis model, where the number of output neurons is the same as the number of fault categories included in the sample set;

步骤S420、将初始故障诊断模型的神经元权重和偏置加载到中间故障诊断模型，作为其初始训练权重和参数，初始化多出的神经元权重和偏置，用于模仿零输出值。Step S420: Load the neuron weights and biases of the initial fault diagnosis model into the intermediate fault diagnosis model as its initial training weights and parameters, and initialize the excess neuron weights and biases to simulate zero output values.

进一步地，所述的得到最终故障诊断模型基于知识蒸馏算法实现，包括：Further, the final fault diagnosis model is implemented based on the knowledge distillation algorithm, including:

步骤S510、冻结初始故障诊断模型参数使其不参与参数优化过程，将所述的新样本集和重要样本子集合并成训练样本集；Step S510: Freeze the initial fault diagnosis model parameters so that they do not participate in the parameter optimization process, and merge the new sample set and important sample subsets into a training sample set;

步骤S520、将训练样本集同时输入初始故障诊断模型模型和中间故障诊断模型，在温度系数T调整下，分别得到关于旧类别的软标签和软预测分布，进而得到总蒸馏损失函数，并计算两者之间的蒸馏损失；Step S520: Input the training sample set into the initial fault diagnosis model and the intermediate fault diagnosis model at the same time. Under the adjustment of the temperature coefficient T, the soft label and soft prediction distribution of the old category are obtained respectively, and then the total distillation loss function is obtained, and the two are calculated. Distillation loss between

步骤S530、将训练样本集输入中间故障诊断模型，得到全类别的预测分布，计算所述的全类别的预测分布与该训练样本集的真实标签之间的交叉熵损失；Step S530: Input the training sample set into the intermediate fault diagnosis model to obtain the prediction distribution of all categories, and calculate the cross-entropy loss between the prediction distribution of all categories and the real labels of the training sample set;

步骤S540、将蒸馏损失与交叉熵损失相加得到总损失，总损失函数作为目标函数，用来反向优化中间故障诊断模型的参数，得到最终故障诊断模型。Step S540: Add the distillation loss and the cross-entropy loss to obtain the total loss. The total loss function is used as the objective function to reversely optimize the parameters of the intermediate fault diagnosis model to obtain the final fault diagnosis model.

进一步地，所述的测试最终故障诊断模型包括将温度系数T设置为1，将测试样本输入模型得到分类结果，并进行性能评价。Further, testing the final fault diagnosis model includes setting the temperature coefficient T to 1, inputting test samples into the model to obtain classification results, and performing performance evaluation.

进一步地，所述的多出的神经元权重和配置被初始化为1×10 ^-6。 Furthermore, the extra neuron weights and configurations are initialized to 1×10 ⁻⁶ .

进一步地，所述的总蒸馏损失函数的公式为：Furthermore, the formula of the total distillation loss function is:

其中，T表示温度系数，T大于1；softmax是归一化指数函数；cls _n和cls _o分别表示新、旧类别数量；

和

分别表示旧模型和新模型某一层输出的特征图第i个像素点；

表示旧模型输出的软标签，

表示新模型输出的与旧类别相关的软预测分布；θ表示深度神经网络的参数；ρ _l表示第l个蒸馏网络层的常系数；

表示第l个网络层的蒸馏损失，L _kd表示总蒸馏损失函数。 Among them, T represents the temperature coefficient, T is greater than 1; softmax is the normalized exponential function; cls _n and cls _o represent the number of new and old categories respectively;

and

Represents the i-th pixel of the feature map output by a certain layer of the old model and the new model respectively;

A soft label representing the output of the old model,

Represents the soft prediction distribution related to the old category output by the new model; θ represents the parameters of the deep neural network; ρ _l represents the constant coefficient of the l-th distillation network layer;

represents the distillation loss of the lth network layer, and L _kd represents the total distillation loss function.

与现有技术相比，本发明具有以下特点：Compared with the existing technology, the present invention has the following characteristics:

1.本发明基于知识蒸馏算法，使用新样本集和重要样本子集得到总损失函数作为目标函数共同训练构建的中间故障诊断模型得到最终训练模型，实现了旧样本迁移和有效学习新样本，使其不仅具备对新故障特征和新故障类型的判别能力，而且对于历史样本保持良好记忆能力。1. This invention is based on the knowledge distillation algorithm, uses the new sample set and important sample subsets to obtain the total loss function as the objective function and jointly trains the constructed intermediate fault diagnosis model to obtain the final training model, realizing the migration of old samples and effective learning of new samples, so that It not only has the ability to identify new fault characteristics and new fault types, but also has the ability to maintain good memory for historical samples.

2.本发明基于初始故障诊断模型的结构和参数，构建中间故障诊断模型，可以从初始故障诊断模型中挖掘新、旧样本之间的潜在相关性，适应性地约束模型参数优化的方向，进一步降低内存消耗。2. The present invention builds an intermediate fault diagnosis model based on the structure and parameters of the initial fault diagnosis model, which can mine the potential correlation between new and old samples from the initial fault diagnosis model, adaptively constrain the direction of model parameter optimization, and further Reduce memory consumption.

3.本发明基于深度学习建立故障诊断模型，其非线性特征表示能力和拟合能力优秀，能够从大量样本中提取关键特征并准确识别故障类别。3. The present invention establishes a fault diagnosis model based on deep learning, which has excellent nonlinear feature representation and fitting capabilities, and can extract key features from a large number of samples and accurately identify fault categories.

Description of drawings

图1为本发明增量式设备故障诊断方法流程图；Figure 1 is a flow chart of the incremental equipment fault diagnosis method of the present invention;

图2为本发明染色体编码示意图；Figure 2 is a schematic diagram of chromosome coding according to the present invention;

图3为本发明基于GA的样本保留方法流程图；Figure 3 is a flow chart of the GA-based sample retention method of the present invention;

图4为本发明更新模型结构的示意图；Figure 4 is a schematic diagram of the update model structure of the present invention;

图5为本发明知识蒸馏方法应用流程图，其中，(5a)：本发明知识蒸馏方法应用于训练过程的流程图；(5b)：本发明知识蒸馏方法应用于测试过程的流程图；Figure 5 is an application flow chart of the knowledge distillation method of the present invention, wherein (5a): a flow chart of the knowledge distillation method of the present invention applied to the training process; (5b): a flow chart of the knowledge distillation method of the present invention applied to the testing process;

图6为本发明增量式设备故障诊断方法实验结果，其中，(6a)：样本保留数目为5时的实验结果；(6b)：样本保留数目为10时的实验结果；(6c)：样本保留数目为20时的实验结果；(6d)：样本保留数目为30时的实验结果。Figure 6 shows the experimental results of the incremental equipment fault diagnosis method of the present invention, where (6a): the experimental result when the number of sample retention is 5; (6b): the experimental result when the number of sample retention is 10; (6c): sample Experimental results when the retention number is 20; (6d): Experimental results when the sample retention number is 30.

Detailed ways

下面结合附图和具体实施例对本发明进行详细说明。本实施例以本发明技术方案为前提进行实施，给出了详细的实施方式和具体的操作过程，但本发明的保护范围不限于下述的实施例。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. This embodiment is implemented based on the technical solution of the present invention and provides detailed implementation modes and specific operating procedures. However, the protection scope of the present invention is not limited to the following embodiments.

实施例：Example:

本实施例提出了一种增量式设备故障诊断方法，该方法利用一经训练的故障诊断模型处理有新样本涌入的待诊断数据，获得设备的故障诊断结果，故障诊断模型的构建方法包括：This embodiment proposes an incremental equipment fault diagnosis method. This method uses a trained fault diagnosis model to process the data to be diagnosed with the influx of new samples, and obtains the fault diagnosis results of the equipment. The construction method of the fault diagnosis model includes:

图1所示为增量式设备故障诊断方法流程示意图，包括以下步骤：Figure 1 shows a schematic flow chart of the incremental equipment fault diagnosis method, including the following steps:

步骤S2、基于深度神经网络构建初始故障诊断模型，并应用完备样本集训练初始故障诊断模型，优化其参数；Step S2: Construct an initial fault diagnosis model based on the deep neural network, train the initial fault diagnosis model using a complete sample set, and optimize its parameters;

步骤S3、从完备样本集中选择性保留用于表征完备样本集统计特性的重要样本子集；Step S3, selectively retaining an important sample subset from the complete sample set for characterizing the statistical characteristics of the complete sample set;

步骤S4、当有新样本涌入时，基于初始故障诊断模型构建中间故障诊断模型，并初始化参数；Step S4: When new samples pour in, build an intermediate fault diagnosis model based on the initial fault diagnosis model and initialize parameters;

步骤S5、调整中间故障诊断模型参数优化的目标函数，基于知识蒸馏算法，使用新样本集和重要样本子集共同训练中间故障诊断模型，得到并测试最终故障诊断模型，结束。Step S5, adjust the objective function of the intermediate fault diagnosis model parameter optimization, based on the knowledge distillation algorithm, use the new sample set and the important sample subset to jointly train the intermediate fault diagnosis model, obtain and test the final fault diagnosis model, and end.

其中，步骤S1包括：Among them, step S1 includes:

步骤S110、从多种监测设备状态的传感器数据中选择表征设备各种故障及健康状态的数据，其中状态数据要求能被连续测量和记录。Step S110: Select data representing various faults and health states of the equipment from a variety of sensor data for monitoring equipment status, where the status data must be continuously measured and recorded.

步骤S120、对传感器数据进行归一化计算，将数值限制在[0，1]之间。对其每个传感器时间，切割成若干段信号片段，每个片段作为一个样本，用以构建完备样本集。Step S120: Perform normalization calculation on the sensor data and limit the value to [0, 1]. For each sensor time, it is cut into several signal segments, and each segment is used as a sample to construct a complete sample set.

其中，步骤S2包括：Among them, step S2 includes:

步骤S210、基于CNN构建深度神经网络模型，并且采用全连接层和softmax层实现多分类任务。输入特征图大小和通道数根据采样大小和数据源数量决定，最后输出的是输入样本隶属于各个故障类别的概率向量。Step S210: Build a deep neural network model based on CNN, and use fully connected layers and softmax layers to implement multi-classification tasks. The size of the input feature map and the number of channels are determined based on the sampling size and the number of data sources. The final output is the probability vector that the input sample belongs to each fault category.

步骤S220、预先设定所有与模型训练相关的超参数，例如学习率、最大迭代次数等等。使用完备样本集训练故障诊断模型，采用Adam优化器优化模型的参数。Step S220: Preset all hyperparameters related to model training, such as learning rate, maximum number of iterations, etc. Use a complete sample set to train the fault diagnosis model, and use the Adam optimizer to optimize the parameters of the model.

其中，见图2，步骤S3从完备样本集中选取和存储重要样本子集以表征完备样本集的统计特性，具体包括：As shown in FIG2 , step S3 selects and stores important sample subsets from the complete sample set to characterize the statistical characteristics of the complete sample set, specifically including:

步骤S310、筛选被初始故障诊断模型从完备样本集中分类正确的样本；Step S310: Screen the samples that are correctly classified by the initial fault diagnosis model from the complete sample set;

步骤S320、染色体二进制编码。对筛选过的样本集索引进行二进制编码，长度等于需要构建的偶数的样本子集的大小，见图3，“11”表示完备样本集中的索引下标为11的一个样本，用二进制编码成基因“1011”，其余以此类推；Step S320: Chromosome binary encoding. The index of the filtered sample set is binary encoded, and the length is equal to the size of the even number of sample subsets that need to be constructed. See Figure 3. "11" represents a sample with the index subscript 11 in the complete sample set, which is encoded into genes in binary "1011", and so on;

步骤S330、随机初始化种群；Step S330, randomly initialize the population;

步骤S340、计算种群中每个个体的适应度；Step S340: Calculate the fitness of each individual in the population;

步骤S350、使用轮盘赌选择、两点交叉和多点变异作为选择算子、交叉算子和变异算子生成新种群，回到步骤S340，直至满足迭代停止条件，生成最终种群；Step S350, use roulette selection, two-point crossover and multi-point mutation as the selection operator, crossover operator and mutation operator to generate a new population, return to step S340 until the iteration stop condition is met and the final population is generated;

步骤S360、将最终种群中的最优个体进行解码得到所有样本索引，利用相应的样本构建样本子集，生成重要样本子集。Step S360: Decode the optimal individuals in the final population to obtain all sample indices, use corresponding samples to construct sample subsets, and generate important sample subsets.

其中，步骤S340中适应度的计算方法包括：Among them, the calculation method of fitness in step S340 includes:

步骤S342、将所述的完备样本集和当前样本子集分别输入故障诊断模型，得到完备样本集的logits向量集合和当前样本子集的logits向量集合，logits向量是最后一层全连接层的输出向量。Step S342: Input the complete sample set and the current sample subset into the fault diagnosis model respectively to obtain the logits vector set of the complete sample set and the logits vector set of the current sample subset. The logits vector is the output of the last fully connected layer. vector.

和μ，然后计算适应度

获得当前种群各个个体的适应度。 Step S343: Calculate the mean center of the logits vector set of the complete sample set and the current sample subset, and obtain

and μ, and then calculate the fitness

Obtain the fitness of each individual in the current population.

在具体的实施方式中，步骤S4中构建中间故障诊断模型，并优化其参数包括：In a specific implementation, constructing an intermediate fault diagnosis model in step S4 and optimizing its parameters include:

步骤S410、构建中间故障诊断模型，其结构与初始故障诊断模型结构相同，更新中间故障诊断模型的输出神经元数量，输出神经元数量与样本集包含的故障类别数量相同，见图4；Step S410: Construct an intermediate fault diagnosis model with the same structure as the initial fault diagnosis model. Update the number of output neurons of the intermediate fault diagnosis model. The number of output neurons is the same as the number of fault categories included in the sample set, see Figure 4;

步骤S420、将初始故障诊断模型的权重和偏置加载到这一轮中间故障诊断模型中，作为其初始训练权重和参数，由于初始故障诊断模型的输出层神经元数量少于中间故障诊断模型的输出层神经元数量，因此将多出的神经元权重和偏置初始化为1×10 ^-6，用于模仿零输出值。 Step S420: Load the weights and biases of the initial fault diagnosis model into this round of intermediate fault diagnosis models as their initial training weights and parameters. Since the number of output layer neurons of the initial fault diagnosis model is less than that of the intermediate fault diagnosis model, The number of output layer neurons, so the extra neuron weights and biases are initialized to 1×10 ^-6 to simulate zero output values.

其中，步骤S5中得到最终故障诊断模型包括：Among them, the final fault diagnosis model obtained in step S5 includes:

步骤S510、冻结初始故障诊断模型参数使其不参与参数优化过程，将新样本集和重要样本子集合并成训练样本集；Step S510: Freeze the initial fault diagnosis model parameters so that they do not participate in the parameter optimization process, and merge the new sample set and important sample subsets into a training sample set;

步骤S520、将训练样本集同时输入初始故障诊断模型和中间故障诊断模型，分别得到在温度系数T调整下，关于旧类别的软标签和软预测分布，并根据旧、新类别数之比计算两者之间的蒸馏损失；Step S520: Input the training sample set into the initial fault diagnosis model and the intermediate fault diagnosis model at the same time to obtain the soft label and soft prediction distribution of the old category under the adjustment of the temperature coefficient T, and calculate the two based on the ratio of the number of old and new categories. Distillation loss between

步骤S530、将训练样本集输入中间故障诊断模型，得到全类别的预测分布，计算全类别的预测分布与该训练样本集的真实标签之间的交叉熵损失；Step S530: input the training sample set into the intermediate fault diagnosis model to obtain the predicted distribution of all categories, and calculate the cross entropy loss between the predicted distribution of all categories and the true label of the training sample set;

其中，测试最终故障诊断模型包括将温度系数T设置为1，将测试样本输入模型得到分类结果，并进行性能评价。Among them, testing the final fault diagnosis model includes setting the temperature coefficient T to 1, inputting the test sample into the model to obtain the classification results, and conducting performance evaluation.

见图(5a)为本发明知识蒸馏方法应用于训练过程的流程示意图，图(5b)为本发明知识蒸馏方法应用与测试过程的流程示意图。Figure (5a) is a schematic flowchart of the application of the knowledge distillation method in the training process of the present invention, and Figure (5b) is a schematic flowchart of the application and testing process of the knowledge distillation method of the present invention.

在具体的实施方式中，步骤S520中训练样本集分别输入初始故障诊断模型和中间故障诊断模型时，对多个中间层输出的特征图计算蒸馏损失，并且根据样本类别比例变化是硬性的调整损失系数。具体计算如下：In a specific implementation, when the training sample set is input into the initial fault diagnosis model and the intermediate fault diagnosis model respectively in step S520, the distillation loss is calculated for the feature maps output by multiple intermediate layers, and the loss is rigidly adjusted according to the change in sample category proportion. coefficient. The specific calculation is as follows:

单一网络层的蒸馏损失函数的计算公式为：The calculation formula of the distillation loss function of a single network layer is:

和

分别表示旧模型和新模型某一层输出的特征图第i个像素点；

表示旧模型输出的软标签，

and

A soft label representing the output of the old model,

对于深度神经网络，需要对每一个具有下采样功能的网络层进行知识蒸馏。由于这样的网络层通常分布均匀，因此可以根据网络深度动态调整蒸馏网络层的数量。For deep neural networks, knowledge distillation needs to be performed on each network layer with downsampling function. Since such network layers are usually evenly distributed, the number of distilled network layers can be dynamically adjusted based on the network depth.

根据新、旧类别数量变化情况，设定适应性系数，当每个类别样本数相当，可用旧类别数cls _o与新类别数cls _n之比进行调整。其次，考虑到温度系数T对蒸馏损失幅值的影响，采用T2进行补偿。 According to the changes in the number of new and old categories, the adaptability coefficient is set. When the number of samples in each category is equivalent, the ratio of the number of old categories cls _o to the number of new categories cls _n can be adjusted. Secondly, considering the influence of temperature coefficient T on the amplitude of distillation loss, T2 is used for compensation.

总蒸馏损失函数的计算公式为：The calculation formula of the total distillation loss function is:

其中，L _kd是总蒸馏损失函数，ρ _l表示第l个蒸馏网络层的常系数，

表示第l个蒸馏网络层的蒸馏损失和常系数，cls _n和cls _o分别表示新、旧类别数量。 Among them, L _kd is the total distillation loss function, ρ _l represents the constant coefficient of the l-th distillation network layer,

Represents the distillation loss and constant coefficient of the l-th distillation network layer, cls _n and cls _o represent the number of new and old categories respectively.

在一个具体的实施方式中，为了验证实施例的性能，采用美国凯斯西储大学(Case Western Reserve University，简称CWRU)轴承数据集作案例研究和分析。CWRU数据集的信息如下：共有3个加速度计采集不同端的振动数据，分别为驱动端加速度计数据(DE)、风扇端加速度计数据(FE)以及基本加速度数据(BA)；共有4种转速不同的运行状态，分别是1730、1750、1772和1797rpm；共有3种不同故障直径，分别为0.007、0.014和0.021；共有3种故障状态，分别是内圈故障(IRF)、滚动体故障(BF)以及外圈故障(ORF)，其中外圈故障还包含3个测点，分别为直接位于受载区的6点钟位置、正交于受载区的3点钟位置和与受载区相对的12点钟位置。In a specific implementation, in order to verify the performance of the embodiment, the Case Western Reserve University (CWRU) bearing data set in the United States is used for case study and analysis. The information of the CWRU data set is as follows: There are 3 accelerometers collecting vibration data at different ends, namely the drive end accelerometer data (DE), the fan end accelerometer data (FE) and the basic acceleration data (BA); there are 4 different speeds. The operating states are 1730, 1750, 1772 and 1797rpm respectively; there are 3 different fault diameters, 0.007, 0.014 and 0.021 respectively; there are 3 fault states, namely inner ring fault (IRF) and rolling element fault (BF) And outer ring fault (ORF), in which the outer ring fault also includes 3 measuring points, namely the 6 o'clock position directly located in the load area, the 3 o'clock position orthogonal to the load area, and the 3 o'clock position opposite to the load area. 12 o'clock position.

本实施例使用故障直径为0.007、0.014和0.021的轴承在转速为1797rpm下运行的状态数据，涵盖IRF、BF和ORF(受载位置6：00)三种故障。轴承状态数据包括DE、FE和BA数据，采样频率为12kHz。将数据集故障状态分为9类，具体故障类型编号如表1所示，类别编号范围为0-8；各故障类型包含的数据量如表2所示。模型结构参数如表3所示，其中cls表示当前故障类别数，步长为2的卷积层输出都将用于计算蒸馏损失。This embodiment uses the status data of bearings with fault diameters of 0.007, 0.014 and 0.021 operating at a rotation speed of 1797 rpm, covering three faults: IRF, BF and ORF (loaded position 6:00). Bearing status data includes DE, FE and BA data, with a sampling frequency of 12kHz. The fault status of the data set is divided into 9 categories. The specific fault type numbers are shown in Table 1, and the category number range is 0-8; the amount of data contained in each fault type is shown in Table 2. The model structure parameters are shown in Table 3, where cls represents the current number of fault categories, and the output of the convolutional layer with a step size of 2 will be used to calculate the distillation loss.

表1 CWRU数据集各故障类型编号Table 1 Fault type numbers in the CWRU data set

表2 CWRU数据集各故障类型的数据量Table 2 Data volume of each fault type in the CWRU data set

表3网络模型参数Table 3 Network model parameters

本实施例分为四个阶段，即1个初始学习阶段和3个增量学习阶段，分别简称为“初始阶段”和“增量阶段i”(i＝1，2，3)。初始阶段只有故障类别0-2的样本，后面每个增量阶段会出现两个新类别的样本用于模型更新学习。增量式设备故障诊断实验结果如图6所示。该实验对比了仅采用样本保留方法、以及结合知识蒸馏和样本保留方法进行增量学习这两者的效果，其中“样本保留N”表示仅使用基于GA的样本保留方法且保留N个关键样本，“知识蒸馏+样本保留N”表示使用基于知识蒸馏和样本保留方法进行增量学习且保留N个关键样本。This embodiment is divided into four stages, namely one initial learning stage and three incremental learning stages, which are respectively referred to as "initial stage" and "incremental stage i" (i=1, 2, 3). In the initial stage, there are only samples of fault categories 0-2, and in each subsequent incremental stage, two new categories of samples will appear for model update learning. The experimental results of incremental equipment fault diagnosis are shown in Figure 6. This experiment compared the effects of using only the sample retention method and combining the knowledge distillation and sample retention methods for incremental learning, where "sample retention N" means only using the GA-based sample retention method and retaining N key samples. "Knowledge distillation + sample retention N" means using methods based on knowledge distillation and sample retention for incremental learning and retaining N key samples.

从图(6a)可见，“样本保留5”在整个实验过程中的增量学习效果最差，到增量阶段3的时候，模型对类别0-2、3-4、5-6的测试集样本的诊断精度分别已经下降到42.16％、56.39％和51.55％。而“知识蒸馏+样本保留5”由于正则化作用，到增量阶段3的时候，模型对类别0-2、3-4、5-6测试集样本的诊断精度分别为63.17％、90.59％和55.05％，整体上有了比较大幅度的精度提升。It can be seen from Figure (6a) that "sample retention 5" has the worst incremental learning effect during the entire experiment. By the incremental stage 3, the model has a better performance on the test sets of categories 0-2, 3-4, and 5-6. The diagnostic accuracy of the samples has dropped to 42.16%, 56.39% and 51.55% respectively. Due to the regularization effect of "Knowledge Distillation + Sample Retention 5", by the incremental stage 3, the model's diagnostic accuracy for category 0-2, 3-4, and 5-6 test set samples are 63.17%, 90.59%, and 90.59% respectively. 55.05%, which shows a relatively large accuracy improvement overall.

同样地，从图(6b)可知“知识蒸馏+样本保留10”到增量阶段3时对类别 0-2、3-4、5-6的测试集样本的诊断精度分别为83.73％、97.44％和70.79％，总体高于“样本保留10”的增量学习效果(61.37％，79.71％和76.97％)。Similarly, it can be seen from Figure (6b) that the diagnostic accuracy of the test set samples of categories 0-2, 3-4, and 5-6 from "knowledge distillation + sample retention 10" to incremental stage 3 is 83.73% and 97.44% respectively. and 70.79%, which is generally higher than the incremental learning effect of "sample retention 10" (61.37%, 79.71% and 76.97%).

图(6c)、图(6d)中，当样本保留数目达到20和30时，在知识蒸馏和样本保留共同作用下，模型增量学习的效果基本保持平稳，无论是在新类别还是在旧类别上，都具有比较良好的故障类别判别性能。其中，“知识蒸馏+样本保留30”与“知识蒸馏+样本保留20”之间总的精度差值为2.29％；如若只论增量阶段3的话，其诊断精度总差值为1.35％。这意味着每个旧类别样本保留数目多了10个，但是总的诊断精度增幅却比较小。“知识蒸馏+样本保留20”比“样本保留30”总的诊断精度还高了1.01％。这意味着此时模型的精度已经接近饱和，而且使用知识蒸馏可以在进一步减少样本保留数量的同时，提高模型对旧类别样本的诊断精度。In Figure (6c) and Figure (6d), when the number of retained samples reaches 20 and 30, under the joint action of knowledge distillation and sample retention, the effect of model incremental learning remains basically stable, and it has relatively good fault category discrimination performance both in new categories and old categories. Among them, the total accuracy difference between "knowledge distillation + sample retention 30" and "knowledge distillation + sample retention 20" is 2.29%; if only the incremental stage 3 is considered, the total difference in diagnostic accuracy is 1.35%. This means that the number of retained samples of each old category has increased by 10, but the total increase in diagnostic accuracy is relatively small. The total diagnostic accuracy of "knowledge distillation + sample retention 20" is 1.01% higher than that of "sample retention 30". This means that the accuracy of the model is close to saturation at this time, and the use of knowledge distillation can further reduce the number of retained samples while improving the model's diagnostic accuracy for old category samples.

上述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the above functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .

以上详细描述了本发明的较佳具体实施例。应当理解，本领域的普通技术人员无需创造性劳动就可以根据本发明的构思作出诸多修改和变化。因此，凡本技术领域中技术人员依本发明的构思在现有技术的基础上通过逻辑分析、推理或者有限的实验可以得到的技术方案，皆应在由权利要求书所确定的保护范围内。The preferred embodiments of the present invention are described in detail above. It should be understood that those skilled in the art can make many modifications and changes based on the concept of the present invention without creative efforts. Therefore, any technical solutions that can be obtained by those skilled in the art through logical analysis, reasoning or limited experiments based on the concept of the present invention and on the basis of the prior art should be within the scope of protection determined by the claims.

Claims

An incremental equipment fault diagnosis method, which is characterized in that the method uses a trained fault diagnosis model to process the data to be diagnosed with the influx of new samples, and obtains fault diagnosis results of the equipment. The construction method of the fault diagnosis model include:

Step S1: Obtain and process sensor data related to device status, and build a complete sample set;

Step S2: Construct an initial fault diagnosis model based on the deep neural network, and use the complete sample set to train the initial fault diagnosis model;

Step S3: Based on the sample retention method in the genetic algorithm, selectively retain important sample subsets from the complete sample set that are used to characterize the statistical characteristics of the complete sample set;

Step S4: When new samples pour in, based on the structure and parameters of the initial fault diagnosis model, an intermediate fault diagnosis model is constructed and its parameters are initialized;

Step S5: Adjust the objective function of the intermediate fault diagnosis model for parameter optimization. Based on the knowledge distillation algorithm, use the new sample set and the important sample subset to jointly train the intermediate fault diagnosis model, obtain and test the final fault diagnosis model, and end.

An incremental equipment fault diagnosis method according to claim 1, characterized in that said constructing a complete sample set includes normalizing the sensor data and limiting the value to [0, 1]; Align the time of each sensor and cut it into several signal fragments. Each fragment is used as a sample to build a complete sample set.

An incremental equipment fault diagnosis method according to claim 1, characterized in that the selective retention of important sample subsets is implemented by a sample retention method based on genetic algorithms, including:

S310. Screen the sample set correctly classified by the initial fault diagnosis model;

S320. Binary encode the filtered sample set index to form a gene;

S330, randomly initialize the population;

S340. Calculate the fitness of each individual of the population;

S350. Perform roulette selection, two-point crossover and multi-point mutation operations on the population, and return to step S340 until the iteration stop condition is met and the final population is generated;

S360. Decode the optimal individuals in the final population to obtain important sample subsets.

An incremental equipment fault diagnosis method according to claim 3, characterized in that the fitness calculation method includes:

Step S341: Decode all individuals in the population generated by the current iteration to obtain the sample subset corresponding to each individual;

Step S342: Enter the complete sample set and the current sample subset into the fault diagnosis model respectively to obtain respective logits vector sets;

Step S343: Calculate the mean center of the logits vector set of the complete sample set and the current sample subset, and obtain

and μ, calculate the fitness of each individual.

An incremental equipment fault diagnosis method according to claim 4, characterized in that the calculation formula for calculating the fitness of each individual is:

Among them, F is the fitness of each individual.

An incremental equipment fault diagnosis method according to claim 1, characterized in that said constructing an intermediate fault diagnosis model and initializing its parameters includes:

Step S410: Construct an intermediate fault diagnosis model with the same structure as the initial fault diagnosis model, and update the number of output neurons of the intermediate fault diagnosis model, where the number of output neurons is the same as the number of fault categories included in the sample set;

Step S420: Load the neuron weights and biases of the initial fault diagnosis model into the intermediate fault diagnosis model as its initial training weights and parameters, and initialize the excess neuron weights and biases to simulate zero output values.

An incremental equipment fault diagnosis method according to claim 1, characterized in that the final fault diagnosis model is implemented based on a knowledge distillation algorithm, including:

Step S510: Freeze the initial fault diagnosis model parameters so that they do not participate in the parameter optimization process, and merge the new sample set and important sample subsets into a training sample set;

Step S520: Input the training sample set into the initial fault diagnosis model and the intermediate fault diagnosis model at the same time. Under the adjustment of the temperature coefficient T, the soft label and soft prediction distribution of the old category are obtained respectively, and then the total distillation loss function is obtained, and the two are calculated. Distillation loss between

Step S530: Input the training sample set into the intermediate fault diagnosis model to obtain the prediction distribution of all categories, and calculate the cross-entropy loss between the prediction distribution of all categories and the real labels of the training sample set;

Step S540: Add the distillation loss and the cross-entropy loss to obtain the total loss. The total loss function is used as the objective function to reversely optimize the parameters of the intermediate fault diagnosis model to obtain the final fault diagnosis model.

An incremental equipment fault diagnosis method according to claim 1, characterized in that said testing the final fault diagnosis model includes setting the temperature coefficient T to 1, inputting the test sample into the model to obtain the classification result, and performing performance testing. evaluate.

An incremental equipment fault diagnosis method according to claim 6, characterized in that the weight and configuration of the extra neurons are initialized to 1×10 ^-6 .

An incremental equipment fault diagnosis method according to claim 7, characterized in that the formula of the total distillation loss function is:

Among them, T represents the temperature coefficient, T is greater than 1; softmax is the normalized exponential function; cls _n and cls _o represent the number of new and old categories respectively;

and

A soft label representing the output of the old model,

represents the distillation loss of the first network layer, and L _kd represents the total distillation loss function.