CN114609994B

CN114609994B - Fault diagnosis method and device based on multi-granularity regularized rebalancing incremental learning

Info

Publication number: CN114609994B
Application number: CN202210174747.3A
Authority: CN
Inventors: 王煜; 陈慧彤; 胡清华
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2023-11-07
Anticipated expiration: 2042-02-24
Also published as: CN114609994A

Abstract

The invention discloses a fault diagnosis method and device based on multi-granularity regularization rebalancing increment learning, wherein the method comprises the following steps: constructing a continuous label with multi-granularity information, and optimizing by utilizing KL divergence loss; the feature extraction layer is used for obtaining feature expression vectors of new and old categories, decision output of the current model is constrained to be identical to output distribution of the model before incremental learning through knowledge distillation, relatively low weight is applied to the new categories with a large number of samples based on a multi-granularity regularization term, gap between new fault categories and gradient update of the old fault categories is balanced, and category imbalance is relieved; and adopting a two-stage training strategy, performing a first-stage training by using the data which can be acquired currently, updating the feature extraction layer, decoupling with the feature extraction layer during a second-stage training, namely freezing parameters of the feature extraction layer, and retraining the classifier by adopting a resampled balance training subset. The device comprises: a processor and a memory. The invention improves the continuous learning capability of the fault diagnosis model, thereby improving the fault recognition capability of the device.

Description

Fault diagnosis method and device based on multi-granularity regularized rebalanced incremental learning

技术领域Technical Field

本发明涉及机器学习的智能故障诊断领域，尤其涉及一种基于多粒度正则化重平衡增量学习的故障诊断方法及装置。The present invention relates to the field of intelligent fault diagnosis of machine learning, and in particular to a fault diagnosis method and device based on multi-granularity regularized rebalanced incremental learning.

背景技术Background Art

随着当代科技的飞速发展，大型装备不断推陈出新，在交通、能源、电力和航空航天等各领域中得到了广泛的使用，其安全稳定、可靠高效的运行与国家发展、国防建设息息相关。同时，随着运行时间的推移，重大装备不可避免的会发生故障，导致灾难性事故时有发生；而大型设备的复杂性和精密性导致其故障难以通过简单的探测手段检出。因此，如何设计面向大型设备智能故障诊断研究成为了工业智能化所面临的一个亟需解决的问题。With the rapid development of contemporary science and technology, large-scale equipment is constantly being innovated and widely used in various fields such as transportation, energy, electricity, and aerospace. Its safe, stable, reliable and efficient operation is closely related to national development and national defense construction. At the same time, as the operation time goes by, major equipment will inevitably fail, resulting in catastrophic accidents from time to time; and the complexity and precision of large-scale equipment make it difficult to detect its failures through simple detection methods. Therefore, how to design intelligent fault diagnosis research for large-scale equipment has become an urgent problem facing industrial intelligence.

现有智能化故障诊断方法通过收集装备不同时刻的传感器信息，基于所整理的数据设计深度神经网络训练后进行部署，针对新收集的传感器数据进行预测以判断设备当前属于正常状态或是某个故障状态。这种方法虽然能够达到不错的性能，但是智能故障诊断模型一旦部署将无法更新。通常地，装备的故障类型随着使用年限的增长会逐渐发生变化，之前未发生过的故障会接踵而至。现有方法由于仅能识别学习过的旧类型故障，因此难以进行实际使用。Existing intelligent fault diagnosis methods collect sensor information from equipment at different times, design deep neural networks based on the collected data, and then deploy them after training. They predict the newly collected sensor data to determine whether the equipment is currently in a normal state or a certain fault state. Although this method can achieve good performance, the intelligent fault diagnosis model cannot be updated once deployed. Generally, the fault type of equipment will gradually change with the increase of service life, and faults that have not occurred before will follow one after another. Existing methods are difficult to use in practice because they can only identify old types of faults that have been learned.

增量学习是使得模型进行更新以掌握新任务的一种方式，即模型在先前任务的基础上学习新任务，达到同时处理新旧任务的能力。一般地，增量学习模型在学习大量新任务数据的同时，可保留少部分旧任务代表性数据进行旧任务的知识保存。但是，由于新任务的数据量远高于旧任务，造成了深度神经网络模型在学习新任务时对旧任务相关的重要参数的改变，导致其对新旧任务的识别产生了严重偏差，即参数偏向于学习新任务，使得旧任务的识别能力大大下降。Incremental learning is a way to update the model to master new tasks, that is, the model learns new tasks based on previous tasks, so as to achieve the ability to handle new and old tasks at the same time. Generally, while learning a large amount of new task data, the incremental learning model can retain a small amount of representative data of old tasks to preserve the knowledge of old tasks. However, since the amount of data for new tasks is much higher than that for old tasks, the deep neural network model changes important parameters related to the old tasks when learning new tasks, resulting in serious deviations in its recognition of new and old tasks, that is, the parameters are biased towards learning new tasks, which greatly reduces the recognition ability of old tasks.

针对此问题，现有方法设计了一些偏差纠正的方法，添加线性纠正层、对新旧类别的输出余弦归一化等，但它们在很大程度上依赖于新旧类之间的偏差关系的假设。在实际应用中，这种假设很难成立，因此会损害模型的性能，并且难以适应复杂的实际应用场景。因此，如何解决增量学习中对旧任务的遗忘问题，是智能化故障诊断模型能否有效大规模应用的关键。To address this problem, existing methods have designed some deviation correction methods, such as adding linear correction layers, normalizing the output cosine of new and old categories, etc. However, they rely heavily on the assumption of the deviation relationship between the new and old categories. In practical applications, this assumption is difficult to hold, which will damage the performance of the model and make it difficult to adapt to complex practical application scenarios. Therefore, how to solve the problem of forgetting old tasks in incremental learning is the key to whether the intelligent fault diagnosis model can be effectively applied on a large scale.

发明内容Summary of the invention

本发明提供了一种基于多粒度正则化重平衡增量学习的故障诊断方法及装置，本发明考虑到故障诊断任务增量学习中新旧类型故障的偏差分布难以获得先验假设的问题，设计了一种基于多粒度正则化重平衡方法，约束模型能同时学习平衡的数据分布以及不同故障类别之间的相关性，使得模型在准确学习识别新故障时减少对已学习旧类别故障的知识遗忘，从而提高故障诊断模型的持续学习能力，进而提高对大型装备的故障识别能力，大幅度增加设备的安全性，降低故障率，详见下文描述：The present invention provides a fault diagnosis method and device based on multi-granularity regularized rebalancing incremental learning. Considering the problem that it is difficult to obtain a priori assumptions about the deviation distribution of new and old types of faults in incremental learning of fault diagnosis tasks, the present invention designs a multi-granularity regularized rebalancing method. The constraint model can simultaneously learn the balanced data distribution and the correlation between different fault categories, so that the model can reduce the knowledge forgetting of the learned old category faults when accurately learning to identify new faults, thereby improving the continuous learning ability of the fault diagnosis model, and then improving the fault identification ability of large equipment, greatly increasing the safety of the equipment, and reducing the failure rate. See the following description for details:

第一方面，一种基于多粒度正则化重平衡增量学习的故障诊断方法，所述方法包括：In a first aspect, a fault diagnosis method based on multi-granularity regularized rebalanced incremental learning is provided, the method comprising:

对划分后的数据集进行类别词向量表示，获取其语义标签所对应的词向量，使用K-means算法进行聚类获取一个两层的多粒度结构；构建带有多粒度信息的连续标签，并利用KL散度损失进行优化；The divided data set is represented by category word vectors, the word vectors corresponding to its semantic labels are obtained, and the K-means algorithm is used for clustering to obtain a two-layer multi-granularity structure; continuous labels with multi-granularity information are constructed and optimized using KL divergence loss;

通过特征提取层获取新旧类别的特征表示向量，通过知识蒸馏约束当前模型的决策输出与增量学习前模型的输出分布相同，基于多粒度正则化项，对于样本数目多的新故障类别施加相对低的权重，平衡新故障类别与旧故障类别梯度更新的差距，减轻类别不均衡性；The feature representation vectors of new and old categories are obtained through the feature extraction layer. The decision output of the current model is constrained to be the same as the output distribution of the model before incremental learning through knowledge distillation. Based on the multi-granularity regularization term, a relatively low weight is applied to new fault categories with a large number of samples to balance the gap between the gradient updates of new fault categories and old fault categories, thereby reducing category imbalance.

采取两阶段训练策略，使用当前能够获取到的数据进行第一阶段训练，更新特征提取层，在第二阶段训练时将分类器与特征提取层解耦，即冻结特征提取层的参数，采用重采样的平衡训练子集重新训练分类器。A two-stage training strategy is adopted. The currently available data is used for the first stage training to update the feature extraction layer. In the second stage training, the classifier is decoupled from the feature extraction layer, that is, the parameters of the feature extraction layer are frozen, and the classifier is retrained using a resampled balanced training subset.

其中，所述第一阶段训练过程为：将新类别样本D_new与旧类范例集D′_old混合，记作D_t作为模型的输入，经过特征提取层、分类器得到网络的输出，优化目标为最小化蒸馏损失和多粒度正则化的重平衡模块损失的加权和。Among them, the first stage training process is: mix the new category samples D _new with the old category example set D′ _old , denoted as D _t as the input of the model, and obtain the output of the network through the feature extraction layer and classifier. The optimization goal is to minimize the weighted sum of the distillation loss and the rebalancing module loss of multi-granularity regularization.

进一步地，所述第二阶段训练过程为：将数据采样成类别平衡的训练子集，冻结特征提取层的参数，单独重新训练分类器；优化目标为最小化蒸馏损失和多粒度正则化的重平衡模块损失的加权和。Furthermore, the second stage training process is: sampling the data into category-balanced training subsets, freezing the parameters of the feature extraction layer, and retraining the classifier separately; the optimization goal is to minimize the weighted sum of the distillation loss and the multi-granularity regularized rebalancing module loss.

第二方面，一种基于多粒度正则化重平衡增量学习的故障诊断装置，所述装置包括：处理器和存储器，所述存储器中存储有程序指令，所述处理器调用存储器中存储的程序指令以使装置执行第一方面中的任一项所述的方法步骤。In a second aspect, a fault diagnosis device based on multi-granularity regularized rebalanced incremental learning is provided, the device comprising: a processor and a memory, the memory storing program instructions, the processor calling the program instructions stored in the memory to enable the device to execute any one of the method steps described in the first aspect.

第三方面，一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序包括程序指令，所述程序指令被处理器执行时使所述处理器执行第一方面中的任一项所述的方法步骤。In a third aspect, a computer-readable storage medium stores a computer program, wherein the computer program includes program instructions, and when the program instructions are executed by a processor, the processor executes any one of the method steps described in the first aspect.

本发明提供的技术方案的有益效果是：The beneficial effects of the technical solution provided by the present invention are:

1、本发明同时考虑重平衡建模和故障类别间的相关关系，进一步提高对新旧类的学习能力，减轻模型的灾难性遗忘现象，提高故障诊断的识别能力，进而提高对大型装备的故障识别能力，大幅度增加设备的安全性，降低设备的故障率；1. The present invention simultaneously considers the correlation between rebalance modeling and fault categories, further improves the learning ability of new and old categories, reduces the catastrophic forgetting phenomenon of the model, improves the recognition ability of fault diagnosis, and further improves the fault recognition ability of large equipment, greatly increases the safety of equipment, and reduces the failure rate of equipment;

2、本发明针对工业场景大型装备的智能化故障诊断任务提出了一种新的多粒度正则化的重新平衡方法，此方法无需提前假设新旧类别之间的偏差关系，改进了智能故障诊断模型增量学习的旧任务灾难性遗忘问题；2. This paper proposes a new multi-granularity regularized rebalancing method for intelligent fault diagnosis tasks of large equipment in industrial scenarios. This method does not need to assume the deviation relationship between new and old categories in advance, and improves the catastrophic forgetting problem of old tasks in incremental learning of intelligent fault diagnosis models;

3、本发明设计了一个多粒度正则化的重平衡模块，根据不同故障类别的样本数量约束其对模型更新的影响，并通过构建故障类的层次结构，将基于该层次结构的类相关性嵌入到学习过程中；3. The present invention designs a multi-granularity regularized rebalancing module, which constrains the impact of different fault categories on model updating according to the number of samples, and embeds the class correlation based on the hierarchy into the learning process by constructing a hierarchy of fault classes;

4、本发明可以在故障诊断等增量学习任务中达到最先进的性能，与对比方法相比，平均精度提升最高可达7.47％和9.99％。4. The present invention can achieve state-of-the-art performance in incremental learning tasks such as fault diagnosis, with average precision improvements of up to 7.47% and 9.99% compared to the comparative methods.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为一种基于多粒度正则化重平衡增量学习的故障诊断方法流程图；FIG1 is a flow chart of a fault diagnosis method based on multi-granularity regularized rebalanced incremental learning;

图2为一大型故障诊断数据集FASON的多粒度结构构建示意图；FIG2 is a schematic diagram of a multi-granularity structure construction of a large fault diagnosis data set FASON;

图3为对比示意图；Figure 3 is a comparative schematic diagram;

其中，该图应用在大型故障诊断数据集上，采用多粒度正则化与仅采用重平衡方法相比，对全部类别的准确率的提升，可以看到绝大多数的类别在使用了多粒度正则化后准确率有显著地提升；Among them, this figure is applied to a large fault diagnosis data set. Compared with the rebalancing method alone, the accuracy of all categories is improved by using multi-granularity regularization. It can be seen that the accuracy of most categories is significantly improved after using multi-granularity regularization;

图4为一种基于多粒度正则化重平衡增量学习的故障诊断装置的结构示意图；FIG4 is a schematic diagram of the structure of a fault diagnosis device based on multi-granularity regularized rebalanced incremental learning;

表1为不同方法在公开数据集上，最后一阶段的准确率和平均准确率的对比；Table 1 shows the comparison of the accuracy and average accuracy of the last stage of different methods on the public dataset;

表2为不同方法在真实场景收集的故障诊断数据集上，最后一阶段的准确率和平均准确率的对比；Table 2 shows the comparison of the accuracy and average accuracy of the last stage of different methods on the fault diagnosis dataset collected in real scenarios;

表3为在FARON-22/22设置下的消融实验，表明本方法的各个组件的有效性。Table 3 shows the ablation experiments in the FARON-22/22 setting, demonstrating the effectiveness of each component of the proposed method.

具体实施方式DETAILED DESCRIPTION

为使本发明的目的、技术方案和优点更加清楚，下面对本发明实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present invention more clear, the embodiments of the present invention are described in further detail below.

实施例1Example 1

由于已学习过的旧故障类仅保存了少量样本，新故障类有大量的样本进行学习，为了解决背景技术中存在的问题，故障诊断任务中增量式学习新故障类别，会导致旧故障类别识别能力的严重退化，本发明实施例提出了一种基于多粒度正则化重平衡增量学习的故障诊断方法，在故障诊断中增量式学习新故障的同时保持以往旧故障类别的识别能力，从而通过快速高效的故障诊断模型更新和持续学习，参见图1，该方法包括以下步骤：Since only a small number of samples of the old fault classes that have been learned are saved, and a large number of samples of the new fault classes are learned, in order to solve the problem existing in the background technology, incremental learning of new fault categories in fault diagnosis tasks will lead to serious degradation of the recognition ability of old fault categories. The embodiment of the present invention proposes a fault diagnosis method based on multi-granularity regularized rebalanced incremental learning, which incrementally learns new faults in fault diagnosis while maintaining the recognition ability of old fault categories in the past, thereby quickly and efficiently updating and continuously learning the fault diagnosis model. Referring to FIG1, the method includes the following steps:

101：重平衡损失；101: rebalancing loss;

其中，重平衡损失基于新旧故障类别的样本数量学习类别的更新权重，通过增大样本稀疏的旧故障类别的梯度更新实现对其识别能力的保持。Among them, the rebalancing loss learns the updated weights of the categories based on the number of samples of the new and old fault categories, and maintains its recognition ability by increasing the gradient update of the old fault categories with sparse samples.

102：多粒度正则化项；102: Multi-granularity regularization term;

其中，由于仅使用重平衡损失会影响新类故障的学习，因此提出了一种多粒度正则化项，利用故障类别的层次结构信息约束模型学习故障类别间的关系，从而提升新旧类的识别能力。Among them, since only using rebalancing loss will affect the learning of new types of faults, a multi-granularity regularization term is proposed to use the hierarchical structure information of fault categories to constrain the model to learn the relationship between fault categories, thereby improving the recognition ability of new and old classes.

103：对分类器进行训练。103: Train the classifier.

其中，采取两阶段训练策略，使用当前能够获取到的数据进行第一阶段训练，更新特征提取层，在第二阶段训练时与特征提取层解耦，即冻结特征提取层的参数，采用重采样的平衡训练子集重新训练分类器。Among them, a two-stage training strategy is adopted. The currently available data is used for the first stage training to update the feature extraction layer. In the second stage training, it is decoupled from the feature extraction layer, that is, the parameters of the feature extraction layer are frozen, and the classifier is retrained using a resampled balanced training subset.

综上所述，本发明实施例通过上述步骤101-步骤103实现了同时考虑重平衡建模和故障类别间的相关关系，进一步提高对新旧类的学习能力，减轻模型的灾难性遗忘现象，提高故障诊断的识别能力，大幅度增加设备的安全性，降低设备的故障率。In summary, the embodiment of the present invention realizes simultaneous consideration of the correlation between rebalancing modeling and fault categories through the above steps 101 to 103, further improving the learning ability of new and old categories, alleviating the catastrophic forgetting phenomenon of the model, improving the recognition ability of fault diagnosis, greatly increasing the safety of the equipment, and reducing the failure rate of the equipment.

实施例2Example 2

下面结合图1、计算公式、表格对实施例1中的方案进行进一步地介绍，详见下文描述：The scheme in Example 1 is further introduced below in conjunction with FIG. 1 , calculation formulas, and tables. See the following description for details:

一、模型框架1. Model Framework

模型框架由一个特征提取模块、知识蒸馏、多粒度正则化的重平衡建模以及一个分类器构成，分为两阶段训练，参见图1所示。其中以下两个组成部分为方法核心：The model framework consists of a feature extraction module, knowledge distillation, multi-granularity regularized rebalancing modeling, and a classifier, and is divided into two stages of training, as shown in Figure 1. The following two components are the core of the method:

(1)重平衡建模：在训练过程中使用重平衡策略来减轻数据不平衡的影响；(1) Rebalanced modeling: Use a rebalancing strategy during training to mitigate the impact of data imbalance;

其中，上述的重平衡策略为本领域技术人员所公知，本发明实施例对此不做赘述。The above-mentioned rebalancing strategy is well known to those skilled in the art, and will not be elaborated in detail in the embodiment of the present invention.

(2)多粒度正则化：由于只使用再平衡策略会导致新类别分类精度的下降(即新类别的欠拟合)，因此本发明实施例设计了一个多粒度正则化项，使得模型考虑类相关性。(2) Multi-granularity regularization: Since only using the rebalancing strategy will lead to a decrease in the classification accuracy of new categories (i.e., underfitting of new categories), the embodiment of the present invention designs a multi-granularity regularization term so that the model takes class correlation into account.

二、数据划分、采样及构建多粒度结构2. Data Partitioning, Sampling, and Building Multi-granularity Structures

1、数据集划分1. Dataset division

本发明实施例将模型当前状态下学习过的类别称为旧类别，将未被学习过的类别并且在这一阶段即将被学习的类别称为新类别。对于增量学习故障诊断任务，需要在实验数据集上划分出不同批次的类别以模拟新类别的到来。为了保证实验公平性，类别学习顺序按照iCaRL、BiC、IL2M等对比方法(本领域技术人员所公知)统一设置，将随机数种子设置为1993。划分用n/m标识，其中n代表初始阶段学习的旧类别数，m代表在之后每轮需要学习的新类别数量，直到数据集的全部类别学习完毕。对CIFAR10、CIFAR100、MiniImageNet数据集以及FARON故障诊断数据集进行划分，具体细节如下：In the embodiment of the present invention, the categories that have been learned in the current state of the model are called old categories, and the categories that have not been learned and are about to be learned in this stage are called new categories. For incremental learning fault diagnosis tasks, it is necessary to divide different batches of categories on the experimental data set to simulate the arrival of new categories. In order to ensure the fairness of the experiment, the category learning order is uniformly set according to the comparison methods such as iCaRL, BiC, IL2M (well known to those skilled in the art), and the random number seed is set to 1993. The division is marked with n/m, where n represents the number of old categories learned in the initial stage, and m represents the number of new categories that need to be learned in each subsequent round until all categories of the data set are learned. The CIFAR10, CIFAR100, MiniImageNet data sets and FARON fault diagnosis data sets are divided, and the specific details are as follows:

(1)CIFAR10数据集共10个类别，每个类别包含5000个样本用于训练，以及1000个样本用于测试。将该数据集划分为1/1和2/2两种设置，即将10个类别平均划分为1类、2类的增量批次。(1) The CIFAR10 dataset has 10 categories, each of which contains 5000 samples for training and 1000 samples for testing. The dataset is divided into two settings: 1/1 and 2/2, that is, the 10 categories are evenly divided into 1-category and 2-category incremental batches.

(2)CIFAR100数据集共100个类别，每个类别包含500个样本用于训练，以及100个样本用于测试。将100个类别中的50个类别作为初始学习使用类别，将剩下的50个类别中的每5类、10类划分成一个增量学习批次，记作50/5、50/10的划分模式。(2) The CIFAR100 dataset has 100 categories, each of which contains 500 samples for training and 100 samples for testing. 50 of the 100 categories are used as the initial learning categories, and every 5 or 10 categories of the remaining 50 categories are divided into an incremental learning batch, which is recorded as a 50/5 or 50/10 division mode.

(3)MiniImageNet数据集包含100个类别，每个类别有600个样本。本发明实施例以9：1的比例将每个类别的600个样本划分为训练集和测试集。实验使用10/10,20/20的划分模式。(3) The MiniImageNet dataset contains 100 categories, each with 600 samples. The embodiment of the present invention divides the 600 samples of each category into a training set and a test set at a ratio of 9:1. The experiment uses a 10/10 and 20/20 division mode.

(4)FARON故障诊断数据集。数据集为大型装备故障诊断数据，数据中共有121个传感器输出维度，即从121个不同方面描述系统当前的运行状态。为了进行有效的故障诊断建模，所用数据需包含不同工况下的正常运行数据和故障数据。正常运行数据经过5.32小时的运行，共收集了76632个信号样本。该数据集共66个类别，包括1个正常类别和55个故障类别。该数据集学习顺序的随机种子设置为36，保证正常类别在初始阶段被学习。实验使用11/11,22/22和33/33的划分模式。(4) FARON fault diagnosis data set. The data set is large-scale equipment fault diagnosis data. There are 121 sensor output dimensions in the data, that is, the current operating status of the system is described from 121 different aspects. In order to carry out effective fault diagnosis modeling, the data used must include normal operating data and fault data under different working conditions. After 5.32 hours of normal operating data, a total of 76,632 signal samples were collected. The data set has 66 categories, including 1 normal category and 55 fault categories. The random seed of the learning order of this data set is set to 36 to ensure that the normal category is learned in the initial stage. The experiment uses 11/11, 22/22 and 33/33 partitioning modes.

2、采样策略2. Sampling strategy

实验使用随机采样策略，并假设最大存储样本空间是固定的，用K表示最大存储样本数。对于公开数据集，设置最大存储样本数K为2000。故障诊断数据集的最大存储样本数K为8000。The experiment uses a random sampling strategy and assumes that the maximum storage sample space is fixed, with K representing the maximum number of stored samples. For the public dataset, the maximum number of stored samples K is set to 2000. The maximum number of stored samples K for the fault diagnosis dataset is 8000.

受存储空间有限的限制，旧类别的训练集与新类别的训练集中各类的样本数不相同，但令旧类别的验证集和新类别的验证集中各类样本数目相同，用于平衡训练阶段。旧类别的训练集和测试集的样本总和需小于等于存储空间的上限K。具体地，本发明实施例设置旧类别的训练集与验证集的样本比例为9:1，旧类别训练集旧类别验证集以及新类别验证集新类别训练集的样本数量如下述公式所示：Due to the limited storage space, the number of samples in the training set of the old category is different from that in the training set of the new category, but the number of samples in the validation set of the old category is the same as that in the validation set of the new category, which is used to balance the training phase. The sum of the samples in the training set and the test set of the old category must be less than or equal to the upper limit K of the storage space. Specifically, the embodiment of the present invention sets the sample ratio of the training set of the old category to the validation set to 9:1, and the sample ratio of the training set of the old category is 9:1. Old category validation set And the new category validation set New category training set The sample size is given by the following formula:

3、对划分的数据集构建多粒度结构3. Construct a multi-granular structure for the divided data set

GloVe(Global Vectors for Word Representation)是一种常用的基于全局词频统计的词矢量表示方法，通过将单词映射到高维空间来实现。利用预训练的词向量库，可将每个单词转换成300维的词向量。在获得了词语在特征空间中的表示后，本发明实施例使用聚类算法对语义向量聚类，利用此方法可以将目前出现的类别建成一个两层的层次结构。本发明实施例使用了K-means算法对标签转换成的词向量进行聚类。GloVe (Global Vectors for Word Representation) is a commonly used word vector representation method based on global word frequency statistics, which is achieved by mapping words to a high-dimensional space. Using a pre-trained word vector library, each word can be converted into a 300-dimensional word vector. After obtaining the representation of the words in the feature space, the embodiment of the present invention uses a clustering algorithm to cluster the semantic vectors. This method can be used to build the currently appearing categories into a two-layer hierarchical structure. The embodiment of the present invention uses the K-means algorithm to cluster the word vectors converted from the labels.

其中，K-means算法的优化公式如下式所示：Among them, the optimization formula of the K-means algorithm is shown as follows:

其中，x＝(x₁,x₂,...,x_n)，代表每个点的特征表示。μ_i为簇S_i中所有点的均值。该算法的优化目标为：找到满足上式的簇S_i，将n个点划分至k个簇中，使得簇内所有点的误差平方和最小。参数k越小，则聚成的超类越少，同一个超类下的类别间相关性越弱；参数k越大，则聚成的超类越多，同属于一个超类的类别间的相关性越强。Where x = (x ₁ , x ₂ , ..., x _n ), represents the feature representation of each point. μ _i is the mean of all points in cluster S _i . The optimization goal of the algorithm is to find a cluster S _i that satisfies the above formula and divide n points into k clusters so that the sum of square errors of all points in the cluster is minimized. The smaller the parameter k is, the fewer superclasses are clustered and the weaker the correlation between categories under the same superclass is; the larger the parameter k is, the more superclasses are clustered and the stronger the correlation between categories belonging to the same superclass is.

算法流程是首先导入GloVe预训练模型获取语义标签所对应的词向量。接着使用K-means算法进行聚类得到一个两层的层次结构。The algorithm flow is to first import the GloVe pre-trained model to obtain the word vector corresponding to the semantic label. Then use the K-means algorithm to cluster to obtain a two-layer hierarchical structure.

三、特征提取模块3. Feature Extraction Module

本发明实施例使用残差网络(ResNet)作为特征提取器，对于CIFAR和MiniImageNet数据集，分别使用ResNet32和ResNet18网络提取特征。对于大型故障诊断数据集，使用CNN网络提取特征。The embodiment of the present invention uses a residual network (ResNet) as a feature extractor. For CIFAR and MiniImageNet datasets, ResNet32 and ResNet18 networks are used to extract features respectively. For large fault diagnosis datasets, CNN networks are used to extract features.

其中，ResNet32、ResNet18和CNN网络均是本领域技术人员所公知，本发明实施例对此不做赘述。Among them, ResNet32, ResNet18 and CNN networks are well known to those skilled in the art, and will not be described in detail in the embodiments of the present invention.

四、知识蒸馏模块4. Knowledge Distillation Module

将上一轮学习N_old类的模型作为教师模型，将此轮需要学习的N_old+N_new类的模型称为学生模型。学生模型的输入为采样后的旧类别样本和新类别样本的并集，符号表示为D_t＝D′_old∪D_new，相应的输出为同样的数据经过教师模型后，将输出在经过softmax层之前对教师模型输出的概率每一项除以温度系数T，并将这一结果记作软标签π(z′)。将学生模型输出的概率每一项除以温度T，记作π(z)。接着对π(z)截取前N_old类的预测概率并计算蒸馏损失，蒸馏损失函数如下所示：The model of the N _old class learned in the previous round is used as the teacher model, and the model of the N _old + N _new class to be learned in this round is called the student model. The input of the student model is the union of the old class samples and the new class samples after sampling, symbolically represented by D _t =D′ _old ∪D _new , and the corresponding output is The same data will be output after passing through the teacher model Before passing through the softmax layer, each probability item output by the teacher model is divided by the temperature coefficient T, and the result is recorded as the soft label π(z′). Each probability item output by the student model is divided by the temperature T, recorded as π(z). Then, the predicted probability of the previous N _old class is intercepted from π(z) and the distillation loss is calculated. The distillation loss function is as follows:

其中，上述的“’”表示采样后的，例如：D_o′_ld表示采样后的旧类别样本。T为蒸馏温度，温度T设置的越大，则会生成越均匀的概率分布；当T设置为1时，蒸馏损失函数等价于交叉熵损失函数。本发明实施例将T设置为2。The above “'” indicates the sampled ones, for example, D _o ′ _ld indicates the old category samples after sampling. T is the distillation temperature. The higher the temperature T is set, the more uniform the probability distribution will be. When T is set to 1, the distillation loss function is equivalent to the cross entropy loss function. In the embodiment of the present invention, T is set to 2.

下述算法1总结了知识蒸馏技术应用于该发明的工作过程。The following Algorithm 1 summarizes the working process of applying knowledge distillation technology to this invention.

五、多粒度正则化的重平衡模块5. Multi-granularity regularized rebalancing module

多粒度正则化的重平衡模块包含两个组成部分。一是重平衡建模。在训练过程中使用再平衡策略来减轻数据不平衡的影响；二是多粒度正则化。由于只使用重平衡策略会导致新类别分类精度的下降，因此设计了一个多粒度正则化项，使模型考虑类相关性。The rebalancing module with multi-granularity regularization consists of two components. The first is rebalancing modeling. A rebalancing strategy is used during training to mitigate the impact of data imbalance. The second is multi-granularity regularization. Since using only the rebalancing strategy will lead to a decrease in the classification accuracy of new categories, a multi-granularity regularization term is designed to make the model consider class correlation.

重平衡建模表现为对于样本稀疏类别施加相对高的权重，对于样本数目多的类别施加相对低的权重，平衡新类别与旧类别梯度更新的差距，减轻类别不均衡性。类别权重如下：Rebalancing modeling is performed by applying relatively high weights to categories with sparse samples and relatively low weights to categories with a large number of samples, balancing the gap between the gradient updates of new categories and old categories and reducing category imbalance. The category weights are as follows:

其中，z_i表示网络的输出，n_i表示每个类别的样本数；γ表示一个超参数，定义为：Among them, _zi represents the output of the network, _ni represents the number of samples in each category, and γ represents a hyperparameter, which is defined as:

其中，N表示类别的总数，z_j表示网络输出，log(·)表示样本属于类别j的预测概率。重平衡损失表示为：Where N represents the total number of categories, _zj represents the network output, and log(·) represents the predicted probability that the sample belongs to category j. The rebalancing loss is expressed as:

本发明实施例借助重平衡损失，提升旧类别的监督强度，使新类别和旧类别因样本数的显著差异产生的负面影响得到进一步减弱。除了利用重平衡方法，根据不同故障类别的样本数量约束其对模型更新的影响，本发明实施例进一步设计了一个多粒度正则化项，通过构建故障类的层次结构，将基于该层次结构的类相关性嵌入到学习过程中。The embodiment of the present invention uses the rebalancing loss to improve the supervision strength of the old categories, so that the negative impact caused by the significant difference in the number of samples between the new and old categories is further weakened. In addition to using the rebalancing method to constrain the impact of the number of samples of different fault categories on model updates, the embodiment of the present invention further designs a multi-granularity regularization term, which embeds the class correlation based on the hierarchy into the learning process by constructing a hierarchy of fault classes.

首先通过使用数据集或对语义进行聚类来获得类的层次结构，利用其语义信息，通过GloVe将语义标签转换为词向量，利用K-means算法进行语义聚类，得到二层次结构。随后，构建带有多粒度信息的连续标签，并利用KL散度损失进行优化。具体计算方式如下：First, we use the dataset or semantic clustering to obtain the class hierarchy. We use its semantic information to convert the semantic labels into word vectors through GloVe, and use the K-means algorithm for semantic clustering to obtain a two-level structure. Then, we construct continuous labels with multi-granularity information and optimize them using KL divergence loss. The specific calculation method is as follows:

构建带有多粒度信息的连续标签的计算方式为：The calculation method for constructing continuous labels with multi-granularity information is:

其中，C表示真实标签的节点，A表示当前类节点，M表示多粒度标签，N表示叶节点集。β表示该函数的一个超参数，其取值范围为(0，+∞)，它控制标签的分布情况。β取值越小，标签分布越均匀；β越大，标签分布越尖锐。公式d(N_i,N_j)度量了层次结构中两个细粒度类别之间的距离。LCS(最小公共子树)表示两个节点中的最小公共子树。Height(B)表示其根为节点B的子树的高度。带有多粒度信息的连续标签记为 Among them, C represents the node of the true label, A represents the current class node, M represents the multi-granularity label, and N represents the leaf node set. β represents a hyperparameter of the function, and its value range is (0, +∞), which controls the distribution of the label. The smaller the value of β, the more uniform the label distribution; the larger the value of β, the sharper the label distribution. The formula d(N _i ,N _j ) measures the distance between two fine-grained categories in the hierarchy. LCS (least common subtree) represents the smallest common subtree between two nodes. Height(B) represents the height of the subtree whose root is node B. The continuous label with multi-granularity information is recorded as

多粒度正则化损失函数的计算方式为：The calculation method of the multi-granularity regularization loss function is:

即利用KL散度计算和预测概率q_i之间的差异。That is, using KL divergence calculation and the difference between the predicted probability q _i .

多粒度正则化的重平衡模块的损失函数为：The loss function of the rebalancing module with multi-granularity regularization is:

L_M＝(1-λ)L_CB+L_H (13) _LM ＝(1-λ) _LCB + _LH (13)

其中， in,

下述算法2总结了多粒度正则化的重平衡模块。The following Algorithm 2 summarizes the rebalancing module with multi-granularity regularization.

六、分类器6. Classifier

分类器为一层全连接网络，输出神经元个数和该阶段以及之前阶段所见的全部类别个数相等，并且随着增量学习的进行增加相应个数的神经元。The classifier is a one-layer fully connected network with the number of output neurons equal to the number of all categories seen in this stage and the previous stage, and the corresponding number of neurons is increased as incremental learning proceeds.

七、训练策略7. Training Strategy

模型分为两阶段训练。在第一阶段，训练过程为将新类别样本D_new与旧类范例集D_o′_ld混合，记作D_t作为模型的输入。经过特征提取层、分类器得到网络的输出，优化目标为最小化蒸馏损失和多粒度正则化的重平衡模块损失的加权和。The model is trained in two stages. In the first stage, the training process is to mix the new category samples D _new with the old category example set D _o ′ _ld , recorded as D _t as the input of the model. After the feature extraction layer and the classifier, the output of the network is obtained. The optimization goal is to minimize the weighted sum of the distillation loss and the rebalancing module loss of multi-granularity regularization.

总损失函数公式如(14)所示：The total loss function formula is shown in (14):

L＝L_M+αλL_D (14)L＝L _M +αλL _D (14)

其中，用于调节旧知识保持的程度。随着增量学习的进行，参数λ的值逐渐变大，对于旧类别知识的保留逐渐加强。α＝10^-x，用于将损失L_D调整到与L_M同一数量级上，x一般赋值0或1。in, Used to adjust the degree of retention of old knowledge. As incremental learning proceeds, the value of parameter λ gradually increases, and the retention of old category knowledge gradually strengthens. α＝10 ^-x , used to adjust the loss _LD to the same order of magnitude as _LM , and x is generally assigned 0 or 1.

在不平衡的数据下训练结束之后，第二阶段训练为加入一个额外的消除分类器偏差的步骤，即将CNN层与分类器解耦，冻结CNN层的参数，重新训练分类器。这是在使用不平衡的训练集训练完成后，作为额外的一个步骤进行。在构建完成平衡的验证集后，训练之前冻结CNN层的参数。训练时利用验证集重新训练分类器。特别地，这一步骤将在除了初始阶段的所有阶段(使用不平衡样本训练之后)进行，因为初始阶段的样本是平衡的。After the training is completed with imbalanced data, the second stage of training is to add an additional step to eliminate the classifier bias, that is, decoupling the CNN layer from the classifier, freezing the parameters of the CNN layer, and retraining the classifier. This is done as an additional step after the training is completed with the imbalanced training set. After the balanced validation set is built, the parameters of the CNN layer are frozen before training. The classifier is retrained using the validation set during training. In particular, this step will be performed in all stages except the initial stage (after training with imbalanced samples) because the samples in the initial stage are balanced.

综上所述，本发明实施例通过上述部分实现了同时考虑重平衡建模和故障类别间的相关关系，进一步提高对新旧类的学习能力，减轻模型的灾难性遗忘现象，提高故障诊断的识别能力，大幅度增加设备的安全性，降低设备的故障率。In summary, the embodiments of the present invention realize the simultaneous consideration of the correlation between rebalancing modeling and fault categories through the above parts, further improve the learning ability of new and old categories, reduce the catastrophic forgetting phenomenon of the model, improve the recognition ability of fault diagnosis, greatly increase the safety of the equipment, and reduce the failure rate of the equipment.

实施例3Example 3

下面结合图2、图3，具体的实验数据对实施例1和2中的方案进行可行性验证，详见下文描述：In conjunction with FIG. 2 and FIG. 3 , specific experimental data are used to verify the feasibility of the schemes in Examples 1 and 2, as described below:

首先使用公开数据集进行方法的验证。数据集的层次结构均由语义聚类逐渐构建。First, we use a public dataset to verify the method. The hierarchical structure of the dataset is gradually constructed by semantic clustering.

对于CIFAR10和CIFAR100，学习率从0.01开始，在第70和第140个迭代除以10。对于MiniImageNet，批处理大小设置为128。学习率从0.01开始，每30个迭代完成后学习率除以0.1，采用SGD优化器，权重衰减为2e-4，动量为0.9。CIFAR10、CIFAR100和MiniImageNet的训练阶段分别设置为200、200和100。数据批次大小均设置为128，采用Adam优化器。对于内存限制K，在所有的公共数据集上设置K＝2,000。For CIFAR10 and CIFAR100, the learning rate starts from 0.01 and is divided by 10 at the 70th and 140th iterations. For MiniImageNet, the batch size is set to 128. The learning rate starts from 0.01 and is divided by 0.1 after every 30 iterations. The SGD optimizer is used with a weight decay of 2e-4 and a momentum of 0.9. The training epochs for CIFAR10, CIFAR100, and MiniImageNet are set to 200, 200, and 100, respectively. The data batch size is set to 128, and the Adam optimizer is used. For the memory limit K, K = 2,000 is set on all public datasets.

其次使用故障诊断数据集进行方法的验证。图2为一大型故障诊断数据集的多粒度结构构建示意图。在图2右侧，可以看到编号为1，5，27，28的类别的特征近似，同属于给水泵故障这一粗粒度类别。以FARON-22/22实验设置为例，编号为5的细粒度故障类别在初始阶段被学习，编号为1，27，28的类别在最后一轮增量学习阶段被学习。通过多粒度正则化，可以将数据的结构信息加入到增量学习中，保持故障类别之间的相关关系。Secondly, the method is verified using a fault diagnosis dataset. Figure 2 is a schematic diagram of the multi-granularity structure construction of a large fault diagnosis dataset. On the right side of Figure 2, it can be seen that the features of the categories numbered 1, 5, 27, and 28 are similar, and they belong to the coarse-grained category of water pump faults. Taking the FARON-22/22 experimental setting as an example, the fine-grained fault category numbered 5 is learned in the initial stage, and the categories numbered 1, 27, and 28 are learned in the last round of incremental learning. Through multi-granularity regularization, the structural information of the data can be added to the incremental learning to maintain the correlation between fault categories.

本发明实施例在该数据集学习顺序的随机种子设置为36，保证正常类别在初始阶段被学习。对于该数据集本发明实施例采用Adam优化器，并将学习率设置为1e-5训练50个epoch。设置内存上限K＝8,000。The random seed of the learning sequence of the data set in the embodiment of the present invention is set to 36 to ensure that the normal category is learned in the initial stage. For this data set, the embodiment of the present invention uses the Adam optimizer and sets the learning rate to 1e-5 for training 50 epochs. The memory limit K is set to 8,000.

本发明实施例中的实验遵循标准评估体系，通过增量准确率和平均准确率评价模型对旧知识的保持能力和对新知识的学习能力。其中平均准确率是度量包括初始阶段在内增量学习每个阶段的平均精度。其中，平均增量准确率和平均准确率的数值越高，代表模型对新类别样本的学习和旧类别知识的保持能力越强。将实验结果与六种现有方法进行比较，包括：iCaRL、LwF、LUCIR、MT、BiC和PODNet。The experiments in the embodiments of the present invention follow the standard evaluation system, and evaluate the model's ability to retain old knowledge and learn new knowledge through incremental accuracy and average accuracy. The average accuracy measures the average accuracy of each stage of incremental learning, including the initial stage. The higher the average incremental accuracy and the average accuracy, the stronger the model's ability to learn new category samples and retain old category knowledge. The experimental results are compared with six existing methods, including: iCaRL, LwF, LUCIR, MT, BiC, and PODNet.

表1中展示了在CIFAR10、CIFAR100和MiniImageNet数据集上的最后增量阶段的准确率和平均准确率，综合体现了本发明实施例提供模型的性能。表2展示了在实际故障诊断数据集上的各个增量阶段的准确率和平均准确率，体现了本发明实施例提供的模型在实际复杂场景FARON下的性能。本发明实施例考虑到类别不均衡性质、利用了类别间的相关关系，对模型施加了约束，因此可提高模型性能。Table 1 shows the accuracy and average accuracy of the last incremental stage on the CIFAR10, CIFAR100 and MiniImageNet datasets, which comprehensively reflects the performance of the model provided by the embodiment of the present invention. Table 2 shows the accuracy and average accuracy of each incremental stage on the actual fault diagnosis dataset, which reflects the performance of the model provided by the embodiment of the present invention under the actual complex scenario FARON. The embodiment of the present invention takes into account the imbalanced nature of the categories, utilizes the correlation between the categories, and imposes constraints on the model, thereby improving the model performance.

表1在三种公共数据集上的增量学习结果(准确率)。Table 1 Incremental learning results (accuracy) on three public datasets.

其中，last表示最后一个阶段的准确率，Avg表示平均增量准确率。Among them, last represents the accuracy of the last stage, and Avg represents the average incremental accuracy.

表2故障诊断数据任务的增量学习结果(准确率)Table 2 Incremental learning results (accuracy) of fault diagnosis data tasks

此外，本发明实施例对于自身组件的有效性进行验证，如表3所示。图3可视化了采用重平衡、多粒度正则化的方法(MGRB)与仅仅采用重平衡方法(RB)的每个类别的准确率提升情况。In addition, the embodiment of the present invention verifies the effectiveness of its own components, as shown in Table 3. FIG3 visualizes the accuracy improvement of each category using the rebalancing and multi-granularity regularization method (MGRB) and the rebalancing method (RB) alone.

表3FARON大型故障诊断数据集的消融实验结果Table 3 Ablation experiment results of FARON large fault diagnosis dataset

其中，CE是交叉熵损失的简称，CB是多粒度正则化的重平衡模块中的类平衡分类损失部分。MG表示多粒度正则化的重平衡模块中的多粒度正则化部分。Decoupling表示是否有第二阶段训练。Among them, CE is the abbreviation of cross entropy loss, CB is the class balance classification loss part in the rebalancing module of multi-granularity regularization. MG represents the multi-granularity regularization part in the rebalancing module of multi-granularity regularization. Decoupling indicates whether there is a second stage of training.

实施例4Example 4

一种视频关键点的动态捕捉装置，参见图3，该装置包括：处理器1和存储器2，存储器2中存储有程序指令，处理器1调用存储器2中存储的程序指令以使装置执行实施例1中的以下方法步骤：A device for capturing dynamic key points of a video, see FIG3 , the device comprises: a processor 1 and a memory 2, the memory 2 stores program instructions, the processor 1 calls the program instructions stored in the memory 2 to enable the device to execute the following method steps in Example 1:

基于多粒度正则化重平衡增量学习的故障诊断方法，所述方法包括：A fault diagnosis method based on multi-granularity regularized rebalanced incremental learning, the method comprising:

对划分后的数据集进行词向量表示，获取语义标签所对应的词向量，使用K-means算法进行聚类获取一个两层的多粒度结构；构建带有多粒度信息的连续标签，并利用KL散度损失进行优化；The divided data set is represented by word vectors, the word vectors corresponding to the semantic labels are obtained, and the K-means algorithm is used for clustering to obtain a two-layer multi-granularity structure; continuous labels with multi-granularity information are constructed and optimized using KL divergence loss;

通过特征提取层获取新旧类别的特征表示向量，通过知识蒸馏约束当前模型的决策输出与增量学习前模型的输出分布相同，基于多粒度正则化项，对于样本数目多的类别施加相对低的权重，平衡新故障类别与旧故障类别梯度更新的差距，减轻类别不均衡性；The feature extraction layer obtains feature representation vectors of new and old categories. Through knowledge distillation, the decision output of the current model is constrained to be the same as the output distribution of the model before incremental learning. Based on multi-granularity regularization terms, relatively low weights are applied to categories with a large number of samples to balance the gap between the gradient updates of new fault categories and old fault categories, thereby reducing category imbalance.

采取两阶段训练策略，使用当前能够获取到的数据进行第一阶段训练，在第二阶段训练时与特征提取层解耦，即冻结特征提取层的参数，采用重采样的平衡训练子集重新训练分类器。A two-stage training strategy is adopted. The currently available data is used for the first stage training. In the second stage training, it is decoupled from the feature extraction layer, that is, the parameters of the feature extraction layer are frozen, and the classifier is retrained using a resampled balanced training subset.

其中，所述构建带有多粒度信息的连续标签具体为：The construction of continuous labels with multi-granularity information is specifically as follows:

其中，C表示真实标签的节点，A表示当前类节点，M表示多粒度标签，N表示叶节点集，β表示该函数的一个超参数，公式d(N_i,N_j)度量了层次结构中两个细粒度类别之间的距离，LCS表示包含两个节点的最小公共子树，Height(B)表示根为节点B的子树的高度，带有多粒度信息的连续标签记为 Where C represents the node of the true label, A represents the current class node, M represents the multi-granularity label, N represents the leaf node set, β represents a hyperparameter of the function, the formula d(N _i ,N _j ) measures the distance between two fine-grained categories in the hierarchy, LCS represents the minimum common subtree containing two nodes, Height(B) represents the height of the subtree with the root as node B, and the continuous label with multi-granularity information is recorded as

进一步地，多粒度正则化项的损失函数的计算方式为：Furthermore, the loss function of the multi-granularity regularization term is calculated as:

多粒度正则化的重平衡的损失函数为：The loss function of the rebalanced multi-granularity regularization is:

L_M＝(1-λ)L_CB+L_H _LM ＝(1-λ) _LCB + _LH

总损失函数为：The total loss function is:

L＝L_M+αλL_D L＝ _LM + _αλLD

其中，α＝10^-x，用于将损失L_D调整到与L_M同一数量级上。in, α=10 ^-x , is used to adjust the loss _LD to the same order of magnitude as _LM .

其中，第一阶段训练过程为：将新类别样本D_new与旧类范例集D_o′_ld混合，记作D_t作为模型的输入，经过特征提取层、分类器得到网络的输出，优化目标为最小化蒸馏损失和多粒度正则化的重平衡模块损失的加权和。The first stage training process is as follows: the new category samples D _new are mixed with the old category example set D _o ′ _ld , denoted as D _t as the input of the model, and the network output is obtained through the feature extraction layer and classifier. The optimization goal is to minimize the weighted sum of the distillation loss and the rebalancing module loss of multi-granularity regularization.

进一步地，第二阶段训练过程为：将数据采样成类别平衡的训练子集，冻结特征提取层的参数，单独重新训练分类器；优化目标为最小化蒸馏损失和多粒度正则化的重平衡模块损失的加权和。Furthermore, the second stage training process is: sampling the data into category-balanced training subsets, freezing the parameters of the feature extraction layer, and retraining the classifier separately; the optimization goal is to minimize the weighted sum of the distillation loss and the rebalancing module loss of multi-granularity regularization.

这里需要指出的是，以上实施例中的装置描述是与实施例中的方法描述相对应的，本发明实施例在此不做赘述。It should be pointed out here that the device description in the above embodiment corresponds to the method description in the embodiment, and the embodiment of the present invention will not be described in detail here.

上述的处理器1和存储器2的执行主体可以是计算机、单片机、微控制器等具有计算功能的器件，具体实现时，本发明实施例对执行主体不做限制，根据实际应用中的需要进行选择。The execution subjects of the above-mentioned processor 1 and memory 2 can be devices with computing functions such as computers, single-chip microcomputers, and microcontrollers. In specific implementation, the embodiment of the present invention does not limit the execution subject and it is selected according to the needs of actual applications.

存储器2和处理器1之间通过总线3传输数据信号，本发明实施例对此不做赘述。The data signal is transmitted between the memory 2 and the processor 1 via the bus 3, which will not be described in detail in the embodiment of the present invention.

实施例5Example 5

基于同一发明构思，本发明实施例还提供了一种计算机可读存储介质，存储介质包括存储的程序，在程序运行时控制存储介质所在的设备执行上述实施例中的方法步骤。Based on the same inventive concept, an embodiment of the present invention further provides a computer-readable storage medium, the storage medium includes a stored program, and when the program is running, the device where the storage medium is located is controlled to execute the method steps in the above embodiment.

该计算机可读存储介质包括但不限于快闪存储器、硬盘、固态硬盘等。The computer-readable storage medium includes but is not limited to a flash memory, a hard disk, a solid-state drive, and the like.

这里需要指出的是，以上实施例中的可读存储介质描述是与实施例中的方法描述相对应的，本发明实施例在此不做赘述。It should be pointed out here that the description of the readable storage medium in the above embodiment corresponds to the description of the method in the embodiment, and the embodiment of the present invention will not be described in detail here.

在上述实施例中，可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时，全部或部分地产生按照本发明实施例的流程或功能。In the above embodiments, all or part of the embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented by software, all or part of the embodiments may be implemented in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions according to the embodiments of the present invention are generated.

计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中，或者通过计算机可读存储介质进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质或者半导体介质等。The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. Computer instructions may be stored in a computer-readable storage medium or transmitted via a computer-readable storage medium. The computer-readable storage medium may be any available medium that can be accessed by the computer or a data storage device such as a server or a data center that includes one or more available media. The available medium may be a magnetic medium or a semiconductor medium, etc.

本发明实施例对各器件的型号除做特殊说明的以外，其他器件的型号不做限制，只要能完成上述功能的器件均可。Unless otherwise specified, the models of the components in the embodiments of the present invention are not limited, and any device that can perform the above functions may be used.

本领域技术人员可以理解附图只是一个优选实施例的示意图，上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。Those skilled in the art can understand that the accompanying drawing is only a schematic diagram of a preferred embodiment, and the serial numbers of the embodiments of the present invention are only for description and do not represent the advantages and disadvantages of the embodiments.

以上所述仅为本发明的较佳实施例，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A fault diagnosis method based on multi-granularity regularized rebalancing incremental learning, characterized in that the method includes:

Perform word vector representation on the divided fault diagnosis data set to obtain the word vectors corresponding to the semantic labels, and use the K-means algorithm to cluster the word vectors converted from the semantic labels to obtain a two-layer multi-granularity structure;

Construct continuous labels with multi-granularity information and optimize using KL divergence loss;

The feature representation vectors of the new and old fault categories are obtained through the feature extraction layer. The fault categories learned in the current state of the model are called old fault categories, and the fault categories that have not been learned and are about to be learned at this stage are called New fault categories;

The decision output of the current model is constrained by knowledge distillation to be the same as the output distribution of the model before incremental learning. Based on multi-granularity regularization terms, relatively low weights are applied to categories with a large number of samples to balance the gradient update of new fault categories and old fault categories. gap;

Adopt a two-stage training strategy, use the currently acquired data for the first stage training, update the feature extraction layer, and decouple the classifier from the feature extraction layer during the second stage training, that is, freeze the parameters of the feature extraction layer and use resampling Retrain the classifier on the balanced training subset;

The construction of continuous labels with multi-granularity information is specifically:

Among them, C represents the real label node, A represents the current class node, M represents the multi-granularity label, N represents the leaf node set, β represents a hyperparameter of the function, and the formula d(N _i ,N _j ) measures the hierarchical structure The distance between two fine-grained categories, LCS represents the minimum common subtree containing two nodes, Height(B) represents the height of the subtree whose root is node B, and continuous labels with multi-granularity information are recorded as

The calculation method of the loss function of the multi-granularity regularization term is:

The loss function of multi-granularity regularization rebalancing is:

L _M =(1-λ)L _CB +L _H

The rebalancing loss is expressed as:

Among them, w _i represents the category weight, z _i represents the output of the network, N represents the total number of categories, and z _j represents the network output;

The total loss function is:

L＝L _M +αλL _D

in, α=10 ^-x , used to adjust the distillation loss _LD to the same order of magnitude as _LM ;

Among them, the model that learned the N _old category in the previous round is used as the teacher model, and the model that needs to learn the N _old + N _new category in this round is called the student model; the input of the student model is the sampled old category sample and the new category sample. The union of , the symbol is expressed as D _t =D _o ′ _ld ∪D _new , and the corresponding output is After the same data passes through the teacher model, the output Before passing through the softmax layer, divide each probability item output by the teacher model by the temperature coefficient T, and record this result as a soft label π(z′). Divide each probability item output by the student model by the temperature T, record as π(z); the first-stage training process is: mix the new fault category sample D _new with the old fault category example set D _o ′ _ld , denoted as D _t as the input of the model, through the feature extraction layer and classifier Obtain the output of the network, and the optimization goal is to minimize the weighted sum of distillation loss and multi-granularity regularization rebalancing module loss;

The second-stage training process is: sampling the data into a category-balanced training subset, freezing the parameters of the feature extraction layer, and retraining the classifier separately; the optimization goal is to minimize the distillation loss and the rebalancing module loss of multi-granularity regularization weighted sum.

2. A fault diagnosis device based on multi-granularity regularized rebalancing incremental learning, characterized in that the device includes: a processor and a memory, program instructions are stored in the memory, and the processor calls the memory stored in the memory. program instructions to cause the apparatus to perform the method steps of claim 1.

3. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, the computer program includes program instructions, and when executed by a processor, the program instructions cause the processor to execute the right The method steps described in claim 1.