CN117252083A

CN117252083A - A bearing remaining life prediction method and system that combines degradation stage division and sub-domain adaptation

Info

Publication number: CN117252083A
Application number: CN202310849683.7A
Authority: CN
Inventors: 宋磊; 杜俊蓉; 贵轩昂; 张健; 郭丽丽; 李续志
Original assignee: Technology and Engineering Center for Space Utilization of CAS
Current assignee: Technology and Engineering Center for Space Utilization of CAS
Priority date: 2023-07-12
Filing date: 2023-07-12
Publication date: 2023-12-19
Anticipated expiration: 2043-07-12
Also published as: CN117252083B

Abstract

The invention discloses a bearing remaining life prediction method and system that combines degradation stage division and sub-domain adaptation. The method includes the following steps: S1. Construction of a health index curve; S2. Health stage division; S3. Construction and training of a remaining life prediction model. ;S4. Remaining life prediction. The advantage is that it can cleverly correspond to different health stages existing in the life cycle to different subdomains, and disperse the overall differences of global alignment to the stage differences of local alignment. In the migration prediction stage, multi-order measures based on data labels and membership degrees are used to narrow down each degradation stage and complete fuzzy substructure alignment and cross-domain regression. The present invention can achieve finer-grained feature alignment, build a model with better generalization, and at the same time achieve more accurate remaining life prediction.

Description

A bearing remaining life prediction method and system combining degradation stage division and subdomain adaptation

技术领域Technical Field

本发明涉及轴承设备状态监测和健康管理技术领域，尤其涉及一种结合退化阶段划分和子域自适应的轴承剩余寿命预测方法及系统。The present invention relates to the technical field of bearing equipment condition monitoring and health management, and in particular to a bearing remaining life prediction method and system combining degradation stage division and subdomain adaptation.

背景技术Background Art

轴承是机械设备的重要组成部件，随着运行时间的增加，轴承的性能会逐渐下降，产生故障甚至影响设备的可用性，如何保障轴承的安全性与可靠性是当前一个重要的问题。因此，轴承的故障预测与健康管理(Prognostics and Health Management，PHM)成为了一个重点问题，轴承的PHM旨在通过监控设备的健康状况，预测可能出现的问题，避免严重故障或事故的发生。剩余寿命预测(Remaining useful life prediction，RULprediction)是PHM任务的核心部分，其主要任务是预测正在运行的设备或部件从当前时刻到失去操作能力之前的时间长度，然后再根据预测结果制定预防性的维修策略。Bearings are important components of mechanical equipment. As the running time increases, the performance of bearings will gradually decline, causing failures and even affecting the availability of equipment. How to ensure the safety and reliability of bearings is an important issue at present. Therefore, bearing fault prognostics and health management (PHM) has become a key issue. Bearing PHM aims to monitor the health of the equipment, predict possible problems, and avoid serious failures or accidents. Remaining useful life prediction (RULprediction) is the core part of the PHM task. Its main task is to predict the length of time from the current moment to the loss of operating ability of the running equipment or components, and then formulate preventive maintenance strategies based on the prediction results.

目前，根据预测过程中所运用的技术，轴承的剩余寿命预测方法可以大致分为两类：基于模型的剩余寿命预测方法和基于数据驱动的剩余寿命预测方法。(1)基于模型的剩余寿命预测方法：基于模型的方法利用先验知识构建表征轴承退化的模型，即通过各种数学物理模型来构造经验模型，以此预测轴承的剩余寿命。该类方法具有良好的可解释性，在完全理解失效机理、构造合适模型的情况下，能够取得很好的预测效果。但是，基于模型的剩余寿命预测方法需要大量专家知识且泛化性差，轴承的物理特性也难以全面理解，限制了其应用。(2)基于数据驱动的剩余寿命预测方法：基于数据驱动的剩余寿命预测方法将退化过程映射成健康状态和监测数据之间的关系函数，从可用数据中提取并学习模式来表征退化行为。传统的数据驱动方法多采用简单的机器学习算法进行预测，但是这类算法采用的网络结构简单,无法更好的挖掘数据之间的潜在信息。随着人工智能技术的发展，深度学习因其强大的特征提取和数据处理能力在轴承剩余寿命预测任务中有着优异的表现。At present, according to the technology used in the prediction process, the remaining life prediction methods of bearings can be roughly divided into two categories: model-based remaining life prediction methods and data-driven remaining life prediction methods. (1) Model-based remaining life prediction methods: Model-based methods use prior knowledge to construct a model to characterize bearing degradation, that is, to construct an empirical model through various mathematical and physical models to predict the remaining life of the bearing. This type of method has good interpretability and can achieve good prediction results when the failure mechanism is fully understood and a suitable model is constructed. However, model-based remaining life prediction methods require a lot of expert knowledge and have poor generalization. The physical properties of bearings are also difficult to fully understand, which limits their application. (2) Data-driven remaining life prediction methods: Data-driven remaining life prediction methods map the degradation process into a relationship function between health status and monitoring data, extract and learn patterns from available data to characterize degradation behavior. Traditional data-driven methods mostly use simple machine learning algorithms for prediction, but the network structure used by this type of algorithm is simple and cannot better mine the potential information between data. With the development of artificial intelligence technology, deep learning has excellent performance in the task of bearing remaining life prediction due to its powerful feature extraction and data processing capabilities.

然而，深度学习模型需要大量高质量的训练数据才能理解数据的潜在模式，并且需要待测试数据与训练数据满足相同的数据分布，上述假设在实际剩余寿命预测中存在如下挑战：(1)首先，各设备往往处在复杂且时变的工况下，训练数据与测试数据存在协变量偏移，无法保证获取高质量同分布的训练数据，导致一种RUL预测模型预测其他设备时，性能出现急剧下降，这种现象被称作“跨域问题”。(2)同时，各设备的可靠性和安全性不断增强，获取全寿命周期越发困难，不完整待测数据与训练数据将会存在更大的分布偏差。However, deep learning models require a large amount of high-quality training data to understand the underlying patterns of the data, and require the test data and training data to have the same data distribution. The above assumptions present the following challenges in actual remaining life prediction: (1) First, each device is often in a complex and time-varying working condition, and there is a covariate shift between the training data and the test data. It is impossible to obtain high-quality training data with the same distribution, which leads to a sharp drop in performance when a RUL prediction model predicts other devices. This phenomenon is called the "cross-domain problem." (2) At the same time, the reliability and safety of each device are constantly improving, and it is becoming increasingly difficult to obtain the full life cycle. Incomplete test data and training data will have a greater distribution deviation.

考虑到上述挑战，以深度学习模型为基础的迁移学习方法应运而生，也成为了如今数据驱动方法研究的重点。Taking the above challenges into consideration, transfer learning methods based on deep learning models have emerged and have become the focus of research on data-driven methods today.

跨域条件下轴承的剩余寿命预测问题中，源域与目标域数据分布存在差异，通过提取两个域的域不变公共特征来缩小差异。但是在实际问题中，缺失部分周期的目标域数据的整体分布会发生变化，与完整周期的源域数据产生更大差异，难以直接进行全局的领域自适应，且轴承全寿命周期存在不同的健康阶段，各阶段结构化特征明显，分布差异较大，全局对齐忽略了生命周期内部结构的差异性，可能导致不同子结构的错误匹配，无法挖掘数据的共同潜在局部特征。因此，在数据不完整、跨域条件下，如何准确进行轴承剩余寿命预测，是目前亟待解决的问题。In the problem of remaining life prediction of bearings under cross-domain conditions, there are differences in the data distribution of the source domain and the target domain. The difference is narrowed by extracting domain-invariant common features of the two domains. However, in practical problems, the overall distribution of the target domain data with missing parts of the cycle will change, resulting in greater differences from the source domain data with complete cycles, making it difficult to directly perform global domain adaptation. In addition, there are different health stages in the entire life cycle of bearings, and each stage has obvious structured characteristics and large distribution differences. Global alignment ignores the differences in the internal structure of the life cycle, which may lead to incorrect matching of different substructures and fail to mine the common potential local features of the data. Therefore, how to accurately predict the remaining life of bearings under incomplete data and cross-domain conditions is a problem that needs to be solved urgently.

发明内容Summary of the invention

本发明的目的在于提供一种结合退化阶段划分和子域自适应的轴承剩余寿命预测方法及系统，从而解决现有技术中存在的前述问题。The object of the present invention is to provide a bearing remaining life prediction method and system combining degradation stage division and subdomain adaptation, so as to solve the above-mentioned problems existing in the prior art.

为了实现上述目的，本发明采用的技术方案如下：In order to achieve the above object, the technical solution adopted by the present invention is as follows:

一种结合退化阶段划分和子域自适应的轴承剩余寿命预测方法，包括如下步骤，A bearing remaining life prediction method combining degradation stage division and subdomain adaptation includes the following steps:

S1、健康指标曲线构建：S1. Construction of health index curve:

获取原始数据集，对原始数据集中的源域数据和目标域数据进行时域和频域特征提取，并将得到的多维特征进行二次指标优化和相关性分析，获取健康指标曲线；Obtain the original data set, extract time domain and frequency domain features of the source domain data and target domain data in the original data set, and perform secondary index optimization and correlation analysis on the obtained multi-dimensional features to obtain a health index curve;

S2、健康阶段划分：S2. Health stage division:

采用时序窗口加权聚类算法对健康指标曲线进行处理，获取每个源域数据和目标域数据所属的健康阶段标签及其模糊隶属度，并按比例划分为训练集和测试集；The time series window weighted clustering algorithm is used to process the health index curve, obtain the health stage label and fuzzy membership of each source domain data and target domain data, and divide them into training set and test set according to the proportion;

S3、构建并训练剩余寿命预测模型：S3. Build and train the remaining life prediction model:

所述剩余寿命预测模型包括模糊子域特征提取器和RUL回归器；将训练集输入到剩余寿命预测模型中对其进行训练；训练过程中，基于模糊子域特征提取器中子结构模糊对齐模块的对齐损失和RUL回归器的回归损失，获取总损失，并通过最小化总损失优化模糊子域特征提取器和RUL回归器的参数，获取并保存训练好的剩余寿命预测模型；The remaining life prediction model includes a fuzzy subdomain feature extractor and a RUL regressor; the training set is input into the remaining life prediction model to train it; during the training process, the total loss is obtained based on the alignment loss of the substructure fuzzy alignment module in the fuzzy subdomain feature extractor and the regression loss of the RUL regressor, and the parameters of the fuzzy subdomain feature extractor and the RUL regressor are optimized by minimizing the total loss, and the trained remaining life prediction model is obtained and saved;

S4、剩余寿命预测：S4. Remaining life prediction:

将测试集输入到训练好的剩余寿命预测模型中进行剩余寿命预测，获取剩余寿命预测结果。The test set is input into the trained remaining life prediction model to perform remaining life prediction and obtain the remaining life prediction result.

优选的，所述二次指标优化具体为，对多维特征进行特征间差异对齐操作得到相对指标，再将相对指标进行特征内压缩平滑操作得到去噪指标。Preferably, the secondary index optimization is specifically to perform an inter-feature difference alignment operation on the multi-dimensional features to obtain a relative index, and then perform an intra-feature compression and smoothing operation on the relative index to obtain a denoising index.

优选的，所述相关性分析具体为，对二次优化后的指标使用皮尔逊相关系数衡量每一时刻数据与初始时刻数据的相关程度，构建健康指标。Preferably, the correlation analysis is specifically to use the Pearson correlation coefficient to measure the correlation degree between the data at each moment and the data at the initial moment for the index after secondary optimization, so as to construct a health index.

优选的，所述源域数据为有轴承的机电设备运行到失效时采集到的机电设备监测数据；所述目标域数据为无轴承的机电设备运行到故障的机电设备监测数据；Preferably, the source domain data is the electromechanical equipment monitoring data collected when the electromechanical equipment with bearings runs to failure; the target domain data is the electromechanical equipment monitoring data when the electromechanical equipment without bearings runs to failure;

所述源域数据为具有轴承剩余寿命作为标签的有监督数据；所述目标域数据为不具有轴承剩余寿命标签的无监督数据。The source domain data is supervised data with the remaining life of the bearing as a label; and the target domain data is unsupervised data without the remaining life of the bearing label.

优选的，步骤S2具体包括如下内容，Preferably, step S2 specifically includes the following contents:

S21、将健康指标曲线划分为窗口，使用该窗口的统计数值代替当前单点数值，实现健康指标曲线的模糊平滑处理；S21, dividing the health index curve into windows, using the statistical value of the window to replace the current single point value, to achieve fuzzy smoothing of the health index curve;

S22、对模糊平滑处理后的健康指标曲线进行聚类，通过迭代优化每个数据点的模糊隶属度以及每个类别的聚类中心，获取每个源域数据和目标域数据所属的健康阶段标签及其模糊隶属度；S22, clustering the health index curve after fuzzy smoothing, and obtaining the health stage label and fuzzy membership of each source domain data and target domain data by iteratively optimizing the fuzzy membership of each data point and the cluster center of each category;

S23、将源域数据、目标域数据及其所属健康阶段标签和模糊隶属度按比例划分为训练集和测试集。S23. Divide the source domain data, target domain data and their health stage labels and fuzzy memberships into training sets and test sets in proportion.

优选的，步骤S3具体包括如下内容，Preferably, step S3 specifically includes the following contents:

S31、将训练集输入到模糊子域特征提取器中，利用模糊子域特征提取器的深度神经网络，分别获取训练集中源域数据和目标域数据的特征潜在表示，获取源域数据和目标域数据对应的高维特征矩阵；S31, inputting the training set into the fuzzy subdomain feature extractor, using the deep neural network of the fuzzy subdomain feature extractor to respectively obtain the feature potential representations of the source domain data and the target domain data in the training set, and obtaining the high-dimensional feature matrix corresponding to the source domain data and the target domain data;

S32、将源域数据和目标域数据对应的高维特征矩阵输入到模糊子域特征提取器的子结构模糊对齐模块中，通过计算源域数据和目标域数据特征矩阵之间的FLMMD、源域数据和目标域数据特征矩阵时间的FLCORAL，获取对齐损失；S32, inputting the high-dimensional feature matrices corresponding to the source domain data and the target domain data into the substructure fuzzy alignment module of the fuzzy subdomain feature extractor, and obtaining the alignment loss by calculating the FLMMD between the feature matrices of the source domain data and the target domain data, and the FLCORAL between the feature matrices of the source domain data and the target domain data;

S33、将模糊子域特征提取器输出的对齐后的源域数据与训练集中的目标域数据的时序特征矩阵输入到RUL回归器中，输出源域数据和训练集中目标域数据的剩余寿命预测值，并计算源域剩余寿命预测值与真实值的均方误差，作为回归损失；S33, inputting the time series feature matrix of the aligned source domain data and the target domain data in the training set output by the fuzzy subdomain feature extractor into the RUL regressor, outputting the remaining life prediction values of the source domain data and the target domain data in the training set, and calculating the mean square error between the source domain remaining life prediction value and the true value as the regression loss;

S34、根据对齐损失和回归损失计算总损失；S34. Calculate the total loss based on the alignment loss and the regression loss;

S35、最小化总损失，反馈调节模糊子域特征提取器和RUL回归器的网络参数，实现对模糊子域特征提取器和RUL回归器的网络训练，直到训练完成，获取训练好的剩余寿命预测模型。S35, minimize the total loss, feedback-adjust the network parameters of the fuzzy subdomain feature extractor and the RUL regressor, implement network training of the fuzzy subdomain feature extractor and the RUL regressor, until the training is completed, and obtain the trained remaining life prediction model.

优选的，步骤S32具体包括如下内容，Preferably, step S32 specifically includes the following contents:

S321、第一部分对齐：基于最大均值差异的模糊局部最大均值差异，同时考虑每个样本属于所有类别的概率，实现更细粒度的模糊子域对齐；通过计算源域数据和目标域数据特征矩阵之间的FLMMD作为第一部分损失；S321, first part alignment: based on the fuzzy local maximum mean difference of the maximum mean difference, while considering the probability of each sample belonging to all categories, a more fine-grained fuzzy subdomain alignment is achieved; the FLMMD between the feature matrices of the source domain data and the target domain data is calculated as the first part loss;

S322、第二部分对齐：基于二阶统计量Correlation Alignment的Fuzzy LocalCORAL，同时考虑每个样本属于所有类别的概率，在二阶统计量上进行细粒度对齐；通过计算源域数据和目标域数据特征矩阵时间的FLCORAL作为第二部分损失；S322, the second part of alignment: Fuzzy LocalCORAL based on the second-order statistics Correlation Alignment, while considering the probability of each sample belonging to all categories, fine-grained alignment is performed on the second-order statistics; the FLCORAL of the feature matrix time of the source domain data and the target domain data is calculated as the second part of the loss;

S323、综合第一部分损失和第二部分损失，获取对齐损失。S323. Combining the first part loss and the second part loss, obtain the alignment loss.

优选的，所述模糊子域特征提取器为ResNet50特征提取器。Preferably, the fuzzy subdomain feature extractor is a ResNet50 feature extractor.

优选的，所述RUL回归器为基于全连接网络的回归预测器。Preferably, the RUL regressor is a regression predictor based on a fully connected network.

本发明的目的还在于提供一种结合退化阶段划分和子域自适应的轴承剩余寿命预测系统，系统用于实现上述任一所述的方法，所述系统包括，The present invention also aims to provide a bearing remaining life prediction system combining degradation stage division and subdomain adaptation, the system is used to implement any of the above methods, the system comprises:

曲线构建模块：用于构建健康指标曲线；Curve construction module: used to construct health indicator curve;

阶段划分模块：用于划分健康阶段；Stage division module: used to divide health stages;

预测模型构建模块：用于构建并训练剩余寿命预测模型；Prediction model building module: used to build and train the remaining life prediction model;

寿命预测模块：用于预测剩余寿命；Life prediction module: used to predict the remaining life;

本发明的有益效果是：1、能够充分巧妙地利用利用寿命周期中存在的阶段对应于不同子域，将全局对齐的整体差异分散到局部对齐的阶段差异。2、在健康阶段划分阶段，提出了一种适用于时序窗口数据的时序模糊聚类算法，实现了统一标准化的健康指标构建和阶段划分过程。在迁移阶段，则是利用基于数据标签和隶属度的多阶度量拉近各退化阶段，以此完成模糊子结构对齐和跨域回归。3、能够实现更细粒度的特征对齐，构建出泛化性更好的模型，同时又能够实现更准确的剩余寿命预测。The beneficial effects of the present invention are: 1. It can fully and cleverly utilize the stages existing in the life cycle corresponding to different subdomains to disperse the overall differences of global alignment into the stage differences of local alignment. 2. In the health stage division stage, a time series fuzzy clustering algorithm suitable for time series window data is proposed to realize the unified and standardized health indicator construction and stage division process. In the migration stage, a multi-order metric based on data labels and membership is used to bring each degradation stage closer, so as to complete the fuzzy substructure alignment and cross-domain regression. 3. It can achieve more fine-grained feature alignment, build a model with better generalization, and at the same time achieve more accurate remaining life prediction.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明实施例中轴承剩余寿命预测方法的原理流程图；FIG1 is a flow chart showing the principle of a method for predicting the remaining life of a bearing according to an embodiment of the present invention;

图2是本发明实施例中迁移学习的基本原理示意图；FIG2 is a schematic diagram of the basic principle of transfer learning in an embodiment of the present invention;

图3是本发明实施例中特征内压缩平滑操作示意图；FIG3 is a schematic diagram of a feature-internal compression and smoothing operation in an embodiment of the present invention;

图4是本发明实施例中TWW-FCM原理图；FIG4 is a schematic diagram of a TWW-FCM according to an embodiment of the present invention;

图5是本发明实施例中模糊子域自适应回归网络(FSARN)的框架示意图。FIG5 is a schematic diagram of a framework of a fuzzy subdomain adaptive regression network (FSARN) according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施方式仅仅用以解释本发明，并不用于限定本发明。In order to make the purpose, technical solution and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings. It should be understood that the specific implementation methods described herein are only used to explain the present invention and are not used to limit the present invention.

实施例一Embodiment 1

如图1至图5所示，本实施例中，提供了一种结合退化阶段划分和子域自适应的轴承剩余寿命预测方法，包括如下步骤，As shown in FIG. 1 to FIG. 5 , in this embodiment, a bearing remaining life prediction method combining degradation stage division and subdomain adaptation is provided, comprising the following steps:

一、健康指标曲线构建：1. Construction of health index curve:

获取原始数据集，对原始数据集中的源域数据和目标域数据进行时域和频域特征提取，并将得到的多维特征进行二次指标优化和相关性分析，获取健康指标曲线。The original data set is obtained, and time domain and frequency domain features are extracted from the source domain data and target domain data in the original data set. The obtained multidimensional features are subjected to secondary index optimization and correlation analysis to obtain a health index curve.

二次指标优化包括相对指标和去噪指标，首先对多维特征进行特征间差异对齐操作得到相对指标，再将相对指标进行特征内压缩平滑操作(如图3所示)得到去噪指标。The secondary index optimization includes relative index and denoising index. First, the multi-dimensional features are aligned to obtain the relative index, and then the relative index is compressed and smoothed within the features (as shown in Figure 3) to obtain the denoising index.

所述相关性分析具体为，对二次优化后的指标使用皮尔逊相关系数衡量每一时刻数据与初始时刻数据的相关程度，构建健康指标HI。The correlation analysis is specifically to use the Pearson correlation coefficient to measure the correlation between the data at each moment and the data at the initial moment for the secondary optimized index, and to construct the health index HI.

所述源域数据为有轴承的机电设备运行到失效时采集到的机电设备监测数据；所述目标域数据为无轴承的机电设备运行到故障的机电设备的监测数据，仅有前期一部分采集到的监测数据。The source domain data is the monitoring data of electromechanical equipment collected when the electromechanical equipment with bearings runs to failure; the target domain data is the monitoring data of electromechanical equipment without bearings running to failure, which is only the monitoring data collected in the early stage.

二、健康阶段划分：2. Health stage division:

采用时序窗口加权聚类算法(TWW-FCM算法)对健康指标曲线进行处理，获取每个源域数据和目标域数据所属的健康阶段标签及其模糊隶属度，并按比例划分为训练集和测试集。The time series window weighted clustering algorithm (TWW-FCM algorithm) is used to process the health index curve to obtain the health stage label and fuzzy membership of each source domain data and target domain data, and divide them into training set and test set according to the proportion.

如图4所示，所述TWW-FCM算法是适用于对时序数据进行阶段划分的聚类算法，在对HI曲线进行模糊平滑处理后，对其进行聚类从而获得健康阶段标签HS以及模糊隶属度。具体包括如下步骤：As shown in FIG4 , the TWW-FCM algorithm is a clustering algorithm suitable for dividing time series data into stages. After fuzzy smoothing is performed on the HI curve, it is clustered to obtain the health stage label HS and the fuzzy membership. Specifically, the following steps are included:

1、将健康指标曲线划分为窗口，使用该窗口的统计数值代替当前单点数值，实现健康指标曲线的模糊平滑处理。1. Divide the health index curve into windows, and use the statistical value of the window to replace the current single point value to achieve fuzzy smoothing of the health index curve.

2、对模糊平滑处理后的健康指标曲线进行聚类，通过迭代优化每个数据点的模糊隶属度以及每个类别的聚类中心，获取每个源域数据和目标域数据所属的健康阶段标签及其模糊隶属度。2. Cluster the health indicator curves after fuzzy smoothing, and obtain the health stage label and fuzzy membership of each source domain data and target domain data by iteratively optimizing the fuzzy membership of each data point and the cluster center of each category.

3、将源域数据、目标域数据及其所属健康阶段标签和模糊隶属度按比例划分为训练集和测试集。3. Divide the source domain data, target domain data and their health stage labels and fuzzy membership into training sets and test sets in proportion.

三、构建并训练剩余寿命预测模型：3. Build and train the remaining life prediction model:

所述剩余寿命预测模型包括模糊子域特征提取器和RUL回归器；将训练集输入到剩余寿命预测模型中对其进行训练；训练过程中，基于模糊子域特征提取器中子结构模糊对齐模块的对齐损失和RUL回归器的回归损失，获取总损失，并通过最小化总损失优化模糊子域特征提取器和RUL回归器的参数，获取并保存训练好的剩余寿命预测模型。The remaining life prediction model includes a fuzzy subdomain feature extractor and a RUL regressor; a training set is input into the remaining life prediction model to train it; during the training process, a total loss is obtained based on the alignment loss of the substructure fuzzy alignment module in the fuzzy subdomain feature extractor and the regression loss of the RUL regressor, and the parameters of the fuzzy subdomain feature extractor and the RUL regressor are optimized by minimizing the total loss, and the trained remaining life prediction model is obtained and saved.

具体包括如下步骤，The specific steps include:

1、将训练集输入到模糊子域特征提取器中，利用模糊子域特征提取器的深度神经网络，分别获取训练集中源域数据和目标域数据的特征潜在表示，获取源域数据和目标域数据对应的高维特征矩阵；1. Input the training set into the fuzzy subdomain feature extractor, and use the deep neural network of the fuzzy subdomain feature extractor to obtain the feature potential representations of the source domain data and the target domain data in the training set, and obtain the high-dimensional feature matrix corresponding to the source domain data and the target domain data;

2、将源域数据和目标域数据对应的高维特征矩阵输入到模糊子域特征提取器的子结构模糊对齐模块中，通过计算源域数据和目标域数据特征矩阵之间的FLMMD、源域数据和目标域数据特征矩阵时间的FLCORAL，获取对齐损失。2. The high-dimensional feature matrices corresponding to the source domain data and the target domain data are input into the substructure fuzzy alignment module of the fuzzy subdomain feature extractor, and the alignment loss is obtained by calculating the FLMMD between the feature matrices of the source domain data and the target domain data, and the FLCORAL between the feature matrices of the source domain data and the target domain data.

将源域数据和目标域数据的高维特征矩阵输入到子结构模糊对齐模块中，通过最小化子结构之间的特征分布差异，来学习子领域层次的域不变表示，所述子结构模糊对齐模块包括两部分对齐，具体为：The high-dimensional feature matrices of the source domain data and the target domain data are input into the substructure fuzzy alignment module to learn the domain-invariant representation of the subdomain level by minimizing the feature distribution differences between substructures. The substructure fuzzy alignment module includes two parts of alignment, specifically:

2.1、第一部分对齐：基于最大均值差异(Maximum Mean Discrepancy，MMD)的模糊局部最大均值差异(Fuzzy Local Maximum Mean Discrepancy，FLMMD)，FLMMD同时考虑每个样本属于所有类别的概率，实现更细粒度的模糊子域对齐，使得源域数据和目标域数据在映射的特征空间中更接近；子结构模糊对齐模块通过计算源域数据和目标域数据特征矩阵之间的FLMMD作为第一部分损失 2.1. Alignment of the first part: Based on the fuzzy local maximum mean discrepancy (FLMMD) of the maximum mean discrepancy (MMD), FLMMD simultaneously considers the probability that each sample belongs to all categories, and realizes a more fine-grained fuzzy subdomain alignment, making the source domain data and the target domain data closer in the mapped feature space; the substructure fuzzy alignment module calculates the FLMMD between the feature matrices of the source domain data and the target domain data as the first part loss

2.2、第二部分对齐：基于二阶统计量Correlation Alignment(CORAL)的FuzzyLocal CORAL(FLCORAL)，FLCORAL同样同时考虑每个样本属于所有类别的概率，在二阶统计量上进行细粒度对齐，从而捕捉到两个域之间的复杂差异；子结构模糊对齐模块通过计算源域数据和目标域数据特征矩阵之间的FLCORAL作为第二部分损失 2.2. Alignment in the second part: Fuzzy Local Coral (FLCORAL) based on the second-order statistic Correlation Alignment (CORAL). FLCORAL also considers the probability of each sample belonging to all categories at the same time, and performs fine-grained alignment on the second-order statistics to capture the complex differences between the two domains. The substructure fuzzy alignment module calculates the FLCORAL between the feature matrices of the source domain data and the target domain data as the second part loss.

2.3、综合第一部分损失和第二部分损失，获取对齐损失。将子结构模糊对齐模块两部分损失相结合，采用下式，计算对齐损失 2.3. Combine the first part of the loss and the second part of the loss to obtain the alignment loss. Combine the two parts of the loss of the substructure fuzzy alignment module and use the following formula to calculate the alignment loss:

其中：γ是的权重系数。Where: γ is The weight coefficient of .

3、将模糊子域特征提取器输出的对齐后的源域数据与训练集中的目标域数据的时序特征矩阵输入到RUL回归器中，输出源域数据和训练集中目标域数据的剩余寿命预测值，并计算源域剩余寿命预测值与真实值的均方误差(mean-square error，MSE)，作为回归损失 3. Input the aligned time series feature matrix of the source domain data and the target domain data in the training set output by the fuzzy subdomain feature extractor into the RUL regressor, output the remaining life prediction values of the source domain data and the target domain data in the training set, and calculate the mean square error (MSE) between the source domain remaining life prediction value and the true value as the regression loss.

4、根据对齐损失和回归损失计算模型总损失采用下式计算，4. Calculate the total model loss based on alignment loss and regression loss The following formula is used for calculation:

其中：β是的惩罚系数。Where: β is The penalty coefficient.

5、最小化总损失，反馈调节模糊子域特征提取器和RUL回归器的网络参数，实现对模糊子域特征提取器和RUL回归器的网络训练，直到训练完成，获取训练好的剩余寿命预测模型。5. Minimize the total loss, feedback-adjust the network parameters of the fuzzy subdomain feature extractor and the RUL regressor, implement network training of the fuzzy subdomain feature extractor and the RUL regressor, until the training is completed, and obtain the trained remaining life prediction model.

通过最小化对齐损失能够学习域不变表示，得到源域和目标域之间的域不变特征矩阵，具体的，用于最小化源域与目标域的一阶矩，用于最小化源域与目标域之间的二阶矩，通过计算源样本和目标样本的协方差之间的距离，来衡量两个域之间的差异。二者共同发挥作用，使得源域与目标域的数据分布更加接近。By minimizing the alignment loss It is possible to learn domain-invariant representations and obtain domain-invariant feature matrices between the source domain and the target domain. Specifically, It is used to minimize the first-order moment between the source domain and the target domain. It is used to minimize the second-order moment between the source domain and the target domain, and measures the difference between the two domains by calculating the distance between the covariance of the source sample and the target sample. The two work together to make the data distribution of the source domain and the target domain closer.

对于串联的模糊子域特征提取器和RUL回归器，可以得到一个模型总损失将最小化模型总损失为优化目标，反馈调节模糊子域特征提取器和RUL回归器的网络参数，实现对模糊子域特征提取器和RUL回归器的网络训练，直到满足训练终止条件，得到训练完成的剩余寿命模型。For the fuzzy subdomain feature extractor and RUL regressor in series, a total model loss can be obtained Will minimize the total model loss In order to optimize the target, the network parameters of the fuzzy subdomain feature extractor and the RUL regressor are adjusted by feedback to realize the network training of the fuzzy subdomain feature extractor and the RUL regressor until the training termination condition is met and the trained remaining life model is obtained.

本实施例中，所述模糊子域特征提取器为ResNet50特征提取器。所述RUL回归器为基于全连接网络的回归预测器。In this embodiment, the fuzzy subdomain feature extractor is a ResNet50 feature extractor. The RUL regressor is a regression predictor based on a fully connected network.

四、剩余寿命预测：4. Remaining life prediction:

本发明的目的还在于提供一种结合退化阶段划分和子域自适应的轴承剩余寿命预测系统，系统用于实现所述的方法，所述系统包括，The present invention also aims to provide a bearing remaining life prediction system combining degradation stage division and subdomain adaptation, the system is used to implement the method described, the system comprises:

实施例二Embodiment 2

本实施例中，通过一个具体的例子，详细说明本发明方法的执行过程：In this embodiment, the execution process of the method of the present invention is described in detail through a specific example:

一、实验数据集获取和健康指标(HI)构建1. Experimental Dataset Acquisition and Health Indicator (HI) Construction

本实施例采用的数据集为IEEE PHM Challenge 2012的轴承数据集，它是在PRONOSTIA测试平台上获取的，该平台由三个部分组成：旋转部分、负载部分和数据采集部分。The data set used in this embodiment is the bearing data set of IEEE PHM Challenge 2012, which is obtained on the PRONOSTIA test platform. The platform consists of three parts: a rotating part, a load part, and a data acquisition part.

旋转部分的电功率为250W，动力通过旋转轴传递到轴承上。负载部分为一个气动千斤顶，为轴承提供4000N的动载荷，使轴承快速退化。数据采集部分包含振动数据和温度数据，振动传感器由两个相互定位为90°的微型加速度计组成，分别放在水平轴和竖直轴上，采样频率为25.6kHz，每隔10s进行一次0.1s的采样，当震动幅度达到20g时停止。The electric power of the rotating part is 250W, and the power is transmitted to the bearing through the rotating shaft. The load part is a pneumatic jack, which provides a dynamic load of 4000N to the bearing, causing the bearing to degrade rapidly. The data acquisition part contains vibration data and temperature data. The vibration sensor consists of two micro accelerometers positioned at 90° to each other, placed on the horizontal axis and the vertical axis respectively. The sampling frequency is 25.6kHz, and 0.1s sampling is performed every 10s, and it stops when the vibration amplitude reaches 20g.

该数据集的数据包含了三种工况，第一种是负载4000N，转速1800rpm，第二种是负载4200N，转速1650rpm，最后一种是负载5000N，转速1500rpm。HI创建则包含了以下步骤：The data set contains three working conditions. The first one is load 4000N, speed 1800rpm, the second one is load 4200N, speed 1650rpm, and the last one is load 5000N, speed 1500rpm. HI creation includes the following steps:

1、时频统计特征获取：1. Time-frequency statistical feature acquisition:

如图2所示，为了能够充分挖掘时序数据的潜在特征，提取了原始信号的17个时域和8个频域特征，从而多角度地对轴承振动信号的变化趋势进行研究。时域信号指信号在时间轴上的变化特征，能够很好地对设备的运行趋势进行描述，频域信号则是可以很好地表征设备的运行状态，同时便于分离、去除噪声信息。在经过一系列时域频域特征的提取之后，得到时频特征表示为p＝{p₁,p₂,…,p_i,…,p_I},其中I＝25。As shown in Figure 2, in order to fully explore the potential features of time series data, 17 time domain and 8 frequency domain features of the original signal are extracted, so as to study the change trend of the bearing vibration signal from multiple angles. The time domain signal refers to the change characteristics of the signal on the time axis, which can well describe the operation trend of the equipment. The frequency domain signal can well characterize the operation status of the equipment and facilitate the separation and removal of noise information. After a series of time domain and frequency domain feature extraction, the time-frequency feature is expressed as p = {p ₁ ,p ₂ ,…, _pi ,…, _pi }, where I = 25.

2、二次特征指标提取：2. Secondary feature index extraction:

步骤1中所提取的特征之间存在个体差异，因此特征之间没有相同的起始高度，同时内部有虚假波动及噪声。为了消除这些影响，本发明提出了两个二次优化特征指标，相对指标和去噪指标。相对指标的作用是消除个体差异,拉进分布间隙，对齐各类特征向量，因为各统计特征的起始位置高度不同，造成前期平缓数据部分存在相对间隙，绝对变化范围受到影响，采用(1)所示的公式计算：There are individual differences between the features extracted in step 1, so there is no same starting height between the features, and there are false fluctuations and noise inside. In order to eliminate these effects, the present invention proposes two secondary optimization feature indicators, the relative indicator and the denoising indicator. The role of the relative indicator is to eliminate individual differences, close the distribution gap, and align various feature vectors. Because the starting position heights of various statistical features are different, there is a relative gap in the early flat data part, and the absolute range of change is affected. The formula shown in (1) is used for calculation:

其中，p_i代表一个特征，i表示特征的序号，p_norm代表在稳定运行情况下的特征的平均数。K代表稳定阶段时长。在经过个体间差异对齐操作后，时频特征变为相对指标p_r＝{p_r1,p_r2,…,p_ri,…,p_rI}。Among them, _pi represents a feature, i represents the feature number, and p _norm represents the average number of features under stable operation. K represents the duration of the stable phase. After the inter-individual difference alignment operation, the time-frequency feature becomes a relative index p _r ={p _r1 ,p _r2 ,…,p _ri ,…,p _rI }.

去噪指标是为了减少异常数据点，准确局部序列波动压缩范围，保证数据趋势的单调性。第i个去噪指标p_di可滑动窗口中的计算公式为：The denoising index is to reduce abnormal data points, accurately compress the fluctuation range of local sequences, and ensure the monotonicity of data trends. The calculation formula of the i-th denoising index p _di in the sliding window is:

其中，p_rj是经过公式(1)得到的相对指标的第j个数据，wi是滑动窗口的长度，x_i是p_r对应的时间序列的第i个数据。k_i和b_i是在第i个滑动窗口上拟合的线性回归模型的参数，可以通过最小二乘法得到。公式(3)和(4)分别代表了k_i和b_i的闭合形式表达。H_i是压缩局部序列波动后的滑动窗口样本，其表达式见公式(5)，d_i是H_i的起始点：Where p _rj is the jth data of the relative index obtained by formula (1), wi is the length of the sliding window, and _xi is the i-th data of the time series corresponding to p _r . _ki and _bi are the parameters of the linear regression model fitted on the i-th sliding window, which can be obtained by the least squares method. Formulas (3) and (4) represent the closed form expressions of _ki and _bi respectively. _Hi is the sliding window sample after compressing the local sequence fluctuations. Its expression is shown in formula (5). _Di is the starting point of _Hi :

其中y是p_r的值。min(H_i)和max(H_i)代表指标的上界和下界，其作用是为了减少异常数据对趋势的影响。它们可以通过下式表示：Where y is the value of p _r . min(H _i ) and max(H _i ) represent the upper and lower bounds of the indicator, which are used to reduce the impact of abnormal data on the trend. They can be expressed as follows:

min(H_i)＝μ-3σmin(H _i )＝μ-3σ

max(H_i)＝μ+3σmax(H _i )＝μ+3σ

公式(6)中，μ和σ分别是W_i的均值和标准差。相对指标经过个体内压缩平滑操作之后，得到了去噪指标p_d＝{p_d1,p_d2,…,p_ri,…,p_dI}。In formula (6), μ and σ are the mean and standard deviation of _Wi, respectively. After the relative index undergoes intra-individual compression and smoothing operation, the denoising index _pd = { _pd1 , _pd2 , …, _pri , …, _pdI } is obtained.

所得到的去噪指标p_d即二次特征指标提取操作后得到的最终数据p′。The obtained denoising index _pd is the final data p′ obtained after the secondary feature index extraction operation.

3、构造HI曲线：3. Construct HI curve:

HI是反应整个生命周期健康状态变化的指标，通常被表示为一条具有趋势的曲线。由于轴承退化为一个渐变过程，开始阶段认为是正常状态，随着运行故障出现，退化阶段数据与正常状态数据发生偏差，其与初始状态的相关性也会发生变化，因此后期运行阶段与初始阶段数据的相关性能够反映为不同健康阶段状态的一组数据点。因此，本发明将变换后的25维特征进行相关性分析，使用Pearson相关系数对两个时间点上数据的相关程度进行有效的衡量，得到时间轴上变化趋势，构建通用HI。具体的做法为：选择优化后的25维二次特征指标p′的起始部分作为正常状态的基值，然后通过计算起始时刻与其它时刻的Pearson相关系数，作为衡量设备状态的HI，表示如下：HI is an indicator that reflects the change in health status throughout the life cycle, and is usually represented as a curve with a trend. Since bearing degradation is a gradual process, the initial stage is considered to be in a normal state. As operational faults occur, the data in the degradation stage deviates from the normal state data, and its correlation with the initial state will also change. Therefore, the correlation between the data in the later operation stage and the initial stage can be reflected as a set of data points in different health stages. Therefore, the present invention performs a correlation analysis on the transformed 25-dimensional features, and uses the Pearson correlation coefficient to effectively measure the degree of correlation of the data at two time points, obtains the trend of changes on the time axis, and constructs a universal HI. The specific approach is: select the starting part of the optimized 25-dimensional secondary feature index p′ as the base value of the normal state, and then calculate the Pearson correlation coefficient between the starting moment and other moments as the HI to measure the equipment status, which is expressed as follows:

以源域数据集为例，其中x和y代表处理过后的不同时间点的多维特征向量，x,y＝p′_j,p′_k,j,k∈{1,2,…,N_s}。通过依次计算Pearson系数，我们可以获得相关系数向量R＝{r₁₁,r₁₂,…,r_1N}，即HI曲线。HI曲线将作为后续HS划分的依据，以及作为特征之一输入深度迁移网络。Take the source domain dataset as an example, where x and y represent the multidimensional feature vectors at different time points after processing, x, y = p′ _j , p′ _k , j, k∈{1,2,…,N _s }. By calculating the Pearson coefficient in sequence, we can obtain the correlation coefficient vector R = {r ₁₁ ,r ₁₂ ,…,r _1N }, that is, the HI curve. The HI curve will serve as the basis for subsequent HS division and as one of the features input into the deep migration network.

二、健康阶段(HS)划分2. Health Stage (HS) Division

考虑到各类的轴承的故障阈值指标通常是不同的，所以需要建立一个自适应的HS划分方法，本发明提出了一种时序窗口加权聚类方法(Time-series Window WeightedFuzzy C-Means Algorithm,TWW-FCM)，由于之前得到的HI曲线存在局部噪声和波动，所以传统的FCM算法对于噪声点的判断通常会进行错误的分类，从而出现阶段短暂突变或“假阶段”等健康阶段不连续的问题。所以基于当前的单点预测并不可靠，需要借助前后一定范围的其他点来决定所处阶段，因此提出适用于阶段划分时序数据的TWW-FCM，其原理图如图4所示。其具体流程如下：Considering that the fault threshold indicators of various types of bearings are usually different, it is necessary to establish an adaptive HS division method. The present invention proposes a time-series window weighted clustering method (Time-series Window Weighted Fuzzy C-Means Algorithm, TWW-FCM). Since the HI curve obtained previously has local noise and fluctuations, the traditional FCM algorithm usually makes incorrect classifications for the judgment of noise points, resulting in problems of discontinuity of healthy stages such as short-term mutations or "false stages". Therefore, the current single-point prediction is not reliable, and it is necessary to use other points within a certain range before and after to determine the stage. Therefore, TWW-FCM suitable for stage division of time series data is proposed, and its principle diagram is shown in Figure 4. The specific process is as follows:

首先，需要将HI曲线划分为窗口，使用该窗口的统计数值来代替当前单点数值，使得数据不但能够体现一段时间内的状态，而且更加平滑和连续，降低数据的突变和震荡，从而更加符合设备退化规律，有利于数据分析和建模。本发明选择高斯函数为窗口进行处理，其定义为：First, the HI curve needs to be divided into windows, and the statistical value of the window is used to replace the current single point value, so that the data can not only reflect the state within a period of time, but also be smoother and more continuous, reducing the mutation and oscillation of the data, so as to be more in line with the law of equipment degradation, which is conducive to data analysis and modeling. The present invention selects the Gaussian function as the window for processing, which is defined as:

其中，r_i为HI曲线上的第i个值，μ表示样本的均值，σ表示样本的标准差。随后，对数据进行窗口处理，对于每个数据点r_i，将高斯函数在这个点上的值作为权重，对窗口内所有数据点进行加权平均，处理后的HI，即序列G，由公式(9)表示：Among them, _ri is the i-th value on the HI curve, μ represents the mean of the sample, and σ represents the standard deviation of the sample. Subsequently, the data is processed by windowing. For each data point _ri , the value of the Gaussian function at this point is used as the weight, and all data points in the window are weighted averaged. The processed HI, i.e., sequence G, is expressed by formula (9):

G＝(g₁,g₂,…,g_N) (9)G＝(g ₁ ,g ₂ ,…,g _N ) (9)

在对HI曲线进行模糊平滑处理后，对其进行聚类从而获得健康阶段指标。首先确定聚类的数量N，并随机初始化每个数据点到各个聚类的隶属度值u_ij。每个数据点i的隶属度u_ij表示数据点i属于第j个聚类的程度，它是一个介于0到1之间的值，模糊隶属度u_ij的计算如公式(10)所示：After fuzzy smoothing of the HI curve, clustering is performed to obtain the health stage index. First, the number of clusters N is determined, and the membership value u _ij of each data point to each cluster is randomly initialized. The membership u _ij of each data point i represents the degree to which the data point i belongs to the jth cluster. It is a value between 0 and 1. The calculation of the fuzzy membership u _ij is shown in formula (10):

式中，‖·‖表示特征样本x_j到类中心c_i之间的欧氏距离范数指数，q表示模糊因子。之后，基于每个数据点的模糊隶属度，计算每个聚类的中心点C＝(c₁,c₂,…,c_i,…c_N)，对于第i个聚类，其中心点公共的坐标c_i通过公式(11)进行计算：In the formula, ‖·‖ represents the Euclidean distance norm index between the feature sample _xj and the cluster center _ci , and q represents the fuzzy factor. Then, based on the fuzzy membership of each data point, the center point of each cluster is calculated as C = ( _c1 , _c2 , ..., _ci , ... _cN ). For the i-th cluster, the common coordinates of its center point _ci are calculated by formula (11):

TWW-FCM通过迭代过程来优化隶属度值和每个类别的聚类中心，从而最小化目标函数J，直到满足停止条件为止。目标函数J和相应的约束条件如公式(12)所示：TWW-FCM optimizes the membership value and the cluster center of each category through an iterative process to minimize the objective function J until the stopping condition is met. The objective function J and the corresponding constraints are shown in formula (12):

0≤u_ij≤1,1≤i≤C,1≤j≤N (12) _{0≤uij≤1,1≤i≤C} ,1≤j≤N (12)

由于每个类中心最初都是在迭代过程开始时随机初始化的，所以当获得目标函数的局部极小值或鞍点时即停止迭代。停止条件定义为两个连续迭代步骤之间的隶属度矩阵的变化满足终止阈值ε。收敛条件表示如下：Since each cluster center is initially randomly initialized at the beginning of the iteration process, the iteration is stopped when a local minimum or saddle point of the objective function is obtained. The stopping condition is defined as the change of the membership matrix between two consecutive iteration steps meets the termination threshold ε. The convergence condition is expressed as follows:

||U^k+1-U^k||<ε (13)||U ^k+1 -U ^k ||<ε (13)

最终，经过TWW-FCM算法处理，得到每个数据点所属的健康阶段c_i∈(1,2,…,N)及对应的模糊隶属度u_i＝(u_i1,u_i2,…,u_iN),健康阶段标签与模糊隶属度矩阵将作为深度迁移网络的输入数据。Finally, after processing by the TWW-FCM algorithm, the health stage c _i ∈ (1, 2, …, N) to which each data point belongs and the corresponding fuzzy membership _ui = ( _ui1 , _ui2 , …, _uiN ) are obtained. The health stage label and the fuzzy membership matrix will be used as the input data of the deep migration network.

三、模糊子域特征提取器进行深度特征提取和特征对齐：3. Fuzzy subdomain feature extractor for deep feature extraction and feature alignment:

模糊子域特征提取器结合各数据点所属健康阶段类别及对应模糊隶属度，对获取的多维时频特征进行深度特征提取和特征对齐。不同于一般的子域自适应使用确定的领域标签，本发明将阶段的不确定性引入子领域自适应，使用TWW-FCM算法中生成的模糊隶属度(Fuzzy membership)来表示一个样本属于各个子域的概率，从而实现更好的子域自适应。首先使用深度神经网络，获取源域和目标域数据的特征潜在表示，再将源域和目标域的特征表示输入子结构模糊对齐模块(Sub-structure Fuzzy alignment module,SFAM)，通过最小化子结构之间的领域差异，来学习子领域层次的域不变表示。在本发明中，SFAM中包括两个多阶子域度量距离：Fuzzy Local Maximum Mean Discrepancy(FLMMD)和Fuzzy LocalCORAL(FLCORAL)。The fuzzy subdomain feature extractor combines the health stage category and corresponding fuzzy membership of each data point to perform deep feature extraction and feature alignment on the acquired multidimensional time-frequency features. Unlike the general subdomain adaptation that uses a certain domain label, the present invention introduces the uncertainty of the stage into the subdomain adaptation, and uses the fuzzy membership generated in the TWW-FCM algorithm to represent the probability that a sample belongs to each subdomain, thereby achieving better subdomain adaptation. First, a deep neural network is used to obtain the feature potential representation of the source domain and target domain data, and then the feature representation of the source domain and the target domain is input into the substructure fuzzy alignment module (SFAM), and the domain-invariant representation of the subdomain level is learned by minimizing the domain differences between substructures. In the present invention, SFAM includes two multi-order subdomain metric distances: Fuzzy Local Maximum Mean Discrepancy (FLMMD) and Fuzzy Local CORAL (FLCORAL).

首先说明FLMMD，MMD是一种用于测量两个分布之间差异的非参数距离估计，广泛用于领域自适应中。原始的MMD注重度量整体的分布差异，而忽视了一个域中各个子域的类别信息。本发明提出的FLMMD同时考虑每个样本属于所有类别的概率，实现更细粒度的模糊子域对齐，使得源域和目标域数据在映射的特征空间中更接近。First, FLMMD is explained. MMD is a non-parametric distance estimation used to measure the difference between two distributions and is widely used in domain adaptation. The original MMD focuses on measuring the overall distribution difference, while ignoring the category information of each subdomain in a domain. The FLMMD proposed in this invention considers the probability of each sample belonging to all categories at the same time, realizes a finer-grained fuzzy subdomain alignment, and makes the source domain and target domain data closer in the mapped feature space.

假定源域和目标域其中，x^S,x^T分别为源域和目标域样本，u^S,u^T分别为源域和目标域样本对应模糊隶属度，x^T为源域标签。D_S和D_T被划分为N个健康阶段，即子域为和其中，n∈1,2,…,N是类别标签。两个分布p和q之间的FLMMD定义为各个子域内均值嵌入之间的再生核希尔伯特空间(RKHS)距离的期望。因此，MMD的平方形式如式(14)所示：Assuming the source domain and target domain Among them, x ^S , x ^T are the source domain and target domain samples, u ^S , u ^T are the corresponding fuzzy membership of the source domain and target domain samples, and x ^T is the source domain label. _DS and _DT are divided into N healthy stages, that is, the subdomains are and Where n∈1,2,…,N is the class label. The FLMMD between two distributions p and q is defined as the expected reproducing kernel Hilbert space (RKHS) distance between the mean embeddings in each subdomain. Therefore, the square form of MMD is shown in Equation (14):

其中，x^S和x^T分别是和中的样本，p⁽ⁿ⁾和q⁽ⁿ⁾分别是和的分布。φ(·)表示一些将输入映射到具有特征核k的RKHS的函数，k(x^S,x^T)＝<φ(x^S),φ(x^T)>。假设每个样本根据权重wⁿ属于每个类别，其无偏估计表示为：Among them, x ^S and x ^T are and The samples in , p ⁽ⁿ⁾ and q ⁽ⁿ⁾ are and distribution. φ(·) represents some function that maps the input to an RKHS with feature kernel k, k(x ^S ,x ^T )＝<φ(x ^S ),φ(x ^T )>. Assuming that each sample belongs to each category according to the weight w ⁿ , its unbiased estimate is expressed as:

其中，和分别表示属于类n的和的权重。和都等于1，并且是类别n上的加权和。对于样本x_i，FLMMD中权重的计算如公式(16)所示：in, and They represent the and The weight of . and are all equal to 1, and is the weighted sum over category n. For sample x _i , the weight in FLMMD The calculation of is shown in formula (16):

其中，u_in是向量u_i的第n个元素的隶属度。由于阶段模糊隶属度能够很好地反映了一个样本属于不同健康阶段的概率分布，因此，本发明使用模糊隶属度和计算样本权重和 Among them, u _in is the membership of the nth element of vector u _i . Since the stage fuzzy membership can well reflect the The probability distribution of different health stages, therefore, the present invention uses fuzzy membership and Calculate sample weights and

虽然MMD是域自适应中最常用的一种度量方法，但是MMD只计算了一阶矩，可能无法完全捕捉到两个域之间的复杂差异。因此，本发明又引入源特征和目标特征之间的二阶统计量CORAL损失，具体而言，CORAL损失通过计算源样本和目标样本的协方差之间的距离，来衡量两个域之间的差异。本发明提出FLCORAL损失可以使模型更全面地考虑子域间的模糊差异，从而进一步提高模型的泛化性能。CORAL损失的计算公式如下：Although MMD is the most commonly used metric in domain adaptation, MMD only calculates the first-order moment and may not be able to fully capture the complex differences between the two domains. Therefore, the present invention introduces the second-order statistic CORAL loss between the source features and the target features. Specifically, the CORAL loss measures the difference between the two domains by calculating the distance between the covariance of the source sample and the target sample. The present invention proposes that the FLCORAL loss can enable the model to more comprehensively consider the fuzzy differences between subdomains, thereby further improving the generalization performance of the model. The calculation formula of the CORAL loss is as follows:

其中，表示平方矩阵Frobenius范数。V^S和V^T是维度为d的源数据和目标数据的协方差矩阵，可由公式(18)计算：in, represents the Frobenius norm of the square matrix. ^{V S} and V ^T are the covariance matrices of the source data and target data with dimension d, which can be calculated by formula (18):

其中1为所有元素都等于1的列向量，n_S和n_T分别表示源数据和目标数据的数量。通过给样本数据赋予权重，引入更细粒度的子域信息，从而实现子领域层面的二阶统计距离拉近。其无偏估计表示为：Where 1 is a column vector with all elements equal to 1, n _S and n _T represent the number of source data and target data respectively. By giving weights to sample data, more fine-grained subdomain information is introduced to achieve a closer second-order statistical distance at the subdomain level. Its unbiased estimate is expressed as:

其中，表示输入数据和的协方差矩阵。与FLMMD一致，源域和目标域样本权重和可以由模糊隶属度u_i计算：in, Represents input data and The covariance matrix of . Consistent with FLMMD, the weights of source and target domain samples are and It can be calculated by the fuzzy membership u _i :

为了对源域和目标域隐藏特征进行对齐，我们需要激活隐藏状态h^T和h^S。在一个batch中，给定n_b个来自源域的带标签样本，以及b_n个来自目标域的无标签样本，其中n_b表示batch size。如图5所示，FSARN网络中的特征提取器将生成激活的隐藏状态以及由于我们不能直接计算φ(·)，所以FLMMD和FLCORAL两个距离通过式(21)计算：In order to align the hidden features of the source domain and the target domain, we need to activate the hidden states h ^T and h ^S . In a batch, given n _b from the source domain of labeled samples, and _bn from the target domain unlabeled samples, where _nb represents the batch size. As shown in Figure 5, the feature extractor in the FSARN network will generate the activated hidden state as well as Since we cannot directly calculate φ(·), the two distances FLMMD and FLCORAL are calculated by formula (21):

其中k_mmd(·,·)和k_coral(·,·)分别表示FLMMD和FLCORAL的核函数。在获得源域和目标域之间的加权分布差异的估计值之后，可以通过最小化它们之间的差异来对齐每个对应的子域。where k _mmd (·,·) and k _coral (·,·) represent the kernel functions of FLMMD and FLCORAL, respectively. After obtaining the estimated value of the weighted distribution difference between the source and target domains, each corresponding subdomain can be aligned by minimizing the difference between them.

四、RUL回归器进行RUL预测：4. RUL regressor for RUL prediction:

RUL回归器是一个具有三个全连接层的预测网络，通过接收特征抽取器得到的特征进行RUL预测，最后得到的结果表示为：The RUL regressor is a prediction network with three fully connected layers. It performs RUL prediction by receiving the features obtained by the feature extractor. The final result is expressed as:

y_t＝σ(w_og_t+b_o) (22)y _t =σ(w _o g _t +b _o ) (22)

其中表示t时刻模糊子域特征提取器的输出序列，y_t是t时刻RUL预测结果，σ(·)是sigmoid函数，表示可训练参数，b_o是标量。回归器利用均方误差(mean-square error,MSE)来作为回归的损失函数：in represents the output sequence of the fuzzy subdomain feature extractor at time t, y _t is the RUL prediction result at time t, σ(·) is the sigmoid function, represents a trainable parameter, and b _o is a scalar. The regressor uses mean-square error (MSE) as the regression loss function:

其中和分别是样本的目标回归值和估计值。in and The samples are The target regression value and estimated value of .

为了拉近源域与目标域的距离，优化模型的预测结果，需要解决下式(24)所示的问题：In order to narrow the distance between the source domain and the target domain and optimize the prediction results of the model, it is necessary to solve the problem shown in the following formula (24):

其中J(·,·)表示目标域RUL真实值y_i与预测值f(x_i),的损失函数，用于衡量源域和目标域之间的所有子域差异。where J(·,·) represents the loss function between the target domain RUL true value y _i and the predicted value f(x _i ), Used to measure all subdomain differences between the source domain and the target domain.

综上，FSARN的损失由两部分组成：模糊子域对齐损失以及RUL预测损失目标函数表示为：In summary, the loss of FSARN consists of two parts: fuzzy subdomain alignment loss And RUL prediction loss Objective Function It is expressed as:

其中，β是子域对齐损失的权重系数，γ是CORAL损失的权重系数。因此，子域适应和回归拟合是同时进行的，求解公式如(26)所示：Among them, β is the weight coefficient of subdomain alignment loss, and γ is the weight coefficient of CORAL loss. Therefore, subdomain adaptation and regression fitting are performed simultaneously, and the solution formula is shown in (26):

在所提出的深度网络中，通过最小化子域对齐损失，网络可以通过最小化子域之间的差异来对源域和目标域数据进行拉近；通过最小化RUL预测损失来学习源域中的RUL预测知识。神经网络模型的训练过程中，参数更新采用了反向传播法，随机梯度下降(SGD)算法被用作权重优化来更新网络参数。在训练的每一步，模型参数是由公式(27)更新的：In the proposed deep network, by minimizing the subdomain alignment loss, the network can bring the source domain and target domain data closer by minimizing the difference between the subdomains; and learn the RUL prediction knowledge in the source domain by minimizing the RUL prediction loss. During the training process of the neural network model, the back propagation method is used for parameter update, and the stochastic gradient descent (SGD) algorithm is used as weight optimization to update the network parameters. At each step of training, the model parameters are updated by formula (27):

其中η表示学习率，其表示SGD算法随着训练的进行所采取的学习步骤。θ_E和θ_R分别是特征提取器和RUL预测器的参数。通过使用损失函数最小化梯度，源域和目标域的数据分布尽可能被拉近，隐蔽层中的高级特征可以被自动学习，从而对无标签的目标域数据进行RUL预测。Where η represents the learning rate, which represents the learning steps taken by the SGD algorithm as the training progresses. _{θ E} and θ _R are the parameters of the feature extractor and RUL predictor, respectively. By minimizing the gradient using the loss function, the data distribution of the source domain and the target domain are pulled as close as possible, and the high-level features in the hidden layer can be automatically learned to perform RUL prediction on the unlabeled target domain data.

在基于域自适应的剩余寿命预测方法中，现有的方法多聚焦于源域与目标域的全局对齐，而忽略了健康阶段所包含的局部信息。通过基于局部的对齐能够实现更细粒度的特征对齐，增强模型的适用性和适用性。但是如何在目标域数据不完整以及各健康阶段健康状态差异较大的情况下挖掘出数据的共同潜在特征是一个难题。而本发明提出的一种结合退化阶段划分和子域自适应的轴承剩余寿命预测方法，通过一个模糊子领域适应回归网络(Fuzzy Subdomain Adaptation Regression Network,FSARN)，利用基于数据标签和隶属度的多阶度量拉近各退化阶段，完成模糊子结构对齐和跨域回归，提高了预测的准确性和模型的泛化性。In the remaining life prediction method based on domain adaptation, the existing methods mostly focus on the global alignment of the source domain and the target domain, while ignoring the local information contained in the health stage. Local-based alignment can achieve more fine-grained feature alignment and enhance the applicability and applicability of the model. However, how to mine the common potential characteristics of the data when the target domain data is incomplete and the health status of each health stage is quite different is a difficult problem. The present invention proposes a bearing remaining life prediction method that combines degradation stage division and subdomain adaptation. Through a fuzzy subdomain adaptation regression network (Fuzzy Subdomain Adaptation Regression Network, FSARN), it uses multi-order metrics based on data labels and membership to bring each degradation stage closer, complete fuzzy substructure alignment and cross-domain regression, and improve the accuracy of prediction and the generalization of the model.

通过采用本发明公开的上述技术方案，得到了如下有益的效果：By adopting the above technical solution disclosed in the present invention, the following beneficial effects are obtained:

本发明提供了一种结合退化阶段划分和子域自适应的轴承剩余寿命预测方法及系统，能够充分巧妙地利用利用寿命周期中存在的阶段对应于不同子域，将全局对齐的整体差异分散到局部对齐的阶段差异。在健康阶段划分阶段，提出了一种适用于时序窗口数据的时序模糊聚类算法，实现了统一标准化的健康指标构建和阶段划分过程。在迁移阶段，则是利用基于数据标签和隶属度的多阶度量拉近各退化阶段，以此完成模糊子结构对齐和跨域回归。能够实现更细粒度的特征对齐，构建出泛化性更好的模型，同时又能够实现更准确的剩余寿命预测。The present invention provides a method and system for predicting the remaining life of a bearing that combines degradation stage division and subdomain adaptation, which can fully and cleverly utilize the stages existing in the life cycle to correspond to different subdomains, and disperse the overall differences of global alignment into the stage differences of local alignment. In the health stage division stage, a time series fuzzy clustering algorithm suitable for time series window data is proposed to realize the unified and standardized health indicator construction and stage division process. In the migration stage, multi-order metrics based on data labels and membership are used to bring each degradation stage closer, so as to complete fuzzy substructure alignment and cross-domain regression. It can achieve more fine-grained feature alignment, build a model with better generalization, and at the same time achieve more accurate remaining life prediction.

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视本发明的保护范围。The above is only a preferred embodiment of the present invention. It should be pointed out that for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the principle of the present invention. These improvements and modifications should also be considered as the scope of protection of the present invention.

Claims

1. A bearing residual life prediction method combining degradation phase division and sub-domain self-adaption is characterized in that: comprises the following steps of the method,

s1, constructing a health index curve:

acquiring an original data set, extracting time domain and frequency domain characteristics of source domain data and target domain data in the original data set, and performing secondary index optimization and correlation analysis on the obtained multidimensional characteristics to acquire a health index curve;

s2, health stage division:

processing the health index curve by adopting a time sequence window weighted clustering algorithm, obtaining the health stage labels and the fuzzy membership degree of each source domain data and each target domain data, and dividing the health stage labels and the fuzzy membership degree into a training set and a testing set according to a proportion;

s3, constructing and training a residual life prediction model:

the residual life prediction model comprises a fuzzy subdomain feature extractor and a RUL regressive; inputting the training set into a residual life prediction model to train the residual life prediction model; in the training process, based on the alignment loss of a sub-structure fuzzy alignment module in a fuzzy subdomain feature extractor and the regression loss of an RUL regressive, obtaining a model total loss, optimizing parameters of the fuzzy subdomain feature extractor and the RUL regressive by minimizing the model total loss, and obtaining and storing a trained residual life prediction model;

S4, residual life prediction:

and inputting the test set into a trained residual life prediction model to perform residual life prediction, and obtaining a residual life prediction result.

2. The method for predicting bearing residual life by combining degradation phase division and sub-domain adaptation according to claim 1, wherein: the secondary index optimization is specifically that the inter-characteristic difference alignment operation is carried out on the multidimensional characteristics to obtain relative indexes, and then the intra-characteristic compression smoothing operation is carried out on the relative indexes to obtain denoising indexes.

3. The method for predicting bearing residual life by combining degradation phase division and sub-domain adaptation according to claim 1, wherein: the correlation analysis is specifically to use the pearson correlation coefficient to measure the correlation degree of the data at each moment and the data at the initial moment on the index after the secondary optimization, and construct the health index.

4. The method for predicting bearing residual life by combining degradation phase division and sub-domain adaptation according to claim 1, wherein: the source domain data are electromechanical equipment monitoring data acquired when electromechanical equipment with a bearing runs to failure; the target domain data is monitoring data of electromechanical equipment from the operation of the electromechanical equipment without the bearing to the fault;

The source domain data are supervised data with the residual life of the bearing as a label; the target domain data is unsupervised data without a bearing remaining life tag.

5. The method for predicting bearing residual life by combining degradation phase division and sub-domain adaptation according to claim 1, wherein: step S2 specifically includes the following,

s21, dividing the health index curve into windows, and replacing the current single-point numerical value with the statistical value of the windows to realize fuzzy smoothing of the health index curve;

s22, clustering the health index curves subjected to fuzzy smoothing, and obtaining health stage labels and fuzzy membership degrees of the health stage labels of the source domain data and the target domain data through iterative optimization of fuzzy membership degrees of the data points and clustering centers of the categories;

s23, dividing the source domain data, the target domain data, the health-stage labels to which the source domain data and the target domain data belong and the fuzzy membership degree into a training set and a testing set in proportion.

6. The method for predicting bearing residual life by combining degradation phase division and sub-domain adaptation according to claim 1, wherein: step S3 specifically includes the following,

s31, inputting a training set into a fuzzy subdomain feature extractor, and respectively acquiring feature potential representations of source domain data and target domain data in the training set by using a deep neural network of the fuzzy subdomain feature extractor to acquire high-dimensional feature matrixes corresponding to the source domain data and the target domain data;

S32, inputting high-dimensional feature matrixes corresponding to the source domain data and the target domain data into a substructure fuzzy alignment module of a fuzzy subdomain feature extractor, and acquiring alignment loss by calculating FLMMD between the feature matrixes of the source domain data and the target domain data and FLCORAL between the feature matrixes of the source domain data and the target domain data and the feature matrix time of the source domain data and the target domain data;

s33, inputting the time sequence feature matrix of the aligned source domain data and the target domain data in the training set, which are output by the fuzzy subdomain feature extractor, into an RUL regressive device, outputting the residual life predicted value of the source domain data and the target domain data in the training set, and calculating the mean square error of the residual life predicted value and the true value of the source domain as regression loss;

s34, calculating the total loss of the model according to the alignment loss and the regression loss;

and S35, minimizing the total loss of the model, and feeding back and adjusting network parameters of the fuzzy subdomain feature extractor and the RUL regressive to realize network training of the fuzzy subdomain feature extractor and the RUL regressive until the training is completed, so as to obtain a trained residual life prediction model.

7. The method for predicting bearing residual life by combining degradation phase division and sub-domain adaptation according to claim 6, wherein: step S32 specifically includes the following,

S321, alignment of the first portion: fuzzy local maximum mean value difference based on the maximum mean value difference, and meanwhile, the probability that each sample belongs to all categories is considered, so that the alignment of fuzzy subdomains with finer granularity is realized; calculating FLMMD between the source domain data and the target domain data feature matrix as a first partial loss;

s322, aligning the second part: fine granularity alignment is performed on the second order statistics based on Fuzzy Local CORAL of the second order statistics Correlation Alignment while considering the probability that each sample belongs to all categories; calculating FLCORAL between the source domain data and the target domain data feature matrix as a second partial loss;

s323, integrating the first partial loss and the second partial loss to obtain the alignment loss.

8. The method for predicting bearing residual life by combining degradation phase division and sub-domain adaptation according to claim 1, wherein: the fuzzy subdomain feature extractor is a ResNet50 feature extractor.

9. The method for predicting bearing residual life by combining degradation phase division and sub-domain adaptation according to claim 1, wherein: the RUL regressor is a regression predictor based on a fully connected network.

10. A bearing residual life prediction system combining degradation phase division and sub-domain self-adaption is characterized in that: a system for implementing the method of any one of the preceding claims 1 to 9, said system comprising,

and a curve construction module: the method is used for constructing a health index curve;

the stage division module: for dividing the health phase;

the prediction model building module: the method is used for constructing and training a residual life prediction model;

the residual life prediction model comprises a fuzzy subdomain feature extractor and a RUL regressive; inputting the training set into a residual life prediction model to train the residual life prediction model; in the training process, based on the alignment loss of the sub-structure fuzzy alignment module in the fuzzy subdomain feature extractor and the regression loss of the RUL regressive, acquiring total loss, optimizing parameters of the fuzzy subdomain feature extractor and the RUL regressive by minimizing the total loss, and acquiring and storing a trained residual life prediction model;

Life prediction module: for predicting remaining life;