CN118036451A

CN118036451A - Lifetime prediction method, system and storage medium for rotating machinery under limited sample

Info

Publication number: CN118036451A
Application number: CN202410127257.7A
Authority: CN
Inventors: 刘淑杰; 王宇; 吕帅; 刘耕硕; 孟祥杰
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Filing date: 2024-01-30
Publication date: 2024-05-14

Abstract

The invention provides a life prediction method, a life prediction system and a storage medium for a rotary machine under a limited sample, and belongs to the technical field of residual life prediction of the rotary machine. Firstly, respectively collecting vibration signals under a certain working condition and other working conditions as source domain data and target domain data, and then extracting features to form a feature set; building a meta-learner as a training model, and fusing a self-attention mechanism and a convolutional neural network; constructing a self-adaptive weighted loss function, using source domain data to generate meta tasks as input, and training a meta learner by using a Reptile-element learning frame; the target domain provides a very small amount of sample construction regression tasks, and gradient updating is carried out on the meta learner, so that the model is quickly adapted to the new domain, and the residual service life of the target domain sample is effectively predicted. The model constructed by the invention can more accurately capture and understand the running state of the equipment, the processing capacity of the model on time series data is enhanced, and higher accuracy is achieved when the residual service life of the equipment is predicted.

Description

Lifetime prediction method, system and storage medium for rotating machinery under limited sample

Technical Field

The invention belongs to the technical field of residual life prediction of rotary machinery, and relates to a life prediction method, a system and a storage medium for rotary machinery under a limited sample.

Background

In modern industrial production processes, rotary machines are used as key components, and stable and reliable operation of the rotary machines is important for guaranteeing continuity of production processes, improving production efficiency and guaranteeing safety of workplaces. Failure of the rotating machinery often results in a stagnation of the production line, not only affecting production efficiency, but also possibly leading to high maintenance costs and potential safety risks. Therefore, accurate prediction of the Remaining Useful Life (RUL) of these mechanical devices is of great economic and safety importance for optimizing maintenance planning, preventing unexpected downtime, reducing maintenance costs, and avoiding potential safety accidents.

In recent years, with rapid development of machine learning, particularly deep learning technology, data-driven based methods have shown excellent capabilities in feature extraction, pattern recognition and processing of high-dimensional nonlinear data, and have become a research hotspot in the field of RUL prediction. Such methods typically include data acquisition, feature engineering, model training, and verification. By analyzing a large amount of operation data collected from sensors and the like, the machine learning model is able to learn a complex pattern of equipment degradation and make accurate predictions. In particular, deep learning models, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), exhibit great potential in processing sequence data and high-dimensional features.

However, despite the excellent results achieved by data-driven methods in laboratory environments, significant challenges are often faced in practical industrial applications. On the one hand, model training samples are often limited due to the large amount of detailed operational and fault data that is often difficult and costly to obtain in an industrial environment, which has to be trained and adapted in the context of small samples. On the other hand, when models are trained in one environment (source domain) and deployed in another, different environment (target domain), a "domain shift" phenomenon is often encountered. The phenomenon refers to the situation that the data distribution and the characteristics learned by the model in the source domain cannot fully represent the target domain, so that the generalization capability of the model is reduced, and the expected effect cannot be achieved in the target domain. This phenomenon is particularly common in the field of Predictive Health Management (PHM), because the actual working environment tends to be more complex and variable than the laboratory environment, such as different operating conditions, equipment wear states, environmental changes, etc., may lead to the occurrence of domain shifting phenomena. These problems significantly affect the accuracy and reliability of RUL predictions, and become a critical technical challenge to be solved.

Disclosure of Invention

The invention provides a life prediction method, a life prediction system and a storage medium for a rotary machine under a limited sample, and aims to solve the problems that in the prior art, the accuracy of residual life prediction of the rotary machine is not high under the condition that the data size is limited and domain offset exists. In particular, in practical application scenarios, existing machine learning methods are difficult to implement with high accuracy predictions due to the limited operational and fault data available.

In order to achieve the above purpose, the invention adopts the following technical scheme:

in one aspect, the present invention provides a method for life prediction for a rotating machine under a limited sample, comprising the steps of:

Step 1: the vibration sensor is used for collecting vibration signals of the rotary machine under a certain working condition as source domain data, and the vibration sensor is used for collecting equipment vibration signals under other working conditions as target domain data.

Step 2: performing time domain and frequency domain analysis on the vibration signals x= [ X ₁,x₂,…,x_N ] of the collected source domain data and target domain data to extract key features, wherein the key features comprise:

Average (mean):

Standard deviation (std):

Root mean square value (rms):

Maximum value (max): max (x _i);

Minimum (min): min (x _i);

Extremely bad (Peak to Peak): max (x _i)-min(x_i);

Kurtosis (Kurt):

degree of deviation (skewness):

peak ratio (create factor): Shape factor (Shape factor): /(I) Wherein mean absolute =

Sharpness factor (CLEARANCE FACTOR): impact factor (Impulse factor): /(I)

Peak frequency (Peak frequency): freqs [ argmax (PSD (f)) ], whereinWhere f is the frequency.

Total power (Total power): sigma PSD (f);

Spectrum center of gravity (Spectral centroid):

spectral kurtosis (Spectral kurt): wherein psc _i is the power spectral density value of the ith frequency bin, and psc is the average value of the power spectral densities;

Spectral skewness (SPECTRAL SKEW):

energy of IMF component (IMF _i): where IMF _i (t) represents the ith IMF component in the Empirical Mode Decomposition (EMD).

Step 3: calculating pearson correlation coefficients of extracted source domain features and machine run timeWhere X _i is the i-th element in the original eigenvalue sequence,/>Representing the mean of the original feature sequence, t _i representing the ith element in the time series,/>Representing the mean of the time series; the top 15 features of r _Xt are selected and normalized, i.e./>Wherein σ is the standard deviation of the feature sequence; extending the sample length using sliding window techniques: SW _(k)＝[X_(i),X_(i+1),…,X_(i+W-1) ], where i=k×s, k is the index of the window, W is the window width, S is the step size, satisfying/>And collecting the expanded features to form a feature set.

Step 4: building a meta learner as a training model, wherein an encoder is built based on Conformer framework and is used for further extracting the characteristics of the characteristic set, and the encoder sequentially comprises 1D convolution and a plurality of Conformer blocks; the input data X is first passed through a 1D convolution layer, converting the input data into a higher level representation of the features, and adjusting its dimensions to accommodate the Conformer block process:

X′＝Conv1D(X)

The X' after the feature extraction and the dimension transformation of the 1D convolution Layer enters Conformer blocks to further carry out subsequent feature extraction, and the structure of the Conformer blocks sequentially comprises a forward propagation module (FFM), a multi-head self-attention module (MultiHead), a Convolution Module (CM), a forward propagation module 2 (FFM 2) and Layer normalization (Layer Norm); in the structure of the forward propagation module, the data X' is processed by:

first, data X' is subjected to layer normalization:

Wherein E [ X '] represents the mean value of X', var [ X '] represents the variance of X', E is a small positive number for avoiding the division by zero, gamma represents a scaling parameter, is a learnable parameter for readjusting the scale of normalized data to maintain the expressive force of the model, beta represents an offset parameter, is a learnable parameter for readjusting the center position of normalized data, and as such, represents element multiplication;

Normalized data X _LN is transformed by the linear layer:

X_Linear＝W_LinearX_LN+b_Linear

Wherein W _Linear represents a weight matrix of the linear layer, and b _Linear represents a bias term of the linear layer;

The linearly transformed data X _Linear applies Swish activation function:

X_Swish＝X_Linear·sigmoid(X_Linear)

wherein sigmoid (X _Linear) represents a sigmoid function;

Swish activated data X _Swish were reduced by Dropout layer over fitting:

X_Dropout＝Dropout(X_Swish)

Data X _Dropout after Dropout passes again through the linear layer:

X_Linear2＝W_Linear2X_Dropout+b_Linear2

Wherein W _Linear2 represents the weight matrix of the linear layer, and b _Linear2 represents the bias term of the linear layer;

The data X _Linear2 after the second linear transformation passes through Dropout layer again:

X_Dropout2＝Dropout(X_Linear2)

Adding X' to the Dropout result through residual connection to obtain FFM final output:

X_FFM＝X_Dropout2+X′

Then, X _FFM is fed into the multi-headed self-attention module (MultiHead), which computes a query (Q), a key (K), and a value (V):

Q＝X_FFMW^Q,K＝X_FFMW^K,V＝X_FFMW^V

Wherein W ^Q、W^K、W^V represents the weight matrix of the query, key and value, respectively;

Calculating a single head self-attention weight:

Where T represents the matrix transpose, d _k represents the dimension of each key vector, softmax represents the softmax function;

Q, K, V are transformed by a plurality of groups of different W ^Q,W^K,W^V respectively to obtain a plurality of groups of different queries (Q), keys (K) and values (V), and corresponding attributes (Q, K, V) are calculated, each group of attributes (Q, K, V) is equivalent to a head. Each "head" focuses on a different subspace representation, then connects the outputs of all heads together and produces a final output representation by another linear transformation:

MultiHead(Q,K,V)＝Concat(head₁,head₂,…,head_h)W^O

Wherein Concat (x) represents that the outputs of all the heads are spliced together in a specific dimension, W ^O is a weight matrix of the final linear transformation, and h is the total number of heads;

the final step of the multi-headed self-attention module combines the final output with input X _FFM, passing on to the next module through the residual connection:

X_MultiHead＝MultiHead(Q,K,V)+X_FFM

then, X _MultiHead enters a Convolution Module (CM) where it is first processed by layer normalization:

the data after layer normalization is used for adjusting characteristic dimension through point-by-point convolution:

In the formula, conv1D _pointwise (x) represents a point-by-point convolution.

The data is then activated via GLU:

X_GLU＝X_PWConv⊙sigmoid(X_PWConv)

the GLU-activated data enter a 1D deep convolution:

X_DepthConv＝Conv1D_depthwise(X_GLU)

Batch normalization of 1D depth convolved data:

X_BN＝BatchNorm(X_DepthConv)

The data then activates the function through Swish:

The Swish activated data is again convolved by point-by-point:

X_PWConv2＝Conv1D_pointwise(X_Swish)

The data were passed through the Dropout layer to reduce overfitting:

the Dropout processed data is added to the original input X _MultiHead of the multi-head self-attention module through residual connection to obtain the final output:

Then X _CM is subjected to a forward propagation module and primary layer normalization again to finish feature extraction; for the extracted features, they are passed as input to a linear regression layer. The function of the linear layer is to map the features to an actual predicted value, i.e. a predicted value of the remaining lifetime.

Step 5: to optimize the performance of the model, an adaptive weighted loss function is constructed to address sample tag imbalance issues in the device full lifecycle samples:

where N 'represents the total number of samples, and beta' is an exponentially adjusting parameter for controlling the intensity of the weight distribution, Representing the predicted RUL value of the model, y _i represents the sample true RUL value, max (freq (y)) represents the frequency of the highest occurrence of true RUL in the samples, freq (y _i) represents the frequency of occurrence of samples with RUL value equal to y _i in the source domain samples.

Step 6: generating a large number of meta-tasks using source domain dataCouple the metatask set/>As input, the meta learner is trained using a Reptile th element learning framework based on the adaptive weighted loss function constructed in step 5: firstly, randomly initializing a parameter theta of a model encoder; then, the encoder parameters θ are updated in batches, m tasks are selected for each update, and for each task T _j, the global parameters θ are copied to temporary parameters θ' _j. Performing a multi-step gradient descent on θ' _j at task T _j, updating the formula as:

wherein α is the internal circulation learning rate; Representing the RUL predictor for sample x _i for the meta-learner in the configuration of parameter θ' _j. The global parameter θ is then updated using knowledge learned from task T _j, with the update formula:

Where η is the extrinsic cycle learning rate.

Step 7: after model training is completed, a very small number of samples are provided on the target domain, creating a new regression task T _new. Then, the element learner is subjected to gradient update by using T _new, so that a predictor suitable for a new working condition is obtained. Therefore, the model can be quickly adapted to the new field, and has good performance, so that the residual service life of the target field sample can be effectively predicted.

In a second aspect, the present invention provides a life prediction system for a rotary machine under a limited sample, for performing the life prediction method of the first aspect, comprising:

The data acquisition unit comprises a vibration sensor and other monitoring equipment and is used for collecting vibration signals and operation data of the rotating machinery equipment under different working conditions;

a data processing unit comprising at least one processor and associated memory, the processor configured with an instruction set for performing feature extraction, normalization processing, sliding window techniques, and model training and prediction;

the storage unit is used for storing the collected original data, the processed data set, the model parameters, the learning rate, the meta-task information and other necessary configuration information;

The model training unit is used for implementing a complex network structure of an encoder based on Conformer architecture, a multi-head self-attention mechanism, a forward propagation module and a convolution module;

The model optimizing unit applies Reptile-based meta-learning framework to optimize global parameter initialization so as to quickly adapt to new tasks;

the user interface allows a user to input equipment working condition information, adjust training parameters, start a training process and evaluate model performance;

and the output unit is used for displaying the prediction result and the related analysis report of the residual service life of the equipment and providing visualization of the model performance.

In a third aspect, the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to execute the lifetime prediction method of the first aspect; the computer readable storage medium includes, but is not limited to, a hard disk drive, a solid state drive, an optical disk, a USB flash drive, a network storage, or any other form of media capable of being read by a computer system; wherein the computer instructions include a network communication component to support remote data transmission, cloud computing resource access, and distributed data processing.

The invention has the beneficial effects that:

Compared with the prior art, the invention has the beneficial effects that: 1. the multi-dimensional feature extraction strategy from the time domain to the frequency domain is adopted, so that the expressive force of data is effectively improved, and the model can more accurately capture and understand the running state of the equipment; 2. the introduced sliding window technology expands the time dimension of the sample, so that the processing capacity of the model on time series data is enhanced; 3. by combining a self-attention mechanism and a Conformer encoder of a convolutional neural network, the capability of the model in the aspects of capturing time sequence and frequency domain characteristics is remarkably improved, so that higher accuracy is achieved when the residual service life of the equipment is predicted; 4. the prediction deviation caused by sample imbalance is effectively solved through the self-adaptive weighting loss function, and particularly when the data sample difference under different working conditions is large; 5. by applying Reptile-element learning framework, the model can be quickly adapted to new tasks in practical application environments with scarce samples.

Drawings

FIG. 1 is a flow chart of the present invention.

Fig. 2 is an overall structure of the meta learner.

Fig. 3 is a structure of a forward propagation module.

Fig. 4 is a structure of a convolution module.

Fig. 5 is a view showing the effect of prediction according to the embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the present invention.

The flow of the invention is shown in FIG. 1, the data set D is divided into a training set D _train (source domain) and a testing set D _test (target domain), and a large number of meta-tasks are further generated in the training set D _train Wherein T _i＝{(x₁,y₁),(x₂,y₂),…,(x_N′,y_N′), i.e., a small number of samples in each metatask are used for model training, using higher-order learning algorithms/>Utilize a series of predefined meta-tasks/>Learning how to generate the model most suitable for the new task T _new. After training is completed, the input to the model is a new task T _new represented by test set D _test, whose output is a particular model F. Formally, this can be expressed as/>Once the model F is obtained, it can be applied to the data set D _test of the new task to predict the Remaining Useful Life (RUL), i.e., rul=f (D _new). In this process,/>Instead of directly learning the mapping from the input data to the output predictions, a loss function L is used to learn how to generate a model F that enables such mapping, and in each metatask there are very few samples for the model to adapt quickly.

In order to better describe the flow, a more detailed description of the above flow will be provided below in connection with the accompanying drawings and a specific embodiment. In this example, full life cycle signals of a slurry pump collected at an industrial site where the slurry pump has two different operating states, in operating state 1, the pump is operated at a lower power (137 kw) and a rotation speed (800 rpm), and in operating state 2, the power and rotation speed of the pump are increased to 178 kw and 980 rpm, respectively, the pump lift is 30m, the inlet-outlet diameter is 150mm, and the total operating time of the pump is 1200 hours, are used as training and testing data. The vibration sensor is arranged at the non-driving end of the pump, the sampling frequency of the sensor is 12800Hz, the signal is collected every half hour, and the sampling time of each time is 2.56s.

A method for life prediction for a rotating machine under a limited sample, the overall steps comprising:

S1, collecting vibration signals X= [ X ₁,x₂,…,x_n ] of the slurry pump in a working state 1 by using a vibration sensor as source domain data, and collecting vibration signals in a working state 2 by using the vibration sensor as target domain data.

S2, performing time domain and frequency domain analysis on the collected vibration signals to extract a plurality of key features.

S3, calculating the Pearson correlation coefficient of the extracted features and the running time of the slurry pumpThe top 15 features of r _Xt are selected and the extracted features are normalized, namely/>Wherein X _i is the original eigenvalue,/>And σ are the mean and standard deviation of the feature, respectively; and applying a sliding window technique to extend the sample length: SW _(k)＝[X_(i),X_(i+1),…,X_(i+W-1) ], where i=k×s, k is the index of the window, W is the window width, S is the step size, satisfyingThe window width is set to 30, the step size is 1, and after processing, the shape of the individual samples is expanded to (15, 30) to form a feature set.

S4, building a meta learner as a training model, wherein the overall structure of the meta learner is shown in fig. 2, and the encoder is built based on Conformer framework and used for further extracting a feature set, sequentially comprises 1D convolution and a plurality of Conformer blocks, and outputs the residual service life by using a layer of regression device based on a linear layer after flattening the feature extracted by the encoder (flat).

S5, constructing a self-adaptive weighting loss function of a model, wherein the self-adaptive weighting loss function is as follows:

S6, generating a large number of meta-tasks by using source domain data Couple the metatask set/>As an input, the meta learner is trained using a Reptile-ary learning framework based on an adaptive weighted loss function.

And S7, after training is finished, using 10 target domain samples and real RUL (continuous unit of time) as labels to form a new regression task T _new, and using the samples in T _new to update the step gradient of the element learner to obtain a life prediction model applicable to the other working condition. In the present embodiment, the number of samples in each metatask is set to 16, and the number of samples in the target domain is set to 10. I.e. the ability of the model to learn how to accurately predict the remaining life of the device with only 16 samples in training; after training, only 10 labeled samples of the target domain are needed to complete adaptation under the whole target domain, and the method can be directly used for life prediction tasks under the new working condition.

Some of the steps are described in detail below:

in step S2, the extracted features include:

Average (mean):

Standard deviation (std):

Root mean square value (rms):

Maximum value (max): max (x _i);

Minimum (min): min (x _i);

Extremely bad (Peak to Peak): max (x _i)-min(x_i);

Kurtosis (Kurt):

degree of deviation (skewness):

peak ratio (create factor): Shape factor (Shape factor): /(I) Wherein mean

Sharpness factor (CLEARANCE FACTOR): impact factor (Impulse factor): /(I)

Peak frequency (Peak frequency): freqs [ argmax (PSD (f)) ], wherein

Total power (Total power): sigma PSD (f);

Spectrum center of gravity (Spectral centroid):

Spectral skewness (SPECTRAL SKEW):

energy of IMF component (IMF _i): Wherein IMF _i (t) represents the ith IMF component in the Empirical Mode Decomposition (EMD);

In step S4, the specific operation of the Conformer-based encoder on the input data is: the input data X first goes through the 1D convolution layer:

X′＝Conv1D(X)

The X' after the feature extraction and dimension transformation by the 1D convolution layer enters Conformer blocks to further carry out subsequent feature extraction, and the structure of Conformer blocks is as shown in FIG. 2, and sequentially comprises a forward propagation module (FFM); a multi-headed self-attention module (MultiHead); a Convolution Module (CM); forward propagation module 2 (FFM 2); layer normalization (Layer Norm); the structure of the forward propagation module is shown in fig. 3, and in the structure of the forward propagation module, data X' is processed by the following steps:

first, data X' is subjected to layer normalization:

Normalized data X _LN is transformed by the linear layer:

X_Linear＝W_LinearX_LN+b_Linear

The linearly transformed data X _Linear applies Swish activation function:

X_Swish＝X_Linear·sigmoid(X_Linear)

Swish activated data X _Swish were reduced by Dropout layer over fitting:

X_Dropout＝Dropout(X_Swish)

Data X _Dropout after Dropout passes again through the linear layer:

data after the second linear transformation Again through the Dropout layer:

Then, X _FFM is fed into the multi-headed self-attention module (MultiHead), which computes a query (Q), a key (K) and a value (V):

Q＝X_FFMW^Q,K＝X_FFMW^K,V＝X_FFMW^V

Calculating single head attention weight:

Q, K, V are transformed by multiple sets of different W ^Q,W^K,W^V, respectively, to obtain multiple sets of different queries (Q), keys (K), and values (V). And calculates the corresponding attitudes (Q, K, V), each set of attitudes (Q, K, V) corresponding to a "header". Each "head" focuses on a different subspace representation, then connects the outputs of all heads together and produces a final output representation by another linear transformation:

MultiHead(Q,K,V)＝Concat(head₁,head₂,…,head_h)W^O

Wherein the method comprises the steps of

X_MultiHead＝MultiHead(Q,K,V)+X_FFM

Then, X _MultiHead enters a Convolution Module (CM), the structure of which is shown in fig. 4, in which first the processing is performed by layer normalization:

the data is then activated via GLU:

X_GLU＝X_PWConv⊙sigmoid(X_PWConv)

the GLU-activated data enter a 1D deep convolution:

X_DepthConv＝Conv1D_depthwise(X_GLU)

Batch normalization of 1D depth convolved data:

X_BN＝BatchNorm(X_DepthConv)

The data then activates the function through Swish:

The Swish activated data is again convolved by point-by-point:

The data were passed through the Dropout layer to reduce overfitting:

The Dropout processed data is added to the original input X _MultiHead by residual connection to obtain the final output:

then X _CM is subjected to a forward propagation module and primary layer normalization again to finish feature extraction; for the extracted features, a linear layer is used as a regressor to predict the remaining lifetime.

In this step, the relevant parameters of the present embodiment are set as follows: in the set self-attention mechanism, the number of heads is 4, the dimension is 128, the overall dropout rate is 0.1, the number of Conformer blocks is 2, and a single layer linear layer is used as a regressor.

In step S5, β' is an index adjustment parameter for controlling the intensity of weight distribution, which is set to 1 in this embodiment.

In step S6, the training process of the Reptile-membered learning algorithm pair-membered learner is as follows:

S6.1, initializing local parameters of a task: copying the global model parameter θ to a temporary parameter θ '_j, namely θ' _j +_θ;

S6.2, task data are acquired: obtaining a batch of data from meta-task T _j, comprising N' samples { (x ₁,y₁),(x₂,y₂),…,(x_N′,y_N′) };

S6.3, internal circulation optimization: in the inner loop, a multi-step gradient descent optimization is performed on the temporary parameter θ' _j, with the objective of minimizing a weighted mean square error loss function, the weights being determined by a function of class frequency, namely:

wherein α is the internal circulation learning rate; The RUL predictor for sample x _i, representing the meta learner in the configuration of parameter θ' _j, is set to 0.001 in this embodiment;

S6.4, updating global parameters: after the inner loop is over, the global parameter θ is updated with local parameter updates for all meta-tasks, i.e. by calculating the average difference between the global parameter and the local parameters for each task:

where η is the outer loop learning rate, and the step length of the global parameter update is controlled, and is set to 0.1 in this embodiment.

The final result of this example is shown in fig. 5, where the prediction curve closely matches the real RUL curve at the later stage of the slurry pump operation time, highlighting its significant ability to grasp task related information and adapt to new environment, where the Root Mean Square Error (RMSE) of the prediction value and the real RUL is 0.083, which indicates the accuracy and reliability of the life prediction method in the actual industrial scenario.

The present embodiment also provides a life prediction system for a rotary machine under a limited sample, for executing the above life prediction method, including:

The present embodiment also provides a computer-readable storage medium storing computer instructions for causing a computer to execute the lifetime prediction method of the first aspect; the computer readable storage medium includes, but is not limited to, a hard disk drive, a solid state drive, an optical disk, a USB flash drive, a network storage, or any other form of media capable of being read by a computer system; wherein the computer instructions include a network communication component to support remote data transmission, cloud computing resource access, and distributed data processing.

Although the invention has been described in detail with reference to preferred embodiments, the description is not intended to limit the scope of the invention. Those skilled in the art with access to the present teachings may effect numerous modifications to, additions to, or substitutions for, the disclosed embodiments without departing from the spirit and scope of the invention. Therefore, any simplified or equivalent changes and modifications are considered to be covered by the technical solution of the present invention, provided that they still meet the spirit and scope of the present invention as described in the present specification.

Claims

1. A life prediction method for a rotating machine under a limited sample, the life prediction method comprising the steps of:

Step 1: respectively acquiring equipment vibration signals under a certain working condition and other working conditions by using a vibration sensor, and respectively serving as source domain data and target domain data;

step 2: performing time domain and frequency domain analysis on the vibration signals X= [ X ₁,x₂,…,x_N ] of the collected source domain data and target domain data to extract key features;

Step 3: calculating the Pearson correlation coefficient of the extracted source domain features and the mechanical running time, selecting the first plurality of features with the highest Pearson correlation coefficient for standardization, and forming a feature set after expanding the sample length by using a sliding window technology;

Step 4: constructing a meta learner as a training model by using an encoder based on Conformer architecture, performing further feature extraction based on a source domain feature set, fusing a self-attention mechanism and a convolutional neural network, and outputting a predicted value of the residual life;

Step 5: constructing a self-adaptive weighting loss function to solve the problem of sample tag imbalance in the full life cycle samples of the equipment;

Step 6: generating a meta task by using source domain data as input, and training a meta learner by using Reptile-element learning frames based on the self-adaptive weighted loss function constructed in the step 5;

Step 7: after training is completed, a few samples are provided on the target domain to form a regression task, gradient updating is carried out on the meta learner, a predictor suitable for new working conditions is obtained, and the residual service life of the samples of the target domain is predicted.

2. The method for predicting the lifetime of a rotary machine in a limited sample according to claim 1, wherein in said step2, key features include:

Average (mean):

Standard deviation (std):

Root mean square value (rms):

Maximum value (max): max (x _i);

Minimum (min): min (x _i);

Extremely bad (Peak to Peak): max (x _i)-min(x_i);

Kurtosis (Kurt):

degree of deviation (skewness):

peak ratio (create factor): Shape factor (Shape factor): /(I) Wherein the method comprises the steps of

Sharpness factor (CLEARANCE FACTOR): impact factor (Impulse factor): /(I)

Peak frequency (Peak frequency): freqs [ argmax (PSD (f)) ], wherein Wherein f is frequency;

total power (Total power): sigma PSD (f);

Spectrum center of gravity (Spectral centroid):

Spectral skewness (SPECTRAL SKEW):

3. The method for predicting the lifetime of a rotary machine under a limited sample according to claim 1, wherein said step 3 specifically comprises: calculating pearson correlation coefficients of extracted source domain features and machine run timeWhere X _i is the i-th element in the original eigenvalue sequence,/>Representing the mean of the original feature sequence, t _i representing the ith element in the time series,/>Representing the mean of the time series; the top 15 features of r _Xt are selected and normalized, i.e./>Wherein σ is the standard deviation of the feature sequence; extending the sample length using sliding window techniques: SW _(k)＝[X_(i),X_(i+1),…,X_(i+W-1) ], where i=k×s, k is the index of the window, W is the window width, S is the step size, satisfying k=0, 1,2, …,/>And collecting the expanded features to form a feature set.

4. The method for predicting the lifetime of a rotary machine under a limited sample according to claim 1, wherein said step 4 specifically comprises: building a meta learner as a training model, wherein an encoder is built based on Conformer framework and is used for further extracting the characteristics of the characteristic set, and the encoder sequentially comprises 1D convolution and a plurality of Conformer blocks; the input data X is first passed through a 1D convolution layer, converting the input data into a higher level representation of the features, and adjusting its dimensions to accommodate the Conformer block process:

X′＝Conv1D(X)

The X' after the feature extraction and dimension transformation by the 1D convolution layer enters Conformer blocks to further carry out subsequent feature extraction, the extracted features are transmitted to a linear regression layer as input, and the features are mapped to an actual predicted value, namely a predicted value of the residual life.

5. The method for predicting the lifetime of a rotary machine under a limited sample according to claim 4, wherein the Conformer blocks of structures sequentially comprise a forward propagation module, a multi-head self-attention module, a convolution module, a forward propagation module 2 and layer normalization; in the structure of the forward propagation module, the data X' is processed by:

first, data X' is subjected to layer normalization:

Wherein E [ X '] represents the mean value of X', var [ X '] represents the variance of X', E is a small positive number for avoiding the division by zero, gamma represents the scaling parameter for readjusting the scale of the normalized data, beta represents the offset parameter for readjusting the center position of the normalized data, and As represents the element multiplication;

Normalized data X _LN is transformed by the linear layer:

X_Linear＝W_LinearX_LN+b_Linear

The linearly transformed data X _Linear applies Swish activation function:

X_Swish＝X_Linear·sigmoid(X_Linear)

wherein sigmoid (X _Linear) represents a sigmoid function;

Swish activated data X _Swish were reduced by Dropout layer over fitting:

X_Dropout＝Dropout(X_Swish)

Data X _Dropout after Dropout passes again through the linear layer:

In the method, in the process of the invention, Weight matrix representing this linear layer,/>A bias term representing this linear layer;

data after the second linear transformation Again through the Dropout layer:

Adding X' to the Dropout result through residual connection to obtain the final output of the forward propagation module:

then, send X _FFM to the multi-headed self-attention module, calculate query Q, key K, and value V:

Q＝X_FFMW^Q,K＝X_FFMW^K,V＝X_FFMW^V

Calculating a single head self-attention weight:

Transforming Q, K, V through multiple sets of different W ^Q,W^K,W^V, respectively, to obtain multiple sets of different queries Q, keys K, and values V, and calculating corresponding attitudes (Q, K, V), each set of attitudes (Q, K, V) being equivalent to a "header", then connecting the outputs of all the headers, and generating a final output representation through another linear transformation:

MultiHead(Q,K,V)＝Concat(head₁,head₂,…,head_h)W^O

Wherein Concat (x) represents that the outputs of all the heads are spliced together in a specific dimension, W ^O is a weight matrix of the final linear transformation, and h is the total number of heads; head _i＝Attention(QW_i ^Q,KW_i ^K,VW_i ^V);

X_MultiHead＝MultiHead(Q,K,V)+X_FFM

then, X _MultiHead enters a convolution module where it is first processed by layer normalization:

wherein Conv1D _pointwise represents a point-by-point convolution;

the data is then activated via GLU:

X_GLU＝X_PWConv⊙sigmoid(X_PWConv)

the GLU-activated data enter a 1D deep convolution:

X_DepthConv＝Conv1D_depthwise(X_GLU)

Batch normalization of 1D depth convolved data:

X_BN＝BatchNorm(X_DepthConv)

The data then activates the function through Swish:

The Swish activated data is again convolved by point-by-point:

The data were passed through the Dropout layer to reduce overfitting:

thereafter X _CM is again passed through a forward propagation module and once-layer normalization to complete feature extraction.

6. The method for predicting the lifetime of a rotary machine under a limited sample according to claim 1, wherein the adaptive weighted loss function in step 5 is:

Where N 'represents the total number of samples, β' is an exponential scaling parameter for controlling the intensity of the weight distribution, Representing the predicted RUL value of the model, y _i represents the sample true RUL value, max (freq (y)) represents the frequency of the highest occurrence of true RUL in the samples, freq (y _i) represents the frequency of occurrence of samples with RUL value equal to y _i in the source domain samples.

7. The method for predicting the lifetime of a rotary machine under a limited sample according to claim 1, wherein said step 6 specifically comprises: generating a large number of meta-tasks using source domain dataCouple the metatask set/>As input, the meta learner is trained using a Reptile th element learning framework based on the adaptive weighted loss function constructed in step 5: firstly, randomly initializing a parameter theta of a model encoder; then, updating the parameter theta of the encoder in batches, selecting m tasks for each update, and for each task T _j, copying the global parameter theta to a temporary parameter theta' _j; performing a multi-step gradient descent on θ' _j at task T _j, updating the formula as:

wherein α is the internal circulation learning rate; representing the RUL predictor for sample x _i for the meta learner in the configuration of parameter θ' _j; the global parameter θ is then updated using knowledge learned from task T _j, with the update formula:

Where η is the extrinsic cycle learning rate.

8. A life prediction system for a rotating machine under a limited sample for performing the life prediction method of any one of claims 1 to 7, said life prediction system comprising:

9. A computer readable storage medium having stored thereon computer instructions for causing a computer to perform the lifetime prediction method of any one of claims 1 to 7.

10. A computer readable storage medium according to claim 9, wherein the computer readable storage medium includes, but is not limited to, a hard disk drive, a solid state drive, an optical disk, a USB flash drive, a network storage, or any other form of media, readable by a computer system.