CN113408076A

CN113408076A - Small sample mechanical residual life prediction method based on support vector machine model

Info

Publication number: CN113408076A
Application number: CN202110783436.2A
Authority: CN
Inventors: 黄贤振; 王树凤; 李禹雄; 丁鹏飞
Original assignee: Yangzhou Super Machine Tool Co ltd
Current assignee: Yangzhou Super Machine Tool Co ltd
Priority date: 2021-07-12
Filing date: 2021-07-12
Publication date: 2021-09-17

Abstract

The invention relates to a method for predicting the residual life of a small sample machine based on a support vector machine model, which comprises the following steps: selecting a plurality of measurement variables with obvious trends from the operation signal data of the mechanical product as characteristic quantities for describing the mechanical degradation process to obtain characteristic values of the characteristic quantities; preprocessing the characteristic value: smoothing the characteristic quantity of each mechanical degradation process by adopting a Butterworth filter algorithm; mapping the characteristic values into the same range by a normalization method; converting the mechanical residual life of each time point in the existing data into the proportion of the residual life to the total life, and normalizing the residual life to be in the range of [0, 1 ]; dividing the combination of a plurality of training sets and verification sets, wherein each combination corresponds to a support vector machine model; training each support vector machine model; and adopting each support vector machine model to predict the on-line residual life. The method fully utilizes limited mechanical operation data and can realize accurate prediction of the residual service life of the machine.

Description

Small sample mechanical residual life prediction method based on support vector machine model

Technical Field

The invention relates to a method for predicting the residual life of a small sample machine based on a support vector machine model, and belongs to the field of residual life prediction of mechanical products.

Background

The remaining life of a machine in the prior art is defined as the length of time the machine has elapsed from the current point in time until it completely fails. Residual life prediction is an important link in fault diagnosis and Health Management (PHM). Accurate residual life prediction can provide effective suggestions and schemes for maintenance and replacement of machinery, thereby greatly improving the reliability and stability of the system.

The method for predicting the residual life of the machine based on data driving is the mainstream of the current research. The data driving method adopts a machine learning algorithm to establish a mapping relation between the monitoring data and the residual life of the machine. Common methods include Artificial Neural Networks (ANN), neuro-fuzzy systems (NF systems), and Deep Learning (DL), among others. However, the current mechanical learning algorithm needs a large number of high-quality sample points to train the model, and when the number of samples is small or the overall quality of the samples is poor, the error of the final prediction result is large. In engineering practice, it often takes days or even months for a machine to go from normal operation to complete failure, and therefore obtaining a full-life machine operation signal data collection is often costly. In addition, environmental noise and machine manufacturing errors can cause a large amount of interference to the observed signals, reducing the quality of the data. A Support Vector Machine (SVM) is a machine learning algorithm currently applicable to small sample conditions, and has a certain application in the remaining life of a machine.

The disadvantages are that: due to improper selection of model parameters and large difference of signals of different elements, the prediction method based on the SVM can generate over-fitting and under-fitting problems, and the prediction accuracy is reduced.

Disclosure of Invention

The invention aims to overcome the problems in the prior art and provide a method for predicting the residual life of a small sample machine based on a support vector machine model, which fully utilizes limited machine operation data, establishes a plurality of SVM models, calculates optimal model parameters through an optimization algorithm, introduces a weight value to automatically update the algorithm to distribute the model proportion, and thus realizes the accurate prediction of the residual life of the machine.

In order to solve the technical problems, the invention provides a method for predicting the residual life of a small sample machine based on a support vector machine model, which comprises the following steps:

step 1, analyzing operation signal data of a mechanical product, selecting a plurality of measurement variables with obvious trends as characteristic quantities for describing a mechanical degradation process, and obtaining characteristic values of the characteristic quantities;

step 2, preprocessing the selected characteristic values: smoothing the characteristic quantity of each mechanical degradation process by adopting a Butterworth filter algorithm to reduce random noise; the characteristic values of the characteristic quantities are mapped into the same range through a normalization method, so that the support vector machine model is conveniently trained;

step 3, converting the mechanical residual life of each time point in the existing data into the proportion of the residual life to the total life, and normalizing the residual life to be in the range of [0, 1 ];

step 4, dividing a plurality of combinations of training sets and verification sets according to the scale and the characteristics of the existing data, wherein each combination corresponds to a support vector machine model;

step 5, training each support vector machine model;

and 6, adopting each support vector machine model to predict the on-line residual life.

As a preferred embodiment of the present invention, the specific process of smoothing the feature quantity of each mechanical degradation process by using the butterworth filter algorithm in step 2 is as follows: filtering the original characteristic quantity, wherein the formula is as follows:

in the formula: n represents the filtering order; omega_cRepresents the cut-off frequency; g₀Representing the benefit at zero frequency, components with frequencies above the cutoff frequency will be filtered, while components below the cutoff frequency will be retained.

As a further preferable aspect of the present invention, the formula according to which the normalization processing in step 3 is based is as follows:

y＝(y_max-y_min)(x-x_min)/(x_max-x_min)+y_minin the formula: y is the result of the normalization; x is the original data value; y is_maxAnd y_minRespectively the upper and lower boundaries of the range to be normalized; x is the number of_maxAnd x_minThe upper and lower bounds of the original data value.

As a further preferable embodiment of the present invention, step 5 specifically includes the following steps:

step 5-1, taking the preprocessed characteristic value as an input quantity theta of the model, and taking the proportion of the residual life to the total life as an output quantity R of the model;

step 5-2, setting a multi-index optimization target to enable the optimization direction to cover the prediction precision and the fitting speed of the support vector machine;

5-3, performing parameter optimization on each individual support vector machine by adopting an artificial bee colony optimization algorithm to obtain model parameters with optimal regression and prediction effects;

and 5-4, verifying the validity of each model by testing the data of the set.

As a further preferred embodiment of the present invention, the input amount θ ═ θ in step 5-1¹，θ²，...，θ^N]Wherein N is the number of the characteristic quantities,

the j characteristic quantity containing K data; output quantity R ═ R₁，r₂，L，r_K]a/T, wherein r_iIs a point of time t_iThe remaining life of the machine, T, is the total life of the machine.

As a further preferable scheme of the present invention, in step 5-2, the multi-index optimization target is set, and the target parameters are as follows:

in the formula, mse is the mean square error of the regression effect, and it is assumed that the actual value and the regression result in the training set are λ and

then the equation for mse is:

d represents an error index describing the prediction effect of the verification set, and v is a corresponding weight.

As a further preferable scheme of the invention, the step 5-3 specifically comprises the following steps:

step 5-31: setting an initial population scale D, setting a maximum iteration number Max, and randomly generating an initial solution vector, wherein the formula is as follows: omega_dL + rand (0,1) × (u-l), where ω is_dD is more than or equal to 1 and less than or equal to D, the generated solution vector comprises H elements, and u and 1 are the upper boundary and the lower boundary of the solution vector respectively;

step 5-32: searching for a new solution vector around the initial solution vector according to the following formula:

ω′_d，m＝ω_d，m+φ(ω_d，m-ω_b，m) Wherein, ω is_d，mIs the solution vector omega_dThe m-th element of (1), ω_b，mIs the solution vector omega_bThe m-th element of (1); m is more than or equal to 1 and less than or equal to H; phi is [0, 1]A random number within a range;

step 5-33: and calculating the fitness fit of the new solution vector, wherein the formula is as follows:

where fit (ω)_d) Is an optimized objective function;

step 5-34: giving each solution vector an acceptance probability according to the fitness, and selecting an ideal solution according to the probability, wherein the formula is as follows:

step 5-35: and (5) repeating the steps 5-31 to 5-34, and updating the optimal solution in each iteration until the maximum iteration number is reached or the optimal value is not changed in a plurality of iterations, namely obtaining the model parameters with optimal regression and prediction effects.

As a further preferable embodiment of the present invention, step 6 specifically includes the following steps:

6-1, performing sliding average processing on the mechanical signal in the current operation in real time to reduce random noise;

6-2, determining the average extreme value of each characteristic quantity according to the existing data set, and carrying out normalization processing on the mechanical signal in the current operation;

6-3, calculating Euclidean distance between each characteristic quantity and the corresponding characteristic quantity in the training sample to evaluate the similarity between the current machine and the machines in the training set;

step 6-4, according to the similarity between the training set machine and the current observation machine in each support vector machine model, giving corresponding weight to each single model, and obtaining the current residual life proportion through weighted average to obtain a residual life prediction result;

and 6-5, carrying out averaging processing on the residual life prediction result in a time period, and further reducing the prediction error.

As a further preferable aspect of the present invention, the calculation formula of the euclidean distance in step 6-3 is as follows:

wherein, θ' and θ are the characteristic value of the current measuring machine and the characteristic value of a certain machine in the training set, respectively.

As a further preferable embodiment of the present invention, the calculation formula of the remaining life prediction result in step 6-4 is as follows:

wherein, w_iIs the weight of the ith support vector machine, r_iIs the prediction result of the ith support vector machine.

Compared with the prior art, the invention has the following beneficial effects: the small sample mechanical residual life prediction method based on the support vector machine can more effectively utilize limited data samples to establish the support vector machine model, is suitable for engineering application with difficulty in obtaining the data samples, and has wider applicability compared with other machine learning algorithms needing large scale and high quality. And a multi-index construction optimization target is introduced, the optimal model parameters are obtained through a manual bee colony optimization algorithm, the prediction precision of each support vector machine model is improved, and the universality of the model is enhanced. Meanwhile, a weight calculation method based on Euclidean distance is introduced, and the weight corresponding to each support vector machine is obtained according to the similarity of the characteristic quantity and the training sample, so that the phenomena of over-fitting and under-fitting are reduced, the prediction efficiency and stability are further improved, and the method has important engineering value.

Drawings

The invention will be described in further detail with reference to the following drawings and detailed description, which are provided for reference and illustration purposes only and are not intended to limit the invention.

FIG. 1 is a flow chart of an embodiment of the present invention.

FIG. 2 is a flow chart of the artificial bee colony algorithm of the present invention.

FIG. 3 is an online prediction effect of a single support vector machine on the remaining life of a machine.

FIG. 4 is an online prediction effect of a weighted support vector machine on the remaining life of a machine.

Detailed Description

As shown in fig. 1, the method for predicting the remaining life of a small sample machine based on a support vector machine model of the invention comprises the following steps:

step 5, training each support vector machine model;

The specific process of smoothing the characteristic quantity of each mechanical degradation process by adopting the Butterworth filter algorithm in the step 2 is as follows: filtering the original characteristic quantity, wherein the formula is as follows:

The formula according to which the normalization process in step 3 is based is as follows:

y＝(y_max-y_min)(x-x_min)/(x_max-x_min)+y_min (2)

in the formula: y is the result of the normalization; x is the original data value; y is_maxAnd y_minRespectively the upper and lower boundaries of the range to be normalized; x is the number of_maxAnd x_minThe upper and lower bounds of the original data value.

The step 5 specifically comprises the following steps:

step 5-1, using the preprocessed characteristic value as the input quantity theta of the model, wherein theta is [ theta ]¹，θ²，...，θ^N]Wherein N is the number of the characteristic quantities,

the j characteristic quantity containing K data;

the ratio of the remaining life to the total life is also used as the output R of the model, R ═ R₁，r₂，L，r_K]a/T, wherein r_iIs a point of time t_iThe remaining life of the machine, T, is the total life of the machine.

Step 5-2, setting a multi-index optimization target to enable the optimization direction to cover the prediction precision and the fitting speed of the support vector machine; the target parameters were as follows:

then the equation for mse is:

in formula (3), d represents an error index describing the prediction effect of the verification set, and v represents a corresponding weight and represents each indexAnd marking the influence of the optimization result. In the method, three error indexes d are selected to evaluate the prediction effect of the verification set. Suppose that the actual values and the predicted results in the verification set are gamma and

the three index calculation formulas are respectively as follows:

d1 Root Mean Square Error (RMSE):

d₂mean Relative Error (MRE):

d₃degree of fit (Convergence):

wherein the content of the first and second substances,

in the formulae (7) to (9), t_FPTAt an initial time point, t_iRepresents the ith time point; m (t)_i) Is t_iRelative error of the true value and the predicted value at the time point.

And 5-3, performing parameter optimization on each individual support vector machine by adopting an artificial bee colony optimization algorithm to obtain model parameters with optimal regression and prediction effects, wherein the method specifically comprises the following steps:

step 5-31: as shown in fig. 2, an initial population size D is set, a maximum number of iterations Max is set, and an initial solution vector is randomly generated, where the formula is as follows:

ω_d＝l+rand(0,1)×(u-l) (10)

wherein, ω is_dD is more than or equal to 1 and less than or equal to D, H elements are contained, and u and l are the upper and lower bounds of the solution vector respectively.

Step 5-32: searching for a new solution vector around the initial solution vector according to equation (11):

ω'_d,m＝ω_d,m+φ(ω_d,m-ω_b,m) (11)

wherein, ω is_d,mIs the solution vector omega_dM (m is more than or equal to 1 and less than or equal to H) element, omega_b,mIs the solution vector omega_bThe m (m is more than or equal to 1 and less than or equal to H) element; phi is [0, 1]A random number within a range;

wherein f (ω)_d) Is an optimized objective function, namely formula (3);

And 5-4, verifying the validity of each model by testing the data of the set.

The step 6 specifically comprises the following steps:

6-3, calculating Euclidean distance between each characteristic quantity and the corresponding characteristic quantity in the training sample to evaluate the similarity between the current machine and the machines in the training set; the calculation formula of the euclidean distance is as follows:

wherein, θ' and θ are respectively the characteristic value of the current measuring machine and the characteristic value of a certain machine in the training set, and the characteristic values can reflect the similarity between the variation.

6-4, according to the similarity between the training set machine and the current observation machine in each support vector machine model, giving a corresponding weight to each single model, and obtaining a current residual life ratio through weighted average to obtain a residual life prediction result, wherein the calculation formula is as follows:

Next, a remaining life prediction is performed for the CMAPSS turbine engine data set, comprising the steps of:

step 1, selecting a proper characteristic value:

the CMAPSS engine data contains 14 indicators describing the state of health of the engine, including: fan inlet temperature, compressor outlet temperature, turbine outlet temperature, fan inlet pressure, bypass duct pressure, relative blade speed, core speed, engine pressure ratio, and the like. The pressure ratio, the static pressure at the outlet of the compressor and the change trend of the relative blade rotating speed are more obvious, and the degradation process of the engine can be better reflected, so that the three variables are selected as characteristic values by the method.

And 2, 3, setting a proper filtering order and a proper cut-off frequency according to the frequency characteristics of different characteristic values, filtering the characteristic values according to a formula (1), and normalizing the characteristic values according to a formula (2).

Step 4, dividing a training set and a verification set: to simulate small sample conditions, only 5 of the 100 samples in the training set were selected as known data, and the other samples were selected as unknown data. A total of 10 support vector machine models can be trained by selecting 3 groups from 5 groups of samples as a training set and the remaining 2 groups as a verification set.

Step 5, training each support vector machine model:

step 5-1, determining input quantity and output quantity: and constructing a 3 xK input quantity matrix according to the preprocessed characteristic values, wherein K is the total number of time points of 3 groups of samples. Because the engine in the CMAPSS data set is in a steady state operation state for an initial period of time, values of the residual life greater than 125 are unified into 125 when the output quantity is constructed, and then the residual life proportion is calculated correspondingly.

Step 5-2, setting a multi-index optimization target according to the required optimization effect: since in engineering practice, overestimation of the remaining lifetime will have more serious consequences than underestimation, it is desirable to avoid overestimation as much as possible. When the objective function is constructed, a higher weight can be distributed to the index of the fitting degree, so that the prediction result is converged as soon as possible.

And 5-3, optimizing the model parameters by adopting an artificial bee colony algorithm. For the CMAPSS engine model, the radial basis kernel function is selected when the support vector machine is trained, so the parameters to be determined in the model include a penalty coefficient C and a radial basis kernel function width g. The population size is set to be 20, the maximum iteration number is set to be 50, and optimization is stopped when the optimal value does not change in 10 iterations. Optimizing the parameters according to the formula (10-13), and finally obtaining the optimal parameters of 10 support vector machine models, wherein the result is shown in table 1;

TABLE 1 support vector machine model optimization parameters

And 5-4, verifying the validity of the model according to the test set: the regression effect and the test effect on the validation set for each model are shown in table 2.

TABLE 2 support vector machine model Effect

And 6, adopting each support vector machine model to predict the residual life:

and 6-1 and 2, performing sliding average on the characteristic values obtained by measurement, and reducing random noise of the characteristic values. According to the data in the training set, the average maximum values of the three feature vectors are respectively: 2388.2587, 8.5309, and 554.3199, normalizing the three feature quantities according to the three average maximum values, respectively, to construct input quantities.

And 6-3, calculating Euclidean distance evaluation of the characteristic values and the characteristic values in the training set to evaluate similarity, wherein tables 3 and 4 show Euclidean distances of the three characteristic quantities and the corresponding characteristic values of each engine individual in the training set at different time points.

TABLE 3 Euclidean distance of characteristic values under 20-cycle operation

TABLE 4 Euclidean distance of eigenvalues under 150 cycle loop of operation

And 6-4, determining the weight of each support vector machine model, taking the time point 20 cycle period: 150 cycle period as an example, the reciprocal of each characteristic value Euclidean distance can reflect the similarity, so that the similarity between the current test engine and the engines 1-5 can be written as follows:

20 cycle period:

150 cycle period:

each support vector machine contains data of 3 engines as training samples, so the applicability of each support vector machine to the current test engine can be determined by adding the similarity s, and the weight of each support vector machine can be determined by the proportion of the applicability. Thus, at 20 cycle period, the fitness and weight of the model are as shown in table 5:

TABLE 5 suitability and weight of each model for 20 cycle run

At 150 cycle period, the fitness and weight of the model are shown in table 6:

TABLE 6 suitability and weight for each model over a 150-cycle run

The online prediction effect of a single support vector machine on the residual life of the engine is shown in FIG. 3, and the online prediction effect of the weighted support vector machine on the residual life of the engine is shown in FIG. 4. Wherein:

the predicted remaining life ratios of the 10 support vector machine models when running for 20 cycles are respectively: 0.9870,0.9431,0.9852,0.9193,0.9998,1.0009,0.9263,0.9535,0.9060,1.0533. According to equation (13), the weighted remaining life ratio results as: 0.9696.

the predicted residual life ratios of the 10 support vector machines when operating for the 150-cycle are respectively as follows: 0.7986,0.6720,0.8327,0.5157,0.6356,0.5680,0.6677,08355,0.6606, 05564. According to equation (13), the weighted remaining life ratio results as: 0.6794.

and 6-5, averaging the prediction result in order to further reduce the prediction error. Firstly, converting the proportion of the residual life into an actual life value, and averaging the residual life values of the five latest predicted time points to obtain a final predicted value of the current time point. The remaining life at 20 cycle and 150 cycle can be calculated as follows:

the actual remaining life at the two time points was 125 and 102, and it can be seen that the predicted results are very close to the actual values.

Table 7 shows the error evaluation of the predicted remaining life values at all time points.

TABLE 7 residual Life prediction error

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention. In addition to the above embodiments, the present invention may have other embodiments. All technical solutions formed by adopting equivalent substitutions or equivalent transformations fall within the protection scope of the claims of the present invention. Technical features of the present invention which are not described may be implemented by or using the prior art, and will not be described herein.

Claims

1. A small sample mechanical residual life prediction method based on a support vector machine model is characterized by comprising the following steps:

step 5, training each support vector machine model;

2. The method for predicting the residual life of the small sample machine based on the support vector machine model according to claim 1, wherein the smoothing of the feature quantity of each mechanical degradation process by using the butterworth filter algorithm in the step 2 is specifically performed by: filtering the original characteristic quantity, wherein the formula is as follows:

in the formula: n represents the filtering order; omega_cRepresents the cut-off frequency; g₀Representing benefit at zero frequency, higher than cut-off frequencyThe components of the rate will be filtered while the components below the cut-off frequency will be retained.

3. The method for predicting the residual life of the small sample machine based on the support vector machine model according to claim 1, wherein the normalization in the step 3 is based on the following formula:

y＝(y_max-y_min)(x-x_min)/(x_max-x_min)+y_min，

4. The method for predicting the residual life of the small sample machine based on the support vector machine model according to claim 1, wherein the step 5 specifically comprises the following steps:

and 5-4, verifying the validity of each model by testing the data of the set.

5. The method for predicting the residual life of the small sample machine based on the support vector machine model according to claim 4, wherein: the input amount θ [ θ ] described in step 5-1¹，θ²，...，θ^N]Wherein N is the number of the characteristic quantities,

for containing K dataThe jth feature quantity; output quantity R ═ R₁,r₂,L,r_K]a/T, wherein r_iIs a point of time t_iThe remaining life of the machine, T, is the total life of the machine.

6. The method for predicting the residual life of the small sample machine based on the support vector machine model according to claim 4, wherein the multi-index optimization target is set in step 5-2, and the target parameters are as follows:

then the equation for mse is:

7. The method for predicting the residual life of the small sample machine based on the support vector machine model according to claim 4, wherein the step 5-3 specifically comprises the following steps:

step 5-31: setting an initial population scale D, setting a maximum iteration number Max, and randomly generating an initial solution vector, wherein the formula is as follows: omega_dL + rand (0,1) × (u-l), where ω is_dD is more than or equal to 1 and less than or equal to D, the generated solution vector comprises H elements, and u and l are the upper boundary and the lower boundary of the solution vector respectively;

ω'_d,m＝ω_d,m+φ(ω_d,m-ω_b,m) Wherein, ω is_d,mIs the solution vector omega_dThe m-th element of (1), ω_b,mIs the solution vector omega_bM is more than or equal to 1 and less than or equal to H; phi is [0, 1]Random number within range；

where fit (ω)_d) Is an optimized objective function;

8. The method for predicting the residual life of the small sample machine based on the support vector machine model according to claim 1, wherein the step 6 specifically comprises the following steps:

9. The method for predicting the residual life of the small sample machine based on the support vector machine model according to claim 8, wherein: the calculation formula of the Euclidean distance in the step 6-3 is as follows:

10. The method for predicting the residual life of the small sample machine based on the support vector machine model according to claim 8, wherein: the calculation formula of the residual life prediction result in the step 6-4 is as follows: