CN111814342A

CN111814342A - Complex equipment reliability hybrid model and construction method thereof

Info

Publication number: CN111814342A
Application number: CN202010688009.1A
Authority: CN
Inventors: 张琳; 张保山; 刘捷; 李波; 汪文峰; 张搏; 魏桥; 魏圣军; 高娜
Original assignee: Air Force Engineering University of PLA
Current assignee: Air Force Engineering University of PLA
Priority date: 2020-07-16
Filing date: 2020-07-16
Publication date: 2020-10-23
Anticipated expiration: 2040-07-16
Also published as: CN111814342B

Abstract

The invention discloses a complex equipment reliability mixed model and a construction method thereof, wherein the method comprises the following steps: (S100) constructing a mixed distribution model by normal distribution, Weibull distribution, exponential distribution and lognormal distribution, and setting an observed equipment life data set to obtain a mixed density function of observed data; (S200) estimating unknown parameters in the mixed density function by adopting an EM algorithm, and converting the estimation of the unknown parameters into an optimization problem by means of maximum likelihood estimation; (S300) optimizing the complex equipment reliability modeling module by adopting Bayesian random classification and a K-means algorithm to obtain a complex equipment reliability mixed model based on Bayesian random classification. According to the method, the distribution parameters are optimized through the EM algorithm, and the Bayesian random classification and the K-means algorithm are combined to optimize the reliability modeling module of the complex equipment, so that the accuracy of parameter estimation is greatly improved, and the iteration speed is improved.

Description

Complex equipment reliability hybrid model and construction method thereof

Technical Field

The invention relates to a complex equipment reliability hybrid model, in particular to a complex equipment reliability hybrid model and a construction method thereof.

Background

The faults or failures of the equipment are random and difficult to avoid, so the time of the faults or failures of the equipment also occurs randomly, the fault rules of different equipment or the same equipment under different conditions are presented in different distribution types, and the general application range of the common fault distribution types is shown in table 1.

TABLE 1 Fault distribution function types and their application ranges

As can be seen from table 1, the failure rates of the same type of devices obey a certain specific probability distribution function, and if a parameter value of the failure probability distribution function of the certain type of devices can be obtained, the failure rates can be accurately characterized. As shown in fig. 1 (a), it is a fault distribution whose fault probability obeys a normal function, and it can be seen that if an equipment fault obeys a single fault distribution function, the fault rate at each moment is relatively easy to obtain, but when the equipment fault distribution is as shown in fig. 1 (b) and (c), that is, the equipment fault is simultaneously influenced by the combination of multiple fault distribution functions, the complexity of the equipment fault distribution is greatly increased, and because the conditions of the working environment, the use intensity, the processing technique and the like of the equipment are different, the difference of the fault distribution functions of the same type of equipment is relatively large, the fault characteristics of the equipment fault distribution functions often exhibit the characteristic of ambiguity [, so that it is very difficult to find a method for accurately characterizing the fault probability distribution function of the equipment fault.

For the reliability prediction problem in fig. 1 (b), (c), many scholars have proposed a series of models, such as black box theory based on fault interval time data modeling, and random process theory based on system state. However, both of the two reliability modeling theories have certain defects, only simple faults can be modeled according to the black box theory, and when system faults are caused by multiple failure modes, the accuracy rate is not high; the stochastic process theory requires that the service life distribution, the repair time distribution after the fault and other distributions of each part of the system are exponential distributions, and if the system does not meet the assumption, the stochastic process modeling method is very difficult.

Disclosure of Invention

The invention aims to provide a complex equipment reliability hybrid model and a construction method thereof, which solve the problem that the existing method can only carry out modeling on simple faults according to the black box theory, can optimize distribution parameters by an EM (effective electromagnetic radiation) algorithm aiming at various faults in hybrid distribution, and optimizes a complex equipment reliability modeling module by combining Bayesian random classification and a K-means algorithm, greatly improves the accuracy of parameter estimation, and improves the iteration speed.

In order to achieve the above object, the present invention provides a method for constructing a complex device reliability hybrid model, the method comprising:

(S100): constructing a mixed distribution model by normal distribution, Weibull distribution, exponential distribution and lognormal distribution, and setting an observed equipment life data set y (y ═ y)₁,y₂,…,y_n)^T，y_jThe sample value representing the j-th collected point, j is 1,2, … …, n, n is the total number of the collected data, the observed device life data is from a mixed distribution formed by combining a normal distribution, an exponential distribution, a Weibull distribution and a lognormal distribution, and the distribution quantity of the normal distribution, the exponential distribution, the Weibull distribution and the lognormal distribution in the mixed distribution is weighted as {π₁，π₂，π₃，π₄}, the sum of which is 1, observed data y_jThe mixing density function of (a) is as shown in formula (1):

wherein the content of the first and second substances,

in the formula, all unknown parameters are psi ═ (pi)₁,π₂,π₃,π₄,μ₁,σ₁,μ₂,σ₂,λ,θ,β)，π₁、π₂、π₃、π₄Weights, σ, representing the total distribution of normal, exponential, Weibull and lognormal distributions, respectively₁、μ₁Is a normally distributed parameter, theta and beta are parameters of Weibull distribution, lambda is a parameter of exponential distribution, sigma₂、μ₂A parameter that is lognormal distributed; pi represents a circumferential ratio; f. of₁(y_j) Is composed of

f₂(y_j) Is f₂(y_j；λ)，f₃(y_j) Is f₃(y_j；θ,β)，f₄(y_j) Is composed of

(S200) estimating unknown parameters psi in the mixed density function by adopting an EM algorithm, converting the estimation of the unknown parameters psi into an optimization problem by means of maximum likelihood estimation, wherein the optimized objective function is a likelihood function L (psi) or an equivalent log likelihood function lnL (psi), and the definition domain is the whole parameter value space;

the likelihood function for the unknown parameter Ψ is:

the equivalent log likelihood function lnL (Ψ) for the unknown parameter Ψ is:

(S300) optimizing the complex equipment reliability modeling module by adopting Bayesian random classification and K-means algorithm to obtain a complex equipment reliability mixed model based on Bayesian random classification, which comprises the following steps:

(S310) classifying the equipment life data set by using Kmeans cluster analysis, and performing maximum likelihood estimation on independent distribution parameters of each type of data to serve as prior distribution parameters of each independent distribution, wherein the distribution parameter corresponding to the ith type of life data is psi_iData amount g of each type of life data_iThe ratio of the data quantity G occupying the service life data is the weight w of each independent distribution_iIs w_i＝g_i/G；

(S320) calculating the density function value f of the j point in the i type independent distribution_ij(y_j) Density function values f of four distributions_ij(y_j) Respectively calculated by formulas (2) to (5), and normalized according to formula (14) to obtain p_ij，p_ijThe posterior probability of the j point belonging to the i type independent distribution is represented as follows:

(S330) assigning the jth point to the posterior probability interval of the ith type independent distribution

Wherein p is_j00, each point generates a random probability r according to a uniform distribution_jAccording to r_jJudging the classification of the j point according to whether the i-th independent distribution posterior probability interval belongs to the i-th independent distribution posterior probability interval or not, and updating the corresponding parameters of each independent distribution according to the current sample classification;

(S340) calculating the maximum likelihood function value of the sample classification after Kmeans clustering according to the formula (15), and selecting the sample with the maximum likelihood function value as the current optimal classification:

wherein N represents the number of independent distributions;

(S350) repeating (S320) - (S340) until the maximum likelihood function converges or the number of iterations ends.

The method of constructing a hybrid model of complex plant reliability as claimed in claim 1, wherein in step (S200), under EM framework, each y is_jOne component, considered to be from a finite mixture model, is given by z ═ z (z)₁,z₂,…,z_n)^TAn indicator vector representing the non-observable component, wherein

In formula (8), i is 1,2,3,4, j is 1,2, …, n;

the observation data vector y is equal to (y)₁,y₂,…,y_n)^TAnd missing data vector z ═ z (z)₁,z₂,…,z_n)^TCombined together to get the complete data vector x ═ (y)^T,z^T)^TThen, in the mixed distribution model, the log likelihood obtained based on the complete data of the parameter ΨFunction lnL_c(ψ) is:

preferably, in step (S200), the parameter estimation method based on the EM algorithm includes:

e, step E: computing conditional probability expectation Q (psi; psi) of a joint distribution^k) In the k +1 th iteration of the EM algorithm, there are:

at a given y and current Ψ^kConditional expectation of log-likelihood function of complete data

Comprises the following steps:

wherein, tau_i(y_j；Ψ^k) Is the jth observable data y_jA posterior probability of an ith component belonging to the finite mixture model; superscript k represents the kth iteration;

then the log-likelihood function lnL derived from the complete data based on the parameter Ψ is combined_c(psi) conditional expectation of log-likelihood function of complete data

From formulae (9) and (11), it is possible:

wherein, pi_iWeights for the respective distributions;

and M: updating the estimated value Ψ of Ψ^k+1Such that the entire parameter space of Ψ is Q (Ψ; Ψ)^k) The function takes the maximum value, and the parameter iteration formula is as follows:

determination of parameter Ψ^k+1If the convergence is not achieved, continuing to repeat the steps E and M until the convergence is achieved finally, and outputting psi^k+1。

It is another object of the present invention to provide a complex plant reliability hybrid model constructed by the method.

The complex equipment reliability hybrid model and the construction method thereof solve the problem that the existing model can only be established for simple faults according to the black box theory, and have the following advantages:

according to the model constructed by the method, the EM algorithm is adopted to optimize the distribution parameters, and although the EM algorithm has the advantages of easiness in implementation, high reliability, easiness in understanding and the like, the whole data set needs to be traversed during iteration, when the data volume is excessive, the iteration speed of the algorithm is low, and the EM algorithm has global search capability unlike a genetic algorithm, so that the problems that whether the EM algorithm can obtain a global optimal solution or not and whether the selected initial parameters are appropriate or not are excessively depended on exist. The invention combines a Bayesian random classification method to randomly distribute the data sets to each independent distribution so as to make up the defects.

The model constructed by the method overcomes the defect that a mixed model constructed by the traditional EM algorithm is easy to be locally optimal, parameters of each independent distribution are adaptively adjusted according to data characteristics, the weights of each independent distribution can be used for accepting or rejecting whether each independent distribution is reserved, the performance of the model is far higher than that of the mixed model constructed by the traditional EM algorithm when the parameters are fitted, the model is iterated for 7 times, the precision of parameter estimation is up to 99.97%, however, the iteration frequency of the mixed model constructed by the traditional EM algorithm is about 35 times, and the precision is only 82.17%.

Drawings

Fig. 1 is a three-dimensional diagram of a distribution function of a fault of a conventional device.

FIG. 2 is a flow chart of the method of the present invention.

Fig. 3 is a conventional EM algorithm computation framework diagram.

FIG. 4 shows normal distribution parameters

And (5) a variation graph.

FIG. 5 shows a normal distribution parameter μ₁And (5) a variation graph.

Fig. 6 is a graph of the variation of the exponential distribution parameter λ.

Fig. 7 is a diagram showing the variation of the weibull distribution parameter β.

Fig. 8 is a diagram showing a variation of the weibull distribution parameter θ.

FIG. 9 is a diagram of a lognormal distribution parameter μ₂And (5) a variation graph.

FIG. 10 is a diagram of lognormal distribution parameters

And (5) a variation graph.

FIG. 11 shows distribution weights π₁、π₂、π₃、π₄A variation diagram of (2).

FIG. 12 is a Bayesian stochastic classification reliability hybrid model error graph.

FIG. 13 is a hybrid model error map of a conventional EM algorithm.

FIG. 14 is a graph of a Bayesian stochastic classification reliability mixture model cumulative distribution function.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

Example 1

A method for constructing a complex equipment reliability mixed model is shown in a flow chart of fig. 2, the complex equipment reliability mixed model based on Bayesian random classification is constructed according to a black box theory, the defect that a traditional EM algorithm is easy to be trapped in local optimization is optimized by means of the Bayesian random classification, the faults aimed at by the mixed model are the result of comprehensive action of various fault distributions (table 1), sudden failures are random, and degeneration failures are predictable.

The EM algorithm is an optimization algorithm capable of performing Maximum Likelihood Estimation (MLE) through iteration, and is generally used for performing parameter estimation on a probability model containing hidden variables (latent variables) or missing data (incomplete-data), and the basic process of the EM algorithm is as follows: firstly, estimating the value of a model parameter according to the given observation data; then, the value of the missing data is estimated according to the parameter value estimated in the previous step, the parameter value is estimated again according to the estimated missing data and the previously observed data, iteration is repeated until the last convergence is reached, and the iteration is finished, as shown in fig. 3, a frame diagram is calculated for the EM algorithm.

The specific steps of the traditional EM algorithm are as follows:

(S1) inputting: observed data x ═ x₁,x₂,…,x_n)^TJoint distribution p (x, z; theta), conditional distribution p (z | x; theta), maximum number of iterations J, where z ═ z (z ═ x; theta)₁,z₂,…,z_n)^TIf the implicit data is not observed, theta is a sample model parameter, and T represents transposition;

(S2) initializing the initial value theta of the model parameter theta at random⁰I.e. the shape and position of the initialization curve (1);

(S3) starting EM algorithm iteration:

(S31) step E: computing a conditional probability expectation L (θ, θ) of the joint distribution^j) I.e. the increment of curve (1) to curve (2):

Q_i(z_i)＝P(z_i|x_i,θ^j) (1’)

wherein Q is_i(z_i) For implicit data z_iDistribution of (2).

(S32) M step: maximization of L (theta )^j) To obtain theta^j+1Namely, calculating a parameter theta when the curve (2) intersects the curve (4):

(S33) if theta^j+1And after convergence, the algorithm is ended, otherwise, the algorithm returns to the step (S31) to perform E-step iteration operation, namely when the curve (3) is intersected with the curve (4), another parameter of the curve (3) is changed, and the iteration operation is continued.

(S4) the model parameter θ is output.

The method for constructing the complex equipment reliability hybrid model specifically comprises the following steps:

(S100): constructing a mixed distribution model by normal distribution, Weibull distribution, exponential distribution and lognormal distribution, and setting an observed equipment life data set y (y ═ y)₁,y₂,…,y_n)^T，y_jA sample value indicating a j-th collected point, j being 1,2, … …, n, n being the total number of collected data, and the observed data being from a mixed distribution in which a normal distribution, an exponential distribution, a weibull distribution, and a lognormal distribution are combined, wherein weights of distribution amounts of the normal distribution, the exponential distribution, the weibull distribution, and the lognormal distribution are expressed as {π₁，π₂，π₃，π₄} in the mixed distribution, the sum thereof is 1; then observe data y_jIs shown in formula (1):

wherein the content of the first and second substances,

in the formula, all unknown parameters are psi ═ (pi)₁,π₂,π₃,π₄,μ₁,σ₁,μ₂,σ₂,λ,θ,β)，π₁、π₂、π₃、π₄Weights, σ, representing the total distribution of normal, exponential, Weibull and lognormal distributions, respectively₁、μ₁Is a normally distributed parameter, theta and beta are parameters of Weibull distribution, lambda is a parameter of exponential distribution, sigma₂、μ₂A parameter that is lognormal distributed; pi represents the weight of each distribution in the total distribution; f. of₁(y_j) Is composed of

For mixed distributions, it is difficult to resolve each sample value y from the data itself alone_jFrom which distribution, the observation does not contain all the information of the data in this sense, and is "incomplete data". The estimates of the finite mixture model of the four distributions finally end up as an estimate of the parameter vector Ψ.

(S200) estimating the unknown parameter Ψ in the mixed density function by using an EM algorithm, and transforming the estimation of the unknown parameter Ψ into an optimization problem by means of Maximum Likelihood Estimation (MLE), wherein an optimized objective function is a likelihood function L (Ψ) or an equivalent log likelihood function lnL (Ψ), and a domain of the optimization problem is the whole parameter value space, specifically as follows:

the likelihood function for the unknown parameter Ψ is:

under EM framework, each y_jOne component, considered to be from a finite mixture model, is given by z ═ z (z)₁,z₂,…,z_n)^TAn indicator vector representing the non-observable component, wherein

Wherein, i is 1,2,3,4, j is 1,2, …, n.

The observation data vector y is equal to (y)₁,y₂,…,y_n)^TAnd missing data vector z ═ z (z)₁,z₂,…,z_n)^TCombined together, the complete data vector x ═ y is obtained^T,z^T)^T. Then the log-likelihood function lnL based on the complete data for the parameter Ψ in the mixed distribution model_c(ψ) is:

the parameter estimation steps based on the EM algorithm are as follows:

e, step E: calculating the conditional probability expectation Q (psi; psi) of the joint distribution using the E step in the conventional EM algorithm calculation^k) In the k +1 th iteration of the EM algorithm, there are:

Comprises the following steps:

wherein, tau_i(y_j；Ψ^k) Is the jth observable data y_jA posterior probability of an ith component belonging to the finite mixture model; the superscript k represents the kth iteration.

From formulae (9) and (11), it is possible:

wherein, pi_iThe weights of the respective distributions.

And M: updating the estimated value psi of psi by using M steps in the traditional EM algorithm calculation^k+1Such that the entire parameter space of Ψ is Q (Ψ; Ψ)^k) The function takes the maximum value, and the parameter iteration formula is as follows:

(S300) modeling of complex equipment reliability hybrid model based on Bayesian stochastic classification

And optimizing the reliability modeling module of the complex equipment by adopting Bayesian random classification and a K-means algorithm to obtain a complex equipment reliability mixed model based on Bayesian random classification.

Although the EM algorithm has the advantages of easiness in implementation, high reliability, easiness in understanding and the like, the whole data set needs to be traversed when the EM algorithm is iterated, when the data volume is excessive, the iteration speed of the EM algorithm is low, and the EM algorithm does not have the global search capability like a genetic algorithm, whether the EM algorithm can obtain a global optimal solution or not depends excessively on whether the selected initial parameters are proper or not. The traditional EM algorithm adopts a random selection or experience selection method to initialize the parameters to be estimated, so that the iterative algorithm is easy to fall into local optimum. The mixed die is generally obtained by superposing a plurality of independent distributions, a certain intersection exists between the independent distributions, and if absolute classification initialization parameters are adopted, the parameters are inconsistent with the theoretical situation.

Aiming at the problems existing in the optimization of the EM initial parameters, a Bayesian random classification method is adopted to randomly distribute the data sets to each independent distribution so as to make up the defects. The method comprises the following specific steps:

(S310) classifying the equipment life data set by using Kmeans cluster analysis, and performing maximum likelihood estimation on independent distribution parameters of each type of data to serve as prior distribution parameters of each independent distribution, wherein the distribution parameter corresponding to the ith type of life data is psi_iData amount g of each type of life data_iThe ratio of the data quantity G occupying the service life data is the weight w of each independent distribution_iIs w_i＝g_i/G。

Wherein p is_j00, each point generates a random probability r according to a uniform distribution_jAccording to r_jWhether the classification belongs to the posterior probability interval of the ith independent distribution or not is judged, and the corresponding parameter of each independent distribution is updated according to the classification of the current sample.

where N represents the number of independent distributions.

Specific application of the model of example 1

In order to verify the accuracy of the model constructed by the invention, the mixed distribution of the reliability of the complex equipment is constructed by normal distribution, exponential distribution, Weibull distribution and logarithmic normal distribution, the data of the mixed distribution is generated by randomly selecting mixed distribution parameters, the specific parameter setting is shown as the mixed distribution 1 in table 2, the capacity of the generated reliability data is 10000, the cluster number of the Bayesian random classification reliability mixed model is 4, the maximum iteration time is set to be 1000 times, the absolute value of the difference value between the random mixed distribution of the generated data and the complex equipment reliability mixed model based on the Bayesian random classification is defined as an error, and the setting error is less than 1 multiplied by 10^-3For the convergence condition, a reliability mixed model constructed by the traditional EM algorithm is selected for the comparison test, the set parameters of the reliability mixed model are the same as those of a complex equipment reliability mixed model based on Bayesian random classification, and the effectiveness of the model provided by the invention is verified by comparing the iteration errors of the two algorithms.

TABLE 2 respective mixing distribution parameters

As can be seen from fig. 4-11 and fig. 14, the bayesian stochastic classification reliability hybrid model constructed by the present invention can overcome the defect that the hybrid model constructed by the conventional EM algorithm is prone to be locally optimal, adaptively adjust the parameters of each independent distribution according to the data characteristics, and use the weight of each independent distribution to accept or reject whether each independent distribution is reserved. As can be seen from fig. 12-13, the performance of the bayesian stochastic classification reliability hybrid model when the bayesian stochastic classification reliability hybrid model is fitted to the parameters is much higher than that of the hybrid model constructed by the conventional EM algorithm, which only iterates 7 times, and the accuracy of the parameter estimation is as high as 99.97%, whereas the hybrid model constructed by the conventional EM algorithm starts to converge about 35 times and only reaches 82.17%.

While the present invention has been described in detail with reference to the preferred embodiments thereof, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims

1. A method for constructing a complex equipment reliability hybrid model is characterized by comprising the following steps:

(S100): constructing a mixed distribution model by normal distribution, Weibull distribution, exponential distribution and lognormal distribution, and setting an observed equipment life data set y (y ═ y)₁,y₂,…,y_n)^T，y_jThe sample value representing the j-th collected point, j being 1,2, … …, n, n being the total number of collected data, the observed device lifetime data being from a mixed distribution in which a normal distribution, an exponential distribution, a weibull distribution, and a lognormal distribution are combined, and the weight of the distribution quantity of the normal distribution, the exponential distribution, the weibull distribution, and the lognormal distribution in the mixed distribution being {π ₁，π₂，π₃，π₄}, the sum of which is 1, observed data y_jIs represented by the formula (1):

wherein the content of the first and second substances,

the likelihood function for the unknown parameter Ψ is:

(S300) optimizing the reliability modeling module of the complex equipment by adopting Bayesian random classification and K-means algorithm to obtain a complex equipment reliability mixed model based on Bayesian random classification, which comprises the following steps:

(S310) classifying the equipment life data set by using Kmeans cluster analysis, and carrying out maximum likelihood estimation on independent distribution parameters of each type of data to serve as prior distribution parameters of each independent distribution, wherein the distribution parameter corresponding to the ith type of life data is psi_iData amount g of each type of life data_iThe ratio of the data quantity G occupying the service life data is the weight w of each independent distribution_iIs w_i＝g_i/G；

(S340) calculating the maximum likelihood function value of the sample classification after Kmeans clustering according to the formula (15), and selecting the sample classification with the maximum likelihood function value as the current optimal classification:

wherein N represents the number of independent distributions;

2. The method of constructing a hybrid model of complex plant reliability as claimed in claim 1, wherein in step (S200), under EM framework, each y is_jOne component, considered to be from a finite mixture model, is given by z ═ z (z)₁,z₂,…,z_n)^TAn indicator vector representing the non-observable component, wherein

In formula (8), i is 1,2,3,4, j is 1,2, …, n;

the observation data vector y is equal to (y)₁,y₂,…,y_n)^TAnd missing data vector z ═ z (z)₁,z₂,…,z_n)^TCombined together, the complete data vector x ═ y is obtained^T,z^T)^TThen, in the mixed distribution model, the log-likelihood function lnL based on the complete data of the parameter Ψ_c(ψ) is:

3. the method for constructing a hybrid model of complex plant reliability according to claim 2, wherein in the step (S200), the parameter estimation method based on the EM algorithm comprises:

Comprises the following steps:

From formulae (9) and (11), it is possible:

wherein, pi_iWeights for the respective distributions;

4. A complex device reliability hybrid model constructed by the method of any one of claims 1-3.