CN103093094A

CN103093094A - Software failure time forecasting method based on kernel partial least squares regression algorithm

Info

Publication number: CN103093094A
Application number: CN2013100130053A
Authority: CN
Inventors: 蒋云良; 楼俊钢; 江建慧; 申情; 范婧
Original assignee: Huzhou University
Current assignee: Huzhou University
Priority date: 2013-01-14
Filing date: 2013-01-14
Publication date: 2013-05-08

Abstract

The invention discloses a software failure time forecasting method based on a kernel partial least squares regression algorithm. Through the application of a kernel function technology, the problem of software reliability forecast is converted to the problem of recession estimation, and the kernel partial least squares regression algorithm is used for resolving the problem of the software reliability forecast. Through fully consideration of a small sample property of the software reliability forecast, the situations that the size of observational variables is bigger than that of observational samples and multicollinearity exists among the variables can be overcome by using the kernel function technology, and so that a model 'overfitting' situation arises in modeling approaches such as a neural network does not occur. By means of the software failure time forecasting method based on the kernel partial least squares regression algorithm, model parameters are automatically and continuously adjusted to fit the dynamic change in a failure process, therefore adaptive forecasting of the software reliability is achieved, and the adaptive capability of a software failure forecasting model is improved effectively.

Description

Software Failure Time Prediction Method Based on Kernel Partial Least Squares Regression Algorithm

【技术领域】【Technical field】

本发明涉及软件可靠性测试以及评估过程中下一次或未来较长时间内软件失效时间数据预测方法。The invention relates to a software reliability test and a method for predicting software failure time data in the next or a long time in the future.

【背景技术】【Background technique】

软件可靠性指在规定条件下，在规定时间内，软件不发生失效的概率。随机过程可靠性模型是软件可靠性增长模型领域研究最多、应用最广泛的一类，但实际可靠性问题的统计成分并不能仅用经典的统计分布函数来描述，而且随机过程模型需要对软件故障的属性和软件失效过程做出许多先验的假设，这导致在不同的项目中各模型表现出极大的预测精度差异，即模型的适用性较差。Software reliability refers to the probability that software will not fail within a specified time under specified conditions. The stochastic process reliability model is the most researched and widely used category in the field of software reliability growth models, but the statistical components of actual reliability problems cannot be described only by classical statistical distribution functions, and the stochastic process model needs to analyze software faults. Many a priori assumptions are made for the attributes and software failure process, which leads to great differences in the prediction accuracy of each model in different projects, that is, the applicability of the model is poor.

基于核函数理论的方法专门针对小样本数据的预测和分类问题，在很多类似可靠性预测领域得到了非常好的结果，适合软件可靠性预测这种复杂问题。借助于计算机技术，这类模型具有自适应能力和学习功能，在模型适用性以及评估预测能力上均有较好的表现，基于核函数理论的软件可靠性模型在有限样本情况下表现出来的良好特性，在很大程度上可以解决神经网络的过学习等问题，成为目前软件可靠性模型研究中较为重要的一个突破口。The method based on kernel function theory is specially aimed at the prediction and classification of small sample data, and has obtained very good results in many similar fields of reliability prediction, which is suitable for complex problems such as software reliability prediction. With the help of computer technology, this type of model has adaptive ability and learning function, and has good performance in model applicability and evaluation and prediction ability. The software reliability model based on kernel function theory shows good performance in the case of limited samples. Features, to a large extent, can solve the problems of neural network over-learning, and become an important breakthrough in the research of software reliability model.

【发明内容】【Content of invention】

本发明所要解决的技术问题是提供一种基于核偏最小二乘回归算法的软件失效时间预测方法，实现软件可靠性的自适应预测，有效提高软件失效预测模型的适应能力。为此，本发明采用以下技术方案，它包含如下步骤：The technical problem to be solved by the present invention is to provide a software failure time prediction method based on kernel partial least squares regression algorithm, realize self-adaptive prediction of software reliability, and effectively improve the adaptability of software failure prediction model. For this reason, the present invention adopts following technical scheme, and it comprises the steps:

（1）、首先观测并记录顺序软件失效数据集，并把所有的输入输出数据归一化；(1), first observe and record the sequential software failure data set, and normalize all input and output data;

（2）、通过合理抽象与假设，把软件失效时间预测问题转化为一个函数回归问题；(2) Through reasonable abstraction and assumptions, the software failure time prediction problem is transformed into a function regression problem;

（3）、选择用于预测的核函数，并给定参数的初始化值；(3), select the kernel function used for prediction, and give the initialization value of the parameter;

（4）、选择用于学习的失效数据数目；(4) Select the number of failure data for learning;

（5）、采用核偏最小二乘回归算法针对不同失效数据集进行学习优化(5) Using kernel partial least squares regression algorithm to optimize learning for different failure data sets

（6）、最后选用优化后的参数对新的失效时间进行预测。(6). Finally, the optimized parameters are selected to predict the new failure time.

进一步地，步骤（2）所述的把软件失效时间预测问题转化为一个函数回归问题，采用以下方法：Further, the software failure time prediction problem described in step (2) is transformed into a function regression problem, and the following method is adopted:

假设已发生的软件失效时间为t₁,t₂,L,t_n，令t_l＝f(t_l-m,t_l-m+1，L,t_l-1)，则t_l服从固定但未知的条件分布函数F(t_lt_l-m,t_l-m+1，L,t_l-1)，在t₁,t₂,L,t_k已知条件下对t_k+1进行预测变为：已知k-m个观测(T₁,t_m+1),(T₂,t_m+2),L,(T_k-m,t_k)和第k-m+1个输入T_k-m+1的情况下，估计第k-m+1个输出值其中，T_i表示m维向量[t_i,t_i+1，L,t_m+i]。Assuming that the software failure time that has occurred is t ₁ , t ₂ , L, t _n , let t _l = f(t _lm ,t _l-m+1 , L,t _l-1 ), then t _l follows a fixed but unknown The conditional distribution function F(t _l t _lm ,t _l-m+1 ，L,t _l-1 ), when t ₁ ,t ₂ ,L,t _k are known, the prediction of t _k+1 becomes : Known km observations (T ₁ ,t _m+1 ),(T ₂ ,t _m+2 ),L,(T _km ,t _k ) and the k-m+1th input T _k-m+1 In the case of , estimate the k-m+1th output value Wherein, T _i represents an m-dimensional vector [t _i ,t _i+1 , L,t _m+i ].

步骤（3）中用到的核函数为高斯核函数，

其参数初始值g＝1。The kernel function used in step (3) is a Gaussian kernel function,

Its parameter initial value g=1.

步骤（4）中的失效数据数目为5-8之间的整数。The number of failure data in step (4) is an integer between 5-8.

步骤（5）采用核偏最小二乘回归算法针对不同失效数据集进行学习优化，包括如下过程：Step (5) Use the kernel partial least squares regression algorithm to optimize learning for different failure data sets, including the following process:

步骤1，输入数据为k维向量X＝{x₁,x₂,L,x_l}，输出为向量y^s,s＝1,2,L,mStep 1, the input data is a k-dimensional vector X={x ₁ ,x ₂ ,L,x _l }, and the output is a vector y ^s , s=1,2,L,m

步骤2，构建核函数矩阵：K_ij＝k(x_i,x_j)i,j＝1,2,L,l，其中Step 2, construct the kernel function matrix: K _ij =k( _xi ,x _j )i,j=1,2,L,l, where

$κ κ ((x x,, y the y)) = = {e e}^{{- - g g < < x x - - y the y,, x x - - y the y > >}^{22}}$

步骤3，令K₁＝K,

的第一行，u_j＝u_j/||u_j||Step 3, let K ₁ =K,

The first line of , u _j ＝u _j /||u _j ||

步骤4，重复计算

u_j＝u_j/||u_j||，直到收敛Step 4, recalculate

u _j ＝u _j /||u _j ||, until convergence

步骤5，计算τ_j＝K_ju_j,

Step 5, calculate τ _j =K _j u _j ,

K_j+1＝(I-τ_jτ′_j/||τ_j||²)K_j(I-τ_jτ′_j/||τ_j||²)K _j+1 ＝(I-τ _j τ′ _j /||τ _j || ² )K _j (I-τ _j τ′ _j /||τ _j || ² )

步骤6，计算B＝[β₁,L,β_k]T＝[τ₁,L,τ_k],得到系数α＝B(T′KB)^-1T′YStep 6, calculate B=[β ₁ ,L,β _k ]T=[τ ₁ ,L,τ _k ], get the coefficient α=B(T′KB) ^-1 T′Y

本发明充分考虑软件失效数据的小样本特性，把核函数理论作为一种主要手段和方法，结合软件失效过程所呈现出来的动态规律，把软件可靠性预测问题转化为一个回归估计问题，并应用核偏最小二乘回归算法来解决这一问题。The present invention fully considers the small sample characteristics of software failure data, uses the kernel function theory as a main means and method, combines the dynamic law presented by the software failure process, transforms the software reliability prediction problem into a regression estimation problem, and applies Kernel partial least squares regression algorithm to solve this problem.

本发明利用输入和输出变量之间的协方差信息提取数据的潜在特征，能克服观测变量多于观测样本数的情形以及变量之间存在的多重共线性，因此不会出现神经网络等建模方法所产生的模型“过拟合”情况。在新预测方法中，随着软件失效不断发生，模型参数将不断自动调整以适应失效过程的动态变化，从而实现软件可靠性的自适应预测，有效提高软件失效预测模型的适应能力。The present invention utilizes the covariance information between the input and output variables to extract the latent features of the data, which can overcome the situation that the observed variables are more than the number of observed samples and the multicollinearity between the variables, so there will be no modeling methods such as neural networks Resulting model "overfitting" cases. In the new prediction method, as software failures continue to occur, model parameters will be automatically adjusted to adapt to the dynamic changes in the failure process, thereby realizing adaptive prediction of software reliability and effectively improving the adaptability of software failure prediction models.

【附图说明】【Description of drawings】

图1为本发明软件失效时间预测方法的流程图。FIG. 1 is a flow chart of the software failure time prediction method of the present invention.

【具体实施方式】【Detailed ways】

1)数据归一化1) Data normalization

在使用回归估计算法进行学习预测时，首先需要把所有的输入输出数据归一化到区间[0.1,0.9]，具体转化式子为：

其中，y是归一化后的值，x是实际值，x_max是数据集中的最大值，x_min是最小值，Δ＝x_max-x_min，预测结束后，采用以下映射把数据映射回到实际值：

x = \frac{y - 0.9}{0.8} \times Δ + x_{\max} .

When using the regression estimation algorithm for learning prediction, it is first necessary to normalize all input and output data to the interval [0.1,0.9]. The specific conversion formula is:

Among them, y is the normalized value, x is the actual value, x _max is the maximum value in the data set, x _min is the minimum value, Δ=x _max -x _min , after the prediction is over, use the following mapping to map the data back to to the actual value:

x = \frac{the y - 0.9}{0.8} \times Δ + x_{\max} .

2)问题转化2) Problem Transformation

在基于核函数理论的软件可靠性预测模型中，对软件失效时间数据与发生在其之前的m次失效时间数据之间的关系进行建模，则单步预测问题可以转化为：已知k-m个观测(T₁,t_m+1),(T₂,t_m+2),L,(T_k-m,t_k)和第k-m+1个输入T_k-m+1的情况下，估计第k-m+1个输出值

其中T_i表示m维向量[t_i,t_i+1,L,t_m+i]，同样的，把

作为输入，则可以预测

同理可以预测得到

In the software reliability prediction model based on kernel function theory, the relationship between software failure time data and the failure time data of m times before it is modeled, and the single-step prediction problem can be transformed into: known km In the case of observing (T ₁ ,t _m+1 ),(T ₂ ,t _m+2 ),L,(T _km ,t _k ) and the k-m+1th input T _k-m+1 , estimate k-m+1th output value

Where T _i represents the m-dimensional vector [t _i ,t _i+1 ,L,t _m+i ], similarly, put

As input, you can predict

Similarly, it can be predicted that

3)选用的核函数，参数的初始化值3) The selected kernel function, the initialization value of the parameter

4)确定核函数参数的值4) Determine the value of the kernel function parameter

核函数参数选择问题，其实质就是一个优化问题，采用网格搜索法进行核函数参数选择，比如在用SVM预测时，采用高斯核函数，需要确定两个参数即惩罚因子C与核函数参数g，基于网格法将C∈[C₁,C₂]，变化步长为C_s，而g∈[g₁,g₂]，变化步长为g_t，针对每对参数(C,g)进行训练，选取效果最好的一对参数作为模型参数。The kernel function parameter selection problem is essentially an optimization problem. The grid search method is used to select the kernel function parameter. For example, when using SVM prediction, the Gaussian kernel function is used. Two parameters need to be determined, namely the penalty factor C and the kernel function parameter g , based on the grid method, C∈[C ₁ ,C ₂ ], the change step is C _s , and g∈[g ₁ ,g ₂ ], the change step is g _t , for each pair of parameters (C,g) For training, select a pair of parameters with the best effect as model parameters.

5)核偏最小二乘回归算法5) Kernel partial least squares regression algorithm

核函数回归问题求解可以描述为：给定一群向量与对应的目标值作为输入，想要找出x_i与t_i之间的对应关系，使得在遇到一个新的向量x_*时，能够预测出它所对应的目标值t_*，t_i是任意实数。假设x与t的对应关系符合以下的函数：The solution to the kernel function regression problem can be described as: given a group of vectors with the corresponding target value As an input, we want to find out the corresponding relationship between x _i and t _i , so that when we encounter a new vector x _* , we can predict its corresponding target value t _* , and t _i is any real number. Assume that the corresponding relationship between x and t conforms to the following function:

$t t = = y the y ((x x;; w w)) = = {Σ Σ}_{i i = = 11}^{M m} {w w}_{i i} k k ((x x,, {x x}_{i i})) + + {w w}_{00}$

其中,k(x,x_i)为核函数，核函数回归估计算法的目的是找到合适的w_i。算法如下：Among them, k(x, x _i ) is the kernel function, and the purpose of the kernel function regression estimation algorithm is to find the appropriate w _i . The algorithm is as follows:

步骤3，令K₁＝K,

的第一行,u_j＝u_j/||u_j||Step 3, let K ₁ =K,

The first line of , u _j ＝u _j /||u _j ||

步骤4，重复计算

u_j＝u_j/||u_j||，直到收敛Step 4, recalculate

u _j ＝u _j /||u _j ||, until convergence

步骤5，计算τ_j＝K_ju_j,

Step 5, calculate τ _j =K _j u _j ,

为了对所建立的模型提供合理的比较与分析，采用10个来自不同类型软件的真实失效数据集对所提出的模型进行了实验分析，如表1所示。这些数据集描述了各个软件系统的失效过程，每个数据点包含两种观测统计集合：累计执行时间和累计失效次数。在实验中，训练集包括从测试开始后完整的系统失效过程，为了让核函数进行充分的学习，在实验过程中，取所有数据集的前三分之一作为学习数据，对后面三分之二数据进行预测后与真实数据进行比较。In order to provide a reasonable comparison and analysis of the established models, 10 real failure datasets from different types of software were used to carry out experimental analysis on the proposed model, as shown in Table 1. These data sets describe the failure process of each software system, and each data point contains two sets of observation statistics: cumulative execution time and cumulative failure times. In the experiment, the training set includes the complete system failure process from the beginning of the test. In order to allow the kernel function to fully learn, during the experiment, the first third of all data sets are taken as learning data, and the latter third is used as learning data. The second data is predicted and compared with the real data.

表中列出了在十个数据集上各个模型的AE值，其中模型1-6分别代表SRGMWith Logistic TEF、SRGM With Rayleigh TEF、Delayed S-Shaped Model WithLogistic TEF、Delayed S-Shaped Model With Rayleigh TEF，G-O model、YamadaDelayed S-Shaped；模型7代表本发明采用的方法，a、b、c、d代表采用的核函数分别为Gaussian Function、Linear Function、Polynomial Function、Symmetric Triangle Function。The table lists the AE values of each model on ten datasets, among which models 1-6 represent SRGMWith Logistic TEF, SRGM With Rayleigh TEF, Delayed S-Shaped Model WithLogistic TEF, Delayed S-Shaped Model With Rayleigh TEF, G-O model, Yamada Delayed S-Shaped; model 7 represents the method adopted in the present invention, a, b, c, d represent the kernel functions used are Gaussian Function, Linear Function, Polynomial Function, Symmetric Triangle Function respectively.

表1：10个数据集上各个模型预测的AE值Table 1: AE values predicted by each model on 10 datasets

结论：在不同数据集上，采用不同的核函数以及采用不同的回归估计方法时，模型预测性能均有差异，采用基于核偏最小二乘回归算法的软件可靠性预测模型能有效提高模型的预测性能和适用性。Conclusion: On different data sets, when using different kernel functions and different regression estimation methods, the prediction performance of the model is different. Using the software reliability prediction model based on the kernel partial least squares regression algorithm can effectively improve the prediction of the model. performance and applicability.

上述实施例是对本发明的说明，不是对本发明的限定，任何对本发明简单变换后的方案均属于本发明的保护范围。The above-mentioned embodiment is an illustration of the present invention, not a limitation of the present invention, and any solution after a simple transformation of the present invention belongs to the protection scope of the present invention.

Claims

1. The software failure time prediction method based on nuclear partial least squares regression algorithm is characterized in that it comprises the following steps:

(1), first observe and record the sequential software failure data set, and normalize all input and output data;

(2) Through reasonable abstraction and assumptions, the software failure time prediction problem is transformed into a functional regression problem;

(3), select the kernel function used for prediction, and give the initialization value of the parameter;

(4) Select the number of failure data for learning;

(5) Using kernel partial least squares regression algorithm to optimize learning for different failure data sets

(6). Finally, the optimized parameters are selected to predict the new failure time. the

2. The method for predicting software failure time based on kernel partial least squares regression algorithm as claimed in claim 1, characterized in that, the software failure time prediction problem described in step (2) is converted into a function regression problem, using the following method:

Assuming that the software failure time that has occurred is t ₁ , t ₂ , L, t _n , let t _l = f(t _lm ,t _l-m+1 , L,t _l-1 ), then t _l follows a fixed but unknown The conditional distribution function F(t _l |t _lm ,t _l-m+1 ,L,t _l-1 ), under the condition that t ₁ ,t ₂ ,L,t _k are known, predicts the change of t _k+1 For: known km observations (T ₁ ,t _m+1 ),(T ₂ ,t _m+2 ),L,(T _km ,t _k ) and the k-m+1th input T _k-m+ In the case of ₁ , estimate the k-m+1th output value

Wherein, T _i represents an m-dimensional vector [t _i ,t _i+1 , L,t _m+i ].

3. The software failure time prediction method based on kernel principal component regression algorithm as claimed in claim 1, wherein the kernel function used in step (3) is a Gaussian kernel function,

Its parameter initial value g=1. The number of failure data in step (4) is an integer between 5-8.

4. The software failure time prediction method based on nuclear partial least squares regression algorithm as claimed in claim 1, characterized in that, step (5) adopts nuclear partial least squares regression algorithm to carry out learning optimization for different failure data sets, including The following process:

Step 1, the input data is a k-dimensional vector X={x ₁ ,x ₂ ,L,x _l }, and the output is a vector y ^s , s=1,2,L,m

Step 2, construct the kernel function matrix: K _ij =k( _xi ,x _j )i,j=1,2,L,l, where

Step 3, let K ₁ =K,

The first line of , u _j ＝u _j /||u _j ||

Step 4, recalculate

u _j ＝u _j /||u _j ||, until convergence

Step 5, calculate τ _j =K _j u _j ,

K _j+1 ＝(I-τ _j τ′ _j /||τ _j || ² )K _j (I-τ _j τ′ _j /||τ _j || ² )

Step 6, calculate B=[β ₁ ,L,β _k ]T=[τ ₁ ,L,τ _k ], and obtain the coefficient α=B(T′KB) ⁻¹ T′Y.