CN103106139B - Based on the software failure time forecasting methods that relevance vector regression is estimated - Google Patents

Based on the software failure time forecasting methods that relevance vector regression is estimated Download PDF

Info

Publication number
CN103106139B
CN103106139B CN201310013004.9A CN201310013004A CN103106139B CN 103106139 B CN103106139 B CN 103106139B CN 201310013004 A CN201310013004 A CN 201310013004A CN 103106139 B CN103106139 B CN 103106139B
Authority
CN
China
Prior art keywords
software
software failure
sigma
relevance vector
failure time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310013004.9A
Other languages
Chinese (zh)
Other versions
CN103106139A (en
Inventor
蒋云良
楼俊钢
沈张果
范婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huzhou University
Original Assignee
Huzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huzhou University filed Critical Huzhou University
Priority to CN201310013004.9A priority Critical patent/CN103106139B/en
Publication of CN103106139A publication Critical patent/CN103106139A/en
Application granted granted Critical
Publication of CN103106139B publication Critical patent/CN103106139B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of software failure time forecasting methods estimated based on relevance vector regression, software failure moment and m inefficacy time data before it are learnt, thus catching between the inefficacy moment in dependence, thus to build based on Method Using Relevance Vector Machine software reliability prediction method by it. Owing to having taken into full account the small sample characteristic of software reliability prediction, adopt kernel function technology that observational variable can be overcome more than the multicollinearity existed between situation and the variable of observation sample number, thus without model " over-fitting " situation produced by the modeling methods such as neutral net occurs. In new Forecasting Methodology, along with software failure constantly occurs, model parameter constantly will be adapted to the dynamic change of failure procedure automatically, thus realizing the adaptive prediction of software reliability, be effectively improved the adaptive capacity of software faults prediction model.

Description

Based on the software failure time forecasting methods that relevance vector regression is estimated
[technical field]
The present invention relates in software reliability test and evaluation process software failure time data Forecasting Methodology next time or in the following long period.
[background technology]
Software reliability refers under prescribed conditions, and at the appointed time, the probability lost efficacy does not occur software. It is that Traditional solutions reflects the statistical philosophy of large sample solving reliability prediction problem, it is easy to study and the problem such as poor for applicability occurred.
Statistical Learning Theory is built upon on a set of more solid theoretical basis, provides a unified framework for solving finite sample problem concerning study. It can, by included for a lot of existing methods, be expected to help to solve many original insoluble problems, such as neural network structure select permeability, local minimum point's problem etc. Method Using Relevance Vector Machine (relevancevectormachine, RVM) it is Tipping in calendar year 2001 proposed a kind of management loading model, very good application is achieved in a lot, tracking such as object, 3D Attitude estimation, 3D model recovery etc., load forecast, channel equalization prediction etc.
[summary of the invention]
The technical problem to be solved is to provide a kind of software failure time forecasting methods estimated based on relevance vector regression, can realize the adaptive prediction of software reliability. For this, by the following technical solutions, it comprises the steps of this utility model:
(1), first observe and record successive software failure dates set, and all of inputoutput data normalization;
(2), by abstraction and it is assumed that software failure time prediction problem is converted into a function regression problem;
(3), the kernel function for predicting the initialization value of given parameters are selected;
(4) the fail data number for learning, is selected;
(5), adopt relevance vector regression algorithm for estimating to carry out study for different failure dates sets to optimize;
(6), finally select the parameter after optimizing that the new out-of-service time is predicted.
Further, described in step (2), software failure time prediction problem is converted into a function regression problem, adopts with the following method:
Assume that the software failure time occurred is t1,t2,…,tn, make tl=f (tl-m,tl-m+1,…,tl-1), then tlObey and fix but the conditional distribution function F (t of the unknownl|tl-m,tl-m+1,…,tl-1), at t1,t2,…,tkTo t under known conditionsk+1It is predicted becoming: known k-m observation (T1,tm+1),(T2,tm+2),…,(Tk-m,tk) and kth-m+1 input Tk-m+1When, estimate kth-m+1 output valveWherein, TiRepresent m dimensional vector [ti,ti+1,…,tm+i];
The kernel function used in step (3) is gaussian kernel function, κ (x, y)=e-g < x-y, x-y > 2, its initial parameter value g=1.
Fail data number in step (4) is the integer between 5-8.
Further, the employing relevance vector regression algorithm for estimating described in step (5) carries out study for different failure dates sets and optimizes, including following process:
(5.1), given a group vectorWith corresponding desired valueAs input, it is assumed that the corresponding relation of x and t meets following function:
p(ti)=N (ti|y(xi; W), σ2)
(5.2) probability distribution, making t is:
p ( t | w , &sigma; 2 ) = &Pi; i = 1 n N ( t i | y ( x i ; w ) , &sigma; 2 ) = ( 2 &pi;&sigma; 2 ) - N 2 exp ( - | | t - &Phi; w | | 2 2 &sigma; 2 )
In formula, Φ=[φ (x1),φ(x2),…φ(xN)]T, φ (xn)=[1, k (xn,x1),k(xn,x2),…,k(xn,xN)]T;
W=[w0,w1,…wN]T,
(5.3), to each weights ωiDefinition prior probability distribution:
p ( w | &alpha; ) = &Pi; i = 0 N &alpha; i 2 &pi; exp ( - &alpha; i w i 2 2 ) , In formula, αiIt is determine wiThe hyper parameter of prior distribution,
α=(α1i,…αN)。
(5.4) Posterior distrbutionp of unknown quantity, is calculated: p ( w , &alpha; , &sigma; 2 | t ) = p ( t | w , &alpha; , &sigma; 2 ) p ( w , &alpha; , &sigma; 2 ) p ( t )
(5.5), after integration, abbreviation obtains:
p ( w | t , &alpha; , &sigma; 2 ) = ( 2 &pi; ) - N + 1 2 | &Sigma; | - 1 2 exp { - ( w - &mu; ) T &Sigma; - 1 ( w - &mu; ) 2 } ,
p ( t | &alpha; , &sigma; 2 ) = ( 2 &pi; ) - N 2 | &Omega; | - 1 2 exp { - t T &Omega; - 1 t 2 }
μ=σ-2ΣΦTT, Σ=(A+ σ-2ΦTΦ)-1, A=diag (α01,…αN), Ω=σ2I+ΦA-1ΦT,
(5.6), p (t is calculated*| approximate solution t):
p ( &alpha; M P , &sigma; M P 2 ) = arg max &alpha; , &sigma; 2 p ( &alpha; , &sigma; 2 | t ) , p ( t * | t ) &ap; &Integral; p ( t * | w , &alpha; M P , &sigma; M P 2 ) p ( w , &alpha; M P , &sigma; M P 2 | t ) d w
(5.7), following formula iterative α is usedMP,
&alpha; i n e w = &gamma; i &mu; i 2 , ( &sigma; 2 ) n e w = | | t - &Phi; &mu; | | 2 N - &Sigma; i = 0 N &gamma; i , &gamma; i = 1 - &alpha; i &Sigma; i i .
Owing to adopting technical scheme, the present invention uses RVM that software failure moment and m inefficacy time data before it are learnt thus catching between the inefficacy moment in dependence, thus builds based on Method Using Relevance Vector Machine software reliability prediction method. By the application of kernel function technology, software reliability prediction problem is converted into a regression estimation problem, and applies relevance vector regression algorithm for estimating to solve this problem. In new Forecasting Methodology, along with software failure constantly occurs, model parameter constantly will be adapted to the dynamic change of failure procedure automatically, thus realizing the adaptive prediction of software reliability.
[accompanying drawing explanation]
Fig. 1 is the flow chart of invention software out-of-service time Forecasting Methodology.
[detailed description of the invention]
1) data normalization
When using regression estimation algorithm to carry out study prediction, it is necessary first to all of inputoutput data is normalized to interval [0.1,0.9], the concrete formula that converts is: y = 0.8 &Delta; x + ( 0.9 - 0.8 &times; x m a x &Delta; ) , Wherein, y is the value after normalization, and x is actual value, xmaxIt is the maximum in data set, xminIt is minima, Δ=xmax-xmin, it was predicted that after terminating, adopt following mapping that data are mapped back to actual value:
2) problem converts
Assume that the software failure time occurred is t1,t2,…,tn, make tl=f (tl-m,tl-m+1,…,tl-1), then tlObey and fix but the conditional distribution function F (t of the unknownl|tl-m,tl-m+1,…,tl-1), use RVM that software failure time data is learnt, it is possible to catch the dependence of out-of-service time inherence. The input of RVM is m dimensional vector [tl-m,tl-m+1,…,tl-1], it is output as tl, then total for RVM list entries is t1,t2,…,tn...;
Output sequence is: tm+1,tm+2,…,tn,tn+1,…。
If being t for the RVM inefficacy moment sequence carrying out learning1,t2,…,tk(k > m), then at t1,t2,…,tkTo t under known conditionsk+1It is predicted becoming: known k-m observation (T1,tm+1),(T2,tm+2),…,(Tk-m,tk) and kth-m+1 input Tk-m+1When, estimate kth-m+1 output valveWherein, TiRepresent m dimensional vector [ti,ti+1,…,tm+i].?As input, then can predictIn like manner can obtain
The mean value function of predictive value is given by:
t k + 1 ^ = &Sigma; i = 1 m w i K ( T i + 1 , T i ) + w 0
Probabilistic forecasting distribution function is:
p ( t k + 1 | T ) &ap; N ( t k + 1 | y k + 1 , &sigma; k + 1 2 )
3) kernel function for predicting the initialization value of given parameters are selected
4) value of kernel functional parameter is determined
Kernel functional parameter select permeability, its essence is exactly an optimization problem, adopts grid data service to carry out kernel functional parameter selection, such as when predicting with SVM, adopt gaussian kernel function, it is thus necessary to determine that two parameters and penalty factor and kernel functional parameter g, based on gridding method by C ∈ [C1,C2], change step is Cs, and g ∈ [g1,g2], change step is gt, for every pair of parameter, (C, g) is trained, and chooses a pair best parameter of effect as model parameter
5) Relevance vector machine for regression algorithm for estimating
Solve regression problem with RVM can be described as: given a group vectorWith corresponding desired valueAs input, it is desirable to find out xiWith tiBetween corresponding relation so that running into a new vector x*Time, it is possible to dope the desired value t that it is corresponding*, tiIt it is any real number. The corresponding relation of x and t meets following function:
p(ti)=N (ti|y(xi; W), σ2)
It may be reasonably assumed thatIt is random variable independent of each other,
KnownWith σ2Under condition, the probability distribution of t is
p ( t | w , &sigma; 2 ) = &Pi; i = 1 n N ( t i | y ( x i ; w ) , &sigma; 2 ) = ( 2 &pi;&sigma; 2 ) - N 2 exp ( - | | t - &Phi; w | | 2 2 &sigma; 2 )
In formula, Φ=[φ (x1),φ(x2),…φ(xN)]T, φ (xn)=[1, k (xn,x1),k(xn,x2),…,k(xn,xN)]T;
W=[w0,w1,…wN]T, to each weights ωiDefinition prior probability distribution:
p ( w | &alpha; ) = &Pi; i = 0 N &alpha; i 2 &pi; exp ( - &alpha; i w i 2 2 )
In formula, αiIt is determine wiThe hyper parameter of prior distribution, α=(α1i,…αN)。
Prior distribution according to weights and sample set likelihood function, the Posterior distrbutionp of unknown quantity can be calculated by Bayesian formula and obtain:
p ( w , &alpha; , &sigma; 2 | t ) = p ( t | w , &alpha; , &sigma; 2 ) p ( w , &alpha; , &sigma; 2 ) p ( t )
Therefore, a given new vector x*Time, t*Probability distribution prediction be:
p(t*| t)=∫ p (t*|w,α,σ2)p(w,α,σ2|t)dwdαdσ2,
p(w,α,σ2| t)=p (w | t, α, σ2)p(α,σ2|t)
Thus, have
p ( w | t , &alpha; , &sigma; 2 ) = p ( w , &alpha; , &sigma; 2 | t ) p ( &alpha; , &sigma; 2 | t ) = p ( t | w , &sigma; 2 ) p ( w | &alpha; ) p ( t | &alpha; , &sigma; 2 ) = p ( t | w , &sigma; 2 ) p ( w | &alpha; ) &Integral; p ( t | w , &sigma; 2 ) p ( w | &alpha; ) d w
P in above formula (t | w, σ2) it is all the product of Gaussian function with p (w | α), after integration, abbreviation obtains:
p ( w | t , &alpha; , &sigma; 2 ) = ( 2 &pi; ) - N + 1 2 | &Sigma; | - 1 2 exp { - ( w - &mu; ) T &Sigma; - 1 ( w - &mu; ) 2 } ,
p ( t | &alpha; , &sigma; 2 ) = ( 2 &pi; ) - N 2 | &Omega; | - 1 2 exp { - t T &Omega; - 1 t 2 }
Wherein, μ=σ-2ΣΦTT, Σ=(A+ σ-2ΦTΦ)-1, A=diag (α01,…αN), Ω=σ2I+ΦA-1ΦT, such that it is able to find p (t*| approximate solution t):
p ( &alpha; M P , &sigma; M P 2 ) = arg max &alpha; , &sigma; 2 p ( &alpha; , &sigma; 2 | t ) , p ( t * | t ) &ap; &Integral; p ( t * | w , &alpha; M P , &sigma; M P 2 ) p ( w , &alpha; M P , &sigma; M P 2 | t ) d w
Two products being all Gaussian function in integration type. So, after definite integral, result is:
p ( t * | t ) &ap; N ( t * | y * , &sigma; * 2 ) , y * = &mu; T &phi; ( x * ) , &sigma; * 2 = &sigma; M P 2 + &phi; ( x * ) T &Sigma; &phi; ( x * ) ,
φ(x*)=[1, k (x*,x1),k(x*,x2),…,k(x*,xN)]T
Finally, remaining issues is to solve for &alpha; M P , &sigma; M P 2 : &alpha; i n e w = &gamma; i &mu; i 2 , ( &sigma; 2 ) n e w = | | t - &Phi; &mu; | | 2 N - &Sigma; i = 0 N &gamma; i , &gamma; i = 1 - &alpha; i &Sigma; i i ,
Wherein ΣiiIt is i-th element on the diagonal in Σ, first provides α, σ2Conjecture value, then constantly updated by above formula, just can approach αMP,
In order to provide rational comparison and analysis to the model set up, adopt 10 and carried out experimental analysis from the model that the true fail data set pair of dissimilar software is proposed, as shown in table 2. These data sets describe the failure procedure of each software system, and each data point comprises two kinds of observation statistics set: accumulative execution time and accumulative Failure count. In an experiment, training set includes starting rear complete thrashing process from test, in order to allow kernel function learn fully, in experimentation, take all data sets first three point one as learning data, compare with truthful data after 2/3rds data below are predicted.
Table lists the AE value of each model on ten data sets, wherein model 1-6 represents SRGMWithLogisticTEF, SRGMWithRayleighTEF, DelayedS-ShapedModelWithLogisticTEF, DelayedS-ShapedModelWithRayleighTEF, G-Omodel, YamadaDelayedS-Shaped respectively; Model 7 represents the method that the present invention adopts, and a, b, c, d represent kernel function respectively GaussianFunction, LinearFunction, PolynomialFunction, SymmetricTriangleFunction of adopting.
The AE value of each model prediction on 1:10 data set of table
Conclusion: on different pieces of information collection, when adopting different kernel functions and adopt different regression estimation methods, model prediction performance is all variant, adopts the prediction model of software reliability based on relevance vector regression algorithm for estimating can be effectively improved estimated performance and the suitability of model.

Claims (3)

1. the software failure time forecasting methods estimated based on relevance vector regression, is characterized in that, it comprises the steps of:
(1), first observe and record successive software failure dates set, and all of inputoutput data normalization;
(2), by abstraction and it is assumed that software failure time prediction problem is converted into a function regression problem;
(3), the kernel function for predicting the initialization value of given parameters are selected;
(4) the fail data number for learning, is selected;
(5), adopt relevance vector regression algorithm for estimating to carry out study for different failure dates sets to optimize;
(6), finally select the parameter after optimizing that the new out-of-service time is predicted;
Employing relevance vector regression algorithm for estimating described in step (5) carries out study for different failure dates sets and optimizes, including following process:
(5.1), given a group vectorWith corresponding desired valueAs input, it is assumed that the corresponding relation of x and t meets following function:
p(ti)=N (ti|y(xi; W), σ2)
(5.2) probability distribution, making t is:
In formula, Φ=[φ (x1),φ(x2),…φ(xN)]T, φ (xn)=[1, k (xn,x1),k(xn,x2),…,k(xn,xN)]T;
W=[w0,w1,…wN]T,
(5.3), to each weights ωiDefinition prior probability distribution:
In formula, αiIt is determine wiThe hyper parameter of prior distribution,
α=(α1i,…αN),
(5.4) Posterior distrbutionp of unknown quantity, is calculated:
(5.5), after integration, abbreviation obtains:
μ=σ-2ΣΦTT, Σ=(A+ σ-2ΦTΦ)-1, A=diag (α01,…αN), Ω=σ2I+ΦA-1ΦT,
(5.6), p (t is calculated*| approximate solution t):
(5.7), following formula iterative α is usedMP,
2. the as claimed in claim 1 software failure time forecasting methods estimated based on relevance vector regression, is characterized in that, described in step (2), software failure time prediction problem are converted into a function regression problem, adopt with the following method:
Assume that the software failure time occurred is t1,t2,…,tn, make tl=f (tl-m,tl-m+1,…,tl-1), then tlObey and fix but the conditional distribution function F (t of the unknownl|tl-m,tl-m+1,…,tl-1), at t1,t2,…,tkTo t under known conditionsk+1It is predicted becoming: known k-m observation (T1,tm+1),(T2,tm+2),…,(Tk-m,tk) and kth-m+1 input Tk-m+1When, estimate kth-m+1 output valveWherein, TiRepresent m dimensional vector [ti,ti+1,…,tm+i]。
3. the software failure time forecasting methods estimated based on relevance vector regression as claimed in claim 1, is characterized in that, the kernel function used in step (3) is gaussian kernel function,Its initial parameter value g=1; Fail data number in step (4) is the integer between 5-8.
CN201310013004.9A 2013-01-14 2013-01-14 Based on the software failure time forecasting methods that relevance vector regression is estimated Active CN103106139B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310013004.9A CN103106139B (en) 2013-01-14 2013-01-14 Based on the software failure time forecasting methods that relevance vector regression is estimated

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310013004.9A CN103106139B (en) 2013-01-14 2013-01-14 Based on the software failure time forecasting methods that relevance vector regression is estimated

Publications (2)

Publication Number Publication Date
CN103106139A CN103106139A (en) 2013-05-15
CN103106139B true CN103106139B (en) 2016-06-15

Family

ID=48314017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310013004.9A Active CN103106139B (en) 2013-01-14 2013-01-14 Based on the software failure time forecasting methods that relevance vector regression is estimated

Country Status (1)

Country Link
CN (1) CN103106139B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111887A (en) * 2014-07-01 2014-10-22 江苏科技大学 Software fault prediction system and method based on Logistic model
CN105260304B (en) * 2015-10-19 2018-03-23 湖州师范学院 A kind of software reliability prediction method based on QBGSA RVR

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1667587A (en) * 2005-04-11 2005-09-14 北京航空航天大学 Software reliability estimation method based on expanded Markov-Bayesian network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120209575A1 (en) * 2011-02-11 2012-08-16 Ford Global Technologies, Llc Method and System for Model Validation for Dynamic Systems Using Bayesian Principal Component Analysis

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1667587A (en) * 2005-04-11 2005-09-14 北京航空航天大学 Software reliability estimation method based on expanded Markov-Bayesian network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
软件可靠性预测的核函数方法;楼俊钢 等;《计算机科学》;20120430;第39卷(第4期);摘要、第145页右栏第4行-第147页右栏倒数第11行,图1-3 *

Also Published As

Publication number Publication date
CN103106139A (en) 2013-05-15

Similar Documents

Publication Publication Date Title
Kaytez et al. Forecasting electricity consumption: A comparison of regression analysis, neural networks and least squares support vector machines
Corizzo et al. Anomaly detection and repair for accurate predictions in geo-distributed big data
Pagel et al. Forecasting species ranges by statistical estimation of ecological niches and spatial population dynamics
Li et al. Big data driven vehicle battery management method: A novel cyber-physical system perspective
Vanem Non-stationary extreme value models to account for trends and shifts in the extreme wave climate due to climate change
CN104123377B (en) A kind of microblog topic temperature forecasting system and method
Turner et al. Regime‐shifting streamflow processes: Implications for water supply reservoir operations
CN103197983B (en) Service component reliability online time sequence predicting method based on probability graph model
CN104331572A (en) Wind power plant reliability modeling method considering correlation between air speed and fault of wind turbine generator
CN105809264B (en) Power load prediction method and device
Krishna An integrated approach for weather forecasting based on data mining and forecasting analysis
CN104699979B (en) Urban lake storehouse algal bloom Study on prediction technology of chaotic series based on complex network
Nunes et al. The elimination-selection based algorithm for efficient resource discovery in Internet of Things environments
Kajornrit et al. Estimation of missing precipitation records using modular artificial neural networks
CN103093095A (en) Software failure time forecasting method based on kernel principle component regression algorithm
CN103106139B (en) Based on the software failure time forecasting methods that relevance vector regression is estimated
CN114282704A (en) Charging load prediction method and device for charging station, computer equipment and storage medium
Sobolewski et al. Estimation of wind farms aggregated power output distributions
Yang et al. A new hybrid model based on fruit fly optimization algorithm and wavelet neural network and its application to underwater acoustic signal prediction
Koivisto et al. Statistical modeling of aggregated electricity consumption and distributed wind generation in distribution systems using AMR data
Pan et al. A novel probabilistic modeling framework for wind speed with highlight of extremes under data discrepancy and uncertainty
CN103093094A (en) Software failure time forecasting method based on kernel partial least squares regression algorithm
Lee et al. A big data management system for energy consumption prediction models
CN104933052A (en) Data true value estimation method and data true value estimation device
CN116523001A (en) Method, device and computer equipment for constructing weak line identification model of power grid

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant