CN109460356B - Data fusion method for software fault prediction - Google Patents
Data fusion method for software fault prediction Download PDFInfo
- Publication number
- CN109460356B CN109460356B CN201811218891.2A CN201811218891A CN109460356B CN 109460356 B CN109460356 B CN 109460356B CN 201811218891 A CN201811218891 A CN 201811218891A CN 109460356 B CN109460356 B CN 109460356B
- Authority
- CN
- China
- Prior art keywords
- software
- data
- evidence
- fault prediction
- prediction data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
Abstract
The invention discloses a data fusion method for software fault prediction, which comprises the steps of firstly judging the fusion degree of two software fault prediction data sets to be fused according to the inherent data characteristics of the software fault prediction data sets, then extracting the characteristics of fault data, judging the consistency of the software fault prediction data sets by adopting a D-S evidence theory, and then judging whether the data fusion condition is met or not through threshold setting, thereby realizing the fault prediction data fusion. The invention can realize the purpose of fusing different software fault prediction data such as system joint debugging, software three-party testing, system operation, similar software historical data and the like for fault prediction; by the method, the sample capacity of the software fault prediction data can be enlarged, the acquisition time of the software fault prediction data can be shortened, and the software fault prediction precision can be improved.
Description
Technical Field
The invention relates to a data fusion technology, in particular to a data fusion method for software fault prediction.
Background
In order to improve the accuracy of software failure prediction, enough failure data acquisition time and enough prediction sample data quantity need to be possessed to ensure the accuracy of software failure prediction.
Considering that a large amount of fault data can be collected in the whole life cycle of the software, the data generally does not contain time labels, but has the same data characteristics such as defect types, severity levels, root causes, occurrence positions, trigger conditions and the like as the fault prediction collected in the running process of the system. If a certain method can be adopted to judge whether the two types of data have the same failure reason, and the data with the same failure reason is fused and eliminated, and the data with different failure reasons is merged into the existing fault prediction data, the aims of enlarging the sample size and shortening the data acquisition time can be achieved, but no method in the prior art can solve the problems.
Disclosure of Invention
The invention aims to provide a data fusion method for software failure prediction.
The technical scheme for realizing the purpose of the invention is as follows: a data fusion method for software failure prediction comprises the following steps:
step 1, defining an identification framework and an identification feature set of software failure prediction data, and introducing a D-S evidence theory into a judgment process of consistency of the software failure prediction data set;
step 2, based on the distribution rule of the software fault prediction data set on the identification features, converting the problem determined by each identification feature credibility function into an F test problem of sample variance in mathematical statistics, and performing evidence synthesis calculation according to a synthesis rule of a D-S evidence theory;
and 3, setting corresponding threshold values according to different fields of the software system, and fusing the data set to be fused into the software failure prediction data set when the fusion degree reaches or exceeds the threshold values to realize the software failure prediction data fusion.
Compared with the prior art, the invention has the following remarkable advantages:
(1) the invention can enrich the sample size of the software failure prediction data on the premise of not prolonging the acquisition time of the failure prediction data when the software runs;
(2) according to the method, the time labels can be added to the data collected in the software full life cycle stage fused into the software failure prediction data set, so that the software failure prediction is effectively supported;
(3) the invention can shorten the data acquisition time of software fault prediction and improve the efficiency of software fault prediction on the premise of equal prediction precision.
The present invention is described in further detail below with reference to the attached drawing figures.
Drawings
FIG. 1 is a data fusion method for software failure prediction according to the present invention.
FIG. 2 is a flow chart of an implementation of the software failure prediction data fusion method based on the D-S evidence theory.
Detailed Description
On the basis of researching a D-S evidence theory principle and a software fault prediction data feature distribution rule, defining software fault prediction data features as identification features in the D-S evidence theory, defining the variance detection significance level of a sample as a credibility function on a single identification feature in the D-S evidence theory, and further applying the D-S evidence theory to realize fusion judgment of a data set; and then, by carrying out slicing processing on the software fault prediction data and extracting and comparing the characteristics of the test case, the purpose of adding a time tag to the fault prediction data at the full life cycle stage of the software fused into the software fault prediction data set is realized.
As shown in fig. 1, a data fusion method for software failure prediction includes the following steps:
step 1, constructing an identification framework and an identification feature set of software fault prediction data, and performing consistency judgment on the software fault prediction data set by using a D-S evidence theory; the specific method comprises the following steps:
step 1-1, defining an identification framework theta, wherein the theta is fuseable or non-fuseable, focal elements are A, the { fuseable }, and B, the { non-fuseable };
step 1-2, defining and identifying feature set
Step 1-3, setting a basic credibility function derived according to data characteristics as miWherein at miThe upper fusion probability is: m isi(A)=aiAt miThe probability of upper non-fusion is: m isi(B)=biThe probabilities for the other cases are: c. Ci=1-ai-bi。
Step 2, based on the distribution rule of the software fault prediction data set on the identification features, converting the reliability function determination problem of each identification feature into an F test problem of sample variance in mathematical statistics, and performing evidence synthesis calculation according to a synthesis rule of a D-S evidence theory to obtain the fusion degree of the software fault prediction data set; the method specifically comprises the following steps:
step 2-1, aiming at the ith data characteristic Mi, assuming data Xi={Xi1,Xi2,...XimAnd Yi={Yi1,Yi2,...,YinThe two independent samples from the same population are considered, and the population follows normal distribution, if the assumption is true, the two groups of data should have the same variance value, and then the confidence function on a single feature is determined, and the sample variance is determined when the samples X and Y follow normal distributionAndsatisfies the F distribution, i.e.
Where m and n are the sample capacities of X and Y, respectively, the sample variance is usedAndratio ofTo determine the overall variance of X and YAndequal probability riI.e. the level of significance of the F test, the fixed order samples X and Y are at feature miThe above fusible reliability function is: m isi(A)=ai=ri;
Step 2-2, constructing a synthesis rule under the condition that evidences do not conflict, and directly applying a D-S combination formula to carry out pairwise combination calculation under the condition that the evidences do not conflict to obtain:
k=mi(A)mj(B)+mi(B)mj(A)
mij(A)=[mi(A)mj(Θ)+mi(Θ)mj(A)+mi(A)mj(A)]/(1-k)
mij(B)=[mi(B)mj(Θ)+mi(Θ)mj(B)+mi(B)mj(B)]/(1-k)
mij(Θ)=[mi(Θ)mj(Θ)]/(1-k)
in the formula, mi(Θ)=ci;mij(A)、mij(B) And mij(theta) is an identification feature miAnd mjProbabilities of joint support of credentials A, B and other conditions;
simplifying to obtain:
k=aibj+biaj
wherein i is more than or equal to 1, and k is more than or equal to j
mij(A)=aij;mij(B)=bij;mij(Θ)=cij;
basic credibility function m under action of all data characteristics0From
Performing two-by-two calculation in sequence, and
a0=m0(A)
obtaining the X and Y fusibility a deduced by all data characteristics0;
Step 2-3, constructing a synthesis rule under the condition of evidence conflict, and processing the evidence by adopting a weighted average evidence combination method under the condition of evidence conflict, wherein the steps are as follows:
calculating the distance d between evidences and comparing the evidences miViewed as vectors, i.e.
mi=[mi(A),mi(B),mi(Θ)]
Is provided with
Wherein<mi,mj>Represents the inner product of the vector, | mi||2=<mi,mi>
The distance matrix DM represents the distance between evidences, i.e. pairwise
② calculating pairwise similarity of evidences, defining similarity between evidences as Sij=1-dij
Obtain a similarity matrix SM
Calculating the credibility of the evidence and setting other evidence pairs miDegree of support of
Then evidence miReliability of (C) (m)i) Can be expressed as
Weighted average of evidence
With a confidence level C (m)i) As miIs weighted to obtain
Degree of fusibility of X and Y
a0=m(A)。
And 3, setting corresponding threshold values according to different fields of the software system, and fusing the data set to be fused into the software failure prediction data set when the fusion degree reaches or exceeds the threshold values to realize the software failure prediction data fusion.
The present invention will be described in detail with reference to examples.
Examples
With reference to fig. 2, the invention discloses a software failure prediction data fusion method based on a D-S evidence theory, which comprises the following steps:
the first step is as follows: defining an identification framework and an identification feature set of software fault prediction data, and introducing a D-S evidence theory into a judgment process of consistency of the software fault prediction data set;
in engineering practice, software failure prediction data is generally described by attributes such as defect types and root causes, and 4 typical attributes are sorted out by analyzing common attributes of general software failure prediction data and reliability test software failure prediction data: m1 (defect type), M2 (root cause), M3 (severity level), M4 (occurrence location). If the two software failure prediction data are consistent on the 4 attribute descriptions, the two software failure prediction data are considered to be the same software failure prediction data, and the two software failure prediction data are considered to be fusible. Therefore, M1 (defect type), M2 (root cause), M3 (severity level), M4 (occurrence position) are defined as the identification characteristics of the software failure prediction data, and the basic credibility function derived from the data characteristics Mi is defined as MiWherein m isi(A)=ai,mi(B)=bi,ci=1-ai-bi。
The second step is that: based on the distribution rule of the software fault prediction data set on the identification features, converting the problem determined by the credibility function on each identification feature into an F test problem of sample variance in mathematical statistics, and performing evidence synthesis calculation according to the synthesis rule of the D-S evidence theory;
according to the statistical analysis of a large number of software failure prediction data samples, the distribution of the software failure prediction data of the software on specific identification features approximately follows normal distribution. Both the general software failure prediction data set X and the reliability software failure prediction data set Y are samples from the population Z. If the sample size of X, Y is large enough, the variances of the two must be equal or approximately equal. However, in general, the sample size of the two samples is not large, so if the data in the two samples are not completely consistent or approximately consistent, the variance of the two samples is necessarily different, and in order to ensure that enough software fault prediction data inconsistent with the data in Y can be selected from the general software fault prediction data set X, the threshold value of the fusibility degree is set to be 0.95.
First, assuming that the confidence value of the sample X, Y is m for each identified feature calculated by the confidence function mentioned above1(A)=r1,m2(A)=r2,m3(A)=r3,m4(A)=r4;
Secondly, according to a calculation formula:or a is obtained by a weighted average evidence combination method0;
Thirdly, setting corresponding threshold values according to different fields of the software system, and determining the degree of fusion a0When the threshold value is reached or exceeded, the data in the software failure prediction data set X and the data in the software failure prediction data set Y are considered to be approximately consistent, and the software failure prediction data in the X do not help the expansion sample size; otherwise, the data in the X can be added into the Y, so that the subsequent reliability test and evaluation are facilitated.
The third step: the data set to be fused is fused into the software failure prediction data set, so that the software failure prediction data fusion is realized, and the purposes of expanding the sample capacity of the software failure prediction data, shortening the acquisition time of the software failure prediction data and improving the software failure prediction precision are achieved.
Claims (3)
1. A data fusion method for software failure prediction is characterized by comprising the following steps:
step 1, constructing an identification framework and an identification feature set of software fault prediction data, and performing consistency judgment on the software fault prediction data set by using a D-S evidence theory;
step 2, based on the distribution rule of the software fault prediction data set on the identification features, converting the reliability function determination problem of each identification feature into an F test problem of sample variance in mathematical statistics, and performing evidence synthesis calculation according to a synthesis rule of a D-S evidence theory to obtain the fusion degree of the software fault prediction data set;
and 3, setting corresponding threshold values according to different fields of the software system, and fusing the data set to be fused into the software failure prediction data set when the fusion degree reaches or exceeds the threshold values to realize the software failure prediction data fusion.
2. The data fusion method for software failure prediction according to claim 1, wherein step 1 specifically comprises:
step 1-1, defining an identification framework theta, wherein the theta is fuseable or non-fuseable, focal elements are A, the { fuseable }, and B, the { non-fuseable };
step 1-2, defining and identifying feature set
Step 1-3, setting a basic credibility function derived according to data characteristics as mi(. wherein at m)iThe probability of fusion on (cna) is: m isi(A)=aiAt miThe probability of unfusibility on (a) is: m isi(B)=biThe probabilities for the other cases are: c. Ci=1-ai-biAnd i represents the order.
3. The data fusion method for software failure prediction according to claim 2, wherein the step 2 is to convert the reliability function determination problem of each recognition feature into an F-test problem of sample variance in mathematical statistics based on the distribution rule of the software failure prediction data set on the recognition features, and perform evidence synthesis calculation according to the synthesis rule of the D-S evidence theory, specifically:
step 2-1, aiming at the ith basic credibility function as mi(. to) assume data Xi={Xi1,Xi2,...,XipAnd Yi={Yi1,Yi2,...,YiqThe two groups of data should have the same variance value if the assumption is true, and then determine the reliability function on a single feature, which is known from mathematical statistics, when the sample X isiAnd YiAll obey normal distribution, their sample varianceAndsatisfies the F distribution, i.e.
Wherein p and q are each Xi、YiThe sample volume of (2) is then the sample varianceAndratio ofTo determine XiAnd YiTotal variance ofAndequal probability riI.e., the level of significance of the F test, thus obtaining, sample XiAnd YiHas a basic reliability function of mi(. is): m isi(A)=ai=ri;
Step 2-2, constructing a synthesis rule under the condition that evidences do not conflict, and directly applying a D-S combination formula to carry out pairwise combination calculation under the condition that the evidences do not conflict to obtain:
k=mi(A)mj(B)+mi(B)mj(A)
mij(A)=[mi(A)mj(Θ)+mi(Θ)mj(A)+mi(A)mj(A)]/(1-k)
mij(B)=[mi(B)mj(Θ)+mi(Θ)mj(B)+mi(B)mj(B)]/(1-k)
mij(Θ)=[mi(Θ)mj(Θ)]/(1-k)
in the formula, mi(Θ)=ci;mij(A)、mij(B) And mij(theta) each is m as a basic reliability functioni(. DEG) and a basic confidence function of mj(. h) probabilities of jointly supporting evidence A, B and other cases, i, j representing order;
simplifying to obtain:
k=aibj+biaj
wherein i is more than or equal to 1, and j is more than or equal to k;
mij(A)=aij;mij(B)=bij;mij(Θ)=cij;
basic credibility function m under action of all data characteristics0(. 2) is prepared from
Performing two-by-two calculation in sequence, and
a0=m0(A)
deriving X inferred from all data featuresiAnd YiDegree of fusion a0;
Step 2-3, constructing a synthesis rule under the condition of evidence conflict, and processing the evidence by adopting a weighted average evidence combination method under the condition of evidence conflict, wherein the steps are as follows:
calculating the distance d between evidences and taking the basic credibility function as mi(. as a vector, i.e.
mi=[mi(A),mi(B),mi(Θ)]
Is provided with
Wherein<mi,mj>Represents the inner product of the vectors, | mi‖2=<mi,mi>
Using distance matrices DMRepresenting pairwise distances between evidences, i.e.
② calculating pairwise similarity of evidences, defining similarity between evidences as Sij=1-dij
Obtain a similarity matrix SM
Calculating the credibility of the evidence and setting other evidence pairs miDegree of support of
n represents the number of rows of the matrix;
the basic confidence function is miReliability of (. cndot.) C (m)i) Can be expressed as
Weighted average of evidence
With a confidence level C (m)i) As miIs weighted to obtain
Then XiAnd YiDegree of fusion
a0=m(A)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811218891.2A CN109460356B (en) | 2018-10-19 | 2018-10-19 | Data fusion method for software fault prediction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811218891.2A CN109460356B (en) | 2018-10-19 | 2018-10-19 | Data fusion method for software fault prediction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109460356A CN109460356A (en) | 2019-03-12 |
CN109460356B true CN109460356B (en) | 2021-12-28 |
Family
ID=65607917
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811218891.2A Active CN109460356B (en) | 2018-10-19 | 2018-10-19 | Data fusion method for software fault prediction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109460356B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102721941A (en) * | 2012-06-20 | 2012-10-10 | 北京航空航天大学 | Method for fusing and diagnosing fault information of circuit of electric meter on basis of SOM (self-organized mapping) and D-S (Dempster-Shafer) theories |
CN103557884A (en) * | 2013-09-27 | 2014-02-05 | 杭州银江智慧城市技术集团有限公司 | Multi-sensor data fusion early warning method for monitoring electric transmission line tower |
CN103984623A (en) * | 2014-04-28 | 2014-08-13 | 天津大学 | Software security risk assessment method based on defect detection |
CN106198749A (en) * | 2015-05-08 | 2016-12-07 | 中国科学院声学研究所 | A kind of data fusion method of multiple sensor based on Metal Crack monitoring |
CN107222322A (en) * | 2016-03-22 | 2017-09-29 | 中国移动通信集团陕西有限公司 | A kind of communication failure diagnostic method and device |
CN107797931A (en) * | 2017-11-13 | 2018-03-13 | 长春长光精密仪器集团有限公司 | A kind of method for evaluating software quality and system based on second evaluation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7383239B2 (en) * | 2003-04-30 | 2008-06-03 | Genworth Financial, Inc. | System and process for a fusion classification for insurance underwriting suitable for use by an automated system |
-
2018
- 2018-10-19 CN CN201811218891.2A patent/CN109460356B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102721941A (en) * | 2012-06-20 | 2012-10-10 | 北京航空航天大学 | Method for fusing and diagnosing fault information of circuit of electric meter on basis of SOM (self-organized mapping) and D-S (Dempster-Shafer) theories |
CN103557884A (en) * | 2013-09-27 | 2014-02-05 | 杭州银江智慧城市技术集团有限公司 | Multi-sensor data fusion early warning method for monitoring electric transmission line tower |
CN103984623A (en) * | 2014-04-28 | 2014-08-13 | 天津大学 | Software security risk assessment method based on defect detection |
CN106198749A (en) * | 2015-05-08 | 2016-12-07 | 中国科学院声学研究所 | A kind of data fusion method of multiple sensor based on Metal Crack monitoring |
CN107222322A (en) * | 2016-03-22 | 2017-09-29 | 中国移动通信集团陕西有限公司 | A kind of communication failure diagnostic method and device |
CN107797931A (en) * | 2017-11-13 | 2018-03-13 | 长春长光精密仪器集团有限公司 | A kind of method for evaluating software quality and system based on second evaluation |
Non-Patent Citations (4)
Title |
---|
Data-fusion method for flaws detection based on reliability evaluation;Li Wang等,;《 2017 36th Chinese Control Conference (CCC)》;20170728;全文 * |
Three-layer information fusion for braking system fault diagnosis;Shaojin Wang等,;《 2012 5th International Conference on BioMedical Engineering and Informatics》;20121018;全文 * |
基于改进PSO和D-S的融合方法及其在智能诊断上的应用;吕朋亮等,;《计算机集成制造系统》;20140923;第21卷(第8期);全文 * |
故障诊断的信息融合方法;朱大奇等,;《控制与决策》;20071215;第22卷(第12期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109460356A (en) | 2019-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110704231A (en) | Fault processing method and device | |
JP7116103B2 (en) | Method, Apparatus, and Device for Predicting Optical Module Failure | |
CN109977895B (en) | Wild animal video target detection method based on multi-feature map fusion | |
CN112257963B (en) | Defect prediction method and device based on spaceflight software defect data distribution outlier | |
CN114978037B (en) | Solar cell performance data monitoring method and system | |
CN110544047A (en) | Bad data identification method | |
CN111221807A (en) | Cloud service-oriented industrial equipment big data quality testing method and architecture | |
CN111274084A (en) | Fault diagnosis method, device, equipment and computer readable storage medium | |
CN114325405A (en) | Battery pack consistency analysis method, modeling method, device, equipment and medium | |
CN111191720B (en) | Service scene identification method and device and electronic equipment | |
CN113536066A (en) | Data anomaly detection algorithm determination method and device and computer equipment | |
CN117076258A (en) | Remote monitoring method and system based on Internet cloud | |
CN109670549B (en) | Data screening method and device for thermal power generating unit and computer equipment | |
CN109460356B (en) | Data fusion method for software fault prediction | |
CN113033624A (en) | Industrial image fault diagnosis method based on federal learning | |
CN111614504A (en) | Power grid regulation and control data center service characteristic fault positioning method and system based on time sequence and fault tree analysis | |
CN107517474B (en) | Network analysis optimization method and device | |
CN116151799A (en) | BP neural network-based distribution line multi-working-condition fault rate rapid assessment method | |
CN112014821B (en) | Unknown vehicle target identification method based on radar broadband characteristics | |
CN115588157A (en) | Performance data processing method and system of cross-linked low-smoke low-halogen polyolefin material | |
CN114936614A (en) | Operation risk identification method and system based on neural network | |
CN111258788B (en) | Disk failure prediction method, device and computer readable storage medium | |
CN114020905A (en) | Text classification external distribution sample detection method, device, medium and equipment | |
CN114020971A (en) | Abnormal data detection method and device | |
CN112083707A (en) | Industrial control physical signal processing method, controller and processing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder |
Address after: 222001 No.18 Shenghu Road, Lianyungang City, Jiangsu Province Patentee after: The 716th Research Institute of China Shipbuilding Corp. Address before: 222001 No.18 Shenghu Road, Lianyungang City, Jiangsu Province Patentee before: 716TH RESEARCH INSTITUTE OF CHINA SHIPBUILDING INDUSTRY Corp. |
|
CP01 | Change in the name or title of a patent holder |