CN104899135A - Software defect prediction method and system - Google Patents
Software defect prediction method and system Download PDFInfo
- Publication number
- CN104899135A CN104899135A CN201510247157.9A CN201510247157A CN104899135A CN 104899135 A CN104899135 A CN 104899135A CN 201510247157 A CN201510247157 A CN 201510247157A CN 104899135 A CN104899135 A CN 104899135A
- Authority
- CN
- China
- Prior art keywords
- mrow
- sample
- msubsup
- defect
- msub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000007547 defect Effects 0.000 title claims abstract description 236
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000012549 training Methods 0.000 claims abstract description 21
- 238000004364 calculation method Methods 0.000 claims abstract description 19
- 239000013598 vector Substances 0.000 claims description 173
- 238000005259 measurement Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 11
- 230000003068 static effect Effects 0.000 claims description 11
- 238000004458 analytical method Methods 0.000 abstract description 6
- 230000001965 increasing effect Effects 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 7
- 230000002950 deficient Effects 0.000 description 2
- 238000005315 distribution function Methods 0.000 description 2
- 238000007493 shaping process Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The invention relates to a software defect prediction method and system, and the method comprises the steps: obtaining sample software modules and clustering to obtain clustered subsets; calculating a Gauss parameter of the clustered subsets, generating a pseudo-defect sample according to the Gauss parameter, and obtaining an updated defect sample set according to the software defect sample set and the pseudo-defect sample; training according to the updated defect sample set to obtain a defect prediction model, performing defect prediction on software to be detected according to the defect prediction model and outputting a prediction result. The clustered subsets are formed by clustering the sample software modules, the clustered subsets are subject to Gauss analysis calculation to obtain the Gauss parameter, and then the pseudo-defect sample is generated according to the Gauss parameter. More defect data are increased to generate the updated defect sample set for training, the accuracy of the defect prediction model is improved, the defect prediction model can better estimate and fit the defect data and the prediction accuracy for software defects is improved.
Description
Technical Field
The invention relates to the technical field of software security, in particular to a software defect prediction method and a software defect prediction system.
Background
With the development of information technology, the software complexity is continuously improved, the software scale is continuously increased, and a good software quality control and prediction mechanism not only can help enterprises to develop high-quality software products and reduce the production and maintenance cost of the products, but also has important significance in the aspects of improving the customer satisfaction, establishing good enterprise images, enhancing the competitiveness of the enterprises in the market and the like. Therefore, the quality of software is more and more emphasized, and how to predict and improve the quality of software becomes one of the hot spots of current research.
The traditional software defect prediction method adopts a software defect prediction model based on machine learning, the model takes a measurement data vector of a software module as input, and whether the software module has defects or not is predicted through the steps of preprocessing, feature extraction, model training, prediction and the like. Due to the inherent problems of evaluation criteria of the performance, induction bias and the like of the model, a software defective module and a software non-defective module are treated equally, the overall maximum prediction accuracy is taken as a target, but the detection rate of software defects is still not high. Therefore, the traditional software defect prediction method has the defect of low prediction accuracy.
Disclosure of Invention
In view of the above, it is necessary to provide a software defect prediction method and system with high prediction accuracy.
A software defect prediction method comprises the following steps:
acquiring a sample software module and carrying out clustering processing to obtain a clustering subset;
calculating Gaussian parameters of the clustering subsets, and generating pseudo-defect samples according to the Gaussian parameters;
obtaining an updated defect sample set according to the software defect sample set and the pseudo defect sample;
training according to the updated defect sample set to obtain a defect prediction model;
and performing defect prediction on the software module to be tested according to the defect prediction model, and outputting a prediction result.
A software bug prediction system comprising:
the clustering module is used for acquiring the sample software module and carrying out clustering processing to obtain a clustering subset;
the calculation module is used for calculating the Gaussian parameters of the clustering subsets and generating pseudo-defect samples according to the Gaussian parameters;
the updating module is used for obtaining an updated defect sample set according to the software defect sample set and the pseudo defect sample;
the training module is used for training according to the updated defect sample set to obtain a defect prediction model;
and the prediction module is used for predicting the defects of the software module to be tested according to the defect prediction model and outputting a prediction result.
According to the software defect prediction method and system, the sample software modules are obtained and subjected to clustering processing, and the clustering subset is obtained. And calculating Gaussian parameters of the clustering subset, generating a pseudo-defect sample according to the Gaussian parameters, and obtaining an updated defect sample set according to the software defect sample set and the pseudo-defect sample. And training according to the updated defect sample set to obtain a defect prediction model, performing defect prediction on the software module to be tested according to the defect prediction model, and outputting a prediction result. And forming a clustering subset for the sample software module in a clustering mode, carrying out Gaussian analysis calculation on the clustering subset to obtain a Gaussian parameter, and then generating a pseudo-defect sample according to the Gaussian parameter. The defect prediction model can better estimate and fit the defect data by increasing more defect data to generate an updated defect sample set for training, so that the accuracy of the defect prediction model is improved, and the prediction accuracy of software defects is improved.
Drawings
FIG. 1 is a flow diagram of a software bug prediction method in one embodiment;
FIG. 2 is a flow chart of obtaining sample software modules and performing clustering to obtain a cluster subset in one embodiment;
FIG. 3 is a flowchart illustrating an embodiment of obtaining a center point of a sample vector by using each sample vector as a starting point;
FIG. 4 is a flowchart illustrating the calculation of Gaussian parameters of a cluster subset and the generation of pseudo-defect samples based on the Gaussian parameters according to an embodiment;
FIG. 5 is a flowchart illustrating defect prediction for a software module under test according to the defect prediction model in an embodiment;
FIG. 6 is a flowchart of a software defect prediction method in another embodiment;
FIG. 7 is a block diagram of a software bug prediction system in accordance with one embodiment;
FIG. 8 is a block diagram of a clustering module in one embodiment;
FIG. 9 is a block diagram of a center point calculation unit in an embodiment;
FIG. 10 is a block diagram of a compute module in one embodiment;
FIG. 11 is a block diagram of a prediction module in one embodiment;
FIG. 12 is a block diagram of a software defect prediction system in accordance with another embodiment.
Detailed Description
A software defect prediction method, as shown in fig. 1, includes the following steps:
step S110: and acquiring a sample software module and carrying out clustering processing to obtain a clustering subset. The sample software module refers to a software module which is known to have defects or not, and the sample software module is classified through clustering processing to obtain a clustering subset. In one embodiment, as shown in fig. 2, step S110 includes steps S112 to S118.
Step S112: and respectively marking the sample software modules to obtain the defect marks of the sample software modules. For example, for the ith sample software module, i 1,2i1 is ═ 1; if there is no defect, the defect mark yi0. It can be understood that the marking mode of each sample software module and the value of the obtained defect mark are not unique, and in other embodiments, the defect of the sample software module with the defect may be marked as 0, and the defect of the sample software module without the defect may be marked as 1, etc.
Step S114: and respectively carrying out static measurement on the sample software modules to obtain sample vectors of the sample software modules. For the ith sample software module, i 1, 2. The static metric in this embodiment may specifically include a Halstead metric, a MaCabe metric, a khoshgoftar metric, a CK metric, and the likeObtaining n measurement values, and marking the measurement values as xi1,xi2,...,xinSample vector x forming a sample software modulei={xi1,xi2,...,xinAll sample vectors constitute a software defect sample set xi|i=1,2,...,M}。
Step S116: and respectively taking each sample vector as a starting point to obtain the central point of the sample vector. And respectively taking each sample vector as a starting point, and calculating the central point of the corresponding sample vector to be used as a basis for clustering. Further, in the embodiment, the MeanShift method is adopted for clustering, so that the calculated amount is small, and the clustering analysis speed can be improved. As shown in fig. 3, step S116 includes steps S1162 to S1166.
Step S1162: taking the sample vector as a starting point, calculating a meanshift vector of the sample vector. The method specifically comprises the following steps:
wherein M ishMean shift vector, S, representing sample vector xh(x) In a high-dimensional spherical region having a radius of a constant h, a relationship (x-x) is satisfiedi)T(x-xi)<h2Set of K sample vectors, xiIs Sh(x) The sample vector in (1), T, denotes transpose.
Step S1164: judging whether the meanshift vector of the sample vector is larger than the preset valueAnd (4) a threshold value. If yes, taking the sum of the sample vector and the meanshift vector as a new sample vector, and returning to the step S1162; if not, go to step S1166. The preset threshold is preset and can be adjusted according to the actual situation, if the mean shift vector MhGreater than, with xi+MhAs a new starting point, a new meanshift vector M is again calculatedh。
Step S1166: and taking the sum of the sample vector and the meanshift vector as the central point of the sample vector. If M ishIf x is less than or equal to x is confirmedi+MhIs the center point.
Repeating S1162 to S1166 for each sample vector until all sample vectors are traversed, and generating M central points.
Step S118: and clustering the sample vectors according to the central points of the sample vectors to obtain a clustering subset. And dividing the sample vectors which tend to the same central point into a class to form L clustering subsets.
Step S120: and calculating Gaussian parameters of the clustering subsets, and generating a pseudo-defect sample according to the Gaussian parameters. By constructing a software defect distribution function of mixed gauss, the software defect distribution is better described. And then, calculating the Gaussian parameters by using the relation between the defect distribution and the sample vector, and laying a foundation for further pseudo-defect generation.
And performing Gaussian parameter estimation on each cluster subset. Assume the kth subset of clusters asNumber of samples Mk. In one embodiment, the gaussian parameters include mean and variance. As shown in fig. 4, step S120 includes steps S121 to S125.
Step S121: the mean of the cluster subsets is calculated. The method specifically comprises the following steps:
wherein, mukDenotes the mean value, MkFor sample vectors in cluster subsetsThe number of the (c) component(s),represents the mean value μkThe metric value in the nth dimension.
Step S122: the variance of the cluster subsets in each dimension is calculated. The method specifically comprises the following steps:
wherein,representing the variance of the clustering subset in the j-th dimension, n being the dimension,representing a sample vectorThe value of the metric in the j-th dimension,represents the mean value μkMetric values in the j-th dimension.
Step S123: and correspondingly generating random numbers of the clustering subsets in each dimension according to the variance of the clustering subsets in each dimension.
In particular, for the j-th dimension, according to a Gaussian distributionGenerating random numbersSpecifically, 12 are selected as [0,1 ]]Random variable uniformly distributed onThenIt can be understood thatThe constant value adopted in the random number calculation is not unique and can be adjusted according to the actual situation.
Step S124: and obtaining a random vector of the clustering subset according to the random number of the clustering subset in each dimension. After calculating the random quantity according to step S123 for each dimension, a random vector is obtained, which specifically is:
wherein, Delta LambdakIn the form of a random vector, the vector is,is a random number in the nth dimension for the subset of clusters.
Step S125: and obtaining a pseudo defect sample according to the mean value and the random vector of the clustering subset. The calculation of the pseudo-defect sample is specifically as follows:
where t is a pseudo-defect sample, μkTo mean of the cluster subset, Δ ΛkIs a random vector.
Repeating the steps S121 to S125 for each cluster subset to obtain L pseudo-defect samples, where the pseudo-defect samples represent sample vectors of the virtually obtained software modules with defects, and the defect flag corresponding to each pseudo-defect sample may be set to be 0.
Step S130: and obtaining an updated defect sample set according to the software defect sample set and the pseudo defect sample. Assume a set of pseudo-defect samples as tiL, total L pseudo-defect samples. Original software defect sample set xi1,2, M and a set of pseudo-defect samples { t |iL | -1, 2.,. L } are merged to obtain an updated defect sample set { x'i|i=1,2,...,M+L}。
Step S140: and training according to the updated defect sample set to obtain a defect prediction model. And more defect data are added to generate an updated defect sample set for training, so that the accuracy of the defect prediction model is improved, and the defect prediction model can better estimate and fit the defect data.
In one embodiment, step S140 is to train a defect prediction model based on a support vector machine according to the updated defect sample set, specifically:
wherein x isi、xjRespectively for updating ith and j sample vectors, y in the defect sample seti、yjRespectively updating defect marks, lambda, corresponding to ith and j sample vectors in the defect sample seti、λjRespectively representing the weights of ith and j sample vectors in the updated defect sample set; and S.T. represents a constraint condition, C is a constant, and M + L represents the number of sample vectors in the updated defect sample set. In this example, k (x)i,xj) Represents a pair xiAnd xjGet a dot product ofOther operations may be shown in other embodiments.
To representTaking the value of lambda when the maximum value is obtained; respectively updating sample vectors x in defect sample seti、xjSubstitution intoDetermining a sample vector x when taking the maximum valueiWeight of (lambda)jAnd finally obtaining the weight of all sample vectors in the updated defect sample set.
Step S150: and performing defect prediction on the software module to be tested according to the defect prediction model, and outputting a prediction result. And performing defect prediction on the unknown software module to be tested by using the defect prediction model to obtain and output a prediction result, and informing a worker of completing the defect prediction on the software module to be tested.
In one embodiment, as shown in fig. 5, step S150 includes step S152 and step S154.
Step S152: and performing static measurement on the software module to be measured to obtain a sample vector of the software module to be measured. The specific process of performing static measurement on the software module to be tested is similar to step S114, and is not described herein again.
Step S154: and performing defect prediction on the software module to be tested according to the sample vector of the software module to be tested and the defect prediction model. The method specifically comprises the following steps:
wherein g (x) represents the defect mark of the software module to be tested, sgn represents the pairFind an integer variable, whenGreater than 0 is taken as 10 is selected when the value is less than or equal to 0; x is the number ofiFor updating the ith sample vector, y, in the defect sample setiFor updating the defect mark, λ, corresponding to the ith sample vector in the defect sample setiRepresenting the weight of the ith sample vector in the updated defect sample set obtained by the defect prediction model, M + L representing the number of the sample vectors in the updated defect sample set, x being the sample vector of the software module to be tested, b being a constant, and K (x) in the same sample embodimentiAnd x) represents the pair xiAnd x is dot multiplied. The way of shaping the variables in this embodiment and the defect mark y if there is a defect in step S112i1 is ═ 1; defect mark y if no defect is presentiThe correspondence of 0 may be specifically adjusted according to the definition of the defect label.
In practical application, the software failure modules account for relatively less total number of software modules because the probability of software failure is lower than the normal probability. However, when these few software failure modules are mispredicted as being flawless, once put into practical use, the economic and social losses incurred are immeasurable.
According to the software defect prediction method, a sample software module forms a clustering subset in a clustering mode, Gaussian analysis calculation is carried out on the clustering subset to obtain a Gaussian parameter, and then a pseudo defect sample is generated according to the Gaussian parameter. The defect prediction model can better estimate and fit the defect data by increasing more defect data to generate an updated defect sample set for training, so that the accuracy of the defect prediction model is improved, and the prediction accuracy of software defects is improved.
In one embodiment, as shown in fig. 6, after step S150, the software defect prediction method further includes step S160.
Step S160: and outputting alarm information when the software module to be tested has defects. Output alarm information reports to the police, reminds the staff to know to the staff in time distinguishes the software module that has the defect, so that follow-up overhauls, improves the operation convenience. Specifically, the alarm can be performed through sound and light alarm, voice alarm, or alarm through displaying preset pictures or characters on a display screen.
The invention also provides a software defect prediction system, as shown in fig. 7, which comprises a clustering module 110, a calculating module 120, an updating module 130, a training module 140 and a prediction module 150.
The clustering module 110 is configured to obtain a sample software module and perform clustering processing to obtain a clustering subset. The sample software module refers to a software module which is known to have defects or not, and the sample software module is classified through clustering processing to obtain a clustering subset. In one embodiment, as shown in FIG. 8, clustering module 110 includes a labeling unit 112, a metric unit 114, a center point calculation unit 116, and a clustering unit 118.
The marking unit 112 is configured to mark the sample software modules respectively to obtain each sample software moduleAnd marking the defects of the blocks. For example, for the ith sample software module, i 1,2i1 is ═ 1; if there is no defect, the defect mark yi0. It can be understood that the marking mode of each sample software module and the value of the obtained defect mark are not unique, and in other embodiments, the defect of the sample software module with the defect may be marked as 0, and the defect of the sample software module without the defect may be marked as 1, etc.
The measurement unit 114 is configured to perform static measurement on the sample software modules, respectively, to obtain sample vectors of the sample software modules. For the ith sample software module, i 1, 2. In this embodiment, the static metric may specifically include Halstead metric, MaCabe metric, Khoshgoftaar metric, CK metric, and the like, n metric values are obtained, and the metric values are respectively marked as xi1,xi2,...,xinSample vector x forming a sample software modulei={xi1,xi2,...,xinAll sample vectors constitute a software defect sample set xi|i=1,2,...,M}。
The central point calculating unit 116 is configured to obtain a central point of each sample vector by using each sample vector as a starting point. And respectively taking each sample vector as a starting point, and calculating the central point of the corresponding sample vector to be used as a basis for clustering. Further, in the embodiment, the MeanShift method is adopted for clustering, so that the calculated amount is small, and the clustering analysis speed can be improved. As shown in fig. 9 in particular, the center point calculating unit 116 includes a first unit 1162, a second unit 1164, and a third unit 1166.
The first unit 1162 is configured to calculate a meanshift vector of the sample vector using the sample vector as a starting point. The method specifically comprises the following steps:
wherein M ishMean shift vector, S, representing sample vector xh(x) In a high-dimensional spherical region having a radius of a constant h, a relationship (x-x) is satisfiedi)T(x-xi)<h2Set of K sample vectors, xiIs Sh(x) The sample vector in (1), T, denotes transpose.
The second unit 1164 is configured to determine whether the meanshift vector of the sample vector is greater than a preset threshold. The preset threshold is preset and can be adjusted according to actual conditions.
The third unit 1166 is configured to, when the mean shift vector of the sample vector is greater than the preset threshold, take the sum of the sample vector and the mean shift vector as a new sample vector, and control the first unit 1162 to calculate the mean shift vector again; and when the mean shift vector of the sample vector is less than or equal to a preset threshold value, taking the sum of the sample vector and the mean shift vector as the central point of the sample vector.
If mean shift vector MhGreater than, with xi+MhAs a new starting point, the first unit 1162 is controlled to calculate again a new meanshift vector Mh. If M ishIf x is less than or equal to x is confirmedi+MhIs the center point.
The calculation is repeated for each sample vector until all sample vectors are traversed, generating M center points.
The clustering unit 118 is configured to cluster the sample vectors according to the central points of the sample vectors to obtain a cluster subset. And dividing the sample vectors which tend to the same central point into a class to form L clustering subsets.
The calculating module 120 is configured to calculate a gaussian parameter of the clustering subset, and generate a pseudo-defect sample according to the gaussian parameter. By constructing a software defect distribution function of mixed gauss, the software defect distribution is better described. And then, calculating the Gaussian parameters by using the relation between the defect distribution and the sample vector, and laying a foundation for further pseudo-defect generation.
And performing Gaussian parameter estimation on each cluster subset. Assume the kth subset of clusters asNumber of samples Mk. In one embodiment, the gaussian parameters include mean and variance. As shown in fig. 10, the calculation block 120 includes a mean calculation unit 121, a variance calculation unit 122, a random number generation unit 123, a random vector generation unit 124, and a pseudo-defect sample generation unit 125.
The mean calculation unit 121 is configured to calculate a mean of the cluster subset. The method specifically comprises the following steps:
wherein, mukDenotes the mean value, MkFor sample vectors in cluster subsetsThe number of the (c) component(s),represents the mean value μkThe metric value in the nth dimension.
The variance calculating unit 122 is used for calculating the variance of the clustering subset in each dimension. The method specifically comprises the following steps:
wherein,representing the variance of the clustering subset in the j-th dimension, n being the dimension,representing a sample vectorThe value of the metric in the j-th dimension,represents the mean value μkMetric values in the j-th dimension.
The random number generating unit 123 is configured to correspondingly generate a random number of the clustering subset in each dimension according to the variance of the clustering subset in each dimension.
In particular, for the j-th dimension, according to a Gaussian distributionGenerating random numbersSpecifically, 12 are selected as [0,1 ]]Random variable uniformly distributed onThenIt can be understood that the constant value adopted in the calculation of the random number is not unique and can be adjusted according to the actual situation.
The random vector generating unit 124 is configured to obtain a random vector of the cluster subset according to the random number of the cluster subset in each dimension. After the random quantity is calculated for each dimension, a random vector is obtained, which specifically comprises:
wherein, Delta LambdakIn the form of a random vector, the vector is,is a random number in the nth dimension for the subset of clusters.
The pseudo-defect sample generating unit 125 is configured to obtain a pseudo-defect sample according to the mean value and the random vector of the clustering subset, and specifically includes:
where t is a pseudo-defect sample, μkTo mean of the cluster subset, Δ ΛkIs a random vector.
And repeatedly calculating each clustering subset to obtain L pseudo-defect samples, wherein the pseudo-defect samples represent the virtually obtained sample vectors of the software modules with the defects, and the defect marks corresponding to the pseudo-defect samples can be set to be 0.
The updating module 130 is configured to obtain an updated defect sample set according to the software defect sample set and the pseudo defect sample. Assume a set of pseudo-defect samples as tiL, total L pseudo-defect samples. Original software defect sample set xi1,2, M and a set of pseudo-defect samples { t |iL | -1, 2.,. L } are merged to obtain an updated defect sample set { x'i|i=1,2,...,M+L}。
The training module 140 is configured to perform training according to the updated defect sample set to obtain a defect prediction model. And more defect data are added to generate an updated defect sample set for training, so that the accuracy of the defect prediction model is improved, and the defect prediction model can better estimate and fit the defect data.
In one embodiment, the training module 140 trains the defect prediction model based on the support vector machine according to the updated defect sample set, specifically:
wherein x isi、xjRespectively for updating ith and j sample vectors, y in the defect sample seti、yjRespectively updating defect marks, lambda, corresponding to ith and j sample vectors in the defect sample seti、λjRespectively representing the weights of ith and j sample vectors in the updated defect sample set; and S.T. represents a constraint condition, C is a constant, and M + L represents the number of sample vectors in the updated defect sample set. In this example, k (x)i,xj) Represents a pair xiAnd xjThe dot product may be calculated in other embodiments by other calculation methods.
To representTaking the value of lambda when the maximum value is obtained; respectively updating sample vectors x in defect sample seti、xjSubstitution intoDetermining a sample vector x when taking the maximum valueiWeight of (lambda)jFinally obtaining the weight of updating all sample vectors in the defect sample setThe value is obtained.
The prediction module 150 is configured to perform defect prediction on the software module to be tested according to the defect prediction model, and output a prediction result. And performing defect prediction on the unknown software module to be tested by using the defect prediction model to obtain and output a prediction result, and informing a worker of completing the defect prediction on the software module to be tested.
In one embodiment, as shown in FIG. 11, prediction module 150 includes a processing unit 152 and a prediction unit 154.
The processing unit 152 is configured to perform static measurement on the software module to be tested, so as to obtain a sample vector of the software module to be tested. The specific process of performing static measurement on the software module to be tested is similar to the operation principle of the measurement unit 114, and is not described herein again.
The prediction unit 154 is configured to perform defect prediction on the software module to be tested according to the sample vector of the software module to be tested and the defect prediction model. The method specifically comprises the following steps:
wherein g (x) represents the defect mark of the software module to be tested, sgn represents the pairFind an integer variable, whenGreater than 0 is taken as 10 is selected when the value is less than or equal to 0; x is the number ofiFor updating the ith sample vector, y, in the defect sample setiFor updating the defect mark, λ, corresponding to the ith sample vector in the defect sample setiRepresenting the weight of the ith sample vector in the updated defect sample set obtained by the defect prediction model, M + L representing the number of the sample vectors in the updated defect sample set, x being the sample vector of the software module to be tested, b being a constant, and K (x) in the same sample embodimentiAnd x) represents the pair xiAnd x is dot multiplied. The method for shaping the variables and the marking unit 112 with defect in this embodiment are then the defect mark yi1 is ═ 1; defect mark y if no defect is presentiThe correspondence of 0 may be specifically adjusted according to the definition of the defect label.
In practical application, the software failure modules account for relatively less total number of software modules because the probability of software failure is lower than the normal probability. However, when these few software failure modules are mispredicted as being flawless, once put into practical use, the economic and social losses incurred are immeasurable.
According to the software defect prediction system, the sample software module forms the clustering subset in a clustering mode, Gaussian analysis calculation is carried out on the clustering subset to obtain the Gaussian parameter, and then the pseudo defect sample is generated according to the Gaussian parameter. The defect prediction model can better estimate and fit the defect data by increasing more defect data to generate an updated defect sample set for training, so that the accuracy of the defect prediction model is improved, and the prediction accuracy of software defects is improved.
In one embodiment, as shown in FIG. 12, the software bug prediction system may further include an alarm module 160.
The alarm module 160 is used for outputting alarm information when the software module to be tested has defects. Output alarm information reports to the police, reminds the staff to know to the staff in time distinguishes the software module that has the defect, so that follow-up overhauls, improves the operation convenience. Specifically, the alarm can be performed through sound and light alarm, voice alarm, or alarm through displaying preset pictures or characters on a display screen.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A software defect prediction method is characterized by comprising the following steps:
acquiring a sample software module and carrying out clustering processing to obtain a clustering subset;
calculating Gaussian parameters of the clustering subsets, and generating pseudo-defect samples according to the Gaussian parameters;
obtaining an updated defect sample set according to the software defect sample set and the pseudo defect sample;
training according to the updated defect sample set to obtain a defect prediction model;
and performing defect prediction on the software module to be tested according to the defect prediction model, and outputting a prediction result.
2. The software defect prediction method of claim 1, wherein the step of obtaining sample software modules and performing clustering to obtain a cluster subset comprises the steps of:
respectively marking the sample software modules to obtain defect marks of the sample software modules;
respectively carrying out static measurement on the sample software modules to obtain sample vectors of all the sample software modules;
respectively taking each sample vector as a starting point, and acquiring a central point of the sample vector;
and clustering the sample vectors according to the central points of the sample vectors to obtain a clustering subset.
3. The software defect prediction method of claim 2, wherein the step of calculating the center point of the sample vector using each sample vector as a starting point comprises the steps of:
taking the sample vector as a starting point, calculating a meanshift vector of the sample vector, specifically:
wherein M ishMean shift vector, S, representing sample vector xh(x) Representing radius by a constant hIn the high-dimensional sphere region, the relationship (x-x) is satisfiedi)T(x-xi)<h2Set of K sample vectors, xiIs Sh(x) Sample vector of (1), T denotes transpose;
judging whether the meanshift vector of the sample vector is larger than a preset threshold value or not;
if not, taking the sum of the sample vector and the meanshift vector as the central point of the sample vector;
and if so, taking the sum of the sample vector and the meanshift vector as a new sample vector, and returning to the step of calculating the meanshift vector of the sample vector by taking the sample vector as a starting point.
4. The software defect prediction method of claim 1, wherein the gaussian parameters comprise a mean and a variance; the method comprises the steps of calculating Gaussian parameters of the clustering subsets and generating pseudo-defect samples according to the Gaussian parameters, and comprises the following steps:
calculating the mean value of the clustering subset, specifically:
wherein,μkDenotes the mean value, MkFor sample vectors in cluster subsetsThe number of the (c) component(s),represents the mean value μkA metric value in the nth dimension;
calculating the variance of the clustering subset in each dimension, specifically:
wherein,representing the variance of the clustering subset in the j-th dimension, n being the dimension,representing a sample vectorThe value of the metric in the j-th dimension,represents the mean value μkA metric value in the j-th dimension;
according to the variance of the clustering subsets in each dimension, correspondingly generating random numbers of the clustering subsets in each dimension, which specifically comprises the following steps: according to a Gaussian distribution for dimension jGenerating random numbersTaking 12 as the number of [0,1 ]]Random variable uniformly distributed onThen
Obtaining a random vector of the clustering subset according to the random number of the clustering subset in each dimension, which specifically comprises the following steps:
wherein, Delta LambdakIn the form of a random vector, the vector is,random numbers of the clustering subsets in the nth dimension;
obtaining a pseudo-defect sample according to the mean value and the random vector of the clustering subset, specifically:
where t is a pseudo-defect sample, μkTo mean of the cluster subset, Δ ΛkIs a random vector.
5. The software defect prediction method of claim 1, wherein the step of training to obtain a defect prediction model according to the updated defect sample set specifically comprises:
wherein,to representTaking the value of lambda when the maximum value is obtained;
xi、xjrespectively for updating ith and j sample vectors, y in the defect sample seti、yjRespectively updating defect marks, lambda, corresponding to ith and j sample vectors in the defect sample seti、λjRespectively representing the weights of ith and j sample vectors in the updated defect sample set; and S.T. represents a constraint condition, C is a constant, and M + L represents the number of sample vectors in the updated defect sample set.
6. The software defect prediction method of claim 1, wherein the step of performing defect prediction on the software module to be tested according to the defect prediction model comprises:
performing static measurement on the software module to be measured to obtain a sample vector of the software module to be measured;
performing defect prediction on the software module to be tested according to the sample vector of the software module to be tested and a defect prediction model, specifically:
wherein g (x) represents the defect mark of the software module to be tested, sgn represents the pairFind an integer variable, whenGreater than 0 is taken as 10 is selected when the value is less than or equal to 0; x is the number ofiFor updating the ith sample vector, y, in the defect sample setiFor updating the defect mark, λ, corresponding to the ith sample vector in the defect sample setiAnd representing the weight of the ith sample vector in the updated defect sample set obtained by the defect prediction model, wherein M + L represents the number of the sample vectors in the updated defect sample set, x is the sample vector of the software module to be tested, and b is a constant.
7. The software defect prediction method of claim 1, further comprising a step of outputting alarm information when the software module to be tested has a defect after the step of performing defect prediction on the software module to be tested according to the defect prediction model.
8. A software bug prediction system, comprising:
the clustering module is used for acquiring the sample software module and carrying out clustering processing to obtain a clustering subset;
the calculation module is used for calculating the Gaussian parameters of the clustering subsets and generating pseudo-defect samples according to the Gaussian parameters;
the updating module is used for obtaining an updated defect sample set according to the software defect sample set and the pseudo defect sample;
the training module is used for training according to the updated defect sample set to obtain a defect prediction model;
and the prediction module is used for predicting the defects of the software module to be tested according to the defect prediction model and outputting a prediction result.
9. The software defect prediction system of claim 8, wherein the clustering module comprises:
the marking unit is used for respectively marking the sample software modules to obtain defect marks of the sample software modules;
the measurement unit is used for respectively carrying out static measurement on the sample software modules to obtain sample vectors of all the sample software modules;
the central point calculating unit is used for respectively taking each sample vector as a starting point and acquiring the central point of the sample vector;
and the clustering unit is used for clustering the sample vectors according to the central points of the sample vectors to obtain a clustering subset.
10. The software defect prediction system of claim 8, wherein the gaussian parameters include mean and variance; the calculation module comprises:
the mean value calculating unit is used for calculating a mean value of the clustering subset, and specifically comprises the following steps:
wherein, mukDenotes the mean value, MkFor sample vectors in cluster subsetsThe number of the (c) component(s),represents the mean value μkA metric value in the nth dimension;
the variance calculating unit is used for calculating the variance of the clustering subset in each dimension, and specifically comprises the following steps:
wherein,representing the variance of the clustering subset in the j-th dimension, n being the dimension,representing a sample vectorThe value of the metric in the j-th dimension,represents the mean value μkA metric value in the j-th dimension;
a random number generating unit, configured to correspondingly generate a random number of the clustering subset in each dimension according to the variance of the clustering subset in each dimension, where: according to a Gaussian distribution for dimension jGenerating random numbersTaking 12 as the number of [0,1 ]]Random variable uniformly distributed onThen
A random vector generating unit, configured to obtain a random vector of the clustering subset according to the random number of the clustering subset in each dimension, specifically to obtain a random vector of the clustering subset
Wherein, Delta LambdakIn the form of a random vector, the vector is,random numbers of the clustering subsets in the nth dimension;
a pseudo-defect sample generating unit, configured to obtain a pseudo-defect sample according to the mean value and the random vector of the clustering subset, specifically:
where t is a pseudo-defect sample, μkTo mean of the cluster subset, Δ ΛkIs a random vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510247157.9A CN104899135B (en) | 2015-05-14 | 2015-05-14 | Software Defects Predict Methods and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510247157.9A CN104899135B (en) | 2015-05-14 | 2015-05-14 | Software Defects Predict Methods and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104899135A true CN104899135A (en) | 2015-09-09 |
CN104899135B CN104899135B (en) | 2017-10-20 |
Family
ID=54031810
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510247157.9A Active CN104899135B (en) | 2015-05-14 | 2015-05-14 | Software Defects Predict Methods and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104899135B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106528417A (en) * | 2016-10-28 | 2017-03-22 | 中国电子产品可靠性与环境试验研究所 | Intelligent detection method and system of software defects |
CN106708738A (en) * | 2016-12-23 | 2017-05-24 | 上海斐讯数据通信技术有限公司 | Method and system for predicting software testing defects |
CN106919505A (en) * | 2017-02-20 | 2017-07-04 | 中国电子产品可靠性与环境试验研究所 | Software Defects Predict Methods and device |
CN107239798A (en) * | 2017-05-24 | 2017-10-10 | 武汉大学 | A kind of feature selection approach of software-oriented defect number prediction |
CN107577605A (en) * | 2017-09-04 | 2018-01-12 | 南京航空航天大学 | A kind of feature clustering system of selection of software-oriented failure prediction |
CN108182141A (en) * | 2016-12-08 | 2018-06-19 | 中国电子产品可靠性与环境试验研究所 | Method for evaluating software quality and system |
CN106021115B (en) * | 2016-06-06 | 2018-07-10 | 重庆大学 | Unsupervised failure prediction method based on probability |
CN109597748A (en) * | 2017-09-30 | 2019-04-09 | 北京国双科技有限公司 | Aacode defect method for early warning and device |
CN109656808A (en) * | 2018-11-07 | 2019-04-19 | 江苏工程职业技术学院 | A kind of Software Defects Predict Methods based on hybrid active learning strategies |
CN111782548A (en) * | 2020-07-28 | 2020-10-16 | 南京航空航天大学 | Software defect prediction data processing method and device and storage medium |
CN113204482A (en) * | 2021-04-21 | 2021-08-03 | 武汉大学 | Heterogeneous defect prediction method and system based on semantic attribute subset division and metric matching |
CN114791886A (en) * | 2022-06-21 | 2022-07-26 | 纬创软件(武汉)有限公司 | Software problem tracking method and system |
US11645188B1 (en) | 2021-11-16 | 2023-05-09 | International Business Machines Corporation | Pull request risk prediction for bug-introducing changes |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050071807A1 (en) * | 2003-09-29 | 2005-03-31 | Aura Yanavi | Methods and systems for predicting software defects in an upcoming software release |
CN101556553A (en) * | 2009-03-27 | 2009-10-14 | 中国科学院软件研究所 | Defect prediction method and system based on requirement change |
US20100180259A1 (en) * | 2009-01-15 | 2010-07-15 | Raytheon Company | Software Defect Forecasting System |
CN103810101A (en) * | 2014-02-19 | 2014-05-21 | 北京理工大学 | Software defect prediction method and system |
CN103810102A (en) * | 2014-02-19 | 2014-05-21 | 北京理工大学 | Method and system for predicting software defects |
-
2015
- 2015-05-14 CN CN201510247157.9A patent/CN104899135B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050071807A1 (en) * | 2003-09-29 | 2005-03-31 | Aura Yanavi | Methods and systems for predicting software defects in an upcoming software release |
US20100180259A1 (en) * | 2009-01-15 | 2010-07-15 | Raytheon Company | Software Defect Forecasting System |
CN101556553A (en) * | 2009-03-27 | 2009-10-14 | 中国科学院软件研究所 | Defect prediction method and system based on requirement change |
CN103810101A (en) * | 2014-02-19 | 2014-05-21 | 北京理工大学 | Software defect prediction method and system |
CN103810102A (en) * | 2014-02-19 | 2014-05-21 | 北京理工大学 | Method and system for predicting software defects |
Non-Patent Citations (3)
Title |
---|
丁智: "基于高斯分布随机样本生成的小样本聚类算法", 《电脑知识与技术》 * |
乔辉: "软件缺陷预测技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
杨林 等: "聚类分析在软件缺陷度量应用中的研究", 《计算机工程与应用》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021115B (en) * | 2016-06-06 | 2018-07-10 | 重庆大学 | Unsupervised failure prediction method based on probability |
CN106528417A (en) * | 2016-10-28 | 2017-03-22 | 中国电子产品可靠性与环境试验研究所 | Intelligent detection method and system of software defects |
CN108182141A (en) * | 2016-12-08 | 2018-06-19 | 中国电子产品可靠性与环境试验研究所 | Method for evaluating software quality and system |
CN108182141B (en) * | 2016-12-08 | 2020-12-08 | 中国电子产品可靠性与环境试验研究所 | Software quality evaluation method and system |
CN106708738B (en) * | 2016-12-23 | 2020-02-11 | 上海斐讯数据通信技术有限公司 | Software test defect prediction method and system |
CN106708738A (en) * | 2016-12-23 | 2017-05-24 | 上海斐讯数据通信技术有限公司 | Method and system for predicting software testing defects |
CN106919505B (en) * | 2017-02-20 | 2019-07-05 | 中国电子产品可靠性与环境试验研究所 | Software Defects Predict Methods and device |
CN106919505A (en) * | 2017-02-20 | 2017-07-04 | 中国电子产品可靠性与环境试验研究所 | Software Defects Predict Methods and device |
CN107239798A (en) * | 2017-05-24 | 2017-10-10 | 武汉大学 | A kind of feature selection approach of software-oriented defect number prediction |
CN107239798B (en) * | 2017-05-24 | 2020-06-09 | 武汉大学 | Feature selection method for predicting number of software defects |
CN107577605A (en) * | 2017-09-04 | 2018-01-12 | 南京航空航天大学 | A kind of feature clustering system of selection of software-oriented failure prediction |
CN109597748A (en) * | 2017-09-30 | 2019-04-09 | 北京国双科技有限公司 | Aacode defect method for early warning and device |
CN109656808A (en) * | 2018-11-07 | 2019-04-19 | 江苏工程职业技术学院 | A kind of Software Defects Predict Methods based on hybrid active learning strategies |
CN109656808B (en) * | 2018-11-07 | 2022-03-11 | 江苏工程职业技术学院 | Software defect prediction method based on hybrid active learning strategy |
CN111782548A (en) * | 2020-07-28 | 2020-10-16 | 南京航空航天大学 | Software defect prediction data processing method and device and storage medium |
CN111782548B (en) * | 2020-07-28 | 2022-04-05 | 南京航空航天大学 | Software defect prediction data processing method and device and storage medium |
CN113204482A (en) * | 2021-04-21 | 2021-08-03 | 武汉大学 | Heterogeneous defect prediction method and system based on semantic attribute subset division and metric matching |
US11645188B1 (en) | 2021-11-16 | 2023-05-09 | International Business Machines Corporation | Pull request risk prediction for bug-introducing changes |
CN114791886A (en) * | 2022-06-21 | 2022-07-26 | 纬创软件(武汉)有限公司 | Software problem tracking method and system |
CN114791886B (en) * | 2022-06-21 | 2022-09-23 | 纬创软件(武汉)有限公司 | Software problem tracking method and system |
Also Published As
Publication number | Publication date |
---|---|
CN104899135B (en) | 2017-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104899135B (en) | Software Defects Predict Methods and system | |
CN106528417A (en) | Intelligent detection method and system of software defects | |
CN111104981A (en) | Hydrological prediction precision evaluation method and system based on machine learning | |
CN108960303B (en) | Unmanned aerial vehicle flight data anomaly detection method based on LSTM | |
Meeyai | Logistic regression with missing data: a comparisson of handling methods, and effects of percent missing values | |
CN108257114A (en) | A kind of transmission facility defect inspection method based on deep learning | |
Hanifah et al. | Smotebagging algorithm for imbalanced dataset in logistic regression analysis (case: Credit of bank x) | |
US11003738B2 (en) | Dynamically non-gaussian anomaly identification method for structural monitoring data | |
US20190360942A1 (en) | Information processing method, information processing apparatus, and program | |
CN104899327A (en) | Method for detecting abnormal time sequence without class label | |
CN114022446B (en) | Leather flaw detection method and system based on improvement YOLOv3 | |
US12086697B2 (en) | Relationship analysis device, relationship analysis method, and recording medium for analyzing relationship between a plurality of types of data using kernel mean learning | |
CN110502277A (en) | A kind of bad taste detection method of code based on BP neural network | |
US20210224664A1 (en) | Relationship analysis device, relationship analysis method, and recording medium | |
CN103955714A (en) | Navy detection model construction method and system and navy detection method | |
CN106599367A (en) | Method for detecting abnormal state of spacecraft | |
CN114266289A (en) | Complex equipment health state assessment method | |
CN116415481A (en) | Regional landslide hazard risk prediction method and device, computer equipment and storage medium | |
CN106960433B (en) | It is a kind of that sonar image quality assessment method is referred to based on image entropy and the complete of edge | |
CN111143768A (en) | Air quality prediction algorithm based on ARIMA-SVM combined model | |
Cançado et al. | A Bayesian spatial scan statistic for zero-inflated count data | |
CN117521512A (en) | Bearing residual service life prediction method based on multi-scale Bayesian convolution transducer model | |
JPWO2020255413A5 (en) | Data analysis device, data analysis method, and program | |
CN110751170A (en) | Panel quality detection method, system, terminal device and computer readable medium | |
CN103279030A (en) | Bayesian framework-based dynamic soft measurement modeling method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |