CN106991047B - Method and system for predicting object-oriented software defects - Google Patents

Method and system for predicting object-oriented software defects Download PDF

Info

Publication number
CN106991047B
CN106991047B CN201710187847.9A CN201710187847A CN106991047B CN 106991047 B CN106991047 B CN 106991047B CN 201710187847 A CN201710187847 A CN 201710187847A CN 106991047 B CN106991047 B CN 106991047B
Authority
CN
China
Prior art keywords
fitness
extreme value
particle
data set
iteration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710187847.9A
Other languages
Chinese (zh)
Other versions
CN106991047A (en
Inventor
朱朝阳
韩丽芳
张信明
王志宏
陈相舟
应欢
李怡康
李梦涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
State Grid Hebei Electric Power Co Ltd
Original Assignee
University of Science and Technology of China USTC
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
State Grid Hebei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC, State Grid Corp of China SGCC, China Electric Power Research Institute Co Ltd CEPRI, State Grid Hebei Electric Power Co Ltd filed Critical University of Science and Technology of China USTC
Priority to CN201710187847.9A priority Critical patent/CN106991047B/en
Publication of CN106991047A publication Critical patent/CN106991047A/en
Application granted granted Critical
Publication of CN106991047B publication Critical patent/CN106991047B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for predicting object-oriented software defects, which comprises the following steps: processing the training data set to obtain effective characteristic attributes, and establishing a new training data set according to the effective characteristic attributes; training a Support Vector Machine (SVM) according to the new training data set, and performing parameter optimization through a Particle Swarm Optimization (PSO), wherein the parameters comprise: penalty factor and gaussian kernel bandwidth; and performing defect prediction on the prediction data by using an SVM model according to the optimized parameters, and acquiring a prediction result. The invention has the beneficial effects that: the training data set is processed to obtain effective characteristic attributes, and a new training data set is established according to the effective characteristic attributes, so that dimension disaster is effectively avoided, processing cost is reduced, and data processing speed is increased; and the particle swarm optimization PSO is utilized to optimize parameters, and the optimal parameters are selected, so that the accuracy of defect prediction is improved.

Description

Method and system for predicting object-oriented software defects
Technical Field
The present invention relates to the field of software defect prediction, and more particularly, to a method and system for predicting object-oriented software defects.
Background
In the long-term development process of information system software, the information system software is mainly developed by an object-oriented design technology. Essentially, object-oriented system design is the process of finding software structural and software functional model solutions. With the complexity of the structure and the model of the software, the object scale is larger, so that the security problem of the software of the whole information system is more serious, the software defects and bugs are found as early as possible in the software development process and solved as soon as possible, the national production and the normal market operation can be guaranteed, and the method is also an important way for reducing the test cost and period in the future and improving the software quality.
Software defect prediction techniques can be divided into static and dynamic defect prediction techniques. The existing static defect prediction technology is basically provided based on different machine learning algorithms, such as classification algorithms of decision trees, random forests, naive Bayes, BP neural networks, artificial immune systems and the like, and all the methods have a certain defect prediction capability, but more or less imply some problems. For example, decision trees are over-fit, ignoring the problem of correlation between feature attributes; naive Bayes requires known prior probability and has higher requirement on attribute independence; the neural network is easy to fall into the problem of local optimum or insufficient fitting degree, as with the Bayes model, factors related to defects need to be obtained according to expert experience, the calculation efficiency is low, the support vector machine has good learning and expansion capabilities, and the optimal parameters are not set in a uniform and efficient method. And when the object-oriented software is handled, various inevitable algorithms need to handle a great number of classes and object characteristic attributes to measure the software, so that dimension disaster is caused, the detection time is too long, and the practicability of a prediction model is reduced.
Therefore, it is necessary to provide a software defect prediction method to improve the accuracy of the software prediction result.
Disclosure of Invention
The invention provides a method for predicting object-oriented software defects, which is used for solving the problem of low accuracy of software prediction results.
In order to solve the above problem, according to an aspect of the present invention, there is provided a method for predicting object-oriented software defects, the method comprising:
processing a training data set to obtain effective characteristic attributes, and establishing a new training data set according to the effective characteristic attributes, wherein the training data set comprises: a defective data set and a non-defective data set;
training a Support Vector Machine (SVM) according to the new training data set, and performing parameter optimization through a Particle Swarm Optimization (PSO), wherein the parameters comprise: penalty factor and gaussian kernel bandwidth; and
and performing defect prediction on the prediction data by using an SVM model according to the optimized parameters, and acquiring a prediction result.
Preferably, the processing the training data set to obtain effective characteristic attributes, and establishing a new training data set according to the effective characteristic attributes includes:
normalizing the weight of the characteristic attribute corresponding to each sample data in the training data set;
randomly selecting a sample data, and respectively selecting a sample data of the same type and a sample data of a different type with the minimum Euclidean distance from the sample data;
calculating and updating the weight of each characteristic attribute corresponding to the sample data according to a weight calculation formula, wherein the weight calculation formula is as follows:
Figure BDA0001255304380000021
wherein, tkWeight (t) as a feature attributek) Is a characteristic attribute tkCorresponding weight, TiIs sample data, TmissIs TiCorresponding to the sample data of the same type, T, with the smallest Euclidean distancehitIs TiCorresponding to the same type of sample data with the smallest Euclidean distance, D (T)i,Tmiss,tk) Is TiAnd TmissAt a characteristic attribute tkAbove Euclidean distance, D (T)i,Thit,tk) Is TiAnd ThitAt a characteristic attribute tkAbove Euclidean distance, max (D (t)k) For all samples at attribute t)kThe maximum euclidean distance above, n being the number of iterations.
Repeating the two steps according to preset times, and calculating the average weight of each characteristic attribute;
comparing the average weight of each characteristic attribute with a preset threshold, and selecting the characteristic attribute with the average weight larger than the preset threshold as an effective characteristic attribute; and
and selecting data corresponding to the effective characteristic attributes to establish a new training data set.
Preferably, the training a support vector machine SVM according to the new training data set and performing parameter optimization through a Particle Swarm Optimization (PSO) algorithm includes:
initializing settings for data, wherein the data comprises: a first learning factor, a second learning factor, an inertial weight, an iteration number, and a particle swarm, the particle swarm comprising: the position and velocity of the particle;
calculating the prediction accuracy of the SVM model according to the position of each particle to serve as fitness;
comparing the fitness with the individual fitness extreme value, and if the fitness is superior to the individual fitness extreme value, updating the individual fitness extreme value of the particle and the best position corresponding to the fitness by using the fitness; if the fitness is better than the individual fitness extreme value of all other particles and the group fitness extreme value in the previous iteration, updating the group fitness extreme value in the current iteration and the best position corresponding to the fitness extreme value by using the fitness; and
judging whether the maximum iteration times is reached or the population fitness extreme value is larger than a preset population fitness extreme value,
if the maximum iteration times is reached or the group fitness extreme value is not less than the preset group fitness extreme value, outputting the best position corresponding to the group fitness extreme value at the moment as an optimal parameter value;
and if the maximum iteration times are not reached and the population fitness extreme value is smaller than the preset population fitness extreme value, updating the position and the speed of each particle, returning to the step to calculate the prediction accuracy of the SVM model according to the position of each particle to be used as the fitness until the maximum iteration times are reached or the population fitness extreme value is not smaller than the preset population fitness extreme value, and outputting the best position corresponding to the population fitness extreme value at the moment as the optimal parameter value.
Preferably, wherein said updating the position and velocity of each particle comprises:
Figure BDA0001255304380000041
Figure BDA0001255304380000042
Figure BDA0001255304380000043
wherein, Vi k+1Is the velocity, w, of the updated (k + 1) th sub-particle ikIs the inertial weight at the kth iteration, Vi kIs the velocity of particle i at the kth iteration, c1And c2For fixed parameters, PBesti kIs the best position, S, corresponding to the individual fitness extremum of the particle i at the kth iterationi kIs the position of particle i at the kth iteration, GBestkIs the best position, S, corresponding to the extreme value of group fitness in the k iterationi k+1Is the position of the particle i at the k +1 iteration, num is the number of particles, closekThe degree of population clustering at the kth iteration,
Figure BDA0001255304380000044
is the Euclidean distance, | S, of each particle from the mean position (center of gravity) of the populationmax-SminI is the maximum diameter length of the solution space, w is the inertial weight, wminLower bound of inertial weight, wmaxIs an upper bound on inertial weight.
Preferably, before the predicting the defect of the prediction data by using the SVM model according to the optimized parameter and obtaining the prediction result, the method further comprises:
and performing defect prediction on the test data set by utilizing an SVM model according to the optimized parameters, and verifying the accuracy of the optimized parameters.
According to another aspect of the present invention, there is provided a system for predicting object-oriented software defects, the system comprising: a data processing unit, a parameter optimization unit and a defect prediction unit,
the data processing unit is configured to process a training data set, acquire an effective characteristic attribute, and establish a new training data set according to the effective characteristic attribute, where the training data set includes: a defective data set and a non-defective data set;
the parameter optimization unit is configured to train a Support Vector Machine (SVM) according to the new training data set, and perform parameter optimization through a Particle Swarm Optimization (PSO), where the parameters include: penalty factor and gaussian kernel bandwidth; and
and the defect prediction unit is used for predicting the defects of the prediction data by utilizing an SVM model according to the optimized parameters and acquiring a prediction result.
Preferably, the processing the training data set by the data processing unit to obtain effective feature attributes, and establishing a new training data set according to the effective feature attributes includes:
normalizing the weight of the characteristic attribute corresponding to each sample data in the training data set;
randomly selecting a sample data, and respectively selecting a sample data of the same type and a sample data of a different type with the minimum Euclidean distance from the sample data;
calculating and updating the weight of each characteristic attribute corresponding to the sample data according to a weight calculation formula, wherein the weight calculation formula is as follows:
Figure BDA0001255304380000051
wherein, tkWeight (t) as a feature attributek) Is a characteristic attribute tkCorresponding weight, TiIs sample data, TmissIs TiCorresponding to the sample data of the same type, T, with the smallest Euclidean distancehitIs TiCorresponding to the same type of sample data with the smallest Euclidean distance, D (T)i,Tmiss,tk) Is TiAnd TmissAt a characteristic attribute tkAbove Euclidean distance, D (T)i,Thit,tk) Is TiAnd ThitAt a characteristic attribute tkAbove Euclidean distance, max (D (t)k) For all samples at attribute t)kThe maximum euclidean distance above, n being the number of iterations.
Calculating the weight of each characteristic attribute corresponding to a plurality of sample data according to the preset times, and calculating the average weight of each characteristic attribute;
comparing the average weight of each characteristic attribute with a preset threshold, and selecting the characteristic attribute with the average weight larger than the preset threshold as an effective characteristic attribute; and
and selecting data corresponding to the effective characteristic attributes to establish a new training data set.
Preferably, the training of the support vector machine SVM by the parameter optimization unit according to the new training data set and the parameter optimization by the particle swarm optimization PSO include:
initializing settings for data, wherein the data comprises: a first learning factor, a second learning factor, an inertial weight, an iteration number, and a particle swarm, the particle swarm comprising: the position and velocity of the particle;
calculating the prediction accuracy of the SVM model according to the position of each particle to serve as fitness;
comparing the fitness with the individual fitness extreme value, and if the fitness is superior to the individual fitness extreme value, updating the individual fitness extreme value of the particle and the best position corresponding to the fitness by using the fitness; if the fitness is better than the individual fitness extreme value of all other particles and the group fitness extreme value in the previous iteration, updating the group fitness extreme value in the current iteration and the best position corresponding to the fitness extreme value by using the fitness; and
judging whether the maximum iteration times is reached or the population fitness extreme value is larger than a preset population fitness extreme value,
if the maximum iteration times is reached or the group fitness extreme value is not less than the preset group fitness extreme value, outputting the best position corresponding to the group fitness extreme value at the moment as an optimal parameter value;
and if the maximum iteration times are not reached and the population fitness extreme value is smaller than the preset population fitness extreme value, updating the position and the speed of each particle, returning to the step to calculate the prediction accuracy of the SVM model according to the position of each particle to be used as the fitness until the maximum iteration times are reached or the population fitness extreme value is not smaller than the preset population fitness extreme value, and outputting the best position corresponding to the population fitness extreme value at the moment as the optimal parameter value.
Preferably, wherein said updating the position and velocity of each particle comprises:
Figure BDA0001255304380000061
Figure BDA0001255304380000062
Figure BDA0001255304380000063
wherein, Vi k+1Is the velocity, w, of the updated (k + 1) th sub-particle ikIs the inertial weight at the kth iteration, Vi kIs the velocity of particle i at the kth iteration, c1And c2For fixed parameters, PBesti kIs the best position, S, corresponding to the individual fitness extremum of the particle i at the kth iterationi kIs the position of particle i at the kth iteration, GBestkIs the best position, S, corresponding to the extreme value of group fitness in the k iterationi k+1Is the position of the particle i at the k +1 iteration, num is the number of particles, closekThe degree of population clustering at the kth iteration,
Figure BDA0001255304380000064
is the Euclidean distance, | S, of each particle from the mean position (center of gravity) of the populationmax-SminI is the maximum diameter length of the solution space, w is the inertial weight, wminLower bound of inertial weight, wmaxIs an upper bound on inertial weight.
Preferably, wherein the system further comprises:
and a verification unit, configured to perform defect prediction on the prediction data by using an SVM model according to the optimized parameter in the defect prediction unit 803, and before obtaining a prediction result, perform defect prediction on the test data set by using the SVM model according to the optimized parameter, and verify the accuracy of the optimized parameter.
The invention has the beneficial effects that:
1. the technical scheme of the invention processes the training data set to obtain the effective characteristic attributes, and establishes a new training data set according to the effective characteristic attributes, thereby effectively avoiding dimension disaster, reducing processing cost and improving data processing speed.
2. According to the technical scheme, the SVM is trained according to the new training data set, parameter optimization is performed through a Particle Swarm Optimization (PSO), the optimal parameter is selected, and accuracy of defect prediction is improved.
Drawings
A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:
FIG. 1 is a flow diagram of a method 100 for predicting object-oriented software defects, according to an embodiment of the present invention;
FIG. 2 is a flow diagram of a method 200 of processing a training data set according to an embodiment of the present invention;
FIG. 3 is a flow chart of a method 300 for parameter optimization using PSO according to an embodiment of the present invention;
FIG. 4 is a graph comparing accuracy across data sets according to embodiments of the present invention;
FIG. 5 is a graph comparing accuracy across data sets according to embodiments of the present invention;
FIG. 6 is a comparison graph of recall on various data sets according to an embodiment of the present invention;
FIG. 7 is a graph comparing F values on various data sets according to an embodiment of the present invention; and
FIG. 8 is a block diagram illustrating a system 800 for predicting object-oriented software bugs, according to an embodiment of the present invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.
Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
FIG. 1 is a flow diagram of a method 100 for predicting object-oriented software defects, according to an embodiment of the present invention. As shown in fig. 1, the method 100 for predicting object-oriented software defects is used for predicting object-oriented software defects. According to the method, the training data set is processed to obtain the effective characteristic attributes, a new training data set is established according to the effective characteristic attributes, then the Support Vector Machine (SVM) is trained according to the new training data set, parameter optimization is carried out through a Particle Swarm Optimization (PSO) algorithm, finally, the predicted data is subjected to defect prediction by using an SVM model according to the optimized parameters, the prediction result is obtained, and the accuracy of software defect prediction is improved. The method 100 for predicting object-oriented software defects starts at step 101, and processes a training data set at step 101 to obtain valid feature attributes, and establishes a new training data set according to the valid feature attributes, where the training data set includes: a defective data set and a non-defective data set.
FIG. 2 is a method 20 for processing a training data set according to an embodiment of the present invention0, flow chart. As shown in fig. 2, the method 200 for processing a training data set starts from step 201, and performs normalization processing on a weight value of a feature attribute corresponding to each sample data in the training data set in step 201. In an embodiment of the present invention, the training data set is set to T ═ T1,T2,…,Ti},Ti=(t1,t2,…,tk) Wherein t iskAnd for the characteristic attributes, assigning the weight value of each characteristic attribute of the samples in the training data set to be 0, digitizing the attributes represented by the non-numerical values, and normalizing all the numerical values according to a maximum and minimum normalization method.
Preferably, in step 202, a sample data is randomly selected, and a sample data of the same type and a sample data of a different type having the smallest euclidean distance with the sample data are respectively selected. In an embodiment of the invention, a sample T is randomly selected from the training data setiAnd respectively selecting one and the sample TiSample data T of the same type with the minimum Euclidean distancehitAnd non-homogeneous sample data and TmissWherein the type means whether or not there is a defect. If T isiFor defective samples, then ThitFor defective samples, TmissSamples that were defect free; if T isiFor a defect-free sample, then ThitFor defect-free samples, TmissIs a defective sample.
Preferably, in step 203, the weight value of each feature attribute corresponding to the sample data is calculated and updated according to a weight value calculation formula, where the weight value calculation formula is:
Figure BDA0001255304380000091
wherein, tkWeight (t) as a feature attributek) Is a characteristic attribute tkCorresponding weight, TiIs sample data, TmissIs TiCorresponding to the sample data of the same type, T, with the smallest Euclidean distancehitIs TiCorresponding Euclidean distanceThe smallest specimen data of the same type, D (T)i,Tmiss,tk) Is TiAnd TmissAt a characteristic attribute tkAbove Euclidean distance, D (T)i,Thit,tk) Is TiAnd ThitAt a characteristic attribute tkAbove Euclidean distance, max (D (t)k) For all samples at attribute t)kThe maximum euclidean distance above, n being the number of iterations.
Preferably, the above two steps are repeated according to a preset number of times in step 204, and an average weight of each feature attribute is calculated.
Preferably, in step 205, the average weight of each feature attribute is compared with a preset threshold, and the feature attribute with the average weight greater than the preset threshold is selected as the effective feature attribute. In the embodiment of the invention, the average weight of each feature attribute is sorted, and then the feature attributes with the average weight larger than a preset threshold are selected for reservation.
Preferably, in step 206, data corresponding to the valid feature attributes are selected to create a new training data set. In the implementation mode of the invention, the reserved characteristic attributes form a characteristic attribute set, and then the training data corresponding to the characteristic attribute set is selected for supporting the training and classification of the vector machine model.
Preferably, in step 202, a support vector machine SVM is trained according to the new training data set, and a particle swarm optimization PSO is used to perform parameter optimization, wherein the parameters include: penalty factor and gaussian kernel bandwidth. Fig. 3 is a flowchart of a method 300 for performing parameter optimization by using a particle swarm optimization PSO according to an embodiment of the present invention. As shown in fig. 3, the method 300 for parameter optimization by particle swarm optimization PSO starts at step 301, and initializes the data at step 301, wherein the data includes: a first learning factor, a second learning factor, an inertial weight, an iteration number, and a particle swarm, the particle swarm comprising: the position and velocity of the particles. In the embodiment of the invention, the relevant parameters optimized by the PSO (particle swarm optimization) algorithm comprise a penalty factor C and the bandwidth of a Gaussian kernel. At the time of data initialization setting, include: initializing a speed interval, learning factor c1And c2An inertial weight w, an iteration number n and a particle swarm, wherein the particle swarm is expressed as S { (S)1_c,s1_σ),(s2_c,s2_σ),...,(snum_c,snum_σ) Including the position(s) of each particlei_c,si_σ) Velocity (v)i_c,vi_σ) And the population number num. Wherein the position of the particle corresponds to the penalty factor and Gaussian kernel bandwidth of the SVM model, i.e. penalty factor si_cAnd gaussian kernel bandwidth si_σ
Preferably, the prediction accuracy of the SVM model is calculated as the fitness according to the position of each particle in step 302. In an embodiment of the invention, the fitness of the current iteration of the particle is calculated for each particle position
Figure BDA0001255304380000101
In the present invention, the current penalty factor s is usedi_cAnd gaussian kernel bandwidth si_σThe prediction accuracy of the obtained SVM model is used as a fitness function return value, namely:
Figure BDA0001255304380000102
preferably, the fitness is compared with an individual fitness extreme value in step 303, and if the fitness is better than the individual fitness extreme value, the individual fitness extreme value of the particle and the best position corresponding to the fitness are updated by using the fitness; and if the fitness is better than the individual fitness extreme value of all other particles and the group fitness extreme value in the previous iteration, updating the group fitness extreme value in the current iteration and the best position corresponding to the fitness extreme value by using the fitness.
Preferably, in step 304, it is determined whether the maximum iteration number is reached or the population fitness extreme value is greater than a preset population fitness extreme value, and if the maximum iteration number is reached or the population fitness extreme value is not less than the preset population fitness extreme value, the step 305 is performed; if the maximum iteration number is not reached and the population fitness extreme value is smaller than the preset population fitness extreme value, step 306 is entered.
Preferably, the best position corresponding to the extreme value of the population fitness at this time is output as the optimal parameter value in step 305.
Preferably, the position and velocity of each particle is updated in step 306 and returns to step 302. If the maximum iteration number is not reached and the population fitness extreme value is less than the preset population fitness extreme value, updating the position and the speed of each particle, returning to the step 302 until the maximum iteration number is reached or the population fitness extreme value is not less than the preset population fitness extreme value, and outputting the best position corresponding to the population fitness extreme value at the moment as the optimal parameter value. In the embodiment of the present invention, if position
Figure BDA0001255304380000111
Is adapted to
Figure BDA0001255304380000112
Is superior to individual fitness extremum fitness (PBest)i) Updating the best position corresponding to the individual fitness extremum of the particle by using the position; if it is not
Figure BDA0001255304380000113
And is also superior to the individual extremum of all other particles and the population extremum fitness (GBest) in the previous iterationk-1) And updating the best position corresponding to the group extremum in the iteration by using the position information. If the maximum number of iterations or the current population extremum fitness (GBest) is reachedk) If the accuracy requirement is met, the iteration can be stopped, and the best position GBest corresponding to the group extremum is outputkAs the optimal parameters for training the SVM model.
Preferably, wherein said updating the position and velocity of each particle comprises:
Figure BDA0001255304380000114
Figure BDA0001255304380000115
Figure BDA0001255304380000116
wherein, Vi k+1Is the velocity, w, of the updated (k + 1) th sub-particle ikIs the inertial weight at the kth iteration, Vi kIs the velocity of particle i at the kth iteration, c1And c2For fixed parameters, PBesti kIs the best position, S, corresponding to the individual fitness extremum of the particle i at the kth iterationi kIs the position of particle i at the kth iteration, GBestkIs the best position, S, corresponding to the extreme value of group fitness in the k iterationi k+1Is the position of the particle i at the k +1 iteration, num is the number of particles, closekThe degree of population clustering at the kth iteration,
Figure BDA0001255304380000117
is the Euclidean distance, | S, of each particle from the mean position (center of gravity) of the populationmax-SminI is the maximum diameter length of the solution space, w is the inertial weight, wminLower bound of inertial weight, wmaxIs an upper bound on inertial weight. In an embodiment of the present invention, c1And c2The main influence is the balance between the individual memory and the population memory of the particles, and when the speed and the position of the particles are updated, c is set1Is 1.6, c2Is 1.5. The inertia weight w mainly influences the balance between the history memory and the current state of the particles, if the value is too large, when the particles approach the optimal solution, the particles still do not fall into the local optimal solution, the results of the global search are concerned, the influence of the local search is ignored, the optimal solution is crossed, otherwise, the moving speed is too slow, and the particles cannot approach the optimal solution as fast as possible, so the invention provides a dynamic inertia weight particle swarm optimization algorithm, the moving speed can be gradually reduced in the process that the population is quickly concentrated to the vicinity of the optimal solution, and each particle can be more refinedAnd the fitness of the surrounding space is accurately searched, and the overall performance of the standard PSO algorithm is enhanced.
Defining a variable close to represent the aggregation degree of the population, wherein the aggregation degree of the population at each iteration k is as follows:
Figure BDA0001255304380000121
wherein the value range of close is (0,1),
Figure BDA0001255304380000122
denotes the Euclidean distance, | S, of each particle from the average position (center of gravity position) of the populationmax-SminAnd | represents the maximum diameter length of the solution space, close describes the condition that the particle swarm is close to the optimal solution space after each iteration, and the larger the value of the value is, the more dispersed the particle swarm is, otherwise, the more concentrated the particle swarm is. After the particles are aggregated, the value of the inertial weight w needs to be gradually reduced, and by quantifying the aggregation degree of the particles, the aggregation degree can be mapped to a solution space of the inertial weight, so that the values of the inertial weight under different concentration degrees can be obtained. To achieve the above object, the calculation formula of w is set as follows:
Figure BDA0001255304380000123
the method can optimize local optimization after the method is quickly close to the optimal space so as to converge as soon as possible and obtain the optimal solution. Wherein wminAnd wmaxIs the lower bound and lower bound of w, set to 0.8 and 1.2, respectively.
Preferably, in step 103, a defect prediction is performed on the prediction data by using an SVM model according to the optimized parameters, and a prediction result is obtained. Preferably, before the step 103, performing defect prediction on the prediction data by using an SVM model according to the optimized parameter, and obtaining a prediction result, the method further includes: and performing defect prediction on the test data set by utilizing an SVM model according to the optimized parameters, and verifying the accuracy of the optimized parameters. In the embodiment of the invention, in order to verify the performance of the defect prediction method provided by the invention, the advantages of the proposed model are illustrated from the performances of four indexes of accuracy, precision, recall ratio and F value on four data sets. The model proposed herein was implemented based on MATLAB and compared to LE-SVM and LE-KNN. We used 4 experimental data sets conforming to the CK metric to verify the effectiveness of the defect prediction method, one is Class-level data for KC1 provided by the national aerospace administration (NASA), and comprises 145 samples, 89 characteristic attributes and 60 defect-free samples in total, and 85 defect samples; the second is an eclipse2.0 dataset based on real data of open source eclipse, with 6728 different samples comprising 975 defective samples, 5753 non-defective samples, the third is an eclipse3.0 dataset comprising 9470 samples, 1522 defective samples; the end is the ant-1.7 dataset, which has 745 samples, and 166 samples without defects. For the eclipse and ant datasets, since defect and defect is represented by the number of bugs, we first need to update them to logical variables 1 and 0 representing defect and defect. Meanwhile, as the manifold learning algorithm has the problem of data point loss in the process of high-dimensional dimensionality reduction, 700 samples are randomly selected from the last three data sets and are randomly divided into two groups with equal number, and the two groups are respectively used as a training set and a testing set.
Table 1 is a cross matrix of actual defect status and predicted results. As shown in table 1, the total number of test samples is N1+ N2+ N3+ N4, the number of correctly predicted samples is N1+ N4, and the total number of incorrectly predicted samples is N2+ N3.
TABLE 1 intersection matrix of actual defect cases and predicted results
Figure BDA0001255304380000131
Accuracy (Accuracy) represents the ratio of the number of samples with correct prediction results (defective modules are successfully detected as defective, and non-defective modules are not misjudged) to the total number of samples to be predicted, and is calculated as follows:
Figure BDA0001255304380000132
the Precision (Precision) represents the ratio of the number of actually defective and predicted defective samples to the number of all predicted defective samples, and can be expressed as:
Figure BDA0001255304380000141
recall (Recall) represents the ratio of the number of samples that are actually defective and predicted to be defective to all of the actual defective samples, and is calculated as follows:
Figure BDA0001255304380000142
the F value is a harmonic average value of the precision and the recall ratio, and the calculation formula is as follows:
Figure BDA0001255304380000143
FIG. 4 is a graph comparing accuracy across data sets according to embodiments of the present invention. As shown in fig. 4, the prediction method of the present invention is superior to the comparison algorithm in four data sets, and mainly includes that the method can remove some attributes that are unfavorable for classification, such as attributes with too small numerical difference, through the Relief algorithm, so that the prediction result is more accurate, and meanwhile, the penalty factor and gaussian kernel bandwidth that optimize the performance of the SVM training model are obtained through the PSO, so as to further improve the accuracy of the prediction result. The parameters of the comparison algorithm can be obtained only through experience, and an optimization process is lacked, so that the result has a certain distance from the optimal solution, the prediction result has a certain difference relative to the extracted model, and the accuracy of the extracted model on four data sets is higher than that of the comparison model by 8.2-12.2%.
FIG. 5 is a graph comparing accuracy across data sets according to embodiments of the present invention. FIG. 6 is a graph comparing recall on various data sets, according to an embodiment of the invention. FIG. 7 is a graph comparing F-values on various data sets according to an embodiment of the present invention. As shown in fig. 5, fig. 6, and fig. 7, which respectively show the comparison graphs of the accuracy, the recall ratio, and the F value of the method and the comparison algorithm on four data sets, it can be seen that the three indexes of the LE-SVM and the LE-KNN algorithm are similar on the latter three data sets, and the difference on CL-KC1 is mainly caused by the recall ratio, which is mainly caused by the fact that after the manifold learning performs the feature dimension reduction processing, instead of retaining part of the original feature attributes, the main information in the original data set is stored in the newly generated low-dimensional data set, and for such data, the influence of the adopted prediction method on the prediction result is reduced. Similarly, the difference between the accuracy rates of the two algorithms in fig. 4 is relatively small, and this problem can also be explained. The prediction method provided by the invention utilizes a high-efficiency Relief algorithm which is extremely suitable for binary problems during dimension reduction, and the algorithm has corresponding optimal parameters when facing different test sets because of optimization processing of penalty factors and Gaussian kernel bandwidth solution space and the problem of fixed convergence speed and local optimization is avoided to the maximum extent through the improved dynamic inertia weight PSO algorithm, so that the optimal defect prediction result is obtained, and the prediction method can obtain 9.9% higher precision, 5.6% higher recall rate and 7.7% lead in F value compared with the LE-SVM algorithm by calculating the average value of indexes on four data sets through three methods.
FIG. 8 is a block diagram illustrating a system 800 for predicting object-oriented software bugs, according to an embodiment of the present invention. As shown in fig. 8, the system 800 for predicting object-oriented software defects includes: a data processing unit 801, a parameter optimization unit 802, and a defect prediction unit 803. Preferably, the data processing unit 801 processes a training data set to obtain effective feature attributes, and establishes a new training data set according to the effective feature attributes, where the training data set includes: a defective data set and a non-defective data set. Preferably, the processing the training data set in the data processing unit 801 to obtain effective feature attributes, and establishing a new training data set according to the effective feature attributes includes:
normalizing the weight of the characteristic attribute corresponding to each sample data in the training data set;
randomly selecting a sample data, and respectively selecting a sample data of the same type and a sample data of a different type with the minimum Euclidean distance from the sample data;
calculating and updating the weight of each characteristic attribute corresponding to the sample data according to a weight calculation formula, wherein the weight calculation formula is as follows:
Figure BDA0001255304380000151
wherein, tkWeight (t) as a feature attributek) Is a characteristic attribute tkCorresponding weight, TiIs sample data, TmissIs TiCorresponding to the sample data of the same type, T, with the smallest Euclidean distancehitIs TiCorresponding to the same type of sample data with the smallest Euclidean distance, D (T)i,Tmiss,tk) Is TiAnd TmissAt a characteristic attribute tkAbove Euclidean distance, D (T)i,Thit,tk) Is TiAnd ThitAt a characteristic attribute tkAbove Euclidean distance, max (D (t)k) For all samples at attribute t)kThe maximum euclidean distance above, n being the number of iterations.
Calculating the weight of each characteristic attribute corresponding to a plurality of sample data according to the preset times, and calculating the average weight of each characteristic attribute;
comparing the average weight of each characteristic attribute with a preset threshold, and selecting the characteristic attribute with the average weight larger than the preset threshold as an effective characteristic attribute; and
and selecting data corresponding to the effective characteristic attributes to establish a new training data set.
Preferably, the parameter optimization unit 802 trains a support vector machine SVM according to the new training data set, and performs parameter optimization through a particle swarm optimization PSO, where the parameters include: penalty factor and gaussian kernel bandwidth. Preferably, the training of the support vector machine SVM according to the new training data set in the parameter optimization unit 802 and the parameter optimization by the particle swarm optimization PSO include:
initializing settings for data, wherein the data comprises: a first learning factor, a second learning factor, an inertial weight, an iteration number, and a particle swarm, the particle swarm comprising: the position and velocity of the particle;
calculating the prediction accuracy of the SVM model according to the position of each particle to serve as fitness;
comparing the fitness with the individual fitness extreme value, and if the fitness is superior to the individual fitness extreme value, updating the individual fitness extreme value of the particle and the best position corresponding to the fitness by using the fitness; if the fitness is better than the individual fitness extreme value of all other particles and the group fitness extreme value in the previous iteration, updating the group fitness extreme value in the current iteration and the best position corresponding to the fitness extreme value by using the fitness; and
judging whether the maximum iteration times is reached or the population fitness extreme value is larger than a preset population fitness extreme value,
if the maximum iteration times is reached or the group fitness extreme value is not less than the preset group fitness extreme value, outputting the best position corresponding to the group fitness extreme value at the moment as an optimal parameter value;
and if the maximum iteration times are not reached and the population fitness extreme value is smaller than the preset population fitness extreme value, updating the position and the speed of each particle, returning to the step to calculate the prediction accuracy of the SVM model according to the position of each particle to be used as the fitness until the maximum iteration times are reached or the population fitness extreme value is not smaller than the preset population fitness extreme value, and outputting the best position corresponding to the population fitness extreme value at the moment as the optimal parameter value.
Preferably, wherein said updating the position and velocity of each particle comprises:
Figure BDA0001255304380000171
Figure BDA0001255304380000172
Figure BDA0001255304380000173
wherein, Vi k+1Is the velocity, w, of the updated (k + 1) th sub-particle ikIs the inertial weight at the kth iteration, Vi kIs the velocity of particle i at the kth iteration, c1And c2For fixed parameters, PBesti kIs the best position, S, corresponding to the individual fitness extremum of the particle i at the kth iterationi kIs the position of particle i at the kth iteration, GBestkIs the best position, S, corresponding to the extreme value of group fitness in the k iterationi k+1Is the position of the particle i at the k +1 iteration, num is the number of particles, closekThe degree of population clustering at the kth iteration,
Figure BDA0001255304380000174
is the Euclidean distance, | S, of each particle from the mean position (center of gravity) of the populationmax-SminI is the maximum diameter length of the solution space, w is the inertial weight, wminLower bound of inertial weight, wmaxIs an upper bound on inertial weight.
Preferably, the defect prediction unit 803 performs defect prediction on the prediction data by using an SVM model according to the optimized parameter, and obtains a prediction result. Preferably, wherein the system further comprises: the verification unit 804 is configured to perform defect prediction on the prediction data by using the SVM model according to the optimized parameter in the defect prediction unit 803, and before obtaining a prediction result, perform defect prediction on the test data set by using the SVM model according to the optimized parameter, and verify the accuracy of the optimized parameter.
The system 800 for predicting object-oriented software defects according to the embodiment of the present invention corresponds to the method 100 for predicting object-oriented software defects according to another embodiment of the present invention, and will not be described herein again.
The invention has been described with reference to a few embodiments. However, other embodiments of the invention than the one disclosed above are equally possible within the scope of the invention, as would be apparent to a person skilled in the art from the appended patent claims.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the [ device, component, etc ]" are to be interpreted openly as referring to at least one instance of said device, component, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.

Claims (6)

1. A method for predicting object-oriented software bugs, the method comprising:
processing an original training data set to obtain effective characteristic attributes, and establishing an updated training data set according to the effective characteristic attributes, wherein the updated training data set comprises: a defective data set and a non-defective data set;
training a Support Vector Machine (SVM) according to the updated training data set, and performing parameter optimization through a Particle Swarm Optimization (PSO), wherein the parameters comprise: penalty factor and gaussian kernel bandwidth; and
performing defect prediction on the prediction data by utilizing an SVM model according to the optimized parameters, and acquiring a prediction result;
wherein, the training of the SVM according to the updated training data set and the parameter optimization through PSO comprise:
step 1, initializing and setting data, wherein the data comprises: a first learning factor, a second learning factor, an inertial weight, an iteration number, and a particle swarm, the particle swarm comprising: the position and velocity of the particle;
step 2, calculating the prediction accuracy of the SVM model according to the position of each particle to serve as fitness;
step 3, comparing the fitness with an individual fitness extreme value, and if the fitness is better than the individual fitness extreme value, updating the individual fitness extreme value of the particle and the best position corresponding to the fitness by using the fitness; if the fitness is better than the individual fitness extreme value of all other particles and the group fitness extreme value in the previous iteration, updating the group fitness extreme value in the current iteration and the best position corresponding to the fitness extreme value by using the fitness;
step 4, judging whether the maximum iteration times is reached or the group fitness extreme value is larger than a preset group fitness extreme value,
if the maximum iteration times is reached or the population fitness extreme value is not less than the preset population fitness extreme value, entering the step 5;
if the maximum iteration times are not reached and the population fitness extreme value is smaller than the preset population fitness extreme value, entering step 6;
step 5, outputting the best position corresponding to the extreme value of the group fitness at the moment as an optimal parameter value; and
step 6, updating the position and the speed of each particle, and returning to the step 2;
the updating the position and the velocity of each particle comprises:
Figure 229911DEST_PATH_IMAGE001
wherein, Vi k+1Is the velocity, w, of the updated (k + 1) th sub-particle ikIs the inertial weight at the kth iteration, Vi kIs the velocity of particle i at the kth iteration, c1And c2In order to fix the parameters of the device,PBesti kis the best position, S, corresponding to the individual fitness extremum of the particle i at the kth iterationi kIs the position of particle i at the kth iteration, GBestkIs the best position, S, corresponding to the extreme value of group fitness in the k iterationi k+1Is the position of the particle i at the k +1 iteration, num is the number of particles, closekThe degree of population clustering at the kth iteration,
Figure 317953DEST_PATH_IMAGE002
is the Euclidean distance, | S, of each particle from the mean position (center of gravity) of the populationmax-SminI is the maximum diameter length of the solution space, w is the inertial weight, wminLower bound of inertial weight, wmaxIs the upper bound of inertial weight; both rand1() and rand2() are random functions.
2. The method of claim 1, wherein the processing the original training data set to obtain valid feature attributes and establishing an updated training data set according to the valid feature attributes comprises:
step 1, normalizing the weight of the characteristic attribute corresponding to each sample data in the original training data set;
step 2, randomly selecting one sample data, and respectively selecting one sample data of the same type and one sample data of a different type with the minimum Euclidean distance from the sample data;
step 3, calculating and updating the weight of each characteristic attribute corresponding to the sample data according to a weight calculation formula, wherein the weight calculation formula is as follows:
Figure 116145DEST_PATH_IMAGE003
wherein, tkWeight (t) as a feature attributek) Is a characteristic attribute tkCorresponding weight, TiIs sample data, TmissIs TiCorresponding Oldham's rayNon-homogeneous sample data with minimal distance, ThitIs TiCorresponding to the same type of sample data with the smallest Euclidean distance, D (T)i,Tmiss,tk) Is TiAnd TmissAt a characteristic attribute tkAbove Euclidean distance, D (T)i,Thit,tk) Is TiAnd ThitAt a characteristic attribute tkAbove Euclidean distance, max (D (t)k) For all samples at attribute t)kThe maximum Euclidean distance above, n is the iteration number;
step 4, repeating the step 2 and the step 3 according to preset times, and calculating the average weight of each characteristic attribute;
step 5, comparing the average weight of each characteristic attribute with a preset threshold, and selecting the characteristic attribute with the average weight larger than the preset threshold as an effective characteristic attribute; and
and 6, selecting data corresponding to the effective characteristic attributes to establish an updated training data set.
3. The method of claim 1, further comprising, before said performing a defect prediction on the prediction data by using an SVM model according to the optimized parameters and obtaining a prediction result:
and performing defect prediction on the test data set by utilizing an SVM model according to the optimized parameters, and verifying the accuracy of the optimized parameters.
4. A system for predicting object-oriented software bugs, the system comprising: a data processing unit, a parameter optimization unit and a defect prediction unit,
the data processing unit is configured to process an original training data set, obtain an effective characteristic attribute, and establish an updated training data set according to the effective characteristic attribute, where the updated training data set includes: a defective data set and a non-defective data set;
the parameter optimization unit is configured to train a Support Vector Machine (SVM) according to the updated training data set, and perform parameter optimization through a Particle Swarm Optimization (PSO), where the parameters include: penalty factor and gaussian kernel bandwidth; and
the defect prediction unit is used for predicting defects of prediction data by utilizing an SVM model according to the optimized parameters and acquiring a prediction result;
the parameter optimization unit trains the SVM according to the updated training data set and performs parameter optimization through a Particle Swarm Optimization (PSO), and the parameter optimization method comprises the following steps:
initializing settings for data, wherein the data comprises: a first learning factor, a second learning factor, an inertial weight, an iteration number, and a particle swarm, the particle swarm comprising: the position and velocity of the particle;
calculating the prediction accuracy of the SVM model according to the position of each particle to serve as fitness;
comparing the fitness with the individual fitness extreme value, and if the fitness is superior to the individual fitness extreme value, updating the individual fitness extreme value of the particle and the best position corresponding to the fitness by using the fitness; if the fitness is better than the individual fitness extreme value of all other particles and the group fitness extreme value in the previous iteration, updating the group fitness extreme value in the current iteration and the best position corresponding to the fitness extreme value by using the fitness; and
judging whether the maximum iteration times is reached or the population fitness extreme value is larger than a preset population fitness extreme value,
if the maximum iteration times is reached or the group fitness extreme value is not less than the preset group fitness extreme value, outputting the best position corresponding to the group fitness extreme value at the moment as an optimal parameter value;
if the maximum iteration times are not reached and the population fitness extreme value is smaller than the preset population fitness extreme value, updating the position and the speed of each particle, returning to the step to calculate the prediction accuracy of the SVM model according to the position of each particle to be used as the fitness until the maximum iteration times are reached or the population fitness extreme value is not smaller than the preset population fitness extreme value, and outputting the best position corresponding to the population fitness extreme value at the moment to be used as the optimal parameter value;
the updating the position and the velocity of each particle comprises:
Figure 128094DEST_PATH_IMAGE001
wherein, Vi k+1Is the velocity, w, of the updated (k + 1) th sub-particle ikIs the inertial weight at the kth iteration, Vi kIs the velocity of particle i at the kth iteration, c1And c2For fixed parameters, PBesti kIs the best position, S, corresponding to the individual fitness extremum of the particle i at the kth iterationi kIs the position of particle i at the kth iteration, GBestkIs the best position, S, corresponding to the extreme value of group fitness in the k iterationi k+1Is the position of the particle i at the k +1 iteration, num is the number of particles, closekThe degree of population clustering at the kth iteration,
Figure 328131DEST_PATH_IMAGE004
is the Euclidean distance, | S, of each particle from the mean position (center of gravity) of the populationmax-SminI is the maximum diameter length of the solution space, w is the inertial weight, wminLower bound of inertial weight, wmaxIs the upper bound of inertial weight; both rand1() and rand2() are random functions.
5. The system of claim 4, wherein the data processing unit processes the original training data set to obtain valid feature attributes, and establishes an updated training data set according to the valid feature attributes, comprising:
normalizing the weight of the characteristic attribute corresponding to each sample data in the original training data set;
randomly selecting a sample data, and respectively selecting a sample data of the same type and a sample data of a different type with the minimum Euclidean distance from the sample data;
calculating and updating the weight of each characteristic attribute corresponding to the sample data according to a weight calculation formula, wherein the weight calculation formula is as follows:
Figure 852654DEST_PATH_IMAGE005
wherein, tkWeight (t) as a feature attributek) Is a characteristic attribute tkCorresponding weight, TiIs sample data, TmissIs TiCorresponding to the sample data of the same type, T, with the smallest Euclidean distancehitIs TiCorresponding to the same type of sample data with the smallest Euclidean distance, D (T)i,Tmiss,tk) Is TiAnd TmissAt a characteristic attribute tkAbove Euclidean distance, D (T)i,Thit,tk) Is TiAnd ThitAt a characteristic attribute tkAbove Euclidean distance, max (D (t)k) For all samples at attribute t)kThe maximum Euclidean distance above, n is the iteration number;
calculating the weight of each characteristic attribute corresponding to a plurality of sample data according to the preset times, and calculating the average weight of each characteristic attribute;
comparing the average weight of each characteristic attribute with a preset threshold, and selecting the characteristic attribute with the average weight larger than the preset threshold as an effective characteristic attribute; and
and selecting data corresponding to the effective characteristic attributes to establish an updated training data set.
6. The system of claim 4, further comprising:
and the verification unit is used for performing defect prediction on the prediction data by utilizing the SVM model according to the optimized parameters and verifying the accuracy of the optimized parameters before obtaining a prediction result by utilizing the SVM model according to the optimized parameters.
CN201710187847.9A 2017-03-27 2017-03-27 Method and system for predicting object-oriented software defects Active CN106991047B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710187847.9A CN106991047B (en) 2017-03-27 2017-03-27 Method and system for predicting object-oriented software defects

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710187847.9A CN106991047B (en) 2017-03-27 2017-03-27 Method and system for predicting object-oriented software defects

Publications (2)

Publication Number Publication Date
CN106991047A CN106991047A (en) 2017-07-28
CN106991047B true CN106991047B (en) 2020-11-17

Family

ID=59413359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710187847.9A Active CN106991047B (en) 2017-03-27 2017-03-27 Method and system for predicting object-oriented software defects

Country Status (1)

Country Link
CN (1) CN106991047B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886160B (en) * 2017-10-25 2021-03-26 河北工程大学 BP neural network interval water demand prediction method
CN108304316B (en) * 2017-12-25 2021-04-06 浙江工业大学 Software defect prediction method based on collaborative migration
CN109522627B (en) * 2018-11-01 2022-12-02 西安电子科技大学 Fan blade icing prediction method based on SCADA (Supervisory control and data acquisition) data
CN110134108B (en) * 2019-05-14 2021-10-22 内蒙古电力(集团)有限责任公司内蒙古电力科学研究院分公司 Code defect testing method and device
CN110659719B (en) * 2019-09-19 2022-02-08 江南大学 Aluminum profile flaw detection method
CN111177011A (en) * 2020-01-02 2020-05-19 腾讯科技(深圳)有限公司 Software test-free prediction method, device, equipment and storage medium
CN111459838B (en) * 2020-04-20 2021-09-03 武汉大学 Software defect prediction method and system based on manifold alignment
CN111611010B (en) * 2020-04-24 2021-10-08 武汉大学 Interpretable method for code modification real-time defect prediction
CN112416783B (en) * 2020-11-25 2022-05-20 武汉联影医疗科技有限公司 Method, device, equipment and storage medium for determining software quality influence factors
CN113268434B (en) * 2021-07-08 2022-07-26 北京邮电大学 Software defect prediction method based on Bayes model and particle swarm optimization
CN114092768B (en) * 2021-11-30 2024-09-20 苏州浪潮智能科技有限公司 Screening method and device for training models in training model group and electronic equipment
CN114154730A (en) * 2021-12-08 2022-03-08 杭州电子科技大学 Software maintenance scale prediction method based on multilayer iteration

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8881095B1 (en) * 2012-03-30 2014-11-04 Sprint Communications Company L.P. Software defect prediction
CN104702460A (en) * 2013-12-10 2015-06-10 中国科学院沈阳自动化研究所 Method for detecting anomaly of Modbus TCP (transmission control protocol) communication on basis of SVM (support vector machine)

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8881095B1 (en) * 2012-03-30 2014-11-04 Sprint Communications Company L.P. Software defect prediction
CN104702460A (en) * 2013-12-10 2015-06-10 中国科学院沈阳自动化研究所 Method for detecting anomaly of Modbus TCP (transmission control protocol) communication on basis of SVM (support vector machine)

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《高维数据的特征选择及基于特征选择的集成学习研究》;张丽新;《中国优秀博硕士学位论文全文数据库 (博士) 信息科技辑》;20050715;I138-1 *
基于ACO_SVM的软件缺陷预测模型的研究;姜慧研,宗茂,刘相莹;《计算机学报》;20110630;第34卷(第6期);1148-1154 *
基于ReliefF和PSO混合特征选择的面向对象土地利用分类;肖艳,姜琦刚,王斌等;《农业工程学报》;20160228;第32卷(第4期);211-216 *

Also Published As

Publication number Publication date
CN106991047A (en) 2017-07-28

Similar Documents

Publication Publication Date Title
CN106991047B (en) Method and system for predicting object-oriented software defects
CN108564129B (en) Trajectory data classification method based on generation countermeasure network
CN110084610B (en) Network transaction fraud detection system based on twin neural network
CN110111113B (en) Abnormal transaction node detection method and device
CN108304316B (en) Software defect prediction method based on collaborative migration
CN111126564A (en) Neural network structure searching method, device and equipment
CN107103332A (en) A kind of Method Using Relevance Vector Machine sorting technique towards large-scale dataset
CN110349597B (en) Voice detection method and device
CN112465040A (en) Software defect prediction method based on class imbalance learning algorithm
CN103279746B (en) A kind of face identification method based on support vector machine and system
CN110020712B (en) Optimized particle swarm BP network prediction method and system based on clustering
CN115048988B (en) Unbalanced data set classification fusion method based on Gaussian mixture model
CN109948735A (en) A kind of multi-tag classification method, system, device and storage medium
CN113111804B (en) Face detection method and device, electronic equipment and storage medium
CN113541985A (en) Internet of things fault diagnosis method, training method of model and related device
CN110378389A (en) A kind of Adaboost classifier calculated machine creating device
EP3685266A1 (en) Power state control of a mobile device
CN112270405A (en) Filter pruning method and system of convolution neural network model based on norm
CN115112372A (en) Bearing fault diagnosis method and device, electronic equipment and storage medium
CN112884569A (en) Credit assessment model training method, device and equipment
CN114330650A (en) Small sample characteristic analysis method and device based on evolutionary element learning model training
CN106251861A (en) A kind of abnormal sound in public places detection method based on scene modeling
CN109948738B (en) Energy consumption abnormity detection method and device for coating drying chamber
CN111352926B (en) Method, device, equipment and readable storage medium for data processing
He et al. Single-stage heavy-tailed food classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant