CN108763096A - Software Defects Predict Methods based on depth belief network algorithm support vector machines - Google Patents

Software Defects Predict Methods based on depth belief network algorithm support vector machines Download PDF

Info

Publication number
CN108763096A
CN108763096A CN201810571352.0A CN201810571352A CN108763096A CN 108763096 A CN108763096 A CN 108763096A CN 201810571352 A CN201810571352 A CN 201810571352A CN 108763096 A CN108763096 A CN 108763096A
Authority
CN
China
Prior art keywords
dbn
software
svm
training
support vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810571352.0A
Other languages
Chinese (zh)
Inventor
单纯
熊雯洁
位华
胡昌振
毛俐旻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Beijing Institute of Computer Technology and Applications
Original Assignee
Beijing Institute of Technology BIT
Beijing Institute of Computer Technology and Applications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT, Beijing Institute of Computer Technology and Applications filed Critical Beijing Institute of Technology BIT
Priority to CN201810571352.0A priority Critical patent/CN108763096A/en
Publication of CN108763096A publication Critical patent/CN108763096A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of Software Defects Predict Methods based on depth belief network algorithm support vector machines DBN-SVM, and dimensionality reduction is carried out to the software metrics attribute extracted from software to be predicted using depth belief network DBN;Data after dimensionality reduction enter support vector machines and classify, and obtain software defect prediction result.The present invention uses novel software defect forecast of distribution model --- DBN-SVM, solves the problem of that the precision of prediction in the prediction of software defect distribution caused by the data redundancy caused by multidimensional measure reduces.

Description

Software Defects Predict Methods based on depth belief network algorithm support vector machines
Technical field
The present invention relates to software defect Predicting Techniques, more particularly to one kind being based on depth belief network algorithm support vector machines Software Defects Predict Methods.
Background technology
Software defect forecast of distribution plays an important role in software development process, to the timely standard of defect software module Really prediction will greatly improve effective configuration of software test resource.Static analysis can be found in software before software is issued Existing defect, and the efficiency of running software will not be reduced.
Therefore in recent years, many researchers form training sample by extracting the software metrics attribute of software module, and Software defect forecast of distribution model is built using machine learning techniques, machine learning techniques are applied to software defect static prediction Field.Traditional bug prediction model refers mainly in the case of defective data abundance, using there are commonly the engineerings of supervision Algorithm is practised, to the model that the defects of same software data are trained and predict, wherein commonly used machine learning algorithm master To include decision tree (DT), random forest (RF), naive Bayesian (NB), logistic regression (LR), support vector machines (SVM), people Artificial neural networks (ANN) etc..
But with growth of the modern large scope software system in terms of software size and software complexity, use machine learning Method structure software defect prediction model needs to face huge higher-dimension degrees of data.During software defect is predicted, excessive Metric attribute will lead to data redundancy, so as to cause higher forecast cost and lower precision of prediction, simultaneously because data With nonlinear feature, SVM is being used alone, build Optimal Separating Hyperplane and can form interference when being predicted.
Invention content
In view of this, the present invention provides a kind of soft based on depth belief network algorithm support vector machines (DBN-SVM) Part defect distribution prediction technique solves software defect distribution using novel software defect forecast of distribution model --- DBN-SVM Prediction in, the problem of precision of prediction caused by the data redundancy caused by multidimensional measure reduces.
In order to solve the above-mentioned technical problem, the invention is realized in this way:
A kind of Software Defects Predict Methods based on depth belief network algorithm support vector machines DBN-SVM, including:
Dimensionality reduction is carried out to the software metrics attribute extracted from software to be predicted using depth belief network DBN;
Data after dimensionality reduction enter support vector machines and classify, and obtain software defect prediction result.
Preferably, software defect of the sample data set of DBN-SVM models from US National Aeronautics and Space Administration NASA is pre- Measured data collection MDP.
Preferably, JM1, MC1 and PC5 data set chosen in MDP are trained and verify.
Preferably, the training of DBN-SVM models includes the following steps:
Step 1 concentrates selected section to be passed through for carrying out pre-training to DBN as training set X in sample data The training set X of dimensionality reduction1And the DBN built;
Step 2, by the training set X1SVM classifier is inputted, grader is trained;
Step 3 concentrates selection test set Y from the sample data, inputs trained DBN-SVM, obtains prediction result, It is compared with actual result, to verify modelling effect.
Preferably, sample data set is randomly divided into 10 subsets;One of subset is chosen every time as test set, and Remaining 9 subsets are used as training set, using ten times of cross validations, carry out 10 confirmatory experiments altogether;By 10 experimental results It is compared with actual result, and the average value of ten comparisons is taken to carry out the performance of evaluation model.
Advantageous effect:
(1) present invention builds two-tier network using DBN and SVM, and DBN can solve asking for software metrics attribute data redundancy Topic, has the function that dimensionality reduction.Data after dimensionality reduction are classified using SVM so that last defect distribution prediction result is compared with it His traditional Predicting Technique, which is compared, possesses higher accuracy, simultaneously because the use of DBN so that the present invention is based on god compared with other Failure prediction method through network is compared, and neural metwork training speed can be effectively improved.
(2) compared with traditional prediction method, this patent can efficiently extract and handle the data characteristics of source program, performance Better than the segmental defects prediction technique such as SVM, LLE-SVM, NPE-SVM.
Description of the drawings
Fig. 1 is the different data collection in MDP data.
Fig. 2 is the software metrics attribute for including in data set.
Fig. 3 is the structure that DBN carries out pre-training.
Fig. 4 is the two-tier network that DBN and SVM is built.
Specific implementation mode
The present invention will now be described in detail with reference to the accompanying drawings and examples.
In order to which the various defects in more accurately forecasting software are to improve the quality of software, higher-dimension Software Metrics Data is reduced Dimension be necessary thing.Manifold learning is a kind of important method for handling high dimensional data, it can be found that hiding Real structure in higher-dimension Software Metrics Data.Currently, researcher mainly proposes, to be locally linear embedding into (LLE), neighborhood embedding Enter protection the methods of (NPE) and Isomap.Metric data after dimensionality reduction is also needed to using machine learning Method structure prediction model classifies to it.
By successful inspiration of the deep learning in terms of image procossing, speech recognition and natural language processing, the application thinks Depth belief network (DBN) even depth learning method can also become the effective ways of inspection software defect.DBN, Deep Belief Nets are a kind of neural network algorithms.If DBN is made of dried layer neuron, constituent element is limited Boltzmann machine (RBM).Feature is learnt using DBN, by its interneuronal weight of training, the characteristics of primitive character can be retained, The dimension of feature is reduced simultaneously.DBN effectively can reduce data dimension in first layer, reduce what higher-dimension degrees of data brought SVM Interference.Compared with other neural networks, the great advantage of DBN is that calculating speed faster, and then efficiently solves large-scale nerve net The slow problem of network training speed.
Therefore, the present invention proposes a kind of combination depth belief network and the new software defect distribution of support vector machines is pre- Model is surveyed, structure depth belief network carries out validity feature extraction to software metrics attribute, to realize Data Dimensionality Reduction, then Defect distribution prediction model is built using the data after dimensionality reduction, which uses SVM as software defect forecast of distribution model Basic classification device.
As it can be seen that the present invention builds two-tier network using DBN and SVM, as shown in figure 4, DBN can solve software metrics attribute The problem of data redundancy, has the function that dimensionality reduction.Data after dimensionality reduction are classified using SVM so that last defect distribution Prediction result is compared compared with other traditional Predicting Techniques possesses higher accuracy, simultaneously because the use of DBN so that the present invention It is compared compared with other failure prediction methods based on neural network, neural metwork training speed can be effectively improved.
The present invention will be described in detail below.
Step 1:Obtain software defect predictive data set.
MDP of the experimental data that the present invention uses from NASA, it is widely used in software defect forecasting research.It is wrapped Containing 13 data sets, it is illustrated in fig. 1 shown below.Each data acquisition system includes multiple samples, and each sample corresponds to a software module, And each software module is made of 21 static code attributes and 1 identity property.Static code attribute is to every data In be identified, including code line (Loc), Halstead attributes and McCabe attributes and last judgement label, the label Value indicate the software corresponding to the data whether have defect, it is defective be true, zero defect false.In this patent In selected the data set of JM1 in NASA, MC1 and PC5 to be trained and verify.
User can therefrom select specific data set as sample data set.
The step will be completed before defect distribution prediction model starts prediction.
Step 2:Selection training set is concentrated in sample data, for building DBN, pre-training is carried out, obtains by dimensionality reduction Training set and the DBN built.
The step is the selection training set X and test set Y on the sample data set that step 1 obtains, and training set X is used for structure DBN is built, pre-training is carried out;Test set is then used for testing model and builds achievement.
In order to verify the predictive ability of model, this patent uses ten times of cross validation methods.First, step 1 is obtained Sample data set is randomly divided into 10 subsets.Every time in experiment, one of subset is chosen by as test set, and remaining 9 A subset is used as training set, carries out 10 confirmatory experiments in total.It is tested by 10 times, the average value that we are tested by 10 times Carry out the performance of evaluation model.The training set and test set used every time is indicated with X and Y respectively.
Pre-training is carried out to DBN using training set X.
DBN in this patent prediction model stacks RBM by 5 and constitutes, and concrete structure is illustrated in fig. 3 shown below.
It is two parts by X points of the training set obtained before, will determine that this attribute of label individually proposes, remaining part is made For the input of the display layer of bottom RBM, the weighting parameter of hidden layer and display layer in the RBM is adjusted by training, then will Input of the hidden layer as second RBM continues up execution training step, right to examin after training up second RBM It is worth and generates new weights;Until when the RBM of top layer is trained, will determine that label and the output of previous hidden layer are incorporated as The input of this layer of RBM, exports after training.The output is the training set Jing Guo pre-training.DBN is adjusted by successively training The weights of interlayer enable the training set by pre-training of final output to reduce data redundancy situation, extract effective attribute.
The training set by dimensionality reduction is obtained by this step, becomes the training set X of pre-training1, and the DBN that builds.
The step will be completed before defect distribution prediction model starts prediction.
Step 3:The training set X by pre-training that step 2 is obtained1Input data set as step 3, to SVM points Class device is trained.
The step for, the training set X by pre-training that step 2 is obtained1Input number as structure SVM classifier According to collection, classification based training is carried out.Penalty factor and slack variable are introduced in SVM models to improve predictablity rate, by the instruction of SVM Practice process and is converted into the optimization problem for solving formula (1) and formula (2).
s.t.yiTγ(xi)+b)≥1-ξi, i=1,2 ..., n (2)
Wherein, C is penalty coefficient, ξiIt is slack variable, ω is the d dimensional vector orthogonal with Optimal Separating Hyperplane, and b is deviation , γ (x) is the kernel function that SVM is used.In DBN-SVM, γ (x) is Radial basis kernel function.N is to participate in dividing in input SVM The sum of the sample of class device training, is herein the training set X Jing Guo pre-training1
RBF functions are selected to use web search according to the value interval of definition and step-length as the kernel function of support vector machines Carry out Optimal Parameters with right-angled intersection verification, this method is used to optimize the parameter of SVM models, finds the value of wherein corresponding C and g, Wherein C is penalty coefficient, and g is kernel functional parameter, can improve the classification accuracy of SVM.
By this step, the SVM classifier that training is completed is obtained.
The step is completed before being predicted using model.
Step 4:The test set Y that step 2 is obtained as test input set, using in step 2 and step 3 obtain DBN with SVM carries out failure prediction, obtains test result.
This step is to carry out test and verification for the training result to preceding step.
2 trained DBN of test set Y input steps carries out dimensionality reduction, extracts validity feature attribute, obtains by pre-training Test set Y1, then by Y1It as the input of trained SVM classifier in step 3, is predicted, is obtained using bug prediction model The final prediction result obtained is compared with actual result.If prediction effect reaches requirement, illustrate to train completion.Otherwise again Sample data set is chosen to continue to train.
It is divided into the sample data set of 10 subsets, after the above process, is commented using the average value of 10 experiments The performance of valence model.
Trained DBN-SVM is obtained by above-mentioned steps 1~4.In use, using depth belief network DBN to from The software metrics attribute of software extraction to be predicted carries out dimensionality reduction;Data after dimensionality reduction enter support vector machines and classify, and obtain To software defect prediction result.
As previously mentioned, advantage of the invention is that:
(1) software security flaw static detection and machine learning algorithm are combined.
(2) depth belief network algorithm (DBN) is combined with support vector machines (SVM) technology, solves software defect number According to data redundancy problem present in collection, data nonlinear problem is solved, is effectively improved prediction index, improves predictablity rate.
In conclusion the above is merely preferred embodiments of the present invention, being not intended to limit the scope of the present invention. All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in the present invention's Within protection domain.

Claims (5)

1. a kind of Software Defects Predict Methods based on depth belief network algorithm support vector machines DBN-SVM, which is characterized in that Including:
Dimensionality reduction is carried out to the software metrics attribute extracted from software to be predicted using depth belief network DBN;
Data after dimensionality reduction enter support vector machines and classify, and obtain software defect prediction result.
2. the method as described in claim 1, which is characterized in that the sample data set of DBN-SVM models navigates from American National The software defect predictive data set MDP of empty space agency NASA.
3. method as claimed in claim 2, which is characterized in that JM1, MC1 and PC5 data set chosen in MDP are trained And verification.
4. the method as described in claim 1, which is characterized in that the training of DBN-SVM models includes the following steps:
Step 1 concentrates selected section to obtain for carrying out pre-training to DBN as training set X and pass through dimensionality reduction in sample data Training set X1And the DBN built;
Step 2, by the training set X1SVM classifier is inputted, grader is trained;
Step 3 concentrates selection test set Y from the sample data, inputs trained DBN-SVM, prediction result is obtained, with reality Border result is compared, to verify modelling effect.
5. method as claimed in claim 4, which is characterized in that sample data set is randomly divided into 10 subsets;It chooses every time One of subset is as test set, and remaining 9 subsets are used as training set, using ten times of cross validations, carries out 10 altogether Secondary confirmatory experiment;10 experimental results are compared with actual result, and the average value of ten comparisons is taken to carry out evaluation model Performance.
CN201810571352.0A 2018-06-06 2018-06-06 Software Defects Predict Methods based on depth belief network algorithm support vector machines Pending CN108763096A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810571352.0A CN108763096A (en) 2018-06-06 2018-06-06 Software Defects Predict Methods based on depth belief network algorithm support vector machines

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810571352.0A CN108763096A (en) 2018-06-06 2018-06-06 Software Defects Predict Methods based on depth belief network algorithm support vector machines

Publications (1)

Publication Number Publication Date
CN108763096A true CN108763096A (en) 2018-11-06

Family

ID=63999083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810571352.0A Pending CN108763096A (en) 2018-06-06 2018-06-06 Software Defects Predict Methods based on depth belief network algorithm support vector machines

Country Status (1)

Country Link
CN (1) CN108763096A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109612513A (en) * 2018-12-17 2019-04-12 安徽农业大学 A kind of online method for detecting abnormality towards extensive higher-dimension sensing data
CN110285976A (en) * 2019-07-09 2019-09-27 哈尔滨工业大学(威海) Multi-dimensional time sequence information based on DBN drives Fault Diagnosis of Aeroengines method
CN111026664A (en) * 2019-12-09 2020-04-17 遵义职业技术学院 Program detection method and detection system based on ANN and application
CN111522743A (en) * 2020-04-17 2020-08-11 北京理工大学 Software defect prediction method based on gradient lifting tree support vector machine
CN113396444A (en) * 2019-02-07 2021-09-14 腓特烈斯港齿轮工厂股份公司 Method and device for automatically identifying product defects of products and/or for automatically identifying causes of product defects
CN114706780A (en) * 2022-04-13 2022-07-05 北京理工大学 Software defect prediction method based on Stacking ensemble learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810102A (en) * 2014-02-19 2014-05-21 北京理工大学 Method and system for predicting software defects
CN103810101A (en) * 2014-02-19 2014-05-21 北京理工大学 Software defect prediction method and system
CN107957946A (en) * 2017-12-01 2018-04-24 北京理工大学 Software Defects Predict Methods based on neighborhood insertion protection algorism support vector machines

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810102A (en) * 2014-02-19 2014-05-21 北京理工大学 Method and system for predicting software defects
CN103810101A (en) * 2014-02-19 2014-05-21 北京理工大学 Software defect prediction method and system
CN107957946A (en) * 2017-12-01 2018-04-24 北京理工大学 Software Defects Predict Methods based on neighborhood insertion protection algorism support vector machines

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
殷瑞刚等: "深度学习中的无监督学习方法综述", 《计算机系统应用》 *
甘露: "基于深度学习的软件缺陷预测技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
陈栩杉等: "深度学习理论及其应用专题讲座(一)第2讲 深度学习基本理论概述", 《军事通信技术》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109612513A (en) * 2018-12-17 2019-04-12 安徽农业大学 A kind of online method for detecting abnormality towards extensive higher-dimension sensing data
CN113396444A (en) * 2019-02-07 2021-09-14 腓特烈斯港齿轮工厂股份公司 Method and device for automatically identifying product defects of products and/or for automatically identifying causes of product defects
CN113396444B (en) * 2019-02-07 2023-08-22 腓特烈斯港齿轮工厂股份公司 Method and device for automatically identifying product defects of a product and/or for automatically identifying product defect causes of product defects
CN110285976A (en) * 2019-07-09 2019-09-27 哈尔滨工业大学(威海) Multi-dimensional time sequence information based on DBN drives Fault Diagnosis of Aeroengines method
CN111026664A (en) * 2019-12-09 2020-04-17 遵义职业技术学院 Program detection method and detection system based on ANN and application
CN111026664B (en) * 2019-12-09 2020-12-22 遵义职业技术学院 Program detection method and detection system based on ANN and application
CN111522743A (en) * 2020-04-17 2020-08-11 北京理工大学 Software defect prediction method based on gradient lifting tree support vector machine
CN111522743B (en) * 2020-04-17 2021-10-22 北京理工大学 Software defect prediction method based on gradient lifting tree support vector machine
CN114706780A (en) * 2022-04-13 2022-07-05 北京理工大学 Software defect prediction method based on Stacking ensemble learning

Similar Documents

Publication Publication Date Title
CN108763096A (en) Software Defects Predict Methods based on depth belief network algorithm support vector machines
CN105975573B (en) A kind of file classification method based on KNN
CN107563431A (en) A kind of image abnormity detection method of combination CNN transfer learnings and SVDD
CN110188047B (en) Double-channel convolutional neural network-based repeated defect report detection method
CN109408389A (en) A kind of aacode defect detection method and device based on deep learning
CN109271374A (en) A kind of database health scoring method and scoring system based on machine learning
CN107341506A (en) A kind of Image emotional semantic classification method based on the expression of many-sided deep learning
CN106980858A (en) The language text detection of a kind of language text detection with alignment system and the application system and localization method
CN110232280A (en) A kind of software security flaw detection method based on tree construction convolutional neural networks
CN107203467A (en) The reference test method and device of supervised learning algorithm under a kind of distributed environment
CN109886021A (en) A kind of malicious code detecting method based on API overall situation term vector and layered circulation neural network
CN111339935B (en) Optical remote sensing picture classification method based on interpretable CNN image classification model
CN108596274A (en) Image classification method based on convolutional neural networks
CN110262942A (en) A kind of log analysis method and device
CN112215696A (en) Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis
Wu et al. Optimized deep learning framework for water distribution data-driven modeling
CN107704883A (en) A kind of sorting technique and system of the grade of magnesite ore
CN112270441A (en) Method for establishing autism child rehabilitation effect prediction model and method and system for predicting autism child rehabilitation effect
CN114722805A (en) Little sample emotion classification method based on size instructor knowledge distillation
CN109816030A (en) A kind of image classification method and device based on limited Boltzmann machine
CN112035345A (en) Mixed depth defect prediction method based on code segment analysis
CN107766560A (en) The evaluation method and system of customer service flow
CN114519508A (en) Credit risk assessment method based on time sequence deep learning and legal document information
CN112001484A (en) Safety defect report prediction method based on multitask deep learning
CN108805152A (en) A kind of scene classification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181106

RJ01 Rejection of invention patent application after publication