CN107515822A - Software defect positioning method based on multiple-objection optimization - Google Patents
Software defect positioning method based on multiple-objection optimization Download PDFInfo
- Publication number
- CN107515822A CN107515822A CN201710700316.5A CN201710700316A CN107515822A CN 107515822 A CN107515822 A CN 107515822A CN 201710700316 A CN201710700316 A CN 201710700316A CN 107515822 A CN107515822 A CN 107515822A
- Authority
- CN
- China
- Prior art keywords
- mrow
- code file
- msub
- mtd
- function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000007547 defect Effects 0.000 title claims abstract description 51
- 238000005457 optimization Methods 0.000 title claims abstract description 30
- 238000005259 measurement Methods 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 91
- 238000011161 development Methods 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 5
- 230000001186 cumulative effect Effects 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 239000000203 mixture Substances 0.000 claims description 4
- 150000001875 compounds Chemical class 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 7
- 238000012423 maintenance Methods 0.000 abstract description 6
- 230000004807 localization Effects 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Stored Programmes (AREA)
Abstract
The invention discloses the software defect positioning method based on multiple-objection optimization, including:Arrange code file, BUG reports and the developer's information of software project;The keyword in code file and BUG reports is extracted, the text similarity function reported based on bag of words calculation code file and BUG;Architectural feature measurement based on code file, the structure complexity function of calculation code file;Developer's information based on code file, the not familiar degree function of computing staff;Given BUG reports, based on upper similarity function, structure complexity function and the not familiar degree function of developer, are ranked up to code file using the two-stage sort method based on multiple-objection optimization, export the code file of high defect suspicion rate.The present invention calculates simply, and autgmentability is strong, can fast and effeciently position defect code, available for different types of code file, exploitation and maintenance process suitable for large scope software product.
Description
Technical field
The present invention relates to field of software engineering, more particularly to the software defect positioning method based on multiple-objection optimization.
Background technology
Software defect is inevitable in the exploitation of software and maintenance process, constantly expands especially as software size, lacks
Sunken quantity is also more and more huger, for example, only in 2013,3389 parts of BUG reports are just produced for Eclipse Platform.It is soft
The main purpose of part debugging includes discovery, positioning, understanding and the removal of defect, and the Development Practice inside software enterprise shows, scarce
Sunken be positioned in debugging process is a difficulty height and the key activities wasted time and energy.
The purpose of defect location is often of a high price and efficiency is low to reach for manual debugging code, also, artificial adjusts
Examination often according to exploitation or the personal experience of attendant, has not reusability.For efficiently positioning defect, researcher proposes
Many automatic adjustment methods.According to whether implementation of test cases, automatic defect localization method can be divided into dynamic and
Static two kinds of classifications, dynamic defect positioning method need to analyze the immanent structure of tested program, collect holding for test case
Row track and result, based on particular model to determine the position of defect code;Static method selective analysis code file and journey
The internal structure of sequence such as controls dependence, data dependence relation, therefrom extracts feature, is built using machine learning method etc.
Code file scoring, export the code file list of high defect suspicion rate.
In existing static defect localization method, the text based on information retrieval method calculation code file and BUG reports
Similarity is a kind of mainstream thoughts, such as the BUGLocator that researcher proposes is based on the current BUG reports of vector space model calculating
Accuse the text similarity reported with code file and history BUG;BLUiR instruments are retrieved as in code file based on structured message
Class name, method name etc. assign different weights.However, existing method needs historical data, weight parameter is trained by supervised learning
It is high Deng, computation complexity, it is not suitable for real-time exploitation and the maintenance process of large scope software.
To sum up, defect positioning method computation complexity of the prior art is high, is not easy to large-scale software development and maintenance.
The content of the invention
The invention provides the software defect positioning method based on multiple-objection optimization, compared with prior art, computational methods letter
It is single, debugging efficiency is improved, being capable of effectively save manpower and time cost.
Software defect positioning method based on multiple-objection optimization, including:
S1, gather software under testing code file, BUG report and developer's posterior infromation, wherein, code file for
Object oriented language refers to the class file of object-oriented, then refers to single code file for procedural language;BUG reports include
Need the software defect data positioned;Document preparation person data during software project development are believed including developer's experience
Breath;
S2, the code file and the BUG report loading keyword extracting method, obtain code file keyword and
BUG file keywords;
S3, the code file keyword and the BUG files keyword constructed into code file S by bag of words
R is reported with BUG, similarity function is defined according to the code file S and BUG reports R;
S4, the architectural feature measurement of the code file S obtain the code according to structure complexity function generating algorithm
File S structure complexity function;
S5, according to developer's posterior infromation, the not familiar degree function of developer is calculated;
S6, the similarity function, the structure complexity function, the not familiar degree function of the developer are using based on more
The two-stage sort method of objective optimization is ranked up to the code file S, obtains code file complete sequence;
S7, the code text by the top-k code files in the code file complete sequence labeled as high defect suspicion rate
Preceding k positions in part, i.e. code file complete sequence, wherein, k is positive integer, k values can according to the order of severity of software defect and
Complexity is manually adjusted.
Further, the keyword extraction method includes:
S21, by the code file and the BUG report resolve into by unordered group identifier into set;
S22, reserved word in punctuation mark in the set, operand, operator and programming language is filtered out, obtained
Set after filtering;
S23, the identifier for compound word composition, the identifier is disassembled into single word according to capitalization;
S24, to after the filtering set carry out English word rootization handle, thus obtain the code file and
The keyword set of BUG files.
Further, the S3 includes:
S31, corpus V constructed according to the code file keyword and the BUG files keyword;
The code file and BUG reports are expressed as code by S32, the corpus V applications bag of words
File S and BUG report R,The code file S and BUG reports R is one-dimensional
Vector set, wherein,WithFor the keyword t word frequency in the code file S and the BUG report R respectivelyWith it is inverse
To document-frequency idftProduct, calculation formula is as follows:
Wherein,The number occurred for keyword t in file d, d are that the code file S or described BUG report R, N
Represent file d total number, NtIt is the number of files for including keyword t;
S33, according to the code file S and the BUG report R define the similarity function, the similarity function
For:
Wherein, RTS is R and S inner product, and T is vectorial transposition, | | R | | and | | S | | respectively R and S mould, i.e., all members
Plain square root sum square.
Further, the architectural feature measurement includes:
Lines of code (LOC Line of Code), total line number of sentence is performed in representative code file;
Maximum McCabe loops complexity (MAX_CC, Max McCabe complexity), side in representative code file
The maximum McCabe loops complexity of method/function;
Code revision number (NOC Number of Correct), representative code file is by modification number;
Other codes that the number of files (DFN Depended File Number) of dependence, representative code file are relied on
Number of files;
Non- annotated code line number (NLC Noncommented Lines of Code), total line number subtracts in representative code file
Go to annotate line number.
Further, the structure complexity function generating algorithm includes:
SS1, each code file, S represented according to the structural eigenvectora={ a1,a2,a3,a4,a5, wherein, a is
Characteristic measure;
SS2, the dimension using the unified characteristic measure of normalizing equation, obtain normalization characteristic measurement, and formula is as follows:
Wherein, aminAnd amaxThe minimum value and maximum of expression characteristic measure, i=1,2,3,4,5;
SS3, according to the normalized characteristic measure, define the structure complexity function of the code file, the knot
Structure complexity function is
Wherein, a is characterized measurement, i=1, and 2,3,4,5.
Further, the S5 includes:
S51, the cumulative time Y for being engaged according to developer developmentexpMapping obtains developer's experience measure index
EXP;
S52, the not familiar degree function of the developer defined according to developer's experience degree index EXP,
Wherein, SPRepresent developer's set of the code file S, EXPiRepresent that developer i experience measure refers to
Mark, i is positive integer.
Further, the developer is engaged in the cumulative time Y of developmentexpWith developer's experience measure
Index EXP mapping relations are:
Yexp< 0.5, EXP=1;
0.5≤Yexp< 1, EXP=2;
1≤Yexp< 3, EXP=3;
Yexp>=3, EXP=5.
Further, the S6 includes:
S61, according to the similarity function SimR (S), the structure complexity function Comp (S) and the exploit person
The not familiar degree function Rusd (S) of member, defines a multi-objective optimization question, and formula is as follows:
Wherein R is BUG reports, and S is the code file, triple (y1,y2,y3) in component y1、y2And y3Point
Not Biao Shi similarity function, structure complexity function and developer it is not familiar degree function functional value, Γ is code file set;
Y is the disaggregation that the similar Y is the similarity function, the structure complexity function and the not familiar degree function of the developer
Spend the disaggregation of function, the structure complexity function and the not familiar degree function of the developer;
S62, the code file set Γ load the quick non-dominated sorting method based on layering, by code text
Part set Γ is divided into different non-dominant layer FlIn, l=1,2 ..., m, the non-dominant layer is labeled as first stage sequence
Row, wherein, m is the quantity of non-dominant layer, and the code suspicion rate in the smaller Fl of l values is higher;
S63, the non-dominant layer FlThe code file of (l=1,2 ..., m) is according to the similarity function SimR(S) enter
Descending two minor sort of row, obtains second stage sequence;
S64, the splicing first stage sequence and second stage sequence, obtain code file complete sequence, wherein, it is described
S62 includes:
S621, make code file Si∈Γ、Sj∈ Γ, i ≠ j;
So that SimR(Si) > SimR(Sj)、Comp(Si) > Comp (Sj)、Rusd(Si) > Rusd (Sj);
Then SiAnd SjRelation be:SiDominate Sj, it is denoted as
Wherein, R is BUG report, and Γ is the code file set, SimR () be the text similarity function,
Comp () is that the structure complexity function, Rusd () are the not familiar degree functions of the developer;
S622, according to the code file SiCalculate dominant set DiWith by domination counter ni,
For the code file Sj∈ Γ, ifThen Di=Di ∪ { Sj};
IfThen ni++;Otherwise SiDo not dominate SjAnd SjDo not dominate Si, DiAnd niKeep constant;
First S623, generation non-dominant layer F1, include all domination counter ni=0 code file Si, i=1,2,
3…,|F1|, | F1| to dominate layer F1Cardinality of a set;
S624, with F1For initial value, non-dominant layer F is iterated to calculatel(l=2 ..., m), wherein m are the non-dominant layers of generation
Quantity, Fl+1Iteration be based on Fl, for Si∈Fl, Sj∈Di, make by domination counter nj--, if nj=0, then Fl+1=Fl+1∪
{Sj, wherein i=1,2,3 ..., | Fl|, j=1,2,3 ..., | Di|, | Fl| it is to dominate layer FlCardinality of a set, | Di| it is dominant set
DiCardinality of a set.
Further, k=10.
The beneficial effects of the invention are as follows:By the text similarity, the code file that consider BUG reports and code file
Architectural feature and developer's empirical data, code file is entered using the two-stage sort method based on multiple-objection optimization
Row sequence, exports the code file of high defect suspicion rate, can simplified calculation method, improve versatility and the extension of localization method
Property, defect code is effectively positioned, improves defect repair efficiency.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, it will use below required in embodiment
Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for ability
For the those of ordinary skill of domain, on the premise of not paying creative work, it can also be obtained according to these accompanying drawings other attached
Figure.
Fig. 1 is the overall framework of the software defect positioning method based on multiple-objection optimization;
Fig. 2 is the schematic diagram of the two-stage sort method based on multiple-objection optimization;
Fig. 3 is the flow chart to be sorted using quick non-dominated sorting method to code file.
Embodiment
To make those skilled in the art more fully understand technical scheme, with reference to embodiment to this
Invention is described in further detail.
The embodiments of the invention provide the software defect positioning method based on multiple-objection optimization, as shown in figure 1, including:
S1, gather software under testing code file, BUG report and developer's posterior infromation, wherein, code file for
Object oriented language refers to the class file of object-oriented, then refers to single code file for procedural language;BUG reports include
Need the software defect data positioned;Document preparation person data during software project development are believed including developer's experience
Breath;
S2, the code file and the BUG report loading keyword extracting method, obtain code file keyword and
BUG file keywords;
S3, the code file keyword and the BUG files keyword constructed into code file S by bag of words
R is reported with BUG, similarity function is defined according to the code file S and BUG reports R;
S4, the architectural feature measurement of the code file S obtain the code according to structure complexity function generating algorithm
File S structure complexity function;
S5, according to developer's posterior infromation, the not familiar degree function of developer is calculated;
S6, the similarity function, the structure complexity function, the not familiar degree function of the developer are using based on more
The two-stage sort method of objective optimization is ranked up to the code file S, obtains code file complete sequence;
S7, the code text by the top-k code files in the code file complete sequence labeled as high defect suspicion rate
Part, k=10, k values can be manually adjusted according to the order of severity and complexity of software defect.
In above method step, the keyword extracting method includes:
S21, by the code file and the BUG report resolve into by unordered group identifier into set;
S22, reserved word in punctuation mark in the set, operand, operator and programming language is filtered out, obtained
Set after filtering;
S23, the identifier for compound word composition, the identifier is disassembled into single word according to capitalization;
S24, the rootization processing that English word is carried out to the set after the filtering, for example, will appear from a document
Word " delegating ", " delegate " and " delegation " is all summarized as their root-form " delegat ", thus
Obtain the code file and the keyword set of BUG files.
The S3 includes:
S31, corpus V constructed according to the code file keyword and the BUG files keyword;
The code file and BUG reports are expressed as code by S32, the corpus V applications bag of words
File S and BUG report R,The code file S and BUG reports R is one-dimensional
Vector set, wherein,WithFor the keyword t word frequency in the code file S and the BUG report R respectivelyWith it is inverse
To document-frequency idftProduct, calculation formula is as follows:
Wherein,The number occurred for keyword t in file d, d are that the code file S or described BUG report R, N
Represent file d total number, NtIt is the number of files for including keyword t;
S33, according to the code file S and the BUG report R define the similarity function, the similarity function
For:
Wherein, RTS is R and S inner product, and T is vectorial transposition, | | R | | and | | S | | respectively R and S mould, i.e., all members
Plain square root sum square.
In the S3, the structure complexity function of the architectural feature metric computation code file based on code file, its
The complexity of middle code is mainly reflected in each side such as the function numbers of lines of code and calling, and the structure of code is more complicated,
Developer is more difficult to control, and the possibility that mistake occurs in code will greatly improve.Therefore for calculation code file
Complexity, 5 code structure characteristic measures are defined, it is as shown in the table:
The code file structure Measure Indexes of table 1
The complexity index of above architectural feature measurement composition code file, each code file S can be expressed as structure
Characteristic measure vector Sa={ a1,a2,a3,a4,a5, wherein a is characterized measurement.Because each Measure Indexes calculate dimension difference, need
Dimension is unified using normalizing equation, formula is as follows:
Wherein i represents the subscript of characteristic measure, aminAnd amaxThe minimum value and maximum of individual features measurement are represented, according to
Above-mentioned normalized characteristic measure, the structure complexity function of definition code file are as follows:
In the S4, developer's information based on code file, the not familiar degree function of computing staff.In general, through
Test that the code logic that abundant programmer writes is clear, and form is good, be easy to read, and new hand programmer is often difficult to write wind
It is easier mistake occur in the outstanding code of lattice and code.The Development Practice of software industry shows that most softwares are opened
More people are needed to participate in exploitation during hair, and the experience level of developer differs, so the quality for the code file write
It is uneven.Therefore personnel experience measure index EXP is designed and developed, be defined as follows, YexpRepresent to be engaged in the accumulative of development
Time:
Development Experience (Yexp)/year | EXP values |
Yexp< 0.5 | 1 |
0.5≤Yexp< 1 | 2 |
1≤Yexp< 3 | 3 |
Yexp≥3 | 5 |
The developer's experience measure index of table 2
The defects of weighing code file according to the experience measure index EXP of above-mentioned developer tendentiousness, that is, think experience
Value is higher, and defective tendentiousness is smaller, and vice versa, therefore both are in the personnel of inversely prroportional relationship, thus definition code file S
Not familiar degree function:
SPRepresent code file S developer's set, EXPiExpression personnel i experience measure index.
In the S5, reported by given BUG, it is not familiar based on similarity function, structure complexity function, developer
Function is spent, code file is ranked up using the two-stage sort method based on multiple-objection optimization, exports high defect suspicion rate
Code file.Based on text similarity function, structure complexity function and the not familiar degree function of personnel, first by defect code
Document alignment is converted to multi-objective optimization question;If code file S substitute into above-mentioned similarity function, structure complexity function and
Make it that functional value is bigger in the not familiar degree function of developer, then the possibility comprising defect is bigger in code file S, exploitation
Personnel just should pay close attention to and check code file S.Therefore, search the problem of meeting the code file S of above-mentioned requirements, can be with
It is expressed as the multi-objective optimization question with a decision variable and three target variables:
Wherein triple (y1,y2,y3) in component y1、y2And y3Degree of denoting like function, structure complexity function
With the functional value of the not familiar degree function of developer, Γ is the set of all code files in software project, i.e. Γ is above-mentioned more mesh
Mark the decision variable set in optimization problem.
Then in generation, is completed using the quick non-dominated sorting method (fast-non-dominated-sort) based on layering
The first stage sequence of code file, its processing step include:
First, the dominance relation between definition code file is closed in decision variable collection:Assuming that S be presenti∈Γ、Sj∈Γ
And i ≠ j causes SimR(Si) > SimR(Sj), Comp (Si) > Comp (Sj) and Rusd (Si) > Rusd (Sj), then SiAnd SjPass
It is to be:SiDominate Sj, it is denoted asR is the BUG reports, and Γ is the code file set, and SimR () is the text
Similarity function, Comp () are that the structure complexity function, Rusd () are the not familiar degree functions of the developer.
Secondly, sorted according to the first stage of the specific steps completion code file of quick non-dominated sorting method, should
The input of sort method is code file collection Γ, not familiar based on above-mentioned similarity function, structure complexity function and developer
Degree function determines the dominance relation in Γ between code file, code file is divided into different non-dominant layer Fl(l=1,
2 ..., m) in, FlFor non-dominant levelCode file SiSet, m is the quantity of non-dominant layer, therefore the first rank
The output of section sequence is { F1,F2,…,Fm}。
Specific steps using quick non-dominated sorting method sort code file can be divided into two parts, Part I
Calculate single code file Si(i=1,2 ..., | Γ |) dominant set Di, by domination counter niAnd F1, DiRepresent to arbitrary
Sj∈Di, have Part II is according to DiAnd niRenewal by code text
Part is divided into corresponding non-dominant layer Fl(l=2 ..., m) in.
Wherein it is based on code file SiCalculate DiAnd niProcess be, for arbitrary Sj∈ Γ, if a)Then Di=
Di∪{Sj};If b)Then ni++;C) otherwise, DiAnd niKeep constant.If ni=0, thenF1=F1∪{Si}。
Part II includes, with F1Based on be iteratively based on Fl(l=1,2 ..., m-1) update by domination counter niFrom
And code file S is divided into corresponding non-dominant layer FlIn (l=2 ..., m), for l=1 ..., m-1, iteration each time
Detailed process be, for Si∈Fl, Sj∈Di, make nj --If nj=0, thenFl+1=Fl+1∪{Sj}。
Code file in project is divided into different non-dominant layer { F by above-mentioned order of classification algorithm1,F2,…,FmIn,
F1In code file form multi-objective optimization question Pareto optimal solution sets, F2Take second place, the like, step by step relative to upper
Primary defect suspicion rate successively decreases.But Fl(l=1, there is no comparativity, i.e. F between 2 ..., m) internal code filelInterior generation
There is no the sequence of defect suspicion rate between code file.To solve this problem, by F in the sequencer procedure of second stagel(l=1,
2 ..., m) internal code file is based on the text similarity function Sim with defect reportR(S) two minor sorts are carried out, are considered
Arrive, FlInterior code file limited amount, therefore bubble sort scheduling algorithm may be selected according to Sim in second stageR(S) non-branch is completed
Sequence with layer internal code file.
The complete sequence of code file is obtained after the sequence in two stages, exports top-k generation in collating sequence
Code file of the code file as high defect suspicion rate, wherein k=10, k values according to the order of severity of software defect and can answer
Miscellaneous degree is manually adjusted.Developer should pay close attention to and check the code file of high defect suspicion rate.
The present invention makes full use of the UG in Software Development maintenance process to report, code file information and developer
Information, in file level defined feature index, from three different aspect objective functions, using non-supervisory method, using base
In the quick non-dominated sorting method (fast-non-dominated-sort) of classification is carried out to code file the first stage
Sequence, is then based on the sequence that the code file inside the non-dominant layer of text similarity function pair carries out second stage, and output is high
The code of defect suspicion rate.
To sum up, the present invention calculates simple, and versatility and autgmentability are strong, can fast and effeciently position defect code, can be used for
Different types of code file, programming language and platform, exploitation and maintenance process suitable for large scope software product.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any
Those familiar with the art the invention discloses technical scope in, the change or replacement that can readily occur in, all should
It is included within the scope of the present invention.Therefore, protection scope of the present invention should be defined by scope of the claims.
Claims (9)
1. the software defect positioning method based on multiple-objection optimization, it is characterised in that including:
S1, the code file for gathering software under testing, BUG reports and developer's posterior infromation;
S2, the code file and BUG report loading keyword extracting methods, obtain code file keyword and BUG texts
Part keyword;
S3, by the code file keyword and the BUG files keyword by bag of words construct code file S and
BUG reports R, and similarity function is defined according to the code file S and BUG reports R;
S4, the architectural feature measurement of the code file S obtain the code file according to structure complexity function generating algorithm
S structure complexity function;
S5, according to developer's posterior infromation, the not familiar degree function of developer is calculated;
S6, the similarity function, the structure complexity function, the not familiar degree function of the developer use and are based on multiple target
The two-stage sort method of optimization is ranked up to the code file S, obtains code file complete sequence;
S7, the code file by the preceding k positions code file in the code file complete sequence labeled as high defect suspicion rate, its
In, k is positive integer.
2. the software defect positioning method according to claim 1 based on multiple-objection optimization, it is characterised in that the key
Word extracting method includes:
S21, by the code file and the BUG report resolve into by unordered group identifier into set;
S22, reserved word in punctuation mark in the set, operand, operator and programming language is filtered out, filtered
Set afterwards;
S23, the identifier for compound word composition, the identifier is disassembled into single word according to capitalization;
S24, the rootization processing that English word is carried out to the set after the filtering, thus obtain the code file and BUG
The keyword set of file.
3. the software defect positioning method according to claim 1 based on multiple-objection optimization, it is characterised in that the S3 bags
Include:
S31, corpus V constructed according to the code file keyword and the BUG files keyword;
The code file and BUG reports are expressed as code file S by S32, the corpus V applications bag of words
R is reported with BUG,The code file S and BUG reports R is one-dimensional vector collection
Close, wherein,WithFor the keyword t word frequency in the code file S and the BUG report R respectivelyWith reverse file
Frequency idftProduct, calculation formula is as follows:
<mrow>
<msubsup>
<mi>w</mi>
<mi>t</mi>
<mi>d</mi>
</msubsup>
<mo>=</mo>
<msubsup>
<mi>tf</mi>
<mi>t</mi>
<mi>d</mi>
</msubsup>
<mo>&times;</mo>
<msub>
<mi>idf</mi>
<mi>t</mi>
</msub>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msubsup>
<mi>tf</mi>
<mi>t</mi>
<mi>d</mi>
</msubsup>
<mo>=</mo>
<mi>l</mi>
<mi>o</mi>
<mi>g</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>n</mi>
<mi>t</mi>
<mi>d</mi>
</msubsup>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mn>1</mn>
</mrow>
<mrow>
<msub>
<mi>idf</mi>
<mi>t</mi>
</msub>
<mo>=</mo>
<mi>log</mi>
<mrow>
<mo>(</mo>
<mfrac>
<mi>N</mi>
<msub>
<mi>N</mi>
<mi>t</mi>
</msub>
</mfrac>
<mo>)</mo>
</mrow>
</mrow>
Wherein,The number occurred for keyword t in file d, d are that the code file S or described BUG report that R, N are represented
File d total number, NtIt is the number of files for including keyword t;
S33, according to the code file S and the BUG report R define the similarity function, the similarity function is:
<mrow>
<msub>
<mi>Sim</mi>
<mi>R</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>S</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<msup>
<mi>R</mi>
<mi>T</mi>
</msup>
<mi>S</mi>
</mrow>
<mrow>
<mo>|</mo>
<mo>|</mo>
<mi>R</mi>
<mo>|</mo>
<mo>|</mo>
<mo>&CenterDot;</mo>
<mo>|</mo>
<mo>|</mo>
<mi>S</mi>
<mo>|</mo>
<mo>|</mo>
</mrow>
</mfrac>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein, RTS is R and S inner product, and T is vectorial transposition, | | R | | and | | S | | it is respectively R and S mould, i.e. all elements square
Root sum square.
4. the software defect positioning method according to claim 1 based on multiple-objection optimization, it is characterised in that the structure
Characteristic measure includes:Lines of code, maximum McCabe loops complexity, code revision number, the number of files relied on, non-annotation
Lines of code.
5. the software defect positioning method according to claim 1 based on multiple-objection optimization, it is characterised in that the structure
Complexity function generating algorithm includes:
SS1, each code file, S represented according to the structural eigenvectora={ a1,a2,a3,a4,a5, wherein, a is characterized
Measurement;
SS2, the dimension using the unified characteristic measure of normalizing equation, obtain normalization characteristic measurement, and formula is as follows:
<mrow>
<msubsup>
<mi>a</mi>
<mi>i</mi>
<mo>&prime;</mo>
</msubsup>
<mo>=</mo>
<mfenced open = "{" close = "">
<mtable>
<mtr>
<mtd>
<mrow>
<mn>0</mn>
<mo>,</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>i</mi>
<mi>f</mi>
</mrow>
</mtd>
<mtd>
<mrow>
<msub>
<mi>a</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<msub>
<mi>a</mi>
<mi>min</mi>
</msub>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mfrac>
<mrow>
<msub>
<mi>a</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>a</mi>
<mi>min</mi>
</msub>
</mrow>
<mrow>
<msub>
<mi>a</mi>
<mi>max</mi>
</msub>
<mo>-</mo>
<msub>
<mi>a</mi>
<mi>min</mi>
</msub>
</mrow>
</mfrac>
<mo>,</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>i</mi>
<mi>f</mi>
</mrow>
</mtd>
<mtd>
<mrow>
<msub>
<mi>a</mi>
<mi>min</mi>
</msub>
<mo><</mo>
<msub>
<mi>a</mi>
<mi>i</mi>
</msub>
<mo><</mo>
<msub>
<mi>a</mi>
<mi>max</mi>
</msub>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mn>1</mn>
<mo>,</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>i</mi>
<mi>f</mi>
</mrow>
</mtd>
<mtd>
<mrow>
<msub>
<mi>a</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<msub>
<mi>a</mi>
<mi>max</mi>
</msub>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>3</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein, aminAnd amaxThe minimum value and maximum of expression characteristic measure, i=1,2,3,4,5;
SS3, according to the normalized characteristic measure, define the structure complexity function of the code file, the structure is answered
It is miscellaneous degree function be
<mrow>
<mi>C</mi>
<mi>o</mi>
<mi>m</mi>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>S</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mn>5</mn>
</mfrac>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mn>5</mn>
</msubsup>
<msub>
<mi>a</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>4</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein, a is characterized measurement, i=1, and 2,3,4,5.
6. the software defect positioning method according to claim 1 based on multiple-objection optimization, it is characterised in that the S5 bags
Include:
S51, the cumulative time Y for being engaged according to developer developmentexpMapping obtains developer's experience measure index EXP;
S52, the not familiar degree function of the developer defined according to developer's experience degree index EXP,
<mrow>
<mi>R</mi>
<mi>u</mi>
<mi>s</mi>
<mi>d</mi>
<mrow>
<mo>(</mo>
<mi>S</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munder>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>&Element;</mo>
<msub>
<mi>S</mi>
<mi>P</mi>
</msub>
</mrow>
</munder>
<mfrac>
<mn>1</mn>
<mrow>
<msub>
<mi>EXP</mi>
<mi>i</mi>
</msub>
</mrow>
</mfrac>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>5</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein, SPRepresent developer's set of the code file S, EXPiDeveloper i experience measure index is represented, i is
Positive integer.
7. the software defect positioning method according to claim 6 based on multiple-objection optimization, it is characterised in that the exploitation
Personnel are engaged in the cumulative time Y of developmentexpMapping relations with developer's experience measure index EXP are:
Yexp< 0.5, EXP=1;
0.5≤Yexp< 1, EXP=2;
1≤Yexp< 3, EXP=3;
Yexp>=3, EXP=5.
8. the software defect positioning method according to claim 6 based on multiple-objection optimization, it is characterised in that the S6 bags
Include:
S61, according to the similarity function SimR (S), the structure complexity function Comp (S) and the developer give birth to
Degree function Rusd (S) is dredged, defines a multi-objective optimization question, formula is as follows:
<mrow>
<mfenced open = "{" close = "">
<mtable>
<mtr>
<mtd>
<mrow>
<mi>m</mi>
<mi>a</mi>
<mi>x</mi>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>Y</mi>
<mo>=</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>Sim</mi>
<mi>R</mi>
</msub>
<mo>(</mo>
<mi>S</mi>
<mo>)</mo>
<mo>,</mo>
<mi>C</mi>
<mi>o</mi>
<mi>m</mi>
<mi>p</mi>
<mo>(</mo>
<mi>S</mi>
<mo>)</mo>
<mo>,</mo>
<mi>R</mi>
<mi>u</mi>
<mi>s</mi>
<mi>d</mi>
<mo>(</mo>
<mi>S</mi>
<mo>)</mo>
<mo>)</mo>
</mrow>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mi>s</mi>
<mi>t</mi>
<mo>.</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>S</mi>
<mo>&Element;</mo>
<mi>&Gamma;</mi>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow></mrow>
</mtd>
<mtd>
<mrow>
<mi>Y</mi>
<mo>=</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>y</mi>
<mn>1</mn>
</msub>
<mo>,</mo>
<msub>
<mi>y</mi>
<mn>2</mn>
</msub>
<mo>,</mo>
<msub>
<mi>y</mi>
<mn>3</mn>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>6</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein R is BUG reports, and S is the code file, triple (y1,y2,y3) in component y1、y2And y3Table respectively
Show the functional value of the similarity function, the structure complexity function and the not familiar degree function of the developer, Γ is code
File set;Y is the disaggregation of the similarity function, the structure complexity function and the not familiar degree function of the developer;
S62, the code file set Γ load the quick non-dominated sorting method based on layering, by the code file collection
Close Γ and be divided into different non-dominant layer FlIn, l=1,2 ..., m, the non-dominant layer is labeled as first stage sequence, its
In, m is the quantity of non-dominant layer;
S63, the non-dominant layer FlThe code file of (l=1,2 ..., m) is according to the similarity function SimR(S) carry out by
Young waiter in a wineshop or an inn's minor sort is arrived greatly, obtains second stage sequence;
S64, the splicing first stage sequence and second stage sequence, obtain code file complete sequence;
Wherein, the S62 includes:
S621, make code file Si∈Γ、Sj∈ Γ, i ≠ j,
So that SimR(Si) > SimR(Sj)、Comp(Si) > Comp (Sj)、Rusd(Si) > Rusd (Sj),
Then SiAnd SjRelation be:SiDominate Sj, it is denoted as
Wherein, R is the BUG reports, and Γ is the code file set, and SimR () is the text similarity function, Comp
() is that the structure complexity function, Rusd () are the not familiar degree functions of the developer;
S622, according to the code file SiCalculate dominant set DiWith by domination counter ni:
For the code file Sj∈ Γ, ifThen Di=Di∪{Sj};
IfThen ni++;Otherwise, SiDo not dominate SjAnd SjDo not dominate Si, DiAnd niKeep constant;
First S623, generation non-dominant layer F1, include all domination counter ni=0 code file Si, i=1,2,3 ...,
|F1|, | F1| to dominate layer F1Cardinality of a set;
S624, with F1For initial value, non-dominant layer F is iterated to calculatel(l=2 ..., m), wherein m are the non-dominant numbers of plies of generation
Amount, Fl+1Iteration be based on Fl, for Si∈Fl, Sj∈Di, make by domination counter nj--, if nj=0, then Fl+1=Fl+1∪
{Sj, wherein i=1, i=1,2,3 ..., | Fl|, j=1,2,3 ..., | Di|, | Fl| it is to dominate layer FlCardinality of a set, | Di| it is
Dominant set DiCardinality of a set.
9. the software defect positioning method according to claim 1 based on multiple-objection optimization, it is characterised in that in the S7
In, k=10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710700316.5A CN107515822B (en) | 2017-08-16 | 2017-08-16 | Software defect positioning method based on multiple-objection optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710700316.5A CN107515822B (en) | 2017-08-16 | 2017-08-16 | Software defect positioning method based on multiple-objection optimization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107515822A true CN107515822A (en) | 2017-12-26 |
CN107515822B CN107515822B (en) | 2019-09-03 |
Family
ID=60723239
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710700316.5A Active CN107515822B (en) | 2017-08-16 | 2017-08-16 | Software defect positioning method based on multiple-objection optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107515822B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109767438A (en) * | 2019-01-09 | 2019-05-17 | 电子科技大学 | A kind of thermal-induced imagery defect characteristic recognition methods based on dynamic multi-objective optimization |
CN110580217A (en) * | 2018-06-08 | 2019-12-17 | 阿里巴巴集团控股有限公司 | software code health degree detection method, processing method and device and electronic equipment |
CN111831541A (en) * | 2019-04-22 | 2020-10-27 | 西安邮电大学 | Software defect positioning method based on risk track |
CN112328475A (en) * | 2020-10-28 | 2021-02-05 | 南京航空航天大学 | Defect positioning method for multiple suspicious code files |
CN114510431A (en) * | 2022-04-20 | 2022-05-17 | 武汉理工大学 | Workload-aware intelligent contract defect prediction method, system and equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101231614A (en) * | 2008-02-02 | 2008-07-30 | 南京大学 | Method for locating software unsoundness base on execution track block semblance |
CN105786704A (en) * | 2016-02-22 | 2016-07-20 | 南京大学 | Work amount sensitive bug positioning technology effectiveness evaluation method |
-
2017
- 2017-08-16 CN CN201710700316.5A patent/CN107515822B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101231614A (en) * | 2008-02-02 | 2008-07-30 | 南京大学 | Method for locating software unsoundness base on execution track block semblance |
CN105786704A (en) * | 2016-02-22 | 2016-07-20 | 南京大学 | Work amount sensitive bug positioning technology effectiveness evaluation method |
Non-Patent Citations (2)
Title |
---|
LU HUIHUA 等: "Defect Prediction between Software Versions with Active Learning and Dimensionality Reduction", 《2014 IEEE 25TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING》 * |
陈翔 等: "静态软件缺陷预测方法研究", 《软件学报》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110580217A (en) * | 2018-06-08 | 2019-12-17 | 阿里巴巴集团控股有限公司 | software code health degree detection method, processing method and device and electronic equipment |
CN110580217B (en) * | 2018-06-08 | 2023-05-05 | 阿里巴巴集团控股有限公司 | Software code health degree detection method, processing method, device and electronic equipment |
CN109767438A (en) * | 2019-01-09 | 2019-05-17 | 电子科技大学 | A kind of thermal-induced imagery defect characteristic recognition methods based on dynamic multi-objective optimization |
CN109767438B (en) * | 2019-01-09 | 2021-06-08 | 电子科技大学 | Infrared thermal image defect feature identification method based on dynamic multi-objective optimization |
CN111831541A (en) * | 2019-04-22 | 2020-10-27 | 西安邮电大学 | Software defect positioning method based on risk track |
CN111831541B (en) * | 2019-04-22 | 2022-10-28 | 西安邮电大学 | Software defect positioning method based on risk track |
CN112328475A (en) * | 2020-10-28 | 2021-02-05 | 南京航空航天大学 | Defect positioning method for multiple suspicious code files |
CN112328475B (en) * | 2020-10-28 | 2021-11-30 | 南京航空航天大学 | Defect positioning method for multiple suspicious code files |
CN114510431A (en) * | 2022-04-20 | 2022-05-17 | 武汉理工大学 | Workload-aware intelligent contract defect prediction method, system and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN107515822B (en) | 2019-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107515822A (en) | Software defect positioning method based on multiple-objection optimization | |
CN102662930B (en) | Corpus tagging method and corpus tagging device | |
CN107817404A (en) | A kind of Portable metering automatization terminal trouble-shooter and its diagnostic method | |
CN110659814A (en) | Power grid operation risk evaluation method and system based on entropy weight method | |
CN104536881A (en) | Public testing error report priority sorting method based on natural language analysis | |
CN104881689A (en) | Method and system for multi-label active learning classification | |
CN112364352B (en) | Method and system for detecting and recommending interpretable software loopholes | |
Kumar | Measuring Software reusability using SVM based classifier approach | |
CN104951987B (en) | Crop Breeding evaluation method based on decision tree | |
CN108446885A (en) | A kind of automatic collecting method of review comment | |
Kumar et al. | Software fault proneness prediction using genetic based machine learning techniques | |
CN109711424A (en) | A kind of rule of conduct acquisition methods, device and equipment based on decision tree | |
CN111199469A (en) | User payment model generation method and device and electronic equipment | |
CN107066389A (en) | The Forecasting Methodology that software defect based on integrated study is reopened | |
Sandhu et al. | A comparative analysis of conjugate gradient algorithms & PSO based neural network approaches for reusability evaluation of procedure based software systems | |
CN108763459A (en) | Professional trend analysis method and system based on psychological test and DNN algorithms | |
Kusiak | A data mining approach for generation of control signatures | |
Au et al. | Decision model for country site selection of overseas clothing plants | |
CN115345379A (en) | Auxiliary decision-making method for operation and maintenance of power transformation equipment | |
Alba et al. | Comparative analysis of modern optimization tools for the p-median problem | |
Kaur et al. | Performance evaluation of reusable software components | |
CN107291722A (en) | The sorting technique and equipment of a kind of descriptor | |
Hassanzadeh et al. | Developing a new method using Artificial Immune System in order to High Productivity of Inefficient Units in Network DEA approach | |
Sun | Construction principles of physical fitness training objective system based on machine learning and data mining | |
Manhas et al. | Framework for Evaluating Reusability of Procedure Oriented System using Metrics based Approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |