The characteristic positioning method analyzed based on multi-source software data
Technical field
The present invention relates to a kind of characteristic positioning method, particularly to a kind of characteristic positioning method analyzed based on multi-source software data, belong to technical field of software engineering.
Background technology
Along with software dependence is grown with each passing day by information-intensive society, user is more and more higher to the requirement of existing software system, more and more, therefore it is accomplished by constantly software system being upgraded and safeguarding, and these upgradings and the amendment request safeguarded are also commonly referred to as certain feature.In software system, a feature can represent a kind of function, and this function is that the requirement according to developer and user defines with acceptable degree.Software maintenance and evolution potentially include various amendment activity, as increased new function, improvement existing capability and patching bugs.Determine that known specific function position in source code is called feature location.The process of feature location is: determine start node;Select next node to be accessed;Access this node;Judge that this node is the most relevant to the feature investigated;Check whether that oneself is through having obtained all relevant nodes.Implementing certain amendment request in Current software, first have to accurately find the start node of amendment request, if this feature position cannot be found, the most whole amendment process cannot smoothly complete.
Current feature location research mainly includes characteristic positioning method based on Program Static Structure and characteristic positioning method based on program dynamic profile.Both approaches only carries out feature location by static analysis target program with by dynamically analyzing object code, cannot monitor some important characteristic informations such as the history amendment information of target program, cause the accuracy of feature location and comprehensive reduction.It addition, software data not only includes static information and multidate information, also include the procedural information of Software Evolution, if only using the information of a certain type therein to may result in the inaccuracy of feature location result and imperfect.
Having a kind of Java platform to debug architecture in prior art, English full name is: Java Platform Debugger
Architecture, is abbreviated as JPDA.JPDA be virtual machine a whole set of for debugging instrument and interface, the interface provided by JPDA and agreement, debugger developer just can extend customization Java debugging utility according to the demand of specific development person, develop the debugging acid attracting developer to use.JPDA is mainly made up of three parts: 1, Java Virtual Machine tool interface (JVMTI): the service that definition virtual machine (VM) must provide for when debugging, including Debugging message (such as stack information), debugging behavior (as client arranges a breakpoint) and notice (notice client during as arrived certain breakpoint);2, Java debugging wire protocol (JDWP): be defined between debugging process and debugger front end information and the form of request of transmission;3, Java debugging interface (JDI): define the debugging interface that tuner can use, interacts with long-range debugging services to facilitate.
A kind of testing tool and performance tool platform in prior art, English full name is: Test and Performance Tools
Platform, is abbreviated as TPTP.TPTP is a top project of Eclipse foundation, provide increase income test and the performance tool of a set of complete function, cover whole test and performance life cycle, from test in early days to the supervision of production application program, write including test and perform, monitoring, following the tracks of and analyze and log analysis characteristic.
Summary of the invention
It is an object of the invention to provide a kind of characteristic positioning method analyzed based on multi-source software data, solve in prior art only characteristic information to single type and be analyzed excavating, cause feature location result inaccuracy, incomplete technical problem.
The object of the present invention is achieved like this: the characteristic positioning method analyzed based on multi-source software data, comprises the following steps:
Step one: retrieved Current software system by information retrieval technique: program code relevant to described current amendment request in the source code of inquiry Current software system, is designated as characteristic information a by described program code;
Step 2: excavate Historical Evolution information by data mining technology: history relevant to described current amendment request in inquiry history of evolution storehouse revises request, amendment element in relevant history amendment request is carried out union operation, and the program code of output is designated as characteristic information b;
Step 3: perform track by dynamic analysis technology analysis, described execution track includes labelling and performs information and complete execution information, with labelling, the information that completely performs is performed information and carries out subtraction, and output result is designated as execution information undetermined;Then execution information undetermined being carried out static analysis, obtain information collection relevant to described current amendment request in execution information undetermined, perform described information collection, labelling information and carry out union operation, the program code of output is designated as characteristic information c;
Step 4: carrying out tri-kinds of characteristic informations of a, b, c occuring simultaneously calculates, output characteristic positioning result m.
The searching step of described information retrieval technique is as follows:
A) corpus is set up: define file granularity and set up the corpus of described file granularity level;
B) natural language processing: utilizing natural language processing technique that described corpus is implemented pretreatment, described pretreatment includes: delete source code operator and programming language keyword, isolating identifier and compound phrase, cutting stem are root;
C) index corpus: comprise the source code of the keyword of described current amendment request in retrieval corpus.
Described labelling performs the collection of information and uses JPDA technology, and the collection of described complete execution information uses TPTP technology.
Compared with prior art, the invention has the beneficial effects as follows: 1, combine information retrieval technique, data mining technology and dynamic analysis technology and respectively Current software system, history of evolution storehouse and execution track have been carried out feature location, achieve the feature location analyzed based on multi-source software data, relative to the method only information of single type being carried out feature location in prior art, inventive feature positioning result has higher accuracy, integrity and high efficiency;The Maturity of 2, information retrieval of the present invention, data mining and dynamic analysis three technology is higher so that the easily operated realization of the present invention;3, the present invention can be used for class hierarchy, the feature location of method level, is combined into the realities such as this analysis, selects corresponding granularity level to carry out feature location, provides Selection Framework flexibly for actual many granularities level feature location.
Accompanying drawing explanation
Fig. 1 is the flow chart of the present invention.
Fig. 2 is information retrieval technique operation principle block diagram of the present invention.
Fig. 3 is data mining technology operation principle block diagram of the present invention.
Fig. 4 is dynamic analysis technology operation principle block diagram of the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawings the present invention is described in further detail.
As it is shown in figure 1, the characteristic positioning method analyzed based on multi-source software data, comprise the following steps:
Building and analyze feature location model based on multi-source software data: setting four-tuple m, a, b, c, wherein, m is final feature location result, a, b, c are the characteristic information extracted from three kinds of different pieces of information sources respectively.
The localization method of characteristic information a is as follows: retrieved Current software system by information retrieval technique, program code relevant to current amendment request in the source code of inquiry Current software system, and program code is designated as characteristic information a.
As in figure 2 it is shown, be information retrieval technique operation principle block diagram of the present invention.The concrete searching step of information retrieval technique is as follows:
A) corpus is set up: define file granularity and set up the corpus of file granularity level.The file granularity of definition can be bag, class, method;
B) natural language processing: utilize natural language processing technique that corpus is implemented pretreatment, pretreatment includes: delete source code operator and programming language keyword, isolating identifier and compound phrase, such as: " impactAnalysis " is separated into " impact " and " Analysis ";Cutting stem is root, such as: " impacted " is cut to " impact ";
C) index corpus: comprise the source code of the keyword of current amendment request, if this source code is: e1, e2, e4, e6, e8, e10, then characteristic information a={ e1, e2, e4, e6, e8, e10} in retrieval corpus.
The localization method of characteristic information b is as follows: excavate Historical Evolution information by data mining technology.It is illustrated in figure 3 data mining technology operation principle block diagram of the present invention.First from history of evolution storehouse, extract history amendment request and to carry out information integrated to history amendment request, it is assumed that after information is integrated, history amendment request and corresponding amendment element are as shown in table 1:
Then, history amendment request relevant to current amendment request in inquiry history of evolution storehouse, i.e. history amendment request, current amendment request are carried out similarity matrix computing;Finally, the amendment element in relevant history amendment request being carried out union operation, the program code of output is designated as characteristic information b.If history amendment request c2 with c5 is relevant to current amendment request, then characteristic information b=c2 ∪ c5={ e2, e3, e5, e7, e4, e12}.
The localization method of characteristic information c is as follows: performs track by dynamic analysis technology analysis, is illustrated in figure 4 dynamic analysis technology operation principle block diagram of the present invention.First, using JPDA technology to collect labelling and perform information, use TPTP technology to collect and completely perform information, with labelling, the information that completely performs is performed information and carries out subtraction, output result is designated as execution information undetermined;Then execution information undetermined being carried out static analysis, obtain information collection relevant to current amendment request in execution information undetermined, perform information collection, labelling information and carry out union operation, the program code of output is designated as characteristic information c.Assuming that labelling performs information is g1={ e1, e3, e4}, complete execution information is g2={ e1, e2, e3, e4, e5, e6, e7, e8 }, execution information g3=g2-g1={ e2, e5, e6, e7, e8} the most undetermined, g3 is carried out static analysis, obtain information collection g4 relevant to current amendment request in g3, if g4={ is e2, e5}, g1 and g4 carried out union operation and obtains characteristic information c, then characteristic information c=g1 ∪ g4={e1, e2, e3, e4, e5}.
The computational methods of feature location result m are as follows: carrying out tri-kinds of characteristic informations of a, b, c occuring simultaneously calculates, output characteristic positioning result m, i.e. m=a ∩ b ∩ c={ e1, e2, e4, e6, e8, e10} ∩ { e2, e3, e5, e7, e4, e12} ∩ { e1, e2, e3, e4, e5}={ e2, e4}.
The invention is not limited in above-described embodiment, such as: characteristic information a, characteristic information b, the positioning action step of characteristic information c three do not exist sequencing, can overturn mutually.On the basis of technical scheme disclosed by the invention; those skilled in the art is according to disclosed technology contents; need not performing creative labour and some of which technical characteristic just can be made some replacements and deformation, these are replaced and deformation is the most within the scope of the present invention.