The characteristic positioning method of analyzing based on multi-source software data
Technical field
The present invention relates to a kind of characteristic positioning method, particularly a kind of characteristic positioning method of analyzing based on multi-source software data, belongs to technical field of software engineering.
Background technology
Along with growing with each passing day that information society relies on software, user is more and more higher to the requirement of existing software systems, more and more, therefore with regard to needs, constantly software systems upgraded and safeguard, and the modification request of these upgradings and maintenance is also referred to as certain feature conventionally.In software systems, a feature can represent a kind of function, and this function is according to developer and user's requirement and can acceptance defines.Software maintenance may comprise various modification activities with evolution, as increases new function, improvement existing capability and patching bugs.Determine that the position of a known specific function in source code is called feature location.The process of feature location is: determine start node; Select the next node that will access; Access this node; Judge that whether this node is relevant to investigated feature; Check whether oneself is through having obtained all relevant nodes.In current software, implement certain and revise request, first will accurately find the start node of the request of modification, if cannot find this feature locations, whole modification process cannot complete smoothly.
Current feature location research mainly comprises the characteristic positioning method based on Program Static Structure and the characteristic positioning method based on the dynamic section of program.Some important characteristic informations such as these two kinds of methods are only carried out feature location by static analysis target program with by performance analysis object code, the history modification information to target program cannot be monitored, and cause accuracy and the comprehensive reduction of feature location.In addition, software data not only comprises static information and multidate information, also comprises the procedural information of Software Evolution, if only use the information of a certain type wherein may cause the out of true of feature location result and imperfect.
In prior art, have a kind of Java platform testing architecture, English full name is: Java Platform Debugger Architecture, is abbreviated as JPDA.JPDA is a whole set of of virtual machine instrument and interface for debugging, the interface providing by JPDA and agreement, debugger developer just can be according to specific development person's demand, and expansion customization Java debugging utility, develops the debugging acid that attracts developer to use.JPDA mainly consists of three parts: 1, Java Virtual Machine tool interface (JVMTI): defining virtual machine (VM) is in the service that when debugging must provide, and comprises Debugging message (as stack information), debugging behavior (as client arranges a breakpoint) and notifies (notifying client when arrived certain breakpoint); 2, Java debugging wire protocol (JDWP): be defined in the information transmitted between debug process and debugger front end and the form of request; 3, Java debugging interface (JDI): defined the operable debugging interface of debugging person, carried out alternately to facilitate with long-range debugging services.
In prior art, also have a kind of testing tool and performance tool platform, English full name is: Test and Performance Tools Platform, is abbreviated as TPTP.TPTP is a top project of Eclipse foundation, increase income test and the performance tool of a set of complete function are provided, whole test and performance life cycle have been covered, supervision from early stage test to production application program, comprises testing and writes and carry out, monitors, follows the tracks of and analysis and log analysis characteristic.
Summary of the invention
The object of this invention is to provide a kind of characteristic positioning method of analyzing based on multi-source software data, solved and only the characteristic information of single type has been carried out to analysis mining in prior art, cause feature location result out of true, incomplete technical matters.
The object of the present invention is achieved like this: the characteristic positioning method of analyzing based on multi-source software data, comprises the following steps:
Step 1: current software systems are retrieved by information retrieval technique: inquire about in the source code of current software systems and ask relevant program code to described current modification, described program code is designated as to characteristic information a;
Step 2: excavate Historical Evolution information by data mining technology: ask relevant history to revise request to described current modification in inquiry history of evolution storehouse, the modification element that relevant history is revised in request carries out and set operation, and the program code of output is designated as characteristic information b;
Step 3: carry out track by dynamic analysis technology analysis, described execution track includes mark and carries out information and complete execution information, and complete execution information and mark execution information are carried out to subtraction, and Output rusults is designated as execution information undetermined; Then execution information undetermined is carried out to static analysis, obtain asking relevant information set to described current modification in execution information undetermined, described information set, mark execution information are carried out and set operation, the program code of output is designated as characteristic information c;
Step 4: to the calculating of occuring simultaneously of a, b, tri-kinds of characteristic informations of c, output characteristic positioning result m.
The searching step of described information retrieval technique is as follows:
A) set up corpus: defined file granularity is also set up the corpus of described file granularity level;
B) natural language processing: utilize natural language processing technique to implement pre-service to described corpus, described pre-service comprises: deleting source code operational character and programming language key word, isolating identifier and compound phrase, cutting stem is root;
C) index corpus: the source code of the key word that comprises described current modification request in retrieval corpus.
The collection that described mark is carried out information adopts JPDA technology, and the collection of described complete execution information adopts TPTP technology.
Compared with prior art, the invention has the beneficial effects as follows: 1, combine information retrieval technique, data mining technology and dynamic analysis technology and respectively current software systems, history of evolution storehouse and execution track have been carried out to feature location, realized the feature location of analyzing based on multi-source software data, with respect to the method for only information of single type being carried out feature location in prior art, feature location result of the present invention has higher accuracy, integrality and high efficiency; The degree of ripeness of 2, information retrieval of the present invention, data mining and performance analysis three technology is higher, and easy operating of the present invention is realized; 3, the present invention can be used for the feature location of class hierarchy, method level, can be combined into the realities such as this analysis, selects corresponding granularity level to carry out feature location, for many granularities of reality level feature location provides Selection Framework flexibly.
Accompanying drawing explanation
Fig. 1 is process flow diagram of the present invention.
Fig. 2 is information retrieval technique principle of work block scheme of the present invention.
Fig. 3 is data mining technology principle of work block scheme of the present invention.
Fig. 4 is dynamic analysis technology principle of work block scheme of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
As shown in Figure 1, the characteristic positioning method of analyzing based on multi-source software data, comprises the following steps:
Structure is based on multi-source software data analytical characteristic location model: set a four-tuple ﹤ m, and a, b, c ﹥, wherein, m is final feature location result, a, b, c are respectively the characteristic informations extracting from three kinds of different pieces of information sources.
The localization method of characteristic information a is as follows: by information retrieval technique, current software systems are retrieved, inquired about in the source code of current software systems and ask relevant program code to current modification, program code is designated as to characteristic information a.
As shown in Figure 2, be information retrieval technique principle of work block scheme of the present invention.The concrete searching step of information retrieval technique is as follows:
A) set up corpus: defined file granularity is also set up the corpus of file granularity level.The file granularity of definition can be bag, class, method;
B) natural language processing: utilize natural language processing technique to implement pre-service to corpus, pre-service comprises: delete source code operational character and programming language key word, isolating identifier and compound phrase, for example: " impactAnalysis " is separated into " impact " and " Analysis "; Cutting stem is root, for example: " impacted " is cut to " impact ";
C) index corpus: the source code of the key word that retrieval comprises current modification request in corpus, establish this source code and be: e1, e2, e4, e6, e8, e10, characteristic information a={ e1, e2, e4, e6, e8, e10}.
The localization method of characteristic information b is as follows: by data mining technology, excavate Historical Evolution information.Be illustrated in figure 3 data mining technology principle of work block scheme of the present invention.First from history of evolution storehouse, extract and historical revise request and history is revised to request that to carry out information integrated, after supposing that information is integrated, historical revise request and the modification element corresponding with it as shown in table 1:
Then, in inquiry history of evolution storehouse, ask relevant history to revise request to current modification, history modification request, current modification request are carried out to similarity matrix computing; Finally, the modification element that relevant history is revised in request carries out and set operation, and the program code of output is designated as characteristic information b.If historical, revise request c2 relevant to current modification request with c5, characteristic information b=c2 ∪ c5={ e2, e3, e5, e7, e4, e12}.
The localization method of characteristic information c is as follows: by dynamic analysis technology analysis, carry out track, be illustrated in figure 4 dynamic analysis technology principle of work block scheme of the present invention.First, adopt JPDA technology to collect mark and carry out information, adopt TPTP technology to collect complete execution information, complete execution information and mark execution information are carried out to subtraction, Output rusults is designated as execution information undetermined; Then execution information undetermined is carried out to static analysis, obtain asking relevant information set to current modification in execution information undetermined, information set, mark execution information are carried out and set operation, the program code of output is designated as characteristic information c.Suppose that mark execution information is g1={ e1, e3, e4}, complete execution information is g2={ e1, e2, e3, e4, e5, e6, e7, e8 }, execution information g3=g2-g1={ e2 undetermined, e5, e6, e7, e8}, g3 is carried out to static analysis, obtain asking relevant information set g4 to current modification in g3, if g4={ is e2, e5}, g1 and g4 are carried out and set operation obtains characteristic information c, characteristic information c=g1 ∪ g4={e1, e2, e3, e4, e5}.
The computing method of feature location result m are as follows: to the calculating of occuring simultaneously of a, b, tri-kinds of characteristic informations of c, output characteristic positioning result m, i.e. m=a ∩ b ∩ c={ e1, e2, e4, e6, e8, e10} ∩ { e2, e3, e5, e7, e4, e12} ∩ { e1, e2, e3, e4, e5}={ e2, e4}.
The present invention is not limited to above-described embodiment, as: there is not sequencing in characteristic information a, characteristic information b, characteristic information c three's positioning action step, can put upside down mutually.On the basis of technical scheme disclosed by the invention; those skilled in the art is according to disclosed technology contents; do not need performing creative labour just can make some replacements and distortion to some technical characterictics wherein, these replacements and distortion are all in protection scope of the present invention.