Software key function identification method based on g-kernel decomposition
Technical Field
The invention relates to a software key function identification method, in particular to a software key function identification method based on g-kernel decomposition.
Background
Computer software has entered into all aspects of our lives and is an indispensable part of our lives. The software is changing and will continue to change our lives. People have higher and higher requirements on the functions and the performances of software, so that the scale of the software is increasingly complicated, and the quality is difficult to ensure. When new needs arise, old software often needs to be adapted to the new needs through certain maintenance work. However, the complexity of the software makes the maintenance work of the software increasingly complex, and the maintenance cost is high all the time, which accounts for more than 60% of the total cost of the software. To perform maintenance on old software, it is first of all a problem to understand the software to be maintained. However, the complexity of the software makes the understanding of the software difficult. Therefore, it is a technical problem to provide an effective technique to assist the maintainer in understanding the software, thereby simplifying the maintenance work of the software and reducing the maintenance cost.
Understanding software from its key elements (packages, classes, functions, attributes, etc.) is one possible approach. The key elements are understood first and then the elements associated with the key elements are understood, thereby gradually understanding the entire software. In order to identify key elements in software, different approaches have been proposed: zaidman et al constructs a static class dependency graph and identifies key classes using the HITS algorithm. Zhou Yuming and the like abstract a software system of class granularity by using a class dependency graph, and identify key classes by using methods such as a PageRank algorithm, HITS, betweenness centrality and the like. Jiang Shujuan et al construct a state transition model for software, and identify key classes by calculating the complexity of state transition tree nodes. Pan Weifeng, et al, construct a software structure diagram of class granularity and package granularity, and further identify key classes and key packages in the software by using the PageRank algorithm. Although there is currently some work on the identification of key elements in software, the following disadvantages remain:
(1) The existing work mainly focuses on static analysis of software codes and lacks dynamic analysis of the actual running of the software. Static analysis does not need to run software, only depends on software source codes, and the relationship among the extracted elements is actually a relationship under the 'worst' condition and may contain redundant relationships; the dynamic analysis needs to run software, collect elements and relationships among the elements in the software running process, and represent the real interaction among the elements. Dynamic analysis is more accurate than static analysis.
(2) The existing work mainly aims at the identification of key packages and key classes and lacks the identification of key functions.
Packets and classes are relatively large-grained software elements, while functions are relatively fine-grained software elements. The technology for recognizing the key function can make up the defects of the existing work, so that the key elements of the software can be recognized in an all-round way from coarse granularity to fine granularity, and technical support is provided for software understanding, software testing and software maintenance work.
Disclosure of Invention
The invention aims to provide a software key function identification method based on g-kernel decomposition aiming at the defects of the prior art.
The technical problem of the invention is mainly solved by the following technical scheme: a software key function identification method based on g-kernel decomposition comprises the following steps:
(1) Abstracting software written in Java language into a function dependency graph FG = (N, D) at the function granularity, wherein N is a set of function nodes in the software; d = { (f) i ,f j )}(f i ∈N,f j e.N) is a set of undirected edges and represents the calling relationship among functions; each edge will be assigned a non-negative integer as the strength value of the function-call relationship.
(2) And (3) calculating the g core number g (i) of the function node i based on the FG constructed in the step (1) as the importance value of the corresponding function of the node.
(3) And (3) performing descending order arrangement on the function nodes based on the g-core number of the function nodes obtained in the step (2) to obtain a key function.
Further, the functions in the step (1) and the call relations among the functions are obtained according to the actual running process of the Java software on the Java virtual machine, and are a dynamic analysis, not a static analysis based on source codes.
Further, the strength value of the edge in the step (1) refers to the number of calls between functions. The calling times are obtained according to the actual running process of the Java software on the Java virtual machine, and are dynamic analysis instead of static analysis based on source codes.
Further, the calculation of the g-kernel number g (i) of the node i in the step (2) specifically includes the following sub-steps:
and (2.1) calculating the weighting degree of all function nodes in the FG obtained in the step (1). Weighting degree w of function node j j Defined as the sum of the strength values of all edges in the FG that are connected to the node of the function, i.e.:
wherein v is j Is a set of neighbor function nodes for function node j; w (j, m) is the intensity value on the edge (j, m).
And (2.2) solving the degrees of all function nodes in the FG obtained in the step (1). Degree k of function node j j Defined as the number of edges in the FG that connect to the function node.
And (2.3) solving the geometric mean degree of all nodes in the FG obtained in the step (1). Geometric mean s of function node j j Is numerically equal to the nearest w j And k j Integer of the arithmetic square root of the product. s j The calculation formula is as follows:
where round (n) returns the integer closest to the value n.
(2.4) obtaining g cores (g =1,2,3, …) of the FG obtained in the step (1): and repeatedly removing function nodes with the geometric mean degree smaller than g in the FG and connecting edges thereof to obtain a subgraph, namely the g core of the FG.
(2.5) calculating the g cores of all function nodes in the FG obtained in the step (1): comparing the function nodes in the g core and the (g + 1) core, if the function node exists in the g core but is deleted in the (g + 1) core, the number of the g cores of the function node is g.
Further, in the step (3), the function nodes are sorted in a descending order by using a bubble sorting algorithm.
Further, in the step (3), after descending order, the top 15% (rounding up) ranked function is taken as the identified key function.
Compared with the prior art, the invention has the following advantages and positive effects:
(1) The FG is constructed based on dynamic analysis of Java software in execution, represents functions in the software and real interactive relations among the functions, is more accurate than a static analysis method based on software source codes, and solves the problem of inaccurate model of a key element identification method based on static analysis to a certain extent.
(2) The invention provides a method for identifying key functions in software based on g-kernel decomposition, which solves the problems that the existing method only focuses on identification of key packets and key classes with coarse granularity in the software and ignores identification of key functions of the software with fine granularity to a certain extent, and can provide technical support for software understanding, software testing, code maintenance and other works.
Drawings
FIG. 1 source code written in the Java language of the present invention;
FIG. 2 is a FG constructed in accordance with an embodiment of the invention;
FIG. 3 g-nucleus decomposition process of an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further explained by the embodiment and the accompanying drawings:
the invention provides a software key function identification method based on g-kernel decomposition, which comprises the following specific steps:
(1) Software written in the Java language is abstracted at function granularity to a function dependency graph FG = (N, D). Fig. 1 shows a Java source code. According to the Java source code given in fig. 1, when it runs on the JVM, the main function is executed first, then the add function is called 9 times by the main function, and the sub1 function is called 1 time by the main function; sub1 calls sub2 and add 1 time each at the time of execution. Therefore, according to its operation, the FG shown in fig. 2 can be obtained, and the text of the node edge is the name of the corresponding function of the node. Wherein N = { main, sub1, sub2, add } is a set of function nodes; d = { (main, sub 1), (main, add), (sub 1, sub 2), (sub 1, add) } is a set of undirected edges, and represents a call relationship between functions; the numbers on the edges represent the frequency of calling the relationship.
(2) And (3) calculating the g core number g (i) of the function node i based on the FG constructed in the step (1) as the importance value of the corresponding function of the node. The calculation of the number of g cores g (i) of the function node i comprises in particular the following sub-steps (these steps are collectively referred to as the g core decomposition process of the FG):
and (2.1) calculating the weighting degree of all function nodes in the FG obtained in the step (1). Weighting degree w of function node j j Defined as the sum of the strength values of all edges in the FG that are connected to the node of the function, i.e.:
wherein v is j Is a neighbor function node set of function node j; w (j, m) is the intensity value on the edge (j, m). Thus, the weighting degree w of the function node main in FIG. 2 main The weighting degree of =9+1=10,sub1 is w sub1 Weighting degree w of =1+1= 3,sub2 sub2 Weighting degree of =1,add is w add =9+1=10。
And (2.2) solving the degrees of all function nodes in the FG obtained in the step (1). Degree k of function node j j Defined as the number of edges in the FG that connect to the function node. Thus, the degree k of the function node main in FIG. 2 main Degree of = 1=2,sub1 is k sub1 Degree k of =1+1= 3,sub2 sub2 Degree of =1,add is k add =1+1=2。
And (2.3) solving the geometric mean degree of all function nodes in the FG obtained in the step (1). Geometric mean s of function node j j Is numerically equal to and w j And k j The arithmetic square root of the product is the nearest integer. s j The calculation formula is as follows:
where round (n) returns the integer closest to the value n. Thus, the geometric mean of the function node main in FIG. 2
(2.4) obtaining g cores (g =1,2,3, …) of the FG obtained in the step (1): and repeatedly removing function nodes with the geometric mean degree smaller than g in the FG and connecting edges thereof to obtain a subgraph, namely the g core of the FG. The g-nucleus decomposition process of fig. 2 is shown in fig. 3.
Calculate 1 kernel (g = 1). Based on the step (2.3), it is possible to obtain
The resulting 1-core diagram is shown in fig. 3 (second left).
Calculate 2 kernels (g = 2). Based on 1 nucleus, simultaneously because
Therefore, in core 2, node sub2 and its connected edges in
core 1 are removed, and the resulting core 2 graph is shown in the (left three) sub-graph of fig. 3.
Calculate 3 kernels (g = 3). Based on 2 kernels, the geometric mean of each node is recalculated simultaneously, as shown in the (left three) subgraph of FIG. 3, because
Therefore, in
core 3, node sub1 and its connected edges in core 2 are removed, and the resulting 3-core graph is shown in the (left four) sub-graph of fig. 3.
4 kernels were calculated (g = 4). Based on 3 kernels, the geometric mean of each node is recalculated at the same time, as shown in the (left-four) subgraph of FIG. 3, because
So in 4 cores, all the nodes main, add and their connected edges in the original 3 cores will be removed, and the resulting 4 cores are empty, i.e. there are no 4 cores in the FG of fig. 2.
(2.5) calculating the g cores of all function nodes in the FG obtained in the step (1): comparing the function nodes in the g core and the (g + 1) core, if the function node exists in the g core but is deleted in the (g + 1) core, the g core number of the function node is g. As in fig. 3, since node sub2 is in 1 core, but not in 2 core, the g core number is 1. Similarly, the g core number of the node sub1 is 2, and the g core number of the add is 3.
(3) And (3) based on the g-core number of the function nodes obtained in the step (2), performing descending arrangement on the function nodes by using a bubble sorting algorithm, and taking a function which is 15 percent of the top rank (one value commonly adopted in the prior art) as the identified key function. Based on the result of step (2.5), the resulting ranking result is (main = add: 3) > (sub 1: 2) > (sub 2: 1). So the function node that ranks the top 15% (15% × 4=0.6, rounding to 1) is main or add.
The specific embodiments described herein are merely illustrative of the spirit of the invention, and the equal values of main and add are merely one possible scenario in reality, and do not represent all scenarios. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.