CN107832080A

CN107832080A - Component Importance measure based on node betweenness under a kind of Software Evolution environment

Info

Publication number: CN107832080A
Application number: CN201710977888.8A
Authority: CN
Inventors: 成蕾; 林英; 李彤; 谢仲文; 莫启; 秦江龙; 王晓芳; 郑交交; 李响; 杨真谛; 郑明�
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2017-10-17
Filing date: 2017-10-17
Publication date: 2018-03-23

Abstract

The invention belongs to software component importance measures technical field, disclose the Component Importance measure based on node betweenness under a kind of Software Evolution environment, software architecture is used as blueprint and support, software architecture Directed Graph Model is proposed, node betweenness is introduced and the importance of component is measured.And the request to component is relied on, service dependence is analyzed, and is analyzed, found out and the maximally related factor of node betweenness by using Pearson correlation coefficient.The present invention to a large amount of open source software source codes by testing, test result indicates that, importance with node betweenness metrology member is effective, and the summation and the node betweenness of component that the request of component relies on and service relies on are mostly concerned, this also specifies another analysis directions to weigh Component Importance using dependence in next step.

Description

Component importance measurement method based on node betweenness in software evolution environment

Technical Field

The invention belongs to the technical field of software component importance measurement, and particularly relates to a component importance measurement method based on node betweenness in a software evolution environment.

Background

Software systems are gradually developed into combined delivery of services and components, and are continuously adjusted and expanded as required in the development of the society, so that the scale of the software systems is increased, the structure has multiple levels, different granularities and multiple integration modes, and people use the term evolution (evolution) to describe the continuous change. This is commonly found in software systems, and a series of complex changing activities of the software system gradually changes until the ideal form is reached is software evolution.

Software has two basic properties of construction and evolution. The development of Software Architecture (SA) has grown to maturity as a blueprint to support people's understanding of the overall software architecture from the macro level. However, as software systems develop in terms of functionality and scale, the mastering and control of software evolution becomes more complex and increasingly difficult. The traditional measurement method has important contribution in software evolution, and shows certain characteristics of the software evolution. However, these conventional measurement methods have a common property and fall into complicated details in the software structure early, which is not enough to focus on the macro aspect, and it is difficult to grasp the software structure integrally and comprehensively.

In the 90 s of the 20 th century, bohner elaborated software changes using the concept of reachable matrices based on the process framework for software change analysis, but did not give the concept of the size of the contribution of constituent elements to the software. Valverde et al first analyzed object-oriented software systems, which abstracted the class diagrams of the systems into directed net graphs. Myers, valverde, and Moura et al use a directed network to represent the structure of a software system, and propose a reconstruction-based software model based on this. Then, a domestic batch of analysts in wangnao et al use a weighted network to analyze a software network of a complex software system, and carry out software structure analysis such as the Kingshihui and Zhangukin, so that a series of analysis results are obtained.

In summary, the problems of the prior art are as follows: the measurement methods of the traditional components are all shared and fall into complicated details in the software structure in advance, so that the attention on the macroscopic aspect is insufficient, and the software structure is difficult to be integrally and comprehensively grasped. So far, there is no standard and generally agreed influence factor for measuring the importance of the components in a complex software system, and in the measurement of the software architecture at the present stage, due to the complexity of the software architecture, nodes with similar structures often appear, and the difference and importance of the software components cannot be strictly reflected. The technology provides a component importance measuring method which can be comprehensively considered and has reasonable calculation cost, on the other hand, the importance of components in a software system structure is sequenced by combining node dependence and node betweenness, and the Pearson correlation coefficient is used for verification.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a component importance measurement method based on node betweenness in a software evolution environment.

The present invention has been accomplished in such a manner that,

a component importance measuring method based on node betweenness in a software evolution environment adopts a software architecture as a blueprint and a support, proposes a software architecture unweighted directed graph model, and introduces the node betweenness to measure the importance of a component; and the request dependence, the service dependence and the total dependence of the components are analyzed by using Pearson correlation coefficients to find out the factors most relevant to the node betweenness.

Further, the method for providing the directed graph model of the software architecture by taking the software architecture as a blueprint and a support comprises the following steps:

1) The model G of the SA of the software system is an unweighted directed graph triple < NG, V (G), E (G) >:

N _G is the name of the software system SA model;

v (G) is a set of nodes represented by the members constituting the software system;

e (G) is a set of unweighted directed edges represented by relationships between the components that make up the software system;

2) The member V represented by the node is a binary < NC, FC >:

nc is the name of the member;

fc is a functional description of the building block;

3) The interactive relation among the components is an unweighted directed edge E which is a triplet<E _n ，V _i ，V _j >：

E _n Is a unique identification of a directed edge;

V _i is a member for initiating dependence, and is an initial node;

V _j is a member for accepting dependence, and is a termination node;

<V _i ，V _j &gt represents a node V _i Point of direction V _j ；

4) Model G = of SA<N _G ，V(G)，E(G)&gt, in the formula, the component vi belongs to V (G), and the component V _i The total number of edges as starting nodes is the component v _i Request dependency of (d) _req (v _i )；

5) Model G = SA<V(G)，E(G)&gt, middle and component v _i Is e.g. V (G) structurePiece v _i The total number of edges as termination nodes is component v _i Service dependency of (d) _ser (v _i )；

The sum of the request dependencies of a component and the service dependencies of the component, which is the total dependency of the component, is denoted as d _sum (v _i )；

6) Given graph G =<V(G)，E(G)&gt, node v _i E.g. V (G), passing through the node V in the graph G _i The ratio of the total number of shortest paths in (1) to all shortest paths in graph G is v _i The number of nodes of (C) is denoted as C (v) _i ) (ii) a Then:

wherein δ st is the total number of all shortest paths from the node s to the node t, and δ st (v) is the number of shortest paths passing through the node v in the number of shortest paths from the node s to the node t.

Further, the method for measuring the importance of the member by introducing the node betweenness comprises the following steps:

acquiring association between the components, taking a class in a source code as one component, and scanning the source code to obtain a relationship between a component name identifier and the component;

processing the relation data between the members and mapping the relation data into an adjacent matrix; if the number of the components is n, mapping the relationship between the components into an n-dimensional adjacent matrix M, and defaulting M ₁₁ ，M ₂₂ ，……；

Calculating the member request dependency, the member service dependency and the total dependency of the members of each node on the basis of the adjacency matrix M;

calculating the node betweenness of each node; calculating the shortest path of the whole graph to obtain the total number of the shortest paths of the whole graph and the number of the shortest paths of each node passing through the node, and then calculating the node betweenness of each component according to a formula (1); measuring important components in the SA according to the size of the node betweenness;

respectively calculating Pearson correlation coefficients of request dependency of the component, service dependency of the component, total dependency of the component and node betweenness; respectively calculating according to a formula (2), and analyzing factors most relevant to node betweenness;

the calculation formula of the Pearson correlation coefficient used is:

where X and Y are two vectors of equal length that require correlation to be computed,andare the average of vectors X and Y, respectively, and the order of X and Y does not affect the result of calculation of Pearson correlation coefficients.

Further, the method of processing the relationship data between the members and mapping the relationship data into the adjacency matrix; the method comprises the following steps:

inputting: the method comprises the following steps that (1) a linked list Name of the component and a linked list Connection of the interaction relationship between the components are identified;

and (3) outputting: an adjacency Matrix of a model of the SA;

initialization: the two-dimensional array Matrix is used for storing an adjacent Matrix of the SA model, and the row-column length of the Matrix is equal to the length of a linked list Name;

initializing all M11, M22, \8230;, mnn has a value of 0 except for a member with special self-call;

when pointer integer i =0 starts looping:

assigning the sequence index of the initial member in the Name identification linked list Name in the ith element Connection [ i ] of the inter-member interaction linked list Connection to a variable row;

assigning the sequence index of the termination component in the Connection [ i ] in the Name of the Name identification linked list to a variable column;

let Matrix [ row ] [ column ] be 1;

the value of the pointer i is i +1, if the value of i +1 is smaller than the length of the linked list Connection, the circulation continues, and if not, the circulation is terminated, and an adjacent matrix is obtained; the following were used:

the request dependency and the service dependency of each component are calculated after the adjacency matrix of the SA model is obtained.

Further, calculating the shortest path of the whole graph includes:

inputting: an adjacency Matrix of the SA model;

and (3) outputting: the full graph shortest path Pathes of the SA model;

initialization: taking integers i and j as pointers of each element Matrix [ i ] [ j ] of the adjacent Matrix, wherein i and j respectively represent different nodes, traversing every two nodes, finding out the shortest path among all the nodes according to the content of the adjacent Matrix, and storing the shortest path into the shortest path Pathes of the whole graph every time one shortest path is found out.

Further, calculating the node betweenness of each node comprises:

inputting: the full graph shortest path Pathes of the SA model, and a component Name identification linked list Name;

and (3) outputting: node betweenness of each member of the SA model;

initialization: the chain table Betweenness is used for storing the node Betweenness of the SA model, and the length of the chain table Betweenness is equal to the length of the Name of the component Name identification chain table;

initializing an integer k =0 and an integer i =0, traversing the Name of the component Name identification linked list from the first element, adding 1 to the value of k when finding a path containing the node Name [ i ] in the shortest path Pathes of the whole graph, and after the traversal is finished, taking the node Betweenness [ i ] of the node Name [ i ] as the length of k divided by the linked list Pathes.

The invention also aims to provide a component importance measurement system based on node betweenness in a software evolution environment.

The invention has the advantages and positive effects that: the invention provides a directed graph model of a software architecture by taking the software architecture as a blueprint and a support, and introduces node betweenness to measure the importance of a component. And analyzing the request dependence and the service dependence of the component, and finding out the factor most relevant to the node betweenness by analyzing by using the Pearson correlation coefficient. Through experiments on a large number of open source software source codes, the experimental result shows that the method for measuring the importance of the component by using the node betweenness is effective, the sum of the request dependence and the service dependence of the component is most relevant to the node betweenness of the component, and another analysis direction is indicated for measuring the importance of the component by using the dependence relation in the next step.

According to experimental statistics, compared with the traditional software evolution which is trapped in complex details too early and does not concern microstructures, the importance of components of a complex software system is measured, and the time cost can be saved by 12% on average on unnecessary component observation and evaluation costs. On the other hand, the method measures the importance of the components by adopting node dependence and node betweenness, is more accurate than single node betweenness, is more accurate in distinguishing nodes with similar structures, and can eliminate about 9 percent of similar nodes on average.

In a software system with a good software architecture, the technology is found to be effective, and the total dependence of components and the trend of node betweenness always accord with each other regardless of the scale and the functional type of the software. It is not difficult to find that there is no obvious correlation between the request dependency of the component and the service dependency of the component, that is, there is no regular correspondence between the request dependency of the component and the service dependency of the component, and a component with high request dependency may have high service dependency or low service dependency. The fluctuation trend of the total dependence of the components and the node numbers is basically consistent, which means that the node numbers of the components with high total dependence are also high. The higher the node betweenness of the component, the more important the function and position of the component in the software architecture.

By calculating the total dependence of the components and node betweenness, the importance of the components in the whole software architecture can be clearly measured, the evolution process of the important nodes can be better grasped when the software architecture evolves, the evolution risk is reduced, and the monitoring and management of activities and components which are difficult to control in the evolution activity are facilitated. The request dependency and the service dependency of the components in the software architecture have no regular correlation, and the total dependency and the node betweenness of the components usually show strong positive correlation or strong positive correlation, i.e. the node betweenness of the components with high total dependency is also high.

Drawings

Fig. 1 is a flowchart of a component importance measurement method based on node betweenness in a software evolution environment according to an embodiment of the present invention.

Fig. 2 is a model relationship diagram of an SA provided in an embodiment of the present invention.

Fig. 3 is a percentage line graph of the request dependency of the building block, the service dependency of the building block, the total dependency of the building block, and the node betweenness of the eclipse3.0 provided by the embodiment of the present invention.

In the figure: (a) A distribution curve of request dependencies, service dependencies of the component; (b) percentage line graph of total dependency and node betweenness.

Fig. 4 is a percentage line graph of the request dependency of the component and the service dependency of the component of Jabref, the total dependency of the component, and the node betweenness provided in the embodiment of the present invention.

In the figure: A. a distribution graph of request dependencies, service dependencies of the components; total dependency of B member, percentage line graph of node betweenness.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The application of the principles of the present invention will now be further described with reference to the accompanying drawings.

As shown in fig. 1, the method for measuring importance of a component based on node betweenness in a software evolution environment provided by the embodiment of the present invention includes the following steps:

s101: taking a software architecture as a blueprint and a support, putting forward an unweighted directed graph model of the software architecture, and introducing node betweenness to measure the importance of the component;

s102: and analyzing the request dependence, the service dependence and the total dependence of the component, and finding out factors most relevant to node betweenness by analyzing by using Pearson correlation coefficients.

The invention is further described below with reference to specific assays.

The model of the software architecture is:

given the unrecognized definition of SA, the present invention adopts a relatively popular simple definition: SA is a high-level abstraction of the components and connections that make up the system, with the interaction relationships between the components being treated as connections.

The component realizes the specific functions needed in the system, conforms to a set of interface standards, realizes a group of interfaces, is represented as a data or computing unit bearing certain functions in the system, is also represented as a reusable software module oriented to a software system architecture, and is a replaceable part actually existing in the system.

In the present invention, it is regarded as an opaque whole regardless of its internal structure. When a software system instance is regarded as an SA, the interaction and dependency between the components in the SA are directional, and the interaction between the components is non-directional, then the model of the SA can be defined as follows:

define 1 model of SA (Software architecture model) describe model G of SA of a Software system instance as an unweighted directed graph triplet < NG, V (G), E (G) >:

(1)N _G is the name of the SA model of the software system instance;

(2) V (G) is a set of nodes represented by the members constituting the software system;

(3) E (G) is a collection of unweighted directed edges represented by relationships between the components that make up the software system.

Definition 2 building block (Component): the description of the building block V represented by a node is a binary < NC, FC >:

(1) Nc is the name of the member;

(2) Fc is a functional description of a building block.

Defining 3 inter-Component association describing the interaction between components as an unweighted directed edge E as a triple<E _n ，V _i ，V _j >：

(1)E _n Is a unique identification of a directed edge;

(2)V _i is a member for initiating dependency, namely an initial node;

(3)V _j is the member that accepts the dependency, i.e., the termination node;

(4)<V _i ，V _j &gt, i represents a node V _i Point of direction V _j 。

Define 4 request dependency of building block model G =sa<N _G ，V(G)，E(G)&gt, in which the component vi ∈ V (G) is represented by the component V _i The total number of edges as starting nodes is called the building Block v _i Request dependency of (d) _req (v _i )。

Request dependencies of a component describe the extent and relationship to which the component depends on other modules. The higher the request dependency of a component is, the greater the number of components that the component directly depends on is, the more complex the behavior of the component is, and the higher the component hierarchy is.

Define 5 Component service dependency model G = at SA<V(G)，E(G)&gt, middle and component v _i E.g. V (G) by member V _i The total number of edges as termination nodes is called the building block v _i Service dependency of (d) _ser (v _i )。

The service dependency of a component characterizes the extent to which the component is directly dependent by other modules in the SA. The higher the service dependency of a component is, the stronger the direct dependency of the component is, the higher the reuse rate in the SA is, which means that the behavior function of the component is more fixed.

The sum of the request dependencies of a component and the service dependencies of the component, called the total dependency of the component, is denoted d _sum (v _i )。

Define 6 node Betweenness given graph G =<V(G)，E(G)&gt, node v _i E.g. V (G), passing through the node V in the graph G _i The ratio of the total number of shortest paths in (c) to all shortest paths in graph G is called v _i The number of nodes of (C) is denoted as C (v) _i ). Then:

wherein δ st is the total number of all shortest paths from the node s to the node t, and δ st (v) is the number of shortest paths from the node s to the node t that pass through the node v.

The node betweenness is an important global geometric quantity, reflects the action and the influence of nodes in the whole graph, abstracts the model of the SA into a directed graph model, introduces the node betweenness in the SA evolution, can intuitively observe the position and the influence of the nodes corresponding to the component in the whole SA, is an important index for measuring the key degree and the position of the component during the SA evolution, and has intuitive guidance effect and strong practical significance for mastering and controlling the influence range and the strength of the component before and after the SA evolution.

In the experiment of the invention, classes are simulated as members, class relations are simulated as the unweighted directed edges in the model of the SA, and the corresponding relations are shown in FIG. 2.

The method for measuring the importance of the components in the evolution environment provided by the embodiment of the invention comprises the following steps:

the evolution is a necessary activity of all software systems, the overall structure of the system tends to be complex, the number of components is large, important components in the structure are found out, a basis is provided for detection and controllability of software evolution, an important aspect of mastering and evaluating the evolution is also provided, and the evolution is particularly important for the software evolution work.

The importance of a component is measured in 5 steps, including:

(1) Component and inter-component association is obtained. And taking the class in the source code as a component, and scanning the source code to obtain the relationship between the component name identifier and the component.

(2) The relationship data between the members is processed and mapped into an adjacency matrix. If the number of the components is n, mapping the relationship between the components into an n-dimensional adjacent matrix M, and defaulting M ₁₁ ，M ₂₂ ，……，M _nn Is 0, except for the components with special self-call; for example, if component 1 has a dependency on component 2 and component 2 has no dependency on component 1, then M is ₁₂ Has a value of 1,M ₂₁ The value of (2) is 0.

(3) And calculating the member request dependency, the member service dependency and the total dependency of the members of each node on the basis of the adjacency matrix M.

(4) And calculating the node betweenness of each node. And (3) calculating the shortest paths of the whole graph, and calculating the node betweenness of each component according to a formula (1) after obtaining the total number of the shortest paths of the whole graph and the number of the shortest paths of each node passing through the node. And measuring important components in the SA according to the size of the node betweenness.

(5) And respectively calculating Pearson correlation coefficients of request dependency of the components, service dependency of the components, total dependency of the components and node betweenness. And (3) respectively calculating according to a formula (2), and analyzing factors most relevant to node betweenness.

The calculation formula of the Pearson correlation coefficient used in the present invention is:

where X and Y are two vectors of equal length that require correlation to be computed,andare the average of vectors X and Y, respectively, and the order of X and Y does not affect the calculation result of Pearson correlation coefficients.

In the definition of Pearson correlation coefficient: the absolute value of the correlation coefficient is [0.8,1.0], which is extremely strong correlation; the absolute value of the correlation coefficient is [0.6,0.8], and the correlation is strong; the absolute value of the correlation coefficient is [0.4,0.6], and the correlation is moderate; the absolute value of the correlation coefficient is [0.2,0.4], which is weak correlation; the absolute value is [0,0.2], which is very weak or no correlation.

The Pearson correlation coefficient is used to calculate the correlation between the request dependency of the node building blocks, the service dependency of the building blocks, the total dependency of the building blocks and the node betweenness. In the experiment, observed values of two variables are paired between node betweenness, request dependency of a member, service dependency of the member and total dependency of the member, the observed values of each pair are independent of each other, and standard deviation of the observed values is not 0, so that a Pearson correlation coefficient is defined.

Algorithm

Algorithm 1 obtains the adjacency matrix algorithm of the SA model.

Inputting: the Name of the component identifies the linked list Name and the Connection of the interactive relation linked list among the components.

And (3) outputting: the adjacency Matrix of the model of SA.

Initialization: the two-dimensional array Matrix is used for storing the adjacency Matrix of the SA model, and the length of the rows and the columns of the Matrix is equal to the length of the Name of the linked list.

For example, the partial matrix obtained in experiment one with eclipse3.0 is as follows:

the adjacency matrix of the SA model is obtained, and then the request dependence and the service dependence of each component can be calculated. When calculating the request dependence of the component, the number of 1 in the column of Matrix [ i ] [ ], and the result obtained by the final accumulation and addition is the request dependence of the component corresponding to the node vi; similarly, when calculating the service dependency of a component, the number of 1's in the row of Matrix [ ] [ j ], and the result of the final cumulative addition is the service dependency of the component corresponding to the node vj. The request dependencies and service dependencies of each component are summed to obtain the total dependency of the node.

Algorithm 2SA full graph shortest path algorithm of directed graph model.

Inputting: the adjacency Matrix of the SA model.

And (3) outputting: full graph shortest path pates for SA model.

Initialization: the linked list Pathes is used to store all shortest paths in the graph.

Algorithm 3SA model node betweenness calculation algorithm.

Inputting: and the full graph shortest path Pathes of the SA model, and the member Name identification linked list Name.

And (3) outputting: node betweenness of each member of the SA model.

Initialization: the chain Betweenness is used for storing the node Betweenness of the SA model, and the length of the chain Betweenness is equal to the length of the Name of the component Name identification chain.

The invention is further described below with reference to specific assays.

1. Analysis of experiments

The open source software selected by the invention is nearly one hundred, comprises various functions, such as a software development platform, a programming language source code packet, open source professional software and the like, and can be divided into three types according to the number of nodes: the node number is less than 50 of small-scale software, the node number is 50 to 200 of medium-scale software, and the node number is more than 200 of large-scale software.

As a result, the method is practical and effective, and the total dependence of the components and the trend of the node betweenness always accord with the software scale and the functional category. The embodiment of the node betweenness is related to the structural design of software, and in the software with good design and system structure, the obtained experimental result is most ideal, so that not only the total dependence of the components and the change of the node betweenness always tend to be synchronous, but also the difference of the node betweenness among the components is more obvious; on the contrary, in a software system without good architectural support, the node betweenness between the components is nearly consistent, which leads to the difficulty of measurement between the components.

Due to space limitation, eclipse3.0 belonging to large-scale software and source code belonging to middle-scale Jabref are finally selected as typical two experimental examples for analysis.

1.1 experiment one

The source code of eclipse3.0 was used as experimental data.

FIG. 3 is a percentage line graph of request dependencies and service dependencies of components, total dependencies and node betweenness of components of eclipse3.0, wherein the range change is small because the node betweenness takes values between [ -1,1], and the value of the total dependencies is much larger than that of the node betweenness, so that the relationship and trend of the total dependencies and the node betweenness can be more clearly seen, and the percentage line graph is adopted for analysis. In the figure, the Y-axis represents the size of the request dependency of the component and the service dependency of the component, and the X-axis represents the node. Wherein, (a) a distribution curve of request dependencies, service dependencies of the component; (b) percentage line graph of total dependency and node betweenness.

By observing the trend of the line graph, it is not difficult to find that there is no obvious correlation between the request dependency of the component and the service dependency of the component, that is, there is no regular correspondence between the request dependency of the component and the service dependency of the component, and the component with high request dependency may have high service dependency or low service dependency. The fluctuation trend of the total dependence of the components and the node numbers is basically consistent, which means that the node numbers of the components with high total dependence are also high. The higher the node betweenness of the component, the more important the function and position of the component in the software architecture.

And (3) calculating the correlation between the degree and the node betweenness by adopting a formula (2), and judging whether the correlation between the degree and the node betweenness exists as the correlation shown by a line graph or not by using a Pearson correlation coefficient.

TABLE 1 eclipse3.0 correlation analysis

P(X,Y)	Pearson correlation coefficient value
		P(d _ser ,d _req )	-0.00283943
P(d _ser ,C)	0.49037850
		P(d _req ,C)	0.51885469
P(d _sum ,C)	0.69526782

As can be seen from the calculation result of the Pearson correlation coefficient value, the trend shown by the line graph is correct, the total dependence of the component is in strong positive correlation with the node betweenness, the larger the total dependence of the component is, the larger the node betweenness is, the service dependence and the request dependence of the component are respectively in moderate positive correlation with the node betweenness, and the request dependence and the service dependence are in extremely weak negative correlation or no correlation.

1.2 experiment two

The source code of Jabref was used as experimental data.

FIG. 4 is a graph of the distribution of the request dependencies of the components, the service dependencies of the components, and the percentage of the total dependencies and node intermediaries of Jabref. The sizes of the request dependencies of the building blocks and the service dependencies of the building blocks are shown on the Y-axis of the graph, and the X-axis represents the nodes. In the figure: A. a distribution graph of request dependencies, service dependencies of the components; total dependency of B member, percentage line graph of node betweenness.

In the line graph of Jabref, the request dependency and the service dependency of the components do not show a regular correlation trend, and the total dependency and the fluctuation of the node numbers of the components almost overlap, which indicates that in Jabref, the node numbers of the components with high total dependency are also high.

TABLE 2 Jabref correlation analysis

P(X,Y)	Pearson correlation coefficient value
		P(d _ser, d _req )	0.11752896
P(d _ser, C)	0.62832354
		P(d _req, C)	0.60910461
P(d _sum, C)	0.80746250

Jabref's correlation analysis proves again that the correlation trend shown by the line graph is correct, the request dependence and the service dependence of the component are extremely weak positive correlation or irrelevant, the request dependence and the service dependence of the component are respectively and positively correlated with the node betweenness, the total dependence and the node betweenness are extremely strong positive correlation, and the node betweenness increases with the increase of the total dependence.

By calculating the Pearson correlation coefficients of the service dependence and the request dependence of the building blocks of eclipse3.0 and Jabref, it can be seen that the distribution of the service dependence and the request dependence of the building blocks is irregular, and no correlation exists between the building blocks.

According to the analysis of the Pearson correlation coefficient, in the software architecture, the total dependency and the node betweenness of the component are in extremely strong positive correlation, and the trend changes of the Pearson correlation coefficient of the service dependency and the node betweenness of the component and the Pearson correlation coefficient of the request dependency and the node betweenness of the component are unstable. It can be known from the definition of node betweenness that the node betweenness of a component and the service dependency, request dependency and total dependency of the component have direct relations, when the total dependency of a component is larger, the position and the importance degree of the component in the whole software architecture are higher, and the corresponding node betweenness is higher; conversely, if the total dependency of a component is lower, or even none, then the lower the position and importance of the component in the overall software architecture, the lower the node betweenness.

The total dependence of the components is most closely related to the change of the components, and is a key judgment factor when the positions and the importance degrees of the components in the whole software architecture are judged through node betweenness.

Experiments prove that the importance of the components in the evolution environment of the software by using node-mediated metrics is effective, and the defects of the traditional metric method in mastering the macroscopic characteristics of the software architecture are overcome.

In the whole software architecture, nodes with high request dependence of components are often poor in independence, strong in dependence on bottom-layer components or other basic components, high in coupling degree and complex in function; the nodes with high service dependence of the components are generally high in cohesion, stable in structure and single in function.

By calculating the total dependence of the components and the node betweenness, the importance of the components in the whole software architecture can be clearly measured, and when the software architecture evolves, the evolution process of the important nodes can be better mastered, the evolution risk is reduced, and the monitoring and management of activities and components which are difficult to control in the evolution activity are facilitated. The request dependency and the service dependency of the components in the software architecture have no regular correlation, and the total dependency and the node betweenness of the components usually show strong positive correlation or strong positive correlation, i.e. the node betweenness of the components with high total dependency is also high.

The total dependence is in extremely strong positive correlation with node betweenness, another analysis direction is indicated for the importance measurement of the component, and particularly in a huge software architecture, the dimension reduction is performed on a large number of nodes in a source code during calculation, so that the total dependence of the component which is calculated more quickly is the key point of the next analysis.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A component importance measuring method based on node betweenness in a software evolution environment is characterized in that a software architecture is adopted as a blueprint and a support in the component importance measuring method based on the node betweenness in the software evolution environment, an unweighted directed graph model of the software architecture is provided, and the node betweenness is introduced to measure the importance of a component; and the request dependence, the service dependence and the total dependence of the components are analyzed by using Pearson correlation coefficients to find out the factors most relevant to the node betweenness.

2. The method for measuring the importance of a component based on node betweenness in a software evolution environment according to claim 1, wherein the step of providing the directed graph model of the software architecture by taking the software architecture as a blueprint and a support comprises the following steps:

N _G is the name of the software system SA model;

2) The member V represented by the node is a binary < NC, FC >:

nc is the name of the component;

fc is a functional description of a building block;

E _n Is the only identification of the directed edge;

V _i is a member for initiating dependence and is an initial node;

V _j is a member for accepting dependence, and is a termination node;

<V _i ，V _j &gt represents a node V _i Point of direction V _j ；

4) Model G = of SA<N _G ，V(G)，E(G)&gt, in the formula, the component vi belongs to V (G), and the component V _i The total number of edges as starting nodes is member v _i Request dependency of (d) _req (v _i )；

5) Model G = of SA<V(G)，E(G)&gt, middle, component v _i E.v (G), member V _i The total number of edges as termination nodes is component v _i Service dependency of (d) _ser (v _i )；

The sum of the request dependencies of the building blocks and the service dependencies of the building blocks, denoted d, is the total dependency of the building blocks _sum (v _i )；

3. The method for measuring the importance of a component based on node betweenness in a software evolution environment as claimed in claim 1, wherein said method for introducing node betweenness to measure the importance of a component comprises:

the calculation formula of the Pearson correlation coefficient used is:

where X and Y are two vectors of equal length for which correlation needs to be computed,andare the average of vectors X and Y, respectively, and the order of X and Y does not affect the calculation result of Pearson correlation coefficients.

4. The method for measuring the importance of a component based on node betweenness in a software evolution environment as claimed in claim 3, wherein the method for processing the relationship data between components and mapping the relationship data into an adjacency matrix is characterized in that the method for processing the relationship data comprises a step of calculating the relationship data of the components and a step of calculating the relationship data of the components; the method comprises the following steps:

inputting: the Name of the component Name identification linked list and the Connection of the interactive relationship linked list between the components;

and (3) outputting: an adjacency Matrix of a model of the SA;

when pointer integer i =0 starts looping:

let Matrix [ row ] [ column ] be 1;

the value of the pointer i is i +1, if the value of i +1 is smaller than the length of the linked list Connection, the circulation continues, otherwise, the circulation is terminated, and an adjacent matrix is obtained; the following were used:

5. The method for measuring the importance of a component based on node betweenness in a software evolution environment as claimed in claim 3, wherein the calculating the shortest path of the whole graph comprises:

inputting: an adjacency Matrix of the SA model;

and (3) outputting: the full graph shortest path Pathes of the SA model;

initialization: taking integers i and j as pointers of each element Matrix [ i ] [ j ] of the adjacent Matrix, wherein the i and the j represent different nodes respectively, traversing every two nodes, finding out the shortest path among all the nodes according to the content of the adjacent Matrix, and storing the shortest path into the shortest path Pathes of the whole graph every time the shortest path is found out.

6. The method for measuring the importance of a component based on node betweenness in a software evolution environment of claim 3, wherein calculating the node betweenness of each node comprises:

inputting: the full graph shortest path Pathes of the SA model, and the Name of the component Name identification linked list;

and (3) outputting: node betweenness of each member of the SA model;

7. A component importance measurement system based on node betweenness in software evolution environment of the component importance measurement method based on node betweenness in software evolution environment according to claim 1.