CN112988216A

CN112988216A - Software architecture recovery method based on functional structure

Info

Publication number: CN112988216A
Application number: CN202110270867.9A
Authority: CN
Inventors: 张莉; 贾航; 葛宁; 周雨飞; 李延旭; 王茵迪
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2021-06-18
Anticipated expiration: 2041-03-12
Also published as: CN112988216B

Abstract

The invention relates to a software architecture recovery method based on a functional structure, belongs to the technical field of software architecture recovery, and solves the problem that the software architecture recovered by the existing method is difficult to understand. The method comprises the following steps: constructing functional structure knowledge: each function comprises a parent function and/or a child function and a function-related class; the lowest level of functionality in the functions comprises function-related classes; respectively mapping the parent function, the child function and the lowest-level function into a parent component, a child component and a leaf component, and dividing the related classes of the functions into corresponding leaf components; removing classes included in the building blocks from all classes to obtain unrecovered classes; calculating the correlation of each unrecovered class and each leaf component, and dividing the class with the correlation exceeding a correlation threshold into the leaf components with the highest correlation; repeatedly traversing all unrecovered classes to obtain an updated component; and if all the classes are recovered, recovering the software architecture of the software based on the updated components.

Description

Software architecture recovery method based on functional structure

Technical Field

The invention relates to the technical field of software architecture recovery, in particular to a software architecture recovery method based on a functional structure.

Background

The software architecture is regarded by relevant scholars and practitioners as an important basis for software development and evolution, but three problems exist in the software development and evolution process: first, in the context of agile development, practitioners consider "workable software outperformed a broad documentation," they simply designed a software architecture without sufficient validation, or did not design a corresponding software architecture, and simply started to implement the software system through the needs provided by the user. Second, even if the software architecture is designed and sufficiently verified, there is a problem that development is not performed according to the designed software architecture in the development process, resulting in inconsistency between the implemented software architecture and the designed software architecture. Third, in the process of continuous software evolution, the problem that the source code of the software is updated but the software architecture is not updated often occurs, which is referred to as "software architecture erosion and drift" in the industry. Finally, the missing and outdated software architecture causes problems of software quality reduction, iteration and maintenance difficulty and the like, which are common problems existing in the current software.

Based on the above problems, many scholars and practitioners are dedicated to research on software architecture recovery technology, and how to extract a group of components from a source code entity to form an architecture of a software system so as to reconstruct and update a software architecture document, thereby improving the quality of software and reducing the risks of software development, maintenance and evolution.

Currently, software architecture recovery technology is a popular research field, and has numerous research results, such as Bunch, ACDC, LIMBO, MCA, ECA, ARC and ZBR. Most of the research work, among others, restores the software architecture by targeting high cohesion, low coupling of the structure as a restoration target. Although these research works have achieved certain results, functional structure knowledge and functional semantic information are not used or not fully considered in the process of recovering structural information of source codes by using structural high-cohesion low-coupling as a recovery target, so that the recovered component has no clear functional semantics, and the software architecture is difficult to understand and even unreasonable. Therefore, current research work is still insufficient to consider functional structural knowledge in software architecture recovery, and the average recovery accuracy of recovery methods is less than 50%.

Disclosure of Invention

In view of the foregoing analysis, embodiments of the present invention are directed to providing a method for recovering a software architecture based on a functional structure, so as to solve the problem that the software architecture is difficult to understand and even unreasonable due to the fact that the functional structure is not fully considered in the existing software architecture recovery process.

The invention discloses a software architecture recovery method based on a functional structure, which comprises the following steps:

constructing functional structure knowledge of software of an architecture to be restored; each function in the functional structural knowledge comprises a parent function and/or a child function, and comprises a function-related class; wherein the lowest level of said functions directly contains said function-related class;

respectively mapping a parent function, a child function and a lowest-level function into a corresponding parent component, a corresponding child component and a corresponding leaf component, and dividing the function-related classes directly contained in the lowest-level function into the leaf components mapped by the lowest-level function;

acquiring structural information and text information from a source code of a class, and acquiring keywords of all classes and all leaf components based on the text information and names and function descriptions of the leaf components;

removing classes included in the building block from all classes to obtain unrecovered classes; based on the structural information and the keywords, obtaining the correlation between each unrecovered class and each leaf component, and dividing the class with the correlation exceeding a correlation threshold value into the leaf components with the highest correlation; repeatedly traversing all unrecovered classes to obtain an updated component;

and if all classes are recovered, recovering the software architecture of the software based on the updated components.

On the basis of the scheme, the invention also makes the following improvements:

further, the obtaining keywords of all classes and the leaf components based on the text information, the names of the leaf components and the function description comprises:

for each class, extracting class vocabularies from the corresponding text information, and performing word segmentation and sequencing on the extracted vocabularies to obtain keywords of the class;

for each leaf component, extracting the vocabulary in the class of all the components of the leaf component and the vocabulary in the name and function description of the leaf component, and performing word segmentation and sequencing on the extracted vocabulary to obtain the keywords of the leaf component.

Further, the relevance of each unrecovered class to each leaf member is derived by performing the following operations:

obtaining structural features of unrecovered classes based on the structural information; based on the structural features, obtaining a structural correlation between the unrecovered class and the leaf member;

respectively generating corresponding text vectors based on the unrecovered class and the keywords of the leaf components; obtaining functional semantic relevance between the unrecovered class and the leaf components based on the text vector;

based on the structural relevance, functional semantic relevance, and their corresponding weights, a relevance of each unrecovered class to each leaf member is obtained.

Further, the structural features include:

a dependent feature, a depended feature, an associated feature, an inherited feature, an implemented feature, and an implemented feature.

Further, based on the structural features, obtaining structural correlations between the unrecovered class and the leaf members, including;

calculating a number of relationships between each unrecovered class and the associated class in which the structural feature exists in each leaf member;

normalizing the obtained relation number;

based on the correlation coefficient after the normalization process, the structural correlation of the unrecovered class and the leaf member is obtained.

Further, the method further comprises:

if the residual unrecovered classes exist, clustering the residual unrecovered classes into pseudo leaf components, wherein the number of the pseudo leaf components is equal to the number of clusters;

and recovering the software architecture of the software based on the updated component and the pseudo leaf component.

Further, the pseudo leaf component is obtained by performing the following operations:

obtaining structure-text characteristics of the remaining unrecovered classes based on the structure information and the text information;

generating a corresponding feature vector based on the structure-text features;

and clustering the feature vectors to obtain the clustering result of the residual unrecovered classes, and distributing pseudo leaf components for the classes clustered into one class.

Further, the structure-text features include:

call features, inheritance features, annotation features, and named fragment features.

Further, the method further comprises:

iteratively clustering the pseudo leaf components to form components of a higher functional abstraction level;

restoring the software architecture of the software based on the updated components, the pseudo leaf components, and the components of the higher functional abstraction level.

Further, the building blocks of the higher level of functional abstraction are formed by performing the following operations:

taking the named fragment characteristics of all classes contained in each pseudo leaf component as the named fragment characteristics of the corresponding pseudo leaf component;

generating a named segment feature vector for each pseudo leaf component based on the named segment features of the pseudo leaf components;

and clustering the named fragment feature vectors to obtain a clustering result of the pseudo leaf components, and distributing components with higher function abstract levels for the pseudo leaf components which are clustered into a class.

Compared with the prior art, the invention can realize at least one of the following beneficial effects:

the invention assists the recovery of the software architecture by using the functional structure knowledge to improve the recovery accuracy, and simultaneously can recover the reasonable and functionally understandable software architecture to help software architects and developers to understand the software system integrally, thereby better iterating and maintaining the software system and reducing the iteration and maintenance risk.

In the invention, the technical schemes can be combined with each other to realize more preferable combination schemes. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.

FIG. 1 is a flowchart of a software architecture recovery method based on functional architecture according to an embodiment of the present invention;

FIG. 2 is a flowchart of another functional architecture-based software architecture recovery method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a class call chain obtained by instrumentation techniques in an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a recovery procedure of a high correlation class allocation stage after a correlation threshold is introduced according to an embodiment of the present invention;

fig. 5 is a flowchart of iterative recovery in a high correlation class assignment phase after a correlation threshold is introduced in an embodiment of the present invention.

Detailed Description

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.

A specific embodiment of the present invention discloses a software architecture recovery method based on a functional structure, and a flow chart is shown in fig. 1-2, and the method includes the following steps:

step S1: constructing functional structure knowledge of software of an architecture to be restored; each function in the functional structural knowledge comprises a parent function and/or a child function, and comprises a function-related class; wherein the lowest level of said functions directly contains said function-related class;

in view of the software system functionality, the main role, use and hierarchical relationship between functions, as well as the possible classes for implementing the function, are further described mainly by the names, alternative names and related terms of the functions. Therefore, based on the software system function, the constructed functional structure knowledge (also called "functional point knowledge") includes the following contents/elements:

(1) function name (N): name of function.

(2) Functional description (D): the role, purpose, of a function is described, including a functional description of each of the parent, child, and lowest level functions in the function.

(3) Related terms of function (T): alternative names, abbreviations and other related words for describing functions.

(4) Set of functionally related Terms (TS): representing a collection of functionally related terms. For n function-related terms, a set of function-related terms TS ═ { T > may be obtained₁,T₂,T₃,…,T_n}。

(5) Parent Function (PF) of function: a parent (higher level) function representing a function, the parent function being a high level abstract description of a child function. It should be noted that, the parent function and the child function are in direct relation, and one function may have 0 or 1 parent function.

(6) Functional subfunction (CF): representing the child (lower level) functions that the function contains, the child functions being a detailed description of the parent function. It should be noted that the sub-functions in the definitions are directly related to the function. A function may have 0,1 or more sub-functions.

(7) function-Related Class (RC): representing a collection of core classes involved in implementing the function, excluding the common tools class. For classes, using the notation C, the relevant class file for a function may be denoted as RC ═ C₁,C₂,…,C_n1In which n is₁Is the number of classes built that are associated with the function. It should be noted that only the function with the lowest abstraction level (i.e. the function without sub-functions) directly contains the function-related class.

(8) Function (F): the function name, the function description, the set of function-related terms, the parent function of the function, the child function of the function point, and the related class file of the function may be regarded as a six-element group with respect to the above 6 elements, and the usage symbol may be expressed as F ═ (N, D, TS, PF, CF, RC).

(9) Functional Structure Knowledge (FSK): the functional structure knowledge is the knowledge composed of the functions of the software system and the containing hierarchical relationship between the functions and the containing hierarchical relationship. For the software system S, the function structure knowledge is a set composed of a plurality of functions, wherein the inclusion relationship among the functions is included, and the use symbols can be expressed as

Wherein F_i＝(N_i,D_i,TS_i,PF_i,CF_i,RC_i) I is denoted as the ith function, n₂Is the number of functions that the software system contains.

The constituent elements included in the functional knowledge can be viewed and looked up through table 1.

Table 1 constituent elements of functional knowledge

Specifically, in step S1, the functional structure knowledge of the software may be constructed in the following manner:

and (3) constructing functional structure knowledge by applying domain knowledge:

(1) based on the source code of the software project and the related technical manual, the functional structure knowledge is constructed:

the method comprises the steps of reading source codes of software items and related technical manuals (such as requirement documents, design documents, function specifications and other related data), and combining related domain knowledge of a software system to construct and obtain preliminary functional structure knowledge; this can be done by domain experts of the software system.

(2) In order to obtain more comprehensive and complete functional structure knowledge, on the basis of the mode, the functional structure knowledge can be constructed in an auxiliary manner based on the dynamic running information and the static semantic information of the source code;

on one hand, when dynamic running information of a source code is used for construction, firstly, a instrumentation program for extracting a called class during the running of a software system is constructed; secondly, in the running process of the software system, the instrumentation program is embedded into the program of the software system, and a class call chain for realizing the function is obtained by operating the corresponding function of the software system, as shown in fig. 3; and finally, removing the class and the tool class shared by other functions to obtain a core class related to the function, and further constructing function related class file information in the function structure knowledge. On the other hand, the invention can also use the static semantic information of the source code to construct the functional structure knowledge. The static semantic information of the source code has rich functional semantic knowledge, and people can know the functions of the software system when reading the source code. Therefore, the functional structure knowledge can be constructed by analyzing static semantic information such as package file names, class file names, field names of classes, method names of classes, annotations, comments and the like of source codes of the software system and assisting with a source code analysis tool.

Since a software architect designs a function as a component when designing a software architecture, software architecture recovery essentially abstracts several components from a group of classes, and thus requires mapping functions as components;

step S2: respectively mapping the parent function, the child function and the lowest-level function into corresponding parent components, child components and leaf components, and dividing the function-related classes directly contained in the lowest-level function into the leaf components to which the lowest-level function is mapped. Further, names and functional descriptions of the respective functions are also taken as names and functional descriptions of the corresponding members: that is, the function name is used as the name of the component, and the description of the function and the related terms are used to describe the function of the component; and constructing the hierarchical relationship contained by the component according to the hierarchical relationship contained by the function: for example, the relationship between a parent function and a child function is taken as a functional containment hierarchical relationship between a parent component and a child component;

however, when building functional structure knowledge, it is difficult to find all the related classes of a whole function; that is, there are some classes that are related to functions, but have not been divided into components corresponding to functions. Therefore, after steps S1 and S2 are executed, a complete recovery result cannot be obtained. In view of the above, the present embodiment classifies unrecovered classes into related leaf components by calculating their correlation with the leaf components. The specific process refers to step S3 and step S4:

step S3: acquiring structural information and text information from a source code of a class, and acquiring keywords of all classes and all leaf components based on the text information and names and function descriptions of the leaf components;

step S31: obtaining structural information and text information from the source code of the class, wherein:

the structure information includes: the dependency relationship, association relationship, inheritance relationship and implementation relationship between classes;

the text information includes: fully qualified names of classes, field names, method names, annotations, comments, and fully qualified names of packages to which the classes belong.

Given that software architecture recovery essentially abstracts several components from a set of classes, the process of obtaining structural and textual information from the source code of the class is used and therefore requires extraction of such information. JDT (Java Development tools) is a tool developed by the Eclipse platform, and is used for extracting, parsing and analyzing source code files (class files) written in the Java language, in this embodiment, a JDT software source code analysis tool is used to obtain structural information and text information from the source codes of the classes, and specific descriptions thereof are shown in table 2.

Table 2 extracted source code static information

It should be noted that, in the Java language, although dependency and association between classes can be extracted by importing a declaration (import), class B may not be actually used although class B is imported by importing a declaration in class a due to carelessness of a developer or lack of specification of development. If only the import declaration extraction relationship is relied on, the relationship between the unrealistic classes is extracted, thereby influencing the recovery effect of the software architecture. Thus, the present invention identifies and extracts associations and dependencies between classes from their fields and methods, rather than relying on import assertions.

By performing step S31, the structural information and the text information are extracted, and further processing is subsequently required for these information.

Step S32: based on the text information and the function information of the leaf member in S31, the keywords of the class and the keywords of the leaf member are extracted.

For classes: extracting similar words from text information of the words, then performing word segmentation on the extracted words, and finally sequencing the words after word segmentation through a keyword extraction algorithm to obtain keywords of the class;

for leaf members: extracting vocabularies in classes in all the components of the leaf component, extracting the vocabularies from function information (such as component names (i.e. component names), function descriptions, related terms) of the leaf component; performing word segmentation on the words extracted by the two modes, and finally sequencing the words after word segmentation by a keyword extraction algorithm to obtain keywords of the leaf component;

the word segmentation method used in the embodiment is as follows:

since the extracted vocabulary contains a large number of compound words and involves various naming rules, such as hump naming and underline naming, the extracted vocabulary needs to be segmented according to different naming rules.

1) For words that conform to the hump nomenclature, word segmentation can be based on:

judging whether the method name contains capital letters or not, and if not, taking the method name as a word; otherwise, taking capital letters in the method name as boundaries, and splitting the method name into a plurality of words; wherein, the ith capital letter is the first letter of the (i + 1) th word; i take 1, a₃，n₃The number of capital letters is represented; after the splitting is finished, all letters are converted into lower case; such as AccountController and addaccuunt, after word segmentation and conversion of all letters to lowercase, the words account, controller and add are obtained.

2) For words conforming to the underline nomenclature, the words before and after underline are split with the following underline as a boundary: e.g., account num, after word segmentation and conversion of all letters into lower case, the words account and num are obtained.

3) For a vocabulary containing proper nouns, such as DFSClient, the words dfs and client are obtained after word segmentation and conversion of all letters into lower case.

The keyword extraction method used in this embodiment:

the text adopts an efficient TextRank algorithm, and a plurality of key words are extracted from a group of words to be used as texts of classes and members. The TextRank algorithm is a keyword extraction algorithm based on graph sorting. The basic principle is that a plurality of words are represented as a directed weighted graph, and the words are sequenced by adopting a voting mechanism so as to obtain keywords. For the software architecture recovery method, the input of the TextRank algorithm is a group of words extracted from classes and components and processed, and the output is a group of keywords sorted from large to small according to the importance degree.

Through the above operations, a set of keywords of each class and a set of keywords of the component are obtained.

Step S4: removing classes included in the building blocks from all classes to obtain unrecovered classes; based on the structural information and the keywords, obtaining the correlation between each unrecovered class and each leaf component (also called seed component), and dividing the class with the correlation exceeding a correlation threshold value into the leaf components with the highest correlation; repeatedly traversing all unrecovered classes until no class with the correlation exceeding the correlation threshold exists; obtaining an updated component; in particular, the amount of the solvent to be used,

step S41: obtaining structural features of unrecovered classes based on the structural information; based on the structural features, obtaining a structural correlation between the unrecovered class and the leaf member;

analyzing from the structural aspect, if there are dependency, depended, associated, inherited, realized and realized relationships between one class and another class, the two classes have certain communication cohesion and logic cohesion. The greater the likelihood that an unrecovered class belongs to a leaf member, the greater the structural relevance of the unrecovered class to the leaf member if the greater the number of related classes in which the relationship exists. Therefore, the following 8 structural features are obtained from the structural information of the source code and used for calculating the structural correlation between the class and the component:

(1) dependence characteristics: if class C₁Dependent on class C in leaf members₂And obtaining the 'dependence' characteristic.

(2) Dependent characteristics: if class C₁Class C in leafed member₂Depending, a "depended" feature is obtained.

(3) And (3) correlation characteristics: if class C₁Class C in related leaf Components₂And obtaining the 'association' characteristic.

(4) Associated features: if class C₁Class C in macerals₂And associating to obtain an associated characteristic.

(5) Inheriting the characteristics: if class C₁Inheriting class C in leaf Member₂The "inherited" feature is obtained.

(6) Inherited characteristics: if class C₁Class C in leafed member₂Inherit, resulting in an "inherited" feature.

(7) The realization characteristics are as follows: if class C₁Implementing interface class C in leaf component₂And obtaining the 'implementation' characteristic.

(8) The implemented features are: if interface class C₁Class C in leafed member₂And realizing to obtain the characteristic of being realized.

In order to calculate the structural correlation between the unrecovered class C and the leaf component Com, first, the relationship number between each unrecovered class and the related class having the above structural feature in each leaf component is calculated, and the obtained relationship number is normalized, so as to obtain the structural correlation corr between the unrecovered class and the leaf component_struct(C, Com) in the numerical range of [0,1]。

If the greater the number of relationships a class has with a class in a leaf member, the greater its structural relevance to the leaf member, corr_structThe larger the value, otherwise the smaller. The algorithm description is shown in algorithm 1.

Step S42: respectively generating corresponding text vectors based on the unrecovered class and the keywords of the leaf components; obtaining functional semantic relevance between the unrecovered class and the leaf components based on the text vector;

and analyzing from the aspect of functional semantics, if the keywords contained in one class are similar to the keywords contained in the leaf components, certain functional semantic correlation exists between the class and the leaf components. Therefore, it is necessary to extract keywords from the class and leaf components, generate a text vector, and determine whether there is a certain functional semantic correlation between the unrecovered class and the leaf component by calculating the text similarity therebetween.

The text similarity calculation method comprises the following steps:

in this embodiment, a cosine text similarity calculation method is used to calculate the text similarity between the unrecovered class and the leaf component. Each keyword in the class and the leaf component is firstly coded, and the coding process uses a One-Hot coding mode (One-Hot) to convert the keyword into a text vector which is represented by 0 and 1 so as to abstract the text representing the class and the leaf component. And finally, measuring cosine included angle values of the two text vectors through a cosine function, wherein if the included angle value is larger, the two text vectors are more similar, and further the unrecovered class is more similar to the text of the leaf. The cosine function of the text similarity calculation for the unrecovered class and leaf is shown in equation 1. Wherein ClassText and ComText respectively represent text vectors of unrecovered classes and leaves, i represents the ith element in the text vector, and n represents the ith element in the text vector₄Is the length of the text vector.

Step S43: based on the structural relevance, functional semantic relevance, and their corresponding weights, a relevance of each unrecovered class to each leaf member is obtained.

After obtaining the structural relevance, semantic relevance value and weight of the unrecovered class and the leaf component, calculating a relevance matrix between the class and the leaf component by using formula (2):

corr(C,Com)＝w_struct·corr_struct(C,Com)+w_func·corr_func(C,Com) (2)

wherein, w_structA weight representing structural correlation, configured for structural correlation between class and member, the weight ranging from [0, 1%]。w_funcRepresenting functional semantic relevance weights: weights configured for functional semantic relatedness between classes and members, the range of weight values being [0,1]。

The correlation between an unrecovered class and the seed building block constructed from the functional structure knowledge can be calculated by equation (2) to obtain a correlation vector F, as shown in equation (3), where n₅Is the number of leaf members; the M unrecovered classes may generate a correlation matrix M as shown in equation (4).

F＝(corr(C,Com₁),corr(C,Com₂),···,corr(C,Com_n5)) (3)

Step S43: obtaining a correlation of each unrecovered class to each leaf member based on the structural correlation, functional semantic correlation, and their corresponding weights; and based on the correlation matrix and the correlation threshold, recovering the class:

by means of the correlation corr (C, Com) between class C and leaf component Com, i.e. recovering a class, the threshold of the correlation is the minimum value of whether a class can be classified into a leaf component, the recovery behavior can be limited to avoid the classification of irrelevant classes into leaf components. When the relevance is above a relevance threshold, then the class is divided into leaf members; if the relevance is below the relevance threshold, the class is not restored and not removed from the set of unrecovered classes. For classes that do not reach the relevance threshold, they are not classified into any leaf component for the time being. It should be noted that the recovery process is an iterative process. And (3) regenerating a correlation matrix after finishing one round of recovery, further solving the leaf component most related to the unrecovered class, judging whether the maximum correlation reaches a correlation threshold value, and if so, dividing the class into the corresponding leaf components. When the unrecovered class set is no longer changed in the next iteration process, the recovery process is stopped. Fig. 3 and 4 show a flow of one-time restoration and an iterative restoration flow after the restoration is performed by introducing the correlation matrix and the correlation threshold.

By executing step S4, a recovery result of the high correlation class allocation stage in fig. 2 (corresponding to the "first stage" in fig. 2) is obtained. At this time, some or all of the unrecovered classes in step S2 are assigned to the corresponding leaf members.

Step S5: and if all classes are recovered, recovering the software architecture of the software based on the updated components.

In the architecture recovery at this stage, there may be a problem that the built functions cannot completely match the functions existing in the source code, so that partial classes cannot be allocated to the building blocks constructed based on the functional structure knowledge, and further recovery is required for these classes, which refers to steps S6 and S7.

Step S6: if the residual unrecovered classes exist, clustering the residual unrecovered classes into pseudo leaf components, wherein the number of the pseudo leaf components is equal to the number of clusters; and recovering the software architecture based on the updated component and the pseudo leaf component. Specifically, the pseudo leaf component is obtained by performing the following operations:

step S61: obtaining structure-text characteristics of the remaining unrecovered classes based on the structure information and the text information;

upon restoration of the leaf member, the following characteristics were selected:

1) calling feature (a.k.a. calling class) calling class: if both class A and class B call the same class C, there may be a communicative cohesion between them, marked as "calling class C" feature f_1c. This feature pertains to a structural feature.

2) Inheritance feature (also known as inheritance class/implementation interface): if both class a and class B inherit the same class C or implement the same interface C,there may be a logical cohesion between them, denoted as the "inherited class C" feature f_2c. This feature pertains to a structural feature.

3) Annotation feature (a.k.a. tag annotation) tag annotation: if class A and class B both mark the same annotation C, there may be functional cohesion between them, feature f marked as "Mark annotation C_3c. The feature belongs to a functional semantic feature.

4) Named fragment characteristics (also known as, including named fragments): if both class A and class B contain the same named segment C, there may be functional cohesion between them, which is marked as "containing named segment C" feature f_4c. The named fragments are derived from keywords formed after the identifier is participled. The feature belongs to a functional semantic feature.

Step S62: generating a corresponding feature vector based on the structure-text features; and (3) generating a feature vector and a feature matrix when the leaf member is recovered:

in this stage, different weights are given to the structural and functional semantic features, and the following definitions are made:

structural feature weight (w)_s): weights configured for structural features, the range of weight values being [0, 1%]。

Functional semantic feature weights (w)_f): weights configured for functional semantic features, with weight values in the range of [0,1]。

The sum of the two weights is 1.

Assuming that the software system S has m classes and interfaces in common, including n named fragments and k annotations, m "call class" features, m "inherit class/implement interface" features, n "include named fragment" features and k "tag annotation" features are generated, totaling (2m + n + k) features. If the entity has the above-mentioned characteristics, the characteristic value is 1, otherwise it is 0. Thus, each entity can generate a feature vector of dimension (2m + n + k).

For class C_iThe feature vector U with structure and function semantic feature weight can be obtained_i：

For p classes that are not recovered, the feature matrix A can be obtained:

step S63: and clustering the feature vectors to obtain the clustering result of the residual unrecovered classes, and distributing pseudo leaf components for the classes clustered into one class.

A feature vector is generated for each unrecovered class and a feature matrix is composed. This feature matrix will be used as input to a neighbor propagation algorithm to cluster unrecovered classes, resulting in a collection of pseudo leaf components.

In order to better meet the requirement of software architecture recovery, the pseudo leaf components can be clustered to form components with higher function abstraction levels, so that the recovery level of the software architecture can be better embodied. The specific process is as shown in step S7:

step S7: iteratively clustering the pseudo leaf components; restoring the software architecture of the software based on the updated components, the pseudo leaf components, and the components of the higher functional abstraction level.

Step S71: taking the named fragment characteristics of all classes contained in each pseudo leaf component as the named fragment characteristics of the corresponding pseudo leaf component;

in restoring the high-rise component, the following features are selected:

named fragment signature (also "comprising named fragment"): if both component A and component B contain the same named segment C, there may be functional cohesion between them, denoted as "containing named segment C" feature r_c. The named fragments are derived from keywords formed after the identifier is participled. The feature belongs to a functional semantic feature.

Step S72: generating a named segment feature vector for each pseudo leaf component based on the named segment features of the pseudo leaf components; and (3) generating a feature vector and a feature matrix when the high-level member is recovered:

assuming there are currently m named fragments, m "contain named fragment" features are generated. If the entity has the above-mentioned characteristics, the characteristic value is 1, otherwise it is 0. Thus, each entity can generate a feature vector of dimension m.

For component Com_iTo obtain its feature vector V_i：

For p members, a feature matrix B can be derived:

step S73: and clustering the named fragment feature vectors to obtain a clustering result of the pseudo leaf components, and distributing components with higher function abstract levels for the pseudo leaf components which are clustered into a class.

By executing steps S6, S7, the recovery result of the low correlation-class aggregation stage in fig. 2 (corresponding to the "second stage" in fig. 2) is obtained.

Firstly, inputting the feature matrix generated by the unrecovered classes into an AP algorithm, wherein the algorithm outputs a clustering result after calculation, the clustering result represents a clustering number to which each unrecovered class belongs, namely the number of a pseudo leaf component, and the classes with the same clustering number are divided into the same pseudo leaf component. Thereafter, the higher-level component needs to be iteratively restored according to the specified number of restoration layers. Finally, a group of components with functional inclusion layers are recovered, and the recovery of the software architecture of the low-correlation-class aggregation stage is completed.

After the pseudo leaf components and the components with higher function abstraction levels are obtained, the hierarchical relations among all the components can be sorted, and a complete software architecture recovery result is formed.

Those skilled in the art will appreciate that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program, which is stored in a computer readable storage medium, to instruct related hardware. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. A software architecture recovery method based on a functional structure is characterized by comprising the following steps:

acquiring structural information and text information from a source code of a class, and acquiring keywords of all classes and all leaf components based on the text information and names and function descriptions of the leaf components; removing classes included in the building block from all classes to obtain unrecovered classes; based on the structural information and the keywords, obtaining the correlation between each unrecovered class and each leaf component, and dividing the class with the correlation exceeding a correlation threshold value into the leaf components with the highest correlation; repeatedly traversing all unrecovered classes to obtain an updated component;

2. The method for restoring a software architecture based on a functional structure according to claim 1, wherein the obtaining keywords of all classes and the leaf components based on the text information, the names and the functional descriptions of the leaf components comprises:

3. The functional fabric-based software architecture recovery method of claim 2, wherein the dependency of each unrecovered class on each leaf component is obtained by performing the following operations:

4. The functional architecture-based software architecture recovery method of claim 3, wherein the structural features comprise:

5. The functional architecture-based software architecture recovery method of claim 3 or 4, characterized in that based on the structural features, the structural dependencies between unrecovered classes and leaf components are derived, including;

normalizing the obtained relation number;

6. The functional architecture-based software architecture recovery method of any one of claims 1-5, wherein the method further comprises:

7. The functional fabric-based software architecture recovery method of claim 6, wherein the pseudo leaf component is obtained by performing the following operations:

generating a corresponding feature vector based on the structure-text features;

8. The functional architecture-based software architecture recovery method of claim 7, wherein the structure-text features comprise:

9. The functional architecture-based software architecture recovery method of any of claims 6-8, wherein the method further comprises:

10. The functional architecture-based software architecture restoration method according to claim 9, wherein the higher functional abstraction level components are formed by performing the following operations: