CN118093439B - Microservice extraction method and system based on consistent graph clustering - Google Patents

Microservice extraction method and system based on consistent graph clustering Download PDF

Info

Publication number
CN118093439B
CN118093439B CN202410487715.8A CN202410487715A CN118093439B CN 118093439 B CN118093439 B CN 118093439B CN 202410487715 A CN202410487715 A CN 202410487715A CN 118093439 B CN118093439 B CN 118093439B
Authority
CN
China
Prior art keywords
view
graph
information
consistent
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410487715.8A
Other languages
Chinese (zh)
Other versions
CN118093439A (en
Inventor
位雪银
李静
吴金龙
顾荣斌
何旭东
方晓蓉
邵佳炜
张皛
潘晨灵
刘文意
周忠冉
李马峰
蔡世龙
潘安顺
顾亚林
张俊杰
邱文元
富思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
State Grid Corp of China SGCC
State Grid Shanghai Electric Power Co Ltd
State Grid Electric Power Research Institute
Original Assignee
Nanjing University of Aeronautics and Astronautics
State Grid Corp of China SGCC
State Grid Shanghai Electric Power Co Ltd
State Grid Electric Power Research Institute
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics, State Grid Corp of China SGCC, State Grid Shanghai Electric Power Co Ltd, State Grid Electric Power Research Institute filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202410487715.8A priority Critical patent/CN118093439B/en
Publication of CN118093439A publication Critical patent/CN118093439A/en
Application granted granted Critical
Publication of CN118093439B publication Critical patent/CN118093439B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a micro-service extraction method and a system based on consistent graph clustering, comprising single program structure dependent view construction, single program semantic view construction, feature embedded representation learning based on consistent graph enhancement graph transitomer and micro-service extraction based on k-means clustering algorithm. The invention constructs a structure dependency view and a semantic view by extracting the dependency relationship among classes in the single program and text information used in the class creation process, then generates a consistent graph by enhancing a graph Transformer through the consistent graph, realizes unified modeling of the structure information and the semantic information of the single program, and finally realizes splitting of the single program by using a k-means clustering algorithm based on the obtained consistent graph. The invention combines the multi-view information of the single program to construct the consistent graph enhancement graph transducer, thereby realizing the improvement of the performance of micro-service extraction in the aspects of functionality and modularity.

Description

Microservice extraction method and system based on consistent graph clustering
Technical Field
The invention relates to the technical fields of software engineering and artificial intelligence, in particular to a method and a system for extracting micro services based on consistent graph clustering.
Background
In the conventional monolithic architecture, the application program is developed as a whole, and as the demand progresses and the business iterates, the code size of the monolithic application program may increase, thereby causing a series of problems. Unlike monolithic architecture, microservice architecture develops a single application into a set of small services, where each service is responsible for a specific function. These services have the characteristics of high cohesion, low coupling, independent development, testing and deployment, etc. In recent years, with the continuous development of cloud computing, the advantages of micro-service architecture over single programs are gradually revealed, so that more and more companies want to reconstruct the single programs left by them into micro-services, so that the benefits brought by cloud deployment can be better benefited. However, most of the existing micro-service extraction methods are manually completed, are time-consuming, labor-consuming and prone to error, and the traditional automatic micro-service extraction method cannot effectively integrate the abundant structural information and semantic information of the single program.
Microservice extraction refers to the process of dividing a monolithic application into multiple microservices. Wherein each micro-service is responsible for an independent function, the micro-service has high cohesiveness in the micro-service and low coupling in the micro-service. The micro-service extraction method can be divided into three types according to the collection mode of data: a manner based on static analysis, a manner based on dynamic analysis, and a manner combining static analysis and dynamic analysis. Some researchers have proposed a method for partitioning microservices based on knowledge maps. The method comprises the steps of analyzing the modules, functions and entities of a system and the dependence on hardware resources, dividing the entities into data entities based on system data storage and operation, forming module entities based on system services, forming functional entities based on the system services and resource entities based on system hardware dependence, expressing a single application program into a graph structure using the four entity types, and finally obtaining candidate micro-services by using a Louvain algorithm. Some researchers have proposed a micro-service partitioning tool CARGO that creates a system dependency graph of the monolithic application through static analysis that reflects various relationships that exist in the monolithic application, including call return edges, data flow edges, heap dependent edges, and database transaction edges, and community detection algorithms that refine the current partition using the monolithic application's context and flow-sensitive system dependency graph, resulting in a highly cohesive and low-coupling micro-service. Part of researchers first determine the main tasks of an application program to be split and sub-tasks thereof; program flow is then defined to determine which tasks can be converted into functions and how the functions interact with each other; finally, the actual functions are specified and those too small are combined to generate candidate micro-services. The method is applicable to any scale application. Some researchers have combined the advantages of static and dynamic analysis strategies and have proposed a hybrid solution to break down monomer applications. According to the scheme, a dynamic call matrix is constructed according to runtime analysis, and according to static analysis on codes, semantic information of classes and interaction relations among the classes are identified, so that a static structure matrix and a semantic matrix are constructed. And finally, obtaining candidate micro-services by using a DBSCAN clustering algorithm.
In summary, with the development of artificial intelligence technology in recent years, the micro-service extraction work has greatly progressed, and despite a great deal of research, a method for jointly modeling structural information and semantic information rich in single programs is still lacking at present, so that the micro-service extraction performance is slightly low.
Disclosure of Invention
The invention aims to provide a micro-service extraction method and a micro-service extraction system based on consistent graph clustering, which can effectively process structural information and semantic information rich in single programs.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
A micro-service extraction method based on consistent graph clustering, the micro-service extraction method comprising the following steps:
Step S1: extracting calling relations, inheritance relations, implementation relations and reference relations among classes in the JAVA monomer program by adopting a static analysis strategy, obtaining a node attribute matrix of the structure dependent view and an adjacent matrix of the structure dependent view based on the dependency relations among the classes, and constructing the structure dependent view according to the node attribute matrix of the structure dependent view and the adjacent matrix of the structure dependent view;
Step S2: extracting text information used in the process of creating the class, generating a node attribute matrix of the semantic view based on a TF-IDF algorithm, calculating the relevance of node embedded representation by using cosine similarity, generating an adjacent matrix of the semantic view, and constructing the semantic view of the single program according to the node attribute matrix of the semantic view and the adjacent matrix of the semantic view;
Step S3: respectively modeling consistency information and inconsistency information of a structural view and a semantic view by using a consistency diagram enhancement diagram converter, maximizing the consistency information by using a designed similarity loss function, generating a consistency diagram, and uniformly modeling the structural information and the semantic information of the single program;
Step S4: and (3) clustering the consistent graphs obtained in the step (S3) based on a k-means algorithm, and extracting the micro-services.
Further, in step S1, the process of constructing the structure dependency view based on the dependency relationships between classes and the adjacency matrix includes the following steps:
Step S11: analyzing source codes of JAVA application programs to obtain a bottom abstract syntax tree of JAVA source codes;
Step S12: the following information is extracted from the underlying abstract syntax tree: class names, annotation lists, variable names, reference lists, method lists called from other classes, lists of inherited classes, reference lists, and implementation lists;
step S13: according to the information extracted in the step S12, obtaining calling dependency relationship, inheritance dependency relationship, reference dependency relationship and implementation dependency relationship among classes, counting the number of the dependency relationship between any two classes, and obtaining a node attribute matrix of the structure dependency view ; The node attribute matrix of the structure dependent view is defined as follows:
Wherein the method comprises the steps of Values representing the ith row, jth column of the structure dependent view node attribute matrix, a representing the nodeSum nodeThere are a dependency relationships between them;
Step S14: if the dependency relationship exists between the classes, creating edges between the two classes to obtain an edge set E, and constructing an adjacency matrix of the structure dependency view ; The adjacency matrix of the structure-dependent view is defined as follows:
Wherein, Representing a node i and a node j respectively,Values representing the ith row, jth column of the structure dependent view adjacency matrix;
Step S15: node attribute matrix according to structure view Adjacency matrixObtaining a structure dependent viewWhere V represents the set of nodes,Representing a class in the overall application.
Further, in step S2, the process of constructing the semantic view of the single program includes the following steps:
step S21: extracting text information used in each class definition according to the information obtained in the step S12, dividing each word in the text by using a hump naming method, splitting the word into a plurality of words, and filtering out stop words;
step S22: measuring TF-IDF value of each word to obtain a value of WhereinThe number of nodes is represented by the number of nodes,Representing the number of all words extracted from the monomer application; node attribute matrix using the matrix as semantic view
Step S23: according to the node attribute matrixCosine similarity between any two classes is calculated to obtain an adjacency matrix of the semantic view
Step S24: node attribute matrix according to semantic viewAdjacency matrixObtaining a semantic viewWhere V represents the set of nodes.
Further, in step S3, the process of generating the coincidence map includes the following steps:
Step S31: respectively inputting the structure dependent view and the semantic view of the single program into a consistent graph enhancement graph transform encoder, and learning to obtain consistency information and inconsistency information of the structure dependent view and the semantic view, wherein the consistency information and the inconsistency information are respectively expressed as And; Wherein,AndRepresenting the consistency and proprietary features of the structure dependent view respectively,AndRespectively representing consistent features and proprietary features of the semantic view;
the following loss functions were designed for training:
Wherein, The expression of the function isRepresenting multiplication of corresponding positions of the two matrixes;
Step S32: the consistent features and the special features of the structure-dependent view and the semantic view obtained in the step S31 are respectively input into a feedforward layer, and after normalization operation and nonlinear activation operation, the consistent features and the special features of the new structure-dependent view and the semantic view are obtained and respectively expressed as And
Step S33: inputting the consistency features and the special features of the structure-dependent view and the semantic view obtained in the step S32 into a consistent graph enhancement graph converter decoder to obtain the reconstruction features of the structure-dependent viewAnd reconstructed features of semantic viewsCalculating a loss value:
Wherein the method comprises the steps of Representing node representations of corresponding ith row in the node attribute matrix and reconstructed node attribute matrix in the structural view,Representing node attribute matrixes in the semantic view and node representations corresponding to the j-th row in the reconstructed node attribute matrixes;
reconstructing the adjacency matrix of the structure dependent view and the adjacency matrix of the semantic view, and calculating the following loss:
In the method, in the process of the invention, AndThe ith row and the jth row of the representation obtained by splicing the consistency characteristic and the special characteristic obtained by the encoder respectively represent the structure dependent view and the semantic view;
step S34: splicing the consistency characteristics of the structure dependent view and the semantic view obtained in the step S32 to obtain a consistency graph:
further, in step S31, for the structural view Corresponding consistency characteristics ofProprietary featuresThe method comprises the following steps of:
Wherein, Obtained by factorization of the Laplace matrix of the graph,The dimensions of the feature vector are represented,Is a parameter of the full connection layer,For the non-linear mapping layer parameters,The dimensions of the map are represented and,
Further, in step S32, for consistency characteristics of the structural viewObtained through a feedforward layer
Wherein the method comprises the steps ofThe normalization operation is represented by a graph,Is a matrixIs used for the degree matrix of the (c),The parameters of the convolutional layer are represented,Representing a nonlinear activation function.
Further, in step S4, the process of extracting the micro-service based on the k-means algorithm to cluster the consistent graph obtained in step S3 includes the following steps:
For the consistent graph Z obtained in the step S3, clustering nodes by using a K-means algorithm, and supposing that the clusters are divided into The following losses are calculated:
where z represents the embedded vector of the node, Representing clustersCentroid of (2) is integrated withAndFormation of new loss functions
Wherein the method comprises the steps ofIs a super parameter; by minimizing the loss functionParameters of the consistent graph enhancement graph transducer are trained and optimized.
In a second aspect, the invention also discloses a micro-service extraction system based on consistent graph clustering, which comprises an information extraction module, a hierarchical multi-head self-attention module, a feedforward layer module based on a graph convolution neural network and a consistent graph enhancement graph converter decoder module;
the information extraction module analyzes the JAVA source codes, extracts the dependency relationship among the classes and text information used in the class creation process, and constructs a structure dependency view and a semantic view;
the hierarchical multi-head self-attention module simultaneously focuses on consistency information and non-consistency information of views, learns and models the consistency information while removing the non-consistency information, and maximizes the consistency information of the two views based on a designed similarity loss function;
the feedforward layer module based on the graph convolution neural network obtains better characteristic embedded representation by taking the output of the hierarchical multi-head self-attention module and the adjacent matrix of the corresponding view as inputs;
The consistent graph enhancement graph converter decoder module comprises a mask multi-head self-attention module, a multi-head attention module and a feedforward layer module based on a graph convolution neural network, the splicing result of the consistent information and the inconsistent information of each view is input into the decoder module to obtain the reconstructed node characteristics, and a reconstruction loss function is designed based on the reconstruction errors.
In a third aspect, the present invention also discloses a computer readable storage medium storing a computer program, where the computer program causes a computer to execute the micro-service extraction method based on consistent graph clustering as described above.
In a fourth aspect, the present invention also discloses an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the micro-service extraction method based on the consistent graph clustering when executing the computer program.
Compared with the prior art, the invention has the following beneficial effects:
Firstly, the micro-service extraction method and the system based on consistent graph clustering can convert the single application program into the structure dependent view and the semantic view representation, and effectively process the abundant structure information and semantic information of the single program.
Secondly, the invention provides a consistent graph clustering-based micro-service extraction method and system, and provides a graph transformation model based on consistent graph enhancement, wherein the hierarchical multi-head self-attention module provided by the model can model consistent parts and inconsistent parts of views at the same time, and learn and remove the inconsistent parts while keeping the consistent parts, so that a consistent graph is obtained, and unified modeling of single program structure information and semantic information is realized.
Thirdly, in order to better utilize the structural information of the graph, the method and the system for extracting the micro-service based on the consistent graph clustering use a feedforward layer based on a graph convolution layer to further extract the characteristics, so that the performance of the micro-service extraction in terms of functionality and modularity is improved.
Drawings
FIG. 1 is a flow chart of a method for microservice extraction based on consistent graph clustering in accordance with the present invention;
FIG. 2 is a frame diagram of a method for microservice extraction based on consistent graph clustering in accordance with the present invention;
Fig. 3a to 3d are ablation experimental results provided in the embodiment of the present invention, wherein fig. 3a is an ICP index, fig. 3b is an SM index, fig. 3c is an IFN index, and fig. 3d is an NED index;
FIGS. 4a to 4d are graphs of the effect of different embedding dimensions on different indicators of different data sets according to an embodiment of the present invention, wherein FIG. 4a is an ICP indicator, FIG. 4b is an SM indicator, FIG. 4c is an IFN indicator, and FIG. 4d is a NED indicator;
FIG. 5 is a graph showing the comparison of the performance of different methods on Jpetstore datasets;
FIG. 6 is a graph showing the comparison of the performance of different methods on DAYTRADER datasets;
FIG. 7 is a graph showing the results of a comparison of the performance of different methods on Plants data sets;
FIG. 8 is a graph showing the comparison of the performance of different methods on Acmeair datasets.
Detailed Description
Embodiments of the present invention are described in further detail below with reference to the accompanying drawings.
As shown in FIG. 1, the invention discloses a micro-service extraction method based on consistent graph clustering, which sequentially comprises single program structure dependent view construction, single program semantic view construction, feature embedded representation learning based on consistent graph enhancement graph transitioner and micro-service extraction based on k-means clustering algorithm. Firstly, constructing a semantic view and a structure dependent view by combining semantic information of a single program and a dependent relation between classes; then, a consistent graph enhancement graph converter is used for generating a consistent graph of the two views, so that unified modeling of the single program structure information and the semantic information is realized; and finally, clustering the consistent graphs by using a k-means clustering algorithm to realize the extraction of the micro-services.
As shown in fig. 2, which is a structural diagram of the micro service extraction system based on consistent graph clustering, it can be seen from fig. 2 that a single application program can be converted into an abstract syntax tree form by parsing, then dependency relationships among classes and text information of a single class can be identified from the abstract syntax tree, a structural dependency view and a semantic view are constructed, wherein the structural view reflects the dependency relationships among the classes, the semantic view reflects the semantic features of each class, then a graph Transformer encoder based on consistent graph enhancement is used for separating consistent information and inconsistent information of each view, and then the obtained consistent features are fused to obtain a consistent graph, and meanwhile, for extracting better representation, a decoder based on the graph Transformer is used for reconstructing the consistent information and the inconsistent information of each view into original node features.
The method for extracting the micro-services based on the consistent graph clustering is described in detail below with reference to the accompanying drawings.
S1, building a single program structure dependent view. The step S1 specifically comprises the following sub-steps:
step S11: analyzing source codes of JAVA application programs to obtain a bottom abstract syntax tree of JAVA source codes;
Step S12: the following information is extracted from the abstract syntax tree: class names, annotation lists, variable names, reference lists, method lists called from other classes, lists of inherited classes, reference lists, and implementation lists;
step S13: according to the extracted information, call dependency relationship, inheritance dependency relationship, reference dependency relationship and implementation dependency relationship among classes are obtained, the number of the dependency relationship between any two classes is counted, and a node attribute matrix of the structure dependency view is obtained ; The node attribute matrix of the structure dependent view is defined as follows:
Wherein the method comprises the steps of Values representing the ith row, jth column of the structure dependent view node attribute matrix, a representing the nodeSum nodeThere are a dependency relationships between them;
Step S14: if the dependency relationship exists between the classes, creating edges between the two classes to obtain an edge set E, and constructing an adjacency matrix of the structure dependency view ; The adjacency matrix of the structure-dependent view is defined as follows:
Wherein, Representing a node i and a node j respectively,Values representing the ith row, jth column of the structure dependent view adjacency matrix;
Step S15: node attribute matrix according to structure view Adjacency matrixObtaining a structure dependent viewWhere V represents the set of nodes,Representing a class in the overall application.
By analyzing the source code of the single program, the bottom abstract syntax tree of the JAVA source code can be obtained. Defining that when the class a calls one of the methods of the class b, the class a to the class b have a calling relationship; when the class a inherits from the class b, defining that the class a to the class b have an inheritance relationship; when the class a realizes the function of the interface class b, defining that the class a to the class b have an implementation relationship; when class b is used as the type of method parameter in class a, there is a reference relationship defining classes a to b. The following information is then extracted from the abstract syntax tree: class names, annotation lists, variable names, reference lists, method lists called from other classes, lists of inherited classes, and implementation lists. Through the extracted information, four kinds of dependency relationships of calling, inheritance, realization and reference among classes can be obtained.
The four dependencies described above are used to build the structural dependency. Analyzing the dependency relationship between any two classes through the information extracted in the previous step, and constructing a structure dependency viewWhere V represents the set of nodes,Representing a class in the overall application; representing an adjacency matrix, wherein Representing the number of nodes, when any one of calling, inheritance, realization or reference dependency relationship exists between the node i and the node j, establishing a directed edge from the node i to the node j, whereinOtherwiseIf a certain node m and all other nodes do not have edges established, namely m is an independent node, discarding the node; representing the attribute matrix of the node, counting the number of the dependency relationship between the node i and the node j, Indicating that there are a dependencies between node i and node j. Theoretically, the stronger the dependency between class i and class j, the more likely they belong to the same microservice.
S2, constructing a semantic view of the single program. The step S2 specifically includes the following sub-steps:
step S21: extracting text information used in each class definition according to the information obtained in the step S12, dividing each word in the text by using a hump naming method, splitting the word into a plurality of words, and filtering out stop words;
Step S22: measuring TF-IDF (Term Frequency-Inverse Document Frequency) value of each word to obtain a value of WhereinThe number of nodes is represented by the number of nodes,Representing the number of all words extracted from the monomer application. Node attribute matrix using the matrix as semantic view
Step S23: according to the node attribute matrixCalculating cosine similarity between any two classes to obtain
Adjacency matrix to semantic view
Step S24: node attribute matrix according to semantic viewAdjacency matrixObtaining a semantic viewWhere V represents the set of nodes, the same as V in the structural view.
The monomer application program contains abundant semantic information, wherein class names, parameter names, method names and variable names in the method can reflect business concepts, and the notes outline functions. When the semantic features of two classes are more similar, the more likely they are to implement the same business function, i.e. the greater the likelihood that they belong to the same microservice.
For each of the N classes in the monolithic application, text information used in the class definition is extracted from the source code, including class names, notes, parameter names, method names, and variable names in the method. For each word in the text, the method name, variable name and the like are first divided by using hump naming, split into a plurality of words, and then stop words are filtered. Finally, for each class in the single application, a word vector is obtained, based on the word vector and the vocabulary, wherein the vocabulary is all the preprocessed words extracted from the single application, and the TF-IDF (Term Frequency-Inverse Document Frequency) value of each word is measured, and finally a word size is obtainedWhereinRepresenting the number of words in a domain vocabulary extracted from a single application, node attribute matrix using the matrix as a semantic viewSimultaneously calculating cosine similarity between every two classes to obtain adjacency matrix of semantic view. Thus, semantic viewsWhere V represents the set of nodes, the same as V in the structural view.
S3, feature embedding representation learning based on the consistent graph enhancement graph Transformer. The step S3 specifically comprises the following sub-steps:
1) Feature embedding represents learning. The structure dependent view obtained in the above step is processed And semantic viewAnd respectively inputting a consistent graph enhancement graph converter coder to obtain the consistent information and the inconsistent information of each view. In a structure dependent viewFor example, calculate position coding:
wherein I represents an identity matrix Representation matrixIs a degree matrix of (2).AndThe feature values and the feature vectors are respectively represented, and the feature vectors are arranged in descending order according to the sizes of the corresponding feature values to be used as position codes.
Layered multi-headed self-attention module inputWhereinThe dimensions of the feature vector are represented,Is a parameter of the full connection layer. And obtaining consistency characteristics and inconsistency characteristics through a layered multi-head self-attention module:
Wherein the method comprises the steps of For the non-linear mapping layer parameters,The dimensions of the map are represented and,
After passing through the hierarchical multi-head attention module, the structure view and the semantic view are respectively divided into a consistent part and a proprietary part, which are expressed asAnd. The following loss functions were designed for training:
Wherein the method comprises the steps of Representing the multiplication of the corresponding positions of the two matrices.
In order to be able to better utilize the structural information of the graph, the invention uses a two-layer graph convolution layer as a feed-forward layer to get a better embedded representation. Feature in consistency of structural viewsFor example, obtained via a feed-forward layer
Wherein the method comprises the steps ofThe normalization operation is represented by a graph,Is a matrixIs used for the degree matrix of the (c),AndParameters representing the convolutional layer of the graphRepresenting a nonlinear activation function. The structural view and the semantic view are respectively obtained through the stepsAnd
2) View information reconstruction.
To obtain more representative features, reconstructing node features using a consistent graph enhancement graph transform decoder willAnd the adjacent matrix is used as the input of a decoder to obtain a node attribute reconstruction matrixAndCalculating a loss value:
Wherein the method comprises the steps of AndRepresenting node representations of corresponding ith row in the node attribute matrix and reconstructed node attribute matrix in the structural view,AndAnd representing node representation of the j-th row corresponding to the node attribute matrix in the semantic view and the reconstructed node attribute matrix.
For better learning structural information of the view, the adjacency matrix is reconstructed and the following losses are calculated:
In the method, in the process of the invention, AndAnd respectively representing an ith row and a jth row of the representation obtained by splicing the consistency characteristic and the proprietary characteristic obtained by the structural view and the semantic view through the encoder.
3) And generating a consistent graph. Consistency characterization of structural and semantic viewsAndSplicing to obtain a consistent graph:
S4, micro-service extraction based on a k-means clustering algorithm. The step S4 specifically includes the following sub-steps:
According to the consistency graph Z obtained in the step S3, clustering nodes by using a K-means algorithm, and dividing the clusters into the following assumption The following losses are calculated:
where z represents the embedded vector of the node, Representing clustersIs calculated using the following formula:
Comprehensive synthesis AndFormation of new loss functions
Wherein the method comprises the steps ofAndIs a super parameter; by minimizing the loss functionParameters of the consistent graph enhancement graph transducer are trained and optimized.
Dimension of hierarchical multi-headed self-attention module during trainingThe head number h=8, the dimension of the two-layer graph convolution in the feed-forward layerThe number of layers of the encoder and the decoder is 1, and in the loss function, the super parameter is calculatedAnd automatically selecting by using a super-parameter optimizer. The invention first uses lossThe model is pre-trained, clusters and centroids are initialized based on the pre-training results, the model is formally trained, and the whole network is trained by minimizing the overall loss function back propagation.
To this end, the training process and microservice extraction of the present invention has been computationally completed. All experiments were performed on a server running Windows 10 (64 bits), equipped with NVIDIA GeForce GTX 1660 Ti Graphics Processing Unit (GPU) and 16gb of memory. Using PyTorch and Python implementations, an Adam optimizer was used during training. To evaluate the present invention, four published monomer applications were used: the Acme-Air, plants, jpetstore, and DAYTRADER test that the present invention removes the individual classes after extracting all classes covered by the application. Acme-Air is an application program for implementation of an airline related business, and after removing independent classes, 45 classes are used to construct a structural view and a semantic view. Plants is an internet storefront application that uses 34 classes after the independent class is removed. Jpetstore is an online pet store system that uses 24 classes after the independent class is removed. DAYTRADER is an online stock exchange system that uses 59 classes after removing independent classes. Because the selection of the number K of the microservices can have a certain influence on the result, in the invention, for each data set, K values in different ranges are selected to obtain a series of results, and then the median is taken as the final result. For Jpetstore, set upFor DAYTRADER, set upFor Plants, set upFor Acmeair, set up
The present invention was compared to reference method Bunch, coGCN, foSCI, MEM, mono, 2micro, GCN and GAT. For all data sets, the parameters were set as follows: the number of pre-training times is 150, the number of formal training times is 100, and the learning rate is 0.0005.
The performance evaluation is mainly performed from four angles of functional independence, modularity, interaction complexity and micro-service granularity, and four evaluation indexes are selected:
(1) IFN (independence of functionality). IFN measures the number of interfaces in a microservice. Can be expressed as:
where K represents the number of extracted micro services, The number of interfaces in the microservice i is indicated. The lower the value of IFN, the better the quality of the resulting decomposition.
(2) SM (structural modularity quality). SM measures the cohesiveness of the class within the microservice and the coupling of the class between microservices. The definition is as follows:
where K represents the number of extracted micro services, Representing the number of edges present in the microservice i,Representing the number of classes present in the microservice i,The number of edges present between microservice i and microservice j is represented. The higher the cohesion and the lower the coupling of the resulting decomposition, the higher the value of SM and therefore the better the value of SM.
(3) ICP (inter-part CALL PERCENTAGE). ICP measures the percentage of static calls between two micro-services, and can reflect the degree of dependence between micro-services, which can be expressed as:
where K represents the number of extracted micro services, The number of calls that exist between microservice i and microservice j is represented. The lower the value of ICP means lower coupling between microservices and better decomposition quality.
(4) NED (Non-Extreme Distribution). NED measures the granularity of the decomposition by measuring the size of each microservice, which can be expressed as:
where K represents the number of extracted micro services, Indicating whether the micro service i is of a suitable size, if the number of classes in the micro service i isBetween them, thenOtherwise. The lower the value of NED, the better the quality of the resulting decomposition. The main objective is to verify whether the micro-services extracted by the invention are related to functional and modularized independence, and meanwhile, the evaluation indexes of the test results are mainly IFN, SM, ICP and NED. The results are shown in fig. 5 to 8, compared with the experimental performance of the other methods. The invention achieves very promising results.
From the overall performance comparison, fig. 5 through 8 enumerate ICP, SM, IFN and NED values for the different methods over four data sets. It can be seen that the best SM and NED values are obtained for the monomer program Jpetstore and that in all methods, the second best ICP and IFN values are obtained. For monomer program DAYTRADER, the present invention achieved the best results in ICP and IFN metrics, with SM metrics next to Bunch. The best IFN values were obtained for Plants procedure, and the results on SM index were inferior to MEM, ICP index was superior to FoSCI, coGCN, GCN and GAT. For the monomer program Acmeair, the present invention is significantly superior to several other methods in terms of ICP and IFN index, and good IFN values were obtained using the dual view GCN and GAT constructed herein. The invention provides improved functionality and modularity compared to other methods.
In order to examine the necessity of each component of the algorithm, ablation study is carried out, and the invention sets five variant methods of the method: GC-VCG-stru represents that only structure dependent views are used, GC-VCG-sem represents that only semantic views are used, GC-VCG-a represents that the picture convolution layer in the self-encoder feed-forward layer is replaced by a linear layer, and GC-VCG-b researches the number of four dependent relations among classes when node representation matrixes of the structure views are constructed, so that four matrixes are obtained, and then the four matrixes are spliced to be used as node representation matrixes, and the extracted micro-service effect is achieved. The GC-VCG-c variant extracts embedded representations of the structural view and the semantic view, respectively, from the encoder using a graph Transformer, and then fuses and clusters the two embedded representations to produce the selected microservice. The experimental results are shown in fig. 3a to 3 d. The method used by the invention can be seen from the graph, almost all data sets and all indexes acquire optimal results, two views constructed by the method can be verified to more comprehensively represent the single program, and the consistent graph enhancement graph converter constructed by the method can realize unified modeling of the structure information and the semantic information of the single program.
Fig. 4a to 4d show graphs of the effect of different node embedding dimensions on different index performances. For ICP index, the monomer application Acmeair achieves relatively good results at an embedding dimension of 32 or 256, relatively poor results at an embedding dimension of 128, and the monomer application Jpetstore and monomer application Plants achieve relatively good results at an embedding dimension of 128, while DAYTRADER achieves good results at an embedding dimension of 256. For SM indices, four datasets achieve relatively good results with different embedding dimensions, respectively. For IFN metrics, application Jpetstore, daytrader and Plants achieve the relatively best results when the embedding dimension is 128, while application Acmeair achieves the worst results when the embedding dimension is 128. For NED metrics, the monomer applications Jpetstore and Acmeair obtain relatively best results when the embedded dimension takes 64, while DAYTRADER and Plants obtain relatively best results when the embedded dimension takes 32. From the test results, the model has different performance on the feature embedding dimension in different application programs and different indexes, so that the embedding dimension is set to be a range, the model automatically selects a proper value for each data set and each index.
The classes in each micro-service have very high functional similarity, which indicates that the micro-services serve the same function, the single responsibility principle is satisfied, the interaction between the classes is basically concentrated in the micro-service, the interaction between the micro-services is less, and the characteristics of high cohesion and low coupling are satisfied.
In summary, the invention aims at the problem of single application program micro-service extraction, models similar entity relation feature representation in single application programs, provides a micro-service extraction method based on consistent graph clustering, combines abundant structural information and semantic information of single program, constructs a multi-view representation form of the single program, uses consistent graph enhancement graph converters to perform feature embedding representation learning, realizes unified modeling of single program structural view and semantic view, and finally obtains candidate micro-services by using k-means clustering algorithm. A relatively wide verification experiment is carried out on 4 general data sets, and experimental results show that the micro-service extraction method based on consistent graph clustering effectively improves the modularity and the functional performance of micro-service extraction.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be realized by adopting various computer languages, such as object-oriented programming language Java, an transliteration script language JavaScript and the like.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (7)

1. The micro-service extraction method based on the consistent graph clustering is characterized by comprising the following steps of:
Step S1: extracting calling relations, inheritance relations, implementation relations and reference relations among classes in the JAVA monomer program by adopting a static analysis strategy, obtaining a node attribute matrix of the structure dependent view and an adjacent matrix of the structure dependent view based on the dependency relations among the classes, and constructing the structure dependent view according to the node attribute matrix of the structure dependent view and the adjacent matrix of the structure dependent view;
Step S2: extracting text information used in the process of creating the class, generating a node attribute matrix of the semantic view based on a TF-IDF algorithm, calculating the relevance of node embedded representation by using cosine similarity, generating an adjacent matrix of the semantic view, and constructing the semantic view of the single program according to the node attribute matrix of the semantic view and the adjacent matrix of the semantic view;
Step S3: respectively modeling consistency information and inconsistency information of a structural view and a semantic view by using a consistency diagram enhancement diagram converter, maximizing the consistency information by using a designed similarity loss function, generating a consistency diagram, and uniformly modeling the structural information and the semantic information of the single program;
step S4: clustering the consistent graphs obtained in the step S3 based on a k-means algorithm, and extracting micro services;
in step S1, the process of constructing a structure dependency view based on the dependency relationships between classes and the adjacency matrix includes the steps of:
Step S11: analyzing source codes of JAVA application programs to obtain a bottom abstract syntax tree of JAVA source codes;
Step S12: the following information is extracted from the underlying abstract syntax tree: class names, annotation lists, variable names, reference lists, method lists called from other classes, lists of inherited classes, reference lists, and implementation lists;
step S13: according to the information extracted in the step S12, obtaining calling dependency relationship, inheritance dependency relationship, reference dependency relationship and implementation dependency relationship among classes, counting the number of the dependency relationship among any two classes, and obtaining a node attribute matrix X stru of the structure dependency view; the node attribute matrix of the structure dependent view is defined as follows:
Xstru(i,j)=a
Wherein X stru (i, j) represents the value of the ith row and jth column of the structure-dependent view node attribute matrix, a represents that a dependency relationship exists between the node V i and the node V j;
Step S14: if the dependency relationship exists between the classes, creating edges between the two classes to obtain an edge set E, and constructing an adjacency matrix A stru of the structure dependency view; the adjacency matrix of the structure-dependent view is defined as follows:
Wherein v i,vj represents node i and node j, respectively, and a stru (i, j) represents values of an ith row and a jth column of the structure dependent view adjacency matrix;
Step S15: obtaining a structure dependent view G stru=(V,Astru,Xstru according to a node attribute matrix X stru and an adjacency matrix A stru of the structure view, wherein V represents a node set, and V i epsilon V represents a class in the whole application program;
in step S2, the process of constructing the semantic view of the single program includes the following steps:
step S21: extracting text information used in each class definition according to the information obtained in the step S12, dividing each word in the text by using a hump naming method, splitting the word into a plurality of words, and filtering out stop words;
step S22: measuring TF-IDF value of each word to obtain a value of Where n= |v| represents the number of nodes and e L represents the number of all words extracted from the monomer application; taking the matrix as a node attribute matrix X sem of the semantic view;
step S23: according to the node attribute matrix X sem, the cosine similarity between any two classes is calculated,
Obtaining an adjacency matrix A sem of the semantic view;
Step S24: obtaining a semantic view G sem=(V,Asem,Xsem according to a node attribute matrix X sem and an adjacency matrix A sem of the semantic view, wherein V represents a node set;
In step S3, the process of generating the coincidence map includes the steps of:
Step S31: respectively inputting the structure dependent view and the semantic view of the single program into a consistent graph enhancement graph transform encoder, and learning to obtain consistency information and inconsistency information of the structure dependent view and the semantic view, wherein the consistency information and the inconsistency information are respectively expressed as AndWherein,AndRepresenting the consistency and proprietary features of the structure dependent view respectively,AndRespectively representing consistent features and proprietary features of the semantic view;
the following loss functions were designed for training:
Wherein the expression of the Sim function is
Sim (a, B) = - Σ [ (a·a T-B·BT)⊙(A·AT-B·BT) ], as indicated by the multiplication of the corresponding positions of the two matrices;
Step S32: the consistent features and the special features of the structure-dependent view and the semantic view obtained in the step S31 are respectively input into a feedforward layer, and after normalization operation and nonlinear activation operation, the consistent features and the special features of the new structure-dependent view and the semantic view are obtained and respectively expressed as And
Step S33: inputting the consistency features and the special features of the structure-dependent view and the semantic view obtained in the step S32 into a consistent graph enhancement graph converter decoder to obtain the reconstruction features of the structure-dependent viewAnd reconstructed features of semantic viewsCalculating a loss value:
Wherein the method comprises the steps of Representing node representations of corresponding ith row in the node attribute matrix and reconstructed node attribute matrix in the structural view,Representing node attribute matrixes in the semantic view and node representations corresponding to the j-th row in the reconstructed node attribute matrixes;
reconstructing the adjacency matrix of the structure dependent view and the adjacency matrix of the semantic view, and calculating the following loss:
In the method, in the process of the invention, AndThe ith row and the jth row of the representation obtained by splicing the consistency characteristic and the special characteristic obtained by the encoder respectively represent the structure dependent view and the semantic view;
step S34: splicing the consistency characteristics of the structure dependent view and the semantic view obtained in the step S32 to obtain a consistency graph:
2. The method for extracting micro services based on consistent graph clustering as claimed in claim 1, wherein in step S31, for the structural view G stru, the corresponding consistency features are Proprietary featuresThe method comprises the following steps of:
Wherein,
Obtained by factorization of the laplace matrix of the graph, d u represents the dimension of the eigenvector,Is a parameter of the full connection layer,For a non-linear mapping layer parameter, d q=dk=dv=dmodel represents the dimension of the mapping,
3. The method for extracting micro services based on consistent graph clustering as claimed in claim 1, wherein in step S32, for the consistent features of the structural viewObtained through a feedforward layer
Wherein LayerNorm denotes the normalization operation,
Is a matrixIs used for the degree matrix of the (c),The parameters of the convolutional layer are represented,
ReLU (x) =max (0, x) represents a nonlinear activation function.
4. The method for extracting micro-services based on consistent graph clustering as claimed in claim 1, wherein in step S4, the process of extracting the micro-services based on the k-means algorithm to the consistent graph clustering obtained in step S3 includes the following steps:
For the consistent graph Z obtained in the step S3, clustering nodes by using a K-means algorithm, and calculating the following loss on the assumption that the clusters are divided into (C 1,C2,...,Ck):
Where z represents the embedding vector of the node, mu i represents the centroid of cluster C i, comprehensive AndFormation of new loss functions
Wherein alpha, beta, gamma and delta are super parameters; by minimizing the loss functionParameters of the consistent graph enhancement graph transducer are trained and optimized.
5. A micro-service extraction system based on consistent graph clustering based on the method of claim 1, wherein the micro-service extraction system comprises an information extraction module, a hierarchical multi-head self-attention module, a feed-forward layer module based on a graph convolution neural network and a consistent graph enhancement graph transform decoder module; the information extraction module analyzes the JAVA source codes, extracts the dependency relationship among the classes and text information used in the class creation process, and constructs a structure dependency view and a semantic view;
the hierarchical multi-head self-attention module simultaneously focuses on consistency information and non-consistency information of views, learns and models the consistency information while removing the non-consistency information, and maximizes the consistency information of the two views based on a designed similarity loss function;
the feedforward layer module based on the graph convolution neural network obtains better characteristic embedded representation by taking the output of the hierarchical multi-head self-attention module and the adjacent matrix of the corresponding view as inputs;
The consistent graph enhancement graph converter decoder module comprises a mask multi-head self-attention module, a multi-head attention module and a feedforward layer module based on a graph convolution neural network, the splicing result of the consistent information and the inconsistent information of each view is input into the decoder module to obtain the reconstructed node characteristics, and a reconstruction loss function is designed based on the reconstruction errors.
6. A computer-readable storage medium storing a computer program, wherein the computer program causes a computer to execute the consistency map clustering-based micro service extraction method according to any one of claims 1 to 4.
7. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, which processor, when executing the computer program, implements a method for micro-service extraction based on consistent graph clustering as claimed in any one of claims 1 to 4.
CN202410487715.8A 2024-04-23 Microservice extraction method and system based on consistent graph clustering Active CN118093439B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410487715.8A CN118093439B (en) 2024-04-23 Microservice extraction method and system based on consistent graph clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410487715.8A CN118093439B (en) 2024-04-23 Microservice extraction method and system based on consistent graph clustering

Publications (2)

Publication Number Publication Date
CN118093439A CN118093439A (en) 2024-05-28
CN118093439B true CN118093439B (en) 2024-07-05

Family

ID=

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985245A (en) * 2020-08-21 2020-11-24 江南大学 Attention cycle gating graph convolution network-based relation extraction method and system
CN114647465A (en) * 2022-05-23 2022-06-21 南京航空航天大学 Single program splitting method and system for multi-channel attention-chart neural network clustering

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985245A (en) * 2020-08-21 2020-11-24 江南大学 Attention cycle gating graph convolution network-based relation extraction method and system
CN114647465A (en) * 2022-05-23 2022-06-21 南京航空航天大学 Single program splitting method and system for multi-channel attention-chart neural network clustering

Similar Documents

Publication Publication Date Title
US10019509B1 (en) Multi-dimensional modeling in a functional information system
CN109960810B (en) Entity alignment method and device
US10354201B1 (en) Scalable clustering for mixed machine learning data
CN110807154A (en) Recommendation method and system based on hybrid deep learning model
US10268749B1 (en) Clustering sparse high dimensional data using sketches
US11373117B1 (en) Artificial intelligence service for scalable classification using features of unlabeled data and class descriptors
CN114647465B (en) Single program splitting method and system for multi-channel attention map neural network clustering
CN111930906A (en) Knowledge graph question-answering method and device based on semantic block
CN112417289A (en) Information intelligent recommendation method based on deep clustering
Mikuła et al. Magnushammer: A transformer-based approach to premise selection
Miao et al. A dynamic financial knowledge graph based on reinforcement learning and transfer learning
CN116245107A (en) Electric power audit text entity identification method, device, equipment and storage medium
KR102269606B1 (en) Method, apparatus and computer program for analyzing new contents for solving cold start
CN109408643A (en) Fund similarity calculating method, system, computer equipment and storage medium
CN112883066A (en) Multidimensional range query cardinality estimation method on database
Babur et al. Towards statistical comparison and analysis of models
CN118093439B (en) Microservice extraction method and system based on consistent graph clustering
CN117435603A (en) Training method and device for data consistency determination model and computer equipment
CN116975634A (en) Micro-service extraction method based on program static attribute and graph neural network
CN114281950B (en) Data retrieval method and system based on multi-graph weighted fusion
CN113643141B (en) Method, device, equipment and storage medium for generating interpretation conclusion report
US11875250B1 (en) Deep neural networks with semantically weighted loss functions
CN118093439A (en) Microservice extraction method and system based on consistent graph clustering
CN113190582B (en) Data real-time interactive mining flow modeling analysis system
CN115409541A (en) Cigarette brand data processing method based on data blood relationship

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant