CN113760778B - Word vector model-based micro-service interface division evaluation method - Google Patents

Word vector model-based micro-service interface division evaluation method Download PDF

Info

Publication number
CN113760778B
CN113760778B CN202111316694.6A CN202111316694A CN113760778B CN 113760778 B CN113760778 B CN 113760778B CN 202111316694 A CN202111316694 A CN 202111316694A CN 113760778 B CN113760778 B CN 113760778B
Authority
CN
China
Prior art keywords
interface
micro
service
cluster
word vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111316694.6A
Other languages
Chinese (zh)
Other versions
CN113760778A (en
Inventor
李莹
夏轩轩
张凌飞
朱晓莉
方燕翎
毛义华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Zhongyi Science And Technology Co ltd
Binhai Industrial Technology Research Institute of Zhejiang University
Original Assignee
Tianjin Zhongyi Science And Technology Co ltd
Binhai Industrial Technology Research Institute of Zhejiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Zhongyi Science And Technology Co ltd, Binhai Industrial Technology Research Institute of Zhejiang University filed Critical Tianjin Zhongyi Science And Technology Co ltd
Priority to CN202111316694.6A priority Critical patent/CN113760778B/en
Publication of CN113760778A publication Critical patent/CN113760778A/en
Application granted granted Critical
Publication of CN113760778B publication Critical patent/CN113760778B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3612Software analysis for verifying properties of programs by runtime analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Abstract

The invention provides a word vector model-based micro-service interface division evaluation method, which comprises the following steps of: the server side constructs a micro-service cluster; collecting log data to restore a distributed link calling process among all micro-service applications; model training: splitting a graph-shaped calling chain into linear calling subchains, extracting interface names according to a calling sequence to form an interface character string array, and obtaining a man-made micro-service interface division set omega; performing word vector model training based on the interface character string array to obtain a word vector of the interface name; interface division evaluation: taking the category number K of the micro-service application in the current cluster as the cluster number to obtain a cluster division set of a K-means algorithm; and evaluating the rationality of the omega interface division of the set by using a Purity algorithm with the clustering division set of the K-means algorithm as a reference. The method is based on the calling relation of the actual operation of the micro-service interface, uses a mathematical method to subdivide the interface set, compares the interface set with the micro-service interface divided manually, and guides the optimization of the existing micro-service architecture.

Description

Word vector model-based micro-service interface division evaluation method
Technical Field
The invention belongs to the field of micro service interfaces, and particularly relates to a micro service interface division evaluation method based on a word vector model.
Background
The traditional single application architecture is generally based on Tomcat middleware, and the complexity of the system is increased by the architecture, so that the cooperation among developers is difficult, and the system is difficult to be smoothly and continuously integrated and continuously released. In actual operation, the problem of chain reaction of faults is easy to occur, and the rapidly-increased business scale of the internet company cannot be met.
Compared with the traditional single architecture, the micro-service architecture decomposes the functions into discrete services, each service is cohesive enough, so that the coupling of the system is reduced, the services can be horizontally and vertically expanded and independently deployed, the problem of one service cannot lead the whole system to be paralyzed, and the system cannot be limited on a certain technical stack for a long time. The project adopting the micro-service architecture can realize the integration of rapid iteration, frequent release, development, operation and maintenance.
Based on the above advantages, more and more companies split the monolithic application into the micro-service architecture, for example, patent document with publication number CN112988122A discloses a monolithic application splitting tool and method based on the correlation between functional characteristics and micro-services, and patent document with publication number CN111026468A discloses a backend splitting strategy based on micro-services.
However, when the single application system has complex business, huge codes and numerous modules coupled together, it is challenging to comb out an ideal micro-service structure by means of manual disassembly. Unreasonable service interface division can lead to more complex service dependence relationship, recursively increases call delay among services, and sometimes even some simple functions are difficult to construct. This has the result that development progress is slowed, migration is more difficult, and the like.
In order to better build a micro-service architecture and reduce the call delay between services, the rationality of micro-service interface division needs to be measured and objectively evaluated.
Disclosure of Invention
In view of the above, the present invention aims to provide a method for evaluating micro-service interface division based on a word vector model, so as to solve the problem of low efficiency caused by unreasonable interface division and complex inter-service dependency relationship.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a micro service interface division evaluation method based on a word vector model comprises the following steps:
s1, collecting data, specifically comprising the following steps:
s11, the server side constructs a micro service cluster;
s12, collecting and restoring the distributed link calling process among the micro service applications and forming a graph-shaped calling chain;
s2, setting a word vector model, inputting an interface character string array, and obtaining a word vector of an interface name, wherein the method comprises the following specific steps:
s21, dividing the graph-shaped calling chain into m linear calling subchains by a depth-first search method DFS, extracting interface names according to a calling sequence to form an interface character string array, and obtaining a man-made micro-service interface division set omega;
s22, carrying out word vector model training based on the interface character string array of the step S21 to obtain a word vector of the interface name;
s3, interface division evaluation, which comprises the following steps:
s31, taking the category number K of the micro-service application as the cluster number, and using the word vector of the clustering interface name of the K-means algorithm to obtain the clustering cluster division set C of the K-means algorithm ═ C1,c2,...,ck};
S32, clustering cluster division set C ═ { C) by K-means algorithm1,c2,...,ckUsing Purity algorithm to evaluate human as referenceThe micro service interface of (2) is divided into the rationality of a set omega.
Further, in step S11, the method for the server to construct the micro service cluster includes:
the method comprises the steps that a server side constructs a micro-service cluster on the basis of spring cluster, service discovery annotations @ EnableDiscoveryClient and Feign annotations @ EnableFeign Clients are started on a micro-service application starting class, and calling is carried out between micro-service applications through the Feign Client.
Further, in step S12, a method for restoring the distributed link call procedure between the microservice applications and forming a graph-like call chain is collected:
adding a link tracking tool SOFATracer dependency, a Spring Cloud OpenFeign dependency and a data collection tool Zipkin dependency in a configuration file of each micro-service application, and performing embedded point access on a Spring Cloud OpenFeign component by using the SOFATracer to obtain a link calling process of each micro-service application;
introducing a link collection and display tool Zipkin into each project engineering, starting a Zipkin server, receiving link log data reported by a SOFATracer, cleaning the link log data by the Zipkin to form a graph-shaped calling chain, and restoring a distributed link calling process.
Further, the parameters of the sofatrecer configuration include:
a logging path, which designates a log file output directory;
com, alipay, sofa, tracker, Zipkin, enabled, starting the SOFATracer to report data to Zipkin remotely;
com, alipay, sofa, tracker, Zipkin, baseUrl, report data to the server address of Zipkin
The Spring Cloud OpenFeign summary log output by sofatrer can be seen in the log catalog of the project, and the parameters contained in one piece of data in the log are as follows:
app, representing the current microservice application name;
url, which represents the request interface address;
traceId, which represents the ID in sofastracer representing a unique request;
the spanId represents the level of the request in the whole call link;
the naming rule of the spanId is the number of a father spanId + a son spanId, the calling chain context relationship is included, and the spanIds with the same TraceId are collected to form a complete link tree.
Further, in step S21, the method for extracting the interface names according to the calling order and forming the interface character string array includes:
converting each linear calling subchain into an interface character string separated by a space, wherein m linear calling subchains form an interface character string array with the length of m, each interface character string represents an interface calling process of a primary child request, and the extracted interface granularity is a father path in an interface address and represents a resource type in micro-service application;
performing deduplication processing on all extracted interface names, and dividing the interface names into k clusters omega { omega ═ omega } according to the categories of the micro-service applications corresponding to the interface names12,...,ωk},Ω={ω12,...,ωkAnd k represents the number of categories of micro service applications in the current cluster.
Further, in step S22, the word vector model is a CBOW model in the word vector models provided by the python genetic library;
the specific steps of training the word vector model are as follows:
setting a generated word vector dimension S, a window size C and a lowest word frequency min _ count equal to 1;
inputting an interface character string array, and establishing a sliding window with the size of C on each interface character string;
the central word of the sliding window is used as a target of the training, the rest words in the window are used as input nodes of the neural network, a piece of training data is generated after the window slides once, and a word vector representation set v ═ v { v } of each interface name is obtained through repeated iterative training1,v2,...,vn}。
Further, in step S31, the word vectors of the interface names are clustered using the K-means algorithm, and a cluster partition set C ═ C of the K-means algorithm is obtained1,c2,...,ckThe concrete steps are as follows:
taking the category number K of the microservice applications in the current cluster in step S21 as the cluster number of the K-means algorithm, first randomly selecting K vectors { mu ] from the interface word vector set v1,μ2,...,μkAs each class cluster C in the set CiAnd initializing cluster ci={μi},i∈{1,2,...,k};
Computing interface word vectors vjAnd each mean vector muiDistance d ofjiWherein j ∈ {1, 2., n }, and v is determined according to the nearest mean vectorjCluster class λ ofj=arg mini∈{1,2,...,k}dji,λjMeans when the distance djiThe value of the minimum time variable i, i.e. λjE {1, 2.., k }, and a vector v of interface wordsjInto a corresponding cluster
Figure GDA0003430787030000051
t
1,2,3, …, original
Figure GDA0003430787030000052
After one iteration is finished, aiming at each class cluster ci,ci∈{c1,c2,...,ckRecalculate the center point
Figure GDA0003430787030000053
Clustering the mean vector mu of the current classiUpdated to be mu'iThen for each interface word vector vjSearching the central point closest to the user again;
repeatedly circulating until the set C of the two iterations does not change, and finally obtaining the clustering cluster division set C of the K-means algorithm (C ═ C)1,c2,...,ck}。
Further, the calculating of the interface word vector vjAnd each mean vector muiDistance d ofjiThe specific method comprises the following steps:
vector v of interface wordjAnd each mean vector muiAre normalized and converted into unit vectors;
vector v of interface wordjAnd each mean vector muiThe normalized unit vector is subjected to vector dot product operation to obtain vector inner product, namely vector space cosine included angle, and the value of the cosine included angle is taken as the distance d between the two vectorsji
The range of the cosine is [ -1,1], if the cosine between two vectors tends to-1, the semantic difference is larger, and tends to 1, the semantic similarity is considered to be higher.
Further, in step S32, the formula of the Purity algorithm is:
Figure GDA0003430787030000054
where N denotes the total number of word vectors, and Ω ═ ω { [ ω ]1,ω2,...,ωkDenotes an artificial micro-service interface partition set, C ═ C1,c2,...,ckExpressing a clustering cluster division set of a K-means algorithm;
purity ∈ [0,1], with closer to 1 indicating more reasonable partitioning of the microservice interface.
For each cluster omegaiAssigning a category j, wherein the assignment rule is that the interface word vector v with the category j is in the cluster omegaiIn which v ∈ C is the largestjCalculate each cluster ωiAnd summing and normalizing the occurrence times of the word vectors with the category of j to obtain the final score Purity.
Compared with the prior art, the word vector model-based micro-service interface division evaluation method has the following beneficial effects:
the micro-service interface division evaluation method based on the word vector model is based on the calling relation of the actual operation of the micro-service interface, uses mathematical methods such as the word vector model, the K-means clustering and the Purity algorithm to divide the interface set again, compares the interface set with the manually divided micro-service interface, calculates the manual interface division evaluation score, and guides the existing micro-service architecture to carry out further optimization and adjustment, so that the micro-service architecture more conforms to the principle of high-cohesion and low-coupling micro-service architecture.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a micro-service interface partitioning evaluation method based on a word vector model according to the present invention;
FIG. 2 is a process diagram of a restore request call chain according to the present invention;
FIG. 3 is a diagram illustrating a word vector model according to the present invention;
FIG. 4 is a schematic diagram of the K-means clustering algorithm and the Purity algorithm according to the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used only for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention. Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless otherwise specified.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
As shown in fig. 1, the method for evaluating the division of the micro service interface based on the word vector model mainly includes a data collection stage S1, a model training stage S2 and an interface evaluation stage S3.
S1, a data collection stage, which comprises the following steps:
s11, the server side constructs a micro service cluster, and each micro service application independently collects the embedded point logs.
S12, collecting and restoring the distributed link calling process among the micro service applications and forming a graph-shaped calling chain;
s2, in the model training phase, setting a word vector model, inputting the preprocessed interface character string array, and obtaining the word vector representation of the interface name, wherein the method specifically comprises the following steps:
s21, dividing the graph-shaped calling chain into m linear calling subchains by a depth-first search method DFS, extracting interface names according to a calling sequence, forming an interface character string array, generating training data of a word vector model, and obtaining a micro-service interface set omega divided artificially;
s22, carrying out word vector model training based on the interface character string array of the step S21 to obtain a word vector of the interface name;
s3, interface evaluation stage, which comprises the following steps:
s31, using the category number K of the micro-service application as the cluster number, using the name word vector of the clustering interface of the K-means algorithm to obtain the clustering cluster division set C of the K-means algorithm ═ C1,c2,...,ck};
S32, clustering cluster division set C ═ { C) by K-means algorithm1,c2,...,ckAnd (5) evaluating the rationality of the artificial micro-service interface division set omega by using a Purity algorithm as a reference.
In step S11, the method for the server to construct the micro service cluster includes:
the method comprises the following steps that a server side constructs a micro-service cluster on the basis of Spring Cloud, SOFATracer dependence, Spring Cloud OpenFeign dependence and Zipkin dependence are added into a pom file of an engineering module, and parameters needed to be used by a link tracking tool SOFATracer and a data collection tool Zipkin are added into a configuration file of each micro-service application, wherein the parameters comprise:
a logging path, which designates a log file output directory;
com, alipay, sofa, tracker, Zipkin, enabled, starting the SOFATracer to report data to Zipkin remotely;
com, alipay, sofa, tracker, Zipkin, baseUrl, report data to the server address of Zipkin
After the configuration of the dependency and the parameters of each micro service project is completed, service discovery notes @ EnableDiscoveryClients and Feign notes @ EnableFeign Clients are started on a micro service application starting class, and the micro service applications are called through the Feign Clients.
The Spring Cloud OpenFeign summary log output by sofatrer can be seen in the log catalog of the project, and the parameters contained in one piece of data in the log are as follows:
app, representing the current microservice application name;
url, which represents the request interface address;
traceId, which represents the ID in sofastracer representing a unique request;
the spanId represents the level of the request in the whole call link,
in step S12, a method for collecting and restoring the distributed link call process between the microservice applications and forming a graph call chain is provided:
starting a Zipkin server, reporting the Spring Cloud OpenFeign summary log to the Zipkin server by the SOFATracer component integrated by each micro-service application, optionally, according to the size of data volume, performing corresponding configuration on the Zipkin server to enable log data to be persisted to databases such as Mysql or elastic search.
As shown in fig. 2, firstly, the reported link log data is extracted from the database, data with the same TraceId is from the same request, the naming rule of the spanId parameter in each piece of data is the number of parent spanId + child spanId, which includes the context relationship of the call chain, the position of the piece of data in the call chain requested according to the spanId is restored, and the format of the request. "name of method in micro service application address/name of micro service resource class/class", such as: "http://122.224.64.250: 8083/device/getInfo";
url parameters, such as device, are extracted as an interface api of the data request, and finally each request is restored to a graph-like call chain, as shown in the first dotted box of fig. 2, a, B, …, G indicate data with the same TraceId in the database, TraceId and spanId are parameters carried by the data, and api is a parameter generated by artificial extraction.
In step S21, the method for generating word vector model training data includes:
and traversing the link data of each request by a depth-first search method DFS, and splitting all the graph-shaped call chains into m linear call subchains as shown by a second dotted box in FIG. 2. Traversing each sub-chain, extracting an api parameter in each piece of data according to a calling sequence, and converting each linear calling sub-chain into an interface character string separated by a space, such as 'sa sd sc sg', wherein each interface character string represents an interface calling process of a sub-request at one time, and m linear calling sub-chains form an interface character string array with the length of m.
And performing duplicate removal processing on all extracted interface names sa, sb, sc and the like, and dividing the extracted interface names into k clusters omega { omega ═ omega } according to the micro service application names local12,...,ωk},Ω={ω1,ω2,...,ωkThe method is an artificial micro service interface division set, and k represents the micro service in the current clusterNumber of categories of service applications.
The interface string array is a training corpus as the word vector model in step S22.
As shown in fig. 3, in step S22, the word vector model is a CBOW model in the word vector models provided by the python general library, where the CBOW model is a three-layer neural network including an Input layer (Input layer), a Hidden layer (Hidden layer), and an Output layer (Output layer);
the specific steps of training the word vector model are as follows:
setting a word vector model training parameter, generating a word vector dimension S of 100, a window size C of 5, and a lowest word frequency min _ count of 1 (every interface appearing on a request link should not be ignored);
an interface character string array is input, a sliding window with the size of C is established on each interface character string, and a1, a2 and … a6 in the figure 3 represent interface names contained in one interface character string. The central word a3 of the sliding window is used as the target of the training, the rest words a1, a2, a4 and a5 in the window are used as input nodes of the neural network, each interface name can be converted into N-dimensional One-Hot codes, N is the number of the extracted and de-weighted interface names, and the One-Hot codes of 4 input nodes are respectively multiplied by a shared input weight matrix WN×SObtaining 4 vectors, generating an S-dimensional hidden layer vector after weighted averaging, and multiplying the hidden layer vector by an output weight matrix W'N×SObtaining an output vector, comparing the output vector with One-Hot codes of the headword a3, updating the weight matrixes W and W ', generating a piece of training data after the window slides once, and obtaining an output weight matrix W ' after repeated iterative training 'N×SFor the interface word vector matrix, each row of the matrix corresponds to an S-dimensional interface word vector, and finally, a word vector representation set V ═ V of each interface name is obtained1,v2,...,vnThe distribution of the set V in space is shown in the first dashed box of fig. 4.
The interface word vectors with similar contexts in the call chain are close to each other in position in the space coordinate, and the interface word vectors with larger context difference are far away from each other.
In step S31, the word vectors of the K-means algorithm clustering interface names are used to obtain a clustering cluster division set C ═ C of the K-means algorithm1,c2,...,ckThe method comprises the following specific steps:
taking the category number K of the microservice applications in the current cluster in step S21 as the cluster number of the K-means algorithm, first randomly selecting K vectors { mu ] from the interface word vector set v12,...,μkAs each class cluster C in the set CiAnd initializing cluster ci={μi},i∈{1,2,...,k};
Computing interface word vectors vjAnd each mean vector muiDistance d ofjiWherein j ∈ {1, 2., n }, and v is determined according to the nearest mean vectorjCluster class λ ofj=arg mini∈{1,2,...,k}dji,λjMeans when the distance djiThe value of the minimum time variable i, i.e. λjE {1, 2.., k }, and a vector v of interface wordsjInto a corresponding cluster
Figure GDA0003430787030000111
t
1,2,3, …, original
Figure GDA0003430787030000112
After one iteration is finished, aiming at each class cluster ci,ci∈{c1,c2,...,ckRecalculate the center point
Figure GDA0003430787030000113
Clustering the mean vector mu of the current classiUpdated to be mu'iThen for each interface word vector vjSearching the central point closest to the user again;
repeatedly circulating until the set C of the two iterations does not change, and finally obtaining the clustering cluster division set C of the K-means algorithm (C ═ C)1,c2,...,ck}。
The computing interface wordVector vjAnd each mean vector muiDistance d ofjiThe specific method comprises the following steps:
vector v of interface wordjAnd each mean vector muiAre normalized and converted into unit vectors;
vector v of interface wordjAnd each mean vector muiThe normalized unit vector is subjected to vector dot product operation to obtain vector inner product, namely vector space cosine included angle, and the value of the cosine included angle is taken as the distance d between the two vectorsji
The range of the cosine is [ -1,1], if the cosine between two vectors tends to-1, the semantic difference is larger, and tends to 1, the semantic similarity is considered to be higher.
In step S32, the Purity algorithm formula is:
Figure GDA0003430787030000114
where N denotes the total number of word vectors, and Ω ═ ω { [ ω ]1,ω2,...,ωkDenotes an artificial micro-service interface partition set, C ═ C1,c2,...,ckExpressing a clustering cluster division set of a K-means algorithm;
purity ∈ [0,1], with closer to 1 indicating more reasonable partitioning of the microservice interface.
The process of the Purity algorithm is shown in FIG. 4, wherein solid circles represent interface word vectors which are not classified by the Kemeans algorithm, open circles, open triangles and open squares represent interface word vectors which are classified into different classes by the K-means algorithm, the second dashed box in FIG. 4 represents the distribution of the interface word vectors in the set C, the third dashed box represents the distribution of the interface word vectors in the set omega, and the Purity formula is given to each class cluster omegaiAssigning a category j, wherein the assignment rule is that the interface word vector v with the category j is in the cluster omegaiIn which v ∈ C is the largestjCalculate each cluster ωiAnd summing the occurrence times of the interface word vectors with the category of j, and normalizing to obtain the final score Purity.
Based on the calling relation of the actual operation of the micro-service interface, the invention uses mathematical methods such as a word vector model, K-means clustering and a Purity algorithm to re-divide the interface set, compares the interface set with the micro-service interface divided manually, calculates to obtain the evaluation score of the division of the manual interface, and guides the existing micro-service architecture to carry out further optimization and adjustment so as to ensure that the micro-service architecture better conforms to the principle of the micro-service architecture with high cohesion and low coupling.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (9)

1. A micro service interface division evaluation method based on a word vector model is characterized by comprising the following steps:
s1, collecting data, specifically comprising the following steps:
s11, the server side constructs a micro service cluster;
s12, collecting and restoring the distributed link calling process among the micro service applications and forming a graph-shaped calling chain;
s2, setting a word vector model, inputting an interface character string array, and obtaining a word vector of an interface name, wherein the method comprises the following specific steps:
s21, dividing the graph-shaped calling chain into m linear calling subchains by a depth-first search method DFS, extracting interface names according to a calling sequence to form an interface character string array, and obtaining a man-made micro-service interface division set omega;
s22, inputting the interface character string array based on the step S21 into a set word vector model to obtain a word vector of the interface name;
s3, interface division evaluation, which comprises the following steps:
s31, taking the category number K of the micro-service application in the current cluster as the cluster number, and using the word vector of the K-means algorithm clustering interface name to obtain the clustering cluster division set C ═ { C ═ C of the K-means algorithm1,c2,...,ck};
S32, using K-means algorithmCluster partition set C ═ { C ═ C1,c2,...,ckAnd (5) evaluating the rationality of the artificial micro-service interface division set omega by using a Purity algorithm as a reference.
2. The method for dividing and evaluating the micro-service interface based on the word vector model according to claim 1, wherein in step S11, the method for the server to construct the micro-service cluster comprises:
the method comprises the steps that a server side constructs a micro-service cluster on the basis of spring cluster, service discovery annotations @ EnableDiscoveryClient and Feign annotations @ EnableFeign Clients are started on a micro-service application starting class, and calling is carried out between micro-service applications through the Feign Client.
3. The method for evaluating division of micro-service interfaces based on word vector model according to claim 1, wherein in step S12, the method for restoring the distributed link calling process between micro-service applications and forming a graph-like calling chain is collected:
adding a link tracking tool SOFATracer dependency, a Spring Cloud OpenFeign dependency and a data collection tool Zipkin dependency in a configuration file of each micro-service application, and performing embedded point access on a Spring Cloud OpenFeign component by using the SOFATracer to obtain a link calling process of each micro-service application;
introducing a link collection and display tool Zipkin into each project engineering, starting a Zipkin server, receiving link log data reported by a SOFATracer, cleaning the link log data to form a shape calling chain, and restoring a distributed link calling process.
4. The micro service interface partition evaluation method based on the word vector model according to claim 3, wherein the parameters of the SOFATracer configuration include:
a logging path, which designates a log file output directory;
com, alipay, sofa, tracker, Zipkin, enabled, starting the SOFATracer to report data to Zipkin remotely;
com, alipay, sofa, tracker, Zipkin, baseUrl, report data to the server address of Zipkin
The Spring Cloud OpenFeign summary log output by sofatrer can be seen in the log catalog of the project, and the parameters contained in one piece of data in the log are as follows:
app, representing the current microservice application name;
url, which represents the request interface address;
traceId, which represents the ID in sofastracer representing a unique request;
the spanId represents the level of the request in the whole call link;
the naming rule of the spanId is the number of a father spanId + a son spanId, the calling chain context relationship is included, and the spanIds with the same TraceId are collected to form a complete link tree.
5. The method for evaluating division of micro-service interfaces based on word vector models according to claim 1, wherein in step S21, the method for extracting the interface names according to the calling order and forming the interface character string array comprises:
converting each linear calling subchain into an interface character string separated by a space, wherein m linear calling subchains form an interface character string array with the length of m, each interface character string represents an interface calling process of a primary child request, and the extracted interface granularity is a father path in an interface address and represents a resource class name in micro-service application;
performing deduplication processing on all extracted interface names, and dividing the interface names into k clusters omega { omega ═ omega } according to the categories of the micro-service applications corresponding to the interface names1,ω2,...,ωk},Ω={ω1,ω2,...,ωkAnd k represents the number of categories of micro service applications in the current cluster.
6. The micro-service interface division evaluation method based on the word vector model according to claim 1, wherein in step S22, the word vector model is a CBOW model in the word vector model provided by a python genetic library;
the specific steps of training the word vector model are as follows:
setting a generated word vector dimension S, a window size C and a lowest word frequency min _ count equal to 1;
inputting an interface character string array, and establishing a sliding window with the size of C on each interface character string;
the central word of the sliding window is used as a target of the training, the rest words in the window are used as input nodes of the neural network, a piece of training data is generated after the window slides once, and a word vector representation set V ═ V { V } of each interface name is obtained through repeated iterative training1,v2,...,vn}。
7. The method for evaluating micro-service interface partition based on word vector model of claim 5, wherein in step S31, word vectors of interface names are clustered by using K-means algorithm, and a cluster partition set C ═ C { C } of K-means algorithm is obtained1,c2,...,ckThe method comprises the following specific steps:
taking the category number K of micro-service application in the current cluster as the cluster number of a K-means algorithm, firstly randomly selecting K vectors (mu) from an interface word vector set v1,μ2,...,μkAs each class cluster C in the set CiAnd initializing cluster ci={μi},i∈{1,2,...,k};
Computing interface word vectors vjAnd each mean vector muiDistance d ofjiWherein j ∈ {1, 2., n }, and v is determined according to the nearest mean vectorjCluster class λ ofj=arg mini∈{1,2,...,k}dji,λjMeans when the distance djiThe value of the minimum time variable i, i.e. λjE {1, 2.., k }, and a vector v of interface wordsjInto a corresponding cluster
Figure FDA0003430787020000041
Initial
Figure FDA0003430787020000042
After one iteration is finished, aiming at each class cluster ci,ci∈{c1,c2,...,ckRecalculate the center point
Figure FDA0003430787020000043
Clustering the mean vector mu of the current classiUpdated to be mu'iThen for each interface word vector vjSearching the central point closest to the user again;
repeatedly circulating until the set C of the two iterations does not change, and finally obtaining the clustering cluster division set C of the K-means algorithm (C ═ C)1,c2,...,ck}。
8. The method according to claim 7, wherein the calculation interface word vector v is a word vector model-based micro-service interface partition evaluation methodjAnd each mean vector muiDistance d ofjiThe specific method comprises the following steps:
vector v of interface wordjAnd each mean vector muiAre normalized and converted into unit vectors;
vector v of interface wordjAnd each mean vector muiThe normalized unit vector is subjected to vector dot product operation to obtain vector inner product, namely vector space cosine included angle, and the value of the cosine included angle is taken as the distance d between the two vectorsji
9. The method for evaluating division of micro-service interfaces based on word vector models according to claim 5, wherein in step S32, the Purity algorithm formula is:
Figure FDA0003430787020000044
where N denotes the total number of word vectors, and Ω ═ ω { [ ω ]1,ω2,...,ωkDenotes an artificial micro-service interface partition set, C ═ C1,c2,...,ckExpressing a clustering cluster division set of a K-means algorithm;
purity ∈ [0,1], with closer to 1 indicating more reasonable partitioning of the microservice interface.
CN202111316694.6A 2021-11-09 2021-11-09 Word vector model-based micro-service interface division evaluation method Active CN113760778B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111316694.6A CN113760778B (en) 2021-11-09 2021-11-09 Word vector model-based micro-service interface division evaluation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111316694.6A CN113760778B (en) 2021-11-09 2021-11-09 Word vector model-based micro-service interface division evaluation method

Publications (2)

Publication Number Publication Date
CN113760778A CN113760778A (en) 2021-12-07
CN113760778B true CN113760778B (en) 2022-02-08

Family

ID=78784664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111316694.6A Active CN113760778B (en) 2021-11-09 2021-11-09 Word vector model-based micro-service interface division evaluation method

Country Status (1)

Country Link
CN (1) CN113760778B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115061836B (en) * 2022-08-16 2022-11-08 浙江大学滨海产业技术研究院 Micro-service splitting method based on graph embedding algorithm for interface layer
CN116112569B (en) * 2023-02-23 2023-07-21 安超云软件有限公司 Micro-service scheduling method and management system
CN117311801B (en) * 2023-11-27 2024-04-09 湖南科技大学 Micro-service splitting method based on networking structural characteristics

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6965886B2 (en) * 2001-11-01 2005-11-15 Actimize Ltd. System and method for analyzing and utilizing data, by executing complex analytical models in real time
CN107580018A (en) * 2017-07-28 2018-01-12 北京北信源软件股份有限公司 The tracking and device of a kind of distributed system
CN109670022A (en) * 2018-12-13 2019-04-23 南京航空航天大学 A kind of java application interface use pattern recommended method based on semantic similarity
CN109921927A (en) * 2019-02-20 2019-06-21 苏州人之众信息技术有限公司 Real-time calling D-chain trace method based on micro services
CN109948710A (en) * 2019-03-21 2019-06-28 杭州电子科技大学 Micro services recognition methods based on API similarity
CN110262972A (en) * 2019-06-17 2019-09-20 中国科学院软件研究所 A kind of failure testing tool and method towards micro services application
CN111459760A (en) * 2020-04-01 2020-07-28 交通银行股份有限公司太平洋信用卡中心 Micro-service monitoring method and device and computer storage medium
CN111459766A (en) * 2019-11-14 2020-07-28 国网浙江省电力有限公司信息通信分公司 Calling chain tracking and analyzing method for micro-service system
CN111552509A (en) * 2020-04-30 2020-08-18 深圳前海微众银行股份有限公司 Method and device for determining dependency relationship between interfaces
CN111651451A (en) * 2020-04-25 2020-09-11 复旦大学 Scene-driven single system micro-service splitting method
CN111984346A (en) * 2020-08-12 2020-11-24 八维通科技有限公司 Method, system, device and storage medium for call chain tracking in micro-service environment
CN112148254A (en) * 2019-06-27 2020-12-29 Sap欧洲公司 Application evaluation system for achieving interface design consistency between microservices
CN112650614A (en) * 2020-12-30 2021-04-13 平安消费金融有限公司 Call chain monitoring method and device, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107957989B9 (en) * 2017-10-23 2021-01-12 创新先进技术有限公司 Cluster-based word vector processing method, device and equipment
CN110377686B (en) * 2019-07-04 2021-09-17 浙江大学 Address information feature extraction method based on deep neural network model
US11693849B2 (en) * 2019-11-21 2023-07-04 Dell Products L.P. Consistent structured data hash value generation across formats and platforms

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6965886B2 (en) * 2001-11-01 2005-11-15 Actimize Ltd. System and method for analyzing and utilizing data, by executing complex analytical models in real time
CN107580018A (en) * 2017-07-28 2018-01-12 北京北信源软件股份有限公司 The tracking and device of a kind of distributed system
CN109670022A (en) * 2018-12-13 2019-04-23 南京航空航天大学 A kind of java application interface use pattern recommended method based on semantic similarity
CN109921927A (en) * 2019-02-20 2019-06-21 苏州人之众信息技术有限公司 Real-time calling D-chain trace method based on micro services
CN109948710A (en) * 2019-03-21 2019-06-28 杭州电子科技大学 Micro services recognition methods based on API similarity
CN110262972A (en) * 2019-06-17 2019-09-20 中国科学院软件研究所 A kind of failure testing tool and method towards micro services application
CN112148254A (en) * 2019-06-27 2020-12-29 Sap欧洲公司 Application evaluation system for achieving interface design consistency between microservices
CN111459766A (en) * 2019-11-14 2020-07-28 国网浙江省电力有限公司信息通信分公司 Calling chain tracking and analyzing method for micro-service system
CN111459760A (en) * 2020-04-01 2020-07-28 交通银行股份有限公司太平洋信用卡中心 Micro-service monitoring method and device and computer storage medium
CN111651451A (en) * 2020-04-25 2020-09-11 复旦大学 Scene-driven single system micro-service splitting method
CN111552509A (en) * 2020-04-30 2020-08-18 深圳前海微众银行股份有限公司 Method and device for determining dependency relationship between interfaces
CN111984346A (en) * 2020-08-12 2020-11-24 八维通科技有限公司 Method, system, device and storage medium for call chain tracking in micro-service environment
CN112650614A (en) * 2020-12-30 2021-04-13 平安消费金融有限公司 Call chain monitoring method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"场景驱动且自底向上的单体系统微服务拆分方法";丁丹 等;《软件学报》;20201130;第31卷(第11期);全文 *
"限界上下文视角下的微服务粒度评估";钟陈星 等;《软件学报》;20191031;第30卷(第10期);全文 *
"面向微服务软件开发方法研究进展";吴化尧;《计算机研究与发展》;20200315;第57卷(第3期);全文 *

Also Published As

Publication number Publication date
CN113760778A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
CN113760778B (en) Word vector model-based micro-service interface division evaluation method
US11334819B2 (en) Method and system for distributed machine learning
Caldas et al. Leaf: A benchmark for federated settings
US20190220470A1 (en) Cross-domain clusterability evaluation for cross-guided data clustering based on alignment between data domains
US9269054B1 (en) Methods for building regression trees in a distributed computing environment
US8200596B2 (en) Speeding up analysis of compressed web graphs using virtual nodes
CN110188030A (en) A kind of test data generating method, device and computer equipment, storage medium
CN103210368A (en) Software application recognition
JP2002014816A (en) Method for preparing decision tree by judgment formula and for using the same for data classification and device for the same
CN102782678A (en) Joint embedding for item association
CN111611488B (en) Information recommendation method and device based on artificial intelligence and electronic equipment
US10963802B1 (en) Distributed decision variable tuning system for machine learning
CN108665148B (en) Electronic resource quality evaluation method and device and storage medium
JP2022505540A (en) Systems and methods for active transfer learning using deep feature extraction
WO2022126960A1 (en) Service term data processing method, apparatus and device, and storage medium
CN109241278A (en) Scientific research knowledge management method and system
CN113222181B (en) Federated learning method facing k-means clustering algorithm
CN114219562A (en) Model training method, enterprise credit evaluation method and device, equipment and medium
CN116579503B (en) 5G intelligent hospital basic data processing method and database platform
CN110209895A (en) Vector index method, apparatus and equipment
CN115618949A (en) User interest analysis method and system based on immersive meta universe service
CN108256083A (en) Content recommendation method based on deep learning
CN114185875A (en) Big data unified analysis and processing system based on cloud computing
CN112528143A (en) Intelligent pushing method and system for price inquiring order
CN114911778A (en) Data processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant