CN113743467B

CN113743467B - Case diagram similarity judging method based on maximum public subgraph calculation

Info

Publication number: CN113743467B
Application number: CN202110886966.XA
Authority: CN
Inventors: 汪烨; 宋师哲; 周澳回; 姜波
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2021-08-03
Filing date: 2021-08-03
Publication date: 2024-01-12
Anticipated expiration: 2041-08-03
Also published as: CN113743467A

Abstract

The invention belongs to the technical field of software development, and discloses a use case diagram similarity judging method based on maximum public sub-graph calculation, which comprises the following steps: step 1: preprocessing the UML use case diagram to be compared, and representing the UML use case diagram as a directed diagram; step 2: calculating and acquiring the maximum public subgraphs among the directed graphs to be compared; step 3: similarity is calculated using a similarity determination algorithm. The maximum public sub-graph algorithm used by the method is simple in process, the graph structure is directly analyzed, the efficiency is high, the high efficiency and convenience in the use process can be ensured, and the method has strong applicability.

Description

Case diagram similarity judging method based on maximum public subgraph calculation

Technical Field

The invention belongs to the technical field of software development, and particularly relates to a use case diagram similarity judging method based on maximum public sub-graph calculation.

Background

In the process of software development, software reuse strategies are frequently utilized, i.e., existing software components are used, including code fragments, designs, test data, or cost estimates, etc., to build new software. The software reuse can save development cost and time and improve the software development process. With the increasing complexity of software, software reuse has involved various stages of the software lifecycle, including demand analysis, design, testing, and even maintenance, and is no longer limited to code.

The analysis of the requirements in the software development period is a key basis for software design, implementation, test and maintenance, and can indicate the working direction and development strategy for developers. The case diagram is the most simple expression form of the interaction between the user and the system, becomes the most commonly used tool in the software demand analysis stage by virtue of the advantages of intuitiveness, standardization, accurate description and the like, and plays a vital role in the process of collecting the software demand. Multiplexing of usage patterns can help developers build their new usage pattern models in a short time, quickly determine requirements, and thus increase work efficiency. But its multiplexing is difficult due to the semi-structural nature of the use case diagram.

In a study using a maximum subgraph to determine similarity of UML use cases, reza Fauzan, daniel Siahan and Siti Rochimah et al propose a method for calculating cosine similarity and word text semantic similarity to determine similarity between two UML use cases. The method comprises the steps of preprocessing and structuring two UML use cases, and judging semantic similarity of two words through Wu Palmer and Wordnet. Wu Palmer will calculate a similarity range value of two words in Wordnet 0,1, where 0 indicates that there is no similarity between the two words and 1 indicates that the two words have complete similarity. Meanwhile, since the text appearing in the UML use case diagram may be composed of a plurality of words, the method also combines cosine similarity and Wu Palmer for comprehensive measurement. The disadvantage of this approach is that it does not make the correct similarity determination for two UML use cases from different projects but similar in structure.

As shown in fig. 1 and 2, if the similarity comparison is performed on the two UML use cases by using the semantic similarity method proposed by Reza Fauzan et al, the two UML use cases will be determined to be dissimilar (i.e., the numerical value is lower than the set threshold) due to low text similarity. In reality, however, the two UML use cases differ only in text content, and the structure thereof has a great similarity, and in the actual software multiplexing process, the UML use case cases shown in fig. 1 can be multiplexed to generate the UML use case shown in fig. 2. The difference between the expected result and the actual result shows that in the process of multiplexing the UML use cases, we need to consider not only the factors of text semantic similarity, but also the influence of structural similarity on the similarity of the two. The method formats the UML use case diagram and calculates the maximum public subgraph of the UML use case diagram and the UML use case diagram as the basis for similarity judgment, so that the problem that the semantic similarity has no strong applicability in structural similarity judgment can be effectively solved.

Furthermore, mohammad Nazir Arifin and Daniel sialan et al also propose a method to determine the similarity between two UML use cases using minimum edit distance and word text semantic similarity. Similarly, the method pre-processes and constructs two UML use cases, then uses the minimum editing distance between the two use cases to measure the structural similarity, then combines the text semantic similarity with the use cases and gives a certain weight to the text semantic similarity, thus calculating the final similarity. However, the minimum edit distance algorithm used in the method has complex process and high time complexity, so that the algorithm has low execution efficiency. The maximum public sub-graph algorithm used by the method is simple in process, the graph structure is directly analyzed, the efficiency is high, the high efficiency and convenience in the use process can be ensured, and the method has strong applicability.

Disclosure of Invention

The invention aims to provide a case diagram similarity judging method based on maximum public sub-graph calculation, which provides support for software multiplexing and finally improves the development efficiency of software.

In order to solve the technical problems, the specific technical scheme of the use case diagram similarity judging method based on the maximum public sub-graph calculation is as follows:

a use case diagram similarity judging method based on maximum public subgraph calculation comprises the following steps:

step 1: preprocessing the UML use case diagram to be compared, and representing the UML use case diagram as a directed diagram;

step 2: calculating and acquiring the maximum public subgraphs among the directed graphs to be compared;

step 3: similarity is calculated using a similarity determination algorithm.

Further, the UML use case diagram is composed of relationships among participants, use cases and elements, wherein the participants of the UML use case diagram refer to users, organizations or external systems interacting with applications or systems; the use cases refer to functions contained in the system; the relationships among the elements comprise association relationships, generalization relationships, inclusion relationships and expansion relationships.

Further, the step 1 comprises the following specific steps:

step 1.1: preprocessing of the UML use case diagram is to extract elements and data in the UML use case diagram by converting the UML use case diagram into a formatting language; known UML instance graphs were constructed using an open-source UML modeling tool, thus exporting each instance graph (model) as an extensible markup language metadata interchange format (XMI) file;

step 1.2: parsing the XMI file and representing the elements as directed graphs; let g (V, E) be a set of graphs comprising vertices V and directed edges E, where vertices V are used to represent a participant or a use case and directed edges E represent associations between participants, between participants and use cases, and between use cases and use cases.

Further, the theoretical basis of the UML use case similarity judgment based on the maximum public subgraph is:

if two graphs are closer in structure, the more common parts of the two graphs are, i.e. there will be a common sub-graph between them, whereby we can use the largest common sub-graph of the two graphs to compare their degree of similarity in structure; the related concepts are defined as follows:

definition one (subgraph): given two graphs c1 (Vc, ec) and g (V, E), we call graph c1 a sub-graph of graph g, written as

Definition two (maximum common subgraph): given two graphs g1 and g2, if there is an additional graph m, the following condition can be satisfied:

and no graph m' satisfies the following condition:

(3)|m′|＞|m|；

then figure m is the largest common subgraph of figures g1 and g2, denoted maxcsg (g 1, g 2);

step 2.1: setting the maximum common subgraph of g1 and g2 as maxcsg, traversing and comparing the nodes of g1 and g2, and taking the common nodes of the same type of g1 and g2 as the nodes of maxcsg;

step 2.2: traversing the nodes in the map maxcsg obtained in the step 2.1 again, and if two nodes are adjacent in g1 and g2 and the types of the edges connecting the nodes are the same, generating the edges of the corresponding types and adding the edges into the map maxcsg;

step 2.3: obtaining the maximum common sub-graph maxcsg of g1 and g 2.

Further, the step 3 comprises the following specific steps:

after the maximum public sub-graph maxcsg is obtained, the calculation of the similarity is completed by utilizing the proportion of the nodes and edges of the maximum public sub-graph maxcsg in the comparison object, and the similarity calculation formula is as follows:

wherein VT represents the elements of participants (including normal participants and generalized participants), use cases (including general use cases, generalized use cases, extended use cases and including use cases) existing in the UML use case diagram, gamma _V Weights set for nodes representing each type element, which are defined manually, and Σ _v∈VT γ _v =1, vertexnum (maxcsg, v) represents the v-type node in the maximum common sub-graph maxcsgNumber, max (VertexNum (g 1, v), vertexNum (g 2, v)) represents the maximum value of the number of corresponding types of nodes in g1 and g2, ET represents the relationship of association, generalization, inclusion and expansion, etc. existing in UML use case diagram, θx represents the weight set for each type of edge, which is defined manually, and Σ _x∈ET θ _x =1, edgeNum (maxcsg, x) represents the number of edges of type x in the maximum common sub-graph maxcsg, max (EdgeNum (g 1, x), edgeNum (g 2, x)) represents the maximum value of the number of corresponding types of edges contained in g1 and g2, α, β is set manually, and α, β e (0, 1).

The use case diagram similarity judging method based on the maximum public subgraph calculation has the following advantages: the maximum public sub-graph algorithm used by the method is simple in process, the graph structure is directly analyzed, the efficiency is high, the high efficiency and convenience in the use process can be ensured, and the method has strong applicability.

Drawings

FIG. 1 is a diagram of a UML use of a banking system;

FIG. 2 is a diagram of UML usage of the warehouse management system;

FIG. 3 is an overview of the method of the present invention;

FIG. 4 is a diagram of a UML use of a banking counter system in accordance with an embodiment of the present invention;

FIG. 5 is a diagram of a UML use of the warehouse management system in accordance with an embodiment of the present invention;

FIG. 6 is a directed graph g1 (V) ₁ ,E ₁ )；

FIG. 7 is a warehouse management system directed graph g2 (V) ₂ ,E ₂ )；

Fig. 8 is a diagram of the maximum common subgraph generation process of the present invention.

Detailed Description

For better understanding of the purpose, structure and function of the present invention, a method for determining similarity of usage patterns based on maximum common sub-graph computation will be described in further detail with reference to the accompanying drawings.

An overview of the method of the present method is shown in fig. 3, which we use the maximum common subgraph to calculate and judge in order to judge the similarity between two UML usage graphs. To obtain the largest common subgraph that can be used by the present algorithm, we will preprocess the UML use case graph to facilitate computation. The algorithm has the following steps in the use process:

1) The UML use case diagram is preprocessed, in this step, we need to convert the UML use case diagram into the form of XMI file first, and then convert the XMI file into the directed diagram through the custom rule.

2) The maximum common subgraph among the compared directed graphs is calculated, and in this step, the maximum common subgraph among the directed graphs to be compared is calculated through a maximum common subgraph algorithm.

3) Similarity calculation in this step we will put the largest common sub-band into the custom similarity calculation formula to get the similarity between them.

The method comprises the following specific steps:

(1) pretreatment of

The UML use case diagram consists of relationships between participants, use cases, and elements. Where the participants of the UML use case diagram refer to users, organizations, or external systems that interact with an application or system, typically represented by a small person. Use cases refer to functions contained within the system and are typically represented using an oval. And the relationship between the elements includes association relationship, generalization relationship, inclusion relationship and expansion relationship. The roles and notations of the relationships are shown in Table 1:

TABLE 1 element relationship Table

The preprocessing of the UML use case diagram mainly extracts elements and data from the UML use case diagram by converting the UML use case diagram into a formatting language. Known UML instance graphs were constructed using an open-source UML modeling tool, so we export each instance graph (model) as an extensible markup language metadata interchange format (XMI) file.

The banking counter system and the warehouse management system are representative service systems among the financial service system and the warehouse service system, so we use the banking counter system and the warehouse management system as an example to describe the method herein. The UML usage diagrams for the bank counter system and warehouse management system are shown in fig. 4 and 5:

in the UML use case diagram of the bank counter system of fig. 4, there are four participants (general depositors, VIP depositors, teller), seven use cases (transfer, deposit, withdrawal, loss reporting, personal transfer, public transfer, frozen account) and thirteen correspondences (association, inclusion, expansion, generalization).

In the UML use case diagram of the warehouse management system of fig. 5, five participants (temporary buyer, in-process buyer, warehouse buyer, logistics driver, warehouse manager) are all in total, six use cases (maintenance, warehouse-in, purchase, ex-warehouse, self-maintenance, factory-return maintenance) and eleven correspondences (association, inclusion, generalization).

We first parse this UML use case diagram using a tool to convert it into a file in XMI format.

The next step is to parse the XMI file and represent the elements as directed graphs. The method sets g (V, E) as a group of graphs comprising a vertex V and a directed edge E. Where vertex V is used to represent a participant or a use case. The directed edge E is used to represent the association between the participants, between the participants and the use cases, and between the use cases.

According to this rule, we convert the participants 'normal depositors and VIP depositors in the XMI file of the bank counter system to FA1 and FA2 vertices, respectively, and the participants' depositors and teller to vertices A1 and A2, respectively. Then we convert case transfer, deposit, withdrawal and loss reporting to the vertices U1, U2, U3 and U4, respectively, and case freeze account, person transfer and revolution account to vertices T1, B1 and B2, respectively. At the same time we convert the generalized relationship between normal and VIP and between the depositors into one-way connected edges (i.e. connecting two vertices with one arc) ef1 and ef2. The association relationship between the participant depositors and the case transfer, deposit, withdrawal and loss reporting is respectively converted into two-way communication edges (namely two arcs with opposite directions are used for connecting two vertexes) eg1, eg2, eg3 and eg4, the association relationship between the participant teller and the case transfer, deposit, withdrawal and loss reporting is respectively converted into two-way communication edges eg5, eg6, eg7 and eg8, and the relationship between the case freezing account, personal transfer, revolution counter account reporting and case loss reporting is respectively converted into one-way communication edges et1, eb1 and eb2.

Similarly, we convert the participant temporary and in the XMI files of the warehouse management system to FA3 and FA4 vertices, respectively, and the participant warehouse buyer and logistics driver and warehouse manager to vertices A3, A4 and A5, respectively. Then we convert the use case maintenance, warehouse entry, purchase and warehouse exit into U5, U6, U7 and U8 vertexes respectively, and convert the use case self maintenance and the factory return maintenance into vertexes B3 and B4 respectively. At the same time we convert the generalized relationship between temporary and warehouse purchasers, between the programmed and warehouse purchasers into one-way connected edges ef3 and ef4. The association relations between the participant warehouse buyers and the use case warehouses and the purchases are respectively converted into two-way communication edges eg9 and eg10, the association relations between the participant warehouse administrators and the use case maintenance, warehouse entry, purchase and ex-warehouse are respectively converted into two-way communication edges eg11, eg12, eg13 and eg14, the association relations between the participant logistics drivers and the use case ex-warehouse are converted into two-way communication edges eg15, and the relations between the use case self-maintenance, the factory return maintenance and the use case maintenance are respectively converted into one-way communication edges eb3 and eb4.

The directed graph after conversion is shown in fig. 6 and fig. 7, and in fig. 6 and fig. 7, the labels corresponding to the elements of different types are shown in the directed graph label correspondence table of table 2:

TABLE 2 case directed icon mapping table

Numbering device	Label name	Type(s)	Numbering device	Label name	Type(s)
						1	A1	Participants (participants)	24	eg2	Association relation
2	A2	Participants (participants)	25	eg3	Association relation
						3	A3	Participants (participants)	26	eg4	Association relation
4	A4	Participants (participants)	27	eg5	Association relation
						5	A5	Participants (participants)	28	eg6	Association relation
6	FA1	Participant (generalization)	29	eg7	Association relation
						7	FA2	Participant (generalization)	30	eg8	Association relation
8	FA3	Participant (generalization)	31	eg9	Association relation
						9	FA4	Participant (generalization)	32	eg10	Association relation
10	U1	Use case	33	eg11	Association relation
						11	U2	Use case	34	eg12	Association relation
12	U3	Use case	35	eg13	Association relation
						13	U4	Use case	36	eg14	Association relation
14	U5	Use case	37	eg15	Association relation
						15	U6	Use case	38	ef1	Generalizing relationships
16	U7	Use case	39	ef2	Generalizing relationships
						17	U8	Use case	40	ef3	Generalizing relationships
18	B1	Use case (include)	41	ef4	Generalizing relationships
						19	B2	Use case (include)	42	eb1	Containment relationship
20	B3	Use case (include)	43	eb2	Containment relationship
						21	B4	Use case (include)	44	eb3	Containment relationship
22	T1	Use case (expansion)	45	eb4	Containment relationship
						23	eg1	Association relation	46	et1	Expanding relationships

It should be noted that, in this case, a generalized relationship between use cases does not occur, and if this relationship occurs in an application, the corresponding node is marked with FUX, where X represents the internal serial number of the same type of tag.

(2) Calculating the maximum common subgraph between the compared directed graphs

The theoretical basis for UML use case similarity judgment based on the maximum public subgraph is as follows: if two graphs are closer in structure, the more common parts of the two graphs are, i.e. there will be a common sub-graph between them. So we can use the largest common sub-graph of the two graphs to compare their degree of similarity in structure. Before the comparison, the relevant concepts are defined.

Definition one (subgraph): given two figures c1 (V _c ,E _c ) And g (V, E), we call graph c1 a sub-graph of graph g, written as

and no graph m' satisfies the following condition:

(3)|m′|＞|m|；

then figure m is the largest common sub-graph of figures g1 and g2, denoted maxcsg (g 1, g 2).

The solution process of the maximum common subgraph has two major steps, here we use the graphs mentioned in fig. 6 and fig. 7g1(V ₁ ,E ₁ ) And FIG. g2 (V) ₂ ,E ₂ ) For illustration, a schematic diagram is shown in fig. 8:

in the first step, we set the maximum common sub-graph of g1 and g2 as maxcsg, then traverse and compare the nodes of g1 and g2, and take the common nodes of the same type of g1 and g2 as the nodes of maxcsg.

And secondly, traversing the nodes in the map maxcsg obtained in the first step again, and if two nodes are adjacent in g1 and g2 and the types of the edges connecting the nodes are the same, generating the edges of the corresponding types and adding the edges into the map maxcsg.

Through the above steps we get the maximum common sub-graph maxcsg for g1 and g 2.

(3) Similarity calculation

After the maximum public subgraph maxcsg is obtained, the calculation of the similarity is completed by utilizing the proportion of the nodes and edges of the maximum public subgraph maxcsg in comparison objects, and a specific similarity calculation formula is shown in a formula 1:

wherein VT represents elements such as participants (including general participants and generalized participants), use cases (including general use cases, generalized use cases, extended use cases and including use cases) and the like existing in the UML use case diagram, and gamma _V Weights set for nodes representing each type element, which are defined manually, and Σ _v∈VT γ _v =1, vertegnum (maxcsg, v) represents the number of v type nodes in the maximum common sub-graph maxcsg, max (vertegnum (g 1, v), vertegnum (g 2, v)) represents the maximum value of the number of corresponding type nodes in g1 and g2, ET represents the relationship of association, generalization, inclusion, expansion, etc. existing in the UML use diagram, θ _x Weights, defined manually, are set for each type of edge, and Σ _x∈ET θ _x =1, edgeNum (maxcsg, x) represents the number of edges of type x in the maximum common sub-graph maxcsg, max (EdgeNum (g 1, x), edgeNum (g 2, x)) represents the most number of edges of the corresponding type contained in g1 and g2Large values, α, β are set manually, and α, β e (0, 1).

Based on the present case, we set α, β to 0.55 and 0.45, respectively, the γ values of the normal participant, the generalized participant, the normal use case, the generalized use case, the extended use case, the inclusion use case are 0.2,0, 0.2, respectively, and the θ values of the association, generalization, inclusion and extended relationships are 0.2,0.266,0.266,0.266, respectively. Simusecanase (g 1, g 2) = 0.7102 can be calculated.

Thus, the similarity of g1 and g2 is judged to be finished, and each parameter can be adjusted to meet the requirement in practical application.

It will be understood that the invention has been described in terms of several embodiments, and that various changes and equivalents may be made to these features and embodiments by those skilled in the art without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims

1. A use case diagram similarity judging method based on maximum public subgraph calculation is characterized by comprising the following steps:

step 1: preprocessing the UML use case diagram to be compared, and representing the UML use case diagram as a directed diagram; the UML use case diagram consists of relationships among participants, use cases and elements, wherein the participants of the UML use case diagram refer to users, organizations or external systems interacting with applications or systems; the use cases refer to functions contained in the system; the relationships among the elements comprise association relationships, generalization relationships, inclusion relationships and expansion relationships;

step 1.1: preprocessing of the UML use case diagram is to extract elements and data in the UML use case diagram by converting the UML use case diagram into a formatting language; known UML instance graphs are built using an open-source UML modeling tool, thus exporting each instance graph as an extensible markup language metadata interchange format XMI file;

step 1.2: parsing the XMI file and representing the elements as directed graphs; let g (V, E) be a set of graphs comprising vertices V and directed edges E, wherein vertices V are used to represent a participant or a use case, and directed edges E represent associations between participants, between participants and use cases, and between use cases and use cases;

step 3: similarity is calculated using a similarity determination algorithm.

2. The use case graph similarity judging method based on the maximum public subgraph calculation according to claim 1, wherein the theoretical basis of the UML use case similarity judgment based on the maximum public subgraph is: if two graphs are closer in structure, the more common parts of the two graphs are, i.e. there will be a common sub-graph between them, whereby we can use the largest common sub-graph of the two graphs to compare their degree of similarity in structure; the related concepts are defined as follows:

defining a sub-graph: given two graphs c1 (Vc, ec) and g (V, E), we call graph c1 a sub-graph of graph g, written as

(1)

(2)

Defining two maximum public subgraphs: given two graphs g1 and g2, if there is an additional graph m, the following condition can be satisfied:

(1)

(2)

and no graph m' satisfies the following condition:

(1)

(2)

(3)|m′|>|m|；

step 2.3: obtaining the maximum common sub-graph maxcsg of g1 and g 2.

3. The usage graph similarity determination method based on maximum common subgraph calculation according to claim 2, wherein step 3 includes the following specific steps:

wherein VT represents participants, use case elements, gamma existing in UML use case diagram _V Weights set for nodes representing each type element, which are defined manually, and Σ _v∈VT γ _v =1, vertegnum (maxcsg, v) represents the number of v type nodes in the maximum common sub-graph maxcsg, max (vertegnum (g 1, v), vertegnum (g 2, v)) represents the maximum value of the number of corresponding type nodes in g1 and g2, ET represents the relationship of association, generalization, inclusion, expansion, etc. existing in the UML use graph, θx represents the weight set for each type of edge, which is defined manually, and Σ _x∈ET θ _x =1, edgeNum (maxcsg, x) represents the number of edges of type x in the maximum common sub-graph maxcsg, max (EdgeNum (g 1, x), edgeNum (g 2, x)) represents the maximum value of the number of corresponding types of edges contained in g1 and g2, α, β is set manually, and α, β e (0, 1).