CN113254013B

CN113254013B - Reusable component mining method for complex business process

Info

Publication number: CN113254013B
Application number: CN202110804713.3A
Authority: CN
Inventors: 潘鑫; 李贞昊; 雷航; 荣燊; 李若尘; 柳叶康; 肖泾军
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-07-16
Filing date: 2021-07-16
Publication date: 2021-09-24
Anticipated expiration: 2041-07-16
Also published as: CN113254013A

Abstract

The invention discloses a complex business process-oriented reusable component mining method, and belongs to the technical field of software reuse. The method is oriented to the business process to carry out process similarity analysis, can mine a similar sub-process set from the business process as a component module of the software, and greatly improves the speed of constructing the component on the premise of ensuring the component quality; according to the method, the reusable component based on the system can be automatically excavated only by inputting the service flow chart of the system, other configurations are not needed, the development period of industrial management software is optimized, the software development difficulty is reduced, and powerful support is provided for the reduction of the subsequent software maintenance cost; by implementing the actual software flow chart data of an enterprise specifically, the method automatically excavates the reusable components, so that software development can be better carried out by utilizing a software multiplexing technology, and support is provided for improving the development efficiency of industrial management software and reducing the development cost of the industrial management software.

Description

Reusable component mining method for complex business process

Technical Field

The invention belongs to the technical field of software multiplexing, and particularly relates to a complex business process-oriented reusable component mining method, wherein the classification number is G06K.

Background

As a carrier of intelligent manufacturing, industrial management software has been deeply integrated into industrial design and manufacturing processes, and becomes an information core of manufacturing industry. With the growth of the market, the architecture of the industrial management software is more and more complex, the quality requirement is gradually improved, and how to rapidly and efficiently develop the industrial management software becomes the difficulty of the current industrial management software development. Most industrial management software has a large number of same transaction processes in a complex business scene, and in order to accelerate software development efficiency, component-based software development becomes one of the mainstream ways at present. The existing reusable component mining method mainly performs mining from an object-oriented API through using frequency, and is not strong in practicability in the field of industrial management software. Aiming at the characteristics of an industrial management software system, the process similarity analysis is carried out facing to the business process, and a similar sub-process set is automatically mined from the business process to be used as a component module of the software, so that the speed of constructing the component can be greatly improved on the premise of ensuring the quality of the component, and the development of the industrial management software is better carried out by utilizing a software multiplexing technology.

Disclosure of Invention

The invention solves the problems of low development speed and low efficiency of industrial software in the prior art due to complex service scenes.

The technical scheme of the invention is that a reusable component mining method for complex business processes comprises the following steps:

s1, inputting a service flow chart set of the system, and converting the service flow chart set into a graph model set represented by symbols through preprocessing;

s2, mining frequent sub-processes with similar structures through the graph model set, and mining all frequent sub-processes with similar structures

Composition set

,

；

S3 calculation set

The behavior similarity of the medium-frequency and frequent sub-processes is set according to the similarity

Clustering to form component alternative set

；

S4, set of all component alternatives

Evaluating based on the feasibility of the multiplexing assembly, and calculating the feasibility of the assembly formed by the alternative sets;

s5, judging component alternative set according to feasibility

And if the feasibility index is not less than the feasibility index, constructing the reusable component, otherwise, abandoning the construction of the component.

Further, the step S1 includes:

s11, converting each flow chart node in the business flow chart set into a tripleNode =（nId，nLabel，nText) WhereinnIdA unique identification number representing the node;nLabelrepresenting the corresponding value of the state of the node in the flow;nTextrepresenting a specific behavioral description of the node;

s12, converting the flow chart connecting line into a quadrupletEdge =（eId，nFrom，nTo,eLable) WhereineIdA unique identification number representing the connection;nFromthe unique mark of the starting node of the connecting lineIdentifying a number;nTothe unique identification number of the destination node representing the connection;eLablethe state of the connecting line is shown, and the purpose is to distinguish a common connecting line from a conditional connecting line;

s13, combining the results of S11 and S12, converting the business flow chart into a graph model G = (N, E) represented by symbols, whereinNBeing triadsNodeIn the collection of the images, the image data is collected,Eis a quadrupleEdgeA collection of (a).

Further, the step S2 includes:

s21, counting the frequency of the edges and nodes with the same function according to the preset minimum support degreemin_sptRemoval frequency less than minimum supportmin_sptTo obtain a new graph model

(ii) a According to edgeeLableAnd of a nodenLabelTo determine whether the edge and node functions are the same;

s22, model of the graph

The edges and nodes in the graph are re-marked by frequency classification, the marking rule is that the marked edges or nodes are marked with the same frequency according to the frequency, the higher the frequency, the smaller the corresponding dictionary sequence of the marks is, and meanwhile, the mapping relation between the identification numbers and the marks and the corresponding relation between the marked edges and the original edges are stored;

s23, selecting the mark with the highest frequency, wherein the edge and the node corresponding to the mark form the maximum frequent sub-processAFor the most frequent sub-flowAExcavating, excavating a frequent sub-process each time

；

S24, digging the mark with the second highest frequency and then the mark with the third highest frequency in sequence according to the method of the step S23 until all the frequent sub-processes are dug

Obtaining a frequent sub-process set

。

Further, the step S23 includes:

s231, maximum frequent sub-flowAFor the first frequent sub-process, each subsequent frequent sub-process is reduced by one edge on the previous frequent sub-process; judging whether the current frequent sub-flow meets the minimumDFSCoding;

s232, if not satisfying the minimumDFSCoding, and ending the mining process of the sub-process;

s233, if the minimum is satisfiedDFSCoding, namely performing rightmost path expansion on the frequent sub-processes to obtain new frequent sub-processes;

the rightmost path expanding method comprises the following steps: given graphs G and GDFSA tree T that iteratively expands the set of vertices that graph G has been visited until a complete set is createdDFSA tree, adding a new edge e between the rightmost node and another node on the rightmost path, or introducing a new node and connecting to the node on the rightmost path;

s234, judging whether the new frequent sub-process meets the minimum support degreemin_sptIf the requirement is met, storing the new frequent sub-process into the frequent sub-process set

Performing the following steps;

further, the step S3 includes:

s31, calculating the semantic similarity of the nodes of the two frequent sub-processes;

s32, adding a hierarchy influence factor according to the semantic similarity of the frequent sub-process nodes, and calculating the behavior similarity of the frequent sub-processes;

s33, clustering the frequent sub-processes by adopting a hierarchical clustering algorithm to form a component alternative set

。

Further, the step S31 includes:

s311, training the model by using Word2Vec, and connecting the nodes

And node

Is vectorized to obtain

And

；

s312, setting weight of nouns in node semantics

Calculating semantic similarity of nodes in the sub-process, and recording as

：

And S313, calculating semantic similarity of all nodes in the two frequent sub-processes pairwise by adopting the methods of S311 and S312.

Further, the step S32 includes:

s321, calculating a hierarchy influence factor of each layer of the frequent sub-processes

WhereinnGraph model depth representing frequent sub-processes:

S322、

representing frequent sub-processes

In the first placeiThe set of nodes of the layer is,

representing the number of elements in the set;

and

respectively representing frequent sub-processes

And frequent sub-processes

In the first placeiFirst of a layerjIndividual node according to node similarity

Calculating the hierarchical similarity of the sub-processes

：

S323, comprehensively considering the hierarchy similarity

And hierarchy impact factor

Is calculated to havenFrequent sub-flows of layers

And frequent sub-processes

Degree of similarity of behaviors between

：

。

Further, the step S33 includes:

s331, calculating clustering clusters based on frequent sub-process behavior similarity

And

is a distance of

：

S332, regarding each frequent sub-process as an initial clustering cluster;

s333, finding out two cluster clusters closest to each other for merging, and repeating the process continuously until all frequent sub-processes become a cluster;

s334, recording clustering cluster division results of each layer in the clustering process to form component alternative set

。

Further, the specific method of step 4 is as follows:

s41, calculating intra-cluster similarity of component alternative setICS(C)：

Wherein the content of the first and second substances,

is shown asjA frequent sub-process;

s42, statistical component alternative set

In

The number of sub-processes appearing in the same original flow chart is marked as

；

S43, statistical component alternative set

In

The number of sub-processes appearing in different original flow charts is recorded as

；

S44, settingkSet of component alternatives

In (1)

The number of the sets is the same as the number of the sets,

setting global coincidence rate weight

And local coincidence rate weight

Calculating the component coincidence rate of the component candidate set:

wherein: weighted sub-process number similarity

S45, considering similarity and coincidence rate in the clusters, and calculating feasibility of the assembly candidate set as an assembly:

。

further, the step S5 includes:

s51, setting assembly feasibility indexes and judging assembly alternative sets

Whether the feasibility index is met;

s52, if the feasibility index is larger than or equal to the feasibility index, outputting an alternative set, and constructing a reusable component;

and S53, if the feasibility index is smaller than the feasibility index, abandoning the construction assembly.

Advantageous effects

The industrial management software system has a large number of same transaction processing processes in each service scene, the method performs flow similarity analysis facing to service flows, can mine a similar sub-flow set from the service flows as a component module of the software, and greatly improves the speed of constructing components on the premise of ensuring the component quality;

according to the method, the reusable component based on the system can be automatically excavated only by inputting the service flow chart of the system, other configurations are not needed, the development period of industrial management software is optimized, the software development difficulty is reduced, and powerful support is provided for the reduction of the subsequent software maintenance cost;

by implementing the actual software flow chart data of an enterprise specifically, the method automatically excavates the reusable components, so that software development can be better carried out by utilizing a software multiplexing technology, and support is provided for improving the development efficiency of industrial management software and reducing the development cost of the industrial management software.

The invention can analyze the business process chart set in the industrial management software, quickly excavate the sub-processes with higher multiplexing from the complex business flow as the reusable component, greatly improve the speed of constructing the component, and better utilize the software multiplexing to develop the software.

Drawings

FIG. 1 is a flow chart of the scheme of the invention.

FIG. 2 is an example of the conversion of the business process flow diagram of S1 of the present invention into a graph model.

FIG. 3 is a flowchart illustrating the step S2 according to the present invention.

FIG. 4 is a flowchart illustrating the step S3 according to the present invention.

FIG. 5 is a diagram illustrating a rightmost path expansion method according to the present invention.

Detailed Description

In order to facilitate the understanding of the technical contents of the present invention by those skilled in the art, the present invention will be further explained with reference to the accompanying drawings.

As shown in fig. 1, which is a flowchart of a solution of the present invention, a mining method for reusable components oriented to complex business processes of the present invention includes:

s1, constructing a front-end flowchart drawing function based on JavaScript, inputting a service flowchart set of an industrial management software system on a front-end page, obtaining a data set in a Json format, saving the file as flowchart.

The flow chart preprocessing module comprises the following specific processes:

s11, reading in a file flowchart. csv, and analyzing Json format data based on Python language. New constructionNodeConverting the flow chart nodes into triplesNode =（nId，nLabel，nText) WhereinnIdA unique identification number representing the node;nLabelthe corresponding value of the state of the node in the flow is shown as table 1;nTextrepresenting a specific behavioral description of the node;

TABLE 1

Node state classes	Correspond tonLabel
		Start of	1
End up	2
		Step (ii) of	3
Condition	4
		Document	5
Data of	6
		Frequent sub-processes	7
Annotating	8

S12, New constructionEdgeConverting the flow chart connecting line into a quadrupletEdge =（eId，nFrom，nTo,eLable) WhereineIdA unique identification number representing the connection;nFromthe unique identification number of the starting node of the connection is represented;nTothe unique identification number of the destination node representing the connection;eLablethe state of the link is shown for the purpose of distinguishing the normal link from the conditional link, as shown in table 2;

TABLE 2

Connection type	Correspond toeLable
		Common connecting wire	1
Conditional connecting line Y	2
		Conditional connecting line N	3

S13, combining the results obtained in S11 and S12 to create a new classGraphConverting the business flow chart into a binary groupG=(N,E) WhereinNBeing triadsNodeIn the collection of the images, the image data is collected,Eis a quadrupleEdgeA collection of (a). Converting each flowchart intoGraphAfter the class, add to graph model set graph _ set.

S2, mining sub-processes with similar structures in the directed graph model data set based on the frequent sub-process algorithm gSpan, and recording the sub-processes as a set

Each set has several structurally similar frequent sub-processes, denoted as

. Setting algorithm parameters: minimum support of frequent sub-processesmin_sptIs 2.

The specific steps of the frequent sub-flow algorithm gSpan are shown in fig. 3, and are as follows:

s21, traverse

Use of

Counting the occurrence frequency of labels of edges and nodes in the graph, and respectively storing the occurrence frequency in

And

the sorting is done from high to low according to frequency. According to the preset minimum support degreemin_sptRemoval of

And

the nodes and edges with lower frequency are obtained to obtain a new graphG. How to measure the frequency of the sub-processes is judged by the support degree, if the sub-processes are more than or equal to the minimum support degreemin_sptThen the sub-flow is considered frequent;

s22, creating DFSEdge class, and comparing the graphGIs relabeled as DFSEdge (0,1, vevlb) and added toDFSEncoding

In, the rule of marking is according to the frequency, i.e. the more frequentHigh, the smaller the lexicographic order corresponding to the labeled tag. Meanwhile, the mapping relation between the original label and the new label is stored and recorded as

；

S23, forming a set by the edges with the highest frequencyE. And put the sets in alphabetical orderEThe lower the lexicographic order, the higher the frequency, the earlier the corresponding edge is ranked.

S24, setEThe edge in (2) can be regarded as the simplest frequent sub-process, and depth-first mining is carried out based on the frequent sub-process.

Further, the step S24 includes:

s241, judging whether the frequent sub-processes meet the minimumDFSAnd (5) encoding.

S244, if not minimumDFSCoding, namely finishing the mining process of the frequent sub-process;

s242, if it is minimumDFSAnd coding, namely performing rightmost path expansion on the frequent sub-processes to obtain new frequent sub-processes. And (3) rightmost path expansion: given a diagramGAndGis/are as followsDFSTree (R)T (the accessed vertex set is iteratively expanded until a complete vertex set is createdDFSTree), a new edgeeMay be added between the rightmost node and another node on the rightmost path (backward expansion) or a new node may be introduced and connected to the node on the rightmost path (forward expansion). Since both extensions occur on the rightmost path, as shown in FIG. 5;

s243, judging whether the new frequent sub-flow meets the minimum support degreemin_sptIf the requirement is met, adding the new frequent sub-processes into the result set, and returning to S241 to continue recursive mining;

s3 calculating a structure-similar frequent sub-process set

Middle two elements

And

the behavior similarity of the component is measured according to the similarity, the distance between the frequent sub-processes is measured, the sets are subjected to hierarchical clustering based on the distance, and a component alternative set is formed

。

The specific process of step 3 is shown in fig. 4, and is as follows:

s31, according to the frequent sub-process nodesnTextIn pairwise frequent sub-processes of information calculation

And

semantic similarity of nodesNode_similarity；

The specific steps of calculating the semantic similarity of the nodes are as follows:

311. introducing python into a program

The module is used for training a Word2Vec model by using corpora such as Wikipedia and engineering terms, and saving a file as embedding _64. model;

s312, traversing all the nodes of the flow chart in the frequent sub-flow set, and performing the next stepnTextJoin to a collection

In (1). Introducing python into a program

Module, using the same

Function pair

All ofnTextAnd performing word segmentation. Then, importing a file embedding _64.model, loading a word2vec model, and finally recording word information semantics after the opposite quantitative word segmentation of the model into a form ofvec(n) And stores the word vector matrix

；

S313, using a loop statement pair

Any two different frequent sub-flowcharts in each set perform the following operations: two frequent sub-processes are obtained in each layer through breadth-first search calculation

And

corresponding node

And

is/are as followsnTextUse of

Module pairnTextWord segmentation and then passage through word vector matrix

Obtaining word vector values, and obtaining semantic vectors of nodes after mean value calculation

And

simultaneously setting the node wordsWeight of noun in sense

Calculating the semantic similarity of the nodes in the frequent sub-processes according to the following formula

Is marked as

：

S32, according to the semantic similarity of the nodes in the sub-processNode_similarityAdding a hierarchy influence factorHierarchical_WeightCalculating the behavior similarity of the frequent sub-processesBehavioral_Similarity。

The specific steps for calculating the behavior similarity of the sub-processes are as follows:

s321, recording the layer where the node with the income degree of 0 in the frequent sub-flowchart model is located as the 0 layer, recording the layer where the child node of the 0 layer is located as the 1 layer, and so on. In the breadth-first search in step S313, the graph model depth of the frequent sub-processes is calculated at the same time and recorded asnFinally, the hierarchical influence factors of each layer are circularly calculated according to the following formula

：

S322, in the breadth first search of the step S313, statistics is carried out

And

first, theiThe set of nodes of the layer is,and calculating the number of nodes in the set. At the same time, according to the node similarity obtained in S313

To be connected toiThe similarity of the corresponding nodes of the layer is obtained by superposition

Is divided byiNumber of layer nodes

Obtaining the hierarchical similarity of the process

Is marked as

：

S323, comprehensively considering the hierarchy similarity

And hierarchy impact factor

Calculated according to the following formula to havenFrequent sub-flows of layers

And

degree of similarity of behaviors between

Is marked as

：

S33, adopting a hierarchical clustering algorithm to collect each frequent sub-process

Clustering is carried out, the sets are divided at different levels according to the distance between clustering clusters to form component alternative sets

。

The hierarchical clustering comprises the following specific steps:

s331, calculating clustering clusters according to hierarchical clustering Complete-Link definitions and based on behavior similarity of pairwise frequent sub-processes

And

is marked as

And is and

：

s332, regarding each frequent sub-process of the frequent sub-process set as an initial cluster

；

S333, repeating the following processes until all the frequent sub-processes become a cluster: finding two cluster clusters closest to each other

And

merging and clustering

Renumbering is

，

Deleting distance matrix

To (1) a

And row and column

And (4) columns. Storing clustering result and merging distance into structure matrix every time of repetition

；

S334, traversing result matrix

Recording the set division results of different levels to form a component alternative set

；

S4, set of all component alternatives

And evaluating based on the feasibility of the multiplexing component, and calculating the feasibility of the component formed by the component candidate set.

The specific process of the step 4 is as follows:

s41, according to

Computing a set of component alternatives according to the formula

Degree of intra-cluster similarity of

Is added to

In, convert to

Object, is marked as

：

S42, statistical component alternative set

In

The number of frequent sub-processes appearing in the same original flow chart is recorded as

；

S43, statistical component alternative set

In

Appear in differentThe number of frequent sub-processes in the original flow chart is recorded as

；

S44, settingkSet of component alternatives

In (1)

The number of the sets is the same as the number of the sets,

setting global coincidence rate weight

And local coincidence rate weight

Calculating the component coincidence rate of the component candidate set according to the following formula

Is added to

In (1). Introduction into

Of modules

Function, using the function

To be converted into

Of objects

Data normalization was performed and recorded as

：

Wherein:

s45, comprehensively considering similarity and coincidence rate in the clusters, and calculating the alternative set according to the following formula

Feasibility as a component, is

：

Step S5 includes:

s51, setting assembly feasibility indexes and judging assembly alternative sets

Is/are as follows

Whether the feasibility index is met;

Claims

1. A reusable component mining method for complex business processes is characterized by comprising the following steps:

Composition set

,

；

S3 calculation set

Clustering to form component alternative set

；

；

S4, set of all component alternatives

Feasibility based on multiplexing componentPerforming row evaluation, and calculating the feasibility of the components formed by the alternative sets;

s5, judging component alternative set according to feasibility

2. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S1 includes:

s12, converting the flow chart connecting line into a quadrupletEdge =（eId，nFrom，nTo,eLable) WhereineIdA unique identification number representing the connection;nFromthe unique identification number of the starting node of the connection is represented;nTothe unique identification number of the destination node representing the connection;eLablethe state of the connecting line is shown, and the purpose is to distinguish a common connecting line from a conditional connecting line;

s13, combining the results from S11 and S12 to convert the business flow chart into a graph model represented by symbolsG=(N,E) Where N is a tripletNodeIn the collection of the images, the image data is collected,Eis a quadrupleEdgeA collection of (a).

3. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S2 includes:

s22, model of the graph

；

Obtaining a frequent sub-process set

。

4. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S31 includes:

s311, training the model by using Word2Vec, and connecting the nodes

And node

Is vectorized to obtain

And

；

s312, setting weight of nouns in node semantics

Calculating semantic similarity of nodes in the sub-process, and recording as

：

5. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S32 includes:

WhereinnGraph model depth representing frequent sub-processes:

S322、

indicating frequent childrenFlow path

In the first placeiThe set of nodes of the layer is,

representing the number of elements in the set;

and

respectively representing frequent sub-processes

And frequent sub-processes

Calculating the hierarchical similarity of the sub-processes

：

S323, comprehensively considering the hierarchy similarity

And hierarchy impact factor

Is calculated to havenFrequent sub-flows of layers

And frequent sub-processes

Degree of similarity of behaviors between

：

。

6. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S33 includes:

And

is a distance of

：

S332, regarding each frequent sub-process as an initial clustering cluster;

。

7. The mining method for reusable components oriented to complex business processes, according to claim 1, characterized in that the specific method in step 4 is:

s41, calculating intra-cluster similarity of component alternative setICS(C)：

Wherein the content of the first and second substances,

is shown asjA frequent sub-process;

s42, statistical component alternative set

Middle cluster

；

S43, statistical component alternative set

Middle cluster

；

S44, settingkSet of component alternatives

Cluster in (1)

The number of the sets is the same as the number of the sets,

setting global coincidence rate weight

And local coincidence rate weight

Calculating the component coincidence rate of the component candidate set:

wherein: weighted sub-process similarity number:

。

8. the method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S5 includes:

s51, setting assembly feasibility indexes and judging assembly alternative sets

Is/are as follows

Whether it is full ofA foot feasibility index;

9. The method for mining reusable components oriented to complex business processes, according to claim 3, wherein the step S23 includes:

the rightmost path expanding method comprises the following steps: given graphs G and GDFSTree (R)TThe tree T is repeatedly expanded for the set of vertices that graph G has been visited until a complete set is createdDFSA tree, adding a new edge e between the rightmost node and another node on the rightmost path, or introducing a new node and connecting to the node on the rightmost path;

In (1).