CN113254013A - Reusable component mining method for complex business process - Google Patents

Reusable component mining method for complex business process Download PDF

Info

Publication number
CN113254013A
CN113254013A CN202110804713.3A CN202110804713A CN113254013A CN 113254013 A CN113254013 A CN 113254013A CN 202110804713 A CN202110804713 A CN 202110804713A CN 113254013 A CN113254013 A CN 113254013A
Authority
CN
China
Prior art keywords
processes
sub
frequent sub
node
component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110804713.3A
Other languages
Chinese (zh)
Other versions
CN113254013B (en
Inventor
潘鑫
李贞昊
雷航
荣燊
李若尘
柳叶康
肖泾军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110804713.3A priority Critical patent/CN113254013B/en
Publication of CN113254013A publication Critical patent/CN113254013A/en
Application granted granted Critical
Publication of CN113254013B publication Critical patent/CN113254013B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/36Software reuse
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing

Abstract

The invention discloses a complex business process-oriented reusable component mining method, and belongs to the technical field of software reuse. The method is oriented to the business process to carry out process similarity analysis, can mine a similar sub-process set from the business process as a component module of the software, and greatly improves the speed of constructing the component on the premise of ensuring the component quality; according to the method, the reusable component based on the system can be automatically excavated only by inputting the service flow chart of the system, other configurations are not needed, the development period of industrial management software is optimized, the software development difficulty is reduced, and powerful support is provided for the reduction of the subsequent software maintenance cost; by implementing the actual software flow chart data of an enterprise specifically, the method automatically excavates the reusable components, so that software development can be better carried out by utilizing a software multiplexing technology, and support is provided for improving the development efficiency of industrial management software and reducing the development cost of the industrial management software.

Description

Reusable component mining method for complex business process
Technical Field
The invention belongs to the technical field of software multiplexing, and particularly relates to a complex business process-oriented reusable component mining method, wherein the classification number is G06K.
Background
As a carrier of intelligent manufacturing, industrial management software has been deeply integrated into industrial design and manufacturing processes, and becomes an information core of manufacturing industry. With the growth of the market, the architecture of the industrial management software is more and more complex, the quality requirement is gradually improved, and how to rapidly and efficiently develop the industrial management software becomes the difficulty of the current industrial management software development. Most industrial management software has a large number of same transaction processes in a complex business scene, and in order to accelerate software development efficiency, component-based software development becomes one of the mainstream ways at present. The existing reusable component mining method mainly performs mining from an object-oriented API through using frequency, and is not strong in practicability in the field of industrial management software. Aiming at the characteristics of an industrial management software system, the process similarity analysis is carried out facing to the business process, and a similar sub-process set is automatically mined from the business process to be used as a component module of the software, so that the speed of constructing the component can be greatly improved on the premise of ensuring the quality of the component, and the development of the industrial management software is better carried out by utilizing a software multiplexing technology.
Disclosure of Invention
The invention solves the problems of low development speed and low efficiency of industrial software in the prior art due to complex service scenes.
The technical scheme of the invention is that a reusable component mining method for complex business processes comprises the following steps:
s1, inputting a service flow chart set of the system, and converting the service flow chart set into a graph model set represented by symbols through preprocessing;
s2 TongThe graph model set is used for mining frequent sub-processes with similar structures, and all frequent sub-processes with similar structures
Figure 444004DEST_PATH_IMAGE001
Composition set
Figure 483765DEST_PATH_IMAGE002
,
Figure 463222DEST_PATH_IMAGE003
S3 calculation set
Figure 977380DEST_PATH_IMAGE002
The behavior similarity of the medium-frequency and frequent sub-processes is set according to the similarity
Figure 700485DEST_PATH_IMAGE002
Clustering to form component alternative set
Figure 577175DEST_PATH_IMAGE004
S4, set of all component alternatives
Figure 134058DEST_PATH_IMAGE004
Evaluating based on the feasibility of the multiplexing assembly, and calculating the feasibility of the assembly formed by the alternative sets;
s5, judging component alternative set according to feasibility
Figure 728987DEST_PATH_IMAGE004
And if the feasibility index is not less than the feasibility index, constructing the reusable component, otherwise, abandoning the construction of the component.
Further, the step S1 includes:
s11, converting each flow chart node in the business flow chart set into a tripleNode =(nId,nLabel,nText) WhereinnIdA unique identification number representing the node;nLabelrepresenting the corresponding value of the state of the node in the flow;nTexta concrete row representing the nodeFor the purpose of description;
s12, converting the flow chart connecting line into a quadrupletEdge =(eId,nFrom,nTo,eLable) WhereineIdA unique identification number representing the connection;nFromthe unique identification number of the starting node of the connection is represented;nTothe unique identification number of the destination node representing the connection;eLablethe state of the connecting line is shown, and the purpose is to distinguish a common connecting line from a conditional connecting line;
s13, combining the results of S11 and S12, converting the business flow chart into a graph model G = (N, E) represented by symbols, whereinNBeing triadsNodeIn the collection of the images, the image data is collected,Eis a quadrupleEdgeA collection of (a).
Further, the step S2 includes:
s21, counting the frequency of the edges and nodes with the same function according to the preset minimum support degreemin_sptRemoval frequency less than minimum supportmin_sptTo obtain a new graph model
Figure 396729DEST_PATH_IMAGE005
(ii) a According to edgeeLableAnd of a nodenLabelTo determine whether the edge and node functions are the same;
s22, model of the graph
Figure 127925DEST_PATH_IMAGE005
The edges and nodes in the graph are re-marked by frequency classification, the marking rule is that the marked edges or nodes are marked with the same frequency according to the frequency, the higher the frequency, the smaller the corresponding dictionary sequence of the marks is, and meanwhile, the mapping relation between the identification numbers and the marks and the corresponding relation between the marked edges and the original edges are stored;
s23, selecting the mark with the highest frequency, wherein the edge and the node corresponding to the mark form the maximum frequent sub-processAFor the most frequent sub-flowAExcavating, excavating a frequent sub-process each time
Figure 980343DEST_PATH_IMAGE001
S24, finishing digging according to the method of the step S23Sequentially mining the mark of the second high frequency, then the mark of the third high frequency, until all the frequent sub-processes are mined
Figure 203514DEST_PATH_IMAGE001
Obtaining a frequent sub-process set
Figure 268422DEST_PATH_IMAGE002
Further, the step S23 includes:
s231, maximum frequent sub-flowAFor the first frequent sub-process, each subsequent frequent sub-process is reduced by one edge on the previous frequent sub-process; judging whether the current frequent sub-flow meets the minimumDFSCoding;
s232, if not satisfying the minimumDFSCoding, and ending the mining process of the sub-process;
s233, if the minimum is satisfiedDFSCoding, namely performing rightmost path expansion on the frequent sub-processes to obtain new frequent sub-processes;
the rightmost path expanding method comprises the following steps: given graphs G and GDFSA tree T that iteratively expands the set of vertices that graph G has been visited until a complete set is createdDFSA tree, adding a new edge e between the rightmost node and another node on the rightmost path, or introducing a new node and connecting to the node on the rightmost path;
s234, judging whether the new frequent sub-process meets the minimum support degreemin_sptIf the requirement is met, storing the new frequent sub-process into the frequent sub-process set
Figure 260649DEST_PATH_IMAGE002
Performing the following steps;
further, the step S3 includes:
s31, calculating the semantic similarity of the nodes of the two frequent sub-processes;
s32, adding a hierarchy influence factor according to the semantic similarity of the frequent sub-process nodes, and calculating the behavior similarity of the frequent sub-processes;
s33, adopting a hierarchical clustering algorithm to perform frequent sub-processesClustering to form component alternative set
Figure 18389DEST_PATH_IMAGE006
Further, the step S31 includes:
s311, training the model by using Word2Vec, and connecting the nodes
Figure 587911DEST_PATH_IMAGE007
And node
Figure 597455DEST_PATH_IMAGE008
Is vectorized to obtain
Figure 834401DEST_PATH_IMAGE009
And
Figure 372830DEST_PATH_IMAGE010
s312, setting weight of nouns in node semantics
Figure 695227DEST_PATH_IMAGE011
Calculating semantic similarity of nodes in the sub-process, and recording as
Figure 101938DEST_PATH_IMAGE012
Figure 68757DEST_PATH_IMAGE013
And S313, calculating semantic similarity of all nodes in the two frequent sub-processes pairwise by adopting the methods of S311 and S312.
Further, the step S32 includes:
s321, calculating a hierarchy influence factor of each layer of the frequent sub-processes
Figure 902720DEST_PATH_IMAGE014
WhereinnGraph model depth representing frequent sub-processes:
Figure 587780DEST_PATH_IMAGE015
S322、
Figure 798181DEST_PATH_IMAGE016
representing frequent sub-processes
Figure 744140DEST_PATH_IMAGE017
In the first placeiThe set of nodes of the layer is,
Figure 155530DEST_PATH_IMAGE018
representing the number of elements in the set;
Figure 927220DEST_PATH_IMAGE019
and
Figure 816679DEST_PATH_IMAGE020
respectively representing frequent sub-processes
Figure 882724DEST_PATH_IMAGE017
And frequent sub-processes
Figure 58490DEST_PATH_IMAGE021
In the first placeiFirst of a layerjIndividual node according to node similarity
Figure 452562DEST_PATH_IMAGE022
Calculating the hierarchical similarity of the sub-processes
Figure 4766DEST_PATH_IMAGE023
Figure 800684DEST_PATH_IMAGE024
S323, comprehensively considering the hierarchy similarity
Figure 678510DEST_PATH_IMAGE025
And hierarchy impact factor
Figure 418933DEST_PATH_IMAGE026
Is calculated to havenFrequent sub-flows of layers
Figure 650194DEST_PATH_IMAGE017
And frequent sub-processes
Figure 159673DEST_PATH_IMAGE027
Degree of similarity of behaviors between
Figure 83767DEST_PATH_IMAGE029
Figure 311486DEST_PATH_IMAGE030
Further, the step S33 includes:
s331, calculating clustering clusters based on frequent sub-process behavior similarity
Figure 471072DEST_PATH_IMAGE031
And
Figure 976002DEST_PATH_IMAGE032
is a distance of
Figure 930052DEST_PATH_IMAGE033
Figure 786012DEST_PATH_IMAGE034
S332, regarding each frequent sub-process as an initial clustering cluster;
s333, finding out two cluster clusters closest to each other for merging, and repeating the process continuously until all frequent sub-processes become a cluster;
s334, recording clustering cluster division results of each layer in the clustering process to form component alternative set
Figure 218131DEST_PATH_IMAGE035
Further, the specific method of step 4 is as follows:
s41, calculating intra-cluster similarity of component alternative setICS(C)
Figure 967781DEST_PATH_IMAGE036
Wherein the content of the first and second substances,
Figure 233677DEST_PATH_IMAGE038
is shown asjA frequent sub-process;
s42, statistical component alternative set
Figure 170409DEST_PATH_IMAGE039
In
Figure 812743DEST_PATH_IMAGE040
The number of sub-processes appearing in the same original flow chart is marked as
Figure 151321DEST_PATH_IMAGE041
S43, statistical component alternative set
Figure 712752DEST_PATH_IMAGE039
In
Figure 12146DEST_PATH_IMAGE040
The number of sub-processes appearing in different original flow charts is recorded as
Figure 51646DEST_PATH_IMAGE042
S44, settingkSet of component alternatives
Figure 504450DEST_PATH_IMAGE039
In (1)
Figure 112149DEST_PATH_IMAGE040
The number of the sets is the same as the number of the sets,
Figure 289053DEST_PATH_IMAGE043
setting global coincidence rate weightHeavy load
Figure 742031DEST_PATH_IMAGE044
And local coincidence rate weight
Figure 320780DEST_PATH_IMAGE045
Calculating the component coincidence rate of the component candidate set:
Figure 958434DEST_PATH_IMAGE046
wherein: weighted sub-process number similarity
Figure 498000DEST_PATH_IMAGE047
S45, considering similarity and coincidence rate in the clusters, and calculating feasibility of the assembly candidate set as an assembly:
Figure 144882DEST_PATH_IMAGE048
further, the step S5 includes:
s51, setting assembly feasibility indexes and judging assembly alternative sets
Figure 187924DEST_PATH_IMAGE049
Whether the feasibility index is met;
s52, if the feasibility index is larger than or equal to the feasibility index, outputting an alternative set, and constructing a reusable component;
and S53, if the feasibility index is smaller than the feasibility index, abandoning the construction assembly.
Advantageous effects
The industrial management software system has a large number of same transaction processing processes in each service scene, the method performs flow similarity analysis facing to service flows, can mine a similar sub-flow set from the service flows as a component module of the software, and greatly improves the speed of constructing components on the premise of ensuring the component quality;
according to the method, the reusable component based on the system can be automatically excavated only by inputting the service flow chart of the system, other configurations are not needed, the development period of industrial management software is optimized, the software development difficulty is reduced, and powerful support is provided for the reduction of the subsequent software maintenance cost;
by implementing the actual software flow chart data of an enterprise specifically, the method automatically excavates the reusable components, so that software development can be better carried out by utilizing a software multiplexing technology, and support is provided for improving the development efficiency of industrial management software and reducing the development cost of the industrial management software.
The invention can analyze the business process chart set in the industrial management software, quickly excavate the sub-processes with higher multiplexing from the complex business flow as the reusable component, greatly improve the speed of constructing the component, and better utilize the software multiplexing to develop the software.
Drawings
FIG. 1 is a flow chart of the scheme of the invention.
FIG. 2 is an example of the conversion of the business process flow diagram of S1 of the present invention into a graph model.
FIG. 3 is a flowchart illustrating the step S2 according to the present invention.
FIG. 4 is a flowchart illustrating the step S3 according to the present invention.
FIG. 5 is a diagram illustrating a rightmost path expansion method according to the present invention.
Detailed Description
In order to facilitate the understanding of the technical contents of the present invention by those skilled in the art, the present invention will be further explained with reference to the accompanying drawings.
As shown in fig. 1, which is a flowchart of a solution of the present invention, a mining method for reusable components oriented to complex business processes of the present invention includes:
s1, constructing a front-end flowchart drawing function based on JavaScript, inputting a service flowchart set of an industrial management software system on a front-end page, obtaining a data set in a Json format, saving the file as flowchart.
The flow chart preprocessing module comprises the following specific processes:
s11, reading file flowcsv, parsing Json formatted data based on Python language. New constructionNodeConverting the flow chart nodes into triplesNode =(nId,nLabel,nText) WhereinnIdA unique identification number representing the node;nLabelthe corresponding value of the state of the node in the flow is shown as table 1;nTextrepresenting a specific behavioral description of the node;
TABLE 1
Node state classes Correspond tonLabel
Start of 1
End up 2
Step (ii) of 3
Condition 4
Document 5
Data of 6
Frequent sub-processes 7
Annotating 8
S12, New constructionEdgeConverting the flow chart connecting line into a quadrupletEdge =(eId,nFrom,nTo,eLable) WhereineIdA unique identification number representing the connection;nFromthe unique identification number of the starting node of the connection is represented;nTothe unique identification number of the destination node representing the connection;eLablethe state of the link is shown for the purpose of distinguishing the normal link from the conditional link, as shown in table 2;
TABLE 2
Connection type Correspond toeLable
Common connecting wire 1
Conditional connecting line Y 2
Conditional connecting line N 3
S13, combining the results obtained in S11 and S12 to create a new classGraphConverting the business flow chart into a binary groupG=(N,E) WhereinNBeing triadsNodeIn the collection of the images, the image data is collected,Eis a quadrupleEdgeA collection of (a). Converting each flowchart intoGraphAfter the class, add to graph model set graph _ set.
S2, mining sub-processes with similar structures in the directed graph model data set based on the frequent sub-process algorithm gSpan, and recording the sub-processes as a set
Figure 262060DEST_PATH_IMAGE050
Each set has several structurally similar frequent sub-processes, denoted as
Figure 882397DEST_PATH_IMAGE051
. Setting algorithm parameters: minimum support of frequent sub-processesmin_sptIs 2.
The specific steps of the frequent sub-flow algorithm gSpan are shown in fig. 3, and are as follows:
s21, traverse
Figure 208336DEST_PATH_IMAGE052
Use of
Figure 496098DEST_PATH_IMAGE053
Counting the occurrence frequency of labels of edges and nodes in the graph, and respectively storing the occurrence frequency in
Figure 350921DEST_PATH_IMAGE054
And
Figure 724134DEST_PATH_IMAGE055
the sorting is done from high to low according to frequency. According to the preset minimum support degreemin_sptRemoval of
Figure 712818DEST_PATH_IMAGE054
And
Figure 464874DEST_PATH_IMAGE055
the nodes and edges with lower frequency are obtained to obtain a new graphG. How to measure the frequency of the sub-processes is judged by the support degree, if the sub-processes are more than or equal to the minimum support degreemin_sptThen the sub-flow is considered frequent;
s22, creating DFSEdge class, and comparing the graphGIs relabeled as DFSEdge (0,1, vevlb) and added toDFSEncoding
Figure 146391DEST_PATH_IMAGE056
In, rule of labeling is high according to frequencyLow, i.e., the higher the frequency, the smaller the lexicographic order corresponding to the labeled tags. Meanwhile, the mapping relation between the original label and the new label is stored and recorded as
Figure 741320DEST_PATH_IMAGE057
S23, forming a set by the edges with the highest frequencyE. And put the sets in alphabetical orderEThe lower the lexicographic order, the higher the frequency, the earlier the corresponding edge is ranked.
S24, setEThe edge in (2) can be regarded as the simplest frequent sub-process, and depth-first mining is carried out based on the frequent sub-process.
Further, the step S24 includes:
s241, judging whether the frequent sub-processes meet the minimumDFSAnd (5) encoding.
S244, if not minimumDFSCoding, namely finishing the mining process of the frequent sub-process;
s242, if it is minimumDFSAnd coding, namely performing rightmost path expansion on the frequent sub-processes to obtain new frequent sub-processes. And (3) rightmost path expansion: given a diagramGAndGis/are as followsDFSTree (R)T (the accessed vertex set is iteratively expanded until a complete vertex set is createdDFSTree), a new edgeeMay be added between the rightmost node and another node on the rightmost path (backward expansion) or a new node may be introduced and connected to the node on the rightmost path (forward expansion). Since both extensions occur on the rightmost path, as shown in FIG. 5;
s243, judging whether the new frequent sub-flow meets the minimum support degreemin_sptIf the requirement is met, adding the new frequent sub-processes into the result set, and returning to S241 to continue recursive mining;
s3 calculating a structure-similar frequent sub-process set
Figure 143483DEST_PATH_IMAGE058
Middle two elements
Figure 405837DEST_PATH_IMAGE059
And
Figure 868042DEST_PATH_IMAGE060
the behavior similarity of the component is measured according to the similarity, the distance between the frequent sub-processes is measured, the sets are subjected to hierarchical clustering based on the distance, and a component alternative set is formed
Figure 950268DEST_PATH_IMAGE061
The specific process of step 3 is shown in fig. 4, and is as follows:
s31, according to the frequent sub-process nodesnTextIn pairwise frequent sub-processes of information calculation
Figure 280755DEST_PATH_IMAGE059
And
Figure 272982DEST_PATH_IMAGE060
semantic similarity of nodesNode_similarity
The specific steps of calculating the semantic similarity of the nodes are as follows:
311. introducing python into a program
Figure 30722DEST_PATH_IMAGE062
The module is used for training a Word2Vec model by using corpora such as Wikipedia and engineering terms, and saving a file as embedding _64. model;
s312, traversing all the nodes of the flow chart in the frequent sub-flow set, and performing the next stepnTextJoin to a collection
Figure 475610DEST_PATH_IMAGE063
In (1). Introducing python into a program
Figure 350068DEST_PATH_IMAGE064
Module, using the same
Figure 321435DEST_PATH_IMAGE065
Function pair
Figure 125443DEST_PATH_IMAGE063
All ofnTextAnd performing word segmentation. Then, importing a file embedding _64.model, loading a word2vec model, and finally recording word information semantics after the opposite quantitative word segmentation of the model into a form ofvec(n) And stores the word vector matrix
Figure 182261DEST_PATH_IMAGE066
S313, using a loop statement pair
Figure 854551DEST_PATH_IMAGE058
Any two different frequent sub-flowcharts in each set perform the following operations: two frequent sub-processes are obtained in each layer through breadth-first search calculation
Figure 555790DEST_PATH_IMAGE059
And
Figure 655333DEST_PATH_IMAGE060
corresponding node
Figure 199447DEST_PATH_IMAGE067
And
Figure 285215DEST_PATH_IMAGE068
is/are as followsnTextUse of
Figure 231174DEST_PATH_IMAGE064
Module pairnTextWord segmentation and then passage through word vector matrix
Figure 236039DEST_PATH_IMAGE066
Obtaining word vector values, and obtaining semantic vectors of nodes after mean value calculation
Figure 877236DEST_PATH_IMAGE069
And
Figure 891329DEST_PATH_IMAGE070
while setting the weight of nouns in node semantics
Figure 691794DEST_PATH_IMAGE071
Calculating the semantic similarity of the nodes in the frequent sub-processes according to the following formula
Figure 8506DEST_PATH_IMAGE072
Is marked as
Figure 261633DEST_PATH_IMAGE073
Figure 813837DEST_PATH_IMAGE074
S32, according to the semantic similarity of the nodes in the sub-processNode_similarityAdding a hierarchy influence factorHierarchical_WeightCalculating the behavior similarity of the frequent sub-processesBehavioral_Similarity
The specific steps for calculating the behavior similarity of the sub-processes are as follows:
s321, recording the layer where the node with the income degree of 0 in the frequent sub-flowchart model is located as the 0 layer, recording the layer where the child node of the 0 layer is located as the 1 layer, and so on. In the breadth-first search in step S313, the graph model depth of the frequent sub-processes is calculated at the same time and recorded asnFinally, the hierarchical influence factors of each layer are circularly calculated according to the following formula
Figure 609755DEST_PATH_IMAGE075
Figure 222002DEST_PATH_IMAGE076
S322, in the breadth first search of the step S313, statistics is carried out
Figure 962425DEST_PATH_IMAGE059
And
Figure 193686DEST_PATH_IMAGE060
first, theiAnd (4) collecting the nodes of the layer, and calculating the number of the nodes in the collection. Root of SimultaneousAccording to the node similarity obtained in S313
Figure 968744DEST_PATH_IMAGE073
To be connected toiThe similarity of the corresponding nodes of the layer is obtained by superposition
Figure 751892DEST_PATH_IMAGE077
Is divided byiNumber of layer nodes
Figure 854977DEST_PATH_IMAGE078
Obtaining the hierarchical similarity of the process
Figure 844641DEST_PATH_IMAGE079
Is marked as
Figure 208626DEST_PATH_IMAGE080
Figure 303621DEST_PATH_IMAGE081
S323, comprehensively considering the hierarchy similarity
Figure 753057DEST_PATH_IMAGE080
And hierarchy impact factor
Figure 591700DEST_PATH_IMAGE082
Calculated according to the following formula to havenFrequent sub-flows of layers
Figure 341350DEST_PATH_IMAGE059
And
Figure 466301DEST_PATH_IMAGE060
degree of similarity of behaviors between
Figure 543979DEST_PATH_IMAGE083
Is marked as
Figure 45367DEST_PATH_IMAGE084
Figure 524890DEST_PATH_IMAGE085
S33, adopting a hierarchical clustering algorithm to collect each frequent sub-process
Figure 820742DEST_PATH_IMAGE086
Clustering is carried out, the sets are divided at different levels according to the distance between clustering clusters to form component alternative sets
Figure 385716DEST_PATH_IMAGE087
The hierarchical clustering comprises the following specific steps:
s331, calculating clustering clusters according to hierarchical clustering Complete-Link definitions and based on behavior similarity of pairwise frequent sub-processes
Figure 690795DEST_PATH_IMAGE088
And
Figure 149458DEST_PATH_IMAGE089
is marked as
Figure 350633DEST_PATH_IMAGE090
And is and
Figure 871744DEST_PATH_IMAGE091
Figure 980514DEST_PATH_IMAGE092
s332, regarding each frequent sub-process of the frequent sub-process set as an initial cluster
Figure 293684DEST_PATH_IMAGE093
S333, repeating the following processes until all the frequent sub-processes become a cluster: finding two cluster clusters closest to each other
Figure 72284DEST_PATH_IMAGE094
And
Figure 470904DEST_PATH_IMAGE095
merging and clustering
Figure 993152DEST_PATH_IMAGE096
Renumbering is
Figure 160829DEST_PATH_IMAGE097
Figure 110330DEST_PATH_IMAGE098
Deleting distance matrix
Figure 996246DEST_PATH_IMAGE090
To (1) a
Figure 181240DEST_PATH_IMAGE099
And row and column
Figure 609947DEST_PATH_IMAGE099
And (4) columns. Storing clustering result and merging distance into structure matrix every time of repetition
Figure 595264DEST_PATH_IMAGE100
S334, traversing result matrix
Figure 843843DEST_PATH_IMAGE100
Recording the set division results of different levels to form a component alternative set
Figure 566948DEST_PATH_IMAGE101
S4, set of all component alternatives
Figure 974796DEST_PATH_IMAGE102
And evaluating based on the feasibility of the multiplexing component, and calculating the feasibility of the component formed by the component candidate set.
The specific process of the step 4 is as follows:
s41 rootAccording to
Figure 266100DEST_PATH_IMAGE090
Computing a set of component alternatives according to the formula
Figure 861029DEST_PATH_IMAGE103
Degree of intra-cluster similarity of
Figure 387825DEST_PATH_IMAGE104
Is added to
Figure 525546DEST_PATH_IMAGE105
In, convert to
Figure 846806DEST_PATH_IMAGE106
Object, is marked as
Figure 69977DEST_PATH_IMAGE107
Figure 666043DEST_PATH_IMAGE108
S42, statistical component alternative set
Figure 517324DEST_PATH_IMAGE103
In
Figure 150431DEST_PATH_IMAGE109
The number of frequent sub-processes appearing in the same original flow chart is recorded as
Figure 719953DEST_PATH_IMAGE041
S43, statistical component alternative set
Figure 729497DEST_PATH_IMAGE103
In
Figure 435285DEST_PATH_IMAGE109
The number of frequent sub-processes appearing in different original flow charts is recorded as
Figure 629506DEST_PATH_IMAGE042
S44, settingkSet of component alternatives
Figure 561690DEST_PATH_IMAGE103
In (1)
Figure 233979DEST_PATH_IMAGE109
The number of the sets is the same as the number of the sets,
Figure 935219DEST_PATH_IMAGE110
setting global coincidence rate weight
Figure 300341DEST_PATH_IMAGE111
And local coincidence rate weight
Figure 578876DEST_PATH_IMAGE112
Calculating the component coincidence rate of the component candidate set according to the following formula
Figure 930223DEST_PATH_IMAGE113
Is added to
Figure 610603DEST_PATH_IMAGE114
In (1). Introduction into
Figure 21993DEST_PATH_IMAGE115
Of modules
Figure 787823DEST_PATH_IMAGE116
Function, using the function
Figure 61636DEST_PATH_IMAGE117
To be converted into
Figure 471888DEST_PATH_IMAGE106
Of objects
Figure 178813DEST_PATH_IMAGE114
Data normalization was performed and recorded as
Figure 431940DEST_PATH_IMAGE118
Figure 125090DEST_PATH_IMAGE119
Wherein:
Figure 45641DEST_PATH_IMAGE120
s45, comprehensively considering similarity and coincidence rate in the clusters, and calculating the alternative set according to the following formula
Figure 267675DEST_PATH_IMAGE121
Feasibility as a component, is
Figure 273677DEST_PATH_IMAGE122
Figure 770518DEST_PATH_IMAGE123
Step S5 includes:
s51, setting assembly feasibility indexes and judging assembly alternative sets
Figure 279996DEST_PATH_IMAGE121
Is/are as follows
Figure 63145DEST_PATH_IMAGE124
Whether the feasibility index is met;
s52, if the feasibility index is larger than or equal to the feasibility index, outputting an alternative set, and constructing a reusable component;
and S53, if the feasibility index is smaller than the feasibility index, abandoning the construction assembly.

Claims (10)

1. A reusable component mining method for complex business processes is characterized by comprising the following steps:
s1, inputting a service flow chart set of the system, and converting the service flow chart set into a graph model set represented by symbols through preprocessing;
s2, mining frequent sub-processes with similar structures through the graph model set, and mining all frequent sub-processes with similar structures
Figure 232372DEST_PATH_IMAGE001
Composition set
Figure 477409DEST_PATH_IMAGE002
,
Figure 725988DEST_PATH_IMAGE003
S3 calculation set
Figure 714672DEST_PATH_IMAGE002
The behavior similarity of the medium-frequency and frequent sub-processes is set according to the similarity
Figure 732307DEST_PATH_IMAGE002
Clustering to form component alternative set
Figure 165823DEST_PATH_IMAGE004
S4, set of all component alternatives
Figure 760752DEST_PATH_IMAGE004
Evaluating based on the feasibility of the multiplexing assembly, and calculating the feasibility of the assembly formed by the alternative sets;
s5, judging component alternative set according to feasibility
Figure 428494DEST_PATH_IMAGE004
And if the feasibility index is not less than the feasibility index, constructing the reusable component, otherwise, abandoning the construction of the component.
2. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S1 includes:
s11, converting each flow chart node in the business flow chart set into a tripleNode =(nId,nLabel,nText) WhereinnIdA unique identification number representing the node;nLabelrepresenting the corresponding value of the state of the node in the flow;nTextrepresenting a specific behavioral description of the node;
s12, converting the flow chart connecting line into a quadrupletEdge =(eId,nFrom,nTo,eLable) WhereineIdA unique identification number representing the connection;nFromthe unique identification number of the starting node of the connection is represented;nTothe unique identification number of the destination node representing the connection;eLablethe state of the connecting line is shown, and the purpose is to distinguish a common connecting line from a conditional connecting line;
s13, combining the results from S11 and S12 to convert the business flow chart into a graph model represented by symbolsG=(N,E) Where N is a tripletNodeIn the collection of the images, the image data is collected,Eis a quadrupleEdgeA collection of (a).
3. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S2 includes:
s21, counting the frequency of the edges and nodes with the same function according to the preset minimum support degreemin_sptRemoval frequency less than minimum supportmin_sptTo obtain a new graph model
Figure 425269DEST_PATH_IMAGE005
(ii) a According to edgeeLableAnd of a nodenLabelTo determine whether the edge and node functions are the same;
s22, model of the graph
Figure 746529DEST_PATH_IMAGE005
The edges and nodes in the Chinese character are re-marked by frequency classification, the marking rule is that the higher the frequency, the smaller the corresponding dictionary sequence of the marks is, the edges or nodes with the same frequency are marked with the same word, and at the same time, the identification number and the marked characters are storedMapping relation between the original edges and the original edges;
s23, selecting the mark with the highest frequency, wherein the edge and the node corresponding to the mark form the maximum frequent sub-processAFor the most frequent sub-flowAExcavating, excavating a frequent sub-process each time
Figure 969700DEST_PATH_IMAGE001
S24, digging the mark with the second highest frequency and then the mark with the third highest frequency in sequence according to the method of the step S23 until all the frequent sub-processes are dug
Figure 34608DEST_PATH_IMAGE001
Obtaining a frequent sub-process set
Figure 26834DEST_PATH_IMAGE002
4. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S3 includes:
s31, calculating the semantic similarity of the nodes of the two frequent sub-processes;
s32, adding a hierarchy influence factor according to the semantic similarity of the frequent sub-process nodes, and calculating the behavior similarity of the frequent sub-processes;
s33, clustering the frequent sub-processes by adopting a hierarchical clustering algorithm to form a component alternative set
Figure 784575DEST_PATH_IMAGE006
5. The mining method for reusable components oriented to complex business processes, according to claim 1, characterized in that the specific method in step 4 is:
s41, calculating intra-cluster similarity of component alternative setICS(C)
Figure 354096DEST_PATH_IMAGE007
Wherein the content of the first and second substances,
Figure 957116DEST_PATH_IMAGE009
is shown asjA frequent sub-process;
s42, statistical component alternative set
Figure 803849DEST_PATH_IMAGE010
Middle cluster
Figure 466912DEST_PATH_IMAGE011
The number of sub-processes appearing in the same original flow chart is marked as
Figure 789309DEST_PATH_IMAGE012
S43, statistical component alternative set
Figure 71386DEST_PATH_IMAGE010
Middle cluster
Figure 897259DEST_PATH_IMAGE011
The number of sub-processes appearing in different original flow charts is recorded as
Figure 996802DEST_PATH_IMAGE013
S44, settingkSet of component alternatives
Figure 416282DEST_PATH_IMAGE010
Cluster in (1)
Figure 626684DEST_PATH_IMAGE011
The number of the sets is the same as the number of the sets,
Figure 572643DEST_PATH_IMAGE014
setting global coincidence rate weight
Figure 718454DEST_PATH_IMAGE015
And local coincidence rate weight
Figure 484284DEST_PATH_IMAGE016
Calculating the component coincidence rate of the component candidate set:
Figure 498377DEST_PATH_IMAGE017
wherein: weighted sub-process similarity number
Figure 174209DEST_PATH_IMAGE018
S45, considering similarity and coincidence rate in the clusters, and calculating feasibility of the assembly candidate set as an assembly:
Figure 886993DEST_PATH_IMAGE019
6. the method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S5 includes:
s51, setting assembly feasibility indexes and judging assembly alternative sets
Figure 140120DEST_PATH_IMAGE020
Is/are as follows
Figure 567690DEST_PATH_IMAGE021
Whether the feasibility index is met;
s52, if the feasibility index is larger than or equal to the feasibility index, outputting an alternative set, and constructing a reusable component;
and S53, if the feasibility index is smaller than the feasibility index, abandoning the construction assembly.
7. The method for mining reusable components oriented to complex business processes, according to claim 3, wherein the step S23 includes:
s231, maximum frequent sub-flowAFor the first frequent sub-process, each subsequent frequent sub-process is reduced by one edge on the previous frequent sub-process; judging whether the current frequent sub-flow meets the minimumDFSCoding;
s232, if not satisfying the minimumDFSCoding, and ending the mining process of the sub-process;
s233, if the minimum is satisfiedDFSCoding, namely performing rightmost path expansion on the frequent sub-processes to obtain new frequent sub-processes;
the rightmost path expanding method comprises the following steps: given graphs G and GDFSTree (R)TThe tree T is repeatedly expanded for the set of vertices that graph G has been visited until a complete set is createdDFSA tree, adding a new edge e between the rightmost node and another node on the rightmost path, or introducing a new node and connecting to the node on the rightmost path;
s234, judging whether the new frequent sub-process meets the minimum support degreemin_sptIf the requirement is met, storing the new frequent sub-process into the frequent sub-process set
Figure 222662DEST_PATH_IMAGE002
In (1).
8. The method for mining reusable components oriented to complex business processes, according to claim 4, wherein the step S31 includes:
s311, training the model by using Word2Vec, and connecting the nodes
Figure 834909DEST_PATH_IMAGE022
And node
Figure 450698DEST_PATH_IMAGE023
Is vectorized to obtain
Figure 72172DEST_PATH_IMAGE024
And
Figure 722597DEST_PATH_IMAGE025
s312, setting weight of nouns in node semantics
Figure 240166DEST_PATH_IMAGE026
Calculating semantic similarity of nodes in the sub-process, and recording as
Figure 467885DEST_PATH_IMAGE027
Figure 502837DEST_PATH_IMAGE028
And S313, calculating semantic similarity of all nodes in the two frequent sub-processes pairwise by adopting the methods of S311 and S312.
9. The method for mining reusable components oriented to complex business processes, according to claim 4, wherein the step S32 includes:
s321, calculating a hierarchy influence factor of each layer of the frequent sub-processes
Figure 132401DEST_PATH_IMAGE029
WhereinnGraph model depth representing frequent sub-processes:
Figure 820871DEST_PATH_IMAGE030
S322、
Figure 676832DEST_PATH_IMAGE031
representing frequent sub-processes
Figure 374530DEST_PATH_IMAGE032
In the first placeiThe set of nodes of the layer is,
Figure 858601DEST_PATH_IMAGE033
representing the number of elements in the set;
Figure 858918DEST_PATH_IMAGE034
and
Figure 795650DEST_PATH_IMAGE035
respectively representing frequent sub-processes
Figure 562617DEST_PATH_IMAGE032
And frequent sub-processes
Figure 776561DEST_PATH_IMAGE036
In the first placeiFirst of a layerjIndividual node according to node similarity
Figure 72413DEST_PATH_IMAGE037
Calculating the hierarchical similarity of the sub-processes
Figure 762020DEST_PATH_IMAGE038
Figure 676887DEST_PATH_IMAGE039
S323, comprehensively considering the hierarchy similarity
Figure 869971DEST_PATH_IMAGE040
And hierarchy impact factor
Figure 330865DEST_PATH_IMAGE041
Is calculated to havenFrequent sub-flows of layers
Figure 383134DEST_PATH_IMAGE032
And frequent sub-streamsProgram for programming
Figure 960746DEST_PATH_IMAGE042
Behavioral similarity between:
Figure 52516DEST_PATH_IMAGE045
10. the method for mining reusable components oriented to complex business processes, according to claim 4, wherein the step S33 includes:
s331, calculating clustering clusters based on frequent sub-process behavior similarity
Figure 451136DEST_PATH_IMAGE046
And
Figure 98018DEST_PATH_IMAGE047
is a distance of
Figure 141061DEST_PATH_IMAGE048
Figure 949617DEST_PATH_IMAGE049
S332, regarding each frequent sub-process as an initial clustering cluster;
s333, finding out two cluster clusters closest to each other for merging, and repeating the process continuously until all frequent sub-processes become a cluster;
s334, recording clustering cluster division results of each layer in the clustering process to form component alternative set
Figure 835533DEST_PATH_IMAGE050
CN202110804713.3A 2021-07-16 2021-07-16 Reusable component mining method for complex business process Active CN113254013B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110804713.3A CN113254013B (en) 2021-07-16 2021-07-16 Reusable component mining method for complex business process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110804713.3A CN113254013B (en) 2021-07-16 2021-07-16 Reusable component mining method for complex business process

Publications (2)

Publication Number Publication Date
CN113254013A true CN113254013A (en) 2021-08-13
CN113254013B CN113254013B (en) 2021-09-24

Family

ID=77180513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110804713.3A Active CN113254013B (en) 2021-07-16 2021-07-16 Reusable component mining method for complex business process

Country Status (1)

Country Link
CN (1) CN113254013B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116185395A (en) * 2023-04-21 2023-05-30 华能信息技术有限公司 Flow component templatization definition method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015021404A2 (en) * 2013-08-08 2015-02-12 Systamedic Inc. Method for knowledge extraction through data mining
CN104767736A (en) * 2015-03-23 2015-07-08 电子科技大学 Method for separating unknown single protocol data stream into different types of data frames
CN104954453A (en) * 2015-06-02 2015-09-30 浙江工业大学 Data mining REST service platform based on cloud computing
CN109063727A (en) * 2018-06-19 2018-12-21 东软集团股份有限公司 Calculate method, apparatus, storage medium and the electronic equipment of track frequency
CN109272155A (en) * 2018-09-11 2019-01-25 郑州向心力通信技术股份有限公司 A kind of corporate behavior analysis system based on big data
CN109902284A (en) * 2018-12-30 2019-06-18 中国科学院软件研究所 A kind of unsupervised argument extracting method excavated based on debate
CN111984688A (en) * 2020-08-19 2020-11-24 中国银行股份有限公司 Method and device for determining business knowledge association relation
CN112764749A (en) * 2021-01-18 2021-05-07 电子科技大学 Method and system for generating software functional interface group

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015021404A2 (en) * 2013-08-08 2015-02-12 Systamedic Inc. Method for knowledge extraction through data mining
CN104767736A (en) * 2015-03-23 2015-07-08 电子科技大学 Method for separating unknown single protocol data stream into different types of data frames
CN104954453A (en) * 2015-06-02 2015-09-30 浙江工业大学 Data mining REST service platform based on cloud computing
CN109063727A (en) * 2018-06-19 2018-12-21 东软集团股份有限公司 Calculate method, apparatus, storage medium and the electronic equipment of track frequency
CN109272155A (en) * 2018-09-11 2019-01-25 郑州向心力通信技术股份有限公司 A kind of corporate behavior analysis system based on big data
CN109902284A (en) * 2018-12-30 2019-06-18 中国科学院软件研究所 A kind of unsupervised argument extracting method excavated based on debate
CN111984688A (en) * 2020-08-19 2020-11-24 中国银行股份有限公司 Method and device for determining business knowledge association relation
CN112764749A (en) * 2021-01-18 2021-05-07 电子科技大学 Method and system for generating software functional interface group

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YU DENG 等: "An Improved Deep Neural Network Model for Job Matching", 《IN PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA》 *
潘鑫: "基于相似度估计文档复制检测系统的设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
谭军: "面向产品持续质量控制的数据挖掘技术与应用研究", 《中国博士学位论文全文数据库信息科技辑》 *
贾小贝 等: "基于Web日志的用户行为相似度的业务流分析方法", 《长江大学学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116185395A (en) * 2023-04-21 2023-05-30 华能信息技术有限公司 Flow component templatization definition method and system

Also Published As

Publication number Publication date
CN113254013B (en) 2021-09-24

Similar Documents

Publication Publication Date Title
CN110245981B (en) Crowd type identification method based on mobile phone signaling data
CN112765358B (en) Taxpayer industry classification method based on noise label learning
Dong et al. Tablesense: Spreadsheet table detection with convolutional neural networks
Xue et al. Res2tim: Reconstruct syntactic structures from table images
CN110889310B (en) Financial document information intelligent extraction system and method
CN107577702B (en) Method for distinguishing traffic information in social media
CN113255321B (en) Financial field chapter-level event extraction method based on article entity word dependency relationship
CN111737477A (en) Intellectual property big data-based intelligence investigation method, system and storage medium
CN103544186A (en) Method and equipment for discovering theme key words in picture
CN111274817A (en) Intelligent software cost measurement method based on natural language processing technology
CN113254013B (en) Reusable component mining method for complex business process
CN114863091A (en) Target detection training method based on pseudo label
CN110827131A (en) Tax payer credit evaluation method based on distributed automatic feature combination
CN115437952A (en) Statement level software defect detection method based on deep learning
CN114238524A (en) Satellite frequency-orbit data information extraction method based on enhanced sample model
CN114444484A (en) Document-level event extraction method and system based on double-layer graph
Shen et al. Divide rows and conquer cells: Towards structure recognition for large tables
CN113469005A (en) Recognition method of bank receipt, related device and storage medium
CN111597806A (en) Method, equipment and medium for identifying short message text template based on statistical model
US20220076109A1 (en) System for contextual and positional parameterized record building
CN114780403A (en) Software defect prediction method and device based on enhanced code attribute graph
CN114519344A (en) Discourse element sub-graph prompt generation and guide-based discourse-level multi-event extraction method
CN115017144A (en) Method for identifying judicial writing case element entity based on graph neural network
Chen et al. Land-cover change detection using paired OpenStreetMap data and optical high-resolution imagery via object-guided Transformer
Shi et al. Graph Guided Transformer: An Image-Based Global Learning Framework for Hyperspectral Image Classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant