CN113254013B - Reusable component mining method for complex business process - Google Patents
Reusable component mining method for complex business process Download PDFInfo
- Publication number
- CN113254013B CN113254013B CN202110804713.3A CN202110804713A CN113254013B CN 113254013 B CN113254013 B CN 113254013B CN 202110804713 A CN202110804713 A CN 202110804713A CN 113254013 B CN113254013 B CN 113254013B
- Authority
- CN
- China
- Prior art keywords
- processes
- sub
- frequent sub
- node
- frequent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/36—Software reuse
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/427—Parsing
Abstract
The invention discloses a complex business process-oriented reusable component mining method, and belongs to the technical field of software reuse. The method is oriented to the business process to carry out process similarity analysis, can mine a similar sub-process set from the business process as a component module of the software, and greatly improves the speed of constructing the component on the premise of ensuring the component quality; according to the method, the reusable component based on the system can be automatically excavated only by inputting the service flow chart of the system, other configurations are not needed, the development period of industrial management software is optimized, the software development difficulty is reduced, and powerful support is provided for the reduction of the subsequent software maintenance cost; by implementing the actual software flow chart data of an enterprise specifically, the method automatically excavates the reusable components, so that software development can be better carried out by utilizing a software multiplexing technology, and support is provided for improving the development efficiency of industrial management software and reducing the development cost of the industrial management software.
Description
Technical Field
The invention belongs to the technical field of software multiplexing, and particularly relates to a complex business process-oriented reusable component mining method, wherein the classification number is G06K.
Background
As a carrier of intelligent manufacturing, industrial management software has been deeply integrated into industrial design and manufacturing processes, and becomes an information core of manufacturing industry. With the growth of the market, the architecture of the industrial management software is more and more complex, the quality requirement is gradually improved, and how to rapidly and efficiently develop the industrial management software becomes the difficulty of the current industrial management software development. Most industrial management software has a large number of same transaction processes in a complex business scene, and in order to accelerate software development efficiency, component-based software development becomes one of the mainstream ways at present. The existing reusable component mining method mainly performs mining from an object-oriented API through using frequency, and is not strong in practicability in the field of industrial management software. Aiming at the characteristics of an industrial management software system, the process similarity analysis is carried out facing to the business process, and a similar sub-process set is automatically mined from the business process to be used as a component module of the software, so that the speed of constructing the component can be greatly improved on the premise of ensuring the quality of the component, and the development of the industrial management software is better carried out by utilizing a software multiplexing technology.
Disclosure of Invention
The invention solves the problems of low development speed and low efficiency of industrial software in the prior art due to complex service scenes.
The technical scheme of the invention is that a reusable component mining method for complex business processes comprises the following steps:
s1, inputting a service flow chart set of the system, and converting the service flow chart set into a graph model set represented by symbols through preprocessing;
s2, mining frequent sub-processes with similar structures through the graph model set, and mining all frequent sub-processes with similar structuresComposition set,;
S3 calculation setThe behavior similarity of the medium-frequency and frequent sub-processes is set according to the similarityClustering to form component alternative set;
S4, set of all component alternativesEvaluating based on the feasibility of the multiplexing assembly, and calculating the feasibility of the assembly formed by the alternative sets;
s5, judging component alternative set according to feasibilityAnd if the feasibility index is not less than the feasibility index, constructing the reusable component, otherwise, abandoning the construction of the component.
Further, the step S1 includes:
s11, converting each flow chart node in the business flow chart set into a tripleNode =(nId,nLabel,nText) WhereinnIdA unique identification number representing the node;nLabelrepresenting the corresponding value of the state of the node in the flow;nTextrepresenting a specific behavioral description of the node;
s12, converting the flow chart connecting line into a quadrupletEdge =(eId,nFrom,nTo,eLable) WhereineIdA unique identification number representing the connection;nFromthe unique mark of the starting node of the connecting lineIdentifying a number;nTothe unique identification number of the destination node representing the connection;eLablethe state of the connecting line is shown, and the purpose is to distinguish a common connecting line from a conditional connecting line;
s13, combining the results of S11 and S12, converting the business flow chart into a graph model G = (N, E) represented by symbols, whereinNBeing triadsNodeIn the collection of the images, the image data is collected,Eis a quadrupleEdgeA collection of (a).
Further, the step S2 includes:
s21, counting the frequency of the edges and nodes with the same function according to the preset minimum support degreemin_sptRemoval frequency less than minimum supportmin_sptTo obtain a new graph model(ii) a According to edgeeLableAnd of a nodenLabelTo determine whether the edge and node functions are the same;
s22, model of the graphThe edges and nodes in the graph are re-marked by frequency classification, the marking rule is that the marked edges or nodes are marked with the same frequency according to the frequency, the higher the frequency, the smaller the corresponding dictionary sequence of the marks is, and meanwhile, the mapping relation between the identification numbers and the marks and the corresponding relation between the marked edges and the original edges are stored;
s23, selecting the mark with the highest frequency, wherein the edge and the node corresponding to the mark form the maximum frequent sub-processAFor the most frequent sub-flowAExcavating, excavating a frequent sub-process each time;
S24, digging the mark with the second highest frequency and then the mark with the third highest frequency in sequence according to the method of the step S23 until all the frequent sub-processes are dugObtaining a frequent sub-process set。
Further, the step S23 includes:
s231, maximum frequent sub-flowAFor the first frequent sub-process, each subsequent frequent sub-process is reduced by one edge on the previous frequent sub-process; judging whether the current frequent sub-flow meets the minimumDFSCoding;
s232, if not satisfying the minimumDFSCoding, and ending the mining process of the sub-process;
s233, if the minimum is satisfiedDFSCoding, namely performing rightmost path expansion on the frequent sub-processes to obtain new frequent sub-processes;
the rightmost path expanding method comprises the following steps: given graphs G and GDFSA tree T that iteratively expands the set of vertices that graph G has been visited until a complete set is createdDFSA tree, adding a new edge e between the rightmost node and another node on the rightmost path, or introducing a new node and connecting to the node on the rightmost path;
s234, judging whether the new frequent sub-process meets the minimum support degreemin_sptIf the requirement is met, storing the new frequent sub-process into the frequent sub-process setPerforming the following steps;
further, the step S3 includes:
s31, calculating the semantic similarity of the nodes of the two frequent sub-processes;
s32, adding a hierarchy influence factor according to the semantic similarity of the frequent sub-process nodes, and calculating the behavior similarity of the frequent sub-processes;
s33, clustering the frequent sub-processes by adopting a hierarchical clustering algorithm to form a component alternative set。
Further, the step S31 includes:
s311, training the model by using Word2Vec, and connecting the nodesAnd nodeIs vectorized to obtainAnd;
s312, setting weight of nouns in node semanticsCalculating semantic similarity of nodes in the sub-process, and recording as:
And S313, calculating semantic similarity of all nodes in the two frequent sub-processes pairwise by adopting the methods of S311 and S312.
Further, the step S32 includes:
s321, calculating a hierarchy influence factor of each layer of the frequent sub-processesWhereinnGraph model depth representing frequent sub-processes:
S322、representing frequent sub-processesIn the first placeiThe set of nodes of the layer is,representing the number of elements in the set;andrespectively representing frequent sub-processesAnd frequent sub-processesIn the first placeiFirst of a layerjIndividual node according to node similarityCalculating the hierarchical similarity of the sub-processes:
S323, comprehensively considering the hierarchy similarityAnd hierarchy impact factorIs calculated to havenFrequent sub-flows of layersAnd frequent sub-processesDegree of similarity of behaviors between:
Further, the step S33 includes:
s331, calculating clustering clusters based on frequent sub-process behavior similarityAndis a distance of :
S332, regarding each frequent sub-process as an initial clustering cluster;
s333, finding out two cluster clusters closest to each other for merging, and repeating the process continuously until all frequent sub-processes become a cluster;
s334, recording clustering cluster division results of each layer in the clustering process to form component alternative set。
Further, the specific method of step 4 is as follows:
s41, calculating intra-cluster similarity of component alternative setICS(C):
s42, statistical component alternative setInThe number of sub-processes appearing in the same original flow chart is marked as;
S43, statistical component alternative setInThe number of sub-processes appearing in different original flow charts is recorded as;
S44, settingkSet of component alternativesIn (1)The number of the sets is the same as the number of the sets,setting global coincidence rate weightAnd local coincidence rate weightCalculating the component coincidence rate of the component candidate set:
S45, considering similarity and coincidence rate in the clusters, and calculating feasibility of the assembly candidate set as an assembly:
further, the step S5 includes:
s51, setting assembly feasibility indexes and judging assembly alternative setsWhether the feasibility index is met;
s52, if the feasibility index is larger than or equal to the feasibility index, outputting an alternative set, and constructing a reusable component;
and S53, if the feasibility index is smaller than the feasibility index, abandoning the construction assembly.
Advantageous effects
The industrial management software system has a large number of same transaction processing processes in each service scene, the method performs flow similarity analysis facing to service flows, can mine a similar sub-flow set from the service flows as a component module of the software, and greatly improves the speed of constructing components on the premise of ensuring the component quality;
according to the method, the reusable component based on the system can be automatically excavated only by inputting the service flow chart of the system, other configurations are not needed, the development period of industrial management software is optimized, the software development difficulty is reduced, and powerful support is provided for the reduction of the subsequent software maintenance cost;
by implementing the actual software flow chart data of an enterprise specifically, the method automatically excavates the reusable components, so that software development can be better carried out by utilizing a software multiplexing technology, and support is provided for improving the development efficiency of industrial management software and reducing the development cost of the industrial management software.
The invention can analyze the business process chart set in the industrial management software, quickly excavate the sub-processes with higher multiplexing from the complex business flow as the reusable component, greatly improve the speed of constructing the component, and better utilize the software multiplexing to develop the software.
Drawings
FIG. 1 is a flow chart of the scheme of the invention.
FIG. 2 is an example of the conversion of the business process flow diagram of S1 of the present invention into a graph model.
FIG. 3 is a flowchart illustrating the step S2 according to the present invention.
FIG. 4 is a flowchart illustrating the step S3 according to the present invention.
FIG. 5 is a diagram illustrating a rightmost path expansion method according to the present invention.
Detailed Description
In order to facilitate the understanding of the technical contents of the present invention by those skilled in the art, the present invention will be further explained with reference to the accompanying drawings.
As shown in fig. 1, which is a flowchart of a solution of the present invention, a mining method for reusable components oriented to complex business processes of the present invention includes:
s1, constructing a front-end flowchart drawing function based on JavaScript, inputting a service flowchart set of an industrial management software system on a front-end page, obtaining a data set in a Json format, saving the file as flowchart.
The flow chart preprocessing module comprises the following specific processes:
s11, reading in a file flowchart. csv, and analyzing Json format data based on Python language. New constructionNodeConverting the flow chart nodes into triplesNode =(nId,nLabel,nText) WhereinnIdA unique identification number representing the node;nLabelthe corresponding value of the state of the node in the flow is shown as table 1;nTextrepresenting a specific behavioral description of the node;
TABLE 1
Node state classes | Correspond tonLabel |
Start of | 1 |
End up | 2 |
Step (ii) of | 3 |
Condition | 4 |
Document | 5 |
Data of | 6 |
Frequent sub-processes | 7 |
Annotating | 8 |
S12, New constructionEdgeConverting the flow chart connecting line into a quadrupletEdge =(eId,nFrom,nTo,eLable) WhereineIdA unique identification number representing the connection;nFromthe unique identification number of the starting node of the connection is represented;nTothe unique identification number of the destination node representing the connection;eLablethe state of the link is shown for the purpose of distinguishing the normal link from the conditional link, as shown in table 2;
TABLE 2
Connection type | Correspond toeLable |
Common connecting wire | 1 |
Conditional connecting line Y | 2 |
Conditional connecting line N | 3 |
S13, combining the results obtained in S11 and S12 to create a new classGraphConverting the business flow chart into a binary groupG=(N,E) WhereinNBeing triadsNodeIn the collection of the images, the image data is collected,Eis a quadrupleEdgeA collection of (a). Converting each flowchart intoGraphAfter the class, add to graph model set graph _ set.
S2, mining sub-processes with similar structures in the directed graph model data set based on the frequent sub-process algorithm gSpan, and recording the sub-processes as a setEach set has several structurally similar frequent sub-processes, denoted as. Setting algorithm parameters: minimum support of frequent sub-processesmin_sptIs 2.
The specific steps of the frequent sub-flow algorithm gSpan are shown in fig. 3, and are as follows:
s21, traverseUse ofCounting the occurrence frequency of labels of edges and nodes in the graph, and respectively storing the occurrence frequency inAndthe sorting is done from high to low according to frequency. According to the preset minimum support degreemin_sptRemoval ofAndthe nodes and edges with lower frequency are obtained to obtain a new graphG. How to measure the frequency of the sub-processes is judged by the support degree, if the sub-processes are more than or equal to the minimum support degreemin_sptThen the sub-flow is considered frequent;
s22, creating DFSEdge class, and comparing the graphGIs relabeled as DFSEdge (0,1, vevlb) and added toDFSEncodingIn, the rule of marking is according to the frequency, i.e. the more frequentHigh, the smaller the lexicographic order corresponding to the labeled tag. Meanwhile, the mapping relation between the original label and the new label is stored and recorded as;
S23, forming a set by the edges with the highest frequencyE. And put the sets in alphabetical orderEThe lower the lexicographic order, the higher the frequency, the earlier the corresponding edge is ranked.
S24, setEThe edge in (2) can be regarded as the simplest frequent sub-process, and depth-first mining is carried out based on the frequent sub-process.
Further, the step S24 includes:
s241, judging whether the frequent sub-processes meet the minimumDFSAnd (5) encoding.
S244, if not minimumDFSCoding, namely finishing the mining process of the frequent sub-process;
s242, if it is minimumDFSAnd coding, namely performing rightmost path expansion on the frequent sub-processes to obtain new frequent sub-processes. And (3) rightmost path expansion: given a diagramGAndGis/are as followsDFSTree (R)T (the accessed vertex set is iteratively expanded until a complete vertex set is createdDFSTree), a new edgeeMay be added between the rightmost node and another node on the rightmost path (backward expansion) or a new node may be introduced and connected to the node on the rightmost path (forward expansion). Since both extensions occur on the rightmost path, as shown in FIG. 5;
s243, judging whether the new frequent sub-flow meets the minimum support degreemin_sptIf the requirement is met, adding the new frequent sub-processes into the result set, and returning to S241 to continue recursive mining;
s3 calculating a structure-similar frequent sub-process setMiddle two elementsAndthe behavior similarity of the component is measured according to the similarity, the distance between the frequent sub-processes is measured, the sets are subjected to hierarchical clustering based on the distance, and a component alternative set is formed。
The specific process of step 3 is shown in fig. 4, and is as follows:
s31, according to the frequent sub-process nodesnTextIn pairwise frequent sub-processes of information calculationAndsemantic similarity of nodesNode_similarity;
The specific steps of calculating the semantic similarity of the nodes are as follows:
311. introducing python into a programThe module is used for training a Word2Vec model by using corpora such as Wikipedia and engineering terms, and saving a file as embedding _64. model;
s312, traversing all the nodes of the flow chart in the frequent sub-flow set, and performing the next stepnTextJoin to a collectionIn (1). Introducing python into a programModule, using the sameFunction pairAll ofnTextAnd performing word segmentation. Then, importing a file embedding _64.model, loading a word2vec model, and finally recording word information semantics after the opposite quantitative word segmentation of the model into a form ofvec(n) And stores the word vector matrix;
S313, using a loop statement pairAny two different frequent sub-flowcharts in each set perform the following operations: two frequent sub-processes are obtained in each layer through breadth-first search calculationAndcorresponding nodeAndis/are as followsnTextUse ofModule pairnTextWord segmentation and then passage through word vector matrixObtaining word vector values, and obtaining semantic vectors of nodes after mean value calculationAndsimultaneously setting the node wordsWeight of noun in senseCalculating the semantic similarity of the nodes in the frequent sub-processes according to the following formulaIs marked as:
S32, according to the semantic similarity of the nodes in the sub-processNode_similarityAdding a hierarchy influence factorHierarchical_WeightCalculating the behavior similarity of the frequent sub-processesBehavioral_Similarity。
The specific steps for calculating the behavior similarity of the sub-processes are as follows:
s321, recording the layer where the node with the income degree of 0 in the frequent sub-flowchart model is located as the 0 layer, recording the layer where the child node of the 0 layer is located as the 1 layer, and so on. In the breadth-first search in step S313, the graph model depth of the frequent sub-processes is calculated at the same time and recorded asnFinally, the hierarchical influence factors of each layer are circularly calculated according to the following formula:
S322, in the breadth first search of the step S313, statistics is carried outAndfirst, theiThe set of nodes of the layer is,and calculating the number of nodes in the set. At the same time, according to the node similarity obtained in S313To be connected toiThe similarity of the corresponding nodes of the layer is obtained by superpositionIs divided byiNumber of layer nodesObtaining the hierarchical similarity of the processIs marked as:
S323, comprehensively considering the hierarchy similarityAnd hierarchy impact factorCalculated according to the following formula to havenFrequent sub-flows of layersAnddegree of similarity of behaviors betweenIs marked as:
S33, adopting a hierarchical clustering algorithm to collect each frequent sub-processClustering is carried out, the sets are divided at different levels according to the distance between clustering clusters to form component alternative sets。
The hierarchical clustering comprises the following specific steps:
s331, calculating clustering clusters according to hierarchical clustering Complete-Link definitions and based on behavior similarity of pairwise frequent sub-processesAndis marked asAnd is and:
S333, repeating the following processes until all the frequent sub-processes become a cluster: finding two cluster clusters closest to each otherAndmerging and clusteringRenumbering is,Deleting distance matrixTo (1) aAnd row and columnAnd (4) columns. Storing clustering result and merging distance into structure matrix every time of repetition;
S334, traversing result matrixRecording the set division results of different levels to form a component alternative set;
S4, set of all component alternativesAnd evaluating based on the feasibility of the multiplexing component, and calculating the feasibility of the component formed by the component candidate set.
The specific process of the step 4 is as follows:
s41, according toComputing a set of component alternatives according to the formulaDegree of intra-cluster similarity ofIs added toIn, convert toObject, is marked as:
S42, statistical component alternative setInThe number of frequent sub-processes appearing in the same original flow chart is recorded as;
S43, statistical component alternative setInAppear in differentThe number of frequent sub-processes in the original flow chart is recorded as;
S44, settingkSet of component alternativesIn (1)The number of the sets is the same as the number of the sets,setting global coincidence rate weightAnd local coincidence rate weightCalculating the component coincidence rate of the component candidate set according to the following formulaIs added toIn (1). Introduction intoOf modulesFunction, using the functionTo be converted intoOf objectsData normalization was performed and recorded as:
s45, comprehensively considering similarity and coincidence rate in the clusters, and calculating the alternative set according to the following formulaFeasibility as a component, is:
Step S5 includes:
s51, setting assembly feasibility indexes and judging assembly alternative setsIs/are as followsWhether the feasibility index is met;
s52, if the feasibility index is larger than or equal to the feasibility index, outputting an alternative set, and constructing a reusable component;
and S53, if the feasibility index is smaller than the feasibility index, abandoning the construction assembly.
Claims (9)
1. A reusable component mining method for complex business processes is characterized by comprising the following steps:
s1, inputting a service flow chart set of the system, and converting the service flow chart set into a graph model set represented by symbols through preprocessing;
s2, mining frequent sub-processes with similar structures through the graph model set, and mining all frequent sub-processes with similar structuresComposition set,;
S3 calculation setThe behavior similarity of the medium-frequency and frequent sub-processes is set according to the similarityClustering to form component alternative set;
S31, calculating the semantic similarity of the nodes of the two frequent sub-processes;
s32, adding a hierarchy influence factor according to the semantic similarity of the frequent sub-process nodes, and calculating the behavior similarity of the frequent sub-processes;
s33, clustering the frequent sub-processes by adopting a hierarchical clustering algorithm to form a component alternative set;
S4, set of all component alternativesFeasibility based on multiplexing componentPerforming row evaluation, and calculating the feasibility of the components formed by the alternative sets;
2. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S1 includes:
s11, converting each flow chart node in the business flow chart set into a tripleNode =(nId,nLabel,nText) WhereinnIdA unique identification number representing the node;nLabelrepresenting the corresponding value of the state of the node in the flow;nTextrepresenting a specific behavioral description of the node;
s12, converting the flow chart connecting line into a quadrupletEdge =(eId,nFrom,nTo,eLable) WhereineIdA unique identification number representing the connection;nFromthe unique identification number of the starting node of the connection is represented;nTothe unique identification number of the destination node representing the connection;eLablethe state of the connecting line is shown, and the purpose is to distinguish a common connecting line from a conditional connecting line;
s13, combining the results from S11 and S12 to convert the business flow chart into a graph model represented by symbolsG=(N,E) Where N is a tripletNodeIn the collection of the images, the image data is collected,Eis a quadrupleEdgeA collection of (a).
3. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S2 includes:
s21, counting the frequency of the edges and nodes with the same function according to the preset minimum support degreemin_sptRemoval frequency less than minimum supportmin_sptTo obtain a new graph model(ii) a According to edgeeLableAnd of a nodenLabelTo determine whether the edge and node functions are the same;
s22, model of the graphThe edges and nodes in the graph are re-marked by frequency classification, the marking rule is that the marked edges or nodes are marked with the same frequency according to the frequency, the higher the frequency, the smaller the corresponding dictionary sequence of the marks is, and meanwhile, the mapping relation between the identification numbers and the marks and the corresponding relation between the marked edges and the original edges are stored;
s23, selecting the mark with the highest frequency, wherein the edge and the node corresponding to the mark form the maximum frequent sub-processAFor the most frequent sub-flowAExcavating, excavating a frequent sub-process each time;
4. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S31 includes:
s311, training the model by using Word2Vec, and connecting the nodesAnd nodeIs vectorized to obtainAnd;
s312, setting weight of nouns in node semanticsCalculating semantic similarity of nodes in the sub-process, and recording as:
And S313, calculating semantic similarity of all nodes in the two frequent sub-processes pairwise by adopting the methods of S311 and S312.
5. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S32 includes:
s321, calculating a hierarchy influence factor of each layer of the frequent sub-processesWhereinnGraph model depth representing frequent sub-processes:
S322、indicating frequent childrenFlow pathIn the first placeiThe set of nodes of the layer is,representing the number of elements in the set;andrespectively representing frequent sub-processesAnd frequent sub-processesIn the first placeiFirst of a layerjIndividual node according to node similarityCalculating the hierarchical similarity of the sub-processes:
S323, comprehensively considering the hierarchy similarityAnd hierarchy impact factorIs calculated to havenFrequent sub-flows of layersAnd frequent sub-processesDegree of similarity of behaviors between:
6. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S33 includes:
s331, calculating clustering clusters based on frequent sub-process behavior similarityAndis a distance of :
S332, regarding each frequent sub-process as an initial clustering cluster;
s333, finding out two cluster clusters closest to each other for merging, and repeating the process continuously until all frequent sub-processes become a cluster;
7. The mining method for reusable components oriented to complex business processes, according to claim 1, characterized in that the specific method in step 4 is:
s41, calculating intra-cluster similarity of component alternative setICS(C):
s42, statistical component alternative setMiddle clusterThe number of sub-processes appearing in the same original flow chart is marked as;
S43, statistical component alternative setMiddle clusterThe number of sub-processes appearing in different original flow charts is recorded as;
S44, settingkSet of component alternativesCluster in (1)The number of the sets is the same as the number of the sets,setting global coincidence rate weightAnd local coincidence rate weightCalculating the component coincidence rate of the component candidate set:
wherein: weighted sub-process similarity number:
s45, considering similarity and coincidence rate in the clusters, and calculating feasibility of the assembly candidate set as an assembly:
8. the method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S5 includes:
s51, setting assembly feasibility indexes and judging assembly alternative setsIs/are as followsWhether it is full ofA foot feasibility index;
s52, if the feasibility index is larger than or equal to the feasibility index, outputting an alternative set, and constructing a reusable component;
and S53, if the feasibility index is smaller than the feasibility index, abandoning the construction assembly.
9. The method for mining reusable components oriented to complex business processes, according to claim 3, wherein the step S23 includes:
s231, maximum frequent sub-flowAFor the first frequent sub-process, each subsequent frequent sub-process is reduced by one edge on the previous frequent sub-process; judging whether the current frequent sub-flow meets the minimumDFSCoding;
s232, if not satisfying the minimumDFSCoding, and ending the mining process of the sub-process;
s233, if the minimum is satisfiedDFSCoding, namely performing rightmost path expansion on the frequent sub-processes to obtain new frequent sub-processes;
the rightmost path expanding method comprises the following steps: given graphs G and GDFSTree (R)TThe tree T is repeatedly expanded for the set of vertices that graph G has been visited until a complete set is createdDFSA tree, adding a new edge e between the rightmost node and another node on the rightmost path, or introducing a new node and connecting to the node on the rightmost path;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110804713.3A CN113254013B (en) | 2021-07-16 | 2021-07-16 | Reusable component mining method for complex business process |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110804713.3A CN113254013B (en) | 2021-07-16 | 2021-07-16 | Reusable component mining method for complex business process |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113254013A CN113254013A (en) | 2021-08-13 |
CN113254013B true CN113254013B (en) | 2021-09-24 |
Family
ID=77180513
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110804713.3A Active CN113254013B (en) | 2021-07-16 | 2021-07-16 | Reusable component mining method for complex business process |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113254013B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116185395B (en) * | 2023-04-21 | 2023-07-14 | 华能信息技术有限公司 | Flow component templatization definition method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015021404A2 (en) * | 2013-08-08 | 2015-02-12 | Systamedic Inc. | Method for knowledge extraction through data mining |
CN104767736A (en) * | 2015-03-23 | 2015-07-08 | 电子科技大学 | Method for separating unknown single protocol data stream into different types of data frames |
CN104954453A (en) * | 2015-06-02 | 2015-09-30 | 浙江工业大学 | Data mining REST service platform based on cloud computing |
CN109902284A (en) * | 2018-12-30 | 2019-06-18 | 中国科学院软件研究所 | A kind of unsupervised argument extracting method excavated based on debate |
CN111984688A (en) * | 2020-08-19 | 2020-11-24 | 中国银行股份有限公司 | Method and device for determining business knowledge association relation |
CN112764749A (en) * | 2021-01-18 | 2021-05-07 | 电子科技大学 | Method and system for generating software functional interface group |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109063727B (en) * | 2018-06-19 | 2021-02-05 | 东软集团股份有限公司 | Method and device for calculating track frequency, storage medium and electronic equipment |
CN109272155B (en) * | 2018-09-11 | 2021-07-06 | 郑州向心力通信技术股份有限公司 | Enterprise behavior analysis system based on big data |
-
2021
- 2021-07-16 CN CN202110804713.3A patent/CN113254013B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015021404A2 (en) * | 2013-08-08 | 2015-02-12 | Systamedic Inc. | Method for knowledge extraction through data mining |
CN104767736A (en) * | 2015-03-23 | 2015-07-08 | 电子科技大学 | Method for separating unknown single protocol data stream into different types of data frames |
CN104954453A (en) * | 2015-06-02 | 2015-09-30 | 浙江工业大学 | Data mining REST service platform based on cloud computing |
CN109902284A (en) * | 2018-12-30 | 2019-06-18 | 中国科学院软件研究所 | A kind of unsupervised argument extracting method excavated based on debate |
CN111984688A (en) * | 2020-08-19 | 2020-11-24 | 中国银行股份有限公司 | Method and device for determining business knowledge association relation |
CN112764749A (en) * | 2021-01-18 | 2021-05-07 | 电子科技大学 | Method and system for generating software functional interface group |
Non-Patent Citations (4)
Title |
---|
An Improved Deep Neural Network Model for Job Matching;Yu Deng 等;《in Proceedings of the 2018 International conference on Artificial intelligence and Big Data》;20180628;106-112 * |
基于Web日志的用户行为相似度的业务流分析方法;贾小贝 等;《长江大学学报》;20180310;第15卷(第5期);69-75 * |
基于相似度估计文档复制检测系统的设计与实现;潘鑫;《中国优秀硕士学位论文全文数据库信息科技辑》;20160315(第3期);I138-7838 * |
面向产品持续质量控制的数据挖掘技术与应用研究;谭军;《中国博士学位论文全文数据库信息科技辑》;20141215(第12期);I138-25 * |
Also Published As
Publication number | Publication date |
---|---|
CN113254013A (en) | 2021-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110245981B (en) | Crowd type identification method based on mobile phone signaling data | |
Dong et al. | Tablesense: Spreadsheet table detection with convolutional neural networks | |
US11651150B2 (en) | Deep learning based table detection and associated data extraction from scanned image documents | |
CN112765358A (en) | Taxpayer industry classification method based on noise label learning | |
Xue et al. | Res2tim: Reconstruct syntactic structures from table images | |
Liu et al. | LF-YOLO: A lighter and faster yolo for weld defect detection of X-ray image | |
CN107577702B (en) | Method for distinguishing traffic information in social media | |
CN113255321B (en) | Financial field chapter-level event extraction method based on article entity word dependency relationship | |
CN110889310B (en) | Financial document information intelligent extraction system and method | |
CN111737477A (en) | Intellectual property big data-based intelligence investigation method, system and storage medium | |
CN113254013B (en) | Reusable component mining method for complex business process | |
Xia et al. | A deep Siamese postclassification fusion network for semantic change detection | |
CN103544186A (en) | Method and equipment for discovering theme key words in picture | |
CN110750588A (en) | Multi-source heterogeneous data fusion method, system, device and storage medium | |
CN110827131B (en) | Tax payer credit evaluation method based on distributed automatic feature combination | |
CN114863091A (en) | Target detection training method based on pseudo label | |
CN110287292A (en) | A kind of judge's measurement of penalty irrelevance prediction technique and device | |
CN115437952A (en) | Statement level software defect detection method based on deep learning | |
Shen et al. | Divide rows and conquer cells: Towards structure recognition for large tables | |
CN114444484A (en) | Document-level event extraction method and system based on double-layer graph | |
CN111597806A (en) | Method, equipment and medium for identifying short message text template based on statistical model | |
CN114780403A (en) | Software defect prediction method and device based on enhanced code attribute graph | |
CN114519344A (en) | Discourse element sub-graph prompt generation and guide-based discourse-level multi-event extraction method | |
Shi et al. | Graph Guided Transformer: An Image-Based Global Learning Framework for Hyperspectral Image Classification | |
Chen et al. | Named entity recognition in multi-level contexts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |