CN113254013A - Reusable component mining method for complex business process - Google Patents
Reusable component mining method for complex business process Download PDFInfo
- Publication number
- CN113254013A CN113254013A CN202110804713.3A CN202110804713A CN113254013A CN 113254013 A CN113254013 A CN 113254013A CN 202110804713 A CN202110804713 A CN 202110804713A CN 113254013 A CN113254013 A CN 113254013A
- Authority
- CN
- China
- Prior art keywords
- processes
- sub
- frequent sub
- node
- component
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/36—Software reuse
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/427—Parsing
Abstract
The invention discloses a complex business process-oriented reusable component mining method, and belongs to the technical field of software reuse. The method is oriented to the business process to carry out process similarity analysis, can mine a similar sub-process set from the business process as a component module of the software, and greatly improves the speed of constructing the component on the premise of ensuring the component quality; according to the method, the reusable component based on the system can be automatically excavated only by inputting the service flow chart of the system, other configurations are not needed, the development period of industrial management software is optimized, the software development difficulty is reduced, and powerful support is provided for the reduction of the subsequent software maintenance cost; by implementing the actual software flow chart data of an enterprise specifically, the method automatically excavates the reusable components, so that software development can be better carried out by utilizing a software multiplexing technology, and support is provided for improving the development efficiency of industrial management software and reducing the development cost of the industrial management software.
Description
Technical Field
The invention belongs to the technical field of software multiplexing, and particularly relates to a complex business process-oriented reusable component mining method, wherein the classification number is G06K.
Background
As a carrier of intelligent manufacturing, industrial management software has been deeply integrated into industrial design and manufacturing processes, and becomes an information core of manufacturing industry. With the growth of the market, the architecture of the industrial management software is more and more complex, the quality requirement is gradually improved, and how to rapidly and efficiently develop the industrial management software becomes the difficulty of the current industrial management software development. Most industrial management software has a large number of same transaction processes in a complex business scene, and in order to accelerate software development efficiency, component-based software development becomes one of the mainstream ways at present. The existing reusable component mining method mainly performs mining from an object-oriented API through using frequency, and is not strong in practicability in the field of industrial management software. Aiming at the characteristics of an industrial management software system, the process similarity analysis is carried out facing to the business process, and a similar sub-process set is automatically mined from the business process to be used as a component module of the software, so that the speed of constructing the component can be greatly improved on the premise of ensuring the quality of the component, and the development of the industrial management software is better carried out by utilizing a software multiplexing technology.
Disclosure of Invention
The invention solves the problems of low development speed and low efficiency of industrial software in the prior art due to complex service scenes.
The technical scheme of the invention is that a reusable component mining method for complex business processes comprises the following steps:
s1, inputting a service flow chart set of the system, and converting the service flow chart set into a graph model set represented by symbols through preprocessing;
s2 TongThe graph model set is used for mining frequent sub-processes with similar structures, and all frequent sub-processes with similar structuresComposition set,;
S3 calculation setThe behavior similarity of the medium-frequency and frequent sub-processes is set according to the similarityClustering to form component alternative set;
S4, set of all component alternativesEvaluating based on the feasibility of the multiplexing assembly, and calculating the feasibility of the assembly formed by the alternative sets;
s5, judging component alternative set according to feasibilityAnd if the feasibility index is not less than the feasibility index, constructing the reusable component, otherwise, abandoning the construction of the component.
Further, the step S1 includes:
s11, converting each flow chart node in the business flow chart set into a tripleNode =(nId,nLabel,nText) WhereinnIdA unique identification number representing the node;nLabelrepresenting the corresponding value of the state of the node in the flow;nTexta concrete row representing the nodeFor the purpose of description;
s12, converting the flow chart connecting line into a quadrupletEdge =(eId,nFrom,nTo,eLable) WhereineIdA unique identification number representing the connection;nFromthe unique identification number of the starting node of the connection is represented;nTothe unique identification number of the destination node representing the connection;eLablethe state of the connecting line is shown, and the purpose is to distinguish a common connecting line from a conditional connecting line;
s13, combining the results of S11 and S12, converting the business flow chart into a graph model G = (N, E) represented by symbols, whereinNBeing triadsNodeIn the collection of the images, the image data is collected,Eis a quadrupleEdgeA collection of (a).
Further, the step S2 includes:
s21, counting the frequency of the edges and nodes with the same function according to the preset minimum support degreemin_sptRemoval frequency less than minimum supportmin_sptTo obtain a new graph model(ii) a According to edgeeLableAnd of a nodenLabelTo determine whether the edge and node functions are the same;
s22, model of the graphThe edges and nodes in the graph are re-marked by frequency classification, the marking rule is that the marked edges or nodes are marked with the same frequency according to the frequency, the higher the frequency, the smaller the corresponding dictionary sequence of the marks is, and meanwhile, the mapping relation between the identification numbers and the marks and the corresponding relation between the marked edges and the original edges are stored;
s23, selecting the mark with the highest frequency, wherein the edge and the node corresponding to the mark form the maximum frequent sub-processAFor the most frequent sub-flowAExcavating, excavating a frequent sub-process each time;
S24, finishing digging according to the method of the step S23Sequentially mining the mark of the second high frequency, then the mark of the third high frequency, until all the frequent sub-processes are minedObtaining a frequent sub-process set。
Further, the step S23 includes:
s231, maximum frequent sub-flowAFor the first frequent sub-process, each subsequent frequent sub-process is reduced by one edge on the previous frequent sub-process; judging whether the current frequent sub-flow meets the minimumDFSCoding;
s232, if not satisfying the minimumDFSCoding, and ending the mining process of the sub-process;
s233, if the minimum is satisfiedDFSCoding, namely performing rightmost path expansion on the frequent sub-processes to obtain new frequent sub-processes;
the rightmost path expanding method comprises the following steps: given graphs G and GDFSA tree T that iteratively expands the set of vertices that graph G has been visited until a complete set is createdDFSA tree, adding a new edge e between the rightmost node and another node on the rightmost path, or introducing a new node and connecting to the node on the rightmost path;
s234, judging whether the new frequent sub-process meets the minimum support degreemin_sptIf the requirement is met, storing the new frequent sub-process into the frequent sub-process setPerforming the following steps;
further, the step S3 includes:
s31, calculating the semantic similarity of the nodes of the two frequent sub-processes;
s32, adding a hierarchy influence factor according to the semantic similarity of the frequent sub-process nodes, and calculating the behavior similarity of the frequent sub-processes;
s33, adopting a hierarchical clustering algorithm to perform frequent sub-processesClustering to form component alternative set。
Further, the step S31 includes:
s311, training the model by using Word2Vec, and connecting the nodesAnd nodeIs vectorized to obtainAnd;
s312, setting weight of nouns in node semanticsCalculating semantic similarity of nodes in the sub-process, and recording as:
And S313, calculating semantic similarity of all nodes in the two frequent sub-processes pairwise by adopting the methods of S311 and S312.
Further, the step S32 includes:
s321, calculating a hierarchy influence factor of each layer of the frequent sub-processesWhereinnGraph model depth representing frequent sub-processes:
S322、representing frequent sub-processesIn the first placeiThe set of nodes of the layer is,representing the number of elements in the set;andrespectively representing frequent sub-processesAnd frequent sub-processesIn the first placeiFirst of a layerjIndividual node according to node similarityCalculating the hierarchical similarity of the sub-processes:
S323, comprehensively considering the hierarchy similarityAnd hierarchy impact factorIs calculated to havenFrequent sub-flows of layersAnd frequent sub-processesDegree of similarity of behaviors between:
Further, the step S33 includes:
s331, calculating clustering clusters based on frequent sub-process behavior similarityAndis a distance of :
S332, regarding each frequent sub-process as an initial clustering cluster;
s333, finding out two cluster clusters closest to each other for merging, and repeating the process continuously until all frequent sub-processes become a cluster;
s334, recording clustering cluster division results of each layer in the clustering process to form component alternative set。
Further, the specific method of step 4 is as follows:
s41, calculating intra-cluster similarity of component alternative setICS(C):
s42, statistical component alternative setInThe number of sub-processes appearing in the same original flow chart is marked as;
S43, statistical component alternative setInThe number of sub-processes appearing in different original flow charts is recorded as;
S44, settingkSet of component alternativesIn (1)The number of the sets is the same as the number of the sets,setting global coincidence rate weightHeavy loadAnd local coincidence rate weightCalculating the component coincidence rate of the component candidate set:
S45, considering similarity and coincidence rate in the clusters, and calculating feasibility of the assembly candidate set as an assembly:
further, the step S5 includes:
s51, setting assembly feasibility indexes and judging assembly alternative setsWhether the feasibility index is met;
s52, if the feasibility index is larger than or equal to the feasibility index, outputting an alternative set, and constructing a reusable component;
and S53, if the feasibility index is smaller than the feasibility index, abandoning the construction assembly.
Advantageous effects
The industrial management software system has a large number of same transaction processing processes in each service scene, the method performs flow similarity analysis facing to service flows, can mine a similar sub-flow set from the service flows as a component module of the software, and greatly improves the speed of constructing components on the premise of ensuring the component quality;
according to the method, the reusable component based on the system can be automatically excavated only by inputting the service flow chart of the system, other configurations are not needed, the development period of industrial management software is optimized, the software development difficulty is reduced, and powerful support is provided for the reduction of the subsequent software maintenance cost;
by implementing the actual software flow chart data of an enterprise specifically, the method automatically excavates the reusable components, so that software development can be better carried out by utilizing a software multiplexing technology, and support is provided for improving the development efficiency of industrial management software and reducing the development cost of the industrial management software.
The invention can analyze the business process chart set in the industrial management software, quickly excavate the sub-processes with higher multiplexing from the complex business flow as the reusable component, greatly improve the speed of constructing the component, and better utilize the software multiplexing to develop the software.
Drawings
FIG. 1 is a flow chart of the scheme of the invention.
FIG. 2 is an example of the conversion of the business process flow diagram of S1 of the present invention into a graph model.
FIG. 3 is a flowchart illustrating the step S2 according to the present invention.
FIG. 4 is a flowchart illustrating the step S3 according to the present invention.
FIG. 5 is a diagram illustrating a rightmost path expansion method according to the present invention.
Detailed Description
In order to facilitate the understanding of the technical contents of the present invention by those skilled in the art, the present invention will be further explained with reference to the accompanying drawings.
As shown in fig. 1, which is a flowchart of a solution of the present invention, a mining method for reusable components oriented to complex business processes of the present invention includes:
s1, constructing a front-end flowchart drawing function based on JavaScript, inputting a service flowchart set of an industrial management software system on a front-end page, obtaining a data set in a Json format, saving the file as flowchart.
The flow chart preprocessing module comprises the following specific processes:
s11, reading file flowcsv, parsing Json formatted data based on Python language. New constructionNodeConverting the flow chart nodes into triplesNode =(nId,nLabel,nText) WhereinnIdA unique identification number representing the node;nLabelthe corresponding value of the state of the node in the flow is shown as table 1;nTextrepresenting a specific behavioral description of the node;
TABLE 1
Node state classes | Correspond tonLabel |
Start of | 1 |
End up | 2 |
Step (ii) of | 3 |
Condition | 4 |
Document | 5 |
Data of | 6 |
Frequent sub-processes | 7 |
Annotating | 8 |
S12, New constructionEdgeConverting the flow chart connecting line into a quadrupletEdge =(eId,nFrom,nTo,eLable) WhereineIdA unique identification number representing the connection;nFromthe unique identification number of the starting node of the connection is represented;nTothe unique identification number of the destination node representing the connection;eLablethe state of the link is shown for the purpose of distinguishing the normal link from the conditional link, as shown in table 2;
TABLE 2
Connection type | Correspond toeLable |
Common connecting wire | 1 |
Conditional connecting line Y | 2 |
Conditional connecting line N | 3 |
S13, combining the results obtained in S11 and S12 to create a new classGraphConverting the business flow chart into a binary groupG=(N,E) WhereinNBeing triadsNodeIn the collection of the images, the image data is collected,Eis a quadrupleEdgeA collection of (a). Converting each flowchart intoGraphAfter the class, add to graph model set graph _ set.
S2, mining sub-processes with similar structures in the directed graph model data set based on the frequent sub-process algorithm gSpan, and recording the sub-processes as a setEach set has several structurally similar frequent sub-processes, denoted as. Setting algorithm parameters: minimum support of frequent sub-processesmin_sptIs 2.
The specific steps of the frequent sub-flow algorithm gSpan are shown in fig. 3, and are as follows:
s21, traverseUse ofCounting the occurrence frequency of labels of edges and nodes in the graph, and respectively storing the occurrence frequency inAndthe sorting is done from high to low according to frequency. According to the preset minimum support degreemin_sptRemoval ofAndthe nodes and edges with lower frequency are obtained to obtain a new graphG. How to measure the frequency of the sub-processes is judged by the support degree, if the sub-processes are more than or equal to the minimum support degreemin_sptThen the sub-flow is considered frequent;
s22, creating DFSEdge class, and comparing the graphGIs relabeled as DFSEdge (0,1, vevlb) and added toDFSEncodingIn, rule of labeling is high according to frequencyLow, i.e., the higher the frequency, the smaller the lexicographic order corresponding to the labeled tags. Meanwhile, the mapping relation between the original label and the new label is stored and recorded as;
S23, forming a set by the edges with the highest frequencyE. And put the sets in alphabetical orderEThe lower the lexicographic order, the higher the frequency, the earlier the corresponding edge is ranked.
S24, setEThe edge in (2) can be regarded as the simplest frequent sub-process, and depth-first mining is carried out based on the frequent sub-process.
Further, the step S24 includes:
s241, judging whether the frequent sub-processes meet the minimumDFSAnd (5) encoding.
S244, if not minimumDFSCoding, namely finishing the mining process of the frequent sub-process;
s242, if it is minimumDFSAnd coding, namely performing rightmost path expansion on the frequent sub-processes to obtain new frequent sub-processes. And (3) rightmost path expansion: given a diagramGAndGis/are as followsDFSTree (R)T (the accessed vertex set is iteratively expanded until a complete vertex set is createdDFSTree), a new edgeeMay be added between the rightmost node and another node on the rightmost path (backward expansion) or a new node may be introduced and connected to the node on the rightmost path (forward expansion). Since both extensions occur on the rightmost path, as shown in FIG. 5;
s243, judging whether the new frequent sub-flow meets the minimum support degreemin_sptIf the requirement is met, adding the new frequent sub-processes into the result set, and returning to S241 to continue recursive mining;
s3 calculating a structure-similar frequent sub-process setMiddle two elementsAndthe behavior similarity of the component is measured according to the similarity, the distance between the frequent sub-processes is measured, the sets are subjected to hierarchical clustering based on the distance, and a component alternative set is formed。
The specific process of step 3 is shown in fig. 4, and is as follows:
s31, according to the frequent sub-process nodesnTextIn pairwise frequent sub-processes of information calculationAndsemantic similarity of nodesNode_similarity;
The specific steps of calculating the semantic similarity of the nodes are as follows:
311. introducing python into a programThe module is used for training a Word2Vec model by using corpora such as Wikipedia and engineering terms, and saving a file as embedding _64. model;
s312, traversing all the nodes of the flow chart in the frequent sub-flow set, and performing the next stepnTextJoin to a collectionIn (1). Introducing python into a programModule, using the sameFunction pairAll ofnTextAnd performing word segmentation. Then, importing a file embedding _64.model, loading a word2vec model, and finally recording word information semantics after the opposite quantitative word segmentation of the model into a form ofvec(n) And stores the word vector matrix;
S313, using a loop statement pairAny two different frequent sub-flowcharts in each set perform the following operations: two frequent sub-processes are obtained in each layer through breadth-first search calculationAndcorresponding nodeAndis/are as followsnTextUse ofModule pairnTextWord segmentation and then passage through word vector matrixObtaining word vector values, and obtaining semantic vectors of nodes after mean value calculationAndwhile setting the weight of nouns in node semanticsCalculating the semantic similarity of the nodes in the frequent sub-processes according to the following formulaIs marked as:
S32, according to the semantic similarity of the nodes in the sub-processNode_similarityAdding a hierarchy influence factorHierarchical_WeightCalculating the behavior similarity of the frequent sub-processesBehavioral_Similarity。
The specific steps for calculating the behavior similarity of the sub-processes are as follows:
s321, recording the layer where the node with the income degree of 0 in the frequent sub-flowchart model is located as the 0 layer, recording the layer where the child node of the 0 layer is located as the 1 layer, and so on. In the breadth-first search in step S313, the graph model depth of the frequent sub-processes is calculated at the same time and recorded asnFinally, the hierarchical influence factors of each layer are circularly calculated according to the following formula:
S322, in the breadth first search of the step S313, statistics is carried outAndfirst, theiAnd (4) collecting the nodes of the layer, and calculating the number of the nodes in the collection. Root of SimultaneousAccording to the node similarity obtained in S313To be connected toiThe similarity of the corresponding nodes of the layer is obtained by superpositionIs divided byiNumber of layer nodesObtaining the hierarchical similarity of the processIs marked as:
S323, comprehensively considering the hierarchy similarityAnd hierarchy impact factorCalculated according to the following formula to havenFrequent sub-flows of layersAnddegree of similarity of behaviors betweenIs marked as:
S33, adopting a hierarchical clustering algorithm to collect each frequent sub-processClustering is carried out, the sets are divided at different levels according to the distance between clustering clusters to form component alternative sets。
The hierarchical clustering comprises the following specific steps:
s331, calculating clustering clusters according to hierarchical clustering Complete-Link definitions and based on behavior similarity of pairwise frequent sub-processesAndis marked asAnd is and:
S333, repeating the following processes until all the frequent sub-processes become a cluster: finding two cluster clusters closest to each otherAndmerging and clusteringRenumbering is,Deleting distance matrixTo (1) aAnd row and columnAnd (4) columns. Storing clustering result and merging distance into structure matrix every time of repetition;
S334, traversing result matrixRecording the set division results of different levels to form a component alternative set;
S4, set of all component alternativesAnd evaluating based on the feasibility of the multiplexing component, and calculating the feasibility of the component formed by the component candidate set.
The specific process of the step 4 is as follows:
s41 rootAccording toComputing a set of component alternatives according to the formulaDegree of intra-cluster similarity ofIs added toIn, convert toObject, is marked as:
S42, statistical component alternative setInThe number of frequent sub-processes appearing in the same original flow chart is recorded as;
S43, statistical component alternative setInThe number of frequent sub-processes appearing in different original flow charts is recorded as;
S44, settingkSet of component alternativesIn (1)The number of the sets is the same as the number of the sets,setting global coincidence rate weightAnd local coincidence rate weightCalculating the component coincidence rate of the component candidate set according to the following formulaIs added toIn (1). Introduction intoOf modulesFunction, using the functionTo be converted intoOf objectsData normalization was performed and recorded as:
s45, comprehensively considering similarity and coincidence rate in the clusters, and calculating the alternative set according to the following formulaFeasibility as a component, is:
Step S5 includes:
s51, setting assembly feasibility indexes and judging assembly alternative setsIs/are as followsWhether the feasibility index is met;
s52, if the feasibility index is larger than or equal to the feasibility index, outputting an alternative set, and constructing a reusable component;
and S53, if the feasibility index is smaller than the feasibility index, abandoning the construction assembly.
Claims (10)
1. A reusable component mining method for complex business processes is characterized by comprising the following steps:
s1, inputting a service flow chart set of the system, and converting the service flow chart set into a graph model set represented by symbols through preprocessing;
s2, mining frequent sub-processes with similar structures through the graph model set, and mining all frequent sub-processes with similar structuresComposition set,;
S3 calculation setThe behavior similarity of the medium-frequency and frequent sub-processes is set according to the similarityClustering to form component alternative set;
S4, set of all component alternativesEvaluating based on the feasibility of the multiplexing assembly, and calculating the feasibility of the assembly formed by the alternative sets;
2. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S1 includes:
s11, converting each flow chart node in the business flow chart set into a tripleNode =(nId,nLabel,nText) WhereinnIdA unique identification number representing the node;nLabelrepresenting the corresponding value of the state of the node in the flow;nTextrepresenting a specific behavioral description of the node;
s12, converting the flow chart connecting line into a quadrupletEdge =(eId,nFrom,nTo,eLable) WhereineIdA unique identification number representing the connection;nFromthe unique identification number of the starting node of the connection is represented;nTothe unique identification number of the destination node representing the connection;eLablethe state of the connecting line is shown, and the purpose is to distinguish a common connecting line from a conditional connecting line;
s13, combining the results from S11 and S12 to convert the business flow chart into a graph model represented by symbolsG=(N,E) Where N is a tripletNodeIn the collection of the images, the image data is collected,Eis a quadrupleEdgeA collection of (a).
3. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S2 includes:
s21, counting the frequency of the edges and nodes with the same function according to the preset minimum support degreemin_sptRemoval frequency less than minimum supportmin_sptTo obtain a new graph model(ii) a According to edgeeLableAnd of a nodenLabelTo determine whether the edge and node functions are the same;
s22, model of the graphThe edges and nodes in the Chinese character are re-marked by frequency classification, the marking rule is that the higher the frequency, the smaller the corresponding dictionary sequence of the marks is, the edges or nodes with the same frequency are marked with the same word, and at the same time, the identification number and the marked characters are storedMapping relation between the original edges and the original edges;
s23, selecting the mark with the highest frequency, wherein the edge and the node corresponding to the mark form the maximum frequent sub-processAFor the most frequent sub-flowAExcavating, excavating a frequent sub-process each time;
4. The method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S3 includes:
s31, calculating the semantic similarity of the nodes of the two frequent sub-processes;
s32, adding a hierarchy influence factor according to the semantic similarity of the frequent sub-process nodes, and calculating the behavior similarity of the frequent sub-processes;
5. The mining method for reusable components oriented to complex business processes, according to claim 1, characterized in that the specific method in step 4 is:
s41, calculating intra-cluster similarity of component alternative setICS(C):
s42, statistical component alternative setMiddle clusterThe number of sub-processes appearing in the same original flow chart is marked as;
S43, statistical component alternative setMiddle clusterThe number of sub-processes appearing in different original flow charts is recorded as;
S44, settingkSet of component alternativesCluster in (1)The number of the sets is the same as the number of the sets,setting global coincidence rate weightAnd local coincidence rate weightCalculating the component coincidence rate of the component candidate set:
S45, considering similarity and coincidence rate in the clusters, and calculating feasibility of the assembly candidate set as an assembly:
6. the method for mining reusable components oriented to complex business processes, according to claim 1, wherein the step S5 includes:
s51, setting assembly feasibility indexes and judging assembly alternative setsIs/are as followsWhether the feasibility index is met;
s52, if the feasibility index is larger than or equal to the feasibility index, outputting an alternative set, and constructing a reusable component;
and S53, if the feasibility index is smaller than the feasibility index, abandoning the construction assembly.
7. The method for mining reusable components oriented to complex business processes, according to claim 3, wherein the step S23 includes:
s231, maximum frequent sub-flowAFor the first frequent sub-process, each subsequent frequent sub-process is reduced by one edge on the previous frequent sub-process; judging whether the current frequent sub-flow meets the minimumDFSCoding;
s232, if not satisfying the minimumDFSCoding, and ending the mining process of the sub-process;
s233, if the minimum is satisfiedDFSCoding, namely performing rightmost path expansion on the frequent sub-processes to obtain new frequent sub-processes;
the rightmost path expanding method comprises the following steps: given graphs G and GDFSTree (R)TThe tree T is repeatedly expanded for the set of vertices that graph G has been visited until a complete set is createdDFSA tree, adding a new edge e between the rightmost node and another node on the rightmost path, or introducing a new node and connecting to the node on the rightmost path;
8. The method for mining reusable components oriented to complex business processes, according to claim 4, wherein the step S31 includes:
s311, training the model by using Word2Vec, and connecting the nodesAnd nodeIs vectorized to obtainAnd;
s312, setting weight of nouns in node semanticsCalculating semantic similarity of nodes in the sub-process, and recording as:
And S313, calculating semantic similarity of all nodes in the two frequent sub-processes pairwise by adopting the methods of S311 and S312.
9. The method for mining reusable components oriented to complex business processes, according to claim 4, wherein the step S32 includes:
s321, calculating a hierarchy influence factor of each layer of the frequent sub-processesWhereinnGraph model depth representing frequent sub-processes:
S322、representing frequent sub-processesIn the first placeiThe set of nodes of the layer is,representing the number of elements in the set;andrespectively representing frequent sub-processesAnd frequent sub-processesIn the first placeiFirst of a layerjIndividual node according to node similarityCalculating the hierarchical similarity of the sub-processes:
S323, comprehensively considering the hierarchy similarityAnd hierarchy impact factorIs calculated to havenFrequent sub-flows of layersAnd frequent sub-streamsProgram for programmingBehavioral similarity between:
10. the method for mining reusable components oriented to complex business processes, according to claim 4, wherein the step S33 includes:
s331, calculating clustering clusters based on frequent sub-process behavior similarityAndis a distance of :
S332, regarding each frequent sub-process as an initial clustering cluster;
s333, finding out two cluster clusters closest to each other for merging, and repeating the process continuously until all frequent sub-processes become a cluster;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110804713.3A CN113254013B (en) | 2021-07-16 | 2021-07-16 | Reusable component mining method for complex business process |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110804713.3A CN113254013B (en) | 2021-07-16 | 2021-07-16 | Reusable component mining method for complex business process |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113254013A true CN113254013A (en) | 2021-08-13 |
CN113254013B CN113254013B (en) | 2021-09-24 |
Family
ID=77180513
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110804713.3A Active CN113254013B (en) | 2021-07-16 | 2021-07-16 | Reusable component mining method for complex business process |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113254013B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116185395A (en) * | 2023-04-21 | 2023-05-30 | 华能信息技术有限公司 | Flow component templatization definition method and system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015021404A2 (en) * | 2013-08-08 | 2015-02-12 | Systamedic Inc. | Method for knowledge extraction through data mining |
CN104767736A (en) * | 2015-03-23 | 2015-07-08 | 电子科技大学 | Method for separating unknown single protocol data stream into different types of data frames |
CN104954453A (en) * | 2015-06-02 | 2015-09-30 | 浙江工业大学 | Data mining REST service platform based on cloud computing |
CN109063727A (en) * | 2018-06-19 | 2018-12-21 | 东软集团股份有限公司 | Calculate method, apparatus, storage medium and the electronic equipment of track frequency |
CN109272155A (en) * | 2018-09-11 | 2019-01-25 | 郑州向心力通信技术股份有限公司 | A kind of corporate behavior analysis system based on big data |
CN109902284A (en) * | 2018-12-30 | 2019-06-18 | 中国科学院软件研究所 | A kind of unsupervised argument extracting method excavated based on debate |
CN111984688A (en) * | 2020-08-19 | 2020-11-24 | 中国银行股份有限公司 | Method and device for determining business knowledge association relation |
CN112764749A (en) * | 2021-01-18 | 2021-05-07 | 电子科技大学 | Method and system for generating software functional interface group |
-
2021
- 2021-07-16 CN CN202110804713.3A patent/CN113254013B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015021404A2 (en) * | 2013-08-08 | 2015-02-12 | Systamedic Inc. | Method for knowledge extraction through data mining |
CN104767736A (en) * | 2015-03-23 | 2015-07-08 | 电子科技大学 | Method for separating unknown single protocol data stream into different types of data frames |
CN104954453A (en) * | 2015-06-02 | 2015-09-30 | 浙江工业大学 | Data mining REST service platform based on cloud computing |
CN109063727A (en) * | 2018-06-19 | 2018-12-21 | 东软集团股份有限公司 | Calculate method, apparatus, storage medium and the electronic equipment of track frequency |
CN109272155A (en) * | 2018-09-11 | 2019-01-25 | 郑州向心力通信技术股份有限公司 | A kind of corporate behavior analysis system based on big data |
CN109902284A (en) * | 2018-12-30 | 2019-06-18 | 中国科学院软件研究所 | A kind of unsupervised argument extracting method excavated based on debate |
CN111984688A (en) * | 2020-08-19 | 2020-11-24 | 中国银行股份有限公司 | Method and device for determining business knowledge association relation |
CN112764749A (en) * | 2021-01-18 | 2021-05-07 | 电子科技大学 | Method and system for generating software functional interface group |
Non-Patent Citations (4)
Title |
---|
YU DENG 等: "An Improved Deep Neural Network Model for Job Matching", 《IN PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA》 * |
潘鑫: "基于相似度估计文档复制检测系统的设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
谭军: "面向产品持续质量控制的数据挖掘技术与应用研究", 《中国博士学位论文全文数据库信息科技辑》 * |
贾小贝 等: "基于Web日志的用户行为相似度的业务流分析方法", 《长江大学学报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116185395A (en) * | 2023-04-21 | 2023-05-30 | 华能信息技术有限公司 | Flow component templatization definition method and system |
Also Published As
Publication number | Publication date |
---|---|
CN113254013B (en) | 2021-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110245981B (en) | Crowd type identification method based on mobile phone signaling data | |
CN112765358B (en) | Taxpayer industry classification method based on noise label learning | |
Dong et al. | Tablesense: Spreadsheet table detection with convolutional neural networks | |
Xue et al. | Res2tim: Reconstruct syntactic structures from table images | |
CN110889310B (en) | Financial document information intelligent extraction system and method | |
CN107577702B (en) | Method for distinguishing traffic information in social media | |
CN113255321B (en) | Financial field chapter-level event extraction method based on article entity word dependency relationship | |
CN111737477A (en) | Intellectual property big data-based intelligence investigation method, system and storage medium | |
CN103544186A (en) | Method and equipment for discovering theme key words in picture | |
CN111274817A (en) | Intelligent software cost measurement method based on natural language processing technology | |
CN113254013B (en) | Reusable component mining method for complex business process | |
CN114863091A (en) | Target detection training method based on pseudo label | |
CN110827131A (en) | Tax payer credit evaluation method based on distributed automatic feature combination | |
CN115437952A (en) | Statement level software defect detection method based on deep learning | |
CN114238524A (en) | Satellite frequency-orbit data information extraction method based on enhanced sample model | |
CN114444484A (en) | Document-level event extraction method and system based on double-layer graph | |
Shen et al. | Divide rows and conquer cells: Towards structure recognition for large tables | |
CN113469005A (en) | Recognition method of bank receipt, related device and storage medium | |
CN111597806A (en) | Method, equipment and medium for identifying short message text template based on statistical model | |
US20220076109A1 (en) | System for contextual and positional parameterized record building | |
CN114780403A (en) | Software defect prediction method and device based on enhanced code attribute graph | |
CN114519344A (en) | Discourse element sub-graph prompt generation and guide-based discourse-level multi-event extraction method | |
CN115017144A (en) | Method for identifying judicial writing case element entity based on graph neural network | |
Chen et al. | Land-cover change detection using paired OpenStreetMap data and optical high-resolution imagery via object-guided Transformer | |
Shi et al. | Graph Guided Transformer: An Image-Based Global Learning Framework for Hyperspectral Image Classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |