Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, provides a cross-tissue multi-instance sub-process model mining method, and solves the problem that the accuracy of a traditional process mining technology on an event log mining model with multi-instance sub-process information is low.
A second object of the present invention is to provide a cross-tissue multi-instance sub-process model mining system.
The first object of the invention is achieved by the following technical scheme: a cross-tissue multi-instance sub-process model mining method comprises the following steps:
1) Acquiring basic data, including a lifecycle event log of a cross-organization multi-instance sub-process;
2) According to the life cycle event logs obtained in the step 1), the nesting relation among the life cycle event log activities is mined and expressed by a nesting relation tree;
3) Constructing a hierarchical event log according to the nested relation tree obtained by digging in the step 2);
4) Carrying out multi-instance identification and reconstruction on the sub-logs according to the hierarchical event logs obtained through the mining in the step 3) to obtain reconstructed hierarchical event logs;
5) Mining a cross-organization multi-instance sub-process model from the reconstructed hierarchical event log of step 4).
Further, in step 1), the lifecycle event log of the cross-organization multi-instance sub-process is a lifecycle event log with multi-instance sub-process information; the cross-organization multi-instance sub-process refers to that an enterprise gives part of internal business to other enterprises, so that the working efficiency of the enterprise is improved, and the operation cost of the enterprise is reduced; the multi-instance sub-process refers to that in a business scene with a parent process and a sub-process, a calling relationship exists between the activities of the parent process and the sub-process, and the sub-process can be instantiated for a plurality of times in the process of calling execution, namely a plurality of instances running in parallel exist; the lifecycle event log with multi-instance sub-process information is a multi-set of cases, one case is a finite sequence of activities and there are inter-activity calling relationships and multi-instance sub-process cross-actions in the case, each activity in the lifecycle event log has both start and end lifecycle information.
Further, in step 2), the nested relation between activities is mined by using the life cycle event log obtained in step 1); the nested relationship is defined as that in the trace σ of the lifecycle event log, activity a, nested activity b, needs to satisfy the following condition:
(1) the ith position in trace σ is activity a, and the lifecycle state of activity a is start; (2) the j-th position in trace σ is activity b, and the lifecycle state of activity b is start; (3) the kth position in trace σ is activity b, and the lifecycle state of activity b is end; (4) the first position in trace σ is activity a, and the lifecycle state of activity a is end; the i, j, k, l mentioned above satisfies the following: i < j < k < l;
the specific process of the nesting relation mining between activities is as follows:
2.1 Taking the life cycle event log as input to obtain event information required by mining the nesting relation between activities;
2.2 Nesting relationship between mining activities): constructing a nested active set according to all nested relations among activities in the defined mining life cycle event log of the nested relations; all nested relations among the activities are represented by a nested relation tree;
the nested active set refers to an active set in the above process, and meets the following conditions: any activity a in the nested activity set, an activity b exists in the activity set A, and an activity a is nested with an activity b; wherein the active set a is an active set of the lifecycle event log L, i.e. a set of all different activities;
The nested relationship tree is a triplet htre (L) = (rooact, HNode (rootAct), η), where:
①is a root node set, wherein A is an active set of a lifecycle event log L, i.e., a set of all different activities; the root node set satisfies: i. no nesting relationship exists between the root node set activities; ii. The activities in the root node set cannot be nested as other activities;
②HNode(rootAct)={Act a the |a epsilon NA epsilon rootAct) } is a child node set of the root node rootAct, where NA is the nested active set of the lifecycle event log L; act a Sub-node set for Activity a, act a Any activity b in the list satisfies the nesting activity b of the activity a;
(3) eta is a mapping function of nested active set to child node set, i.e. eta (a) =act a The method comprises the following steps: i. any activity a in the nested active set NA, there is a set of child nodes Act a For Act a All activities b in the list have activity a nested activity b; ii. No nesting relationship exists between activities in the child node set; iii, nesting Activity to child node setThe mapping relationship constitutes a tree structure.
Further, in step 3), constructing a hierarchical event log according to the nested relationship tree obtained in step 2); the nested relationship tree is a triplet htre (L) = (rooact, HNode (rootAct), η), where:
①Is a root node set, wherein A is an active set of a lifecycle event log L, i.e., a set of all different activities; the root node set satisfies: i. no nesting relationship exists between the root node set activities; ii. The activities in the root node set cannot be nested as other activities;
②HNode(rootAct)={Act a the |a epsilon NA epsilon rootAct) } is a child node set of the root node rootAct, where NA is the nested active set of the lifecycle event log L; act a Sub-node set for Activity a, act a Any activity b in the list satisfies the nesting activity b of the activity a;
(3) eta is a mapping function of nested active set to child node set, i.e. eta (a) =act a The method comprises the following steps: i. any activity a in the nested active set NA, there is a set of child nodes Act a For Act a All activities b in the list have activity a nested activity b; ii. No nesting relationship exists between activities in the child node set; iii, embedding the mapping relation from the activity to the sub-node set to form a tree structure;
the nesting relationship mentioned in the above procedure is defined as that in the trace σ of the lifecycle event log, the activity a, nesting activity b, needs to satisfy the following condition:
(1) the ith position in trace σ is activity a, and the lifecycle state of activity a is start; (2) the j-th position in trace σ is activity b, and the lifecycle state of activity b is start; (3) the kth position in trace σ is activity b, and the lifecycle state of activity b is end; (4) the first position in trace σ is activity a, and the lifecycle state of activity a is end; the i, j, k, l mentioned above satisfies the following: i < j < k < l;
The nested active set refers to an active set in the above process, and meets the following conditions: any activity a in the nested activity set, an activity b exists in the activity set A, and an activity a is nested with an activity b; wherein the active set a is an active set of the lifecycle event log L, i.e. a set of all different activities;
the hierarchical event log is defined as HL (L) = (rootLog, HL (rootLog)), where:
①the method comprises the steps of taking zeta as a track set of a life cycle event log L, taking sigma as any track in the life cycle event log L, taking sigma (i) as an activity of an ith position of the track, taking roottact as a root node set of the life cycle event log, and taking all tracks in the root node log as a root node set of a nested relation tree, wherein all activities in i and sigma of the track belong to the root node set of the nested relation tree; ii. The ith position of the track is activity a, the jth position is activity b and i<j, the ending time of the activity a is required to be satisfied before the ending time of the activity b;
②HL(rootLog)={(a,NLog a ) The I a epsilon NA epsilon rootAct } is a sub-log of a root node log, wherein NA is a nested active set of life cycle event logs, a is one activity in the nested active set, and rootAct is a root node set of a nested relationship tree, and the following are satisfied:
if no activity in the root node set is nested, then the sub-log is empty; otherwise, any activity b in the track sigma of the sub-log satisfies: i. there is an activity c nested activity b in the root node set; ii. If the ith position in trace σ of the sub-log is activity b, the jth position is activity d and i < j, then the end time of activity b is earlier than the end time of activity d;
The specific process of constructing the hierarchical event log is as follows:
3.1 Taking the life cycle event log and the nested relation tree as input to obtain root node information and nested relation information;
3.2 A root node set of the lifecycle event log is constructed;
3.3 Constructing a sub-process event log corresponding to each root node in the root node set of the step 3.2), and removing the activities corresponding to the root nodes from the nested activity set;
3.4 Iteratively performing the operation steps 3.2) and 3.3) until the nested active set is empty, and finally constructing the hierarchical event log.
Further, in step 4), performing multi-instance recognition and reconstruction on the sub-logs of each level according to the hierarchical event log constructed in step 3); the hierarchical event log is defined as HL (L) = (rootLog, HL (rootLog)), where:
①the method comprises the steps of obtaining a root node log, wherein ζ is a trace set of the life cycle event log L, σ is any trace in the life cycle event log L, σ (i) represents the activity of the ith position of the trace, rootAct is a root node set of the life cycle event log, and all traces in the root node log satisfy: i. all activities in the trace σ belong to the root node set of the nested relationship tree; ii. The ith position of the track is activity a, the jth position is activity b and i <j, the ending time of the activity a is required to be satisfied before the ending time of the activity b;
②HL(rootLog)={(a,NLog a ) The I a epsilon NA epsilon rootAct } is a sub-log of a root node log, wherein NA is a nested active set of life cycle event logs, a is one activity in the nested active set, and rootAct is a root node set of a nested relationship tree, and the following are satisfied:
if no activity in the root node set is nested, then the sub-log is empty; otherwise, any activity b in the track sigma of the sub-log satisfies: i. there is an activity c nested activity b in the root node set; ii. If the ith position in trace σ of the sub-log is activity b, the jth position is activity d and i < j, then the end time of activity b is earlier than the end time of activity d;
the multi-instance identification and reconstruction is to divide the staggered track with multi-instance behaviors in the sub-logs into a plurality of tracks, allocate the best case for the activity to obtain a new sub-log, and update the new sub-log into the layered event log to obtain a reconstructed layered event log;
according to the hierarchical event log, carrying out multi-instance identification and reconstruction on the sub-log, wherein the specific process is as follows:
4.1 Inputting a hierarchical event log, obtaining sub-logs of each level and constructing a sub-log set;
4.2 Multiple instance identification is carried out on each sub-log in the sub-log set, and the specific process is as follows:
4.2.1 Input sub-log NLog a Initializing an activity frequency matrix M and a sub-log track sequence; the row and the column of the activity frequency matrix M are formed by all different activities in the sub-logs, starting activities and ending activities, any two activities a and b are formed, the value in the activity frequency matrix is M (a, b), and the value of the activity frequency matrix is the frequency of the adjacent relation of the activities; the start activity and the end activity do not represent any actual activity, only represent that a track is started or ended at a certain activity, no transition points to the start activity, and no transition points from the end activity;
the frequencies of the immediate relationship are defined as: the frequency of the immediate relationship of Activity a and Activity b, denoted by P (a > b):
where |a > b| is the frequency of activity a immediately adjacent to activity b, |a > c| is the frequency of activity a immediately adjacent to activity c;
the frequency of the immediate relationship between activities is defined as: the frequency of activity a immediately adjacent to activity b is denoted by |a > b|, i.e., in sub-log NLog a The number of times that activity a is next to activity b is satisfied in all the trajectories of (a);
the above mentioned close relationship is defined as: NLog a Is a sub-log of activity a, in sub-log NLog a In trajectory σ of (a), activity b satisfies next to activity c: (1) the ith position is activity b; (2) the (i+1) th position is activity c;
sub-log sequenceWherein #, is # trans (sigma (i)) represents the lifecycle state of the activity of the ith position of the track sigma, sub-log sequence SLog a The method meets the following conditions: i. the ith position in the sub-log sequence is a, the jth position is b, the end time of activity a is earlier than the end time of activity b and i<j; ii. The life cycle state of the activity in the sub-log sequence is complete;
4.2.2 According to the activity frequency matrix M, the optimal case is allocated for the activity in the sub-log sequence, and the specific process is as follows:
4.2.2.1 Input sub-log sequence SLog) a Defining the current activity q to point to the sub-log sequence SLog a Is the first activity in (a);
4.2.2.2 Assigning optimal cases to the activities according to the activity frequency matrix M, where there are two cases:
first, if the active set is empty, or any of the active sets is active b, there is M (b, q)<M(S e Q), then a new case σ is started at the current activity q, which becomes the enabled activity and joins the enabled active set; wherein S is e Is to start activity; m (b, q) is the value of activity b and activity q in the activity frequency matrix, i.e. the frequency of activity b immediately adjacent to activity q; m (S) e Q) is activity S e And the value of activity q in the activity frequency matrix, i.e. activity S e The frequency of immediately adjacent activity q; the enabling active set refers to a set of activities in an enabled state;
second, if there is one enabling activity b in the enabling activity set, it is satisfied that the M (b, q) value is the largest and M (b, q)>M(S e Q), i.e. b is most frequent next to q, when i, if M (q, c) for any activity c in the active set<M(q,C e ) Then the current activity q is assigned to the case in which the enabling activity b is located, and the case is ended at activity q, enabling activity b is removed from the set of enabling activities, wherein C e To end the activity; m (q, c) is the value of activity q and activity c in the activity frequency matrix, i.e. the frequency at which activity q is immediately adjacent to activity c; m (q, C) e ) Is Activity q and Activity C e Values in an active frequency matrix, i.e.Activity q is next to activity C e Is a frequency of (2); ii. Otherwise, the current activity q is allocated to the case where the enabling activity b is located, the current activity q becomes the enabling activity and is added to the enabling activity set, and the enabling activity b is removed from the enabling activity set, namely, a new case is not started at the current activity q, and a certain case in a running state is not ended; if not, selecting any one of the enabling activities b from the enabling activity set, distributing the current activity q to the case where the enabling activity b is located, removing the enabling activity b from the enabling activity set, changing the current activity q into the enabling activity and adding the enabling activity q into the enabling activity set;
Directing the current activity q to the sub-log sequence SLog a The next activity in (a);
4.2.2.3 Iteratively performing step 4.2.2.2) until the current activity q points to null, outputting a new sub-log;
4.2.3 According to the new sub-log, calculating the frequency of the close relation between the activities, and updating the activity frequency matrix M;
4.2.4 Iteratively executing the steps 4.2.2) and 4.2.3) until no value changes when the active frequency matrix M is updated, and outputting the sub-log after reconstruction;
4.3 Step 4.2) is repeatedly executed until the sub-log set is empty, and the reconstructed hierarchical event log is output.
Further, in step 5), mining a cross-organization multi-instance sub-process model from the hierarchical event log reconstructed in step 4); the cross-organization multi-instance sub-process model is a layered multi-instance Petri network defined as HPMN= (Q, PN) N0 NA, map), where i,For all nested transitions t i Corresponding sub-modelNA is the nested active set; ii. PN (Positive-negative) network N0 Is a Petri net with nested transition at the top layer; iii, NA is the set of all nested transitions; iv, map, NA→ {1, + } XQ\ { PN ] N0 }, i.e.)>If->Then nest transition t i Call one sub-procedure, otherwise->Nested transition t i Calling a plurality of sub-processes;
The nested transitions mentioned above are activities that may trigger sub-process multi-instance behavior;
the Petri network with nested transitions mentioned above is a binary PN N = (PN, β), satisfy: (1) PN is a label Perti network; (2) beta is a nested transition function, which satisfies the condition: for any one T in the transition set T, β (T) =g means that T is a normal transition, and β (T) =n means that T is a nested transition;
the above-mentioned label Petri net is a four-tuple pn= (P, T, F, l), where P represents the pool set, T represents the transition set,is a set of directed arcs representing a flow relationship; t- & gtΓ is a transition marking function, and the transition set T points to the marking set Γ, which indicates that each transition activity has a label corresponding to the transition activity;
the specific process of the hierarchical multi-instance Petri network model mined by the reconstructed hierarchical event log is as follows: and taking the obtained layered event log as input, and mining the sub-process event log by using a traditional process mining method, namely an Inductive Miner, so as to obtain a sub-process model, and finally obtaining the cross-organization multi-instance sub-process model.
The second object of the invention is achieved by the following technical scheme: a cross-organization multi-instance sub-process model mining system comprises a data acquisition module, an active nested relation construction module, a layered event log construction module and a cross-organization multi-instance sub-process model mining module;
The data acquisition module is used for acquiring a life cycle event log, wherein the life cycle event log is provided with multi-instance sub-process information, and each activity is provided with two kinds of life cycle information, namely a start life cycle information and an end life cycle information;
the activity nesting relation construction module is used for mining activity nesting relation information from the life cycle event log to obtain a nesting active set, and then constructing a nesting relation tree according to nesting relation among activities and the nesting active set;
the hierarchical event log construction module is used for constructing the event log acquired by the data acquisition module into a hierarchical event log according to the active nested relation acquired by the active nested relation construction module; then carrying out multi-instance identification and reconstruction on sub-logs of each level in the layered event log to obtain a reconstructed layered event log;
the cross-organization multi-instance sub-process model mining module is used for mining the sub-process model of each level for the reconstructed hierarchical event log obtained by the hierarchical event log construction module, and finally obtaining the cross-organization multi-instance sub-process model.
Further, the data acquisition module performs the following operations:
the data acquisition module acquires a life cycle event log with multi-instance sub-process information; the multi-instance sub-process refers to that in a business scene with a parent process and a sub-process, a calling relationship exists between the activities of the parent process and the sub-process, and the sub-process can be instantiated for a plurality of times in the process of calling and executing, namely a plurality of instances running in parallel exist; the lifecycle event log with multi-instance sub-process information is a multi-set of cases, one case is a finite sequence of activities and there are inter-activity calling relationships and multi-instance sub-process cross-actions in the case, each activity in the lifecycle event log has both start and end lifecycle information.
Further, the active nested relationship construction module performs the following operations:
according to the life cycle event log obtained by the data acquisition module, mining event information required by the activity nesting relationship; the nesting relationship is that in the track sigma of the life cycle event log, the activity a, the nesting activity b, needs to meet the following conditions:
(1) the ith position in trace σ is activity a, and the lifecycle state of activity a is start; (2) the j-th position in trace σ is activity b, and the lifecycle state of activity b is start; (3) the kth position in trace σ is activity b, and the lifecycle state of activity b is end; (4) the first position in trace σ is activity a, and the lifecycle state of activity a is end; the i, j, k, l mentioned above satisfies the following: i < j < k < l;
then, mining all the active nested relations in the life cycle event log according to the event information required by the active nested relations, constructing a nested active set, and representing all the active nested relations by using a nested relation tree;
the nested active set refers to a set of activities, satisfying: any activity a in the nested activity set, an activity b exists in the activity set A, and an activity a is nested with an activity b; wherein the active set a is an active set of the lifecycle event log L, i.e. a set of all different activities;
The nested relationship tree is a triplet htre (L) = (rooact, HNode (rootAct), η), where:
①is a root node set, wherein A is an active set of a lifecycle event log L, i.e., a set of all different activities; the root node set satisfies: i. no nesting relationship exists between the root node set activities; ii. The activities in the root node set cannot be nested as other activities;
②HNode(rootAct)={Act a the |a epsilon NA epsilon rootAct) } is a child node set of the root node rootAct, where NA is the nested active set of the lifecycle event log L; act a Sub-node set for Activity a, act a Any activity b in the list satisfies the nesting activity b of the activity a;
(3) eta is a mapping function of nested active set to child node set, i.e. eta (a) =act a Full of all that isFoot: i. any activity a in the nested active set NA, there is a set of child nodes Act a For Act a All activities b in the list have activity a nested activity b; ii. No nesting relationship exists between activities in the child node set; and iii, the mapping relation from the nested activities to the sub-node sets forms a tree structure.
Further, the hierarchical event log construction module performs the following operations:
obtaining root node and nested relation information according to the movable nested relation and the nested relation tree obtained by the movable nested relation construction module; then constructing a root node set of a life cycle event log; then, a sub-process event log corresponding to each root node in the root node set is constructed, and activities corresponding to the root nodes are removed from the nested activity set; performing the operation on the sub-process event log until the nested active set is empty, and constructing a layered event log;
The hierarchical event log is defined as HL (L) = (rootLog, HL (rootLog)), where:
①the method comprises the steps of obtaining a root node log, wherein ζ is a trace set of the life cycle event log L, σ is any trace in the life cycle event log L, σ (i) represents the activity of the ith position of the trace, rootAct is a root node set of the life cycle event log, and all traces in the root node log satisfy: all activities in the track sigma belong to the root node set of the nested relation tree; ii. The ith position of the track is activity a, the jth position is activity b and i<j, the ending time of the activity a is required to be satisfied before the ending time of the activity b;
②HL(rootLog)={(a,NLog a ) The I a epsilon NA epsilon rootAct } is a sub-log of a root node log, wherein NA is a nested active set of life cycle event logs, a is one activity in the nested active set, and rootAct is a root node set of a nested relationship tree, and the following are satisfied:
if no activity in the root node set is nested, then the sub-log is empty; otherwise, any activity b in the track sigma of the sub-log satisfies: i. there is an activity c nested activity b in the root node set; ii. If the ith position in trace σ of the sub-log is activity b, the jth position is activity d and i < j, then the end time of activity b is earlier than the end time of activity d;
Then, carrying out multi-instance identification and reconstruction on sub-logs of each level in the layered event log to obtain a reconstructed layered event log; the multi-instance identification and reconstruction refers to dividing the staggered tracks with multi-instance behaviors in the sub-logs into a plurality of tracks, and distributing the optimal cases for the activities to obtain new sub-logs; and finally, updating the sub-logs in the layered event log into new sub-logs.
Further, the cross-organization multi-instance sub-process model module performs the following operations:
mining the sub-process event logs by using a traditional process mining method according to the layered event logs obtained by the layered event log construction module to obtain a sub-process model, and finally obtaining a cross-organization multi-instance sub-process model;
the cross-tissue multi-instance sub-process model is a layered multi-instance Petri net; the layered multi-instance Petri network is hpmn= (Q, PN) N0 NA, map), where i,Sub-model PN for all nested transitions t t NA is the nested active set; ii. PN (Positive-negative) network N0 Is a Petri net with nested transition at the top layer; iii, NA is the set of all nested transitions; iv, map, NA→ {1, + } XQ\ { PN ] N0 }, i.e.)>If map (a) = (1, pn a ) Activity a invokes a sub-process once, otherwise map (a) = (+, PN a ) Namely, the activity a calls a plurality of subprocesses;
the nested transitions are nested activities that may trigger sub-process multi-instance behavior;
the Petri network with nested transition is a binary PN N = (PN, β), satisfy: (1) p (P)N is a label Perti net; (2) beta is a nested transition function, which satisfies the condition: for any one T in the transition set T, β (T) =a means that T is a normal transition, and β (T) =n means that T is a nested transition;
the label Petri net is a four-tuple pn= (P, T, F, l), where P represents the pool set, T represents the transition set,is a set of directed arcs representing a flow relationship; t→Γ is a transition marker function, and the transition set T points to the marker set Γ, which indicates that each transition activity has a label corresponding to the transition activity.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the event log used by the invention is a life cycle event log, and is more beneficial to the mining of a cross-organization multi-instance sub-process model by combining the existing process mining method.
2. The invention provides the task nesting relationship for the life cycle event log, and breaks through the limitation that the traditional process mining technology cannot support task relationship mining in a cross-organization multi-instance sub-process scene.
3. The invention converts the life cycle event log into the layered event log, and the sub-process event log of each level represents the event information of the parent process calling sub-process, thereby being more beneficial to expressing the level calling of the parent process and the sub-process in the cross-organization scene.
4. The sub-log is subjected to multi-instance identification and reconstruction, so that staggered tracks do not exist in a single instance, the sub-process model of each level is excavated by using the existing excavation method, and the quality of the excavated model is improved.
5. The method has wide prospect in the cross-tissue multi-instance sub-process model mining.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
Example 1
As shown in fig. 1, this embodiment discloses a method for mining a cross-organization multi-instance sub-process model, which uses the obtained nested relation between life cycle event logs to mine tasks to obtain a nested relation tree, further constructs a layered event log by analyzing the nested relation between tasks, traverses the sub-process event log of each layer, performs multi-instance recognition and reconstruction on the sub-log, and finally obtains the cross-organization multi-instance sub-process model based on the layered event log mining, and includes the following steps:
1) Acquiring basic data, including a lifecycle event log of a cross-organization multi-instance sub-process; the lifecycle event log of the cross-organization multi-instance sub-process is a lifecycle event log with multi-instance sub-process information; the cross-organization multi-instance sub-process refers to that an enterprise gives part of internal business to other enterprises, so that the working efficiency of the enterprise is improved, and the operation cost of the enterprise is reduced; the multi-instance sub-process refers to that in a business scene with a parent process and a sub-process, a calling relationship exists between the activities of the parent process and the sub-process, and the sub-process can be instantiated for a plurality of times in the process of calling execution, namely a plurality of instances running in parallel exist; the lifecycle event log with multi-instance sub-process information is a multi-set of cases, one case is a finite sequence of activities and there are inter-activity calling relationships and multi-instance sub-process cross-actions in the case, each activity in the lifecycle event log has both start and end lifecycle information.
2) Excavating nested relations among the life cycle event log activities, and representing the nested relation by a nested relation tree; the nested relationship is defined as that in the trace σ of the lifecycle event log, activity a, nested activity b, needs to satisfy the following condition:
(1) The ith position in trace σ is activity a, and the lifecycle state of activity a is start; (2) the j-th position in trace σ is activity b, and the lifecycle state of activity b is start; (3) the kth position in trace σ is activity b, and the lifecycle state of activity b is end; (4) the first position in trace σ is activity a, and the lifecycle state of activity a is end; the i, j, k, l mentioned above satisfies the following: i < j < k < l;
taking the track sigma= < USRs, POs, POc, TEPs, TEPc, USRc > as an example, the track sigma has three activities USR, PO and TEP, each corresponding to a start activity and an end activity, for example, the start activity and the end activity of USR are USRs and USRc respectively; the nested relationship in the track sigma is that the active USR is nested with the active PO, and the active USR is nested with the active PO, wherein the position relationship between the active USR and the active PO in the track is shown in figure 2.
The specific process of the nesting relation mining between activities is as follows:
2.1 Taking the life cycle event log as input to obtain event information required by mining the nesting relation between activities;
2.2 Nesting relationship between mining activities): constructing a nested active set according to all nested relations among activities in the defined mining life cycle event log of the nested relations; all nested relations among the activities are represented by a nested relation tree;
The nested active set refers to an active set in the above process, and meets the following conditions: any activity a in the nested activity set, an activity b exists in the activity set A, and an activity a is nested with an activity b; wherein the active set a is an active set of the lifecycle event log L, i.e. a set of all different activities;
the nested relationship tree is a triplet htre (L) = (rooact, HNode (rootAct), η), where:
①is a root node set, wherein A is an active set of a lifecycle event log L, i.e., a set of all different activities;the root node set satisfies: i. no nesting relationship exists between the root node set activities; ii. The activities in the root node set cannot be nested as other activities;
②HNode(rootAct)={Act a the |a epsilon NA epsilon rootAct) } is a child node set of the root node rootAct, where NA is the nested active set of the lifecycle event log L; act a Sub-node set for Activity a, act a Any activity b in the list satisfies the nesting activity b of the activity a;
(3) eta is a mapping function of nested active set to child node set, i.e. eta (a) =act a The method comprises the following steps: i. any activity a in the nested active set NA, there is a set of child nodes Act a For Act a All activities b in the list have activity a nested activity b; ii. No nesting relationship exists between activities in the child node set; and iii, the mapping relation from the nested activities to the sub-node sets forms a tree structure.
In life cycle event logs [<USRs,JPSs,JPSc,POs,POc,TEPs,TEPc,USRc,MROs,MROc,TPLs,EOIs,EOIs,EOIc,EOIc,EFRs,EFRs,EFRc,PEBs,EFRc,PEBs,PEBc,CLLs,PEBc,CLLc,CLLs,CLLc,CQIs,ATSs,COIc,ATSs,ALs,NTCs,ALc,NTCc,NTCs,NTCc,TPLc,UCRs,UCRc> 100 ,<USRs,JPSs,JPSc,POs,POc,TEPs,TEPc,USRc,MROs,MROc,TPLs,EOIs,EOIc,EOIs,EFRs,EFRc,EOIc,PEBs,EFRs,PEBc,CLLs,EFRc,CLLc,CQIs,COIc,PEBs,ALs,ALc,NTCs,NTCc,PEBc,CLLs,CLLc,ATSs,ATSc,NTCs,NTCc,TPLc,UCRs,UCRc> 100 ]For example, using the steps described above, the nested task set is { USR, TPL }, the task nested relationship tree is shown in FIG. 3.
3) Constructing a layered event log according to the nest relation tree obtained by excavation; the nested relationship tree is a triplet htre (L) = (rooact, HNode (rootAct), η), where:
①is a root node set, wherein A is an active set of a lifecycle event log L, i.e., a set of all different activities; the root node set satisfies: i. no nesting relationship exists between the root node set activities; ii. The activity in the root node set cannot be doneNesting activities that are other activities;
②HNode(rootAct)={Act a the |a epsilon NA epsilon rootAct) } is a child node set of the root node rootAct, where NA is the nested active set of the lifecycle event log L; act a Sub-node set for Activity a, act a Any activity b in the list satisfies the nesting activity b of the activity a;
(3) eta is a mapping function of nested active set to child node set, i.e. eta (a) =act a The method comprises the following steps: i. any activity a in the nested active set NA, there is a set of child nodes Act a For Act a All activities b in the list have activity a nested activity b; ii. No nesting relationship exists between activities in the child node set; iii, embedding the mapping relation from the activity to the sub-node set to form a tree structure;
The nesting relationship mentioned in the above procedure is defined as that in the trace σ of the lifecycle event log, the activity a, nesting activity b, needs to satisfy the following condition:
(1) the ith position in trace σ is activity a, and the lifecycle state of activity a is start; (2) the j-th position in trace σ is activity b, and the lifecycle state of activity b is start; (3) the kth position in trace σ is activity b, and the lifecycle state of activity b is end; (4) the first position in trace σ is activity a, and the lifecycle state of activity a is end; the i, j, k, l mentioned above satisfies the following: i < j < k < l;
the nested active set refers to an active set in the above process, and meets the following conditions: any activity a in the nested activity set, an activity b exists in the activity set A, and an activity a is nested with an activity b; wherein the active set a is an active set of the lifecycle event log L, i.e. a set of all different activities;
the hierarchical event log is defined as HL (L) = (rootLog, HL (rootLog)), where:
①is a root node log, wherein zeta is a track set of a life cycle event log L, sigma is a life cycle event dayAny trace in the log L, σ (i) represents the activity of the ith position of the trace, rootAct is the root node set of the cycle event log, and all traces in the root node log satisfy: i. all activities in the trace σ belong to the root node set of the nested relationship tree; ii. The ith position of the track is activity a, the jth position is activity b and i <j, the ending time of the activity a is required to be satisfied before the ending time of the activity b;
②HL(rootLog)={(a,NLog a ) The I a epsilon NA epsilon rootAct } is a sub-log of a root node log, wherein NA is a nested active set of life cycle event logs, a is one activity in the nested active set, and rootAct is a root node set of a nested relationship tree, and the following are satisfied:
if no activity in the root node set is nested, then the sub-log is empty; otherwise, any activity b in the track sigma of the sub-log satisfies: i. there is an activity c nested activity b in the root node set; ii. If the ith position in trace σ of the sub-log is activity b, the jth position is activity d and i < j, then the end time of activity b is earlier than the end time of activity d;
the specific process of constructing the hierarchical event log is as follows:
3.1 Taking the life cycle event log and the nested relation tree as input to obtain root node information and nested relation information;
3.2 A root node set of the lifecycle event log is constructed;
3.3 Constructing a sub-process event log corresponding to each root node in the root node set of the step 3.2), and removing the activities corresponding to the root nodes from the nested activity set;
3.4 Iteratively performing the operation steps 3.2) and 3.3) until the nested active set is empty, and finally constructing the hierarchical event log.
In life cycle event logs [<USRs,JPSs,JPSc,POs,POc,TEPs,TEPc,USRc,MROs,MROc,TPLs,EOIs,EOIs,EOIc,EOIc,EFRs,EFRs,EFRc,PEBs,EFRc,PEBs,PEBc,CLLs,PEBc,CLLc,CLLs,CLLc,CQIs,ATSs,COIc,ATSs,ALs,NTCs,ALc,NTCc,NTCs,NTCc,TPLc,UCRs,UCRc> 100 ,<USRs,JPSs,JPSc,POs,POc,TEPs,TEPc,USRc,MROs,MROc,TPLs,EOIs,EOIc,EOIs,EFRs,EFRc,EOIc,PEBs,EFRs,PEBc,CLLs,EFRc,CLLc,CQIs,COIc,PEBs,ALs,ALc,NTCs,NTCc,PEBc,CLLs,CLLc,ATSs,ATSc,NTCs,NTCc,TPLc,UCRs,UCRc> 100 ]For example, the above steps are used to obtain root log rootlog= [<USRs,USRc,MROs,MROc,TPLs,TPLc,UCRs,UCRc> 200 ]Nested task set NA (rootLog) = { USR, TPL }, the sub-process event log corresponding to nested task USR is NLog USR =[<JPSs,JPSc,POs,POc,TEPs,TEPc> 200 ]The sub-process event log corresponding to the nested task TPL is NLog TPL =[<EOIs,EOIs,EOIc,EOIc,EFRs,EFRs,EFRc,PEBs,EFRc,PEBs,PEBc,CLLs,PEBc,CLLc,CLLs,CLLc,CQIs,ATSs,COIc,ATSs,ALs,NTCs,ALc,NTCc,NTCs,NTCc> 100 ,<EOIs,EOIc,EOIs,EFRs,EFRc,EOIc,PEBs,EFRs,PEBc,CLLs,EFRc,CLLc,CQIs,COIc,PEBs,ALs,ALc,NTCs,NTCc,PEBc,CLLs,CLLc,ATSs,ATSc,NTCs,NTCc> 100 ]
4) Carrying out multi-instance identification and reconstruction on sub-logs of the layered event log; the hierarchical event log is defined as HL (L) = (rootLog, HL (rootLog)), where:
①the method comprises the steps of obtaining a root node log, wherein ζ is a trace set of the life cycle event log L, σ is any trace in the life cycle event log L, σ (i) represents the activity of the ith position of the trace, rootAct is a root node set of the life cycle event log, and all traces in the root node log satisfy: i. all activities in the trace σ belong to the root node set of the nested relationship tree; ii. The ith position of the track is activity a, the jth position is activity b and i<j, the ending time of the activity a is required to be satisfied before the ending time of the activity b;
②HL(rootLog)={(a,NLog a ) The I a epsilon NA epsilon rootAct } is a sub-log of a root node log, wherein NA is a nested active set of life cycle event logs, a is one activity in the nested active set, and rootAct is a root node set of a nested relationship tree, and the following are satisfied:
If no activity in the root node set is nested, then the sub-log is empty; otherwise, any activity b in the track sigma of the sub-log satisfies: i. there is an activity c nested activity b in the root node set; ii. If the ith position in trace σ of the sub-log is activity b, the jth position is activity d and i < j, then the end time of activity b is earlier than the end time of activity d;
the multi-instance identification and reconstruction is to divide the staggered track with multi-instance behaviors in the sub-logs into a plurality of tracks, allocate the best case for the activity to obtain a new sub-log, and update the new sub-log into the layered event log to obtain a reconstructed layered event log;
according to the hierarchical event log, carrying out multi-instance identification and reconstruction on the sub-log, wherein the specific process is as follows:
4.1 Inputting a hierarchical event log, obtaining sub-logs of each level and constructing a sub-log set;
4.2 Multiple instance identification is carried out on each sub-log in the sub-log set, and the specific process is as follows:
4.2.1 Input sub-log NLog a Initializing an activity frequency matrix M and a sub-log track sequence; the row and the column of the activity frequency matrix M are formed by all different activities in the sub-logs, starting activities and ending activities, any two activities a and b are formed, the value in the activity frequency matrix is M (a, b), and the value of the activity frequency matrix is the frequency of the adjacent relation of the activities; the start activity and the end activity do not represent any actual activity, only represent that a track is started or ended at a certain activity, no transition points to the start activity, and no transition points from the end activity;
The frequencies of the immediate relationship are defined as: the frequency of the immediate relationship of Activity a and Activity b, denoted by P (a > b):
where |a > b| is the frequency of activity a immediately adjacent to activity b, |a > c| is the frequency of activity a immediately adjacent to activity c;
the frequency of the immediate relationship between activities is defined as: the frequency of activity a immediately adjacent to activity b is denoted by |a > b|, i.e., in sub-log NLog a The number of times that activity a is next to activity b is satisfied in all the trajectories of (a);
the above mentioned close relationship is defined as: NLog a Is a sub-log of activity a, in sub-log NLog a In trajectory σ of (a), activity b satisfies next to activity c: (1) the ith position is activity b; (2) the (i+1) th position is activity c;
sub-log sequenceWherein #, is # trans (sigma (i)) represents the lifecycle state of the activity of the ith position of the track sigma, sub-log sequence SLog a The method meets the following conditions: i. the ith position in the sub-log sequence is a, the jth position is b, the end time of activity a is earlier than the end time of activity b and i<j; ii. The life cycle state of the activity in the sub-log sequence is complete;
4.2.2 According to the activity frequency matrix M, the optimal case is allocated for the activity in the sub-log sequence, and the specific process is as follows:
4.2.2.1 Input sub-log sequence SLog) a Defining the current activity q to point to the sub-log sequence SLog a Is the first activity in (a);
4.2.2.2 Assigning optimal cases to the activities according to the activity frequency matrix M, where there are two cases:
first, if the active set is empty, or any of the active sets is active b, there is M (b, q)<M(S e Q), then a new case σ is started at the current activity q, which becomes the enabled activity and joins the enabled active set; wherein S is e Is to start activity; m (b, q) is the value of activity b and activity q in the activity frequency matrix, i.e. the frequency of activity b immediately adjacent to activity q; m (S) e Q) is activity S e And the value of activity q in the activity frequency matrix, i.e. activity S e The frequency of immediately adjacent activity q; the enabling active set refers to a set of activities in an enabled state;
second, if there is one enabling activity b in the enabling activity set, it is satisfied that the M (b, q) value is the largest and M (b, q)>M(S e Q), i.e. the frequency of b immediately adjacent to q is greatestAt this time, i, if M (q, c) is the case for any activity c in the active set<M(q,C e ) Then the current activity q is assigned to the case in which the enabling activity b is located, and the case is ended at activity q, enabling activity b is removed from the set of enabling activities, wherein C e To end the activity; m (q, c) is the value of activity q and activity c in the activity frequency matrix, i.e. the frequency at which activity q is immediately adjacent to activity c; m (q, C) e ) Is Activity q and Activity C e Values in the activity frequency matrix, i.e. activity q is immediately adjacent to activity C e Is a frequency of (2); ii. Otherwise, the current activity q is allocated to the case where the enabling activity b is located, the current activity q becomes the enabling activity and is added to the enabling activity set, and the enabling activity b is removed from the enabling activity set, namely, a new case is not started at the current activity q, and a certain case in a running state is not ended; if not, selecting any one of the enabling activities b from the enabling activity set, distributing the current activity q to the case where the enabling activity b is located, removing the enabling activity b from the enabling activity set, changing the current activity q into the enabling activity and adding the enabling activity q into the enabling activity set;
directing the current activity q to the sub-log sequence SLog a The next activity in (a);
4.2.2.3 Iteratively performing step 4.2.2.2) until the current activity q points to null, outputting a new sub-log;
4.2.3 According to the new sub-log, calculating the frequency of the close relation between the activities, and updating the activity frequency matrix M;
4.2.4 Iteratively executing the steps 4.2.2) and 4.2.3) until no value changes when the active frequency matrix M is updated, and outputting the sub-log after reconstruction;
4.3 Step 4.2) is repeatedly executed until the sub-log set is empty, and the reconstructed hierarchical event log is output.
5) Mining a cross-organization multi-instance sub-process model from the reconstructed hierarchical event log; the cross-tissue multi-instance sub-process model is a layered multi-instance Petri network; layered multi-instance Petri network is defined as hpmn= (Q, PN) N0 NA, map), wherein i,For all nested transitions t i Corresponding submodel->NA is the nested active set; ii, PN N0 Is a Petri net with nested transition at the top layer; iii, NA is the set of all nested transitions; iv, map NA → {1, + } XQ \ { PN ] N0 }, i.e.)>If->Then nest transition t i Calling a sub-process once, otherwiseNested transition t i Calling a plurality of sub-processes;
the nested transitions mentioned above are activities that may trigger sub-process multi-instance behavior;
the Petri network with nested transitions mentioned above is a binary PN N = (PN, β), satisfy: (1) PN is a label Perti network; (2) beta is a nested transition function, which satisfies the condition: for any one T in the transition set T, β (T) =g means that T is a normal transition, and β (T) =n means that T is a nested transition;
the above-mentioned label Petri net is a four-tuple pn= (P, T, F, l), where P represents the pool set, T represents the transition set,is a set of directed arcs representing a flow relationship; t- & gtΓ is a transition marking function, and the transition set T points to the marking set Γ, which indicates that each transition activity has a label corresponding to the transition activity;
The specific process of the hierarchical multi-instance Petri network model mined by the reconstructed hierarchical event log is as follows: and taking the obtained layered event log as input, and mining the sub-process event log by using a traditional process mining method, namely an Inductive Miner, so as to obtain a sub-process model, and finally obtaining the cross-organization multi-instance sub-process model.
Taking the cross-organization multi-instance sub-process model of FIG. 4 as an example, the top-level flow model contains two nested transition USRs and TPLs, the nested transition USRs correspond to the sub-process models PN USR The sub-process model comprises three common transitions, and the nested transition TPL corresponds to the sub-process model PN TPL Eight common transitions are included in the sub-process model.
Example 2
The embodiment discloses a cross-organization multi-instance sub-process model mining system, the system architecture is shown in fig. 5, and the system architecture comprises a data acquisition module, an active nested relation construction module, a hierarchical event log construction module and a cross-organization multi-instance sub-process model mining module;
the data acquisition module is used for acquiring a life cycle event log, wherein the life cycle event log is provided with multi-instance sub-process information, and each activity is provided with two kinds of life cycle information, namely a start life cycle information and an end life cycle information;
The activity nesting relation construction module is used for mining activity nesting relation information from the life cycle event log to obtain a nesting active set, and then constructing a nesting relation tree according to nesting relation among activities and the nesting active set;
the hierarchical event log construction module is used for constructing the event log acquired by the data acquisition module into a hierarchical event log according to the active nested relation acquired by the active nested relation construction module; then carrying out multi-instance identification and reconstruction on sub-logs of each level in the layered event log to obtain a reconstructed layered event log;
the cross-organization multi-instance sub-process model mining module is used for mining the sub-process model of each level for the reconstructed hierarchical event log obtained by the hierarchical event log construction module, and finally obtaining the cross-organization multi-instance sub-process model.
The data acquisition module performs the following operations:
the data acquisition module acquires a life cycle event log with multi-instance sub-process information; the multi-instance sub-process refers to that in a business scene with a parent process and a sub-process, a calling relationship exists between the activities of the parent process and the sub-process, and the sub-process can be instantiated for a plurality of times in the process of calling and executing, namely a plurality of instances running in parallel exist; the lifecycle event log with multi-instance sub-process information is a multi-set of cases, one case is a finite sequence of activities and there are inter-activity calling relationships and multi-instance sub-process cross-actions in the case, each activity in the lifecycle event log has both start and end lifecycle information.
The active nested relationship construction module performs the following operations:
according to the life cycle event log obtained by the data acquisition module, mining event information required by the activity nesting relationship; the nesting relationship is that in the track sigma of the life cycle event log, the activity a, the nesting activity b, needs to meet the following conditions:
(1) the ith position in trace σ is activity a, and the lifecycle state of activity a is start; (2) the j-th position in trace σ is activity b, and the lifecycle state of activity b is start; (3) the kth position in trace σ is activity b, and the lifecycle state of activity b is end; (4) the first position in trace σ is activity a, and the lifecycle state of activity a is end; the i, j, k, l mentioned above satisfies the following: i < j < k < l;
then, mining all the active nested relations in the life cycle event log according to the event information required by the active nested relations, constructing a nested active set, and representing all the active nested relations by using a nested relation tree;
the nested active set refers to a set of activities, satisfying: any activity a in the nested activity set, an activity b exists in the activity set A, and an activity a is nested with an activity b; wherein the active set a is an active set of the lifecycle event log L, i.e. a set of all different activities;
The nested relationship tree is a triplet htre (L) = (rooact, HNode (rootAct), η), where:
①is a root node set, wherein A is an active set of a lifecycle event log L, i.e., a set of all different activities; the root node set satisfies: i. no nesting relationship exists between the root node set activities; ii. The activities in the root node set cannot be nested as other activities;
②HNode(rootAct)={Act a the |a epsilon NA epsilon rootAct) } is a child node set of the root node rootAct, where NA is the nested active set of the lifecycle event log L; act a Sub-node set for Activity a, act a Any activity b in the list satisfies the nesting activity b of the activity a;
(3) eta is a mapping function of nested active set to child node set, i.e. eta (a) =act a The method comprises the following steps: i. any activity a in the nested active set NA, there is a set of child nodes Act a For Act a All activities b in the list have activity a nested activity b; ii. No nesting relationship exists between activities in the child node set; and iii, the mapping relation from the nested activities to the sub-node sets forms a tree structure.
The hierarchical event log construction module performs the following operations:
obtaining root node and nested relation information according to the movable nested relation and the nested relation tree obtained by the movable nested relation construction module; then constructing a root node set of a life cycle event log; then, a sub-process event log corresponding to each root node in the root node set is constructed, and activities corresponding to the root nodes are removed from the nested activity set; performing the operation on the sub-process event log until the nested active set is empty, and constructing a layered event log;
The hierarchical event log is defined as HL (L) = (rootLog, HL (rootLog)), where:
①is a root node log, wherein ζ is a trace set of the lifecycle event log L, σ is any trace in the lifecycle event log L, and σ (i) representsThe activity of the ith position of the track, the rootAct is a root node set of the life cycle event log, and all tracks in the root node log satisfy: all activities in the track sigma belong to the root node set of the nested relation tree; ii. The ith position of the track is activity a, the jth position is activity b and i<j, the ending time of the activity a is required to be satisfied before the ending time of the activity b;
②HL(rootLog)={(a,NLog a ) The I a epsilon NA epsilon rootAct } is a sub-log of a root node log, wherein NA is a nested active set of life cycle event logs, a is one activity in the nested active set, and rootAct is a root node set of a nested relationship tree, and the following are satisfied:
if no activity in the root node set is nested, then the sub-log is empty; otherwise, any activity b in the track sigma of the sub-log satisfies: i. there is an activity c nested activity b in the root node set; ii. If the ith position in trace σ of the sub-log is activity b, the jth position is activity d and i < j, then the end time of activity b is earlier than the end time of activity d;
Then, carrying out multi-instance identification and reconstruction on sub-logs of each level in the layered event log to obtain a reconstructed layered event log; the multi-instance identification and reconstruction refers to dividing the staggered tracks with multi-instance behaviors in the sub-logs into a plurality of tracks, and distributing the optimal cases for the activities to obtain new sub-logs; finally, the sub-logs in the layered event log are updated to be new sub-logs;
the cross-organization multi-instance sub-process model module performs the following operations:
mining the sub-process event logs by using a traditional process mining method according to the layered event logs obtained by the layered event log construction module to obtain a sub-process model, and finally obtaining a cross-organization multi-instance sub-process model;
the cross-tissue multi-instance sub-process model is a layered multi-instance Petri net; the layered multi-instance Petri network is hpmn= (Q, PN) N0 NA, map), where i,Sub-model PN for all nested transitions t t NA is the nested active set; ii. PN (Positive-negative) network N0 Is a Petri net with nested transition at the top layer; iii, NA is the set of all nested transitions; iv, map, NA→ {1, + } XQ\ { PN ] N0 }, i.e.)>If map (a) = (1, pn a ) Activity a invokes a sub-process once, otherwise map (a) = (+, PN a ) Namely, the activity a calls a plurality of subprocesses;
the nested transitions are nested activities that may trigger sub-process multi-instance behavior;
the Petri network with nested transition is a binary PN N = (PN, β), satisfy: (1) PN is a label Perti network; (2) beta is a nested transition function, which satisfies the condition: for any one T in the transition set T, β (T) =a means that T is a normal transition, and β (T) =n means that T is a nested transition;
the label Petri net is a four-tuple pn= (P, T, F, l), where P represents the pool set, T represents the transition set,is a set of directed arcs representing a flow relationship; t→Γ is a transition marker function, and the transition set T points to the marker set Γ, which indicates that each transition activity has a label corresponding to the transition activity.
In summary, after the scheme is adopted, a brand new method is provided for the cross-organization multi-instance sub-process mining, the mining layered multi-instance Petri net model is used as an effective means for the cross-organization multi-instance sub-process model mining, the problem that the traditional process mining technology cannot solve the recognition and mining of multi-instance sub-process behaviors in the cross-organization multi-instance sub-process business is effectively solved, the development of the cross-organization multi-instance sub-process model mining technology is effectively promoted, and the method has practical application value and is worthy of popularization.
The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, so variations in shape and principles of the present invention should be covered.