CN115712676A - Method and system for identifying unmarked business process event log case - Google Patents

Method and system for identifying unmarked business process event log case Download PDF

Info

Publication number
CN115712676A
CN115712676A CN202211445119.0A CN202211445119A CN115712676A CN 115712676 A CN115712676 A CN 115712676A CN 202211445119 A CN202211445119 A CN 202211445119A CN 115712676 A CN115712676 A CN 115712676A
Authority
CN
China
Prior art keywords
activity
case
event log
unmarked
activities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202211445119.0A
Other languages
Chinese (zh)
Inventor
刘聪
王颖
陆婷
郭娜
李彩虹
张冬梅
郑凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University of Technology
Original Assignee
Shandong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University of Technology filed Critical Shandong University of Technology
Priority to CN202211445119.0A priority Critical patent/CN115712676A/en
Publication of CN115712676A publication Critical patent/CN115712676A/en
Withdrawn legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a system for identifying a no-mark service process event log case, which comprises the following steps: 1) Acquiring basic data, namely a unmarked business process event log; 2) According to the obtained unmarked service process event log, mining the dependency relationship and dependency degree value among activities; 3) According to the dependency degree value of the mined activities, mining the concurrency relation among the activities and constructing a dependency relation graph; 4) Mining mutually exclusive activities and cyclic activities among the activities according to the mined concurrency relationship and dependency relationship graph among the activities; 5) And according to the mined concurrency relation and dependency relation graph among the activities and the exclusive activities and cyclic activities among the activities, constructing a case tree to perform case recognition on the activities in the event log of the unmarked service flow, and obtaining the event log with case identification. The invention solves the problem that the traditional process mining technology can not mine the process model from the unmarked service process event log.

Description

Method and system for identifying unmarked business process event log case
Technical Field
The invention relates to the technical field of process mining, in particular to a case identification method and a case identification system for unmarked business process event logs, which mainly aim at the problem that the current process mining technology is difficult to effectively mine a process model from the unmarked business process event logs.
Background
The process mining is a new research hotspot in the field of business process management, and aims to extract relevant process information from a business process event log to provide a factual basis for understanding, improving and reconstructing business processes of enterprises. The IEEE process mining working group divides the event logs into five levels from high to low to express the maturity of the logs, the event log at the highest level is credible and complete, the event definition is good, and the recorded events and the attributes thereof have clear semantics, such as a semantic annotation log of a BPM system; the lowest level event logs are of poor quality, the recorded event logs may not be in line with the fact, and certain events may be lost, such as paper document flow records, paper medical records, etc. that organize internal routes. The IEEE process mining working group thus defines cleaning event data as one of the challenges in process mining. Process mining requires a standard event log, i.e., each event corresponds to a process instance, however, it may not be possible to associate specific process instances when recording and collecting data, and at this time, the case attributes may not exist or be lost, and the log becomes an unmarked event log. In the unmarked business process event log, whether two events are related or not becomes uncertain, and the number of process instances is unknown, so designing and modeling a process model from the unmarked business process event log is an extremely complex, time-consuming, labor-consuming and challenging task.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, provides a case identification method for unmarked service process event logs, and solves the problem that the traditional process mining technology cannot mine a process model from the unmarked service process event logs.
The invention also provides a system for identifying the unmarked business process event log case.
The first purpose of the invention is realized by the following technical scheme: a method for identifying an event log case of an unmarked business process comprises the following steps:
1) Acquiring basic data, namely unmarked business process event logs;
2) Mining the dependency relationship and dependency degree value among activities according to the basic data obtained in the step 1);
3) Mining the concurrency relationship among the activities and constructing a dependency relationship graph according to the dependency degree value of the activities mined in the step 2);
4) Mining mutually exclusive activities and cyclic activities among the activities according to the concurrency relationship and dependency relationship graph among the activities mined in the step 3);
5) And constructing a case tree to perform case recognition on the activities in the unmarked service process event log according to the concurrency relation and dependency relation graph among the activities mined in the step 3) and the exclusive activities and cyclic activities among the activities mined in the step 4) to obtain the event log with case identification.
Further, in step 1), the unmarked business process event log is an event log of a missing case, and is represented as an event sequence with timestamp information; the event log is data generated in the process of executing the process, the event log is a multi-set of cases, and one case is a process instance corresponding to an event in the event log, namely one-time execution of the process model.
Further, in step 2), the dependency relationship between the activities means that in the unmarked service flow event log UL, activity b depends on activity a to satisfy that when the ith position in UL is activity a, the (i + 1) th position is b, where i satisfies: 1 ≦ i ≦ UL | -1, and the dependency metric for activity b dependent on activity a is calculated as follows:
Figure BDA0003949964180000021
where R (a → b) represents the dependency value of activity b on activity a, a → b represents activity b on activity a, | a → b | represents the frequency with which activity b depends on activity a in the unmarked traffic flow event log UL, | UL | represents the total number of activities in the unmarked traffic flow event log, | a | represents the frequency of activity a in the unmarked traffic flow event log UL, | b | represents the frequency of activity b in the unmarked traffic flow event log UL, a is dependent activity, and b is dependent activity.
Further, in step 3), the concurrency relationship between activities is in the unmarked business process event log, and activity c and activity b are in the concurrency relationship, that is, c | | d, which needs to satisfy:
①R(c→d)>ω∨R(d→c)>ω
Figure BDA0003949964180000031
in the formula, ω and θ are input threshold values, R (c → d) represents a dependency degree value of activity d depending on activity c, and R (d → c) represents a dependency degree value of activity c depending on activity d; the condition ((1) means that the activity c depends on the activity d or the dependency degree value of the activity d depends on the activity c is larger than the threshold value omega, namely, the condition is used for screening the activity pair with high dependency degree;
the specific steps of constructing the concurrent activity set according to the dependency degree value among the activities are as follows:
3.1 Input the unmarked business process event log UL to obtain the dependency degree value between activities;
3.2 Add activities that satisfy the inter-activity concurrency relationship to a concurrent activity set ParallelSet;
3.3 Step 3.2) of loop iteration is carried out until all the activity pairs are traversed, and a concurrent activity set ParallelSet is output;
the dependency graph is a two-tuple DG = (N, E), wherein,
Figure BDA0003949964180000032
is a set of the top points,
Figure BDA0003949964180000033
an active set which is a no-mark service flow event log UL;
Figure BDA0003949964180000034
representing a dependency edge set among nodes, and satisfying the following conditions for any (E, f) belonging to E: (1) R (e → f) ≧ omega, i.e., the degree of dependence of activity f on activity e is greater than or equal to threshold omega; (2) there is no concurrency relationship between activity e and activity f; (3) I R (e → f) -R (f → e) |/(R (e → f) + R (f → e))<Theta, namely the dependency degree value of the activity f depending on the activity e is close to that of the activity e depending on the activity f; (4) When e is the start activity startAct, f is not the end activity endAct, or when f is the end activity endAct, e is not the start activity startAct;
the above-mentioned ω and θ are threshold values of the input, R (e → f) represents a dependency value of the activity f depending on the activity e, and R (f → e) represents a dependency value of the activity e depending on the activity f; the start activity startAct is the first activity of the unmarked business process event log, and the end activity endAct is the last activity of the unmarked business process event log.
Further, in step 4), the mutually exclusive activity is defined as that if the activity a and the activity b are mutually exclusive activities, it is required to satisfy: (1) No activity b exists on a path from the activity a to the ending activity endAct in the dependency graph; (2) The activity a does not exist on the path from the activity b to the end activity endAct in the dependency graph; (3) there is no concurrency relationship between activity a and activity b;
the above mentioned paths refer to the dependency graph DG = (N, E), and for any two nodes c and d, if there is a node c 1 ,c 2 ,…,c j ,…,c k ,…,c n And satisfies when 1. Ltoreq. J<When k is less than or equal to n, c j ≠c k If c → c 1 ,c 1 →c 2 ,…,c n → d, then there is one path p (c, d) from activity c to activity d, i.e. p (c, d) = (c, c) 1 ,…,c n ,d);
The mentioned ending activity endAct is the last activity of the unmarked business process event log;
the cycle activity is defined as that if the activity e is the cycle activity, the requirement of (1) | e | > | startAct |, namely the frequency of the activity e in the unmarked business process event log UL is greater than the frequency of the activity startAct; (2) A path from the node e to the node e exists in the dependency graph;
the start activity startAct mentioned above is the first activity of the unmarked business process event log;
mining mutually exclusive activities and cyclic activities according to the concurrency relationship and the dependency relationship graph among the activities, and the steps are as follows:
4.1 Input the unmarked service flow event log UL to obtain a concurrent activity set and a dependency relationship graph;
4.2 According to the dependency relationship diagram, adding an activity pair meeting the mutual exclusion relationship into a mutual exclusion activity set Exclusive set;
4.3 Step 4.2) of loop iteration until all the activity pairs are traversed, and outputting a mutually exclusive activity set Exclusive set;
4.4 Based on the frequency of starting the activity startAct), if the frequency of an activity is greater than the frequency of starting the activity startAct and there is a path from the activity to the activity in the dependency graph, the activity is a cyclic activity and is added to the cyclic activity set LoopSet;
4.5 Step 4.4) of loop iteration until all the activities are traversed, and a loop activity set LoopSet is output.
Further, in step 5), the case tree is a quadruplet ctre (σ) = (Node, root, F, leaves), where Node e a is a Node activity set, there are basic information of case, subscript, and activity name for any Node, a is an activity set of the unmarked service flow event log UL, and the subscript refers to location information of an activity in the unmarked service flow event log UL; root belongs to Node as Root Node;
Figure BDA0003949964180000052
is a directed arcRepresents the directional relationship between nodes, i.e.
Figure BDA0003949964180000053
If the node a points to the node b, the node a is a father node of the node b, and the node b is a child node of the node a; the Leaves belonging to the Node is a leaf Node set, and for any leaf Node, a child Node of the leaf Node is empty;
the case identification is to reconstruct the event log of the unmarked service process, add case information to the activity and obtain the event log with the case information; the case identification comprises the following specific steps:
5.1 Input the unmarked service process event log UL, and obtain the concurrent activity set, the cyclic activity set, the mutually exclusive activity set and the dependency graph;
5.2 ) initializing a set of case trees
Figure BDA0003949964180000051
Case σ =0;
5.3 If the current activity to be allocated is activity e, then:
if the activity e is the starting activity startAct, indicating that a new case is started, newly building a case tree sigma, wherein the root node of the case tree is e, the leaf node is e, updating a case tree set caseTree = caseTree { [ sigma ], and updating the next case sigma = sigma +1; otherwise, one of two cases is performed:
in the first case: selecting a case tree sigma 'from the case tree set CaseTree, wherein the sigma' meets the condition: i. there is no mutually exclusive activity of e in the case tree; ii. If the case tree has no activity e or if the activity e is a cyclic activity, then a leaf node leaf is selected from the case tree σ', if one of the following two conditions exists, the first condition is: if the leaf and the activity e are concurrent activities, an activity n exists as an entry node of the activity e in the dependency graph, and the activity n exists in the case tree sigma ', adding the activity e into the case tree sigma', wherein a parent node of the activity e is n, and the activity e is a leaf node; case two: if R (leaf → e) ≧ omega, adding the activity e to the case tree sigma', wherein the father node of the activity e is leaf and the activity e is leaf node, wherein R (leaf → e) represents the dependency degree value of the activity e depending on the leaf;
in the second case: preferentially selecting a case tree from the CaseTree to be allocated to the activity, and preferentially selecting the case tree according to the following principle: selecting a case tree rt from a case tree set CaseTree, if a leaf node leaf exists in rt, any leaf node in all case trees is represented by leaf ', and R (leaf → e) is not less than R (leaf' → e), adding an activity e into the case tree rt, wherein the father node of the activity e is leaf and the activity e is a leaf node, and R (leaf '→ e) represents a dependency degree value of the activity e depending on the activity leaf';
5.4 Step 5.3) of cycle, traversing from the first activity in the unmarked business process event log to the last activity in sequence, finishing case identification, reading the activity case information from the case tree, and outputting the reconstructed event log with marks.
The second purpose of the invention is realized by the following technical scheme: a kind of no mark business process incident journal case recognition system, including data acquisition module, activity dependency relation excavate module, activity relation excavate module and no mark business process incident journal case recognition module;
the data acquisition module is used for acquiring the unmarked service process event log; the event log is data generated in the process of executing the process, the event log is a multi-set of cases, and one case is a process instance corresponding to an event in the event log, namely one-time execution of the process model; the unmarked business process event log is an event log without a case and is represented as an event sequence with timestamp information;
the activity dependency relationship mining module is used for acquiring dependency relationships and dependency degree values among activities;
the activity relation mining module is used for acquiring activity concurrency relation, mutual exclusion relation and cyclic relation;
the case identification module of the unmarked service process event logs is used for reconstructing the unmarked service process event logs, adding case attributes for activities based on the case tree and obtaining the event logs with case information.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the data used by the method is the unmarked service process event log, the method does not need additional attribute information of the event, and the application scene is wide.
2. The invention excavates the data characteristics of the event from the unmarked service process event log, breaks through the limitation that the existing case recognition technology can not support the excavation of the cyclic activity, greatly improves the excavation of the concurrency relation and improves the accuracy of case recognition.
3. The invention carries out case recognition on the unmarked service process event logs, so that each event has case information, the model accuracy of mining is improved, and a novel case recognition method is provided for the process mining field.
4. The method for mining the process model in the unmarked service process event log has wide use space, strong practicability and wide prospect in the fields of process mining and case identification.
Drawings
FIG. 1 is a schematic diagram of the logic process of the method of the present invention.
Fig. 2 is a plug-in interface diagram implemented in the ProM by the method of the present invention.
FIG. 3 is a diagram of an interface for setting a threshold ω according to the method of the present invention.
FIG. 4 is a graphical representation of a threshold θ interface for the method of the present invention.
FIG. 5 is a diagram of the dependency of the method of the present invention.
Fig. 6 is an architecture diagram of the system of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Example 1
As shown in fig. 1, the present embodiment discloses a case identification method for an unmarked business process event log, which has been implemented in a ProM tool in a plug-in manner, as shown in fig. 2; the method comprises the following steps of mining the dependency relationship and the dependency degree value among activities by using the obtained unmarked service process event log, further mining the concurrency relationship among the activities, constructing a dependency relationship graph, further mining the mutual exclusion activity and the cycle activity among the activities, and finally carrying out case recognition on the activities in the unmarked service process event log based on a case tree to obtain the event log with case identification, wherein the method comprises the following steps:
1) Acquiring basic data, namely a unmarked business process event log; the unmarked business process event log is a case-missing event log and is represented as an event sequence with timestamp information; the event log is data generated in the process of executing the process, the event log is a multi-set of cases, and one case is a process instance corresponding to an event in the event log, namely one-time execution of the process model.
2) Excavating a dependency relationship and a dependency degree value among activities; the dependency relationship among the activities means that in the unmarked service flow event log UL, activity b depends on activity a to satisfy that when the ith position in the UL is activity a, the (i + 1) th position is b, where i satisfies: 1 ≦ i ≦ UL | -1, and the dependency metric for activity b dependent on activity a is calculated as follows:
Figure BDA0003949964180000081
where R (a → b) represents the dependency value of activity b on activity a, a → b represents activity b on activity a, | a → b | represents the frequency with which activity b depends on activity a in the unmarked traffic flow event log UL, | UL | represents the total number of activities in the unmarked traffic flow event log, | a | represents the frequency of activity a in the unmarked traffic flow event log UL, | b | represents the frequency of activity b in the unmarked traffic flow event log UL, a is dependent activity, and b is dependent activity.
Taking the unmarked traffic flow event log UL = < a, B, C, a, D, C, B, D, a, B, C, D, a, C, B, D, a, C, D, a, B, D > as an example, then R (a → B) =1.5, R (a → C) =2.0, R (a → D) =0.57, R (B → a) =0.0, R (B → C) =1.5, R (B → D) =2.86, R (C → a) =0.5, R (C → B) =2.5, R (C → D) =1.14, R (D → a) =3.43, R (D) =0.0, R (D → C) =0.57.
3) Mining the concurrent relation among the activities and constructing a dependency relation graph according to the dependency degree values of the activities mined in the step; the concurrency relationship between activities is in an unmarked business process event log, and the activity c and the activity b are in a concurrency relationship, namely c | | d, which needs to satisfy the following conditions:
①R(c→d)>ω∨R(d→c)>ω
Figure BDA0003949964180000091
in the formula, ω and θ are input threshold values, R (c → d) represents a dependency degree value of activity d depending on activity c, and R (d → c) represents a dependency degree value of activity c depending on activity d; the condition (1) means that the activity c depends on the activity d or the dependency degree value of the activity d depends on the activity c is larger than the threshold value omega, namely, the activity c is used for screening the activity pairs with high dependency degree; condition (2) means that the dependency degree values of activity c dependent on activity d and activity d dependent on activity c are close;
taking an unmarked traffic flow event log UL = < a, B, C, a, C, B, D, a, B, C, D, a, C, B, C, D, B, a, C, D, B, D > as an example, R (a → B) =1.33, R (a → C) =2.0, R (B → a) =0.67, R (B → C) =2.0, R (B → D) =1.6, R (C → a) =0.67, R (C → B) =1.33, R (C → D) =2.4, R (D → a) =1.6, R (D → B) =1.6, R (D → C) =0.96, take ω as 1.0, θ as 0.3, get and send active set partelseti | seti, and set a value of ω as shown in fig. 4, a value of θ 3, and θ 3, as shown in a diagram.
The dependency graph is a two-tuple DG = (N, E), wherein,
Figure BDA0003949964180000092
is a set of the top points,
Figure BDA0003949964180000093
activity for unmarked business process event log ULDynamic collection;
Figure BDA0003949964180000094
representing a dependency edge set among nodes, and satisfying the following conditions for any (E, f) belonging to E: (1) R (e → f) ≧ omega, i.e., the degree of dependence of activity f on activity e is greater than or equal to threshold omega; (2) there is no concurrency relationship between activity e and activity f; (3) The value of the dependency degree of the activity f dependent activity e and the activity e dependent activity f is close to each other, | R (e → f) -R (f → e) |/(R (e → f) + R (f → e)) < theta; (4) When e is the start activity startAct, f is not the end activity endAct, or when f is the end activity endAct, e is not the start activity startAct;
the above-mentioned ω and θ are threshold values of the input, R (e → f) represents a dependency degree value of the activity f depending on the activity e, and R (f → e) represents a dependency degree value of the activity e depending on the activity f; the start activity startAct is the first activity of the unmarked business process event log, and the end activity endAct is the last activity of the unmarked business process event log.
Taking the above unmarked service flow event log UL as an example, the dependency relationship is shown in fig. 5.
4) Mining mutually exclusive activities and cyclic activities among the activities according to the mined concurrent relationship pair and dependency relationship graph; the mutually exclusive activity is defined as that if the activity a and the activity b are mutually exclusive activities, the following requirements are met: (1) The activity b does not exist on a path from the activity a to the end activity endAct in the dependency graph; (2) No activity a exists on the path from activity b to the ending activity endAct in the dependency graph; (3) there is no concurrency relationship between activity a and activity b;
the above mentioned path refers to the dependency graph DG = (N, E), and for any two nodes c and d, if there is a node c 1 ,c 2 ,...,c j ,...,c k ,...,c n And c is satisfied when j is more than or equal to 1 and less than k and less than or equal to n j ≠c k If c → c 1 ,c 1 →c 2 ,...,c n → d, then there is one path p (c, d) from activity c to activity d, i.e. p (c, d) = (c, c) 1 ,...,c n ,d);
The mentioned ending activity endAct is the last activity of the unmarked business process event log;
the cycle activity is defined as if activity e is a cycle activity, it needs to satisfy: (1) Ie > | startAct |, i.e. the frequency of activity e in the unmarked service flow event log UL is greater than the frequency of starting activity startAct; (2) A path from the node e to the node e exists in the dependency graph;
the start activity startAct mentioned above is the first activity of the unmarked business process event log;
taking an unmarked service flow event log UL = < a, B, C, a, C, B, D, a, B, C, D, a, C, B, C, D, B, a, C, D, B, D >, the dependency relationship thereof is as shown in fig. 5, the mutually exclusive active set of UL is an empty set, and since | a | =6, | B | =6, | C | =6, | D | =6, the cyclic active set of UL is an empty set.
5) Performing case identification on activities in the unmarked business process event logs based on the case tree; the case tree is a quadruplet ctre (σ) = (Node, root, F, leaves), wherein,
Figure BDA0003949964180000111
for the node active set, there are case, subscript and active name basic information for any node,
Figure BDA0003949964180000112
the subscript refers to the position information of the activity in the unmarked service process event log UL; root belongs to Node as Root Node;
Figure BDA0003949964180000113
is a set of directed arcs representing the directional relationships between nodes, i.e.
Figure BDA0003949964180000114
If the node a points to the node b, the node a is a father node of the node b, and the node b is a child node of the node a; the Leaves belonging to the Node is a leaf Node set, and for any leaf Node, the sub-Node point of the leaf Node is null;
the case identification is to reconstruct the event log of the unmarked service process, add case information to the activity and obtain the event log with the case information; the case identification comprises the following specific steps:
5.1 Input a no-mark service flow event log UL, and acquire a concurrent activity set, a cyclic activity set, a mutual exclusion activity set and a dependency relationship graph;
5.2 ) initializing a set of case trees
Figure BDA0003949964180000115
Case σ =0;
5.3 If the current activity to be allocated is activity e, then:
if the activity e is the starting activity startAct, indicating that a new case is started, newly building a case tree sigma, wherein the root node of the case tree is e, the leaf node is e, updating a case tree set caseTree = caseTree { [ sigma ], and updating the next case sigma = sigma +1; otherwise, one of two cases is performed:
in the first case: selecting a case tree sigma 'from the case tree set CaseTree, wherein the sigma' meets the condition: i. there is no mutually exclusive activity of e in the case tree; ii. If the case tree has no activity e or if the activity e is a cyclic activity, then a leaf node leaf is selected from the case tree σ', if one of the following two conditions exists, the first condition is: if the leaf and the activity e are concurrent activities, an activity n exists as an entry node of the activity e in the dependency graph, and the activity n exists in the case tree sigma ', adding the activity e into the case tree sigma', wherein a parent node of the activity e is n, and the activity e is a leaf node; case two: if R (leaf → e) ≧ omega, adding the activity e to the case tree sigma', wherein the father node of the activity e is leaf and the activity e is leaf node, wherein R (leaf → e) represents the dependency degree value of the activity e depending on the leaf;
in the second case: preferentially selecting a case tree from the CaseTree to be allocated to the activity, and preferentially selecting the case tree according to the following principle: selecting a case tree rt from a case tree set CaseTree, if a leaf node leaf exists in rt, expressing any leaf node in all case trees by leaf ', and satisfying that R (leaf → e) ≧ R (leaf' → e), adding an activity e into the case tree rt, wherein the father node of the activity e is leaf and the activity e is a leaf node, and R (leaf '→ e) expresses a dependency degree value of the activity e depending on the activity leaf';
5.4 Step 5.3) of cycle, traversing from the first activity in the unmarked business process event log to the last activity in sequence, finishing case identification, reading the activity case information from the case tree, and outputting the reconstructed event log with marks.
Using unmarked service process event log UL =<A,B,C,A,C,B,D,D,A,A,B,C,D,A,C,B,C,D,B,A,C,D,B,D>For example, taking ω as 1.0 and θ as 0.3, we obtain the concurrent activity set ParallelSet = { B | | C },
Figure BDA0003949964180000121
startAct = a, endAct = D, and the event log L = [ case ] information can be obtained by the above steps<A,B,C,D> 2 ,<A,C,B,D> 4 ]。
Example 2
The embodiment discloses a system for identifying a case of an unmarked business process event log, which is used for implementing the method for identifying the case of the unmarked business process event log described in embodiment 1, and as shown in fig. 6, the system comprises the following functional modules:
the data acquisition module is used for acquiring the unmarked service process event log;
the activity dependency relationship mining module is used for acquiring the dependency relationship and the dependency degree value among activities;
the activity relation mining module is used for acquiring activity concurrency relation, mutual exclusion relation and circulation relation;
the case recognition module of the unmarked service process event log is used for reconstructing the unmarked service process event log, adding case attributes for activities based on the case tree and obtaining the event log with case information (namely case identification).
In summary, after the above scheme is adopted, the invention provides a brand-new method and system for case identification from the unmarked business process event logs, and takes the data characteristics in the unmarked business process event logs as an effective means for case identification, thereby effectively breaking through the problem that the traditional process mining technology cannot support unmarked business process event logs mining, effectively promoting the development of case identification and unmarked business process event logs mining, having practical application value and being worthy of popularization.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims (7)

1. A method for identifying an event log case of an unmarked business process is characterized by comprising the following steps:
1) Acquiring basic data, namely a unmarked business process event log;
2) Mining the dependency relationship and the dependency degree value among activities according to the basic data obtained in the step 1);
3) Mining the concurrency relationship among the activities and constructing a dependency relationship graph according to the dependency degree value of the activities mined in the step 2);
4) Mining mutually exclusive activities and cyclic activities among the activities according to the concurrent relationship and the dependency relationship graph among the activities mined in the step 3);
5) And constructing a case tree to perform case recognition on the activities in the unmarked service process event log according to the concurrency relation and dependency relation graph among the activities mined in the step 3) and the exclusive activities and cyclic activities among the activities mined in the step 4) to obtain the event log with case identification.
2. The method for identifying the unmarked business process event log case as claimed in claim 1, wherein: in step 1), the unmarked business process event log is an event log of a lack case and is represented as an event sequence with time stamp information; the event log is data generated in the process of executing the process, the event log is a multi-set of cases, and one case is a process instance corresponding to an event in the event log, namely one-time execution of the process model.
3. The method of claim 2, wherein the method for identifying the unmarked business process event log case comprises the following steps: in step 2), the dependency relationship between the activities means that in the unmarked service flow event log UL, activity b depends on activity a to satisfy that when the ith position in UL is activity a, the (i + 1) th position is b, where i satisfies: 1 ≦ i ≦ UL | -1, and the dependency metric for activity b dependent on activity a is calculated as follows:
Figure FDA0003949964170000011
where R (a → b) represents the dependency value of activity b on activity a, a → b represents activity b on activity a, | a → b | represents the frequency with which activity b depends on activity a in the unmarked traffic flow event log UL, | UL | represents the total number of activities in the unmarked traffic flow event log, | a | represents the frequency of activity a in the unmarked traffic flow event log UL, | b | represents the frequency of activity b in the unmarked traffic flow event log UL, a is dependent activity, and b is dependent activity.
4. The method of claim 3, wherein the method for identifying the unmarked business process event log case comprises the following steps: in step 3), the concurrency relationship between activities is in the unmarked business process event log, and activity c and activity b are in the concurrency relationship, that is, c | | d, which needs to satisfy:
①R(c→d)>ω∨R(d→c)>ω
Figure FDA0003949964170000021
wherein, ω and θ are input threshold values, R (c → d) represents the dependency degree value of activity d dependent on activity c, and R (d → c) represents the dependency degree value of activity c dependent on activity d; the condition (1) means that the activity c depends on the activity d or the dependency degree value of the activity d depends on the activity c is larger than the threshold value omega, namely, the activity c is used for screening the activity pairs with high dependency degree; condition (2) means that the dependency degree values of activity c dependent on activity d and activity d dependent on activity c are close;
the specific steps of constructing the concurrent activity set according to the dependency degree value among the activities are as follows:
3.1 Input the unmarked business process event log UL to obtain the dependency degree value between activities;
3.2 Add activities that satisfy the inter-activity concurrency relationship to a concurrent activity set ParallelSet;
3.3 Step 3.2) of loop iteration until all the activity pairs are traversed, and outputting a concurrent activity set ParallelSet;
the dependency graph is a two-tuple DG = (N, E), wherein,
Figure FDA0003949964170000022
is a set of the top points,
Figure FDA0003949964170000023
an active set which is a no-mark service flow event log UL;
Figure FDA0003949964170000024
representing a dependency edge set among nodes, and satisfying the following conditions for any (E, f) belonging to E: (1) R (e → f) ≧ omega, i.e., the degree of dependence of activity f on activity e is greater than or equal to threshold omega; (2) there is no concurrency relationship between activity e and activity f; (3) I R (e → f) -R (f → e) |/(R (e → f) + R (f → e))<Theta, namely the dependency degree value of the activity f depending on the activity e is close to that of the activity e depending on the activity f; (4) When e is the start activity startAct, f is not the end activity endAct, or when f is the end activity endAct, e is not the start activity startAct;
the above-mentioned ω and θ are threshold values of the input, R (e → f) represents a dependency value of the activity f depending on the activity e, and R (f → e) represents a dependency value of the activity e depending on the activity f; the start activity startAct is the first activity of the unmarked business process event log, and the end activity endAct is the last activity of the unmarked business process event log.
5. The method of claim 4, wherein the case identification method for the unmarked business process event log is characterized in that: in step 4), the mutually exclusive activity is defined as if the activity a and the activity b are mutually exclusive activities, and the following requirements are satisfied: (1) No activity b exists on a path from the activity a to the ending activity endAct in the dependency graph; (2) No activity a exists on the path from activity b to the ending activity endAct in the dependency graph; (3) there is no concurrency relationship between activity a and activity b;
the above mentioned path refers to the dependency graph DG = (N, E), and for any two nodes c and d, if there is a node c 1 ,c 2 ,…,c j ,…,c k ,…,c n And satisfies when 1. Ltoreq. J<When k is less than or equal to n, c j ≠c k If c → c 1 ,c 1 →c 2 ,…,c n → d, then there is one path p (c, d) from activity c to activity d, i.e. p (c, d) = (c, c) 1 ,…,c n ,d);
The mentioned ending activity endAct is the last activity of the unmarked business process event log;
the cycle activity is defined as that if the activity e is the cycle activity, the requirement of (1) | e | > | startAct |, namely the frequency of the activity e in the unmarked business process event log UL is greater than the frequency of the activity startAct; (2) A path from the node e to the node e exists in the dependency graph;
the start activity startAct mentioned above is the first activity of the unmarked business process event log;
mining mutually exclusive activities and cyclic activities according to the concurrency relationship and the dependency relationship graph among the activities, and the steps are as follows:
4.1 Input the unmarked service flow event log UL to obtain a concurrent activity set and a dependency relationship graph;
4.2 According to the dependency relationship diagram, adding the activity pairs meeting the mutual exclusion relationship into a mutual exclusion activity set;
4.3 Step 4.2) of loop iteration until all the activity pairs are traversed, and outputting a mutually exclusive activity set Exclusive set;
4.4 Based on the frequency of starting the active startAct), if the frequency of an activity is greater than the frequency of starting the active startAct and there is a path from the activity to the activity in the dependency graph, the activity is a loop activity and is added to the loop activity set LoopSet;
4.5 Step 4.4) of loop iteration until all the activities are traversed, and a loop activity set LoopSet is output.
6. The method of claim 5, wherein the method for identifying the unmarked business process event log case comprises the following steps: in step 5), the case tree is a quadruplet ctre (σ) = (Node, root, F, leaves), wherein,
Figure FDA0003949964170000041
for the node active set, there are case, subscript and active name basic information for any node,
Figure FDA0003949964170000042
the subscript refers to the position information of the activity in the unmarked service process event log UL; root belongs to Node as Root Node;
Figure FDA0003949964170000043
is a set of directed arcs representing the directional relationships between nodes, i.e.
Figure FDA0003949964170000044
If the node a points to the node b, the node a is a father node of the node b, and the node b is a child node of the node a; the Leaves belonging to the Node is a leaf Node set, and for any leaf Node, the sub-Node point of the leaf Node is null;
the case identification is to reconstruct the event log of the unmarked service process, add case information to the activity and obtain the event log with the case information; the case identification comprises the following specific steps:
5.1 Input a no-mark service flow event log UL, and acquire a concurrent activity set, a cyclic activity set, a mutual exclusion activity set and a dependency relationship graph;
5.2 ) initializing a set of case trees
Figure FDA0003949964170000045
Case σ =0;
5.3 If the current activity to be allocated is activity e, then:
if the activity e is the starting activity startAct, indicating that a new case is started, newly building a case tree sigma, wherein the root node of the case tree is e, the leaf nodes are e, updating a case tree set caseTree = caseTree { [ sigma }, and updating the next case sigma = sigma +1; otherwise, one of two cases is performed:
in the first case: selecting a case tree sigma 'from the case tree set CaseTree, wherein the sigma' meets the condition: i. there is no mutually exclusive activity of e in the case tree; ii. If the case tree has no activity e or if the activity e is a cyclic activity, then a leaf node leaf is selected from the case tree σ', if one of the following two conditions exists, the first condition is: if the leaf and the activity e are concurrent activities, an activity n exists as an entry node of the activity e in the dependency graph, and the activity n exists in the case tree sigma ', adding the activity e into the case tree sigma', wherein a parent node of the activity e is n, and the activity e is a leaf node; case two: if R (leaf → e) is not less than omega, adding the activity e into the case tree sigma', wherein the father node of the activity e is leaf and the activity e is leaf, and R (leaf → e) represents the dependency degree value of the activity e depending on the leaf;
in the second case: preferentially selecting a case tree from the CaseTree to be allocated to the activity, and preferentially selecting the case tree according to the following principle: selecting a case tree rt from a case tree set CaseTree, if a leaf node leaf exists in rt, expressing any leaf node in all case trees by leaf ', and satisfying that R (leaf → e) ≧ R (leaf' → e), adding an activity e into the case tree rt, wherein the father node of the activity e is leaf and the activity e is a leaf node, and R (leaf '→ e) expresses a dependency degree value of the activity e depending on the activity leaf';
5.4 Step 5.3) of cycle, traversing from the first activity in the unmarked business process event log to the last activity in sequence, finishing case identification, reading the activity case information from the case tree, and outputting the reconstructed event log with marks.
7. An unmarked business process event log case recognition system, which is used for implementing the unmarked business process event log case recognition method of any one of claims 1 to 6, and comprises the following steps:
the data acquisition module is used for acquiring the unmarked service process event log;
the activity dependency relationship mining module is used for acquiring the dependency relationship and the dependency degree value among activities;
the activity relation mining module is used for acquiring activity concurrency relation, mutual exclusion relation and cyclic relation;
and the case recognition module is used for reconstructing the unmarked service process event logs, adding case attributes for activities based on the case tree and obtaining the event logs with case information, namely case identifications.
CN202211445119.0A 2022-11-18 2022-11-18 Method and system for identifying unmarked business process event log case Withdrawn CN115712676A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211445119.0A CN115712676A (en) 2022-11-18 2022-11-18 Method and system for identifying unmarked business process event log case

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211445119.0A CN115712676A (en) 2022-11-18 2022-11-18 Method and system for identifying unmarked business process event log case

Publications (1)

Publication Number Publication Date
CN115712676A true CN115712676A (en) 2023-02-24

Family

ID=85233857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211445119.0A Withdrawn CN115712676A (en) 2022-11-18 2022-11-18 Method and system for identifying unmarked business process event log case

Country Status (1)

Country Link
CN (1) CN115712676A (en)

Similar Documents

Publication Publication Date Title
CN107665191B (en) Private protocol message format inference method based on extended prefix tree
CN102332125B (en) Workflow mining method based on subsequent tasks
CN105912666B (en) A kind of mixed structure data high-performance storage of facing cloud platform, querying method
CN103218692B (en) Workflow mining method based on dependence analysis between activity
CN102880684B (en) The workflow modeling method with combined authentication is excavated based on log recording
CN106503872B (en) A kind of business process system construction method based on basic business active set
CN113612749B (en) Intrusion behavior-oriented tracing data clustering method and device
CN106713273B (en) A kind of protocol keyword recognition methods based on dictionary tree pruning search
CN110737466A (en) Source code coding sequence representation method based on static program analysis
CN103150163A (en) Map/Reduce mode-based parallel relating method
CN106203631A (en) The parallel Frequent Episodes Mining of description type various dimensions sequence of events and system
CN114443854A (en) Processing method and device of multi-source heterogeneous data, computer equipment and storage medium
CN106557881B (en) Business process system construction method based on business activity execution sequence
CN113139712B (en) Machine learning-based extraction method for incomplete rules of activity attributes of process logs
CN108897680B (en) Software system operation profile construction method based on SOA
CN109086385A (en) A kind of operation flow low frequency Behavior mining method based on Petri network
CN115712676A (en) Method and system for identifying unmarked business process event log case
CN108647220A (en) Based on event indirectly prior to the scientific workflow method for digging of relationship
CN112069136A (en) Outsourcing model mining method for emergency handling process of emergency event
CN110049023B (en) Unknown protocol reverse identification method and system based on machine learning
CN110046265B (en) Subgraph query method based on double-layer index
CN104199649B (en) The path method for decomposing of interactive information between a kind of process for father and son
CN113947374A (en) Process mining system based on causal concurrency network
CN102929896A (en) Data mining method based on privacy protection
CN115203290A (en) Fault diagnosis method based on multi-dimensional prefix span algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20230224