Disclosure of Invention
The invention aims to provide a process model mining method of multiple concurrent triangle 2-degree cycles, aiming at the problem that a mining result model is easy to deviate from an original model when the existing multiple concurrent triangle 2-degree cycles are mined.
The invention adopts the following technical scheme:
a process model mining method of multiple concurrent triangular 2-degree circulation comprises the following steps:
step 1: providing a triangle 2-degree cycle concurrency block according to the definition of the triangle 2-degree cycle, and dividing the activities into main activities and callback activities according to the quantity characteristics;
define triangle 2 degree cycle, with Δ>LOr<ΔLRepresents;
let N be (P, T; F, M) a Petri net model, a, b two transitions in N, a Δ>Lb or b<ΔLa if and only if:
(2) suppose M1∈R(M0) So that M1[a>M2And no M is present1[σ>M2Where σ is the occurrence sequence, then only M is present2[b>M1If M is present2Not final identification and presence of M2[x>M3Wherein x belongs to T, a is not equal to x is not equal to b;
provided with an activity ai,biForm a triangular 2-degree cycle satisfying aiΔ>LbiOr bi<Δai,aiIs a main body moving, biFor callback activities, a set formed by all main activities and a set of all callback activities are called a main activity set and a callback activity set, respectively, and formally defined as follows:
defining a subject activity set and a callback activity set
Let BoLIs a subject activity set, CLIs a callback activity set, wherein:
defining triangle 2 degree cyclic concurrency blocks
Let a doublet (a)1,b1),(a2,b2),……,(an,bn) All activities in (1) satisfy aiΔ>LbiWhen n triangles are circularly concurrent at 2 degrees, unique transition x and y exist, and the following conditions are satisfied:
(1)x=●(●a1)∩●(●a2)∩……∩●(●an);
(2)y=(a1 ●)●∩(●a2 ●)●∩……∩(an ●)●;
say x, y and n concurrent tripletsThe structure formed by angular 2-degree circulation is a triangular 2-degree circulation concurrent block, wherein hΔX is the head of block activity, tΔY is a block tail activity;
two activities continuously occurring in the trace form a direct following relationship, and a concurrent relationship and a causal relationship can be judged by utilizing the direct following relationship, wherein the direct following set is defined as follows:
defining directly following collections
Directly following set D
LThe elements in (1) are all the components in the trace>
LThe activity of a relation constituting a doublet, i.e.
The activities forming the loop structure appear in the log for a plurality of times, the relationship of the times of the activities is an important reference for judging the loop, and the definition of the times of the activities appears is given as follows:
defining number of occurrences of an activity
Setting a log L, wherein the trace sigma belongs to L, the activity a belongs to sigma, sum (a, sigma) represents the number of times of the activity in the trace, and sum (a, L) represents the total number of times of the activity in the log;
algorithm 1 Classification Algorithm for Main Activity and callback Activity
Inputting: a log L satisfying local completeness;
and (3) outputting: subject Activity set BoLAnd callback active set CL;
Step (1): creating a one-dimensional array LTM to count the number of activities, creating a direct-following set DLThe main body Activity set BoLCallback activity set CLAnd triangle 2 degree cyclic concurrency block head activity hΔAnd initializing;
step (2): traversing the log L and putting the initial activity into the initial activity set TIPut the end activity into the end activity set ToPut all activities into the activity set TLAnd will continue to come outThe current activities form dyads, which are placed into the direct follow set DLPerforming the following steps;
and (3): traversing the log L, and counting the activity set TLThe times of the occurrence of the middle activities are put into the positions corresponding to the one-dimensional array LTM;
and (4): traversing the one-dimensional array LTM, if the difference between two elements in the array is larger than 0 and the activities corresponding to the two elements are in a concurrent relationship in the log, traversing any trace in the log, and assigning one activity before the first activity in the two activities as a triangle 2-degree cycle concurrent block head activity hΔ;
And (5): traverse the active set TLWill move with the head of the block hΔPutting activities satisfying causal relationships into a subject activity set BoLPlacing activities in the callback activity set C that satisfy concurrency and direct-following relationships with activities in the subject activity setLThe preparation method comprises the following steps of (1) performing;
and (6): return to the subject Activity set BoLAnd callback active set CL;
Step 2: defining the position of the first occurrence of the activity in the trace, and deleting incorrect activity matching by adopting a pruning idea so as to obtain correct activity matching;
defining the position where an activity first appears in a trace
Setting trace sigma epsilon L, activity a epsilon sigma, and first (a, sigma) represents the position index of the first occurrence of the activity a in the trace sigma;
defining a first time marker position matrix
Let log L, set Bo
L∪C
LThen the first marking position matrix is FM [ | L! non-calculation][|Bo
L∪C
L|]Satisfy the following requirements
Having FM [ sigma ]
i][a
j]=first(a
j,σ
i);
Obtaining matching results by using the first marking position matrix, wherein a set formed by all matching results is called a matching result set, and the matching results and the matching result set are defined as follows:
defining a match result and a set of match results
The matching result is a binary mt
lWhere a is a subject activity and b is a callback activity, the activities in the dyad cannot be either subject activities or callback activities at the same time, i.e. the activity in the dyad is a subject activity or a callback activity
Set of matching results MTLIs derived from the matching result mtlSet of compositions, i.e. MTL={(a,b)|(a∈BoL∧b∈CL)∨(b∈BoL∧a∈CL)};
Algorithm 2 subject activity and callback activity matching algorithm
Inputting: log L, subject Activity set Bo satisfying local completenessLCallback activity set CL;
And (3) outputting: set of matching results MTL;
Step (1): creating a first-marking position matrix FM [ | L |)][|BoL∪CL|]Set of matching results MTLAnd matching result mtlAnd initializing;
step (2): assemble the subject activities BoLAnd callback active set CLThe activities in (1) are Cartesian multiplied, and the formed binary groups are assigned to mtlAnd all mt are combinedlPut into the matching result set MTLPerforming the following steps;
and (3): traversing the log L and leading the L to belong to the set BoL∪CLIs recorded at the position where the activity first appears in the track and stored in the two-dimensional array FM [ | L][|BoL∪CL|]The corresponding position in (1);
and (4): traverse two-dimensional array FM [ | L #][|BoL∪CL|]If callback to active set CLIs smaller than the subject activity set BoLAt a position in (1), the matching result set MT isLThe cartesian product of this activity in (a) is deleted;
and (5): returning a matching resultSet MTL;
And step 3: obtaining an AlphaMatch algorithm, and completing the excavation of a process model of multiple concurrent triangles with 2-degree circulation;
defining AlphaMatch Algorithm
Let L be an activity-based log, then AlphaMatch (L) is defined as follows:
(9)PL={P(A,B)|(A,B)∈YL}∪{iL,oL};
(10)FL={(a,P(A,B))|(A,B)∈YL∧a∈A}∪{(P(A,B),b)|(A,B)∈YL∧b∈B}∪{(iL,t)|t∈TI}∪{(t,oL)|t∈TO};
(11)AlphaMatch(PL,TL,FL);
the AlphaMatch algorithm classifies the subject activities and callback activities, matches the subject activities with the callback activities, and returns a correctly matched result set MTLAnd obtain the relationships between activities in the result set.
The invention has the beneficial effects that:
according to the process model mining method for the multi-concurrent-triangle 2-degree cycle, firstly, the concept of the triangle 2-degree cycle concurrent block is provided according to the definition of the triangle 2-degree cycle, the activities are divided into main activities and callback activities according to the number characteristics, and then incorrect activities are matched and deleted by adopting the pruning idea according to the position where the activities firstly appear in the trace, so that correct activity matching is obtained. The method is simple to implement, easy to operate, low in dependence on log completeness, high in accuracy and free of digging out the relation which does not exist in the log.
Finally, the algorithm is realized on a ProM platform in a plug-in mode, experimental analysis proves that the algorithm can accurately and effectively mine multiple concurrent triangle 2-degree circulation, and the model obtained by the method has higher accuracy and fitting degree.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings:
defining traces and event logs
Let A be the set of activities, trace σ ∈ A ∈ the activity queue, and event log L be the multi-set of traces, i.e., L ∈ B (A ∈).
The Petri net is a model for describing a distributed system, can describe the structure of the system and can simulate the operation of the system. The Petri net is a directed bipartite graph without isolated nodes in form.
Defining a Petri Net
A Petri net is a four-tuple N ═ (P, T; F, M), where P is a finite set of libraries and T is a finite set of transitions. N satisfies:
(3)
is a collection of directed arcs, called a stream relation;
(4) p → {1,2, 3. } an identifier called N, M0Representing the initial identity.
(5)dom(F)∪cod(F)=P∪T。
Wherein the content of the first and second substances,
the antecedent and postcedent sets are used to describe the library or epitopic portion of the transition, and are defined as follows:
defining a front set and a back set
Let N ═ (P, T; F, M) be a Petri net. For x ∈ P ^ T, remember
●x={y|y∈P∪T∧(y,x)∈F}
x●={y|y∈P∪T∧(x,y)∈F}
Balance with scale●x is a preceding set of x, called x●The latter set of x.
Any two activities in a trace constitute different ordering relationships, four common ordering relationships are as follows:
defining log-based ordering relationships
Let L be an activity-based event log, σ be the trace in the log, and a, b be any two activities that occur in log L. Then:
(1)a>Lb if and only if there is a trace σ ═ b<t1,t2,t3,...tn>I ∈ {1,2,3, 4., n-1}, such that σ ∈ L, tiA and ti+1=b;
(2)a→Lb if and only if a >Lb and no b >La。
(3)a#Lb if and only if a > is not presentLb also does not existLa。
(4)a||Lb if and only if a >Lb and b >La。
The following describes the process model mining method of multiple concurrent triangle 2 degree loops in detail.
A process model mining method of multiple concurrent triangle 2-degree circulation comprises the following steps:
step 1: providing a triangle 2-degree cycle concurrency block according to the definition of the triangle 2-degree cycle, and dividing the activities into main activities and callback activities according to the quantity characteristics;
define triangle 2 degree cycle, with Δ>LOr<ΔLTo represent
Let N be (P, T; F, M) a Petri net model and a, b two transitions in N. a Δ>Lb or b<ΔLa if and only if:
(2) suppose M1∈R(M0) So that M1[a>M2And no M is present1[σ>M2Where σ is the occurrence sequence, then only M is present2[b>M1If M is present2Not final identity and presence M2[x>M3Wherein x belongs to T, a is not equal to x is not equal to b;
defining Alpha Algorithm
Let L represent an activity-based event log, then alpha (L) is defined as follows:
(6)PL={P(A,B)|(A,B)∈YL}∪{iL,oL};
(8)Alpha(PL,TL,FL);
defining partial completeness logs
Let a, b be any two activities in the log, and b can directly follow a, said to satisfy a>LThe log of at least one occurrence of the behavior of b in the trace is a partial completeness log.
The mold is typically constructed from a plurality of parts. The plastic pouring mold is formed by splicing an upper special metal groove and a lower special metal groove into a cavity, finally, liquid plastic is poured into the cavity, the plastic is cooled and molded, and the plastic is processed into plastic through the subsequent process. The plastic casting mold is usually produced by first producing the upper and lower grooves of the semi-finished product separately. Because the mould needs very high precision, the upper and lower recess of semi-manufactured goods all do not accord with the concatenation standard under most circumstances, so require to polish the calibration to the upper and lower two recesses of mould. On one hand, the edge meets the splicing requirement, and on the other hand, the cavity is polished. The concrete process can be abstracted into the following steps: 1) preparing a mold production material. 2) And producing a semi-finished product groove. 3) The groove on the mold was measured. 4) If the upper groove meets the standard, waiting for splicing; if the standard is not met, polishing and calibrating the upper groove are needed, and then continuing to perform the step 3. 5) The groove under the mold is measured. 6) If the lower groove meets the standard, waiting for splicing; if the standard is not met, polishing calibration needs to be carried out on the lower groove, and then the step 5 is continued. 7) And after the upper groove and the lower groove of the mold are measured to be qualified, splicing the two grooves into a plastic pouring mold. 8) The finished product mold enters the subsequent plastic production flow. Wherein, the step 3 and the step 5 can be carried out simultaneously, and the step 4 and the step 6 can also be carried out simultaneously. The process model of the die machining is shown in fig. 1:
as shown in the model of FIG. 1, two triangular 2 degree cycles are concurrent in the model, and a and b, c and d are clearThe precedence relationship is shown. The activity a must be performed first, then the activity b, and after the activity b is completed, the activity a must be performed again, and the activity b must be performed between the two activities a. Activities c and d also follow such a relationship. The model can generate two types of logs, one type is a complete log containing obvious triangular 2-degree cycle explicit behaviors such as ' aba ' and ' cdc[12]. Such as log L1=<e k a c j f,e k c a j f,e k a b a c j f,e k a c b a j f,e k c d c a j,e k c d a c j f,e k a b a c d c j f,e k c d c a b a j f,…>. The other type is a local complete log without explicit behaviors of triangle loops such as "aba", "cdc", and the like. Such as log L2=<e k a c j f,e k c a j f,e k a b c a j f,e k a c b a j f,e k c d a c j f,e k c a d c j f,e k a c db a c j f,e k a c b d a c j f>. For log L2In the prior art, the algorithm cannot dig out the correct model. Taking Alpha + algorithm as an example, Alpha + algorithm pairs log L2The excavation results are shown in fig. 2. Because the Alpha + algorithm only digs the concurrency relation among a, b, c and d, no triangle 2-degree cycle is dug. Therefore, the resulting model has two independent transitions and is apparently not the correct model.
Although no explicit behavior of the loop occurs in the log, the log still maintains the characteristics of the loop: (1) the sequence of the occurrences of the activities a and b in the trace is fixed; (2) the relation of the number of activities a and b in the log is not changed.
In the partial completeness log lacking the loop explicit behavior, mining a model containing a plurality of concurrent triangle 2-degree loops is important for research through the above features of the structure. The core of the problem is how to correctly match two activities into a triangular 2 degree loop. Such as the model shown in fig. 2, the core of the problem is how to match activities b, d with activities a, c into triangular 2 degree loops.
The algorithm is described in detail by taking a log generated by the model in fig. 1 as an example.
Provided with an activity ai,biForm a triangular 2-degree cycle satisfying aiΔ>LbiOr bi<Δai,aiIs a main body moving, biIs a callback activity. The set formed by all the main body activities and the set formed by all the callback activities are called a main body activity set and a callback activity set respectively, and the formalization definition of the set is as follows:
defining a subject activity set and a callback activity set
Let BoLIs a subject activity set, CLIs a callback activity set, wherein:
defining triangle 2 degree cyclic concurrency blocks
Let a doublet (a)1,b1),(a2,b2),……,(an,bn) All activities in (1) satisfy aiΔ>LbiWhen n triangles are circularly concurrent at 2 degrees, unique transition x and y exist, and the following conditions are satisfied:
(1)x=●(●a1)∩●(●a2)∩……∩●(●an);
(2)y=(a1 ●)●∩(●a2 ●)●∩……∩(an ●)●。
the structure formed by x, y and n concurrent triangular 2-degree cycles is called a triangular 2-degree cycle concurrent block, wherein hΔX is the head of block activity, tΔY is a block tail activity.
As shown in FIG. 3, the activities in the two tuples (c, d) and (a, b) constitute a triangle 2 degree loop, where a and c are subject activities and b and d are callback activities, respectively. And the first block activity k, the last block activity j and two concurrent triangle 2 degree cycles form a triangle 2 degree cycle concurrency block.
Two activities continuously occurring in the trace form a direct following relationship, other relationships such as a concurrency relationship, a causal relationship and the like can be judged by utilizing the direct following relationship, and a direct following set is defined as follows:
defining directly following collections
Directly following set D
LThe elements in (1) are all the components in the trace>
LThe activity of a relation constituting a doublet, i.e.
E.g. at σ3=<e a c f>Middle, activity e>La,a>Lc,c>Lf, therefore DL={(e,a),(a,c),(c,f)}。
The activities constituting the loop structure may appear in the log many times, and the relationship of the times of the activities is an important reference for judging the loop. The definition of the number of activity occurrences is given below:
defining number of occurrences of an activity
Let log L, trace σ ∈ L, activity a ∈ σ. sum (a, σ) represents the number of occurrences of the activity in the trace, and sum (a, L) represents the total number of occurrences of the activity in the log.
E.g. trace sigma1=<eacf>,sum(a,σ1)=1;L={<ecabaf>},sum(a,L)=2。
The characteristics of the activity in the log are the correct reflection of the model structure, the characteristics of the triangle 2-degree circular structure are abstracted into a position theorem and a number theorem, and a proof is given.
Theorem 1 position theorem: if there is activity ai,biE σ, σ e L, and aiΔ>LbiThen first (a)i,σ)<first(bi,σ)。
And (3) proving that: activity ai,biBelongs to sigma, and belongs to L and satisfies aiΔ>Lbi,aiFor major activities, biFor callback activities. The triangle is defined as the 2 degree cycle of the triangle is certain to generate<aibiaibiaibi…biai>Of a sequence of which the first occurring activity is aiOnly then the first b appearsi. Thus the first biThe subscript in σ must be less than the first aiSubscript in σ. After the syndrome is confirmed.
Besides satisfying the position theorem, the number of activities in the loop structure has a certain rule.
Theorem 2 quantitative theorem: if there is an activity aiAnd biAnd a isiΔ>LbiThen contains ai,biIn any trace sigma and any log L generated by the formed model of triangle 2-degree loop: sum (b)i,σ)-sum(aiσ) 1 and sum (a)i,L)-sum(bi,L)=|L|。
And (3) proving that:
1) if the triangle is not circulated by 2 degrees, the trace is ai Is 1, biThe theorem holds true when the total number of (2) is 0.
2) If entering the triangular 2 degree cycle, a is given abovei,biThe formed triangle 2-degree circulation is generated as sigma ═ definitely<aibiaibiaibi…biai>The sequence of (a). The first occurring activity is aiSubsequently occurring in pairs<biai>So a in σiA constant ratio of the total number of (b)iThe theorem holds true at this time, too, 1.
Each trace satisfies the above quantity relationship, at this time, a in the logiA constant ratio of the total number of (b)iThe total number of (a) is 1 × | L | ═ L |, in number. After the syndrome is confirmed.
Algorithm 1 is a classification algorithm for subject activities and callback activities. Algorithm 1 classifies main activities and callback activities in logs mainly according to the definition of triangle 2-degree circular concurrency blocks and the quantity relation in theorem 2, and puts Bo in the logs respectivelyLAnd CLIn (1).
Algorithm 1 Classification Algorithm for Main Activity and callback Activity
Inputting: a log L satisfying local completeness;
and (3) outputting: subject Activity set BoLAnd callback active set CL;
Step (1): creating a one-dimensional array LTM to count the number of activities, creating a direct-following set DLThe main body Activity set BoLCallback activity set CLAnd triangle 2 degree cyclic concurrency block head activity hΔAnd initialization is performed.
Step (2): traversing the log L and putting the initial activity into the initial activity set TIPut the end activity into the end activity set ToPut all activities into the activity set TLAnd the activities that occur consecutively are grouped into tuples, the tuples being placed into the directly following set DLIn (1).
And (3): traversing the log L, and counting the activity set TLAnd (4) putting the times of the occurrence of the medium activities into the corresponding positions of the one-dimensional array LTM.
And (4): traversing the one-dimensional array LTM, if the difference between two elements in the array is larger than 0 and the activities corresponding to the two elements are in a concurrent relationship in the log, traversing any trace in the log, and assigning one activity before the first activity in the two activities as a triangle 2-degree cycle concurrent block head activity hΔ。
And (5): traverse the active set TL. Will move with the head of the block hΔPutting activities satisfying causal relationships into a subject activity set BoLPlacing activities in the callback activity set C that satisfy concurrency and direct-following relationships with activities in the subject activity setLIn (1).
And (6): return to the subject Activity set BoLAnd callback active set CL。
With log L3:[<e k a b c a j f>,<e k a c b a j f>,<e k c d a c j f>,<e k c a d c j f>,<e k a c d b a c j f>,<e k a c b d a c j f>,<e k a c j f>,<e k c a j f>]For example. In the step (1), all elements in the LTM array are 0; step (2) obtaining DL={e>Lk,a>L c,c>Lb,c>Ld,b>Lc,b>La,k>Lc,b>Ld,d>Lc,d>Lb,c>Lj,a>Ld,k>La,d>La,a>Lj,c>La,a>Lb,j>Lf }; and (3) acquiring an LTM matrix, wherein the statistical result of the matrix is shown in Table 1. Step (4) obtaining the head of block activity hΔK; in the step (5), the occurrence frequency of the activities a and c is 12; the occurrence frequency of the activities b and d is 4, and the theorem 2 is satisfied. Then Bo is obtainedL={c,a},CLD, b. Step (6) returns to BoLAnd CL。
Table 1 Algorithm results in L3LTM of
Step 2: defining the position of the first occurrence of the activity in the trace, and deleting incorrect activity matching by adopting a pruning idea so as to obtain correct activity matching;
defining the position where an activity first appears in a trace
Let trace σ ∈ L, activity a ∈ σ. first (a, σ) represents the position index at which activity a first occurs in trace σ.
E.g. trace sigma1=<eacf>,first(a,σ1) 2; trace sigma2=<ecabaf>,first(a,σ2)=3。
Defining a first time marker position matrix
Let log L, set Bo
L∪C
LThen the first marking position matrix is FM [ | L! non-calculation][|Bo
L∪C
L|]Satisfy the following requirements
Is provided with
Matching results can be obtained by using the first-time mark position matrix, a set formed by all matching results is called a matching result set, and the matching results and the matching result set are defined as follows:
defining a match result and a set of match results
The matching result is a binary mt
lWhere a is the subject activity and b is the callback activity. The activities in the doublets cannot be simultaneously subject activities or callback activities, i.e.
Set of matching results MTLIs derived from the matching result mtlSet of compositions, i.e. MTL={(a,b)|(a∈BoL∧b∈CL)∨(b∈BoL∧a∈CL)}。
Algorithm 2 is a subject activity and callback activity matching algorithm. Algorithm 2 puts the matching results into the matching result set first, and then performs activity matching according to theorem 1 using the position at which the activity was first marked. The matching process adopts the concept of pruning, and if the matching result mt islIf the two activities in (1) do not satisfy theorem 1, the matching result is deleted.
Algorithm 2 subject activity and callback activity matching algorithm
Inputting: log L, subject Activity set Bo satisfying local completenessLCallback activity set CL;
And (3) outputting: set of matching results MTL;
Step (1): creating a first-marking position matrix FM [ | L |)][|BoL∪CL|]Set of matching results MTLAnd matching result mtlAnd initialization is performed.
Step (2): assemble the subject activities BoLAnd callback active set CLThe activities in (1) are Cartesian multiplied, and the formed binary groups are assigned to mtlAnd all mt are combinedlPut into the matching result set MTLIn (1).
And (3): traversing the log L and leading the L to belong to the set BoL∪CLIs recorded at the position where the activity first appears in the track and stored in the two-dimensional array FM [ | L][|BoL∪CL|]To the corresponding position in (a).
And (4): traverse two-dimensional array FM [ | L #][|BoL∪CL|]If callback to active set CLIs smaller than the subject activity set BoLAt a position in (1), the matching result set MT isLThe cartesian product of this activity in (a) is deleted.
And (5): return matching result set MTL。
With L3For example, in step (1), all elements in the FM matrix are 0. Step (2) adding CLWith BoLPut the Cartesian product of into the MTL. MT obtained at this timeL{ (a, b), (a, d), (b, a), (d, a), (c, b), (c, d), (b, c), (d, c) }. Step (3) obtaining CL∪BoLWhere the activity first occurs in each trace. Obtained FM [ | L # |)][|BoL∪CL|]As shown in table 2. Step (4) search for CLHas a position index smaller than BoLWhich active position indices in σ1For example, first (b, σ)1)=2<first(c,σ1) 3. According to theorem 1, if the subject activity c and the callback activity b are proved not to be matched, the MT is correspondingly deletedLTwo tuples of (b, c) and (c, b). At σ3Middle first (d, sigma)3)=2<first(a,σ3) 3. In the same way, delete MTLTwo tuples of (a, d) and (d, a) in (a, d). After traversing FM, MTLFour binary groups (a, b), (b, a), (c, d) and (d, c) are left, and the activities in the binary groups in the matching result set can form a triangle 2-degree cycle.
TABLE 2L3First time mark position matrix FM
And step 3: obtaining an AlphaMatch algorithm, and completing the excavation of a process model of multiple concurrent triangles with 2-degree circulation;
defining AlphaMatch Algorithm
Let L be an activity-based log, then AlphaMatch (L) is defined as follows:
(9)PL={P(A,B)|(A,B)∈YL}∪{iL,oL};
(10)FL={(a,P(A,B))|(A,B)∈YL∧a∈A}∪{(P(A,B),b)|(A,B)∈YL∧b∈B}∪{(iL,t)|t∈TI}∪{(t,oL)|t∈TO};
(11)AlphaMatch(PL,TL,FL)。
compared with the classical Alpha algorithm, the Alpha match algorithm classifies the subject activities and the callback activities; matching the main body activity with the callback activity; finally returning a correctly matched result set MTLAnd obtain the relationships between activities in the result set. With L3For example, the relationship between activities mined and analyzed by the AlphaMatch algorithm is shown in table 3.
As can be seen from Table 3, activities a and b, c and d are all matched together, resulting in L3The corresponding model is shown in fig. 4, which is consistent with the original model in the case.
TABLE 3L3Footprint of
Example 1
The concurrent structure of triangle 2 degree circulation is widely applied to the fields of mould production, part processing, flexible manufacturing, precision instrument production, medical instrument production, sensor production and the like.
Taking a production process model of the ball bearing as an example, the invention obtains a local complete log without circulation explicit behaviors such as 'aba' and the like through the following steps:
1) inputting a process model for ball bearing production comprising three concurrent triangular 2 degree cycles as shown in fig. 5;
2) running a Perform a simple relationship of a (stochastic) Petri net plug-in the ProM to obtain a log of an original model;
3) and manually screening partial complete logs meeting the requirements. The log attributes for the experiments are shown in table 4:
experiments compared results mined by the Alpha Match algorithm, the Alpha + algorithm, the ILP algorithm, and the Inductive Miner-Integer (IMF) algorithm.
TABLE 4 Log attributes
Import log L4The log is obtained from the original model shown in fig. 5. Comparing the mining results of the Alpha + algorithm, the ILP algorithm, the Inductive Miner-Integer (IMF) algorithm and the Alpha Match algorithm. The Alpha + algorithm results are shown in fig. 6. Since the log is not completely complete, the Alpha + algorithm only digs out the concurrency relationships and causal relationships of activities a, b, c, d, g, h. At this time, the Alpha + algorithm does not dig out the relationship between the three callback activities and other activities, so that three independent transitions exist in the model of fig. 6, which is greatly different from the original model. Therefore, this model is not reasonable.
FIG. 7 is a model mined by the ILP algorithm, which yields a model in which the subject activity and callback activity matches correctly, but which has many order relationships that do not exist in the original model and logs, as compared to the Alpha + algorithm. E.g. e →Lb、e→Ld. The model obtained by the ILP algorithm is therefore not reasonable.
FIG. 8 is a model mined by the Inductive Miner-not-frequency (IMF) algorithm, which does not perform matching of activities, but rather separates callback activities and subject activities into two parts and adds a large number of invisible transitions, which results in a relatively complex model structure. In addition, if the main activity a occurs first and the other two main activities do not occur yet, any one of the three callback activities may occur immediately following the activity a. The sequence generated in this case may not have been generated by the original model, for example the sequence "aha". Therefore, the model in fig. 8 is not reasonable.
FIG. 9 is a model mined by the method of the present invention, the model shown in FIG. 9 correctly matches activities together and there are no independent transitions present, as compared to models mined by the Alpha + algorithm; compared to the model mined by the ILP algorithm, the model shown in FIG. 9 has no error relationships between activities. Compared to the model mined by the Inductive Miner-Integer (IMF) algorithm, the model of FIG. 9 does not produce sequences that were not available from the original model, and the model is consistent with the original model.
In summary, from the perspective of algorithm mining model, the model mined herein is consistent with the original model, and has a great advantage compared with other algorithms.
Four resulting models were analyzed from a fitness perspective. Importing logs L of different scales and different complentances generated by original model4,L5,L6,L7. L in four logs7The number of traces contained is the largest and the completeness is the strongest. The model and the Log are input through a Replay a Log on Petri Net for Performance Analysis plug-in of the ProM platform, the fitting degree of the model mined by the four algorithms is obtained, and the statistical result is shown in FIG. 10. The fitness obtained by the AlphaMatch algorithm and the Inductive Miner-Integer (IMF) algorithm is always 1, and the fitness is higher than that of the other two algorithms. However, as the Inductive Miner-Integer (IMF) algorithm divides the subject activities and the callback activities into two pieces of mining, the model may also generate sequences similar to those that the original models such as "ada" and "aha" cannot generate. Therefore, the model is an unreasonable model. ILP algorithm is mining log L4,L5,L6The fitting degree of the obtained model is low, but the log L is mined7The time-of-fit also reaches 1 due to the enhanced log completeness, at which time the ILP also yields the correct model. In contrast, the algorithm herein also maintains a higher degree of fit at all times with poor completeness. Thus, the present algorithm has advantages over the ILP algorithm in terms of the completeness requirement of the log. Since none of the four logs above is a fully complete log as required by the Alpha + algorithm, the Alpha + algorithm cannot obtain the relationship between callback activities. The degree of fit of the model obtained by the Alpha + algorithm is relatively low.
In summary, the algorithm herein has certain advantages in the resulting fitting of the models.
And analyzing the four algorithms to obtain the accuracy of the model. The accuracy of the four algorithms is obtained by using Check Precision based on Align-ETConformance plug-in ProM,the statistical results are shown in fig. 11. Because three independent transitions occur in the model mined by the Alpha + algorithm, the accuracy of the obtained model is the lowest. The accuracy of the model obtained by the algorithm is not high because the model mined by the Inductive Miner-not-frequency (IMF) algorithm can generate a large number of activity sequences which cannot be generated by the original model. Due to log L4,L5,L6The completeness is weak, and the model excavated by the ILP algorithm is different from the original model to a certain extent, so that the accuracy is slightly lower than that of the algorithm. But the ILP algorithm mines L7Meanwhile, due to the fact that log completeness is strong, a correct model consistent with an original model is obtained, and the accuracy of the model obtained by the ILP algorithm is equal to that of the model obtained by the text algorithm. In contrast, the algorithm herein has lower requirements on the completeness of the log and higher accuracy.
In conclusion, the model obtained by the algorithm has great advantages in terms of accuracy.
It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make modifications, alterations, additions or substitutions within the spirit and scope of the present invention.