CN108984746B

CN108984746B - Multi-concurrent triangular 2-degree-cycle process model mining method

Info

Publication number: CN108984746B
Application number: CN201810780704.3A
Authority: CN
Inventors: 杜玉越; 孙慧明; 田银花; 王路; 亓亮; 张福新
Original assignee: Shenzhen Xieer Information System Co ltd
Current assignee: Shenzhen Xieer Information System Co ltd
Priority date: 2018-07-17
Filing date: 2018-07-17
Publication date: 2022-05-06
Anticipated expiration: 2038-07-17
Also published as: CN108984746A

Abstract

The invention provides a process model mining method of a multi-concurrency triangle 2-degree cycle, which comprises the steps of firstly, providing a concept of a triangle 2-degree cycle concurrency block according to the definition of the triangle 2-degree cycle, dividing activities into main activities and callback activities according to quantity characteristics, and then, matching and deleting incorrect activities by adopting a pruning idea according to the position where the activities firstly appear in a trace, thereby obtaining correct activity matching. The method is simple to implement, easy to operate, low in dependence on log completeness, high in accuracy and free of digging out the relation which does not exist in the log. Experimental analysis proves that the algorithm can accurately and effectively mine multiple concurrent triangle 2-degree circulation, and the model obtained by the method has higher accuracy and fitting degree.

Description

Multi-concurrent triangular 2-degree-cycle process model mining method

Technical Field

The invention relates to the field of mining of a multi-concurrent-triangle 2-degree cyclic process model, in particular to a mining method of a multi-concurrent-triangle 2-degree cyclic process model.

Background

With the development of computers and the internet, more and more enterprises adopt information systems to process services, and the information systems can generate a large amount of log files. Process mining, an emerging discipline, aims to extract valuable process-related information from these log files. The Process mining mainly has applications in three aspects of Process Discovery (Process Discovery), compliance check (Process performance) and Process improvement (Process Enhancement). Process discovery is one of the most challenging tasks in process mining.

In general, process discovery is the process of generating a model using event logs that do not include any a priori information. After obtaining the model, four standard evaluation process models of fitting degree, accuracy, simplification degree and generalization degree are generally adopted. The degree of fitting represents the ability of traces in the log to replay in the model; accuracy represents the ability of the model to replay logs; the degree of simplification represents the complexity of the model; the degree of generalization represents the ability of the model to allow future behavior. Fitness and accuracy are the two most important criteria for judging process models.

Under different backgrounds, foreign and domestic scholars propose a plurality of algorithms for process discovery aiming at different problems in the process discovery. The literature: wilv D A, Weijsters T, market L.workflow mining: decoding process models from events, IEEE Transactions on Knowledge & Data Engineering [ J ],2004,16(9): 1128-.

The literature: WEN Lijie, VAN DER AALST WMP, WANG Jianmin, et al, mining process models with a non-free-choice constraint [ J ]. Data mining and Knowledge discovery,2007, 15(2): 145-. The literature: weijsters AJ M, Dongen B F V, Medeeiros AK. Process Mining: extension the-algorithm to Mine Short Loops [ C ]. Springer Berlin Heidelberg,2004: 151-. However, when the log only satisfies local completeness and does not contain explicit cyclic explicit behaviors such as "aba", both Alpha and its extended algorithm cannot mine the correct model. The literature: weijsters A J M, Aalst W M P, Medeeros AKA. Process mining with the Heuritics Miner Algorithm [ J ]. Eddhoven University of Technology,2006:1-34, a heuristic process mining Algorithm is proposed that replays logs according to dependencies, which has great advantages in incomplete, noisy log processing, but is generally capable of short-loop processing. The literature is as follows: medeiros A K A D, Weijsters A J M, Aalst W M P V D. genetic processing [ M ] Data Mining & Knowledge Discovery,2007,14(2):245-304, the genetic algorithm idea is used for process Mining, the method has good parallel capability and high log processing speed, but when short loops are hidden in a large-scale model, the efficiency is not very high. The literature: J.M.E.M.van der Werf, Dongen B FV, Hurken C A J, et al.Process Discovery Using Integrated Linear Programming [ J ]. Fundamenta information, 2008,94(3):368-387, an ILP algorithm (Integer Linear Programming Algorithm) is proposed, which can solve the problem of short-cycle mining to some extent, but has a high requirement on the completeness of the log. The literature: linlei bud, Zhouhua, Daifei, etc., an extended Alpha algorithm [ J ] for mining two-degree cycles, a computer integrated manufacturing system, 2018,24(03): 591-one 601, innovatively divides the 2-degree cycle into a triangular 2-degree cycle and a diamond 2-degree cycle, and proposes a proximity model to solve 2-degree short-cycle mining without cycle explicit behavior. The proximity model can solve the problem to some extent. But the proximity model is a probabilistic model calculated from correlations, relying on a large number of logs. When the log amount is small or the activity close-proximity behavior in the triangle 2-degree loop is small, the triangle 2-degree loop is recognized to have certain limitation, namely when a plurality of triangle 2-degree loops are concurrent, the activity in the matching triangle 2-degree loop is easy to deviate.

Disclosure of Invention

The invention aims to provide a process model mining method of multiple concurrent triangle 2-degree cycles, aiming at the problem that a mining result model is easy to deviate from an original model when the existing multiple concurrent triangle 2-degree cycles are mined.

The invention adopts the following technical scheme:

a process model mining method of multiple concurrent triangular 2-degree circulation comprises the following steps:

step 1: providing a triangle 2-degree cycle concurrency block according to the definition of the triangle 2-degree cycle, and dividing the activities into main activities and callback activities according to the quantity characteristics;

define triangle 2 degree cycle, with Δ>_LOr<Δ_LRepresents;

let N be (P, T; F, M) a Petri net model, a, b two transitions in N, a Δ>_Lb or b<Δ_La if and only if:

(1)

(2) suppose M₁∈R(M₀) So that M₁[a>M₂And no M is present₁[σ>M₂Where σ is the occurrence sequence, then only M is present₂[b>M₁If M is present₂Not final identification and presence of M₂[x>M₃Wherein x belongs to T, a is not equal to x is not equal to b;

provided with an activity a_i，b_iForm a triangular 2-degree cycle satisfying a_iΔ>_Lb_iOr b_i<Δa_i，a_iIs a main body moving, b_iFor callback activities, a set formed by all main activities and a set of all callback activities are called a main activity set and a callback activity set, respectively, and formally defined as follows:

defining a subject activity set and a callback activity set

Let Bo_LIs a subject activity set, C_LIs a callback activity set, wherein:

(1)

(2)

defining triangle 2 degree cyclic concurrency blocks

Let a doublet (a)₁,b₁)，(a₂,b₂)，……，(a_n,b_n) All activities in (1) satisfy a_iΔ>_Lb_iWhen n triangles are circularly concurrent at 2 degrees, unique transition x and y exist, and the following conditions are satisfied:

(1)x＝^●(^●a₁)∩^●(^●a₂)∩……∩^●(^●a_n)；

(2)y＝(a₁ ^●)^●∩(^●a₂ ^●)^●∩……∩(a_n ^●)^●；

say x, y and n concurrent tripletsThe structure formed by angular 2-degree circulation is a triangular 2-degree circulation concurrent block, wherein h_ΔX is the head of block activity, t_ΔY is a block tail activity;

two activities continuously occurring in the trace form a direct following relationship, and a concurrent relationship and a causal relationship can be judged by utilizing the direct following relationship, wherein the direct following set is defined as follows:

defining directly following collections

Directly following set D_LThe elements in (1) are all the components in the trace>_LThe activity of a relation constituting a doublet, i.e.

The activities forming the loop structure appear in the log for a plurality of times, the relationship of the times of the activities is an important reference for judging the loop, and the definition of the times of the activities appears is given as follows:

defining number of occurrences of an activity

Setting a log L, wherein the trace sigma belongs to L, the activity a belongs to sigma, sum (a, sigma) represents the number of times of the activity in the trace, and sum (a, L) represents the total number of times of the activity in the log;

algorithm 1 Classification Algorithm for Main Activity and callback Activity

Inputting: a log L satisfying local completeness;

and (3) outputting: subject Activity set Bo_LAnd callback active set C_L；

Step (1): creating a one-dimensional array LTM to count the number of activities, creating a direct-following set D_LThe main body Activity set Bo_LCallback activity set C_LAnd triangle 2 degree cyclic concurrency block head activity h_ΔAnd initializing;

step (2): traversing the log L and putting the initial activity into the initial activity set T_IPut the end activity into the end activity set T_oPut all activities into the activity set T_LAnd will continue to come outThe current activities form dyads, which are placed into the direct follow set D_LPerforming the following steps;

and (3): traversing the log L, and counting the activity set T_LThe times of the occurrence of the middle activities are put into the positions corresponding to the one-dimensional array LTM;

and (4): traversing the one-dimensional array LTM, if the difference between two elements in the array is larger than 0 and the activities corresponding to the two elements are in a concurrent relationship in the log, traversing any trace in the log, and assigning one activity before the first activity in the two activities as a triangle 2-degree cycle concurrent block head activity h_Δ；

And (5): traverse the active set T_LWill move with the head of the block h_ΔPutting activities satisfying causal relationships into a subject activity set Bo_LPlacing activities in the callback activity set C that satisfy concurrency and direct-following relationships with activities in the subject activity set_LThe preparation method comprises the following steps of (1) performing;

and (6): return to the subject Activity set Bo_LAnd callback active set C_L；

Step 2: defining the position of the first occurrence of the activity in the trace, and deleting incorrect activity matching by adopting a pruning idea so as to obtain correct activity matching;

defining the position where an activity first appears in a trace

Setting trace sigma epsilon L, activity a epsilon sigma, and first (a, sigma) represents the position index of the first occurrence of the activity a in the trace sigma;

defining a first time marker position matrix

Let log L, set Bo_L∪C_LThen the first marking position matrix is FM [ | L! non-calculation][|Bo_L∪C_L|]Satisfy the following requirements

Having FM [ sigma ]_i][a_j]＝first(a_j,σ_i)；

Obtaining matching results by using the first marking position matrix, wherein a set formed by all matching results is called a matching result set, and the matching results and the matching result set are defined as follows:

defining a match result and a set of match results

The matching result is a binary mt_lWhere a is a subject activity and b is a callback activity, the activities in the dyad cannot be either subject activities or callback activities at the same time, i.e. the activity in the dyad is a subject activity or a callback activity

Set of matching results MT_LIs derived from the matching result mt_lSet of compositions, i.e. MT_L＝{(a,b)|(a∈Bo_L∧b∈C_L)∨(b∈Bo_L∧a∈C_L)}；

Algorithm 2 subject activity and callback activity matching algorithm

Inputting: log L, subject Activity set Bo satisfying local completeness_LCallback activity set C_L；

And (3) outputting: set of matching results MT_L；

Step (1): creating a first-marking position matrix FM [ | L |)][|Bo_L∪C_L|]Set of matching results MT_LAnd matching result mt_lAnd initializing;

step (2): assemble the subject activities Bo_LAnd callback active set C_LThe activities in (1) are Cartesian multiplied, and the formed binary groups are assigned to mt_lAnd all mt are combined_lPut into the matching result set MT_LPerforming the following steps;

and (3): traversing the log L and leading the L to belong to the set Bo_L∪C_LIs recorded at the position where the activity first appears in the track and stored in the two-dimensional array FM [ | L][|Bo_L∪C_L|]The corresponding position in (1);

and (4): traverse two-dimensional array FM [ | L #][|Bo_L∪C_L|]If callback to active set C_LIs smaller than the subject activity set Bo_LAt a position in (1), the matching result set MT is_LThe cartesian product of this activity in (a) is deleted;

and (5): returning a matching resultSet MT_L；

And step 3: obtaining an AlphaMatch algorithm, and completing the excavation of a process model of multiple concurrent triangles with 2-degree circulation;

defining AlphaMatch Algorithm

Let L be an activity-based log, then AlphaMatch (L) is defined as follows:

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)P_L＝{P_(A,B)|(A,B)∈Y_L}∪{i_L,o_L}；

(10)F_L＝{(a,P_(A,B))|(A,B)∈Y_L∧a∈A}∪{(P_(A,B),b)|(A,B)∈Y_L∧b∈B}∪{(i_L,t)|t∈T_I}∪{(t,o_L)|t∈T_O}；

(11)AlphaMatch(P_L,T_L,F_L)；

the AlphaMatch algorithm classifies the subject activities and callback activities, matches the subject activities with the callback activities, and returns a correctly matched result set MT_LAnd obtain the relationships between activities in the result set.

The invention has the beneficial effects that:

according to the process model mining method for the multi-concurrent-triangle 2-degree cycle, firstly, the concept of the triangle 2-degree cycle concurrent block is provided according to the definition of the triangle 2-degree cycle, the activities are divided into main activities and callback activities according to the number characteristics, and then incorrect activities are matched and deleted by adopting the pruning idea according to the position where the activities firstly appear in the trace, so that correct activity matching is obtained. The method is simple to implement, easy to operate, low in dependence on log completeness, high in accuracy and free of digging out the relation which does not exist in the log.

Finally, the algorithm is realized on a ProM platform in a plug-in mode, experimental analysis proves that the algorithm can accurately and effectively mine multiple concurrent triangle 2-degree circulation, and the model obtained by the method has higher accuracy and fitting degree.

Drawings

Fig. 1 shows a model of a plastic casting mold production process.

FIG. 2 is Alpha + algorithm mining L₂And (5) a log result graph.

Fig. 3 is a structural diagram of a triangular concurrency block.

Fig. 4 is a graph of the mining result of the AlphaMatch algorithm.

Figure 5 is a master model of ball bearing production.

FIG. 6 is a model of Alpha + mining of example 1.

Fig. 7 is a model mined by the ILP algorithm of example 1.

FIG. 8 is a model of the mining of the Inductive Miner-in frequency (IMF) algorithm of example 1.

Fig. 9 is a model mined by the AlphaMatch algorithm of example 1.

FIG. 10 is a comparison of the degree of fit.

Fig. 11 is a graph comparing accuracy.

Detailed Description

The following description of the embodiments of the present invention will be made with reference to the accompanying drawings:

defining traces and event logs

Let A be the set of activities, trace σ ∈ A ∈ the activity queue, and event log L be the multi-set of traces, i.e., L ∈ B (A ∈).

The Petri net is a model for describing a distributed system, can describe the structure of the system and can simulate the operation of the system. The Petri net is a directed bipartite graph without isolated nodes in form.

Defining a Petri Net

A Petri net is a four-tuple N ═ (P, T; F, M), where P is a finite set of libraries and T is a finite set of transitions. N satisfies:

(1)

(2)

(3)

is a collection of directed arcs, called a stream relation;

(4) p → {1,2, 3. } an identifier called N, M₀Representing the initial identity.

(5)dom(F)∪cod(F)＝P∪T。

Wherein the content of the first and second substances,

the antecedent and postcedent sets are used to describe the library or epitopic portion of the transition, and are defined as follows:

defining a front set and a back set

Let N ═ (P, T; F, M) be a Petri net. For x ∈ P ^ T, remember

^●x＝{y|y∈P∪T∧(y,x)∈F}

x^●＝{y|y∈P∪T∧(x,y)∈F}

Balance with scale^●x is a preceding set of x, called x^●The latter set of x.

Any two activities in a trace constitute different ordering relationships, four common ordering relationships are as follows:

defining log-based ordering relationships

Let L be an activity-based event log, σ be the trace in the log, and a, b be any two activities that occur in log L. Then:

(1)a＞_Lb if and only if there is a trace σ ═ b<t₁,t₂,t₃,...t_n>I ∈ {1,2,3, 4., n-1}, such that σ ∈ L, t_iA and t_i+1＝b；

(2)a→_Lb if and only if a >_Lb and no b >_La。

(3)a#_Lb if and only if a > is not present_Lb also does not exist_La。

(4)a||_Lb if and only if a >_Lb and b >_La。

The following describes the process model mining method of multiple concurrent triangle 2 degree loops in detail.

A process model mining method of multiple concurrent triangle 2-degree circulation comprises the following steps:

define triangle 2 degree cycle, with Δ>_LOr<Δ_LTo represent

Let N be (P, T; F, M) a Petri net model and a, b two transitions in N. a Δ>_Lb or b<Δ_La if and only if:

(1)

(2) suppose M₁∈R(M₀) So that M₁[a>M₂And no M is present₁[σ>M₂Where σ is the occurrence sequence, then only M is present₂[b>M₁If M is present₂Not final identity and presence M₂[x>M₃Wherein x belongs to T, a is not equal to x is not equal to b;

defining Alpha Algorithm

Let L represent an activity-based event log, then alpha (L) is defined as follows:

(1)

(2)

(3)

(4)

(5)

(6)P_L＝{P_(A,B)|(A,B)∈Y_L}∪{i_L,o_L}；

(7)

(8)Alpha(P_L,T_L,F_L)；

defining partial completeness logs

Let a, b be any two activities in the log, and b can directly follow a, said to satisfy a>_LThe log of at least one occurrence of the behavior of b in the trace is a partial completeness log.

The mold is typically constructed from a plurality of parts. The plastic pouring mold is formed by splicing an upper special metal groove and a lower special metal groove into a cavity, finally, liquid plastic is poured into the cavity, the plastic is cooled and molded, and the plastic is processed into plastic through the subsequent process. The plastic casting mold is usually produced by first producing the upper and lower grooves of the semi-finished product separately. Because the mould needs very high precision, the upper and lower recess of semi-manufactured goods all do not accord with the concatenation standard under most circumstances, so require to polish the calibration to the upper and lower two recesses of mould. On one hand, the edge meets the splicing requirement, and on the other hand, the cavity is polished. The concrete process can be abstracted into the following steps: 1) preparing a mold production material. 2) And producing a semi-finished product groove. 3) The groove on the mold was measured. 4) If the upper groove meets the standard, waiting for splicing; if the standard is not met, polishing and calibrating the upper groove are needed, and then continuing to perform the step 3. 5) The groove under the mold is measured. 6) If the lower groove meets the standard, waiting for splicing; if the standard is not met, polishing calibration needs to be carried out on the lower groove, and then the step 5 is continued. 7) And after the upper groove and the lower groove of the mold are measured to be qualified, splicing the two grooves into a plastic pouring mold. 8) The finished product mold enters the subsequent plastic production flow. Wherein, the step 3 and the step 5 can be carried out simultaneously, and the step 4 and the step 6 can also be carried out simultaneously. The process model of the die machining is shown in fig. 1:

as shown in the model of FIG. 1, two triangular 2 degree cycles are concurrent in the model, and a and b, c and d are clearThe precedence relationship is shown. The activity a must be performed first, then the activity b, and after the activity b is completed, the activity a must be performed again, and the activity b must be performed between the two activities a. Activities c and d also follow such a relationship. The model can generate two types of logs, one type is a complete log containing obvious triangular 2-degree cycle explicit behaviors such as ' aba ' and ' cdc^[12]. Such as log L₁＝<e k a c j f,e k c a j f,e k a b a c j f,e k a c b a j f,e k c d c a j,e k c d a c j f,e k a b a c d c j f,e k c d c a b a j f,…>. The other type is a local complete log without explicit behaviors of triangle loops such as "aba", "cdc", and the like. Such as log L₂＝<e k a c j f,e k c a j f,e k a b c a j f,e k a c b a j f,e k c d a c j f,e k c a d c j f,e k a c db a c j f,e k a c b d a c j f>. For log L₂In the prior art, the algorithm cannot dig out the correct model. Taking Alpha + algorithm as an example, Alpha + algorithm pairs log L₂The excavation results are shown in fig. 2. Because the Alpha + algorithm only digs the concurrency relation among a, b, c and d, no triangle 2-degree cycle is dug. Therefore, the resulting model has two independent transitions and is apparently not the correct model.

Although no explicit behavior of the loop occurs in the log, the log still maintains the characteristics of the loop: (1) the sequence of the occurrences of the activities a and b in the trace is fixed; (2) the relation of the number of activities a and b in the log is not changed.

In the partial completeness log lacking the loop explicit behavior, mining a model containing a plurality of concurrent triangle 2-degree loops is important for research through the above features of the structure. The core of the problem is how to correctly match two activities into a triangular 2 degree loop. Such as the model shown in fig. 2, the core of the problem is how to match activities b, d with activities a, c into triangular 2 degree loops.

The algorithm is described in detail by taking a log generated by the model in fig. 1 as an example.

Provided with an activity a_i，b_iForm a triangular 2-degree cycle satisfying a_iΔ>_Lb_iOr b_i<Δa_i，a_iIs a main body moving, b_iIs a callback activity. The set formed by all the main body activities and the set formed by all the callback activities are called a main body activity set and a callback activity set respectively, and the formalization definition of the set is as follows:

defining a subject activity set and a callback activity set

Let Bo_LIs a subject activity set, C_LIs a callback activity set, wherein:

(1)

(2)

defining triangle 2 degree cyclic concurrency blocks

(1)x＝^●(^●a₁)∩^●(^●a₂)∩……∩^●(^●a_n)；

(2)y＝(a₁ ^●)^●∩(^●a₂ ^●)^●∩……∩(a_n ^●)^●。

the structure formed by x, y and n concurrent triangular 2-degree cycles is called a triangular 2-degree cycle concurrent block, wherein h_ΔX is the head of block activity, t_ΔY is a block tail activity.

As shown in FIG. 3, the activities in the two tuples (c, d) and (a, b) constitute a triangle 2 degree loop, where a and c are subject activities and b and d are callback activities, respectively. And the first block activity k, the last block activity j and two concurrent triangle 2 degree cycles form a triangle 2 degree cycle concurrency block.

Two activities continuously occurring in the trace form a direct following relationship, other relationships such as a concurrency relationship, a causal relationship and the like can be judged by utilizing the direct following relationship, and a direct following set is defined as follows:

defining directly following collections

E.g. at σ₃＝<e a c f>Middle, activity e>_La，a>_Lc，c>_Lf, therefore D_L＝{(e,a),(a,c),(c,f)}。

The activities constituting the loop structure may appear in the log many times, and the relationship of the times of the activities is an important reference for judging the loop. The definition of the number of activity occurrences is given below:

defining number of occurrences of an activity

Let log L, trace σ ∈ L, activity a ∈ σ. sum (a, σ) represents the number of occurrences of the activity in the trace, and sum (a, L) represents the total number of occurrences of the activity in the log.

E.g. trace sigma₁＝<eacf>，sum(a,σ₁)＝1；L＝{<ecabaf>}，sum(a,L)＝2。

The characteristics of the activity in the log are the correct reflection of the model structure, the characteristics of the triangle 2-degree circular structure are abstracted into a position theorem and a number theorem, and a proof is given.

Theorem 1 position theorem: if there is activity a_i，b_iE σ, σ e L, and a_iΔ>_Lb_iThen first (a)_i,σ)<first(b_i,σ)。

And (3) proving that: activity a_i，b_iBelongs to sigma, and belongs to L and satisfies a_iΔ>_Lb_i，a_iFor major activities, b_iFor callback activities. The triangle is defined as the 2 degree cycle of the triangle is certain to generate<a_ib_ia_ib_ia_ib_i…b_ia_i>Of a sequence of which the first occurring activity is a_iOnly then the first b appears_i. Thus the first b_iThe subscript in σ must be less than the first a_iSubscript in σ. After the syndrome is confirmed.

Besides satisfying the position theorem, the number of activities in the loop structure has a certain rule.

Theorem 2 quantitative theorem: if there is an activity a_iAnd b_iAnd a is_iΔ>_Lb_iThen contains a_i，b_iIn any trace sigma and any log L generated by the formed model of triangle 2-degree loop: sum (b)_i,σ)-sum(a_iσ) 1 and sum (a)_i,L)-sum(b_i,L)＝|L|。

And (3) proving that:

1) if the triangle is not circulated by 2 degrees, the trace is a_i Is 1, b_iThe theorem holds true when the total number of (2) is 0.

2) If entering the triangular 2 degree cycle, a is given above_i，b_iThe formed triangle 2-degree circulation is generated as sigma ═ definitely<a_ib_ia_ib_ia_ib_i…b_ia_i>The sequence of (a). The first occurring activity is a_iSubsequently occurring in pairs<b_ia_i>So a in σ_iA constant ratio of the total number of (b)_iThe theorem holds true at this time, too, 1.

Each trace satisfies the above quantity relationship, at this time, a in the log_iA constant ratio of the total number of (b)_iThe total number of (a) is 1 × | L | ═ L |, in number. After the syndrome is confirmed.

Algorithm 1 is a classification algorithm for subject activities and callback activities. Algorithm 1 classifies main activities and callback activities in logs mainly according to the definition of triangle 2-degree circular concurrency blocks and the quantity relation in theorem 2, and puts Bo in the logs respectively_LAnd C_LIn (1).

Algorithm 1 Classification Algorithm for Main Activity and callback Activity

Inputting: a log L satisfying local completeness;

and (3) outputting: subject Activity set Bo_LAnd callback active set C_L；

Step (1): creating a one-dimensional array LTM to count the number of activities, creating a direct-following set D_LThe main body Activity set Bo_LCallback activity set C_LAnd triangle 2 degree cyclic concurrency block head activity h_ΔAnd initialization is performed.

Step (2): traversing the log L and putting the initial activity into the initial activity set T_IPut the end activity into the end activity set T_oPut all activities into the activity set T_LAnd the activities that occur consecutively are grouped into tuples, the tuples being placed into the directly following set D_LIn (1).

And (3): traversing the log L, and counting the activity set T_LAnd (4) putting the times of the occurrence of the medium activities into the corresponding positions of the one-dimensional array LTM.

And (4): traversing the one-dimensional array LTM, if the difference between two elements in the array is larger than 0 and the activities corresponding to the two elements are in a concurrent relationship in the log, traversing any trace in the log, and assigning one activity before the first activity in the two activities as a triangle 2-degree cycle concurrent block head activity h_Δ。

And (5): traverse the active set T_L. Will move with the head of the block h_ΔPutting activities satisfying causal relationships into a subject activity set Bo_LPlacing activities in the callback activity set C that satisfy concurrency and direct-following relationships with activities in the subject activity set_LIn (1).

And (6): return to the subject Activity set Bo_LAnd callback active set C_L。

With log L₃：[<e k a b c a j f>,<e k a c b a j f>,<e k c d a c j f>,<e k c a d c j f>,<e k a c d b a c j f>,<e k a c b d a c j f>,<e k a c j f>,<e k c a j f>]For example. In the step (1), all elements in the LTM array are 0; step (2) obtaining D_L＝{e>_Lk,a>_L c,c>_Lb,c>_Ld,b>_Lc,b>_La,k>_Lc,b>_Ld,d>_Lc,d>_Lb,c>_Lj,a>_Ld,k>_La,d>_La,a>_Lj,c>_La,a>_Lb,j>_Lf }; and (3) acquiring an LTM matrix, wherein the statistical result of the matrix is shown in Table 1. Step (4) obtaining the head of block activity h_ΔK; in the step (5), the occurrence frequency of the activities a and c is 12; the occurrence frequency of the activities b and d is 4, and the theorem 2 is satisfied. Then Bo is obtained_L＝{c,a}，C_LD, b. Step (6) returns to Bo_LAnd C_L。

Table 1 Algorithm results in L₃LTM of

defining the position where an activity first appears in a trace

Let trace σ ∈ L, activity a ∈ σ. first (a, σ) represents the position index at which activity a first occurs in trace σ.

E.g. trace sigma₁＝<eacf>，first(a,σ₁) 2; trace sigma₂＝<ecabaf>，first(a,σ₂)＝3。

Defining a first time marker position matrix

Is provided with

Matching results can be obtained by using the first-time mark position matrix, a set formed by all matching results is called a matching result set, and the matching results and the matching result set are defined as follows:

defining a match result and a set of match results

The matching result is a binary mt_lWhere a is the subject activity and b is the callback activity. The activities in the doublets cannot be simultaneously subject activities or callback activities, i.e.

Set of matching results MT_LIs derived from the matching result mt_lSet of compositions, i.e. MT_L＝{(a,b)|(a∈Bo_L∧b∈C_L)∨(b∈Bo_L∧a∈C_L)}。

Algorithm 2 is a subject activity and callback activity matching algorithm. Algorithm 2 puts the matching results into the matching result set first, and then performs activity matching according to theorem 1 using the position at which the activity was first marked. The matching process adopts the concept of pruning, and if the matching result mt is_lIf the two activities in (1) do not satisfy theorem 1, the matching result is deleted.

Algorithm 2 subject activity and callback activity matching algorithm

And (3) outputting: set of matching results MT_L；

Step (1): creating a first-marking position matrix FM [ | L |)][|Bo_L∪C_L|]Set of matching results MT_LAnd matching result mt_lAnd initialization is performed.

Step (2): assemble the subject activities Bo_LAnd callback active set C_LThe activities in (1) are Cartesian multiplied, and the formed binary groups are assigned to mt_lAnd all mt are combined_lPut into the matching result set MT_LIn (1).

And (3): traversing the log L and leading the L to belong to the set Bo_L∪C_LIs recorded at the position where the activity first appears in the track and stored in the two-dimensional array FM [ | L][|Bo_L∪C_L|]To the corresponding position in (a).

And (4): traverse two-dimensional array FM [ | L #][|Bo_L∪C_L|]If callback to active set C_LIs smaller than the subject activity set Bo_LAt a position in (1), the matching result set MT is_LThe cartesian product of this activity in (a) is deleted.

And (5): return matching result set MT_L。

With L₃For example, in step (1), all elements in the FM matrix are 0. Step (2) adding C_LWith Bo_LPut the Cartesian product of into the MT_L. MT obtained at this time_L{ (a, b), (a, d), (b, a), (d, a), (c, b), (c, d), (b, c), (d, c) }. Step (3) obtaining C_L∪Bo_LWhere the activity first occurs in each trace. Obtained FM [ | L # |)][|Bo_L∪C_L|]As shown in table 2. Step (4) search for C_LHas a position index smaller than Bo_LWhich active position indices in σ₁For example, first (b, σ)₁)＝2<first(c,σ₁) 3. According to theorem 1, if the subject activity c and the callback activity b are proved not to be matched, the MT is correspondingly deleted_LTwo tuples of (b, c) and (c, b). At σ₃Middle first (d, sigma)₃)＝2<first(a,σ₃) 3. In the same way, delete MT_LTwo tuples of (a, d) and (d, a) in (a, d). After traversing FM, MT_LFour binary groups (a, b), (b, a), (c, d) and (d, c) are left, and the activities in the binary groups in the matching result set can form a triangle 2-degree cycle.

TABLE 2L₃First time mark position matrix FM

defining AlphaMatch Algorithm

Let L be an activity-based log, then AlphaMatch (L) is defined as follows:

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)P_L＝{P_(A,B)|(A,B)∈Y_L}∪{i_L,o_L}；

(11)AlphaMatch(P_L,T_L,F_L)。

compared with the classical Alpha algorithm, the Alpha match algorithm classifies the subject activities and the callback activities; matching the main body activity with the callback activity; finally returning a correctly matched result set MT_LAnd obtain the relationships between activities in the result set. With L₃For example, the relationship between activities mined and analyzed by the AlphaMatch algorithm is shown in table 3.

As can be seen from Table 3, activities a and b, c and d are all matched together, resulting in L₃The corresponding model is shown in fig. 4, which is consistent with the original model in the case.

TABLE 3L₃Footprint of

Example 1

The concurrent structure of triangle 2 degree circulation is widely applied to the fields of mould production, part processing, flexible manufacturing, precision instrument production, medical instrument production, sensor production and the like.

Taking a production process model of the ball bearing as an example, the invention obtains a local complete log without circulation explicit behaviors such as 'aba' and the like through the following steps:

1) inputting a process model for ball bearing production comprising three concurrent triangular 2 degree cycles as shown in fig. 5;

2) running a Perform a simple relationship of a (stochastic) Petri net plug-in the ProM to obtain a log of an original model;

3) and manually screening partial complete logs meeting the requirements. The log attributes for the experiments are shown in table 4:

experiments compared results mined by the Alpha Match algorithm, the Alpha + algorithm, the ILP algorithm, and the Inductive Miner-Integer (IMF) algorithm.

TABLE 4 Log attributes

Import log L₄The log is obtained from the original model shown in fig. 5. Comparing the mining results of the Alpha + algorithm, the ILP algorithm, the Inductive Miner-Integer (IMF) algorithm and the Alpha Match algorithm. The Alpha + algorithm results are shown in fig. 6. Since the log is not completely complete, the Alpha + algorithm only digs out the concurrency relationships and causal relationships of activities a, b, c, d, g, h. At this time, the Alpha + algorithm does not dig out the relationship between the three callback activities and other activities, so that three independent transitions exist in the model of fig. 6, which is greatly different from the original model. Therefore, this model is not reasonable.

FIG. 7 is a model mined by the ILP algorithm, which yields a model in which the subject activity and callback activity matches correctly, but which has many order relationships that do not exist in the original model and logs, as compared to the Alpha + algorithm. E.g. e →_Lb、e→_Ld. The model obtained by the ILP algorithm is therefore not reasonable.

FIG. 8 is a model mined by the Inductive Miner-not-frequency (IMF) algorithm, which does not perform matching of activities, but rather separates callback activities and subject activities into two parts and adds a large number of invisible transitions, which results in a relatively complex model structure. In addition, if the main activity a occurs first and the other two main activities do not occur yet, any one of the three callback activities may occur immediately following the activity a. The sequence generated in this case may not have been generated by the original model, for example the sequence "aha". Therefore, the model in fig. 8 is not reasonable.

FIG. 9 is a model mined by the method of the present invention, the model shown in FIG. 9 correctly matches activities together and there are no independent transitions present, as compared to models mined by the Alpha + algorithm; compared to the model mined by the ILP algorithm, the model shown in FIG. 9 has no error relationships between activities. Compared to the model mined by the Inductive Miner-Integer (IMF) algorithm, the model of FIG. 9 does not produce sequences that were not available from the original model, and the model is consistent with the original model.

In summary, from the perspective of algorithm mining model, the model mined herein is consistent with the original model, and has a great advantage compared with other algorithms.

Four resulting models were analyzed from a fitness perspective. Importing logs L of different scales and different complentances generated by original model₄，L₅，L₆，L₇. L in four logs₇The number of traces contained is the largest and the completeness is the strongest. The model and the Log are input through a Replay a Log on Petri Net for Performance Analysis plug-in of the ProM platform, the fitting degree of the model mined by the four algorithms is obtained, and the statistical result is shown in FIG. 10. The fitness obtained by the AlphaMatch algorithm and the Inductive Miner-Integer (IMF) algorithm is always 1, and the fitness is higher than that of the other two algorithms. However, as the Inductive Miner-Integer (IMF) algorithm divides the subject activities and the callback activities into two pieces of mining, the model may also generate sequences similar to those that the original models such as "ada" and "aha" cannot generate. Therefore, the model is an unreasonable model. ILP algorithm is mining log L₄，L₅，L₆The fitting degree of the obtained model is low, but the log L is mined₇The time-of-fit also reaches 1 due to the enhanced log completeness, at which time the ILP also yields the correct model. In contrast, the algorithm herein also maintains a higher degree of fit at all times with poor completeness. Thus, the present algorithm has advantages over the ILP algorithm in terms of the completeness requirement of the log. Since none of the four logs above is a fully complete log as required by the Alpha + algorithm, the Alpha + algorithm cannot obtain the relationship between callback activities. The degree of fit of the model obtained by the Alpha + algorithm is relatively low.

In summary, the algorithm herein has certain advantages in the resulting fitting of the models.

And analyzing the four algorithms to obtain the accuracy of the model. The accuracy of the four algorithms is obtained by using Check Precision based on Align-ETConformance plug-in ProM,the statistical results are shown in fig. 11. Because three independent transitions occur in the model mined by the Alpha + algorithm, the accuracy of the obtained model is the lowest. The accuracy of the model obtained by the algorithm is not high because the model mined by the Inductive Miner-not-frequency (IMF) algorithm can generate a large number of activity sequences which cannot be generated by the original model. Due to log L₄，L₅，L₆The completeness is weak, and the model excavated by the ILP algorithm is different from the original model to a certain extent, so that the accuracy is slightly lower than that of the algorithm. But the ILP algorithm mines L₇Meanwhile, due to the fact that log completeness is strong, a correct model consistent with an original model is obtained, and the accuracy of the model obtained by the ILP algorithm is equal to that of the model obtained by the text algorithm. In contrast, the algorithm herein has lower requirements on the completeness of the log and higher accuracy.

In conclusion, the model obtained by the algorithm has great advantages in terms of accuracy.

It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make modifications, alterations, additions or substitutions within the spirit and scope of the present invention.

Claims

1. A process model mining method of multiple concurrent triangle 2-degree circulation is characterized by comprising the following steps:

define triangle 2 degree cycle, with Δ>_LOr<Δ_LRepresents;

(1)

defining a subject activity set and a callback activity set

Let Bo_LAs a subject activity set, C_LIs a callback activity set, wherein:

(1)

(2)

defining triangle 2 degree cyclic concurrency blocks

(1)x＝^●(^●a₁)∩^●(^●a₂)∩……∩^●(^●a_n)；

(2)y＝(a₁ ^●)^●∩(^●a₂ ^●)^●∩……∩(a_n ^●)^●；

is called x, y and nThe structure formed by the triangular 2-degree circulation is a triangular 2-degree circulation concurrent block, wherein h_ΔX is the head of block activity, t_ΔY is a block tail activity;

defining directly following collections

defining number of occurrences of an activity

Setting a log L, wherein the trace sigma belongs to L, the activity a belongs to sigma, the sum (a, sigma) represents the number of times of the activity in the trace, and the sum (a, L) represents the total number of times of the activity in the log;

algorithm 1 Classification Algorithm for Main Activity and callback Activity

Inputting: a log L satisfying local completeness;

and (3) outputting: subject Activity set Bo_LAnd callback active set C_L；

step (2): traversing the log L and putting the initial activity into the initial activity set T_IPut the end activity into the end activity set T_oPut all activities into the activity set T_LAnd is andthe continuously occurring activities are grouped into duplets, which are put into the directly following set D_LPerforming the following steps;

And (5): traversing the active set T_LWill move with the head of the block h_ΔPutting activities satisfying causal relationships into a subject activity set Bo_LPlacing activities in the callback activity set C that satisfy concurrency and direct-following relationships with activities in the subject activity set_LPerforming the following steps;

and (6): return to the subject Activity set Bo_LAnd callback active set C_L；

defining the position where an activity first appears in a trace

defining a first time marker position matrix

Having FM [ sigma ]_i][a_j]＝first(a_j,σ_i)；

defining a match result and a set of match results

Algorithm 2 subject activity and callback activity matching algorithm

And (3) outputting: set of matching results MT_L；

and (4): traverse two-dimensional array FM [ | L #][|Bo_L∪C_L|]If callback to active set C_LIs smaller than the subject activity set Bo_LIn the location, then set MT with the matching result_LThe cartesian product of this activity in (a) is deleted;

and (5): returning a matching result setAlloy MT_L；

defining AlphaMatch Algorithm

Let L be an activity-based log, then AlphaMatch (L) is defined as follows:

(1)