CN109753591B

CN109753591B - Business process predictive monitoring method

Info

Publication number: CN109753591B
Application number: CN201811510292.8A
Authority: CN
Inventors: 王伟; 曹健
Original assignee: Jiangyin Zhuri Information Technology Co ltd
Current assignee: Jiangyin Zhuri Information Technology Co ltd
Priority date: 2018-12-11
Filing date: 2018-12-11
Publication date: 2024-01-09
Anticipated expiration: 2038-12-11
Also published as: CN109753591A

Abstract

The invention discloses a business process predictability monitoring method, which is used for carrying out sequence coding and sequence distance measurement based on an event log. The algorithm encodes the activity sequence by using the frequent activity set, gives different weights to the frequent activity subsequence and the data attribute, and searches the historical similar data for predictive monitoring. The monitoring mode effectively monitors the flow in the execution process on the basis that the flow model cannot be known, and adapts to model change caused by concept drift. The mining analysis is performed on the historical information recorded by the event log, and the mining analysis can be used for predictively monitoring the execution status of the current process, such as predicting the next activity, the execution result of the process, the abnormal probability of the process, the execution time of the activity and the like. Predictive monitoring predicts a series of follow-up activities through the current incomplete event trace prefix, which helps the enterprise monitor the status of each flow execution, discover the risk in time to make corresponding countermeasures ahead of time, and improve the enterprise's ability to schedule resources.

Description

Business process predictive monitoring method

Technical Field

The invention relates to a business process predictive monitoring method, in particular to a business process predictive monitoring method based on frequent active set sequence coding and distance measurement.

Background

Predictive business process monitoring can be categorized into 2 broad categories depending on whether or not the process model is mined: a part of researchers focus on predictive monitoring after excavating a process model; another type of supervisory prediction relaxes the assumption that no process model needs to be obtained, but rather predictions are based solely on log-based basis. In active sequence based prediction, sequence coding is an important part of it. The sequence coding method comprises Boolean coding, frequency coding, index-based coding and the like. Boolean coding encodes each attribute value as a one-dimensional feature, the value of which means whether the attribute value appears in the log trace. Frequency-based encoding constructs the same feature vector as boolean encoding, but with the difference that its value represents the number of times that the attribute value appears in the log trace. Boolean coding and frequency-based coding can only encode discrete properties, but cannot encode continuous properties. Another type of coding method is index-based coding. Boolean coding and frequency-based coding methods cannot represent the order in which activities occur, whereas in index-based coding, each feature corresponds to an event in the sequence, and the value is the activity represented by the event and its attribute value.

Currently available predictive monitoring, mostly based on the assumption of smooth business processes, proposes some predictive methods based on mining process models. In practice, enterprises often do not model end-to-end complete flows and manage them through a workflow system, so that certain links are not under the control of the workflow system. In other scenarios, the outside needs to know or predict the progress information of the process, and for reasons of data privacy protection, they can only get information about the occurrence of certain events and not all activities and inter-activity relationships of the whole process. Also, over time, the flow may change, such that the event logs do not correspond to the same process model. Thus, in these cases, the predictions cannot be based on known flow models.

For the prediction problem based on event log information, researchers have proposed a method of sequence comparison by constructing a prediction model by regarding an execution trace as a symbol sequence. However, the specificity of each flow and the variability of the external environment present further challenges to flow prediction, and to overcome this challenge, other features that affect the flow, such as data attribute information, must be comprehensively considered.

Disclosure of Invention

The invention provides a business process predictive monitoring method aiming at the problems and the defects of the prior art, solves the problem of predictive monitoring on the running process on the basis of being unable to learn a process model, and can adapt to the situations of parallel activity, low-frequency activity and concept drift occurring under the actual condition.

The invention solves the technical problems by the following technical proposal:

the invention provides a business process predictive monitoring method, which comprises the following steps:

s1, calculating frequent active sets through a mutual information formula, reserving sequence information of key activities in each frequent active set, and encoding an active sequence by using the frequent active sets;

s2, giving different weights to the frequent active set through a random forest algorithm to form a feature vector set;

s3, measuring the distance between two tracks by using an edit distance in each frequent active set, and weighting the distance between the frequent active sets to obtain the distance to the whole track;

and S4, searching K most similar historical tracks by using the weighted distance, and voting or averaging target values in the K similar historical tracks to obtain a prediction result.

On the basis of conforming to the common knowledge in the field, the above preferred conditions can be arbitrarily combined to obtain the preferred examples of the invention.

The invention has the positive progress effects that:

the invention provides a prediction technology based on a log, and provides a new sequence coding method and a sequence distance measuring mode for performing monitoring prediction on a flow. Compared with Boolean coding, frequency coding and index-based methods, the method considers the influence of different attributes and activities on the prediction target, effectively processes the inter-sequence distance measurement problem under the conditions of low-frequency activities and parallel activities, and can adapt to model change caused by concept drift along with the update of the log.

Drawings

FIG. 1 is a block diagram of a business process predictive monitoring algorithm according to a preferred embodiment of the invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment provides a business process predictability monitoring method based on frequent active set sequence coding and distance measurement, which carries out sequence coding and sequence distance measurement based on event logs. The algorithm encodes the activity sequence by using the frequent activity set, gives different weights to the frequent activity subsequence and the data attribute, and searches the historical similar data for predictive monitoring. The monitoring mode effectively monitors the flow in the execution process on the basis that the flow model cannot be known, and adapts to model change caused by concept drift.

As shown in fig. 1, the predictive monitoring algorithm has the following steps:

s1-frequent active set encoding

For activity a _i ，a _j The mutual information is defined as

Wherein p (a) _i ，a _j ) Is activity a _i 、a _j Appear in the same log trace and a _i Appear at a _j Probability of front, p (a _i ) Is a as _i Independent probabilities of occurrence in log traces, p (a _j ) Is a as _j The independent probabilities of occurrence in the log trace.

Let 1 _k Is a frequent active set, thena _j ∈l _k Are all present in I (a _i ：a _j ) > 0. Assume that an activity can be divided into multiple frequent sub-activity sets l= { L ₁ ，l ₂ ，...，l _n }，l _i ＝{a ₁ ，a ₂ ，...，a _m N sub-feature vectors f can be constructed ₁ ，f ₂ ，...，f _n . For log trace σ=<e ₁ ，e ₂ ，...，e _s >It can be classified into E ^* ＝{E ₁ ，E ₂ ，...，E _N N subsets for event e _j Has pi _A (e _j )＝a _i Thene _j ∈E _i . Pair E _i All events in the list are ordered according to the occurrence time, and can be obtained

If the attribute set isThe eigenvector is f _k ＝(α _1，1 ，α _1，2 ，...，α _1，s ，α _2，1 ，...，α _2，s ，...，α _m，s ) Wherein

The feature sub-vectors may constitute a feature vector f= (f) ₁ ，f ₂ ，...，f _n )。

S2-feature selection

The attributes are screened using a random forest. The sequence information is characterized by f= (f) ₁ ，f ₂ ，...，f _n ) The attribute features are denoted as g= (g ₁ ，g ₂ ，...，g _m ) Combining to obtain feature X= (f) ₁ ，f ₂ ，...，f _n ，g ₁ ，g ₂ ，...，g _m )＝(X ₁ ，X ₂ ，...，X _m+n )。

For each feature X _i The importance of the sample was calculated Vim (X _i ). Selecting a coefficient of Kerning to evaluate the feature importance degree for the classification problem; for regression problems, a minimum mean square error evaluation was chosen.

For X _i The feature hypothesis event log may be divided into e= { E ₁ ，E ₂ ，...，E _K The coefficient of its basis can be expressed as

The decision forest T comprises a decision tree T ₁ ，…，t _j ，…，t _c Feature X _i In decision tree t _j Importance of Vim in (a) _j For its variation of the coefficient of the kunning before and after the branch p

Vim _p，j (E，X _i )=Gini _p (E，X _i )-Gini ₁ (E，X _i )-Gini _r (E，X _i )

Wherein Gini is _p Representing the coefficient of radix-before-branch, gini ₁ With Gini _r The coefficient of the base after branching is shown. Thereby obtaining the importance degree of each feature

V(v ₁ ，v ₂ ，...，v _m+n )

v _i =Vim(E，X _i )

S3-feature distance measurement

The edit distance lev is defined as the minimum number of insert, delete and replace operations required to convert one sequence into another. Let the sequence information feature be f= (f) ₁ ，f ₂ ，...，f _n ) Wherein f _i ＝(α _i，1 ，α _i，2 ，...，α _i，n ) Representing the feature vectors encoded by a frequent sub-active set. For track sigma _a ＝<e _a，1 ，e _a，2 ，...，e _a，K ，>，lev(f _i (σ _a )，f _i (σ _b ) Representing the locus sigma _a And sigma (sigma) _b At f _i The distance above, let (f) _i (σ _a )，f _i (σ _b ))，d(g _i (σ _a )，g _i (σ _b ) Representing its attribute euclidean distance.

Importance weight is given to each distance, and the method can obtain

The method is a track distance measuring function of the algorithm.

S4-find K nearest neighbor prediction

Based on the track filtering module, the K most similar historical tracks are found by using the distance formula set forth in the section above. Voting (discrete value) or averaging (continuous value) the target values in the K similar historical tracks to obtain a prediction result.

S5-concept drift detection

The business process itself will change over time. The change of the business process itself is divided into 2 cases: frequent relationships between activities change and are changed.

When the frequent relation among activities changes and a new execution track is added into the historical data, the mutual information values of all related activities are synchronously updated. When the relationship between the mutual information value and the threshold value changes, the frequent active set is recalculated.

If the frequent relation among activities is not changed, the method for searching the history similar tracks and features in the track filtering module and the feature distance measuring module has the capability of automatically filtering logs before changing, and the most similar execution data is screened out.

Business process management has become an important part of modern enterprises. The enterprise records the execution process of the business process to form an event log. The mining analysis is performed on the historical information recorded by the event log, and the mining analysis can be used for predictively monitoring the execution status of the current process, such as predicting the execution result of the next activity and the flow, the abnormal probability of the flow, the execution time of the activity and the like. Predictive monitoring predicts a series of follow-up activities through the current incomplete event trace prefix, which helps the enterprise monitor the status of each flow execution, discover the risk in time to make corresponding countermeasures ahead of time, and improve the enterprise's ability to schedule resources.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that these are by way of example only, and the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the principles and spirit of the invention, but such changes and modifications fall within the scope of the invention.

Claims

1. The business process predictive monitoring method is characterized by comprising the following steps of:

s1, calculating frequent active sets through a mutual information formula, reserving sequence information of key activities in each frequent active set, encoding an active sequence by using the frequent active sets,

activity a _i ，a _j The mutual information is defined as:

wherein p (a) _i ，a _j ) Is activity a _i 、a _j Appear in the same log trace and a _i Appear at a _j Probability of front, p (a _i ) Is a as _i Independent probabilities of occurrence in log traces, p (a _j ) Is a as _j The independent probability of occurrence in the log trace,

let 1 _k Is a frequent active set, thenAre all present with I (a) _i ：a _j )>>0; assume that an activity is divided into multiple frequent sub-activity sets l= { L ₁ ，l ₂ ，...，l _n }，l _i ＝{a ₁ ，a ₂ ，...，a _m Construction of n sub-feature vectors f ₁ ，f ₂ ，...，f _n The method comprises the steps of carrying out a first treatment on the surface of the For log trace σ=<e ₁ ，e ₂ ，...，e _s >Divide it into E ^* ＝{E ₁ ，E ₂ ，...，E _z Z subsets, for event e _j Has pi _A (e _j )＝a _i E is then _j ∈E _i The method comprises the steps of carrying out a first treatment on the surface of the Pair E _i All events in (a) are ordered according to the occurrence time to obtain +.>

The feature sub-vectors may constitute a feature vector f= (f) ₁ ，f ₂ ，...，f _n )；

2. The business process predictive monitoring method of claim 1, wherein said predictive monitoring method further comprises the steps of: s5: when the frequent relation between activities is changed and a new execution track is added into historical data, synchronously updating the mutual information values of all related activities, and when the relation between the mutual information values and a threshold value is changed, recalculating a frequent activity set;

when the frequent relation among activities is not changed, the method for searching the history similar tracks and features in the track filtering module and the feature distance measuring module has the capability of automatically filtering logs before changing, and the most similar execution data is screened out.

3. The business process predictive monitoring method of claim 1, wherein, in step S2,

the sequence information is characterized by f= (f) ₁ ，f ₂ ，...，f _n )

The attribute features are denoted as g= (g ₁ ，g ₂ ，...，g _m )

Combining to obtain feature X= (f) ₁ ，f ₂ ，...，f _n ，g ₁ ，g ₂ ，...，g _m )＝(X ₁ ，X ₂ ，...，X _m+n )

For X _i The feature assumes that the event log is divided into e= { E ₁ ，E ₂ ，...，E _K [ Kidney coefficient ]

The decision forest T comprises a decision tree T ₁ ，...，t _j ，...，t _c Feature X _i In decision tree t _j Importance of Vim in (a) _j The variation of the coefficient of the kunity before and after the branch p;

Vim _p，j (E，X _i )＝Gini _p (E，X _i )-Gini _l (E，X _i )-Gini _r (E，X _i )

wherein Gini is _p Representing the coefficient of radix-before-branch, gini _l With Gini _r Representing the coefficient of the post-branching Kennel, thereby obtaining the importance of each feature

V＝(v ₁ ，v ₂ ，...，v _m+n )

v _i ＝Vim(E，X _i )。

4. A business process predictive monitoring method as claimed in claim 3, characterized in that in step S3 the edit distance lev is defined as the minimum number of insert, delete and replace operations required to convert one sequence into another;

let the sequence information feature be f= (f) ₁ ，f ₂ ，...，f _n ) Wherein f _i ＝(α _i，1 ，α _i，2 ，...，α _i，n ) Representing the feature vector encoded by a frequent sub-active set, for trace sigma _a ＝<e _a，1 ，e _a，2 ，...，e _a，K ，>，lev(f _i (σ _a )，f _i (σ _b ) Representing the locus sigma _a And sigma (sigma) _b At f _i The distance above, let (f) _i (σ _a )，f _i (σ _b ))，d(g _i (σ _a )，g _i (σ _b ) Representing the Euclidean distance of the attribute;

importance weight is given to each distance, and the method can obtain

The track distance measurement function of the algorithm is obtained.