CN109753591B - Business process predictive monitoring method - Google Patents

Business process predictive monitoring method Download PDF

Info

Publication number
CN109753591B
CN109753591B CN201811510292.8A CN201811510292A CN109753591B CN 109753591 B CN109753591 B CN 109753591B CN 201811510292 A CN201811510292 A CN 201811510292A CN 109753591 B CN109753591 B CN 109753591B
Authority
CN
China
Prior art keywords
frequent
feature
activity
distance
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811510292.8A
Other languages
Chinese (zh)
Other versions
CN109753591A (en
Inventor
王伟
曹健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangyin Zhuri Information Technology Co ltd
Original Assignee
Jiangyin Zhuri Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangyin Zhuri Information Technology Co ltd filed Critical Jiangyin Zhuri Information Technology Co ltd
Priority to CN201811510292.8A priority Critical patent/CN109753591B/en
Publication of CN109753591A publication Critical patent/CN109753591A/en
Application granted granted Critical
Publication of CN109753591B publication Critical patent/CN109753591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a business process predictability monitoring method, which is used for carrying out sequence coding and sequence distance measurement based on an event log. The algorithm encodes the activity sequence by using the frequent activity set, gives different weights to the frequent activity subsequence and the data attribute, and searches the historical similar data for predictive monitoring. The monitoring mode effectively monitors the flow in the execution process on the basis that the flow model cannot be known, and adapts to model change caused by concept drift. The mining analysis is performed on the historical information recorded by the event log, and the mining analysis can be used for predictively monitoring the execution status of the current process, such as predicting the next activity, the execution result of the process, the abnormal probability of the process, the execution time of the activity and the like. Predictive monitoring predicts a series of follow-up activities through the current incomplete event trace prefix, which helps the enterprise monitor the status of each flow execution, discover the risk in time to make corresponding countermeasures ahead of time, and improve the enterprise's ability to schedule resources.

Description

Business process predictive monitoring method
Technical Field
The invention relates to a business process predictive monitoring method, in particular to a business process predictive monitoring method based on frequent active set sequence coding and distance measurement.
Background
Predictive business process monitoring can be categorized into 2 broad categories depending on whether or not the process model is mined: a part of researchers focus on predictive monitoring after excavating a process model; another type of supervisory prediction relaxes the assumption that no process model needs to be obtained, but rather predictions are based solely on log-based basis. In active sequence based prediction, sequence coding is an important part of it. The sequence coding method comprises Boolean coding, frequency coding, index-based coding and the like. Boolean coding encodes each attribute value as a one-dimensional feature, the value of which means whether the attribute value appears in the log trace. Frequency-based encoding constructs the same feature vector as boolean encoding, but with the difference that its value represents the number of times that the attribute value appears in the log trace. Boolean coding and frequency-based coding can only encode discrete properties, but cannot encode continuous properties. Another type of coding method is index-based coding. Boolean coding and frequency-based coding methods cannot represent the order in which activities occur, whereas in index-based coding, each feature corresponds to an event in the sequence, and the value is the activity represented by the event and its attribute value.
Currently available predictive monitoring, mostly based on the assumption of smooth business processes, proposes some predictive methods based on mining process models. In practice, enterprises often do not model end-to-end complete flows and manage them through a workflow system, so that certain links are not under the control of the workflow system. In other scenarios, the outside needs to know or predict the progress information of the process, and for reasons of data privacy protection, they can only get information about the occurrence of certain events and not all activities and inter-activity relationships of the whole process. Also, over time, the flow may change, such that the event logs do not correspond to the same process model. Thus, in these cases, the predictions cannot be based on known flow models.
For the prediction problem based on event log information, researchers have proposed a method of sequence comparison by constructing a prediction model by regarding an execution trace as a symbol sequence. However, the specificity of each flow and the variability of the external environment present further challenges to flow prediction, and to overcome this challenge, other features that affect the flow, such as data attribute information, must be comprehensively considered.
Disclosure of Invention
The invention provides a business process predictive monitoring method aiming at the problems and the defects of the prior art, solves the problem of predictive monitoring on the running process on the basis of being unable to learn a process model, and can adapt to the situations of parallel activity, low-frequency activity and concept drift occurring under the actual condition.
The invention solves the technical problems by the following technical proposal:
the invention provides a business process predictive monitoring method, which comprises the following steps:
s1, calculating frequent active sets through a mutual information formula, reserving sequence information of key activities in each frequent active set, and encoding an active sequence by using the frequent active sets;
s2, giving different weights to the frequent active set through a random forest algorithm to form a feature vector set;
s3, measuring the distance between two tracks by using an edit distance in each frequent active set, and weighting the distance between the frequent active sets to obtain the distance to the whole track;
and S4, searching K most similar historical tracks by using the weighted distance, and voting or averaging target values in the K similar historical tracks to obtain a prediction result.
On the basis of conforming to the common knowledge in the field, the above preferred conditions can be arbitrarily combined to obtain the preferred examples of the invention.
The invention has the positive progress effects that:
the invention provides a prediction technology based on a log, and provides a new sequence coding method and a sequence distance measuring mode for performing monitoring prediction on a flow. Compared with Boolean coding, frequency coding and index-based methods, the method considers the influence of different attributes and activities on the prediction target, effectively processes the inter-sequence distance measurement problem under the conditions of low-frequency activities and parallel activities, and can adapt to model change caused by concept drift along with the update of the log.
Drawings
FIG. 1 is a block diagram of a business process predictive monitoring algorithm according to a preferred embodiment of the invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment provides a business process predictability monitoring method based on frequent active set sequence coding and distance measurement, which carries out sequence coding and sequence distance measurement based on event logs. The algorithm encodes the activity sequence by using the frequent activity set, gives different weights to the frequent activity subsequence and the data attribute, and searches the historical similar data for predictive monitoring. The monitoring mode effectively monitors the flow in the execution process on the basis that the flow model cannot be known, and adapts to model change caused by concept drift.
As shown in fig. 1, the predictive monitoring algorithm has the following steps:
s1-frequent active set encoding
For activity a i ,a j The mutual information is defined as
Wherein p (a) i ,a j ) Is activity a i 、a j Appear in the same log trace and a i Appear at a j Probability of front, p (a i ) Is a as i Independent probabilities of occurrence in log traces, p (a j ) Is a as j The independent probabilities of occurrence in the log trace.
Let 1 k Is a frequent active set, thena j ∈l k Are all present in I (a i :a j ) > 0. Assume that an activity can be divided into multiple frequent sub-activity sets l= { L 1 ,l 2 ,...,l n },l i ={a 1 ,a 2 ,...,a m N sub-feature vectors f can be constructed 1 ,f 2 ,...,f n . For log trace σ=<e 1 ,e 2 ,...,e s >It can be classified into E * ={E 1 ,E 2 ,...,E N N subsets for event e j Has pi A (e j )=a i Thene j ∈E i . Pair E i All events in the list are ordered according to the occurrence time, and can be obtained
If the attribute set isThe eigenvector is f k =(α 1,1 ,α 1,2 ,...,α 1,s ,α 2,1 ,...,α 2,s ,...,α m,s ) Wherein
The feature sub-vectors may constitute a feature vector f= (f) 1 ,f 2 ,...,f n )。
S2-feature selection
The attributes are screened using a random forest. The sequence information is characterized by f= (f) 1 ,f 2 ,...,f n ) The attribute features are denoted as g= (g 1 ,g 2 ,...,g m ) Combining to obtain feature X= (f) 1 ,f 2 ,...,f n ,g 1 ,g 2 ,...,g m )=(X 1 ,X 2 ,...,X m+n )。
For each feature X i The importance of the sample was calculated Vim (X i ). Selecting a coefficient of Kerning to evaluate the feature importance degree for the classification problem; for regression problems, a minimum mean square error evaluation was chosen.
For X i The feature hypothesis event log may be divided into e= { E 1 ,E 2 ,...,E K The coefficient of its basis can be expressed as
The decision forest T comprises a decision tree T 1 ,…,t j ,…,t c Feature X i In decision tree t j Importance of Vim in (a) j For its variation of the coefficient of the kunning before and after the branch p
Vim p,j (E,X i )=Gini p (E,X i )-Gini 1 (E,X i )-Gini r (E,X i )
Wherein Gini is p Representing the coefficient of radix-before-branch, gini 1 With Gini r The coefficient of the base after branching is shown. Thereby obtaining the importance degree of each feature
V(v 1 ,v 2 ,...,v m+n )
v i =Vim(E,X i )
S3-feature distance measurement
The edit distance lev is defined as the minimum number of insert, delete and replace operations required to convert one sequence into another. Let the sequence information feature be f= (f) 1 ,f 2 ,...,f n ) Wherein f i =(α i,1 ,α i,2 ,...,α i,n ) Representing the feature vectors encoded by a frequent sub-active set. For track sigma a =<e a,1 ,e a,2 ,...,e a,K ,>,lev(f ia ),f ib ) Representing the locus sigma a And sigma (sigma) b At f i The distance above, let (f) ia ),f ib )),d(g ia ),g ib ) Representing its attribute euclidean distance.
Importance weight is given to each distance, and the method can obtain
The method is a track distance measuring function of the algorithm.
S4-find K nearest neighbor prediction
Based on the track filtering module, the K most similar historical tracks are found by using the distance formula set forth in the section above. Voting (discrete value) or averaging (continuous value) the target values in the K similar historical tracks to obtain a prediction result.
S5-concept drift detection
The business process itself will change over time. The change of the business process itself is divided into 2 cases: frequent relationships between activities change and are changed.
When the frequent relation among activities changes and a new execution track is added into the historical data, the mutual information values of all related activities are synchronously updated. When the relationship between the mutual information value and the threshold value changes, the frequent active set is recalculated.
If the frequent relation among activities is not changed, the method for searching the history similar tracks and features in the track filtering module and the feature distance measuring module has the capability of automatically filtering logs before changing, and the most similar execution data is screened out.
Business process management has become an important part of modern enterprises. The enterprise records the execution process of the business process to form an event log. The mining analysis is performed on the historical information recorded by the event log, and the mining analysis can be used for predictively monitoring the execution status of the current process, such as predicting the execution result of the next activity and the flow, the abnormal probability of the flow, the execution time of the activity and the like. Predictive monitoring predicts a series of follow-up activities through the current incomplete event trace prefix, which helps the enterprise monitor the status of each flow execution, discover the risk in time to make corresponding countermeasures ahead of time, and improve the enterprise's ability to schedule resources.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that these are by way of example only, and the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the principles and spirit of the invention, but such changes and modifications fall within the scope of the invention.

Claims (4)

1. The business process predictive monitoring method is characterized by comprising the following steps of:
s1, calculating frequent active sets through a mutual information formula, reserving sequence information of key activities in each frequent active set, encoding an active sequence by using the frequent active sets,
activity a i ,a j The mutual information is defined as:
wherein p (a) i ,a j ) Is activity a i 、a j Appear in the same log trace and a i Appear at a j Probability of front, p (a i ) Is a as i Independent probabilities of occurrence in log traces, p (a j ) Is a as j The independent probability of occurrence in the log trace,
let 1 k Is a frequent active set, thenAre all present with I (a) i :a j )>>0; assume that an activity is divided into multiple frequent sub-activity sets l= { L 1 ,l 2 ,...,l n },l i ={a 1 ,a 2 ,...,a m Construction of n sub-feature vectors f 1 ,f 2 ,...,f n The method comprises the steps of carrying out a first treatment on the surface of the For log trace σ=<e 1 ,e 2 ,...,e s >Divide it into E * ={E 1 ,E 2 ,...,E z Z subsets, for event e j Has pi A (e j )=a i E is then j ∈E i The method comprises the steps of carrying out a first treatment on the surface of the Pair E i All events in (a) are ordered according to the occurrence time to obtain +.>
If the attribute set isThe eigenvector is f k =(α 1,1 ,α 1,2 ,...,α 1,s ,α 2,1 ,...,α 2,s ,...,α m,s ) Wherein
The feature sub-vectors may constitute a feature vector f= (f) 1 ,f 2 ,...,f n );
S2, giving different weights to the frequent active set through a random forest algorithm to form a feature vector set;
s3, measuring the distance between two tracks by using an edit distance in each frequent active set, and weighting the distance between the frequent active sets to obtain the distance to the whole track;
and S4, searching K most similar historical tracks by using the weighted distance, and voting or averaging target values in the K similar historical tracks to obtain a prediction result.
2. The business process predictive monitoring method of claim 1, wherein said predictive monitoring method further comprises the steps of: s5: when the frequent relation between activities is changed and a new execution track is added into historical data, synchronously updating the mutual information values of all related activities, and when the relation between the mutual information values and a threshold value is changed, recalculating a frequent activity set;
when the frequent relation among activities is not changed, the method for searching the history similar tracks and features in the track filtering module and the feature distance measuring module has the capability of automatically filtering logs before changing, and the most similar execution data is screened out.
3. The business process predictive monitoring method of claim 1, wherein, in step S2,
the sequence information is characterized by f= (f) 1 ,f 2 ,...,f n )
The attribute features are denoted as g= (g 1 ,g 2 ,...,g m )
Combining to obtain feature X= (f) 1 ,f 2 ,...,f n ,g 1 ,g 2 ,...,g m )=(X 1 ,X 2 ,...,X m+n )
For X i The feature assumes that the event log is divided into e= { E 1 ,E 2 ,...,E K [ Kidney coefficient ]
The decision forest T comprises a decision tree T 1 ,...,t j ,...,t c Feature X i In decision tree t j Importance of Vim in (a) j The variation of the coefficient of the kunity before and after the branch p;
Vim p,j (E,X i )=Gini p (E,X i )-Gini l (E,X i )-Gini r (E,X i )
wherein Gini is p Representing the coefficient of radix-before-branch, gini l With Gini r Representing the coefficient of the post-branching Kennel, thereby obtaining the importance of each feature
V=(v 1 ,v 2 ,...,v m+n )
v i =Vim(E,X i )。
4. A business process predictive monitoring method as claimed in claim 3, characterized in that in step S3 the edit distance lev is defined as the minimum number of insert, delete and replace operations required to convert one sequence into another;
let the sequence information feature be f= (f) 1 ,f 2 ,...,f n ) Wherein f i =(α i,1 ,α i,2 ,...,α i,n ) Representing the feature vector encoded by a frequent sub-active set, for trace sigma a =<e a,1 ,e a,2 ,...,e a,K ,>,lev(f ia ),f ib ) Representing the locus sigma a And sigma (sigma) b At f i The distance above, let (f) ia ),f ib )),d(g ia ),g ib ) Representing the Euclidean distance of the attribute;
importance weight is given to each distance, and the method can obtain
The track distance measurement function of the algorithm is obtained.
CN201811510292.8A 2018-12-11 2018-12-11 Business process predictive monitoring method Active CN109753591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811510292.8A CN109753591B (en) 2018-12-11 2018-12-11 Business process predictive monitoring method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811510292.8A CN109753591B (en) 2018-12-11 2018-12-11 Business process predictive monitoring method

Publications (2)

Publication Number Publication Date
CN109753591A CN109753591A (en) 2019-05-14
CN109753591B true CN109753591B (en) 2024-01-09

Family

ID=66403689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811510292.8A Active CN109753591B (en) 2018-12-11 2018-12-11 Business process predictive monitoring method

Country Status (1)

Country Link
CN (1) CN109753591B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659275B (en) * 2019-09-23 2022-02-08 东华大学 Dynamic production environment abnormity monitoring system facing to real-time data flow
CN110956309A (en) * 2019-10-30 2020-04-03 南京大学 Flow activity prediction method based on CRF and LSTM
CN111191897B (en) * 2019-12-23 2023-06-30 浙江传媒学院 Business process online compliance prediction method and system based on bidirectional GRU neural network
CN111353702A (en) * 2020-02-28 2020-06-30 中国工商银行股份有限公司 Change operation risk calculation method and device
CN111782620A (en) * 2020-06-19 2020-10-16 多加网络科技(北京)有限公司 Credit link automatic tracking platform and method thereof
CN112052273B (en) * 2020-07-27 2021-08-31 杭州电子科技大学 Method for extracting next candidate activity of multi-angle business process
CN112612765A (en) * 2020-12-21 2021-04-06 山东理工大学 Flow variant difference analysis method and system based on drift detection
CN113537712B (en) * 2021-06-10 2022-03-08 杭州电子科技大学 Business process residual activity sequence prediction method based on trajectory replay
CN114757592B (en) * 2022-06-15 2022-10-21 北京乐开科技有限责任公司 Workflow engine and RPA (resilient packet Access) fusion arrangement method and system
CN115878421B (en) * 2022-12-09 2023-11-14 国网湖北省电力有限公司信息通信公司 Data center equipment level fault prediction method, system and medium
CN116225513B (en) * 2023-05-09 2023-07-04 安徽思高智能科技有限公司 RPA dynamic flow discovery method and system based on concept drift

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193804A (en) * 2017-06-02 2017-09-22 河海大学 A kind of refuse messages text feature selection method towards word and portmanteau word
CN107808258A (en) * 2017-11-21 2018-03-16 杭州电子科技大学 The optimal employee's distribution method of workflow based on traffic log and collaboration mode

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193804A (en) * 2017-06-02 2017-09-22 河海大学 A kind of refuse messages text feature selection method towards word and portmanteau word
CN107808258A (en) * 2017-11-21 2018-03-16 杭州电子科技大学 The optimal employee's distribution method of workflow based on traffic log and collaboration mode

Also Published As

Publication number Publication date
CN109753591A (en) 2019-05-14

Similar Documents

Publication Publication Date Title
CN109753591B (en) Business process predictive monitoring method
KR102274389B1 (en) Method for building anomaly pattern detection model using sensor data, apparatus and method for detecting anomaly using the same
US9379951B2 (en) Method and apparatus for detection of anomalies in integrated parameter systems
AU2018203375A1 (en) Method and system for data based optimization of performance indicators in process and manufacturing industries
KR20190019493A (en) It system fault analysis technique based on configuration management database
CN112800116B (en) Method and device for detecting abnormity of service data
KR101910926B1 (en) Technique for processing fault event of it system
JP7195264B2 (en) Automated decision-making using step-by-step machine learning
US9799007B2 (en) Method of collaborative software development
CN110334208B (en) LKJ fault prediction diagnosis method and system based on Bayesian belief network
Kadwe et al. A review on concept drift
KR102359090B1 (en) Method and System for Real-time Abnormal Insider Event Detection on Enterprise Resource Planning System
CN113032238A (en) Real-time root cause analysis method based on application knowledge graph
CN115001753B (en) Method and device for analyzing associated alarms, electronic equipment and storage medium
CN116457802A (en) Automatic real-time detection, prediction and prevention of rare faults in industrial systems using unlabeled sensor data
CN114676435A (en) Knowledge graph-based software vulnerability availability prediction method
JP5889759B2 (en) Missing value prediction device, missing value prediction method, missing value prediction program
WO2023009027A1 (en) Method and system for warning of upcoming anomalies in a drilling process
JP7367196B2 (en) Methods and systems for identification and analysis of regime shifts
JP2023520066A (en) Data processing for industrial machine learning
CN116668083A (en) Network traffic anomaly detection method and system
CN112348318B (en) Training and application method and device of supply chain risk prediction model
CN116186603A (en) Abnormal user identification method and device, computer storage medium and electronic equipment
KR20190030193A (en) Technique for processing fault event of it system
CN112201340B (en) Electrocardiogram disease determination method based on Bayesian network filtering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant