CN109753591B - Business process predictive monitoring method - Google Patents
Business process predictive monitoring method Download PDFInfo
- Publication number
- CN109753591B CN109753591B CN201811510292.8A CN201811510292A CN109753591B CN 109753591 B CN109753591 B CN 109753591B CN 201811510292 A CN201811510292 A CN 201811510292A CN 109753591 B CN109753591 B CN 109753591B
- Authority
- CN
- China
- Prior art keywords
- frequent
- feature
- activity
- distance
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000012544 monitoring process Methods 0.000 title claims abstract description 28
- 230000000694 effects Effects 0.000 claims abstract description 40
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 8
- 238000005259 measurement Methods 0.000 claims abstract description 7
- 239000013598 vector Substances 0.000 claims description 11
- 238000001914 filtration Methods 0.000 claims description 5
- 238000003066 decision tree Methods 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000007637 random forest analysis Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims 1
- 210000003734 kidney Anatomy 0.000 claims 1
- 238000005065 mining Methods 0.000 abstract description 5
- 230000002159 abnormal effect Effects 0.000 abstract description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a business process predictability monitoring method, which is used for carrying out sequence coding and sequence distance measurement based on an event log. The algorithm encodes the activity sequence by using the frequent activity set, gives different weights to the frequent activity subsequence and the data attribute, and searches the historical similar data for predictive monitoring. The monitoring mode effectively monitors the flow in the execution process on the basis that the flow model cannot be known, and adapts to model change caused by concept drift. The mining analysis is performed on the historical information recorded by the event log, and the mining analysis can be used for predictively monitoring the execution status of the current process, such as predicting the next activity, the execution result of the process, the abnormal probability of the process, the execution time of the activity and the like. Predictive monitoring predicts a series of follow-up activities through the current incomplete event trace prefix, which helps the enterprise monitor the status of each flow execution, discover the risk in time to make corresponding countermeasures ahead of time, and improve the enterprise's ability to schedule resources.
Description
Technical Field
The invention relates to a business process predictive monitoring method, in particular to a business process predictive monitoring method based on frequent active set sequence coding and distance measurement.
Background
Predictive business process monitoring can be categorized into 2 broad categories depending on whether or not the process model is mined: a part of researchers focus on predictive monitoring after excavating a process model; another type of supervisory prediction relaxes the assumption that no process model needs to be obtained, but rather predictions are based solely on log-based basis. In active sequence based prediction, sequence coding is an important part of it. The sequence coding method comprises Boolean coding, frequency coding, index-based coding and the like. Boolean coding encodes each attribute value as a one-dimensional feature, the value of which means whether the attribute value appears in the log trace. Frequency-based encoding constructs the same feature vector as boolean encoding, but with the difference that its value represents the number of times that the attribute value appears in the log trace. Boolean coding and frequency-based coding can only encode discrete properties, but cannot encode continuous properties. Another type of coding method is index-based coding. Boolean coding and frequency-based coding methods cannot represent the order in which activities occur, whereas in index-based coding, each feature corresponds to an event in the sequence, and the value is the activity represented by the event and its attribute value.
Currently available predictive monitoring, mostly based on the assumption of smooth business processes, proposes some predictive methods based on mining process models. In practice, enterprises often do not model end-to-end complete flows and manage them through a workflow system, so that certain links are not under the control of the workflow system. In other scenarios, the outside needs to know or predict the progress information of the process, and for reasons of data privacy protection, they can only get information about the occurrence of certain events and not all activities and inter-activity relationships of the whole process. Also, over time, the flow may change, such that the event logs do not correspond to the same process model. Thus, in these cases, the predictions cannot be based on known flow models.
For the prediction problem based on event log information, researchers have proposed a method of sequence comparison by constructing a prediction model by regarding an execution trace as a symbol sequence. However, the specificity of each flow and the variability of the external environment present further challenges to flow prediction, and to overcome this challenge, other features that affect the flow, such as data attribute information, must be comprehensively considered.
Disclosure of Invention
The invention provides a business process predictive monitoring method aiming at the problems and the defects of the prior art, solves the problem of predictive monitoring on the running process on the basis of being unable to learn a process model, and can adapt to the situations of parallel activity, low-frequency activity and concept drift occurring under the actual condition.
The invention solves the technical problems by the following technical proposal:
the invention provides a business process predictive monitoring method, which comprises the following steps:
s1, calculating frequent active sets through a mutual information formula, reserving sequence information of key activities in each frequent active set, and encoding an active sequence by using the frequent active sets;
s2, giving different weights to the frequent active set through a random forest algorithm to form a feature vector set;
s3, measuring the distance between two tracks by using an edit distance in each frequent active set, and weighting the distance between the frequent active sets to obtain the distance to the whole track;
and S4, searching K most similar historical tracks by using the weighted distance, and voting or averaging target values in the K similar historical tracks to obtain a prediction result.
On the basis of conforming to the common knowledge in the field, the above preferred conditions can be arbitrarily combined to obtain the preferred examples of the invention.
The invention has the positive progress effects that:
the invention provides a prediction technology based on a log, and provides a new sequence coding method and a sequence distance measuring mode for performing monitoring prediction on a flow. Compared with Boolean coding, frequency coding and index-based methods, the method considers the influence of different attributes and activities on the prediction target, effectively processes the inter-sequence distance measurement problem under the conditions of low-frequency activities and parallel activities, and can adapt to model change caused by concept drift along with the update of the log.
Drawings
FIG. 1 is a block diagram of a business process predictive monitoring algorithm according to a preferred embodiment of the invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment provides a business process predictability monitoring method based on frequent active set sequence coding and distance measurement, which carries out sequence coding and sequence distance measurement based on event logs. The algorithm encodes the activity sequence by using the frequent activity set, gives different weights to the frequent activity subsequence and the data attribute, and searches the historical similar data for predictive monitoring. The monitoring mode effectively monitors the flow in the execution process on the basis that the flow model cannot be known, and adapts to model change caused by concept drift.
As shown in fig. 1, the predictive monitoring algorithm has the following steps:
s1-frequent active set encoding
For activity a i ,a j The mutual information is defined as
Wherein p (a) i ,a j ) Is activity a i 、a j Appear in the same log trace and a i Appear at a j Probability of front, p (a i ) Is a as i Independent probabilities of occurrence in log traces, p (a j ) Is a as j The independent probabilities of occurrence in the log trace.
Let 1 k Is a frequent active set, thena j ∈l k Are all present in I (a i :a j ) > 0. Assume that an activity can be divided into multiple frequent sub-activity sets l= { L 1 ,l 2 ,...,l n },l i ={a 1 ,a 2 ,...,a m N sub-feature vectors f can be constructed 1 ,f 2 ,...,f n . For log trace σ=<e 1 ,e 2 ,...,e s >It can be classified into E * ={E 1 ,E 2 ,...,E N N subsets for event e j Has pi A (e j )=a i Thene j ∈E i . Pair E i All events in the list are ordered according to the occurrence time, and can be obtained
If the attribute set isThe eigenvector is f k =(α 1,1 ,α 1,2 ,...,α 1,s ,α 2,1 ,...,α 2,s ,...,α m,s ) Wherein
The feature sub-vectors may constitute a feature vector f= (f) 1 ,f 2 ,...,f n )。
S2-feature selection
The attributes are screened using a random forest. The sequence information is characterized by f= (f) 1 ,f 2 ,...,f n ) The attribute features are denoted as g= (g 1 ,g 2 ,...,g m ) Combining to obtain feature X= (f) 1 ,f 2 ,...,f n ,g 1 ,g 2 ,...,g m )=(X 1 ,X 2 ,...,X m+n )。
For each feature X i The importance of the sample was calculated Vim (X i ). Selecting a coefficient of Kerning to evaluate the feature importance degree for the classification problem; for regression problems, a minimum mean square error evaluation was chosen.
For X i The feature hypothesis event log may be divided into e= { E 1 ,E 2 ,...,E K The coefficient of its basis can be expressed as
The decision forest T comprises a decision tree T 1 ,…,t j ,…,t c Feature X i In decision tree t j Importance of Vim in (a) j For its variation of the coefficient of the kunning before and after the branch p
Vim p,j (E,X i )=Gini p (E,X i )-Gini 1 (E,X i )-Gini r (E,X i )
Wherein Gini is p Representing the coefficient of radix-before-branch, gini 1 With Gini r The coefficient of the base after branching is shown. Thereby obtaining the importance degree of each feature
V(v 1 ,v 2 ,...,v m+n )
v i =Vim(E,X i )
S3-feature distance measurement
The edit distance lev is defined as the minimum number of insert, delete and replace operations required to convert one sequence into another. Let the sequence information feature be f= (f) 1 ,f 2 ,...,f n ) Wherein f i =(α i,1 ,α i,2 ,...,α i,n ) Representing the feature vectors encoded by a frequent sub-active set. For track sigma a =<e a,1 ,e a,2 ,...,e a,K ,>,lev(f i (σ a ),f i (σ b ) Representing the locus sigma a And sigma (sigma) b At f i The distance above, let (f) i (σ a ),f i (σ b )),d(g i (σ a ),g i (σ b ) Representing its attribute euclidean distance.
Importance weight is given to each distance, and the method can obtain
The method is a track distance measuring function of the algorithm.
S4-find K nearest neighbor prediction
Based on the track filtering module, the K most similar historical tracks are found by using the distance formula set forth in the section above. Voting (discrete value) or averaging (continuous value) the target values in the K similar historical tracks to obtain a prediction result.
S5-concept drift detection
The business process itself will change over time. The change of the business process itself is divided into 2 cases: frequent relationships between activities change and are changed.
When the frequent relation among activities changes and a new execution track is added into the historical data, the mutual information values of all related activities are synchronously updated. When the relationship between the mutual information value and the threshold value changes, the frequent active set is recalculated.
If the frequent relation among activities is not changed, the method for searching the history similar tracks and features in the track filtering module and the feature distance measuring module has the capability of automatically filtering logs before changing, and the most similar execution data is screened out.
Business process management has become an important part of modern enterprises. The enterprise records the execution process of the business process to form an event log. The mining analysis is performed on the historical information recorded by the event log, and the mining analysis can be used for predictively monitoring the execution status of the current process, such as predicting the execution result of the next activity and the flow, the abnormal probability of the flow, the execution time of the activity and the like. Predictive monitoring predicts a series of follow-up activities through the current incomplete event trace prefix, which helps the enterprise monitor the status of each flow execution, discover the risk in time to make corresponding countermeasures ahead of time, and improve the enterprise's ability to schedule resources.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that these are by way of example only, and the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the principles and spirit of the invention, but such changes and modifications fall within the scope of the invention.
Claims (4)
1. The business process predictive monitoring method is characterized by comprising the following steps of:
s1, calculating frequent active sets through a mutual information formula, reserving sequence information of key activities in each frequent active set, encoding an active sequence by using the frequent active sets,
activity a i ,a j The mutual information is defined as:
wherein p (a) i ,a j ) Is activity a i 、a j Appear in the same log trace and a i Appear at a j Probability of front, p (a i ) Is a as i Independent probabilities of occurrence in log traces, p (a j ) Is a as j The independent probability of occurrence in the log trace,
let 1 k Is a frequent active set, thenAre all present with I (a) i :a j )>>0; assume that an activity is divided into multiple frequent sub-activity sets l= { L 1 ,l 2 ,...,l n },l i ={a 1 ,a 2 ,...,a m Construction of n sub-feature vectors f 1 ,f 2 ,...,f n The method comprises the steps of carrying out a first treatment on the surface of the For log trace σ=<e 1 ,e 2 ,...,e s >Divide it into E * ={E 1 ,E 2 ,...,E z Z subsets, for event e j Has pi A (e j )=a i E is then j ∈E i The method comprises the steps of carrying out a first treatment on the surface of the Pair E i All events in (a) are ordered according to the occurrence time to obtain +.>
If the attribute set isThe eigenvector is f k =(α 1,1 ,α 1,2 ,...,α 1,s ,α 2,1 ,...,α 2,s ,...,α m,s ) Wherein
The feature sub-vectors may constitute a feature vector f= (f) 1 ,f 2 ,...,f n );
S2, giving different weights to the frequent active set through a random forest algorithm to form a feature vector set;
s3, measuring the distance between two tracks by using an edit distance in each frequent active set, and weighting the distance between the frequent active sets to obtain the distance to the whole track;
and S4, searching K most similar historical tracks by using the weighted distance, and voting or averaging target values in the K similar historical tracks to obtain a prediction result.
2. The business process predictive monitoring method of claim 1, wherein said predictive monitoring method further comprises the steps of: s5: when the frequent relation between activities is changed and a new execution track is added into historical data, synchronously updating the mutual information values of all related activities, and when the relation between the mutual information values and a threshold value is changed, recalculating a frequent activity set;
when the frequent relation among activities is not changed, the method for searching the history similar tracks and features in the track filtering module and the feature distance measuring module has the capability of automatically filtering logs before changing, and the most similar execution data is screened out.
3. The business process predictive monitoring method of claim 1, wherein, in step S2,
the sequence information is characterized by f= (f) 1 ,f 2 ,...,f n )
The attribute features are denoted as g= (g 1 ,g 2 ,...,g m )
Combining to obtain feature X= (f) 1 ,f 2 ,...,f n ,g 1 ,g 2 ,...,g m )=(X 1 ,X 2 ,...,X m+n )
For X i The feature assumes that the event log is divided into e= { E 1 ,E 2 ,...,E K [ Kidney coefficient ]
The decision forest T comprises a decision tree T 1 ,...,t j ,...,t c Feature X i In decision tree t j Importance of Vim in (a) j The variation of the coefficient of the kunity before and after the branch p;
Vim p,j (E,X i )=Gini p (E,X i )-Gini l (E,X i )-Gini r (E,X i )
wherein Gini is p Representing the coefficient of radix-before-branch, gini l With Gini r Representing the coefficient of the post-branching Kennel, thereby obtaining the importance of each feature
V=(v 1 ,v 2 ,...,v m+n )
v i =Vim(E,X i )。
4. A business process predictive monitoring method as claimed in claim 3, characterized in that in step S3 the edit distance lev is defined as the minimum number of insert, delete and replace operations required to convert one sequence into another;
let the sequence information feature be f= (f) 1 ,f 2 ,...,f n ) Wherein f i =(α i,1 ,α i,2 ,...,α i,n ) Representing the feature vector encoded by a frequent sub-active set, for trace sigma a =<e a,1 ,e a,2 ,...,e a,K ,>,lev(f i (σ a ),f i (σ b ) Representing the locus sigma a And sigma (sigma) b At f i The distance above, let (f) i (σ a ),f i (σ b )),d(g i (σ a ),g i (σ b ) Representing the Euclidean distance of the attribute;
importance weight is given to each distance, and the method can obtain
The track distance measurement function of the algorithm is obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811510292.8A CN109753591B (en) | 2018-12-11 | 2018-12-11 | Business process predictive monitoring method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811510292.8A CN109753591B (en) | 2018-12-11 | 2018-12-11 | Business process predictive monitoring method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109753591A CN109753591A (en) | 2019-05-14 |
CN109753591B true CN109753591B (en) | 2024-01-09 |
Family
ID=66403689
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811510292.8A Active CN109753591B (en) | 2018-12-11 | 2018-12-11 | Business process predictive monitoring method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109753591B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110659275B (en) * | 2019-09-23 | 2022-02-08 | 东华大学 | Dynamic production environment abnormity monitoring system facing to real-time data flow |
CN110956309A (en) * | 2019-10-30 | 2020-04-03 | 南京大学 | Flow activity prediction method based on CRF and LSTM |
CN111191897B (en) * | 2019-12-23 | 2023-06-30 | 浙江传媒学院 | Business process online compliance prediction method and system based on bidirectional GRU neural network |
CN111353702A (en) * | 2020-02-28 | 2020-06-30 | 中国工商银行股份有限公司 | Change operation risk calculation method and device |
CN111782620A (en) * | 2020-06-19 | 2020-10-16 | 多加网络科技(北京)有限公司 | Credit link automatic tracking platform and method thereof |
CN112052273B (en) * | 2020-07-27 | 2021-08-31 | 杭州电子科技大学 | Method for extracting next candidate activity of multi-angle business process |
CN112612765A (en) * | 2020-12-21 | 2021-04-06 | 山东理工大学 | Flow variant difference analysis method and system based on drift detection |
CN113537712B (en) * | 2021-06-10 | 2022-03-08 | 杭州电子科技大学 | Business process residual activity sequence prediction method based on trajectory replay |
CN114757592B (en) * | 2022-06-15 | 2022-10-21 | 北京乐开科技有限责任公司 | Workflow engine and RPA (resilient packet Access) fusion arrangement method and system |
CN115878421B (en) * | 2022-12-09 | 2023-11-14 | 国网湖北省电力有限公司信息通信公司 | Data center equipment level fault prediction method, system and medium |
CN116225513B (en) * | 2023-05-09 | 2023-07-04 | 安徽思高智能科技有限公司 | RPA dynamic flow discovery method and system based on concept drift |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107193804A (en) * | 2017-06-02 | 2017-09-22 | 河海大学 | A kind of refuse messages text feature selection method towards word and portmanteau word |
CN107808258A (en) * | 2017-11-21 | 2018-03-16 | 杭州电子科技大学 | The optimal employee's distribution method of workflow based on traffic log and collaboration mode |
-
2018
- 2018-12-11 CN CN201811510292.8A patent/CN109753591B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107193804A (en) * | 2017-06-02 | 2017-09-22 | 河海大学 | A kind of refuse messages text feature selection method towards word and portmanteau word |
CN107808258A (en) * | 2017-11-21 | 2018-03-16 | 杭州电子科技大学 | The optimal employee's distribution method of workflow based on traffic log and collaboration mode |
Also Published As
Publication number | Publication date |
---|---|
CN109753591A (en) | 2019-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109753591B (en) | Business process predictive monitoring method | |
KR102274389B1 (en) | Method for building anomaly pattern detection model using sensor data, apparatus and method for detecting anomaly using the same | |
US9379951B2 (en) | Method and apparatus for detection of anomalies in integrated parameter systems | |
AU2018203375A1 (en) | Method and system for data based optimization of performance indicators in process and manufacturing industries | |
KR20190019493A (en) | It system fault analysis technique based on configuration management database | |
CN112800116B (en) | Method and device for detecting abnormity of service data | |
KR101910926B1 (en) | Technique for processing fault event of it system | |
JP7195264B2 (en) | Automated decision-making using step-by-step machine learning | |
US9799007B2 (en) | Method of collaborative software development | |
CN110334208B (en) | LKJ fault prediction diagnosis method and system based on Bayesian belief network | |
Kadwe et al. | A review on concept drift | |
KR102359090B1 (en) | Method and System for Real-time Abnormal Insider Event Detection on Enterprise Resource Planning System | |
CN113032238A (en) | Real-time root cause analysis method based on application knowledge graph | |
CN115001753B (en) | Method and device for analyzing associated alarms, electronic equipment and storage medium | |
CN116457802A (en) | Automatic real-time detection, prediction and prevention of rare faults in industrial systems using unlabeled sensor data | |
CN114676435A (en) | Knowledge graph-based software vulnerability availability prediction method | |
JP5889759B2 (en) | Missing value prediction device, missing value prediction method, missing value prediction program | |
WO2023009027A1 (en) | Method and system for warning of upcoming anomalies in a drilling process | |
JP7367196B2 (en) | Methods and systems for identification and analysis of regime shifts | |
JP2023520066A (en) | Data processing for industrial machine learning | |
CN116668083A (en) | Network traffic anomaly detection method and system | |
CN112348318B (en) | Training and application method and device of supply chain risk prediction model | |
CN116186603A (en) | Abnormal user identification method and device, computer storage medium and electronic equipment | |
KR20190030193A (en) | Technique for processing fault event of it system | |
CN112201340B (en) | Electrocardiogram disease determination method based on Bayesian network filtering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |