CN117495071A - Flow discovery method and system based on predictive log enhancement - Google Patents

Flow discovery method and system based on predictive log enhancement Download PDF

Info

Publication number
CN117495071A
CN117495071A CN202311851217.9A CN202311851217A CN117495071A CN 117495071 A CN117495071 A CN 117495071A CN 202311851217 A CN202311851217 A CN 202311851217A CN 117495071 A CN117495071 A CN 117495071A
Authority
CN
China
Prior art keywords
sequence
training
prediction
header
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311851217.9A
Other languages
Chinese (zh)
Other versions
CN117495071B (en
Inventor
裴学良
陈伟雄
邓逸
郑超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Sigao Intelligent Technology Co ltd
Original Assignee
Anhui Sigao Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Sigao Intelligent Technology Co ltd filed Critical Anhui Sigao Intelligent Technology Co ltd
Priority to CN202311851217.9A priority Critical patent/CN117495071B/en
Publication of CN117495071A publication Critical patent/CN117495071A/en
Application granted granted Critical
Publication of CN117495071B publication Critical patent/CN117495071B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0633Workflow analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Operations Research (AREA)
  • Databases & Information Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Educational Administration (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a flow discovery method based on predictive log enhancement, which relates to the field of flow discovery and comprises the following steps: s1: dividing the track into a training track set L A μ And a prediction track set L B μ The method comprises the steps of carrying out a first treatment on the surface of the S2: calculating to obtain training head sequence set S A μ And a prediction header sequence set S B μ The method comprises the steps of carrying out a first treatment on the surface of the S3: by training the set of header sequences S A μ For predictive model M μ‑1 Training to obtain a prediction model M μ The method comprises the steps of carrying out a first treatment on the surface of the S4: extraction of S B μ' And with L A μ Merging to obtain a training track set L A μ+1 The method comprises the steps of carrying out a first treatment on the surface of the Will S B μ' The remaining head sequences of (a) are used as a prediction head sequence set S B μ+1 The method comprises the steps of carrying out a first treatment on the surface of the S5: through the flow model P A μ+1 Calculating to obtain the best evaluation score E best The method comprises the steps of carrying out a first treatment on the surface of the If μ is equal to the maximum number of iterations β or E best Converging, outputting the flow model P A μ+1 Is an optimal flow model. According to the invention, the influence of noise and abnormal data on the flow model can be reduced through the continuously updated training head sequence set, the prediction head sequence set and the prediction model, and a more accurate optimal flow model is obtained.

Description

Flow discovery method and system based on predictive log enhancement
Technical Field
The invention relates to the field of process discovery, in particular to a process discovery method and system based on predictive log enhancement.
Background
Flow discovery is the most challenging technique in the field of flow mining. The process discovery algorithm then primarily processes the serialized log information to generate a visual business process model, including representing activities, sequences, and dependencies, typically using graphical symbols to facilitate understanding and sharing. By automatically identifying and modeling business processes, organizations are provided with opportunities for improved efficiency, reduced risk, and increased customer satisfaction.
Flow prediction algorithms are analytical techniques based on historical event data or event logs, intended to predict the next activity type or other state that may occur in the future. These algorithms typically use a sequence of events that have been recorded to build a model to identify and predict patterns and trends in the business process, to help organizations provide data-driven decision support, or to use real-time predictions of the model to help businesses better monitor the business process.
The current flow discovery method is too dependent on log data quality, and when the structuring degree of the log file is low or a large number of noise events exist, an effective flow model is difficult to mine, namely, a model which originally expects a visual business flow does not accurately summarize the event log, or any meaningful abstraction on the event log is not provided. However, the current method for preprocessing log data is mainly based on domain knowledge, and screens and filters out part of flow tracks which possibly influence the process of flow discovery; such methods are too dependent on the data characteristics in different scenarios and may cause some useful information to be lost.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a process discovery method based on predictive log enhancement, which can obtain a more accurate optimal process model.
The invention provides a flow discovery method based on predictive log enhancement, which comprises the following steps:
s1: acquiring tracks in a business process log, and dividing the tracks into a training track set L A μ And a prediction track set L B μ μ is an iteration number;
s2: calculating to obtain a training track set L A μ Training header sequence set S of (1) A μ And a prediction track set L B μ Is a prediction header sequence set S of (1) B μ
S3: by training the set of header sequences S A μ For predictive model M μ-1 Training to obtain a prediction model M μ
S4: will predict the header sequence set S B μ Input prediction model M μ Event activity prediction is carried out to obtain a reference head sequence set S B μ' The method comprises the steps of carrying out a first treatment on the surface of the Extraction of S B μ' And with L A μ Merging to obtain a training track set L A μ+1 The method comprises the steps of carrying out a first treatment on the surface of the Will S B μ' The remaining head sequences of (a) are used as a prediction head sequence set S B μ+1
S5: by training the track set L A μ+1 Calculating to obtain a flow model P A μ+1 Through the flow model P A μ+1 Calculating to obtain the best evaluation score E best The method comprises the steps of carrying out a first treatment on the surface of the If μ is equal to the maximum number of iterations β or E best Converging, outputting the flow model P A μ+1 Is an optimal flow model; if not, μ=μ+1 and returns to step S1.
Preferably, step S1 specifically includes:
s11: obtaining a business process log l= { σ 1 , σ 2 , σ 3 ,…,σ λ …, where λ denotes the number of the track, σ λ Representing a lambda-th track, wherein A is a set of all activity types recorded in a business process log;
s12: let sigma λ ={e 1 , e 2 ,…, e m …, where m represents the number of the event, e m =(λ, a m ) Represented in locus sigma λ M-th event executed in a m E A is event e m Is a type of activity;
s13: setting a dividing proportion tau, and randomly dividing tracks in a business process log L into a training track set L A μ And a prediction track set L B μ Wherein L is A μ And L B μ The number ratio of the contained tracks is tau: (1- τ).
Preferably, step S2 specifically includes:
s21: setting the length of the header sequence to be w;
s22: acquiring a training track set L according to a time sequence A μ Each active subsequence contained in each track is converted into a head sequence with a fixed length w through filling and truncation, and a corresponding training head sequence set S is obtained A μ
S23: obtaining a predicted track set L according to step S22 B μ Corresponding prediction header sequence set S B μ
Preferably, the step S3 specifically includes:
s31: let training header sequence set S A μ ={s 1 , s 2 , …,s k …, where k denotes the number of the header sequence, s k Represents the kth header sequence;
s32: initializing an activity characteristic matrix H and constructing a mapping table f tran Through the mapping table f tran Sequence of the header s k The activity type in the map is corresponding to the activity sequence number q k According to the activity sequence numberq k Constructing corresponding sequence feature v with activity feature matrix H k
S33: repeating step S32 to obtain sequence features corresponding to all head sequences and obtain a sequence feature set { v } 1 , v 2 , …,v k ,…};
S34: setting the end symbol of the flow to be [ E]The sequence features are assembled { v 1 , v 2 , …,v k … and the actual next active sequence number input prediction model M μ-1 Training to obtain a prediction model M μ
Preferably, step S4 specifically includes:
s41: acquiring a set of pre-measurement head sequences S B μ ={s 1 , s 2 , …,s z …, where z denotes the number of the header sequence, s z Representing a z-th header sequence; let the initial reference header sequence set S B0 μ' =S B μ
S42: acquisition of header sequence s z Is set of activity types of (a)<a 1 , a 2 , …, a w >Wherein w is the length of the header sequence, a w The activity type for the w-th event;
s43: sequence of the header s z Corresponding sequence feature v z Input prediction model M μ Predicting event activity to obtain the activity type y of the next event, and making the current reference head sequence set S Bz μ' =S B(z-1) μ' ∪<a 1 , a 2 , …, a w ,y>The method comprises the steps of carrying out a first treatment on the surface of the Let z=z+1;
s44: repeating steps S42-S43 until all the sequence features corresponding to the head sequences are input into the prediction model M μ Obtaining a reference header sequence set S B μ'
S45: will refer to the header sequence set S B μ' With end-of-flow symbol [ E ]]As a completed head sequence, combining the completed head sequence with a training track set L A μ Merging to obtain a training track set L A μ+1 The method comprises the steps of carrying out a first treatment on the surface of the Will refer to the header sequence set S B μ' The remaining heads in (a)Sequence as prediction header sequence set S B μ+1
Preferably, step S5 specifically includes:
s51: in the training track set L A μ+1 In the process, a process model P is obtained based on a process discovery algorithm A μ+1 Calculating to obtain a flow model P A μ+1 In the training track set L A μ Evaluation score E on μ+1 The method comprises the steps of carrying out a first treatment on the surface of the If E μ+1 >E best Then E is provided μ+1 Value of (2) gives E best
S52: judgment E best Whether to converge, if so, outputting a flow model P A μ+1 If not, the process goes to step S53:
s53: judging whether mu is equal to the maximum iteration number beta, if so, outputting a flow model P A μ+1 For the optimal flow model, otherwise let μ=μ+1 and return to step S1.
The storage device stores instructions and data for implementing the predictive log enhancement-based flow discovery method.
A predictive log enhancement based process discovery system, comprising: a processor and a storage device; the processor loads and executes the instructions and data in the storage device for implementing the predictive log enhancement-based flow discovery method.
The invention has the following beneficial effects:
according to the invention, the track in the business process log is used for extracting the training head sequence set and the prediction head sequence set, the prediction model is trained through the training head sequence set, the prediction head sequence set is input into the prediction model to update the training head sequence set and the prediction head sequence set, and the influence of noise and abnormal data on the process model can be reduced through the continuously updated training head sequence set, the continuously updated prediction head sequence set and the continuously updated prediction model, so that a more accurate optimal process model is obtained, and the robustness and the universality of the obtained optimal process model are improved.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention;
FIG. 2 is a system architecture diagram of an embodiment of the present invention;
the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, the present invention provides a predictive log enhancement-based flow discovery method, including:
s1: acquiring tracks in a business process log, and dividing the tracks into a training track set L A μ And a prediction track set L B μ μ is an iteration number;
s2: calculating to obtain a training track set L A μ Training header sequence set S of (1) A μ And a prediction track set L B μ Is a prediction header sequence set S of (1) B μ
S3: by training the set of header sequences S A μ For predictive model M μ-1 Training to obtain a prediction model M μ
S4: will predict the header sequence set S B μ Input prediction model M μ Event activity prediction is carried out to obtain a reference head sequence set S B μ' The method comprises the steps of carrying out a first treatment on the surface of the Extraction of S B μ' And with L A μ Merging to obtain a training track set L A μ+1 The method comprises the steps of carrying out a first treatment on the surface of the Will S B μ' The remaining head sequences of (a) are used as a prediction head sequence set S B μ+1
S5: by training the track set L A μ+1 Calculating to obtain a flow model P A μ+1 Through the flow model P A μ+1 Calculating to obtain the best evaluation score E best The method comprises the steps of carrying out a first treatment on the surface of the If μ is equal to the maximum number of iterations β or E best Converging, outputting the flow model P A μ+1 Is an optimal flow model; no let μ=μ+1 and return to stepS1。
Further, obtaining complete business process log data from a log recording system, and removing low-frequency noise events and repeated cycling events in the complete business process log data to obtain a business process log L for algorithm input;
the step S1 specifically comprises the following steps:
s11: obtaining a business process log l= { σ 1 , σ 2 , σ 3 ,…,σ λ …, where λ denotes the number of the track, σ λ Representing a lambda-th track, wherein A is a set of all activity types recorded in a business process log;
s12: let sigma λ ={e 1 , e 2 ,…, e m …, where m represents the number of the event, e m =(λ, a m ) Represented in locus sigma λ M-th event executed in a m E A is event e m Is a type of activity;
s13: setting a dividing proportion tau, and randomly dividing tracks in a business process log L into a training track set L A μ And a prediction track set L B μ Wherein L is A μ And L B μ The number ratio of the contained tracks is tau: (1- τ).
Specifically, for example, when τ=2/3, the business process log L is randomly divided into sets L A μ And L B μ Wherein L is A μ And L B μ The track number ratio of (2) to (1).
Further, the step S2 specifically includes:
s21: setting the length of the header sequence to be w;
s22: acquiring a training track set L according to a time sequence A μ Each active subsequence contained in each track is converted into a head sequence with a fixed length w through filling and truncation, and a corresponding training head sequence set S is obtained A μ
S23: obtaining a predicted track set L according to step S22 B μ Corresponding prediction header sequence set S B μ
Further, step S22 specifically includes:
S221:L A μ the trajectory in (a) includes { sigma } 1 , σ 2 , σ 3 , …,σ λ ,…},σ λ Represents the trace from the lambda-th traversal, where sigma λ ∈L A μ And lambda is more than or equal to 1 and less than or equal to L A μ The method comprises the steps of (1) calculating the number of elements in an object;
s222: for the track sigma λ ={e 1 , e 2 , …, e m The sub-sequence of the activity type obtained by the t-th traversal is that<e 1 .a, e 2 .a, …,e t .a>Wherein t is more than or equal to 1 and less than or equal to m; if t is less than w, filling w-t empty activity types in front of the subsequence to obtain a header sequence s t =<0, 0, …, e 1 .a, e 2 .a,…, e t .a>The method comprises the steps of carrying out a first treatment on the surface of the If t > w, then intercept the last w events to get the head sequence s t =<e t-w .a, e t-w+1 .a, …, e t .a>;
S223: updating S A μ =S A μ ∪s t T=t+1, where s t I=w, if t < m, return to step S222; otherwise, go on traversing L A μ The next track in (a) is when lambda+1 is less than or equal to L A μ When L, λ=λ+1 is updated and t=0 is reset until L is traversed A μ Outputting the final S when the flow path is all A μ
Further, the step S3 specifically includes:
s31: let training header sequence set S A μ ={s 1 , s 2 , …,s k …, where k denotes the number of the header sequence, s k Represents the kth header sequence;
s32: initializing an activity characteristic matrix H and constructing a mapping table f tran Through the mapping table f tran Sequence of the header s k The activity type in the map is corresponding to the activity sequence number q k According to the activity sequence number q k Constructing corresponding sequence feature v with activity feature matrix H k
S33: repeating step S32 to obtain sequence features corresponding to all head sequences and obtain a sequence feature set { v } 1 , v 2 , …,v k ,…};
S34: setting the end symbol of the flow to be [ E]The sequence features are assembled { v 1 , v 2 , …,v k … and the actual next active sequence number input prediction model M μ-1 Training to obtain a prediction model M μ
Specifically, the prediction model is a two-layer LSTM cyclic neural network, and the training process is as follows:
set sequence features { v } 1 , v 2 , …,v k Vector concatenation in … }And inputting the result into a two-layer LSTM circulating neural network, and transmitting the output result into the full-connection layer Linear and softmax layers, thereby obtaining the prediction result of the model.
Where i is equal to the number of sequence features, hid_X 0 Is a neural network hidden layer feature representation initialized randomly, andd represents the number of dimensions of the sequence feature;
the complete calculation process of the neural network is as follows:
based on the model prediction result Y, the corresponding next event activity in the original log sequence is true (consider the end symbol [ E ] if there is no next event activity]) Taking cross-entopy as a loss function, continuously training and adjusting model parameters, and storing a corresponding prediction model M when the loss function is minimum μ
Further, the step S4 specifically includes:
s41: acquiring a set of pre-measurement head sequences S B μ ={s 1 , s 2 , …,s z …, where z denotes the number of the header sequence,s z representing a z-th header sequence; let the initial reference header sequence set S B0 μ' =S B μ
S42: acquisition of header sequence s z Is set of activity types of (a)<a 1 , a 2 , …, a w >Wherein w is the length of the header sequence, a w The activity type for the w-th event;
s43: sequence of the header s z Corresponding sequence feature v z Input prediction model M μ Predicting event activity to obtain the activity type y of the next event, and making the current reference head sequence set S Bz μ' =S B(z-1) μ' ∪<a 1 , a 2 , …, a w ,y>The method comprises the steps of carrying out a first treatment on the surface of the Let z=z+1;
s44: repeating steps S42-S43 until all the sequence features corresponding to the head sequences are input into the prediction model M μ Obtaining a reference header sequence set S B μ'
S45: will refer to the header sequence set S B μ' With end-of-flow symbol [ E ]]As a completed head sequence, combining the completed head sequence with a training track set L A μ Merging to obtain a training track set L A μ+1 The method comprises the steps of carrying out a first treatment on the surface of the Will refer to the header sequence set S B μ' The remaining head sequences of (a) are used as a prediction head sequence set S B μ+1
Further, the step S5 specifically includes:
s51: in the training track set L A μ+1 In the process, a process model P is obtained based on a process discovery algorithm A μ+1 Calculating to obtain a flow model P A μ+1 In the training track set L A μ Evaluation score E on μ+1 The method comprises the steps of carrying out a first treatment on the surface of the If E μ+1 >E best Then E is provided μ+1 Value of (2) gives E best
Specifically, the evaluation score adopts an F-measure score, and the calculation formula of the F-measure is as follows:
wherein L and P represent the number of flow variants in the business flow log and the event model, respectively, L n P represents the number of identical flow variants contained in both;
s52: judgment E best Whether to converge, if so, outputting a flow model P A μ+1 If not, the process goes to step S53:
s53: judging whether mu is equal to the maximum iteration number beta, if so, outputting a flow model P A μ+1 For the optimal flow model, otherwise let μ=μ+1 and return to step S1.
The storage device stores instructions and data for implementing the predictive log enhancement-based flow discovery method.
Referring to fig. 2, a predictive log enhancement based flow discovery system 401 includes: a processor 402 and a storage device 403; the processor 402 loads and executes instructions and data in the memory device 403 for implementing the predictive log-based enhanced process discovery method.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. do not denote any order, but rather the terms first, second, third, etc. are used to interpret the terms as labels.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (8)

1. A predictive log enhancement based process discovery method, comprising:
s1: acquiring tracks in a business process log, and dividing the tracks into a training track set L A μ And a prediction track set L B μ μ is an iteration number;
s2: calculating to obtain a training track set L A μ Training header sequence set S of (1) A μ And a prediction track set L B μ Is a prediction header sequence set S of (1) B μ
S3: by training the set of header sequences S A μ For predictive model M μ-1 Training to obtain a prediction model M μ
S4: will predict the header sequence set S B μ Input prediction model M μ Event activity prediction is carried out to obtain a reference head sequence set S B μ' The method comprises the steps of carrying out a first treatment on the surface of the Extraction of S B μ' And with L A μ Merging to obtain a training track set L A μ+1 The method comprises the steps of carrying out a first treatment on the surface of the Will S B μ' The remaining head sequences of (a) are used as a prediction head sequence set S B μ+1
S5: by training the track set L A μ+1 Calculating to obtain a flow modelP A μ+1 Through the flow model P A μ+1 Calculating to obtain the best evaluation score E best The method comprises the steps of carrying out a first treatment on the surface of the If μ is equal to the maximum number of iterations β or E best Converging, outputting the flow model P A μ+1 Is an optimal flow model; if not, μ=μ+1 and returns to step S1.
2. The predictive log enhancement-based process discovery method according to claim 1, wherein step S1 is specifically:
s11: obtaining a business process log l= { σ 1 , σ 2 , σ 3 ,…,σ λ …, where λ denotes the number of the track, σ λ Representing a lambda-th track, wherein A is a set of all activity types recorded in a business process log;
s12: let sigma λ ={e 1 , e 2 ,…, e m …, where m represents the number of the event, e m =(λ, a m ) Represented in locus sigma λ M-th event executed in a m E A is event e m Is a type of activity;
s13: setting a dividing proportion tau, and randomly dividing tracks in a business process log L into a training track set L A μ And a prediction track set L B μ Wherein L is A μ And L B μ The number ratio of the contained tracks is tau: (1- τ).
3. The predictive log enhancement-based process discovery method according to claim 1, wherein step S2 specifically comprises:
s21: setting the length of the header sequence to be w;
s22: acquiring a training track set L according to a time sequence A μ Each active subsequence contained in each track is converted into a head sequence with a fixed length w through filling and truncation, and a corresponding training head sequence set S is obtained A μ
S23: obtaining a predicted track set L according to step S22 B μ Corresponding prediction header sequence set S B μ
4. The predictive log enhancement-based process discovery method according to claim 1, wherein step S3 is specifically:
s31: let training header sequence set S A μ ={s 1 , s 2 , …,s k …, where k denotes the number of the header sequence, s k Represents the kth header sequence;
s32: initializing an activity characteristic matrix H and constructing a mapping table f tran Through the mapping table f tran Sequence of the header s k The activity type in the map is corresponding to the activity sequence number q k According to the activity sequence number q k Constructing corresponding sequence feature v with activity feature matrix H k
S33: repeating step S32 to obtain sequence features corresponding to all head sequences and obtain a sequence feature set { v } 1 , v 2 , …,v k ,…};
S34: setting the end symbol of the flow to be [ E]The sequence features are assembled { v 1 , v 2 , …,v k … and the actual next active sequence number input prediction model M μ-1 Training to obtain a prediction model M μ
5. The predictive log enhancement-based process discovery method according to claim 1, wherein step S4 is specifically:
s41: acquiring a set of pre-measurement head sequences S B μ ={s 1 , s 2 , …,s z …, where z denotes the number of the header sequence, s z Representing a z-th header sequence; let the initial reference header sequence set S B0 μ' =S B μ
S42: acquisition of header sequence s z Is set of activity types of (a)<a 1 , a 2 , …, a w >Wherein w is the length of the header sequence, a w The activity type for the w-th event;
s43: sequence of the header s z Corresponding sequence feature v z Input prediction model M μ Predicting event activity to obtain the activity type y of the next event, and making the current reference head sequence set S Bz μ' =S B(z-1) μ' ∪<a 1 , a 2 , …, a w ,y>The method comprises the steps of carrying out a first treatment on the surface of the Let z=z+1;
s44: repeating steps S42-S43 until all the sequence features corresponding to the head sequences are input into the prediction model M μ Obtaining a reference header sequence set S B μ'
S45: will refer to the header sequence set S B μ' With end-of-flow symbol [ E ]]As a completed head sequence, combining the completed head sequence with a training track set L A μ Merging to obtain a training track set L A μ+1 The method comprises the steps of carrying out a first treatment on the surface of the Will refer to the header sequence set S B μ' The remaining head sequences of (a) are used as a prediction head sequence set S B μ+1
6. The predictive log enhancement-based process discovery method according to claim 1, wherein step S5 is specifically:
s51: in the training track set L A μ+1 In the process, a process model P is obtained based on a process discovery algorithm A μ+1 Calculating to obtain a flow model P A μ+1 In the training track set L A μ Evaluation score E on μ+1 The method comprises the steps of carrying out a first treatment on the surface of the If E μ+1 >E best Then E is provided μ+1 Value of (2) gives E best
S52: judgment E best Whether to converge, if so, outputting a flow model P A μ+1 If not, the process goes to step S53:
s53: judging whether mu is equal to the maximum iteration number beta, if so, outputting a flow model P A μ+1 For the optimal flow model, otherwise let μ=μ+1 and return to step S1.
7. A memory device, characterized by: the storage device stores instructions and data for implementing the predictive log enhancement-based flow discovery method of any one of claims 1-6.
8. A predictive log enhancement based process discovery system, characterized by: comprising the following steps: a processor and a storage device; the processor loads and executes the instructions and data in the storage device to implement the predictive log enhancement-based flow discovery method of any one of claims 1-6.
CN202311851217.9A 2023-12-29 2023-12-29 Flow discovery method and system based on predictive log enhancement Active CN117495071B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311851217.9A CN117495071B (en) 2023-12-29 2023-12-29 Flow discovery method and system based on predictive log enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311851217.9A CN117495071B (en) 2023-12-29 2023-12-29 Flow discovery method and system based on predictive log enhancement

Publications (2)

Publication Number Publication Date
CN117495071A true CN117495071A (en) 2024-02-02
CN117495071B CN117495071B (en) 2024-05-14

Family

ID=89669428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311851217.9A Active CN117495071B (en) 2023-12-29 2023-12-29 Flow discovery method and system based on predictive log enhancement

Country Status (1)

Country Link
CN (1) CN117495071B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050278322A1 (en) * 2004-05-28 2005-12-15 Ibm Corporation System and method for mining time-changing data streams
CN102880684A (en) * 2012-09-13 2013-01-16 杭州电子科技大学 Workflow modeling method based on log record mining and combination verification
CN105677876A (en) * 2016-01-12 2016-06-15 国家电网公司 Method for log mining based on physical level database
CN113919319A (en) * 2021-10-15 2022-01-11 中国人民解放军国防科技大学 Script event prediction method based on action scene reinforcement
US20220058558A1 (en) * 2018-12-21 2022-02-24 Odaia Intelligence Inc. Accurate and transparent path prediction using process mining
CN114358445A (en) * 2022-03-21 2022-04-15 山东建筑大学 Business process residual time prediction model recommendation method and system
CN114757432A (en) * 2022-04-27 2022-07-15 浙江传媒学院 Future execution activity and time prediction method and system based on flow log and multi-task learning
CN115238583A (en) * 2022-07-27 2022-10-25 山东理工大学 Business process remaining time prediction method and system supporting incremental logs
CN115525693A (en) * 2022-09-20 2022-12-27 山东理工大学 Incremental event log-oriented process model mining method and system
WO2023057512A1 (en) * 2021-10-05 2023-04-13 Deepmind Technologies Limited Retrieval augmented reinforcement learning
CN116450704A (en) * 2023-04-03 2023-07-18 国家电网有限公司大数据中心 Automatic generation method and generation device of flow model
US20230306343A1 (en) * 2022-03-23 2023-09-28 Digiwin Software Co., Ltd Business process management system and method thereof
CN116822920A (en) * 2023-05-23 2023-09-29 北京杰成合力科技有限公司 Flow prediction method based on cyclic neural network

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050278322A1 (en) * 2004-05-28 2005-12-15 Ibm Corporation System and method for mining time-changing data streams
CN102880684A (en) * 2012-09-13 2013-01-16 杭州电子科技大学 Workflow modeling method based on log record mining and combination verification
CN105677876A (en) * 2016-01-12 2016-06-15 国家电网公司 Method for log mining based on physical level database
US20220058558A1 (en) * 2018-12-21 2022-02-24 Odaia Intelligence Inc. Accurate and transparent path prediction using process mining
WO2023057512A1 (en) * 2021-10-05 2023-04-13 Deepmind Technologies Limited Retrieval augmented reinforcement learning
CN113919319A (en) * 2021-10-15 2022-01-11 中国人民解放军国防科技大学 Script event prediction method based on action scene reinforcement
CN114358445A (en) * 2022-03-21 2022-04-15 山东建筑大学 Business process residual time prediction model recommendation method and system
US20230306343A1 (en) * 2022-03-23 2023-09-28 Digiwin Software Co., Ltd Business process management system and method thereof
CN114757432A (en) * 2022-04-27 2022-07-15 浙江传媒学院 Future execution activity and time prediction method and system based on flow log and multi-task learning
CN115238583A (en) * 2022-07-27 2022-10-25 山东理工大学 Business process remaining time prediction method and system supporting incremental logs
CN115525693A (en) * 2022-09-20 2022-12-27 山东理工大学 Incremental event log-oriented process model mining method and system
CN116450704A (en) * 2023-04-03 2023-07-18 国家电网有限公司大数据中心 Automatic generation method and generation device of flow model
CN116822920A (en) * 2023-05-23 2023-09-29 北京杰成合力科技有限公司 Flow prediction method based on cyclic neural network

Also Published As

Publication number Publication date
CN117495071B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
US10360517B2 (en) Distributed hyperparameter tuning system for machine learning
CN110321603B (en) Depth calculation model for gas path fault diagnosis of aircraft engine
US20060217939A1 (en) Time series analysis system, time series analysis method, and time series analysis program
CN110138595A (en) Time link prediction technique, device, equipment and the medium of dynamic weighting network
US6947915B2 (en) Multiresolution learning paradigm and signal prediction
CN110659742A (en) Method and device for acquiring sequence representation vector of user behavior sequence
US11423043B2 (en) Methods and systems for wavelet based representation
CN116383096B (en) Micro-service system anomaly detection method and device based on multi-index time sequence prediction
CN113298131A (en) Attention mechanism-based time sequence data missing value interpolation method
CN114694379B (en) Traffic flow prediction method and system based on self-adaptive dynamic graph convolution
CN114880482A (en) Graph embedding-based relation graph key personnel analysis method and system
Voke et al. A Framework for Feature Selection using Data Value Metric and Genetic Algorithm
Falini et al. Spline based Hermite quasi-interpolation for univariate time series
Kruse et al. Data mining applications in the automotive industry
Chouzenoux et al. Sparse graphical linear dynamical systems
CN113380340A (en) Training method and device of fly ash concentration prediction model and computer equipment
US11176502B2 (en) Analytical model training method for customer experience estimation
CN117495071B (en) Flow discovery method and system based on predictive log enhancement
CN116522070A (en) Non-supervision intelligent fault diagnosis method and system for mechanical parts
US20230206054A1 (en) Expedited Assessment and Ranking of Model Quality in Machine Learning
JPH10143343A (en) Association type plant abnormality diagnosis device
CN115329962A (en) Visual interpretation method of normal form graph model
CN112232557A (en) Switch machine health degree short-term prediction method based on long-term and short-term memory network
CN114618167A (en) Anti-cheating detection model construction method and anti-cheating detection method
CN116305531B (en) Spacecraft health evolution model modeling method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant