CN113704215B - Business process event log sampling method, system, storage medium and computing device - Google Patents

Business process event log sampling method, system, storage medium and computing device Download PDF

Info

Publication number
CN113704215B
CN113704215B CN202110914759.0A CN202110914759A CN113704215B CN 113704215 B CN113704215 B CN 113704215B CN 202110914759 A CN202110914759 A CN 202110914759A CN 113704215 B CN113704215 B CN 113704215B
Authority
CN
China
Prior art keywords
track
log
intersection
starting point
point set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110914759.0A
Other languages
Chinese (zh)
Other versions
CN113704215A (en
Inventor
刘聪
苏轩
张帅鹏
李彩虹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiecheng Heli Technology Co ltd
Original Assignee
Beijing Jiecheng Heli Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiecheng Heli Technology Co ltd filed Critical Beijing Jiecheng Heli Technology Co ltd
Priority to CN202110914759.0A priority Critical patent/CN113704215B/en
Publication of CN113704215A publication Critical patent/CN113704215A/en
Application granted granted Critical
Publication of CN113704215B publication Critical patent/CN113704215B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Abstract

The invention discloses a business process event log sampling method, a system, a storage medium and a computing device, comprising the following steps: 1) Acquiring a log to directly follow an activity relation set, a starting point set and an ending point set; 2) Judging whether the intersection of the track starting point and the start point set, the intersection of the track ending point and the end point set, the intersection of the track direct following active relation set and the log direct following active relation set are empty sets or not; if the judgment results are all empty sets, finishing track traversal of the event log, and outputting a sample log; if the judgment result is not the empty set, any one of four event log sampling methods including a complete traversal sampling method, a set coverage sampling method, a sampling method based on track length and a sampling method based on track frequency is selected; 3) The selection tracks form a new log, and the new log is the sample log. The invention can effectively acquire the sample log with enough representativeness through sampling, and simultaneously ensures the completeness of the log.

Description

Business process event log sampling method, system, storage medium and computing device
Technical Field
The invention relates to the technical field of process mining of event logs, in particular to a business process event log sampling method, a system, a storage medium and computing equipment.
Background
Process mining is a novel discipline connecting the fields of data science and business process management, with the aim of extracting valid information about business processes from event logs, discovering, monitoring and improving real business processes. Process discovery is one of the most challenging process mining tasks, and many process discovery methods have been proposed by researchers at home and abroad, such as based on Alpha Miner, heuristics Miner, induced Miner, tsinghua-Alpha, split Miner, etc. Most discovery methods are no longer suitable for processing the entire large data set using one machine due to hardware limitations such as I/O and memory. If existing process discovery algorithms, such as the well-known MapReduce framework, are re-implemented by means of the current distributed platform, these methods are time-consuming and cannot be generalized, and require developers to have extensive knowledge of the underlying discovery methods, so a new method is urgently needed to solve these problems. The event log sampling method provides an alternative method of improving discovery efficiency, rather than re-implementing existing discovery methods. However, the performance of the existing event log sampling method still cannot meet the requirements of practical applications. The business process event log sampling method provides a feasibility scheme for the problems, greatly improves log sampling efficiency on the basis of guaranteeing model mining quality, ensures log completeness, and can obtain a simpler and higher-quality process model.
Disclosure of Invention
The first object of the present invention is to overcome the drawbacks and disadvantages of the existing event log sampling methods, and to provide a business process event log sampling method, which solves the problems that the existing event log sampling method cannot process a large-scale event log or has low processing efficiency, and the like, and by taking the large-scale event log as an input, a sample log with enough representativeness is obtained, the sample log is much smaller than the original log, and the processing efficiency is also higher.
A second object of the present invention is to provide a business process event log sampling system.
A third object of the present invention is to provide a storage medium.
It is a fourth object of the present invention to provide a computing device.
The first object of the invention is achieved by the following technical scheme: the business process event log sampling method comprises the following steps:
1) Acquiring three sets of log event logs, namely a log direct following active relation set, a starting point set and an ending point set;
2) Judging whether the intersection of the track starting point and the starting point set, the intersection of the track ending point and the ending point set and the intersection of the track direct following active relation set and the log direct following active relation set are empty sets or not according to the three sets obtained in the step 1); if the judgment results are all empty sets, finishing track traversal of the event log, and outputting a sample log; if the judgment result is not the empty set, any one of four event log sampling methods including a complete traversal sampling method, a set coverage sampling method, a sampling method based on track length and a sampling method based on track frequency is selected;
3) And (3) forming a new log by selecting the track according to the event log sampling method selected in the step (2), wherein the new log is the sample log.
Further, in step 1), the event log is composed of cases, the cases are composed of events, the events in the cases are represented by tracks, the events have a plurality of attributes, the events are represented by activities, and the set is defined as follows:
a. the direct following activity means that in one track of the event log, the condition that the activity b follows the activity a is marked as < a, b >, and the log direct following activity relation set is a direct following activity set of each track in the log and marked as dfrSetLog;
b. the starting points of each track form a starting point set, and the starting point set of the log is recorded as StartSet;
c. the end points of each track form an end point set, and the end point set of the log is marked as EndSet;
further, in step 3), if the full traversal sampling method is selected, sequentially traversing a first track of the event log, adding the track to the sample log when at least one of a track start point and start point set intersection, a track end point and end point set intersection, a track direct following active relation set and a track direct following active relation set intersection is not an empty set, deleting a track direct following active relation set intersection and a track direct following active relation set intersection in the track direct following relation set, a track end point and end point set intersection in the start point set, and stopping track traversal until the track direct following relation set, the start point set and the end point set are empty sets;
if the selection set covers the sampling method, traversing all tracks in the log, selecting a track with the largest intersection between the track direct following active relation set and the log direct following active relation set, adding the track into the sample log under the condition that the intersection between a track starting point and a starting point set, the intersection between a track ending point and an ending point set and the intersection between the track direct following active relation set and the log direct following active relation set is not an empty set is met, deleting the intersection between the track direct following active relation set and the track direct following active relation set in the log direct following active relation set, the intersection between a starting point and the starting point set in the starting point set and the intersection between a track ending point and the ending point set in the ending point set, and stopping track traversing until the intersection between the track direct following relation set, the starting point set and the ending point set is an empty set;
if a sampling method based on track length is selected, wherein the track length refers to the number of activities contained in a track, firstly counting all track lengths in an event log and carrying out descending order sequencing, secondly traversing sequentially from the track with the longest length, adding the track into a sample log when at least one of the track starting point and starting point set intersection, the track ending point and ending point set intersection and the track direct following activity relation set intersection is not an empty set, and deleting the track direct following activity relation set and the track direct following activity relation set intersection in the log direct following relation set, the starting point and starting point set intersection in the starting point set and the track ending point set intersection in the ending point set until the track direct following relation set, the starting point set and the ending point set are empty sets;
if a sampling method based on track frequency is selected, wherein the track frequency refers to the track occurrence number in track traversal of an event log, firstly counting the track frequency of the event log and performing deduplication operation, wherein the deduplication operation refers to only keeping tracks with the largest frequency in the same track, finally descending and sorting according to the track frequency, traversing sequentially from the track with the largest track frequency, and stopping track traversal when at least one of track starting point and starting point set intersection, track ending point and ending point set intersection, track direct following active relation set and track direct following active relation set intersection is not blank, and deleting the track direct following active relation set and track direct following active relation set intersection in the log direct following relation set, track ending point and ending point set intersection in the starting point set until the track direct following relation set, the starting point set and the ending point set are blank.
The second object of the invention is achieved by the following technical scheme: the business process event log sampling system comprises an event log data acquisition module, a track set intersection judgment module, an event log sampling selection module and a sample log track selection module;
the event log data acquisition module is used for acquiring a log to directly follow the active relation set, the starting point set and the ending point set;
the track set intersection judgment module is used for judging whether the intersection of a track starting point and a starting point set, the intersection of a track ending point and an ending point set, the intersection of a track direct following active relation set and a log direct following active relation set are empty sets or not;
the event log sampling selection module is used for selecting one of four event log sampling methods, namely a full traversal sampling method, a set coverage sampling method, a sampling method based on track length and a sampling method based on track frequency, or directly finishing track traversal of the event log, and outputting a sample log;
the sample log track selection module is used for selecting tracks to form a new log, and the new log is the sample log.
Further, the event log data acquisition module performs the following operations:
acquiring a starting point set, an ending point set and a log directly following an activity relation set of an event log, wherein the event log consists of cases, the cases consist of events, the events in the cases are represented in the form of tracks, the events have a plurality of attributes, the events are represented by activities, and the set is defined as follows:
a. the direct following activity means that in one track of the event log, the condition that the activity b follows the activity a is marked as < a, b >, and the log direct following activity relation set is a direct following activity set of each track in the log and marked as dfrSetLog;
b. the starting points of each track form a starting point set, and the starting point set of the log is recorded as StartSet;
c. the end points of each track constitute an end point set, which is noted EndSet for the log.
Further, the track set intersection judgment module performs the following operations:
and judging whether the intersection of the track starting point and the starting point set, the intersection of the track ending point and the ending point set and the intersection of the track direct following active relation set and the log direct following active relation set are empty sets or not according to the log direct following active relation set, the starting point set and the ending point set obtained by the data acquisition module.
Further, the event log sampling selection module performs the following operations according to the determination result obtained by the trace set intersection determination module:
a. if the judgment result is an empty set, finishing track traversal of the event log, and outputting a sample log;
b. if the judgment result is not the empty set, one of four event log sampling methods is selected, wherein the four event log sampling methods are respectively as follows: a full traversal sampling method, a set coverage sampling method, a sampling method based on track length and a sampling method based on track frequency.
Further, the sample log trace selection module performs the following operations:
a. if a complete traversal sampling method is selected, traversing the first track of the event log in sequence, adding the track into the sample log when at least one of the intersection of a track starting point and a starting point set, the intersection of a track ending point and an ending point set and the intersection of a track direct following active relation set and a log direct following active relation set is not an empty set, deleting the intersection of the track direct following active relation set and the track direct following active relation set in the log direct following relation set, the intersection of a starting point and the starting point set in the starting point set and the intersection of a track ending point and the ending point set in the ending point set until the track direct following relation set, the starting point set and the ending point set are empty sets, and stopping track traversal;
b. if the selection set covers the sampling method, traversing all tracks in the log, selecting a track with the largest intersection between the track direct following active relation set and the log direct following active relation set, adding the track into the sample log under the condition that the intersection between a track starting point and a starting point set, the intersection between a track ending point and an ending point set and the intersection between the track direct following active relation set and the log direct following active relation set is not an empty set is met, deleting the intersection between the track direct following active relation set and the track direct following active relation set in the log direct following active relation set, the intersection between a starting point and the starting point set in the starting point set and the intersection between a track ending point and the ending point set in the ending point set, and stopping track traversing until the intersection between the track direct following relation set, the starting point set and the ending point set is an empty set;
c. if a sampling method based on track length is selected, wherein the track length refers to the number of activities contained in a track, firstly counting all track lengths in an event log and carrying out descending order sequencing, secondly traversing sequentially from the track with the longest length, adding the track into a sample log when at least one of the track starting point and starting point set intersection, the track ending point and ending point set intersection and the track direct following activity relation set intersection is not an empty set, and deleting the track direct following activity relation set and the track direct following activity relation set intersection in the log direct following relation set, the starting point and starting point set intersection in the starting point set and the track ending point set intersection in the ending point set until the track direct following relation set, the starting point set and the ending point set are empty sets;
d. if a sampling method based on track frequency is selected, wherein the track frequency refers to the track occurrence number in track traversal of an event log, firstly counting the track frequency of the event log and performing deduplication operation, wherein the deduplication operation refers to only keeping tracks with the largest frequency in the same track, finally descending and sorting according to the track frequency, traversing sequentially from the track with the largest track frequency, and stopping track traversal when at least one of track starting point and starting point set intersection, track ending point and ending point set intersection, track direct following active relation set and track direct following active relation set intersection is not blank, and deleting the track direct following active relation set and track direct following active relation set intersection in the log direct following relation set, track ending point and ending point set intersection in the starting point set until the track direct following relation set, the starting point set and the ending point set are blank.
The third object of the invention is achieved by the following technical scheme: a storage medium storing a program which, when executed by a processor, implements the business process event log sampling method described above.
The fourth object of the invention is achieved by the following technical scheme: a computing device comprising a processor and a memory for storing a program executable by the processor, the processor implementing the business process event log sampling method described above when executing the program stored by the memory.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention realizes that for the large-scale event log, the sample log obtained by adopting a more efficient business process event log sampling method is adopted to sample the large-scale event log, so that the completeness of the log can be ensured;
2. the invention uses the more efficient business process event log sampling method to sample, and greatly improves the sampling efficiency of the event log on the premise of ensuring the model mining quality, thereby providing four new sampling methods for the process mining field;
3. the method can be deployed on a distributed system in combination with the big data field, and can process the ultra-large-scale event log more efficiently;
4. the method has wide use space in the aspect of process discovery of large-scale logs, has strong practicability, and has wide prospect in the process discovery, consistency check and other process mining fields.
Drawings
FIG. 1 is a schematic diagram of a logic flow of the present invention.
Fig. 2 is a Prom tool plug-in implementation method interface diagram of the present invention.
FIG. 3 is a diagram of an interface selection for four sampling methods according to the present invention.
FIG. 4 is a diagram of an original event log for a use case of the present invention.
FIG. 5 is a diagram of a sample event log for use cases of the present invention.
Fig. 6 is a diagram of a system architecture of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
Example 1
The embodiment discloses a business process event log sampling method which is realized in a plug-in mode in a Prom tool, as shown in fig. 2; as shown in fig. 1, an original event log is input, the log of the obtained event log directly follows an active relation set, a start point set and an end point set, and after one of four event log sampling methods is selected, sampling is performed according to different sampling strategies to obtain a sample event log, which specifically includes the following steps:
1) A set of direct-following activity relationships, a set of initial points, and a set of end points of the event log are obtained. Wherein the event log is composed of cases, the cases are composed of events, the events in the cases are represented by tracks, the events have a plurality of attributes, the events are represented by activities, and the set definition is as follows: directly following an activity means that in one track of the event log, it is satisfied that activity b follows activity a immediately, denoted < a, b >, the initial point of each track constitutes an initial point set, and the end point constitutes an end point set. The three set of determinations for this step are therefore as follows:
the example event log L contains 9 tracks for a total of 6 activities. Wherein, record sigma (1) =<a,d,e>,σ (2) =<a,b,c,e>,σ (3) =<b,c,e,f>,σ (4) =<b,d,f>,σ (5) =<c,d>,σ (6) =<a,c,d>,σ (7) =<b,c,d>,σ (8) =<a,d,e>,σ (9) =<b,c,e,f>。L=[<a,d,e>,<a,b,c,e>,<b,c,e,f>,<b,d,f>,<c,d>,<a,c,d>,<b,c,d>]. As shown in fig. 4, the original event log input when the present invention is used can finally obtain the sample log shown in fig. 5 through an event log sampling method.
a. The direct-following set of activity relationships of the log is noted dfrSetLog, dfrSetLog = [ < a, d >, < d, e >, < a, b >, < b, c >, < c, e >, < e, f >, < b, d >, < d, f >, < c, d >, < a, c > ];
b. the starting point set of the log is StartSet, startSet = [ a, b, c ];
c. the set of end points of the log is denoted by EndSet, endset= [ e, f, d ];
2) Judging whether the intersection of the track starting point and the start point set, the intersection of the track ending point and the end point set, the intersection of the track direct following active relation set and the log direct following active relation set are empty sets or not; if the judgment results are all empty sets, finishing track traversal of the event log, and outputting a sample log; if the judgment result is not the empty set, firstly selecting a business process event log sampling plug-in (named Business Process Event Log Sampling Plugin) in the Prom6 platform, and secondly selecting one of four event log sampling methods, wherein the four event log sampling methods are respectively as follows: (1) fully traversing the sampling method (Brute Force Sampling); (2) aggregate coverage sampling (Set Coverage Sampling); (3) Track Length-based Sampling method (track Length-based Sampling); (4) A Sampling method (Trace Frequency-based Sampling) based on Trace Frequency, as shown in fig. 3, which is a selection interface of the Sampling method;
3) According to the event log sampling method selected in the step 2), the selection tracks form a new log, and the new log is a sample log, and specifically comprises the following steps:
a. if a complete traversal sampling method is selected, traversing the first track of the event log in sequence, adding the track into the sample log when at least one of the intersection of a track starting point and a starting point set, the intersection of a track ending point and an ending point set and the intersection of a track direct following active relation set and a log direct following active relation set is not an empty set, deleting the intersection of the track direct following active relation set and the track direct following active relation set in the log direct following relation set, the intersection of a starting point and the starting point set in the starting point set and the intersection of a track ending point and the ending point set in the ending point set until the track direct following relation set, the starting point set and the ending point set are empty sets, and stopping track traversal; the resulting sample log L 'of the example event log is thus L' = [ < a, d, e >, < a, b, c, e >, < b, c, e, f >, < b, d, f >, < c, d >, < a, c, d > ].
b. If the selection set covers the sampling method, traversing all tracks in the log, selecting a track with the largest intersection between the track direct following active relation set and the log direct following active relation set, adding the track into the sample log under the condition that the intersection between a track starting point and a starting point set, the intersection between a track ending point and an ending point set and the intersection between the track direct following active relation set and the log direct following active relation set is not an empty set is met, deleting the intersection between the track direct following active relation set and the track direct following active relation set in the log direct following active relation set, the intersection between a starting point and the starting point set in the starting point set and the intersection between a track ending point and the ending point set in the ending point set, and stopping track traversing until the intersection between the track direct following relation set, the starting point set and the ending point set is an empty set; the resulting sample log L 'of the example event log is thus L' = [ < a, d, e >, < a, b, c, e >, < b, c, e, f >, < b, d, f >, < c, d >, < a, c, d > ].
c. If a sampling method based on track length is selected, wherein the track length refers to the number of activities contained in a track, firstly counting all track lengths in an event log and carrying out descending order sequencing, secondly traversing sequentially from the track with the longest length, adding the track into a sample log when at least one of the track starting point and starting point set intersection, the track ending point and ending point set intersection and the track direct following activity relation set intersection is not an empty set, and deleting the track direct following activity relation set and the track direct following activity relation set intersection in the log direct following relation set, the starting point and starting point set intersection in the starting point set and the track ending point set intersection in the ending point set until the track direct following relation set, the starting point set and the ending point set are empty sets; the resulting sample log L 'of the example event log is thus L' = [ < a, d, e >, < a, b, c, e >, < b, c, e, f >, < b, d, f >, < c, d >, < a, c, d > ].
d. If a sampling method based on track frequency is selected, wherein the track frequency refers to the track occurrence number in track traversal of an event log, firstly counting the track frequency of the event log and performing deduplication operation, wherein the deduplication operation refers to only keeping tracks with the largest frequency in the same track, finally descending and sorting according to the track frequency, traversing sequentially from the track with the largest track frequency, and stopping track traversal when at least one of track starting point and starting point set intersection, track ending point and ending point set intersection, track direct following active relation set and track direct following active relation set intersection is not blank, and deleting the track direct following active relation set and track direct following active relation set intersection in the log direct following relation set, track ending point and ending point set intersection in the starting point set until the track direct following relation set, the starting point set and the ending point set are blank. The resulting sample log L 'of the example event log is thus L' = [ < a, d, e >, < a, b, c, e >, < b, c, e, f >, < b, d, f >, < c, d >, < a, c, d > ].
Example 2
The embodiment discloses a business process event log sampling system, as shown in fig. 6, which comprises an event log data acquisition module, a track set intersection judgment module, an event log sampling selection module and a sample log track selection module;
the event log data acquisition module is used for acquiring a log to directly follow the active relation set, the starting point set and the ending point set;
the track set intersection judgment module is used for judging whether the intersection of a track starting point and a starting point set, the intersection of a track ending point and an ending point set, the intersection of a track direct following active relation set and a log direct following active relation set are empty sets or not;
the event log sampling selection module is used for selecting one of four event log sampling methods, namely a full traversal sampling method, a set coverage sampling method, a sampling method based on track length and a sampling method based on track frequency, or directly finishing track traversal of the event log, and outputting a sample log;
the sample log track selection module is used for selecting tracks to form a new log, and the new log is the sample log.
The event log data acquisition module performs the following operations:
the method comprises the steps of obtaining a starting point set, an ending point set and a log directly following an activity relation set of an event log, wherein the event log consists of cases, the cases consist of events, and the events in the cases are represented in the form of tracks. Events have a number of attributes, and the events are represented by the activities in the present invention, and the three aggregate concrete solutions are as follows: the example event log L contains 9 tracks for a total of 6 activities. Wherein, record sigma (1) =<a,d,e>,σ (2) =<a,b,c,e>,σ (3) =<b,c,e,f>,σ (4) =<b,d,f>,σ (5) =<c,d>,σ (6) =<a,c,d>,σ (7) =<b,c,d>,σ (8) =<a,d,e>,σ (9) =<b,c,e,f>。L=[<a,d,e>,<a,b,c,e>,<b,c,e,f>,<b,d,f>,<c,d>,<a,c,d>,<b,c,d>]。
a. The direct-following set of activity relationships of the log is noted dfrSetLog, dfrSetLog = [ < a, d >, < d, e >, < a, b >, < b, c >, < c, e >, < e, f >, < b, d >, < d, f >, < c, d >, < a, c > ];
b. the starting point set of the log is StartSet, startSet = [ a, b, c ];
c. the set of end points of the log is denoted by EndSet, endset= [ e, f, d ];
the track set intersection judging module executes the following operations:
and judging whether the intersection of the track starting point and the starting point set, the intersection of the track ending point and the ending point set and the intersection of the track direct following active relation set and the log direct following active relation set are empty sets or not according to the log direct following active relation set, the starting point set and the ending point set obtained by the data acquisition module.
The event log sampling selection module performs the following operations:
a. if the judgment result is an empty set, finishing track traversal of the event log, and outputting a sample log;
b. if the judgment result is not an empty set, firstly selecting a business process event log sampling plug-in (named Business Process Event Log Sampling Plugin) in the Prom6 platform, and secondly selecting one of four event log sampling methods, wherein the four event log sampling methods are respectively as follows: (1) fully traversing the sampling method (Brute Force Sampling); (2) aggregate coverage sampling (Set Coverage Sampling); (3) Track Length-based Sampling method (track Length-based Sampling); (4) Track Frequency based Sampling method (track Frequency-based Sampling).
The sample log trace selection module performs the following operations:
a. if a complete traversal sampling method is selected, traversing the first track of the event log in sequence, adding the track into the sample log when at least one of the intersection of a track starting point and a starting point set, the intersection of a track ending point and an ending point set and the intersection of a track direct following active relation set and a log direct following active relation set is not an empty set, deleting the intersection of the track direct following active relation set and the track direct following active relation set in the log direct following relation set, the intersection of a starting point and the starting point set in the starting point set and the intersection of a track ending point and the ending point set in the ending point set until the track direct following relation set, the starting point set and the ending point set are empty sets, and stopping track traversal; the resulting sample log L 'of the example event log is thus L' = [ < a, d, e >, < a, b, c, e >, < b, c, e, f >, < b, d, f >, < c, d >, < a, c, d > ].
b. If the selection set covers the sampling method, traversing all tracks in the log, selecting a track with the largest intersection between the track direct following active relation set and the log direct following active relation set, adding the track into the sample log under the condition that the intersection between a track starting point and a starting point set, the intersection between a track ending point and an ending point set and the intersection between the track direct following active relation set and the log direct following active relation set is not an empty set is met, deleting the intersection between the track direct following active relation set and the track direct following active relation set in the log direct following active relation set, the intersection between a starting point and the starting point set in the starting point set and the intersection between a track ending point and the ending point set in the ending point set, and stopping track traversing until the intersection between the track direct following relation set, the starting point set and the ending point set is an empty set; the resulting sample log L 'of the example event log is thus L' = [ < a, d, e >, < a, b, c, e >, < b, c, e, f >, < b, d, f >, < c, d >, < a, c, d > ].
c. If a sampling method based on track length is selected, wherein the track length refers to the number of activities contained in a track, firstly counting all track lengths in an event log and carrying out descending order sequencing, secondly traversing sequentially from the track with the longest length, adding the track into a sample log when at least one of the track starting point and starting point set intersection, the track ending point and ending point set intersection and the track direct following activity relation set intersection is not an empty set, and deleting the track direct following activity relation set and the track direct following activity relation set intersection in the log direct following relation set, the starting point and starting point set intersection in the starting point set and the track ending point set intersection in the ending point set until the track direct following relation set, the starting point set and the ending point set are empty sets; the resulting sample log L 'of the example event log is thus L' = [ < a, d, e >, < a, b, c, e >, < b, c, e, f >, < b, d, f >, < c, d >, < a, c, d > ].
d. If a sampling method based on track frequency is selected, wherein the track frequency refers to the track occurrence number in track traversal of an event log, firstly counting the track frequency of the event log and performing deduplication operation, wherein the deduplication operation refers to only keeping tracks with the largest frequency in the same track, finally descending and sorting according to the track frequency, traversing sequentially from the track with the largest track frequency, and stopping track traversal when at least one of track starting point and starting point set intersection, track ending point and ending point set intersection, track direct following active relation set and track direct following active relation set intersection is not blank, and deleting the track direct following active relation set and track direct following active relation set intersection in the log direct following relation set, track ending point and ending point set intersection in the starting point set until the track direct following relation set, the starting point set and the ending point set are blank. The resulting sample log L 'of the example event log is thus L' = [ < a, d, e >, < a, b, c, e >, < b, c, e, f >, < b, d, f >, < c, d >, < a, c, d > ].
Example 3
The present embodiment discloses a storage medium storing a program that, when executed by a processor, implements the business process event log sampling method described in embodiment 1.
The storage medium in this embodiment may be a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a usb disk, a removable hard disk, or the like.
Example 4
The embodiment discloses a computing device, which comprises a processor and a memory for storing a program executable by the processor, wherein the method for sampling the business process event log is implemented when the processor executes the program stored by the memory.
The computing device described in this embodiment may be a desktop computer, a notebook computer, a smart phone, a PDA handheld terminal, a tablet computer, a programmable logic controller (PLC, programmable Logic Controller), or other terminal devices with processor functionality.
In summary, after the above scheme is adopted, the invention provides a new way for the existing event log sampling method to not effectively process the information in the large-scale event log or not, and the inefficiency of the discovery process model is caused, so that the sample log with enough representativeness can be effectively obtained through sampling, the practical popularization value is realized, and the popularization is worth.
The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, so variations in shape and principles of the present invention should be covered.

Claims (4)

1. The business process event log sampling method is characterized by comprising the following steps:
1) Acquiring three sets of event logs, namely a log direct following active relation set, a starting point set and an ending point set;
the event log is composed of cases, the cases are composed of events, the events in the cases are represented by the form of tracks, the events have a plurality of attributes, the events are represented by activities, and the set is defined as follows:
a. the direct following activity means that in one track of the event log, the condition that the activity b follows the activity a is marked as < a, b >, the log direct following activity relation set is a direct following activity set of each track in the log, and the record is marked as dfrSetLog;
b. the starting points of each track form a starting point set, and the starting point set of the log is recorded as StartSet;
c. the end points of each track form an end point set, and the end point set of the log is marked as EndSet;
2) Judging whether the intersection of the track starting point and the starting point set, the intersection of the track ending point and the ending point set and the intersection of the track direct following active relation set and the log direct following active relation set are empty sets or not according to the three sets obtained in the step 1); if the judgment results are all empty sets, finishing track traversal of the event log, and outputting a sample log; if the judgment result is not the empty set, any one of four event log sampling methods including a complete traversal sampling method, a set coverage sampling method, a sampling method based on track length and a sampling method based on track frequency is selected;
3) According to the event log sampling method selected in the step 2), selecting a track to form a new log, wherein the new log is a sample log;
if a complete traversal sampling method is selected, traversing the first track of the event log in sequence, adding the track into the sample log when at least one of the intersection of a track starting point and a starting point set, the intersection of a track ending point and an ending point set and the intersection of a track direct following active relation set and a log direct following active relation set is not an empty set, deleting the intersection of the track direct following active relation set and the track direct following active relation set in the log direct following relation set, the intersection of a starting point and the starting point set in the starting point set and the intersection of a track ending point and the ending point set in the ending point set until the track direct following relation set, the starting point set and the ending point set are empty sets, and stopping track traversal;
if the selection set covers the sampling method, traversing all tracks in the log, selecting a track with the largest intersection between the track direct following active relation set and the log direct following active relation set, adding the track into the sample log under the condition that the intersection between a track starting point and a starting point set, the intersection between a track ending point and an ending point set and the intersection between the track direct following active relation set and the log direct following active relation set is not an empty set is met, deleting the intersection between the track direct following active relation set and the track direct following active relation set in the log direct following active relation set, the intersection between a starting point and the starting point set in the starting point set and the intersection between a track ending point and the ending point set in the ending point set, and stopping track traversing until the intersection between the track direct following relation set, the starting point set and the ending point set is an empty set;
if a sampling method based on track length is selected, wherein the track length refers to the number of activities contained in a track, firstly counting all track lengths in an event log and carrying out descending order sequencing, secondly traversing sequentially from the track with the longest length, adding the track into a sample log when at least one of the track starting point and starting point set intersection, the track ending point and ending point set intersection and the track direct following activity relation set intersection is not an empty set, and deleting the track direct following activity relation set and the track direct following activity relation set intersection in the log direct following relation set, the starting point and starting point set intersection in the starting point set and the track ending point set intersection in the ending point set until the track direct following relation set, the starting point set and the ending point set are empty sets;
if a sampling method based on track frequency is selected, wherein the track frequency refers to the track occurrence number in track traversal of an event log, firstly counting the track frequency of the event log and performing deduplication operation, wherein the deduplication operation refers to only keeping tracks with the largest frequency in the same track, finally descending and sorting according to the track frequency, traversing sequentially from the track with the largest track frequency, and stopping track traversal when at least one of track starting point and starting point set intersection, track ending point and ending point set intersection, track direct following active relation set and track direct following active relation set intersection is not blank, and deleting the track direct following active relation set and track direct following active relation set intersection in the log direct following relation set, track ending point and ending point set intersection in the starting point set until the track direct following relation set, the starting point set and the ending point set are blank.
2. The business process event log sampling system is characterized by being used for realizing the business process event log sampling method according to claim 1, and comprises an event log data acquisition module, a track set intersection judgment module, an event log sampling selection module and a sample log track selection module;
the event log data acquisition module is used for acquiring a log to directly follow the active relation set, the starting point set and the ending point set;
the track set intersection judgment module is used for judging whether the intersection of a track starting point and a starting point set, the intersection of a track ending point and an ending point set, the intersection of a track direct following active relation set and a log direct following active relation set are empty sets or not;
the event log sampling selection module is used for selecting one of four event log sampling methods, namely a full traversal sampling method, a set coverage sampling method, a sampling method based on track length and a sampling method based on track frequency, or directly finishing track traversal of the event log, and outputting a sample log;
the sample log track selection module is used for selecting tracks to form a new log, and the new log is the sample log.
3. A storage medium storing a program which, when executed by a processor, implements the business process event log sampling method of claim 1.
4. A computing device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the business process event log sampling method of claim 1.
CN202110914759.0A 2021-08-10 2021-08-10 Business process event log sampling method, system, storage medium and computing device Active CN113704215B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110914759.0A CN113704215B (en) 2021-08-10 2021-08-10 Business process event log sampling method, system, storage medium and computing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110914759.0A CN113704215B (en) 2021-08-10 2021-08-10 Business process event log sampling method, system, storage medium and computing device

Publications (2)

Publication Number Publication Date
CN113704215A CN113704215A (en) 2021-11-26
CN113704215B true CN113704215B (en) 2023-10-13

Family

ID=78652112

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110914759.0A Active CN113704215B (en) 2021-08-10 2021-08-10 Business process event log sampling method, system, storage medium and computing device

Country Status (1)

Country Link
CN (1) CN113704215B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238243B (en) * 2021-12-17 2023-02-03 杭州电子科技大学 Local log sampling method for process discovery

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416365A (en) * 2018-02-06 2018-08-17 山东科技大学 Concurrent Complete Log method for digging based on distance
CN112632018A (en) * 2020-12-21 2021-04-09 山东理工大学 Business process event log sampling method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10860949B2 (en) * 2016-05-02 2020-12-08 Verizon Media Inc. Feature transformation of event logs in machine learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416365A (en) * 2018-02-06 2018-08-17 山东科技大学 Concurrent Complete Log method for digging based on distance
CN112632018A (en) * 2020-12-21 2021-04-09 山东理工大学 Business process event log sampling method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于事件日志增强的时序活动表示学习方法;倪维健,孙宇健,等;《计算机集成制造系统》;第第25卷卷(第第4期期);全文 *

Also Published As

Publication number Publication date
CN113704215A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
CN107506451B (en) Abnormal information monitoring method and device for data interaction
CN110210227B (en) Risk detection method, device, equipment and storage medium
CN109062780B (en) Development method of automatic test case and terminal equipment
CN102902752B (en) Method and system for monitoring log
CN111752799A (en) Service link tracking method, device, equipment and storage medium
US20140207820A1 (en) Method for parallel mining of temporal relations in large event file
CN103399887A (en) Query and statistical analysis system for mass logs
CN106126601A (en) A kind of social security distributed preprocess method of big data and system
CN102508919A (en) Data processing method and system
CN113704215B (en) Business process event log sampling method, system, storage medium and computing device
CN106502875A (en) A kind of daily record generation method and system based on cloud computing
CN113297269A (en) Data query method and device
CN105279138B (en) A kind of information research report automatic creation system
CN112651618A (en) Construction method of audit dimension model for online audit of metering data
CN115392501A (en) Data acquisition method and device, electronic equipment and storage medium
CN115827436A (en) Data processing method, device, equipment and storage medium
CN113326131B (en) Data processing method, device, equipment and storage medium
Cunha et al. Context-aware execution migration tool for data science Jupyter Notebooks on hybrid clouds
CN113312529A (en) Data visualization method and device, computer equipment and storage medium
CN102436535B (en) Identification method and system for creative inflection point in computer aided design process
CN109032940B (en) Test scene input method, device, equipment and storage medium
US8489444B2 (en) Chronicling for process discovery in model driven business transformation
WO2019127926A1 (en) Calculation method and calculation device for sparse neural network, electronic device, computer readable storage medium, and computer program product
CN112000312B (en) Space big data automatic parallel processing method and system based on Kettle and GeoTools
CN112860412A (en) Service data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211221

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Shenzhen Jiecheng Software Co.,Ltd.

Address before: 266 Xincun West Road, Zhangdian District, Zibo City, Shandong Province

Applicant before: Shandong University of Technology

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230602

Address after: 408, 4th Floor, No. 6 Zhongguancun South Street, Haidian District, Beijing, 100080

Applicant after: Beijing Jiecheng Heli Technology Co.,Ltd.

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant before: Shenzhen Jiecheng Software Co.,Ltd.

GR01 Patent grant
GR01 Patent grant