CN110032494B - Double-granularity noise log filtering method based on incidence relation - Google Patents

Double-granularity noise log filtering method based on incidence relation Download PDF

Info

Publication number
CN110032494B
CN110032494B CN201910218832.3A CN201910218832A CN110032494B CN 110032494 B CN110032494 B CN 110032494B CN 201910218832 A CN201910218832 A CN 201910218832A CN 110032494 B CN110032494 B CN 110032494B
Authority
CN
China
Prior art keywords
event
log
events
noise
mixed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910218832.3A
Other languages
Chinese (zh)
Other versions
CN110032494A (en
Inventor
孙笑笑
侯文杰
俞东进
潘建梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201910218832.3A priority Critical patent/CN110032494B/en
Publication of CN110032494A publication Critical patent/CN110032494A/en
Application granted granted Critical
Publication of CN110032494B publication Critical patent/CN110032494B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a dual-granularity noise log filtering method based on an incidence relation. The method obtains the mixed dependency degree based on the calculation of the local dependency degree and the global dependency degree, and can simultaneously realize the fine-grained filtration of noise events in the log and the coarse-grained filtration of noise tracks. Compared with the traditional log filtering method, the log filtering method has the following benefits: 1. a double-particle filtering mechanism is adopted, and different filtering mechanisms are used for different noise scenes, so that an excellent filtering effect is realized under the condition that original log data is kept as much as possible; 2. the filtered log file is used for process mining, so that the accuracy of the process discovery model can be greatly improved, and the understandability of the model is enhanced.

Description

Double-granularity noise log filtering method based on incidence relation
Technical Field
The invention relates to the field of process mining, in particular to a dual-granularity noise log filtering method based on an incidence relation.
Background
Process mining aims to extract useful information from event logs recorded by process-aware information systems to help stakeholders understand the actual execution of the process. The process discovery is an important part of process mining, and the effect of the process discovery is to construct a process model which can reproduce event logging behaviors. The high-precision model can intuitively show the actual execution condition of the business process.
In a business process management system, the activities of a business process are performed according to a well-designed process model, and the execution of these activities is recorded in a log to help stakeholders analyze and monitor the execution of the process. In real life, most business processes have no standardized process model, or the process model has a great difference from the current business process along with the continuous evolution of the business process, so people need to extract the actual execution behavior of the process from the log generated by the process by means of a process discovery technology. However, the noise present in the log can negatively impact the quality of the flow discovery model. If the flow discovery technology is used for carrying out flow discovery on the log containing the noise, the discovery model of the log can generate invisible tasks and non-freely selected structures, and therefore complexity and understandability of the mining model are increased. Common log noise is of the following types: missing type noise events (some events in the flow are not logged for some reason), redundant type noise events (some events in the flow are repeatedly logged multiple times), and misplaced type noise events (some events are logged incorrectly in the order in which they occur in the flow trace).
The noise filtering algorithm can effectively filter noise events in the log, and the accuracy of the process discovery model is greatly improved. The current log noise filtering algorithm can be roughly divided into two types according to the filtering granularity, namely coarse-grained filtering and fine-grained filtering. Where coarse-grained filtering removes the traces containing noise events directly from the original log, removing the entire trace may produce large changes to the mined model structure for smaller-scale log data. The fine-grained filtering only removes the noise event and keeps other events on the trajectory, but the noise event is removed, and meanwhile, the behavior cannot be guaranteed to bring new noise to the trajectory, and meanwhile, the algorithm cannot solve the problem of the missing noise event.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a dual-granularity noise log filtering method based on an incidence relation, which can effectively solve the problems. The technical scheme adopted by the invention is as follows:
a dual-granularity noise log filtering method based on incidence relation comprises the following steps:
(1) inputting an original log file, preprocessing the original log file to generate a log set consisting of a plurality of process tracks sigma
Figure BDA0002002924030000021
Each flow track is composed of a plurality of flow events eiComposition σ ═<e1,…,en>Recording the set of all flow events e in all flow tracks as epsilon, namely e belongs to epsilon;
(2) statistics Log aggregation
Figure BDA0002002924030000022
Frequency dependency DFD (e) between two process events in all process tracesi,ej);
(3) Further calculating the local dependence Dep between every two eventslocal(ei,ej) Global dependency Depglobal(ei,ej) And mixed dependencies Depmixed(ei,ej);
The local dependency Deplocal(ei,ej) The calculation formula is as follows:
Figure BDA0002002924030000023
wherein C is1、C2Is a constant number, Dsuc(ei) Indicating subsequent density, i.e. event eiAverage frequency of occurrence of all subsequent events of (a); dpre(ej) Representing precursor density for representing event ejAverage frequency of occurrence of all precursor events; the calculation formulas of the successor density and the predecessor density are as follows:
Dpre(ek)=Npre(ek)/|Upre(ek)|
Dsuc(ek)=Nsuc(ek)/|Usuc(ek)
Figure BDA0002002924030000024
Figure BDA0002002924030000025
wherein Dpre(ek) As an event ekPrecursor density of (D)suc(ek) As an event ekSubsequent density of (2), Npre(ek) To be by an event ekNumber of following relations for subsequent events, Nsuc(ek) As an event ekNumber of following relations for predecessor events, Upre(ek) As an event ekIs a precursor set, | Upre(ek) L is event ekNumber of events in the precursor set, Usuc(ek) As an event ekIs connected with the successor set of, | Usuc(ek) I event ekThe number of event categories in the successor set of (1);
the global dependency Depglobal(ei,ej) The calculation formula is as follows:
Figure BDA0002002924030000031
θ=Max{DFD(ex,ey)}
Figure BDA0002002924030000032
where ζ is the global noise factor used to partition global noise events.
The mixed dependency Depmixed(ei,ej) The calculation formula is as follows:
Depmixed(ei,ej)=α*Deplocal(ei,ej)+(1-α)*Depglobal(ei,ej)
wherein α weighs factors that balance the occupancy of global and local dependencies.
(4) Constructing log set according to the mixed dependencies calculated in the last step
Figure BDA0002002924030000033
Mixed dependency matrix of all process events in
Figure BDA0002002924030000034
(5) The method for filtering log noise specifically comprises the following steps:
51) constructing an empty Log set
Figure BDA0002002924030000035
For storing the filtered tracks;
52) fetching a Log set
Figure BDA0002002924030000036
A trace of sigma, a discard value of sigma
Figure BDA0002002924030000037
Initializing to 1;
53) get start event e of σstartAnd will start event estartAdding to an empty sequence of events sigmafilterPerforming the following steps;
54) fetching a current event e according to the sequence of events in sigmai
55) Taking out the next event e of the current event in the tracki+1
56) In that
Figure BDA0002002924030000038
In search to eiAnd ei+1Mixed dependency of Depmixed(ei,ei+1) First, fine-grained filtering of events is performed, if Depmixed(ei,ei+1) Is not less than the mixedness threshold β, event ei+1Is determined as a normal event, and is added to the trajectory σfilter,ei+1Becoming the current event, subscript i ═ i +1, and returning to step 55); if Depmixed(ei,ei+1) Is less than the mixedness threshold β, event ei+1Is determined as a noise event, and a penalty function is used to modify the discard value of the trajectory sigma
Figure BDA0002002924030000039
The penalty function is formulated as follows:
Figure BDA00020029240300000310
wherein
Figure BDA00020029240300000311
Determining the punishment degree of a punishment function as a punishment factor;
if the corrected abandon value is not lower than the set abandon threshold value
Figure BDA00020029240300000312
Return to step 55); if the corrected abandon value
Figure BDA00020029240300000313
Below the abandon threshold
Figure BDA00020029240300000314
Then coarse-grained filtering operation of the track is executed, the track sigma is judged as a noise track, and the step 52) is returned;
57) if event ei+1End event e for current trajectory σendThen the trace σ will be filteredfilterAdding to a filtered Log set
Figure BDA00020029240300000315
Performing the following steps;
58) repeating the steps 52) to 57) until all the tracks in the original log set are taken out;
59) outputting a filtered log set
Figure BDA00020029240300000316
(6) Filtering log sets from output
Figure BDA00020029240300000317
And regenerating the log file.
Preferably, the log collection described in step (1)
Figure BDA0002002924030000041
All the execution examples of the business process are included, that is, each process track sigma corresponds to one execution example of the business process, and the process track sigma is composed of a plurality of process events eAnd (4) ordered sequence, wherein the flow event e is a record of the execution activity of the business flow.
Preferably, the frequency-dependent DFD (e) described in step (2)i,ej) Indicating degree of direct follow, i.e. event e in all flow instancesjFollowing event eiThe total frequency of occurrence.
Preferably, the global noise factor ζ described in step (3) is 0.02.
Preferably, the blending degree threshold β in step (5) is 0.5.
Preferably, the value of the trade-off factor α in step (5) is 0.5.
Preferably, the penalty factor described in step (5)
Figure BDA0002002924030000042
0.8 is taken.
Preferably, the abandon threshold is set forth in step (5)
Figure BDA0002002924030000043
Take 0.7.
The filtering method provided by the invention considers the dependency relationship between events from the global and local angles, and judges whether the events are noise events according to the dependency relationship. Compared with the traditional log filtering method, the log filtering method has the following benefits: 1. a double-particle filtering mechanism is adopted, and different filtering mechanisms are used for different noise scenes, so that an excellent filtering effect is realized under the condition that original log data is kept as much as possible; 2. the filtered log file is used for process mining, so that the accuracy of the process discovery model can be greatly improved, and the understandability of the model is enhanced.
Drawings
FIG. 1 is a flow chart of a dual-granularity noise log filtering method based on an incidence relation according to the present invention;
FIG. 2 is a schematic diagram of an example of noise filtering according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.
As shown in fig. 1, the method for filtering a dual-granularity noise log based on an association relationship of the present invention includes the following steps:
(1) inputting an original log file, preprocessing the original log file to generate a log set consisting of a plurality of process tracks sigma
Figure BDA0002002924030000051
Each flow track is composed of a plurality of flow events eiComposition σ ═<e1,…,en>Recording the set of all flow events e in all flow tracks as epsilon, namely e belongs to epsilon;
log collection
Figure BDA0002002924030000052
All the execution examples of the business process are included, that is, each process track sigma corresponds to one execution example of the business process, each process track sigma is an ordered sequence composed of a plurality of process events e, and the process events e are one record of the business process execution activity.
(2) Statistics Log aggregation
Figure BDA0002002924030000053
Frequency dependency DFD (e) between two process events in all process tracesi,ej)。
Frequency dependence DFD (e)i,ej) Indicating degree of direct follow-up, i.e. in all process instancesMiddle event ejFollowing event eiThe total frequency of occurrence.
(3) Further calculating the local dependence Dep between every two eventslocal(ei,ej) Global dependency Depglobal(ei,ej) And mixed dependencies Depmixed(ei,ej);
The local dependency Deplocal(ei,ej) The calculation formula is as follows:
Figure BDA0002002924030000054
wherein C is1、C2Is a constant number, Dsuc(ei) Indicating subsequent density, i.e. event eiAverage frequency of occurrence of all subsequent events of (a); dpre(ej) Representing precursor density for representing event ejAverage frequency of occurrence of all precursor events; the calculation formulas of the successor density and the predecessor density are as follows:
Dpre(ek)=Npre(ek)/|Upre(ek)|
Dsuc(ek)=Nsuc(ek)/|Usuc(ek)|
Figure BDA0002002924030000055
Figure BDA0002002924030000056
wherein Dpre(ek) As an event ekPrecursor density of (D)suc(ek) As an event ekSubsequent density of (2), Npre(ek) To be by an event ekNumber of following relations for subsequent events, Nsuc(ek) As an event ekNumber of following relations for predecessor events, Upre(ek) As an event ekIs a precursor set, | Upre(ek) L is event ekNumber of events in the precursor set, Usuc(ek) As an event ekIs connected with the successor set of, | Usuc(ek) I event ekThe number of event categories in the successor set of (1);
the global dependency Depglobal(ei,ej) The calculation formula is as follows:
Figure BDA0002002924030000061
θ=Max{DFD(ex,ey)}
Figure BDA0002002924030000062
where ζ is the global noise factor, which is used to partition global noise events, and is taken to be 0.02.
The mixed dependency Depmixed(ei,ej) The calculation formula is as follows:
Depmixed(ei,ej)=α*Deplocal(ei,ej)+(1-α)*Depglobal(ei,ej)
wherein α is a weighing factor used to balance the occupancy of global and local dependencies, which is taken to be 0.5.
(4) Constructing log set according to the mixed dependencies calculated in the last step
Figure BDA0002002924030000063
Mixed dependency matrix of all process events in
Figure BDA0002002924030000064
(5) The method for filtering log noise specifically comprises the following steps:
51) constructing an empty Log set
Figure BDA0002002924030000065
For storing the filtered tracks;
52) fetching a Log set
Figure BDA0002002924030000066
A trace of sigma, a discard value of sigma
Figure BDA0002002924030000067
Initializing to 1;
53) get start event e of σstartAnd will start event estartAdding to an empty sequence of events sigmafilterPerforming the following steps;
54) fetching a current event e according to the sequence of events in sigmai
55) Taking out the next event e of the current event in the tracki+1
56) In that
Figure BDA0002002924030000068
In search to eiAnd ei+1Mixed dependency of Depmixed(ei,ei+1) First, fine-grained filtering of events is performed, if Depmixed(ei,ei+1) Is not less than the mixedness threshold β (taken to be 0.5), event ei+1Is determined as a normal event, and is added to the trajectory σfilter,ei+1Becoming the current event, subscript i ═ i +1, and returning to step 55); if Depmixed(ei,ei+1) Is less than the mixedness threshold β, event ei+1Is determined as a noise event, and a penalty function is used to modify the discard value of the trajectory sigma
Figure BDA0002002924030000069
The penalty function is formulated as follows:
Figure BDA00020029240300000610
wherein
Figure BDA00020029240300000611
Determining the punishment degree of a punishment function as a punishment factor, and taking 0.8;
if the corrected abandon value is not lower than the set abandon threshold value
Figure BDA00020029240300000612
Return to step 55); if the corrected abandon value
Figure BDA00020029240300000613
Below the abandon threshold
Figure BDA00020029240300000614
A coarse-grained filtering operation of the trajectory is performed and the trajectory sigma is determined to be a noisy trajectory, returning to step 52). Abandon threshold
Figure BDA00020029240300000615
Take 0.7.
57) If event ei+1End event e for current trajectory σendThen the trace σ will be filteredfilterAdding to a filtered Log set
Figure BDA00020029240300000616
Performing the following steps;
58) repeating the steps 52) to 57) until all the tracks in the original log set are taken out;
59) outputting a filtered log set
Figure BDA00020029240300000617
(6) Filtering log sets from output
Figure BDA0002002924030000071
And regenerating the log file.
Based on the above method flow, the technical effects are further shown by the embodiments.
Examples
The steps in this embodiment are the same as those in the previous embodiment, and are not described herein again. The following shows some of the implementation processes and implementation results:
data source acquisition: the original log file used in this embodiment reads the log file using java toolkit JDOM, obtains a root node root of the log file, obtains a child node element named Process from the root node, and further obtains all child node elements named Process instance from the Process node. A ProcessInstance node contains all the information of a process one-time execution instance, and usually has a plurality of node elements named audiotrailentry, and the detailed information of each event occurring in the process instance is recorded in one audiotrailentry node element, and these audiotrailentry nodes contain many event attributes, such as a timestamp attribute, an event name attribute, a resource attribute, and the like. Screening the event information, eliminating redundant information in the event information, reserving event name attributes of the events, sequencing the events of the same instance according to the starting timestamp attributes, and finally storing the events as a flow track sigma<e1,…,en>And endowing the track with the id attribute of the ProcessInstance node element corresponding to the track as the track id, and using a plurality of sets formed by all tracks in the log, namely an original log set
Figure BDA0002002924030000072
And (5) storing.
Fig. 2 shows in detail a specific process of performing dual-granularity noise log filtering based on an association relationship on two tracks (example 1 and example 2) by using the method of the present invention:
example 1 trajectory σ1=<ABCDEFGH>
1) Obtaining sigma1And adds it to the empty trajectory sequence σfPerforming the following steps;
2) taking out the next event B of the event A, and calculating the mixed relevance Dep of the event ABmixed(a, B) ═ 0.80, greater than the mixedness threshold 0.5, so event B is a normal event (non-noise event), added to the sequence σfPerforming the following steps;
3) taking out the next event C of the event B, and calculating the mixed relevance Dep of the event BCmixed(B, C) ═ 0.75, greater than the mixedness threshold of 0.5, due toThis event C is a normal event (non-noise event) which is added to the sequence σfPerforming the following steps;
4) taking out the next event D of the event C, and calculating the mixed association degree Dep of the event CDmixed(C, D) ═ 0.85, greater than the mixedness threshold 0.5, so event D is a normal event (non-noise event), added to the sequence σfPerforming the following steps;
5) taking out the next event E of the event D, and calculating the mixed relevance Dep of the event DEmixed(D, E) ═ 0.87, greater than the mixedness threshold 0.5, so event E is a normal event (non-noise event) added to the sequence σfPerforming the following steps;
6) taking out the next event F of the event E, and calculating the mixed relevance Dep of the event EFmixed(E, F) ═ 0.26, small mixedness threshold 0.5, so event F is a noise event, which is not added to sequence σfPerforming the following steps; modifying trajectory sigma using penalty function1Value of abandonment
Figure BDA0002002924030000081
Calculated to be 0.9, greater than the discard threshold of 0.7, so σ1A normal trajectory (non-noisy trajectory);
7) taking out the next event G of the event F, and calculating the mixed relevance Dep of the event EGmixed(E, G) ═ 0.87, greater than the mixedness threshold 0.5, so event G is a normal event (non-noise event), added to the sequence σfPerforming the following steps;
8) taking out the next event H of the event G, and calculating the mixed relevance Dep of the event GHmixed(G, H) ═ 0.85, greater than the mixedness threshold 0.5, so event H is a normal event (non-noise event), added to the sequence σfPerforming the following steps;
9) event H is the current trajectory σ1The trajectory filtered by the method is sigmaf=<ABCDEGH>It is added to the filter log set.
Example 2 track σ2=<ABCEGH>
1) Obtaining sigma2And adds it to the empty trajectory sequence σfPerforming the following steps;
2) taking out the next event B of the event A, and calculating the mixed relevance Dep of the event ABmixed(a, B) ═ 0.80, greater than the mixedness threshold 0.5, so event B is a normal event (non-noise event), added to the sequence σfPerforming the following steps;
3) taking out the next event C of the event B, and calculating the mixed relevance Dep of the event BCmixed(B, C) ═ 0.75, greater than the mixedness threshold 0.5, so event C is a normal event (non-noise event) added to the sequence σfPerforming the following steps;
4) taking out the next event E of the event C, and calculating the mixed relevance Dep of the event CEmixed(C, E) ═ 0.26, less than the mixedness threshold 0.5, so event E is a noise event, which is not added to the sequence σfPerforming the following steps; modifying trajectory sigma using penalty function2Value of abandonment
Figure BDA0002002924030000082
Calculated to be 0.9, greater than the discard threshold of 0.7, so σ2A normal trajectory (non-noisy trajectory);
5) taking out the next event G of the event E, and calculating the mixed association degree Dep of the event CGmixed(C, G) ═ 0.01, less than the mixedness threshold 0.5, so event G is a noise event, which is not added to the sequence σfPerforming the following steps; modifying trajectory sigma using penalty function2Value of abandonment
Figure BDA0002002924030000083
Calculated as 0.72, greater than the discard threshold of 0.7, so σ2A normal trajectory (non-noisy trajectory);
taking out the next event H of the event G, and calculating the mixed relevance Dep of the event CHmixed(C, H) ═ 0.01, less than the mixedness threshold 0.5, so event H is a noise event, which is not added to the sequence σfPerforming the following steps;
modifying trajectory sigma using penalty function2Value of abandonment
Figure BDA0002002924030000084
Calculated as 0.58, less than the discard threshold of 0.7, and thereforeσ2To noise traces, they are not added to the noise log set.

Claims (8)

1. A dual-granularity noise log filtering method based on incidence relation is characterized by comprising the following steps:
(1) inputting an original log file, preprocessing the original log file to generate a log set consisting of a plurality of process tracks sigma
Figure FDA0002404387060000014
Each flow track is composed of a plurality of flow events eiComposition σ ═<e1,…,en>Recording the set of all flow events e in all flow tracks as epsilon, namely e belongs to epsilon;
(2) statistics Log aggregation
Figure FDA0002404387060000015
Frequency dependency DFD (e) between two process events in all process tracesi,ej);
(3) Further calculating the local dependence Dep between every two eventslocal(ei,ej) Global dependency Depglobal(ei,ej) And mixed dependencies Depmixed(ei,ej);
The local dependency Deplocal(ei,ej) The calculation formula is as follows:
Figure FDA0002404387060000011
wherein C is1、C2Is a constant number, Dsuc(ei) Indicating subsequent density, i.e. event eiAverage frequency of occurrence of all subsequent events of (a); dpre(ej) Representing precursor density for representing event ejAverage frequency of occurrence of all precursor events; the calculation formulas of the successor density and the predecessor density are as follows:
Dpre(ek)=Npre(ek)/ |Upre(ek) |
Dsuc(ek)=Nsuc(ek)/ |Usuc(ek) |
Figure FDA0002404387060000012
Figure FDA0002404387060000013
wherein Dpre(ek) As an event ekPrecursor density of (D)suc(ek) As an event ekSubsequent density of (2), Npre(ek) To be by an event ekNumber of following relations for subsequent events, Nsuc(ek) As an event ekNumber of following relations for predecessor events, Upre(ek) As an event ekIs a precursor set, | Upre(ek) L is event ekNumber of events in the precursor set, Usuc(ek) As an event ekIs connected with the successor set of, | Usuc(ek) I event ekThe number of event categories in the successor set of (1);
the global dependency Depglobal(ei,ej) The calculation formula is as follows:
Figure FDA0002404387060000021
θ=Max{DFD(ex,ey)}
Figure FDA0002404387060000022
where ζ is a global noise factor used to partition global noise events;
the mixed dependency Depmixed(ei,ej) The calculation formula is as follows:
Depmixed(ei,ej)=α*Deplocal(ei,ej)+(1-α)*Depglobal(ei,ej)
α is a balance factor used for balancing the occupation proportion of the global dependency and the local dependency;
(4) constructing log set according to the mixed dependencies calculated in the last step
Figure FDA0002404387060000023
Mixed dependency matrix of all process events in
Figure FDA0002404387060000024
(5) The method for filtering log noise specifically comprises the following steps:
51) constructing an empty Log set
Figure FDA0002404387060000025
For storing the filtered tracks;
52) fetching a Log set
Figure FDA0002404387060000026
A trace of sigma, a discard value of sigma
Figure FDA0002404387060000027
Initializing to 1;
53) get start event e of σstartAnd will start event estartAdding to an empty sequence of events sigmafilterPerforming the following steps;
54) fetching a current event e according to the sequence of events in sigmai
55) Taking out the next event e of the current event in the tracki+1
56) In that
Figure FDA00024043870600000215
In search to eiAnd ei+1Mixed dependency of Depmixed(ei,ei+1) First, fine-grained filtering of events is performed, if Depmixed(ei,ei+1) Is not less than the mixedness threshold β, event ei+1Is determined as a normal event, and is added to the trajectory σfilter,ei+1Becoming the current event, subscript i ═ i +1, and returning to step 55); if Depmixed(ei,ei+1) Is less than the mixedness threshold β, event ei+1Is determined as a noise event, and a penalty function is used to modify the discard value of the trajectory sigma
Figure FDA00024043870600000216
The penalty function is formulated as follows:
Figure FDA0002404387060000028
wherein
Figure FDA0002404387060000029
Determining the punishment degree of a punishment function as a punishment factor;
if the corrected abandon value is not lower than the set abandon threshold value
Figure FDA00024043870600000217
Return to step 55); if the corrected abandon value
Figure FDA00024043870600000211
Below the abandon threshold
Figure FDA00024043870600000212
Then coarse-grained filtering operation of the track is executed, the track sigma is judged as a noise track, and the step 52) is returned;
57) if event ei+1End event e for current trajectory σendThen the trace σ will be filteredfilterAdding to a filtered Log set
Figure FDA00024043870600000210
Performing the following steps;
58) repeating the steps 52) to 57) until all the tracks in the original log set are taken out;
59) outputting a filtered log set
Figure FDA00024043870600000213
(6) Filtering log sets from output
Figure FDA00024043870600000214
And regenerating the log file.
2. The correlation-based dual-granularity noise log filtering method according to claim 1, wherein the log set in the step (1) is
Figure FDA0002404387060000031
All the execution examples of the business process are included, that is, each process track sigma corresponds to one execution example of the business process, the process track sigma is an ordered sequence composed of a plurality of process events e, and the process events e are one record of the business process execution activity.
3. The correlation-based dual-granularity noise log filtering method according to claim 1, wherein the frequency dependency DFD (e) in the step (2)i,ej) Indicating degree of direct follow, i.e. event e in all flow instancesjFollowing event eiThe total frequency of occurrence.
4. The correlation-based dual-granularity noise log filtering method according to claim 1, wherein the global noise factor ζ in step (3) is 0.02.
5. The correlation-based dual-granularity noise log filtering method as claimed in claim 1, wherein the mixedness threshold β in step (5) is 0.5.
6. The correlation-based dual-granularity noise log filtering method as claimed in claim 1, wherein the weighting factor α in the step (5) is 0.5.
7. The correlation-based dual-granularity noise log filtering method according to claim 1, wherein the penalty factor in the step (5)
Figure FDA0002404387060000032
0.8 is taken.
8. The correlation-based dual-granularity noise log filtering method according to claim 1, wherein the abandon threshold in the step (5)
Figure FDA0002404387060000033
Take 0.7.
CN201910218832.3A 2019-03-21 2019-03-21 Double-granularity noise log filtering method based on incidence relation Active CN110032494B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910218832.3A CN110032494B (en) 2019-03-21 2019-03-21 Double-granularity noise log filtering method based on incidence relation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910218832.3A CN110032494B (en) 2019-03-21 2019-03-21 Double-granularity noise log filtering method based on incidence relation

Publications (2)

Publication Number Publication Date
CN110032494A CN110032494A (en) 2019-07-19
CN110032494B true CN110032494B (en) 2020-05-26

Family

ID=67236382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910218832.3A Active CN110032494B (en) 2019-03-21 2019-03-21 Double-granularity noise log filtering method based on incidence relation

Country Status (1)

Country Link
CN (1) CN110032494B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110597686B (en) * 2019-08-18 2022-10-18 南京理工大学 Noise-tolerant process mining method based on mixed event log
CN112052990B (en) * 2020-08-21 2021-05-04 杭州电子科技大学 CNN-BilSTM hybrid model-based next activity prediction method for multi-angle business process
CN113176977A (en) * 2021-04-27 2021-07-27 南开大学 Interleaved log analysis method for networking workflow of construction
CN114564473B (en) * 2022-04-28 2022-07-12 江苏益柏锐信息科技有限公司 Data processing method, equipment and medium based on ERP enterprise management system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198146A (en) * 2013-04-19 2013-07-10 中国科学院计算技术研究所 Real-time event filtering method and real-time event filtering system oriented to network stream data
CN103593484A (en) * 2013-12-03 2014-02-19 南京安讯科技有限责任公司 Method for filtering garbage logs during mobile phone internet surfing
KR20170127876A (en) * 2016-05-13 2017-11-22 한국전자통신연구원 System and method for dealing with troubles through fault analysis of log
CN107909344A (en) * 2017-11-21 2018-04-13 杭州电子科技大学 Workflow logs iterative task recognition methods based on relational matrix

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198146A (en) * 2013-04-19 2013-07-10 中国科学院计算技术研究所 Real-time event filtering method and real-time event filtering system oriented to network stream data
CN103593484A (en) * 2013-12-03 2014-02-19 南京安讯科技有限责任公司 Method for filtering garbage logs during mobile phone internet surfing
KR20170127876A (en) * 2016-05-13 2017-11-22 한국전자통신연구원 System and method for dealing with troubles through fault analysis of log
CN107909344A (en) * 2017-11-21 2018-04-13 杭州电子科技大学 Workflow logs iterative task recognition methods based on relational matrix

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Filtering log data: Finding the needles in the Haystack;Li Yu et al.;《ResearchGate》;20150627;1-10 *

Also Published As

Publication number Publication date
CN110032494A (en) 2019-07-19

Similar Documents

Publication Publication Date Title
CN110032494B (en) Double-granularity noise log filtering method based on incidence relation
US7739666B2 (en) Analyzing software users with instrumentation data and user group modeling and analysis
US20170185468A1 (en) Creating A Correlation Rule Defining A Relationship Between Event Types
US20190171652A1 (en) Methods and systems for multi-dimensional aggregation using composition
US8127032B2 (en) Performance sampling in distributed systems
US20210342367A1 (en) Methods and systems for multi-dimensional aggregation using composition
CN112667860A (en) Sub-graph matching method, device, equipment and storage medium
Conforti et al. Timestamp repair for business process event logs
Wagner et al. Enhanced encoding techniques for the open trace format 2
CN115525693A (en) Incremental event log-oriented process model mining method and system
TWI482033B (en) Method for arranging schedules and computer using the same
Piattini et al. A metric-based approach for predicting conceptual data models maintainability
US20180004879A1 (en) Integrated circuit design verification
Kourani et al. Scalable Discovery of Partially Ordered Workflow Models with Formal Guarantees
Farkas et al. Towards reliable benchmarks of timed automata
CN114329453A (en) Anomaly detection method based on system log
WO2021133448A1 (en) Edge table representation of processes
CN114238243B (en) Local log sampling method for process discovery
Swanson Form coherence and the fates of de alio and de novo organizations in the United States digital computer industry: 1951–1994
Cuzzocrea et al. Spatio-temporal analysis of greenhouse gas data via clustering techniques
Shahzadi et al. Enhancement in Process Mining Model by Repairing Noisy Behavior in Event Log
CN113546426B (en) Security policy generation method for data access event in game service
Ostroski et al. Scalable Edge Clustering of Dynamic Graphs via Weighted Line Graphs
Chang et al. Transformation from activity diagrams with time properties to Timed Coloured Petri Nets
van der Aalst Discovery, verification and conformance of workflows with cancellation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Sun Xiaoxiao

Inventor after: Hou Wenjie

Inventor after: Yu Dongjin

Inventor after: Pan Jianliang

Inventor before: Sun Xiaoxiao

Inventor before: Yu Dongjin

Inventor before: Hou Wenjie

Inventor before: Pan Jianliang

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant