CN115907443A - Method, medium and electronic device for process mining analysis - Google Patents

Method, medium and electronic device for process mining analysis Download PDF

Info

Publication number
CN115907443A
CN115907443A CN202211464343.4A CN202211464343A CN115907443A CN 115907443 A CN115907443 A CN 115907443A CN 202211464343 A CN202211464343 A CN 202211464343A CN 115907443 A CN115907443 A CN 115907443A
Authority
CN
China
Prior art keywords
node
segment
target
self
loops
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211464343.4A
Other languages
Chinese (zh)
Inventor
王健
袁野
高煜光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Hongji Information Technology Co Ltd
Original Assignee
Shanghai Hongji Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Hongji Information Technology Co Ltd filed Critical Shanghai Hongji Information Technology Co Ltd
Priority to CN202211464343.4A priority Critical patent/CN115907443A/en
Publication of CN115907443A publication Critical patent/CN115907443A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a method, a medium and an electronic device for process mining analysis, wherein the method comprises the following steps: deleting a bidirectional edge between a target node pair on the direct following graph to obtain a target direct following graph, wherein the target node pair comprises two nodes belonging to a parallel relation; and carrying out flow mining analysis according to the target direct following graph. According to the method and the device, the direct following graph is effectively simplified by deleting the parallel staggered edges on the direct following graph, and the complexity of the direct following graph is reduced, so that the method and the device are more beneficial to process mining and process optimization.

Description

Method, medium and electronic device for process mining analysis
Technical Field
The present application relates to the field of process intelligence, and in particular, to a method, a medium, and an electronic device for process mining analysis.
Background
The Process Mining (Process Mining) is to mine Process knowledge of different dimensions among business activities by using an event log containing business execution information stored in an enterprise information system, establish a Process model capable of reflecting the actual business Process execution Process of an enterprise, and diagnose and optimize the original business Process on the basis of the Process knowledge. That is, the process mining is a process of mining and constructing a business process model from an Event log (Event Logs), and the obtained process model can well reflect the process behavior recorded by the Event log.
The types of the flow models include DFG (direct follow Graph), if the log entries are few, the obtained direct follow Graph is also simple, if the log entries are many and the number of corresponding flow case cases is many, the corresponding direct follow Graph is also very complex, on one hand, the very complex direct follow Graph will cause failure in logic analysis due to incapability of understanding the real flow logic, and on the other hand, the complexity will be greatly increased when flow analysis such as flow mining is performed based on the complex direct follow Graph.
Disclosure of Invention
The embodiment of the application aims to provide a method, a medium and an electronic device for process mining analysis, and the method, the medium and the electronic device for process mining analysis effectively simplify a direct follow graph by deleting parallel staggered edges on the direct follow graph, reduce the complexity of the direct follow graph, and are more beneficial to process mining and process optimization.
In a first aspect, an embodiment of the present application provides a method for process mining analysis, where the method includes: deleting a bidirectional edge between a target node pair on the direct following graph to obtain a target direct following graph, wherein the target node pair comprises two nodes belonging to a parallel relation; and carrying out flow mining analysis according to the target direct following graph.
According to some embodiments of the application, the direct following graph is simplified by deleting the bidirectional edge between the two nodes in the parallel relationship, so that the complexity of the direct following graph is effectively reduced, and the technical effect of performing flow mining or flow optimization based on the direct following graph can be improved.
In some embodiments, before the deleting directly follows the bidirectional edge on the graph between the target pair of nodes, the method further comprises: obtaining that a staggered relation exists between a first node and a second node according to trajectory data, and a short ring does not exist between the first node and the second node, and confirming that the first node and the second node form a candidate target node pair; or if a staggered relationship exists between a first node and a second node and at least one node of the first node and the second node has a self-loop according to the trajectory data, determining that the first node and the second node form a candidate target node pair; selecting at least some node pairs from the candidate target node pairs as the target node pairs.
Some embodiments of the present application provide a policy for identifying two nodes having a parallel relationship, and the policy can effectively identify parallel interleaving directly following a non-self-loop and a non-short loop on a graph, and delete a bidirectional edge between such parallel interleaving relationships, so that it is possible to effectively avoid deleting an edge corresponding to a type of parallel interleaving relationship such as a short loop by mistake, and simplify directly following the graph while ensuring a reaction condition of the graph to an actual process as much as possible.
In some embodiments, the staggering relationship is used to characterize that there is a first segment of a third node to a fourth node and a second segment of the fourth node to the third node in one or more tracks, and the first segment and the second segment are adjacent segments or non-adjacent segments if the first segment and the second segment are in the same track; the self-loop is used for representing that a segment from a node to the self exists in a track; the short ring is used for representing that a first segment from a fifth node to a sixth node exists in one track and a second segment from the sixth node to the fifth node exists, and the first segment and the second segment are adjacent segments.
Some embodiments of the present application provide a method for determining a self-loop, a short-loop, and a parallel interleaving relationship, and the method can determine the self-loop, the short-loop, and the parallel interleaving relationship existing in all tracks, thereby effectively excluding two nodes that do not belong to a target node pair, and avoiding performing an erroneous deletion on a bidirectional edge between the nodes.
In some embodiments, before the deleting directly follows the bidirectional edge on the graph between the pair of target nodes, the method further comprises: obtaining all tracks according to the event log; identifying all target node pairs through all the tracks.
Some embodiments of the present application provide a method of screening target node pairs from trace data in an event log.
In some embodiments, said identifying all target node pairs through said all trajectories comprises: searching and counting attribute information for identifying the parallel relationship in all the tracks, wherein the attribute information comprises self loops, short loops, parallel staggered relationship, the number of self loops, the number of short loops and the number of parallel staggered relationship existing in each track; and searching the target node pairs in all the tracks according to the attribute information.
According to some embodiments of the application, the target node pairs are screened by counting attribute information used for identifying the parallel relationship, and the accuracy of the obtained target node pairs can be improved.
In some embodiments, the finding and counting attribute information for identifying a parallel relationship in all the tracks includes: if a segment from a node to the segment < \ 8230a, a, \8230; > exists in one track, the self-loop of the node a is confirmed to exist; and counting the self-loops of the node a in all the tracks to obtain the self-loop number of the self-loops of the node a.
Some embodiments of the present application provide a method of how to determine whether a self-loop exists based on a trajectory.
In some embodiments, the finding and counting attribute information for identifying a parallel relationship in all the tracks includes: if a segment < \ 8230;, a, b, a, \ 8230;) returning from the node a to the node b again through the node b exists in one track, confirming that a short ring exists between the node a and the node b; and counting short loops from the node a to the node b existing in all the tracks to obtain the number of the short loops from the node a to the node b.
Some embodiments of the present application provide a method of how to determine whether a short loop exists through a trajectory.
In some embodiments, the finding and counting attribute information for identifying a parallel relationship in all the tracks includes: if a segment < \ 8230;, a, b, \ 8230;, and a segment < \ 8230;, b, a, \ 8230;, and a, from the node b to the node a exist in one or more tracks, confirming that a parallel staggered relationship exists between the node a and the node b; and counting the total number of the fragments from the node a to the node b to obtain a first cross correlation coefficient, and counting the total number of the fragments from the node b to the node a to obtain a second cross correlation coefficient.
Some embodiments of the present application provide a way to determine whether there are two nodes in a parallel staggered relationship through a trace.
In some embodiments, said finding said target node pair in all said trajectories according to said attribute information comprises: if the node a and the node b simultaneously satisfy the following three formulas, confirming that the node a and the node b form a pair of the target node pair:
| a → b | >0and ceiling ray b → a | >0 (formula 1)
Figure BDA0003955713430000041
Figure BDA0003955713430000042
Wherein | a → b | is used to characterize the first cross correlation coefficient, | b → a | characterizes the second cross correlation coefficient,
Figure BDA0003955713430000043
for characterizing a first number of short loops for counting segments present in all traces<…,a,b,a,…>Total number of (b), in>
Figure BDA0003955713430000044
For characterizing the total number of second short loops for characterizing the segments present in all tracks<…,b,a,b,…>Is used to characterize a first number of self-loops, which is the number of self-loops of node b, | a → a | is used to characterize a second number of self-loops, which is the number of self-loops of node a.
Some embodiments of the present application provide a method for screening a target node pair according to attribute information and a corresponding calculation formula, so that the screened node pair belongs to nodes having a parallel relationship of bidirectional edges, and a direct follow-up graph can be effectively simplified by deleting the bidirectional edges between the nodes.
In some embodiments, the performing a flow mining analysis according to the target direct follow-up graph includes: and diagnosing or optimizing the business process according to the target direct following graph.
Some embodiments of the present application provide a method for business process diagnostics or optimization based on a target direct follow-up graph.
In a second aspect, some embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, may implement the method as described in any of the embodiments of the first aspect.
In a third aspect, some embodiments of the present application provide an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, may implement the method according to any of the embodiments of the first aspect.
In a fourth aspect, some embodiments of the present application provide an apparatus for process mining analysis, including: the target direct following graph acquisition module is configured to delete a bidirectional edge between a target node pair on a direct following graph to obtain a target direct following graph, wherein the target node pair comprises two nodes belonging to a parallel relation; and the flow mining analysis module is configured to perform flow mining analysis according to the target direct following graph.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a system architecture diagram for processing log data according to an embodiment of the present disclosure;
FIG. 2 is a flowchart of a method for process mining analysis according to an embodiment of the present disclosure;
FIG. 3 is a second flowchart of a method for process mining analysis according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a self-loop provided by an embodiment of the present application;
FIG. 5 is a schematic view of a short ring provided in accordance with an embodiment of the present application;
FIG. 6 is an example of a direct follow-up provided by an embodiment of the present application;
FIG. 7 is an illustration of a target follower graph obtained after removing bidirectional edges between pairs of target nodes from the direct follower graph of FIG. 6 according to an embodiment of the present application;
FIG. 8 is a block diagram illustrating an apparatus for process mining analysis according to an embodiment of the present disclosure;
fig. 9 is a schematic composition diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not construed as indicating or implying relative importance.
In order to introduce the technical solutions of the embodiments of the present application more clearly, the concept related to event logs is first exemplified.
An event log file consists of several log entries, each log entry containing three key fields necessary for process mining: the process instance id, the activity (or represented as an action), and the timestamp time, where each process instance corresponds to one process instance id, and one process instance may include multiple activities activity, and each activity has a start execution time. All log entries belonging to the same process instance number, case id, are arranged according to time to obtain activity, which represents an occurrence sequence of activity in the process of executing the process instance, where the sequence represents a trace (trace) of executing the process, it is understood that trace data related to some embodiments of the present application includes one or more such traces, each trace in the related art may be characterized as a sequence of nodes and each node corresponds to an activity, two or more adjacent nodes in the trace characterized by the nodes serve as a segment and it is understood that two adjacent segments are adjacent in the sequence of nodes (each node corresponding to each different activity in the sequence of nodes is characterized by a unique letter), a certain activity instance may be uniquely identified by a position of the activity in the trace, and different cases may have the same trace (trace).
For example, table 1 is log data corresponding to an acquired reimbursement service of a certain enterprise, and the activity sequences included in the flow instance with the flow instance number of 1, that is, the case id of 1 in table 1 are in turn: registering a request register request, scrutinizing an amine thread, checking a ticket check ticket, deciding a decide, and rejecting a request, all activities of this sequence constituting a trace. The sequence of activities included for the flow instance with the flow instance number 2 in table 1, i.e. the case id is 2, is as follows: registration request register request, check ticket, review amine thoroughly, decision decide, and pay compensation, all activities of this sequence constituting a trace. That is, a trace (trace) is a sequence of activities that is an execution trace of an instance of a process, and each element in the sequence is an event that occurs to a case. The remaining log entries in table 1 include process instances and traces that are described above with reference to the above description, which is not explained in any greater detail for the sake of brevity.
It should be noted that, each activity is represented by letters on the process model corresponding to the directly following graph, an activity corresponds to a node on the process model graph (for example, an activity corresponds to a node in table 1 below, and the same activity node is identified identically), and the number on the edge between the node and the node on the directly following graph is used to record the total number of process instances including the segment.
TABLE 1 event Log
Figure BDA0003955713430000081
Referring to fig. 1, fig. 1 is a system for processing log data according to some embodiments of the present application, where the system includes a plurality of terminal devices (for example, a first terminal 110 and a second terminal 120 shown in fig. 1, and the present application does not limit the specific number of the terminal devices) that generate event logs and a flow mining server 130, where the terminal devices in fig. 1 send the respective generated event logs to the flow mining server 130, for example, the first terminal 110 sends a first event log 101 to the flow mining server 130 and the second terminal 120 sends a second event log 102 to the flow mining server 130.
In some embodiments of the present application, the process mining server 130 is configured to generate a corresponding target follower graph from the received event log data.
It should be noted that fig. 1 is only an exemplary architecture of the present application, and in some embodiments of the present application, event log data generated by each device may also be sent to the flow mining server 130 through a gateway. Those skilled in the art can design a corresponding implementation architecture according to actual situations.
As shown in FIG. 2, this figure illustratively provides a method of process mining analysis performed by the process mining server 130 of FIG. 1.
The method for process mining analysis provided by some embodiments of the present application exemplarily includes:
s101, deleting a bidirectional edge between a target node pair on the direct following graph to obtain the target direct following graph, wherein the target node pair comprises two nodes belonging to a parallel relation.
And S102, carrying out flow mining analysis according to the target direct following graph. For example, S102 illustratively includes: and diagnosing or optimizing the business process according to the target direct following graph.
That is to say, some embodiments of the present application simplify the direct following graph by performing S101 and S102 to delete the bidirectional edge located between two nodes in the parallel relationship, so that the complexity of the direct following graph is effectively reduced, and the technical effect of performing flow mining or flow optimization based on the direct following graph can be improved.
It should be noted that, in order to avoid deleting the bidirectional edge between the two nodes by mistake, the target node pair described in S101 needs to be identified before performing S101 in some embodiments of the present application.
The following exemplary process of obtaining a target node pair.
For example, in some embodiments of the present application, the target node pair is selected from a candidate target node pair, that is, in some embodiments of the present application, the two nodes that make up the target node pair preferably need to satisfy the characteristics of the candidate target node pair. For example, in some embodiments of the application, the process of obtaining candidate target node pairs and target nodes illustratively includes: if a parallel staggered relationship exists between a first node and a second node and a short ring does not exist between the first node and the second node according to the track data, determining that the first node and the second node form a candidate target node pair; or if the cross relationship between a first node and a second node is obtained according to the track data and at least one node of the first node and the second node has a self-loop, determining that the first node and the second node form a candidate target node pair; selecting at least a portion of node pairs from the candidate target node pairs as the target node pairs. It should be noted that, in some embodiments of the present application, a candidate target node pair may be directly used as the target node pair, and in some embodiments of the present application, in order to more accurately identify two nodes storing a parallel relationship of bidirectional edge connections, it is further necessary to determine a proportional relationship between the two nodes and determine whether the candidate target node belongs to the target node pair according to the proportional relationship, for example, it is determined whether a value of the proportional relationship satisfies a requirement by equation 3 below, and two nodes satisfying the requirement are determined to belong to the target node pair, and a candidate target node pair not satisfying the equation 3 is not used as the target node pair.
The definitions of the parallel interleaving relationship, self-loops, and short loops described above in some embodiments of the present application are as follows.
For example, in some embodiments of the present application, the parallel-interleaved relationship is used to characterize that a first segment from a third node to a fourth node exists and a second segment from the fourth node to the third node exists in one or more tracks, and the first segment and the second segment are adjacent segments or non-adjacent segments if the first segment and the second segment are in the same track. For example, if there is a segment < \ 8230;, a, b, \ 8230; > from node a to node b and a segment < \ 8230;, b, a, \8230;) from the node b to the node a in one or more tracks, then a parallel staggered relationship between the node a and the node b is confirmed.
For example, in some embodiments of the present application, the self-loop is used to characterize the existence of a segment of a node to itself in a trajectory.
For example, in some embodiments of the present application, the short ring is used to characterize that a first segment of a fifth node to a sixth node exists and a second segment of the sixth node to the fifth node exists in one track, and the first segment and the second segment are adjacent segments in the one track. For example, if there is a segment < \ 8230;, a, b, a, \ 8230;, as an example of a second segment) in a trace from node a via node b (as an example of a first segment) and back to node a by node b, then the existence of a short loop between node a and node b is confirmed. It will be appreciated that the segment < \ 8230;, a, b, a, \ 8230; > includes a first segment (i.e., from node a to node b) and a second segment (i.e., from node b to node a) that are contiguous.
That is to say, some embodiments of the present application provide a method for determining a self-loop, a short-loop, and a parallel interleaving relationship, and by using these methods, the self-loop, the short-loop, and the parallel interleaving relationship existing in all tracks can be determined, so that two nodes not belonging to a target node pair can be effectively excluded, and a bidirectional edge between these nodes is prevented from being deleted erroneously.
In combination with the above, it is found that in some embodiments of the present application, before the deleting the bidirectional edge directly following the graph between the target node pair at S101, the method further includes:
in the first step, all traces are obtained from the event log.
And extracting all track traces from the event log data to obtain track data.
And secondly, identifying all target node pairs through all tracks.
For example, the second step illustratively includes:
and a first sub-step of searching and counting attribute information for identifying the parallel relationship in all the tracks, wherein the attribute information comprises self loops, short loops, parallel staggered relationships, the number of self loops, the number of short loops and the number of parallel staggered relationships existing in each track.
For example, in some embodiments of the present application, the process of the first sub-step identifying self-loops and number of self-loops illustratively comprises: if a segment < \ 8230A, a, \8230;) from the node a to the segment itself exists in one track, confirming that a self-loop of the node a exists; and counting the self-loops of the node a in all the tracks to obtain the self-loop number of the self-loops of the node a. Some embodiments of the present application provide a method of how to determine whether a self-loop exists based on a trajectory.
For example, in some embodiments of the present application, the first sub-step identifying short rings and the number of short rings illustratively comprises: if there is a segment < \ 8230;, a, b, a, 8230; (it is understood that there is a first segment from node a to node b and a second segment from node b to node a in the trace, and both segments belong to adjacent segments in the trace) from node a to node b back to node b in a trace, then the existence of a short loop from node a to node b is confirmed; and counting short rings from the node a to the node b in all the tracks to obtain the number of the short rings from the node a to the node b. Some embodiments of the present application provide a method of how to determine whether a short loop exists based on a trajectory.
For example, in some embodiments of the present application, the process of identifying parallel interlace relationships and parallel interlace relationship coefficients in the first sub-step illustratively comprises: if a segment < \ 8230;, a, b, \ 8230; > from a node a to a node b exists and a segment < \ 8230; b, a, \ 8230;) from the node b to the node a exists in one or more tracks, confirming that a parallel staggered relationship exists between the node a and the node b; and counting the total number of the fragments from the node a to the node b to obtain a first cross correlation coefficient, and counting the total number of the fragments from the node b to the node a to obtain a second cross correlation coefficient.
And a second sub-step of searching the target node pairs in all the tracks according to the attribute information.
For example, in some embodiments of the present application, the second sub-step illustratively comprises: if the node a and the node b simultaneously satisfy the following three formulas, confirming that the node a and the node b form a pair of the target node pair:
| a → b | >0and ceiling B → a | >0 (formula 1)
Figure BDA0003955713430000121
Figure BDA0003955713430000122
Wherein | a → b | is used to characterize the first interleaving relation numberAnd | b → a | represents the second interleaving correlation coefficient,
Figure BDA0003955713430000123
for characterizing a first number of short loops for counting segments present in all traces<…,a,b,a,…>Total number of (b), in>
Figure BDA0003955713430000124
For characterizing the total number of second short loops for characterizing the segments present in all tracks<…,b,a,b,…>Is used to characterize a first number of self-loops, which is the number of self-loops of node b, | a → a | is used to characterize a second number of self-loops, which is the number of self-loops of node a.
Some embodiments of the present application provide a method of screening target node pairs from trace data in an event log.
The method for flow mining analysis provided by the embodiment of the application is exemplarily described below with reference to fig. 3.
S201, reading the log file.
S202, generating a direct following graph.
And constructing a direct following graph DFG, and regarding each trace, taking an activity as a node (node), and creating a connecting line (link) between adjacent activities of a timestamp so as to form a network as the direct following graph DFG.
For example, the generated direct-follow graph is shown in fig. 4, five nodes a, b, c1, c2 and d are included in the flow model, and these nodes are also connected by edges, and the numbers on the edges in fig. 4 are used to characterize the number of flow instances containing the segment, i.e., the case number.
S203, obtaining the track data according to the log file, and identifying self-loop, short-loop and parallel interleaving relations according to the track data.
Firstly, a trace set is constructed according to the log file, and the case number of each trace is counted.
Sequencing according to a process instance number caseID in a log file, forming a track trace by the log items with the same process instance number caseID to obtain track data, sequencing according to time map for each track trace to obtain activity sequences, wherein the tracks with the same activity sequences represent the same execution path.
And secondly, searching and counting self-loop, short-loop, parallel staggered relation and quantity in a trace set corresponding to the track data.
As shown in FIG. 4, the self-looping relationship self (a) means that there are edge links pointing from the node a to a (marked as a → a or self (a)) in all trace traces, and a → a is expressed as < \ 8230; a, a, \ 8230;).
As shown in fig. 5, the short-loop relationship short (a, b) means that given two nodes a and b, when | a → a | =0, and
Figure BDA0003955713430000131
when, a is said to form a short-loop relationship with b, wherein->
Figure BDA0003955713430000132
Means that an edge link from the node a to the node b exists in any trace, and the node b at the position exists in the edge link to the node a, and the relationship between a and b is called ≥ h>
Figure BDA0003955713430000133
In a track trace>
Figure BDA0003955713430000134
Is embodied as<…,a,b,a,…>. It will be appreciated that>
Figure BDA0003955713430000135
Is composed of<…,b,a,b,…>。
The parallel interleaving relationship (a, b) refers to the existence of a relationship from node a to node b in the trace, which is denoted as a → b, and the existence of a relationship from node b to node a, which is denoted as b →a . When a bidirectional edge exists between a and b in the direct following graph DFG, it is said that a parallel interleaving relationship exists between the two nodes. It will be appreciated that the parallel interleaved relationship includes short loopsIn this case.
And S204, screening the target node pair according to the information identified in the S203.
And S204 is executed, namely the parallel relation is searched in the trace set.
The parallel relationship refers to that when the DFG and two nodes a and b are given, and the following 3 conditions are satisfied, the nodes a and b are referred to as having the parallel relationship, which is denoted as a | | b.
| a → b | >0and ceiling B → a | >0 (formula 1)
Figure BDA0003955713430000141
Figure BDA0003955713430000142
It should be noted that specific meanings of each parameter in the above three formulas can be referred to above to avoid repetition, and are not described herein in detail. The meaning of equation 1 is that there are sequence segments from node a to node b and also segment from node b to node a in the trace contained in the log, which is the basic condition of the parallel relationship. The meaning of formula 2 is: the two nodes satisfying the condition of formula 1 are not necessarily parallel relationship nodes, for example, two nodes of the short ring may be excluded from the candidate target node pair (i.e. the short ring is prevented from being recognized as a parallel relationship) by formula 2, or a node-class parallel staggered relationship in which at least one self-ring exists in the two nodes of the two parallel staggered relationships may be retained in the candidate target node pair by formula 2. Equation 3 is used to implement the proportional control of node a to node b and node b to node a.
That is, in some embodiments of the present application, node a and node b are in a parallel relationship when there is { <8230, a, b, 8230; > <8230, b, a, 8230; >, and there is no < \8230;, a, b, a, 8230; >, in one or more trace traces.
It is understood that in the DFG of fig. 6, the parallel relationship is reflected as a bidirectional edge between c1 and c2 in fig. 6, and since the bidirectional edge does not reflect the control flow relationship between c1 and c2, but only reflects the occurrence time sequence thereof, in some embodiments of the present application, it should be deleted, otherwise, when the DFG is large in scale, the relationship of the flow cannot be understood. The embodiment of the application can effectively simplify the DFG by deleting the parallel staggered edges. That is, c1 and c2 of fig. 4 may be identified as a target node pair by performing an embodiment of the present application.
And S205, deleting the bidirectional edges between the target node pairs in the direct following graph to obtain the target direct following graph.
Taking fig. 6 as an example, it is determined that the c1 node and the c2 node of fig. 6 constitute a target node pair through the above steps, so that the target direct following graph shown in fig. 7 is obtained after the deletion operation of S205 is performed, and as can be seen by comparing fig. 6 and fig. 7, fig. 7 simplifies the direct following graph of fig. 6.
And S206, outputting the target direct following graph for further process mining analysis.
Referring to fig. 8, fig. 8 shows a device for flow mining analysis provided in an embodiment of the present application, it should be understood that the device corresponds to the method embodiment of fig. 2 described above, and is capable of performing various steps related to the method embodiment described above, and specific functions of the device may be referred to the description above, and detailed descriptions are appropriately omitted here to avoid redundancy. The apparatus includes at least one software function module capable of being stored in a memory in the form of software or firmware or being fixed in an operating system of the apparatus, where the apparatus logs data, including: the target directly follows the graph acquisition module 301 and the flow mining analysis module 302.
The target direct following graph obtaining module 301 is configured to delete a bidirectional edge between a target node pair on the direct following graph, so as to obtain a target direct following graph, where the target node pair includes two nodes belonging to a parallel relationship.
A flow mining analysis module 302 configured to perform a flow mining analysis according to the target direct-following graph.
According to some embodiments of the application, the direct following graph is simplified by deleting the bidirectional edge between the two nodes in the parallel relationship, so that the complexity of the direct following graph is effectively reduced, and the technical effect of performing flow mining or flow optimization based on the direct following graph can be improved.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method, and redundant description is not repeated here.
Some embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, may implement the method of any of the embodiments included in the method of flow mining analysis according to the first aspect described above.
As shown in fig. 9, some embodiments of the present application provide an electronic device 500, where the electronic device 500 includes a memory 510, a processor 520, and a computer program stored on the memory 510 and executable on the processor 520, and when the processor 520 reads the program through a bus 530 and executes the program, the method according to any of the embodiments included in the method for flow mining analysis described above can be implemented.
Processor 520 may process digital signals and may include various computing structures. Such as a complex instruction set computer architecture, a architecturally reduced instruction set computer architecture, or an architecture that implements a combination of multiple instruction sets. In some examples, processor 520 may be a microprocessor.
Memory 510 may be used to store instructions that are executed by processor 520 or data related to the execution of the instructions. The instructions and/or data may include code for performing some or all of the functions of one or more of the modules described in embodiments of the application. The processor 520 of the disclosed embodiments may be used to execute instructions in the memory 510 to implement the method shown in fig. 2. Memory 510 includes dynamic random access memory, static random access memory, flash memory, optical memory, or other memory known to those skilled in the art.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

Claims (12)

1. A method of process mining analysis, the method comprising:
deleting a bidirectional edge between a target node pair on the direct following graph to obtain a target direct following graph, wherein the target node pair comprises two nodes belonging to a parallel relation;
and carrying out flow mining analysis according to the target direct following graph.
2. The method of claim 1, wherein prior to said removing bidirectional edges on the immediately following graph between the pair of target nodes, the method further comprises:
if a parallel staggered relationship exists between a first node and a second node and a short ring does not exist between the first node and the second node according to the track data, determining that the first node and the second node form a candidate target node pair; or if the cross relationship between a first node and a second node is obtained according to the track data and at least one node of the first node and the second node has a self-loop, determining that the first node and the second node form a candidate target node pair;
selecting at least a portion of node pairs from the candidate target node pairs as the target node pairs.
3. The method of claim 2,
the parallel interleaving relation is used for representing that a first segment from a third node to a fourth node exists in one or more tracks, a second segment from the fourth node to the third node exists, and if the first segment and the second segment are in the same track, the first segment and the second segment are adjacent segments or non-adjacent segments;
the self-loop is used for representing that a segment from a node to the self exists in a track;
the short ring is used to characterize that there is a first segment of a fifth node to a sixth node and a second segment of the sixth node to the fifth node in one track, and the first segment and the second segment are neighboring segments.
4. The method of claim 1, wherein prior to the deleting a bidirectional edge on an immediately following graph located between a target node pair, the method further comprises:
obtaining all tracks according to the event log;
identifying all target node pairs through all traces.
5. The method of claim 4,
identifying all target node pairs through all trajectories includes:
searching and counting attribute information for identifying the parallel relationship in all the tracks, wherein the attribute information comprises self loops, short loops, parallel staggered relationship, the number of self loops, the number of short loops and the number of parallel staggered relationship existing in each track;
and searching the target node pairs in all the tracks according to the attribute information.
6. The method of claim 5, wherein the finding and counting of attribute information for identifying parallel relationships among all tracks comprises:
if a segment from a node to the segment < \ 8230a, a, \8230; > exists in one track, the self-loop of the node a is confirmed to exist;
and counting the self-loops of the node a in all the tracks to obtain the self-loop number of the self-loops of the node a.
7. The method of claim 6, wherein the finding and counting of attribute information for identifying parallel relationships among all tracks comprises:
if a segment < \ 8230;, a, b, a, \ 8230;, which returns to the node a from the node a through the node b, exists in one track, a short ring between the node a and the node b is confirmed to exist;
and counting short rings from the node a to the node b in all the tracks to obtain the number of the short rings from the node a to the node b.
8. The method of claim 7, wherein the finding and counting of attribute information for identifying parallel relationships among all tracks comprises:
if a segment < \ 8230;, a, b, \ 8230; > from a node a to a node b exists and a segment < \ 8230; b, a, \ 8230;) from the node b to the node a exists in one or more tracks, confirming that a parallel staggered relationship exists between the node a and the node b;
and counting the total number of the fragments from the node a to the node b to obtain a first cross correlation coefficient, and counting the total number of the fragments from the node b to the node a to obtain a second cross correlation coefficient.
9. The method of claim 8, wherein said finding said target node pairs in said all traces according to said attribute information comprises:
if the node a and the node b simultaneously satisfy the following three formulas, confirming that the node a and the node b form a pair of the target node pair:
i a → b I >0and a non-conducting circuit b → a I >0 (formula 1)
Figure FDA0003955713420000031
or |a→a|+|b→b|>0 (formula 2)
Figure FDA0003955713420000032
Wherein | a → b | is used to characterize the first cross correlation coefficient, | b → a | characterizes the second cross correlation coefficient,
Figure FDA0003955713420000033
for characterizing a first number of short loops for counting segments present in all traces<…,a,b,a,…>Total number of (b), in>
Figure FDA0003955713420000034
For characterizing a second total number of short rings characterizing the segments present in all traces<…,b,a,b,…>Is used to characterize a first number of self-loops, which is the number of self-loops of node b, | a → a | is used to characterize a second number of self-loops, which is the number of self-loops of node a.
10. The method of claim 1, wherein said performing a process mining analysis based on said target direct follow-up graph comprises:
and diagnosing or optimizing the business process according to the target direct following graph.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 10.
12. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program is adapted to implement the method of any of claims 1-10.
CN202211464343.4A 2022-11-22 2022-11-22 Method, medium and electronic device for process mining analysis Pending CN115907443A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211464343.4A CN115907443A (en) 2022-11-22 2022-11-22 Method, medium and electronic device for process mining analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211464343.4A CN115907443A (en) 2022-11-22 2022-11-22 Method, medium and electronic device for process mining analysis

Publications (1)

Publication Number Publication Date
CN115907443A true CN115907443A (en) 2023-04-04

Family

ID=86484393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211464343.4A Pending CN115907443A (en) 2022-11-22 2022-11-22 Method, medium and electronic device for process mining analysis

Country Status (1)

Country Link
CN (1) CN115907443A (en)

Similar Documents

Publication Publication Date Title
CN110177094B (en) User group identification method and device, electronic equipment and storage medium
CA2738480C (en) Detection of confidential information
US8468134B1 (en) System and method for measuring consistency within a distributed storage system
AU2014201595A1 (en) Computer-implemented systems and methods for comparing and associating objects
US20120226677A1 (en) Methods for detecting sensitive information in mainframe systems, computer readable storage media and system utilizing same
US10248517B2 (en) Computer-implemented method, information processing device, and recording medium
WO2016075915A1 (en) Log analyzing system, log analyzing method, and program recording medium
Goel et al. Quality-informed process mining: A case for standardised data quality annotations
KR20190069959A (en) System and method for managing dangerous factors in AEO certification process
CN112783749A (en) Static code scanning optimization method and device, electronic equipment and storage medium
US8745053B2 (en) Method for managing mainframe overhead during detection of sensitive information, computer readable storage media and system utilizing same
CN113590839A (en) Knowledge graph construction method, target service execution method and device
US11308130B1 (en) Constructing ground truth when classifying data
CN115907443A (en) Method, medium and electronic device for process mining analysis
CN110489416B (en) Information storage method based on data processing and related equipment
CN109101234B (en) Method and device for determining corresponding relation between page and business module
CN115904970A (en) Regression testing method and equipment
EP3812940A1 (en) Vulnerability analyzer
CN113868137A (en) Method, device and system for processing buried point data and server
CN114155012A (en) Fraud group identification method, device, server and storage medium
KR20220115859A (en) Edge table representation of the process
JP6508202B2 (en) INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM
CN117688564B (en) Detection method, device and storage medium for intelligent contract event log
US20220253529A1 (en) Information processing apparatus, information processing method, and computer readable medium
JP2013003681A (en) Service operation management device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination