CN117112644A - Anomaly discovery method and device based on process mining - Google Patents

Anomaly discovery method and device based on process mining Download PDF

Info

Publication number
CN117112644A
CN117112644A CN202310890540.0A CN202310890540A CN117112644A CN 117112644 A CN117112644 A CN 117112644A CN 202310890540 A CN202310890540 A CN 202310890540A CN 117112644 A CN117112644 A CN 117112644A
Authority
CN
China
Prior art keywords
cost
log data
cost function
search
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310890540.0A
Other languages
Chinese (zh)
Inventor
金涛
孙沛瑜
王建民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202310890540.0A priority Critical patent/CN117112644A/en
Publication of CN117112644A publication Critical patent/CN117112644A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Tourism & Hospitality (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides an anomaly discovery method and device based on process mining, comprising the following steps: acquiring log data to be processed, wherein the log data to be processed at least comprises log data to be mined and log data to be detected; calculating according to the log data to be mined to obtain a process model; sampling the log data to be detected to obtain a sampling result; calculating a cost by using a cost function according to the sampling result and the process model; the cost function comprises a transition cost function and an event cost function; and performing beam search-based process consistency detection on the log data to be processed according to the cost function, the process model and the cost to obtain a target alignment scheme, and finding out the abnormality of the log data to be processed according to the target alignment scheme. The invention accelerates the process consistency detection by using the beam searching method, and improves the accuracy of alignment calculation by using the cost function, thereby realizing acceleration and having more accurate result.

Description

Anomaly discovery method and device based on process mining
Technical Field
The invention relates to the technical field of computer data analysis, in particular to an anomaly discovery method and device based on process mining.
Background
The operation and maintenance of the industrial system comprises important links such as personalized evaluation of equipment states, anomaly detection, fault diagnosis, fault prediction and the like. With the rapid development of information technology, the integration of industrial systems and information technology is becoming more and more compact. At present, various sensors are widely used in the industry to monitor various state indexes of an industrial system at regular time, and then process mining technology is generally used to analyze sensor data so as to find anomalies of the industrial system, so that powerful support is provided for intelligent operation and production safety control of the industrial system.
The main contents of the process mining technology include: and calculating a process model which can represent the corresponding log data from the original log data by a model mining method, and detecting the process consistency of the new log data obtained from various sensors and the process model, thereby finding an abnormality. The purpose of anomaly discovery based on process mining is to find anomalies occurring in realistic events and locate them, thereby analyzing the causes of the anomalies and correcting the process model, and thus optimizing and improving the industrial system.
In order to find the position where the abnormality occurs, an alignment-based process consistency detection method is generally used when performing process consistency detection, but since the method calculates an alignment scheme based on a search method and because of complexity of a process model, the state of searching required in searching is tens of times huge, so that a considerable time is required for searching, and how to accelerate the progress of searching for the alignment scheme is also an important point of research in recent years.
For the problem of acceleration process consistency detection, many solutions have been proposed so far, such as a method based on sampling and estimation, a method based on subset selection and edit distance, a method based on dictionary tree, a method based on EMD distance, and so on. These methods have merits and merits, but there is a problem that efficiency and accuracy cannot be considered. For example, a partial algorithm can only calculate a process consistency detection metric of the process model and the log data as a whole, and cannot calculate a corresponding anomaly and position for each event log; part of the algorithm needs to calculate all event sequences which can be represented by the process model, so that the efficiency is not improved particularly greatly; the partial calculation of the estimated value algorithm can only obtain a range of the metric index, and cannot obtain a specific value of the index.
Disclosure of Invention
The invention provides an anomaly discovery method and device based on process mining, which are used for solving the defect that efficiency and accuracy cannot be simultaneously considered in the prior art, and ensuring higher accuracy while accelerating a process consistency detection algorithm.
The invention provides an anomaly discovery method based on process mining, which comprises the following steps:
Acquiring log data to be processed, wherein the log data to be processed at least comprises log data to be mined and log data to be detected;
calculating according to the log data to be mined to obtain a process model; sampling the log data to be detected to obtain a sampling result;
calculating a cost by using a cost function according to the sampling result and the process model; the cost function comprises a transition cost function and an event cost function;
and performing beam search-based process consistency detection on the log data to be processed according to the cost function, the process model and the cost to obtain a target alignment scheme, and finding out the abnormality of the log data to be processed according to the target alignment scheme.
According to the anomaly discovery method based on process mining provided by the invention, according to the sampling result and the process model, the cost is calculated by using a cost function, and the anomaly discovery method concretely comprises the following steps:
according to the sampling result and the process model, calculating to obtain a basic alignment scheme by using an alignment-based process consistency detection method;
counting the number of each event synchronization step, model step and log step in the basic alignment scheme to obtain a statistical result;
And calculating the cost by using a cost function according to the statistical result.
According to the anomaly discovery method based on process mining provided by the invention, the cost is calculated by using a cost function according to the statistical result, and the anomaly discovery method concretely comprises the following steps:
calculating the transition cost of each transition in the basic alignment scheme by utilizing a transition cost function according to the statistical result;
wherein the transition cost function comprises:
wherein Cost (transition) is a transition cost function; t_cnt represents the sum of the number of synchronization and model steps; skip represents the number of model steps; totnum represents the sum of the number of sync, model and log steps;
calculating the event cost of each event in the basic alignment scheme by using an event cost function according to the statistical result;
the event cost function includes:
where Cost (event) is an event Cost function; totnum represents the sum of the number of sync, model and log steps; insert represents the number of log steps; e_cnt represents the sum of the number of sync and log steps;
the transition cost costs for all transitions and the event cost costs for all events constitute the cost.
According to the anomaly discovery method based on process mining provided by the invention, the process consistency detection based on the beam search is carried out on the log data to be processed according to the cost function, the process model and the cost, so as to obtain a target alignment scheme, which comprises the following steps:
S1: extracting an event sequence in the log data to be processed, constructing a search queue, and adding an initial state pre-constructed according to the event sequence into the search queue;
s2: extracting search states with the minimum cost of a preset beam width number from the search queue, and deleting the search states which are not extracted and have the same cost as the search states;
s3: expanding the search state by using the process model, sequentially enumerating each excitable transition of the search state, and respectively calculating the subsequent search states of the model step, the log step and the synchronization step of the search state to obtain a possible state set;
s4: calculating the cost of each subsequent search state in the possible state set by using the cost, and adding the subsequent search state and the cost thereof into a search queue;
s5: repeating the steps S2-S4 until the ending state is searched or the maximum search iteration number is reached, so as to obtain a target search queue;
s6: and traversing all event sequences in the log data to be processed, and forming a target alignment scheme by all target search queues.
According to the anomaly discovery method based on process mining provided by the invention, the log data to be detected is sampled to obtain a sampling result, and the method specifically comprises the following steps:
Randomly sampling the log data to be detected or sampling according to the occurrence frequency to obtain a sampling result;
wherein the frequency of occurrence is a frequency of occurrence of a sequence of events included in the log data to be detected.
According to the anomaly discovery method based on process mining provided by the invention, the log data to be detected is sampled to obtain a sampling result, and the method specifically comprises the following steps:
and clustering the event sequences in the log data to be detected, and selecting a clustering center as a sampling result.
The invention also provides an abnormality discovery device based on process mining, comprising:
the acquisition unit is used for acquiring log data to be processed, wherein the log data to be processed at least comprises log data to be mined and log data to be detected;
the model and sampling unit is used for calculating to obtain a process model according to the log data to be mined; sampling the log data to be detected to obtain a sampling result;
the calculating unit is used for calculating cost by utilizing a cost function according to the sampling result and the process model; the cost function comprises a transition cost function and an event cost function;
and the detection unit is used for carrying out process consistency detection based on beam search on the log data to be processed according to the cost function, the process model and the cost to obtain a target alignment scheme, and finding out the abnormality of the log data to be processed according to the target alignment scheme.
The invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the abnormality discovery method based on process mining when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a process mining based anomaly discovery method as described in any one of the above.
The present invention also provides a computer program product comprising a computer program which when executed by a processor implements a process mining based anomaly discovery method as described in any one of the above.
According to the anomaly discovery method and device based on process mining, log data to be processed is obtained, and the log data to be processed at least comprises log data to be mined and log data to be detected; calculating according to the log data to be mined to obtain a process model; sampling the log data to be detected to obtain a sampling result; calculating a cost by using a cost function according to the sampling result and the process model; the cost function comprises a transition cost function and an event cost function; and performing beam search-based process consistency detection on the log data to be processed according to the cost function, the process model and the cost to obtain a target alignment scheme, and finding out the abnormality of the log data to be processed according to the target alignment scheme. The invention accelerates the process consistency detection by using the beam searching method, improves the accuracy of the alignment result calculation by using the improved cost function, realizes the acceleration and can obtain a more accurate alignment scheme at the same time, thereby finding the abnormality more rapidly and accurately.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow diagram of a process mining-based anomaly discovery method provided by the present invention;
FIG. 2 is a second flow chart of the anomaly discovery method based on process mining provided by the present invention;
FIG. 3 is a schematic diagram of the abnormality discovery apparatus based on process mining provided by the present invention;
fig. 4 is a schematic structural diagram of an electronic device provided by the present invention.
Reference numerals:
310: an acquisition unit; 320: a model and a sampling unit; 330: a calculation unit; 340: a detection unit;
410: a processor; 420: a communication interface; 430: a memory; 440: a communication bus.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The process mining-based anomaly discovery method of the present invention is described below with reference to fig. 1-2, and fig. 1 is one of the flow charts of the process mining-based anomaly discovery method provided by the present invention, as shown in fig. 1, and the method includes the following steps:
step 110: and acquiring log data to be processed, wherein the log data to be processed at least comprises log data to be mined and log data to be detected.
It should be noted that the log data to be processed is event log data in the operation and maintenance process of the industrial system, and each event sequence is composed of event sequences obtained from a terminal such as a sensor, and each event sequence represents which events occur in sequence in the process of operating the industrial system in reality.
The invention is based on log data to be processed, and aims to find anomalies in industrial processes. An anomaly represents an event in which an anomaly occurs in the process, and is generally referred to as various anomalies in industrial production. Anomalies typically result in a mismatch between the process model and the sequence of events in the log, which can be discovered by way of computing an alignment scheme. It should be noted that there may be one or more anomalies in a sequence of events.
Further, the log data to be processed is divided into log data to be mined for mining a process model and log data to be detected for process consistency detection according to the data flow direction of the log data to be processed. It should be appreciated that a process model is an abstraction of a real process using mathematical language, and the process is now generally described using a Petri net consisting of a library, transitions, tuokons, arcs, and a state represented by the distribution of tuokons in the library.
It should be noted that the distinction between log data to be mined and log data to be detected is determined at the time of input. That is, when log data to be processed is acquired, a distinction has been made as to which log files are for model mining and which log files are for process consistency detection. In some embodiments, the log file before each time is taken as log data to be mined, and the log file after the time is taken as log data to be detected.
Step 120: calculating according to the log data to be mined to obtain a process model; and sampling the log data to be detected to obtain a sampling result.
There is no precedence between the process model mining and the sampling steps, and in some embodiments, the process model mining is performed first, followed by the sampling. It is also possible to sample first, followed by model mining.
Specifically, event log data for model mining is model mined using a process mining method. Process mining refers to a method of taking event log data as input and outputting a process model that can represent the event log data. Specifically, an existing model mining algorithm may be selected to model mine log data to be mined. In the actual operation process, the model mining algorithm may be a model mining algorithm such as inductive mining, and the invention is not limited to this.
In addition, a sampling method is used for sampling the log records needing to be subjected to process consistency detection, and the sampling method comprises the following steps: and sampling the event sequence by a random sampling method, a sampling method based on the occurrence frequency or a clustering method to obtain a sampling result.
It should be noted that the process consistency detection refers to a method of taking a process model and log data as inputs and discovering anomalies in the process model and log data as outputs. The log record that needs to be subjected to process consistency detection refers to log data to be detected.
Step 130: calculating a cost by using a cost function according to the sampling result and the process model; the cost functions include a transition cost function and an event cost function.
Specifically, the cost is calculated according to the sampling result by using a preset cost function. It should be noted that before calculating the cost, the process model is used to perform process consistency detection on the sampling result to obtain a basic alignment scheme, and then the number of alignment steps in the basic alignment scheme is used to calculate the cost through a cost function. The cost comprises the cost corresponding to each transition and the cost corresponding to each event in the log data, and for the convenience of distinguishing, the cost corresponding to the transition is recorded as the transition cost, and the cost corresponding to the event is recorded as the event cost. In practice, the cost of the penalty includes the cost of each alignment step that may be performed during the alignment process, for calculating the search state and the cost of the target alignment scheme in further beam search based process consistency detection.
Step 140: and performing beam search-based process consistency detection on the log data to be processed according to the cost function, the process model and the cost to obtain a target alignment scheme, and finding out the abnormality of the log data to be processed according to the target alignment scheme.
Specifically, using the cost calculated in the stage step 130 and the process consistency detection of the joining beam search method for the process model and all log data to be processed, a target alignment scheme is obtained, and an anomaly is found according to the target alignment scheme.
In some embodiments, all event sequences in log data to be mined are sampled in turn, the cost is calculated, and the process consistency detection based on the beam search is performed according to a cost function and a process model, so that an alignment scheme is obtained.
It should be noted that the basic principle of the beam search method is to divide the search into stages in the process of searching, and perform state expansion on only part of the search states in each stage, so as to accelerate the search process by reducing the number of states traversed during the search. The method is used for accelerating the searching method when initially proposed, and can also be used for accelerating the process of searching the alignment scheme of the process model and the event sequence. Essentially, the beam search method is a search method that reduces the accuracy of partial search and increases the search speed.
The process consistency detection part introduces optimization such as a beam search method, an improvement cost function and the like on the basis of an alignment-based process consistency detection algorithm. The process consistency detection algorithm is accelerated, and meanwhile, higher accuracy can be ensured, and a more excellent alignment scheme is obtained.
Based on the above embodiment, in the method, according to the sampling result and the process model, a cost function is used to calculate a cost, which specifically includes:
step 210: and calculating to obtain a basic alignment scheme by using an alignment-based process consistency detection method according to the sampling result and the process model.
Specifically, the basic alignment scheme is found by a search method, the search state is (current petri net marking, current event sequence matching subscript), the initial state is (initial marking, 0), and the termination state is (termination marking, event sequence length). The sampling result is expanded based on a process model through a search method, each excitable transition is enumerated in sequence in the expansion search state, and finally the possibility of log steps is enumerated, so that a basic alignment scheme is obtained.
Step 220: and counting the number of each event synchronization step, model step and log step in the basic alignment scheme to obtain a statistical result.
Specifically, counting how many synchronization steps, model steps and log steps are respectively carried out on each event in the basic alignment scheme, and obtaining the number of the synchronization steps, the model steps and the log steps of each event in the basic alignment scheme as a statistical result.
It should be appreciated that in the alignment scheme, a legal alignment step can be written as (T, a), T e T { > }, a e a { > }, T and a cannot be skip events "> >", where T is the set of all transitions in the Petri network. Such a combination of (t, a) indicates that t corresponds to a during the alignment process or that one of t and a is a skip event. It should be noted that one of t and a being a skip event means that the other transition, which is not a skip event, does not find the transition or event corresponding to it during the alignment. Wherein in order to ensure that each alignment step is meaningful, i.e. that the alignment process is helpful, it is provided that t and a cannot be skip events at the same time. To classify the type of alignment step, the alignment step is divided into steps where neither t nor a is a skip event; t is a log step of a skip event, a is not a skip event; and t is not a skip event but a is a model step of a skip event.
In some embodiments, in the statistical result, for a certain event or transition in the basic alignment scheme, the occurrence number corresponding to the synchronization step is denoted as sync, the occurrence number corresponding to the model step is denoted as skip, and the occurrence number corresponding to the log step is denoted as insert. Further, the statistics further include: calculating the sum of the occurrence times of each event synchronization, model step and log step in the basic alignment scheme, and marking as totnum; calculating the sum of the occurrence times of each event model step and each synchronization step in the basic alignment scheme, and recording the sum as t_cnt; the sum of the number of occurrences of each event log step and synchronization step in the base alignment scheme is calculated and recorded as e_cnt.
Step 230: and calculating the cost by using a cost function according to the statistical result.
Specifically, the cost is calculated according to the statistical result by using a preset cost function. It should be appreciated that the cost function of the present invention is an improved cost function, including a transition cost function for each transition and an event cost function for each event in the process model; correspondingly, the cost costs include transition cost and event cost. When calculating the cost, calculating the transition cost corresponding to each transition in the model and the event cost corresponding to each event in the log data respectively to obtain the transition cost and the event cost. All transition cost costs and all event cost costs together constitute cost costs. The calculated cost is used for further process consistency detection.
Based on the above embodiment, in the method, calculating the cost by using a cost function according to the statistical result specifically includes:
calculating the transition cost of each transition in the basic alignment scheme by utilizing a transition cost function according to the statistical result;
wherein the transition cost function comprises:
wherein Cost (transition) is a transition cost function; t_cnt represents the sum of the number of synchronization and model steps; skip represents the number of model steps; totnum represents the sum of the number of sync, model and log steps;
calculating the event cost of each event in the basic alignment scheme by using an event cost function according to the statistical result;
the event cost function includes:
where Cost (event) is an event Cost function; totnum represents the sum of the number of sync, model and log steps; insert represents the number of log steps; e_cnt represents the sum of the number of sync and log steps;
the transition cost costs for all transitions and the event cost costs for all events constitute the cost.
Specifically, for each transition in the base alignment scheme, its transition cost is calculated using a transition cost function, and for each event in the base alignment scheme, its event cost is calculated using an event cost function. The transition cost costs of all transitions and the event cost costs of all events in the base alignment scheme together constitute cost costs.
Based on the above embodiment, in the method, performing, according to the cost function, the process model and the cost, a process consistency detection based on a beam search on the log data to be processed, to obtain a target alignment scheme, which specifically includes:
s1: extracting an event sequence in the log data to be processed, constructing a search queue, and adding an initial state pre-constructed according to the event sequence into the search queue;
s2: extracting search states with the minimum cost of a preset beam width number from the search queue, and deleting the search states which are not extracted and have the same cost as the search states;
s3: expanding the search state by using the process model, sequentially enumerating each excitable transition of the search state, and respectively calculating the subsequent search states of the model step, the log step and the synchronization step of the search state to obtain a possible state set;
s4: calculating the cost of each subsequent search state in the possible state set by using the cost, and adding the subsequent search state and the cost thereof into a search queue;
s5: repeating the steps S2-S4 until the ending state is searched or the maximum search iteration number is reached, so as to obtain a target search queue;
S6: and traversing all event sequences in the log data to be processed, and forming a target alignment scheme by all target search queues.
Specifically, a process model is used to perform process consistency detection based on a beam search method according to the sampling result. In the actual operation process, the alignment scheme with the minimum cost is found by a beam searching method, a searching process is needed to be carried out on each event sequence, and a searching queue obtained by searching all event sequences forms a target alignment scheme. Specifically, for each event sequence in log data to be processed, a search sequence is constructed, and an initial state is added to the search sequence. And extracting a preset beam width number search states with minimum cost from the search queue at each search step, and expanding each search state by using a process model. It should be understood that the search state is (current petri net marking, current event sequence matching subscript), the initial state is (initial marking, 0), and the termination state is (termination marking, event sequence length). For example, ({ start }, 0); ({ 1}, 1); ({ 2}, 2); ({ 3}, 3); ({ 4}, 4); ({ 5}, 5); ({ end }, 6).
For the search state with the smallest cost extracted from the search queue, the search state which is not extracted and has the same cost in the search queue is deleted at the same time of extraction. In the process of expanding the search state extracted from the search queue by using the process model, each excitable transition of the search state needs to be enumerated in turn, and the subsequent search states of the model step, the log step and the synchronization step of the search state are calculated respectively to obtain a possible state set. In one embodiment, for ({ 1}, 1) this search state, for example, two transitions may be initiated subsequent thereto: f (reject application) and g (accept application and go on the subsequent step), then he can have ({ 2}, 2); ({ end }, 1); ({ 1}, 2) three subsequent states, corresponding to excitation g, respectively, adding a synchronization step; exciting f, adding a model step; the transition is not stimulated, and a log step is added, and three subsequent states form a possible state set.
And then, calculating the cost of each subsequent search state in the possible state set by using the cost, and adding the subsequent search state and the cost thereof into a search queue. After the subsequent search state is added to the search queue, repeating the steps of extracting the search state and searching until the search reaches the end state or the maximum search iteration number is reached, so as to obtain a target search queue, wherein the target search queue comprises an alignment scheme of an event sequence. Further, the search path from the initial state to the termination state is the alignment scheme of the searched event sequence. It is noted that for each search state, there will be one search path from the initial state to the search state, which can be represented as part of the alignment scheme currently searched, the cost of the search state being the sum of the costs of each alignment step of this alignment scheme. In one embodiment, for example for a certain Petri net, the sequence of events < a, g, b, c, d, e > and its calculation alignment scheme, then there will be ({ start }, 0), ({ 1}, 1), ({ 2}, 2), ({ 3}, 3), ({ 4}, 4), ({ 5}, 5), ({ end }, 6) search states. The partial alignment schemes corresponding to the 7 search states respectively are < >; (a, a) >; (a, a), (g, g) >; (a, a), (g, g), (b, b) >; (a, a), (g, g), (b, b), (c, c) >; (a, a), (g, g), (b, b), (c, c), (d, d) >; (a, a), (g, g), (b, b), (c, c), (d, d), (e, e) >. Wherein (a, a), (g, g), (b, b), (c, c), (d, d), (e, e) represent alignment steps. The cost of each search state is the sum of the costs of each alignment step in the alignment scheme to which the search state corresponds. The cost of the previously calculated cost includes the cost of all the alignment steps that will be used in this step, such as (a, > >), (g, >) alignment steps and (> >, a), (>, b) alignment steps, respectively, which require more or less cost.
And finally, traversing all event sequences in the log data to be processed after obtaining target search queues of the extracted event sequences, wherein all target search queues form a target alignment scheme. Essentially, the search is a process of continuously calculating the subsequent state for each fetched state, eventually finding a termination state, with the goal of searching for an alignment scheme of the model with the currently processed event sequence.
And taking the cost of the alignment scheme obtained by using the alignment-based process consistency detection method in the prior art as the cost of the sampling result, putting the cost into a search queue, extracting the search state with the minimum cost from the search queue, and deleting other search states with the same cost as the extracted search state. In the implementation process, all search states in the current search queue are ordered from small to large according to the cost, the number of the search states with the preset beam width before the search state is taken out, and other search states with the same cost as the cost of the taken search states are deleted. That is, in the whole searching process, for all states of the same cost, only the preset number of the beam widths is selected for state expansion.
And then expanding the extracted search state. In calculating the cost of the new state that is extended, a cost function is used for the calculation. When expanding the search state, enumerating each excitable transition in turn, and finally enumerating the possibility of performing log steps, the cost function is essentially used to calculate the total cost of the search state and the final alignment scheme, and further evaluate the quality of an alignment scheme. It should be noted that the preset beamwidth is limited and can be flexibly selected, and a balance is found between accuracy and calculation cost. In one embodiment, the preset beamwidth is 2.
Based on the above embodiment, in the method, sampling the log data to be detected to obtain a sampling result, specifically includes:
randomly sampling the log data to be detected or sampling according to the occurrence frequency to obtain a sampling result;
wherein the frequency of occurrence is a frequency of occurrence of a sequence of events included in the log data to be detected.
Specifically, the log data to be detected is sampled randomly or according to the occurrence frequency. The random sampling is to sample by randomly grabbing the event sequence in the log data to be detected. Sampling according to the frequency of occurrence includes: the occurrence frequency of each event sequence of log data to be detected is calculated, sampling is carried out according to the occurrence frequency, and the finally sampled event sequence is the sampling result.
Based on the above embodiment, in the method, sampling the log data to be detected to obtain a sampling result, specifically includes:
and clustering the event sequences in the log data to be detected, and selecting a clustering center as a sampling result.
Specifically, the sampling method further comprises the steps of directly clustering event sequences in log data to be detected, and selecting a clustering center as a sampling result. In some embodiments, the sampling is performed using a K-media clustering method.
The anomaly discovery method based on process mining provided by the invention has the following advantages:
(1) The response speed is high when the process consistency detection is carried out, an alignment scheme obtained by carrying out alignment-based process consistency detection on the process model and the log data can be obtained by faster calculation, and the position of the occurrence of the abnormality can be found;
(2) When the process consistency detection is carried out, a new cost function is calculated, and the abnormal occurrence frequency is added into the calculation, so that the accuracy of the alignment scheme searching is improved.
Based on the above embodiments, a specific embodiment of performing one anomaly discovery by using the process consistency detection method for process mining provided by the present invention is provided, as shown in fig. 2, including the following steps:
S101: and (3) performing process model mining, wherein the process model mining is performed by utilizing a generalized mining algorithm according to the obtained log data to be processed.
S102: and sampling the log data to be detected by using a random sampling method to obtain a sampling event sequence as a sampling result.
Calculating the cost according to the sampling result, comprising the following steps:
s103: calculating a basic alignment scheme of the sampling event sequence based on an alignment process consistency detection method;
s104: counting the alignment steps in the alignment scheme; in this step, the alignment scheme refers to a basic alignment scheme;
s105: calculating a cost function according to the alignment result; calculating a cost function to obtain cost;
s106: finally, carrying out alignment scheme calculation again; and when the alignment scheme is calculated again, a beam searching method is adopted, a target alignment scheme is finally obtained, and the abnormality is found according to the target alignment scheme.
According to the anomaly discovery method based on process mining, log data to be processed is obtained, and the log data to be processed at least comprises log data to be mined and log data to be detected; calculating according to the log data to be mined to obtain a process model; sampling the log data to be detected to obtain a sampling result; calculating a cost by using a cost function according to the sampling result and the process model; the cost function comprises a transition cost function and an event cost function; and performing beam search-based process consistency detection on the log data to be processed according to the cost function, the process model and the cost to obtain a target alignment scheme, and finding out the abnormality of the log data to be processed according to the target alignment scheme. According to the method, according to given event log data, abnormal points of the log data are finally detected through four-stage processing of process model mining, event sequence sampling, cost function calculation and alignment scheme calculation, process consistency detection is accelerated through a beam searching method, accuracy of alignment result calculation is improved through an improved cost function, an accurate alignment scheme can be obtained while acceleration is achieved, and further abnormality discovery is quicker and more accurate.
The process mining-based abnormality discovery apparatus provided by the present invention will be described below, and the process mining-based abnormality discovery apparatus described below and the process mining-based abnormality discovery method described above may be referred to in correspondence with each other. Fig. 3 is a schematic structural diagram of the abnormality discovery apparatus based on process mining according to the present invention, and as shown in fig. 3, the apparatus includes an acquisition unit 310, a model and sampling unit 320, a calculation unit 330, and a detection unit 340. Wherein,
an obtaining unit 310, configured to obtain log data to be processed, where the log data to be processed at least includes log data to be mined and log data to be detected;
the model and sampling unit 320 is configured to calculate a process model according to the log data to be mined; sampling the log data to be detected to obtain a sampling result;
a calculating unit 330, configured to calculate a cost using a cost function according to the sampling result and the process model; the cost function comprises a transition cost function and an event cost function;
and a detection unit 340, configured to perform beam search-based process consistency detection on the log data to be processed according to the cost function, the process model and the cost, obtain a target alignment scheme, and find an anomaly of the log data to be processed according to the target alignment scheme.
Based on the above embodiment, in the apparatus, according to the sampling result and the process model, a cost function is used to calculate a cost, which specifically includes:
according to the sampling result and the process model, calculating to obtain a basic alignment scheme by using an alignment-based process consistency detection method;
counting the number of each event synchronization step, model step and log step in the basic alignment scheme to obtain a statistical result;
and calculating the cost by using a cost function according to the statistical result.
Based on the above embodiment, in the device, calculating the cost by using a cost function according to the statistical result specifically includes:
calculating the transition cost of each transition in the basic alignment scheme by utilizing a transition cost function according to the statistical result;
wherein the transition cost function comprises:
wherein Cost (transition) is a transition cost function; t_cnt represents the sum of the number of synchronization and model steps; skip represents the number of model steps; totnum represents the sum of the number of sync, model and log steps;
calculating the event cost of each event in the basic alignment scheme by using an event cost function according to the statistical result;
The event cost function includes:
where Cost (event) is an event Cost function; totnum represents the sum of the number of sync, model and log steps; insert represents the number of log steps; e_cnt represents the sum of the number of sync and log steps;
the transition cost costs for all transitions and the event cost costs for all events constitute the cost.
Based on the above embodiment, in the apparatus, performing, according to the cost function, the process model and the cost, a process consistency detection based on a beam search on the log data to be processed, to obtain a target alignment scheme, which specifically includes:
s1: extracting an event sequence in the log data to be processed, constructing a search queue, and adding an initial state pre-constructed according to the event sequence into the search queue;
s2: extracting search states with the minimum cost of a preset beam width number from the search queue, and deleting the search states which are not extracted and have the same cost as the search states;
s3: expanding the search state by using the process model, sequentially enumerating each excitable transition of the search state, and respectively calculating the subsequent search states of the model step, the log step and the synchronization step of the search state to obtain a possible state set;
S4: calculating the cost of each subsequent search state in the possible state set by using the cost, and adding the subsequent search state and the cost thereof into a search queue;
s5: repeating the steps S2-S4 until the ending state is searched or the maximum search iteration number is reached, so as to obtain a target search queue;
s6: and traversing all event sequences in the log data to be processed, and forming a target alignment scheme by all target search queues.
Based on the above embodiment, in the device, the sampling the log data to be detected to obtain a sampling result specifically includes:
randomly sampling the log data to be detected or sampling according to the occurrence frequency to obtain a sampling result;
wherein the frequency of occurrence is a frequency of occurrence of a sequence of events included in the log data to be detected.
Based on the above embodiment, in the device, the sampling the log data to be detected to obtain a sampling result specifically includes:
and clustering the event sequences in the log data to be detected, and selecting a clustering center as a sampling result.
According to the abnormality discovery device based on process mining, the log data to be processed is obtained, and at least comprises the log data to be mined and the log data to be detected; calculating according to the log data to be mined to obtain a process model; sampling the log data to be detected to obtain a sampling result; calculating a cost by using a cost function according to the sampling result and the process model; the cost function comprises a transition cost function and an event cost function; and performing beam search-based process consistency detection on the log data to be processed according to the cost function, the process model and the cost to obtain a target alignment scheme, and finding out the abnormality of the log data to be processed according to the target alignment scheme. The invention accelerates the process consistency detection by using the beam searching method, improves the accuracy of the alignment result calculation by using the improved cost function, realizes the acceleration and can obtain a more accurate alignment scheme at the same time, thereby finding the abnormality more rapidly and accurately.
Fig. 4 illustrates a physical schematic diagram of an electronic device, as shown in fig. 4, which may include: processor 410, communication interface (Communications Interface) 420, memory 430 and communication bus 440, wherein processor 410, communication interface 420 and memory 430 communicate with each other via communication bus 440. Processor 410 may invoke logic instructions in memory 430 to perform a process mining-based exception discovery method comprising: acquiring log data to be processed, wherein the log data to be processed at least comprises log data to be mined and log data to be detected; calculating according to the log data to be mined to obtain a process model; sampling the log data to be detected to obtain a sampling result; calculating a cost by using a cost function according to the sampling result and the process model; the cost function comprises a transition cost function and an event cost function; and performing beam search-based process consistency detection on the log data to be processed according to the cost function, the process model and the cost to obtain a target alignment scheme, and finding out the abnormality of the log data to be processed according to the target alignment scheme.
Further, the logic instructions in the memory 430 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute the anomaly discovery method based on process mining provided by the above methods, and the method includes: acquiring log data to be processed, wherein the log data to be processed at least comprises log data to be mined and log data to be detected; calculating according to the log data to be mined to obtain a process model; sampling the log data to be detected to obtain a sampling result; calculating a cost by using a cost function according to the sampling result and the process model; the cost function comprises a transition cost function and an event cost function; and performing beam search-based process consistency detection on the log data to be processed according to the cost function, the process model and the cost to obtain a target alignment scheme, and finding out the abnormality of the log data to be processed according to the target alignment scheme.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the process mining-based anomaly discovery method provided by the above methods, the method comprising: acquiring log data to be processed, wherein the log data to be processed at least comprises log data to be mined and log data to be detected; calculating according to the log data to be mined to obtain a process model; sampling the log data to be detected to obtain a sampling result; calculating a cost by using a cost function according to the sampling result and the process model; the cost function comprises a transition cost function and an event cost function; and performing beam search-based process consistency detection on the log data to be processed according to the cost function, the process model and the cost to obtain a target alignment scheme, and finding out the abnormality of the log data to be processed according to the target alignment scheme.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. An anomaly discovery method based on process mining, comprising:
acquiring log data to be processed, wherein the log data to be processed at least comprises log data to be mined and log data to be detected;
calculating according to the log data to be mined to obtain a process model; sampling the log data to be detected to obtain a sampling result;
calculating a cost by using a cost function according to the sampling result and the process model; the cost function comprises a transition cost function and an event cost function;
and performing beam search-based process consistency detection on the log data to be processed according to the cost function, the process model and the cost to obtain a target alignment scheme, and finding out the abnormality of the log data to be processed according to the target alignment scheme.
2. The anomaly discovery method based on process mining of claim 1, wherein calculating a cost using a cost function based on the sampling result and the process model, comprises:
according to the sampling result and the process model, calculating to obtain a basic alignment scheme by using an alignment-based process consistency detection method;
Counting the number of each event synchronization step, model step and log step in the basic alignment scheme to obtain a statistical result;
and calculating the cost by using a cost function according to the statistical result.
3. The anomaly discovery method based on process mining of claim 2, wherein calculating a cost using a cost function based on the statistics comprises:
calculating the transition cost of each transition in the basic alignment scheme by utilizing a transition cost function according to the statistical result;
wherein the transition cost function comprises:
wherein Cost (transition) is a transition cost function; t_cnt represents the sum of the number of synchronization and model steps; skip represents the number of model steps; totnum represents the sum of the number of sync, model and log steps;
calculating the event cost of each event in the basic alignment scheme by using an event cost function according to the statistical result;
the event cost function includes:
where Cost (event) is an event Cost function; totnum represents the sum of the number of sync, model and log steps; insert represents the number of log steps; e_cnt represents the sum of the number of sync and log steps;
The transition cost costs for all transitions and the event cost costs for all events constitute the cost.
4. The anomaly discovery method based on process mining of claim 3, wherein performing a process consistency detection based on a beam search on the log data to be processed according to the cost function, the process model and the cost to obtain a target alignment scheme specifically comprises:
s1: extracting an event sequence in the log data to be processed, constructing a search queue, and adding an initial state pre-constructed according to the event sequence into the search queue;
s2: extracting search states with the minimum cost of a preset beam width number from the search queue, and deleting the search states which are not extracted and have the same cost as the search states;
s3: expanding the search state by using the process model, sequentially enumerating each excitable transition of the search state, and respectively calculating the subsequent search states of the model step, the log step and the synchronization step of the search state to obtain a possible state set;
s4: calculating the cost of each subsequent search state in the possible state set by using the cost, and adding the subsequent search state and the cost thereof into a search queue;
S5: repeating the steps S2-S4 until the ending state is searched or the maximum search iteration number is reached, so as to obtain a target search queue;
s6: and traversing all event sequences in the log data to be processed, and forming a target alignment scheme by all target search queues.
5. The anomaly discovery method based on process mining according to claim 1, wherein the sampling of the log data to be detected to obtain a sampling result specifically includes:
randomly sampling the log data to be detected or sampling according to the occurrence frequency to obtain a sampling result;
wherein the frequency of occurrence is a frequency of occurrence of a sequence of events included in the log data to be detected.
6. The anomaly discovery method based on process mining according to claim 1, wherein the sampling of the log data to be detected to obtain a sampling result specifically includes:
and clustering the event sequences in the log data to be detected, and selecting a clustering center as a sampling result.
7. An anomaly discovery device based on process mining, comprising:
the acquisition unit is used for acquiring log data to be processed, wherein the log data to be processed at least comprises log data to be mined and log data to be detected;
The model and sampling unit is used for calculating to obtain a process model according to the log data to be mined; sampling the log data to be detected to obtain a sampling result;
the calculating unit is used for calculating cost by utilizing a cost function according to the sampling result and the process model; the cost function comprises a transition cost function and an event cost function;
and the detection unit is used for carrying out process consistency detection based on beam search on the log data to be processed according to the cost function, the process model and the cost to obtain a target alignment scheme, and finding out the abnormality of the log data to be processed according to the target alignment scheme.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the process mining-based anomaly discovery method of any one of claims 1 to 6 when the program is executed.
9. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the anomaly discovery method based on process mining of any one of claims 1 to 6.
10. A computer program product comprising a computer program which, when executed by a processor, implements the anomaly discovery method based on process mining of any one of claims 1 to 6.
CN202310890540.0A 2023-07-19 2023-07-19 Anomaly discovery method and device based on process mining Pending CN117112644A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310890540.0A CN117112644A (en) 2023-07-19 2023-07-19 Anomaly discovery method and device based on process mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310890540.0A CN117112644A (en) 2023-07-19 2023-07-19 Anomaly discovery method and device based on process mining

Publications (1)

Publication Number Publication Date
CN117112644A true CN117112644A (en) 2023-11-24

Family

ID=88801076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310890540.0A Pending CN117112644A (en) 2023-07-19 2023-07-19 Anomaly discovery method and device based on process mining

Country Status (1)

Country Link
CN (1) CN117112644A (en)

Similar Documents

Publication Publication Date Title
CN112148772A (en) Alarm root cause identification method, device, equipment and storage medium
CN112181758B (en) Fault root cause positioning method based on network topology and real-time alarm
KR20070011432A (en) Processing data in a computerised system
CN111435343B (en) Automatic generation and online updating method and system for computer system log template
CN113239365B (en) Vulnerability repairing method based on knowledge graph
US8954468B2 (en) Extracting a meaningful frequent itemset
CN117220920A (en) Firewall policy management method based on artificial intelligence
CN109189840B (en) Streaming online log analysis method
CN117112644A (en) Anomaly discovery method and device based on process mining
CN114465875B (en) Fault processing method and device
CN110336817B (en) Unknown protocol frame positioning method based on TextRank
EP3367275A1 (en) Biological sequence data processing method and device
Cao et al. A Fast Randomized Algorithm for Finding the Maximal Common Subsequences
CN113065130A (en) Log classification method and related device
CN117690153B (en) Text detection method, device and equipment based on deterministic finite automaton
CN117312350B (en) Steel industry carbon emission data management method and device
WO2019227227A1 (en) A method of digital signal feature extraction comprising multiscale analysis
CN117873905B (en) Method, device, equipment and medium for code homology detection
CN118227741A (en) Crown block system alarm log analysis method and device, electronic equipment and storage medium
CN111753148B (en) Transaction type matching method and device
CN116662795A (en) Model training method and related equipment
CN118260346A (en) Log fuzzy retrieval system and method based on time sequence data
CN117573798A (en) Database index recommendation method and device
CN117573806A (en) Name matching method and device without separator
CN116418742A (en) Method, device and storage medium for solving IPFIX hash collision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination