CN113139712B - Machine learning-based extraction method for incomplete rules of activity attributes of process logs - Google Patents

Machine learning-based extraction method for incomplete rules of activity attributes of process logs Download PDF

Info

Publication number
CN113139712B
CN113139712B CN202110257681.XA CN202110257681A CN113139712B CN 113139712 B CN113139712 B CN 113139712B CN 202110257681 A CN202110257681 A CN 202110257681A CN 113139712 B CN113139712 B CN 113139712B
Authority
CN
China
Prior art keywords
activity
flow
incomplete
path
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110257681.XA
Other languages
Chinese (zh)
Other versions
CN113139712A (en
Inventor
聂富强
叶旺
孙曜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202110257681.XA priority Critical patent/CN113139712B/en
Publication of CN113139712A publication Critical patent/CN113139712A/en
Application granted granted Critical
Publication of CN113139712B publication Critical patent/CN113139712B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0633Workflow analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention discloses a machine learning-based extraction method for incomplete rules of activity attributes of a flow log. The invention comprises the following steps: step 1, preprocessing log data, firstly extracting flow log data recorded in a business flow information management system, converting the XES format log data into a CSV format suitable for a machine learning algorithm, and dividing the flow log data into flow active paths by taking a flow instance as a unit; step 2, after preprocessing pretreatment is carried out on the flow log data, each flow active path is encoded, and the flow active paths are converted into flow characteristic vectors; and step 3, classifying the flow characteristic vector by using a classification regression decision tree in machine learning, and constructing a classification decision tree. The invention can improve the data analysis efficiency to a great extent and provides a reference for analyzing the reasons of log data missing. The method provided by the invention has the characteristics of good universality, high accuracy and easy understanding.

Description

Machine learning-based extraction method for incomplete rules of activity attributes of process logs
Technical Field
The invention relates to the field of business process management, in particular to a process log activity attribute incomplete rule extraction method based on machine learning.
Background
The process mining is used as a technology in the field of business process management, and is mainly used for optimizing the existing enterprise resources by analyzing process logs recorded in a business process information management system. The business process mining research is mainly divided into three layers: process model discovery, consistency check, and model improvement. The process model discovery means that a process model is mined in a history process log; the consistency detection is mainly to measure the degree of agreement between the excavation model and the original model; model improvement refers to how a mined model is utilized to improve and optimize a known model, organization architecture, etc. At present, the business process model finds out the most studied, and the business process model finds out four mining dimensions of control flow, organization, case and time. The mining dimension found by the flow model is primarily dependent on the data dimension in the flow log.
The flow log is flow instance execution history information recorded by the business flow information management system. Fig. 1 is a flow log fragment in which one flow instance (Case) often contains multiple events (events) (also called activities) or tasks (tasks)). An event contains a number of attributes, such as the ID of the flow instance, the ID of the event, the execution timestamp of the event, and the execution resources of the activity (the activity executives, the executing roles, and the devices required to execute), etc. Most of the existing process mining methods are based on complete process log data, however, due to technical reasons (such as system faults and resource limitations) or artificial reasons (such as manual recording errors), certain data noise is usually recorded in the log information system, such as data missing, inaccurate data recording, irrelevant data recording, and the like. For example, the event time stamp in fig. 1 should be accurate to minutes, but not recorded for some reason or not sufficiently accurate. In data analysis, this phenomenon is referred to as "dirty in and dirty out". The business analysis based on the data with poor quality only produces meaningless results, so the improvement of the data quality has important significance for the mining of the business process. The existing data quality improving modes mainly comprise two modes, namely, when data is generated, the capturing mode is improved, so that log data is recorded as accurately as possible; and secondly, after log data are acquired, repairing the data. Repair log data fills in missing values or replaces inaccurate values mainly through predicted values, but the accuracy of prediction often cannot achieve ideal effects. Therefore, the invention mainly extracts the missing value of the log attribute or the rule of the inaccurate value of the record by a machine learning method and provides support for analyzing the reason of noise generated by the log data in the first mode.
Event attributes in the flow log play a key role in flow mining, e.g., case IDs (Case IDs) typically categorize flow activities by Case; the execution time stamp of the activity is usually used to find the flow execution path, and the flow control flow structure (such as selection, parallel, circulation, repetition, etc. structure in the flow model) is mined; the activity performers and the performance roles are commonly used for organizing mining, performing business analysis through a mined model, optimizing an organization structure and the like. It is difficult to accurately mine a flow model for event attribute value missing or logging inaccuracy. The completeness of the log event attribute determines the accuracy of the process mining.
Disclosure of Invention
The invention provides a machine learning-based extraction method of activity attribute incomplete rules of a flow log, which aims to find a trend of activity attribute value deficiency or inaccurate record in the flow log and provides support for analyzing reasons of the deficiency or inaccurate record of the activity attribute value of the flow log.
A method for extracting incomplete rules of activity attributes of a flow log based on machine learning comprises the following steps:
step 1, preprocessing log data, firstly extracting flow log data recorded in a business flow information management system, converting the log data in an XES format into a CSV format suitable for a machine learning algorithm, and dividing the flow log data into a flow active path set by taking a flow instance as a unit;
step 2, after preprocessing the flow log data, encoding each flow activity path, and converting the flow activity paths into flow characteristic vectors;
and step 3, classifying the flow characteristic vector by using a classification regression decision tree in machine learning, and constructing a classification decision tree.
Further, the step 1 is specifically implemented as follows:
let Case ID be 364868, convert the flow instance with Case ID 364868 into flow activity path, and record as trace= < a, B, C, D, E >, wherein a, B, C, D, E are all unique identifications of activity types. If the activity attribute in the flow activity path is complete, then the flow activity path is complete, if the activity attribute in the flow activity path contains a missing or inaccurate value, then the activity containing the missing or inaccurate value of the attribute is generally denoted in the flow activity path by "-" and if the time attribute value of activity B in the flow activity path trace is missing, then the flow activity path is recorded as trace= < a, -, C, D, E >.
Further, the step 2 is specifically implemented as follows:
and encoding the flow activity path by adopting a one-hot encoding mode: each activity of each path in the pre-processed flow log data is first traversed, and if the activity attribute value is found to be missing or inaccurate, a prefix variable Vpre is added to the immediately preceding activity ID of the activity, and a suffix variable Vsuf is added to the immediately following activity ID, so that the encoded activity is distinguished from the original activity in the flow feature vector.
Further, the prefix variable and the suffix variable are calculated as follows:
Vsuf=∑Type activity +1
Vpre=Vsuf×2
wherein, type activity Is the type of activity in the overall flow log. Taking the total number N of the activity types in the flow log data as a base variable, adding 1 to a suffix variable, and adding 2 times of the suffix variable to the prefix variable. The prefix variable and the suffix variable obtained by calculation are positive integers, and the original activity and the coded activity can be distinguished after the prefix variable and the suffix variable are added.
Further, if one flow active path contains incomplete value activity, the feature vector label value corresponding to the flow active path is set to 1, otherwise, 0 is set.
Further, the step 3 is specifically implemented as follows:
the leaf nodes in the constructed classification decision tree represent the number of flow activity paths containing attribute incomplete values in the flow activity paths, and the non-leaf nodes represent the front and back activity information containing the attribute incomplete value activities; x is less than or equal to Q in the non-leaf nodes and represents the decision condition of the path, and when the characteristic X is less than or equal to Q, the decision tree makes a decision to the left; when the feature X is larger than Q, the decision tree decides to the right; wherein Q is a set threshold; the samples in the non-leaf nodes represent the number S=S1+S2 of flow active paths, wherein S1 flow active paths do not contain incomplete values, and S2 flow active paths contain incomplete values; and S2, analyzing and judging the flow activity paths containing the incomplete values to obtain a conclusion.
Further, the value of the threshold Q is 0.5, and the scikit-learn specification in the machine learning library is adopted for selection.
Further, the analysis rule for the flow activity path including the incomplete value is as follows:
rule 1: whether the flow path containing the incomplete value of the activity attribute has the activity execution of the same ID before the activity containing the incomplete value;
rule 2: whether the flow path containing the incomplete value of the activity attribute has the activity execution of the same ID after the activity containing the incomplete value;
the invention has the following beneficial effects:
the invention aims to find the trend of the lack of the activity attribute value or the inaccuracy of the record in the flow log and provides support for analyzing the cause of the lack of the activity attribute value or the inaccuracy of the record in the flow log.
In the experiment, the classification accuracy is higher, and under the condition that the total number of the total data sets is 6042 and the paths containing the missing values are 3369, only 28 paths do not make correct classification, and the classification accuracy can reach 99.2%. Without this approach, the data analyst would need to look at where log data missing values occur in the massive historical log data without any assistance, a time and effort consuming task. The method can greatly improve the data analysis efficiency and provide a reference for analyzing the reasons of log data missing. The comprehensive analysis method provided by the invention has the characteristics of good universality, high accuracy and easy understanding.
Drawings
FIG. 1 is a flow log fragment.
Fig. 2 is a flowchart of an encoding algorithm.
Fig. 3 is a path coding diagram.
Fig. 4 is an example of a proposed method decision tree.
Fig. 5 is a flow chart of the present invention.
Fig. 6 is a decision tree trained based on real data.
Fig. 7 is a log activity type table.
Detailed Description
The invention is further described below with reference to the drawings and examples.
The flow log is typically recorded in the log information system in the data format of XES (eXtensible Event Stream, scalable event stream). XES is an XML-based event log standard, and has the advantages of less format limitation and high expansibility. However, it is difficult to train decision tree based on XES format data, so log data is preprocessed first, the process log data recorded in the business process information management system is extracted first, the XES format log data is converted into CSV format suitable for machine learning algorithm, and the process log data is divided into process activity paths (also referred to as process activity sequence or process activity track) in units of process instance. For example, in fig. 1, the flow instance with Case ID 364868 is converted into a flow activity path and then recorded as trace= < a, B, C, D, E >, where a, B, C, D, E are all unique identifiers of the activity types. If the activity attribute in the flow activity path is complete, then the flow activity path is also complete, and if the activity attribute in the flow activity path contains a missing or inaccurate value, then the activity containing the missing or inaccurate value of the attribute is generally denoted in the flow activity path by "-" e.g., when the time attribute value of activity B in this case is missing, the flow activity path is recorded as trace= < a, -, C, D, E >.
After preprocessing and preprocessing the flow log data, each flow active path needs to be encoded, and the flow active paths are converted into flow characteristic vectors. The invention adopts a one-hot coding mode to code the flow activity path: first, each activity of each path in the preprocessed flow log data is traversed, if the activity attribute value is found to be missing or inaccurate, a prefix variable Vpre is added to the activity ID (unique identifier of the activity) preceding the activity, and a suffix variable Vsuf is added to the activity ID immediately following the activity, fig. 2 is a coding algorithm flow.
Firstly traversing each activity in each flow activity path, judging whether activity attributes are incomplete, if yes, adding prefix variable and suffix variable to the previous activity ID and the next activity ID of the activity respectively, then judging whether the added feature value exists in a feature set, if yes, not repeatedly adding, and if not, storing the feature value into the set. If the activity attribute has no incomplete value, directly skipping to traverse the next activity until the process activity path traversal is finished.
The prefix variable and the suffix variable are added to the ID of the activity immediately before and after the activity containing the attribute incomplete value respectively, mainly for distinguishing the coded activity from the original activity in the flow characteristic vector, and the prefix variable and the suffix variable have the following calculation formulas:
Vsuf=∑Type activity +1
Vpre=Vsuf×2
wherein, type activity Is the type of activity in the overall flow log. Taking the total number N of the activity types in the flow log data as a base variable, adding 1 to the suffix variable (if the activity type codes start from 0, the base variable value can be directly used as the suffix variable), and the prefix variable is 2 times of the suffix variable. The prefix variable and the suffix variable obtained by calculation are positive integers, and the original activity and the coded activity area can be obtained after the prefix variable and the suffix variable are added. If one flow active path contains incomplete value activity, the feature vector label value corresponding to the flow active path is set to be 1, otherwise, the feature vector label value is set to be 0. For example, as shown in fig. 3, the activity type ID is encoded from 0, 13 activities are total, 13 suffix variable is taken, 26 prefix variable is taken, the flow activity path contains activity with attribute incomplete value, the feature value in the flow feature vector is 1, the tag value is 1, the activity without attribute incomplete value is contained, the feature value in the feature vector is 0, and the tag value is 0.
After the flow activity path is converted into the feature vector through the coding algorithm, the flow feature vector is classified by using a classification regression decision tree (CART) in machine learning, wherein the regression decision tree is a binary tree, and data can be continuously divided into two parts according to the features. The invention trains a CART classification tree, wherein leaf nodes represent the number of flow activity paths containing attribute incomplete values in the flow activity paths, and non-leaf nodes represent the front and back activity information containing attribute incomplete value activities. FIG. 4 shows an example of constructing a decision tree: "26.0< = 0.5" in the root node represents the decision condition of the path, when the feature 26.0 is 0, the decision tree makes a decision to the left, and when the feature is 1, the decision tree makes a decision to the right; 0.5 is chosen because the scikit-learn specification in the machine learning library is used, and other values may be chosen as long as 0 and 1 can be distinguished. The samples in the root node represent the number of flow active paths (6042), wherein 2350 flow active paths contain no incomplete values and 3692 flow active paths contain attribute incomplete values. The left branch of the root node represents the number of flow activity paths without incomplete values, the right branch represents the number of flow activity paths with missing values, in this example, we can find that the incomplete rule of the log attribute value is that most (3581) flow paths with incomplete values of the activity attribute have an activity with an ID of 0 before the activity with the incomplete value, then it can be derived from classification of the decision tree, and in other flow paths with the incomplete value, an activity with an ID of 0 is always executed after the activity with the incomplete value, according to the two rules generated by the decision tree, a common rule of the attribute incomplete value of the log activity can be extracted from the log data of the sample, and the activity with an ID of 0 is always executed before or after the activity with the attribute incomplete value.
The above is a research idea of the present invention, and a specific research flow is shown in fig. 5, and the validity of the method will be verified in the real log data set. The experimental data set adopts data of an information management system issued by Belgium Volvo information company, the system is a subject of intelligent challenge competition of a 2013 business process, and the work flow of processing is a process from system fault occurrence to normal recovery. The log data contains 6042 flow instances, for a total of 13 different types of activities, each with a unique activity ID and activity name, as shown in fig. 7. The activity ID is mainly used later as an element in the activity sequence. All experiments of the invention are completed on a machine with an operating system of Windows10 professional version, a CPU of Intel i7-77003.60GHz and a memory of 16.0G.
Example 1:
firstly, converting a data format through a process mining tool, then converting a process instance into an active sequence, and then adopting one-hot coding to the converted active sequence to enable the active sequence to become a feature vector. Because the data in the log information system are complete data and do not contain incomplete values, the experiment adopts random deletion of the appointed activity attribute value to simulate imperfect log records caused by a system fault or human error, the method is effective to both the missing value and the imprecise value of the log attribute, and the attribute missing value is taken as an example below, when the activity ID in a process example is 5,6,7,9,10, a random number is generated, and if the random number is greater than 45, the timestamp value in the activity attribute is deleted. After the experimental data are processed, the feature vectors are input into a decision tree trained by a machine learning algorithm as shown in fig. 6. The analysis experiment result shows that the number of flow paths containing attribute missing values in the log data of the root node is 3369, 2673 flow paths do not contain missing values, one common missing rule containing the attribute missing values is that activities with the ID of 0 always occur before activities containing the missing values, and some activities (48) containing the attribute missing values are executed after the activities with the ID of 0; further, an activity with an activity ID of 2 is always performed after an activity containing a missing value.
In the experiment, the classification accuracy is higher, and under the condition that the total number of the total data sets is 6042 and the paths containing the missing values are 3369, only 28 paths do not make correct classification, and the classification accuracy can reach 99.2%. Without this approach, the data analyst would need to look at where log data missing values occur in the massive historical log data without any assistance, a time and effort consuming task. The method can greatly improve the data analysis efficiency and provide a reference for analyzing the reasons of log data missing. The comprehensive analysis method provided by the invention has the characteristics of good universality, high accuracy and easy understanding.

Claims (5)

1. A method for extracting incomplete rules of activity attributes of a process log based on machine learning is characterized by comprising the following steps:
step 1, preprocessing log data, firstly extracting flow log data recorded in a business flow information management system, converting the XES format log data into a CSV format suitable for a machine learning algorithm, and dividing the flow log data into flow active paths by taking a flow instance as a unit;
step 2, after preprocessing pretreatment is carried out on the flow log data, each flow active path is encoded, and the flow active paths are converted into flow characteristic vectors;
step 3, classifying the flow characteristic vector by using a classification regression decision tree in machine learning to construct a classification decision tree;
the step 1 is specifically realized as follows:
setting the Case ID as 364868, converting the flow instance with the Case ID of 364868 into a flow activity path, and then recording the flow activity path as trace= < A, B, C, D, E >, wherein A, B, C, D, E are all unique identifiers of the activity type; if the activity attribute in the flow activity path is complete, then the flow activity path is also complete, if the activity attribute in the flow activity path contains a missing or inaccurate value, then the activity containing the missing or inaccurate value of the attribute is typically denoted in the flow activity path by "-" and if the time attribute value of activity B in the flow activity path trace is missing, then the flow activity path is recorded as trace= < a, -, C, D, E >;
the step 2 is specifically realized as follows:
and encoding the flow activity path by adopting a one-hot encoding mode: firstly traversing each activity of each path in the preprocessed flow log data, if the activity attribute value is found to be missing or the record is inaccurate, adding a prefix variable Vpre to the immediately previous activity ID of the activity, and adding a suffix variable Vsuf to the immediately next activity ID, so that the coded activity and the original activity are distinguished in the flow feature vector;
the step 3 is specifically realized as follows:
the leaf nodes in the constructed classification decision tree represent the number of flow activity paths containing attribute incomplete values in the flow activity paths, and the non-leaf nodes represent the front and back activity information containing the attribute incomplete value activities; x is less than or equal to Q in the non-leaf nodes and represents the decision condition of the path, and when the characteristic X is less than or equal to Q, the decision tree makes a decision to the left; when the feature X is larger than Q, the decision tree decides to the right; wherein Q is a set threshold; the samples in the non-leaf nodes represent the number S=S1+S2 of flow active paths, wherein S1 flow active paths do not contain incomplete values, and S2 flow active paths contain incomplete values; and S2, analyzing and judging the flow activity paths containing the incomplete values to obtain a conclusion.
2. The method for extracting incomplete rules of activity attributes of a process log based on machine learning according to claim 1, wherein a prefix variable and a suffix variable are calculated according to the following formula:
Vsuf=∑Type activity +1
Vpre=Vsuf×2
wherein, type activity Is the activity type in the whole flow; taking the total number N of the activity types in the flow log data as a base variable, adding 1 to a suffix variable, and adding 2 times of the suffix variable to the prefix variable.
3. The method for extracting incomplete rules of activity attributes of a process log based on machine learning according to claim 1 or 2, wherein if one process activity path contains incomplete value activity, the feature vector label value corresponding to the process activity path is set to 1, otherwise, set to 0.
4. The method for extracting incomplete rules of activity attributes of a process log based on machine learning according to claim 3, wherein the value of the threshold Q is 0.5, and the scikit-learn specification in the machine learning library is adopted for selection.
5. The method for extracting incomplete rules of activity attributes of a process log based on machine learning according to claim 4, wherein the analysis rules for the process activity path containing incomplete values are as follows:
rule 1: whether the flow path containing the incomplete value of the activity attribute has the activity execution of the same ID before the activity containing the incomplete value;
rule 2: the flow path containing the incomplete value of the activity attribute has the activity execution of the same ID after the activity containing the incomplete value.
CN202110257681.XA 2021-03-09 2021-03-09 Machine learning-based extraction method for incomplete rules of activity attributes of process logs Active CN113139712B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110257681.XA CN113139712B (en) 2021-03-09 2021-03-09 Machine learning-based extraction method for incomplete rules of activity attributes of process logs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110257681.XA CN113139712B (en) 2021-03-09 2021-03-09 Machine learning-based extraction method for incomplete rules of activity attributes of process logs

Publications (2)

Publication Number Publication Date
CN113139712A CN113139712A (en) 2021-07-20
CN113139712B true CN113139712B (en) 2024-02-09

Family

ID=76810975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110257681.XA Active CN113139712B (en) 2021-03-09 2021-03-09 Machine learning-based extraction method for incomplete rules of activity attributes of process logs

Country Status (1)

Country Link
CN (1) CN113139712B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11748219B1 (en) 2022-09-14 2023-09-05 International Business Machines Corporation Application event logging augmentation
CN117787680A (en) * 2024-02-27 2024-03-29 西安敦讯信息技术有限公司 Business process mining method and equipment based on management system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750286A (en) * 2011-04-21 2012-10-24 常州蓝城信息科技有限公司 Novel decision tree classifier method for processing missing data
CN106156260A (en) * 2015-04-28 2016-11-23 阿里巴巴集团控股有限公司 The method and apparatus that a kind of shortage of data is repaired
CN111915018A (en) * 2020-07-31 2020-11-10 第四范式(北京)技术有限公司 Rule extraction method and system based on GBDT model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750286A (en) * 2011-04-21 2012-10-24 常州蓝城信息科技有限公司 Novel decision tree classifier method for processing missing data
CN106156260A (en) * 2015-04-28 2016-11-23 阿里巴巴集团控股有限公司 The method and apparatus that a kind of shortage of data is repaired
CN111915018A (en) * 2020-07-31 2020-11-10 第四范式(北京)技术有限公司 Rule extraction method and system based on GBDT model

Also Published As

Publication number Publication date
CN113139712A (en) 2021-07-20

Similar Documents

Publication Publication Date Title
CN113139712B (en) Machine learning-based extraction method for incomplete rules of activity attributes of process logs
CN113326244B (en) Abnormality detection method based on log event graph and association relation mining
CN113190373B (en) Micro-service system fault root cause positioning method based on fault feature comparison
US20180268081A1 (en) Data extraction
CN108628600B (en) Software dynamic behavior modeling method and device based on control flow analysis
CN113452672B (en) Method for analyzing abnormal flow of terminal of Internet of things of electric power based on reverse protocol analysis
CN111027615A (en) Middleware fault early warning method and system based on machine learning
CN111782460A (en) Large-scale log data anomaly detection method and device and storage medium
CN114968727B (en) Database through infrastructure fault positioning method based on artificial intelligence operation and maintenance
CN115455429A (en) Vulnerability analysis method and system based on big data
CN113064873B (en) Log anomaly detection method with high recall rate
CN114841789A (en) Block chain-based auditing and auditing pricing fault data online editing method and system
CN114971710A (en) Event log-based multi-dimensional process variant difference analysis method and system
CN111078457A (en) Storage fault analysis method and device based on big data
CN108897680B (en) Software system operation profile construction method based on SOA
CN112949778A (en) Intelligent contract classification method and system based on locality sensitive hashing and electronic equipment
CN109299132B (en) SQL data processing method and system and electronic equipment
CN113505283B (en) Screening method and system for test data
CN114153721A (en) API misuse detection method based on decision tree algorithm
CN113051161A (en) API misuse detection method based on historical code change information
Singh et al. Improving event log quality using autoencoders and performing quantitative analysis with conformance checking
CN114020593B (en) Heterogeneous process log sampling method and system based on track clustering
Kornahrens et al. Extracting Process Instances from User Interaction Logs
Cheng et al. Software fault detection using program patterns
CN114500011B (en) Auxiliary decision-making method based on behavior baseline anomaly analysis and event arrangement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant