CN115587081A - Automatic segmentation method and device for flow logs and storage medium - Google Patents

Automatic segmentation method and device for flow logs and storage medium Download PDF

Info

Publication number
CN115587081A
CN115587081A CN202211038946.8A CN202211038946A CN115587081A CN 115587081 A CN115587081 A CN 115587081A CN 202211038946 A CN202211038946 A CN 202211038946A CN 115587081 A CN115587081 A CN 115587081A
Authority
CN
China
Prior art keywords
event
events
probability
log
flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211038946.8A
Other languages
Chinese (zh)
Inventor
唐琦松
林平
吴鑫
赵基
靳志业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai I Search Software Co ltd
Original Assignee
Shanghai I Search Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai I Search Software Co ltd filed Critical Shanghai I Search Software Co ltd
Priority to CN202211038946.8A priority Critical patent/CN115587081A/en
Publication of CN115587081A publication Critical patent/CN115587081A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0633Workflow analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Human Computer Interaction (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method, a device and a storage medium for automatically segmenting a process log, which comprises the steps of obtaining the process log which is collected in advance, and preprocessing or mapping the original attribute value of the process log to obtain the attribute value; predicting a characteristic value by recording the front N-1 records according to the attribute value of the flow log and any one record in the flow log; calculating the probability that any event is a start event, an end event and other events respectively through the characteristic values, comparing the probability with a preset threshold value, and listing the pair of events as the start event and the end event when the probability exceeds the threshold value; and determining the events classified as start and end in all the events, and then performing segmentation operation on the flow log according to a preset segmentation mode. The method and the device have the advantages that the technical effects of determining the types of the events to which the logs belong and automatically splitting the flow logs under different environments are achieved, and the problem of low efficiency of splitting the flow logs is solved.

Description

Automatic segmentation method and device for flow logs and storage medium
Technical Field
The invention relates to a method and a device for automatically segmenting a flow log and a storage medium, and belongs to the technical field of flow segmentation.
Background
In order to understand the actual situation of the employee in executing the business process of the enterprise, all business operations of the employee need to be automatically recorded, and the corresponding business process needs to be restored. When the operation log is automatically collected, one person often continuously operates a plurality of services or one service is operated for a plurality of times, so the collected operation log needs to be segmented according to the service content. The general way is to list all event types and let the user specify the start and end events to be manually split, so that the obtained result is more accurate.
However, the method is complicated in segmenting services in an actual scene, the data volume of service logs is large, the efficiency of manually selecting starting and ending events or checking segmentation results is low, the service attributes are not obvious or belong to multiple services simultaneously, and the events are difficult to judge.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a method, a device and a storage medium for automatically splitting a flow log, and realizes the function of automatically splitting the flow log under different environments.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
in a first aspect, the present invention provides an automatic segmentation method for a flow log, including:
acquiring a pre-collected process log, and preprocessing or mapping an original attribute value of the process log to obtain an attribute value;
predicting a characteristic value by recording the front N-1 records according to the attribute value of the process log and any one record in the process log;
calculating the probability that any event is a start event, an end event and other events respectively through the characteristic values, comparing the probability with a preset threshold value, and listing the pair of events as the start event and the end event when the probability exceeds the threshold value;
and determining the events classified as start and end in all the events, and then performing segmentation operation on the flow log according to a preset segmentation mode.
Further, the continuous relation characteristic value, the dependency relation characteristic value and the data change characteristic value.
Further, the method for predicting the continuous relation characteristic value comprises the following steps: and predicting the occurrence probability of the current event according to N-1 continuous events before the current event, and expressing the continuous relation by using the occurrence probability.
Further, the method for predicting the dependency characteristic value comprises the following steps: and predicting the probability that the event is a starting event or an ending event according to the sequential dependency relationship between the two events in the flow log.
Further, the prediction method of the data change characteristic value is as follows: and predicting the probability that the event is a start event or an end event according to the data change condition corresponding to the attribute of the current event and the attribute of the previous event in the flow log, wherein the data change value is 0 or 1, the data change value is not changed to 0, and the data change value is changed to 1.
Further, the calculating, by the feature value, the probability that any event is a start event, an end event, or another event, respectively, and comparing the probability with a preset threshold, and when the threshold is exceeded, listing the pair of events as the start event and the end event includes:
training a classifier based on training data consisting of manually specified start, end and other events;
finding all occurrence positions of an event through a classifier, predicting the probability of the beginning, the end and other events of a business process according to the classifier according to the continuity of the current event and the previous N-1 events, the dependency relationship with the previous event and the data change characteristics in each position, and determining the event as the beginning event or the end event when the probability exceeds a preset threshold, wherein the classification result of the classifier supports manual adjustment.
Further, when the probability that any pair of events is a start event and an end event respectively exceeds a set threshold, the coverage rate of the service process is further judged, the higher the coverage rate is, the higher the priority is, the lower the coverage rate is, and the service process is skipped, wherein the coverage rate is the ratio of the number of service logs contained in the service process to the total number of logs.
Further, in the flow log, if the time interval between two events exceeds the set time, the former event is automatically determined as an end event; the latter event is determined as a start event.
In a second aspect, the present invention provides an automatic segmentation apparatus for flow logs, including:
the device comprises a preprocessing unit, a processing unit and a processing unit, wherein the preprocessing unit is used for acquiring a pre-acquired process log and preprocessing or mapping an original attribute value of the process log to obtain an attribute value;
the prediction unit is used for predicting the characteristic value by recording the front N records according to the attribute value of the flow log and any one record in the flow log;
the calculating unit is used for calculating the probability that any event is a start event, an end event and other events respectively through the characteristic values, comparing the probability with a preset threshold value, and listing the pair of events as the start event and the end event when the probability exceeds the threshold value;
and the segmentation unit is used for performing segmentation operation on the flow log according to a preset segmentation mode after determining that the events are classified as the beginning and the ending events in all the events.
In a third aspect, the invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of any one of the preceding claims.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention provides a method, a device and a storage medium for automatically segmenting a flow log, which realize the determination of the type of an event to which the log belongs by acquiring the attribute of the flow log;
2. the invention provides a method, a device and a storage medium for automatically splitting a flow log, which define a start event pair and an end event pair through a characteristic value and a key index, and realize the automatic splitting of the flow log;
3. the invention provides a method and a device for automatically segmenting a flow log and a storage medium, which realize the splitting of the flow log under different environments through the adjustment of coverage rate and weight.
Drawings
Fig. 1 is a flowchart of an automatic splitting method for a flow log according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Example 1
As shown in fig. 1, the present embodiment introduces an automatic splitting method for a flow log, including:
acquiring a pre-collected process log, and preprocessing or mapping an original attribute value of the process log to obtain an attribute value;
predicting a characteristic value by recording the front N-1 records according to the attribute value of the process log and any one record in the process log;
calculating the probability that any event is a start event, an end event and other events respectively through the characteristic values, comparing the probability with a preset threshold value, and listing the pair of events as the start event and the end event when the probability exceeds the threshold value;
and determining the events classified as start and end in all the events, and then performing segmentation operation on the flow log according to a preset segmentation mode.
The method for automatically segmenting the flow log provided by the embodiment comprises the following steps of:
s1, preprocessing or mapping the acquired original attributes.
For example, when the original attribute of the web page is collected, the attribute is obtained: URL and operating components (such as buttons). Only the domain name and the label of the component in the URL are reserved during mapping, for example, a search button in www.baidu.com is accessed, the mapping is a character string of 'www.baidu.com search', and the character string is used as the name and the unique identifier of the event. The names are the same, namely, the names belong to the same event.
S2, predicting three characteristic values according to the original attribute value and the context of each operation: continuity, dependencies, data changes.
Continuity: and predicting the occurrence probability of the current event according to N-1 continuous events before the current event, and expressing the continuous relation by using the occurrence probability. For example, three events of ABC often appear together, if A and B are seen, then C is seen with a high probability, and the continuity characteristic of C is high; three events ABD occur together infrequently, the probability of predicting D and B to be continuous is very low, and if an AB event is followed by D, a service breakpoint is possible.
Dependence relationship: as a start event of a service means that other events of the same service will appear behind it, and as an end event of a service means that other events of the same service will all appear in front of it. The prediction method of the dependency relationship characteristic value comprises the following steps: predicting the probability that the event is a starting event or an ending event according to the sequence dependency relationship between two events in the process log, wherein the dependency relationship is judged according to the simultaneous occurrence frequency of the current event and all other events, and if two events A and B are assumed, if A and B always occur simultaneously and A is always before B, B is considered to be dependent on A; counting the total times of continuous occurrences of A and B, the times of A before B after B, and the times of B before A and after B, and respectively calculating the probability that A depends on B and the probability that B depends on A.
Data change: as a service ends, some changes, such as but not limited to closing a window (window title change), browser URL changes, often occur. If a data change occurs after an event, it may be an end event of a service or an operation fragment.
And S3, predicting a start event pair and an end event pair (namely the probability that any pair of events are respectively a start event and an end event) by using a classifier according to the continuity, the dependency relationship and the data change three characteristic values, wherein the method comprises the following steps:
training a classifier based on training data consisting of manually specified start, end and other events;
finding all occurrence positions of an event through a classifier, predicting the probability of the beginning, the ending and other events of a business process according to the classifier on the basis of the continuity of the current event and the previous N-1 events, the dependency relationship with the previous event and data change characteristics in each position, and determining the event as the beginning event or the ending event when the probability exceeds a preset threshold, wherein the classification result of the classifier supports manual adjustment.
And sequencing the start event pairs and the end event pairs which exceed the threshold according to the coverage rate, and enabling the user to select. Or only the first three may be recommended. The weight of the classifier is self-defined by the user or learned using the labeled data.
In the embodiment, user marked data is used, wherein each sample of the data is a characteristic value of an event, including continuity, dependency relationship and data change, and each characteristic value is a floating point number between 0 and 1;
setting labels as 0, 1 and 2, wherein 0 represents other events, 1 represents a start event, 2 represents an end event, and the identification data of the training classifier is one of three events, and setting the event type of the training classifier as a corresponding label to obtain the trained regression classifier.
In addition, special cases need to be considered, such as no operation for a period of time. After a business is finished, business personnel have a large possibility of having a rest, and may have a few minutes of no activity, and the events can be directly used as a type of finishing events.
After screening out possible start event and end event pairs, when the probability that any pair of events are respectively a start event and an end event exceeds a set threshold, the coverage rate of the business process (namely the ratio of the number of the contained business logs to the total number of the logs) is further judged, and the start event and the end event pairs with higher coverage rate have higher priority and lower coverage rate and are skipped. It is also supported to determine the coverage of the service based on other characteristics such as, but not limited to, service length, service operating frequency. For example, in general, it is assumed that there is often no break between the start and end events, and there is no one or two operations.
Regardless of the specific characteristics, or the overall priority rating, support adaptation to different scenarios by adjusting the weights. When the weight is less, the weight is manually set and then it is evaluated whether the calculated pair of start and end events is valid. And the user is also supported to manually select a plurality of start and end event pairs, and the weights are adjusted according to the selected start and end event pairs to ensure that the manually selected start and end event pairs are ranked in the front.
And S4, segmenting the event log according to the determined start event and end event. When the starting event and the ending event are matched one by one, directly segmenting; if there are multiple start and multiple end, there are multiple slicing modes. In this embodiment, the earliest start event employed matches the earliest end event.
Example 2
The embodiment provides an automatic segmentation device of flow log, including:
the system comprises a preprocessing unit, a processing unit and a processing unit, wherein the preprocessing unit is used for acquiring a pre-collected process log and preprocessing or mapping an original attribute value of the process log to obtain an attribute value;
the prediction unit is used for predicting the characteristic value by recording the front N records according to the attribute value of the flow log and any one record in the flow log;
the calculating unit is used for calculating the probability that any event is a start event, an end event and other events respectively through the characteristic values, comparing the probability with a preset threshold value, and listing the pair of events as the start event and the end event when the probability exceeds the threshold value;
and the segmentation unit is used for performing segmentation operation on the flow log according to a preset segmentation mode after determining that the events are classified as the beginning and the ending events in all the events.
Example 3
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method of any of the embodiment 1.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be considered as the protection scope of the present invention.

Claims (10)

1. A method for automatically segmenting flow logs is characterized by comprising the following steps:
acquiring a pre-acquired process log, and preprocessing or mapping an original attribute value of the process log to obtain an attribute value;
predicting a characteristic value by recording the front N-1 records according to the attribute value of the process log and any one record in the process log;
calculating the probability that any event is a start event, an end event and other events respectively through the characteristic values, comparing the probability with a preset threshold value, and listing the pair of events as the start event and the end event when the probability exceeds the threshold value;
and determining the events classified as start and end in all the events, and then segmenting the flow log according to a preset segmentation mode.
2. The method for automatically splitting flow logs according to claim 1, wherein the characteristic values comprise: a continuous relationship characteristic value, a dependency relationship characteristic value, a data change characteristic value.
3. The method for automatically splitting flow logs according to claim 2, wherein the method for predicting the continuous relationship characteristic value is as follows: and predicting the occurrence probability of the current event according to N-1 continuous events before the current event, and expressing the continuous relation by using the occurrence probability.
4. The method for automatically splitting process logs according to claim 3, wherein the method for predicting the dependency characteristic value is as follows: and predicting the probability that the event is a starting event or an ending event according to the sequential dependency relationship between the two events in the flow log.
5. The method for automatically splitting flow logs according to claim 4, wherein the method for predicting the data change characteristic value is as follows: and predicting the probability that the event is a start event or an end event according to the data change condition corresponding to the attribute of the current event and the attribute of the previous event in the flow log, wherein the data change value is 0 or 1, the data change value is not changed to 0, and the data change value is changed to 1.
6. The method of claim 5, wherein the calculating, according to the feature value, probabilities that any event is a start event, an end event, and other events respectively, comparing the probabilities with a preset threshold, and if the threshold is exceeded, classifying the pair of events as the start event and the end event includes:
training a classifier based on training data consisting of manually specified start, end and other events;
finding all occurrence positions of an event through a classifier, predicting the probability of the beginning, the ending and other events of a business process according to the classifier on the basis of the continuity of the current event and the previous N-1 events, the dependency relationship with the previous event and data change characteristics in each position, and determining the event as the beginning event or the ending event when the probability exceeds a preset threshold, wherein the classification result of the classifier supports manual adjustment.
7. The method according to claim 1, wherein when the probability that any pair of events is a start event and an end event respectively exceeds a set threshold, the coverage of the service flow is further determined, and the higher the coverage is, the higher the priority is, the lower the coverage is, and the service flow will be skipped, where the coverage is the ratio of the number of service logs included in the service flow to the total number of logs.
8. The method according to claim 1, wherein in the flow log, if the time interval between two events exceeds a set time, the previous event is automatically determined as an end event; the latter event is determined as a start event.
9. The utility model provides a flow log automatic segmentation device which characterized in that includes:
the device comprises a preprocessing unit, a processing unit and a processing unit, wherein the preprocessing unit is used for acquiring a pre-acquired process log and preprocessing or mapping an original attribute value of the process log to obtain an attribute value;
the prediction unit is used for predicting the characteristic value by recording the front N records according to the attribute value of the flow log and any one record in the flow log;
the calculating unit is used for calculating the probability that any event is a start event, an end event and other events respectively through the characteristic values, comparing the probability with a preset threshold value, and listing the pair of events as the start event and the end event when the probability exceeds the threshold value;
and the segmentation unit is used for performing segmentation operation on the flow logs according to a preset segmentation mode after determining the events classified as start and end in all the events.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the program when executed by a processor implements the steps of the method of any one of claims 1 to 8.
CN202211038946.8A 2022-08-29 2022-08-29 Automatic segmentation method and device for flow logs and storage medium Pending CN115587081A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211038946.8A CN115587081A (en) 2022-08-29 2022-08-29 Automatic segmentation method and device for flow logs and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211038946.8A CN115587081A (en) 2022-08-29 2022-08-29 Automatic segmentation method and device for flow logs and storage medium

Publications (1)

Publication Number Publication Date
CN115587081A true CN115587081A (en) 2023-01-10

Family

ID=84771555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211038946.8A Pending CN115587081A (en) 2022-08-29 2022-08-29 Automatic segmentation method and device for flow logs and storage medium

Country Status (1)

Country Link
CN (1) CN115587081A (en)

Similar Documents

Publication Publication Date Title
US7171620B2 (en) System and method for managing document retention of shared documents
CN108932945B (en) Voice instruction processing method and device
US20080195378A1 (en) Question and Answer Data Editing Device, Question and Answer Data Editing Method and Question Answer Data Editing Program
CN107145445A (en) The automatic analysis method and system of the daily record that reports an error of software automated testing
US20020069197A1 (en) Method and apparatus for categorizing information, and a computer product
CN112579728B (en) Behavior abnormity identification method and device based on mass data full-text retrieval
CN109275047B (en) Video information processing method and device, electronic equipment and storage medium
CN104702492A (en) Garbage message model training method, garbage message identifying method and device thereof
CN109190036B (en) Recommendation method and device, electronic equipment and storage medium
CN110874744B (en) Data anomaly detection method and device
CN112765003B (en) Risk prediction method based on APP behavior log
CN111090822A (en) Business object pushing method and device
CN112508638B (en) Data processing method and device and computer equipment
CN111026961A (en) Method and system for indexing data of interest within multiple data elements
CN111814759A (en) Method and device for acquiring face quality label value, server and storage medium
CN103324641B (en) Information record recommendation method and device
CN109165119B (en) Electronic commerce data processing method and system
CN110795614A (en) Index automatic optimization method and device
CN110378190A (en) Video content detection system and detection method based on topic identification
CN112433993B (en) Network data processing and analyzing system based on computer
CN116484109B (en) Customer portrait analysis system and method based on artificial intelligence
CN115587081A (en) Automatic segmentation method and device for flow logs and storage medium
CN112130759A (en) Parameter configuration method, system and related device of storage system
CN107391551B (en) Web service data analysis method and system based on data mining
CN109409844A (en) The management method and device of netpage user's operation behavior

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination