CN117406972B

CN117406972B - RPA high-value flow instance discovery method and system based on fitness analysis

Info

Publication number: CN117406972B
Application number: CN202311714610.3A
Authority: CN
Inventors: 裴学良; 邓逸; 郑超; 袁水平
Original assignee: Anhui Sigao Intelligent Technology Co ltd
Current assignee: Anhui Sigao Intelligent Technology Co ltd
Priority date: 2023-12-14
Filing date: 2023-12-14
Publication date: 2024-02-13
Anticipated expiration: 2043-12-14
Also published as: CN117406972A

Abstract

The invention provides an RPA high-value flow instance discovery method based on fitness analysis, which comprises the following steps: s1: acquiring an interaction log L, and preprocessing the interaction log L to acquire a new interaction log L; s2: clustering the new interaction log L to obtain a clustering result set, and obtaining a flow model set through the clustering result set; s3: simulating the running state of each flow model in the flow model set through a flow executor to obtain a flow instance set, and calculating through the flow instance set to obtain a high-value flow instance set; s4: and converting the high-value flow instance set into an RPA executable script. According to the method, the interaction logs are clustered to obtain the flow model set, the flow examples are obtained by simulating the running state of the flow model, the high-value flow examples are determined by carrying out cluster matching on the flow examples and calculating the matching fitting degree, and the flow with high potential value can be identified in an automatic mode.

Description

RPA high-value flow instance discovery method and system based on fitness analysis

Technical Field

The invention relates to the field of process mining and automation, in particular to an RPA high-value process instance discovery method and system based on fitness analysis.

Background

The software process is automatic, namely RPA (Robotic Process Automation), and can simulate manual operation, and through interaction with an interface of an application program or a system, various operations such as data input, data processing, system integration and the like are completed, so that time consumption caused by manual operation is reduced, manual errors caused by repeated work are even reduced, and enterprise benefits are improved.

Flow discovery is the most challenging task in the flow mining field, mainly taking a serialized event log as input, discovering a structured flow track from the input, and outputting a business flow model. Before implementing RPA, a comprehensive understanding of the business process, including the steps, activities, inputs and outputs of the process, and the rules and constraints associated therewith, may be obtained by the process discovery organization. This provides a basis for the development and configuration of RPA robots that can accurately simulate and automatically perform specific business processes.

Consistency check is one of important technologies in the field of process mining, and based on a process model of process discovery, whether real process operation is consistent with a standard process model is checked, so that whether business is compliant is deduced, and deviation and non-compliant business events, routes and generation reasons are found through analysis.

The method for finding the RPA high-value flow mainly judges the importance of the flow under the whole scene based on business logic or subjective experience, particularly identifies high-frequency flow events and error-prone flow events, thereby playing the advantages of the RPA automatic flow; however, this type of method is too dependent on subjective assumptions, and when the service scene changes, a high-value strategy needs to be readjusted.

Disclosure of Invention

In order to solve the technical problems, the invention provides an RPA high-value flow instance discovery method based on fitness analysis, which comprises the following steps:

s1: acquiring an interaction log L, and preprocessing the interaction log L to acquire a new interaction log L;

s2: clustering the new interaction log L to obtain a clustering result set, and obtaining a flow model set through the clustering result set;

s3: simulating the running state of each flow model in the flow model set through a flow executor to obtain a flow instance set, and calculating through the flow instance set to obtain a high-value flow instance set;

s4: and converting the high-value flow instance set into an RPA executable script.

Preferably, step S1 specifically includes:

s11: obtaining an interaction log L= { sigma generated by operation in an application program ₁ ,σ ₂ ,σ ₃ ...σ _u }，σ _u=< e ₁ ,...,e _t> The flow path is u, and u is the number of the flow path; acquiring event e in interaction log L _t = (ζ, a, t), where ζ is a unique identifier of a flow path to which the event belongs, a is an activity type of the event, and t is time;

s12: setting a frequency range (0, g) of the low-frequency event, counting the occurrence times of each event, and deleting a flow path to which the event of the activity type belongs from the interaction log L if the event with the occurrence times lower than g exists;

s13: and traversing the events in the rest flow paths, and deleting the latter event from the flow paths if the activity types of the adjacent events are the same, so as to obtain a new interaction log L.

Preferably, step S2 specifically includes:

s21: obtaining new flow paths of the interaction log L, wherein the number of the types of the flow paths is n, obtaining feature vectors of all the flow paths, calculating to obtain similarity among the feature vectors of all the flow paths, and clustering the flow paths with similar feature vectors into the same cluster S through a clustering algorithm and the similarity _x Wherein x is the number of the cluster, and x is more than or equal to 1 and less than or equal to n, so as to obtain a clustering result set S ₁ ,...,S _n ；

S22: inputting the clustering result set into a heuristic process discovery algorithm, and outputting a process model set P ₁ ,...,P _n 。

Preferably, step S21 specifically includes:

s211: the flow path sigma _l The frequency of the activity type of the event occurring in the process path is used as a type feature alpha, the adjacency relationship among the events in the process path is used as a transition feature beta, and gamma= [ alpha, beta]As a flow path sigma _l I is the number of the flow path;

S212：randomly selecting n flow paths as a clustering center O= { O ₁ ,o ₂ ,...,o _n }，o _n For the nth clustering center element, calculating the similarity of feature vectors between other flow paths and the clustering centers, and for each non-clustering center flow path, selecting the clustering center with the largest similarity to form the same cluster S _x Obtaining an iterative clustering result set, and calculating to obtain a measurement value of the iterative clustering result set;

s213: repeating the step S212, and selecting the iterative clustering result set with the minimum metric value as the output clustering result set S ₁ ,...,S _n 。

Preferably, the step S3 specifically includes:

s31: building a flow executor to collect a flow model set P ₁ ,...,P _n Inputting into a process executor to obtain a process instance set corresponding to the process model setWherein, the method comprises the steps of, wherein,for the ith flow model P _i Is a set of flow instance cases of (c),representing the ith flow model P _i The j-th flow instance, tr, generated _λ = (j, a', λ) represents a flow instance +.>J is the unique identifier of the flow instance, and a' is the activity type of the node;

s32: in a subset of process instancesFind and cluster S _x The matching times of the flow paths are calculated and obtained, and the fitting degree of each matching is calculated;

s33: setting the frequency range (0, mu) of the low frequency flow example,computing flow instance collectionsThe fitting degree average value of the process instance with the matching times larger than mu is calculated to obtain the value of the process instance;

s34: setting a value threshold value as theta, taking a flow instance with a value larger than theta as a high-value flow instance, and obtaining a cluster S _x High value flow instance set Q of (2) _x ；

S35: repeating steps S32-S34 to obtain a high-value flow instance set Q of all clusters ₁ ,...,Q _n 。

Preferably, step S32 specifically includes:

s321: setting skipped costs of nodes of activity type a' asIs inserted at the cost ofSet A ^skip And A ^insert Recording the nodes skipped and inserted in each traversal process, A ^skip And A ^insert Initially empty;

s322: a in the process of obtaining matching ^skip And A ^insert In cluster S _x Selecting a flow path sigma _u ^x =<e ₁ ,...,e _t >Traversing sigma in time sequence _u ^x Event in (b) settingRepresenting a sequence of flow instances matched by traversing at m times, wherein t is time, m is the number of times, +.>For the node temp is->Is a length of (2);

s323: when t=m+1, will σ _u ^x In (a) and (b)Event e _m+1 Node converted into flow instance by= (k, a, m+1)And inserted into the sequence->In, the generated sequence->WhereinAn operator representing a node that converts an input event into a flow instance;

judgment sequenceIn the process model P _x If present, updating the sequence to obtain +.>Let m=m+1, return to step S323;

otherwise consider the sequenceInvalid, the process advances to step S324;

s324: considered in sequenceInserting several nodes or skipping the latest nodeUntil the flow model P is obtained _x Effective sequence->；

If there are inserted and skipped nodes, the sequence can be causedIf it is valid, calculate the first cost +.>And second cost->Selecting an operation scheme with lower cost and updating the set +.>Or->Let m=m+1, return to step S323; wherein,for inserting a set of nodes, pi is the number of the node, +.>Is a merging operation;

s325: repeating steps S323-S324 until the flow path sigma is traversed _u ^x All events in (1) at this timeAnd sequence->For the flow model P _x Valid, i.e. there is a flow instance I _j ^x ∈/>And sequence->Matching is consistent;

s326: based onAnd->Calculate matched flow instance I _j ^x Fitting degree of (a)The formula for the fitness calculation is as follows:

wherein z is _j For flow example I _j ^x Quilt cluster S _x The number of times the flow path in (1) is matched, str.a' is set A ^skip The activity type of the middle node, itr.a', is set A ^insert The activity type of the middle node, tr.a', is the flow instance I _j ^x The activity type of the intermediate node;

updating z _j= z _j +1 and reset set A ^skip And A ^insert Is empty;

s327: traversing cluster S _x Repeating steps S322-S326, and calculating to obtain P _x Flow instance and cluster S in (a) _x The matching times of each flow path and the fitting degree of each matched flow instance.

Preferably, the calculation formula of the value of the flow instance is:

wherein Score (I _j ^x ) For flow example I _j ^x Value of I _j ^x For a subset of process instancesIn the j-th flow instance, z _j For flow example I _j ^x Quilt cluster S _x The number of times the flow path in (a) is matched to, +.>For flow example I _j ^x Fitting degree of ∈0, +.>Is V (V) _j ^x Is a component of the group.

Preferably, step S4 specifically includes:

s41: for all high-value flow instances in the high-value flow instance set, if parallel flow instance branches exist, merging all nodes on the parallel flow instance branches into a new node; if a node with the functional complexity larger than that of a single RPA flow step exists, splitting the node into a plurality of child nodes to obtain a processed high-value flow instance set;

s42: and converting the processed high-value flow instance set into an RPA executable script through an RPA instruction converter.

An RPA high-value flow instance discovery system based on fitness analysis comprises the following modules:

the preprocessing module is used for acquiring an interaction log L, preprocessing the interaction log L and acquiring a new interaction log L;

the flow model set acquisition module is used for clustering the new interaction logs L to obtain a clustering result set, and obtaining a flow model set through the clustering result set;

the high-value flow instance set acquisition module is used for simulating the running state of each flow model in the flow model set through the flow executor to obtain a flow instance set, and obtaining the high-value flow instance set through the calculation of the flow instance set;

and the executable script generation module is used for converting the high-value flow instance set into the RPA executable script.

The invention has the following beneficial effects:

the method can identify the flow with high potential value in an automatic mode, reduce the requirements of manual intervention and subjective judgment, and improve the flow automation benefit.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention;

the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, the application provides an RPA high-value process instance discovery method based on fitness analysis, which determines a high-value process instance by comparing the fitness of an actual process instance and a standard process, so that a process with high potential value can be more easily identified.

The method can be realized in an automatic mode, so that the requirements of manual intervention and subjective judgment are reduced. Meanwhile, the method can be customized and adjusted according to different business requirements and flow characteristics, for example, an enterprise can adjust the cost quantity and the threshold value of the fitting degree of different activity types according to actual conditions so as to adapt to different business scenes and flow changes.

In addition, enterprises may gain insight and understanding of the process instances through fitness-based methods. Having identified high value flow instances, the enterprise can further analyze and evaluate the impact and potential of these instances, providing powerful support for decision making and optimization.

The method comprises the following steps:

Further, the step S1 specifically includes:

s11: acquiring intersections generated by operations in an application programMutual log l= { σ ₁ ,σ ₂ ,σ ₃ ...σ _u }，σ _u=< e ₁ ,...,e _t> The flow path is u, and u is the number of the flow path; acquiring event e in interaction log L _t = (ζ, a, t), where ζ is a unique identifier of a flow path to which the event belongs, a is an activity type of the event, and t is time;

s13: traversing the events in the rest flow paths, and deleting the latter event from the flow paths if the activity types of the adjacent events are the same, so as to obtain a new interaction log L;

in particular, if there is an immediately preceding event e _t = (ζ, a, t) and e _t+1 = (ζ, a, t+1), where the activity types are the same, the repeatedly occurring event is removed from the flow path, leaving only the first occurring event e _t 。

Further, the step S2 specifically includes:

Specifically, the characteristics of the coding flow paths under the event type view and the event transition view are set, the number n of the flow paths is set, the similarity between all the flow paths based on the characteristic vector is calculated, and the flow paths with similar characteristic values are clustered into the same cluster by using a clustering algorithm;

Further, step S21 specifically includes:

s212: randomly selecting n flow paths as a clustering center O= { O ₁ ,o ₂ ,...,o _n }，o _n For the nth clustering center element, calculating the similarity of feature vectors between other flow paths and the clustering centers, and for each non-clustering center flow path, selecting the clustering center with the largest similarity to form the same cluster S _x Obtaining an iterative clustering result set, and calculating to obtain a measurement value of the iterative clustering result set;

Specifically, any one flow pathAnd clustering center->The calculation formula of the feature similarity value is as follows:

wherein the method comprises the steps ofTo calculate the two norms of the vector, +.>Represents the selection->Personal cluster center->。

Selecting a cluster center with the maximum feature similarity to form a cluster for the flow path of each non-cluster center;

by a function ofEvaluating the average similarity between the flow path and the center point after a certain clustering, wherein +.>Representative at +.>For the calculation, +.>As a clustering result obtained by a clustering center;

represents->Metric values obtained by iterative calculation, if +.>Then->When the metric value converges or exceeds the maximum number of iterations, get +.>Corresponding cluster division results as in interaction log +.>Result based on flow path feature clustering +.>。

Further, the step S3 specifically includes:

s31: construction of a flow executor, the flowModel set P ₁ ,...,P _n Inputting into a process executor to obtain a process instance set corresponding to the process model setWherein, the method comprises the steps of, wherein,for the ith flow model P _i Is a set of flow instance cases of (c),representing the ith flow model P _i The j-th flow instance, tr, generated _λ = (j, a', λ) represents a flow instance +.>J is the unique identifier of the flow instance, and a' is the activity type of the node;

s33: setting the frequency range (0, mu) of the low-frequency flow instance, and calculating the set of flow instanceThe fitting degree average value of the process instance with the matching times larger than mu is calculated to obtain the value of the process instance;

Further, the step S32 specifically includes:

s321: setting up activity classesThe skipped cost of the node of type a' isIs inserted at the cost ofSet A ^skip And A ^insert Recording the nodes skipped and inserted in each traversal process, A ^skip And A ^insert Initially empty;

s323: when t=m+1, will σ _u ^x Event e in (a) _m+1 Node converted into flow instance by= (k, a, m+1)And inserted into the sequence->In, the generated sequence->WhereinAn operator representing a node that converts an input event into a flow instance;

otherwise consider the sequenceInvalid, the process advances to step S324;

updating z _j= z _j +1 and reset set A ^skip And A ^insert Is empty;

Further, the calculation formula of the value of the flow instance is:

Further, the step S4 specifically includes:

Specifically, each flow node is converted into a Script-Action label in a Script, operation attributes and operation targets needed in the flow node are converted into Script-Command labels, and each Script-Command contains complete interaction information needed for guiding one basic operation and is arranged according to the sequence of the operations.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. do not denote any order, but rather the terms first, second, third, etc. are used to interpret the terms as labels.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. The RPA high-value flow instance discovery method based on fitting degree analysis is characterized by comprising the following steps:

s4: converting the high-value flow instance set into an RPA executable script;

the step S1 specifically comprises the following steps:

the step S2 specifically comprises the following steps:

s21: obtaining new interaction log LThe number of the types of the flow paths is n, the feature vectors of the flow paths are obtained, the similarity between the feature vectors of the flow paths is obtained through calculation, and the flow paths with similar feature vectors are clustered into the same cluster S through a clustering algorithm and the similarity _x Wherein x is the number of the cluster, and x is more than or equal to 1 and less than or equal to n, so as to obtain a clustering result set S ₁ ,...,S _n ；

S22: inputting the clustering result set into a heuristic process discovery algorithm, and outputting a process model set P ₁ ,...,P _n ；

The step S3 specifically comprises the following steps:

S35: repeating steps S32-S34 to obtain a high-value flow instance set Q of all clusters ₁ ,...,Q _n ；

The step S32 specifically includes:

s321: setting skipped costs of nodes of activity type a' asIs inserted with a cost of->Set A ^skip And A ^insert Recording the nodes skipped and inserted in each traversal process, A ^skip And A ^insert Initially empty;

judgment sequenceIn the process model P _x If it exists, updating the sequence to obtainLet m=m+1, return to step S323;

otherwise consider the sequenceInvalid, the process advances to step S324;

s324: considered in sequenceInsert several nodes or skip the latest node +.>Until the flow model P is obtained _x Effective sequence->；

If there are inserted and skipped nodes, the sequence can be causedEffective, then calculate the first costAnd second cost->Selecting an operation scheme with lower cost and updating the set +.>Or->Let m=m+1, return to step S323; wherein,for inserting a set of nodes, pi is the number of the node, +.>Is a merging operation;

s325: repeating steps S323-S324 until the flow path sigma is traversed _u ^x All events in (1) at this timeAnd sequence ofFor the flow model P _x Valid, i.e. there is a flow instance I _j ^x ∈/>And sequence->Matching is consistent;

s326: based onAnd->Calculate matched flow instance I _j ^x Fitting degree of +.>The formula for the fitness calculation is as follows:

updating z _j= z _j +1 and reset set A ^skip And A ^insert Is empty;

2. The RPA high-value flow instance discovery method based on fitness analysis according to claim 1, wherein step S21 specifically comprises:

s212: randomly selecting n flow paths as a clustering center O= { O ₁ ,o ₂ ,...,o _n }，o _n For the nth cluster center elementCalculating the similarity of feature vectors between other flow paths and clustering centers, and selecting the clustering center with the largest similarity to form the same cluster S for each non-clustering center flow path _x Obtaining an iterative clustering result set, and calculating to obtain a measurement value of the iterative clustering result set;

3. The RPA high-value process instance discovery method based on fitness analysis according to claim 1, wherein a calculation formula of a value of a process instance is:

4. The RPA high-value flow instance discovery method based on fitness analysis according to claim 1, wherein step S4 specifically comprises:

5. An RPA high value flow instance discovery system based on fitness analysis, comprising:

the executable script generation module is used for converting the high-value flow instance set into an RPA executable script;

the work flow of the preprocessing module is specifically as follows:

the workflow of the flow model set acquisition module is specifically:

The workflow of the high-value flow instance set acquisition module is specifically:

The step S32 specifically includes:

s322: a in the process of obtaining matching ^skip And A ^insert In cluster S _x Selecting a flow path sigma _u ^x =<e ₁ ,...,e _t >Traversing sigma in time sequence _u ^x Event in (b) settingRepresenting a sequence of flow instances matched to the traversal of the m-time instant, wherein,t is time, m is the number of times, < >>For the node temp is->Is a length of (2);

otherwise consider the sequenceInvalid, the process advances to step S324;

updating z _j= z _j +1 and reset set A ^skip And A ^insert Is empty;