CN117406972B - RPA high-value flow instance discovery method and system based on fitness analysis - Google Patents

RPA high-value flow instance discovery method and system based on fitness analysis Download PDF

Info

Publication number
CN117406972B
CN117406972B CN202311714610.3A CN202311714610A CN117406972B CN 117406972 B CN117406972 B CN 117406972B CN 202311714610 A CN202311714610 A CN 202311714610A CN 117406972 B CN117406972 B CN 117406972B
Authority
CN
China
Prior art keywords
flow
instance
value
event
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311714610.3A
Other languages
Chinese (zh)
Other versions
CN117406972A (en
Inventor
裴学良
邓逸
郑超
袁水平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Sigao Intelligent Technology Co ltd
Original Assignee
Anhui Sigao Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Sigao Intelligent Technology Co ltd filed Critical Anhui Sigao Intelligent Technology Co ltd
Priority to CN202311714610.3A priority Critical patent/CN117406972B/en
Publication of CN117406972A publication Critical patent/CN117406972A/en
Application granted granted Critical
Publication of CN117406972B publication Critical patent/CN117406972B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Abstract

The invention provides an RPA high-value flow instance discovery method based on fitness analysis, which comprises the following steps: s1: acquiring an interaction log L, and preprocessing the interaction log L to acquire a new interaction log L; s2: clustering the new interaction log L to obtain a clustering result set, and obtaining a flow model set through the clustering result set; s3: simulating the running state of each flow model in the flow model set through a flow executor to obtain a flow instance set, and calculating through the flow instance set to obtain a high-value flow instance set; s4: and converting the high-value flow instance set into an RPA executable script. According to the method, the interaction logs are clustered to obtain the flow model set, the flow examples are obtained by simulating the running state of the flow model, the high-value flow examples are determined by carrying out cluster matching on the flow examples and calculating the matching fitting degree, and the flow with high potential value can be identified in an automatic mode.

Description

RPA high-value flow instance discovery method and system based on fitness analysis
Technical Field
The invention relates to the field of process mining and automation, in particular to an RPA high-value process instance discovery method and system based on fitness analysis.
Background
The software process is automatic, namely RPA (Robotic Process Automation), and can simulate manual operation, and through interaction with an interface of an application program or a system, various operations such as data input, data processing, system integration and the like are completed, so that time consumption caused by manual operation is reduced, manual errors caused by repeated work are even reduced, and enterprise benefits are improved.
Flow discovery is the most challenging task in the flow mining field, mainly taking a serialized event log as input, discovering a structured flow track from the input, and outputting a business flow model. Before implementing RPA, a comprehensive understanding of the business process, including the steps, activities, inputs and outputs of the process, and the rules and constraints associated therewith, may be obtained by the process discovery organization. This provides a basis for the development and configuration of RPA robots that can accurately simulate and automatically perform specific business processes.
Consistency check is one of important technologies in the field of process mining, and based on a process model of process discovery, whether real process operation is consistent with a standard process model is checked, so that whether business is compliant is deduced, and deviation and non-compliant business events, routes and generation reasons are found through analysis.
The method for finding the RPA high-value flow mainly judges the importance of the flow under the whole scene based on business logic or subjective experience, particularly identifies high-frequency flow events and error-prone flow events, thereby playing the advantages of the RPA automatic flow; however, this type of method is too dependent on subjective assumptions, and when the service scene changes, a high-value strategy needs to be readjusted.
Disclosure of Invention
In order to solve the technical problems, the invention provides an RPA high-value flow instance discovery method based on fitness analysis, which comprises the following steps:
s1: acquiring an interaction log L, and preprocessing the interaction log L to acquire a new interaction log L;
s2: clustering the new interaction log L to obtain a clustering result set, and obtaining a flow model set through the clustering result set;
s3: simulating the running state of each flow model in the flow model set through a flow executor to obtain a flow instance set, and calculating through the flow instance set to obtain a high-value flow instance set;
s4: and converting the high-value flow instance set into an RPA executable script.
Preferably, step S1 specifically includes:
s11: obtaining an interaction log L= { sigma generated by operation in an application program 123 ...σ u },σ u=< e 1 ,...,e t> The flow path is u, and u is the number of the flow path; acquiring event e in interaction log L t = (ζ, a, t), where ζ is a unique identifier of a flow path to which the event belongs, a is an activity type of the event, and t is time;
s12: setting a frequency range (0, g) of the low-frequency event, counting the occurrence times of each event, and deleting a flow path to which the event of the activity type belongs from the interaction log L if the event with the occurrence times lower than g exists;
s13: and traversing the events in the rest flow paths, and deleting the latter event from the flow paths if the activity types of the adjacent events are the same, so as to obtain a new interaction log L.
Preferably, step S2 specifically includes:
s21: obtaining new flow paths of the interaction log L, wherein the number of the types of the flow paths is n, obtaining feature vectors of all the flow paths, calculating to obtain similarity among the feature vectors of all the flow paths, and clustering the flow paths with similar feature vectors into the same cluster S through a clustering algorithm and the similarity x Wherein x is the number of the cluster, and x is more than or equal to 1 and less than or equal to n, so as to obtain a clustering result set S 1 ,...,S n
S22: inputting the clustering result set into a heuristic process discovery algorithm, and outputting a process model set P 1 ,...,P n
Preferably, step S21 specifically includes:
s211: the flow path sigma l The frequency of the activity type of the event occurring in the process path is used as a type feature alpha, the adjacency relationship among the events in the process path is used as a transition feature beta, and gamma= [ alpha, beta]As a flow path sigma l I is the number of the flow path;
S212:randomly selecting n flow paths as a clustering center O= { O 1 ,o 2 ,...,o n },o n For the nth clustering center element, calculating the similarity of feature vectors between other flow paths and the clustering centers, and for each non-clustering center flow path, selecting the clustering center with the largest similarity to form the same cluster S x Obtaining an iterative clustering result set, and calculating to obtain a measurement value of the iterative clustering result set;
s213: repeating the step S212, and selecting the iterative clustering result set with the minimum metric value as the output clustering result set S 1 ,...,S n
Preferably, the step S3 specifically includes:
s31: building a flow executor to collect a flow model set P 1 ,...,P n Inputting into a process executor to obtain a process instance set corresponding to the process model setWherein, the method comprises the steps of, wherein,for the ith flow model P i Is a set of flow instance cases of (c),representing the ith flow model P i The j-th flow instance, tr, generated λ = (j, a', λ) represents a flow instance +.>J is the unique identifier of the flow instance, and a' is the activity type of the node;
s32: in a subset of process instancesFind and cluster S x The matching times of the flow paths are calculated and obtained, and the fitting degree of each matching is calculated;
s33: setting the frequency range (0, mu) of the low frequency flow example,computing flow instance collectionsThe fitting degree average value of the process instance with the matching times larger than mu is calculated to obtain the value of the process instance;
s34: setting a value threshold value as theta, taking a flow instance with a value larger than theta as a high-value flow instance, and obtaining a cluster S x High value flow instance set Q of (2) x
S35: repeating steps S32-S34 to obtain a high-value flow instance set Q of all clusters 1 ,...,Q n
Preferably, step S32 specifically includes:
s321: setting skipped costs of nodes of activity type a' asIs inserted at the cost ofSet A skip And A insert Recording the nodes skipped and inserted in each traversal process, A skip And A insert Initially empty;
s322: a in the process of obtaining matching skip And A insert In cluster S x Selecting a flow path sigma u x =<e 1 ,...,e t >Traversing sigma in time sequence u x Event in (b) settingRepresenting a sequence of flow instances matched by traversing at m times, wherein t is time, m is the number of times, +.>For the node temp is->Is a length of (2);
s323: when t=m+1, will σ u x In (a) and (b)Event e m+1 Node converted into flow instance by= (k, a, m+1)And inserted into the sequence->In, the generated sequence->WhereinAn operator representing a node that converts an input event into a flow instance;
judgment sequenceIn the process model P x If present, updating the sequence to obtain +.>Let m=m+1, return to step S323;
otherwise consider the sequenceInvalid, the process advances to step S324;
s324: considered in sequenceInserting several nodes or skipping the latest nodeUntil the flow model P is obtained x Effective sequence->
If there are inserted and skipped nodes, the sequence can be causedIf it is valid, calculate the first cost +.>And second cost->Selecting an operation scheme with lower cost and updating the set +.>Or->Let m=m+1, return to step S323; wherein,for inserting a set of nodes, pi is the number of the node, +.>Is a merging operation;
s325: repeating steps S323-S324 until the flow path sigma is traversed u x All events in (1) at this timeAnd sequence->For the flow model P x Valid, i.e. there is a flow instance I j x ∈/>And sequence->Matching is consistent;
s326: based onAnd->Calculate matched flow instance I j x Fitting degree of (a)The formula for the fitness calculation is as follows:
wherein z is j For flow example I j x Quilt cluster S x The number of times the flow path in (1) is matched, str.a' is set A skip The activity type of the middle node, itr.a', is set A insert The activity type of the middle node, tr.a', is the flow instance I j x The activity type of the intermediate node;
updating z j= z j +1 and reset set A skip And A insert Is empty;
s327: traversing cluster S x Repeating steps S322-S326, and calculating to obtain P x Flow instance and cluster S in (a) x The matching times of each flow path and the fitting degree of each matched flow instance.
Preferably, the calculation formula of the value of the flow instance is:
wherein Score (I j x ) For flow example I j x Value of I j x For a subset of process instancesIn the j-th flow instance, z j For flow example I j x Quilt cluster S x The number of times the flow path in (a) is matched to, +.>For flow example I j x Fitting degree of ∈0, +.>Is V (V) j x Is a component of the group.
Preferably, step S4 specifically includes:
s41: for all high-value flow instances in the high-value flow instance set, if parallel flow instance branches exist, merging all nodes on the parallel flow instance branches into a new node; if a node with the functional complexity larger than that of a single RPA flow step exists, splitting the node into a plurality of child nodes to obtain a processed high-value flow instance set;
s42: and converting the processed high-value flow instance set into an RPA executable script through an RPA instruction converter.
An RPA high-value flow instance discovery system based on fitness analysis comprises the following modules:
the preprocessing module is used for acquiring an interaction log L, preprocessing the interaction log L and acquiring a new interaction log L;
the flow model set acquisition module is used for clustering the new interaction logs L to obtain a clustering result set, and obtaining a flow model set through the clustering result set;
the high-value flow instance set acquisition module is used for simulating the running state of each flow model in the flow model set through the flow executor to obtain a flow instance set, and obtaining the high-value flow instance set through the calculation of the flow instance set;
and the executable script generation module is used for converting the high-value flow instance set into the RPA executable script.
The invention has the following beneficial effects:
the method can identify the flow with high potential value in an automatic mode, reduce the requirements of manual intervention and subjective judgment, and improve the flow automation benefit.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention;
the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, the application provides an RPA high-value process instance discovery method based on fitness analysis, which determines a high-value process instance by comparing the fitness of an actual process instance and a standard process, so that a process with high potential value can be more easily identified.
The method can be realized in an automatic mode, so that the requirements of manual intervention and subjective judgment are reduced. Meanwhile, the method can be customized and adjusted according to different business requirements and flow characteristics, for example, an enterprise can adjust the cost quantity and the threshold value of the fitting degree of different activity types according to actual conditions so as to adapt to different business scenes and flow changes.
In addition, enterprises may gain insight and understanding of the process instances through fitness-based methods. Having identified high value flow instances, the enterprise can further analyze and evaluate the impact and potential of these instances, providing powerful support for decision making and optimization.
The method comprises the following steps:
s1: acquiring an interaction log L, and preprocessing the interaction log L to acquire a new interaction log L;
s2: clustering the new interaction log L to obtain a clustering result set, and obtaining a flow model set through the clustering result set;
s3: simulating the running state of each flow model in the flow model set through a flow executor to obtain a flow instance set, and calculating through the flow instance set to obtain a high-value flow instance set;
s4: and converting the high-value flow instance set into an RPA executable script.
Further, the step S1 specifically includes:
s11: acquiring intersections generated by operations in an application programMutual log l= { σ 123 ...σ u },σ u=< e 1 ,...,e t> The flow path is u, and u is the number of the flow path; acquiring event e in interaction log L t = (ζ, a, t), where ζ is a unique identifier of a flow path to which the event belongs, a is an activity type of the event, and t is time;
s12: setting a frequency range (0, g) of the low-frequency event, counting the occurrence times of each event, and deleting a flow path to which the event of the activity type belongs from the interaction log L if the event with the occurrence times lower than g exists;
s13: traversing the events in the rest flow paths, and deleting the latter event from the flow paths if the activity types of the adjacent events are the same, so as to obtain a new interaction log L;
in particular, if there is an immediately preceding event e t = (ζ, a, t) and e t+1 = (ζ, a, t+1), where the activity types are the same, the repeatedly occurring event is removed from the flow path, leaving only the first occurring event e t
Further, the step S2 specifically includes:
s21: obtaining new flow paths of the interaction log L, wherein the number of the types of the flow paths is n, obtaining feature vectors of all the flow paths, calculating to obtain similarity among the feature vectors of all the flow paths, and clustering the flow paths with similar feature vectors into the same cluster S through a clustering algorithm and the similarity x Wherein x is the number of the cluster, and x is more than or equal to 1 and less than or equal to n, so as to obtain a clustering result set S 1 ,...,S n
Specifically, the characteristics of the coding flow paths under the event type view and the event transition view are set, the number n of the flow paths is set, the similarity between all the flow paths based on the characteristic vector is calculated, and the flow paths with similar characteristic values are clustered into the same cluster by using a clustering algorithm;
s22: inputting the clustering result set into a heuristic process discovery algorithm, and outputting a process model set P 1 ,...,P n
Further, step S21 specifically includes:
s211: the flow path sigma l The frequency of the activity type of the event occurring in the process path is used as a type feature alpha, the adjacency relationship among the events in the process path is used as a transition feature beta, and gamma= [ alpha, beta]As a flow path sigma l I is the number of the flow path;
s212: randomly selecting n flow paths as a clustering center O= { O 1 ,o 2 ,...,o n },o n For the nth clustering center element, calculating the similarity of feature vectors between other flow paths and the clustering centers, and for each non-clustering center flow path, selecting the clustering center with the largest similarity to form the same cluster S x Obtaining an iterative clustering result set, and calculating to obtain a measurement value of the iterative clustering result set;
s213: repeating the step S212, and selecting the iterative clustering result set with the minimum metric value as the output clustering result set S 1 ,...,S n
Specifically, any one flow pathAnd clustering center->The calculation formula of the feature similarity value is as follows:
wherein the method comprises the steps ofTo calculate the two norms of the vector, +.>Represents the selection->Personal cluster center->
Selecting a cluster center with the maximum feature similarity to form a cluster for the flow path of each non-cluster center;
by a function ofEvaluating the average similarity between the flow path and the center point after a certain clustering, wherein +.>Representative at +.>For the calculation, +.>As a clustering result obtained by a clustering center;
represents->Metric values obtained by iterative calculation, if +.>Then->When the metric value converges or exceeds the maximum number of iterations, get +.>Corresponding cluster division results as in interaction log +.>Result based on flow path feature clustering +.>
Further, the step S3 specifically includes:
s31: construction of a flow executor, the flowModel set P 1 ,...,P n Inputting into a process executor to obtain a process instance set corresponding to the process model setWherein, the method comprises the steps of, wherein,for the ith flow model P i Is a set of flow instance cases of (c),representing the ith flow model P i The j-th flow instance, tr, generated λ = (j, a', λ) represents a flow instance +.>J is the unique identifier of the flow instance, and a' is the activity type of the node;
s32: in a subset of process instancesFind and cluster S x The matching times of the flow paths are calculated and obtained, and the fitting degree of each matching is calculated;
s33: setting the frequency range (0, mu) of the low-frequency flow instance, and calculating the set of flow instanceThe fitting degree average value of the process instance with the matching times larger than mu is calculated to obtain the value of the process instance;
s34: setting a value threshold value as theta, taking a flow instance with a value larger than theta as a high-value flow instance, and obtaining a cluster S x High value flow instance set Q of (2) x
S35: repeating steps S32-S34 to obtain a high-value flow instance set Q of all clusters 1 ,...,Q n
Further, the step S32 specifically includes:
s321: setting up activity classesThe skipped cost of the node of type a' isIs inserted at the cost ofSet A skip And A insert Recording the nodes skipped and inserted in each traversal process, A skip And A insert Initially empty;
s322: a in the process of obtaining matching skip And A insert In cluster S x Selecting a flow path sigma u x =<e 1 ,...,e t >Traversing sigma in time sequence u x Event in (b) settingRepresenting a sequence of flow instances matched by traversing at m times, wherein t is time, m is the number of times, +.>For the node temp is->Is a length of (2);
s323: when t=m+1, will σ u x Event e in (a) m+1 Node converted into flow instance by= (k, a, m+1)And inserted into the sequence->In, the generated sequence->WhereinAn operator representing a node that converts an input event into a flow instance;
judgment sequenceIn the process model P x If present, updating the sequence to obtain +.>Let m=m+1, return to step S323;
otherwise consider the sequenceInvalid, the process advances to step S324;
s324: considered in sequenceInserting several nodes or skipping the latest nodeUntil the flow model P is obtained x Effective sequence->
If there are inserted and skipped nodes, the sequence can be causedIf it is valid, calculate the first cost +.>And second cost->Selecting an operation scheme with lower cost and updating the set +.>Or->Let m=m+1, return to step S323; wherein,for inserting a set of nodes, pi is the number of the node, +.>Is a merging operation;
s325: repeating steps S323-S324 until the flow path sigma is traversed u x All events in (1) at this timeAnd sequence->For the flow model P x Valid, i.e. there is a flow instance I j x ∈/>And sequence->Matching is consistent;
s326: based onAnd->Calculate matched flow instance I j x Fitting degree of (a)The formula for the fitness calculation is as follows:
wherein z is j For flow example I j x Quilt cluster S x The number of times the flow path in (1) is matched, str.a' is set A skip The activity type of the middle node, itr.a', is set A insert The activity type of the middle node, tr.a', is the flow instance I j x The activity type of the intermediate node;
updating z j= z j +1 and reset set A skip And A insert Is empty;
s327: traversing cluster S x Repeating steps S322-S326, and calculating to obtain P x Flow instance and cluster S in (a) x The matching times of each flow path and the fitting degree of each matched flow instance.
Further, the calculation formula of the value of the flow instance is:
wherein Score (I j x ) For flow example I j x Value of I j x For a subset of process instancesIn the j-th flow instance, z j For flow example I j x Quilt cluster S x The number of times the flow path in (a) is matched to, +.>For flow example I j x Fitting degree of ∈0, +.>Is V (V) j x Is a component of the group.
Further, the step S4 specifically includes:
s41: for all high-value flow instances in the high-value flow instance set, if parallel flow instance branches exist, merging all nodes on the parallel flow instance branches into a new node; if a node with the functional complexity larger than that of a single RPA flow step exists, splitting the node into a plurality of child nodes to obtain a processed high-value flow instance set;
s42: and converting the processed high-value flow instance set into an RPA executable script through an RPA instruction converter.
Specifically, each flow node is converted into a Script-Action label in a Script, operation attributes and operation targets needed in the flow node are converted into Script-Command labels, and each Script-Command contains complete interaction information needed for guiding one basic operation and is arranged according to the sequence of the operations.
An RPA high-value flow instance discovery system based on fitness analysis comprises the following modules:
the preprocessing module is used for acquiring an interaction log L, preprocessing the interaction log L and acquiring a new interaction log L;
the flow model set acquisition module is used for clustering the new interaction logs L to obtain a clustering result set, and obtaining a flow model set through the clustering result set;
the high-value flow instance set acquisition module is used for simulating the running state of each flow model in the flow model set through the flow executor to obtain a flow instance set, and obtaining the high-value flow instance set through the calculation of the flow instance set;
and the executable script generation module is used for converting the high-value flow instance set into the RPA executable script.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. do not denote any order, but rather the terms first, second, third, etc. are used to interpret the terms as labels.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (5)

1. The RPA high-value flow instance discovery method based on fitting degree analysis is characterized by comprising the following steps:
s1: acquiring an interaction log L, and preprocessing the interaction log L to acquire a new interaction log L;
s2: clustering the new interaction log L to obtain a clustering result set, and obtaining a flow model set through the clustering result set;
s3: simulating the running state of each flow model in the flow model set through a flow executor to obtain a flow instance set, and calculating through the flow instance set to obtain a high-value flow instance set;
s4: converting the high-value flow instance set into an RPA executable script;
the step S1 specifically comprises the following steps:
s11: obtaining an interaction log L= { sigma generated by operation in an application program 123 ...σ u },σ u=< e 1 ,...,e t> The flow path is u, and u is the number of the flow path; acquiring event e in interaction log L t = (ζ, a, t), where ζ is a unique identifier of a flow path to which the event belongs, a is an activity type of the event, and t is time;
s12: setting a frequency range (0, g) of the low-frequency event, counting the occurrence times of each event, and deleting a flow path to which the event of the activity type belongs from the interaction log L if the event with the occurrence times lower than g exists;
s13: traversing the events in the rest flow paths, and deleting the latter event from the flow paths if the activity types of the adjacent events are the same, so as to obtain a new interaction log L;
the step S2 specifically comprises the following steps:
s21: obtaining new interaction log LThe number of the types of the flow paths is n, the feature vectors of the flow paths are obtained, the similarity between the feature vectors of the flow paths is obtained through calculation, and the flow paths with similar feature vectors are clustered into the same cluster S through a clustering algorithm and the similarity x Wherein x is the number of the cluster, and x is more than or equal to 1 and less than or equal to n, so as to obtain a clustering result set S 1 ,...,S n
S22: inputting the clustering result set into a heuristic process discovery algorithm, and outputting a process model set P 1 ,...,P n
The step S3 specifically comprises the following steps:
s31: building a flow executor to collect a flow model set P 1 ,...,P n Inputting into a process executor to obtain a process instance set corresponding to the process model setWherein, the method comprises the steps of, wherein,for the ith flow model P i Is a set of flow instance cases of (c),representing the ith flow model P i The j-th flow instance, tr, generated λ = (j, a', λ) represents a flow instance +.>J is the unique identifier of the flow instance, and a' is the activity type of the node;
s32: in a subset of process instancesFind and cluster S x The matching times of the flow paths are calculated and obtained, and the fitting degree of each matching is calculated;
s33: setting the frequency range (0, mu) of the low-frequency flow instance, and calculating the set of flow instanceThe fitting degree average value of the process instance with the matching times larger than mu is calculated to obtain the value of the process instance;
s34: setting a value threshold value as theta, taking a flow instance with a value larger than theta as a high-value flow instance, and obtaining a cluster S x High value flow instance set Q of (2) x
S35: repeating steps S32-S34 to obtain a high-value flow instance set Q of all clusters 1 ,...,Q n
The step S32 specifically includes:
s321: setting skipped costs of nodes of activity type a' asIs inserted with a cost of->Set A skip And A insert Recording the nodes skipped and inserted in each traversal process, A skip And A insert Initially empty;
s322: a in the process of obtaining matching skip And A insert In cluster S x Selecting a flow path sigma u x =<e 1 ,...,e t >Traversing sigma in time sequence u x Event in (b) settingRepresenting a sequence of flow instances matched by traversing at m times, wherein t is time, m is the number of times, +.>For the node temp is->Is a length of (2);
s323: when t=m+1, will σ u x Event e in (a) m+1 Node converted into flow instance by= (k, a, m+1)And inserted into the sequence->In, the generated sequence->WhereinAn operator representing a node that converts an input event into a flow instance;
judgment sequenceIn the process model P x If it exists, updating the sequence to obtainLet m=m+1, return to step S323;
otherwise consider the sequenceInvalid, the process advances to step S324;
s324: considered in sequenceInsert several nodes or skip the latest node +.>Until the flow model P is obtained x Effective sequence->
If there are inserted and skipped nodes, the sequence can be causedEffective, then calculate the first costAnd second cost->Selecting an operation scheme with lower cost and updating the set +.>Or->Let m=m+1, return to step S323; wherein,for inserting a set of nodes, pi is the number of the node, +.>Is a merging operation;
s325: repeating steps S323-S324 until the flow path sigma is traversed u x All events in (1) at this timeAnd sequence ofFor the flow model P x Valid, i.e. there is a flow instance I j x ∈/>And sequence->Matching is consistent;
s326: based onAnd->Calculate matched flow instance I j x Fitting degree of +.>The formula for the fitness calculation is as follows:
wherein z is j For flow example I j x Quilt cluster S x The number of times the flow path in (1) is matched, str.a' is set A skip The activity type of the middle node, itr.a', is set A insert The activity type of the middle node, tr.a', is the flow instance I j x The activity type of the intermediate node;
updating z j= z j +1 and reset set A skip And A insert Is empty;
s327: traversing cluster S x Repeating steps S322-S326, and calculating to obtain P x Flow instance and cluster S in (a) x The matching times of each flow path and the fitting degree of each matched flow instance.
2. The RPA high-value flow instance discovery method based on fitness analysis according to claim 1, wherein step S21 specifically comprises:
s211: the flow path sigma l The frequency of the activity type of the event occurring in the process path is used as a type feature alpha, the adjacency relationship among the events in the process path is used as a transition feature beta, and gamma= [ alpha, beta]As a flow path sigma l I is the number of the flow path;
s212: randomly selecting n flow paths as a clustering center O= { O 1 ,o 2 ,...,o n },o n For the nth cluster center elementCalculating the similarity of feature vectors between other flow paths and clustering centers, and selecting the clustering center with the largest similarity to form the same cluster S for each non-clustering center flow path x Obtaining an iterative clustering result set, and calculating to obtain a measurement value of the iterative clustering result set;
s213: repeating the step S212, and selecting the iterative clustering result set with the minimum metric value as the output clustering result set S 1 ,...,S n
3. The RPA high-value process instance discovery method based on fitness analysis according to claim 1, wherein a calculation formula of a value of a process instance is:
wherein Score (I j x ) For flow example I j x Value of I j x For a subset of process instancesIn the j-th flow instance, z j For flow example I j x Quilt cluster S x The number of times the flow path in (a) is matched to, +.>For flow example I j x Fitting degree of ∈0, +.>Is V (V) j x Is a component of the group.
4. The RPA high-value flow instance discovery method based on fitness analysis according to claim 1, wherein step S4 specifically comprises:
s41: for all high-value flow instances in the high-value flow instance set, if parallel flow instance branches exist, merging all nodes on the parallel flow instance branches into a new node; if a node with the functional complexity larger than that of a single RPA flow step exists, splitting the node into a plurality of child nodes to obtain a processed high-value flow instance set;
s42: and converting the processed high-value flow instance set into an RPA executable script through an RPA instruction converter.
5. An RPA high value flow instance discovery system based on fitness analysis, comprising:
the preprocessing module is used for acquiring an interaction log L, preprocessing the interaction log L and acquiring a new interaction log L;
the flow model set acquisition module is used for clustering the new interaction logs L to obtain a clustering result set, and obtaining a flow model set through the clustering result set;
the high-value flow instance set acquisition module is used for simulating the running state of each flow model in the flow model set through the flow executor to obtain a flow instance set, and obtaining the high-value flow instance set through the calculation of the flow instance set;
the executable script generation module is used for converting the high-value flow instance set into an RPA executable script;
the work flow of the preprocessing module is specifically as follows:
s11: obtaining an interaction log L= { sigma generated by operation in an application program 123 ...σ u },σ u=< e 1 ,...,e t> The flow path is u, and u is the number of the flow path; acquiring event e in interaction log L t = (ζ, a, t), where ζ is a unique identifier of a flow path to which the event belongs, a is an activity type of the event, and t is time;
s12: setting a frequency range (0, g) of the low-frequency event, counting the occurrence times of each event, and deleting a flow path to which the event of the activity type belongs from the interaction log L if the event with the occurrence times lower than g exists;
s13: traversing the events in the rest flow paths, and deleting the latter event from the flow paths if the activity types of the adjacent events are the same, so as to obtain a new interaction log L;
the workflow of the flow model set acquisition module is specifically:
s21: obtaining new flow paths of the interaction log L, wherein the number of the types of the flow paths is n, obtaining feature vectors of all the flow paths, calculating to obtain similarity among the feature vectors of all the flow paths, and clustering the flow paths with similar feature vectors into the same cluster S through a clustering algorithm and the similarity x Wherein x is the number of the cluster, and x is more than or equal to 1 and less than or equal to n, so as to obtain a clustering result set S 1 ,...,S n
S22: inputting the clustering result set into a heuristic process discovery algorithm, and outputting a process model set P 1 ,...,P n
The workflow of the high-value flow instance set acquisition module is specifically:
s31: building a flow executor to collect a flow model set P 1 ,...,P n Inputting into a process executor to obtain a process instance set corresponding to the process model setWherein, the method comprises the steps of, wherein,for the ith flow model P i Is a set of flow instance cases of (c),representing the ith flow model P i The j-th flow instance, tr, generated λ = (j, a', λ) represents a flow instance +.>J is the unique identifier of the flow instance, and a' is the activity type of the node;
s32: in a subset of process instancesFind and cluster S x The matching times of the flow paths are calculated and obtained, and the fitting degree of each matching is calculated;
s33: setting the frequency range (0, mu) of the low-frequency flow instance, and calculating the set of flow instanceThe fitting degree average value of the process instance with the matching times larger than mu is calculated to obtain the value of the process instance;
s34: setting a value threshold value as theta, taking a flow instance with a value larger than theta as a high-value flow instance, and obtaining a cluster S x High value flow instance set Q of (2) x
S35: repeating steps S32-S34 to obtain a high-value flow instance set Q of all clusters 1 ,...,Q n
The step S32 specifically includes:
s321: setting skipped costs of nodes of activity type a' asIs inserted with a cost of->Set A skip And A insert Recording the nodes skipped and inserted in each traversal process, A skip And A insert Initially empty;
s322: a in the process of obtaining matching skip And A insert In cluster S x Selecting a flow path sigma u x =<e 1 ,...,e t >Traversing sigma in time sequence u x Event in (b) settingRepresenting a sequence of flow instances matched to the traversal of the m-time instant, wherein,t is time, m is the number of times, < >>For the node temp is->Is a length of (2);
s323: when t=m+1, will σ u x Event e in (a) m+1 Node converted into flow instance by= (k, a, m+1)And inserted into the sequence->In, the generated sequence->WhereinAn operator representing a node that converts an input event into a flow instance;
judgment sequenceIn the process model P x If it exists, updating the sequence to obtainLet m=m+1, return to step S323;
otherwise consider the sequenceInvalid, the process advances to step S324;
s324: considered in sequenceInsert several nodes or skip the latest node +.>Until the flow model P is obtained x Effective sequence->
If there are inserted and skipped nodes, the sequence can be causedEffective, then calculate the first costAnd second cost->Selecting an operation scheme with lower cost and updating the set +.>Or->Let m=m+1, return to step S323; wherein,for inserting a set of nodes, pi is the number of the node, +.>Is a merging operation;
s325: repeating steps S323-S324 until the flow path sigma is traversed u x All events in (1) at this timeAnd sequence ofFor the flow model P x Valid, i.e. there is a flow instance I j x ∈/>And sequence->Matching is consistent;
s326: based onAnd->Calculate matched flow instance I j x Fitting degree of +.>The formula for the fitness calculation is as follows:
wherein z is j For flow example I j x Quilt cluster S x The number of times the flow path in (1) is matched, str.a' is set A skip The activity type of the middle node, itr.a', is set A insert The activity type of the middle node, tr.a', is the flow instance I j x The activity type of the intermediate node;
updating z j= z j +1 and reset set A skip And A insert Is empty;
s327: traversing cluster S x Repeating steps S322-S326, and calculating to obtain P x Flow instance and cluster S in (a) x The matching times of each flow path and the fitting degree of each matched flow instance.
CN202311714610.3A 2023-12-14 2023-12-14 RPA high-value flow instance discovery method and system based on fitness analysis Active CN117406972B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311714610.3A CN117406972B (en) 2023-12-14 2023-12-14 RPA high-value flow instance discovery method and system based on fitness analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311714610.3A CN117406972B (en) 2023-12-14 2023-12-14 RPA high-value flow instance discovery method and system based on fitness analysis

Publications (2)

Publication Number Publication Date
CN117406972A CN117406972A (en) 2024-01-16
CN117406972B true CN117406972B (en) 2024-02-13

Family

ID=89492855

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311714610.3A Active CN117406972B (en) 2023-12-14 2023-12-14 RPA high-value flow instance discovery method and system based on fitness analysis

Country Status (1)

Country Link
CN (1) CN117406972B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693317A (en) * 2012-05-29 2012-09-26 华为软件技术有限公司 Method and device for data mining process generating
CN104881435A (en) * 2015-05-05 2015-09-02 中国海洋石油总公司 Data mining based automatic research flow well logging evaluation expert system
CN114926073A (en) * 2022-06-02 2022-08-19 南京英诺森软件科技有限公司 Method for automatic process mining based on RPA decomposition log
CN115759979A (en) * 2022-11-16 2023-03-07 上海弘玑信息技术有限公司 Process intelligent processing method and system based on RPA and process mining
CN115878081A (en) * 2023-02-23 2023-03-31 安徽思高智能科技有限公司 High-value RPA demand analysis system based on process discovery
CN115953123A (en) * 2022-12-19 2023-04-11 中移信息技术有限公司 Method, device and equipment for generating robot automation flow and storage medium
CN115952919A (en) * 2023-01-16 2023-04-11 哈尔滨工业大学(威海) Intelligent risk prediction method based on process mining
CN116225513A (en) * 2023-05-09 2023-06-06 安徽思高智能科技有限公司 RPA dynamic flow discovery method and system based on concept drift
CN116628228A (en) * 2023-07-19 2023-08-22 安徽思高智能科技有限公司 RPA flow recommendation method and computer readable storage medium
CN117170648A (en) * 2023-09-08 2023-12-05 上海艺赛旗软件股份有限公司 Robot flow automation component recommendation method, device, equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020174093A1 (en) * 2001-05-17 2002-11-21 Fabio Casati Method of identifying and analyzing business processes from workflow audit logs
US11281936B2 (en) * 2018-12-31 2022-03-22 Kofax, Inc. Systems and methods for identifying processes for robotic automation and building models therefor
US11433536B2 (en) * 2019-09-19 2022-09-06 UiPath, Inc. Process understanding for robotic process automation (RPA) using sequence extraction
US11249729B2 (en) * 2019-10-14 2022-02-15 UiPath Inc. Providing image and text data for automatic target selection in robotic process automation
US11403118B2 (en) * 2019-12-30 2022-08-02 UiPath Inc. Enhanced target selection for robotic process automation
US11294793B1 (en) * 2020-10-23 2022-04-05 UiPath Inc. Robotic process automation (RPA) debugging systems and methods
US11934416B2 (en) * 2021-04-13 2024-03-19 UiPath, Inc. Task and process mining by robotic process automations across a computing environment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693317A (en) * 2012-05-29 2012-09-26 华为软件技术有限公司 Method and device for data mining process generating
CN104881435A (en) * 2015-05-05 2015-09-02 中国海洋石油总公司 Data mining based automatic research flow well logging evaluation expert system
CN114926073A (en) * 2022-06-02 2022-08-19 南京英诺森软件科技有限公司 Method for automatic process mining based on RPA decomposition log
CN115759979A (en) * 2022-11-16 2023-03-07 上海弘玑信息技术有限公司 Process intelligent processing method and system based on RPA and process mining
CN115953123A (en) * 2022-12-19 2023-04-11 中移信息技术有限公司 Method, device and equipment for generating robot automation flow and storage medium
CN115952919A (en) * 2023-01-16 2023-04-11 哈尔滨工业大学(威海) Intelligent risk prediction method based on process mining
CN115878081A (en) * 2023-02-23 2023-03-31 安徽思高智能科技有限公司 High-value RPA demand analysis system based on process discovery
CN116225513A (en) * 2023-05-09 2023-06-06 安徽思高智能科技有限公司 RPA dynamic flow discovery method and system based on concept drift
CN116628228A (en) * 2023-07-19 2023-08-22 安徽思高智能科技有限公司 RPA flow recommendation method and computer readable storage medium
CN117170648A (en) * 2023-09-08 2023-12-05 上海艺赛旗软件股份有限公司 Robot flow automation component recommendation method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
RPA机器人助力电力企业财务数字化转型;李锐;黄彩云;;中国新技术新产品;20200725(第14期);全文 *
市场经济背景下鉴定文书签章流程自动化应用系统设计;王松;;财富时代;20191225(第12期);全文 *

Also Published As

Publication number Publication date
CN117406972A (en) 2024-01-16

Similar Documents

Publication Publication Date Title
De Medeiros et al. Quantifying process equivalence based on observed behavior
van der Aalst et al. Process equivalence: Comparing two process models based on observed behavior
Ferreira et al. Approaching process mining with sequence clustering: Experiments and findings
Kasenberg et al. Interpretable apprenticeship learning with temporal logic specifications
CN107909344B (en) Workflow log repeated task identification method based on relation matrix
CN110263230B (en) Data cleaning method and device based on density clustering
CN110427298B (en) Automatic feature extraction method for distributed logs
CN112182219A (en) Online service abnormity detection method based on log semantic analysis
CN113779272A (en) Data processing method, device and equipment based on knowledge graph and storage medium
CN110134663B (en) Organization structure data processing method and device and electronic equipment
CN111782460A (en) Large-scale log data anomaly detection method and device and storage medium
Naderifar et al. A review on conformance checking technique for the evaluation of process mining algorithms
CN114968727B (en) Database through infrastructure fault positioning method based on artificial intelligence operation and maintenance
CN112287603A (en) Prediction model construction method and device based on machine learning and electronic equipment
CN117406972B (en) RPA high-value flow instance discovery method and system based on fitness analysis
CN113793227A (en) Human-like intelligent perception and prediction method for social network events
CN112949778A (en) Intelligent contract classification method and system based on locality sensitive hashing and electronic equipment
CN110348005B (en) Distribution network equipment state data processing method and device, computer equipment and medium
CN111444635A (en) XM L language-based system dynamics simulation modeling method and engine
CN110554952A (en) search-based hierarchical regression test data generation method
CN113836005A (en) Virtual user generation method and device, electronic equipment and storage medium
CN112434831A (en) Troubleshooting method and device, storage medium and computer equipment
Saller et al. Easy, adaptable and high-quality Modelling with domain-specific Constraint Patterns
Fang et al. Online incremental mining based on trusted behavior interval
WO2023162002A1 (en) Log analysis device, log analysis method and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant