CN116778189A

CN116778189A - RPA flow processing analysis method and computer equipment

Info

Publication number: CN116778189A
Application number: CN202310915645.7A
Authority: CN
Inventors: 刘艳
Original assignee: Fujing Technology Shenzhen Co ltd
Current assignee: Fujing Technology Shenzhen Co ltd
Priority date: 2023-07-25
Filing date: 2023-07-25
Publication date: 2023-09-19

Abstract

The application provides an RPA flow processing analysis method and computer equipment, which are used for acquiring a plurality of associated image elements of a first target image element in an RPA element page image to be picked up, then acquiring knowledge representation of the first target image element through the first target image element and context image elements thereof, integrating the knowledge representation of the first target image element with the knowledge representation of the associated image element through a commonality measurement result between the knowledge representation of the first target image element and the knowledge representation of the plurality of associated image elements, obtaining an integrated knowledge representation of the first target image element, and acquiring a corresponding target element pickup result based on the integrated knowledge representation of the first target image element. The semantic information contained in the knowledge representation of the first target image element is perfected according to the associated image element, so that the semantic information of the first target image element in the RPA element page image to be picked up can be more accurately represented, and the accuracy of a target element pickup result obtained by integrating the knowledge representation can be improved.

Description

RPA flow processing analysis method and computer equipment

Technical Field

The present application relates to, but not limited to, RPA, data processing, machine learning, and other technical fields, and in particular, to an RPA flow processing analysis method and a computer device.

Background

RPA (Robotic Process Automation) refers to techniques that utilize software robots or automated tools to perform daily repetitive tasks. It can simulate human behavior and automatically perform a series of prescribed tasks, thereby improving work efficiency, reducing errors, and reducing personnel burden. RPA may be applied in various industries and departments, such as finance, insurance, human resources, customer service, and the like. The method can process structured data and repetitive tasks, liberate manpower through automatic processing and improve working efficiency. The RPA can be integrated with the existing application programs and systems, and can interact through interface operations or APIs to realize the execution of automatic processes and tasks. This saves time and cost and provides greater accuracy and consistency. However, in a specific scenario, such as a remote desktop, a virtual machine, custom software, etc., API operation targeting cannot be performed. In the prior art, the RPA elements in the scene are picked up by various target detection technologies, for example, in the prior art, the RPA elements are picked up by adopting a mode of joint recognition such as image search, optical character detection, template matching and the like, so that the complexity of the mode is high, and a mode which can ensure the picking up accuracy, high efficiency and simplicity of the element picking up in the RPA process is needed.

Disclosure of Invention

In view of this, the embodiments of the present application at least provide an RPA flow processing analysis method and a computer device, which improve the above technical problems.

The technical scheme of the embodiment of the application is realized as follows:

in one aspect, an embodiment of the present application provides an RPA process analysis method, applied to a computer device, where the method includes:

acquiring a plurality of associated image elements of a first target image element in an RPA element page image to be picked up, wherein the plurality of associated image elements are used for representing various element possibilities contained in the first target image element;

acquiring knowledge representation of the first target image element through the first target image element and a context image element of the first target image element in the RPA element page image to be picked up;

integrating the knowledge representation of the first target image element with the knowledge representations of the plurality of associated image elements through a commonality measurement result between the knowledge representation of the first target image element and the knowledge representations of the plurality of associated image elements to obtain an integrated knowledge representation of the first target image element;

and obtaining a target element pickup result corresponding to the RPA element page image to be picked up through the integrated knowledge representation.

In some embodiments, the obtaining a knowledge representation of the first target image element by the first target image element and a contextual image element of the first target image element in the RPA element page image to be picked up comprises:

inputting the first target image element and the context image element into a knowledge representation mining network;

and performing salient feature embedding mapping on the first target image element and the context image element through the knowledge representation mining network to obtain knowledge representation of the first target image element.

In some embodiments, the performing salient feature embedding mapping on the first target image element and the context image element, obtaining the knowledge representation of the first target image element includes:

acquiring a first search array, a first anchor array and a first result array of the first target image element;

acquiring a second anchor array and a second result array of the contextual image element;

performing standardization operation on the multiplication result of the first search array and the first anchor array and the multiplication result of the first search array and the second anchor array to obtain a first significant eccentricity factor of the first target image element and a second significant eccentricity factor of the context image element to the first target image element;

Summing the multiplication result of the first significant eccentricity factor and the first result array and the multiplication result of the second significant eccentricity factor and the second result array to obtain knowledge representation of the first target image element;

the integrating the knowledge representation of the first target image element with the knowledge representations of the plurality of associated image elements through the commonality measurement result between the knowledge representation of the first target image element and the knowledge representations of the plurality of associated image elements, the obtaining the integrated knowledge representation of the first target image element includes:

the mining network is characterized by knowledge to perform the following operations:

determining a plurality of first correlation eccentricity factors between the knowledge representation of the first target image element and the knowledge representations of the plurality of associated image elements through a commonality measurement result between the knowledge representation of the first target image element and the knowledge representations of the plurality of associated image elements, the first correlation eccentricity factors being used to characterize the degree of association of the corresponding associated image element with the first target image element;

and integrating the knowledge representation of the RPA element page image to be picked up with the knowledge representation of the plurality of associated image elements through a plurality of first correlation eccentric factors to obtain the integrated knowledge representation of the first target image element.

In some embodiments, the integrating the knowledge representation of the RPA element page image to be picked up with the knowledge representations of the plurality of associated image elements by a plurality of first correlation eccentricity factors, obtaining the integrated knowledge representation of the first target image element includes:

fusing knowledge representation of the RPA element page image to be picked up with knowledge representation of the plurality of associated image elements through a plurality of first correlation eccentric factors to obtain fused knowledge representation of the first target image element;

performing multi-head salient feature embedding mapping on the fusion knowledge representation to obtain a plurality of salient feature embedding mapping arrays of the first target image element;

combining the plurality of salient feature embedding mapping arrays to obtain salient feature embedding mapping tensors;

and performing downsampling operation on the salient feature embedding mapping tensor to obtain the integrated knowledge representation of the first target image element.

In some embodiments, the debugging process of the knowledge characterization mining network includes:

obtaining a debugging learning sample, wherein the debugging learning sample comprises an RPA element page image learning sample, a target element pickup result sample and a commonality measurement result sample between the RPA element page image learning sample and the target element pickup result sample;

Inputting the RPA element page image learning sample and the target element pickup result sample into the knowledge representation mining network;

extracting the integrated knowledge representation of the target image element sample in the RPA element page image learning sample and the integrated knowledge representation of the target element pickup result sample through the knowledge representation mining network;

optimizing the knowledge representation mining network internal configuration variables through loss between a commonality measurement result and the commonality measurement result sample between the integrated knowledge representation of the target image element sample and the integrated knowledge representation of the target element pickup result sample.

In some embodiments, the acquiring a plurality of associated image elements of a first target image element in the RPA element page image to be picked up comprises:

traversing in a set of associated image elements a target image element associated with the first target image element, the set of associated image elements storing a plurality of image elements and a plurality of associated image elements corresponding to each of the image elements;

determining a plurality of associated image elements corresponding to the target image element as a plurality of associated image elements of the first target image element;

The acquiring process of the first target image element comprises the following steps:

performing image segmentation processing on the RPA element page image to be picked up to obtain a plurality of contrast image elements of the RPA element page image to be picked up;

and when any one of the plurality of contrast image elements is consistent with any one of a set of associated image elements, determining the any one contrast image element as the first target image element, wherein the set of associated image elements stores a plurality of image elements and a plurality of associated image elements corresponding to each image element.

In some embodiments, the performing image segmentation processing on the RPA element page image to be picked up to obtain a plurality of contrast image elements of the RPA element page image to be picked up includes:

performing image segmentation processing on the RPA element page image to be picked up through different strategies to obtain a plurality of contrast image element sets respectively corresponding to the different strategies, wherein each contrast image element set comprises a plurality of contrast image elements in the RPA element page image to be picked up, the number of image blocks of different contrast image elements in the same contrast image element set is the same, and meanwhile, the number of image blocks of the contrast image elements in different contrast image element sets is different;

The determining any one of the plurality of contrast image elements as the first target image element when the any one of the plurality of contrast image elements is consistent with any one of the set of associated image elements comprises:

and when the plurality of contrast image elements belonging to different contrast image element sets are respectively consistent with the plurality of image elements in the associated image element set, determining the contrast image element with the largest number of image blocks in the plurality of contrast image elements belonging to the different contrast image element sets as the first target image element.

In some embodiments, the method of obtaining knowledge representation of the plurality of associated image elements comprises:

for any associated image element, inputting the any associated image element into a knowledge representation mining network;

and performing salient feature embedding mapping on a plurality of image elements in any associated image element through the knowledge representation mining network to obtain knowledge representation of any associated image element.

In some embodiments, the performing salient feature embedding mapping on the plurality of image elements in the any associated image element, and obtaining the knowledge representation of the any associated image element includes:

For any one image element of a plurality of image elements in any associated image element, acquiring a third search array, a third anchor array and a third result array of the any one image element;

acquiring a fourth anchor array and a fourth result array of the rest of image elements except any image element in a plurality of image elements in any associated image element;

performing standardization operation on the multiplication result of the third search array and the third anchor array and the multiplication result of the third search array and the fourth anchor array to obtain a third significant eccentric factor of any one image element and a fourth significant eccentric factor of the rest image elements on the any image element;

summing the multiplication result of the third significant eccentric factor and the third result array and the multiplication result of the fourth significant eccentric factor and the fourth result array to obtain knowledge representation of any one image element;

integrating knowledge representation of a plurality of image elements in any associated image element to obtain knowledge representation of any associated image element.

In another embodiment, the application also provides a computer device comprising a memory and a processor, the memory storing a computer program executable on the processor, the processor implementing the steps in the method described above when the program is executed.

The application has at least the following beneficial effects: according to the RPA flow processing analysis method and the computer equipment, through the acquisition of the multiple associated image elements of the first target image element in the RPA element page image to be picked up, the multiple associated image elements are used for representing various element possibilities contained in the first target image element, then the knowledge representation of the first target image element is acquired through the first target image element and the context image element of the first target image element in the RPA element page image to be picked up, then the knowledge representation of the first target image element is integrated with the knowledge representation of the multiple associated image elements through the common measurement result between the knowledge representation of the first target image element and the knowledge representation of the multiple associated image elements, the integrated knowledge representation of the first target image element is obtained, and finally the target element pickup result corresponding to the RPA element page image to be picked up is obtained through the integrated knowledge representation. When knowledge representation mining is carried out on the first target image element, the associated image element is added into an analysis process, semantic information contained in the knowledge representation of the first target image element is perfected according to the associated image element, and the obtained integrated knowledge representation can more accurately represent the semantic information of the first target image element in the RPA element page image to be picked up, so that accuracy of a target element pickup result obtained through the integrated knowledge representation can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic implementation flow chart of an RPA flow processing analysis method according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a composition structure of an RPA flow processing analysis device according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a hardware entity of a computer device according to an embodiment of the present application.

Detailed Description

The technical solution of the present application will be further elaborated with reference to the accompanying drawings and examples, which should not be construed as limiting the application, but all other embodiments which can be obtained by one skilled in the art without making inventive efforts are within the scope of protection of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict. The term "first/second/third" is merely to distinguish similar objects and does not represent a particular ordering of objects, it being understood that the "first/second/third" may be interchanged with a particular order or precedence, as allowed, to enable embodiments of the application described herein to be implemented in other than those illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing the application only and is not intended to be limiting of the application.

The embodiment of the application provides an RPA flow processing analysis method which can be executed by a processor of computer equipment. The computer device may refer to a device with data processing capability such as a server, a notebook computer, a tablet computer, a desktop computer, a smart television, a mobile device (e.g., a mobile phone, a portable video player, a personal digital assistant, a dedicated messaging device, a portable game device), etc.

Fig. 1 is a schematic implementation flow chart of an RPA flow processing analysis method according to an embodiment of the present application, as shown in fig. 1, the method includes steps S110 to S140 as follows:

step S110, a plurality of associated image elements of a first target image element in the RPA element page image to be picked up are acquired, where the plurality of associated image elements are used to characterize various element possibilities contained in the first target image element.

The RPA element page image to be picked up may be a target page when the RPA robot executes a task, for example, an operation interface in a remote desktop or a virtual system, when the task is executed, the operation elements included in the target page need to be picked up to perform subsequent automatic operations, where the operation elements include a button, an input box, a link, and the like, and the button includes a submit button, a start button, an end button, a skip button, and the like, and each operation element corresponds to a corresponding operation. The first target image element is an image element of an element type to be identified, which is determined in an RPA element page image to be picked up, and has difficulty in identification, in other words, the target image element may contain multiple identification possibilities, for example, the image element contained in the page is a button a, different meanings may be contained in different scenes, such as jumping, submitting, ending, closing, and the like, one associated image element represents one meaning that the first target image element may have, a plurality of associated image elements represent different meanings of the first target image element, and the meaning of the associated image element is explicit.

Step S120, obtaining a knowledge representation of the first target image element by the first target image element and a context image element of the first target image element in the RPA element page image to be picked up.

Because the first target image element may correspond to different meanings in different interface environments, the interface environments, namely the context image element of the first target image element in the RPA element page image to be picked up, are constructed together with the first target image element, so semantic information of the first target image element in the RPA element page image to be picked up can be embodied through knowledge representation of the first target image element obtained by the first target image element and the context image element. The knowledge representation of the image elements is feature information of the image elements extracted by the machine learning model, and the knowledge of the image elements acquired by the model can be represented by carriers such as feature vectors, matrixes or tensors.

Step S130, integrating the knowledge representation of the first target image element with the knowledge representations of the plurality of associated image elements to obtain an integrated knowledge representation of the first target image element through a commonality measurement result between the knowledge representation of the first target image element and the knowledge representations of the plurality of associated image elements.

Because the plurality of associated image elements can represent a plurality of different meanings of the first target image element, through a commonality measurement result (namely, a similarity measurement value between knowledge characterization and knowledge characterization of the plurality of associated image elements can be embodied through similarity or matching degree, the calculation mode can be that vector distance is calculated, the smaller the vector distance is, the larger the commonality measurement result is), the knowledge characterization of the first target image element and the knowledge characterization of the plurality of associated image elements are integrated, additional information can be added in the knowledge characterization of the first target image element, and semantic information of the first target image element in an RPA element page image to be picked up can be more accurately represented by the integrated knowledge characterization of the first target image element. Knowledge characterization integration, i.e., fusion, is performed, for example, by vector addition, stitching, or concatenation, which is not limited by the present application.

And step S140, acquiring a target element pickup result corresponding to the RPA element page image to be picked up through integrating knowledge representation.

According to the application, because the integrated knowledge representation can accurately and perfectly represent the semantic information of the first target image element in the RPA element page image to be picked up, based on the semantic information, the real semantic of the target image element can be accurately restored by integrating the target element pickup result obtained by knowledge representation, and the element pickup accuracy is increased. Wherein the target element pick-up result is, for example, the type of the target image element identified, i.e. the meaning of the element represented, such as submitted, entered, jumped, etc.

Based on the method provided by the embodiment of the application, when knowledge representation mining is performed on the first target image element, the associated image elements are added into an analysis process, and different associated image elements can correspond to different meanings of the first target image element. Through the degree of association between the knowledge representation of the associated image element and the knowledge representation of the first target image element, the knowledge representation of the associated image element and the knowledge representation of the first target image element are integrated, semantic information contained in the knowledge representation of the first target image element is perfected, so that the semantic information of the first target image element in the RPA element page image to be picked up can be accurately represented by the integrated knowledge representation, and the accuracy of a target element pickup result obtained through the integrated knowledge representation can be improved.

As another embodiment, the RPA flow processing analysis method provided by the present application may include the following steps:

step S210, acquiring a first target image element from an RPA element page image to be picked up.

Optionally, performing image segmentation processing on the RPA element page image to be picked up to obtain a plurality of contrast image elements of the RPA element page image to be picked up. The image segmentation process may be to cut the target page according to a preset pixel size to obtain a plurality of image blocks, where the preset pixel size may be adaptively selected according to an actual pixel of the page, and the application is not limited. When any one of the plurality of contrast image elements is consistent with any one of the associated image element sets, the contrast image element is taken as a first target image element, and the associated image element sets store the plurality of image elements and the plurality of associated image elements corresponding to each image element.

The associated image element set stores a plurality of image elements and a plurality of associated image elements corresponding to each image element, in other words, the associated image elements corresponding to each image element are stored as a result (Value) by taking the plurality of image elements as anchors (keys-keys), and the corresponding associated image elements can be searched by the image elements. Alternatively, the set of associated image elements may be generated by a graph network, the image elements in the graph network being represented by feature vectors, the associated image elements corresponding to the target image element being connected to the target image element by edges. Based on the above embodiment, under the condition of performing image segmentation processing on the RPA element page image to be picked up, searching for a plurality of contrast image elements in the associated image element set, so as to determine the first target image element in the RPA element page image to be picked up, and the identification speed of the first target image element is high.

Optionally, image segmentation processing is performed on the RPA element page image to be picked up through different strategies to obtain a plurality of contrast image element sets corresponding to the different strategies respectively, wherein the different strategies are to perform image segmentation according to different pixel scales, each contrast image element set comprises a plurality of contrast image elements in the RPA element page image to be picked up, one contrast image element is constructed by at least one image block combination, the number of image blocks of different contrast image elements in the same contrast image element set is the same, meanwhile, the number of image blocks of the contrast image elements in the different contrast image element sets is different, and when the plurality of contrast image elements belonging to the different contrast image element sets are consistent with the plurality of image elements in the associated image element sets respectively, the contrast image element with the largest number of image blocks in the plurality of contrast image elements belonging to the different contrast image element sets is determined as the first target image element.

Step S220, acquiring a plurality of associated image elements of the first target image element in the RPA element page image to be picked up, where the plurality of associated image elements are used to characterize various element possibilities contained in the first target image element.

Optionally, traversing the first target image element in the set of associated image elements produces an associated target image element. And determining a plurality of associated image elements corresponding to the target image element as a plurality of associated image elements of the first target image element. Based on the above embodiment, a plurality of associated image elements of the first target image element can be efficiently determined in the associated image element set, so that the integrated knowledge representation of the first target image element can be acquired through the plurality of associated image elements, and further, semantic information of the first target image element in the RPA element page image to be picked up can be more accurately represented.

In step S230, knowledge representation of the first target image element is obtained by the first target image element and the contextual image element of the first target image element in the RPA element page image to be picked up.

Optionally, the first target image element and the context image element are input into a knowledge representation mining network, and the knowledge representation of the first target image element is obtained by performing salient feature embedding mapping on the first target image element and the context image element through the knowledge representation mining network. Based on the above embodiment, because the first target image element and the context image element are integrated to reflect the semantic information of the first target image element in the RPA element page image to be picked up, the first target image element and the context image element can be subjected to saliency feature embedding mapping, and the knowledge representation of the first target image element is obtained through the saliency feature embedding mapping, so that the knowledge representation of the first target image element can accurately and perfectly represent the meaning of the first target image element. Wherein the salient feature embedding map is a process of embedding map for the target element based on an attention mechanism (e.g., an internal attention mechanism), and the embedding map may be encoded based on an Encoder. For example, the first target image element and the context image element are each input into a knowledge representation mining network, which may be any feasible neural network model, such as a transformer. The mining network obtains a first search array, a first anchor array and a first result array of the first target image element based on knowledge representation, wherein the search array is a Query in an attention mechanism, the corresponding array can be a one-dimensional array, namely a vector, the anchor array is a Key in the attention mechanism, and the result array is a Value in the attention mechanism. And obtaining a second anchor array and a second result array of the context image element through a knowledge representation mining network, carrying out standardization operation on the multiplication result of the first search array and the first anchor array and the multiplication result of the first search array and the second anchor array to obtain a first saliency eccentric factor of the first target image element and a second saliency eccentric factor of the context image element to the first target image element, and summing the multiplication result of the first saliency eccentric factor and the first result array and the multiplication result of the second saliency eccentric factor and the second result array to obtain the knowledge representation of the first target image element. Optionally, the first search array and the first anchor array are used to obtain a first saliency bias factor for the first target image element, the first result array is used to characterize the first target image element, and the first saliency bias factor and the first result array are used to obtain a knowledge characterization of the first target image element. The process of the normalization operation is to perform mapping projection of multiplication results of different values in a specific value range, for example, uniformly mapping to [0,1], wherein the saliency eccentricity factor of the target image element is the corresponding attention influence degree of the target image element, and the saliency eccentricity factor can be embodied by giving corresponding weight, so that the saliency eccentricity factor is a weight, and the importance of the corresponding target image element can be adjusted.

Step S240, integrating the knowledge representation of the first target image element with the knowledge representations of the plurality of associated image elements to obtain an integrated knowledge representation of the first target image element, by using a result of the commonality measure between the knowledge representation of the first target image element and the knowledge representations of the plurality of associated image elements.

Optionally, a knowledge representation mining network is adopted, and a plurality of first correlation eccentric factors between the knowledge representation of the first target image element and the knowledge representations of the plurality of associated image elements are determined through a commonality measurement result between the knowledge representation of the first target image element and the knowledge representations of the plurality of associated image elements, wherein the first correlation eccentric factors are used for representing the association degree of the corresponding associated image elements and the first target image element. And integrating the knowledge representation of the first target image element with the knowledge representations of the plurality of associated image elements through a plurality of first correlation eccentric factors to obtain an integrated knowledge representation of the first target image element. Based on the above embodiment, the knowledge representation mining network may be adopted, and the first correlation eccentric factor is determined by using a common measurement result between the knowledge representation of the first target image element and the knowledge representations of the plurality of associated image elements, in other words, if the common measurement result between the knowledge representation of the first target image element and the knowledge representation of one associated image element is large, which represents that the associated image element is closer to the semantic information of the first target image element in the RPA element page image to be picked up, the numerical configuration of the first correlation eccentric factor between the knowledge representation of the first target image element and the knowledge representation of the associated image element may be larger, so as to perfect the semantic information contained in the knowledge representation of the first target image element, and the obtained integrated knowledge representation may more accurately represent the semantic information of the first target image element in the RPA element page image to be picked up.

The following description is made from the process of acquiring the first correlation decentration factor and the process of acquiring the integrated knowledge representation, and for the acquisition of the first correlation decentration factor, optionally, the knowledge representation of the first target image element and the knowledge representations of the plurality of associated image elements are input into a knowledge representation mining network, where the knowledge representation mining network may adopt the following formula 1 when acquiring the commonality measurement result between the knowledge representation of the first target image element and the knowledge representations of the plurality of associated image elements:

αn＝2sigmoid（2×(M ₁ ×V ₁ +M ₂ ×V _n )）-1

in determining a plurality of first correlation eccentric factors between the knowledge representation of the first target image element and the knowledge representations of the plurality of associated image elements, the following equation 2 may be employed:

W _n =exp（V _n ）/∑exp（V _m ）

wherein, alpha n is the result of commonality measurement, V ₁ For knowledge representation of the first target image element, V ₂ For knowledge representation of the associated image element, n is the number of the associated image element. M is M ₁ And M ₂ Generating W when debugging knowledge representation mining network for parameter variable arrays _n Is a first correlation eccentricity factor. Because knowledge of the first target image element characterizes V ₁ With the context information of the RPA element page image to be picked up, the meaning of each associated image element in the current page can be accurately acquired based on the above calculation manner.

Optionally, the first correlation eccentricity factor is positively correlated with the commonality measurement between the knowledge representation of the first target image element and the knowledge representation of the associated image element, in other words, the larger the commonality measurement between the knowledge representation of the first target image element and the knowledge representation of an associated image element, the larger the first correlation eccentricity factor between the knowledge representation of the first target image element and the knowledge representation of the associated image element, and the larger the commonality measurement between the knowledge representation of the first target image element and the knowledge representation of another associated image element, the smaller the first correlation eccentricity factor between the knowledge representation of the first target image element and the knowledge representation of the associated image element.

For the process of acquiring the integrated knowledge representation, optionally, a knowledge representation mining network is used to fuse the knowledge representation of the first target image element with the knowledge representations of the plurality of associated image elements through a plurality of first correlation eccentric factors (e.g., weighting is performed first, then adding and summing are performed after numerical values are adjusted), so as to obtain a fused knowledge representation of the first target image element. And carrying out multi-head salient feature embedding mapping on the fused knowledge representation through a knowledge representation mining network to obtain a plurality of salient feature embedding mapping arrays of the first target image element, adopting the knowledge representation mining network, and obtaining the integrated knowledge representation of the first target image element through the plurality of salient feature embedding mapping arrays. The multi-head salient feature embedding mapping is used for encoding the integrated knowledge representation by adopting different adjusting arrays, and can finish deeper feature information mining of the integrated knowledge representation and increase the feature representation effect of the integrated knowledge representation.

The process of obtaining the integrated knowledge representation can be further refined into the following three layers for detailed description: firstly, introducing a mode of obtaining integrated knowledge representation of a first target image element; then, a mode of obtaining a plurality of salient feature embedding mapping arrays of the first target image element is introduced; finally, a way of obtaining an integrated knowledge representation of the first target image element is described.

First, for the way to obtain an integrated knowledge representation of the first target image element, the following equation 3 may optionally be used:

V _f =（V ₁ +）/2

wherein V is _f An integrated knowledge representation of the first target image element.

The knowledge representation of the first target image element and the knowledge representations of the plurality of associated image elements may be obtained by integrating the knowledge representations of the first target image element, which may be based on the following manner, in which the composition structure of the knowledge representation mining network is simply introduced. Optionally, inputting the first target image element and the plurality of associated image elements into an encoder of a knowledge representation mining network, and performing embedded mapping encoding on the first target image element and the plurality of associated image elements by the encoder to obtain a knowledge representation V of the first target image element ₁ And knowledge representation of a plurality of associated image elements, representing the knowledge representation of the first target image element V ₁ And knowledge representation input knowledge representation mining network first correlation eccentric factor operators of a plurality of associated image elements, wherein the first correlation eccentric factor operators calculate a plurality of first correlation eccentric factors by adopting the formulas 1 and 2, and the first correlation eccentric factor operators are attention operators. And integrating the first correlation eccentric factor with the corresponding knowledge representation by adopting a knowledge representation mining network, loading the obtained result (a vector) into a sum-up mapping operator of the knowledge representation mining network, wherein the sum-up mapping operator can be a fully connected network, and calculating the integrated knowledge representation of the first target image element through the formula 3.

For the manner of obtaining the multiple saliency feature embedding mapping arrays of the first target image element, multi-head saliency feature embedding mapping is set to be two-head saliency feature embedding mapping, such as saliency head 1 and saliency head 2, corresponding three adjustment arrays (such as two-dimensional matrixes) Mq, mk and Mv are matched for each saliency head, and for the saliency head 1, the integrated knowledge representation of the first target image element is multiplied with the three adjustment arrays Mq, mk and Mv respectively through a knowledge representation mining network to obtain a search array, an anchoring array and a result number of the integrated knowledge representation. Multiplying the search array and the transpose of the anchor array through the knowledge representation mining network to obtain a product figure1, performing standardization operation on the product figure1 to obtain a saliency eccentric factor Wx of the first target image element, and multiplying the saliency eccentric factor Wx and a result array of the first target image element to obtain a saliency feature embedding mapping array 1 of the first target image element. For the saliency head 2, the integrated knowledge representation of the first target image element is multiplied by three adjustment arrays Mq, mk and Mv respectively through a knowledge representation mining network to obtain a search array, an anchor array and a result array of the integrated knowledge representation. The search array is multiplied with the transposition of the anchor array through the knowledge representation mining network to obtain a product figure2, the product figure2 is subjected to standardization operation to obtain a saliency eccentric factor Wy of the first target image element, and the saliency eccentric factor Wy is multiplied with a result array of the first target image element to obtain a saliency characteristic embedding mapping array 2 of the first target image element.

And combining a plurality of salient feature embedding mapping arrays through a knowledge representation mining network in a mode of acquiring the integrated knowledge representation of the first target image element to obtain a salient feature embedding mapping tensor, and performing downsampling operation on the salient feature embedding mapping tensor by adopting the knowledge representation mining network to obtain the integrated knowledge representation of the first target image element. The multi-head salient feature embedding mapping is set to be the two-head salient feature embedding mapping of the second layer, a knowledge representation mining network is adopted, the salient feature embedding mapping array 1 and the salient feature embedding mapping array 2 of the first target image element are combined, namely spliced to obtain salient feature embedding mapping tensors, the salient feature embedding mapping tensors are fully connected through the knowledge representation mining network, namely the salient feature embedding mapping tensors are integrated with fully connected weight vectors, and then summed with fully connected offset vectors to obtain the integrated knowledge representation of the first target image element.

In acquiring knowledge representation of a plurality of associated image elements, the following steps may be employed:

optionally, for any associated image element, inputting any associated image element into a knowledge representation mining network, and performing salient feature embedding mapping on a plurality of image elements in the associated image element through the knowledge representation mining network to obtain the knowledge representation of the associated image element. For example, for any one image element of a plurality of image elements in any associated image element, acquiring a third search array, a third anchor array and a third result array of the image element through a knowledge representation mining network, and acquiring a fourth anchor array and a fourth result array of the rest of the image elements except the image element in the plurality of image elements in the associated image element through the knowledge representation mining network; the knowledge representation mining network is used for carrying out standardization operation on the multiplication results of the third search array and the third anchoring array and the multiplication results of the third search array and the fourth anchoring array to obtain a third significant decentration factor of the image element and a fourth significant decentration factor of the rest of the image elements on the image element; summing the multiplication result of the third significant eccentric factor and the third result array and the multiplication result of the fourth significant eccentric factor and the fourth result array through a knowledge representation mining network to obtain knowledge representation of the image element; and integrating knowledge representation of a plurality of image elements in the associated image elements through a knowledge representation mining network to obtain knowledge representation of the associated image elements.

Step S250, obtaining a target element pickup result corresponding to the RPA element page image to be picked up through integrating knowledge representation.

Optionally, when the commonality measurement result between the integrated knowledge representation of any target element pickup result and the integrated knowledge representation of the first target image element meets the target commonality measurement result requirement, determining the target element pickup result as a target element pickup result corresponding to the RPA element page image to be picked up. The commonality measurement result meets the requirement of the target commonality measurement result that the commonality measurement result is not smaller than the preset value of the commonality measurement result. Based on the above embodiment, the target element pickup result corresponding to the RPA element page image to be picked up may be determined by the commonality measurement result between the integrated knowledge representation of the first target image element and the integrated knowledge representation of the target element pickup result, and the determination speed of the target element pickup result is faster.

For example, including N target element pick-up results 1, 2, 3 … … N, let the integrated knowledge of the first target image element be characterized as V _f The integrated knowledge characterization of the N target element pick-up results is V respectively ₁ 、V ₂ 、V ₃ ……V _n Determining an integrated knowledge representation V of the first target image element, respectively _f The Euclidean distance between the integrated knowledge representation of the N target element pickup results is used for obtaining a commonality measurement result, and if the commonality measurement result is preset as S ₁ The integrated knowledge representation is greater than S ₁ And determining the target element pickup result corresponding to the RPA element page image to be picked.

When the integrated knowledge representation of the target element pickup result is obtained, optionally, image segmentation processing is performed on the target element pickup result to obtain a plurality of contrast image elements of the target element pickup result, when any one of the plurality of contrast image elements of the target element pickup result is consistent with any one of the plurality of associated image elements, any contrast image element is determined to be a second target image element, the plurality of image elements and the plurality of associated image elements corresponding to the plurality of image elements are stored in the associated image element set, the plurality of associated image elements corresponding to the second target image element are obtained from the associated image element set, the knowledge representation of the second target image element is obtained through the second target image element and the context image element of the second target image element in the target element pickup result, the common measurement result between the knowledge representation of the second target image element and the knowledge representation of the plurality of associated image elements corresponding to the second target image element is obtained, and the knowledge representation of the second target image element is integrated.

In the above embodiment of the present application, different associated image elements may correspond to different meanings of the first target image element. Through the degree of association between the knowledge representation of the associated image element and the knowledge representation of the first target image element, the knowledge representation of the associated image element and the knowledge representation of the first target image element are integrated, semantic information contained in the knowledge representation of the first target image element is perfected, so that the semantic information of the first target image element in the RPA element page image to be picked up can be accurately represented by the integrated knowledge representation, and the accuracy of a target element pickup result obtained through the integrated knowledge representation can be improved.

The debugging process of the knowledge representation mining network is described below, which may specifically include the following steps:

step S310, obtaining a debug learning sample.

The debugging and learning sample is data for training a debugging network and comprises an RPA element page image learning sample, a target element pickup result sample and a commonality measurement result sample between the RPA element page image learning sample and the target element pickup result sample.

For example, a common measurement result sample between the RPA element page image learning sample and the target element pickup result sample is represented by Y and N, wherein Y represents that the common measurement result of the RPA element page image learning sample and the target element pickup result sample is high, or the RPA element page image learning sample is matched with the target element pickup result sample, and N represents that the common measurement result of the RPA element page image learning sample and the target element pickup result sample is low, or the RPA element page image learning sample is not matched with the target element pickup result sample.

Step S320, the RPA element page image learning sample and the target element pickup result sample are input to a knowledge representation mining network.

Step S330, extracting the integrated knowledge representation of the target image element sample and the integrated knowledge representation of the target element pickup result sample in the RPA element page image learning sample through the knowledge representation mining network.

Optionally, the target image element sample in the RPA element page image learning sample is obtained according to the knowledge representation mining network, the identification target image element of the target element pickup result sample is obtained according to the knowledge representation mining network, the multiple associated image elements of the target image element sample and the multiple associated image elements of the index target image element are obtained according to the knowledge representation mining network, and the integrated knowledge representation of the target image element sample and the integrated knowledge representation of the target element pickup result sample are obtained according to the knowledge representation mining network.

For example, a target image element sample is obtained from an RPA element page image learning sample according to a knowledge extraction operator of a knowledge representation mining network, a plurality of associated image elements corresponding to the target image element sample are obtained from an associated image element set, and knowledge representation Va of the target image element sample and knowledge representation Vr of the plurality of associated image elements are extracted. Acquiring identification target image elements in a target element pickup result sample through a knowledge extraction operator of a knowledge representation mining network, acquiring a plurality of associated image elements corresponding to the identification target image elements from an associated image element set, and extracting knowledge representation Vb of the identification target image elements and knowledge representation Vs of the plurality of associated image elements. Inputting knowledge representation Va of a target image element sample and knowledge representation Vr of a plurality of associated image elements into an internal focusing characteristic analysis operator (namely a network operator based on a self-attention mechanism) of a knowledge representation mining network, integrating the knowledge representation Va of the target image element sample and the knowledge representation Vr of the plurality of associated image elements through the internal focusing characteristic analysis operator to obtain an integrated knowledge representation of the target image element sample, inputting knowledge representation Vb of the target image element and knowledge representation Vs of the plurality of associated image elements into an internal focusing characteristic analysis operator of the knowledge representation mining network, and integrating knowledge representation Vb of the identified target image element and knowledge representation Vs of the plurality of associated image elements through the internal focusing characteristic analysis operator to obtain an integrated knowledge representation of the identified target image element. The integrated knowledge representation of the target image element sample and the integrated knowledge representation of the identified target image element are input into a joint mapping integration operator (namely, a network operator based on a multi-head attention mechanism, information of a plurality of heads is combined to carry out saliency mapping) of a knowledge representation mining network, and multi-head saliency feature embedding mapping is carried out on the integrated knowledge representation of the target image element sample through the joint mapping integration operator, so that a plurality of saliency feature embedding mapping arrays corresponding to the integrated knowledge representation of the target image element sample are obtained; and carrying out multi-head salient feature embedding mapping on the integrated knowledge representation of the identification target image element based on the joint mapping integration operator to obtain a plurality of salient feature embedding mapping arrays corresponding to the integrated knowledge representation of the identification target image element. Inputting a plurality of salient feature embedding mapping arrays corresponding to the integrated knowledge representation of the target image element sample into a summarizing decision operator (for example, a fully connected network operator) of a knowledge representation mining network, deciding (fully connected mapping) the plurality of salient feature embedding mapping arrays corresponding to the integrated knowledge representation of the target image element sample through the summarizing decision operator to obtain the integrated knowledge representation of the target image element sample, inputting the plurality of salient feature embedding mapping arrays corresponding to the integrated knowledge representation of the identification target image element into the summarizing decision operator of the knowledge representation mining network, deciding the plurality of salient feature embedding mapping arrays corresponding to the integrated knowledge representation of the identification target image element through the summarizing decision operator to obtain the integrated knowledge representation of the identification target image element, and obtaining the integrated knowledge representation of the target element pickup result sample through the integrated knowledge representation of the identification target image element.

Step S340, optimizing knowledge representation mining network internal configuration variables by loss between the commonality metric result and the commonality metric result sample between the integrated knowledge representation of the target image element sample and the integrated knowledge representation of the target element pick-up result sample.

For example, network internal configuration variables (including various types of parameters, hyper-parameters, etc.) of the knowledge representation mining network may be optimized based on the contrast loss function, or, in other embodiments, may be optimized based on the cross entropy loss function.

When the knowledge representation mining network is debugged, the associated image elements are added into the analysis process, different associated image elements represent different meanings of the first target image element, the knowledge representation of the target image element sample and semantic information contained in the identification target image element are perfected according to the associated image elements, the obtained two integrated knowledge representations can more accurately represent the semantic information of the target image element sample in the RPA element page image learning sample and the semantic information of the identification target image element in the target element pickup result, and therefore knowledge representation mining effect of the knowledge representation mining network is improved.

Based on the foregoing embodiments, the embodiments of the present application provide an RPA flow processing analysis device, where each unit included in the RPA flow processing analysis device and each module included in each unit may be implemented by a processor in a computer device; of course, the method can also be realized by a specific logic circuit; in practice, the processor may be a central processing unit (Central Processing Unit, CPU), microprocessor (Microprocessor Unit, MPU), digital signal processor (Digital Signal Processor, DSP) or field programmable gate array (Field Programmable Gate Array, FPGA), etc.

Fig. 2 is a schematic diagram of a composition structure of an RPA flow processing analysis device according to an embodiment of the present application, and as shown in fig. 2, an RPA flow processing analysis device 200 includes:

an associated element obtaining module 210, configured to obtain a plurality of associated image elements of a first target image element in an RPA element page image to be picked up, where the plurality of associated image elements are used to characterize various element possibilities contained in the first target image element;

a knowledge representation extraction module 220, configured to obtain a knowledge representation of the first target image element through the first target image element and a context image element of the first target image element in the RPA element page image to be picked up;

A knowledge representation integration module 230, configured to integrate the knowledge representation of the first target image element with the knowledge representations of the plurality of associated image elements to obtain an integrated knowledge representation of the first target image element according to a result of a commonality metric between the knowledge representation of the first target image element and the knowledge representations of the plurality of associated image elements;

and the RPA element picking module 240 is configured to obtain a target element picking result corresponding to the RPA element page image to be picked up through the integrated knowledge representation.

The description of the apparatus embodiments above is similar to that of the method embodiments above, with similar advantageous effects as the method embodiments. In some embodiments, the functions or modules included in the apparatus provided by the embodiments of the present application may be used to perform the methods described in the foregoing method embodiments, and for technical details that are not disclosed in the embodiments of the apparatus of the present application, reference should be made to the description of the embodiments of the method of the present application.

It should be noted that, in the embodiment of the present application, if the RPA process analysis method is implemented in the form of a software functional module, and is sold or used as a separate product, the RPA process analysis method may also be stored in a computer readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or some of contributing to the related art may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes. Thus, embodiments of the application are not limited to any specific hardware, software, or firmware, or any combination of hardware, software, and firmware.

The embodiment of the application provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor realizes part or all of the steps in the method when executing the program.

Embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs some or all of the steps of the above-described method. The computer readable storage medium may be transitory or non-transitory.

Embodiments of the present application provide a computer program comprising computer readable code which, when run in a computer device, causes a processor in the computer device to perform some or all of the steps for carrying out the above method.

Embodiments of the present application provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program which, when read and executed by a computer, performs some or all of the steps of the above-described method. The computer program product may be realized in particular by means of hardware, software or a combination thereof. In some embodiments, the computer program product is embodied as a computer storage medium, in other embodiments the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

It should be noted here that: the above description of various embodiments is intended to emphasize the differences between the various embodiments, the same or similar features being referred to each other. The above description of apparatus, storage medium, computer program and computer program product embodiments is similar to that of method embodiments described above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus, the storage medium, the computer program and the computer program product of the present application, reference should be made to the description of the embodiments of the method of the present application.

Fig. 3 is a schematic diagram of a hardware entity of a computer device according to an embodiment of the present application, as shown in fig. 3, the hardware entity of the computer device 1000 includes: a processor 1001 and a memory 1002, wherein the memory 1002 stores a computer program executable on the processor 1001, the processor 1001 implementing the steps in the method of any of the embodiments described above when the program is executed.

The memory 1002 stores a computer program executable on a processor, and the memory 1002 is configured to store instructions and applications executable by the processor 1001, and may also cache data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by each module in the processor 1001 and the computer device 1000, which may be implemented by a FLASH memory (FLASH) or a random access memory (Random Access Memory, RAM).

The processor 1001 implements the steps of the RPA flow process analysis method of any one of the above when executing a program. The processor 1001 generally controls the overall operation of the computer device 1000.

Embodiments of the present application provide a computer storage medium storing one or more programs executable by one or more processors to implement the steps of the RPA flow process analysis method of any of the embodiments above.

It should be noted here that: the description of the storage medium and apparatus embodiments above is similar to that of the method embodiments described above, with similar benefits as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and the apparatus of the present application, please refer to the description of the method embodiments of the present application. The processor may be at least one of a target application integrated circuit (Application Specific Integrated Circuit, ASIC), a digital signal processor (Digital Signal Processor, DSP), a digital signal processing device (Digital Signal Processing Device, DSPD), a programmable logic device (Programmable Logic Device, PLD), a field programmable gate array (Field Programmable Gate Array, FPGA), a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, and a microprocessor. It will be appreciated that the electronic device implementing the above-mentioned processor function may be other, and embodiments of the present application are not limited in detail.

The computer storage medium/Memory may be a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable programmable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable programmable Read Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), a magnetic random access Memory (Ferromagnetic Random Access Memory, FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Read Only optical disk (Compact Disc Read-Only Memory, CD-ROM); but may also be various terminals such as mobile phones, computers, tablet devices, personal digital assistants, etc., that include one or any combination of the above-mentioned memories.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence number of each step/process described above does not mean that the execution sequence of each step/process should be determined by its functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read Only Memory (ROM), a magnetic disk or an optical disk, or the like, which can store program codes.

Alternatively, the above-described integrated units of the present application may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the related art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.

The foregoing is merely an embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application.

Claims

1. An RPA process analysis method, applied to a computer device, comprising:

2. The method of claim 1, wherein the obtaining a knowledge representation of the first target image element by the first target image element and a contextual image element of the first target image element in the RPA element page image to be picked up comprises:

3. The method of claim 2, wherein the performing salient feature embedding mapping on the first target image element and the context image element to obtain the knowledge representation of the first target image element comprises:

4. A method according to claim 3, wherein integrating the knowledge representation of the RPA element page image to be picked with the knowledge representations of the plurality of associated image elements by a plurality of first correlation eccentricity factors, resulting in an integrated knowledge representation of the first target image element comprises:

5. The method of claim 4, wherein the knowledge characterization mining network debugging process comprises:

6. The method of claim 1, wherein the acquiring a plurality of associated image elements of a first target image element in the RPA element page image to be picked up comprises:

7. The method of claim 6, wherein the performing image segmentation processing on the RPA element page image to be picked up to obtain a plurality of contrast image elements of the RPA element page image to be picked up includes:

8. The method according to any one of claims 1 to 7, wherein the method for obtaining knowledge representation of the plurality of associated image elements comprises:

9. The method of claim 8, wherein performing salient feature embedding mapping on a plurality of image elements in the any associated image element to obtain a knowledge representation of the any associated image element comprises:

10. A computer device comprising a memory and a processor, the memory storing a computer program executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1-9 when the program is executed.