CN115019029B - RPA element intelligent positioning method based on neural automaton - Google Patents

RPA element intelligent positioning method based on neural automaton Download PDF

Info

Publication number
CN115019029B
CN115019029B CN202210944163.XA CN202210944163A CN115019029B CN 115019029 B CN115019029 B CN 115019029B CN 202210944163 A CN202210944163 A CN 202210944163A CN 115019029 B CN115019029 B CN 115019029B
Authority
CN
China
Prior art keywords
state
rpa
picture
automaton
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210944163.XA
Other languages
Chinese (zh)
Other versions
CN115019029A (en
Inventor
王昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Real Intelligence Technology Co ltd
Original Assignee
Hangzhou Real Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Real Intelligence Technology Co ltd filed Critical Hangzhou Real Intelligence Technology Co ltd
Priority to CN202210944163.XA priority Critical patent/CN115019029B/en
Publication of CN115019029A publication Critical patent/CN115019029A/en
Application granted granted Critical
Publication of CN115019029B publication Critical patent/CN115019029B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/245Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of page element positioning, and discloses an RPA element intelligent positioning method based on a neural automaton, which comprises the neural automaton, wherein the specific operation steps comprise data acquisition and transformation, model design, model training and model reasoning, the neural automaton has five states and corresponding actions, namely A represents that element recognition is rejected, B represents that element recognition passes through and moves to a picture block on the right side, C represents that element recognition passes through and moves to a picture block on the lower side, D represents that element recognition passes through and moves to a picture block on the left side, and E represents that element recognition passes through and moves to a picture block on the upper side. The RPA element intelligent positioning method based on the neural automata reduces dependence on a large amount of data by reducing the modeling scale, improves the efficiency of feature utilization by only using a single model, only processes boundary information, accelerates reasoning speed and eliminates interference of irrelevant factors.

Description

RPA element intelligent positioning method based on neural automaton
Technical Field
The invention relates to the technical field of page element positioning, in particular to an RPA element intelligent positioning method based on a neural automaton.
Background
The RPA (robot process automation) technology is a process automation technology, and a process editor provided by RPA software can be used for designing and configuring a service operation process capable of being automatically executed, packaging the service operation process into a form of a software robot or a virtual robot, deploying the software robot or the virtual robot to a production environment and a service system for execution, simulating a series of operations of a human on a computer, such as mouse movement, mouse clicking, keyboard input, webpage opening, page information acquisition, file creation, file content input, file storage, file deletion and the like.
The existing traditional RPA technology realizes interaction with a service system by analyzing interface layout and codes. For example, various operations of a mouse and a keyboard in an operating system are realized through an API (application programming interface) interface provided by some desktop application automation tools; positioning and operating browser page elements such as buttons, input frames, text lines and the like are realized by analyzing CSS (cascading Styles tables) structures and JavaScript (scripting language for developing webpage) codes and the like of the browser pages; the positioning and operation of elements in the software interface are realized by analyzing the source code of office software. The technology has high requirements on visibility and openness of an operation object, and position and attribute information of the operation object needs to be acquired through an interface or source code so as to execute corresponding operations.
In addition, the technology has problems in some application scenarios, for example, for operation objects such as remote desktops, virtual systems, and office software developed by some clients, only one page picture can be obtained, and positioning and operation cannot be performed through an API interface or a source code parsing manner. In order to solve this problem, a solution provided in the prior art is to use a computer vision technology in the field of AI (artificial intelligence) to match, locate and operate operation elements. For example, a "submit" button on the remote desktop is first located to the position of the button in the whole page picture by means of object detection or template matching, and then the "submit" action is completed in cooperation with the mouse movement and click operation. Currently, the target detection method has the disadvantage that a deep learning model used in the method needs a large number of sample pictures marked with detection frames during training to achieve high positioning accuracy. For a webpage or common office software, a large amount of sample data can be automatically constructed in a synthesis mode; however, for some business software developed by the client itself or rarely used, the sample data with the label is difficult to obtain, and the deep learning model has not learned similar samples before, so the effect of element detection and positioning is not ideal.
The problem with this approach to template matching: firstly, the matching effect is not ideal in the traditional and single matching mode based on picture pixel values or 'feature points'; secondly, with the expansion of the application scene, the number of page elements, namely templates, to be matched is increased, and if a reasonable retrieval structure is not adopted, the speed of template query and matching is slow, so that the use is influenced.
Recently, although there is a trend to combine the above-mentioned technologies to solve the element positioning problem, the implementation difficulty and the calculation amount are increased.
In summary, the prior art has the problems of simplicity, difficulty in use and inconvenience in use when solving the problem of element positioning, and therefore it is necessary to design a technology considering development and deployment efficiency at the same time to solve the above problems.
Disclosure of Invention
In order to achieve the purpose, the invention provides the following technical scheme: an RPA element intelligent positioning method based on a neural automaton comprises the following specific operation steps:
step 1, dividing a picture into a plurality of smaller picture blocks, and extracting the characteristics of each small picture block to form a characteristic map based on a pre-trained picture block characteristic extraction model;
step 2, obtaining the image block state of each image block based on a pre-trained state prediction model, comprising:
2.1 selecting any position (i, j) of the feature map y, starting with the corresponding feature y i,j At a time t 0 The initial state of the automaton is z 0
2.2 State transition model by automaton
Figure 752743DEST_PATH_IMAGE001
Obtaining the state of the next moment t;
Figure 890464DEST_PATH_IMAGE002
2.3 Next Using the decision model
Figure 618248DEST_PATH_IMAGE003
Determining the next action of the automaton
Figure 841419DEST_PATH_IMAGE004
And marking u for the tile at location (i, j) t
2.4 action u here t The automaton can be moved one unit from the current position (i, j) to the next position, for example in the manner presented in the table below, as in (i +1,j);
step 3, finding an action mark sequence of at least one ordered image block meeting a preset condition, wherein each action mark in the ordered image block action mark sequence corresponds to one image block in the picture, and all image blocks corresponding to the ordered image block action mark sequence form a closed loop according to the indication of the action marks;
and 4, obtaining the position and the size of the RPA element based on the ordered tile state sequence.
Wherein the solving process of each model parameter θ, φ, ψ, i.e. the (pre-) training process of the model, comprises:
the first step, data acquisition and transformation, the acquired data are picture x and block state s, wherein the block state is a matrix, and each element represents a block.
Secondly, designing a model, extracting features by using CNN according to a given picture to obtain a feature map y = f (x), and point-by-point y on the feature map i,j Running LSTM to get a memory of time of day
Figure 312852DEST_PATH_IMAGE005
Then using MLP to acquire the state
Figure 570658DEST_PATH_IMAGE006
Then the coordinates (i, j) of the next feature are determined from this state, noting that when a rejection state occurs, it is necessary to clear the memory information of the LSTM
Figure 984190DEST_PATH_IMAGE007
Moving the LSTM input head to point to different feature points y using state-to-action mapping i,j When a non-rejection state occurs, the memory information h of the previous step needs to be transmitted t-1 Given the LSTM, we can then locate the RPA elements by the state trajectory that the neuroautomaton has traversed.
Thirdly, model training, namely optimizing the following problem solving model parameter theta,
Figure 960237DEST_PATH_IMAGE008
the training picture is encoded into a feature map through CNN, and each image block corresponds to a feature vector y on the feature map i,j Finding complete boundary containing BCDE in state label s and corresponding feature vector y in sequence i,j Sequence of constituent features and states
Figure 704202DEST_PATH_IMAGE009
(in this case, the sequence index t determines the coordinates (i, j) of the feature vector), and for this sequence, the feature vector sequence { y } t Sending it into LSTM to get the memory sequence { h } t Mapping the memory sequence to the prediction state sequence in parallel by using MLP
Figure 816514DEST_PATH_IMAGE010
At this timeCalculating the loss using a cross entropy (Cross Encopy) loss function
Figure 620522DEST_PATH_IMAGE011
While the use of back propagation algorithm can reduce the loss
Figure 333132DEST_PATH_IMAGE011
Here, using the RMSprop optimizer, the model parameters θ that can correctly complete the task will be obtained after the loss convergence.
And fourthly, model reasoning is carried out, loss calculation is not carried out at the moment, the optimized model parameters in the previous section are used, and the boundary box of the RPA element is calculated according to the predicted state label sequence which can be obtained by state vector transfer and feature reading actions in model design.
Designing a neural automaton and carrying out specific operation steps.
The neural automaton has five states and corresponding actions, a: reject element identification, B: element identification pass and move to right tile, C: element identification through tile moved to the bottom, D: element identification through and to the left, E: the element identifies the tile that passes and moves to the top.
The neural automaton needs to position specific elements in the picture before running, firstly a CNN network is input to obtain a feature map, wherein each position of the feature map corresponds to an area of the input picture, and the picture is divided through grids.
Each image block divided in the picture corresponds to a vector, the neural automaton runs in a grid space shown in the picture, the vector corresponding to the image block is used as input, next input is automatically determined according to the current state until the next input returns to the starting point, and therefore the position and the size of one interface element are obtained.
The neuro-automaton walks around the entire picture in left-to-right, top-to-bottom reading order until encountering an RPA element whose state changes in turn to B, C, D, E, whereupon the RPA element is located, and then the automaton continues to walk around other regions of the picture looking for other RPA elements until all tiles are accessed or closed.
Preferably, the neural automaton has an internal state vector, and the next state vector is obtained by reasoning the input vector and the state vector itself, and the above behavior can be conveniently modeled using a recurrent neural network.
Preferably, there is a vector prototype for each state vector, other vectors sufficiently similar to the vector prototype are also considered as the states, the cosine of the angle between the vectors is generally used as a measure of their similarity, the state vector space is partitioned into different regions by the neighborhood of each state prototype to correspond to the individual states, and in practice, an MLP multi-tier perceptron is used to model the mapping of the vector space to the state space.
According to yet another embodiment of the present invention, there is provided an electronic device including a memory and a processor, the memory having stored thereon a computer program that, when executed by the processor, is capable of implementing the above-described neural automaton-based RPA element intelligent localization method.
According to still another embodiment of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is capable of implementing the above neural automata-based RPA element intelligent localization method.
Compared with the prior art, the invention provides an RPA element intelligent positioning method based on a neural automaton, which has the following beneficial effects:
compared with target detection and feature extraction based on deep learning, the RPA element intelligent positioning method based on the neural automata limits the receptive field of a visual model to a local sub-pixel space of an image instead of the whole image, so that the parameter quantity required by modeling can be reduced by a plurality of orders of magnitude, and the data quantity required by the training of the model is easier to obtain.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a first illustration of the present invention;
FIG. 3 is a second embodiment of the present invention;
FIG. 4 is a third embodiment of the present invention;
FIG. 5 is a state vector transition diagram of the present invention;
FIG. 6 is a fourth embodiment of the present invention;
FIG. 7 is a diagram illustrating example five of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The related art and technical terms involved in the present invention will be briefly described below so that the skilled person can better understand the present solution.
A neural automaton:
the automaton comprises inputs, states and corresponding actions, and in general we define a finite state automaton as a five-tuple: (Q, Σ, δ, Q0, F), where Q is the set of all states, Σ is the set of input characters, δ is the state transfer function (the cartesian product of the fields Q and Σ is defined, the value field is Q), Q0 and F are the starting states, respectively (F is the set, is a subset of Q; Q0 is an element of Q, there is only one initial state).
The automaton may recognize some patterns. For example, we want to identify patterns of yz, yxz, yxxz, yxxxxz. Let Q = { c, Q0, Q, o }, Σ = { x, y, z }, δ be as shown in the following table:
δ: Q ✕ Σ → Q x y z
c
q0 o
q q c
o q
c is a termination state, and the machine is stopped (failure in recognition) when an undefined condition is encountered. Inputting a character string yxxz, firstly enabling the automaton to be in a state q0, changing the state of the automaton into o according to the table after receiving the initial character y, then receiving the next character x, changing the state into q, then receiving the second character x, keeping the state unchanged, finally receiving the character z, changing the state into an end state c, and successfully recognizing.
In particular, we can use vectors to represent Q and Σ, while representing the state transfer function δ with a neural network, which results in a neural automaton. This model is good at handling vector sequences. Much like tracing inside a painting, the region of interest is the region around the brush rather than the entire painting, so we use this computational model to deal with the RPA element boundary recognition problem.
The process of the invention is shown in fig. 1, and specifically comprises the following steps:
step one, extracting a feature vector of a target image block;
step two, obtaining the image block state of each image block based on a pre-trained state prediction model;
step three, finding the next image block with the corresponding characteristic according to the current image block ordered state sequence;
and step four, until all the blocks in the ordered block state sequence form a closed loop according to the indication of the block states, thereby obtaining the position and the size of the RPA element.
The invention uses the artificial neural network as a component, and is characterized by having huge parameter space, and some elements are needed to find the optimal parameter theta in the implementation process, and the elements are as follows: data D, loss function L, optimization (parameter search) algorithm (optimization problem)
Figure 146367DEST_PATH_IMAGE012
)。
The phase of the optimization algorithm run, called the training phase, during which our goal is to find the optimal parameter θ * Enabling the model to perform a task, e.g. the task of the invention is in one pieceFinding RPA elements in the picture, and solving the task by using a trained model is called an inference stage.
In the training stage, the technical scheme provided by the invention can be divided into the following steps:
1. data acquisition and transformation
The data used by the invention is a picture x and a picture block state s, wherein the picture block state is a matrix, each element represents a picture block, as shown in fig. 4, but considering that the cost of marking according to the picture blocks is too high, the data used by the invention can be obtained by transforming general right-angle bounding box marking data (as shown in fig. 7), and the specific transformation method is as follows:
the first step is as follows: the tiles are sliced into smaller tiles (which may overlap between tiles to increase robustness) and numbered.
The second step is that: fig. 4 shows how to label the image block with a label corresponding to the corresponding state according to whether the image block includes a bounding box and the type of the bounding box (whether the image block includes corners). There are five states and corresponding actions known for neuroautomata:
a: the element identification is rejected.
B: the element identifies the tile that passes and moves to the right (the next input is the right tile).
C: the element identifies the tile that passes and moves to the bottom (the next input is the bottom tile).
D: the element identifies the tile that passes and moves to the left (the next input is the tile on the left).
E: the element identifies the tile that passes and moves to the top (the next input is the top tile).
The labeling method of the blocks is as follows: for a set of tiles S that each element boundary crosses clockwise:
(1) If two blocks on the right and lower sides of the block S also belong to the element in S, the block is labeled B;
(2) If the boundary covered by the block faces right clockwise, the block is marked as B;
(3) If the boundary covered by the tile is downward clockwise, the tile is labeled as C;
(4) If the boundary covered by the block faces to the left clockwise, the block is marked as D;
(5) If the boundary covered by the image block faces upwards clockwise, the image block is marked as E;
(6) The rest blocks are marked as A, and the representation is independent of the boundary and does not need to participate in model reasoning calculation.
It should be noted that the requirement for labeled data depends on the complexity of the behavior of the algorithm model, and the complex model requires more training data to make the behavior pattern converge to the expected one under the condition of ensuring enough constraint conditions. The model, either the behavior complexity or the parameter number, is far lower than the prior art based on the convolutional neural network, so the requirement on data is also far lower than the prior art. And the marking scheme taking the image blocks as units has higher efficiency than the marking scheme taking the pixels as units, thereby saving more manpower.
2. Model design
Given picture x, using CNN to extract features results in feature map y = f (x). Each image block in the picture corresponds to a coordinate (i, j) on the feature map, and the coordinate (i, j) corresponds to a feature vector y i,j
Point by point y on the feature map i,j Running LSTM to obtain memory of t moment
Figure 378766DEST_PATH_IMAGE005
Then, the MLP is used to obtain the state of the current time t
Figure 150412DEST_PATH_IMAGE006
Then, based on the state, the coordinates (i, j) corresponding to the extracted feature at the next time t +1 are determined in accordance with the action-state mapping relationship at the time of data preparation. Note that when rejection state A occurs, it is necessary to clear the memory information of LSTM at the current time t
Figure 569892DEST_PATH_IMAGE007
Moving LSTM input using state-action mappingHead pointing to different feature points y i,j When the non-rejection state (BCDE) occurs, the memory information h of the last time t-1 needs to be transmitted t-1 To the LSTM. Through the state track (the sequence formed by arranging the coordinates ij according to the time t in sequence) passed by the automaton, the RPA element can be positioned (the sequence formed by arranging the coordinates ij according to the time t in sequence can enclose the boundary of the RPA element, and the position can be determined by the boundary and the corresponding coordinates).
3. Model training
The model training is to optimize the following problem solving model parameters theta:
Figure 452398DEST_PATH_IMAGE008
in the present invention, the data D refers specifically to the picture x and the state labels s of the tiles (corresponding to the expected state ABCDE of the neuro-automaton).
The training picture is encoded into a feature map through CNN, each image block corresponds to a feature vector y on the feature map i,j Finding complete boundary containing BCDE in state label s and corresponding feature vector y in sequence i,j Sequence of constituent features and states
Figure 539303DEST_PATH_IMAGE009
(in this case, the sequence index t determines the coordinates (i, j) of the feature vector). For each RPA element, a learning sample can be constructed
Figure 999627DEST_PATH_IMAGE009
Wherein s is t The label comprises four states of complete BCDE image block labels forming a closed loop, and a region enclosed by corresponding image blocks corresponds to an RPA element. To represent the learning samples of multiple RPA elements on one picture, only one set of the learning samples corresponding to all the RPA elements is needed to be used as the learning sample corresponding to the picture. When calculating the loss function L, firstly, the loss L of the learning sample corresponding to each RPA element is calculated, and then the losses L corresponding to all the RPA elements in the union set of the learning sample sets corresponding to each picture are summed to obtain the global loss L. For the sequence, feature vectors are addedSequence y t Sending it into LSTM to get the memory sequence { h } t Mapping the memory sequence to the prediction state sequence in parallel by using MLP
Figure 640824DEST_PATH_IMAGE010
At this time, the loss is calculated using a cross entropy (CrossEntropy) loss function
Figure 513971DEST_PATH_IMAGE011
While the use of back propagation algorithm can reduce the loss
Figure 455382DEST_PATH_IMAGE011
Here, using the RMSprop optimizer, after the loss convergence, the model parameters θ will be obtained that can properly complete the task.
4. Model reasoning
Unlike the training phase, there is no previously labeled state sequence s t . On the premise of full training of the model, the method can be used
Figure 307449DEST_PATH_IMAGE013
To approximate substitute s t Let the model go from the current state s t Get the next state s by itself t+1 S thus recurrently at different time points t The composed sequence corresponds to the boundary contour of one RPA element. Notably, in the initial hidden state h 0 The model of (c) can also be used to find the element outlines starting with the state B patches, so we start with h after or just after finding an RPA element 0 Model finding an s in hidden state 0 B-state block and s is performed by starting with the block t And (5) recursion of the sequence. It should be noted that under the clockwise BCDE action state setting, s found are found in left-to-right top-to-bottom reading order 0 The tile for B must be the first tile of a sequence of tiles.
The technical scheme provided by the invention can comprise the following steps:
s1, dividing the picture into a plurality of smaller picture blocks x ij (or x) t ) And using CNN feature extraction model to obtain its correspondent feature vector y ij (or y) t )。
S2, based on the pre-trained state transition model, initially recognizing the hidden state h 0 And the feature y 0 Obtaining the corresponding hidden state h of each image block in a recursion way t H is then determined by using MLP decision model t Mapping to an action State s t It is necessary to point out s t Block x, which determines that the neuro-automaton is to access at the next time t +1 t Repeating the steps 1 and 2 to obtain s t+1 . It should be noted that when s t+1 And s t When incompatible, the hidden state of the model should be reset to the initial state h 0 E.g., B followed by E and skipping CD we consider incompatible.
S3, finding an ordered sequence of block states satisfying a complete and continuous BCDE condition, each block state S in the ordered sequence of block states t Corresponding to a tile x in a picture t (or x) ij ) And all tiles x in the ordered sequence of tile states t (or x) ij ) According to the state s of the picture block t The indication of (c) forms a closed loop.
S4, based on the ordered graph block state sequence S t Obtain the corresponding sequence of tiles "x t "or (" "x") ij "at this time, the position and size of the RPA element can be obtained by using the coordinate information ij in the tile sequence.
In the specific embodiment, the specific implementation flow of the invention is as follows:
as shown in fig. 2, it is necessary to locate the center-floated shadow-specific element.
First, a CNN network is input to obtain a feature map, where each position of the feature map corresponds to an area of an input picture, and is divided as shown in a grid (10 ✕) in fig. 3, and each tile corresponds to a vector (for example, 512-dimensional, output by CNN).
The neural automaton of the present invention operates in the grid space shown in fig. 3, takes the vector corresponding to the block as input, and determines the next input by itself according to the current state.
For example, the neural automaton in initial state A is placed in the upper left-hand grid, and the next state B is obtained by reasoning the corresponding vector, and then it decides to read the next input vector from the right-hand grid. We repeat the above process to obtain a trace containing various states as shown in fig. 4:
when it returns to the beginning (this tile has been visited before) in state E we have a sequence of tiles [ x ] 0 ,x 1 ,x 2 ,。。。,x t The tiles themselves contain coordinate information ij, so that the sequence of tiles can also be written [ x ] i0j0 ,x i1j1 ,x i2j2 ,。。。,x ij In this way, a set of coordinates of the picture block, i 0 j 0 ,i 1 j 1 ,i 2 j 2 ,. . . Ij defines the coordinate range [ min (i), min (j), max (i), max (j) ] corresponding to the RPA element, so that the position (the minimum value in i/j corresponds to the coordinate of the upper left corner of the element) and the size (the maximum value in ij corresponds to the coordinate of the lower right corner of the element) of the RPA interface element can be obtained, and by subtracting the coordinate of the upper left corner from the coordinate of the lower right corner, the width and height information of the element, namely the size of the RPA interface element, can be obtained.
It should be noted that the neural automaton of the present invention has an internal state vector (for example, 512 dimensions), the next state vector is obtained by reasoning the input vector and the self state vector, and the above behaviors can be conveniently modeled by using a recurrent neural network (for example, a long short-term memory), and the transitions of the various state vectors are as shown in fig. 5.
There is a vector prototype (e.g. 512 dimensions, which can be obtained by learning) for each state, other vectors sufficiently similar to the prototype are also considered as the state vector, the cosine of the included angle between the vectors is usually used as the measure of their similarity, the state vector space is divided into different regions by the neighborhood of each state prototype to correspond to each state, as shown in fig. 6, the hollow arrow is more similar to arrow E, and in practice, an MLP Multi-layer perceptron (Multi-layerpercetron) should be used to model the mapping of the vector space to the state space.
The automaton walks around the whole picture in the left-to-right top-to-bottom reading order until encountering the RPA element and the state of the RPA element is sequentially changed to B, C, D, E, so that the RPA element is located (the position and the size of the RPA element can be obviously deduced by using the coordinate information corresponding to the complete and continuous BCDE state sequence, and then the automaton continues to walk around other areas of the picture to find other RPA elements until all the blocks are accessed or closed.
According to yet another embodiment of the present invention, there is provided an electronic device including a memory and a processor, the memory having stored thereon a computer program that, when executed by the processor, is capable of implementing the above-described neural automaton-based RPA element intelligent localization method.
According to still another embodiment of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is capable of implementing the above neural automaton-based RPA element smart positioning method.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (9)

1. An RPA element intelligent positioning method based on a neural automaton is characterized in that: the specific operation steps are as follows:
step 1, dividing a picture into a plurality of smaller picture blocks, and obtaining feature vectors corresponding to the picture blocks by using a convolutional neural network feature extraction model;
step 2, obtaining the image block state of each image block based on a pre-trained state prediction model; the training process of the state prediction model comprises the following steps:
the method comprises the following steps of firstly, acquiring and transforming data, wherein the acquired data comprise a picture x and a picture block state s, the picture block state is a matrix, and each element represents a picture block;
secondly, designing a model, extracting features by using a convolutional neural network according to a given picture x to obtain a feature map y = f (x), and point-by-point y on the feature map i,j Running LSTM to obtain memory of t moment
Figure 404634DEST_PATH_IMAGE001
Then obtaining the state by using MLP multi-layer perceptron
Figure 148599DEST_PATH_IMAGE002
Then the coordinates (i, j) of the next feature are determined from this state, noting that when a rejection state occurs, it is necessary to clear the memory information of the LSTM
Figure 792070DEST_PATH_IMAGE003
Moving the LSTM input head using the state-action map to point to different feature points y i,j When the non-rejection state occurs, the memory information h of the previous step needs to be transmitted t-1 For LSTM, we can locate RPA element through the state track passed by the neural automaton;
step 3, finding at least one ordered tile state sequence meeting a preset condition, wherein each tile state in the ordered tile state sequence corresponds to one tile in the picture, and all the tiles in the ordered tile state sequence form a closed loop according to the indication of the tile states;
and 4, obtaining the position and the size of the RPA element based on the ordered tile state sequence.
2. The RPA element intelligent positioning method based on the neural automaton as claimed in claim 1, wherein: the training process of the state prediction model comprises the following steps:
the training picture is encoded into a feature map through a convolutional neural network, and each image block corresponds to the feature mapA feature vector of
Figure 392815DEST_PATH_IMAGE004
Finding complete boundary containing BCDE in state label s and corresponding characteristic vector in sequence
Figure 449633DEST_PATH_IMAGE004
Sequence of constituent features and states
Figure 794027DEST_PATH_IMAGE005
In this case, the sequence index a determines the coordinates (i, j) of the feature vector, and for this sequence, the feature vector sequence is sequenced
Figure 495266DEST_PATH_IMAGE006
Sending the sequence into LSTM to obtain memory sequence { ht }, and mapping the memory sequence to the predicted state sequence in parallel by using MLP multi-layer perceptron
Figure 532493DEST_PATH_IMAGE007
When calculating the loss using the cross entropy loss function
Figure 483131DEST_PATH_IMAGE008
Loss can be reduced using back propagation algorithms
Figure 444265DEST_PATH_IMAGE008
Obtaining a model parameter theta which can correctly complete the task after loss convergence;
the neural automaton has five states and corresponding actions, A: rejection of element identification, B: element identification through and move to right tile, C: element identification by and to the lower tile, D: element identification through and to the left, E: the element identifies the tile that passes and moves to the top.
3. The RPA element intelligent positioning method based on neural automaton according to claim 2, wherein: when reasoning is carried out through the model, loss calculation is not carried out, optimized model parameters are used, and a predicted state label sequence can be obtained and a bounding box of an RPA element is calculated according to state vector transfer and feature reading actions in model design.
4. The RPA element intelligent positioning method based on the neural automaton as claimed in claim 2, wherein: the neural automaton specifically comprises the following operation steps:
before operation, specific elements in the picture need to be positioned, firstly, a convolutional neural network is input to obtain a feature map, wherein each position of the feature map corresponds to an area of the input picture, and the picture is divided through grids;
each image block divided from the picture corresponds to a vector, the neural automaton runs in a grid space shown by the picture, the vector corresponding to the image block is used as input, next input is automatically determined according to the current state until the next input returns to the starting point, and therefore the position and the size of an interface element are obtained;
the neuro-automaton walks around the entire picture in left-to-right, top-to-bottom reading order until encountering an RPA element whose state changes in turn to B, C, D, E, whereupon the RPA element is located, and then the automaton continues to walk around other regions of the picture looking for other RPA elements until all tiles are accessed or closed.
5. The RPA element intelligent positioning method based on neural automaton according to claim 4, wherein: the neural automaton has an internal state vector, and the next state vector can be obtained by reasoning the input vector and the state vector of the neural automaton.
6. The RPA element intelligent positioning method based on neural automaton according to claim 5, wherein: there is a vector prototype for each state vector, other vectors sufficiently similar to the vector prototype are also identified as the state, the cosine of the angle between the vectors is used as a measure of their similarity, the state vector space is partitioned into different regions by the neighborhood of each state prototype to correspond to the individual states, and in practice, the MLP multi-layer perceptron is used to model the mapping of the vector space to the state space.
7. The utility model provides a RPA element intelligence positioner based on neural automaton which characterized in that: the system comprises a feature extraction module, a state updating module, a state sequence acquisition module and an element positioning module, and specifically comprises:
the characteristic extraction module is used for dividing the image into a plurality of smaller image blocks and obtaining characteristic vectors corresponding to the image blocks by using a convolutional neural network characteristic extraction model;
the state updating module is used for obtaining the image block state of each image block based on a pre-trained state prediction model; the training process of the state prediction model comprises the following steps:
firstly, data are acquired and transformed, the acquired data comprise a picture x and a picture block state s, wherein the picture block state is a matrix, and each element represents a picture block;
secondly, designing a model, extracting features by using a convolutional neural network according to a given picture x to obtain a feature map y = f (x), and point-by-point y on the feature map i,j Running LSTM to obtain memory of t moment
Figure 62328DEST_PATH_IMAGE001
Then obtaining the state by using MLP multi-layer perceptron
Figure 208139DEST_PATH_IMAGE002
Then the coordinates (i, j) of the next feature are determined from this state, noting that when a rejection state occurs, it is necessary to clear the memory information of the LSTM
Figure 911652DEST_PATH_IMAGE003
Moving the LSTM input head to point to different feature points y using state-to-action mapping i,j When the non-rejection state occurs, the memory information h of the previous step needs to be transmitted t-1 To LSTM, thenThen, the RPA element can be positioned through the state track passed by the neural automaton;
the state sequence acquisition module is used for finding at least one ordered block state sequence meeting a preset condition, each block state in the ordered block state sequence corresponds to one block in the picture, and all blocks in the ordered block state sequence form a closed loop according to the indication of the block state;
an element positioning module to obtain a position and a size of an RPA element based on the ordered sequence of tile states.
8. An RPA element smart location storage medium having stored thereon computer instructions, characterized in that: the computer instructions, when executed by a processor, implement the steps of a method for intelligent location of RPA elements as recited in any of claims 1-6.
9. An RPA element smart positioning computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that: the processor, when running the computer program, implements a RPA element smart location method as recited in any one of claims 1-6.
CN202210944163.XA 2022-08-08 2022-08-08 RPA element intelligent positioning method based on neural automaton Active CN115019029B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210944163.XA CN115019029B (en) 2022-08-08 2022-08-08 RPA element intelligent positioning method based on neural automaton

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210944163.XA CN115019029B (en) 2022-08-08 2022-08-08 RPA element intelligent positioning method based on neural automaton

Publications (2)

Publication Number Publication Date
CN115019029A CN115019029A (en) 2022-09-06
CN115019029B true CN115019029B (en) 2022-11-04

Family

ID=83065943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210944163.XA Active CN115019029B (en) 2022-08-08 2022-08-08 RPA element intelligent positioning method based on neural automaton

Country Status (1)

Country Link
CN (1) CN115019029B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115964027B (en) * 2023-03-16 2023-06-30 杭州实在智能科技有限公司 Desktop embedded RPA flow configuration system and method based on artificial intelligence

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491885A (en) * 2018-03-28 2018-09-04 广东工业大学 A kind of autoCAD graphic blocks identifying method and devices based on Naive Bayes Classifier
CN112101357A (en) * 2020-11-03 2020-12-18 杭州实在智能科技有限公司 RPA robot intelligent element positioning and picking method and system
CN113298250A (en) * 2020-02-24 2021-08-24 福特全球技术公司 Neural network for localization and object detection
CN113391871A (en) * 2021-08-17 2021-09-14 杭州实在智能科技有限公司 RPA element intelligent fusion picking method and system
CN113741882A (en) * 2021-09-16 2021-12-03 杭州分叉智能科技有限公司 RPA graphical instruction design method
CN114419611A (en) * 2020-10-12 2022-04-29 八维智能股份有限公司 Real-time short message robot system and method for automatically detecting character lines in digital image
CN114556391A (en) * 2019-10-15 2022-05-27 尤帕斯公司 Artificial intelligence layer based process extraction for robot process automation
CN114556244A (en) * 2019-10-15 2022-05-27 尤帕斯公司 Process evolution and workflow micro-optimization for robotic process automation
CN114556305A (en) * 2019-10-15 2022-05-27 尤帕斯公司 Artificial intelligence based process identification, extraction and automation for robotic process automation

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190324781A1 (en) * 2018-04-24 2019-10-24 Epiance Software Pvt. Ltd. Robotic script generation based on process variation detection
US11726802B2 (en) * 2018-09-28 2023-08-15 Servicenow Canada Inc. Robust user interface related robotic process automation
JP2023523374A (en) * 2020-04-30 2023-06-05 ユーアイパス,インコーポレイテッド A Machine Learning Model Retraining Pipeline for Robotic Process Automation

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491885A (en) * 2018-03-28 2018-09-04 广东工业大学 A kind of autoCAD graphic blocks identifying method and devices based on Naive Bayes Classifier
CN114556391A (en) * 2019-10-15 2022-05-27 尤帕斯公司 Artificial intelligence layer based process extraction for robot process automation
CN114556244A (en) * 2019-10-15 2022-05-27 尤帕斯公司 Process evolution and workflow micro-optimization for robotic process automation
CN114556305A (en) * 2019-10-15 2022-05-27 尤帕斯公司 Artificial intelligence based process identification, extraction and automation for robotic process automation
CN113298250A (en) * 2020-02-24 2021-08-24 福特全球技术公司 Neural network for localization and object detection
CN114419611A (en) * 2020-10-12 2022-04-29 八维智能股份有限公司 Real-time short message robot system and method for automatically detecting character lines in digital image
CN112101357A (en) * 2020-11-03 2020-12-18 杭州实在智能科技有限公司 RPA robot intelligent element positioning and picking method and system
CN113391871A (en) * 2021-08-17 2021-09-14 杭州实在智能科技有限公司 RPA element intelligent fusion picking method and system
CN113741882A (en) * 2021-09-16 2021-12-03 杭州分叉智能科技有限公司 RPA graphical instruction design method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
《Efficient Automated Processing of the Unstructured Documents Using Artificial Intelligence: A Systematic Literature Review and Future Directions》;Dipali Baviskar等;《IEEE Access》;20210524;第9卷;第72894-72936页 *
《Key-Region and Layout Learning for Contract Intelligent Identification》;Qinghong Guo等;《2021 IEEE International Conference on Emergency Science and Information Technology》;20211231;第57-61页 *
《RPA流程自动化技术分析》;秦海波等;《自动化技术与应用》;20220531;第41卷(第5期);第1-4页 *
《基于RPA技术的财务机器人应用研究》;田高良等;《财会月刊》;20190831(第18期);第10-14页 *

Also Published As

Publication number Publication date
CN115019029A (en) 2022-09-06

Similar Documents

Publication Publication Date Title
Yan et al. Self-supervised learning of state estimation for manipulating deformable linear objects
US7308030B2 (en) Object activity modeling method
US11633860B2 (en) Method and system for machine concept understanding
Lyu et al. Robot path planning by leveraging the graph-encoded Floyd algorithm
CN106981071B (en) Target tracking method based on unmanned ship application
Ridge et al. Training of deep neural networks for the generation of dynamic movement primitives
Chen et al. Learning linear regression via single-convolutional layer for visual object tracking
CN107067410B (en) Manifold regularization related filtering target tracking method based on augmented samples
CN111368759B (en) Monocular vision-based mobile robot semantic map construction system
CN115019029B (en) RPA element intelligent positioning method based on neural automaton
Zhang et al. Sparse learning-based correlation filter for robust tracking
Yu et al. A global energy optimization framework for 2.1 D sketch extraction from monocular images
US8238650B2 (en) Adaptive scene dependent filters in online learning environments
CN115268719B (en) Method, medium and electronic device for positioning target element on interface
Khanday et al. Taxonomy, state-of-the-art, challenges and applications of visual understanding: A review
Fan Research and realization of video target detection system based on deep learning
Lippi et al. Enabling visual action planning for object manipulation through latent space roadmap
Fei et al. O-vit: Orthogonal vision transformer
Tang et al. Two-stage filtering method to improve the performance of object detection trained by synthetic dataset in heavily cluttered industry scenes
CN110659576A (en) Pedestrian searching method and device based on joint judgment and generation learning
CN112507940B (en) Bone action recognition method based on differential guidance representation learning network
CN112365456B (en) Transformer substation equipment classification method based on three-dimensional point cloud data
CN114648762A (en) Semantic segmentation method and device, electronic equipment and computer-readable storage medium
Li et al. Centroid-based graph matching networks for planar object tracking
Deng et al. Multi-label image recognition in anime illustration with graph convolutional networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant