CN115019029B

CN115019029B - RPA element intelligent positioning method based on neural automaton

Info

Publication number: CN115019029B
Application number: CN202210944163.XA
Authority: CN
Inventors: 王昊
Original assignee: Hangzhou Real Intelligence Technology Co ltd
Current assignee: Hangzhou Real Intelligence Technology Co ltd
Priority date: 2022-08-08
Filing date: 2022-08-08
Publication date: 2022-11-04
Anticipated expiration: 2042-08-08
Also published as: CN115019029A

Abstract

The invention relates to the technical field of page element positioning, and discloses an RPA element intelligent positioning method based on a neural automaton, which comprises the neural automaton, wherein the specific operation steps comprise data acquisition and transformation, model design, model training and model reasoning, the neural automaton has five states and corresponding actions, namely A represents that element recognition is rejected, B represents that element recognition passes through and moves to a picture block on the right side, C represents that element recognition passes through and moves to a picture block on the lower side, D represents that element recognition passes through and moves to a picture block on the left side, and E represents that element recognition passes through and moves to a picture block on the upper side. The RPA element intelligent positioning method based on the neural automata reduces dependence on a large amount of data by reducing the modeling scale, improves the efficiency of feature utilization by only using a single model, only processes boundary information, accelerates reasoning speed and eliminates interference of irrelevant factors.

Description

RPA element intelligent positioning method based on neural automaton

Technical Field

The invention relates to the technical field of page element positioning, in particular to an RPA element intelligent positioning method based on a neural automaton.

Background

The RPA (robot process automation) technology is a process automation technology, and a process editor provided by RPA software can be used for designing and configuring a service operation process capable of being automatically executed, packaging the service operation process into a form of a software robot or a virtual robot, deploying the software robot or the virtual robot to a production environment and a service system for execution, simulating a series of operations of a human on a computer, such as mouse movement, mouse clicking, keyboard input, webpage opening, page information acquisition, file creation, file content input, file storage, file deletion and the like.

The existing traditional RPA technology realizes interaction with a service system by analyzing interface layout and codes. For example, various operations of a mouse and a keyboard in an operating system are realized through an API (application programming interface) interface provided by some desktop application automation tools; positioning and operating browser page elements such as buttons, input frames, text lines and the like are realized by analyzing CSS (cascading Styles tables) structures and JavaScript (scripting language for developing webpage) codes and the like of the browser pages; the positioning and operation of elements in the software interface are realized by analyzing the source code of office software. The technology has high requirements on visibility and openness of an operation object, and position and attribute information of the operation object needs to be acquired through an interface or source code so as to execute corresponding operations.

In addition, the technology has problems in some application scenarios, for example, for operation objects such as remote desktops, virtual systems, and office software developed by some clients, only one page picture can be obtained, and positioning and operation cannot be performed through an API interface or a source code parsing manner. In order to solve this problem, a solution provided in the prior art is to use a computer vision technology in the field of AI (artificial intelligence) to match, locate and operate operation elements. For example, a "submit" button on the remote desktop is first located to the position of the button in the whole page picture by means of object detection or template matching, and then the "submit" action is completed in cooperation with the mouse movement and click operation. Currently, the target detection method has the disadvantage that a deep learning model used in the method needs a large number of sample pictures marked with detection frames during training to achieve high positioning accuracy. For a webpage or common office software, a large amount of sample data can be automatically constructed in a synthesis mode; however, for some business software developed by the client itself or rarely used, the sample data with the label is difficult to obtain, and the deep learning model has not learned similar samples before, so the effect of element detection and positioning is not ideal.

The problem with this approach to template matching: firstly, the matching effect is not ideal in the traditional and single matching mode based on picture pixel values or 'feature points'; secondly, with the expansion of the application scene, the number of page elements, namely templates, to be matched is increased, and if a reasonable retrieval structure is not adopted, the speed of template query and matching is slow, so that the use is influenced.

Recently, although there is a trend to combine the above-mentioned technologies to solve the element positioning problem, the implementation difficulty and the calculation amount are increased.

In summary, the prior art has the problems of simplicity, difficulty in use and inconvenience in use when solving the problem of element positioning, and therefore it is necessary to design a technology considering development and deployment efficiency at the same time to solve the above problems.

Disclosure of Invention

In order to achieve the purpose, the invention provides the following technical scheme: an RPA element intelligent positioning method based on a neural automaton comprises the following specific operation steps:

step 1, dividing a picture into a plurality of smaller picture blocks, and extracting the characteristics of each small picture block to form a characteristic map based on a pre-trained picture block characteristic extraction model;

step 2, obtaining the image block state of each image block based on a pre-trained state prediction model, comprising:

2.1 selecting any position (i, j) of the feature map y, starting with the corresponding feature y _i,j At a time t ₀ The initial state of the automaton is z ₀ ；

2.2 State transition model by automaton

Obtaining the state of the next moment t;

2.3 Next Using the decision model

Determining the next action of the automaton

And marking u for the tile at location (i, j) _t ；

2.4 action u here _t The automaton can be moved one unit from the current position (i, j) to the next position, for example in the manner presented in the table below, as in (i +1,j);

step 3, finding an action mark sequence of at least one ordered image block meeting a preset condition, wherein each action mark in the ordered image block action mark sequence corresponds to one image block in the picture, and all image blocks corresponding to the ordered image block action mark sequence form a closed loop according to the indication of the action marks;

and 4, obtaining the position and the size of the RPA element based on the ordered tile state sequence.

Wherein the solving process of each model parameter θ, φ, ψ, i.e. the (pre-) training process of the model, comprises:

the first step, data acquisition and transformation, the acquired data are picture x and block state s, wherein the block state is a matrix, and each element represents a block.

Secondly, designing a model, extracting features by using CNN according to a given picture to obtain a feature map y = f (x), and point-by-point y on the feature map _i,j Running LSTM to get a memory of time of day

Then using MLP to acquire the state

Then the coordinates (i, j) of the next feature are determined from this state, noting that when a rejection state occurs, it is necessary to clear the memory information of the LSTM

Moving the LSTM input head to point to different feature points y using state-to-action mapping _i,j When a non-rejection state occurs, the memory information h of the previous step needs to be transmitted _t-1 Given the LSTM, we can then locate the RPA elements by the state trajectory that the neuroautomaton has traversed.

Thirdly, model training, namely optimizing the following problem solving model parameter theta,

the training picture is encoded into a feature map through CNN, and each image block corresponds to a feature vector y on the feature map _i,j Finding complete boundary containing BCDE in state label s and corresponding feature vector y in sequence _i,j Sequence of constituent features and states

(in this case, the sequence index t determines the coordinates (i, j) of the feature vector), and for this sequence, the feature vector sequence { y } _t Sending it into LSTM to get the memory sequence { h } _t Mapping the memory sequence to the prediction state sequence in parallel by using MLP

At this timeCalculating the loss using a cross entropy (Cross Encopy) loss function

While the use of back propagation algorithm can reduce the loss

Here, using the RMSprop optimizer, the model parameters θ that can correctly complete the task will be obtained after the loss convergence.

And fourthly, model reasoning is carried out, loss calculation is not carried out at the moment, the optimized model parameters in the previous section are used, and the boundary box of the RPA element is calculated according to the predicted state label sequence which can be obtained by state vector transfer and feature reading actions in model design.

Designing a neural automaton and carrying out specific operation steps.

The neural automaton has five states and corresponding actions, a: reject element identification, B: element identification pass and move to right tile, C: element identification through tile moved to the bottom, D: element identification through and to the left, E: the element identifies the tile that passes and moves to the top.

The neural automaton needs to position specific elements in the picture before running, firstly a CNN network is input to obtain a feature map, wherein each position of the feature map corresponds to an area of the input picture, and the picture is divided through grids.

Each image block divided in the picture corresponds to a vector, the neural automaton runs in a grid space shown in the picture, the vector corresponding to the image block is used as input, next input is automatically determined according to the current state until the next input returns to the starting point, and therefore the position and the size of one interface element are obtained.

The neuro-automaton walks around the entire picture in left-to-right, top-to-bottom reading order until encountering an RPA element whose state changes in turn to B, C, D, E, whereupon the RPA element is located, and then the automaton continues to walk around other regions of the picture looking for other RPA elements until all tiles are accessed or closed.

Preferably, the neural automaton has an internal state vector, and the next state vector is obtained by reasoning the input vector and the state vector itself, and the above behavior can be conveniently modeled using a recurrent neural network.

Preferably, there is a vector prototype for each state vector, other vectors sufficiently similar to the vector prototype are also considered as the states, the cosine of the angle between the vectors is generally used as a measure of their similarity, the state vector space is partitioned into different regions by the neighborhood of each state prototype to correspond to the individual states, and in practice, an MLP multi-tier perceptron is used to model the mapping of the vector space to the state space.

According to yet another embodiment of the present invention, there is provided an electronic device including a memory and a processor, the memory having stored thereon a computer program that, when executed by the processor, is capable of implementing the above-described neural automaton-based RPA element intelligent localization method.

According to still another embodiment of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is capable of implementing the above neural automata-based RPA element intelligent localization method.

Compared with the prior art, the invention provides an RPA element intelligent positioning method based on a neural automaton, which has the following beneficial effects:

compared with target detection and feature extraction based on deep learning, the RPA element intelligent positioning method based on the neural automata limits the receptive field of a visual model to a local sub-pixel space of an image instead of the whole image, so that the parameter quantity required by modeling can be reduced by a plurality of orders of magnitude, and the data quantity required by the training of the model is easier to obtain.

Drawings

FIG. 1 is an overall flow chart of the present invention;

FIG. 2 is a first illustration of the present invention;

FIG. 3 is a second embodiment of the present invention;

FIG. 4 is a third embodiment of the present invention;

FIG. 5 is a state vector transition diagram of the present invention;

FIG. 6 is a fourth embodiment of the present invention;

FIG. 7 is a diagram illustrating example five of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The related art and technical terms involved in the present invention will be briefly described below so that the skilled person can better understand the present solution.

A neural automaton:

the automaton comprises inputs, states and corresponding actions, and in general we define a finite state automaton as a five-tuple: (Q, Σ, δ, Q0, F), where Q is the set of all states, Σ is the set of input characters, δ is the state transfer function (the cartesian product of the fields Q and Σ is defined, the value field is Q), Q0 and F are the starting states, respectively (F is the set, is a subset of Q; Q0 is an element of Q, there is only one initial state).

The automaton may recognize some patterns. For example, we want to identify patterns of yz, yxz, yxxz, yxxxxz. Let Q = { c, Q0, Q, o }, Σ = { x, y, z }, δ be as shown in the following table:

δ: Q ✕ Σ → Q	x	y	z
				c
q0		o
				q	q	c
o	q

c is a termination state, and the machine is stopped (failure in recognition) when an undefined condition is encountered. Inputting a character string yxxz, firstly enabling the automaton to be in a state q0, changing the state of the automaton into o according to the table after receiving the initial character y, then receiving the next character x, changing the state into q, then receiving the second character x, keeping the state unchanged, finally receiving the character z, changing the state into an end state c, and successfully recognizing.

In particular, we can use vectors to represent Q and Σ, while representing the state transfer function δ with a neural network, which results in a neural automaton. This model is good at handling vector sequences. Much like tracing inside a painting, the region of interest is the region around the brush rather than the entire painting, so we use this computational model to deal with the RPA element boundary recognition problem.

The process of the invention is shown in fig. 1, and specifically comprises the following steps:

step one, extracting a feature vector of a target image block;

step two, obtaining the image block state of each image block based on a pre-trained state prediction model;

step three, finding the next image block with the corresponding characteristic according to the current image block ordered state sequence;

and step four, until all the blocks in the ordered block state sequence form a closed loop according to the indication of the block states, thereby obtaining the position and the size of the RPA element.

The invention uses the artificial neural network as a component, and is characterized by having huge parameter space, and some elements are needed to find the optimal parameter theta in the implementation process, and the elements are as follows: data D, loss function L, optimization (parameter search) algorithm (optimization problem)

)。

The phase of the optimization algorithm run, called the training phase, during which our goal is to find the optimal parameter θ ^* Enabling the model to perform a task, e.g. the task of the invention is in one pieceFinding RPA elements in the picture, and solving the task by using a trained model is called an inference stage.

In the training stage, the technical scheme provided by the invention can be divided into the following steps:

1. data acquisition and transformation

The data used by the invention is a picture x and a picture block state s, wherein the picture block state is a matrix, each element represents a picture block, as shown in fig. 4, but considering that the cost of marking according to the picture blocks is too high, the data used by the invention can be obtained by transforming general right-angle bounding box marking data (as shown in fig. 7), and the specific transformation method is as follows:

the first step is as follows: the tiles are sliced into smaller tiles (which may overlap between tiles to increase robustness) and numbered.

The second step is that: fig. 4 shows how to label the image block with a label corresponding to the corresponding state according to whether the image block includes a bounding box and the type of the bounding box (whether the image block includes corners). There are five states and corresponding actions known for neuroautomata:

a: the element identification is rejected.

B: the element identifies the tile that passes and moves to the right (the next input is the right tile).

C: the element identifies the tile that passes and moves to the bottom (the next input is the bottom tile).

D: the element identifies the tile that passes and moves to the left (the next input is the tile on the left).

E: the element identifies the tile that passes and moves to the top (the next input is the top tile).

The labeling method of the blocks is as follows: for a set of tiles S that each element boundary crosses clockwise:

(1) If two blocks on the right and lower sides of the block S also belong to the element in S, the block is labeled B;

(2) If the boundary covered by the block faces right clockwise, the block is marked as B;

(3) If the boundary covered by the tile is downward clockwise, the tile is labeled as C;

(4) If the boundary covered by the block faces to the left clockwise, the block is marked as D;

(5) If the boundary covered by the image block faces upwards clockwise, the image block is marked as E;

(6) The rest blocks are marked as A, and the representation is independent of the boundary and does not need to participate in model reasoning calculation.

It should be noted that the requirement for labeled data depends on the complexity of the behavior of the algorithm model, and the complex model requires more training data to make the behavior pattern converge to the expected one under the condition of ensuring enough constraint conditions. The model, either the behavior complexity or the parameter number, is far lower than the prior art based on the convolutional neural network, so the requirement on data is also far lower than the prior art. And the marking scheme taking the image blocks as units has higher efficiency than the marking scheme taking the pixels as units, thereby saving more manpower.

2. Model design

Given picture x, using CNN to extract features results in feature map y = f (x). Each image block in the picture corresponds to a coordinate (i, j) on the feature map, and the coordinate (i, j) corresponds to a feature vector y _i,j 。

Point by point y on the feature map _i,j Running LSTM to obtain memory of t moment

Then, the MLP is used to obtain the state of the current time t

Then, based on the state, the coordinates (i, j) corresponding to the extracted feature at the next time t +1 are determined in accordance with the action-state mapping relationship at the time of data preparation. Note that when rejection state A occurs, it is necessary to clear the memory information of LSTM at the current time t

Moving LSTM input using state-action mappingHead pointing to different feature points y _i,j When the non-rejection state (BCDE) occurs, the memory information h of the last time t-1 needs to be transmitted _t-1 To the LSTM. Through the state track (the sequence formed by arranging the coordinates ij according to the time t in sequence) passed by the automaton, the RPA element can be positioned (the sequence formed by arranging the coordinates ij according to the time t in sequence can enclose the boundary of the RPA element, and the position can be determined by the boundary and the corresponding coordinates).

3. Model training

The model training is to optimize the following problem solving model parameters theta:

in the present invention, the data D refers specifically to the picture x and the state labels s of the tiles (corresponding to the expected state ABCDE of the neuro-automaton).

The training picture is encoded into a feature map through CNN, each image block corresponds to a feature vector y on the feature map _i,j Finding complete boundary containing BCDE in state label s and corresponding feature vector y in sequence _i,j Sequence of constituent features and states

(in this case, the sequence index t determines the coordinates (i, j) of the feature vector). For each RPA element, a learning sample can be constructed

Wherein s is _t The label comprises four states of complete BCDE image block labels forming a closed loop, and a region enclosed by corresponding image blocks corresponds to an RPA element. To represent the learning samples of multiple RPA elements on one picture, only one set of the learning samples corresponding to all the RPA elements is needed to be used as the learning sample corresponding to the picture. When calculating the loss function L, firstly, the loss L of the learning sample corresponding to each RPA element is calculated, and then the losses L corresponding to all the RPA elements in the union set of the learning sample sets corresponding to each picture are summed to obtain the global loss L. For the sequence, feature vectors are addedSequence y _t Sending it into LSTM to get the memory sequence { h } _t Mapping the memory sequence to the prediction state sequence in parallel by using MLP

At this time, the loss is calculated using a cross entropy (CrossEntropy) loss function

While the use of back propagation algorithm can reduce the loss

Here, using the RMSprop optimizer, after the loss convergence, the model parameters θ will be obtained that can properly complete the task.

4. Model reasoning

Unlike the training phase, there is no previously labeled state sequence s _t . On the premise of full training of the model, the method can be used

To approximate substitute s _t Let the model go from the current state s _t Get the next state s by itself _t+1 S thus recurrently at different time points _t The composed sequence corresponds to the boundary contour of one RPA element. Notably, in the initial hidden state h ₀ The model of (c) can also be used to find the element outlines starting with the state B patches, so we start with h after or just after finding an RPA element ₀ Model finding an s in hidden state ₀ B-state block and s is performed by starting with the block _t And (5) recursion of the sequence. It should be noted that under the clockwise BCDE action state setting, s found are found in left-to-right top-to-bottom reading order ₀ The tile for B must be the first tile of a sequence of tiles.

The technical scheme provided by the invention can comprise the following steps:

s1, dividing the picture into a plurality of smaller picture blocks x _ij (or x) _t ) And using CNN feature extraction model to obtain its correspondent feature vector y _ij (or y) _t ）。

S2, based on the pre-trained state transition model, initially recognizing the hidden state h ₀ And the feature y ₀ Obtaining the corresponding hidden state h of each image block in a recursion way _t H is then determined by using MLP decision model _t Mapping to an action State s _t It is necessary to point out s _t Block x, which determines that the neuro-automaton is to access at the next time t +1 _t Repeating the steps 1 and 2 to obtain s _t+1 . It should be noted that when s _t+1 And s _t When incompatible, the hidden state of the model should be reset to the initial state h ₀ E.g., B followed by E and skipping CD we consider incompatible.

S3, finding an ordered sequence of block states satisfying a complete and continuous BCDE condition, each block state S in the ordered sequence of block states _t Corresponding to a tile x in a picture _t (or x) _ij ) And all tiles x in the ordered sequence of tile states _t (or x) _ij ) According to the state s of the picture block _t The indication of (c) forms a closed loop.

S4, based on the ordered graph block state sequence S _t Obtain the corresponding sequence of tiles "x _t "or (" "x") _ij "at this time, the position and size of the RPA element can be obtained by using the coordinate information ij in the tile sequence.

In the specific embodiment, the specific implementation flow of the invention is as follows:

as shown in fig. 2, it is necessary to locate the center-floated shadow-specific element.

First, a CNN network is input to obtain a feature map, where each position of the feature map corresponds to an area of an input picture, and is divided as shown in a grid (10 ✕) in fig. 3, and each tile corresponds to a vector (for example, 512-dimensional, output by CNN).

The neural automaton of the present invention operates in the grid space shown in fig. 3, takes the vector corresponding to the block as input, and determines the next input by itself according to the current state.

For example, the neural automaton in initial state A is placed in the upper left-hand grid, and the next state B is obtained by reasoning the corresponding vector, and then it decides to read the next input vector from the right-hand grid. We repeat the above process to obtain a trace containing various states as shown in fig. 4:

when it returns to the beginning (this tile has been visited before) in state E we have a sequence of tiles [ x ] ₀ ，x ₁ ，x ₂ ，。。。，x _t The tiles themselves contain coordinate information ij, so that the sequence of tiles can also be written [ x ] _i0j0 ，x _i1j1 ，x _i2j2 ，。。。，x _ij In this way, a set of coordinates of the picture block, i ₀ j ₀ ，i ₁ j ₁ ，i ₂ j ₂ ,. . . Ij defines the coordinate range [ min (i), min (j), max (i), max (j) ] corresponding to the RPA element, so that the position (the minimum value in i/j corresponds to the coordinate of the upper left corner of the element) and the size (the maximum value in ij corresponds to the coordinate of the lower right corner of the element) of the RPA interface element can be obtained, and by subtracting the coordinate of the upper left corner from the coordinate of the lower right corner, the width and height information of the element, namely the size of the RPA interface element, can be obtained.

It should be noted that the neural automaton of the present invention has an internal state vector (for example, 512 dimensions), the next state vector is obtained by reasoning the input vector and the self state vector, and the above behaviors can be conveniently modeled by using a recurrent neural network (for example, a long short-term memory), and the transitions of the various state vectors are as shown in fig. 5.

There is a vector prototype (e.g. 512 dimensions, which can be obtained by learning) for each state, other vectors sufficiently similar to the prototype are also considered as the state vector, the cosine of the included angle between the vectors is usually used as the measure of their similarity, the state vector space is divided into different regions by the neighborhood of each state prototype to correspond to each state, as shown in fig. 6, the hollow arrow is more similar to arrow E, and in practice, an MLP Multi-layer perceptron (Multi-layerpercetron) should be used to model the mapping of the vector space to the state space.

The automaton walks around the whole picture in the left-to-right top-to-bottom reading order until encountering the RPA element and the state of the RPA element is sequentially changed to B, C, D, E, so that the RPA element is located (the position and the size of the RPA element can be obviously deduced by using the coordinate information corresponding to the complete and continuous BCDE state sequence, and then the automaton continues to walk around other areas of the picture to find other RPA elements until all the blocks are accessed or closed.

According to still another embodiment of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is capable of implementing the above neural automaton-based RPA element smart positioning method.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. An RPA element intelligent positioning method based on a neural automaton is characterized in that: the specific operation steps are as follows:

step 1, dividing a picture into a plurality of smaller picture blocks, and obtaining feature vectors corresponding to the picture blocks by using a convolutional neural network feature extraction model;

step 2, obtaining the image block state of each image block based on a pre-trained state prediction model; the training process of the state prediction model comprises the following steps:

the method comprises the following steps of firstly, acquiring and transforming data, wherein the acquired data comprise a picture x and a picture block state s, the picture block state is a matrix, and each element represents a picture block;

secondly, designing a model, extracting features by using a convolutional neural network according to a given picture x to obtain a feature map y = f (x), and point-by-point y on the feature map _i,j Running LSTM to obtain memory of t moment

Then obtaining the state by using MLP multi-layer perceptron

Moving the LSTM input head using the state-action map to point to different feature points y _i,j When the non-rejection state occurs, the memory information h of the previous step needs to be transmitted _t-1 For LSTM, we can locate RPA element through the state track passed by the neural automaton;

step 3, finding at least one ordered tile state sequence meeting a preset condition, wherein each tile state in the ordered tile state sequence corresponds to one tile in the picture, and all the tiles in the ordered tile state sequence form a closed loop according to the indication of the tile states;

2. The RPA element intelligent positioning method based on the neural automaton as claimed in claim 1, wherein: the training process of the state prediction model comprises the following steps:

the training picture is encoded into a feature map through a convolutional neural network, and each image block corresponds to the feature mapA feature vector of

Finding complete boundary containing BCDE in state label s and corresponding characteristic vector in sequence

Sequence of constituent features and states

In this case, the sequence index a determines the coordinates (i, j) of the feature vector, and for this sequence, the feature vector sequence is sequenced

Sending the sequence into LSTM to obtain memory sequence { ht }, and mapping the memory sequence to the predicted state sequence in parallel by using MLP multi-layer perceptron

When calculating the loss using the cross entropy loss function

Loss can be reduced using back propagation algorithms

Obtaining a model parameter theta which can correctly complete the task after loss convergence;

the neural automaton has five states and corresponding actions, A: rejection of element identification, B: element identification through and move to right tile, C: element identification by and to the lower tile, D: element identification through and to the left, E: the element identifies the tile that passes and moves to the top.

3. The RPA element intelligent positioning method based on neural automaton according to claim 2, wherein: when reasoning is carried out through the model, loss calculation is not carried out, optimized model parameters are used, and a predicted state label sequence can be obtained and a bounding box of an RPA element is calculated according to state vector transfer and feature reading actions in model design.

4. The RPA element intelligent positioning method based on the neural automaton as claimed in claim 2, wherein: the neural automaton specifically comprises the following operation steps:

before operation, specific elements in the picture need to be positioned, firstly, a convolutional neural network is input to obtain a feature map, wherein each position of the feature map corresponds to an area of the input picture, and the picture is divided through grids;

each image block divided from the picture corresponds to a vector, the neural automaton runs in a grid space shown by the picture, the vector corresponding to the image block is used as input, next input is automatically determined according to the current state until the next input returns to the starting point, and therefore the position and the size of an interface element are obtained;

5. The RPA element intelligent positioning method based on neural automaton according to claim 4, wherein: the neural automaton has an internal state vector, and the next state vector can be obtained by reasoning the input vector and the state vector of the neural automaton.

6. The RPA element intelligent positioning method based on neural automaton according to claim 5, wherein: there is a vector prototype for each state vector, other vectors sufficiently similar to the vector prototype are also identified as the state, the cosine of the angle between the vectors is used as a measure of their similarity, the state vector space is partitioned into different regions by the neighborhood of each state prototype to correspond to the individual states, and in practice, the MLP multi-layer perceptron is used to model the mapping of the vector space to the state space.

7. The utility model provides a RPA element intelligence positioner based on neural automaton which characterized in that: the system comprises a feature extraction module, a state updating module, a state sequence acquisition module and an element positioning module, and specifically comprises:

the characteristic extraction module is used for dividing the image into a plurality of smaller image blocks and obtaining characteristic vectors corresponding to the image blocks by using a convolutional neural network characteristic extraction model;

the state updating module is used for obtaining the image block state of each image block based on a pre-trained state prediction model; the training process of the state prediction model comprises the following steps:

firstly, data are acquired and transformed, the acquired data comprise a picture x and a picture block state s, wherein the picture block state is a matrix, and each element represents a picture block;

Then obtaining the state by using MLP multi-layer perceptron

Moving the LSTM input head to point to different feature points y using state-to-action mapping _i,j When the non-rejection state occurs, the memory information h of the previous step needs to be transmitted _t-1 To LSTM, thenThen, the RPA element can be positioned through the state track passed by the neural automaton;

the state sequence acquisition module is used for finding at least one ordered block state sequence meeting a preset condition, each block state in the ordered block state sequence corresponds to one block in the picture, and all blocks in the ordered block state sequence form a closed loop according to the indication of the block state;

an element positioning module to obtain a position and a size of an RPA element based on the ordered sequence of tile states.

8. An RPA element smart location storage medium having stored thereon computer instructions, characterized in that: the computer instructions, when executed by a processor, implement the steps of a method for intelligent location of RPA elements as recited in any of claims 1-6.

9. An RPA element smart positioning computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that: the processor, when running the computer program, implements a RPA element smart location method as recited in any one of claims 1-6.