CN110780965A

CN110780965A - Vision-based process automation method, device and readable storage medium

Info

Publication number: CN110780965A
Application number: CN201911020138.7A
Authority: CN
Inventors: 吴子凡; 张潮宇; 何元钦; 陈天健
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2019-10-24
Filing date: 2019-10-24
Publication date: 2020-02-11
Anticipated expiration: 2039-10-24
Also published as: CN110780965B

Abstract

The invention discloses a visual-based process automation method, a device and a readable storage medium, wherein the visual-based process automation method comprises the following steps: receiving an interface image, analyzing the interface image to obtain an analysis result, combing the association relationship of interface elements in the interface image based on a preset knowledge map template and the analysis result to establish an interface analysis map database, and executing a preset automatic process based on the interface analysis map database. The technical problem of poor RPA applicability in the prior art is solved.

Description

Vision-based process automation method, device and readable storage medium

Technical Field

The invention relates to the technical field of neural networks of financial technology (Fintech), in particular to a visual-based process automation method, equipment and a readable storage medium.

Background

With the continuous development of financial technologies, especially internet technology and finance, more and more technologies (such as distributed, Blockchain, artificial intelligence and the like) are applied to the financial field, but the financial industry also puts higher requirements on the technologies, such as higher requirements on the distribution of backlog of the financial industry.

With the continuous development of computer software and artificial intelligence, RPA is more and more widely applied in daily life, currently, RPA (robot Process Automation) software must be installed in a local machine to be operated when in use to obtain a state of a page to execute necessary input and further complete some mechanical work, but for some systems that do not allow third-party software to be installed, the existing RPA software does not have corresponding working capacity, for example, a financial system in a certain bank needs to log in through specific hardware, so that RPA software cannot be installed to perform an automatic Process, further, people can only rely on manpower to perform the mechanical work, unnecessary human resource waste is caused, and therefore, the technical problem of poor applicability of RPA exists in the prior art.

Disclosure of Invention

The invention mainly aims to provide a visual-based process automation method, equipment and a readable storage medium, and aims to solve the technical problem of poor RPA applicability in the prior art.

In order to achieve the above object, an embodiment of the present invention provides a vision-based process automation method, which is applied to a vision-based process automation device, and the vision-based process automation method includes:

receiving an interface image, and analyzing the interface image to obtain an analysis result;

based on a preset knowledge graph template and the analysis result, combing the incidence relation of interface elements in the interface image to establish an interface analysis graph database;

and executing a preset automation process based on the interface analysis graph database.

Optionally, the step of combing the association relationship of the interface elements in the interface image based on the preset knowledge graph template and the analysis result to establish an interface analysis graph database includes:

based on the analysis result, combing the incidence relation of the interface elements in the interface image to obtain the incidence relation of the interface elements;

inputting the interface elements into the preset knowledge graph template based on the interface element association relation to obtain an interface element knowledge graph;

and establishing a search condition module corresponding to the interface element knowledge graph so as to establish the interface analysis graph database based on the interface element knowledge graph and the search condition module.

Optionally, the interface element comprises an interface, an interface region and a region element, the parsing result comprises an image classification result, a semantic segmentation result and a target detection result,

the step of combing the incidence relation of the interface elements in the interface image based on the analysis result to obtain the incidence relation of the interface elements comprises the following steps:

based on the target detection result, combing the incidence relation of the area elements to obtain the incidence relation of the area elements;

based on the area element incidence relation and the semantic segmentation result, combing the incidence relation of the interface area to obtain the incidence relation of the interface area;

and combing the incidence relation of the interface based on the incidence relation of the interface area and the image classification result to obtain the incidence relation of the interface elements.

Optionally, the step of executing a preset automation process based on the interface resolution graph database includes:

acquiring an operation command corresponding to the preset automation process, and extracting input information corresponding to the operation command from a preset input information database;

and searching interface element information corresponding to the input information based on the interface analysis graph database so as to execute a preset automatic process.

Optionally, the parsing result comprises an image classification result, a semantic segmentation result and a target detection result,

the step of analyzing the interface image to obtain an analysis result comprises the following steps:

inputting the interface image into a preset image classification model to identify the interface image and obtain an image classification result;

inputting the interface image into a preset semantic segmentation model, and performing semantic segmentation on the interface image to obtain a semantic segmentation result;

and inputting the interface image subjected to semantic segmentation into a preset target detection model, and carrying out target detection on the interface image to obtain a target detection result.

Optionally, the step of inputting the interface image into a preset image classification model to identify the interface image, and obtaining the image classification result includes:

inputting the interface image into the preset image classification model to perform convolution and pooling alternative processing on the interface image for preset times to obtain a plurality of image classification characteristic graphs corresponding to the interface image;

and fully connecting the image classification characteristic graphs to obtain image classification characteristic vectors, and extracting interface information in the image classification characteristic vectors to obtain the image classification result.

Optionally, the step of inputting the interface image into a preset semantic segmentation model, performing semantic segmentation on the interface image, and obtaining the semantic segmentation result includes:

inputting the interface image into the preset semantic segmentation model to encode the interface image to obtain an encoding result;

and decoding the coding result to obtain the semantic segmentation result.

Optionally, the step of inputting the interface image after semantic segmentation into a preset target detection model, performing target detection on the interface image, and obtaining the target detection result includes:

inputting the interface image subjected to semantic segmentation into the preset target detection model to select candidate regions in the interface image in a frame mode to obtain target frames corresponding to the candidate regions;

performing convolution and pooling alternative processing on each target frame for preset times to obtain a plurality of target frame characteristic graphs corresponding to each target frame;

and fully connecting the plurality of target frame feature maps to obtain target feature vectors corresponding to the target frames, and extracting target information in the target feature vectors to obtain the target detection result.

The present invention also provides a vision-based process automation apparatus applied to a vision-based process automation device, the vision-based process automation apparatus including:

the analysis module is used for receiving the interface image and analyzing the interface image to obtain an analysis result;

the carding module is used for carding the incidence relation of the interface elements in the interface image based on the preset knowledge map template and the analysis result so as to establish an interface analysis map database;

and the execution module is used for executing a preset automatic process based on the interface analysis graph database.

Optionally, the grooming module comprises:

the combing unit is used for combing the incidence relation of the interface elements in the interface image based on the analysis result to obtain the incidence relation of the interface elements;

the input unit is used for inputting the interface elements into the preset knowledge graph template based on the interface element association relation to obtain an interface element knowledge graph;

and the matching unit is used for establishing a search condition module corresponding to the interface element knowledge graph so as to establish the interface analysis graph database based on the interface element knowledge graph and the search condition module.

Optionally, the comb unit comprises:

the first combing subunit is used for combing the association relation of the area elements based on the target detection result to obtain the association relation of the area elements;

the second combing subunit is used for combing the incidence relation of the interface region based on the incidence relation of the region elements and the semantic segmentation result to obtain the incidence relation of the interface region;

and the third carding subunit is used for carding the incidence relation of the interface based on the incidence relation of the interface area and the image classification result to obtain the incidence relation of the interface elements.

Optionally, the execution module includes:

the extraction unit is used for acquiring the operation command corresponding to the preset automation process and extracting the input information corresponding to the operation command from a preset input information database;

and the searching unit is used for searching the interface element information corresponding to the input information based on the interface analysis graph database so as to execute a preset automatic process.

Optionally, the parsing module includes:

the image recognition unit is used for inputting the interface image into a preset image classification model so as to recognize the interface image and obtain an image classification result;

the semantic segmentation unit is used for inputting the interface image into a preset semantic segmentation model, performing semantic segmentation on the interface image and obtaining a semantic segmentation result;

and the target detection unit is used for inputting the interface image after the semantic segmentation into a preset target detection model, and carrying out target detection on the interface image to obtain the target detection result.

Optionally, the image recognition unit includes:

a first convolution and pooling subunit, configured to input the interface image into the preset image classification model, so as to perform convolution and pooling alternative processing on the interface image for a preset number of times, so as to obtain a plurality of image classification feature maps corresponding to the interface image;

and the first full-connection unit is used for performing full connection on the plurality of image classification characteristic graphs to obtain image classification characteristic vectors and extracting interface information in the image classification characteristic vectors to obtain the image classification result.

Optionally, the semantic segmentation unit includes:

the coding unit is used for inputting the interface image into the preset semantic segmentation model so as to code the interface image and obtain a coding result;

and the decoding unit is used for decoding the coding result to obtain the semantic segmentation result.

Optionally, the target detection unit includes:

a framing unit, configured to input the interface image after semantic segmentation into the preset target detection model, so as to frame candidate regions in the interface image, and obtain target frames corresponding to the candidate regions;

the second convolution and pooling unit is used for performing convolution and pooling alternative processing on each target frame for preset times to obtain a plurality of target frame feature maps corresponding to each target frame;

and the second full-connection unit is used for performing full-connection on the plurality of target frame feature maps to obtain target feature vectors corresponding to the target frames, and extracting target information in the target feature vectors to obtain the target detection result.

The present invention also provides a vision-based process automation apparatus, including: a memory, a processor, and a program of the vision-based process automation method stored on the memory and executable on the processor, which when executed by the processor, may implement the steps of the vision-based process automation method as described above.

The present invention also provides a readable storage medium having stored thereon a program for implementing a vision-based process automation method, which when executed by a processor implements the steps of the vision-based process automation method as described above.

The method comprises the steps of receiving an interface image, analyzing the interface image to obtain an analysis result, combing the association relation of interface elements in the interface image based on a preset knowledge map template and the analysis result to establish an interface analysis map database, and further executing a preset automatic flow based on the interface analysis map database. That is, the interface image is received first, the interface image is analyzed, an analysis result is obtained, association relation of interface elements in the interface image is combed based on a preset knowledge graph template and the analysis result, an interface analysis graph database is established, and further, execution of a preset automatic process is performed based on the interface analysis graph database. That is, this application is through analyzing the interface image to based on its analytic result, comb the relation between each interface element in the interface image, and then establish interface analysis map database, and then based on interface analysis map database, the execution presets the automation process, so, this application can acquire the state of interface through analyzing the interface image, with the execution predetermine the automation process, need not to install third party software on the system and acquire the state of interface through the system interface, can realize the automation process to systems such as the system that does not allow third party software to install or interface closed system, has improved RPA's suitability to a great extent, so, solved the technical problem that RPA suitability is poor among the prior art.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a schematic flow chart diagram of a first embodiment of a vision-based process automation method of the present invention;

FIG. 2 is a schematic flow chart of a second embodiment of the vision-based process automation method of the present invention;

fig. 3 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The present invention provides a vision-based process automation method applied to a vision-based process automation device, and in a first embodiment of the vision-based process automation method of the present application, referring to fig. 1, the vision-based process automation method includes:

step S10, receiving an interface image, analyzing the interface image and obtaining an analysis result;

in this embodiment, it should be noted that the interface image includes multiple types of interface elements, where the multiple types of interface elements include an interface, an interface area, and area elements, where the interface includes a web interface, a software interface, and the like, the interface area includes a working area, a navigation area, and the like, the area elements include a drop-down box, an input box, a text box, and the like, the analysis result includes an image classification result, a semantic segmentation result, and a target detection result, and the interface image may be obtained by shooting with a camera or other shooting devices.

Receiving an interface image, analyzing the interface image to obtain an analysis result, specifically, receiving the interface image, analyzing the interface image to identify the type of the interface and obtain an image classification result, further segmenting each interface region of the interface image to obtain a semantic segmentation result, and further, performing target detection on each region element of each interface region to obtain a target detection result.

Step S20, based on a preset knowledge graph template and the analysis result, combing the incidence relation of the interface elements in the interface image to establish an interface analysis graph database;

in this embodiment, it should be noted that the preset knowledge graph template is a blank knowledge graph, wherein the preset knowledge graph template comprises a plurality of knowledge nodes, wherein the knowledge nodes comprise a primary knowledge node, a secondary knowledge node, a tertiary knowledge node and the like, and each of the knowledge nodes may be connected by a particular line segment to represent an associative relationship between the knowledge nodes, for example, assuming that the primary knowledge node corresponds to an interface, the secondary knowledge node corresponds to an interface area, the tertiary knowledge node corresponds to an area element, if the interface elements are in dependency relationship, the corresponding level and connected knowledge nodes can be input, if the two area elements belonging to different interface areas are in level relationship, the knowledge nodes corresponding to the two area elements will be connected to the knowledge nodes of the respective interface areas, and the knowledge nodes of the respective interface areas will be connected. The incidence relation comprises a subordinate relation of the interface region to the interface, a subordinate relation of the region element to the interface region and an incidence relation between the region elements, wherein the incidence relation between the region elements comprises incidence relations of sub-elements, levels, affiliations and the like.

Based on a preset knowledge graph template and the analysis result, combing the incidence relation of interface elements in the interface image to establish an interface analysis graph database, specifically, based on the analysis result, combing the incidence relation between the interface elements to obtain the incidence relation between the interface elements, based on the incidence relation, inputting the interface elements in the interface image into knowledge nodes in the preset knowledge graph template to obtain a knowledge graph corresponding to the interface image, and storing the knowledge graph into the preset interface analysis graph database template to establish the interface analysis graph database.

The step of combing the incidence relation of the interface elements in the interface image based on the preset knowledge graph template and the analysis result to establish an interface analysis graph database comprises the following steps:

step S21, based on the analysis result, combing the incidence relation of the interface elements in the interface image to obtain the incidence relation of the interface elements;

in this embodiment, based on the analysis result, the interface elements in the interface image are subjected to association relationship combing to obtain an interface element association relationship, specifically, based on information such as the type, the attribute, and the position of each interface element in the analysis result, the interface elements in the interface image are subjected to association relationship combing to obtain an interface element association relationship, where the interface element association relationship includes a membership relationship, a flat-level relationship, and the like, and the process of performing association relationship combing on the interface elements in the interface image includes a process of converting the analysis information of the interface elements into a specific tag and binding the specific tag with the interface based on the analysis result, and then associating each interface element through the specific tag.

Wherein the interface elements comprise an interface, an interface area and area elements, the analysis result comprises an image classification result, a semantic segmentation result and a target detection result,

step S211, based on the target detection result, combing the association relation of the area elements to obtain the association relation of the area elements;

in this embodiment, it should be noted that the target detection result includes information such as a category, an attribute, and a position of each area element.

Based on the target detection result, combing the association relationship of the area elements to obtain an association relationship of the area elements, specifically, based on the target detection result, combing the association relationship of the area elements to obtain an association relationship of the area elements, and determining an interface area to which the area elements belong according to position information of the area elements to obtain an association relationship between the area elements and the interface area, for example, assuming that an area element a is an input frame and an area element B is an output frame, when the interface image is subjected to target detection, corresponding labels a and B are attached according to the target detection result a and B, wherein the labels include the target detection result, and identification codes are respectively set for the labels a and B, and since the area element a is the input frame and the area element B is the output frame, therefore, a and B are in a flat-level relationship, and the corresponding identification codes of the tags a and B are also flat-level relationship identification codes, for example, assuming that the identification code of the flat-level relationship is 0001, the identification code of the tag a is a-0001, and the identification code of the tag B is B-0001.

Step S212, based on the area element association relation and the semantic segmentation result, carrying out association relation carding on the interface area to obtain an interface area association relation;

in this embodiment, it should be noted that the semantic segmentation result includes information such as a category, an attribute, and a position of each interface region.

Based on the area element association relationship and the semantic segmentation result, combing the association relationship of the interface areas to obtain an interface area association relationship, specifically, based on the area element association relationship and the semantic segmentation result, combing the association relationship between the interface areas to obtain an association relationship between the interface areas, and based on the position information of the interface areas, obtaining a membership relationship between the interface areas and the interface, for example, combing the association relationship between the interface areas through identification codes in preset tags corresponding to the interface areas.

And S213, combing the incidence relation of the interface based on the incidence relation of the interface area and the image classification result to obtain the incidence relation of the interface elements.

In this embodiment, it should be noted that the image classification result includes information such as the type and attribute of the interface.

And combing the association relation of the interfaces based on the association relation of the interface regions and the image classification result to obtain the association relation of the interface elements, specifically, combing the association relation between the interfaces based on the association relation of the interface regions and the image classification result to obtain the association relation between the interfaces, and determining the membership relation between the interface regions and the interfaces based on the association relation of the interface regions, for example, combing the association relation between the interfaces through identification codes in preset labels corresponding to the interfaces.

In addition, another method for combing association relationships between interface elements is also provided in this embodiment, where the interface elements include interfaces, interface regions, and region elements, first, based on the semantic segmentation result and the image classification result, the association relationships between the interface regions are combed to obtain relationships between the interface regions, membership between the interface regions and the interfaces, and membership between the interface regions and the region elements, that is, to obtain an interface region association relationship, further based on the target detection result and the interface region association relationship, the association relationships between the interface elements belonging to the interface regions are combed to obtain a region element association relationship, and further based on the interface region association relationship, the region element association relationship, and the image classification result, the interface element association relationship is obtained, for example, tag matching may be used to perform association relationship combing between region elements, and in particular, assuming that a region element a is an input box, a region element B is an output box, when the interface image is detected, corresponding labels a and B are attached to the A and B according to the target detection result, wherein the label comprises a target detection result, and the labels a and B are respectively provided with identification codes, and because the area element A is an input box and the area element B is an output box, so a and B are in a flat relationship, and therefore the corresponding identification codes of the tags a and B are also flat relationship identification codes, for example, assuming that the identification codes of the flat relationship are 0001, the identification code of the label a is a-0001, the identification code of the label b is b-0001, and similarly, the relationship of the interface region association relationship can be carded through the label matching.

Step S22, inputting the interface element into the preset knowledge graph template based on the interface element incidence relation to obtain an interface element knowledge graph;

in this embodiment, it should be noted that the preset knowledge graph template includes a knowledge graph module, and the knowledge graph module is configured to store each interface element.

And inputting the interface elements into the preset knowledge graph template based on the interface element association relationship to obtain an interface element knowledge graph, specifically, inputting the interface elements into knowledge nodes of corresponding levels in the knowledge graph module based on the membership relationship between interfaces and interface areas and the membership relationship between interface areas and area elements, and further connecting the knowledge nodes by specific line segments based on the interface element association relationship to obtain the interface element knowledge graph.

Step S23, establishing a search condition module corresponding to the interface element knowledge graph, so as to establish the interface analysis graph database based on the interface element knowledge graph and the search condition module.

In this embodiment, it should be noted that the search condition includes information that can be used to query the interface element, such as a keyword, a tag, and a character string, the preset knowledge graph template includes a search condition template, and the search condition module is configured to store the search condition corresponding to each interface element.

Establishing a search condition module corresponding to the interface element knowledge graph so as to establish the interface analysis graph database based on the interface element knowledge graph and the search condition module, specifically based on the association relationship information among the interface elements in the interface element knowledge graph, the element feature information of the interface elements and the storage position information of the interface elements, wherein the element attributes comprise the characteristics of the category, the position, the attribute and the like of the interface elements, search keywords corresponding to each interface element are matched to obtain search conditions, then inputting the search condition into the search condition template to establish a search condition module corresponding to each interface element in the interface element knowledge graph, and establishing the interface analysis database based on the interface element knowledge graph and the search condition module.

And step S30, executing a preset automation process based on the interface analysis database.

In this embodiment, a preset automation process is executed based on the interface resolution graph database, specifically, an operation instruction of the preset automation process is obtained, and an association relationship between a required interface element and an interface element corresponding to the operation instruction is queried in the interface resolution graph database based on search information in the operation instruction, and further, the preset automation process is executed according to the association relationship between the required interface element and the interface element based on execution information in the operation instruction.

Wherein the step of executing a preset automation process based on the interface resolution graph database comprises:

step S31, acquiring an operation command corresponding to the preset automation process, and extracting input information corresponding to the operation command from a preset input information database;

in this embodiment, it should be noted that the operation command includes information such as an execution code and an identification tag.

The method comprises the steps of obtaining an operation command corresponding to the preset automation process, extracting input information corresponding to the operation command from a preset input information database, specifically obtaining the operation command corresponding to the preset automation process, and inquiring input information corresponding to the operation command from a preset input information database based on identification information in the operation command, wherein the input information comprises process information and search conditions for executing the preset automation process.

And step S32, searching interface element information corresponding to the input information based on the interface analysis graph database to execute a preset automation process.

In this embodiment, interface element information corresponding to the input information is searched based on the interface resolution graph database to execute a preset automation process, specifically, an association relationship between an interface element and an interface element required by the preset automation process is searched in the interface resolution graph database according to a search condition in the input information, further, a preset automation process is executed according to process information of the preset automation process based on an association relationship between a required interface element and an interface element, for example, specifically, a bluetooth receiver may be installed on a host machine where the preset automation process is located, and then the process information of the preset automation process is simulated as operation information of an input device corresponding to the host machine, such as a mouse, a keyboard, and the like, and the operation information is sent to the bluetooth receiver, and controlling the host machine to complete the preset automatic process through the Bluetooth receiver.

In the embodiment, an interface image is received and analyzed to obtain an analysis result, and then, based on a preset knowledge map template and the analysis result, the association relationship of interface elements in the interface image is combed to establish an interface analysis map database, and further, based on the interface analysis map database, a preset automation process is executed. That is, in this embodiment, the interface image is received first, and then the interface image is analyzed to obtain an analysis result, and then based on a preset knowledge graph template and the analysis result, the association relationship of the interface elements in the interface image is combed to establish an interface analysis graph database, and further, based on the interface analysis graph database, execution of a preset automation process is performed. That is, in the present embodiment, the interface image is analyzed, and based on the analysis result, the relationship between the interface elements in the interface image is combed, so as to establish the interface analysis map database, and then the preset automated process is executed based on the interface analysis map database, so that in the present embodiment, the state of the interface can be obtained by analyzing the interface image, so as to execute the preset automated process, and there is no need to install third-party software on the system and obtain the state of the interface through the system interface, and the automated process of systems that do not allow third-party software to be installed, such as the interface closed system, or the like, can be realized, and the applicability of the RPA is greatly improved, so that the technical problem of poor applicability of the RPA in the prior art is solved.

Further, referring to fig. 2, in another embodiment of the vision-based process automation method based on the first embodiment in the present application, the parsing result includes an image classification result, a semantic segmentation result and a target detection result,

in step S10, the step of analyzing the interface image to obtain an analysis result includes:

step S11, inputting the interface image into a preset image classification model to identify the interface image and obtain the image classification result;

in this embodiment, it should be noted that the preset image classification model is a neural network model that has been trained based on deep learning.

Inputting the interface image into a preset image classification model to identify the interface image to obtain an image classification result, specifically, inputting the interface image into the preset image classification model to perform convolution and pooling alternative processing on the interface image for a preset number of times to obtain a convolution and pooling processing result corresponding to the preset image classification model, further fully connecting the convolution and pooling processing results corresponding to the preset image classification model to obtain an image classification unique vector corresponding to the interface image, extracting feature information in the image classification unique vector to further obtain the image classification result, wherein the convolution refers to a process of performing element-by-element multiplication on an image matrix corresponding to the image and a convolution kernel to obtain an image feature value, and the convolution kernel refers to a weight matrix corresponding to the interface image feature, the pooling refers to a process of integrating feature values of images obtained by convolution to obtain new feature values, the full connection can be regarded as a special convolution processing, and the result of the special convolution processing is to obtain a one-dimensional vector corresponding to the images.

The step of inputting the interface image into a preset image classification model to identify the interface image and obtain the image classification result comprises the following steps:

step S111, inputting the interface image into the preset image classification model to perform convolution and pooling alternative processing on the interface image for preset times to obtain a plurality of image classification characteristic graphs corresponding to the interface image;

in this embodiment, the interface image is input into the preset image classification model to perform convolution and pooling alternation processing on the interface image for a preset number of times to obtain a plurality of image classification feature maps corresponding to the interface image, specifically, the interface image is input into the preset image classification model, performing convolution processing on the interface image to obtain a convolution processing result corresponding to the preset image classification model, further performing pooling treatment on the convolution treatment result corresponding to the preset image classification model to obtain a pooling treatment result corresponding to the preset image classification model, further repeating the convolution and pooling treatment processes, after the convolution and processing of the preset times are carried out, a plurality of image classification characteristic graphs corresponding to the interface image are obtained, and the image classification characteristic graphs comprise all image characteristic information of the interface image.

Step S112, fully connecting the image classification characteristic graphs to obtain image classification characteristic vectors, and extracting interface information in the image classification characteristic vectors to obtain the image classification result.

In this embodiment, the image classification feature maps are fully connected to obtain an image classification feature vector, and interface information in the image classification feature vector is extracted to obtain the image classification result, specifically, the image classification feature maps are fully connected to obtain image classification feature vectors corresponding to the image classification feature maps, where the image classification feature vector includes all interface features of the interface image, where the interface features include an interface type, an interface attribute, and the like, and then the interface information in the image classification feature vector is extracted, where the interface information includes all interface features, and then based on the interface features, the image is classified and identified to obtain the image classification result.

Step S12, inputting the interface image into a preset semantic segmentation model, and performing semantic segmentation on the interface image to obtain a semantic segmentation result;

in this embodiment, it should be noted that the preset semantic segmentation model includes a convolutional neural network.

Inputting the interface image into a preset semantic segmentation model, performing semantic segmentation on the interface image to obtain a semantic segmentation result, specifically, inputting the interface image into the convolutional neural network, coding the interface image, namely, downsampling the interface image to obtain a coding result, wherein the encoding result is an image matrix output by the convolutional neural network, and the pixel values in the image matrix represent the identification and classification results of pixel points, for example, assuming that the pixel values in the image matrix are composed of 0 and 1, a pixel value of 1 indicates that the corresponding pixel belongs to the navigation bar region, a pixel value of 0 indicates that the corresponding pixel belongs to the background region, and further, and decoding the coding result, namely, up-sampling the coding result to obtain the semantic segmentation result.

The interface image is input into a preset semantic segmentation model, the interface image is subjected to semantic segmentation, and the semantic segmentation result is obtained through the steps of:

step S121, inputting the interface image into the preset semantic segmentation model to encode the interface image to obtain an encoding result;

in this embodiment, it should be noted that the encoding includes convolution processing, pooling processing, and the like.

Inputting the interface image into the preset semantic segmentation model to encode the interface image to obtain an encoding result, specifically, inputting the interface image into the convolutional neural network to perform convolution and pooling alternative processing on the interface image for a preset number of times, extracting features of each pixel point in the interface image, that is, obtaining high-level semantic information, classifying and identifying the pixel points in the interface image based on the high-level semantic information to obtain an identification classification result, specifically, for example, assuming that the interface image includes a navigation bar region and a background region, after inputting the interface image into the convolutional neural network, identifying the extracted features of the pixel points, and calculating probabilities P1 and P2 that the pixel points belong to the navigation bar region and the background region respectively, and P1+ P2 being 1, if P1 is larger than P2, the pixel point belongs to the navigation bar area, if P1 is smaller than P2, the pixel point belongs to the background area, and the pixel point is divided into two types, wherein one type corresponds to the navigation bar area, and the other type corresponds to the background area, and further, based on the identification and classification result, the encoding result is output.

And S122, decoding the coding result to obtain the semantic segmentation result.

In this embodiment, it should be noted that the decoding includes deconvolution, inverse pooling, and the like, the semantic segmentation result is a semantic segmentation image, and the resolution of the semantic segmentation image and the resolution of the interface image should be consistent.

Decoding the coding result to obtain the semantic segmentation result, specifically, performing deconvolution processing on an image matrix corresponding to the coding result, that is, multiplying the image matrix corresponding to the coding result with a weight matrix transferred by the convolutional neural network to obtain a semantic image matrix corresponding to the semantic segmentation image, and further outputting the semantic segmentation image corresponding to the semantic image matrix, and further, according to the classification of the pixel points, distinguishing different regions in the semantic segmentation image by different colors to obtain the semantic segmentation result.

And step S13, inputting the interface image after semantic segmentation into a preset target detection model, and carrying out target detection on the interface image to obtain the target detection result.

In this embodiment, it should be noted that the preset target detection model is a neural network model that has been trained based on deep learning.

Inputting the interface image subjected to semantic segmentation into a preset target detection model, performing target detection on the interface image, and obtaining the target detection result, specifically, inputting the interface image subjected to semantic segmentation into the preset target detection model, performing convolution and pooling alternative processing and full connection on the interface image for a preset number of times, obtaining a regional target feature vector, and further extracting a regional target and attribute information in the target feature vector to obtain the target detection result.

Inputting the interface image subjected to semantic segmentation into a preset target detection model, performing target detection on the interface image, and obtaining the target detection result, wherein the step of inputting the interface image subjected to semantic segmentation into the preset target detection model comprises the following steps:

step S131, inputting the interface image after semantic segmentation into the preset target detection model to perform frame selection on candidate areas in the interface image to obtain target frames corresponding to the candidate areas;

in this embodiment, it should be noted that the candidate region refers to a region that may be a region element in the interface region.

The interface image after semantic segmentation is input into the preset target detection model to perform frame selection on candidate regions in the interface image to obtain target frames corresponding to the candidate regions, specifically, the interface image after semantic segmentation is input into the preset target detection model to perform frame selection on the candidate regions in the interface image, and the size of the target frames is determined according to the boundaries of the candidate regions, and the candidate regions are in the target frames to further obtain one or more target frames corresponding to the candidate regions.

Step S132, performing convolution and pooling alternative processing on each target frame for preset times to obtain a plurality of target frame feature maps corresponding to each target frame;

in this embodiment, the target frame refers to a picture region framed by the target frame.

Performing convolution and pooling alternative processing on each target frame for preset times to obtain a plurality of target frame feature maps corresponding to each target frame, specifically, performing convolution processing on each target frame to obtain a target frame convolution processing result, further performing pooling processing on the target frame convolution processing result to obtain a target frame pooling processing result, further repeating the convolution and pooling processing, and after performing convolution and pooling alternative processing for preset times, obtaining a plurality of target frame feature maps corresponding to each target frame.

Step S133, performing full connection on the plurality of target frame feature maps to obtain target feature vectors corresponding to the target frames, and extracting target information in the target feature vectors to obtain the target detection result.

In this embodiment, it should be noted that the target feature vector includes all target information of the target frame, where the all target information includes the area target and attribute information, such as a determination result of whether a candidate area corresponding to the target frame is an interface element, an interface element type, and a position of the interface element, and based on the target information, the target frame may be detected and identified to obtain a target detection result.

In this embodiment, the interface image is input into a preset image classification model to identify the interface image, so as to obtain the image classification result, the interface image is input into a preset semantic segmentation model, the interface image is subjected to semantic segmentation, so as to obtain the semantic segmentation result, further, the interface image subjected to semantic segmentation is input into a preset target detection model, and the interface image is subjected to target detection, so as to obtain the target detection result. That is, in this embodiment, the interface image is first input into a preset image classification model to identify the interface image, so as to obtain the image classification result, and then the interface image is input into a preset semantic segmentation model to perform semantic segmentation on the interface image, so as to obtain the semantic segmentation result. And further, inputting the interface image subjected to semantic segmentation into a preset target detection model to perform target detection on the interface image, so as to obtain the target detection result. That is, the present embodiment provides methods for obtaining the image classification result, the semantic segmentation result, and the target detection result, that is, the present embodiment provides a method for analyzing an interface image to obtain an analysis result, and the method does not need to install RPA software on a system, and also does not need to obtain page information through a system interface, thereby implementing interface analysis on a closed system, and therefore, a foundation is laid for solving the technical problem of poor applicability of RPA in the prior art.

Referring to fig. 3, fig. 3 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 3, the vision-based process automation device may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. The communication bus 1002 is used for realizing connection communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a memory device separate from the processor 1001 described above.

Optionally, the vision-based process automation device may further include a rectangular user interface, a network interface, a camera, RF (Radio Frequency) circuitry, sensors, audio circuitry, a WiFi module, and the like. The rectangular user interface may comprise a Display screen (Display), an input sub-module such as a Keyboard (Keyboard), and the optional rectangular user interface may also comprise a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).

Those skilled in the art will appreciate that the vision-based process automation device configuration shown in fig. 3 does not constitute a limitation of the vision-based process automation device and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.

As shown in fig. 3, a memory 1005, which is a type of computer storage medium, may include an operating system, a network communication module, and a vision-based automated flow program. An operating system is a program that manages and controls the hardware and software resources of a vision-based process automation device, supporting the operation of the vision-based automation process program as well as other software and/or programs. The network communication module is used to enable communication between the various components within the memory 1005, as well as with other hardware and software in the vision-based automated flow system.

In the vision-based process automation device shown in fig. 3, the processor 1001 is configured to execute the vision-based automation process program stored in the memory 1005 to implement the steps of any one of the vision-based process automation methods described above.

The specific implementation of the vision-based process automation device of the present invention is substantially the same as that of the above-mentioned embodiments of the vision-based process automation method, and is not described herein again.

The present invention also provides a vision-based process automation apparatus, including:

Optionally, the grooming module comprises:

Optionally, the comb unit comprises:

Optionally, the execution module includes:

Optionally, the parsing module includes:

Optionally, the image recognition unit includes:

Optionally, the semantic segmentation unit includes:

Optionally, the target detection unit includes:

The present invention provides a readable storage medium storing one or more programs, the one or more programs being further executable by one or more processors for implementing the steps of any of the above-described vision-based process automation methods.

The specific implementation of the medium of the present invention is substantially the same as the embodiments of the vision-based process automation method, and will not be described herein again.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A vision-based process automation method, characterized in that the vision-based process automation method comprises:

2. The vision-based process automation method of claim 1, wherein the step of combing the association relationship of the interface elements in the interface image based on the preset knowledge-graph template and the parsing result to establish an interface parsing database comprises:

3. The vision-based process automation method of claim 2, where the interface elements include interface, interface region, and region elements, the parsing results include image classification results, semantic segmentation results, and target detection results,

4. The vision-based process automation method of claim 1, wherein the step of executing a preset automation process based on the interface resolution database comprises:

5. The vision-based process automation method of claim 1, in which the parsing results include image classification results, semantic segmentation results, and target detection results,

6. The vision-based process automation method of claim 5, wherein the step of inputting the interface image into a preset image classification model to identify the interface image and obtaining the image classification result comprises:

7. The vision-based process automation method of claim 5, wherein the step of inputting the interface image into a preset semantic segmentation model, performing semantic segmentation on the interface image, and obtaining the semantic segmentation result comprises:

and decoding the coding result to obtain the semantic segmentation result.

8. The vision-based process automation method of claim 5, wherein the step of inputting the interface image after semantic segmentation into a preset target detection model, performing target detection on the interface image, and obtaining the target detection result comprises:

9. A vision-based process automation device, characterized in that the vision-based process automation device comprises: a memory, a processor, and a program stored on the memory for implementing the vision-based process automation method,

the memory is for storing a program for implementing a vision-based process automation method;

the processor is configured to execute a program implementing the vision-based process automation method to implement the steps of the vision-based process automation method according to any one of claims 1 to 8.

10. A readable storage medium having stored thereon a program for implementing a vision-based process automation method, the program being executed by a processor to implement the steps of the vision-based process automation method as claimed in any one of claims 1 to 8.