Background
In the field of Robot Process Automation (RPA), in order to implement Automation of a Process, a software robot needs to frequently access control elements (interface elements for short) on a software interface and operate on the interface elements to execute corresponding operation tasks.
Artificial Intelligence (AI) is a new technology science for researching and developing theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence. Research in the field of artificial intelligence includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others.
In the prior art, in order to ensure the accuracy of an automation process, when a software robot runs the process, the position of a target element needs to be accurately matched and the automation operation needs to be performed on the target element. In application scenarios such as remote desktop or virtual machine, interface elements are generally detected by computer vision technology, and feature attributes of the interface elements are extracted as matching bases of the interface elements during process operation.
However, such a matching method is not stable, and it is easy to cause matching errors or matching failures of the target elements, so that the accuracy of the automated process is low.
Disclosure of Invention
The invention provides a matching method and a matching device for software interface elements by combining RPA and AI, which can improve the matching accuracy of the interface elements on a software interface in the robot process automation process, and have the advantages of simple implementation mode and stable and reliable effect.
In a first aspect, the present disclosure provides a matching method for software interface elements combining RPA and AI, including:
extracting interface elements in the current software interface by adopting an Optical Character Recognition (OCR) technology;
matching the characteristic information of the target element with the interface element in the current software interface to obtain the distribution information of the target element on the current software interface;
and executing the access operation on the target element according to the distribution information.
In one possible design, the extracting interface elements in the current software interface by using OCR technology includes:
intercepting an interface image of a current software interface;
and extracting all interface elements from the interface image by an Optical Character Recognition (OCR) technology or a pre-trained deep learning model.
In one possible design, matching the feature information of the target element with the interface element in the current software interface to obtain the distribution information of the target element on the current software interface includes:
searching a second anchor point element matched with the first anchor point element from the current software interface according to the category information, the position information and the text information corresponding to the first anchor point element;
determining the distribution information of the target element on the current software interface according to the position relationship between the target element and the first anchor point element and the position of the second anchor point element in the current software interface; the distribution information includes: coordinate information of at least one shape point of the target element, size information of the target element; wherein the shape points are used to define an area that the target element includes.
In one possible design, before matching the feature information of the target element with the interface element in the current software interface, the method further includes:
intercepting an interface image of a template software interface;
extracting all interface elements from the interface image of the template software interface as candidate elements by an Optical Character Recognition (OCR) technology or a pre-trained deep learning model;
selecting a target element from the candidate elements and a first anchor element associated with the target element; wherein the first anchor element comprises: any one or more of an icon element, a text element and a key element with invariable forms;
generating feature information of the target element according to the target element and the first anchor point element; the characteristic information of the target element comprises: the position relation between the target element and the first anchor element, and the category information, the position information and the text information corresponding to the first anchor element.
In one possible design, before performing the access operation on the target element according to the distribution information, the method further includes:
detecting the overlapping degree of the area corresponding to the distribution information and the interface element in the current software interface to obtain an overlapping threshold value;
and if the overlapping threshold value is larger than a preset value, executing the access to the target element.
In one possible design, further comprising:
and if the overlapping threshold value is not larger than a preset value, determining that the target element is invalid, and feeding back matching failure prompt information.
In a second aspect, the present disclosure also provides a matching device for software interface elements combining RPA and AI, including:
the extraction module is used for extracting interface elements in the current software interface by adopting an Optical Character Recognition (OCR) technology;
the matching module is used for matching the characteristic information of the target element with the interface element in the current software interface to obtain the distribution information of the target element on the current software interface;
and the execution module is used for executing the access operation on the target element according to the distribution information.
In one possible design, the extraction module is specifically configured to:
intercepting an interface image of a current software interface;
and extracting all interface elements from the interface image by an Optical Character Recognition (OCR) technology or a pre-trained deep learning model.
In one possible design, the matching module is specifically configured to:
searching a second anchor point element matched with the first anchor point element from the current software interface according to the category information, the position information and the text information corresponding to the first anchor point element;
determining the distribution information of the target element on the current software interface according to the position relationship between the target element and the first anchor point element and the position of the second anchor point element in the current software interface; the distribution information includes: coordinate information of at least one shape point of the target element, size information of the target element; wherein the shape points are used to define an area that the target element includes.
In one possible design, further comprising: the acquisition module is used for intercepting an interface image of the template software interface before matching the characteristic information of the target element with the interface element in the current software interface;
extracting all interface elements from the interface image of the template software interface as candidate elements by an Optical Character Recognition (OCR) technology or a pre-trained deep learning model;
selecting a target element from the candidate elements and a first anchor element associated with the target element; wherein the first anchor element comprises: any one or more of an icon element, a text element and a key element with invariable forms;
generating feature information of the target element according to the target element and the first anchor point element; the characteristic information of the target element comprises: the position relation between the target element and the first anchor element, and the category information, the position information and the text information corresponding to the first anchor element.
In one possible design, further comprising: an overlap degree judgment module, configured to:
detecting the overlapping degree of the area corresponding to the distribution information and the interface element in the current software interface to obtain an overlapping threshold value;
and if the overlapping threshold value is larger than a preset value, executing the access to the target element.
In one possible design, further comprising:
and the feedback module is used for determining that the target element is invalid and feeding back matching failure prompt information when the overlapping threshold value is not larger than a preset value.
In a third aspect, the present disclosure also provides an electronic device, including:
a processor; and the number of the first and second groups,
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform any one of the first aspect methods of matching software interface elements in conjunction with RPA and AI via execution of the executable instructions.
In a fourth aspect, the disclosed embodiments also provide a storage medium, on which a computer program is stored, where the program, when executed by a processor, implements any one of the matching methods for software interface elements combining RPA and AI in the first aspect.
The invention provides a matching method and a device of software interface elements combining RPA and AI, which extracts the interface elements in the current software interface by adopting an OCR technology; matching the characteristic information of the target element with the interface element in the current software interface to obtain the distribution information of the target element on the current software interface; and executing the access operation on the target element according to the distribution information. Therefore, the matching accuracy of the interface elements on the software interface in the robot process automation process can be improved, the implementation mode is simple, and the effect is stable and reliable.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present disclosure and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the field of Robot Process Automation (RPA), in order to implement Automation of a Process, a software robot needs to frequently access control elements (interface elements for short) on a software interface and operate on the interface elements to execute corresponding operation tasks. In the prior art, in order to ensure the accuracy of an automation process, when a software robot runs the process, the position of a target element needs to be accurately matched and the automation operation needs to be performed on the target element. In application scenarios such as remote desktop or virtual machine, interface elements are generally detected by computer vision technology, and feature attributes of the interface elements are extracted as matching bases of the interface elements during process operation. However, such a matching method is not stable, and it is easy to cause matching errors or matching failures of the target elements, so that the accuracy of the automated process is low.
In view of the above technical problems, the present disclosure provides a matching method and device for software interface elements by combining RPA and AI, which can improve the accuracy of matching interface elements on a software interface in a robot process automation process, and has the advantages of simple implementation manner and stable and reliable effect. Fig. 1 is an application scenario diagram illustrating a matching method of software interface elements combining an RPA and an AI according to an example embodiment of the present disclosure, where the interface elements in a software interface mainly include text, icons, and controls, as shown in fig. 1. In general, there is a text element (Label) in the control element to identify it, such as: there is typically a simple text inside the button that identifies the function of the button (e.g., "OK" or "Cancel"), etc.; there will also be a simple text on the left or top side of the input box button to identify the function of the input box (e.g., "username" or "password"), etc.); therefore, when the matching search is performed on the interface element, the Label information used as the identification can be sufficiently utilized for assistance. These Label information are referred to as "anchor points" in this disclosure. The anchor point is more generally defined and described below. "anchor point" is understood to mean a reference point, similar to a landmark, which is morphologically stable (variable in position), easily recognizable, and globally unique. Here, an "anchor point" may be an icon or a piece of text. Therefore, the text elements are detected by an Optical Character Recognition (OCR) technology, and the position and the character content of each section of text in the interface are detected; for the icons and the control elements, the positions and the types of the icons and the control elements in the interface can be detected through a deep learning target detection algorithm (such as SSD \ Faster R-CNN).
Then, the software robot can search a second anchor point element matched with the first anchor point element from the current software interface according to the category information, the position information and the text information corresponding to the first anchor point element; and determining the distribution information of the target element on the current software interface according to the position relationship between the target element and the first anchor point element and the position of the second anchor point element in the current software interface. The first anchor element is an anchor of the template software interface, the second anchor element is an anchor of the current software interface, and the anchor elements comprise: any one or more of a morphically invariant icon element, text element, key element. If the anchor point element is an icon, matching and searching are carried out in a template matching mode; and if the anchor point element is a text, matching and searching are carried out in a character string matching mode. Thus, a second anchor element that matches the first anchor element may be found in the current software interface. And then, determining the distribution information of the target element on the current software interface by combining the position relation between the target element and the first anchor element in the template software interface and the position of the second anchor element in the current software interface, so that the area range of the target element can be determined to be used as a candidate area. The distribution information of the interface element may be described by coordinate information of at least one shape point, which may be a vertex of the interface element or a center point of the interface element, size information of the target element. The distribution information of the rectangular interface element may be described by four vertices, and the distribution information of the circular interface element may be described by a center point. For example, a circular interface element (circular button), knowing the center position and the radius of the circle, the area of the interface element can be determined. According to the coordinate conversion relation between the coordinate information corresponding to the anchor point area and the coordinate information corresponding to the interface element, the coordinate of the shape point of the interface element can be quickly determined, and further the information such as the position coordinate, the size and the like of the interface element is determined.
Finally, after the distribution information of the target element is acquired, the target element may be accessed, for example, a pick-up and simulation operation of the target element. In a possible implementation, before performing the access operation on the target element according to the distribution information, the method further includes: carrying out overlapping degree detection on target elements in the area corresponding to the distribution information and interface elements in the current software interface to obtain an overlapping threshold value; and if the overlapping threshold value is larger than the preset value, determining that the target element is effective. And if the overlapping threshold value is not larger than the preset value, determining that the target element is invalid, and feeding back matching failure prompt information.
The method can improve the matching accuracy of the interface elements on the software interface in the robot process automation process, and has the advantages of simple implementation mode and stable and reliable effect.
Fig. 2 is a flowchart illustrating a matching method for software interface elements combining RPA and AI according to an example embodiment of the present disclosure, and as shown in fig. 2, the method provided in this embodiment may include:
step 101, extracting interface elements in a current software interface by adopting an Optical Character Recognition (OCR) technology.
In this embodiment, the software robot may intercept an interface image of the current software interface. Then, all interface elements are extracted from the interface image through an Optical Character Recognition (OCR) technology or a pre-trained deep learning model.
Specifically, interface elements in the software interface mainly include text, icons, and controls. In general, there is a text element (Label) in the control element to identify it, such as: there is typically a simple text inside the button that identifies the function of the button (e.g., "OK" or "Cancel"), etc.; there will also be a simple text on the left or top side of the input box button to identify the function of the input box (e.g., "username" or "password"), etc.); therefore, when the matching search is performed on the interface element, the Label information used as the identification can be sufficiently utilized for assistance. These Label information are referred to as "anchor points" in this disclosure. The anchor point is more generally defined and described below. "anchor point" is understood to mean a reference point, similar to a landmark, which is morphologically stable (variable in position), easily recognizable, and globally unique. Here, an "anchor point" may be an icon or a piece of text. Therefore, the text elements are detected by an Optical Character Recognition (OCR) technology, and the position and the character content of each section of text in the interface are detected; for the icons and the control elements, the positions and the types of the icons and the control elements in the interface can be detected through a deep learning target detection algorithm (such as SSD \ Faster R-CNN).
And 102, matching the characteristic information of the target element with the interface element in the current software interface to obtain the distribution information of the target element on the current software interface.
In this embodiment, the software robot may search, from the current software interface, a second anchor point element that matches the first anchor point element according to the category information, the position information, and the text information corresponding to the first anchor point element; determining the distribution information of the target element on the current software interface according to the position relationship between the target element and the first anchor point element and the position of the second anchor point element in the current software interface; the distribution information includes: coordinate information of at least one shape point of the target element, size information of the target element; wherein the shape points are used to define the area that the target element contains.
Specifically, the first anchor element is an anchor point of the template software interface, the second anchor element is an anchor point of the current software interface, and the anchor elements include: any one or more of a morphically invariant icon element, text element, key element. If the anchor point element is an icon, matching and searching are carried out in a template matching mode; and if the anchor point element is a text, matching and searching are carried out in a character string matching mode. Thus, a second anchor element that matches the first anchor element may be found in the current software interface. And then, determining the distribution information of the target element on the current software interface by combining the position relation between the target element and the first anchor element in the template software interface and the position of the second anchor element in the current software interface, so that the area range of the target element can be determined to be used as a candidate area. The distribution information of the interface element may be described by coordinate information of at least one shape point, which may be a vertex of the interface element or a center point of the interface element, size information of the target element. The distribution information of the rectangular interface element may be described by four vertices, and the distribution information of the circular interface element may be described by a center point. For example, a circular interface element (circular button), knowing the center position and the radius of the circle, the area of the interface element can be determined. According to the coordinate conversion relation between the coordinate information corresponding to the anchor point area and the coordinate information corresponding to the interface element, the coordinate of the shape point of the interface element can be quickly determined, and further the information such as the position coordinate, the size and the like of the interface element is determined.
And 103, executing the access operation on the target element according to the distribution information.
In this embodiment, after the distribution information of the target element is acquired, the target element may be accessed, for example, a picking and simulation operation of the target element.
In a possible implementation, before performing the access operation on the target element according to the distribution information, the method further includes: detecting the overlapping degree of the area corresponding to the distribution information and the interface element in the current software interface to obtain an overlapping threshold value; and if the overlapping threshold value is larger than the preset value, executing the access to the target element.
Specifically, the obtained candidate region and the interface element analyzed in step 101 are subjected to overlapping degree detection iou (interaction over union). And if the IOU result is larger than the set threshold value, the candidate area is considered to be effective.
In another possible implementation manner, if the overlap threshold is not greater than the preset value, it is determined that the candidate region is invalid, and a matching failure prompt message is fed back.
In a possible implementation manner, when there are multiple anchor elements and the candidate regions determined based on each anchor element are different, performing overlapping degree detection IOU on the obtained candidate regions and the interface elements analyzed in step 101, determining the candidate regions whose IOU results are greater than the set threshold, and performing access operation on the interface elements matched with the candidate regions whose IOU results are greater than the set threshold.
In a possible implementation manner, when there are multiple anchor elements and the candidate regions determined based on each anchor element are different, the obtained candidate regions and the interface elements analyzed in step 101 are subjected to overlapping degree detection IOU, the interface element with the highest comprehensive matching degree with each candidate region is determined, and an access operation on the interface element is performed. The comprehensive matching degree of the interface element and each candidate region may be the sum of the matching degrees of the interface element and each candidate region, or may be determined in other preset manners, which is not limited in this application.
Specifically, the software robot can also perform feedback to the user in a prompt message manner when the matching fails. The matching failure means that the overlapping threshold value of the target element in the area corresponding to the distribution information and the interface element in the current software interface is not larger than a preset value.
In the embodiment, interface elements in the current software interface are extracted; matching the characteristic information of the target element with the interface element in the current software interface to obtain the distribution information of the target element on the current software interface; and executing the access operation on the target element according to the distribution information. Therefore, the matching accuracy of the interface elements on the software interface in the robot process automation process can be improved, the implementation mode is simple, and the effect is stable and reliable.
Fig. 3 is a flowchart illustrating a matching method of software interface elements combining an RPA and an AI according to another exemplary embodiment of the present disclosure, and as shown in fig. 3, the method provided in this embodiment may include:
step 201, acquiring feature information of a first anchor point element and a target element of a template software interface.
In this embodiment, an interface image of a template software interface may be intercepted; extracting all interface elements from an interface image of a template software interface as candidate elements by an Optical Character Recognition (OCR) technology or a pre-trained deep learning model; selecting a target element from the candidate elements and a first anchor element associated with the target element; wherein the first anchor element comprises: any one or more of an icon element, a text element and a key element with invariable forms; generating characteristic information of the target element according to the target element and the first anchor point element; the characteristic information of the target element includes: the position relation between the target element and the first anchor element, and the category information, the position information and the text information corresponding to the first anchor element.
Specifically, an interface image of the template software interface may also be intercepted. Detecting the text elements by an OCR technology, and detecting the position and the character content of each section of text in the interface; for the icons and the control elements, the positions and the types of the icons and the control elements in the interface can be detected through a deep learning target detection algorithm (such as SSD \ Faster R-CNN). And taking all the extracted interface elements as candidate elements, and designating target elements to be operated and anchor point elements for assisting the target elements to be searched. Taking a mailbox login interface as an example, the input box control is the target element to be operated, and the text such as the user name or the password can be selected as the anchor point element. Generating characteristic information according to information such as the target element and the anchor point element, storing the characteristic information into an RPA process source code, wherein the characteristic information mainly comprises the category and the position of the target element; the type, position and text content of anchor elements. When the anchor elements are matched, the anchor elements can be matched, and then the positions of the target elements on the current software interface are determined through the matched anchor elements. The specific matching implementation is not described herein again.
Step 202, extracting interface elements in the current software interface by adopting an OCR technology.
And step 203, matching the characteristic information of the target element with the interface element in the current software interface to obtain the distribution information of the target element on the current software interface.
And step 204, executing the access operation on the target element according to the distribution information.
In this embodiment, please refer to the related description in step 101 to step 103 in the method shown in fig. 2 for the specific implementation process and technical principle of step 202 to step 204, which is not described herein again.
In the embodiment, interface elements in the current software interface are extracted; matching the characteristic information of the target element with the interface element in the current software interface to obtain the distribution information of the target element on the current software interface; and executing the access operation on the target element according to the distribution information. Therefore, the matching accuracy of the interface elements on the software interface in the robot process automation process can be improved, the implementation mode is simple, and the effect is stable and reliable.
In addition, the implementation can also intercept an interface image of the template software interface; extracting all interface elements from an interface image of a template software interface as candidate elements by an Optical Character Recognition (OCR) technology or a pre-trained deep learning model; selecting a target element from the candidate elements and a first anchor element associated with the target element; wherein the first anchor element comprises: any one or more of an icon element, a text element and a key element with invariable forms; generating characteristic information of the target element according to the target element and the first anchor point element; the characteristic information of the target element includes: the position relation between the target element and the first anchor element, and the category information, the position information and the text information corresponding to the first anchor element.
Fig. 4 is a schematic structural diagram illustrating a matching apparatus for combining software interface elements of an RPA and an AI according to an example embodiment of the present disclosure. As shown in fig. 4, the matching device for software interface elements combining RPA and AI according to this embodiment may include:
an extracting module 31, configured to extract an interface element in a current software interface by using an Optical Character Recognition (OCR) technology;
the matching module 32 is configured to match the feature information of the target element with an interface element in the current software interface to obtain distribution information of the target element on the current software interface;
and the execution module 33 is configured to execute an access operation on the target element according to the distribution information.
In one possible design, the extraction module 31 is specifically configured to:
intercepting an interface image of a current software interface;
and extracting all interface elements from the interface image by an Optical Character Recognition (OCR) technology or a pre-trained deep learning model.
In one possible design, the matching module 32 is specifically configured to:
searching a second anchor point element matched with the first anchor point element from the current software interface according to the category information, the position information and the text information corresponding to the first anchor point element;
determining the distribution information of the target element on the current software interface according to the position relationship between the target element and the first anchor point element and the position of the second anchor point element in the current software interface; the distribution information includes: coordinate information of at least one shape point of the target element, size information of the target element; wherein the shape points are used to define the area that the target element contains.
The apparatus provided in this embodiment may be used to implement the technical solution of the method embodiment shown in fig. 2, and the implementation principle and the technical effect are similar, which are not described herein again.
In the embodiment, interface elements in the current software interface are extracted; matching the characteristic information of the target element with the interface element in the current software interface to obtain the distribution information of the target element on the current software interface; and executing the access operation on the target element according to the distribution information. Therefore, the matching accuracy of the interface elements on the software interface in the robot process automation process can be improved, the implementation mode is simple, and the effect is stable and reliable.
On the basis of the embodiment shown in fig. 4, fig. 5 is a schematic structural diagram of a matching apparatus for combining software interface elements of an RPA and an AI according to another exemplary embodiment of the present disclosure, and as shown in fig. 5, the matching apparatus for combining software interface elements of an RPA and an AI provided in this embodiment further includes:
the obtaining module 34 is configured to intercept an interface image of the template software interface before matching feature information of the target element with an interface element in the current software interface;
extracting all interface elements from an interface image of a template software interface as candidate elements by an Optical Character Recognition (OCR) technology or a pre-trained deep learning model;
selecting a target element from the candidate elements and a first anchor element associated with the target element; wherein the first anchor element comprises: any one or more of an icon element, a text element and a key element with invariable forms;
generating characteristic information of the target element according to the target element and the first anchor point element; the characteristic information of the target element includes: the position relation between the target element and the first anchor element, and the category information, the position information and the text information corresponding to the first anchor element.
In one possible design, further comprising: an overlap determination module 35, configured to:
detecting the overlapping degree of the area corresponding to the distribution information and the interface element in the current software interface to obtain an overlapping threshold value;
and if the overlapping threshold value is larger than the preset value, executing the access to the target element.
In one possible design, further comprising:
and the feedback module 36 is configured to determine that the target element is invalid and feed back a matching failure prompt message when the overlap threshold is not greater than the preset value.
The apparatus provided in this embodiment may be used to implement the technical solutions of the method embodiments shown in fig. 2 and fig. 3, and the implementation principles and technical effects are similar, which are not described herein again.
In the embodiment, interface elements in the current software interface are extracted; matching the characteristic information of the target element with the interface element in the current software interface to obtain the distribution information of the target element on the current software interface; and executing the access operation on the target element according to the distribution information. Therefore, the matching accuracy of the interface elements on the software interface in the robot process automation process can be improved, the implementation mode is simple, and the effect is stable and reliable.
Fig. 6 is a schematic structural diagram of an electronic device shown in the present disclosure according to an example embodiment. As shown in fig. 6, the present embodiment provides an electronic device 40, including:
a processor 401; and the number of the first and second groups,
a memory 402 for storing executable instructions of the processor, which may also be a flash (flash memory);
wherein the processor 401 is configured to perform the respective steps of the above-described method via execution of executable instructions. Reference may be made in particular to the description relating to the preceding method embodiment.
Alternatively, the memory 402 may be separate or integrated with the processor 401.
When the memory 402 is a device independent of the processor 401, the electronic device 40 may further include:
a bus 403 for connecting the processor 401 and the memory 402.
The present embodiment also provides a readable storage medium, in which a computer program is stored, and when at least one processor of the electronic device executes the computer program, the electronic device executes the methods provided by the above various embodiments.
The present embodiment also provides a program product comprising a computer program stored in a readable storage medium. The computer program can be read from a readable storage medium by at least one processor of the electronic device, and the execution of the computer program by the at least one processor causes the electronic device to implement the methods provided by the various embodiments described above.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present disclosure, and not for limiting the same; while the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present disclosure.