CN115269107B - Method, medium and electronic device for processing interface image - Google Patents

Method, medium and electronic device for processing interface image Download PDF

Info

Publication number
CN115269107B
CN115269107B CN202211205653.4A CN202211205653A CN115269107B CN 115269107 B CN115269107 B CN 115269107B CN 202211205653 A CN202211205653 A CN 202211205653A CN 115269107 B CN115269107 B CN 115269107B
Authority
CN
China
Prior art keywords
node
nodes
pair
graph
adjacent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211205653.4A
Other languages
Chinese (zh)
Other versions
CN115269107A (en
Inventor
杭天欣
康佳慧
高煜光
张泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hongji Information Technology Co ltd
Shanghai Hongji Information Technology Co Ltd
Original Assignee
Beijing Hongji Information Technology Co ltd
Shanghai Hongji Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Hongji Information Technology Co ltd, Shanghai Hongji Information Technology Co Ltd filed Critical Beijing Hongji Information Technology Co ltd
Priority to CN202211205653.4A priority Critical patent/CN115269107B/en
Publication of CN115269107A publication Critical patent/CN115269107A/en
Application granted granted Critical
Publication of CN115269107B publication Critical patent/CN115269107B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a method, a medium and an electronic device for processing an interface image, wherein the method comprises the following steps: acquiring at least one hierarchy father node corresponding to an element on an interface image to be detected, and acquiring an element structure tree according to the father node, wherein the father node of any hierarchy is acquired through node pair characteristics of each adjacent node on an element graph of the layer; providing the element structure tree. Some embodiments of the present application provide a simple and efficient method of generating an element structure for a robot to locate elements on an interface image according to the element structure tree.

Description

Method, medium and electronic device for processing interface image
Technical Field
The present application relates to the field of robot flow automation, and in particular, to a method, medium, and electronic device for processing an interface image.
Background
In the implementation Process of Robot Process Automation (RPA), for a common interface (which may be a web page or an interface image corresponding to an app), before a software robot clicks a certain button (as an example of an element on the interface), the position and semantics of the button need to be accurately recognized, and the accuracy of the related technology depends on the common accuracy of multiple models, such as object detection, template matching, and OCR (Optical Character Recognition).
Since the software robot in the related art searches for some elements in the interface, depending on semantic information given by OCR, the software robot has poor robustness to language version or color shape transformation.
Therefore, how to improve the accuracy of the robot in searching the elements on the interface becomes a technical problem to be solved urgently.
Disclosure of Invention
In order to solve the problems pointed out in the background section, some embodiments of the present application provide a simple and efficient method for generating an element structure tree (for example, each layer of parent nodes in the element structure tree is obtained by identifying whether adjacent nodes in element graphs of each constructed hierarchy have common parent nodes or not), so that a software robot does not need to determine the selection of a button according to a cumbersome OCR result, but finds a corresponding element (for example, a button) position on an interface through an element hierarchy relationship shown in the element structure tree, thereby improving accuracy and speed of element positioning.
In a first aspect, an embodiment of the present application provides a method for processing an interface image, where the method includes: acquiring at least one hierarchy father node corresponding to an element on an interface image to be detected, and acquiring an element structure tree according to the father node, wherein the father node of any hierarchy is acquired through node pair characteristics of each adjacent node on an element graph of the layer; and providing the element structure tree so that the robot positions the elements on the interface to be detected according to the element hierarchical relation displayed by the element structure tree.
According to some embodiments of the application, each layer of father nodes are obtained through the node pair characteristics of the adjacent nodes on each layer of constructed element graph, and then the element structure tree is obtained, so that a robot can complete element positioning on the interface image to be detected according to the element structure tree.
In some embodiments, the obtaining of at least one hierarchical parent node corresponding to an element on the interface image to be detected includes: constructing an ith layer element graph, wherein the value range of i is an integer which is more than or equal to 1, and the ith layer element graph adopts one node to represent one element of the layer and adopts edges to represent adjacent nodes of the layer; and if the ith layer element graph contains a plurality of nodes, acquiring a node combination with a common father node from the ith layer element graph to obtain the ith layer father node, wherein the node combination comprises at least one pair of adjacent nodes, and the pair of adjacent nodes corresponds to one edge on the ith layer element graph.
According to some embodiments of the application, when each layer of father nodes is obtained, the layer of element tree is firstly constructed, and then the father nodes corresponding to the adjacent nodes are obtained by identifying the adjacent nodes with the common father nodes on the layer of element tree, so that the layer of father nodes are obtained.
In some embodiments, the constructing the i-th layer element map comprises: and determining that the two nodes need to be connected by edges according to the distribution characteristics of the nodes included in the ith layer element diagram on the interface image to be detected.
Some embodiments of the application determine the structure tree of each layer according to the distribution characteristics of each node (that is, each element) on the interface image to be detected when constructing the structure tree of each layer, because the general spatial positions of a plurality of elements having a common parent node satisfy a certain rule, the structure tree of each layer having strong association can be constructed by this way, and the speed of obtaining the structure tree of elements is increased.
In some embodiments, the ith layer element graph comprises an mth node, wherein the constructing the ith layer element graph comprises: moving to the boundary of the interface image to be detected according to any one preset direction by taking the m-th node as a starting point, and taking a first node found in the moving process as an adjacent node of the m-th node; and connecting the mth node with the first node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by using an edge.
Some embodiments of the present application use a collision mapping method to obtain each layer element diagram, and may construct each layer element diagram with strong correlation in fewer connection ways, thereby reducing the running time of the model and reducing the running time of the entire RPA.
In some embodiments, the ith layer element graph comprises an mth node, wherein the constructing the ith layer element graph comprises: acquiring all nodes with a preset distance from the mth node to obtain at least one second node, wherein the at least one second node is used as an adjacent node of the mth node; and connecting each second node in the at least one second node with the mth node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by adopting an edge.
Some embodiments of the present application use a fixed distance mapping method to obtain each layer element diagram, and may construct strongly associated each layer element diagram in a fewer connection manner, thereby reducing the running time of the model and reducing the running time of the entire RPA.
In some embodiments, the ith layer element graph comprises an mth node, wherein the constructing the ith layer element graph comprises: acquiring at least part of nodes which are away from the mth node by a preset distance to obtain at least one third node, wherein each third node in the at least one third node is used as an adjacent node of the mth node, and the at least part of nodes are located in a sector area; and connecting each third node in the at least one third node with the mth node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by using one edge.
Some embodiments of the present application use a fan-shaped nearest element mapping method to obtain each layer element map, and can construct each layer element map with strong correlation in a fewer connection manner, thereby reducing the running time of the model and reducing the running time of the entire RPA.
In some embodiments, the obtaining, from the ith element graph, a node combination having a common parent node to obtain an ith parent node includes: obtaining node pair characteristics of any pair of adjacent nodes on the ith layer element graph to obtain a node pair characteristic set, wherein the node pair characteristics are characterized by element attributes of two elements corresponding to any pair of adjacent nodes; and inputting the node pair feature set into a target second classification model, and determining whether each adjacent node on the ith layer element graph has a common father node or not through the target second classification model to obtain the ith layer father node.
Some embodiments of the application predict whether adjacent nodes on any layer of element graph share the same father node through a trained binary classification model, for example, the target binary classification model obtains a prediction result through node pair characteristics of adjacent nodes on a corresponding layer of element graph, and the obtaining speed of the element structure tree can be improved by predicting whether the adjacent nodes have the father node through the trained binary classification model.
In some embodiments, the element attributes include: element location, element category, and element image semantics; alternatively, the node pair feature comprises at least one of a difference feature describing a difference of the arbitrary pair of neighboring nodes, a common feature describing a commonality of the arbitrary pair of neighboring nodes, or a correlation feature describing a correlation of the arbitrary pair of neighboring nodes.
Some embodiments of the present application define element attributes (i.e., data types included in the node features) and relationship features between nodes reflected by the node feature, thereby improving objectivity of the entire technical solution.
In some embodiments, the obtaining of the node pair characteristics of any pair of adjacent nodes on the ith element graph includes: calculating the node pair characteristics of the adjacent nodes consisting of the kth node and the pth node by adopting the following formula:
pair characteristics = [ a + b, a-b, a, b, a/(b + x), a, b ]
Wherein, pair feature characterizes the node pair feature, x is a positive number and the positive number is less than 1.
Some embodiments of the present application provide a calculation formula for how to quantify differences, commonalities, and correlations of two nodes of adjacent nodes, so as to improve objectivity of obtaining node pair features.
In some embodiments, any pair of neighboring nodes in the ith level element graph includes a kth node and a pth node, the kth node corresponding to a kth element of the ith level element graph, and the pth node corresponding to a pth element of the ith level element graph, wherein before obtaining node pair characteristics for any pair of neighboring nodes on the ith level element graph, the method further comprises: acquiring element positions, element categories and element image semantics of the kth element and the p element to obtain element attributes of the two elements; the obtaining node pair characteristics of any pair of adjacent nodes on the ith layer element graph comprises: and combining the element attributes of the two elements to obtain the node pair characteristic.
Some embodiments of the present application need to first obtain node characteristics of two nodes that constitute a neighboring node in order to obtain node pair characteristics of the neighboring node.
In some embodiments, the obtaining the element positions, the element categories, and the element image semantics of the kth element and the pth element to obtain the element attributes of the two elements includes: obtaining element positions and element types of the kth element and the p element through a target detection model; intercepting an image corresponding to the kth element from an interface image to be detected according to the element position to obtain a kth sub-picture, and intercepting an image corresponding to the pth element to obtain a pth sub-picture; and inputting the kth sub-picture and the pth sub-picture into a feature extractor to obtain an element image semantic corresponding to the kth element and obtain an element image semantic corresponding to the pth element.
Some embodiments of the present application provide a method of obtaining element attributes of an original element.
In some embodiments, the element categories include: a bordered image, a button, a label, or an editable input box.
Some embodiments of the present application provide a variety of categories of elements on an interface image that can be applied to aspects of the present application.
In some embodiments, prior to said inputting the set of node pair features into the target two classification model, the method further comprises: and training a two-classification model according to at least one training interface image and marking data to obtain the target two-classification model, wherein the marking data is used for marking whether each adjacent node on the element graph of each hierarchy has a common father node.
Some embodiments of the application obtain a target binary model by training a binary model, and when training the model, it needs to use supervision data (i.e. marking data) for marking whether two elements (i.e. two nodes corresponding to any adjacent node) forming the pair belong to the same father node, if the two elements belong to the same father node, the two elements are connected, the supervision information is true, otherwise, the supervision information is false, and the target binary model obtained by training the expression data has the capability of judging whether two nodes (corresponding to a certain element on a corresponding level element graph) forming the adjacent node share the same father node according to any node pair feature corresponding to any adjacent node on any level element graph.
In some embodiments, the training the binary model according to the at least one training interface image and the annotation data includes: performing element detection on any training interface image in the at least one training interface image to obtain at least one element; acquiring attribute characteristics of each element in the at least one element to obtain an element attribute, wherein the element attribute comprises an element position; constructing an initial training element graph according to the element positions; and training the two classification models according to the initial training element graph and the labeling data.
Some embodiments of the present application provide a method for constructing a hierarchical element graph for training based on element positions.
In some embodiments, the q node is included on the initial training element graph, wherein the constructing the initial training element graph according to the element positions includes: moving towards the boundary of the interface image to be detected according to any one preset direction by taking the q-th node as a starting point, and taking a first node found in the moving process as an adjacent node of the q-th node; connecting the q node with the first node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by one edge; and repeating the steps to obtain the initial training element diagram. Or acquiring all nodes which are away from the q-th node by a preset distance to obtain at least one second node, wherein the at least one second node is used as an adjacent node of the q-th node; connecting each second node of the at least one second node with the qth node by using an edge, wherein two nodes confirmed as adjacent nodes are connected by using one edge; and repeating the steps to obtain the initial training element diagram. Or acquiring at least part of nodes which are away from the q-th node by a preset distance to obtain at least one third node, wherein each third node in the at least one third node is used as an adjacent node of the q-th node, the at least part of nodes are located in a sector area, and the sector area belongs to a part of area of a circle which takes the q-th node as a circle center and takes the preset distance as a radius; connecting each third node of the at least one third node with the q-th node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by adopting an edge; and repeating the steps to obtain the initial training element diagram.
Some embodiments of the present application provide various methods of constructing a multi-level element graph for training.
In some embodiments, said training said binary classification model according to said initial training element graph and said annotation data comprises: obtaining node characteristics of each node on the initial training element graph, wherein the node characteristics are element attributes of corresponding elements, and the element attributes further include: element classes and element image semantics; obtaining node pair characteristics of each adjacent node on the initial training element graph according to the node characteristics to obtain a node pair characteristic set; respectively marking whether each adjacent node pair has a common father node or not to obtain marking data; and training the two classification models according to the labeling data and the node pair feature set to obtain the target two classification models.
Some embodiments of the present application provide a method for training a binary model through constructed node-to-feature and labeled data, so that a target binary model obtained through training has the capability of predicting whether adjacent nodes have a common parent node according to input node-to-feature.
In a second aspect, some embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, may implement the method as described in any of the embodiments of the first aspect.
In a third aspect, some embodiments of the present application provide an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, may implement the method according to any of the embodiments of the first aspect.
In a fourth aspect, some embodiments of the present application provide an apparatus for processing an interface image, the apparatus comprising: each hierarchy father node acquisition module is configured to acquire at least one hierarchy father node corresponding to an element on the interface diagram to be detected and acquire an element structure tree according to the father node, wherein the father node of any hierarchy is acquired through node pair characteristics of each adjacent node on the element diagram of the hierarchy; a providing module configured to provide the element structure tree.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic diagram of a web interface of the related art provided in an embodiment of the present application;
FIG. 2 is a flowchart of a method for processing an interface image according to an embodiment of the present disclosure;
FIG. 3 is a block diagram of a system for processing an interface image according to an embodiment of the present disclosure;
fig. 4 is a schematic composition diagram of an element semantic acquisition module according to an embodiment of the present application;
fig. 5 is a second flowchart of a method for processing an interface image according to an embodiment of the present application;
fig. 6 is a third flowchart of a method for processing an interface image according to an embodiment of the present application;
7-12 are schematic diagrams of a first layer element map obtained using a collision patterning method as provided in some implementations of the present application;
FIG. 13 is a schematic illustration of training a classification model provided by some embodiments of the present application;
FIG. 14 is a flow chart for obtaining node pair characteristics provided by some embodiments of the present application;
FIG. 15 is a block diagram illustrating an apparatus for processing an interface image according to some embodiments of the present disclosure;
fig. 16 is a schematic diagram of an electronic device according to some embodiments of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
The RPA technology can simulate the operation of staff on a computer through a keyboard and a mouse in daily work, and can replace human beings to execute operations such as logging in a system, operating software, reading and writing data, downloading files, reading mails and the like. The automatic robot is used as the virtual labor force of an enterprise, so that the staff can be liberated from repeated and low-value work, and the energy is put into the work with high added value, so that the enterprise can realize the reduction of cost and the increase of benefit while realizing the digital intelligent transformation.
The RPA is a software robot which replaces manual tasks in business processes and interacts with a front-end system of a computer like a human, so the RPA can be regarded as a software program robot running in a personal PC or a server, and replaces human beings to automatically repeat operations such as mail retrieval, attachment downloading, system logging, data processing and analysis and other activities by imitating the operations performed by users on the computer, and is fast, accurate and reliable. Although the problems of speed and accuracy in human work are solved by specific rules set like the traditional physical robot, the traditional physical robot is a robot combining software and hardware, and can execute work only by matching with software under the support of specific hardware; the RPA robot is in a pure software layer, and can be deployed to any PC and server to complete specified work as long as corresponding software is installed.
That is, RPA is a way to perform business operations using "digital staff" instead of people and its related technology. In essence, the RPA realizes unmanned operation of objects such as systems, software, webpages and documents on a computer by a human simulator through a software automation technology, acquires service information, executes service actions, and finally realizes flow automation processing, labor cost saving and processing efficiency improvement. As can be seen from the description, one of the core technologies of RPA is to perform positioning and picking of an element on an interface, for example, when a human simulator is required to perform a button clicking action, the element is positioned on the interface. In some embodiments of the present application, the robot is assisted by the constructed element structure tree to complete element positioning of the operation interface, for example, in some embodiments of the present application, the robot may position the elements on the interface to be detected according to the element hierarchical relationship shown by the constructed element structure tree, and compared with the related technical solutions, accuracy and positioning speed of element positioning on the interface are improved.
Referring to fig. 1, fig. 1 is an interface image, and the following exemplarily describes a process of robot process automation in conjunction with fig. 1.
In fig. 1, a web interface, i.e., a hundred degree search interface, is provided. A plurality of elements, namely a first element 101, a second element 102, a third element 103, a fourth element 104, a fifth element 105, a sixth element 106, a seventh element 107, an eighth element 108, a ninth element 109 and a tenth element, are included on the interface, wherein the first element to the seventh element all belong to elements of hyperlink type, the eighth element belongs to an editable input box type, the ninth element belongs to a button type, and the tenth element 190 belongs to a bordered image.
The robot process automation means that the robot simulation manually performs corresponding operations on each element shown in fig. 1.
The related art needs to realize robot process automation by means of: the element detection module, the template matching module based on image features, the OCR module and other modules work in series, and some embodiments of the application need to acquire an element structure tree of the webpage interface of fig. 1 in a design stage, and then help the robot to locate and search the position of the button through the element structure tree, so that the robot can smoothly execute a click operation on the button.
It should be noted that fig. 1 is only used for exemplarily illustrating the working scenario and the working process of the present application, and should not be construed as limiting the application scenario of the technical solution of the present application.
Some embodiments of the present application use original elements to describe elements recognized from the interface images to be detected or from the training interface images, so that the original two words are used to define the elements recognized directly from the interface images in order to distinguish the elements from combined elements, where a combined element is a plurality of original elements having a common parent node recognized by the method provided by some embodiments of the present application. Elements may include both original and combined elements unless otherwise specified below.
A method for processing an interface image provided by some embodiments of the present application is illustratively set forth below in conjunction with fig. 2.
As shown in fig. 2, some embodiments of the present application provide a method of processing an interface image, the method including:
s101, acquiring parent nodes of at least one hierarchy corresponding to elements (called original elements in some embodiments of the present application) on an interface image to be detected, and obtaining an element structure tree according to the parent nodes.
As described above, the RPA technology can simulate the operation of an employee on a computer through a keyboard and a mouse in daily work, and can perform operations of logging in a system, operating software, reading and writing data, downloading files, reading mails and the like on an interface (for example, an interface corresponding to a web page or an application) instead of a human, where the interface image to be detected in S101 is an image corresponding to the interfaces. That is, in the implementation process of RPA, an image of an interface that requires actual operation of the robot may be used as the interface image to be detected in S101.
For example, in some embodiments of the present application, elements and element attributes of the elements on the interface image to be detected are already recognized, and the element attributes of the elements may be directly read and a parent node of at least one hierarchy may be obtained according to the element attributes.
For example, in some embodiments of the present application, it is required to first obtain an element attribute of each element according to an interface image to be detected, and the corresponding S101 exemplarily includes: identifying elements on the interface image to be detected (namely obtaining the element attributes of each element on the interface image, wherein the element attributes comprise element positions, element categories and element image semantics, for example); and acquiring a father node of at least one hierarchy corresponding to an element on the interface image to be detected, and acquiring an element structure tree according to the father node.
It should be noted that, in some embodiments of the present application, the element categories of the elements on the interface to be detected, which are identified, exemplarily include: image with border, button, icon, mark symbol, textbox: editable input box, icon _ button: is both icon and button or icon _ button _ text: both icon and button and text, etc. Some embodiments of the present application do not define the category of the original element on the interface image. It is understood that the position attribute information of each identified element may also be determined by performing S101, for example, acquiring coordinates of each identified element on the interface image.
The parent node of any level obtained in S101 is obtained through the constructed element graph of the current level, specifically, the parent node of any level is obtained through the node pair characteristics of each adjacent node on the element graph of the current level, for example, it is determined whether two nodes of the adjacent nodes have a common parent node according to the node pair characteristics. It should be noted that some embodiments of the present application predict whether neighboring nodes have a common parent node through a binary classification model. Those skilled in the art can also use other ways to obtain whether there is a common parent node between adjacent nodes according to the constructed node pair characteristics.
S101, the element graph of the current layer adopts a node to represent an element of the current layer and adopts an edge to represent an adjacent node of the current layer, wherein the element of the current layer is an original element and/or a combined element, and the combined element comprises a plurality of original elements with common father nodes. For example, a composite element includes a plurality of identified original elements having a common parent node at a time, or a plurality of identified composite elements of a previous level having a common parent node at a time (it is understood that such a composite element also includes a plurality of original elements having a common parent node), or a composite element is composed of one or more original elements and one or more composite elements of a previous level.
It should be noted that, in order to obtain each layer of parent nodes, some embodiments of the present application use a recursive concept, for example, after obtaining the ith layer of parent nodes according to the ith layer of element graph, the (i + 1) th layer of element graph is constructed again according to the ith layer of parent nodes (each parent node included in the ith layer of parent nodes is taken as a node on the graph), and then the (i + 1) th layer of parent nodes are obtained according to the (i + 1) th layer of element graph. And aggregating a plurality of original elements, or aggregating a plurality of combined elements, or aggregating the original elements and the combined elements by identifying a new father node, and finally finding out a root node to complete the construction of the element structure tree. It is understood that the i-th layer element map is any layer element map.
And S102, providing the element structure tree so that the robot can position the elements on the interface to be detected according to the element hierarchical relation displayed by the element structure tree. For example, the element structure tree is provided to a robot, and the robot can position the element on the interface to be operated according to the element structure tree, so as to complete the relevant operation for the element. For example, the correlation operations include: click, enter text content, or otherwise operate.
That is, the obtained element structure tree may be provided to the program robot, and the robot positions the position of the target element on the interface to be operated according to the element structure tree and completes the corresponding operation.
The steps of fig. 2 are illustrated in fig. 3 below in conjunction with a software architecture diagram of some embodiments of the present application.
The method for processing an interface image according to some embodiments of the present application may be implemented by the target element detection model 110, the element semantic acquisition module 140, and the parent node acquisition modules 120 of each hierarchy of fig. 3.
The target element detection model 110 of fig. 3 is configured to detect and recognize an element included on an interface image input to the model, for example, the element described in S101 is obtained by the target element detection model 110 recognizing the interface image to be detected. Specifically, element position information and element category information of each element are obtained. It is understood that the target element detection model is a model obtained by training the element detection model, and the trained target element detection model 110 has a function of identifying the element position and the element type on the input interface image (or each layer of element map). For example, the target element detection model may employ a yolov5 neural network model that includes a convolutional neural network layer. It should be noted that, the embodiment of the present application is not limited to the specific deep learning network model used by the element detection model or the target element detection model, and any deep learning model that can recognize multiple elements on the interface image may be used by those skilled in the art.
The element semantic acquisition module 140 of fig. 3 is configured to derive the element image semantics of each identified element from the information output by the target element detection model 110.
The composition of the element semantic acquisition module 140 is exemplarily illustrated in fig. 4, and in fig. 4, the element semantic acquisition module 140 includes a clipping module 141 and a feature extractor 142. For example, the interface image to be detected is input into the target element detection model 110 in fig. 4 to obtain the element positions and the element categories of the identified elements, the element positions of the elements are provided to the cropping module 141, the interface image to be detected is input into the cropping module 141, so that the cropping module 141 can crop the image corresponding to the elements from the image to obtain sub-pictures containing the elements, and then the sub-picture input feature extractor 142 is used to obtain the element image semantics corresponding to the sub-pictures. It should be noted that some embodiments of the present application use element image semantics and element positions and element categories corresponding to an element as element attributes, and the element attributes are used to describe node features of nodes on each level element graph.
The parent node acquisition modules 120 of each hierarchy of fig. 3 are configured to perform at least the acquisition of the parent node of at least one hierarchy corresponding to the element described in S102, for example, the parent node acquisition modules 120 of each hierarchy are configured to perform at least the following steps:
first, an ith layer element diagram is constructed (the ith layer element diagram constructing module 121 shown in fig. 3 is adopted), wherein the value range of i is an integer greater than or equal to 1.
If i =1, the first-layer element map is corresponding to the first-layer element map, and it can be understood that the first-layer element map is an element map constructed from all original elements obtained by performing element recognition on the interface image to be detected.
If i =2, the element corresponding to each node on the corresponding second-layer element graph may be one or more original elements, or may be a combined element composed of a plurality of original elements identified through the first iteration and having a common parent node.
If i =3, the element corresponding to each node on the corresponding layer 3 element graph may be one or more original elements, may be a combined element formed by a plurality of original elements having a common parent node and identified through the second iteration, and may even be a combined element formed by a plurality of original elements and combined elements having a common parent node and identified through the second iteration.
Different elements corresponding to each layer of element diagram will be described later with reference to a specific example, and are not described herein in detail to avoid repetition.
And secondly, if the ith layer element graph is confirmed to contain a plurality of nodes, acquiring a node combination with a common father node from the ith layer element graph to obtain the ith layer father node, wherein the node combination comprises at least one adjacent node, and one adjacent node corresponds to one edge on the ith layer element graph.
And repeating the corresponding processes of the first step and the second step until the obtained target layer element graph comprises a node, and confirming to obtain the element structure tree, wherein an element on the target layer element graph is a root node.
To improve the accuracy of the resulting element structure tree, some embodiments of the present application employ a binary classification model to determine whether each neighboring node included on each level of the element graph has a common parent node. For example, the process of obtaining the node combination with the common parent node from the ith element graph to obtain the ith parent node in the second step may exemplarily include: obtaining node pair characteristics of any pair of adjacent nodes on the ith layer element graph to obtain a node pair characteristic set, wherein the node pair characteristics are characterized by element attributes of two elements corresponding to any pair of adjacent nodes; and inputting the node pair feature set into a target secondary classification model, and determining whether each adjacent node on the ith layer element graph has a common father node or not through the target secondary classification model to obtain the ith layer father node. It should be noted that the target secondary classification model is used to predict whether a common parent node exists between adjacent nodes on the i-th level element diagram (i.e., two nodes connected by an edge on the element diagram).
In view of the above description, if a binary classification model is used to identify multiple elements having a common parent node, node pair features for constructing adjacent nodes on a corresponding element graph need to be obtained. As shown in fig. 3, the parent node obtaining module 120 of each hierarchy further includes a node pair feature constructing module 122 and a target two-class model 123, where the node pair feature constructing module 122 is configured to construct node pair features of each connected node on the ith-level element graph, and the target two-class model 123 is configured to determine whether corresponding adjacent nodes have a common parent node according to the node pair features, and if so, identify one or more parent nodes of the hierarchy.
That is, some embodiments of the present application determine, through the target two-class model 123, whether two adjacent nodes on each layer element graph (i.e., two nodes having edges on the element graph) have a common parent node, so as to obtain all parent nodes of the corresponding layer, where multiple nodes having a common parent node serve as one node when forming the element graph of the next layer, and then construct the next layer element graph with these nodes by using the following composition method (e.g., the collision composition method, the circle composition method, or the fan composition method).
The implementation process of determining the parent node of each hierarchy by using the target two-classification model is exemplarily described below with reference to fig. 5.
As shown in fig. 5, which illustrates that the recursive and binary classification (i.e., identifying whether neighboring nodes have common parent nodes) combined method provided by some embodiments of the present application obtains a set of tree nodes included in an element structure tree (the tree nodes of each layer combined include parent nodes of a corresponding layer), the illustrated method for processing an interface image exemplarily includes:
A. the recursion starts.
B. Element detection and feature extraction.
For example, element detection is performed on the interface image to be detected through the target element detection model in fig. 3 to obtain all original elements, element positions and element categories of all the original elements, a sub-picture is obtained by clipping the original elements on the interface image to be detected, and feature extraction is performed on the sub-picture to obtain element image semantics corresponding to the original elements, so that element detection and feature extraction are completed.
The node information in fig. 5 includes both the node characteristics corresponding to the original element (i.e., the element position, the element category, and the element image semantics corresponding to the node), and the node characteristics of each node in the nth layer tree node set composed of the parent node and/or the node corresponding to the original element obtained after the target binary classification model performs the binary classification task. The node features are all characterized by element positions, element categories and element image semantics of elements corresponding to the nodes.
C. And judging whether the node is a root node.
Fig. 5 also determines whether the corresponding nth-layer tree node set is a root node according to the node information, if so, the recursion ends to output the structure tree (i.e., the element structure tree), otherwise, the next step of processing is executed.
D. And (4) composition and data construction, for example, according to the node information. It should be noted that, composition is performed according to the node information, that is, a next-layer element graph is constructed, and data construction is performed, that is, node characteristics of each node on the element graph are obtained, and node pair characteristics of an adjacent node corresponding to the node are obtained according to the corresponding node characteristics.
The node pair feature set is characterized as a pair dataset. The content output from the composition and data construction of fig. 5 may be used to obtain the node pair feature set (i.e., the Parirs data set of fig. 5).
E. And a second classification task, namely performing a second classification task on the node characteristic set to identify a father node of a corresponding element graph so as to obtain an nth layer of tree node set.
That is, the recursive and binary process as in 5 can be summarized as follows:
step 10, beginning the recursion, that is, usually, constructing the parent node of the next layer of the child nodes in the tree, so that the element detection and feature extraction module is performed first to obtain the node information (at the beginning of the recursion, the nodes correspond to the leaf nodes in the tree).
Step 11, judging whether the node is a root node or not according to the number of the nodes, if the number is 1, determining the node is the root node and ending the recursion to obtain an element structure tree; if the number is more than 1, a composition and data construction module (refer to the above specifically), and a pair data set is obtained.
And step 12, sending the pair data set into a two-classification task module, and constructing a set of nth layer tree nodes according to a classification result.
And step 13, integrating the node information in each set to obtain the characteristic information of the father nodes of all the nodes in the set, sending the characteristic information to the step 11, and sequentially and circularly performing to finally obtain the element structure tree.
Step 12 and step 13 are described as an example in connection with fig. 6. The upper left diagram of fig. 6 is a first-layer element diagram (i.e., an element diagram directly constructed by performing an element recognition result on an interface image to be detected) that is constructed and includes nine original elements recognized from the interface image to be detected, and edges connecting different nodes obtained according to a following patterning method (e.g., a collision patterning method, a fixed distance patterning method, or the like). It should be noted that, in some embodiments of the present application, node pair features of adjacent nodes connected by each edge are obtained according to the first-layer element graph, and then the node pair features are input into the target classification model to obtain a classification result graph of the upper-right graph of fig. 6, and it can be seen from the graph that the following combination elements are obtained by processing the first-layer element graph through the target second classification model: three nodes, that is, two pairs of adjacent nodes, namely, the node 2 corresponding to the element 2, the node 4 corresponding to the element 4, and the node 7 corresponding to the element 7, have a common parent node (it should be noted that, three subsequent elements corresponding to this parent node may constitute one combined element as a node on the next-level element map), the node 5 corresponding to the element 5 and the adjacent node corresponding to the node 6 corresponding to the element 6 also have a common parent node (it should be noted that, two subsequent elements corresponding to this parent node may constitute one combined element as a node on the next-level element map), the node 6 corresponding to the element 6 and the adjacent node 9 corresponding to the element 9 also have a common parent node (it should be noted that, two subsequent elements corresponding to this parent node may constitute one combined element as a node on the next-level element map), and thus a first-level parent node is obtained. The lower right diagram of fig. 6 is a distribution diagram of elements obtained from the upper right diagram of fig. 6, in which a plurality of nodes having common parent nodes in the upper right diagram of fig. 6 are taken as one parent node (for example, element 2, element 4, and element 7 in the upper right diagram of fig. 6 are taken as one parent node corresponding to the second parent node of the lower right diagram of fig. 6, element 6 and element 9 in the upper right diagram of fig. 6 are taken as one parent node corresponding to the fifth parent node of the lower right diagram of fig. 6, element 5 and element 6 in the upper right diagram of fig. 6 are taken as one parent node corresponding to the fourth parent node of the lower right diagram of fig. 6), a node to which a single original element without a common parent node corresponds is taken as one parent node (for example, element 1 in the upper right diagram of fig. 6 is taken as one parent node corresponding to the first parent node of the lower right diagram of fig. 6, taking element 3 in the upper right diagram of fig. 6 as a parent node corresponding to a third parent node in the lower right diagram of fig. 6), and then performing composition based on the element distribution diagram in the lower right diagram of fig. 6 according to a composition method (e.g., a collision composition method, a fixed distance composition method, etc.) to obtain a second layer element diagram (i.e., determining whether the parent nodes in the lower right diagram of fig. 6 need to be connected by edges) of the lower left diagram of fig. 6, where the second layer element diagram includes a plurality of nodes and edges connecting different nodes (the edges are also obtained according to the collision composition method, the fixed distance composition method, etc., and the edges represent adjacent nodes of the current layer), it can be understood that the second parent node (corresponding to one node of the element diagram of the current layer) of the diagram corresponds to a combined element composed of three original elements having a common parent node.
As can be seen from the above description, the above-mentioned patterning related to fig. 5 or step 11 is to construct any layer element diagram, and the patterning process is exemplarily illustrated below by taking the construction process of the ith layer element diagram as an example. It can be understood that the composition is to obtain the edges connecting the related nodes on the corresponding element graph, one node on the corresponding element graph corresponds to one element of the layer, and the position of each node on the element graph is determined by the element position of the corresponding element.
It should be noted that, in some embodiments of the present application, constructing the ith layer element diagram exemplarily includes: and determining whether the two nodes belong to adjacent nodes according to the distribution characteristics of the nodes included in the ith layer element graph on the interface image to be detected, namely determining whether the two nodes need to be connected by edges. In some embodiments of the present application, an edge connects two nodes when they are confirmed to belong to adjacent nodes, otherwise the two nodes do not have a variable connection.
The process of determining each edge included on the ith layer element map is exemplarily described below in a collision patterning method. It should be noted that the collision mapping method according to some embodiments of the present application can construct a strongly correlated initial map in a minimum number of connected ways, thereby reducing the runtime of the model and thus the runtime of the RPA as a whole.
In some embodiments of the present application, the ith layer element diagram includes an mth node, wherein the constructing the ith layer element diagram illustratively includes: moving to the boundary of the interface image to be detected according to any one preset direction by taking the m-th node as a starting point, and taking a first node found in the moving process as an adjacent node of the m-th node; and connecting the mth node with the first node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by using an edge.
For example, the following exemplifies a collision mapping method as an example to describe a process of obtaining element attributes and constructing a first-layer element map.
Step 1, inputting an interface picture to be detected into a target detection model, and obtaining element positions and element category information of all original elements.
And 2, according to the element position information in the step 1, cutting out a small picture of the position of each original element from the interface picture to be detected to obtain a cut picture.
And 3, inputting the cut picture in the step 2 into a pre-prepared feature extractor E, and obtaining the image feature information of each original element to obtain the element image semantics. For example, the feature extractor network is based on the imageNet dataset, and the trained resnet18 network, because the interface elements have a certain universality, the data in the imageNet dataset can cover the features of the interface elements.
And 4, integrating the element position and element category information in the step 1 and the element image semantics in the step 3 to obtain the multidimensional information (namely the element attribute) of the element. Since each element is also referred to as a node in the structure tree, the multidimensional information of the elements is also referred to herein as node information or node features.
Step 5, according to the node position information in the node information, constructing a first-layer element graph by using the collision composition method (that is, determining whether an edge exists between two nodes corresponding to any original element according to the collision composition method), wherein the collision composition method comprises the following steps: the graph is constructed by the collision graph construction method starting from the leftmost node based on BFS (Breadth First Search). For example, the collision mapping method is: setting x dimension directions by taking a certain node as a center, so that a rectangular detection frame (the frame is an element rectangular detection frame, the size of the frame is not fixed and is determined according to an element detection result, if the element is large, the frame is large, if the element is small, the frame is small), running along the x dimension directions (namely, a plurality of preset directions) at one time, and in the motion (it needs to be explained that the motion is a virtual motion, the purpose of the motion is to find a nearest point on a certain dimension (namely, a certain preset direction), the motion is not really realized, because the motion structure of the motion is simulated in an algorithm level) in the process of meeting the first other node, the node is a neighbor of the node in the direction, and two nodes (represented as a connecting edge between two node nodes in a first-layer element diagram) are connected; if no other node is encountered in motion in a certain direction, the image boundary has been reached, and the node has no neighbors in that direction. For example, in some embodiments of the present application, x takes on multiple values such as 4, 8, or 12, where x =4 (i.e., four directions, up, down, left, and right) is preferable for accuracy and speed.
A detailed process of constructing the upper left diagram (i.e., the first-layer element diagram) of fig. 6 by the bump patterning method is exemplarily described below with reference to fig. 7 to 12, and fig. 6 is different from the connection lines for connecting edges between two points in fig. 7 to 12.
As shown in fig. 7-12, the process of constructing the first-layer element map is schematically illustrated, in this example, nine original elements, i.e., element 1, element 2, element 3, element 4 \8230, and up to element 9, are identified together from the interface image to be detected. Starting from fig. 7, the neighbor nodes of each node are sequentially searched from the element 1 at the top left corner according to a plurality of set directions to obtain each pair of adjacent nodes, as shown in fig. 7, the node corresponding to the element 2 is the neighbor node of the node corresponding to the element 1, and two nodes form a pair of adjacent nodes. The nodes corresponding to the elements 4 and 7 in fig. 8 are both used as neighbor nodes of the element 2 to obtain two pairs of adjacent nodes, and since the node corresponding to the element 2 is already used as the neighbor node of the node when the neighbor node of the node corresponding to the element 1 is identified, only one edge is connected between the two nodes. Fig. 9 takes the node corresponding to element 5 as the node corresponding to element 3, resulting in yet another pair of neighboring nodes. In fig. 10, two nodes corresponding to the element 7 and the element 5 are both used as neighbor nodes of the element 4, and two pairs of neighbor nodes are obtained. The node of fig. 11 corresponding to element 8 is taken as a neighbor node of the node corresponding to element 5, resulting in a pair of neighboring nodes. The nodes corresponding to the elements 7 and 9 in fig. 12 are used as neighbor nodes of the node corresponding to the element 6, so as to obtain two pairs of neighbor nodes, and thus, the construction of the first-layer element graph is completed.
It will be readily appreciated that the collision diagramming of fig. 7-12 begins at element 1 and ends in order through element 9. Taking element 1 as an example, in four degrees of freedom (i.e. four preset directions), the top, bottom, left and right are free elements, and the nearest element on the right is element 2, so that element 1 is connected with element 2. Taking element 2 as an example, in four degrees of freedom, namely, upper, lower, left and right degrees of freedom, the upper is without element, the right is nearest element 4, the lower is nearest element 7, element 2 is connected with elements 4 and 7 respectively, the left is nearest element 1, and elements 1 and 2 are connected, so that repeated connection is not needed.
The collision mapping method described above yields a first-level element map as shown in fig. 12 (i.e., the top-left diagram of fig. 6), which includes nodes corresponding to the original elements and links between adjacent nodes.
The process of determining each edge included on the ith layer element map to obtain the ith layer element map is exemplarily described below in a fixed distance patterning method.
In some embodiments of the present application, the ith layer element diagram includes an mth node, wherein the constructing the ith layer element diagram illustratively includes: acquiring all nodes with a preset distance from the mth node to obtain at least one second node, wherein the at least one second node is used as an adjacent node of the mth node; and connecting each second node in the at least one second node with the mth node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by adopting an edge.
That is to say, some embodiments of the present application may also determine an edge on a corresponding element graph by using a fixed distance graph construction method, that is, taking each element (that is, one node on the graph) traversed as a center, all elements within a certain range from the element need to be connected to the element as an adjacent node of the node corresponding to the element.
The process of determining each edge included on the ith layer element map to obtain the ith layer element map is exemplarily described below in a fan patterning method.
In some embodiments of the present application, the ith layer element map includes an mth node, wherein the process of constructing the ith layer element map exemplarily includes: acquiring at least part of nodes which are away from the mth node by a preset distance to obtain at least one third node, wherein each third node in the at least one third node is used as an adjacent node of the mth node, the at least part of nodes are positioned in a sector area, and the sector area belongs to a part of area of a circle which takes the mth node as a circle center and takes the preset distance as a radius; and connecting each third node in the at least one third node with the mth node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by adopting an edge.
That is to say, some embodiments of the present application use a sector graph construction method to obtain each edge on the corresponding element graph, that is, taking each element (i.e., one node on the graph) of traversal as a center, dividing 360 degrees into n sector intervals, where the nearest element in each sector interval is connected to the nearest element as an adjacent element of the corresponding element, or the two elements are called adjacent nodes.
It should be noted that, in some embodiments of the present application, the element attributes include: the positions of the elements on the interface image to be detected, the element categories and the element image semantics. In some embodiments of the present application, the node pair characteristics include at least one of difference characteristics describing a difference of the arbitrary pair of neighboring nodes, common characteristics describing a commonality of the arbitrary pair of neighboring nodes, or correlation characteristics describing a correlation of the arbitrary pair of neighboring nodes.
The following example illustrates a method for obtaining node pair characteristics (i.e. obtaining the pair data set of fig. 5) by taking two neighboring nodes as an example.
In some embodiments of the present application, the any pair of neighboring nodes includes a kth node and a pth node, a node feature (i.e., an element attribute of a corresponding element) of the kth node is characterized as a, and a node feature (i.e., an element attribute of a corresponding element) of the pth node is characterized as b, where the obtaining a node pair feature of any pair of neighboring nodes on the ith-level element graph includes: calculating the node pair characteristics of the adjacent nodes consisting of the kth node and the pth node by adopting the following formula:
pair characteristics = [ a + b, a-b, a, b, a/(b + x), a, b ]
Wherein, pair feature characterizes the node pair feature, x is a positive number and the positive number is less than 1.
The following illustrates how node pair characteristics are derived from node characteristics of neighboring nodes.
In some embodiments of the present application, any pair of adjacent nodes in the ith element graph includes a kth node and a pth node, the kth node corresponds to a kth element of the ith element graph, and the pth node corresponds to a pth element of the ith element graph, where before obtaining node pair characteristics of any pair of adjacent nodes on the ith element graph, the method further includes: acquiring element positions, element categories and element image semantics of the kth element and the p element to obtain element attributes of the two elements (namely, the two elements corresponding to any pair of adjacent nodes); the obtaining node pair characteristics of any pair of adjacent nodes on the ith layer element graph comprises: and combining the element attributes of the two elements to obtain the node pair characteristic. For example, the node pair feature calculation formula is used to combine two attribute features, it should be noted that the combination in some embodiments of the present application may also include directly concatenating two element attributes, or summing and subtracting two element attributes.
For example, in some embodiments of the present application, the obtaining the element positions, the element categories, and the element image semantics of the kth element and the pth element to obtain the element attributes of the two elements includes: obtaining element positions and element types of the kth element and the p element through a target detection model; intercepting an image corresponding to the kth element from the interface image to be detected according to the element position to obtain a kth sub-picture, and intercepting an image corresponding to the pth element to obtain a pth sub-picture; and inputting the kth sub-picture and the pth sub-picture into a feature extractor to obtain an element image semantic corresponding to the kth element and obtain an element image semantic corresponding to the pth element. It will be appreciated that the process may also be employed to obtain element attribute information for each element (which does not necessarily constitute a neighboring node with other elements). Are not described in detail herein to avoid repetition.
It should be noted that the element categories include: a bordered image, a button, a label, or an editable input box.
The following exemplary description describes the process of training the two-class model to obtain the target two-class model.
It should be noted that the obtained node pair feature set pair dataset (refer to the following) is used as input, the manually completed data label is used as supervision information, and the supervision information is sent to a binary model for task prediction training, and whether two nodes in the same adjacent node share the same father node is judged. The supervision information is whether the two elements constituting the pair belong to the same parent node. If the nodes belong to the same father node, the nodes are connected, the supervision information is true, otherwise, the supervision information is false.
As shown in fig. 13, in some embodiments of the present application, before the inputting the node pair feature set into the target two-class model, the method further comprises: and training a binary classification model 124 according to at least one training interface image and annotation data 125 to obtain the target binary classification model 123, wherein the annotation data is used for marking whether each adjacent node on the element graph of each hierarchy has a common parent node.
For example, the training the binary model according to the at least one training interface image and the annotation data includes: carrying out element detection on any training interface image to obtain at least one original element; obtaining attribute characteristics of each original element in the at least one original element to obtain element attributes of each original element, wherein the element attributes comprise element positions; constructing an initial training element graph according to the elements (for a specific process, reference may be made to the process of constructing the first-layer element graph described above or a composition process shown according to the q-th node shown in the following example); and training the two classification models according to the initial training element graph and the labeling data.
The method of constructing the initial training element map for training is the same as the method of constructing each layer element map described above.
For example, the q-th node is included in the initial training element graph, where the constructing the initial training element graph according to the original element attributes includes: moving to the boundary of the interface image to be detected according to any one preset direction by taking the q-th node as a starting point, and taking a first node found in the moving process as an adjacent node of the q-th node; connecting the q-th node with the first node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by using an edge; and repeating the steps to obtain the initial training element diagram. Or acquiring all nodes which are away from the q-th node by a preset distance to obtain at least one second node, wherein the at least one second node is used as an adjacent node of the q-th node; connecting each second node of the at least one second node with the q-th node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by adopting an edge; and repeating the steps to obtain the initial training element diagram. Or acquiring at least part of nodes which are away from the q-th node by a preset distance to obtain at least one third node, wherein each third node in the at least one third node is used as an adjacent node of the q-th node, the at least part of nodes are located in a sector area, and the sector area belongs to a part of area of a circle which takes the q-th node as a circle center and takes the preset distance as a radius; connecting each third node of the at least one third node with the q-th node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by adopting an edge; and repeating the steps to obtain the initial training element diagram.
In order to enable the trained target binary classification model to have the capability of judging whether the adjacent nodes have the common parent node according to the node pair characteristics of the adjacent nodes on the element diagram of the current layer, the node pair characteristics of the adjacent nodes on the training image also need to be constructed when the binary classification model is trained, and the construction mode of the node pair characteristics can refer to the description above.
For example, the training the two-class model according to the initial training element diagram and the labeling data includes: obtaining node characteristics of each node on the initial training element graph, wherein the node characteristics are element attributes of corresponding elements, and the element attributes further include: element category and element image semantics (the acquisition process of these features may refer to the acquisition process of the corresponding parameters above); obtaining node pair features of each adjacent node on the initial training element graph according to the node features to obtain a node pair feature set (specifically, refer to the above process for obtaining the node pair features or refer to the below process for obtaining the node pair features described in conjunction with fig. 14); whether the adjacent node pairs have common father nodes is respectively marked to obtain marking data; and training the two classification models according to the labeling data and the node pair feature set to obtain the target two classification models.
The process of obtaining the node-to-feature data on the training interface image is described below with reference to fig. 14.
The node information of fig. 14 is obtained node features, i.e., element attributes, of each node on the initial training element graph, where the node features include element positions, element categories, and element image semantics of corresponding elements.
Step 5, collision mapping method
An initial training element graph (i.e., the initial graph g0 shown in fig. 14) is constructed by using a collision mapping method according to node position information in the node information.
Step 6, characteristic structure
And constructing the node characteristics of each node according to all information in the node information (or called node characteristics), namely element positions, element categories and element image semantic characteristics. The node characteristic of each node is composed in the form of: node = [ node.
And 7, constructing a correction element graph (namely the correction element graph g1 of the graph 14), wherein each node on the graph has characteristic information, namely the node characteristics of each node.
And (4) putting the node characteristics constructed in the step (6) into the corresponding element graph to obtain a corrected element graph g1 with node characteristic information. In the figure, each node can be provided with a node characteristic, and the 'put' refers to that: according to the node characteristics of step 6, for each node in the initial graph g0, the node characteristics thereof are set, thereby obtaining a corrected element graph g1. The modified element graph g1 has no structural change, i.e. more node features, than the original graph g 0.
Step 8, connecting nodes form node pairs pair
According to the corrected element graph g1, a pair is formed by two nodes (namely adjacent nodes) connected by an edge, and according to the characteristics of the two nodes, the characteristics (namely node pair characteristics) of the node pair are constructed, so that a pair data set (namely node pair characteristic set) is obtained. Wherein, the pair characteristic structural formula is as follows:
a = [node1.class + node1.location + node1.img_feature]
b = [node2.class + node2.location + node2.img_feature]
pair characteristics = [ a + b, a-b, a, b, a/(b + 0.01), a, b ]
Wherein a is the characteristic of the node 1, b is the characteristic of the node 2, and the pair characteristic is constructed by the sum, difference, product, quotient, a and b of a and b by a method of concatenate splicing. Since all values in a and b are non-negative, all quotients are a divided by (b + 0.01) to prevent a divide by 0.
Note that the node pair characteristics of some embodiments of the present application may also be pair characteristics = [ a, b ]. In some embodiments of the present application, by obtaining a difference point, a common point, and a correlation between two node features, an obtained node pair feature expression is pair feature = [ a + b, a-b, a × b, a/(b + 0.01), a, b ], and compared with the node pair feature of the first method, the node pair feature representing the difference point, the common point, and the correlation point is easier for model learning.
The relevant technical means of the node characteristics (i.e., the element attributes), the node pair characteristics (i.e., the node pair characteristic calculation formula), and the like involved in the training process may also refer to the relevant description of the above application process, and these contents are not described in detail in order to avoid repetition.
Referring to fig. 15, fig. 15 shows an apparatus for processing an interface image according to an embodiment of the present application, and it should be understood that the apparatus corresponds to the embodiment of the method in fig. 2, and is capable of performing various steps related to the embodiment of the method, and specific functions of the apparatus can be referred to the description above, and detailed descriptions are appropriately omitted here to avoid repetition. The device comprises at least one software functional module which can be stored in a memory in the form of software or firmware or solidified in an operating system of the device, and the device for processing the interface image comprises: a parent node acquisition module 111 and a providing module 112 for each hierarchy.
Each hierarchy father node obtaining module 111 is configured to obtain at least one hierarchy father node corresponding to an element on the interface image to be detected, and obtain an element structure tree according to the father node, where the father node of any hierarchy is obtained through node pair characteristics of each adjacent node on the element graph of the current hierarchy.
A providing module configured to provide the element structure tree to cause the robot to position elements on the interface image according to the element structure tree.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the method for processing an interface image, and will not be described in detail herein.
As shown in fig. 16, some embodiments of the present application provide an electronic device 500, which includes a memory 510, a processor 520, and a computer program stored on the memory 510 and executable on the processor, wherein when the processor 520 reads a program through a bus 530 and executes the program, the method as described in any embodiment of the method for processing an interface image can be implemented.
Processor 520 may process digital signals and may include various computing structures. Such as a complex instruction set computer architecture, a structurally reduced instruction set computer architecture, or an architecture that implements a combination of instruction sets. In some examples, processor 520 may be a microprocessor.
Memory 510 may be used to store instructions that are executed by processor 520 or data related to the execution of the instructions. The instructions and/or data may include code for performing some or all of the functions of one or more of the modules described in embodiments of the application. The processor 520 of the disclosed embodiment may be configured to execute the instructions in the memory 510 to implement the methods shown in the above-described method of processing an interface image. Memory 510 includes dynamic random access memory, static random access memory, flash memory, optical memory, or other memory known to those skilled in the art.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

Claims (18)

1. A method of processing an interface image, the method comprising:
acquiring at least one hierarchy father node corresponding to an element on an interface image to be detected, and acquiring an element structure tree according to the father node, wherein the father node of any hierarchy is acquired through node pair characteristics of each adjacent node on an element graph of the layer, and the node pair characteristics are acquired by combining element attribute information of two elements corresponding to the adjacent nodes;
and providing the element structure tree so that the robot positions the elements on the interface to be detected according to the element hierarchical relation displayed by the element structure tree.
2. The method as claimed in claim 1, wherein the obtaining of at least one hierarchy level parent node corresponding to an element on the interface image to be detected comprises:
constructing an ith layer element graph, wherein the value range of i is an integer which is more than or equal to 1, and the ith layer element graph adopts one node to represent one element of the layer and adopts edges to represent adjacent nodes of the layer;
and if the ith layer element graph contains a plurality of nodes, acquiring a node combination with a common father node from the ith layer element graph to obtain the ith layer father node, wherein the node combination comprises at least one pair of adjacent nodes, and the pair of adjacent nodes corresponds to one edge on the ith layer element graph.
3. The method of claim 2, wherein the constructing the i-th layer element map comprises: and determining whether the two nodes need to be connected by edges according to the distribution characteristics of the nodes included in the ith layer element diagram on the interface image to be detected.
4. The method of claim 3, wherein the i-th layer element graph includes an m-th node, wherein,
the constructing of the ith layer element map comprises the following steps:
moving to the boundary of the interface image to be detected according to any one preset direction by taking the m-th node as a starting point, and taking a first node found in the moving process as an adjacent node of the m-th node;
and connecting the mth node with the first node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by one edge.
5. The method of claim 3, wherein the i-th layer element graph includes an m-th node, wherein,
the building of the ith layer element map comprises the following steps:
acquiring all nodes with a preset distance from the mth node to obtain at least one second node, wherein the at least one second node is used as an adjacent node of the mth node;
and connecting each second node in the at least one second node with the mth node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by adopting an edge.
6. The method of claim 3, wherein the i-th layer element graph includes an m-th node, wherein,
the constructing of the ith layer element map comprises the following steps:
acquiring at least part of nodes which are away from the mth node by a preset distance to obtain at least one third node, wherein each third node in the at least one third node is used as an adjacent node of the mth node, and the at least part of nodes are located in a sector area;
and connecting each third node in the at least one third node with the mth node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by adopting an edge.
7. The method of any one of claims 2 to 6,
the obtaining a node combination with a common father node from the ith element graph to obtain an ith father node includes:
obtaining node pair characteristics of any pair of adjacent nodes on the ith layer element graph to obtain a node pair characteristic set, wherein the node pair characteristics are characterized by element attributes of two elements corresponding to any pair of adjacent nodes;
and inputting the node pair feature set into a target second classification model, and determining whether each adjacent node on the ith layer element graph has a common father node or not through the target second classification model to obtain the ith layer father node.
8. The method of claim 7,
the element attributes include: element location, element category, and element image semantics; alternatively, the first and second electrodes may be,
the node pair characteristics comprise at least one of difference characteristics, common characteristics or correlation characteristics, wherein the difference characteristics are used for describing the difference of any pair of adjacent nodes, the common characteristics are used for describing the commonality of any pair of adjacent nodes, and the correlation characteristics are used for describing the correlation of any pair of adjacent nodes.
9. The method of claim 8, wherein the arbitrary pair of neighboring nodes includes a kth node and a pth node, the node characteristic of the kth node is characterized as a, the node characteristic of the pth node is characterized as b, the node characteristic is characterized by an element attribute of the corresponding element, wherein,
the obtaining node pair characteristics of any pair of adjacent nodes on the ith layer element graph comprises:
calculating the node pair characteristics of the adjacent nodes consisting of the kth node and the pth node by adopting the following formula:
pair characteristics = [ a + b, a-b, a, b, a/(b + x), a, b ]
Wherein, pair feature characterizes the node pair feature, x is a positive number and the positive number is less than 1.
10. The method of claim 7, wherein any pair of adjacent nodes in the i-th level element graph includes a kth node corresponding to a kth element of the i-th level element graph and a pth node corresponding to a pth element of the i-th level element graph, wherein,
before obtaining node pair characteristics of any pair of adjacent nodes on the ith layer element graph, the method further comprises:
acquiring element positions, element categories and element image semantics of the kth element and the p element to obtain element attributes of the two elements;
the obtaining node pair characteristics of any pair of adjacent nodes on the ith layer element graph comprises:
and combining the element attributes of the two elements to obtain the node pair characteristic.
11. The method of claim 10, wherein the obtaining of the element positions, element categories, and element image semantics of the kth element and the pth element to obtain element attributes of the two elements comprises:
obtaining the element positions and element types of the kth element and the p element through a target detection model;
intercepting an image corresponding to the kth element from an interface image to be detected according to the element position to obtain a kth sub-picture, and intercepting an image corresponding to the pth element to obtain a pth sub-picture;
and inputting the kth sub-picture and the pth sub-picture into a feature extractor to obtain an element image semantic corresponding to the kth element and obtain an element image semantic corresponding to the pth element.
12. The method of claim 8, wherein the element categories include: bordered images, buttons, tabs, or editable input boxes.
13. The method of claim 7, wherein prior to said inputting the node pair feature set into a target two classification model, the method further comprises:
and training a two-classification model according to at least one training interface image and marking data to obtain the target two-classification model, wherein the marking data is used for marking whether each adjacent node on the element graph of each hierarchy has a common father node.
14. The method of claim 13, wherein training the classification model based on the at least one training interface image and the annotation data comprises:
performing element detection on any training interface image in the at least one training interface image to obtain at least one element;
acquiring attribute characteristics of each element in the at least one element to obtain an element attribute, wherein the element attribute comprises an element position;
constructing an initial training element graph according to the element positions;
and training the two classification models according to the initial training element graph and the labeling data.
15. The method of claim 14, comprising a qth node on the initial training element graph, wherein,
the constructing of the initial training element graph according to the element positions comprises:
moving towards the boundary of the interface image to be detected according to any one preset direction by taking the q-th node as a starting point, and taking a first node found in the moving process as an adjacent node of the q-th node;
connecting the q-th node with the first node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by using an edge;
repeating the steps to obtain the initial training element diagram;
alternatively, the first and second electrodes may be,
acquiring all nodes with a preset distance from the q node to obtain at least one second node, wherein the at least one second node is used as an adjacent node of the q node;
connecting each second node of the at least one second node with the q-th node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by adopting an edge;
repeating the steps to obtain the initial training element diagram;
alternatively, the first and second electrodes may be,
acquiring at least part of nodes which are away from the q-th node by a preset distance to obtain at least one third node, wherein each third node in the at least one third node is used as an adjacent node of the q-th node, the at least part of nodes are positioned in a sector area, and the sector area belongs to a part of area of a circle which takes the q-th node as a circle center and takes the preset distance as a radius;
connecting each third node of the at least one third node with the q-th node by adopting an edge, wherein two nodes confirmed as adjacent nodes are connected by adopting an edge;
and repeating the steps to obtain the initial training element diagram.
16. The method of claim 15,
the training the binary model according to the initial training element diagram and the labeled data comprises:
obtaining node characteristics of each node on the initial training element graph, wherein the node characteristics are element attributes of corresponding elements, and the element attributes further include: element classes and element image semantics;
obtaining node pair characteristics of each adjacent node on the initial training element graph according to the node characteristics to obtain a node pair characteristic set;
respectively marking whether each adjacent node pair has a common father node or not to obtain marking data;
and training the two classification models according to the labeling data and the node pair feature set to obtain the target two classification models.
17. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 16.
18. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program is adapted to implement the method of any of claims 1-16.
CN202211205653.4A 2022-09-30 2022-09-30 Method, medium and electronic device for processing interface image Active CN115269107B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211205653.4A CN115269107B (en) 2022-09-30 2022-09-30 Method, medium and electronic device for processing interface image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211205653.4A CN115269107B (en) 2022-09-30 2022-09-30 Method, medium and electronic device for processing interface image

Publications (2)

Publication Number Publication Date
CN115269107A CN115269107A (en) 2022-11-01
CN115269107B true CN115269107B (en) 2022-12-27

Family

ID=83757827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211205653.4A Active CN115269107B (en) 2022-09-30 2022-09-30 Method, medium and electronic device for processing interface image

Country Status (1)

Country Link
CN (1) CN115269107B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201719862D0 (en) * 2017-11-29 2018-01-10 Yellow Line Parking Ltd Hierarchical image interpretation system
CN112446647A (en) * 2020-12-14 2021-03-05 上海众源网络有限公司 Abnormal element positioning method and device, electronic equipment and storage medium
CN112891945B (en) * 2021-03-26 2022-11-18 腾讯科技(深圳)有限公司 Data processing method and device, electronic equipment and storage medium
CN114637448B (en) * 2022-03-04 2023-04-21 上海弘玑信息技术有限公司 Data processing method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115269107A (en) 2022-11-01

Similar Documents

Publication Publication Date Title
Mahdavi et al. ICDAR 2019 CROHME+ TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection
US10191889B2 (en) Systems, apparatuses and methods for generating a user interface by performing computer vision and optical character recognition on a graphical representation
Cliche et al. Scatteract: Automated extraction of data from scatter plots
JP4781924B2 (en) White space graph and tree for content adaptive scaling of document images
US8737739B2 (en) Active segmentation for groups of images
US8243988B1 (en) Clustering images using an image region graph
CN115268719B (en) Method, medium and electronic device for positioning target element on interface
US20070038937A1 (en) Method, Program, and Device for Analyzing Document Structure
CN106202514A (en) Accident based on Agent is across the search method of media information and system
CN112463976B (en) Knowledge graph construction method taking crowd sensing task as center
Patnaik et al. Intelligent and adaptive web data extraction system using convolutional and long short-term memory deep learning networks
WO2022005663A1 (en) Computerized information extraction from tables
Le Bodic et al. An integer linear program for substitution-tolerant subgraph isomorphism and its use for symbol spotting in technical drawings
Chen et al. Towards complete icon labeling in mobile applications
US11600088B2 (en) Utilizing machine learning and image filtering techniques to detect and analyze handwritten text
Healey et al. Interest driven navigation in visualization
Li et al. A method based on an adaptive radius cylinder model for detecting pole-like objects in mobile laser scanning data
CN111949306A (en) Pushing method and system supporting fragmented learning of open-source project
Manandhar et al. Magic layouts: Structural prior for component detection in user interface designs
CN114387608A (en) Table structure identification method combining convolution and graph neural network
Alahmadi VID2XML: Automatic Extraction of a Complete XML Data From Mobile Programming Screencasts
CN115546465A (en) Method, medium and electronic device for positioning element position on interface
CN115269107B (en) Method, medium and electronic device for processing interface image
US20230410543A1 (en) List and tabular data extraction system and method
Nieddu et al. In Codice Ratio: A crowd-enabled solution for low resource machine transcription of the Vatican Registers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant