CN113177168B

CN113177168B - Positioning method based on Web element attribute characteristics

Info

Publication number: CN113177168B
Application number: CN202110474540.3A
Authority: CN
Inventors: 刘春刚; 许凯; 赵东旭; 田永军
Original assignee: Shanghai Yunda Information Technology Co ltd
Current assignee: Shanghai Yunda Information Technology Co ltd
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2023-12-01
Anticipated expiration: 2041-04-29
Also published as: CN113177168A

Abstract

The application discloses a positioning method based on Web element attribute characteristics, which comprises the following steps: step one: determining attribute names to be verified according to different types of target elements; step two: acquiring a characteristic value of a target element; step three: and constructing XPath according to the element characteristic values. According to the positioning method based on the Web element attribute characteristics, the characteristic attribute of the target element is used as positioning information, the obtained characteristic attribute of the element is irrelevant to the Dom structure and the page display style, the characteristic attribute of the element is mostly used for indicating the service purpose of the element (for example, the characteristic attribute value of a common Submit button is type= "Submit" = "Submit") and after the Dom structure and the page style are changed, the characteristic attribute of the element is stable and unchanged, different characteristic attributes are customized and read for different types of elements, and the unique characteristic attribute value of the target element on the page can be obtained by utilizing a dynamic characteristic attribute extraction algorithm, so that more robust element positioning is realized.

Description

Positioning method based on Web element attribute characteristics

Technical Field

The application relates to the technical field of element positioning, in particular to a positioning method based on Web element attribute characteristics.

Background

RPA (Robotic Process Automation), i.e. robotic flow automation. The system is a working mode for replacing manual work and is used for freeing the manual work from repeated redundant and regular workflow. Robot flow automation refers to non-invasive automation, i.e., employing different recognition techniques to locate target elements, rather than the form of injected code.

In the product function of RPA, the accurate positioning element is a basic core function, in the Web automation field, the traditional Web element positioning method mostly uses a traversing path between a target element and a root node based on a page Dom structure as a characteristic value (the traversing path is like HTML- > BODY- > DIV- > DIV [2] - > SPAN), because the modern Web page display content is more and more complex, the interactive display is more and more abundant, the current Web mainstream frame (React, vue, angular) adopts a Virtual-Dom technology to improve the performance, the technology realizes the Web interactive display with high performance by dynamically controlling the addition and deletion of the Dom element and changing the control, the traversing path between the root node and the target element is changed by the dynamic Dom element addition and deletion, and the element cannot be positioned correctly when the page is automatic, and the automation operation cannot be completed; in addition, with the continuous maturity of the Web technology, the Web development is standardized and componentized gradually, so that in order to ensure the consistency of experience across browsers, identical appearances and interactions exist in different browsers (Chrome/Firefox/Edge), the doming elements and patterns of the same component rendered in different browser types and even in different release versions of the same browser type are different, and the problem that page automation caused by browser replacement or browser version upgrading cannot accurately position the elements is caused, and automation operation cannot be completed.

Therefore, we propose a positioning method based on the attribute characteristics of the Web element, so as to solve the above-mentioned problems.

Disclosure of Invention

The application aims to provide a positioning method based on Web element attribute characteristics, which aims to solve the problem that most of positioning methods in the background art can cause that page automation caused by browser replacement or browser version upgrading cannot accurately position elements and cannot finish automation operation.

In order to achieve the above purpose, the present application provides the following technical solutions: a positioning method based on Web element attribute features, the Web element positioning method selects characteristic attribute values according to different element optimization, and constructs a target element XPath query string by utilizing the characteristic attribute values, wherein the Web element positioning method comprises the following steps:

step one: determining attribute names to be verified according to different types of target elements;

step two: acquiring a characteristic value of a target element;

step three: and constructing XPath according to the element characteristic values.

Preferably, in the first step, when determining the attribute name to be verified, the element attribute is first divided into a general attribute and a proprietary attribute, where:

the generic attribute refers to the attribute that HTML elements are ubiquitous and frequently used, and is specifically as follows: an element ID attribute field specifying a unique identifier for the element; a Name field for an element, the attribute specifying the Name of the element; title, an element Title field, which is commonly used for explaining the use of an element, and a text displayed by a prompt box when a mouse hovers over the element; text, text inside the element;

the proprietary attribute refers to the attribute specific to the reading element according to different element types, and is specifically as follows:

input HTMLE lement element proprietary: the Type is used for indicating that the Input element is used for inputting contents or whether the Input element is an operation such as form submission or not;

img HTMLE document element proprietary: src, the attribute prescribes a file path of the embedded picture; alt, the attribute prescribes text descriptions of the images, alternative text when the pictures cannot be presented or for a screen reader to read the descriptions for the user to listen;

iframe HTMLE lement element proprietary: src is embedded page address;

ahtmle element proprietary: uref, including the URL or URL fragment pointed by the hyperlink; target, which designates where to display the linked resource;

finally, the element attribute array ArrList can be obtained according to the target element general attribute and the special attribute.

Preferably, the specific process of obtaining the feature value of the target element in the second step includes:

(1) Constructing a target element characteristic attribute AMap and an integer Count, wherein the integer Count is used for recording the number of elements which can be matched through the current AMap in the global, and the initial value is infinite;

(2) Acquiring current attribute names K from the AttrList one by one, acquiring a value V corresponding to the current element attribute name K, and adding the value V into an AMap; the global search can be matched with all the elements of the attribute in the AMap, the number N of the elements meeting the condition at present is recorded, if the value of N is not smaller than Count at the moment, the addition of the current attribute K does not enable the target element to be positioned more accurately, and the attribute K and the corresponding V need to be deleted from the AMap at the moment; otherwise, updating the value of Count to be the number N of elements which currently meet the condition, and circularly executing the step until all the attributes in the AttrList are traversed, or only uniquely matching the attributes in the AMap to the target elements;

(3) Checking a Count field, if the field is 1, indicating that a target element can be uniquely positioned through an AMap, wherein an attribute value in the AMap at the moment is a characteristic value of the target element; if the field is larger than 1, the fact that the attribute value in the AMap is met in the page is indicated to be further provided with other elements, at the moment, the characteristic value needs to be further added to uniquely confirm the target element, a deep traversal algorithm based on a Dom tree structure is adopted, an element list ElementArray which meets the attribute value of the AMap is obtained, a position Index of the target element in the ElementArray is recorded, and at the moment, the Index and all the attribute values in the AMap can be used as the characteristic value of the target element.

Preferably, in the third step, XPath refers to an XML path language, where the XML path language may be used to detect whether a certain node in a document matches a certain pattern, and XPath provides a rich function, can flexibly support multiple attribute matching, and supports XPath in the main stream browser.

Preferably, the element attribute feature value obtained in the second step constructs an XPath character string of the target element, where the XPath is a positioning feature value obtained based on the attribute of the element, and after the Html page structure changes, the XPath character string can be stably positioned to the target element.

Preferably, the first step and the second step select the element with obvious characteristics as an anchor element to generate an attribute characteristic set, and the third step generates an XPath unique positioning target element according to the relative position of the anchor element to the target element.

Compared with the prior art, the application has the beneficial effects that: according to the positioning method based on the Web element attribute characteristics, the characteristic attribute of the target element is used as positioning information, the obtained characteristic attribute of the element is irrelevant to a Dom structure and a page display style, the characteristic attribute of the element is mostly used for indicating the service purpose of the element (for example, the characteristic attribute value of a common Submit button is type= "Submit" = "Submit") and after the Dom structure and the page style are changed, the characteristic attribute of the element is stable and unchanged.

Drawings

FIG. 1 is a schematic flow chart of the process for acquiring the attribute names of the characteristics of the target elements;

fig. 2 is a schematic flow chart of extracting the characteristic attribute of the target element by the dynamic characteristic attribute extraction algorithm of the application.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments, and all other embodiments obtained by those skilled in the art without making creative efforts based on the embodiments of the present application are included in the protection scope of the present application.

Referring to fig. 1-2, the present application provides a technical solution: a positioning method based on Web element attribute features, the Web element positioning method selects characteristic attribute values according to different element optimization, and constructs a target element XPath query string by utilizing the characteristic attribute values, wherein the Web element positioning method comprises the following steps:

step two: acquiring a characteristic value of a target element;

In a further step, when determining the attribute name to be verified, the element attribute is first divided into a general attribute and a proprietary attribute, wherein:

iframe HTMLE lement element proprietary: src is embedded page address;

The specific process of obtaining the characteristic value of the target element in the second step further comprises the following steps:

(2) Acquiring current attribute names K from the AttrList one by one, acquiring a value V corresponding to the current element attribute name K, and adding the value V into an AMap; the global search can be matched with all the elements of the attribute in the AMap, the number N of the elements meeting the condition at present is recorded, if the value of N is not smaller than Count at the moment, the addition of the current attribute K does not enable the target element to be positioned more accurately, and the attribute K and the corresponding V need to be deleted from the AMap at the moment; otherwise, updating the value of the Count to be the number N of the elements which currently meet the condition; the step is circularly executed until all the attributes in the AttrList are traversed, or only the target element can be uniquely matched through the attributes in the AMap;

In the third step, the XPath refers to an XML path language, which can be used to detect whether a certain node in a document matches a certain pattern, the XPath provides rich functions, can flexibly support multiple attribute matching, and all the main stream browsers support XPath.

In the application, the characteristic value of the element attribute obtained in the second step is used for constructing the XPath character string of the target element, wherein the XPath is a positioning characteristic value obtained based on the attribute of the element, and the XPath can be stably positioned to the target element after the structure of the Html page is changed.

In the third step, XPath unique positioning target elements are generated according to the relative positions of the anchor point elements to the target elements.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It should be noted that in the description of the present specification, descriptions of terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics may be combined in any suitable manner in any one or more embodiments or examples, which are described as being relatively simple as being substantially similar to the method embodiments, as relevant in part to the description of the method embodiments. The system embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements illustrated as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.

In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes. It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A positioning method based on Web element attribute features is characterized by comprising the following steps: the Web element positioning method selects characteristic attribute values according to different element optimization, and constructs a target element XPath query string by utilizing the characteristic attribute values, wherein the Web element positioning method comprises the following steps:

step two: acquiring a characteristic value of a target element;

step three: constructing XPath according to the element characteristic values;

in the first step, when determining the attribute name to be verified, the element attribute is first divided into a general attribute and a proprietary attribute, wherein:

iframe HTMLE lement element proprietary: src is embedded page address;

finally, the element attribute array ArrList can be obtained according to the general attribute and the special attribute of the target element;

the specific process for obtaining the characteristic value of the target element in the second step comprises the following steps:

(3) Checking a Count field, if the field is 1, indicating that a target element can be uniquely positioned through an AMap, wherein an attribute value in the AMap at the moment is a characteristic value of the target element; if the field is larger than 1, indicating that the page meets the attribute value in the AMap and other elements are needed, wherein the feature value is further added to uniquely identify the target element, a deep traversal algorithm based on a Dom tree structure is adopted to obtain an element list ElementArray which meets the attribute value of the AMap, the position Index of the target element in the ElementArray is recorded, and the Index and the attribute values in the AMap can be used as the feature value of the target element;

in the third step, XPath refers to an XML path language, which can be used to detect whether a certain node in a document is matched with a certain pattern (pattern), and the XPath provides rich functions, can flexibly support multiple attribute matching, and supports XPath in the main stream browser;

and (3) constructing an XPath character string of the target element based on the element attribute characteristic value obtained in the step (II), wherein the XPath is a positioning characteristic value obtained based on the element attribute, and can be stably positioned to the target element after the Html page structure is changed.

2. The positioning method based on the attribute characteristics of the Web element according to claim 1, wherein the positioning method comprises the following steps: and step one and step two are used for selecting elements with obvious characteristics as anchor point elements, generating attribute characteristic sets, and generating XPath unique positioning target elements according to the relative positions of the anchor point elements to the target elements in step three.