CN113177168B - Positioning method based on Web element attribute characteristics - Google Patents

Positioning method based on Web element attribute characteristics Download PDF

Info

Publication number
CN113177168B
CN113177168B CN202110474540.3A CN202110474540A CN113177168B CN 113177168 B CN113177168 B CN 113177168B CN 202110474540 A CN202110474540 A CN 202110474540A CN 113177168 B CN113177168 B CN 113177168B
Authority
CN
China
Prior art keywords
attribute
target
elements
characteristic
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110474540.3A
Other languages
Chinese (zh)
Other versions
CN113177168A (en
Inventor
刘春刚
许凯
赵东旭
田永军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yunda Information Technology Co ltd
Original Assignee
Shanghai Yunda Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yunda Information Technology Co ltd filed Critical Shanghai Yunda Information Technology Co ltd
Priority to CN202110474540.3A priority Critical patent/CN113177168B/en
Publication of CN113177168A publication Critical patent/CN113177168A/en
Application granted granted Critical
Publication of CN113177168B publication Critical patent/CN113177168B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Transfer Between Computers (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The application discloses a positioning method based on Web element attribute characteristics, which comprises the following steps: step one: determining attribute names to be verified according to different types of target elements; step two: acquiring a characteristic value of a target element; step three: and constructing XPath according to the element characteristic values. According to the positioning method based on the Web element attribute characteristics, the characteristic attribute of the target element is used as positioning information, the obtained characteristic attribute of the element is irrelevant to the Dom structure and the page display style, the characteristic attribute of the element is mostly used for indicating the service purpose of the element (for example, the characteristic attribute value of a common Submit button is type= "Submit" = "Submit") and after the Dom structure and the page style are changed, the characteristic attribute of the element is stable and unchanged, different characteristic attributes are customized and read for different types of elements, and the unique characteristic attribute value of the target element on the page can be obtained by utilizing a dynamic characteristic attribute extraction algorithm, so that more robust element positioning is realized.

Description

Positioning method based on Web element attribute characteristics
Technical Field
The application relates to the technical field of element positioning, in particular to a positioning method based on Web element attribute characteristics.
Background
RPA (Robotic Process Automation), i.e. robotic flow automation. The system is a working mode for replacing manual work and is used for freeing the manual work from repeated redundant and regular workflow. Robot flow automation refers to non-invasive automation, i.e., employing different recognition techniques to locate target elements, rather than the form of injected code.
In the product function of RPA, the accurate positioning element is a basic core function, in the Web automation field, the traditional Web element positioning method mostly uses a traversing path between a target element and a root node based on a page Dom structure as a characteristic value (the traversing path is like HTML- > BODY- > DIV- > DIV [2] - > SPAN), because the modern Web page display content is more and more complex, the interactive display is more and more abundant, the current Web mainstream frame (React, vue, angular) adopts a Virtual-Dom technology to improve the performance, the technology realizes the Web interactive display with high performance by dynamically controlling the addition and deletion of the Dom element and changing the control, the traversing path between the root node and the target element is changed by the dynamic Dom element addition and deletion, and the element cannot be positioned correctly when the page is automatic, and the automation operation cannot be completed; in addition, with the continuous maturity of the Web technology, the Web development is standardized and componentized gradually, so that in order to ensure the consistency of experience across browsers, identical appearances and interactions exist in different browsers (Chrome/Firefox/Edge), the doming elements and patterns of the same component rendered in different browser types and even in different release versions of the same browser type are different, and the problem that page automation caused by browser replacement or browser version upgrading cannot accurately position the elements is caused, and automation operation cannot be completed.
Therefore, we propose a positioning method based on the attribute characteristics of the Web element, so as to solve the above-mentioned problems.
Disclosure of Invention
The application aims to provide a positioning method based on Web element attribute characteristics, which aims to solve the problem that most of positioning methods in the background art can cause that page automation caused by browser replacement or browser version upgrading cannot accurately position elements and cannot finish automation operation.
In order to achieve the above purpose, the present application provides the following technical solutions: a positioning method based on Web element attribute features, the Web element positioning method selects characteristic attribute values according to different element optimization, and constructs a target element XPath query string by utilizing the characteristic attribute values, wherein the Web element positioning method comprises the following steps:
step one: determining attribute names to be verified according to different types of target elements;
step two: acquiring a characteristic value of a target element;
step three: and constructing XPath according to the element characteristic values.
Preferably, in the first step, when determining the attribute name to be verified, the element attribute is first divided into a general attribute and a proprietary attribute, where:
the generic attribute refers to the attribute that HTML elements are ubiquitous and frequently used, and is specifically as follows: an element ID attribute field specifying a unique identifier for the element; a Name field for an element, the attribute specifying the Name of the element; title, an element Title field, which is commonly used for explaining the use of an element, and a text displayed by a prompt box when a mouse hovers over the element; text, text inside the element;
the proprietary attribute refers to the attribute specific to the reading element according to different element types, and is specifically as follows:
input HTMLE lement element proprietary: the Type is used for indicating that the Input element is used for inputting contents or whether the Input element is an operation such as form submission or not;
img HTMLE document element proprietary: src, the attribute prescribes a file path of the embedded picture; alt, the attribute prescribes text descriptions of the images, alternative text when the pictures cannot be presented or for a screen reader to read the descriptions for the user to listen;
iframe HTMLE lement element proprietary: src is embedded page address;
ahtmle element proprietary: uref, including the URL or URL fragment pointed by the hyperlink; target, which designates where to display the linked resource;
finally, the element attribute array ArrList can be obtained according to the target element general attribute and the special attribute.
Preferably, the specific process of obtaining the feature value of the target element in the second step includes:
(1) Constructing a target element characteristic attribute AMap and an integer Count, wherein the integer Count is used for recording the number of elements which can be matched through the current AMap in the global, and the initial value is infinite;
(2) Acquiring current attribute names K from the AttrList one by one, acquiring a value V corresponding to the current element attribute name K, and adding the value V into an AMap; the global search can be matched with all the elements of the attribute in the AMap, the number N of the elements meeting the condition at present is recorded, if the value of N is not smaller than Count at the moment, the addition of the current attribute K does not enable the target element to be positioned more accurately, and the attribute K and the corresponding V need to be deleted from the AMap at the moment; otherwise, updating the value of Count to be the number N of elements which currently meet the condition, and circularly executing the step until all the attributes in the AttrList are traversed, or only uniquely matching the attributes in the AMap to the target elements;
(3) Checking a Count field, if the field is 1, indicating that a target element can be uniquely positioned through an AMap, wherein an attribute value in the AMap at the moment is a characteristic value of the target element; if the field is larger than 1, the fact that the attribute value in the AMap is met in the page is indicated to be further provided with other elements, at the moment, the characteristic value needs to be further added to uniquely confirm the target element, a deep traversal algorithm based on a Dom tree structure is adopted, an element list ElementArray which meets the attribute value of the AMap is obtained, a position Index of the target element in the ElementArray is recorded, and at the moment, the Index and all the attribute values in the AMap can be used as the characteristic value of the target element.
Preferably, in the third step, XPath refers to an XML path language, where the XML path language may be used to detect whether a certain node in a document matches a certain pattern, and XPath provides a rich function, can flexibly support multiple attribute matching, and supports XPath in the main stream browser.
Preferably, the element attribute feature value obtained in the second step constructs an XPath character string of the target element, where the XPath is a positioning feature value obtained based on the attribute of the element, and after the Html page structure changes, the XPath character string can be stably positioned to the target element.
Preferably, the first step and the second step select the element with obvious characteristics as an anchor element to generate an attribute characteristic set, and the third step generates an XPath unique positioning target element according to the relative position of the anchor element to the target element.
Compared with the prior art, the application has the beneficial effects that: according to the positioning method based on the Web element attribute characteristics, the characteristic attribute of the target element is used as positioning information, the obtained characteristic attribute of the element is irrelevant to a Dom structure and a page display style, the characteristic attribute of the element is mostly used for indicating the service purpose of the element (for example, the characteristic attribute value of a common Submit button is type= "Submit" = "Submit") and after the Dom structure and the page style are changed, the characteristic attribute of the element is stable and unchanged.
Drawings
FIG. 1 is a schematic flow chart of the process for acquiring the attribute names of the characteristics of the target elements;
fig. 2 is a schematic flow chart of extracting the characteristic attribute of the target element by the dynamic characteristic attribute extraction algorithm of the application.
Detailed Description
The technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments, and all other embodiments obtained by those skilled in the art without making creative efforts based on the embodiments of the present application are included in the protection scope of the present application.
Referring to fig. 1-2, the present application provides a technical solution: a positioning method based on Web element attribute features, the Web element positioning method selects characteristic attribute values according to different element optimization, and constructs a target element XPath query string by utilizing the characteristic attribute values, wherein the Web element positioning method comprises the following steps:
step one: determining attribute names to be verified according to different types of target elements;
step two: acquiring a characteristic value of a target element;
step three: and constructing XPath according to the element characteristic values.
In a further step, when determining the attribute name to be verified, the element attribute is first divided into a general attribute and a proprietary attribute, wherein:
the generic attribute refers to the attribute that HTML elements are ubiquitous and frequently used, and is specifically as follows: an element ID attribute field specifying a unique identifier for the element; a Name field for an element, the attribute specifying the Name of the element; title, an element Title field, which is commonly used for explaining the use of an element, and a text displayed by a prompt box when a mouse hovers over the element; text, text inside the element;
the proprietary attribute refers to the attribute specific to the reading element according to different element types, and is specifically as follows:
input HTMLE lement element proprietary: the Type is used for indicating that the Input element is used for inputting contents or whether the Input element is an operation such as form submission or not;
img HTMLE document element proprietary: src, the attribute prescribes a file path of the embedded picture; alt, the attribute prescribes text descriptions of the images, alternative text when the pictures cannot be presented or for a screen reader to read the descriptions for the user to listen;
iframe HTMLE lement element proprietary: src is embedded page address;
ahtmle element proprietary: uref, including the URL or URL fragment pointed by the hyperlink; target, which designates where to display the linked resource;
finally, the element attribute array ArrList can be obtained according to the target element general attribute and the special attribute.
The specific process of obtaining the characteristic value of the target element in the second step further comprises the following steps:
(1) Constructing a target element characteristic attribute AMap and an integer Count, wherein the integer Count is used for recording the number of elements which can be matched through the current AMap in the global, and the initial value is infinite;
(2) Acquiring current attribute names K from the AttrList one by one, acquiring a value V corresponding to the current element attribute name K, and adding the value V into an AMap; the global search can be matched with all the elements of the attribute in the AMap, the number N of the elements meeting the condition at present is recorded, if the value of N is not smaller than Count at the moment, the addition of the current attribute K does not enable the target element to be positioned more accurately, and the attribute K and the corresponding V need to be deleted from the AMap at the moment; otherwise, updating the value of the Count to be the number N of the elements which currently meet the condition; the step is circularly executed until all the attributes in the AttrList are traversed, or only the target element can be uniquely matched through the attributes in the AMap;
(3) Checking a Count field, if the field is 1, indicating that a target element can be uniquely positioned through an AMap, wherein an attribute value in the AMap at the moment is a characteristic value of the target element; if the field is larger than 1, the fact that the attribute value in the AMap is met in the page is indicated to be further provided with other elements, at the moment, the characteristic value needs to be further added to uniquely confirm the target element, a deep traversal algorithm based on a Dom tree structure is adopted, an element list ElementArray which meets the attribute value of the AMap is obtained, a position Index of the target element in the ElementArray is recorded, and at the moment, the Index and all the attribute values in the AMap can be used as the characteristic value of the target element.
In the third step, the XPath refers to an XML path language, which can be used to detect whether a certain node in a document matches a certain pattern, the XPath provides rich functions, can flexibly support multiple attribute matching, and all the main stream browsers support XPath.
In the application, the characteristic value of the element attribute obtained in the second step is used for constructing the XPath character string of the target element, wherein the XPath is a positioning characteristic value obtained based on the attribute of the element, and the XPath can be stably positioned to the target element after the structure of the Html page is changed.
In the third step, XPath unique positioning target elements are generated according to the relative positions of the anchor point elements to the target elements.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It should be noted that in the description of the present specification, descriptions of terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics may be combined in any suitable manner in any one or more embodiments or examples, which are described as being relatively simple as being substantially similar to the method embodiments, as relevant in part to the description of the method embodiments. The system embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements illustrated as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.
In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes. It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (2)

1. A positioning method based on Web element attribute features is characterized by comprising the following steps: the Web element positioning method selects characteristic attribute values according to different element optimization, and constructs a target element XPath query string by utilizing the characteristic attribute values, wherein the Web element positioning method comprises the following steps:
step one: determining attribute names to be verified according to different types of target elements;
step two: acquiring a characteristic value of a target element;
step three: constructing XPath according to the element characteristic values;
in the first step, when determining the attribute name to be verified, the element attribute is first divided into a general attribute and a proprietary attribute, wherein:
the generic attribute refers to the attribute that HTML elements are ubiquitous and frequently used, and is specifically as follows: an element ID attribute field specifying a unique identifier for the element; a Name field for an element, the attribute specifying the Name of the element; title, an element Title field, which is commonly used for explaining the use of an element, and a text displayed by a prompt box when a mouse hovers over the element; text, text inside the element;
the proprietary attribute refers to the attribute specific to the reading element according to different element types, and is specifically as follows:
input HTMLE lement element proprietary: the Type is used for indicating that the Input element is used for inputting contents or whether the Input element is an operation such as form submission or not;
img HTMLE document element proprietary: src, the attribute prescribes a file path of the embedded picture; alt, the attribute prescribes text descriptions of the images, alternative text when the pictures cannot be presented or for a screen reader to read the descriptions for the user to listen;
iframe HTMLE lement element proprietary: src is embedded page address;
ahtmle element proprietary: uref, including the URL or URL fragment pointed by the hyperlink; target, which designates where to display the linked resource;
finally, the element attribute array ArrList can be obtained according to the general attribute and the special attribute of the target element;
the specific process for obtaining the characteristic value of the target element in the second step comprises the following steps:
(1) Constructing a target element characteristic attribute AMap and an integer Count, wherein the integer Count is used for recording the number of elements which can be matched through the current AMap in the global, and the initial value is infinite;
(2) Acquiring current attribute names K from the AttrList one by one, acquiring a value V corresponding to the current element attribute name K, and adding the value V into an AMap; the global search can be matched with all the elements of the attribute in the AMap, the number N of the elements meeting the condition at present is recorded, if the value of N is not smaller than Count at the moment, the addition of the current attribute K does not enable the target element to be positioned more accurately, and the attribute K and the corresponding V need to be deleted from the AMap at the moment; otherwise, updating the value of the Count to be the number N of the elements which currently meet the condition; the step is circularly executed until all the attributes in the AttrList are traversed, or only the target element can be uniquely matched through the attributes in the AMap;
(3) Checking a Count field, if the field is 1, indicating that a target element can be uniquely positioned through an AMap, wherein an attribute value in the AMap at the moment is a characteristic value of the target element; if the field is larger than 1, indicating that the page meets the attribute value in the AMap and other elements are needed, wherein the feature value is further added to uniquely identify the target element, a deep traversal algorithm based on a Dom tree structure is adopted to obtain an element list ElementArray which meets the attribute value of the AMap, the position Index of the target element in the ElementArray is recorded, and the Index and the attribute values in the AMap can be used as the feature value of the target element;
in the third step, XPath refers to an XML path language, which can be used to detect whether a certain node in a document is matched with a certain pattern (pattern), and the XPath provides rich functions, can flexibly support multiple attribute matching, and supports XPath in the main stream browser;
and (3) constructing an XPath character string of the target element based on the element attribute characteristic value obtained in the step (II), wherein the XPath is a positioning characteristic value obtained based on the element attribute, and can be stably positioned to the target element after the Html page structure is changed.
2. The positioning method based on the attribute characteristics of the Web element according to claim 1, wherein the positioning method comprises the following steps: and step one and step two are used for selecting elements with obvious characteristics as anchor point elements, generating attribute characteristic sets, and generating XPath unique positioning target elements according to the relative positions of the anchor point elements to the target elements in step three.
CN202110474540.3A 2021-04-29 2021-04-29 Positioning method based on Web element attribute characteristics Active CN113177168B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110474540.3A CN113177168B (en) 2021-04-29 2021-04-29 Positioning method based on Web element attribute characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110474540.3A CN113177168B (en) 2021-04-29 2021-04-29 Positioning method based on Web element attribute characteristics

Publications (2)

Publication Number Publication Date
CN113177168A CN113177168A (en) 2021-07-27
CN113177168B true CN113177168B (en) 2023-12-01

Family

ID=76925351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110474540.3A Active CN113177168B (en) 2021-04-29 2021-04-29 Positioning method based on Web element attribute characteristics

Country Status (1)

Country Link
CN (1) CN113177168B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113495775B (en) * 2021-09-07 2021-12-03 长沙博为软件技术股份有限公司 Combined positioning system, method, equipment and medium for RPA positioning control element
CN115062206B (en) * 2022-05-30 2023-04-07 上海弘玑信息技术有限公司 Webpage element searching method and electronic equipment
CN115033822B (en) * 2022-06-14 2024-05-17 壹沓科技(上海)有限公司 Element positioning method, device, equipment and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514292A (en) * 2013-10-09 2014-01-15 南京大学 Webpage data extraction method based on semi-supervised learning of small sample
CN108804472A (en) * 2017-05-04 2018-11-13 腾讯科技(深圳)有限公司 A kind of webpage content extraction method, device and server
CN110297752A (en) * 2018-03-23 2019-10-01 华为软件技术有限公司 Acquisition methods and device, automatization test system, the storage medium of control element
CN111046317A (en) * 2019-12-27 2020-04-21 北京奇艺世纪科技有限公司 Page data acquisition method, device, equipment and computer readable storage medium
CN111079043A (en) * 2019-12-05 2020-04-28 北京数立得科技有限公司 Key content positioning method
CN111368241A (en) * 2020-03-05 2020-07-03 苏州数字力量教育科技有限公司 Webpage element identification method based on XPath
CN112182468A (en) * 2020-10-14 2021-01-05 北京新纽科技有限公司 Positioning and analyzing method compatible with client interface element and webpage element

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7912755B2 (en) * 2005-09-23 2011-03-22 Pronto, Inc. Method and system for identifying product-related information on a web page

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514292A (en) * 2013-10-09 2014-01-15 南京大学 Webpage data extraction method based on semi-supervised learning of small sample
CN108804472A (en) * 2017-05-04 2018-11-13 腾讯科技(深圳)有限公司 A kind of webpage content extraction method, device and server
CN110297752A (en) * 2018-03-23 2019-10-01 华为软件技术有限公司 Acquisition methods and device, automatization test system, the storage medium of control element
CN111079043A (en) * 2019-12-05 2020-04-28 北京数立得科技有限公司 Key content positioning method
CN111046317A (en) * 2019-12-27 2020-04-21 北京奇艺世纪科技有限公司 Page data acquisition method, device, equipment and computer readable storage medium
CN111368241A (en) * 2020-03-05 2020-07-03 苏州数字力量教育科技有限公司 Webpage element identification method based on XPath
CN112182468A (en) * 2020-10-14 2021-01-05 北京新纽科技有限公司 Positioning and analyzing method compatible with client interface element and webpage element

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
generating xpath expressions for structured web data record segmentation;Tomas Grigalis等;ICIST 2012:information and software technologies;38-47 *
利用JTidy和XML实现Web数据信息的批量提取;刘钊夏;何明昕;;计算机工程与设计;第31卷(第06期);1243-1246 *
基于DOM的Deep Web查询接口属性抽取方法;石龙;强保华;何倩;吴春明;谌超;;桂林电子科技大学学报;第32卷(第06期);468-472 *
跨站脚本漏洞渗透测试技术;王丹;顾明昌;赵文兵;;哈尔滨工程大学学报;第38卷(第11期);1769-1774 *

Also Published As

Publication number Publication date
CN113177168A (en) 2021-07-27

Similar Documents

Publication Publication Date Title
CN113177168B (en) Positioning method based on Web element attribute characteristics
US7917846B2 (en) Web clip using anchoring
KR101312867B1 (en) Markup based extensibility for user interfaces
US6732102B1 (en) Automated data extraction and reformatting
CA2669479C (en) Generating end-user presentations from structured data
EP3358470B1 (en) Method of preparing documents in markup languages
JP4997749B2 (en) Document processing method, program, and system
US20030237046A1 (en) Transformation stylesheet editor
CN108762743B (en) Data table operation code generation method and device
US7720885B2 (en) Generating a word-processing document from database content
JP2008508642A (en) Document processing and management method for generating a new document in a markup language environment using a new fragment and a new scheme
KR20090028758A (en) Methods and apparatus for reusing data access and presentation elements
US20070204215A1 (en) Device for analyzing log files generated by process automation tools
WO2006102512A2 (en) Change control management of xml documents
JPWO2006051715A1 (en) Document processing apparatus and document processing method
JP2009520284A (en) Access method of XML file data
JPWO2006051713A1 (en) Document processing apparatus and document processing method
JPWO2006051960A1 (en) Document processing apparatus and document processing method
US7908586B1 (en) Collapse on content property
US10776351B2 (en) Automatic core data service view generator
JPWO2006051712A1 (en) Document processing apparatus and document processing method
JPWO2006051959A1 (en) Document processing apparatus and document processing method
JPWO2006051716A1 (en) Document processing apparatus and document processing method
US20110078552A1 (en) Transclusion Process
US20190332600A1 (en) Query engine for recursive searches in a self-describing data system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant