CN112347332A - XPath-based crawler target positioning method - Google Patents
XPath-based crawler target positioning method Download PDFInfo
- Publication number
- CN112347332A CN112347332A CN202011287213.9A CN202011287213A CN112347332A CN 112347332 A CN112347332 A CN 112347332A CN 202011287213 A CN202011287213 A CN 202011287213A CN 112347332 A CN112347332 A CN 112347332A
- Authority
- CN
- China
- Prior art keywords
- webpage
- xpath
- blocks
- content
- monitoring
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000012544 monitoring process Methods 0.000 claims abstract description 33
- 230000000903 blocking effect Effects 0.000 claims abstract description 23
- 230000003993 interaction Effects 0.000 claims abstract description 13
- 238000000638 solvent extraction Methods 0.000 claims description 6
- 239000003086 colorant Substances 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 claims description 3
- 230000009193 crawling Effects 0.000 claims description 2
- 230000008685 targeting Effects 0.000 claims 5
- 230000000007 visual effect Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 10
- 238000005457 optimization Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000009471 action Effects 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 238000005192 partition Methods 0.000 description 3
- 101100171060 Caenorhabditis elegans div-1 gene Proteins 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 208000010359 Newcastle Disease Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Abstract
The invention belongs to the technical field of computer WEB and the technical field of information capture, and particularly relates to a crawler target positioning method based on a webpage path XPath. The method comprises the following specific steps: step 1, loading website information and acquiring a webpage corresponding to a website; step 2, finding out the relative position of the current content in the webpage according to the monitoring position; step 3, dividing the webpage into blocks, wherein each webpage contains monitoring position content; and 4, determining a monitoring range through human-computer interaction. The invention can meet the requirements of the user on information (news, notice and other contents) monitoring and acquisition based on the actual requirements of the user. The invention realizes the blocking of the webpage based on the tree structure of the webpage, and realizes the accurate positioning of the user requirement by showing in a visual way and in a man-machine interaction way.
Description
Technical Field
The invention belongs to the technical field of computer WEB and the technical field of information capture, and particularly relates to a crawler target positioning method based on a webpage path XPath.
Background
As more and more information is brought into people's lives, it becomes more difficult to obtain accurate and effective information in time. In this era of "information explosion", it is becoming increasingly difficult to accurately obtain a variety of information such as education, shopping, news, etc. that are of interest to individuals.
The positioning method of the crawler target has important significance for information grabbing, monitoring webpage information change and the like. The website subscription service based on the RSS technology which is redpolar a few years ago notifies users when the website content is updated, and a crawler target positioning method is not adopted, so that the information updating reminding is not targeted. Most of the later developed web page target positioning methods are identified by determining attributes, such as id, class and the like, of a certain position block, and the method is limited by the naming rule of a web page source code and does not meet the real requirements of users.
Disclosure of Invention
Aiming at the problems, the invention provides an accurate crawler target positioning method according with the user intention, and the crawler target positioning method is more accurate and reasonable.
In order to achieve the purpose, the invention provides the following technical scheme:
a crawler target positioning method based on XPath comprises the following specific steps:
and 4, determining a monitoring range through human-computer interaction.
In the further optimization of the technical scheme, in the step 1, the target webpage is visually presented in a form of embedding the webpage by crawling the input webpage source code.
In the further optimization of the technical scheme, the step 2 comprises the steps of selecting a monitored position according to a target webpage and inputting the existing text content of the position; and finding the XPath corresponding to the text content by traversing the DOM tree structure of the webpage source code.
In a further optimization of the technical solution, the specific method for finding the existing content XPath of the monitoring location in step 2 is as follows: traversing DOM tree nodes of the HTML framework webpage, finding tree nodes matched with the input content, and storing paths of the tree nodes.
In a further optimization of the technical solution, the method for partitioning the web page in step 3 comprises: the webpage blocking technology is based on a DOM tree structure of an HTML framework webpage, a path from a root node to a leaf node represents all webpage blocks containing the existing content of a monitoring position, and the webpage blocks are marked in the webpage; dividing the webpage blocks into longitudinal blocks and transverse blocks according to the number of XPath returned in the step 2; when the XPath number is just 1, only one determined position is shown, and only longitudinal partitioning is needed; when the number of XPath is more than 1, it is necessary to perform transverse blocking first and then perform longitudinal blocking.
In a further optimization of the technical solution, the step 4 specifically includes:
step 4.1, returning XPath as empty, and inputting again;
step 4.2, the returned XPath number is 1, the webpage is presented in blocks according to the step 3, the user inputs numbers representing different webpage blocks according to requirements, and the webpage blocks needing to be monitored are fed back;
and 4.3, the returned XPath number is larger than 1, the first interaction is presented in a transverse block mode, the second interaction is presented in a longitudinal block mode, and the user selects the accurate monitoring position.
In the further optimization of the technical scheme, the step 3 specifically comprises the following steps: according to the existing content of the monitoring position, according to the webpage structure, the range containing the content of the monitoring position is divided in the webpage in a longitudinal blocking mode and a transverse blocking mode, and the range is marked with different colors.
Different from the prior art, the beneficial results of the technical scheme are as follows:
the invention can meet the requirements of the user on information (news, notice and other contents) monitoring and acquisition based on the actual requirements of the user. The invention realizes the blocking of the webpage based on the tree structure of the webpage, and realizes the accurate positioning of the user requirement by showing in a visual way and in a man-machine interaction way.
Compared with the method based on the attribute value, the method has higher universality and is suitable for all web pages based on HTML. In addition, the method can realize effective information monitoring by combining with a crawler scheme, and meanwhile, a problem feedback module for the webpage can accurately determine the position of the problem, so that manpower and material resources are saved. Meanwhile, the invention is very user-friendly, does not need a tutorial when getting rid of the computer professional terms, and has concise and understandable human-computer interaction process.
Drawings
FIG. 1 is a flow chart of a crawler target location method based on XPath;
FIG. 2 is a simplified web page partition and DOM tree structure diagram;
FIG. 3 is a simplified diagram of a transverse block;
FIG. 4 is a first schematic view of a page;
FIG. 5 is a second schematic page view;
fig. 6 is a third schematic page diagram.
Detailed Description
To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.
Please refer to fig. 1, which is a flowchart of a XPath-based crawler target positioning method, the method specifically includes the following steps:
After a user inputs a target website, a background downloads an HTML (hypertext markup language) file of a webpage based on a requests library of python, and then uses a regular expression to analyze an original path of resources such as CSS (cascading style sheets), JavaScript and pictures in the file, the resources are downloaded and stored according to the path to present a complete original webpage instead of a single HTML structure text, and the downloaded target webpage is loaded in an input page, so that the user can conveniently check and select.
Determining the relative position of the existing content in the webpage through the monitoring position input by the user is one of the key steps of the invention.
In a webpage with an HTML framework, a DOM establishes an HTML document into a tree structure, finds nodes with node contents matched with content input by a user by traversing nodes of the DOM tree, records paths of the nodes in HTML codes of the webpage and expresses the paths by XPath. The path expressed based on XPath is a path from the root node < HTML > to the node where the content is located in the DOM tree structure of the HTML framework webpage. The HTML document object in the character string format is converted into an _ Element object by using an etee.HTML function in an lxml library of Python, so that an XPath method can be used for analyzing the XPath, and a find method in the Element tree of the etee can be used for searching for the matched XPath.
The invention divides the webpage into two types, namely longitudinal blocking and transverse blocking, and solves the problems under different conditions by adopting two modes. The longitudinal partitions mainly classify parent and child nodes of an XPath path, and the transverse partitions mainly mark different positions corresponding to different contents, and the definitions of the two are explained below.
Step 3.1, longitudinal blocking
In an HTML-structured web page, the start and end points of each tag delimit the tag, based on which we can block the web page. The XPath obtained in the previous step represents a path from the root node < html > to the node where the content is located, and each node on the path corresponds to a block in the page. For example, in the Newcastle web, the corresponding path of a news headline content in a web page is represented as/html/body/div 9/div 6/div 2/div/div 2/h 1[1], and its web page block diagram and simplified path diagram are shown in FIG. 2, which is a simplified structure diagram of web page blocks and DOM tree. In the path of this DOM tree, the web page range corresponding to the child node is a subset of the parent node. Each node on this path represents a block range of the web page. The invention analyzes XPath by using the convenience of character string slicing in Python.
In the aspect of visual presentation, a form of adding a frame to each block is adopted, and a specific technical means is to analyze an XPath path, for example, XPath (content) is analyzed into a plurality of parts of/html/body,/html/body/div [9], …,/html/body/div [9]/div [6]/div [2]/div/div [2]/h1[1], and specific principles of color framing of the webpage blocks are detailed in table 1 through a CSS selector, and a CSS selector adopts a { loader: red solid thick; and displaying the modification mode of the electronic device to a user for selection.
TABLE 1 CSS selector specific rules Table
Step 3.2, transversely blocking
The horizontal blocks are identified for different locations corresponding to different contents, for example, in the Newcastle disease network, assuming that the input content is "Liipu", there will be more than one location where the content appears, respectively,/html/body/div 10]/div 7/div 1/div 5/div 2/ul/li 11[/a and/html/body/div 10/div 14/div 2/div 1/li 4/a. For the identification of this case, since there is generally no intersection between blocks, it can be represented as a simplified diagram as shown in fig. 3.
The method for partitioning the web page is based on the organization mode of html, and each element corresponds to a range in the web page. Element-based analysis is also the division of the web page scope.
The method has the advantages that: and two modes of transverse partitioning and longitudinal partitioning are used, and positioning is performed from a two-dimensional angle, so that the crawler target is determined more accurately.
The method has the advantages that: meanwhile, the interactive process with the user is combined, so that the selection of the monitoring range is closer to the requirement of the user.
The man-machine interaction is mainly used for determining the specific position to be monitored by the user, the system presents various conditions after the blocking, the user selects the specific range to be monitored, and the accurate positioning of the user requirement is realized in a Q & A mode.
After analyzing the existing contents of the target website and the monitoring position, the invention can carry out different operations according to the number of returned XPath.
Step 4.1, return XPath empty in step 2
And (4) reminding the user that the input content does not appear in the website, needing to check the content, and jumping back to the step 2 to input again.
Step 4.2, there is exactly one XPath returned in step 2
And (3) directly performing longitudinal blocking according to the step (3) and presenting the longitudinal blocking to a user, inputting numbers representing different webpage blocks by the user according to requirements, and feeding back the webpage blocks needing to be monitored.
Step 4.3, the XPath returned in step 2 is more than one
This situation illustrates that more than one content position appears in the web page, and two man-machine interactive questions and answers are needed. Assuming that there are n (n > 1) returned xpaths, the first interactive selection step of the horizontal tile presentation in step 3 represents the selection of one out of n separate tiles; and 4, interactively selecting the longitudinal blocks in the step 3 for the second time, and presenting the longitudinal blocks to the user to select the accurate position to be monitored.
Specific examples are as follows:
1. user input
Target website url: http:// sports
Monitoring the existing content of the position: plum blossom: the two people are the best chief in the heart of the people who teach the people to give a lot of people
Target keywords: c Rou
Because the "existing content of the monitored location" input by the user is complete, only 1 XPath is returned, as shown in FIG. 4, which is a first page diagram, the returned XPath is/html/body/div 4/div 5/div 2/div/ul 1/li 2/a, and vertical blocking can be directly performed.
If the 'monitoring position existing content' input by the user is not complete enough, for example, only two characters of 'Meixi' are input, a plurality of positions can be found in the webpage, and a plurality of results are returned. At this time, horizontal blocking is performed first, as shown in fig. 5, which is a page diagram ii.
Suppose that the user selects 1, and then performs vertical blocking, which is schematically represented as fig. 6, which is a schematic page diagram three. At this time, the user can determine the final monitoring position by selecting from 1-4. And when the monitoring range has two characters of 'C Rou', the user is reminded.
The invention divides the frames in the webpage blocks into different colors corresponding to different numbers, intelligently presents the corresponding relation between the numbers and the color table on the man-machine interaction interface, acquires the target requirement of the user by collecting the numbers fed back by the user, and realizes the man-machine communication based on the determined rule.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrases "comprising … …" or "comprising … …" does not exclude the presence of additional elements in a process, method, article, or terminal that comprises the element. Further, herein, "greater than," "less than," "more than," and the like are understood to exclude the present numbers; the terms "above", "below", "within" and the like are to be understood as including the number.
Although the embodiments have been described, once the basic inventive concept is obtained, other variations and modifications of these embodiments can be made by those skilled in the art, so that the above embodiments are only examples of the present invention, and not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes using the contents of the present specification and drawings, or any other related technical fields, which are directly or indirectly applied thereto, are included in the scope of the present invention.
Claims (7)
1. A crawler target positioning method based on XPath is characterized by comprising the following specific steps:
step 1, loading website information and acquiring a webpage corresponding to a website;
step 2, finding out the relative position of the current content in the webpage according to the monitoring position;
step 3, dividing the webpage into blocks, wherein each webpage contains monitoring position content;
and 4, determining a monitoring range through human-computer interaction.
2. The XPath-based crawler targeting method of claim 1, wherein said step 1 visually presents the destination web page in the form of an embedded web page by crawling the input page source code.
3. The XPath-based crawler targeting method of claim 1, wherein said step 2 comprises selecting a monitored location based on a destination web page and entering existing text at the location; and finding the XPath corresponding to the text content by traversing the DOM tree structure of the webpage source code.
4. A method for XPath-based crawler targeting as recited in claim 3 wherein said step 2 specific method for finding the monitoring location existing content XPath is: traversing DOM tree nodes of the HTML framework webpage, finding tree nodes matched with the input content, and storing paths of the tree nodes.
5. The XPath-based crawler targeting method of claim 3, wherein said step 3 web blocking method is: the webpage blocking technology is based on a DOM tree structure of an HTML framework webpage, a path from a root node to a leaf node represents all webpage blocks containing the existing content of a monitoring position, and the webpage blocks are marked in the webpage; dividing the webpage blocks into longitudinal blocks and transverse blocks according to the number of XPath returned in the step 2; when the XPath number is just 1, only one determined position is shown, and only longitudinal partitioning is needed; when the number of XPath is more than 1, it is necessary to perform transverse blocking first and then perform longitudinal blocking.
6. A XPath-based crawler targeting method as recited in claim 3, wherein said step 4 specifically comprises:
step 4.1, returning XPath as empty, and inputting again;
step 4.2, the returned XPath number is 1, the webpage is presented in blocks according to the step 3, the user inputs numbers representing different webpage blocks according to requirements, and the webpage blocks needing to be monitored are fed back;
and 4.3, the returned XPath number is larger than 1, the first interaction is presented in a transverse block mode, the second interaction is presented in a longitudinal block mode, and the user selects the accurate monitoring position.
7. The XPath-based crawler target positioning method of claim 1, wherein said step 3 is specifically: according to the existing content of the monitoring position, according to the webpage structure, the range containing the content of the monitoring position is divided in the webpage in a longitudinal blocking mode and a transverse blocking mode, and the range is marked with different colors.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011287213.9A CN112347332A (en) | 2020-11-17 | 2020-11-17 | XPath-based crawler target positioning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011287213.9A CN112347332A (en) | 2020-11-17 | 2020-11-17 | XPath-based crawler target positioning method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112347332A true CN112347332A (en) | 2021-02-09 |
Family
ID=74364091
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011287213.9A Pending CN112347332A (en) | 2020-11-17 | 2020-11-17 | XPath-based crawler target positioning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112347332A (en) |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8166054B2 (en) * | 2008-05-29 | 2012-04-24 | International Business Machines Corporation | System and method for adaptively locating dynamic web page elements |
CN102831121A (en) * | 2011-06-15 | 2012-12-19 | 阿里巴巴集团控股有限公司 | Method and system for extracting webpage information |
US20130024441A1 (en) * | 2011-07-22 | 2013-01-24 | Alibaba Group Holding Limited | Configuring web crawler to extract web page information |
CN104965901A (en) * | 2015-06-30 | 2015-10-07 | 北京奇虎科技有限公司 | Method and apparatus for grabbing content of target page |
CN106294885A (en) * | 2016-10-09 | 2017-01-04 | 华东师范大学 | A kind of data collection towards isomery webpage and mask method |
CN107220250A (en) * | 2016-03-21 | 2017-09-29 | 北大方正集团有限公司 | A kind of template configuration method and system |
CN107729475A (en) * | 2017-10-16 | 2018-02-23 | 深圳视界信息技术有限公司 | Web page element acquisition method, device, terminal and computer-readable recording medium |
CN108733405A (en) * | 2017-04-13 | 2018-11-02 | 富士通株式会社 | The method and apparatus that training webpage distribution indicates model |
CN109325204A (en) * | 2018-09-13 | 2019-02-12 | 武汉伯远生物科技有限公司 | Web page contents extraction method |
CN110110198A (en) * | 2017-12-28 | 2019-08-09 | 中移(苏州)软件技术有限公司 | A kind of method for abstracting web page information and device |
CN110134841A (en) * | 2018-02-09 | 2019-08-16 | 鼎复数据科技(北京)有限公司 | The customized real-time method for obtaining website data |
CN110222251A (en) * | 2019-05-27 | 2019-09-10 | 浙江大学 | A kind of Service encapsulating method based on Web-page segmentation and searching algorithm |
CN110390038A (en) * | 2019-07-25 | 2019-10-29 | 中南民族大学 | Segment method, apparatus, equipment and storage medium based on dom tree |
-
2020
- 2020-11-17 CN CN202011287213.9A patent/CN112347332A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8166054B2 (en) * | 2008-05-29 | 2012-04-24 | International Business Machines Corporation | System and method for adaptively locating dynamic web page elements |
CN102831121A (en) * | 2011-06-15 | 2012-12-19 | 阿里巴巴集团控股有限公司 | Method and system for extracting webpage information |
US20130024441A1 (en) * | 2011-07-22 | 2013-01-24 | Alibaba Group Holding Limited | Configuring web crawler to extract web page information |
CN104965901A (en) * | 2015-06-30 | 2015-10-07 | 北京奇虎科技有限公司 | Method and apparatus for grabbing content of target page |
CN107220250A (en) * | 2016-03-21 | 2017-09-29 | 北大方正集团有限公司 | A kind of template configuration method and system |
CN106294885A (en) * | 2016-10-09 | 2017-01-04 | 华东师范大学 | A kind of data collection towards isomery webpage and mask method |
CN108733405A (en) * | 2017-04-13 | 2018-11-02 | 富士通株式会社 | The method and apparatus that training webpage distribution indicates model |
CN107729475A (en) * | 2017-10-16 | 2018-02-23 | 深圳视界信息技术有限公司 | Web page element acquisition method, device, terminal and computer-readable recording medium |
CN110110198A (en) * | 2017-12-28 | 2019-08-09 | 中移(苏州)软件技术有限公司 | A kind of method for abstracting web page information and device |
CN110134841A (en) * | 2018-02-09 | 2019-08-16 | 鼎复数据科技(北京)有限公司 | The customized real-time method for obtaining website data |
CN109325204A (en) * | 2018-09-13 | 2019-02-12 | 武汉伯远生物科技有限公司 | Web page contents extraction method |
CN110222251A (en) * | 2019-05-27 | 2019-09-10 | 浙江大学 | A kind of Service encapsulating method based on Web-page segmentation and searching algorithm |
CN110390038A (en) * | 2019-07-25 | 2019-10-29 | 中南民族大学 | Segment method, apparatus, equipment and storage medium based on dom tree |
Non-Patent Citations (2)
Title |
---|
李桐宇: "面向领域的网页内容提取及语义标签生成框架", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
陈晓雷: "自适应Web数据抽取技术研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7406459B2 (en) | Concept network | |
EP2057557B1 (en) | Joint optimization of wrapper generation and template detection | |
US9336279B2 (en) | Hidden text detection for search result scoring | |
US9594730B2 (en) | Annotating HTML segments with functional labels | |
US20060212446A1 (en) | Method and system for assessing relevant properties of work contexts for use by information services | |
US20080235567A1 (en) | Intelligent form filler | |
US20050028156A1 (en) | Automatic method and system for formulating and transforming representations of context used by information services | |
US20060161564A1 (en) | Method and system for locating information in the invisible or deep world wide web | |
CN102662969B (en) | Internet information object positioning method based on webpage structure semantic meaning | |
CN106202514A (en) | Accident based on Agent is across the search method of media information and system | |
US20090019015A1 (en) | Mathematical expression structured language object search system and search method | |
CN103443786A (en) | Machine learning method to identify independent tasks for parallel layout in web browsers | |
CN109906450A (en) | For the method and apparatus by similitude association to electronic information ranking | |
CN105045875A (en) | Personalized information retrieval method and apparatus | |
CN102741838A (en) | System and method for block segmenting, identifying and indexing visual elements, and searching documents | |
US10810181B2 (en) | Refining structured data indexes | |
KR102157218B1 (en) | Data transformation method for spatial data's semantic annotation | |
KR20190131778A (en) | Web Crawler System for Collecting a Structured and Unstructured Data in Hidden URL | |
US8150878B1 (en) | Device method and computer program product for sharing web feeds | |
Shestakov et al. | DEQUE: querying the deep web | |
CN106372232B (en) | Information mining method and device based on artificial intelligence | |
Wang et al. | Enriching descriptions for public web services using information captured from related web pages on the internet | |
CN112347332A (en) | XPath-based crawler target positioning method | |
CN111666479A (en) | Method for searching web page and computer readable storage medium | |
CN115033643A (en) | Data synchronization method, electronic device and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210209 |