CN113779377B - Crawler searching method based on barrier-free detection result deduplication - Google Patents

Crawler searching method based on barrier-free detection result deduplication Download PDF

Info

Publication number
CN113779377B
CN113779377B CN202110849849.6A CN202110849849A CN113779377B CN 113779377 B CN113779377 B CN 113779377B CN 202110849849 A CN202110849849 A CN 202110849849A CN 113779377 B CN113779377 B CN 113779377B
Authority
CN
China
Prior art keywords
link
links
detection
pages
barrier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110849849.6A
Other languages
Chinese (zh)
Other versions
CN113779377A (en
Inventor
卜佳俊
杨文武
周晟
王炜
于智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202110849849.6A priority Critical patent/CN113779377B/en
Publication of CN113779377A publication Critical patent/CN113779377A/en
Application granted granted Critical
Publication of CN113779377B publication Critical patent/CN113779377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A crawler searching method based on barrier-free detection result deduplication presets the total number of pages to be crawled, circularly acquires links from a URL queue, accesses the links and acquires webpage source codes; detecting a rule subset selected from the webpage source codes, and combining detection results into a feature matrix; after all links extracted from a webpage are accessed, clustering matrixes acquired by all linked pages by using a DBSCAN algorithm; randomly sampling the clustered results of each cluster to serve as representative pages of the cluster, extracting links from the representative pages, and adding the links into a URL (uniform resource locator) queue, wherein other webpages in one cluster are marked as 'skipped' due to the fact that barrier-free detection results are similar to the representative pages, and the count of the crawled pages is directly increased without actually crawling the pages; the invention is used for the webpage link crawling stage in the automatic detection of the user friendliness degree of the website pages, and the crawling progress is quickened by controlling the number of the crawled pages, so that the detection efficiency is quickened.

Description

Crawler searching method based on barrier-free detection result deduplication
Technical field:
the invention belongs to the field of information barrier-free, and particularly relates to a crawler module applied to a barrier-free detection step.
The background technology is as follows:
in the big data information age, people are getting the information needed by themselves from mass data in the internet more and more. With the gradual penetration of the concept of Internet plus, information is given a wider connotation by people in the background of social development. How to realize information equalization through information technology in the information environment of the Internet, so that all people including handicapped groups and vulnerable groups can conveniently acquire and use the information, and the information is just one current social hotspot.
The web page is unobstructed, which means that a disabled person and a sound person with special requirements can acquire any information on the network. To do this, it is desirable to achieve both unobstructed web content and unobstructed auxiliary software technology for use on the web. Due to the rapid development of internet technology, the presentation forms of data information in web pages are also becoming more and more diversified. In order to display more information in a page, a developer uses technology to achieve the aim that a user pays attention to a certain type of information, and the information is usually presented in a form of a floating window, a side advertisement and the like. However, this approach often brings bad use experience to the user, and at the same time, aggravates the obstacle of the weak group (visually impaired people, elderly people) to acquire information, so that the weak group cannot acquire the normal information content in the web page normally through the auxiliary mode. Therefore, in order to reduce the information acquisition threshold of the vulnerable group, the barrier-free construction of the web page is necessary.
The detection of the web page without obstacle is an important ring in the construction of the web page without obstacle. Through carrying out barrier-free detection on the web page, various designs which are unfavorable for the target user to acquire information in the web page are found, and effective basis can be provided for subsequent barrier-free optimization of the website. The premise of carrying out overall barrier-free detection on the website is to acquire pages in the website through the crawler. Most of the current crawlers use breadth-first search and content duplication removal methods, and along with the improvement of the intelligent degree of the barrier-free detection of the web pages, the time consumption of the barrier-free detection of the single web page is longer and longer. The barrier-free detection of all pages of the website can obviously greatly increase the detection cost, and is unfavorable for rapidly objectively evaluating the barrier-free detection degree of the website, so that other crawler search methods are urgently needed to improve the detection efficiency.
The invention comprises the following steps:
aiming at the problems and difficulties, the invention provides a crawler searching method based on barrier-free detection result deduplication. Compared with the traditional breadth-first search, the method reduces the number of the webpages needing to be crawled through the similarity judgment of the barrier-free detection results, and also reduces the number of calling complex barrier-free detection methods. Compared with the method for judging the similarity of the web pages by utilizing the content and the structure of the web pages, the method for judging the similarity of the web pages is short in time consumption and low in consumption, improves the diversity of barrier-free detection results of the web sites, and improves the barrier-free detection speed of the web sites.
The crawler searching method based on the barrier-free detection result deduplication comprises the following specific steps:
s1, acquiring links of a website top page and total number total count of webpages required to be acquired from user input. And adding the links of the website top page into the URL queue.
S2, acquiring a link from the first URL queue to be crawled, and accessing the link to acquire the webpage source code. The value of the accessed link number finisccount is increased by 1.
S3, if the number of links which have been accessed finiscount and the number of links marked as skipped skip meet the condition finiscount+skip count not less than total count, ending the flow, otherwise continuing to execute downwards.
S4, extracting an unobstructed detection item matrix for the webpage source code.
S41, selecting a rule subset from GB/T37668-2019 information technology Internet content accessible technical requirements and test methods. The rules selected meet the following criteria: the realization is simple, and only depends on the webpage source code, and does not relate to image, video or audio information. The detection speed is high, and the total time of all rules on a single webpage is not more than 1 second. According to the above standard, 7 accessible rules are selected from national standards, and the rule names are respectively: non-text links, non-text controls, non-text content, user contact feedback, real-time user contact feedback, consistent navigation, in-site searching, and sitemaps.
S42, detecting the rule selected in the step S41 of the webpage source code application, wherein for one rule, the detected result is in the form of r= [ N ] s ,N p ,N f ,N i ]Where N is s N is the number of detection points p N is the number of passes of the result in the detection point f N is the number of failed detection points i Is the result of the detection point isUnknown quantity.
S43, splicing vectors corresponding to the detection rules obtained in the step S42 into a matrix according to a fixed sequence, wherein the sequence of the rules is fixed, and the vectors can be ordered according to the number of the rules. The matrix format obtained is m= [ r 1 ,r 2 ,r 3 ...r n ]Wherein r is i Is the vector corresponding to the ith rule.
S5, when the link A is extracted from the webpage source code corresponding to the link B, the link A is called as a parent link of the link B, and the link A is a child link of the link B. Finding the parent link of the current access link, and returning to the step S2 to continue execution if all child links of the parent link are not all accessed. Otherwise, a set c= { M of matrices obtained by step S4 will be obtained 1 ,M 2 ,M 3 ...M n M is }, where M i And detecting a term matrix for the ith sub-link corresponding to the barrier-free webpage source code. And performing cluster analysis on the set by using a DBSCAN method based on density clustering, and dividing the set C into a plurality of clusters.
S6, for each cluster divided in the step S5, sampling according to the proportion lambda, and putting the sampling result into a set R= { H 1 ,H 2 ,H 3 ...H m The remaining results are put into t= { H } m+1 ...H n }. Aggregate wherein H i And detecting the original webpage source code corresponding to the item matrix for the ith barrier-free detection.
S7, extracting links from each element and adding the links to the URL queue for the set R acquired in the step S6. For the set T acquired in step S6, links are extracted from each element, and all links are added to the set p= { U 1 ,U 2 ,U 3 ...U n U, where i Is the ith link. Links in set P are links that are de-duplicated based on the barrier-free detection result, marked as skipped, and the number of skipped links skip count is added to the number of elements Card (P) in set P.
S8, if the finiscount+skip count is more than or equal to the total count, enough webpages are acquired, the process ends, and otherwise, the step S2 is repeatedly executed.
In summary, the invention provides a crawler searching method based on barrier-free detection result deduplication, which has the following beneficial effects:
(1) By reducing the number of the web pages to be crawled, the speed of the overall barrier-free detection of the website is improved.
(2) Compared with a similarity judging method based on webpage content and webpage structure, the method has the advantages that the step of detecting the webpage without obstacle and the step of crawling the webpage are integrated, and no extra calculation is needed by taking the detection result without obstacle as the characteristic.
(3) The method has universality, uses the barrier-free detection result as a characteristic, does not depend on the content or structure of the website, and can be implemented on different types of websites.
Description of the drawings:
in order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 shows a general flow chart of a crawler search method based on barrier-free detection result deduplication provided by the invention.
Fig. 2 shows a flowchart of obtaining a web page barrier-free detection result matrix in a general flowchart of the crawler search method based on barrier-free detection result deduplication provided by the invention.
The specific implementation method comprises the following steps:
exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Taking a website as an example, the method comprises the following specific steps:
s1, acquiring links of a website top page and total number total count of webpages required to be acquired from user input. And adding the links of the website top page into the URL queue.
S2, acquiring a link from the first URL queue to be crawled, and accessing the link to acquire the webpage source code. The value of the accessed link number finisccount is increased by 1.
S3, if the number of links which have been accessed finiscount and the number of links marked as skipped skip meet the condition finiscount+skip count not less than total count, ending the flow, otherwise continuing to execute downwards.
S4, extracting an unobstructed detection item matrix for the webpage source code.
S41, selecting a rule subset from GB/T37668-2019 information technology Internet content accessible technical requirements and test methods. The rules selected meet the following criteria:
1. the realization is simple, and only depends on the webpage source code, and does not relate to image, video or audio information.
2. The detection speed is high, and the total time of all rules on a single webpage is not more than 1 second.
According to the above standard, 7 accessible rules are selected from national standards, and the rule names are respectively: non-text links, non-text controls, non-text content, user contact feedback, real-time user contact feedback, consistent navigation, in-site searching, and sitemaps.
S42, detecting the rule selected in the step S41 of the webpage source code application, wherein for one rule, the detected result is in the form of r= [ N ] s ,N p ,N f ,N i ]Where N is s N is the number of detection points p N is the number of passes of the result in the detection point f N is the number of failed detection points i Is the number of unknown results in the detection point.
S43, splicing vectors corresponding to the detection rules obtained in the step S42 into a matrix according to a fixed sequence, wherein the sequence of the rules is fixed, and the vectors can be ordered according to the number of the rules. The matrix format obtained is m= [ r 1 ,r 2 ,r 3 ...r n ]Wherein r is i Is the vector corresponding to the ith rule.
S5, when the link A is extracted from the webpage source code corresponding to the link B, the link A is called as a parent link of the link B, and the link A is a child link of the link B. Finding the parent link of the current access link, and returning to the step S2 to continue execution if all child links of the parent link are not all accessed. Otherwise, a set c= { M of matrices obtained by step S4 will be obtained 1 ,M 2 ,M 3 ...M n M is }, where M i And detecting a term matrix for the ith sub-link corresponding to the barrier-free webpage source code. And performing cluster analysis on the set by using a DBSCAN method based on density clustering, and dividing the set C into a plurality of clusters.
S6, for each cluster divided in the step S5, sampling according to the proportion lambda, and putting the sampling result into a set R= { H 1 ,H 2 ,H 3 ...H m The remaining results are put into t= { H } m+1 ...H n }. Aggregate wherein H i And detecting the original webpage source code corresponding to the item matrix for the ith barrier-free detection.
S7, extracting links from each element and adding the links to the URL queue for the set R acquired in the step S6. For the set T acquired in step S6, links are extracted from each element, and all links are added to the set p= { U 1 ,U 2 ,U 3 ...U n U, where i Is the ith link. Links in set P are links that are de-duplicated based on the barrier-free detection result, marked as skipped, and the number of skipped links skip count is added to the number of elements Card (P) in set P.
S8, if the finiscount+skip count is more than or equal to the total count, enough webpages are acquired, the process ends, and otherwise, the step S2 is repeatedly executed.
FIG. 1 shows a general flow chart of a crawler search method based on barrier-free detection result deduplication provided by the invention
Fig. 2 shows a flowchart of obtaining a web page unobstructed detection result matrix in a general flowchart of a crawler search method based on unobstructed detection result deduplication provided by the invention: s41 No barrier is provided for Internet content from GB/T37668-2019 information technologyThe accessibility specification and test method selects a subset of rules. The rules selected meet the following criteria: 1. the realization is simple, and only depends on the webpage source code, and does not relate to image, video or audio information. 2. The detection speed is high, and the total time of all rules on a single webpage is not more than 1 second. According to the above standard, 7 accessible rules are selected from national standards, and the rule names are respectively: non-text links, non-text controls, non-text content, user contact feedback, real-time user contact feedback, consistent navigation, in-site searching, and sitemaps. S42, detecting the rule selected in the step S41 of the webpage source code application, wherein for one rule, the detected result is in the form of r= [ N ] s ,N p ,N f ,N i ]Where N is s N is the number of detection points p N is the number of passes of the result in the detection point f N is the number of failed detection points i Is the number of unknown results in the detection point. S43, splicing vectors corresponding to the detection rules obtained in the step S42 into a matrix according to a fixed sequence, wherein the sequence of the rules is fixed, and the vectors can be ordered according to the number of the rules. The matrix format obtained is m= [ r 1 ,r 2 ,r 3 ...r n ]Wherein r is i Is the vector corresponding to the ith rule.
The embodiments described in the present specification are merely examples of implementation forms of the inventive concept, and the scope of protection of the present invention should not be construed as being limited to the specific forms set forth in the embodiments, and the scope of protection of the present invention and equivalent technical means that can be conceived by those skilled in the art based on the inventive concept.

Claims (1)

1. A crawler searching method based on barrier-free detection result deduplication comprises the following steps:
s1, acquiring links of a website top page and total number total count of webpages to be acquired from user input; adding the links of the website home page into a URL queue;
s2, acquiring a link from the first URL queue to be crawled, and accessing the link to acquire a webpage source code; the value of the accessed link number finisccount plus 1;
s3, if the accessed link number finiscount and the skip link number skip count marked as skipped meet the condition finiscount+skip count is more than or equal to total count, ending the flow, otherwise, continuing to execute downwards;
s4, extracting an unobstructed detection item matrix for the webpage source code;
s41, selecting a rule subset from GB/T37668-2019 information technology Internet content accessible technical requirements and test methods; the rules selected meet the following criteria:
1. the realization is simple, and only depends on the webpage source code, and does not relate to image, video or audio information;
2. the detection speed is high, and the total time of all rules in a single webpage is not more than 1 second;
according to the above standard, 7 accessible rules are selected from national standards, and the rule names are respectively: non-text links, non-text controls, non-text content, user contact feedback, real-time user contact feedback, consistent navigation, in-station searching, and sitemaps;
s42, detecting the rule selected in the step S41 of the webpage source code application, wherein for one rule, the detected result is in the form of r= [ N ] s ,N p ,N f ,N i ]Where N is s N is the number of detection points p N is the number of passes of the result in the detection point f N is the number of failed detection points i The number of unknown results in the detection points;
s43, splicing vectors corresponding to the detection rules obtained in the step S42 into a matrix according to a fixed sequence, wherein the sequence of the rules is fixed, and the vectors can be ordered according to the number of the rules; the matrix format obtained is m= [ r 1 ,r 2 ,r 3 ...r n ]Wherein r is i Is the vector corresponding to the ith rule;
s5, when the link A is extracted from the webpage source code corresponding to the link B, the link A is called as a parent link of the link A, and the link A is a child link of the link B; finding the parent link of the current access link if the parent linkAll the sub links are not accessed yet, and the step S2 is returned to continue to execute; otherwise, a set c= { M of matrices obtained by step S4 will be obtained 1 ,M 2 ,M 3 ...M n M is }, where M i An unobstructed detection item matrix corresponding to the webpage source code for the ith sub-link; performing cluster analysis on the set by using a DBSCAN method based on density clustering, and dividing the set C into a plurality of clusters;
s6, for each cluster divided in the step S5, sampling according to the proportion lambda, and putting the sampling result into a set R= { H 1 ,H 2 ,H 3 ...H m The remaining results are put into t= { H } m+1 ...H n -a }; aggregate wherein H i The original webpage source codes corresponding to the ith barrier-free detection item matrix;
s7, extracting links from each element and adding the links to the URL queue for the set R acquired in the step S6; for the set T acquired in step S6, links are extracted from each element, and all links are added to the set p= { U 1 ,U 2 ,U 3 ...U n U, where i Is the ith link; links in the set P are links which are de-duplicated according to the barrier-free detection result, marked as skipped, and the skipped link number skip count is added to the number Card (P) of elements in the set P;
s8, if the finiscount+skip count is more than or equal to the total count, enough webpages are acquired, the process ends, and otherwise, the step S2 is repeatedly executed.
CN202110849849.6A 2021-07-27 2021-07-27 Crawler searching method based on barrier-free detection result deduplication Active CN113779377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110849849.6A CN113779377B (en) 2021-07-27 2021-07-27 Crawler searching method based on barrier-free detection result deduplication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110849849.6A CN113779377B (en) 2021-07-27 2021-07-27 Crawler searching method based on barrier-free detection result deduplication

Publications (2)

Publication Number Publication Date
CN113779377A CN113779377A (en) 2021-12-10
CN113779377B true CN113779377B (en) 2024-03-22

Family

ID=78836163

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110849849.6A Active CN113779377B (en) 2021-07-27 2021-07-27 Crawler searching method based on barrier-free detection result deduplication

Country Status (1)

Country Link
CN (1) CN113779377B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115373649B (en) * 2022-07-26 2023-03-31 哈尔滨亿时代数码科技开发有限公司 Dynamic internet content barrier-free transformation method and device and website content barrier-free transformation method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101694658A (en) * 2009-10-20 2010-04-14 浙江大学 Method for constructing webpage crawler based on repeated removal of news
CN101989303A (en) * 2010-11-02 2011-03-23 浙江大学 Automatic barrier-free network detection method
CN103279548A (en) * 2013-06-06 2013-09-04 浙江大学 Method for performing barrier-free detection on websites
CN112257073A (en) * 2020-10-29 2021-01-22 重庆邮电大学 Webpage duplicate removal method based on improved DBSCAN algorithm

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013051005A2 (en) * 2011-07-06 2013-04-11 Kanani Hirenkumar Nathalal A method of a web based product crawler for products offering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101694658A (en) * 2009-10-20 2010-04-14 浙江大学 Method for constructing webpage crawler based on repeated removal of news
CN101989303A (en) * 2010-11-02 2011-03-23 浙江大学 Automatic barrier-free network detection method
CN103279548A (en) * 2013-06-06 2013-09-04 浙江大学 Method for performing barrier-free detection on websites
CN112257073A (en) * 2020-10-29 2021-01-22 重庆邮电大学 Webpage duplicate removal method based on improved DBSCAN algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于URL聚类的快速无障碍检测抽样方法(英文);Meng-ni ZHANG; Can WANG; Jia-jun BU; Zhi YU; Yu ZHOU; Chun CHEN;信息与电子工程前沿(英文版);20150603(第006期);449-456 *

Also Published As

Publication number Publication date
CN113779377A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
CN110825957B (en) Deep learning-based information recommendation method, device, equipment and storage medium
US8554759B1 (en) Selection of documents to place in search index
US7844594B1 (en) Information search, retrieval and distillation into knowledge objects
Srikant et al. Mining web logs to improve website organization
JP5148278B2 (en) Method and system for selecting a language for text segmentation
JP5384837B2 (en) System and method for annotating documents
US8538943B1 (en) Providing images of named resources in response to a search query
US7231405B2 (en) Method and apparatus of indexing web pages of a web site for geographical searchine based on user location
US20150088846A1 (en) Suggesting keywords for search engine optimization
US7072890B2 (en) Method and apparatus for improved web scraping
US8271495B1 (en) System and method for automating categorization and aggregation of content from network sites
JP6785921B2 (en) Picture search method, device, server and storage medium
CN110929145B (en) Public opinion analysis method, public opinion analysis device, computer device and storage medium
CN103955529A (en) Internet information searching and aggregating presentation method
AU2007243784A1 (en) Propagating useful information among related web pages, such as web pages of a website
CN102436564A (en) Method and device for identifying falsified webpage
JP2005182817A (en) Query recognizer
CN107193987A (en) Obtain the methods, devices and systems of the search term related to the page
CN105868290B (en) Method and device for displaying search results
US20110238653A1 (en) Parsing and indexing dynamic reports
WO2021068681A1 (en) Tag analysis method and device, and computer readable storage medium
US20220215065A1 (en) Intelligent browser bookmark management
CN113779377B (en) Crawler searching method based on barrier-free detection result deduplication
Li [Retracted] Internet Tourism Resource Retrieval Using PageRank Search Ranking Algorithm
CN105204806A (en) Individual display method and device for mobile terminal webpage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant