CN113779377B - Crawler searching method based on barrier-free detection result deduplication - Google Patents
Crawler searching method based on barrier-free detection result deduplication Download PDFInfo
- Publication number
- CN113779377B CN113779377B CN202110849849.6A CN202110849849A CN113779377B CN 113779377 B CN113779377 B CN 113779377B CN 202110849849 A CN202110849849 A CN 202110849849A CN 113779377 B CN113779377 B CN 113779377B
- Authority
- CN
- China
- Prior art keywords
- link
- links
- detection
- pages
- barrier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 68
- 238000000034 method Methods 0.000 title claims abstract description 27
- 239000011159 matrix material Substances 0.000 claims abstract description 20
- 238000005070 sampling Methods 0.000 claims abstract description 7
- 239000013598 vector Substances 0.000 claims description 12
- 238000005516 engineering process Methods 0.000 claims description 7
- 238000010998 test method Methods 0.000 claims description 4
- 238000007621 cluster analysis Methods 0.000 claims description 3
- 230000009193 crawling Effects 0.000 abstract description 4
- 238000010276 construction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9566—URL specific, e.g. using aliases, detecting broken or misspelled links
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A crawler searching method based on barrier-free detection result deduplication presets the total number of pages to be crawled, circularly acquires links from a URL queue, accesses the links and acquires webpage source codes; detecting a rule subset selected from the webpage source codes, and combining detection results into a feature matrix; after all links extracted from a webpage are accessed, clustering matrixes acquired by all linked pages by using a DBSCAN algorithm; randomly sampling the clustered results of each cluster to serve as representative pages of the cluster, extracting links from the representative pages, and adding the links into a URL (uniform resource locator) queue, wherein other webpages in one cluster are marked as 'skipped' due to the fact that barrier-free detection results are similar to the representative pages, and the count of the crawled pages is directly increased without actually crawling the pages; the invention is used for the webpage link crawling stage in the automatic detection of the user friendliness degree of the website pages, and the crawling progress is quickened by controlling the number of the crawled pages, so that the detection efficiency is quickened.
Description
Technical field:
the invention belongs to the field of information barrier-free, and particularly relates to a crawler module applied to a barrier-free detection step.
The background technology is as follows:
in the big data information age, people are getting the information needed by themselves from mass data in the internet more and more. With the gradual penetration of the concept of Internet plus, information is given a wider connotation by people in the background of social development. How to realize information equalization through information technology in the information environment of the Internet, so that all people including handicapped groups and vulnerable groups can conveniently acquire and use the information, and the information is just one current social hotspot.
The web page is unobstructed, which means that a disabled person and a sound person with special requirements can acquire any information on the network. To do this, it is desirable to achieve both unobstructed web content and unobstructed auxiliary software technology for use on the web. Due to the rapid development of internet technology, the presentation forms of data information in web pages are also becoming more and more diversified. In order to display more information in a page, a developer uses technology to achieve the aim that a user pays attention to a certain type of information, and the information is usually presented in a form of a floating window, a side advertisement and the like. However, this approach often brings bad use experience to the user, and at the same time, aggravates the obstacle of the weak group (visually impaired people, elderly people) to acquire information, so that the weak group cannot acquire the normal information content in the web page normally through the auxiliary mode. Therefore, in order to reduce the information acquisition threshold of the vulnerable group, the barrier-free construction of the web page is necessary.
The detection of the web page without obstacle is an important ring in the construction of the web page without obstacle. Through carrying out barrier-free detection on the web page, various designs which are unfavorable for the target user to acquire information in the web page are found, and effective basis can be provided for subsequent barrier-free optimization of the website. The premise of carrying out overall barrier-free detection on the website is to acquire pages in the website through the crawler. Most of the current crawlers use breadth-first search and content duplication removal methods, and along with the improvement of the intelligent degree of the barrier-free detection of the web pages, the time consumption of the barrier-free detection of the single web page is longer and longer. The barrier-free detection of all pages of the website can obviously greatly increase the detection cost, and is unfavorable for rapidly objectively evaluating the barrier-free detection degree of the website, so that other crawler search methods are urgently needed to improve the detection efficiency.
The invention comprises the following steps:
aiming at the problems and difficulties, the invention provides a crawler searching method based on barrier-free detection result deduplication. Compared with the traditional breadth-first search, the method reduces the number of the webpages needing to be crawled through the similarity judgment of the barrier-free detection results, and also reduces the number of calling complex barrier-free detection methods. Compared with the method for judging the similarity of the web pages by utilizing the content and the structure of the web pages, the method for judging the similarity of the web pages is short in time consumption and low in consumption, improves the diversity of barrier-free detection results of the web sites, and improves the barrier-free detection speed of the web sites.
The crawler searching method based on the barrier-free detection result deduplication comprises the following specific steps:
s1, acquiring links of a website top page and total number total count of webpages required to be acquired from user input. And adding the links of the website top page into the URL queue.
S2, acquiring a link from the first URL queue to be crawled, and accessing the link to acquire the webpage source code. The value of the accessed link number finisccount is increased by 1.
S3, if the number of links which have been accessed finiscount and the number of links marked as skipped skip meet the condition finiscount+skip count not less than total count, ending the flow, otherwise continuing to execute downwards.
S4, extracting an unobstructed detection item matrix for the webpage source code.
S41, selecting a rule subset from GB/T37668-2019 information technology Internet content accessible technical requirements and test methods. The rules selected meet the following criteria: the realization is simple, and only depends on the webpage source code, and does not relate to image, video or audio information. The detection speed is high, and the total time of all rules on a single webpage is not more than 1 second. According to the above standard, 7 accessible rules are selected from national standards, and the rule names are respectively: non-text links, non-text controls, non-text content, user contact feedback, real-time user contact feedback, consistent navigation, in-site searching, and sitemaps.
S42, detecting the rule selected in the step S41 of the webpage source code application, wherein for one rule, the detected result is in the form of r= [ N ] s ,N p ,N f ,N i ]Where N is s N is the number of detection points p N is the number of passes of the result in the detection point f N is the number of failed detection points i Is the result of the detection point isUnknown quantity.
S43, splicing vectors corresponding to the detection rules obtained in the step S42 into a matrix according to a fixed sequence, wherein the sequence of the rules is fixed, and the vectors can be ordered according to the number of the rules. The matrix format obtained is m= [ r 1 ,r 2 ,r 3 ...r n ]Wherein r is i Is the vector corresponding to the ith rule.
S5, when the link A is extracted from the webpage source code corresponding to the link B, the link A is called as a parent link of the link B, and the link A is a child link of the link B. Finding the parent link of the current access link, and returning to the step S2 to continue execution if all child links of the parent link are not all accessed. Otherwise, a set c= { M of matrices obtained by step S4 will be obtained 1 ,M 2 ,M 3 ...M n M is }, where M i And detecting a term matrix for the ith sub-link corresponding to the barrier-free webpage source code. And performing cluster analysis on the set by using a DBSCAN method based on density clustering, and dividing the set C into a plurality of clusters.
S6, for each cluster divided in the step S5, sampling according to the proportion lambda, and putting the sampling result into a set R= { H 1 ,H 2 ,H 3 ...H m The remaining results are put into t= { H } m+1 ...H n }. Aggregate wherein H i And detecting the original webpage source code corresponding to the item matrix for the ith barrier-free detection.
S7, extracting links from each element and adding the links to the URL queue for the set R acquired in the step S6. For the set T acquired in step S6, links are extracted from each element, and all links are added to the set p= { U 1 ,U 2 ,U 3 ...U n U, where i Is the ith link. Links in set P are links that are de-duplicated based on the barrier-free detection result, marked as skipped, and the number of skipped links skip count is added to the number of elements Card (P) in set P.
S8, if the finiscount+skip count is more than or equal to the total count, enough webpages are acquired, the process ends, and otherwise, the step S2 is repeatedly executed.
In summary, the invention provides a crawler searching method based on barrier-free detection result deduplication, which has the following beneficial effects:
(1) By reducing the number of the web pages to be crawled, the speed of the overall barrier-free detection of the website is improved.
(2) Compared with a similarity judging method based on webpage content and webpage structure, the method has the advantages that the step of detecting the webpage without obstacle and the step of crawling the webpage are integrated, and no extra calculation is needed by taking the detection result without obstacle as the characteristic.
(3) The method has universality, uses the barrier-free detection result as a characteristic, does not depend on the content or structure of the website, and can be implemented on different types of websites.
Description of the drawings:
in order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 shows a general flow chart of a crawler search method based on barrier-free detection result deduplication provided by the invention.
Fig. 2 shows a flowchart of obtaining a web page barrier-free detection result matrix in a general flowchart of the crawler search method based on barrier-free detection result deduplication provided by the invention.
The specific implementation method comprises the following steps:
exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Taking a website as an example, the method comprises the following specific steps:
s1, acquiring links of a website top page and total number total count of webpages required to be acquired from user input. And adding the links of the website top page into the URL queue.
S2, acquiring a link from the first URL queue to be crawled, and accessing the link to acquire the webpage source code. The value of the accessed link number finisccount is increased by 1.
S3, if the number of links which have been accessed finiscount and the number of links marked as skipped skip meet the condition finiscount+skip count not less than total count, ending the flow, otherwise continuing to execute downwards.
S4, extracting an unobstructed detection item matrix for the webpage source code.
S41, selecting a rule subset from GB/T37668-2019 information technology Internet content accessible technical requirements and test methods. The rules selected meet the following criteria:
1. the realization is simple, and only depends on the webpage source code, and does not relate to image, video or audio information.
2. The detection speed is high, and the total time of all rules on a single webpage is not more than 1 second.
According to the above standard, 7 accessible rules are selected from national standards, and the rule names are respectively: non-text links, non-text controls, non-text content, user contact feedback, real-time user contact feedback, consistent navigation, in-site searching, and sitemaps.
S42, detecting the rule selected in the step S41 of the webpage source code application, wherein for one rule, the detected result is in the form of r= [ N ] s ,N p ,N f ,N i ]Where N is s N is the number of detection points p N is the number of passes of the result in the detection point f N is the number of failed detection points i Is the number of unknown results in the detection point.
S43, splicing vectors corresponding to the detection rules obtained in the step S42 into a matrix according to a fixed sequence, wherein the sequence of the rules is fixed, and the vectors can be ordered according to the number of the rules. The matrix format obtained is m= [ r 1 ,r 2 ,r 3 ...r n ]Wherein r is i Is the vector corresponding to the ith rule.
S5, when the link A is extracted from the webpage source code corresponding to the link B, the link A is called as a parent link of the link B, and the link A is a child link of the link B. Finding the parent link of the current access link, and returning to the step S2 to continue execution if all child links of the parent link are not all accessed. Otherwise, a set c= { M of matrices obtained by step S4 will be obtained 1 ,M 2 ,M 3 ...M n M is }, where M i And detecting a term matrix for the ith sub-link corresponding to the barrier-free webpage source code. And performing cluster analysis on the set by using a DBSCAN method based on density clustering, and dividing the set C into a plurality of clusters.
S6, for each cluster divided in the step S5, sampling according to the proportion lambda, and putting the sampling result into a set R= { H 1 ,H 2 ,H 3 ...H m The remaining results are put into t= { H } m+1 ...H n }. Aggregate wherein H i And detecting the original webpage source code corresponding to the item matrix for the ith barrier-free detection.
S7, extracting links from each element and adding the links to the URL queue for the set R acquired in the step S6. For the set T acquired in step S6, links are extracted from each element, and all links are added to the set p= { U 1 ,U 2 ,U 3 ...U n U, where i Is the ith link. Links in set P are links that are de-duplicated based on the barrier-free detection result, marked as skipped, and the number of skipped links skip count is added to the number of elements Card (P) in set P.
S8, if the finiscount+skip count is more than or equal to the total count, enough webpages are acquired, the process ends, and otherwise, the step S2 is repeatedly executed.
FIG. 1 shows a general flow chart of a crawler search method based on barrier-free detection result deduplication provided by the invention
Fig. 2 shows a flowchart of obtaining a web page unobstructed detection result matrix in a general flowchart of a crawler search method based on unobstructed detection result deduplication provided by the invention: s41 No barrier is provided for Internet content from GB/T37668-2019 information technologyThe accessibility specification and test method selects a subset of rules. The rules selected meet the following criteria: 1. the realization is simple, and only depends on the webpage source code, and does not relate to image, video or audio information. 2. The detection speed is high, and the total time of all rules on a single webpage is not more than 1 second. According to the above standard, 7 accessible rules are selected from national standards, and the rule names are respectively: non-text links, non-text controls, non-text content, user contact feedback, real-time user contact feedback, consistent navigation, in-site searching, and sitemaps. S42, detecting the rule selected in the step S41 of the webpage source code application, wherein for one rule, the detected result is in the form of r= [ N ] s ,N p ,N f ,N i ]Where N is s N is the number of detection points p N is the number of passes of the result in the detection point f N is the number of failed detection points i Is the number of unknown results in the detection point. S43, splicing vectors corresponding to the detection rules obtained in the step S42 into a matrix according to a fixed sequence, wherein the sequence of the rules is fixed, and the vectors can be ordered according to the number of the rules. The matrix format obtained is m= [ r 1 ,r 2 ,r 3 ...r n ]Wherein r is i Is the vector corresponding to the ith rule.
The embodiments described in the present specification are merely examples of implementation forms of the inventive concept, and the scope of protection of the present invention should not be construed as being limited to the specific forms set forth in the embodiments, and the scope of protection of the present invention and equivalent technical means that can be conceived by those skilled in the art based on the inventive concept.
Claims (1)
1. A crawler searching method based on barrier-free detection result deduplication comprises the following steps:
s1, acquiring links of a website top page and total number total count of webpages to be acquired from user input; adding the links of the website home page into a URL queue;
s2, acquiring a link from the first URL queue to be crawled, and accessing the link to acquire a webpage source code; the value of the accessed link number finisccount plus 1;
s3, if the accessed link number finiscount and the skip link number skip count marked as skipped meet the condition finiscount+skip count is more than or equal to total count, ending the flow, otherwise, continuing to execute downwards;
s4, extracting an unobstructed detection item matrix for the webpage source code;
s41, selecting a rule subset from GB/T37668-2019 information technology Internet content accessible technical requirements and test methods; the rules selected meet the following criteria:
1. the realization is simple, and only depends on the webpage source code, and does not relate to image, video or audio information;
2. the detection speed is high, and the total time of all rules in a single webpage is not more than 1 second;
according to the above standard, 7 accessible rules are selected from national standards, and the rule names are respectively: non-text links, non-text controls, non-text content, user contact feedback, real-time user contact feedback, consistent navigation, in-station searching, and sitemaps;
s42, detecting the rule selected in the step S41 of the webpage source code application, wherein for one rule, the detected result is in the form of r= [ N ] s ,N p ,N f ,N i ]Where N is s N is the number of detection points p N is the number of passes of the result in the detection point f N is the number of failed detection points i The number of unknown results in the detection points;
s43, splicing vectors corresponding to the detection rules obtained in the step S42 into a matrix according to a fixed sequence, wherein the sequence of the rules is fixed, and the vectors can be ordered according to the number of the rules; the matrix format obtained is m= [ r 1 ,r 2 ,r 3 ...r n ]Wherein r is i Is the vector corresponding to the ith rule;
s5, when the link A is extracted from the webpage source code corresponding to the link B, the link A is called as a parent link of the link A, and the link A is a child link of the link B; finding the parent link of the current access link if the parent linkAll the sub links are not accessed yet, and the step S2 is returned to continue to execute; otherwise, a set c= { M of matrices obtained by step S4 will be obtained 1 ,M 2 ,M 3 ...M n M is }, where M i An unobstructed detection item matrix corresponding to the webpage source code for the ith sub-link; performing cluster analysis on the set by using a DBSCAN method based on density clustering, and dividing the set C into a plurality of clusters;
s6, for each cluster divided in the step S5, sampling according to the proportion lambda, and putting the sampling result into a set R= { H 1 ,H 2 ,H 3 ...H m The remaining results are put into t= { H } m+1 ...H n -a }; aggregate wherein H i The original webpage source codes corresponding to the ith barrier-free detection item matrix;
s7, extracting links from each element and adding the links to the URL queue for the set R acquired in the step S6; for the set T acquired in step S6, links are extracted from each element, and all links are added to the set p= { U 1 ,U 2 ,U 3 ...U n U, where i Is the ith link; links in the set P are links which are de-duplicated according to the barrier-free detection result, marked as skipped, and the skipped link number skip count is added to the number Card (P) of elements in the set P;
s8, if the finiscount+skip count is more than or equal to the total count, enough webpages are acquired, the process ends, and otherwise, the step S2 is repeatedly executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110849849.6A CN113779377B (en) | 2021-07-27 | 2021-07-27 | Crawler searching method based on barrier-free detection result deduplication |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110849849.6A CN113779377B (en) | 2021-07-27 | 2021-07-27 | Crawler searching method based on barrier-free detection result deduplication |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113779377A CN113779377A (en) | 2021-12-10 |
CN113779377B true CN113779377B (en) | 2024-03-22 |
Family
ID=78836163
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110849849.6A Active CN113779377B (en) | 2021-07-27 | 2021-07-27 | Crawler searching method based on barrier-free detection result deduplication |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113779377B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115373649B (en) * | 2022-07-26 | 2023-03-31 | 哈尔滨亿时代数码科技开发有限公司 | Dynamic internet content barrier-free transformation method and device and website content barrier-free transformation method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101694658A (en) * | 2009-10-20 | 2010-04-14 | 浙江大学 | Method for constructing webpage crawler based on repeated removal of news |
CN101989303A (en) * | 2010-11-02 | 2011-03-23 | 浙江大学 | Automatic barrier-free network detection method |
CN103279548A (en) * | 2013-06-06 | 2013-09-04 | 浙江大学 | Method for performing barrier-free detection on websites |
CN112257073A (en) * | 2020-10-29 | 2021-01-22 | 重庆邮电大学 | Webpage duplicate removal method based on improved DBSCAN algorithm |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013051005A2 (en) * | 2011-07-06 | 2013-04-11 | Kanani Hirenkumar Nathalal | A method of a web based product crawler for products offering |
-
2021
- 2021-07-27 CN CN202110849849.6A patent/CN113779377B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101694658A (en) * | 2009-10-20 | 2010-04-14 | 浙江大学 | Method for constructing webpage crawler based on repeated removal of news |
CN101989303A (en) * | 2010-11-02 | 2011-03-23 | 浙江大学 | Automatic barrier-free network detection method |
CN103279548A (en) * | 2013-06-06 | 2013-09-04 | 浙江大学 | Method for performing barrier-free detection on websites |
CN112257073A (en) * | 2020-10-29 | 2021-01-22 | 重庆邮电大学 | Webpage duplicate removal method based on improved DBSCAN algorithm |
Non-Patent Citations (1)
Title |
---|
基于URL聚类的快速无障碍检测抽样方法(英文);Meng-ni ZHANG; Can WANG; Jia-jun BU; Zhi YU; Yu ZHOU; Chun CHEN;信息与电子工程前沿(英文版);20150603(第006期);449-456 * |
Also Published As
Publication number | Publication date |
---|---|
CN113779377A (en) | 2021-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110825957B (en) | Deep learning-based information recommendation method, device, equipment and storage medium | |
US8554759B1 (en) | Selection of documents to place in search index | |
US7844594B1 (en) | Information search, retrieval and distillation into knowledge objects | |
Srikant et al. | Mining web logs to improve website organization | |
JP5148278B2 (en) | Method and system for selecting a language for text segmentation | |
JP5384837B2 (en) | System and method for annotating documents | |
US8538943B1 (en) | Providing images of named resources in response to a search query | |
US7231405B2 (en) | Method and apparatus of indexing web pages of a web site for geographical searchine based on user location | |
US20150088846A1 (en) | Suggesting keywords for search engine optimization | |
US7072890B2 (en) | Method and apparatus for improved web scraping | |
US8271495B1 (en) | System and method for automating categorization and aggregation of content from network sites | |
JP6785921B2 (en) | Picture search method, device, server and storage medium | |
CN110929145B (en) | Public opinion analysis method, public opinion analysis device, computer device and storage medium | |
CN103955529A (en) | Internet information searching and aggregating presentation method | |
AU2007243784A1 (en) | Propagating useful information among related web pages, such as web pages of a website | |
CN102436564A (en) | Method and device for identifying falsified webpage | |
JP2005182817A (en) | Query recognizer | |
CN107193987A (en) | Obtain the methods, devices and systems of the search term related to the page | |
CN105868290B (en) | Method and device for displaying search results | |
US20110238653A1 (en) | Parsing and indexing dynamic reports | |
WO2021068681A1 (en) | Tag analysis method and device, and computer readable storage medium | |
US20220215065A1 (en) | Intelligent browser bookmark management | |
CN113779377B (en) | Crawler searching method based on barrier-free detection result deduplication | |
Li | [Retracted] Internet Tourism Resource Retrieval Using PageRank Search Ranking Algorithm | |
CN105204806A (en) | Individual display method and device for mobile terminal webpage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |