CN103744944A

CN103744944A - Method for re-filtering in webpage or data crawling by web crawler

Info

Publication number: CN103744944A
Application number: CN201310754635.6A
Authority: CN
Inventors: 朱龙腾
Original assignee: SHANGHAI BOSHI INFORMATION SCIENCE & TECHNOLOGY Co Ltd
Current assignee: SHANGHAI BOSHI INFORMATION SCIENCE & TECHNOLOGY Co Ltd
Priority date: 2013-12-31
Filing date: 2013-12-31
Publication date: 2014-04-23

Abstract

The invention discloses a method for re-filtering in webpage or data crawling by a web crawler. The method includes the processes: inputting keywords of information to be searched; searching the address of a URL (uniform resource locator) by a server; crawling information of a target webpage from the searched address of the URL; inputting secondary search keywords again; crawling the information of the webpage again; outputting target information. Based on automatic webpage search by the web crawler, the webpage is re-filtered, the information quantity of an internet is quite large, people need to consume a lot of manpower and do not know whether the information is the best or not if people want to search the target information, search information is detailed by the method, and people can conveniently and effectively acquire the target information.

Description

The method that web crawlers refilters when capturing webpage or data

Invention field

The present invention relates to a kind of method that captures webpage in rope process of receiving, belong to networking technology area.

Background technology

Web crawlers is a program of automatically extracting webpage, and it is search engine downloading web pages WWW, is the important composition of search engine.Tradition reptile, from the URL of one or several Initial pages, obtains the URL on Initial page, and in capturing the process of webpage, constantly from current page, extracting new URL puts into queue, until meet certain stop condition of system.Web crawlers is a kind of according to certain rule, captures automatically program or the script of WWW information.The name that other is seldom used also has ant, automatic indexing, simulator program or worm.Its receive rope target web accuracy be not also very high, for we obtain the information needing, brought certain difficulty.For this reason, we to propose a kind of web crawlers are the methods of filtering capturing webpage or data.

Summary of the invention

The present invention captures the inaccurate problem of target web, a kind of method that provides web crawlers to refilter when capturing webpage or data for solving current web crawlers in receiving rope process.The present invention includes following steps:

Step 1: input need to be received the keyword of rope information;

Step 2: server is received the address of rope URL;

Step 3: the information that captures target web from received rope URL address;

Step 4: again input secondary and receive rope keyword;

Step 5: the information that again captures webpage;

Step 6: export target information.

Invention effect: the present invention automatically receives at web crawlers on the basis of rope webpage webpage is filtered again, now quantity of information is on the internet very large, if we are wanted to look for target information, need to expend very large manpower, and do not know whether this information is best one, the method refinement receipts rope information, for we obtain target information, provide method easily and effectively.

Accompanying drawing explanation

Fig. 1 is that web crawlers refilters the process flow diagram of method when capturing webpage or data.

Embodiment

Embodiment: refilter the process flow diagram 1 of method when capturing webpage or data referring to web crawlers, present embodiment is comprised of following steps:

Step 1: input need to be received the keyword of rope information;

Step 2: server is received the address of rope URL;

Step 4: again input secondary and receive rope keyword;

Step 5: the information that again captures webpage;

Step 6: export target information.

Input need to be received the length of the keyword of rope information and do not limit, server is received the address of rope URL and before keyword is analyzed, then the address of the receipts rope URL selecting, the information that captures target web from received rope URL address shows with the form of list, and again inputting secondary, to receive rope keyword be descriptive words more specifically in target information.

To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned one exemplary embodiment, and in the situation that not deviating from spirit of the present invention or essential characteristic, can realize the present invention with other concrete form.Therefore, no matter from which point, all should regard example as exemplary, and be nonrestrictive, scope of the present invention is limited by claims rather than above-mentioned explanation, is therefore intended to include in the present invention dropping on the implication that is equal to important document of claim and all changes in scope.Any Reference numeral in claim should be considered as limiting related claim.

Claims

1. the method that web crawlers refilters when capturing webpage or data, is characterized in that it is realized by following steps:

Step 1: input need to be received the keyword of rope information;

Step 2: server is received the address of rope URL;

Step 4: again input secondary and receive rope keyword;

Step 5: the information that again captures webpage;

Step 6: export target information.

2. the method refiltering when capturing webpage or data according to web crawlers described in claims 1, is characterized in that: server described in step 2 is received the address of rope URL and before keyword analyzed, the address of the receipts rope URL then selecting.

3. the method refiltering when capturing webpage or data according to web crawlers described in claims 1, is characterized in that: described in step 3, from received rope URL address, capture the information of target web with the form demonstration of list.

4. the method refiltering when capturing webpage or data according to web crawlers described in claims 1, is characterized in that: described in step 4, again inputting secondary, to receive rope keyword be descriptive words more specifically in target information.