CN108595543A - Data grab method, device and network crawler system - Google Patents

Data grab method, device and network crawler system Download PDF

Info

Publication number
CN108595543A
CN108595543A CN201810306740.6A CN201810306740A CN108595543A CN 108595543 A CN108595543 A CN 108595543A CN 201810306740 A CN201810306740 A CN 201810306740A CN 108595543 A CN108595543 A CN 108595543A
Authority
CN
China
Prior art keywords
proxy server
data
downloader
host node
selector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810306740.6A
Other languages
Chinese (zh)
Inventor
田春燕
付鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Knownsec Information Technology Co Ltd
Original Assignee
Beijing Knownsec Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Knownsec Information Technology Co Ltd filed Critical Beijing Knownsec Information Technology Co Ltd
Priority to CN201810306740.6A priority Critical patent/CN108595543A/en
Publication of CN108595543A publication Critical patent/CN108595543A/en
Pending legal-status Critical Current

Links

Abstract

A kind of data grab method of the embodiment of the present application offer, device and network crawler system, method and device is applied to the network crawler system for including host node, selector and proxy server pond.Host node response data fetching instruction, establishment include the downloader of proxy server blacklist, and distribute data grabber task to downloader;Downloader is by host node to selector request agency server;Selector determines a proxy server in proxy server pond, and when the proxy server is available, downloader is distributed to using the proxy server as target proxy server;When target proxy server is not in proxy server blacklist, downloader executes data grabber task to capture data from network, and from the extracting data structural data grabbed by target proxy server;When structural data can be extracted, the structural data extracted is sent to host node and is stored.In this way, can avoid interruption and the obstruction of web crawlers.

Description

Data grab method, device and network crawler system
Technical field
This application involves Internet technical fields, in particular to a kind of data grab method, device and web crawlers System.
Background technology
Web crawlers, also referred to as webpage spider, network robot, webpage follower, ant, automatic indexing, simulation journey Sequence or worm etc. are a kind of programs or script automatically capturing webpage and data on the internet according to certain rule.Net Content is collected and downloaded on the internet to network reptile according to the strategy of formulation, and structural data is extracted from the content of download, and The structural data extracted is stored.
Presently, there are some general network crawler systems, the variation that programmer can be as desired only changes the system In strategy, other parts still sample program original in the system, and without rewriting a web crawler.
It can be seen that existing network crawler system is focused primarily upon and how efficiently to be collected and downloading data, but in mesh Under preceding network environment, the Internet resources that network crawler system can use are limited, and especially many websites are for flow The purpose of protection forbids the address same IP (Internet Protocol) largely to access in a short time.These websites can shield The access from these IP address is covered, to cause interruption or the obstruction of network crawler system, and then leads to network crawler system Efficiency reduce.
Apply for content
In view of this, the purpose of the application includes a kind of data grab method of offer, device and network crawler system, to change The kind above problem.
In order to achieve the above object, the embodiment of the present application adopts the following technical scheme that:
In a first aspect, the embodiment of the present application provides a kind of data grab method, it is applied to network crawler system, which climbs Worm system includes host node, selector and proxy server pond;The method includes:
The host node response data fetching instruction creates the downloader for collecting simultaneously downloading data from network, described Downloader includes a proxy server blacklist;
The host node distributes data grabber task to newly-built downloader;
The downloader is by the host node to the selector request agency server;
The selector determines a proxy server in the proxy server pond, when the proxy server is available, The proxy server as target proxy server and is distributed into the downloader;
When the target proxy server is not in the proxy server blacklist, the downloader passes through the mesh It marks proxy server and executes the data grabber task to capture data from network, and from the extracting data structuring grabbed Data;
When structural data can be extracted, the structural data extracted is sent to the host node;
The host node stores the structural data into preset database.
Optionally, the data grab method provided according to the embodiment of the present application first aspect, the method further include:
When the proxy server that the selector determines in the proxy server pond is unavailable, taken from the agency Business device removes the proxy server in pond, and redefines a proxy server.
Optionally, the data grab method provided according to the embodiment of the present application first aspect, the method further include:
When the target proxy server is in the proxy server blacklist, the downloader is again through described Host node is to the selector request agency server.
Optionally, the data grab method provided according to the embodiment of the present application first aspect, the method further include:
When that can not extract structural data, the agency is recorded in the target proxy server by the downloader In server blacklist, and again through the host node to the selector request agency server.
Optionally, the data grab method provided according to the embodiment of the present application first aspect, the method further include:
At the end of the data grabber task, the downloader and the proxy server blacklist are deleted.
Second aspect, the embodiment of the present application also provide a kind of data grabber device, are applied to network crawler system, the network Crawler system includes host node, selector and proxy server pond;Described device includes:
Creation module is created for controlling the host node response data fetching instruction for collecting and downloading from network The downloader of data, the downloader include a proxy server blacklist;
Task allocating module distributes data grabber task for controlling the host node to newly-built downloader;
Request module, for controlling the downloader by the host node to the selector request agency server;
Module is chosen, a proxy server is determined in the proxy server pond for controlling the selector, when this When proxy server can be used, which as target proxy server and is distributed into the downloader;
Task execution module, for when the target proxy server is not in the proxy server blacklist, controlling It makes the downloader and the crawl task is executed to capture data from network by the target proxy server, and from grabbing Extracting data structural data;
Memory module, for when structural data can be extracted, the structural data extracted being sent to described Host node, and controlling the host node will be in structural data storage to preset database.
Optionally, the data grabber device provided according to the embodiment of the present application second aspect, described device further include:
It is unavailable to be additionally operable to the proxy server determined in the proxy server pond when the selector for remove module When, it controls the selector and removes the proxy server from the proxy server pond, and redefine a proxy server.
Optionally, the data grabber device provided according to the embodiment of the present application second aspect, the request module are additionally operable to When the target proxy server is in the proxy server blacklist, the downloader is controlled again through the main section It puts to the selector request agency server.
Optionally, the data grabber device provided according to the embodiment of the present application second aspect, described device further include:
Logging modle takes the target proxy for when that can not extract structural data, controlling the downloader Business device is recorded in the proxy server blacklist, and again through the host node to the selector request agency service Device.
The third aspect, the embodiment of the present application also provide a kind of network crawler system, including host node, selector and agency's clothes Business device pond;
The host node response data fetching instruction creates the downloader for collecting simultaneously downloading data from network, described Downloader includes a proxy server blacklist;
The host node distributes data grabber task to newly-built downloader;
The downloader is by the host node to the selector request agency server;
The selector determines a proxy server in the proxy server pond, when the proxy server is available, The proxy server as target proxy server and is distributed into the downloader;
When the target proxy server is not in the proxy server blacklist, the downloader passes through the mesh It marks proxy server and executes the data grabber task to capture data from network, and from the extracting data structuring grabbed Data;
When structural data can be extracted, the structural data extracted is sent to the host node;
The host node stores the structural data into preset database.
Compared to the prior art, the embodiment of the present application has the advantages that:
A kind of data grab method, device and network crawler system provided by the embodiments of the present application, the network crawler system Including host node, selector and proxy server pond.Host node response data fetching instruction, establishment includes the black name of proxy server Single downloader, and distribute data grabber task to downloader;Downloader is by host node to selector request agency server; Selector determines a proxy server in proxy server pond, and when the proxy server is available, which is made Downloader is distributed to for target proxy server;When target proxy server is not in proxy server blacklist, downloader Data grabber task is executed to capture data from network, and from the extracting data structure grabbed by target proxy server Change data;When structural data can be extracted, the structural data extracted is sent to host node and is stored.Such as This, can avoid interruption and the obstruction of web crawlers, and then improve the efficiency of network crawler system.
Description of the drawings
It, below will be to needed in the embodiment attached in order to illustrate more clearly of the technical solution of the embodiment of the present application Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 is a kind of connection block diagram of web crawlers system provided by the embodiments of the present application;
Fig. 2 is a kind of flow diagram of data grab method provided by the embodiments of the present application;
Fig. 3 is a kind of functional block diagram of data grabber device provided by the embodiments of the present application;
Fig. 4 is the another functional block diagram of data grabber device provided by the embodiments of the present application.
Icon:100- network crawler systems;110- data grabber devices;111- creation modules;112- task allocating modules; 113- request modules;114- chooses module;115- task execution modules;116- memory modules;117- remove modules;118- is recorded Module;119- removing modules;120- host nodes;130- selectors;140- proxy servers pond;150- downloaders;The rear ends 160- Database;200- networks.
Specific implementation mode
To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, technical solutions in the embodiments of the present application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is implemented The component of example can be arranged and be designed with a variety of different configurations.
Therefore, below the detailed description of the embodiments herein to providing in the accompanying drawings be not intended to limit it is claimed Scope of the present application, but be merely representative of the selected embodiment of the application.Based on the embodiment in the application, this field is common The every other embodiment that technical staff is obtained without creative efforts belongs to the model of the application protection It encloses.
It should be noted that:Similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined, then it further need not be defined and explained in subsequent attached drawing in a attached drawing.
As shown in Figure 1, being a kind of connection block diagram of network crawler system 100 provided by the embodiments of the present application.The network Crawler system 100 includes data grabber device 110, host node 120 (Master), selector (Selector) 130 and agency's clothes Be engaged in device pond (Proxy Pool) 140.Wherein, the host node 120 and a back-end data base 160 communicate to connect, the selector 130 communicate to connect with the proxy server pond 140, and the host node 120 is communicated to connect with the selector 130.
In the present embodiment, the host node 120 is the main part of the network crawler system 100, is responsible for task tune Degree, database purchase etc..Available proxy server is stored in the proxy server pond 140, the selector 130 is responsible for It safeguards the proxy server pond 140, choose proxy server (Proxy) from the proxy server pond 140 and detect choosing Whether the proxy server taken can be used.The back-end data base 160 is for storing the structuring number that the host node 120 extracts According to.
When implementing, the host node 120 is when receiving data grabber instruction to open a data grabber task, meeting A downloader 150 (Downloader) is created, which includes a caching for being used for storage agent server blacklist (Proxy Blacklist Cache).The content and data on network 200 are collected and downloaded to the downloader 150 for being responsible for Structuring etc..
As shown in Fig. 2, being a kind of flow diagram of data grab method provided by the embodiments of the present application, the data grabber Method is applied to network crawler system 100 shown in FIG. 1.The content of this method is described in detail with reference to Fig. 2.
Step S201, host node 120 response data fetching instruction are created for being collected from network 200 and downloading data Downloader 150, the downloader 150 include a proxy server blacklist.
Wherein, the data grabber instruction can be the behaviour for the beginning data grabber that user executes in respective user interfaces Make corresponding instruction, the proxy server blacklist creates together with the downloader 150, and by the downloader 150 It safeguards.
In the present embodiment, host node 120 can create one, two or more downloaders 150 according to actual conditions, this Embodiment is without limitation, by creating more than two (including two) downloaders 150, the data captured can be needed to distribute It is captured to more than two downloaders 150, improves efficiency.
Step S202, the host node 120 distribute data grabber task to newly-built downloader 150.
Wherein, when the downloader 150 creates successfully, the host node 120 is instructed according to the data grabber to this Downloader 150 distributes data grabber task.
Step S203, the downloader 150 is by the host node 120 to 130 request agency server of the selector.
Step S204, the selector 130 determines a proxy server in the proxy server pond 140, and detects Whether the proxy server can be used.If so, S205 is thened follow the steps, if it is not, thening follow the steps S206.
When implementation, the downloader 150 can send the request for request agency server to the host node 120, The host node 120 controls the selector 130 in the proxy server pond 140 randomly when receiving the request Determine a proxy server.It is worth noting that, for each proxy server that the selector 130 determines, all can herein Detect whether the proxy server can be used.
Step S205, the selector 130 using the proxy server as target proxy server and distribute to it is described under Carry device 150.
In the present embodiment, when the selector 130 judges that identified proxy server is available, the selector The target proxy server is distributed to the downloader 150 by 130.
Step S206, the selector 130 removes the proxy server from the proxy server pond 140, and returns The step S204.
In the present embodiment, when the selector 130 judges that identified proxy server is unavailable, from the agency The proxy server is removed in server pools 140, and redefines a proxy server in the proxy server pond 140, Detect whether the proxy server redefined can be used again.Until when the proxy server determined is available, by the proxy server The downloader 150 is distributed to as target proxy server.
In this way, can in time renewal agency server pools 140 and network crawler system 100 because of selected generation Reason server is unavailable and interrupts.
Step S207, the downloader 150 judge when obtaining the target proxy server that the selector 130 distributes Whether the target proxy server is in the proxy server blacklist.If so, return to step S203, with again through The host node 120 is to 130 request agency server of the selector;If it is not, thening follow the steps S208.
In the present embodiment, the proxy server in the proxy server blacklist is shielded agency service Device.Through the above steps, it can be shielded proxy server to avoid selected proxy server, and then avoid network Crawler system 100 is interrupted because proxy server is by shielding.
Step S208, the downloader 150 by the target proxy server execute the data grabber task with from Network 200 captures data, from the extracting data structural data grabbed, and judges whether that structuring number can be extracted According to.If so, thening follow the steps S209;If it is not, thening follow the steps S210.
The structural data extracted is sent to the host node 120 by step S209, the downloader 150.
The proxy server blacklist is recorded in the target proxy server by step S210, the downloader 150 In, and return to step S203.
In the present embodiment, possibly can not include all quilts since the proxy server blacklist is pre-set The proxy server of shielding.Based on this, for the present embodiment after grabbing data, the downloader 150 judges the number grabbed According to whether be can structuring, if it is can structuring, show that the target proxy server is not shielded, if not Can structuring, show that the target proxy server is shielded.
It is shielded in the proxy server, the target proxy server can be added and be recorded In the proxy server blacklist, and request agency server again.
Step S211, the host node 120 store the structural data into preset database.
Wherein, the preset database can be the back-end data base communicated to connect with the host node 120 160。
Optionally, in the present embodiment, the host node 120 can detect whether the data grabber task is tied in real time Beam can delete the proxy server pond 140 that the downloader 150 and the downloader 150 are safeguarded if being over.
By the above process, using pool technology is acted on behalf of, automatically distribute and safeguard the agency service that downloader 150 uses Device solves the problems, such as to shield the IP address of network crawler system 100 under current network conditions.And avoid web crawlers system The interruption of system 100 and obstruction, it is steady in structure, ripe, improve the efficiency for crawling data.
As shown in figure 3, be a kind of functional block diagram of data grabber device 110 provided by the embodiments of the present application, application In network crawler system 100 shown in FIG. 1.
The data grabber device 110 includes creation module 111, task allocating module 112, request module 113, chooses mould Block 114, task execution module 115 and memory module 116.
Wherein, the creation module 111 is for controlling 120 response data fetching instruction of the host node, create for from Network 200 is collected and the downloader 150 of downloading data, and the downloader 150 includes a proxy server blacklist.
In the present embodiment, the description as described in the creation module 111 is specifically referred to the detailed of step S201 shown in Fig. 2 Thin description, i.e. step S201 can be executed by the creation module 111.
The task allocating module 112 is used to control the host node 120 and distributes data grabber to newly-built downloader 150 Task.
In the present embodiment, the description as described in the task allocating module 112 is specifically referred to step S202 shown in Fig. 2 Detailed description, i.e. step S202 can execute by the task allocating module 112.
The request module 113 is for controlling the downloader 150 by the host node 120 to the selector 130 Request agency server.
In the present embodiment, the description as described in the request module 113 is specifically referred to step S203 shown in figure Fig. 2 Detailed description, i.e. step S203 can be executed by the request module 113.
The selection module 114 determines an agency for controlling the selector 130 in the proxy server pond 140 The proxy server as target proxy server and is distributed to the download by server when the proxy server is available Device 150.
In the present embodiment, the description as described in the selection module 114 is specifically referred to step S204 shown in Fig. 2 and step The detailed description of rapid S205, i.e. step S204 and step S205 can be executed by the selection module 114.
The task execution module 115 is used for when the target proxy server is not in the proxy server blacklist When, it controls the downloader 150 and the crawl task is executed to capture number from network 200 by the target proxy server According to, and from the extracting data structural data grabbed.
Optionally, the request module 113 can be also used for when the target proxy server is in the proxy server When in blacklist, the downloader 150 is controlled again through the host node 120 to 130 request agency service of the selector Device.
The memory module 116 is used for when that can extract structural data, and the structural data extracted is sent To the host node 120, and controlling the host node 120 will be in structural data storage to preset database.
Optionally, as shown in figure 4, the data grabber device 110 can also include remove module 117.
Agency's clothes that the remove module 117 is used to determine in the proxy server pond 140 when the selector 130 When business device is unavailable, controls the selector 130 and remove the proxy server from the proxy server pond 140, and again Determine a proxy server.
Optionally, the data grabber device 110 can also include logging modle 118.
The logging modle 118 is used for when that can not extract structural data, controls the downloader 150 by the mesh Mark proxy server is recorded in the proxy server blacklist, and again through the host node 120 to the selector 130 request agency servers.
Optionally, the data grabber device 110 can also include removing module 119.
The removing module 119 is used at the end of the data grabber task, deletes the downloader 150 and the generation Manage server blacklist.
The description as described in above-mentioned module specifically refers to the detailed description to corresponding steps in the above, herein no longer It repeats.
In conclusion data grab method provided by the embodiments of the present application, device and network crawler system 100, network is climbed Worm system 100 includes host node 120, selector 130 and proxy server pond 140.120 response data fetching instruction of host node, Establishment includes the downloader 150 of proxy server blacklist, and distributes data grabber task to downloader 150;Downloader 150 is logical Host node 120 is crossed to 130 request agency server of selector;Selector 130 determines agency's clothes in proxy server pond 140 Business device distributes to downloader 150 when the proxy server is available using the proxy server as target proxy server;When When target proxy server is not in proxy server blacklist, downloader 150 executes data by target proxy server and grabs Take task to capture data from network 200, and from the extracting data structural data grabbed;When structuring can be extracted When data, the structural data extracted is sent to host node 120 and is stored.In this way, can avoid in 200 reptile of network Disconnected and obstruction.
The foregoing is merely the preferred embodiments of the application, are not intended to limit this application, for the skill of this field For art personnel, the application can have various modifications and variations.Within the spirit and principles of this application, any made by repair Change, equivalent replacement, improvement etc., should be included within the protection domain of the application.

Claims (10)

1. a kind of data grab method, which is characterized in that be applied to network crawler system, which includes main section Point, selector and proxy server pond;The method includes:
The host node response data fetching instruction creates the downloader for collecting simultaneously downloading data from network, the download Device includes a proxy server blacklist;
The host node distributes data grabber task to newly-built downloader;
The downloader is by the host node to the selector request agency server;
The selector determines a proxy server in the proxy server pond, when the proxy server is available, by this Proxy server is as target proxy server and distributes to the downloader;
When the target proxy server is not in the proxy server blacklist, the downloader passes through the target generation It manages server and executes the data grabber task to capture data from network, and from the extracting data structuring number grabbed According to;
When structural data can be extracted, the structural data extracted is sent to the host node;
The host node stores the structural data into preset database.
2. data grab method according to claim 1, which is characterized in that the method further includes:
When the proxy server that the selector determines in the proxy server pond is unavailable, from the proxy server The proxy server is removed in pond, and redefines a proxy server.
3. data grab method according to claim 1 or 2, which is characterized in that the method further includes:
When the target proxy server is in the proxy server blacklist, the downloader is again through the main section It puts to the selector request agency server.
4. data grab method according to claim 1 or 2, which is characterized in that the method further includes:
When that can not extract structural data, the agency service is recorded in the target proxy server by the downloader In device blacklist, and again through the host node to the selector request agency server.
5. data grab method according to claim 1 or 2, which is characterized in that the method further includes:
At the end of the data grabber task, the downloader and the proxy server blacklist are deleted.
6. a kind of data grabber device, which is characterized in that be applied to network crawler system, which includes main section Point, selector and proxy server pond;Described device includes:
Creation module is created for controlling the host node response data fetching instruction for collecting simultaneously downloading data from network Downloader, the downloader include a proxy server blacklist;
Task allocating module distributes data grabber task for controlling the host node to newly-built downloader;
Request module, for controlling the downloader by the host node to the selector request agency server;
Module is chosen, a proxy server is determined in the proxy server pond for controlling the selector, as the agency When server can be used, which as target proxy server and is distributed into the downloader;
Task execution module, for when the target proxy server is not in the proxy server blacklist, controlling institute It states downloader and the crawl task is executed to capture data from network by the target proxy server, and from the number grabbed According to middle extraction structural data;
Memory module, for when structural data can be extracted, the structural data extracted to be sent to the main section Point, and controlling the host node will be in structural data storage to preset database.
7. data grabber device according to claim 6, which is characterized in that described device further includes:
Remove module, when the proxy server for being determined in the proxy server pond when the selector is unavailable, control It makes the selector and removes the proxy server from the proxy server pond, and redefine a proxy server.
8. the data grabber device described according to claim 6 or 7, which is characterized in that
The request module is additionally operable to when the target proxy server is in the proxy server blacklist, controls institute Downloader is stated again through the host node to the selector request agency server.
9. the data grabber device described according to claim 6 or 7, which is characterized in that described device further includes:
Logging modle, for when structural data can not be extracted, controlling the downloader by the target proxy server It is recorded in the proxy server blacklist, and again through the host node to the selector request agency server.
10. a kind of network crawler system, which is characterized in that including host node, selector and proxy server pond;
The host node response data fetching instruction creates the downloader for collecting simultaneously downloading data from network, the download Device includes a proxy server blacklist;
The host node distributes data grabber task to newly-built downloader;
The downloader is by the host node to the selector request agency server;
The selector determines a proxy server in the proxy server pond, when the proxy server is available, by this Proxy server is as target proxy server and distributes to the downloader;
When the target proxy server is not in the proxy server blacklist, the downloader passes through the target generation It manages server and executes the data grabber task to capture data from network, and from the extracting data structuring number grabbed According to;
When structural data can be extracted, the structural data extracted is sent to the host node;
The host node stores the structural data into preset database.
CN201810306740.6A 2018-04-08 2018-04-08 Data grab method, device and network crawler system Pending CN108595543A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810306740.6A CN108595543A (en) 2018-04-08 2018-04-08 Data grab method, device and network crawler system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810306740.6A CN108595543A (en) 2018-04-08 2018-04-08 Data grab method, device and network crawler system

Publications (1)

Publication Number Publication Date
CN108595543A true CN108595543A (en) 2018-09-28

Family

ID=63621143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810306740.6A Pending CN108595543A (en) 2018-04-08 2018-04-08 Data grab method, device and network crawler system

Country Status (1)

Country Link
CN (1) CN108595543A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670103A (en) * 2018-12-14 2019-04-23 深圳中兴飞贷金融科技有限公司 Data grab method and device
CN109871226A (en) * 2019-02-25 2019-06-11 深圳前海达闼云端智能科技有限公司 Configuration method, device, medium and the electronic equipment of downloader
CN110740447A (en) * 2019-10-22 2020-01-31 福州汇思博信息技术有限公司 remote log grabbing method for Android terminal
CN111343253A (en) * 2020-02-14 2020-06-26 苏宁金融科技(南京)有限公司 Information extraction method and system
CN112637049A (en) * 2020-12-16 2021-04-09 广州索答信息科技有限公司 Data capture system and method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902386A (en) * 2014-04-11 2014-07-02 复旦大学 Multi-thread network crawler processing method based on connection proxy optimal management
US20160241670A1 (en) * 2013-09-25 2016-08-18 Verizon Digital Media Services Inc. Instantaneous non-blocking content purging in a distributed platform
CN106534244A (en) * 2015-09-14 2017-03-22 中国移动通信集团公司 Scheduling method and device for proxy resources
CN107395782A (en) * 2017-07-19 2017-11-24 北京理工大学 A kind of IP limitation controlled source information extraction methods based on agent pool
CN107832355A (en) * 2017-10-23 2018-03-23 北京金堤科技有限公司 The method and device that a kind of agency of crawlers obtains

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160241670A1 (en) * 2013-09-25 2016-08-18 Verizon Digital Media Services Inc. Instantaneous non-blocking content purging in a distributed platform
CN103902386A (en) * 2014-04-11 2014-07-02 复旦大学 Multi-thread network crawler processing method based on connection proxy optimal management
CN106534244A (en) * 2015-09-14 2017-03-22 中国移动通信集团公司 Scheduling method and device for proxy resources
CN107395782A (en) * 2017-07-19 2017-11-24 北京理工大学 A kind of IP limitation controlled source information extraction methods based on agent pool
CN107832355A (en) * 2017-10-23 2018-03-23 北京金堤科技有限公司 The method and device that a kind of agency of crawlers obtains

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670103A (en) * 2018-12-14 2019-04-23 深圳中兴飞贷金融科技有限公司 Data grab method and device
CN109871226A (en) * 2019-02-25 2019-06-11 深圳前海达闼云端智能科技有限公司 Configuration method, device, medium and the electronic equipment of downloader
CN109871226B (en) * 2019-02-25 2022-05-17 深圳前海达闼云端智能科技有限公司 Configuration method, device and medium of downloader and electronic equipment
CN110740447A (en) * 2019-10-22 2020-01-31 福州汇思博信息技术有限公司 remote log grabbing method for Android terminal
CN111343253A (en) * 2020-02-14 2020-06-26 苏宁金融科技(南京)有限公司 Information extraction method and system
CN112637049A (en) * 2020-12-16 2021-04-09 广州索答信息科技有限公司 Data capture system and method

Similar Documents

Publication Publication Date Title
CN108595543A (en) Data grab method, device and network crawler system
CN107832355B (en) A kind of method and device that the agency of crawlers obtains
CN102054028B (en) Method for implementing web-rendering function by using web crawler system
CN113037777B (en) Honeypot bait distribution method and device, storage medium and electronic equipment
CN104144142B (en) A kind of Web bug excavation methods and system
CN102833240A (en) Malicious code capturing method and system
CN109948026A (en) A kind of web data crawling method, device, equipment and medium
CN108121511A (en) Data processing method, device and equipment in a kind of distributed edge storage system
EP3477894A1 (en) Method and device for controlling virtualized broadband remote access server (vbras), and communication system
WO2020211561A1 (en) Data processing method and device, storage medium and electronic device
CN111209460A (en) Data acquisition system and method based on script crawler framework
CN107547526A (en) The data processing method and device combined a kind of cloud
CN108829792A (en) Distributed darknet excavating resource system and method based on scrapy
CN111831275B (en) Method, server, medium and computer equipment for arranging micro-scene script
CN107026871A (en) A kind of Web vulnerability scanning methods based on cloud computing
CN109359263B (en) User behavior feature extraction method and system
CN109710440A (en) Abnormality eliminating method, device, storage medium and the terminal device of webpage front-end
CN111367629A (en) Delayed task processing method and device
CN109600385A (en) A kind of access control method and device
CN112769838B (en) Access user filtering method, device, equipment and storage medium
CN109062590A (en) A kind of method and system of game SDK online updating
CN111523074A (en) Acquisition system for dynamic page sensitive data of front-end rendering website
CN104753861A (en) Security event handling method and device
CN108133041A (en) Data collecting system and method based on web crawlers and data transfer technology
CN114363036A (en) Network attack path acquisition method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 311501, Unit 1, Building 5, Courtyard 1, Futong East Street, Chaoyang District, Beijing

Applicant after: Beijing Zhichuangyu Information Technology Co., Ltd.

Address before: Room 311501, Unit 1, Building 5, Courtyard 1, Futong East Street, Chaoyang District, Beijing

Applicant before: Beijing Knows Chuangyu Information Technology Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180928