CN108595543A - Data grab method, device and network crawler system - Google Patents
Data grab method, device and network crawler system Download PDFInfo
- Publication number
- CN108595543A CN108595543A CN201810306740.6A CN201810306740A CN108595543A CN 108595543 A CN108595543 A CN 108595543A CN 201810306740 A CN201810306740 A CN 201810306740A CN 108595543 A CN108595543 A CN 108595543A
- Authority
- CN
- China
- Prior art keywords
- proxy server
- data
- downloader
- host node
- selector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
A kind of data grab method of the embodiment of the present application offer, device and network crawler system, method and device is applied to the network crawler system for including host node, selector and proxy server pond.Host node response data fetching instruction, establishment include the downloader of proxy server blacklist, and distribute data grabber task to downloader;Downloader is by host node to selector request agency server;Selector determines a proxy server in proxy server pond, and when the proxy server is available, downloader is distributed to using the proxy server as target proxy server;When target proxy server is not in proxy server blacklist, downloader executes data grabber task to capture data from network, and from the extracting data structural data grabbed by target proxy server;When structural data can be extracted, the structural data extracted is sent to host node and is stored.In this way, can avoid interruption and the obstruction of web crawlers.
Description
Technical field
This application involves Internet technical fields, in particular to a kind of data grab method, device and web crawlers
System.
Background technology
Web crawlers, also referred to as webpage spider, network robot, webpage follower, ant, automatic indexing, simulation journey
Sequence or worm etc. are a kind of programs or script automatically capturing webpage and data on the internet according to certain rule.Net
Content is collected and downloaded on the internet to network reptile according to the strategy of formulation, and structural data is extracted from the content of download, and
The structural data extracted is stored.
Presently, there are some general network crawler systems, the variation that programmer can be as desired only changes the system
In strategy, other parts still sample program original in the system, and without rewriting a web crawler.
It can be seen that existing network crawler system is focused primarily upon and how efficiently to be collected and downloading data, but in mesh
Under preceding network environment, the Internet resources that network crawler system can use are limited, and especially many websites are for flow
The purpose of protection forbids the address same IP (Internet Protocol) largely to access in a short time.These websites can shield
The access from these IP address is covered, to cause interruption or the obstruction of network crawler system, and then leads to network crawler system
Efficiency reduce.
Apply for content
In view of this, the purpose of the application includes a kind of data grab method of offer, device and network crawler system, to change
The kind above problem.
In order to achieve the above object, the embodiment of the present application adopts the following technical scheme that:
In a first aspect, the embodiment of the present application provides a kind of data grab method, it is applied to network crawler system, which climbs
Worm system includes host node, selector and proxy server pond;The method includes:
The host node response data fetching instruction creates the downloader for collecting simultaneously downloading data from network, described
Downloader includes a proxy server blacklist;
The host node distributes data grabber task to newly-built downloader;
The downloader is by the host node to the selector request agency server;
The selector determines a proxy server in the proxy server pond, when the proxy server is available,
The proxy server as target proxy server and is distributed into the downloader;
When the target proxy server is not in the proxy server blacklist, the downloader passes through the mesh
It marks proxy server and executes the data grabber task to capture data from network, and from the extracting data structuring grabbed
Data;
When structural data can be extracted, the structural data extracted is sent to the host node;
The host node stores the structural data into preset database.
Optionally, the data grab method provided according to the embodiment of the present application first aspect, the method further include:
When the proxy server that the selector determines in the proxy server pond is unavailable, taken from the agency
Business device removes the proxy server in pond, and redefines a proxy server.
Optionally, the data grab method provided according to the embodiment of the present application first aspect, the method further include:
When the target proxy server is in the proxy server blacklist, the downloader is again through described
Host node is to the selector request agency server.
Optionally, the data grab method provided according to the embodiment of the present application first aspect, the method further include:
When that can not extract structural data, the agency is recorded in the target proxy server by the downloader
In server blacklist, and again through the host node to the selector request agency server.
Optionally, the data grab method provided according to the embodiment of the present application first aspect, the method further include:
At the end of the data grabber task, the downloader and the proxy server blacklist are deleted.
Second aspect, the embodiment of the present application also provide a kind of data grabber device, are applied to network crawler system, the network
Crawler system includes host node, selector and proxy server pond;Described device includes:
Creation module is created for controlling the host node response data fetching instruction for collecting and downloading from network
The downloader of data, the downloader include a proxy server blacklist;
Task allocating module distributes data grabber task for controlling the host node to newly-built downloader;
Request module, for controlling the downloader by the host node to the selector request agency server;
Module is chosen, a proxy server is determined in the proxy server pond for controlling the selector, when this
When proxy server can be used, which as target proxy server and is distributed into the downloader;
Task execution module, for when the target proxy server is not in the proxy server blacklist, controlling
It makes the downloader and the crawl task is executed to capture data from network by the target proxy server, and from grabbing
Extracting data structural data;
Memory module, for when structural data can be extracted, the structural data extracted being sent to described
Host node, and controlling the host node will be in structural data storage to preset database.
Optionally, the data grabber device provided according to the embodiment of the present application second aspect, described device further include:
It is unavailable to be additionally operable to the proxy server determined in the proxy server pond when the selector for remove module
When, it controls the selector and removes the proxy server from the proxy server pond, and redefine a proxy server.
Optionally, the data grabber device provided according to the embodiment of the present application second aspect, the request module are additionally operable to
When the target proxy server is in the proxy server blacklist, the downloader is controlled again through the main section
It puts to the selector request agency server.
Optionally, the data grabber device provided according to the embodiment of the present application second aspect, described device further include:
Logging modle takes the target proxy for when that can not extract structural data, controlling the downloader
Business device is recorded in the proxy server blacklist, and again through the host node to the selector request agency service
Device.
The third aspect, the embodiment of the present application also provide a kind of network crawler system, including host node, selector and agency's clothes
Business device pond;
The host node response data fetching instruction creates the downloader for collecting simultaneously downloading data from network, described
Downloader includes a proxy server blacklist;
The host node distributes data grabber task to newly-built downloader;
The downloader is by the host node to the selector request agency server;
The selector determines a proxy server in the proxy server pond, when the proxy server is available,
The proxy server as target proxy server and is distributed into the downloader;
When the target proxy server is not in the proxy server blacklist, the downloader passes through the mesh
It marks proxy server and executes the data grabber task to capture data from network, and from the extracting data structuring grabbed
Data;
When structural data can be extracted, the structural data extracted is sent to the host node;
The host node stores the structural data into preset database.
Compared to the prior art, the embodiment of the present application has the advantages that:
A kind of data grab method, device and network crawler system provided by the embodiments of the present application, the network crawler system
Including host node, selector and proxy server pond.Host node response data fetching instruction, establishment includes the black name of proxy server
Single downloader, and distribute data grabber task to downloader;Downloader is by host node to selector request agency server;
Selector determines a proxy server in proxy server pond, and when the proxy server is available, which is made
Downloader is distributed to for target proxy server;When target proxy server is not in proxy server blacklist, downloader
Data grabber task is executed to capture data from network, and from the extracting data structure grabbed by target proxy server
Change data;When structural data can be extracted, the structural data extracted is sent to host node and is stored.Such as
This, can avoid interruption and the obstruction of web crawlers, and then improve the efficiency of network crawler system.
Description of the drawings
It, below will be to needed in the embodiment attached in order to illustrate more clearly of the technical solution of the embodiment of the present application
Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 is a kind of connection block diagram of web crawlers system provided by the embodiments of the present application;
Fig. 2 is a kind of flow diagram of data grab method provided by the embodiments of the present application;
Fig. 3 is a kind of functional block diagram of data grabber device provided by the embodiments of the present application;
Fig. 4 is the another functional block diagram of data grabber device provided by the embodiments of the present application.
Icon:100- network crawler systems;110- data grabber devices;111- creation modules;112- task allocating modules;
113- request modules;114- chooses module;115- task execution modules;116- memory modules;117- remove modules;118- is recorded
Module;119- removing modules;120- host nodes;130- selectors;140- proxy servers pond;150- downloaders;The rear ends 160-
Database;200- networks.
Specific implementation mode
To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
In attached drawing, technical solutions in the embodiments of the present application is clearly and completely described, it is clear that described embodiment is
Some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is implemented
The component of example can be arranged and be designed with a variety of different configurations.
Therefore, below the detailed description of the embodiments herein to providing in the accompanying drawings be not intended to limit it is claimed
Scope of the present application, but be merely representative of the selected embodiment of the application.Based on the embodiment in the application, this field is common
The every other embodiment that technical staff is obtained without creative efforts belongs to the model of the application protection
It encloses.
It should be noted that:Similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined, then it further need not be defined and explained in subsequent attached drawing in a attached drawing.
As shown in Figure 1, being a kind of connection block diagram of network crawler system 100 provided by the embodiments of the present application.The network
Crawler system 100 includes data grabber device 110, host node 120 (Master), selector (Selector) 130 and agency's clothes
Be engaged in device pond (Proxy Pool) 140.Wherein, the host node 120 and a back-end data base 160 communicate to connect, the selector
130 communicate to connect with the proxy server pond 140, and the host node 120 is communicated to connect with the selector 130.
In the present embodiment, the host node 120 is the main part of the network crawler system 100, is responsible for task tune
Degree, database purchase etc..Available proxy server is stored in the proxy server pond 140, the selector 130 is responsible for
It safeguards the proxy server pond 140, choose proxy server (Proxy) from the proxy server pond 140 and detect choosing
Whether the proxy server taken can be used.The back-end data base 160 is for storing the structuring number that the host node 120 extracts
According to.
When implementing, the host node 120 is when receiving data grabber instruction to open a data grabber task, meeting
A downloader 150 (Downloader) is created, which includes a caching for being used for storage agent server blacklist
(Proxy Blacklist Cache).The content and data on network 200 are collected and downloaded to the downloader 150 for being responsible for
Structuring etc..
As shown in Fig. 2, being a kind of flow diagram of data grab method provided by the embodiments of the present application, the data grabber
Method is applied to network crawler system 100 shown in FIG. 1.The content of this method is described in detail with reference to Fig. 2.
Step S201, host node 120 response data fetching instruction are created for being collected from network 200 and downloading data
Downloader 150, the downloader 150 include a proxy server blacklist.
Wherein, the data grabber instruction can be the behaviour for the beginning data grabber that user executes in respective user interfaces
Make corresponding instruction, the proxy server blacklist creates together with the downloader 150, and by the downloader 150
It safeguards.
In the present embodiment, host node 120 can create one, two or more downloaders 150 according to actual conditions, this
Embodiment is without limitation, by creating more than two (including two) downloaders 150, the data captured can be needed to distribute
It is captured to more than two downloaders 150, improves efficiency.
Step S202, the host node 120 distribute data grabber task to newly-built downloader 150.
Wherein, when the downloader 150 creates successfully, the host node 120 is instructed according to the data grabber to this
Downloader 150 distributes data grabber task.
Step S203, the downloader 150 is by the host node 120 to 130 request agency server of the selector.
Step S204, the selector 130 determines a proxy server in the proxy server pond 140, and detects
Whether the proxy server can be used.If so, S205 is thened follow the steps, if it is not, thening follow the steps S206.
When implementation, the downloader 150 can send the request for request agency server to the host node 120,
The host node 120 controls the selector 130 in the proxy server pond 140 randomly when receiving the request
Determine a proxy server.It is worth noting that, for each proxy server that the selector 130 determines, all can herein
Detect whether the proxy server can be used.
Step S205, the selector 130 using the proxy server as target proxy server and distribute to it is described under
Carry device 150.
In the present embodiment, when the selector 130 judges that identified proxy server is available, the selector
The target proxy server is distributed to the downloader 150 by 130.
Step S206, the selector 130 removes the proxy server from the proxy server pond 140, and returns
The step S204.
In the present embodiment, when the selector 130 judges that identified proxy server is unavailable, from the agency
The proxy server is removed in server pools 140, and redefines a proxy server in the proxy server pond 140,
Detect whether the proxy server redefined can be used again.Until when the proxy server determined is available, by the proxy server
The downloader 150 is distributed to as target proxy server.
In this way, can in time renewal agency server pools 140 and network crawler system 100 because of selected generation
Reason server is unavailable and interrupts.
Step S207, the downloader 150 judge when obtaining the target proxy server that the selector 130 distributes
Whether the target proxy server is in the proxy server blacklist.If so, return to step S203, with again through
The host node 120 is to 130 request agency server of the selector;If it is not, thening follow the steps S208.
In the present embodiment, the proxy server in the proxy server blacklist is shielded agency service
Device.Through the above steps, it can be shielded proxy server to avoid selected proxy server, and then avoid network
Crawler system 100 is interrupted because proxy server is by shielding.
Step S208, the downloader 150 by the target proxy server execute the data grabber task with from
Network 200 captures data, from the extracting data structural data grabbed, and judges whether that structuring number can be extracted
According to.If so, thening follow the steps S209;If it is not, thening follow the steps S210.
The structural data extracted is sent to the host node 120 by step S209, the downloader 150.
The proxy server blacklist is recorded in the target proxy server by step S210, the downloader 150
In, and return to step S203.
In the present embodiment, possibly can not include all quilts since the proxy server blacklist is pre-set
The proxy server of shielding.Based on this, for the present embodiment after grabbing data, the downloader 150 judges the number grabbed
According to whether be can structuring, if it is can structuring, show that the target proxy server is not shielded, if not
Can structuring, show that the target proxy server is shielded.
It is shielded in the proxy server, the target proxy server can be added and be recorded
In the proxy server blacklist, and request agency server again.
Step S211, the host node 120 store the structural data into preset database.
Wherein, the preset database can be the back-end data base communicated to connect with the host node 120
160。
Optionally, in the present embodiment, the host node 120 can detect whether the data grabber task is tied in real time
Beam can delete the proxy server pond 140 that the downloader 150 and the downloader 150 are safeguarded if being over.
By the above process, using pool technology is acted on behalf of, automatically distribute and safeguard the agency service that downloader 150 uses
Device solves the problems, such as to shield the IP address of network crawler system 100 under current network conditions.And avoid web crawlers system
The interruption of system 100 and obstruction, it is steady in structure, ripe, improve the efficiency for crawling data.
As shown in figure 3, be a kind of functional block diagram of data grabber device 110 provided by the embodiments of the present application, application
In network crawler system 100 shown in FIG. 1.
The data grabber device 110 includes creation module 111, task allocating module 112, request module 113, chooses mould
Block 114, task execution module 115 and memory module 116.
Wherein, the creation module 111 is for controlling 120 response data fetching instruction of the host node, create for from
Network 200 is collected and the downloader 150 of downloading data, and the downloader 150 includes a proxy server blacklist.
In the present embodiment, the description as described in the creation module 111 is specifically referred to the detailed of step S201 shown in Fig. 2
Thin description, i.e. step S201 can be executed by the creation module 111.
The task allocating module 112 is used to control the host node 120 and distributes data grabber to newly-built downloader 150
Task.
In the present embodiment, the description as described in the task allocating module 112 is specifically referred to step S202 shown in Fig. 2
Detailed description, i.e. step S202 can execute by the task allocating module 112.
The request module 113 is for controlling the downloader 150 by the host node 120 to the selector 130
Request agency server.
In the present embodiment, the description as described in the request module 113 is specifically referred to step S203 shown in figure Fig. 2
Detailed description, i.e. step S203 can be executed by the request module 113.
The selection module 114 determines an agency for controlling the selector 130 in the proxy server pond 140
The proxy server as target proxy server and is distributed to the download by server when the proxy server is available
Device 150.
In the present embodiment, the description as described in the selection module 114 is specifically referred to step S204 shown in Fig. 2 and step
The detailed description of rapid S205, i.e. step S204 and step S205 can be executed by the selection module 114.
The task execution module 115 is used for when the target proxy server is not in the proxy server blacklist
When, it controls the downloader 150 and the crawl task is executed to capture number from network 200 by the target proxy server
According to, and from the extracting data structural data grabbed.
Optionally, the request module 113 can be also used for when the target proxy server is in the proxy server
When in blacklist, the downloader 150 is controlled again through the host node 120 to 130 request agency service of the selector
Device.
The memory module 116 is used for when that can extract structural data, and the structural data extracted is sent
To the host node 120, and controlling the host node 120 will be in structural data storage to preset database.
Optionally, as shown in figure 4, the data grabber device 110 can also include remove module 117.
Agency's clothes that the remove module 117 is used to determine in the proxy server pond 140 when the selector 130
When business device is unavailable, controls the selector 130 and remove the proxy server from the proxy server pond 140, and again
Determine a proxy server.
Optionally, the data grabber device 110 can also include logging modle 118.
The logging modle 118 is used for when that can not extract structural data, controls the downloader 150 by the mesh
Mark proxy server is recorded in the proxy server blacklist, and again through the host node 120 to the selector
130 request agency servers.
Optionally, the data grabber device 110 can also include removing module 119.
The removing module 119 is used at the end of the data grabber task, deletes the downloader 150 and the generation
Manage server blacklist.
The description as described in above-mentioned module specifically refers to the detailed description to corresponding steps in the above, herein no longer
It repeats.
In conclusion data grab method provided by the embodiments of the present application, device and network crawler system 100, network is climbed
Worm system 100 includes host node 120, selector 130 and proxy server pond 140.120 response data fetching instruction of host node,
Establishment includes the downloader 150 of proxy server blacklist, and distributes data grabber task to downloader 150;Downloader 150 is logical
Host node 120 is crossed to 130 request agency server of selector;Selector 130 determines agency's clothes in proxy server pond 140
Business device distributes to downloader 150 when the proxy server is available using the proxy server as target proxy server;When
When target proxy server is not in proxy server blacklist, downloader 150 executes data by target proxy server and grabs
Take task to capture data from network 200, and from the extracting data structural data grabbed;When structuring can be extracted
When data, the structural data extracted is sent to host node 120 and is stored.In this way, can avoid in 200 reptile of network
Disconnected and obstruction.
The foregoing is merely the preferred embodiments of the application, are not intended to limit this application, for the skill of this field
For art personnel, the application can have various modifications and variations.Within the spirit and principles of this application, any made by repair
Change, equivalent replacement, improvement etc., should be included within the protection domain of the application.
Claims (10)
1. a kind of data grab method, which is characterized in that be applied to network crawler system, which includes main section
Point, selector and proxy server pond;The method includes:
The host node response data fetching instruction creates the downloader for collecting simultaneously downloading data from network, the download
Device includes a proxy server blacklist;
The host node distributes data grabber task to newly-built downloader;
The downloader is by the host node to the selector request agency server;
The selector determines a proxy server in the proxy server pond, when the proxy server is available, by this
Proxy server is as target proxy server and distributes to the downloader;
When the target proxy server is not in the proxy server blacklist, the downloader passes through the target generation
It manages server and executes the data grabber task to capture data from network, and from the extracting data structuring number grabbed
According to;
When structural data can be extracted, the structural data extracted is sent to the host node;
The host node stores the structural data into preset database.
2. data grab method according to claim 1, which is characterized in that the method further includes:
When the proxy server that the selector determines in the proxy server pond is unavailable, from the proxy server
The proxy server is removed in pond, and redefines a proxy server.
3. data grab method according to claim 1 or 2, which is characterized in that the method further includes:
When the target proxy server is in the proxy server blacklist, the downloader is again through the main section
It puts to the selector request agency server.
4. data grab method according to claim 1 or 2, which is characterized in that the method further includes:
When that can not extract structural data, the agency service is recorded in the target proxy server by the downloader
In device blacklist, and again through the host node to the selector request agency server.
5. data grab method according to claim 1 or 2, which is characterized in that the method further includes:
At the end of the data grabber task, the downloader and the proxy server blacklist are deleted.
6. a kind of data grabber device, which is characterized in that be applied to network crawler system, which includes main section
Point, selector and proxy server pond;Described device includes:
Creation module is created for controlling the host node response data fetching instruction for collecting simultaneously downloading data from network
Downloader, the downloader include a proxy server blacklist;
Task allocating module distributes data grabber task for controlling the host node to newly-built downloader;
Request module, for controlling the downloader by the host node to the selector request agency server;
Module is chosen, a proxy server is determined in the proxy server pond for controlling the selector, as the agency
When server can be used, which as target proxy server and is distributed into the downloader;
Task execution module, for when the target proxy server is not in the proxy server blacklist, controlling institute
It states downloader and the crawl task is executed to capture data from network by the target proxy server, and from the number grabbed
According to middle extraction structural data;
Memory module, for when structural data can be extracted, the structural data extracted to be sent to the main section
Point, and controlling the host node will be in structural data storage to preset database.
7. data grabber device according to claim 6, which is characterized in that described device further includes:
Remove module, when the proxy server for being determined in the proxy server pond when the selector is unavailable, control
It makes the selector and removes the proxy server from the proxy server pond, and redefine a proxy server.
8. the data grabber device described according to claim 6 or 7, which is characterized in that
The request module is additionally operable to when the target proxy server is in the proxy server blacklist, controls institute
Downloader is stated again through the host node to the selector request agency server.
9. the data grabber device described according to claim 6 or 7, which is characterized in that described device further includes:
Logging modle, for when structural data can not be extracted, controlling the downloader by the target proxy server
It is recorded in the proxy server blacklist, and again through the host node to the selector request agency server.
10. a kind of network crawler system, which is characterized in that including host node, selector and proxy server pond;
The host node response data fetching instruction creates the downloader for collecting simultaneously downloading data from network, the download
Device includes a proxy server blacklist;
The host node distributes data grabber task to newly-built downloader;
The downloader is by the host node to the selector request agency server;
The selector determines a proxy server in the proxy server pond, when the proxy server is available, by this
Proxy server is as target proxy server and distributes to the downloader;
When the target proxy server is not in the proxy server blacklist, the downloader passes through the target generation
It manages server and executes the data grabber task to capture data from network, and from the extracting data structuring number grabbed
According to;
When structural data can be extracted, the structural data extracted is sent to the host node;
The host node stores the structural data into preset database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810306740.6A CN108595543A (en) | 2018-04-08 | 2018-04-08 | Data grab method, device and network crawler system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810306740.6A CN108595543A (en) | 2018-04-08 | 2018-04-08 | Data grab method, device and network crawler system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108595543A true CN108595543A (en) | 2018-09-28 |
Family
ID=63621143
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810306740.6A Pending CN108595543A (en) | 2018-04-08 | 2018-04-08 | Data grab method, device and network crawler system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108595543A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670103A (en) * | 2018-12-14 | 2019-04-23 | 深圳中兴飞贷金融科技有限公司 | Data grab method and device |
CN109871226A (en) * | 2019-02-25 | 2019-06-11 | 深圳前海达闼云端智能科技有限公司 | Configuration method, device, medium and the electronic equipment of downloader |
CN110740447A (en) * | 2019-10-22 | 2020-01-31 | 福州汇思博信息技术有限公司 | remote log grabbing method for Android terminal |
CN111343253A (en) * | 2020-02-14 | 2020-06-26 | 苏宁金融科技(南京)有限公司 | Information extraction method and system |
CN112637049A (en) * | 2020-12-16 | 2021-04-09 | 广州索答信息科技有限公司 | Data capture system and method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902386A (en) * | 2014-04-11 | 2014-07-02 | 复旦大学 | Multi-thread network crawler processing method based on connection proxy optimal management |
US20160241670A1 (en) * | 2013-09-25 | 2016-08-18 | Verizon Digital Media Services Inc. | Instantaneous non-blocking content purging in a distributed platform |
CN106534244A (en) * | 2015-09-14 | 2017-03-22 | 中国移动通信集团公司 | Scheduling method and device for proxy resources |
CN107395782A (en) * | 2017-07-19 | 2017-11-24 | 北京理工大学 | A kind of IP limitation controlled source information extraction methods based on agent pool |
CN107832355A (en) * | 2017-10-23 | 2018-03-23 | 北京金堤科技有限公司 | The method and device that a kind of agency of crawlers obtains |
-
2018
- 2018-04-08 CN CN201810306740.6A patent/CN108595543A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160241670A1 (en) * | 2013-09-25 | 2016-08-18 | Verizon Digital Media Services Inc. | Instantaneous non-blocking content purging in a distributed platform |
CN103902386A (en) * | 2014-04-11 | 2014-07-02 | 复旦大学 | Multi-thread network crawler processing method based on connection proxy optimal management |
CN106534244A (en) * | 2015-09-14 | 2017-03-22 | 中国移动通信集团公司 | Scheduling method and device for proxy resources |
CN107395782A (en) * | 2017-07-19 | 2017-11-24 | 北京理工大学 | A kind of IP limitation controlled source information extraction methods based on agent pool |
CN107832355A (en) * | 2017-10-23 | 2018-03-23 | 北京金堤科技有限公司 | The method and device that a kind of agency of crawlers obtains |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670103A (en) * | 2018-12-14 | 2019-04-23 | 深圳中兴飞贷金融科技有限公司 | Data grab method and device |
CN109871226A (en) * | 2019-02-25 | 2019-06-11 | 深圳前海达闼云端智能科技有限公司 | Configuration method, device, medium and the electronic equipment of downloader |
CN109871226B (en) * | 2019-02-25 | 2022-05-17 | 深圳前海达闼云端智能科技有限公司 | Configuration method, device and medium of downloader and electronic equipment |
CN110740447A (en) * | 2019-10-22 | 2020-01-31 | 福州汇思博信息技术有限公司 | remote log grabbing method for Android terminal |
CN111343253A (en) * | 2020-02-14 | 2020-06-26 | 苏宁金融科技(南京)有限公司 | Information extraction method and system |
CN112637049A (en) * | 2020-12-16 | 2021-04-09 | 广州索答信息科技有限公司 | Data capture system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108595543A (en) | Data grab method, device and network crawler system | |
CN107832355B (en) | A kind of method and device that the agency of crawlers obtains | |
CN102054028B (en) | Method for implementing web-rendering function by using web crawler system | |
CN113037777B (en) | Honeypot bait distribution method and device, storage medium and electronic equipment | |
CN104144142B (en) | A kind of Web bug excavation methods and system | |
CN102833240A (en) | Malicious code capturing method and system | |
CN109948026A (en) | A kind of web data crawling method, device, equipment and medium | |
CN108121511A (en) | Data processing method, device and equipment in a kind of distributed edge storage system | |
EP3477894A1 (en) | Method and device for controlling virtualized broadband remote access server (vbras), and communication system | |
WO2020211561A1 (en) | Data processing method and device, storage medium and electronic device | |
CN111209460A (en) | Data acquisition system and method based on script crawler framework | |
CN107547526A (en) | The data processing method and device combined a kind of cloud | |
CN108829792A (en) | Distributed darknet excavating resource system and method based on scrapy | |
CN111831275B (en) | Method, server, medium and computer equipment for arranging micro-scene script | |
CN107026871A (en) | A kind of Web vulnerability scanning methods based on cloud computing | |
CN109359263B (en) | User behavior feature extraction method and system | |
CN109710440A (en) | Abnormality eliminating method, device, storage medium and the terminal device of webpage front-end | |
CN111367629A (en) | Delayed task processing method and device | |
CN109600385A (en) | A kind of access control method and device | |
CN112769838B (en) | Access user filtering method, device, equipment and storage medium | |
CN109062590A (en) | A kind of method and system of game SDK online updating | |
CN111523074A (en) | Acquisition system for dynamic page sensitive data of front-end rendering website | |
CN104753861A (en) | Security event handling method and device | |
CN108133041A (en) | Data collecting system and method based on web crawlers and data transfer technology | |
CN114363036A (en) | Network attack path acquisition method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: Room 311501, Unit 1, Building 5, Courtyard 1, Futong East Street, Chaoyang District, Beijing Applicant after: Beijing Zhichuangyu Information Technology Co., Ltd. Address before: Room 311501, Unit 1, Building 5, Courtyard 1, Futong East Street, Chaoyang District, Beijing Applicant before: Beijing Knows Chuangyu Information Technology Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180928 |