CN107885843A - A kind of method and device of intelligent reptile task - Google Patents
A kind of method and device of intelligent reptile task Download PDFInfo
- Publication number
- CN107885843A CN107885843A CN201711106956.XA CN201711106956A CN107885843A CN 107885843 A CN107885843 A CN 107885843A CN 201711106956 A CN201711106956 A CN 201711106956A CN 107885843 A CN107885843 A CN 107885843A
- Authority
- CN
- China
- Prior art keywords
- reptile
- task
- rule
- objective network
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of method and device of intelligent reptile task.The method of the intelligent reptile task, including:According to the first reptile rule the first reptile task of generation for being pre-configured with completion;Objective network link is obtained according to the first reptile task;Input using objective network link as the second reptile rule;Corresponding data content is stored according to the second reptile task of second reptile rule generation.The present invention by intelligentized reptile task management, the network that can be got twice and more than twice redirect after data content, so as to improve the degree of intelligence of reptile task.
Description
Technical field
The present invention relates to crawler technology field, more particularly to a kind of method and device of intelligent reptile task.
Background technology
Reptile obtains the resource content of target web, can be according to the specific purpose of user come the money required for the acquisition that orients
Source contents.But user often faces the problem of following, the target web of the resource content of user's real demand in initial input
In the pointed webpage of objective network link in, i.e., user directly can not be got in the resource of real demand by reptile
Hold.How adequate solution above mentioned problem, just become industry problem urgently to be resolved hurrily.
The content of the invention
The present invention provides a kind of method and device of intelligent reptile task, to by intelligentized reptile task management,
The network that can be got twice and more than twice redirect after data content, so as to improve the degree of intelligence of reptile task.
First aspect according to embodiments of the present invention, there is provided a kind of method of intelligent reptile task, including:
According to the first reptile rule the first reptile task of generation for being pre-configured with completion;
Objective network link is obtained according to the first reptile task;
Input using objective network link as the second reptile rule;
Corresponding data content is stored by the second reptile task of second reptile rule generation.
In one embodiment, the basis is pre-configured with the first reptile rule the first reptile task of generation of completion, bag
Include:
Corresponding first reptile is generated in front end according to the first reptile rule for being pre-configured with completion to work;
First reptile work is sent to control centre;
Corresponding first reptile task is generated according to first reptile work in the control centre.
In one embodiment, it is described that objective network link is obtained according to the first reptile task, including:
The first reptile task is sent to executor by the control centre;
The executor obtains the content of pages of at least one webpage to be crawled;
The objective network link of the first reptile task is parsed from the content of pages;
The objective network chain of the first reptile task is received and sent to the control centre.
In one embodiment, the input using objective network link as the second reptile rule, including:
The control centre parses target web according to objective network link;
Webpage to be crawled using the target web as second reptile rule.
In one embodiment, the corresponding number of the second reptile task storage by second reptile rule generation
According to content, including:
The executor obtains the webpage to be crawled of the second reptile rule;
Target storage content is gone out by the second reptile rule parsing;
Download and store the target storage content into presetting database.
Second aspect according to embodiments of the present invention, there is provided a kind of device of intelligent reptile task, including:
Generation module, for according to the first reptile rule the first reptile task of generation for being pre-configured with completion;
Acquisition module, for obtaining objective network link according to the first reptile task;
Input module, for the input using objective network link as the second reptile rule;
Memory module, stored for the second reptile task by second reptile rule generation in corresponding data
Hold.
In one embodiment, the generation module, including:
First generation submodule, for generating corresponding first according to the first reptile rule for being pre-configured with completion in front end
Reptile works;
Sending submodule, for first reptile work to be sent into control centre;
Second generation submodule, for being climbed in the control centre according to first reptile work generation corresponding first
Worm task.
In one embodiment, the acquisition module, including:
First sending submodule, the first reptile task is sent to executor for the control centre;
First acquisition submodule, the content of pages of at least one webpage to be crawled is obtained for the executor;
First analyzing sub-module, for parsing the target network of the first reptile task from the content of pages
Network links;
Second sending submodule, for the objective network chain of the first reptile task to be received and sent to the control
Center.
In one embodiment, the input module, including:
Second analyzing sub-module, target web is parsed according to objective network link for the control centre;
Input submodule, for the webpage to be crawled using the target web as second reptile rule.
In one embodiment, the memory module, including:
Second acquisition submodule, the webpage to be crawled of the second reptile rule is obtained for the executor;
3rd analyzing sub-module, for going out target storage content by the second reptile rule parsing;
Sub-module stored, for downloading and storing the target storage content into presetting database.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification
Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by the explanations write
Specifically noted structure is realized and obtained in book, claims and accompanying drawing.
Below by drawings and examples, technical scheme is described in further detail.
Brief description of the drawings
Accompanying drawing is used for providing a further understanding of the present invention, and a part for constitution instruction, the reality with the present invention
Apply example to be used to explain the present invention together, be not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is a kind of flow chart of the method for intelligent reptile task shown in an exemplary embodiment of the invention;
Fig. 2 is a kind of step S11 of the method for intelligent reptile task shown in an exemplary embodiment of the invention flow
Figure;
Fig. 3 is a kind of step S12 of the method for intelligent reptile task shown in an exemplary embodiment of the invention flow
Figure;
Fig. 4 is a kind of step S13 of the method for intelligent reptile task shown in an exemplary embodiment of the invention flow
Figure;
Fig. 5 is a kind of step S14 of the method for intelligent reptile task shown in an exemplary embodiment of the invention flow
Figure;
Fig. 6 is a kind of block diagram of the device of intelligent reptile task shown in an exemplary embodiment of the invention;
Fig. 7 is a kind of frame of the generation module 61 of the device of intelligent reptile task shown in an exemplary embodiment of the invention
Figure;
Fig. 8 is a kind of frame of the acquisition module 62 of the device of intelligent reptile task shown in an exemplary embodiment of the invention
Figure;
Fig. 9 is a kind of frame of the input module 63 of the device of intelligent reptile task shown in an exemplary embodiment of the invention
Figure;
Figure 10 is a kind of memory module 64 of the device of intelligent reptile task shown in an exemplary embodiment of the invention
Block diagram.
Embodiment
The preferred embodiments of the present invention are illustrated below in conjunction with accompanying drawing, it will be appreciated that described herein preferred real
Apply example to be merely to illustrate and explain the present invention, be not intended to limit the present invention.
Fig. 1 is a kind of method flow diagram of generation module according to an exemplary embodiment, as shown in figure 1, the life
Into the method for module, comprise the following steps S11-S14:
In step s 11, according to the first reptile rule the first reptile task of generation for being pre-configured with completion;
In step s 12, objective network link is obtained according to the first reptile task;
In step s 13, the objective network is linked into the input as the second reptile rule;
In step S14, stored by the second reptile task of second reptile rule generation in corresponding data
Hold.
In one embodiment, reptile obtains the resource content of target web, can be oriented according to the specific purpose of user
Acquisition required for resource content.But user often faces the problem of following, the resource content of user's real demand is first
Objective network in the target web for beginning to input is linked in pointed webpage, i.e., user directly can not be got by reptile
The resource content of real demand.Technical scheme in the present embodiment can adequate solution above mentioned problem.
According to the first reptile rule the first reptile task of generation for being pre-configured with completion.Wherein, in front end according to matching somebody with somebody in advance
The the first reptile rule for putting completion generates corresponding first reptile work.First reptile work is sent to control centre.
Corresponding first reptile task is generated according to first reptile work in the control centre.
Objective network link is obtained according to the first reptile task.Wherein, the control centre is by first reptile
Task is sent to executor.The executor obtains the content of pages of at least one webpage to be crawled.From the content of pages
Parse the objective network link of the first reptile task.The objective network of the first reptile task is linked
Send to the control centre.
Input using objective network link as the second reptile rule.Wherein, the control centre is according to the mesh
Mark network linking parses target web.Webpage to be crawled using the target web as second reptile rule.
Corresponding data content is stored by the second reptile task of second reptile rule generation.Wherein, it is described
Executor obtains the webpage to be crawled of the second reptile rule.Gone out by the second reptile rule parsing in target storage
Hold.Download and store the target storage content into presetting database.
Technical scheme in the present embodiment be able to can be got twice and more than twice by intelligentized reptile task management
Network redirect after data content, so as to improve the degree of intelligence of reptile task.
In one embodiment, as shown in Fig. 2 step S11 comprises the following steps S21-S23:
In the step s 21, corresponding first reptile work is generated according to the first reptile rule for being pre-configured with completion in front end
Make;
In step S22, first reptile work is sent to control centre;
In step S23, corresponding first reptile task is generated according to first reptile work in the control centre.
In one embodiment, user can go out corresponding first reptile rule according to the intention of oneself in front-end configuration.
After the completion of configuration, the first reptile rule with completion is generated into the first reptile work, first reptile work in front end
It is the first reptile demand.First reptile work is sent to control centre by front end, and control centre works according to the first reptile and given birth to
Into the first reptile task that executor can be allowed to run.
In one embodiment, as shown in figure 3, step S12 comprises the following steps S31-S34:
In step S31, the first reptile task is sent to executor by the control centre;
In step s 32, the executor obtains the content of pages of at least one webpage to be crawled;
In step S33, the objective network link of the first reptile task is parsed from the content of pages;
In step S34, the objective network chain of the first reptile task is received and sent to the control centre.
In one embodiment, the first reptile task is sent to executor by control centre, and executor is getting user
The content of pages at least one webpage to be crawled specified, such as, video website A, video website B and the video network that user specifies
Stand C.Control centre can generate three the first reptile tasks, be video website A the first reptile task, video website B respectively
First reptile task of the first reptile task and video website C, and by above-mentioned video website A the first reptile task, video
Website B the first reptile task and video website C the first reptile task are sent to executor.Video website A is obtained first, is regarded
The content of pages of frequency website B and video website C webpage to be crawled, this is parsed from the content of pages of those webpages to be crawled
The objective network link of a little first reptile tasks, the objective network link of those the first reptile tasks is sent to control centre.
In one embodiment, as shown in figure 4, step S13 comprises the following steps S41-S42:
In step S41, the control centre parses target web according to objective network link;
In step S42, the webpage to be crawled using the target web as second reptile rule.
In one embodiment, control centre parses the target network pointed by the objective network link of the first reptile task
Page.Webpage to be crawled using the target web as second reptile rule.For example, control centre parses objective network
Link " http://v.youku.com/v_show/id_XMzEyNjc2OTk2MA==.htmlSpm=
Target web α pointed by a2hww.20027244.m_250003.5~5~5~5~A ", using target web α as the second reptile
The webpage to be crawled of rule.
In one embodiment, as shown in figure 5, step S14 comprises the following steps S51-S53:
In step s 51, the executor obtains the webpage to be crawled of the second reptile rule;
In step S52, target storage content is gone out by the second reptile rule parsing;
In step S53, download and store the target storage content into presetting database.
In one embodiment, executor obtains the webpage to be crawled of the second reptile rule, generation the second reptile rule
The tree of webpage to be crawled, and the former page of the webpage to be crawled is shown to user simultaneously.Pass through second reptile rule
Target storage content is parsed, downloads and stores the target storage content into presetting database.For example, executor obtains the
The tree of the webpage to be crawled of two reptiles rule, and it is " emergency department doctor to parse target storage content:19th collection "
Video the cast that takes part in a performance, download and store " the emergency department doctor:The tool of the cast that takes part in a performance of the video of 19th collection "
Hold in vivo into presetting database.
In one embodiment, Fig. 6 is a kind of device frame of intelligent reptile task according to an exemplary embodiment
Figure.As Fig. 6 shows, the device includes generation module 61, acquisition module 62, input module 63 and memory module 64.
The generation module 61, for according to the first reptile rule the first reptile task of generation for being pre-configured with completion;
The acquisition module 62, for obtaining objective network link according to the first reptile task;
The input module 63, for the input using objective network link as the second reptile rule;
The memory module 64, corresponding number is stored for the second reptile task by second reptile rule generation
According to content.
As shown in fig. 7, the generation module 61 includes the first generation submodule 71, sending submodule 72 and second generates submodule
Block 73.
The first generation submodule 71, for corresponding according to the first reptile rule generation for being pre-configured with completion in front end
First reptile works;
The sending submodule 72, for first reptile work to be sent into control centre;
The second generation submodule 73, in the control centre according to first reptile work generation corresponding the
One reptile task.
As shown in figure 8, the acquisition module 62 includes the first sending submodule 81, the first acquisition submodule 82, first parses
The sending submodule 84 of submodule 83 and second.
First sending submodule 81, the first reptile task is sent to executor for the control centre;
First acquisition submodule 82, the content of pages of at least one webpage to be crawled is obtained for the executor;
First analyzing sub-module 83, for parsing the mesh of the first reptile task from the content of pages
Mark network linking;
Second sending submodule 84, for the objective network chain of the first reptile task to be received and sent to described
Control centre.
As shown in figure 9, the input module 63 includes the second analyzing sub-module 91 and input submodule 92.
Second analyzing sub-module 91, target network is parsed according to objective network link for the control centre
Page;
The input submodule 92, for the webpage to be crawled using the target web as second reptile rule.
As shown in Figure 10, the memory module 64 includes the second acquisition submodule 101, the 3rd analyzing sub-module 102 and storage
Submodule 103.
Second acquisition submodule 101, the webpage to be crawled of the second reptile rule is obtained for the executor;
3rd analyzing sub-module 102, for going out target storage content by the second reptile rule parsing;
The sub-module stored 103, for downloading and storing the target storage content into presetting database.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program
Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware
Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more
The shape for the computer program product that usable storage medium is implemented on (including but is not limited to magnetic disk storage and optical memory etc.)
Formula.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram
Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided
The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real
The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to
Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
Above-described embodiment can independent assortment.Obviously, those skilled in the art can to the present invention carry out it is various change and
Modification is without departing from the spirit and scope of the present invention.So, if these modifications and variations of the present invention belong to right of the present invention
It is required that and its within the scope of equivalent technologies, then the present invention be also intended to comprising these change and modification including.
Claims (10)
- A kind of 1. method of intelligent reptile task, it is characterised in that including:According to the first reptile rule the first reptile task of generation for being pre-configured with completion;Objective network link is obtained according to the first reptile task;Input using objective network link as the second reptile rule;Corresponding data content is stored by the second reptile task of second reptile rule generation.
- 2. the method as described in claim 1, it is characterised in that the basis is pre-configured with the first reptile rule generation of completion First reptile task, including:Corresponding first reptile is generated in front end according to the first reptile rule for being pre-configured with completion to work;First reptile work is sent to control centre;Corresponding first reptile task is generated according to first reptile work in the control centre.
- 3. the method as described in claim 1, it is characterised in that described that objective network chain is obtained according to the first reptile task Connect, including:The first reptile task is sent to executor by the control centre;The executor obtains the content of pages of at least one webpage to be crawled;The objective network link of the first reptile task is parsed from the content of pages;The objective network chain of the first reptile task is received and sent to the control centre.
- 4. the method as described in claim 1, it is characterised in that described using objective network link as the second reptile rule Input, including:The control centre parses target web according to objective network link;Webpage to be crawled using the target web as second reptile rule.
- 5. the method as described in claim 1, it is characterised in that second reptile by second reptile rule generation Task stores corresponding data content, including:The executor obtains the webpage to be crawled of the second reptile rule;Target storage content is gone out by the second reptile rule parsing;Download and store the target storage content into presetting database.
- A kind of 6. device of intelligent reptile task, it is characterised in that including:Generation module, for according to the first reptile rule the first reptile task of generation for being pre-configured with completion;Acquisition module, for obtaining objective network link according to the first reptile task;Input module, for the input using objective network link as the second reptile rule;Memory module, corresponding data content is stored for the second reptile task by second reptile rule generation.
- 7. device according to claim 6, it is characterised in that the generation module, including:First generation submodule, for generating corresponding first reptile according to the first reptile rule for being pre-configured with completion in front end Work;Sending submodule, for first reptile work to be sent into control centre;Second generation submodule, appoints for generating corresponding first reptile according to first reptile work in the control centre Business.
- 8. device according to claim 6, it is characterised in that the acquisition module, including:First sending submodule, the first reptile task is sent to executor for the control centre;First acquisition submodule, the content of pages of at least one webpage to be crawled is obtained for the executor;First analyzing sub-module, for parsing the objective network chain of the first reptile task from the content of pages Connect;Second sending submodule, for the objective network chain of the first reptile task to be received and sent into the control The heart.
- 9. device according to claim 6, it is characterised in that the input module, including:Second analyzing sub-module, target web is parsed according to objective network link for the control centre;Input submodule, for the webpage to be crawled using the target web as second reptile rule.
- 10. device according to claim 6, it is characterised in that the memory module, including:Second acquisition submodule, the webpage to be crawled of the second reptile rule is obtained for the executor;3rd analyzing sub-module, for going out target storage content by the second reptile rule parsing;Sub-module stored, it is used for Download and store the target storage content into presetting database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711106956.XA CN107885843A (en) | 2017-11-10 | 2017-11-10 | A kind of method and device of intelligent reptile task |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711106956.XA CN107885843A (en) | 2017-11-10 | 2017-11-10 | A kind of method and device of intelligent reptile task |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107885843A true CN107885843A (en) | 2018-04-06 |
Family
ID=61780198
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711106956.XA Pending CN107885843A (en) | 2017-11-10 | 2017-11-10 | A kind of method and device of intelligent reptile task |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107885843A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109359232A (en) * | 2018-08-21 | 2019-02-19 | 中国平安人寿保险股份有限公司 | Obtain method, apparatus, computer equipment and the storage medium of room rate |
CN110968756A (en) * | 2018-09-29 | 2020-04-07 | 北京国双科技有限公司 | Webpage crawling method and device |
-
2017
- 2017-11-10 CN CN201711106956.XA patent/CN107885843A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109359232A (en) * | 2018-08-21 | 2019-02-19 | 中国平安人寿保险股份有限公司 | Obtain method, apparatus, computer equipment and the storage medium of room rate |
CN110968756A (en) * | 2018-09-29 | 2020-04-07 | 北京国双科技有限公司 | Webpage crawling method and device |
CN110968756B (en) * | 2018-09-29 | 2023-05-12 | 北京国双科技有限公司 | Webpage crawling method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109614102A (en) | Code automatic generation method, device, electronic equipment and storage medium | |
CN109344170B (en) | Stream data processing method, system, electronic device and readable storage medium | |
CN110096689A (en) | Template type legal documents information fill method and device | |
US10635435B2 (en) | Collection of API documentations | |
US20100100872A1 (en) | Methods and systems for implementing a test automation framework for testing software applications on unix/linux based machines | |
CN109710703A (en) | A kind of generation method and device of genetic connection network | |
CN104424018B (en) | Distributed Calculation transaction methods and device | |
CN107729106A (en) | It is a kind of that the method and apparatus quickly redirected are realized between application component | |
CN112328671A (en) | Data format conversion method, system, storage medium and equipment | |
CN108776610A (en) | A kind of interface configuration method and device | |
CN112036577B (en) | Method and device for applying machine learning based on data form and electronic equipment | |
CN109582289B (en) | Method, system, storage medium and processor for processing rule flow in rule engine | |
CN102929646B (en) | Application program generation method and device | |
CN109840083A (en) | Web pages component template construction method, device, computer equipment and storage medium | |
CN111984669B (en) | Functional SQL query method, device, equipment and medium supporting dynamic variables | |
US10380233B2 (en) | Launching workflow processes based on annotations in a document | |
US8756258B2 (en) | Generating references to reusable code in a schema | |
CN107885843A (en) | A kind of method and device of intelligent reptile task | |
Gulwani et al. | StriSynth: synthesis for live programming | |
CN110019315A (en) | A kind of method and apparatus for the parsing of data blood relationship | |
CN111310005A (en) | Network request processing method and device, server and storage medium | |
CN106484488B (en) | Integrated cloud Compilation Method and system | |
US20130080876A1 (en) | Using a template processor to determine context nodes | |
CN107817972B (en) | Cache code processing method and device, storage medium and electronic equipment | |
CN113495723B (en) | Method, device and storage medium for calling functional component |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180406 |