CN107885843A - A kind of method and device of intelligent reptile task - Google Patents

A kind of method and device of intelligent reptile task Download PDF

Info

Publication number
CN107885843A
CN107885843A CN201711106956.XA CN201711106956A CN107885843A CN 107885843 A CN107885843 A CN 107885843A CN 201711106956 A CN201711106956 A CN 201711106956A CN 107885843 A CN107885843 A CN 107885843A
Authority
CN
China
Prior art keywords
reptile
task
rule
objective network
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711106956.XA
Other languages
Chinese (zh)
Inventor
郭建辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TVMining Beijing Media Technology Co Ltd
Original Assignee
TVMining Beijing Media Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TVMining Beijing Media Technology Co Ltd filed Critical TVMining Beijing Media Technology Co Ltd
Priority to CN201711106956.XA priority Critical patent/CN107885843A/en
Publication of CN107885843A publication Critical patent/CN107885843A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of method and device of intelligent reptile task.The method of the intelligent reptile task, including:According to the first reptile rule the first reptile task of generation for being pre-configured with completion;Objective network link is obtained according to the first reptile task;Input using objective network link as the second reptile rule;Corresponding data content is stored according to the second reptile task of second reptile rule generation.The present invention by intelligentized reptile task management, the network that can be got twice and more than twice redirect after data content, so as to improve the degree of intelligence of reptile task.

Description

A kind of method and device of intelligent reptile task
Technical field
The present invention relates to crawler technology field, more particularly to a kind of method and device of intelligent reptile task.
Background technology
Reptile obtains the resource content of target web, can be according to the specific purpose of user come the money required for the acquisition that orients Source contents.But user often faces the problem of following, the target web of the resource content of user's real demand in initial input In the pointed webpage of objective network link in, i.e., user directly can not be got in the resource of real demand by reptile Hold.How adequate solution above mentioned problem, just become industry problem urgently to be resolved hurrily.
The content of the invention
The present invention provides a kind of method and device of intelligent reptile task, to by intelligentized reptile task management, The network that can be got twice and more than twice redirect after data content, so as to improve the degree of intelligence of reptile task.
First aspect according to embodiments of the present invention, there is provided a kind of method of intelligent reptile task, including:
According to the first reptile rule the first reptile task of generation for being pre-configured with completion;
Objective network link is obtained according to the first reptile task;
Input using objective network link as the second reptile rule;
Corresponding data content is stored by the second reptile task of second reptile rule generation.
In one embodiment, the basis is pre-configured with the first reptile rule the first reptile task of generation of completion, bag Include:
Corresponding first reptile is generated in front end according to the first reptile rule for being pre-configured with completion to work;
First reptile work is sent to control centre;
Corresponding first reptile task is generated according to first reptile work in the control centre.
In one embodiment, it is described that objective network link is obtained according to the first reptile task, including:
The first reptile task is sent to executor by the control centre;
The executor obtains the content of pages of at least one webpage to be crawled;
The objective network link of the first reptile task is parsed from the content of pages;
The objective network chain of the first reptile task is received and sent to the control centre.
In one embodiment, the input using objective network link as the second reptile rule, including:
The control centre parses target web according to objective network link;
Webpage to be crawled using the target web as second reptile rule.
In one embodiment, the corresponding number of the second reptile task storage by second reptile rule generation According to content, including:
The executor obtains the webpage to be crawled of the second reptile rule;
Target storage content is gone out by the second reptile rule parsing;
Download and store the target storage content into presetting database.
Second aspect according to embodiments of the present invention, there is provided a kind of device of intelligent reptile task, including:
Generation module, for according to the first reptile rule the first reptile task of generation for being pre-configured with completion;
Acquisition module, for obtaining objective network link according to the first reptile task;
Input module, for the input using objective network link as the second reptile rule;
Memory module, stored for the second reptile task by second reptile rule generation in corresponding data Hold.
In one embodiment, the generation module, including:
First generation submodule, for generating corresponding first according to the first reptile rule for being pre-configured with completion in front end Reptile works;
Sending submodule, for first reptile work to be sent into control centre;
Second generation submodule, for being climbed in the control centre according to first reptile work generation corresponding first Worm task.
In one embodiment, the acquisition module, including:
First sending submodule, the first reptile task is sent to executor for the control centre;
First acquisition submodule, the content of pages of at least one webpage to be crawled is obtained for the executor;
First analyzing sub-module, for parsing the target network of the first reptile task from the content of pages Network links;
Second sending submodule, for the objective network chain of the first reptile task to be received and sent to the control Center.
In one embodiment, the input module, including:
Second analyzing sub-module, target web is parsed according to objective network link for the control centre;
Input submodule, for the webpage to be crawled using the target web as second reptile rule.
In one embodiment, the memory module, including:
Second acquisition submodule, the webpage to be crawled of the second reptile rule is obtained for the executor;
3rd analyzing sub-module, for going out target storage content by the second reptile rule parsing;
Sub-module stored, for downloading and storing the target storage content into presetting database.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by the explanations write Specifically noted structure is realized and obtained in book, claims and accompanying drawing.
Below by drawings and examples, technical scheme is described in further detail.
Brief description of the drawings
Accompanying drawing is used for providing a further understanding of the present invention, and a part for constitution instruction, the reality with the present invention Apply example to be used to explain the present invention together, be not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is a kind of flow chart of the method for intelligent reptile task shown in an exemplary embodiment of the invention;
Fig. 2 is a kind of step S11 of the method for intelligent reptile task shown in an exemplary embodiment of the invention flow Figure;
Fig. 3 is a kind of step S12 of the method for intelligent reptile task shown in an exemplary embodiment of the invention flow Figure;
Fig. 4 is a kind of step S13 of the method for intelligent reptile task shown in an exemplary embodiment of the invention flow Figure;
Fig. 5 is a kind of step S14 of the method for intelligent reptile task shown in an exemplary embodiment of the invention flow Figure;
Fig. 6 is a kind of block diagram of the device of intelligent reptile task shown in an exemplary embodiment of the invention;
Fig. 7 is a kind of frame of the generation module 61 of the device of intelligent reptile task shown in an exemplary embodiment of the invention Figure;
Fig. 8 is a kind of frame of the acquisition module 62 of the device of intelligent reptile task shown in an exemplary embodiment of the invention Figure;
Fig. 9 is a kind of frame of the input module 63 of the device of intelligent reptile task shown in an exemplary embodiment of the invention Figure;
Figure 10 is a kind of memory module 64 of the device of intelligent reptile task shown in an exemplary embodiment of the invention Block diagram.
Embodiment
The preferred embodiments of the present invention are illustrated below in conjunction with accompanying drawing, it will be appreciated that described herein preferred real Apply example to be merely to illustrate and explain the present invention, be not intended to limit the present invention.
Fig. 1 is a kind of method flow diagram of generation module according to an exemplary embodiment, as shown in figure 1, the life Into the method for module, comprise the following steps S11-S14:
In step s 11, according to the first reptile rule the first reptile task of generation for being pre-configured with completion;
In step s 12, objective network link is obtained according to the first reptile task;
In step s 13, the objective network is linked into the input as the second reptile rule;
In step S14, stored by the second reptile task of second reptile rule generation in corresponding data Hold.
In one embodiment, reptile obtains the resource content of target web, can be oriented according to the specific purpose of user Acquisition required for resource content.But user often faces the problem of following, the resource content of user's real demand is first Objective network in the target web for beginning to input is linked in pointed webpage, i.e., user directly can not be got by reptile The resource content of real demand.Technical scheme in the present embodiment can adequate solution above mentioned problem.
According to the first reptile rule the first reptile task of generation for being pre-configured with completion.Wherein, in front end according to matching somebody with somebody in advance The the first reptile rule for putting completion generates corresponding first reptile work.First reptile work is sent to control centre. Corresponding first reptile task is generated according to first reptile work in the control centre.
Objective network link is obtained according to the first reptile task.Wherein, the control centre is by first reptile Task is sent to executor.The executor obtains the content of pages of at least one webpage to be crawled.From the content of pages Parse the objective network link of the first reptile task.The objective network of the first reptile task is linked Send to the control centre.
Input using objective network link as the second reptile rule.Wherein, the control centre is according to the mesh Mark network linking parses target web.Webpage to be crawled using the target web as second reptile rule.
Corresponding data content is stored by the second reptile task of second reptile rule generation.Wherein, it is described Executor obtains the webpage to be crawled of the second reptile rule.Gone out by the second reptile rule parsing in target storage Hold.Download and store the target storage content into presetting database.
Technical scheme in the present embodiment be able to can be got twice and more than twice by intelligentized reptile task management Network redirect after data content, so as to improve the degree of intelligence of reptile task.
In one embodiment, as shown in Fig. 2 step S11 comprises the following steps S21-S23:
In the step s 21, corresponding first reptile work is generated according to the first reptile rule for being pre-configured with completion in front end Make;
In step S22, first reptile work is sent to control centre;
In step S23, corresponding first reptile task is generated according to first reptile work in the control centre.
In one embodiment, user can go out corresponding first reptile rule according to the intention of oneself in front-end configuration. After the completion of configuration, the first reptile rule with completion is generated into the first reptile work, first reptile work in front end It is the first reptile demand.First reptile work is sent to control centre by front end, and control centre works according to the first reptile and given birth to Into the first reptile task that executor can be allowed to run.
In one embodiment, as shown in figure 3, step S12 comprises the following steps S31-S34:
In step S31, the first reptile task is sent to executor by the control centre;
In step s 32, the executor obtains the content of pages of at least one webpage to be crawled;
In step S33, the objective network link of the first reptile task is parsed from the content of pages;
In step S34, the objective network chain of the first reptile task is received and sent to the control centre.
In one embodiment, the first reptile task is sent to executor by control centre, and executor is getting user The content of pages at least one webpage to be crawled specified, such as, video website A, video website B and the video network that user specifies Stand C.Control centre can generate three the first reptile tasks, be video website A the first reptile task, video website B respectively First reptile task of the first reptile task and video website C, and by above-mentioned video website A the first reptile task, video Website B the first reptile task and video website C the first reptile task are sent to executor.Video website A is obtained first, is regarded The content of pages of frequency website B and video website C webpage to be crawled, this is parsed from the content of pages of those webpages to be crawled The objective network link of a little first reptile tasks, the objective network link of those the first reptile tasks is sent to control centre.
In one embodiment, as shown in figure 4, step S13 comprises the following steps S41-S42:
In step S41, the control centre parses target web according to objective network link;
In step S42, the webpage to be crawled using the target web as second reptile rule.
In one embodiment, control centre parses the target network pointed by the objective network link of the first reptile task Page.Webpage to be crawled using the target web as second reptile rule.For example, control centre parses objective network Link " http://v.youku.com/v_show/id_XMzEyNjc2OTk2MA==.htmlSpm= Target web α pointed by a2hww.20027244.m_250003.5~5~5~5~A ", using target web α as the second reptile The webpage to be crawled of rule.
In one embodiment, as shown in figure 5, step S14 comprises the following steps S51-S53:
In step s 51, the executor obtains the webpage to be crawled of the second reptile rule;
In step S52, target storage content is gone out by the second reptile rule parsing;
In step S53, download and store the target storage content into presetting database.
In one embodiment, executor obtains the webpage to be crawled of the second reptile rule, generation the second reptile rule The tree of webpage to be crawled, and the former page of the webpage to be crawled is shown to user simultaneously.Pass through second reptile rule Target storage content is parsed, downloads and stores the target storage content into presetting database.For example, executor obtains the The tree of the webpage to be crawled of two reptiles rule, and it is " emergency department doctor to parse target storage content:19th collection " Video the cast that takes part in a performance, download and store " the emergency department doctor:The tool of the cast that takes part in a performance of the video of 19th collection " Hold in vivo into presetting database.
In one embodiment, Fig. 6 is a kind of device frame of intelligent reptile task according to an exemplary embodiment Figure.As Fig. 6 shows, the device includes generation module 61, acquisition module 62, input module 63 and memory module 64.
The generation module 61, for according to the first reptile rule the first reptile task of generation for being pre-configured with completion;
The acquisition module 62, for obtaining objective network link according to the first reptile task;
The input module 63, for the input using objective network link as the second reptile rule;
The memory module 64, corresponding number is stored for the second reptile task by second reptile rule generation According to content.
As shown in fig. 7, the generation module 61 includes the first generation submodule 71, sending submodule 72 and second generates submodule Block 73.
The first generation submodule 71, for corresponding according to the first reptile rule generation for being pre-configured with completion in front end First reptile works;
The sending submodule 72, for first reptile work to be sent into control centre;
The second generation submodule 73, in the control centre according to first reptile work generation corresponding the One reptile task.
As shown in figure 8, the acquisition module 62 includes the first sending submodule 81, the first acquisition submodule 82, first parses The sending submodule 84 of submodule 83 and second.
First sending submodule 81, the first reptile task is sent to executor for the control centre;
First acquisition submodule 82, the content of pages of at least one webpage to be crawled is obtained for the executor;
First analyzing sub-module 83, for parsing the mesh of the first reptile task from the content of pages Mark network linking;
Second sending submodule 84, for the objective network chain of the first reptile task to be received and sent to described Control centre.
As shown in figure 9, the input module 63 includes the second analyzing sub-module 91 and input submodule 92.
Second analyzing sub-module 91, target network is parsed according to objective network link for the control centre Page;
The input submodule 92, for the webpage to be crawled using the target web as second reptile rule.
As shown in Figure 10, the memory module 64 includes the second acquisition submodule 101, the 3rd analyzing sub-module 102 and storage Submodule 103.
Second acquisition submodule 101, the webpage to be crawled of the second reptile rule is obtained for the executor;
3rd analyzing sub-module 102, for going out target storage content by the second reptile rule parsing;
The sub-module stored 103, for downloading and storing the target storage content into presetting database.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more The shape for the computer program product that usable storage medium is implemented on (including but is not limited to magnetic disk storage and optical memory etc.) Formula.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
Above-described embodiment can independent assortment.Obviously, those skilled in the art can to the present invention carry out it is various change and Modification is without departing from the spirit and scope of the present invention.So, if these modifications and variations of the present invention belong to right of the present invention It is required that and its within the scope of equivalent technologies, then the present invention be also intended to comprising these change and modification including.

Claims (10)

  1. A kind of 1. method of intelligent reptile task, it is characterised in that including:
    According to the first reptile rule the first reptile task of generation for being pre-configured with completion;
    Objective network link is obtained according to the first reptile task;
    Input using objective network link as the second reptile rule;
    Corresponding data content is stored by the second reptile task of second reptile rule generation.
  2. 2. the method as described in claim 1, it is characterised in that the basis is pre-configured with the first reptile rule generation of completion First reptile task, including:
    Corresponding first reptile is generated in front end according to the first reptile rule for being pre-configured with completion to work;
    First reptile work is sent to control centre;
    Corresponding first reptile task is generated according to first reptile work in the control centre.
  3. 3. the method as described in claim 1, it is characterised in that described that objective network chain is obtained according to the first reptile task Connect, including:
    The first reptile task is sent to executor by the control centre;
    The executor obtains the content of pages of at least one webpage to be crawled;
    The objective network link of the first reptile task is parsed from the content of pages;
    The objective network chain of the first reptile task is received and sent to the control centre.
  4. 4. the method as described in claim 1, it is characterised in that described using objective network link as the second reptile rule Input, including:
    The control centre parses target web according to objective network link;
    Webpage to be crawled using the target web as second reptile rule.
  5. 5. the method as described in claim 1, it is characterised in that second reptile by second reptile rule generation Task stores corresponding data content, including:
    The executor obtains the webpage to be crawled of the second reptile rule;
    Target storage content is gone out by the second reptile rule parsing;
    Download and store the target storage content into presetting database.
  6. A kind of 6. device of intelligent reptile task, it is characterised in that including:
    Generation module, for according to the first reptile rule the first reptile task of generation for being pre-configured with completion;
    Acquisition module, for obtaining objective network link according to the first reptile task;
    Input module, for the input using objective network link as the second reptile rule;
    Memory module, corresponding data content is stored for the second reptile task by second reptile rule generation.
  7. 7. device according to claim 6, it is characterised in that the generation module, including:
    First generation submodule, for generating corresponding first reptile according to the first reptile rule for being pre-configured with completion in front end Work;
    Sending submodule, for first reptile work to be sent into control centre;
    Second generation submodule, appoints for generating corresponding first reptile according to first reptile work in the control centre Business.
  8. 8. device according to claim 6, it is characterised in that the acquisition module, including:
    First sending submodule, the first reptile task is sent to executor for the control centre;
    First acquisition submodule, the content of pages of at least one webpage to be crawled is obtained for the executor;
    First analyzing sub-module, for parsing the objective network chain of the first reptile task from the content of pages Connect;
    Second sending submodule, for the objective network chain of the first reptile task to be received and sent into the control The heart.
  9. 9. device according to claim 6, it is characterised in that the input module, including:
    Second analyzing sub-module, target web is parsed according to objective network link for the control centre;
    Input submodule, for the webpage to be crawled using the target web as second reptile rule.
  10. 10. device according to claim 6, it is characterised in that the memory module, including:
    Second acquisition submodule, the webpage to be crawled of the second reptile rule is obtained for the executor;
    3rd analyzing sub-module, for going out target storage content by the second reptile rule parsing;Sub-module stored, it is used for Download and store the target storage content into presetting database.
CN201711106956.XA 2017-11-10 2017-11-10 A kind of method and device of intelligent reptile task Pending CN107885843A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711106956.XA CN107885843A (en) 2017-11-10 2017-11-10 A kind of method and device of intelligent reptile task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711106956.XA CN107885843A (en) 2017-11-10 2017-11-10 A kind of method and device of intelligent reptile task

Publications (1)

Publication Number Publication Date
CN107885843A true CN107885843A (en) 2018-04-06

Family

ID=61780198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711106956.XA Pending CN107885843A (en) 2017-11-10 2017-11-10 A kind of method and device of intelligent reptile task

Country Status (1)

Country Link
CN (1) CN107885843A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359232A (en) * 2018-08-21 2019-02-19 中国平安人寿保险股份有限公司 Obtain method, apparatus, computer equipment and the storage medium of room rate
CN110968756A (en) * 2018-09-29 2020-04-07 北京国双科技有限公司 Webpage crawling method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359232A (en) * 2018-08-21 2019-02-19 中国平安人寿保险股份有限公司 Obtain method, apparatus, computer equipment and the storage medium of room rate
CN110968756A (en) * 2018-09-29 2020-04-07 北京国双科技有限公司 Webpage crawling method and device
CN110968756B (en) * 2018-09-29 2023-05-12 北京国双科技有限公司 Webpage crawling method and device

Similar Documents

Publication Publication Date Title
CN109614102A (en) Code automatic generation method, device, electronic equipment and storage medium
CN109344170B (en) Stream data processing method, system, electronic device and readable storage medium
CN110096689A (en) Template type legal documents information fill method and device
US10635435B2 (en) Collection of API documentations
US20100100872A1 (en) Methods and systems for implementing a test automation framework for testing software applications on unix/linux based machines
CN109710703A (en) A kind of generation method and device of genetic connection network
CN104424018B (en) Distributed Calculation transaction methods and device
CN107729106A (en) It is a kind of that the method and apparatus quickly redirected are realized between application component
CN112328671A (en) Data format conversion method, system, storage medium and equipment
CN108776610A (en) A kind of interface configuration method and device
CN112036577B (en) Method and device for applying machine learning based on data form and electronic equipment
CN109582289B (en) Method, system, storage medium and processor for processing rule flow in rule engine
CN102929646B (en) Application program generation method and device
CN109840083A (en) Web pages component template construction method, device, computer equipment and storage medium
CN111984669B (en) Functional SQL query method, device, equipment and medium supporting dynamic variables
US10380233B2 (en) Launching workflow processes based on annotations in a document
US8756258B2 (en) Generating references to reusable code in a schema
CN107885843A (en) A kind of method and device of intelligent reptile task
Gulwani et al. StriSynth: synthesis for live programming
CN110019315A (en) A kind of method and apparatus for the parsing of data blood relationship
CN111310005A (en) Network request processing method and device, server and storage medium
CN106484488B (en) Integrated cloud Compilation Method and system
US20130080876A1 (en) Using a template processor to determine context nodes
CN107817972B (en) Cache code processing method and device, storage medium and electronic equipment
CN113495723B (en) Method, device and storage medium for calling functional component

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180406