CN107133217A - Target topic intelligent grabbing method, system and computer-readable recording medium - Google Patents

Target topic intelligent grabbing method, system and computer-readable recording medium Download PDF

Info

Publication number
CN107133217A
CN107133217A CN201710385603.1A CN201710385603A CN107133217A CN 107133217 A CN107133217 A CN 107133217A CN 201710385603 A CN201710385603 A CN 201710385603A CN 107133217 A CN107133217 A CN 107133217A
Authority
CN
China
Prior art keywords
target topic
search result
search
control centre
queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710385603.1A
Other languages
Chinese (zh)
Inventor
张程伟
刘顺峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hui Xing Xing Xing Network Technology Co Ltd
Original Assignee
Beijing Hui Xing Xing Xing Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Hui Xing Xing Xing Network Technology Co Ltd filed Critical Beijing Hui Xing Xing Xing Network Technology Co Ltd
Priority to CN201710385603.1A priority Critical patent/CN107133217A/en
Publication of CN107133217A publication Critical patent/CN107133217A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

Disclose a kind of target topic intelligent grabbing method, system and computer-readable recording medium.This method can include:Control centre reads the target topic of search to be matched from database, and multiple queues are distributed to by dispatching algorithm;Multiple analyzers target topic to be analyzed such as acquisition from corresponding multiple queues, and carry out keyword extraction simultaneously, obtains keyword;The search interface that keyword is applied to corresponding multiple internet sites by multiple analyzers simultaneously is scanned for, and the search result of acquisition is returned into control centre;And control centre's analysis search result, final search result is preserved, wherein, multiple analyzers are corresponded with multiple queues.The present invention realizes the intelligent grabbing of the target topic of efficient stable by distributed search.

Description

Target topic intelligent grabbing method, system and computer-readable recording medium
Technical field
The present invention relates to computer realm, more particularly, to a kind of target topic intelligent grabbing method, system and calculating Machine readable storage medium storing program for executing.
Background technology
In computer realm, crawler technology is an automatic program for downloading webpage, and it has according to set crawl target Webpage on the access WWW of selection is linked to related, the information required for obtaining.Crawler technology does not pursue big cover Lid, and will be targeted by capturing the webpage related to a certain particular topic content, it is that user's inquiry of subject-oriented prepares data Resource.At present, some websites may the logic that accesses of some controls, i.e., anti-crawl is tactful.Therefore, it is necessary to develop a kind of target Theme intelligent grabbing method, system and computer-readable recording medium.
The information for being disclosed in background of invention part is merely intended to deepen the reason of the general background technology to the present invention Solution, and be not construed as recognizing or imply known to those skilled in the art existing of the information structure in any form Technology.
The content of the invention
The present invention proposes a kind of target topic intelligent grabbing method, system and computer-readable recording medium, and it can By distributed search, the intelligent grabbing of the target topic of efficient stable is realized.
According to an aspect of the invention, it is proposed that a kind of target topic intelligent grabbing method.Methods described can include:Adjust The target topic of search to be matched is read at degree center from database, and multiple queues are distributed to by dispatching algorithm;Multiple analyses The device target topic to be analyzed such as acquisition from corresponding the multiple queue, and carry out keyword extraction simultaneously, is obtained The keyword;The search interface that the keyword is applied to corresponding multiple internet sites by multiple analyzers simultaneously is carried out Search, control centre is returned to by the search result of acquisition;And control centre analyzes the search result, final search is preserved Hitch fruit, wherein, the multiple analyzer is corresponded with the multiple queue.
Preferably, control centre analyzes the search result, and preserving final search result includes:Judge the search knot Whether fruit is effective, if invalid, proceeds the distribution of the queue, and duplicate key word is extracted and search;And if effective, Preserve the final search result.
Preferably, the multiple analyzer is corresponded with the multiple internet sites.
Preferably, the target topic of the search to be matched is uploaded to the database by user by client.
Preferably, in addition to:The final search result is returned to the client by control centre.
According to another aspect of the invention, it is proposed that a kind of target topic intelligent grabbing system, the system can include: Control centre, the control centre reads the target topic of search to be matched from database, is distributed to by dispatching algorithm many Individual queue, analyzes search result, and preserve final search result;Database, stores the target topic of the search to be matched; Queue, the multiple queue receives the target topic of the search to be matched of control centre's distribution, and distributes to correspondence Multiple analyzers;And analyzer, acquisition etc. is to be analyzed from corresponding the multiple queue simultaneously for the multiple analyzer The target topic, and carry out keyword extraction, the keyword be applied to corresponding multiple internet sites simultaneously Search interface is scanned for, and the search result is returned into control centre, wherein, the multiple analyzer and the multiple team Row are corresponded.
Preferably, control centre analyzes the search result, and preserving final search result includes:Judge the search knot Whether fruit is effective, if invalid, proceeds the distribution of the queue, and duplicate key word is extracted and search;And if effective, Preserve the final search result.
Preferably, the multiple analyzer is corresponded with the multiple internet sites.
Preferably, the target topic of the search to be matched is uploaded to the database by user by client, scheduling The final search result is returned to the client by center.
According to the third aspect of the present invention, it is proposed that a kind of computer-readable recording medium, it is stored thereon with computer Program, wherein, following steps are realized when described program is executed by processor:Search to be matched is read from database in control centre Target topic, multiple queues are distributed to by dispatching algorithm;Multiple analyzers are obtained from corresponding the multiple queue simultaneously The target topic to be analyzed such as take, and carry out keyword extraction, obtain the keyword;Multiple analyzers simultaneously will be described The search interface that keyword is applied to corresponding multiple internet sites is scanned for, and the search result of acquisition is returned into scheduling Center;And control centre analyzes the search result, final search result is preserved, wherein, the multiple analyzer and institute Multiple queues are stated to correspond.
Methods and apparatus of the present invention has other characteristics and advantage, and these characteristics and advantage are attached from what is be incorporated herein It will be apparent in figure and subsequent embodiment, or by the accompanying drawing being incorporated herein and subsequent specific reality Apply in mode and stated in detail, these the drawings and specific embodiments are provided commonly for explaining the certain principles of the present invention.
Brief description of the drawings
By the way that exemplary embodiment of the invention is described in more detail with reference to accompanying drawing, it is of the invention above-mentioned and its Its purpose, feature and advantage will be apparent, wherein, in exemplary embodiment of the invention, identical reference number Typically represent same parts.
The flow chart for the step of Fig. 1 shows the target topic intelligent grabbing method according to the present invention.
Embodiment
The present invention is more fully described below with reference to accompanying drawings.Although showing the side of being preferable to carry out of the present invention in accompanying drawing Formula, however, it is to be appreciated that may be realized in various forms the present invention without that should be limited by embodiments set forth herein.Phase Instead there is provided these embodiments be in order that the present invention is more thorough and complete, and can be by the scope of the present invention intactly Convey to those skilled in the art.
Embodiment 1
The flow chart for the step of Fig. 1 shows the target topic intelligent grabbing method according to the present invention.
In this embodiment, it can be included according to the target topic intelligent grabbing method of the present invention:Step 101, dispatch The target topic of search to be matched is read at center from database, and multiple queues are distributed to by dispatching algorithm;Step 102, it is many The individual analyzer target topic to be analyzed such as acquisition from corresponding multiple queues, and carry out keyword extraction simultaneously, is closed Keyword;Step 103, the search interface that keyword is applied to corresponding multiple internet sites by multiple analyzers simultaneously is searched Rope, control centre is returned to by the search result of acquisition;And step 104, control centre's analysis search result, preserve finally Search result, wherein, multiple analyzers are corresponded with multiple queues.
The embodiment realizes the intelligent grabbing of the target topic of efficient stable by distributed search.
The following detailed description of the specific steps of the target topic intelligent grabbing method according to the present invention.
In one example, control centre reads the target topic of search to be matched from database, can pass through scheduling Algorithm distributes to multiple queues.
In one example, the target topic of search to be matched by client can be uploaded to database by user.
Specifically, the target topic of search to be matched is uploaded to database by user by client, and control centre is from number According to the target topic that search to be matched is read in storehouse, queue is distributed to by dispatching algorithm, dispatching algorithm can be according to the negative of system Carry, the task quantity that queue is waited for, the dynamic state of parameters distribution such as each website processing time, those skilled in the art can basis With putting forward the dispatching algorithm that situation selection needs.
In one example, multiple analyzers simultaneously from corresponding multiple queues obtain etc. target topic to be analyzed, And keyword extraction is carried out, keyword can be obtained.
In one example, multiple analyzers are corresponded with multiple queues.
In one example, keyword is applied to search circle of corresponding multiple internet sites by multiple analyzers simultaneously Face is scanned for, and the search result of acquisition can be returned into control centre.
In one example, multiple analyzers are corresponded with multiple internet sites.
Specifically, analyzer is corresponded with queue, and a component parser is corresponded with queue and internet sites, some There is the logic that control is accessed website, i.e., anti-crawl strategy, by configuring the analyzer of each website, carries out to website one by one individually Adaptation, optimization.Those skilled in the art can configure corresponding analyzer according to the concrete condition of website.
Multiple analyzers can the target topic to be analyzed such as acquisition from corresponding multiple queues, and carry out key simultaneously Word is extracted, and obtains keyword, and keyword is applied to the search interface of corresponding multiple internet sites by multiple analyzers simultaneously Scan for, the search result of acquisition is returned into control centre.
In one example, control centre's analysis search result, can preserve final search result.
In one example, control centre's analysis search result, preserving final search result can include:Judge search As a result whether effectively, if invalid, the distribution of queue is proceeded, duplicate key word is extracted and search;And if effectively, protect Deposit final search result.
In one example, this method can also include:Final search result is returned to client by control centre.
Specifically, whether analysis search result in control centre's is effective, i.e., whether merchandise news is normally resolved, if nothing Effect, proceeds the distribution of queue, and duplicate key word is extracted with searching for, if running into duplicate key word, judges keyword most The nearly time once performed, if treated in 7 days, ignore.If effectively, preserving final search result.Adjust Final search result is returned to client by degree center, and user can check final search result by client.
Using example
For ease of understanding the scheme and its effect of embodiment of the present invention, a concrete application example given below.Ability Field technique personnel should be understood that the example only for the purposes of understanding the present invention, and its any detail is not intended in any way The limitation present invention.
The target topic of search to be matched is uploaded to database by user by client, and control centre is read from database The target topic of search to be matched is taken, according to the load of system, the task quantity that queue is waited for, each website processing time etc. Dynamic state of parameters distribution condition, queue is distributed to by dispatching algorithm by the target topic of search to be matched.Analyzer and queue one One correspondence, a component parser is corresponded with queue and internet sites, by configuring the analyzer of each website, one by one website Ground be individually adapted to, optimized.
Multiple analyzers such as obtain at the target topic to be analyzed from corresponding multiple queues simultaneously, and carry out keyword and carry Take, obtain keyword, the search interface that keyword is applied to corresponding multiple internet sites by multiple analyzers simultaneously is carried out Search, control centre is returned to by the search result of acquisition.Whether control centre's analysis search result is effective, i.e., merchandise news is It is no to be normally resolved, judge that search result effectively, has been normally resolved, has then preserved final search result, and final is searched Hitch fruit returns to client, and user checks final search result by client.
In summary, the present invention realizes the intelligent grabbing of the target topic of efficient stable by distributed search.
It will be understood by those skilled in the art that the purpose of the description above to embodiments of the present invention is only for exemplarily Illustrate the beneficial effect of embodiments of the present invention, be not intended to embodiments of the present invention being limited to given any show Example.
Embodiment 2
According to the embodiment of the present invention there is provided a kind of target topic intelligent grabbing system, the system can include: Control centre, control centre reads the target topic of search to be matched from database, and multiple teams are distributed to by dispatching algorithm Row, analyze search result, and preserve final search result;Database, stores the target topic of search to be matched;Queue, it is many Individual queue receives the target topic of the search to be matched of control centre's distribution, and distributes to corresponding multiple analyzers;And point Parser, multiple analyzers such as obtain at the target topic to be analyzed from corresponding multiple queues simultaneously, and carry out keyword extraction, The search interface that keyword is applied into corresponding multiple internet sites simultaneously is scanned for, and search result is returned into scheduling Center, wherein, multiple analyzers are corresponded with multiple queues.
The embodiment realizes the intelligent grabbing of the target topic of efficient stable by distributed search.
In one example, control centre's analysis search result, and preserve final search result and can include:Judgement is searched Whether hitch fruit is effective, if invalid, proceeds the distribution of queue, and duplicate key word is extracted and search;And if effective, Preserve final search result.
In one example, multiple analyzers are corresponded with multiple internet sites.
In one example, the target topic of search to be matched by client can be uploaded to database by user, adjust Final search result is returned to client by degree center.
It will be understood by those skilled in the art that the purpose of the description above to embodiments of the present invention is only for exemplarily Illustrate the beneficial effect of embodiments of the present invention, be not intended to embodiments of the present invention being limited to given any show Example.
Embodiment 3
According to the embodiment of the present invention there is provided a kind of computer-readable recording medium, computer journey is stored thereon with Sequence, wherein, following steps are realized when program is executed by processor:The target of search to be matched is read from database in control centre Theme, multiple queues are distributed to by dispatching algorithm;Acquisition etc. is to be analyzed from corresponding multiple queues simultaneously for multiple analyzers Target topic, and carry out keyword extraction, obtain keyword;Keyword is applied to corresponding multiple by multiple analyzers simultaneously The search interface of internet sites is scanned for, and the search result of acquisition is returned into control centre;And control centre's analysis Search result, preserves final search result, wherein, multiple analyzers are corresponded with multiple queues.
The embodiment realizes the intelligent grabbing of the target topic of efficient stable by distributed search.
It will be understood by those skilled in the art that the purpose of the description above to embodiments of the present invention is only for exemplarily Illustrate the beneficial effect of embodiments of the present invention, be not intended to embodiments of the present invention being limited to given any show Example.
It is described above the embodiments of the present invention, described above is exemplary, and non-exclusive, and It is also not necessarily limited to disclosed each embodiment.It is right in the case of without departing from the scope and spirit of illustrated each embodiment Many modifications and changes will be apparent from for those skilled in the art.

Claims (10)

1. a kind of target topic intelligent grabbing method, including:
Control centre reads the target topic of search to be matched from database, and multiple queues are distributed to by dispatching algorithm;
Multiple analyzers target topic to be analyzed such as acquisition from corresponding the multiple queue, and carry out key simultaneously Word is extracted, and obtains the keyword;
The search interface that the keyword is applied to corresponding multiple internet sites by multiple analyzers simultaneously is scanned for, will The search result of acquisition returns to control centre;And
Control centre analyzes the search result, preserves final search result,
Wherein, the multiple analyzer is corresponded with the multiple queue.
2. target topic intelligent grabbing method according to claim 1, wherein, control centre analyzes the search result, Preserving final search result includes:
Judge whether the search result effective, if invalid, proceed the distribution of the queue, duplicate key word extract with Search;And
If effectively, preserving the final search result.
3. target topic intelligent grabbing method according to claim 1, wherein, the multiple analyzer with it is the multiple mutually Networking site is corresponded.
4. target topic intelligent grabbing method according to claim 1, wherein, the target topic of the search to be matched by User is uploaded to the database by client.
5. target topic intelligent grabbing method according to claim 4, wherein, in addition to:Control centre will be described final Search result return to the client.
6. a kind of target topic intelligent grabbing system, including:
Control centre, the control centre reads the target topic of search to be matched from database, is distributed by dispatching algorithm To multiple queues, search result is analyzed, and preserve final search result;
Database, stores the target topic of the search to be matched;
Queue, the multiple queue receives the target topic of the search to be matched of control centre's distribution, and distributes to Corresponding multiple analyzers;And
Analyzer, the multiple analyzer simultaneously from corresponding the multiple queue obtain etc. the target master to be analyzed Topic, and keyword extraction is carried out, the search interface that the keyword is applied into corresponding multiple internet sites simultaneously is carried out Search, control centre is returned to by the search result,
Wherein, the multiple analyzer is corresponded with the multiple queue.
7. target topic intelligent grabbing system according to claim 6, wherein, control centre's analysis search result, and protect Depositing final search result includes:
Judge whether the search result effective, if invalid, proceed the distribution of the queue, duplicate key word extract with Search;And
If effectively, preserving the final search result.
8. target topic intelligent grabbing system according to claim 6, wherein, the multiple analyzer with it is the multiple mutually Networking site is corresponded.
9. target topic intelligent grabbing system according to claim 6, wherein, the target topic of the search to be matched by User is uploaded to the database by client, and the final search result is returned to the client by control centre.
10. a kind of computer-readable recording medium, is stored thereon with computer program, wherein, described program is executed by processor Shi Shixian following steps:
Control centre reads the target topic of search to be matched from database, and multiple queues are distributed to by dispatching algorithm;
Multiple analyzers target topic to be analyzed such as acquisition from corresponding the multiple queue, and carry out key simultaneously Word is extracted, and obtains the keyword;
The search interface that the keyword is applied to corresponding multiple internet sites by multiple analyzers simultaneously is scanned for, will The search result of acquisition returns to control centre;And
Control centre analyzes the search result, preserves final search result,
Wherein, the multiple analyzer is corresponded with the multiple queue.
CN201710385603.1A 2017-05-26 2017-05-26 Target topic intelligent grabbing method, system and computer-readable recording medium Pending CN107133217A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710385603.1A CN107133217A (en) 2017-05-26 2017-05-26 Target topic intelligent grabbing method, system and computer-readable recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710385603.1A CN107133217A (en) 2017-05-26 2017-05-26 Target topic intelligent grabbing method, system and computer-readable recording medium

Publications (1)

Publication Number Publication Date
CN107133217A true CN107133217A (en) 2017-09-05

Family

ID=59734057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710385603.1A Pending CN107133217A (en) 2017-05-26 2017-05-26 Target topic intelligent grabbing method, system and computer-readable recording medium

Country Status (1)

Country Link
CN (1) CN107133217A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184227B (en) * 2011-05-10 2013-05-08 北京邮电大学 General crawler engine system used for WEB service and working method thereof
US20130226897A1 (en) * 2004-08-30 2013-08-29 Anton P.T. Carver Minimizing Visibility of Stale Content in Web Searching Including Revising Web Crawl Intervals of Documents
CN106649362A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Webpage crawling method and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130226897A1 (en) * 2004-08-30 2013-08-29 Anton P.T. Carver Minimizing Visibility of Stale Content in Web Searching Including Revising Web Crawl Intervals of Documents
CN102184227B (en) * 2011-05-10 2013-05-08 北京邮电大学 General crawler engine system used for WEB service and working method thereof
CN106649362A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Webpage crawling method and apparatus

Similar Documents

Publication Publication Date Title
Nolet et al. Comparing the effects of even‐and uneven‐aged silviculture on ecological diversity and processes: A review
Lamb et al. Effects of habitat quality and access management on the density of a recovering grizzly bear population
CN102930210B (en) Rogue program behavior automated analysis, detection and classification system and method
Stevens et al. Examining complexities of forest cover change during armed conflict on Nicaragua’s Atlantic Coast
Farji‐Brener et al. Environmental rugosity, body size and access to food: a test of the size‐grain hypothesis in tropical litter ants
Tecco et al. Contrasting functional trait syndromes underlay woody alien success in the same ecosystem
CN103577756B (en) The method for detecting virus judged based on script type and device
CN108205486A (en) A kind of intelligent distributed call chain tracking based on machine learning
CN102831122B (en) Data storage method, inquiring method and inquiring device for workflow table
CN107087001A (en) A kind of important address spatial retrieval system in distributed internet
CN102567407B (en) Method and system for collecting forum reply increment
CN103559300B (en) The querying method and inquiry unit of data
CN103020123B (en) A kind of method searching for bad video website
CN103166917A (en) Method and system for network equipment identity recognition
CN107832468A (en) Demand recognition methods and device
CN103064984B (en) The recognition methods of spam page and system
Webala et al. Bat habitat use in logged jarrah eucalypt forests of south‐western Australia
CN104361067B (en) A kind of intelligent loading method and system of browsing device net page information
US9305054B2 (en) System and method for extracting analogous queries
CN107239563A (en) Public feelings information dynamic monitoring and controlling method
CN111814192A (en) Training sample generation method and device and sensitive information detection method and device
CN105468981A (en) Vulnerability identification technology-based plugin safety scanning device and scanning method
CN108183902B (en) Malicious website identification method and device
Perkovich et al. Differentiated plant defense strategies: Herbivore community dynamics affect plant–herbivore interactions
CN107247789A (en) user interest acquisition method based on internet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170905