CN107133217A - Target topic intelligent grabbing method, system and computer-readable recording medium - Google Patents
Target topic intelligent grabbing method, system and computer-readable recording medium Download PDFInfo
- Publication number
- CN107133217A CN107133217A CN201710385603.1A CN201710385603A CN107133217A CN 107133217 A CN107133217 A CN 107133217A CN 201710385603 A CN201710385603 A CN 201710385603A CN 107133217 A CN107133217 A CN 107133217A
- Authority
- CN
- China
- Prior art keywords
- target topic
- search result
- search
- control centre
- queue
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
Disclose a kind of target topic intelligent grabbing method, system and computer-readable recording medium.This method can include:Control centre reads the target topic of search to be matched from database, and multiple queues are distributed to by dispatching algorithm;Multiple analyzers target topic to be analyzed such as acquisition from corresponding multiple queues, and carry out keyword extraction simultaneously, obtains keyword;The search interface that keyword is applied to corresponding multiple internet sites by multiple analyzers simultaneously is scanned for, and the search result of acquisition is returned into control centre;And control centre's analysis search result, final search result is preserved, wherein, multiple analyzers are corresponded with multiple queues.The present invention realizes the intelligent grabbing of the target topic of efficient stable by distributed search.
Description
Technical field
The present invention relates to computer realm, more particularly, to a kind of target topic intelligent grabbing method, system and calculating
Machine readable storage medium storing program for executing.
Background technology
In computer realm, crawler technology is an automatic program for downloading webpage, and it has according to set crawl target
Webpage on the access WWW of selection is linked to related, the information required for obtaining.Crawler technology does not pursue big cover
Lid, and will be targeted by capturing the webpage related to a certain particular topic content, it is that user's inquiry of subject-oriented prepares data
Resource.At present, some websites may the logic that accesses of some controls, i.e., anti-crawl is tactful.Therefore, it is necessary to develop a kind of target
Theme intelligent grabbing method, system and computer-readable recording medium.
The information for being disclosed in background of invention part is merely intended to deepen the reason of the general background technology to the present invention
Solution, and be not construed as recognizing or imply known to those skilled in the art existing of the information structure in any form
Technology.
The content of the invention
The present invention proposes a kind of target topic intelligent grabbing method, system and computer-readable recording medium, and it can
By distributed search, the intelligent grabbing of the target topic of efficient stable is realized.
According to an aspect of the invention, it is proposed that a kind of target topic intelligent grabbing method.Methods described can include:Adjust
The target topic of search to be matched is read at degree center from database, and multiple queues are distributed to by dispatching algorithm;Multiple analyses
The device target topic to be analyzed such as acquisition from corresponding the multiple queue, and carry out keyword extraction simultaneously, is obtained
The keyword;The search interface that the keyword is applied to corresponding multiple internet sites by multiple analyzers simultaneously is carried out
Search, control centre is returned to by the search result of acquisition;And control centre analyzes the search result, final search is preserved
Hitch fruit, wherein, the multiple analyzer is corresponded with the multiple queue.
Preferably, control centre analyzes the search result, and preserving final search result includes:Judge the search knot
Whether fruit is effective, if invalid, proceeds the distribution of the queue, and duplicate key word is extracted and search;And if effective,
Preserve the final search result.
Preferably, the multiple analyzer is corresponded with the multiple internet sites.
Preferably, the target topic of the search to be matched is uploaded to the database by user by client.
Preferably, in addition to:The final search result is returned to the client by control centre.
According to another aspect of the invention, it is proposed that a kind of target topic intelligent grabbing system, the system can include:
Control centre, the control centre reads the target topic of search to be matched from database, is distributed to by dispatching algorithm many
Individual queue, analyzes search result, and preserve final search result;Database, stores the target topic of the search to be matched;
Queue, the multiple queue receives the target topic of the search to be matched of control centre's distribution, and distributes to correspondence
Multiple analyzers;And analyzer, acquisition etc. is to be analyzed from corresponding the multiple queue simultaneously for the multiple analyzer
The target topic, and carry out keyword extraction, the keyword be applied to corresponding multiple internet sites simultaneously
Search interface is scanned for, and the search result is returned into control centre, wherein, the multiple analyzer and the multiple team
Row are corresponded.
Preferably, control centre analyzes the search result, and preserving final search result includes:Judge the search knot
Whether fruit is effective, if invalid, proceeds the distribution of the queue, and duplicate key word is extracted and search;And if effective,
Preserve the final search result.
Preferably, the multiple analyzer is corresponded with the multiple internet sites.
Preferably, the target topic of the search to be matched is uploaded to the database by user by client, scheduling
The final search result is returned to the client by center.
According to the third aspect of the present invention, it is proposed that a kind of computer-readable recording medium, it is stored thereon with computer
Program, wherein, following steps are realized when described program is executed by processor:Search to be matched is read from database in control centre
Target topic, multiple queues are distributed to by dispatching algorithm;Multiple analyzers are obtained from corresponding the multiple queue simultaneously
The target topic to be analyzed such as take, and carry out keyword extraction, obtain the keyword;Multiple analyzers simultaneously will be described
The search interface that keyword is applied to corresponding multiple internet sites is scanned for, and the search result of acquisition is returned into scheduling
Center;And control centre analyzes the search result, final search result is preserved, wherein, the multiple analyzer and institute
Multiple queues are stated to correspond.
Methods and apparatus of the present invention has other characteristics and advantage, and these characteristics and advantage are attached from what is be incorporated herein
It will be apparent in figure and subsequent embodiment, or by the accompanying drawing being incorporated herein and subsequent specific reality
Apply in mode and stated in detail, these the drawings and specific embodiments are provided commonly for explaining the certain principles of the present invention.
Brief description of the drawings
By the way that exemplary embodiment of the invention is described in more detail with reference to accompanying drawing, it is of the invention above-mentioned and its
Its purpose, feature and advantage will be apparent, wherein, in exemplary embodiment of the invention, identical reference number
Typically represent same parts.
The flow chart for the step of Fig. 1 shows the target topic intelligent grabbing method according to the present invention.
Embodiment
The present invention is more fully described below with reference to accompanying drawings.Although showing the side of being preferable to carry out of the present invention in accompanying drawing
Formula, however, it is to be appreciated that may be realized in various forms the present invention without that should be limited by embodiments set forth herein.Phase
Instead there is provided these embodiments be in order that the present invention is more thorough and complete, and can be by the scope of the present invention intactly
Convey to those skilled in the art.
Embodiment 1
The flow chart for the step of Fig. 1 shows the target topic intelligent grabbing method according to the present invention.
In this embodiment, it can be included according to the target topic intelligent grabbing method of the present invention:Step 101, dispatch
The target topic of search to be matched is read at center from database, and multiple queues are distributed to by dispatching algorithm;Step 102, it is many
The individual analyzer target topic to be analyzed such as acquisition from corresponding multiple queues, and carry out keyword extraction simultaneously, is closed
Keyword;Step 103, the search interface that keyword is applied to corresponding multiple internet sites by multiple analyzers simultaneously is searched
Rope, control centre is returned to by the search result of acquisition;And step 104, control centre's analysis search result, preserve finally
Search result, wherein, multiple analyzers are corresponded with multiple queues.
The embodiment realizes the intelligent grabbing of the target topic of efficient stable by distributed search.
The following detailed description of the specific steps of the target topic intelligent grabbing method according to the present invention.
In one example, control centre reads the target topic of search to be matched from database, can pass through scheduling
Algorithm distributes to multiple queues.
In one example, the target topic of search to be matched by client can be uploaded to database by user.
Specifically, the target topic of search to be matched is uploaded to database by user by client, and control centre is from number
According to the target topic that search to be matched is read in storehouse, queue is distributed to by dispatching algorithm, dispatching algorithm can be according to the negative of system
Carry, the task quantity that queue is waited for, the dynamic state of parameters distribution such as each website processing time, those skilled in the art can basis
With putting forward the dispatching algorithm that situation selection needs.
In one example, multiple analyzers simultaneously from corresponding multiple queues obtain etc. target topic to be analyzed,
And keyword extraction is carried out, keyword can be obtained.
In one example, multiple analyzers are corresponded with multiple queues.
In one example, keyword is applied to search circle of corresponding multiple internet sites by multiple analyzers simultaneously
Face is scanned for, and the search result of acquisition can be returned into control centre.
In one example, multiple analyzers are corresponded with multiple internet sites.
Specifically, analyzer is corresponded with queue, and a component parser is corresponded with queue and internet sites, some
There is the logic that control is accessed website, i.e., anti-crawl strategy, by configuring the analyzer of each website, carries out to website one by one individually
Adaptation, optimization.Those skilled in the art can configure corresponding analyzer according to the concrete condition of website.
Multiple analyzers can the target topic to be analyzed such as acquisition from corresponding multiple queues, and carry out key simultaneously
Word is extracted, and obtains keyword, and keyword is applied to the search interface of corresponding multiple internet sites by multiple analyzers simultaneously
Scan for, the search result of acquisition is returned into control centre.
In one example, control centre's analysis search result, can preserve final search result.
In one example, control centre's analysis search result, preserving final search result can include:Judge search
As a result whether effectively, if invalid, the distribution of queue is proceeded, duplicate key word is extracted and search;And if effectively, protect
Deposit final search result.
In one example, this method can also include:Final search result is returned to client by control centre.
Specifically, whether analysis search result in control centre's is effective, i.e., whether merchandise news is normally resolved, if nothing
Effect, proceeds the distribution of queue, and duplicate key word is extracted with searching for, if running into duplicate key word, judges keyword most
The nearly time once performed, if treated in 7 days, ignore.If effectively, preserving final search result.Adjust
Final search result is returned to client by degree center, and user can check final search result by client.
Using example
For ease of understanding the scheme and its effect of embodiment of the present invention, a concrete application example given below.Ability
Field technique personnel should be understood that the example only for the purposes of understanding the present invention, and its any detail is not intended in any way
The limitation present invention.
The target topic of search to be matched is uploaded to database by user by client, and control centre is read from database
The target topic of search to be matched is taken, according to the load of system, the task quantity that queue is waited for, each website processing time etc.
Dynamic state of parameters distribution condition, queue is distributed to by dispatching algorithm by the target topic of search to be matched.Analyzer and queue one
One correspondence, a component parser is corresponded with queue and internet sites, by configuring the analyzer of each website, one by one website
Ground be individually adapted to, optimized.
Multiple analyzers such as obtain at the target topic to be analyzed from corresponding multiple queues simultaneously, and carry out keyword and carry
Take, obtain keyword, the search interface that keyword is applied to corresponding multiple internet sites by multiple analyzers simultaneously is carried out
Search, control centre is returned to by the search result of acquisition.Whether control centre's analysis search result is effective, i.e., merchandise news is
It is no to be normally resolved, judge that search result effectively, has been normally resolved, has then preserved final search result, and final is searched
Hitch fruit returns to client, and user checks final search result by client.
In summary, the present invention realizes the intelligent grabbing of the target topic of efficient stable by distributed search.
It will be understood by those skilled in the art that the purpose of the description above to embodiments of the present invention is only for exemplarily
Illustrate the beneficial effect of embodiments of the present invention, be not intended to embodiments of the present invention being limited to given any show
Example.
Embodiment 2
According to the embodiment of the present invention there is provided a kind of target topic intelligent grabbing system, the system can include:
Control centre, control centre reads the target topic of search to be matched from database, and multiple teams are distributed to by dispatching algorithm
Row, analyze search result, and preserve final search result;Database, stores the target topic of search to be matched;Queue, it is many
Individual queue receives the target topic of the search to be matched of control centre's distribution, and distributes to corresponding multiple analyzers;And point
Parser, multiple analyzers such as obtain at the target topic to be analyzed from corresponding multiple queues simultaneously, and carry out keyword extraction,
The search interface that keyword is applied into corresponding multiple internet sites simultaneously is scanned for, and search result is returned into scheduling
Center, wherein, multiple analyzers are corresponded with multiple queues.
The embodiment realizes the intelligent grabbing of the target topic of efficient stable by distributed search.
In one example, control centre's analysis search result, and preserve final search result and can include:Judgement is searched
Whether hitch fruit is effective, if invalid, proceeds the distribution of queue, and duplicate key word is extracted and search;And if effective,
Preserve final search result.
In one example, multiple analyzers are corresponded with multiple internet sites.
In one example, the target topic of search to be matched by client can be uploaded to database by user, adjust
Final search result is returned to client by degree center.
It will be understood by those skilled in the art that the purpose of the description above to embodiments of the present invention is only for exemplarily
Illustrate the beneficial effect of embodiments of the present invention, be not intended to embodiments of the present invention being limited to given any show
Example.
Embodiment 3
According to the embodiment of the present invention there is provided a kind of computer-readable recording medium, computer journey is stored thereon with
Sequence, wherein, following steps are realized when program is executed by processor:The target of search to be matched is read from database in control centre
Theme, multiple queues are distributed to by dispatching algorithm;Acquisition etc. is to be analyzed from corresponding multiple queues simultaneously for multiple analyzers
Target topic, and carry out keyword extraction, obtain keyword;Keyword is applied to corresponding multiple by multiple analyzers simultaneously
The search interface of internet sites is scanned for, and the search result of acquisition is returned into control centre;And control centre's analysis
Search result, preserves final search result, wherein, multiple analyzers are corresponded with multiple queues.
The embodiment realizes the intelligent grabbing of the target topic of efficient stable by distributed search.
It will be understood by those skilled in the art that the purpose of the description above to embodiments of the present invention is only for exemplarily
Illustrate the beneficial effect of embodiments of the present invention, be not intended to embodiments of the present invention being limited to given any show
Example.
It is described above the embodiments of the present invention, described above is exemplary, and non-exclusive, and
It is also not necessarily limited to disclosed each embodiment.It is right in the case of without departing from the scope and spirit of illustrated each embodiment
Many modifications and changes will be apparent from for those skilled in the art.
Claims (10)
1. a kind of target topic intelligent grabbing method, including:
Control centre reads the target topic of search to be matched from database, and multiple queues are distributed to by dispatching algorithm;
Multiple analyzers target topic to be analyzed such as acquisition from corresponding the multiple queue, and carry out key simultaneously
Word is extracted, and obtains the keyword;
The search interface that the keyword is applied to corresponding multiple internet sites by multiple analyzers simultaneously is scanned for, will
The search result of acquisition returns to control centre;And
Control centre analyzes the search result, preserves final search result,
Wherein, the multiple analyzer is corresponded with the multiple queue.
2. target topic intelligent grabbing method according to claim 1, wherein, control centre analyzes the search result,
Preserving final search result includes:
Judge whether the search result effective, if invalid, proceed the distribution of the queue, duplicate key word extract with
Search;And
If effectively, preserving the final search result.
3. target topic intelligent grabbing method according to claim 1, wherein, the multiple analyzer with it is the multiple mutually
Networking site is corresponded.
4. target topic intelligent grabbing method according to claim 1, wherein, the target topic of the search to be matched by
User is uploaded to the database by client.
5. target topic intelligent grabbing method according to claim 4, wherein, in addition to:Control centre will be described final
Search result return to the client.
6. a kind of target topic intelligent grabbing system, including:
Control centre, the control centre reads the target topic of search to be matched from database, is distributed by dispatching algorithm
To multiple queues, search result is analyzed, and preserve final search result;
Database, stores the target topic of the search to be matched;
Queue, the multiple queue receives the target topic of the search to be matched of control centre's distribution, and distributes to
Corresponding multiple analyzers;And
Analyzer, the multiple analyzer simultaneously from corresponding the multiple queue obtain etc. the target master to be analyzed
Topic, and keyword extraction is carried out, the search interface that the keyword is applied into corresponding multiple internet sites simultaneously is carried out
Search, control centre is returned to by the search result,
Wherein, the multiple analyzer is corresponded with the multiple queue.
7. target topic intelligent grabbing system according to claim 6, wherein, control centre's analysis search result, and protect
Depositing final search result includes:
Judge whether the search result effective, if invalid, proceed the distribution of the queue, duplicate key word extract with
Search;And
If effectively, preserving the final search result.
8. target topic intelligent grabbing system according to claim 6, wherein, the multiple analyzer with it is the multiple mutually
Networking site is corresponded.
9. target topic intelligent grabbing system according to claim 6, wherein, the target topic of the search to be matched by
User is uploaded to the database by client, and the final search result is returned to the client by control centre.
10. a kind of computer-readable recording medium, is stored thereon with computer program, wherein, described program is executed by processor
Shi Shixian following steps:
Control centre reads the target topic of search to be matched from database, and multiple queues are distributed to by dispatching algorithm;
Multiple analyzers target topic to be analyzed such as acquisition from corresponding the multiple queue, and carry out key simultaneously
Word is extracted, and obtains the keyword;
The search interface that the keyword is applied to corresponding multiple internet sites by multiple analyzers simultaneously is scanned for, will
The search result of acquisition returns to control centre;And
Control centre analyzes the search result, preserves final search result,
Wherein, the multiple analyzer is corresponded with the multiple queue.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710385603.1A CN107133217A (en) | 2017-05-26 | 2017-05-26 | Target topic intelligent grabbing method, system and computer-readable recording medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710385603.1A CN107133217A (en) | 2017-05-26 | 2017-05-26 | Target topic intelligent grabbing method, system and computer-readable recording medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107133217A true CN107133217A (en) | 2017-09-05 |
Family
ID=59734057
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710385603.1A Pending CN107133217A (en) | 2017-05-26 | 2017-05-26 | Target topic intelligent grabbing method, system and computer-readable recording medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107133217A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102184227B (en) * | 2011-05-10 | 2013-05-08 | 北京邮电大学 | General crawler engine system used for WEB service and working method thereof |
US20130226897A1 (en) * | 2004-08-30 | 2013-08-29 | Anton P.T. Carver | Minimizing Visibility of Stale Content in Web Searching Including Revising Web Crawl Intervals of Documents |
CN106649362A (en) * | 2015-10-30 | 2017-05-10 | 北京国双科技有限公司 | Webpage crawling method and apparatus |
-
2017
- 2017-05-26 CN CN201710385603.1A patent/CN107133217A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130226897A1 (en) * | 2004-08-30 | 2013-08-29 | Anton P.T. Carver | Minimizing Visibility of Stale Content in Web Searching Including Revising Web Crawl Intervals of Documents |
CN102184227B (en) * | 2011-05-10 | 2013-05-08 | 北京邮电大学 | General crawler engine system used for WEB service and working method thereof |
CN106649362A (en) * | 2015-10-30 | 2017-05-10 | 北京国双科技有限公司 | Webpage crawling method and apparatus |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nolet et al. | Comparing the effects of even‐and uneven‐aged silviculture on ecological diversity and processes: A review | |
Lamb et al. | Effects of habitat quality and access management on the density of a recovering grizzly bear population | |
CN102930210B (en) | Rogue program behavior automated analysis, detection and classification system and method | |
Stevens et al. | Examining complexities of forest cover change during armed conflict on Nicaragua’s Atlantic Coast | |
Farji‐Brener et al. | Environmental rugosity, body size and access to food: a test of the size‐grain hypothesis in tropical litter ants | |
Tecco et al. | Contrasting functional trait syndromes underlay woody alien success in the same ecosystem | |
CN103577756B (en) | The method for detecting virus judged based on script type and device | |
CN108205486A (en) | A kind of intelligent distributed call chain tracking based on machine learning | |
CN102831122B (en) | Data storage method, inquiring method and inquiring device for workflow table | |
CN107087001A (en) | A kind of important address spatial retrieval system in distributed internet | |
CN102567407B (en) | Method and system for collecting forum reply increment | |
CN103559300B (en) | The querying method and inquiry unit of data | |
CN103020123B (en) | A kind of method searching for bad video website | |
CN103166917A (en) | Method and system for network equipment identity recognition | |
CN107832468A (en) | Demand recognition methods and device | |
CN103064984B (en) | The recognition methods of spam page and system | |
Webala et al. | Bat habitat use in logged jarrah eucalypt forests of south‐western Australia | |
CN104361067B (en) | A kind of intelligent loading method and system of browsing device net page information | |
US9305054B2 (en) | System and method for extracting analogous queries | |
CN107239563A (en) | Public feelings information dynamic monitoring and controlling method | |
CN111814192A (en) | Training sample generation method and device and sensitive information detection method and device | |
CN105468981A (en) | Vulnerability identification technology-based plugin safety scanning device and scanning method | |
CN108183902B (en) | Malicious website identification method and device | |
Perkovich et al. | Differentiated plant defense strategies: Herbivore community dynamics affect plant–herbivore interactions | |
CN107247789A (en) | user interest acquisition method based on internet |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170905 |