CN108038221A - A kind of information extraction method and device - Google Patents
A kind of information extraction method and device Download PDFInfo
- Publication number
- CN108038221A CN108038221A CN201711407157.6A CN201711407157A CN108038221A CN 108038221 A CN108038221 A CN 108038221A CN 201711407157 A CN201711407157 A CN 201711407157A CN 108038221 A CN108038221 A CN 108038221A
- Authority
- CN
- China
- Prior art keywords
- information
- crawl
- public platform
- keyword
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Abstract
The present invention provides a kind of information extraction method and device, this method includes:Obtain information scratching request;According to the crawl keyword carried in described information crawl request, at least one public platform corresponding with the crawl keyword is determined;From each public platform determined, crawl asks corresponding information with described information crawl.This programme can grab the information for meeting user demand.
Description
Technical field
The present invention relates to field of computer technology, more particularly to a kind of information extraction method and device.
Background technology
With the continuous development of Internet technology, more and more enterprises or tissue have been got used to capturing it from internet
The information needed.
At present, data mainly are captured from the webpage of each portal website.For example, include in a webpage of certain website
Entertainment news display module and News display module, then can be respectively from entertainment news display module and News displaying mould
Block captures corresponding information respectively.
But due to the fast development of mobile Internet, more and more users get used to sending out by mobile Internet
Cloth and check information, correspondingly, the information that the webpage of each portal website is shown is fewer and fewer, it is difficult to shown from webpage
The information for meeting user demand is grabbed in information, therefore, how to obtain the information for meeting user demand then as urgently to be resolved hurrily
The problem of.
The content of the invention
An embodiment of the present invention provides a kind of information extraction method and device, can grab the information for meeting user demand.
In a first aspect, an embodiment of the present invention provides a kind of information extraction method, including:
Obtain information scratching request;
According to the crawl keyword carried in described information crawl request, determine corresponding at least with the crawl keyword
One public platform;
From each public platform determined, crawl asks corresponding information with described information crawl.
Preferably,
The crawl keyword carried in the crawl request according to described information, determines corresponding with the crawl keyword
At least one public platform, including:
According to the crawl keyword, at least one public platform to be selected, and each described public platform pair to be selected are determined
The identification information answered;
For public platform to be selected each described, it is performed both by:
At least one characteristic character is parsed from the identification information;
Determine to capture the corresponding target character of keyword with described from least one characteristic character, and determine institute
State the quantity of target character;
When the quantity of the target character is more than default amount threshold, using the public platform to be selected as with the crawl
The corresponding public platform of keyword.
Preferably,
Described crawl captures with described information and asks corresponding information from each public platform determined, wraps
Include:
Determine at least one historical information of each public platform issue;
At least one crawl information corresponding with the crawl keyword is determined from least one historical information;
Described at least one crawl information is captured.
Preferably,
It is described that described at least one crawl information is captured, including:
Determine to capture the corresponding issuing time of information described in each;
According to the crawl time and the issuing time carried in the crawl request, from described at least one crawl letter
Target crawl information is determined in breath;
The target crawl information determined is captured.
Preferably,
It is described that described at least one crawl information is captured, including:
The corresponding issuing time of information is captured described in each bar for determining each public platform issue;
According to the issuing time, determine that each described public platform is corresponding respectively and capture information issue frequency;
According to it is described crawl information issue frequency, selected from each public platform it is described crawl information issue frequency compared with
Big at least one target public platform;
The crawl information issued to target public platform each described captures.
Preferably,
It is described that described at least one crawl information is captured, including:
For information is captured described in each, it is performed both by:
At least one sample word is parsed from the crawl information;
At least one mutual exclusion keyword is parsed from described information crawl request;
Determine to whether there is at least one target sample word and the mutual exclusion keyword phase at least one sample word
Together, if not, being captured to the crawl information.
Second aspect, an embodiment of the present invention provides a kind of information scratching device, including:Acquisition request unit, public platform
Determination unit and information scratching unit;Wherein,
The acquisition request unit, for obtaining information scratching request;
The public platform determination unit, for according to the crawl keyword that carries in described information crawl request, determine with
It is described to capture the corresponding at least one public platform of keyword;
Described information placement unit, for from each public platform determined, crawl please with described information crawl
Seek corresponding information.
Preferably,
The public platform determination unit, for according to the crawl keyword, determining at least one public platform to be selected, and
The corresponding identification information of each described public platform to be selected;For public platform to be selected each described, it is performed both by:From the mark
At least one characteristic character is parsed in information;Determined from least one characteristic character opposite with the crawl keyword
The target character answered, and determine the quantity of the target character;When the quantity of the target character is more than default amount threshold,
Using the public platform to be selected as the public platform corresponding with the crawl keyword.
Preferably,
Described information placement unit, for determining at least one historical information of each public platform issue;From institute
State and at least one crawl information corresponding with the crawl keyword is determined at least one historical information, and to described at least one
Bar crawl information is captured.
Preferably,
Described information placement unit, for determining to capture the corresponding issuing time of information described in each, is grabbed according to described
The crawl time carried in request and the issuing time are taken, target crawl letter is determined from described at least one crawl information
Breath, and the crawl of the target to determining information captures;
Preferably,
Described information placement unit, for determining that crawl information described in each bar of each public platform issue is corresponding
Issuing time, according to the issuing time, determines that each described public platform is corresponding and captures information issue frequency respectively;According to
The crawl information issue frequency, selects the crawl information to issue larger at least one of frequency from each public platform
Target public platform, and the crawl information issued to target public platform each described captures;
Preferably,
Described information placement unit, for for information is captured described in each, being performed both by:Solved from the crawl information
Separate out at least one sample word;At least one mutual exclusion keyword is parsed from described information crawl request;Described in determining at least
It is identical with the mutual exclusion keyword with the presence or absence of at least one target sample word in one sample word, if not, to the crawl
Information is captured.
An embodiment of the present invention provides a kind of information extraction method and device, and phase is determined according to the crawl keyword got
The public platform answered, and corresponding information is captured from the public platform determined.Since the information content issued in public platform is larger,
And public platform is generally managed by professional person, therefore the content of its issue passes through comprehensive improvement so that issues content
It is professional higher, so that the information for meeting user demand can be grabbed from public platform.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are the present invention
Some embodiments, for those of ordinary skill in the art, without creative efforts, can also basis
These attached drawings obtain other attached drawings.
Fig. 1 is a kind of flow chart of information extraction method provided by one embodiment of the present invention;
Fig. 2 is a kind of flow chart for information extraction method that another embodiment of the present invention provides;
Fig. 3 is a kind of structure diagram of information scratching device provided by one embodiment of the present invention;
Fig. 4 is a kind of structure diagram of information scratching device provided by one embodiment of the present invention.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
Part of the embodiment of the present invention, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art
The all other embodiments obtained on the premise of creative work is not made, belong to the scope of protection of the invention.
As shown in Figure 1, an embodiment of the present invention provides a kind of information extraction method, this method may comprise steps of:
Step 101:Obtain information scratching request;
Step 102:According to the crawl keyword carried in described information crawl request, determine and the crawl keyword pair
At least one public platform answered;
Step 103:From each public platform determined, crawl asks corresponding letter with described information crawl
Breath.
In above-described embodiment, corresponding public platform is determined according to the crawl keyword got, and from the public determined
Corresponding information is captured in number.Since the information content issued in public platform is larger, and public platform generally by professional person into
Row management, therefore the content of its issue passes through comprehensive improvement so that the professional higher of content is issued, so that the energy from public platform
Grab the information for meeting user demand.
In one embodiment of the invention, the embodiment of step 102, can include:
According to the crawl keyword, at least one public platform to be selected, and each described public platform pair to be selected are determined
The identification information answered;
For public platform to be selected each described, it is performed both by:
At least one characteristic character is parsed from the identification information;
Determine to capture the corresponding target character of keyword with described from least one characteristic character, and determine institute
State the quantity of target character;
When the quantity of the target character is more than default amount threshold, using the public platform to be selected as with the crawl
The corresponding public platform of keyword.
For example, the crawl keyword parsed from information scratching request is " natural gas ", then searched with " natural gas "
Rope, can search for multiple information for including " natural gas ", and each information is corresponding with public platform respectively, then these public platforms are and grab
Take the corresponding public platform to be selected of keyword " natural gas ".Then, it is determined that the corresponding title of each public platform to be selected, function introduction and
Identification informations such as account main body, and characteristic character is parsed from identification information, when parsing, can split into identification information multiple
Word, and removing in the word that splits out without sincere auxiliary word and pronoun etc., using remaining word after removal as Feature Words, i.e.,
Characteristic character.It is then determined that with capturing keyword " natural gas " corresponding target character and these mesh in these characteristic characters
The quantity of marking-up symbol, these target characters can be " natural gas ", or the near synonym of " natural gas ", or " natural gas "
Corresponding phonetic or foreign language word etc..When the quantity of target character is more than default amount threshold, illustrate the public platform to be selected with
It is higher to capture the degree of correlation of keyword " natural gas ", then as " natural gas " corresponding public platform.Thus, as the public to be selected
Number identification information in when being more than default amount threshold with the crawl corresponding target character quantity of keyword, illustrate the public platform
It is higher with the degree of correlation of crawl keyword, information is captured from the public platform, information and the crawl for advantageously allowing crawl are crucial
Word is more consistent, so as to can more meet the demand of user.
In one embodiment of the invention, the embodiment of step 103, can include:
Determine at least one historical information of each public platform issue;
At least one crawl information corresponding with the crawl keyword is determined from least one historical information;
Described at least one crawl information is captured.
Herein, each public platform has issued a plurality of historical information, determines to close with crawl from a plurality of historical information
The corresponding crawl information of keyword, and only the crawl information determined is captured so that the information grabbed is crucial with crawl
Word is consistent, so as to can more meet the demand of user.
It is described that described at least one crawl information is captured in one embodiment of the invention, including:
Determine to capture the corresponding issuing time of information described in each;
According to the crawl time and the issuing time carried in the crawl request, from described at least one crawl letter
Target crawl information is determined in breath;
The target crawl information determined is captured.
Herein, the crawl time can be limited when user inputs information scratching request, for example, the crawl time in crawl request
Arrived on January 1st, 2017, then after crawl information corresponding with crawl keyword is determined, can be captured and believed according to each
The issuing time of breath, determines if to be consistent with the crawl time, for example, the issuing time of certain crawl information is September 1 in 2017
Day, illustrate that its issuing time in the range of the time is captured, at this time just captures the crawl information, so that crawl
Information is not only corresponding with crawl keyword, also corresponding with the crawl time, so that the information grabbed further conforms to
The demand of user.
It is described that described at least one crawl information is captured in one embodiment of the invention, including:
The corresponding issuing time of information is captured described in each bar for determining each public platform issue;
According to the issuing time, determine that each described public platform is corresponding respectively and capture information issue frequency;
According to it is described crawl information issue frequency, selected from each public platform it is described crawl information issue frequency compared with
Big at least one target public platform;
The crawl information issued to target public platform each described captures.
Herein, the issuing time for each bar crawl information issued according to public platform, it may be determined that the public platform is corresponding to grab
Breath of winning the confidence issues frequency, for example, public platform A issued crawl information 10 in 1 year, then the crawl information of public platform A issues frequency
Rate is 10/year, similarly, if public platform B issued crawl information 100, the crawl information issue frequency of public platform B in 1 year
Rate is 100/year.Obviously, the crawl information issue frequency of public platform B is bigger than public platform A, illustrates that public platform B is crucial to crawl
The corresponding field attention rate higher of word, it is professional stronger, therefore the larger public platform of information issue frequency will be captured as information
The target public platform of crawl, is conducive to make the information specialty degree higher grabbed, more meets user demand.
It is described that described at least one crawl information is captured in one embodiment of the invention, including:
For information is captured described in each, it is performed both by:
At least one sample word is parsed from the crawl information;
At least one mutual exclusion keyword is parsed from described information crawl request;
Determine to whether there is at least one target sample word and the mutual exclusion keyword phase at least one sample word
Together, if not, being captured to the crawl information.
In order to avoid the interference of other information, user can add mutual exclusion keyword in information scratching request, for example, user
Need the information that captures only related to " natural gas ", and it is unrelated with " coal gas ", then, will " natural gas " work in information scratching request
To capture keyword, and " coal gas " is used as mutual exclusion keyword.Each can then be captured the mark of information by information scratching device
The content such as topic and text is split, and multiple sample words is formed, it is then determined that whether having in each sample word and mutual exclusion keyword
" coal gas " identical word, only when " coal gas " is not present in each sample word, just captures this crawl information.By
This, can screen each crawl information according to mutual exclusion keyword, to the crawl information including mutual exclusion keyword without grabbing
Take so that the information grabbed is more matched with crawl keyword, more meets user demand.
Below exemplified by capturing " natural gas " corresponding information, information extraction method provided in an embodiment of the present invention is carried out
Describe in detail, as shown in Fig. 2, this method may comprise steps of:
Step 201:Information scratching request is obtained, the crawl request includes:Capture keyword " natural gas ", mutual exclusion is closed
Keyword " coal gas " and crawl time " 2017.1.1-2017.10.1 ".
Step 202:According to crawl keyword " natural gas ", at least one public platform to be selected is determined, and each is to be selected
The corresponding identification information of public platform.
Herein, scanned for " natural gas ", can search for multiple information for including " natural gas ", each information difference
Public platform is corresponding with, then these public platforms are to capture the corresponding public platform to be selected of keyword " natural gas ".It is it is then determined that each
The identification informations such as public platform to be selected corresponding title, function introduction and account main body.
Step 203:For public platform to be selected each described, it is performed both by:At least one is parsed from the identification information
A characteristic character, and definite and " natural gas " corresponding target character from least one characteristic character, and determine institute
State the quantity of target character.
Herein, the identification informations such as the corresponding title of public platform to be selected, function introduction and account main body are split into multiple
Word, and removing in the word that splits out without sincere auxiliary word and pronoun etc., using remaining word after removal as Feature Words, i.e.,
Characteristic character.It is then determined that with capturing keyword " natural gas " corresponding target character and these mesh in these characteristic characters
The quantity of marking-up symbol, these target characters can be " natural gas ", or the near synonym of " natural gas ", or " natural gas "
Corresponding phonetic or foreign language word etc..
Step 204:Judge whether the quantity of the target character is more than default amount threshold, if so, performing step
205, otherwise terminate current process.
Step 205:It is " natural gas " corresponding public platform to determine the public platform to be selected, and determines the public platform issue
At least one historical information, from least one historical information determine it is corresponding with " natural gas " at least one crawl believe
Breath.
When the quantity of target character is more than default amount threshold, illustrate that the public platform to be selected and crawl keyword are " natural
The degree of correlation of gas " is higher, then as " natural gas " corresponding public platform.
Step 206:The corresponding issuing time of information is captured described in each bar for determining each public platform issue, according to
The issuing time, determines that each described public platform is corresponding and captures information issue frequency respectively.
For example, public platform A issued crawl information 10 in 1 year, then the crawl information issue frequency of public platform A is 10
Bar/year, similarly, if public platform B issued crawl information 100 in 1 year, the crawl information issue frequency of public platform B is
100/year.
Step 207:According to the crawl information issue frequency, the crawl information is selected to send out from each public platform
The larger at least one target public platform of cloth frequency.
The crawl information issue frequency bigger of public platform, illustrates the public platform to the corresponding field attention rate of crawl keyword
Higher, it is professional stronger.Herein, i.e., using public platform B as target public platform.
Step 208:Determine to capture the corresponding issuing time of information described in each of target public platform issue, grabbed from each bar
Win the confidence and determine that target corresponding with crawl time " 2017.1.1-2017.10.1 " captures information in breath.
The issuing time of information is captured when capturing in the range of the time, just the crawl information is captured, is thus made
The information that must be captured not only with crawl keyword it is corresponding, also with crawl the time it is corresponding so that the information grabbed into
One step meets the demand of user.
Step 209:Information is captured for each target, is performed both by:Capture in information and parsed at least from the target
One sample word.
Step 210:Judge crucial with mutual exclusion with the presence or absence of at least one target sample word at least one sample word
Word " coal gas " is identical, if it is, terminating current process, otherwise performs step 211.
Step 211:Target crawl information is captured.
When " coal gas " is not present in each sample word, just this crawl information is captured.Thus, can be according to mutual exclusion
Keyword screens each crawl information, to the crawl information including mutual exclusion keyword without crawl so that grab
Information with crawl keyword more match, more meet user demand.
As shown in figure 3, an embodiment of the present invention provides a kind of information scratching device, including:Acquisition request unit 301, public affairs
Crowd's determination unit 302 and information scratching unit 303;Wherein,
The acquisition request unit 301, for obtaining information scratching request;
The public platform determination unit 302, for according to the crawl keyword carried in described information crawl request, determining
At least one public platform corresponding with the crawl keyword;
Described information placement unit 303, for from each public platform determined, crawl to be captured with described information
Ask corresponding information.
As shown in figure 4, in one embodiment of the invention, the public platform determination unit 302, it is true can to include identification information
Stator unit 3021, quantity determination subelement 3022 and quantity contrast subunit 3023;Wherein,
The identification information determination subelement 3021, for according to the crawl keyword, determining at least one public affairs to be selected
Many numbers, and the corresponding identification information of each described public platform to be selected;
The quantity determination subelement 3022, for for public platform to be selected each described, being performed both by:From the mark
At least one characteristic character is parsed in information;Determined from least one characteristic character opposite with the crawl keyword
The target character answered, and determine the quantity of the target character;
Quantitative comparison's subelement 3023, for when the quantity of the target character is more than default amount threshold, inciting somebody to action
The public platform to be selected is as the public platform corresponding with the crawl keyword.
In one embodiment of the invention, described information placement unit 303, for determining each public platform issue
At least one historical information;Determine to grab with the crawl keyword corresponding at least one from least one historical information
Win the confidence breath, and described at least one crawl information is captured.
In one embodiment of the invention, described information placement unit 303, for determining that information is captured described in each to be corresponded to
Issuing time, according to crawl time for carrying and the issuing time in the crawl request, grabbed from described at least one
Win the confidence and target crawl information is determined in breath, and the crawl of the target to determining information captures;
In one embodiment of the invention, described information placement unit 303, for determining each public platform issue
The corresponding issuing time of information is captured described in each bar, according to the issuing time, determines that each described public platform corresponds to respectively
Crawl information issue frequency;According to the crawl information issue frequency, the crawl letter is selected from each public platform
The larger at least one target public platform of breath issue frequency, and to the crawl information of target public platform issue each described
Captured.
In one embodiment of the invention, described information placement unit 303, for for information is captured described in each,
Perform:At least one sample word is parsed from the crawl information;Parsed from described information crawl request at least one
Mutual exclusion keyword;Determine at least one sample word with the presence or absence of at least one target sample word and the mutual exclusion keyword
It is identical, if not, being captured to the crawl information.
The contents such as the information exchange between each unit, implementation procedure in above device, due to implementing with the method for the present invention
Example is based on same design, and particular content can be found in the narration in the method for the present invention embodiment, and details are not described herein again.
The embodiment of the present invention additionally provides a kind of computer-readable recording medium, including execute instruction, when the processor of storage control is held
During the row execute instruction, the storage control performs the method that any of the above-described embodiment of the present invention provides.
The embodiment of the present invention additionally provides a kind of storage control, including:Processor, memory and bus;The storage
Device is used to store execute instruction, and the processor is connected with the memory by the bus, when the storage control is transported
During row, the processor performs the execute instruction of the memory storage, so that the storage control performs the present invention
The method that any of the above-described embodiment provides.
In conclusion more than the present invention each embodiment at least has the advantages that:
1st, in embodiments of the present invention, corresponding public platform is determined according to the crawl keyword got, and from determining
Public platform in capture corresponding information.Since the information content issued in public platform is larger, and public platform is generally by specialty
Personage is managed, therefore content of its issue passes through comprehensive improvement so that the professional higher of content is issued, so that from the public
The information for meeting user demand can be grabbed in number.
2nd, in embodiments of the present invention, at least one public platform to be selected is determined according to crawl keyword, and from each
Multiple characteristic characters are parsed in the corresponding identification information of public platform to be selected, when crucial with crawl in the characteristic character parsed
When the quantity of the corresponding target character of word is more than predetermined threshold value, using public platform to be selected as the public corresponding with crawl keyword
Number.Thus, when being more than default quantity with the crawl corresponding target character quantity of keyword in the identification information of public platform to be selected
During threshold value, illustrate that the public platform and the degree of correlation for capturing keyword are higher, capture information from the public platform, advantageously allow to grab
The information taken is more consistent with crawl keyword, so as to can more meet the demand of user.
3rd, in embodiments of the present invention, after the historical information of each public platform issue is determined, from each bar history
Determine to capture with capturing the corresponding crawl information of keyword, and to crawl information in information, this causes what is grabbed
Information is consistent with crawl keyword, so as to can more meet the demand of user.
4th, in embodiments of the present invention, the crawl time carried in being asked according to crawl and the issue of each bar crawl information
Time, the target for determining to be consistent in information with the crawl time is captured from each bar and captures information, and target crawl information is grabbed
Taking, this make it that the information of crawl is not only corresponding with crawl keyword, also corresponding with the crawl time, so that grab
Information further conforms to the demand of user.
5th, in embodiments of the present invention, the issuing time for each bar crawl information issued according to each public platform, determines
The corresponding crawl information issue frequency of each public platform, and selection crawl information issue frequency is larger from each public platform
Target public platform, the crawl information then issued to the target public platform of selection capture, due to crawl information issue frequency
It is bigger, illustrate the corresponding field attention rate higher of public platform team crawl keyword, it is professional stronger, therefore will crawl information hair
Target public platform of the larger public platform of cloth frequency as information scratching, is conducive to make the information specialty degree higher grabbed, more
Meet user demand.
6th, in embodiments of the present invention, captured from each in information and parse multiple sample words, when in each sample word
During the mutual exclusion keyword carried in being asked there is no information scratching, information just is captured to this and is captured.Thus, can basis
Mutual exclusion keyword screens each crawl information, to the crawl information including mutual exclusion keyword without crawl so that grab
The information got more is matched with crawl keyword, more meets user demand.
It should be noted that herein, such as first and second etc relational terms are used merely to an entity
Or operation is distinguished with another entity or operation, is existed without necessarily requiring or implying between these entities or operation
Any actual relationship or order.Moreover, term " comprising ", "comprising" or its any other variant be intended to it is non-
It is exclusive to include, so that process, method, article or equipment including a series of elements not only include those key elements,
But also including other elements that are not explicitly listed, or further include solid by this process, method, article or equipment
Some key elements.In the absence of more restrictions, the key element limited by sentence " including one ", is not arranged
Except in the process, method, article or apparatus that includes the element also in the presence of other identical factor.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through
The relevant hardware of programmed instruction is completed, and foregoing program can be stored in computer-readable storage medium, the program
Upon execution, the step of execution includes above method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or light
Disk etc. is various can be with the medium of store program codes.
It is last it should be noted that:The foregoing is merely presently preferred embodiments of the present invention, is merely to illustrate skill of the invention
Art scheme, is not intended to limit the scope of the present invention.Any modification for being made within the spirit and principles of the invention,
Equivalent substitution, improvement etc., are all contained in protection scope of the present invention.
Claims (10)
- A kind of 1. information extraction method, it is characterised in that including:Obtain information scratching request;According to the crawl keyword carried in described information crawl request, determine corresponding at least one with the crawl keyword Public platform;From each public platform determined, crawl asks corresponding information with described information crawl.
- 2. according to the method described in claim 1, it is characterized in that,The crawl keyword carried in the crawl request according to described information, determines corresponding at least with the crawl keyword One public platform, including:According to the crawl keyword, at least one public platform to be selected is determined, and each described public platform to be selected is corresponding Identification information;For public platform to be selected each described, it is performed both by:At least one characteristic character is parsed from the identification information;Determine to capture the corresponding target character of keyword with described from least one characteristic character, and determine the mesh The quantity of marking-up symbol;When the quantity of the target character is more than default amount threshold, using the public platform to be selected as crucial with the crawl The corresponding public platform of word.
- 3. according to the method described in claim 1, it is characterized in that,It is described that crawl asks corresponding information with described information crawl from each public platform determined, including:Determine at least one historical information of each public platform issue;At least one crawl information corresponding with the crawl keyword is determined from least one historical information;Described at least one crawl information is captured.
- 4. according to the method described in claim 3, it is characterized in that,It is described that described at least one crawl information is captured, including:Determine to capture the corresponding issuing time of information described in each;According to the crawl time and the issuing time carried in the crawl request, from described at least one crawl information Determine that target captures information;The target crawl information determined is captured.
- 5. according to the method described in claim 3, it is characterized in that,It is described that described at least one crawl information is captured, including:The corresponding issuing time of information is captured described in each bar for determining each public platform issue;According to the issuing time, determine that each described public platform is corresponding respectively and capture information issue frequency;According to the crawl information issue frequency, the crawl information is selected to issue frequency from each public platform larger At least one target public platform;The crawl information issued to target public platform each described captures.
- 6. according to the method described in claim 3, it is characterized in that,It is described that described at least one crawl information is captured, including:For information is captured described in each, it is performed both by:At least one sample word is parsed from the crawl information;At least one mutual exclusion keyword is parsed from described information crawl request;Determine at least one sample word with the presence or absence of at least one target sample word it is identical with the mutual exclusion keyword, such as Fruit is no, and the crawl information is captured.
- A kind of 7. information scratching device, it is characterised in that including:Acquisition request unit, public platform determination unit and information scratching Unit;Wherein,The acquisition request unit, for obtaining information scratching request;The public platform determination unit, for according to the crawl keyword that carries in described information crawl request, determine with it is described Capture the corresponding at least one public platform of keyword;Described information placement unit, for from each public platform determined, crawl to capture request phase with described information Corresponding information.
- 8. device according to claim 7, it is characterised in thatThe public platform determination unit, for according to the crawl keyword, determining at least one public platform to be selected, and it is each The corresponding identification information of a public platform to be selected;For public platform to be selected each described, it is performed both by:From the identification information In parse at least one characteristic character;Determined from least one characteristic character corresponding with the crawl keyword Target character, and determine the quantity of the target character;When the quantity of the target character is more than default amount threshold, by institute Public platform to be selected is stated as the public platform corresponding with the crawl keyword.
- 9. device according to claim 7, it is characterised in thatDescribed information placement unit, for determining at least one historical information of each public platform issue;From it is described to At least one crawl information corresponding with the crawl keyword is determined in a few historical information, and is grabbed to described at least one Breath of winning the confidence is captured.
- 10. device according to claim 9, it is characterised in thatDescribed information placement unit, please according to the crawl for determining to capture the corresponding issuing time of information described in each The crawl time of middle carrying and the issuing time are asked, target crawl information is determined from described at least one crawl information, And the crawl of the target to determining information captures;And/orDescribed information placement unit, for determining to capture the corresponding issue of information described in each bar of each public platform issue Time, according to the issuing time, determines that each described public platform is corresponding and captures information issue frequency respectively;According to described Information issue frequency is captured, selects the crawl information to issue the larger at least one target of frequency from each public platform Public platform, and the crawl information issued to target public platform each described captures;And/orDescribed information placement unit, for for information is captured described in each, being performed both by:Parsed from the crawl information At least one sample word;At least one mutual exclusion keyword is parsed from described information crawl request;Determine described at least one It is identical with the mutual exclusion keyword with the presence or absence of at least one target sample word in sample word, if not, to the crawl information Captured.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711407157.6A CN108038221B (en) | 2017-12-22 | 2017-12-22 | Information capturing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711407157.6A CN108038221B (en) | 2017-12-22 | 2017-12-22 | Information capturing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108038221A true CN108038221A (en) | 2018-05-15 |
CN108038221B CN108038221B (en) | 2021-10-15 |
Family
ID=62100785
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711407157.6A Active CN108038221B (en) | 2017-12-22 | 2017-12-22 | Information capturing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108038221B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113657977A (en) * | 2021-10-21 | 2021-11-16 | 广州市格利网络技术有限公司 | Intelligent purchasing recommendation method and device based on industrial Internet |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101150802A (en) * | 2006-09-19 | 2008-03-26 | 北京三星通信技术研究有限公司 | Search method for mobile communication terminal and device using this method |
CN101236566A (en) * | 2008-03-06 | 2008-08-06 | 宇龙计算机通信科技(深圳)有限公司 | Designation inquiry method and system |
CN104182488A (en) * | 2014-08-08 | 2014-12-03 | 腾讯科技(深圳)有限公司 | Search method, server and client |
CN104391846A (en) * | 2014-04-28 | 2015-03-04 | 腾讯科技(深圳)有限公司 | Method and system for searching social application public account numbers |
CN105205140A (en) * | 2015-09-17 | 2015-12-30 | 小米科技有限责任公司 | Message pushing method and device |
CN105320740A (en) * | 2015-09-22 | 2016-02-10 | 清华大学 | WeChat article and official account acquisition method and acquisition system |
CN106355507A (en) * | 2016-09-05 | 2017-01-25 | 北京蓝色光标品牌管理顾问股份有限公司 | Official account activity level ranking method and ranking system |
CN106789559A (en) * | 2016-12-02 | 2017-05-31 | 上海智臻智能网络科技股份有限公司 | Information processing method, device and system for wechat public platform |
-
2017
- 2017-12-22 CN CN201711407157.6A patent/CN108038221B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101150802A (en) * | 2006-09-19 | 2008-03-26 | 北京三星通信技术研究有限公司 | Search method for mobile communication terminal and device using this method |
CN101236566A (en) * | 2008-03-06 | 2008-08-06 | 宇龙计算机通信科技(深圳)有限公司 | Designation inquiry method and system |
CN104391846A (en) * | 2014-04-28 | 2015-03-04 | 腾讯科技(深圳)有限公司 | Method and system for searching social application public account numbers |
CN104182488A (en) * | 2014-08-08 | 2014-12-03 | 腾讯科技(深圳)有限公司 | Search method, server and client |
CN105205140A (en) * | 2015-09-17 | 2015-12-30 | 小米科技有限责任公司 | Message pushing method and device |
CN105320740A (en) * | 2015-09-22 | 2016-02-10 | 清华大学 | WeChat article and official account acquisition method and acquisition system |
CN106355507A (en) * | 2016-09-05 | 2017-01-25 | 北京蓝色光标品牌管理顾问股份有限公司 | Official account activity level ranking method and ranking system |
CN106789559A (en) * | 2016-12-02 | 2017-05-31 | 上海智臻智能网络科技股份有限公司 | Information processing method, device and system for wechat public platform |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113657977A (en) * | 2021-10-21 | 2021-11-16 | 广州市格利网络技术有限公司 | Intelligent purchasing recommendation method and device based on industrial Internet |
Also Published As
Publication number | Publication date |
---|---|
CN108038221B (en) | 2021-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104408093B (en) | A kind of media event key element abstracting method and device | |
WO2017036047A1 (en) | Information extraction method and information extraction device | |
US10366154B2 (en) | Information processing device, information processing method, and computer program product | |
CN110377908B (en) | Semantic understanding method, semantic understanding device, semantic understanding equipment and readable storage medium | |
CN110334241A (en) | Quality detecting method, device, equipment and the computer readable storage medium of customer service recording | |
CN104426944B (en) | Information feedback method, device and terminal | |
CN111899829A (en) | Full-text retrieval matching engine based on ICD9/10 participle lexicon | |
CN110288190A (en) | Event notification method, event notification server, storage medium and device | |
CN110427453A (en) | Similarity calculating method, device, computer equipment and the storage medium of data | |
CN110232126A (en) | Hot spot method for digging and server and computer readable storage medium | |
CN106294717A (en) | Based on intelligent terminal search topic method and device | |
CN109903122A (en) | House prosperity transaction information processing method, device, equipment and storage medium | |
CN110275938B (en) | Knowledge extraction method and system based on unstructured document | |
CN109657043B (en) | Method, device and equipment for automatically generating article and storage medium | |
CN105045882A (en) | Hot word processing method and device | |
CN108875050B (en) | Text-oriented digital evidence-obtaining analysis method and device and computer readable medium | |
CN108038221A (en) | A kind of information extraction method and device | |
CN109036506A (en) | Monitoring and managing method, electronic device and the readable storage medium storing program for executing of internet medical treatment interrogation | |
CN103309993B (en) | The extracting method of a kind of key word and device | |
CN108038220A (en) | A kind of keyword methods of exhibiting and device | |
US20190303364A1 (en) | Searching method and apparatus, device and non-volatile computer storage medium | |
CN112529627B (en) | Method and device for extracting implicit attribute of commodity, computer equipment and storage medium | |
CN114693435A (en) | Intelligent return visit method and device for collection list, electronic equipment and storage medium | |
CN107391741A (en) | Searching method, searcher and the terminal device of sound bite | |
CN109284364B (en) | Interactive vocabulary updating method and device for voice microphone-connecting interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |