CN108038221A - A kind of information extraction method and device - Google Patents

A kind of information extraction method and device Download PDF

Info

Publication number
CN108038221A
CN108038221A CN201711407157.6A CN201711407157A CN108038221A CN 108038221 A CN108038221 A CN 108038221A CN 201711407157 A CN201711407157 A CN 201711407157A CN 108038221 A CN108038221 A CN 108038221A
Authority
CN
China
Prior art keywords
information
crawl
public platform
keyword
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711407157.6A
Other languages
Chinese (zh)
Other versions
CN108038221B (en
Inventor
温煦峰
翟素校
张静静
郝景坡
闵剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Austrian (china) Gas Investment Co Ltd
Original Assignee
New Austrian (china) Gas Investment Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New Austrian (china) Gas Investment Co Ltd filed Critical New Austrian (china) Gas Investment Co Ltd
Priority to CN201711407157.6A priority Critical patent/CN108038221B/en
Publication of CN108038221A publication Critical patent/CN108038221A/en
Application granted granted Critical
Publication of CN108038221B publication Critical patent/CN108038221B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The present invention provides a kind of information extraction method and device, this method includes:Obtain information scratching request;According to the crawl keyword carried in described information crawl request, at least one public platform corresponding with the crawl keyword is determined;From each public platform determined, crawl asks corresponding information with described information crawl.This programme can grab the information for meeting user demand.

Description

A kind of information extraction method and device
Technical field
The present invention relates to field of computer technology, more particularly to a kind of information extraction method and device.
Background technology
With the continuous development of Internet technology, more and more enterprises or tissue have been got used to capturing it from internet The information needed.
At present, data mainly are captured from the webpage of each portal website.For example, include in a webpage of certain website Entertainment news display module and News display module, then can be respectively from entertainment news display module and News displaying mould Block captures corresponding information respectively.
But due to the fast development of mobile Internet, more and more users get used to sending out by mobile Internet Cloth and check information, correspondingly, the information that the webpage of each portal website is shown is fewer and fewer, it is difficult to shown from webpage The information for meeting user demand is grabbed in information, therefore, how to obtain the information for meeting user demand then as urgently to be resolved hurrily The problem of.
The content of the invention
An embodiment of the present invention provides a kind of information extraction method and device, can grab the information for meeting user demand.
In a first aspect, an embodiment of the present invention provides a kind of information extraction method, including:
Obtain information scratching request;
According to the crawl keyword carried in described information crawl request, determine corresponding at least with the crawl keyword One public platform;
From each public platform determined, crawl asks corresponding information with described information crawl.
Preferably,
The crawl keyword carried in the crawl request according to described information, determines corresponding with the crawl keyword At least one public platform, including:
According to the crawl keyword, at least one public platform to be selected, and each described public platform pair to be selected are determined The identification information answered;
For public platform to be selected each described, it is performed both by:
At least one characteristic character is parsed from the identification information;
Determine to capture the corresponding target character of keyword with described from least one characteristic character, and determine institute State the quantity of target character;
When the quantity of the target character is more than default amount threshold, using the public platform to be selected as with the crawl The corresponding public platform of keyword.
Preferably,
Described crawl captures with described information and asks corresponding information from each public platform determined, wraps Include:
Determine at least one historical information of each public platform issue;
At least one crawl information corresponding with the crawl keyword is determined from least one historical information;
Described at least one crawl information is captured.
Preferably,
It is described that described at least one crawl information is captured, including:
Determine to capture the corresponding issuing time of information described in each;
According to the crawl time and the issuing time carried in the crawl request, from described at least one crawl letter Target crawl information is determined in breath;
The target crawl information determined is captured.
Preferably,
It is described that described at least one crawl information is captured, including:
The corresponding issuing time of information is captured described in each bar for determining each public platform issue;
According to the issuing time, determine that each described public platform is corresponding respectively and capture information issue frequency;
According to it is described crawl information issue frequency, selected from each public platform it is described crawl information issue frequency compared with Big at least one target public platform;
The crawl information issued to target public platform each described captures.
Preferably,
It is described that described at least one crawl information is captured, including:
For information is captured described in each, it is performed both by:
At least one sample word is parsed from the crawl information;
At least one mutual exclusion keyword is parsed from described information crawl request;
Determine to whether there is at least one target sample word and the mutual exclusion keyword phase at least one sample word Together, if not, being captured to the crawl information.
Second aspect, an embodiment of the present invention provides a kind of information scratching device, including:Acquisition request unit, public platform Determination unit and information scratching unit;Wherein,
The acquisition request unit, for obtaining information scratching request;
The public platform determination unit, for according to the crawl keyword that carries in described information crawl request, determine with It is described to capture the corresponding at least one public platform of keyword;
Described information placement unit, for from each public platform determined, crawl please with described information crawl Seek corresponding information.
Preferably,
The public platform determination unit, for according to the crawl keyword, determining at least one public platform to be selected, and The corresponding identification information of each described public platform to be selected;For public platform to be selected each described, it is performed both by:From the mark At least one characteristic character is parsed in information;Determined from least one characteristic character opposite with the crawl keyword The target character answered, and determine the quantity of the target character;When the quantity of the target character is more than default amount threshold, Using the public platform to be selected as the public platform corresponding with the crawl keyword.
Preferably,
Described information placement unit, for determining at least one historical information of each public platform issue;From institute State and at least one crawl information corresponding with the crawl keyword is determined at least one historical information, and to described at least one Bar crawl information is captured.
Preferably,
Described information placement unit, for determining to capture the corresponding issuing time of information described in each, is grabbed according to described The crawl time carried in request and the issuing time are taken, target crawl letter is determined from described at least one crawl information Breath, and the crawl of the target to determining information captures;
Preferably,
Described information placement unit, for determining that crawl information described in each bar of each public platform issue is corresponding Issuing time, according to the issuing time, determines that each described public platform is corresponding and captures information issue frequency respectively;According to The crawl information issue frequency, selects the crawl information to issue larger at least one of frequency from each public platform Target public platform, and the crawl information issued to target public platform each described captures;
Preferably,
Described information placement unit, for for information is captured described in each, being performed both by:Solved from the crawl information Separate out at least one sample word;At least one mutual exclusion keyword is parsed from described information crawl request;Described in determining at least It is identical with the mutual exclusion keyword with the presence or absence of at least one target sample word in one sample word, if not, to the crawl Information is captured.
An embodiment of the present invention provides a kind of information extraction method and device, and phase is determined according to the crawl keyword got The public platform answered, and corresponding information is captured from the public platform determined.Since the information content issued in public platform is larger, And public platform is generally managed by professional person, therefore the content of its issue passes through comprehensive improvement so that issues content It is professional higher, so that the information for meeting user demand can be grabbed from public platform.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are the present invention Some embodiments, for those of ordinary skill in the art, without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 is a kind of flow chart of information extraction method provided by one embodiment of the present invention;
Fig. 2 is a kind of flow chart for information extraction method that another embodiment of the present invention provides;
Fig. 3 is a kind of structure diagram of information scratching device provided by one embodiment of the present invention;
Fig. 4 is a kind of structure diagram of information scratching device provided by one embodiment of the present invention.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art The all other embodiments obtained on the premise of creative work is not made, belong to the scope of protection of the invention.
As shown in Figure 1, an embodiment of the present invention provides a kind of information extraction method, this method may comprise steps of:
Step 101:Obtain information scratching request;
Step 102:According to the crawl keyword carried in described information crawl request, determine and the crawl keyword pair At least one public platform answered;
Step 103:From each public platform determined, crawl asks corresponding letter with described information crawl Breath.
In above-described embodiment, corresponding public platform is determined according to the crawl keyword got, and from the public determined Corresponding information is captured in number.Since the information content issued in public platform is larger, and public platform generally by professional person into Row management, therefore the content of its issue passes through comprehensive improvement so that the professional higher of content is issued, so that the energy from public platform Grab the information for meeting user demand.
In one embodiment of the invention, the embodiment of step 102, can include:
According to the crawl keyword, at least one public platform to be selected, and each described public platform pair to be selected are determined The identification information answered;
For public platform to be selected each described, it is performed both by:
At least one characteristic character is parsed from the identification information;
Determine to capture the corresponding target character of keyword with described from least one characteristic character, and determine institute State the quantity of target character;
When the quantity of the target character is more than default amount threshold, using the public platform to be selected as with the crawl The corresponding public platform of keyword.
For example, the crawl keyword parsed from information scratching request is " natural gas ", then searched with " natural gas " Rope, can search for multiple information for including " natural gas ", and each information is corresponding with public platform respectively, then these public platforms are and grab Take the corresponding public platform to be selected of keyword " natural gas ".Then, it is determined that the corresponding title of each public platform to be selected, function introduction and Identification informations such as account main body, and characteristic character is parsed from identification information, when parsing, can split into identification information multiple Word, and removing in the word that splits out without sincere auxiliary word and pronoun etc., using remaining word after removal as Feature Words, i.e., Characteristic character.It is then determined that with capturing keyword " natural gas " corresponding target character and these mesh in these characteristic characters The quantity of marking-up symbol, these target characters can be " natural gas ", or the near synonym of " natural gas ", or " natural gas " Corresponding phonetic or foreign language word etc..When the quantity of target character is more than default amount threshold, illustrate the public platform to be selected with It is higher to capture the degree of correlation of keyword " natural gas ", then as " natural gas " corresponding public platform.Thus, as the public to be selected Number identification information in when being more than default amount threshold with the crawl corresponding target character quantity of keyword, illustrate the public platform It is higher with the degree of correlation of crawl keyword, information is captured from the public platform, information and the crawl for advantageously allowing crawl are crucial Word is more consistent, so as to can more meet the demand of user.
In one embodiment of the invention, the embodiment of step 103, can include:
Determine at least one historical information of each public platform issue;
At least one crawl information corresponding with the crawl keyword is determined from least one historical information;
Described at least one crawl information is captured.
Herein, each public platform has issued a plurality of historical information, determines to close with crawl from a plurality of historical information The corresponding crawl information of keyword, and only the crawl information determined is captured so that the information grabbed is crucial with crawl Word is consistent, so as to can more meet the demand of user.
It is described that described at least one crawl information is captured in one embodiment of the invention, including:
Determine to capture the corresponding issuing time of information described in each;
According to the crawl time and the issuing time carried in the crawl request, from described at least one crawl letter Target crawl information is determined in breath;
The target crawl information determined is captured.
Herein, the crawl time can be limited when user inputs information scratching request, for example, the crawl time in crawl request Arrived on January 1st, 2017, then after crawl information corresponding with crawl keyword is determined, can be captured and believed according to each The issuing time of breath, determines if to be consistent with the crawl time, for example, the issuing time of certain crawl information is September 1 in 2017 Day, illustrate that its issuing time in the range of the time is captured, at this time just captures the crawl information, so that crawl Information is not only corresponding with crawl keyword, also corresponding with the crawl time, so that the information grabbed further conforms to The demand of user.
It is described that described at least one crawl information is captured in one embodiment of the invention, including:
The corresponding issuing time of information is captured described in each bar for determining each public platform issue;
According to the issuing time, determine that each described public platform is corresponding respectively and capture information issue frequency;
According to it is described crawl information issue frequency, selected from each public platform it is described crawl information issue frequency compared with Big at least one target public platform;
The crawl information issued to target public platform each described captures.
Herein, the issuing time for each bar crawl information issued according to public platform, it may be determined that the public platform is corresponding to grab Breath of winning the confidence issues frequency, for example, public platform A issued crawl information 10 in 1 year, then the crawl information of public platform A issues frequency Rate is 10/year, similarly, if public platform B issued crawl information 100, the crawl information issue frequency of public platform B in 1 year Rate is 100/year.Obviously, the crawl information issue frequency of public platform B is bigger than public platform A, illustrates that public platform B is crucial to crawl The corresponding field attention rate higher of word, it is professional stronger, therefore the larger public platform of information issue frequency will be captured as information The target public platform of crawl, is conducive to make the information specialty degree higher grabbed, more meets user demand.
It is described that described at least one crawl information is captured in one embodiment of the invention, including:
For information is captured described in each, it is performed both by:
At least one sample word is parsed from the crawl information;
At least one mutual exclusion keyword is parsed from described information crawl request;
Determine to whether there is at least one target sample word and the mutual exclusion keyword phase at least one sample word Together, if not, being captured to the crawl information.
In order to avoid the interference of other information, user can add mutual exclusion keyword in information scratching request, for example, user Need the information that captures only related to " natural gas ", and it is unrelated with " coal gas ", then, will " natural gas " work in information scratching request To capture keyword, and " coal gas " is used as mutual exclusion keyword.Each can then be captured the mark of information by information scratching device The content such as topic and text is split, and multiple sample words is formed, it is then determined that whether having in each sample word and mutual exclusion keyword " coal gas " identical word, only when " coal gas " is not present in each sample word, just captures this crawl information.By This, can screen each crawl information according to mutual exclusion keyword, to the crawl information including mutual exclusion keyword without grabbing Take so that the information grabbed is more matched with crawl keyword, more meets user demand.
Below exemplified by capturing " natural gas " corresponding information, information extraction method provided in an embodiment of the present invention is carried out Describe in detail, as shown in Fig. 2, this method may comprise steps of:
Step 201:Information scratching request is obtained, the crawl request includes:Capture keyword " natural gas ", mutual exclusion is closed Keyword " coal gas " and crawl time " 2017.1.1-2017.10.1 ".
Step 202:According to crawl keyword " natural gas ", at least one public platform to be selected is determined, and each is to be selected The corresponding identification information of public platform.
Herein, scanned for " natural gas ", can search for multiple information for including " natural gas ", each information difference Public platform is corresponding with, then these public platforms are to capture the corresponding public platform to be selected of keyword " natural gas ".It is it is then determined that each The identification informations such as public platform to be selected corresponding title, function introduction and account main body.
Step 203:For public platform to be selected each described, it is performed both by:At least one is parsed from the identification information A characteristic character, and definite and " natural gas " corresponding target character from least one characteristic character, and determine institute State the quantity of target character.
Herein, the identification informations such as the corresponding title of public platform to be selected, function introduction and account main body are split into multiple Word, and removing in the word that splits out without sincere auxiliary word and pronoun etc., using remaining word after removal as Feature Words, i.e., Characteristic character.It is then determined that with capturing keyword " natural gas " corresponding target character and these mesh in these characteristic characters The quantity of marking-up symbol, these target characters can be " natural gas ", or the near synonym of " natural gas ", or " natural gas " Corresponding phonetic or foreign language word etc..
Step 204:Judge whether the quantity of the target character is more than default amount threshold, if so, performing step 205, otherwise terminate current process.
Step 205:It is " natural gas " corresponding public platform to determine the public platform to be selected, and determines the public platform issue At least one historical information, from least one historical information determine it is corresponding with " natural gas " at least one crawl believe Breath.
When the quantity of target character is more than default amount threshold, illustrate that the public platform to be selected and crawl keyword are " natural The degree of correlation of gas " is higher, then as " natural gas " corresponding public platform.
Step 206:The corresponding issuing time of information is captured described in each bar for determining each public platform issue, according to The issuing time, determines that each described public platform is corresponding and captures information issue frequency respectively.
For example, public platform A issued crawl information 10 in 1 year, then the crawl information issue frequency of public platform A is 10 Bar/year, similarly, if public platform B issued crawl information 100 in 1 year, the crawl information issue frequency of public platform B is 100/year.
Step 207:According to the crawl information issue frequency, the crawl information is selected to send out from each public platform The larger at least one target public platform of cloth frequency.
The crawl information issue frequency bigger of public platform, illustrates the public platform to the corresponding field attention rate of crawl keyword Higher, it is professional stronger.Herein, i.e., using public platform B as target public platform.
Step 208:Determine to capture the corresponding issuing time of information described in each of target public platform issue, grabbed from each bar Win the confidence and determine that target corresponding with crawl time " 2017.1.1-2017.10.1 " captures information in breath.
The issuing time of information is captured when capturing in the range of the time, just the crawl information is captured, is thus made The information that must be captured not only with crawl keyword it is corresponding, also with crawl the time it is corresponding so that the information grabbed into One step meets the demand of user.
Step 209:Information is captured for each target, is performed both by:Capture in information and parsed at least from the target One sample word.
Step 210:Judge crucial with mutual exclusion with the presence or absence of at least one target sample word at least one sample word Word " coal gas " is identical, if it is, terminating current process, otherwise performs step 211.
Step 211:Target crawl information is captured.
When " coal gas " is not present in each sample word, just this crawl information is captured.Thus, can be according to mutual exclusion Keyword screens each crawl information, to the crawl information including mutual exclusion keyword without crawl so that grab Information with crawl keyword more match, more meet user demand.
As shown in figure 3, an embodiment of the present invention provides a kind of information scratching device, including:Acquisition request unit 301, public affairs Crowd's determination unit 302 and information scratching unit 303;Wherein,
The acquisition request unit 301, for obtaining information scratching request;
The public platform determination unit 302, for according to the crawl keyword carried in described information crawl request, determining At least one public platform corresponding with the crawl keyword;
Described information placement unit 303, for from each public platform determined, crawl to be captured with described information Ask corresponding information.
As shown in figure 4, in one embodiment of the invention, the public platform determination unit 302, it is true can to include identification information Stator unit 3021, quantity determination subelement 3022 and quantity contrast subunit 3023;Wherein,
The identification information determination subelement 3021, for according to the crawl keyword, determining at least one public affairs to be selected Many numbers, and the corresponding identification information of each described public platform to be selected;
The quantity determination subelement 3022, for for public platform to be selected each described, being performed both by:From the mark At least one characteristic character is parsed in information;Determined from least one characteristic character opposite with the crawl keyword The target character answered, and determine the quantity of the target character;
Quantitative comparison's subelement 3023, for when the quantity of the target character is more than default amount threshold, inciting somebody to action The public platform to be selected is as the public platform corresponding with the crawl keyword.
In one embodiment of the invention, described information placement unit 303, for determining each public platform issue At least one historical information;Determine to grab with the crawl keyword corresponding at least one from least one historical information Win the confidence breath, and described at least one crawl information is captured.
In one embodiment of the invention, described information placement unit 303, for determining that information is captured described in each to be corresponded to Issuing time, according to crawl time for carrying and the issuing time in the crawl request, grabbed from described at least one Win the confidence and target crawl information is determined in breath, and the crawl of the target to determining information captures;
In one embodiment of the invention, described information placement unit 303, for determining each public platform issue The corresponding issuing time of information is captured described in each bar, according to the issuing time, determines that each described public platform corresponds to respectively Crawl information issue frequency;According to the crawl information issue frequency, the crawl letter is selected from each public platform The larger at least one target public platform of breath issue frequency, and to the crawl information of target public platform issue each described Captured.
In one embodiment of the invention, described information placement unit 303, for for information is captured described in each, Perform:At least one sample word is parsed from the crawl information;Parsed from described information crawl request at least one Mutual exclusion keyword;Determine at least one sample word with the presence or absence of at least one target sample word and the mutual exclusion keyword It is identical, if not, being captured to the crawl information.
The contents such as the information exchange between each unit, implementation procedure in above device, due to implementing with the method for the present invention Example is based on same design, and particular content can be found in the narration in the method for the present invention embodiment, and details are not described herein again.
The embodiment of the present invention additionally provides a kind of computer-readable recording medium, including execute instruction, when the processor of storage control is held During the row execute instruction, the storage control performs the method that any of the above-described embodiment of the present invention provides.
The embodiment of the present invention additionally provides a kind of storage control, including:Processor, memory and bus;The storage Device is used to store execute instruction, and the processor is connected with the memory by the bus, when the storage control is transported During row, the processor performs the execute instruction of the memory storage, so that the storage control performs the present invention The method that any of the above-described embodiment provides.
In conclusion more than the present invention each embodiment at least has the advantages that:
1st, in embodiments of the present invention, corresponding public platform is determined according to the crawl keyword got, and from determining Public platform in capture corresponding information.Since the information content issued in public platform is larger, and public platform is generally by specialty Personage is managed, therefore content of its issue passes through comprehensive improvement so that the professional higher of content is issued, so that from the public The information for meeting user demand can be grabbed in number.
2nd, in embodiments of the present invention, at least one public platform to be selected is determined according to crawl keyword, and from each Multiple characteristic characters are parsed in the corresponding identification information of public platform to be selected, when crucial with crawl in the characteristic character parsed When the quantity of the corresponding target character of word is more than predetermined threshold value, using public platform to be selected as the public corresponding with crawl keyword Number.Thus, when being more than default quantity with the crawl corresponding target character quantity of keyword in the identification information of public platform to be selected During threshold value, illustrate that the public platform and the degree of correlation for capturing keyword are higher, capture information from the public platform, advantageously allow to grab The information taken is more consistent with crawl keyword, so as to can more meet the demand of user.
3rd, in embodiments of the present invention, after the historical information of each public platform issue is determined, from each bar history Determine to capture with capturing the corresponding crawl information of keyword, and to crawl information in information, this causes what is grabbed Information is consistent with crawl keyword, so as to can more meet the demand of user.
4th, in embodiments of the present invention, the crawl time carried in being asked according to crawl and the issue of each bar crawl information Time, the target for determining to be consistent in information with the crawl time is captured from each bar and captures information, and target crawl information is grabbed Taking, this make it that the information of crawl is not only corresponding with crawl keyword, also corresponding with the crawl time, so that grab Information further conforms to the demand of user.
5th, in embodiments of the present invention, the issuing time for each bar crawl information issued according to each public platform, determines The corresponding crawl information issue frequency of each public platform, and selection crawl information issue frequency is larger from each public platform Target public platform, the crawl information then issued to the target public platform of selection capture, due to crawl information issue frequency It is bigger, illustrate the corresponding field attention rate higher of public platform team crawl keyword, it is professional stronger, therefore will crawl information hair Target public platform of the larger public platform of cloth frequency as information scratching, is conducive to make the information specialty degree higher grabbed, more Meet user demand.
6th, in embodiments of the present invention, captured from each in information and parse multiple sample words, when in each sample word During the mutual exclusion keyword carried in being asked there is no information scratching, information just is captured to this and is captured.Thus, can basis Mutual exclusion keyword screens each crawl information, to the crawl information including mutual exclusion keyword without crawl so that grab The information got more is matched with crawl keyword, more meets user demand.
It should be noted that herein, such as first and second etc relational terms are used merely to an entity Or operation is distinguished with another entity or operation, is existed without necessarily requiring or implying between these entities or operation Any actual relationship or order.Moreover, term " comprising ", "comprising" or its any other variant be intended to it is non- It is exclusive to include, so that process, method, article or equipment including a series of elements not only include those key elements, But also including other elements that are not explicitly listed, or further include solid by this process, method, article or equipment Some key elements.In the absence of more restrictions, the key element limited by sentence " including one ", is not arranged Except in the process, method, article or apparatus that includes the element also in the presence of other identical factor.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through The relevant hardware of programmed instruction is completed, and foregoing program can be stored in computer-readable storage medium, the program Upon execution, the step of execution includes above method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or light Disk etc. is various can be with the medium of store program codes.
It is last it should be noted that:The foregoing is merely presently preferred embodiments of the present invention, is merely to illustrate skill of the invention Art scheme, is not intended to limit the scope of the present invention.Any modification for being made within the spirit and principles of the invention, Equivalent substitution, improvement etc., are all contained in protection scope of the present invention.

Claims (10)

  1. A kind of 1. information extraction method, it is characterised in that including:
    Obtain information scratching request;
    According to the crawl keyword carried in described information crawl request, determine corresponding at least one with the crawl keyword Public platform;
    From each public platform determined, crawl asks corresponding information with described information crawl.
  2. 2. according to the method described in claim 1, it is characterized in that,
    The crawl keyword carried in the crawl request according to described information, determines corresponding at least with the crawl keyword One public platform, including:
    According to the crawl keyword, at least one public platform to be selected is determined, and each described public platform to be selected is corresponding Identification information;
    For public platform to be selected each described, it is performed both by:
    At least one characteristic character is parsed from the identification information;
    Determine to capture the corresponding target character of keyword with described from least one characteristic character, and determine the mesh The quantity of marking-up symbol;
    When the quantity of the target character is more than default amount threshold, using the public platform to be selected as crucial with the crawl The corresponding public platform of word.
  3. 3. according to the method described in claim 1, it is characterized in that,
    It is described that crawl asks corresponding information with described information crawl from each public platform determined, including:
    Determine at least one historical information of each public platform issue;
    At least one crawl information corresponding with the crawl keyword is determined from least one historical information;
    Described at least one crawl information is captured.
  4. 4. according to the method described in claim 3, it is characterized in that,
    It is described that described at least one crawl information is captured, including:
    Determine to capture the corresponding issuing time of information described in each;
    According to the crawl time and the issuing time carried in the crawl request, from described at least one crawl information Determine that target captures information;
    The target crawl information determined is captured.
  5. 5. according to the method described in claim 3, it is characterized in that,
    It is described that described at least one crawl information is captured, including:
    The corresponding issuing time of information is captured described in each bar for determining each public platform issue;
    According to the issuing time, determine that each described public platform is corresponding respectively and capture information issue frequency;
    According to the crawl information issue frequency, the crawl information is selected to issue frequency from each public platform larger At least one target public platform;
    The crawl information issued to target public platform each described captures.
  6. 6. according to the method described in claim 3, it is characterized in that,
    It is described that described at least one crawl information is captured, including:
    For information is captured described in each, it is performed both by:
    At least one sample word is parsed from the crawl information;
    At least one mutual exclusion keyword is parsed from described information crawl request;
    Determine at least one sample word with the presence or absence of at least one target sample word it is identical with the mutual exclusion keyword, such as Fruit is no, and the crawl information is captured.
  7. A kind of 7. information scratching device, it is characterised in that including:Acquisition request unit, public platform determination unit and information scratching Unit;Wherein,
    The acquisition request unit, for obtaining information scratching request;
    The public platform determination unit, for according to the crawl keyword that carries in described information crawl request, determine with it is described Capture the corresponding at least one public platform of keyword;
    Described information placement unit, for from each public platform determined, crawl to capture request phase with described information Corresponding information.
  8. 8. device according to claim 7, it is characterised in that
    The public platform determination unit, for according to the crawl keyword, determining at least one public platform to be selected, and it is each The corresponding identification information of a public platform to be selected;For public platform to be selected each described, it is performed both by:From the identification information In parse at least one characteristic character;Determined from least one characteristic character corresponding with the crawl keyword Target character, and determine the quantity of the target character;When the quantity of the target character is more than default amount threshold, by institute Public platform to be selected is stated as the public platform corresponding with the crawl keyword.
  9. 9. device according to claim 7, it is characterised in that
    Described information placement unit, for determining at least one historical information of each public platform issue;From it is described to At least one crawl information corresponding with the crawl keyword is determined in a few historical information, and is grabbed to described at least one Breath of winning the confidence is captured.
  10. 10. device according to claim 9, it is characterised in that
    Described information placement unit, please according to the crawl for determining to capture the corresponding issuing time of information described in each The crawl time of middle carrying and the issuing time are asked, target crawl information is determined from described at least one crawl information, And the crawl of the target to determining information captures;
    And/or
    Described information placement unit, for determining to capture the corresponding issue of information described in each bar of each public platform issue Time, according to the issuing time, determines that each described public platform is corresponding and captures information issue frequency respectively;According to described Information issue frequency is captured, selects the crawl information to issue the larger at least one target of frequency from each public platform Public platform, and the crawl information issued to target public platform each described captures;
    And/or
    Described information placement unit, for for information is captured described in each, being performed both by:Parsed from the crawl information At least one sample word;At least one mutual exclusion keyword is parsed from described information crawl request;Determine described at least one It is identical with the mutual exclusion keyword with the presence or absence of at least one target sample word in sample word, if not, to the crawl information Captured.
CN201711407157.6A 2017-12-22 2017-12-22 Information capturing method and device Active CN108038221B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711407157.6A CN108038221B (en) 2017-12-22 2017-12-22 Information capturing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711407157.6A CN108038221B (en) 2017-12-22 2017-12-22 Information capturing method and device

Publications (2)

Publication Number Publication Date
CN108038221A true CN108038221A (en) 2018-05-15
CN108038221B CN108038221B (en) 2021-10-15

Family

ID=62100785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711407157.6A Active CN108038221B (en) 2017-12-22 2017-12-22 Information capturing method and device

Country Status (1)

Country Link
CN (1) CN108038221B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657977A (en) * 2021-10-21 2021-11-16 广州市格利网络技术有限公司 Intelligent purchasing recommendation method and device based on industrial Internet

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101150802A (en) * 2006-09-19 2008-03-26 北京三星通信技术研究有限公司 Search method for mobile communication terminal and device using this method
CN101236566A (en) * 2008-03-06 2008-08-06 宇龙计算机通信科技(深圳)有限公司 Designation inquiry method and system
CN104182488A (en) * 2014-08-08 2014-12-03 腾讯科技(深圳)有限公司 Search method, server and client
CN104391846A (en) * 2014-04-28 2015-03-04 腾讯科技(深圳)有限公司 Method and system for searching social application public account numbers
CN105205140A (en) * 2015-09-17 2015-12-30 小米科技有限责任公司 Message pushing method and device
CN105320740A (en) * 2015-09-22 2016-02-10 清华大学 WeChat article and official account acquisition method and acquisition system
CN106355507A (en) * 2016-09-05 2017-01-25 北京蓝色光标品牌管理顾问股份有限公司 Official account activity level ranking method and ranking system
CN106789559A (en) * 2016-12-02 2017-05-31 上海智臻智能网络科技股份有限公司 Information processing method, device and system for wechat public platform

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101150802A (en) * 2006-09-19 2008-03-26 北京三星通信技术研究有限公司 Search method for mobile communication terminal and device using this method
CN101236566A (en) * 2008-03-06 2008-08-06 宇龙计算机通信科技(深圳)有限公司 Designation inquiry method and system
CN104391846A (en) * 2014-04-28 2015-03-04 腾讯科技(深圳)有限公司 Method and system for searching social application public account numbers
CN104182488A (en) * 2014-08-08 2014-12-03 腾讯科技(深圳)有限公司 Search method, server and client
CN105205140A (en) * 2015-09-17 2015-12-30 小米科技有限责任公司 Message pushing method and device
CN105320740A (en) * 2015-09-22 2016-02-10 清华大学 WeChat article and official account acquisition method and acquisition system
CN106355507A (en) * 2016-09-05 2017-01-25 北京蓝色光标品牌管理顾问股份有限公司 Official account activity level ranking method and ranking system
CN106789559A (en) * 2016-12-02 2017-05-31 上海智臻智能网络科技股份有限公司 Information processing method, device and system for wechat public platform

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657977A (en) * 2021-10-21 2021-11-16 广州市格利网络技术有限公司 Intelligent purchasing recommendation method and device based on industrial Internet

Also Published As

Publication number Publication date
CN108038221B (en) 2021-10-15

Similar Documents

Publication Publication Date Title
CN104408093B (en) A kind of media event key element abstracting method and device
WO2017036047A1 (en) Information extraction method and information extraction device
US10366154B2 (en) Information processing device, information processing method, and computer program product
CN110377908B (en) Semantic understanding method, semantic understanding device, semantic understanding equipment and readable storage medium
CN110334241A (en) Quality detecting method, device, equipment and the computer readable storage medium of customer service recording
CN104426944B (en) Information feedback method, device and terminal
CN111899829A (en) Full-text retrieval matching engine based on ICD9/10 participle lexicon
CN110288190A (en) Event notification method, event notification server, storage medium and device
CN110427453A (en) Similarity calculating method, device, computer equipment and the storage medium of data
CN110232126A (en) Hot spot method for digging and server and computer readable storage medium
CN106294717A (en) Based on intelligent terminal search topic method and device
CN109903122A (en) House prosperity transaction information processing method, device, equipment and storage medium
CN110275938B (en) Knowledge extraction method and system based on unstructured document
CN109657043B (en) Method, device and equipment for automatically generating article and storage medium
CN105045882A (en) Hot word processing method and device
CN108875050B (en) Text-oriented digital evidence-obtaining analysis method and device and computer readable medium
CN108038221A (en) A kind of information extraction method and device
CN109036506A (en) Monitoring and managing method, electronic device and the readable storage medium storing program for executing of internet medical treatment interrogation
CN103309993B (en) The extracting method of a kind of key word and device
CN108038220A (en) A kind of keyword methods of exhibiting and device
US20190303364A1 (en) Searching method and apparatus, device and non-volatile computer storage medium
CN112529627B (en) Method and device for extracting implicit attribute of commodity, computer equipment and storage medium
CN114693435A (en) Intelligent return visit method and device for collection list, electronic equipment and storage medium
CN107391741A (en) Searching method, searcher and the terminal device of sound bite
CN109284364B (en) Interactive vocabulary updating method and device for voice microphone-connecting interaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant