CN103186633B - A kind of structured message abstracting method, searching method and device - Google Patents

A kind of structured message abstracting method, searching method and device Download PDF

Info

Publication number
CN103186633B
CN103186633B CN201110459457.5A CN201110459457A CN103186633B CN 103186633 B CN103186633 B CN 103186633B CN 201110459457 A CN201110459457 A CN 201110459457A CN 103186633 B CN103186633 B CN 103186633B
Authority
CN
China
Prior art keywords
property value
word
pattern
seed
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110459457.5A
Other languages
Chinese (zh)
Other versions
CN103186633A (en
Inventor
李永强
杨帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201110459457.5A priority Critical patent/CN103186633B/en
Publication of CN103186633A publication Critical patent/CN103186633A/en
Application granted granted Critical
Publication of CN103186633B publication Critical patent/CN103186633B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a kind of structured message abstracting method, searching method and device, wherein said structured message abstracting method includes: S1, the statement set that acquisition comprises preset attribute word from language material;S2, the statement that acquisition comprises preset attribute value from described statement set are gathered as seed, composition seed;S3, the pattern extracting the pattern described preset attribute word of composition from each seed corresponding gather;S4, utilize in the statement set that each pattern obtains to step S1 and extract property value, obtain the property value set that described preset attribute word is corresponding;S5, utilize described property value set to mate in statement set, extract new seed, go to step S3, until algorithmic statement, obtain structured message.Compared to prior art, the present invention can automatically set up template and excavate property value, it is achieved the extraction of structured message, more saves human resources, improves efficiency and the recall rate of search engine.

Description

A kind of structured message abstracting method, searching method and device
[technical field]
The present invention relates to information retrieval and natural language processing technique field, particularly to a kind of structuring letter Breath abstracting method, searching method and device.
[background technology]
Along with the development of communication technology and network, the growing magnanimity information relating to every field Resource, proposes huge challenge to information tissue and retrieval.Search engine is the weight that people obtain information Wanting approach, required information is extracted from webpage by search engine according to the search word (query) of input Come, and return Search Results.When information data scale constantly increases, due to information destructuring, letter Breath wide variety, the document content covering scope factor such as extensively, cause search efficiency of search engine low, Recall rate is low, it is desirable to directly find the answer oneself wanted, more difficulty in a search engine.
Structured data searching is also referred to as vertical search, is relative to the containing much information of universal search, inquires about The new search pattern of the propositions such as inaccurate, the degree of depth is inadequate.In order to realize structured data searching, need Info web is carried out structuring extraction, the unstructured data of webpage is taken into specific structuring number According to, and these data are stored data base, set up and be indexed for search.Info web is by some Sentence composition, drawing-out structure data from sentence.For single sentence, some expression is certain Individual entity associates by occurring certain action to produce with another property value.Such as: Liu De Huakao in 1978 Enter wireless performer training class.Wherein, entity word is " Liu Dehua ", and attribute word is " being admitted to ", " nothing Line performer training class " it is property value.Under normal circumstances, the entity word in webpage is more fixed, and belongs to The expression-form of property word can compare many, such as " is admitted to " and can also be expressed as " being admitted to " " attending school " etc. Synonym.The process carrying out structuring extraction is to be analyzed sentence, thus extracts entity in sentence The property value that word is corresponding with attribute word.
In the method for existing structural data, entity word and attribute word are respectively by the entity word word preset Allusion quotation and attribute word dictionary carry out match cognization, find the entity word matched or attribute word in webpage, then Extraction obtains the property value of correspondence.If not comprising the expression of a certain form in attribute word dictionary, it is right The property value answered will be unable to identified, causes recall rate low.Although attribute word dictionary can also pass through Artificial or combine the mode that synonymicon adds and add other synonym attribute words to attribute word dictionary In, but artificial addition manner labor intensive resource, inefficient and recall rate is relatively low;In conjunction with synonym The mode of dictionary, equally exists the problem that recall rate is relatively low.
[summary of the invention]
The invention provides a kind of structured message abstracting method, searching method and device, it is possible to automatically build Shuttering and excavation property value, it is achieved the extraction of structured message, more save human resources, and raising is searched Index the efficiency and recall rate held up.
Concrete technical scheme is as follows:
A kind of structured message abstracting method, the method includes:
S1, obtain from language material and comprise the statement set of preset attribute word;
S2, obtain from described statement set and comprise the statement of preset attribute value as seed seed, constitute Seed gathers;
S3, the pattern collection that the described preset attribute word of extraction template pattern composition is corresponding from each seed Closing, described pattern includes: meets default feature in the character string adjacent with property value and seed and wants The eigenvalue asked;
S4, utilize extraction genus in the statement set that each pattern in pattern set obtains to step S1 Property value, obtains the property value set that described preset attribute word is corresponding.
According to one preferred embodiment of the present invention, also include after described step S4:
S5, utilize described property value set to mate in statement set, extract new seed, go to Step S3, until algorithmic statement, obtains structured message, and described structured message includes: attribute word and Property value.
According to one preferred embodiment of the present invention, in step s3, the described character string adjacent with property value Including: word, phrase or symbol;
In described seed meet preset feature request eigenvalue include set forth below at least one: belong to Name in the property previous verb of value, later verb, nearest noun, the part of speech of property value, property value The number of word and the number of words of property value.
According to one preferred embodiment of the present invention, in described step S3, also include described preset attribute word Corresponding pattern set is filtered, and filter method uses set forth below at least one: based on pattern The frequency of occurrences carry out filtering or semantic dependency based on pattern filters.
According to one preferred embodiment of the present invention, in described step S4, also include described preset attribute word Corresponding property value set filters, and filter method can use set forth below one or more to combine:
A, filter based on word frequency, word frequency is filtered out less than the property value of preset requirement;
B, filtering based on part of speech feature, the property value that part of speech does not meets property value feature filters out;
C, number of words based on property value, the property value that number of words exceedes the requirement of default number of words filters out;
D, based on the number of stop words in property value participle, comprised stop words number is exceeded preset want The property value asked filters out.
According to one preferred embodiment of the present invention, before carrying out filtering based on word frequency, first judge described presetting The most repeatable appearance of property value that attribute word is corresponding, it is judged that rule be: judge extract obtain pre- If whether the property value that attribute word is corresponding occurs in multiple statements of described statement set, if it is, Think that property value corresponding to this preset attribute word is repeatable appearance;
For the property value of repeatable appearance, filter according to the word frequency of property value, by word frequency less than pre- If the property value of property value word frequency threshold value filters out;
For the property value of non-duplicate appearance, filter according to the word frequency of property value context character string, The word frequency of context character string is filtered out less than the property value presetting word frequency threshold value.
A kind of searching method of structured message, the method comprises the following steps:
S6, from user input query obtain attribute word;
S7, the attribute word utilizing step S6 to obtain mate in structured message, obtain the attribute of correspondence Value information, is included in the attribute value information obtained in Search Results and returns to described user;
Wherein said structured message uses structured message abstracting method of the present invention to obtain.
According to one preferred embodiment of the present invention, described step S6 also includes: from the query of user's input Middle acquisition entity word;
Coupling in described step S7 is particularly as follows: utilize the attribute word that step S6 obtains in described entity word Corresponding structured message mates, obtains the attribute value information of correspondence.
A kind of structured message draw-out device, this device includes:
Statement set acquisition module, for obtaining the statement set comprising preset attribute word from language material;
Seed gathers acquisition module, for obtaining the statement comprising preset attribute value from described statement set As seed seed, constitute seed set;
Pattern abstraction module, constitutes described preset attribute for extraction template pattern from each seed The pattern set that word is corresponding, described pattern includes: the character string adjacent with property value and seed In meet preset feature request eigenvalue;
Property value abstraction module, for utilizing each pattern in pattern set to described statement set The statement set that acquiring unit obtains is extracted property value, obtains the property value that described preset attribute word is corresponding Set.
According to one preferred embodiment of the present invention, described seed set acquisition module utilizes described property value to extract The property value set that module obtains mates in statement set, extracts new seed, joins seed In set;
Described property value abstraction module, after algorithmic statement, obtains structured message, described structured message Including: attribute word and property value.
According to one preferred embodiment of the present invention, adjacent with property value in described pattern abstraction module character String includes: word, phrase or symbol;
In described seed meet preset feature request eigenvalue include set forth below at least one: belong to Name in the property previous verb of value, later verb, nearest noun, the part of speech of property value, property value The number of word and the number of words of property value.
According to one preferred embodiment of the present invention, this device also includes pattern filtering module, for described The pattern set of pattern abstraction module extraction is supplied to described property value abstraction module after filtering, Filter method uses set forth below at least one: the frequency of occurrences based on pattern filters or base Semantic dependency in pattern filters.
According to one preferred embodiment of the present invention, this device also includes property value filtering module, for described The property value set that preset attribute word is corresponding is supplied to described seed and gathers acquisition module, mistake after filtering Filtering method can use set forth below one or more to combine:
A, filter based on word frequency, word frequency is filtered out less than the property value of preset requirement;
B, filtering based on part of speech feature, the property value that part of speech does not meets property value feature filters out;
C, number of words based on property value, the property value that number of words exceedes the requirement of default number of words filters out;
D, based on the number of stop words in property value participle, comprised stop words number is exceeded preset want The property value asked filters out.
According to one preferred embodiment of the present invention, described property value filtering module filters it based on word frequency carrying out Before, first judge the most repeatable appearance of property value that described preset attribute word is corresponding, it is judged that rule be: Judge to extract property value corresponding to the preset attribute word obtained whether in multiple statements of described statement set Occur, if it is, think that property value corresponding to this preset attribute word is repeatable appearance;
For the property value of repeatable appearance, filter according to the word frequency of property value, by word frequency less than pre- If the property value of property value word frequency threshold value filters out;
For the property value of non-duplicate appearance, filter according to the word frequency of property value context character string, The word frequency of context character string is filtered out less than the property value presetting word frequency threshold value.
A kind of searcher of structured message, this device includes:
Analyze module, for the query from user's input, obtain attribute word;
Matching module, for utilizing attribute word that described analysis module obtains in described structured message Join, obtain the attribute value information of correspondence, return to be included in Search Results by the attribute value information obtained Return to described user;
Wherein said structured message uses present configuration information extraction device to obtain.
According to one preferred embodiment of the present invention, the query that described analysis module inputs from described user further Middle acquisition entity word;
The attribute word that described matching module specifically utilizes described analysis module to obtain is corresponding in described entity word Structured message in mate, obtain correspondence attribute value information.
As can be seen from the above technical solutions, the structured message abstracting method of present invention offer and device, By extracting template (pattern) from limited seed, the pattern obtained is utilized to extract property value, And use the continuous iteration of mode of circulation, it is possible to automatically set up template and excavate property value dictionary, it is achieved knot The extraction of structure information, more saves human resources, improves efficiency and the recall rate of search engine.
[accompanying drawing explanation]
The structured message abstracting method flow chart that Fig. 1 provides for the embodiment of the present invention one;
The structured message draw-out device structure chart that Fig. 2 provides for the embodiment of the present invention two;
The searching method flow chart of the structured message that Fig. 3 provides for the embodiment of the present invention three;
Structure that Fig. 4 provides for the embodiment of the present invention four and the searcher structure chart of information.
[detailed description of the invention]
In order to make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the accompanying drawings and specifically Embodiment describes the present invention.
Embodiment one,
Fig. 1 is the structured message abstracting method flow chart that the present embodiment provides, as it is shown in figure 1, the party Method includes:
S101, obtain from language material and comprise the statement set of preset attribute word.
Described language material is the resource collection of certain scale, can select according to actual application scenarios, than As, the entry of whole encyclopaedia and article content can be selected as language material, it is also possible in selecting particular range Or the entry of some classification below encyclopaedia and article content are as language material.
Preset attribute word be can presentation-entity action or the lexical item of attribute in a certain respect, can be that some move Word or noun, such as, noun " film " " TV play ", verb " is admitted to " " being admitted to " etc..
Preset attribute word can be set by the way of artificial setting, it is also possible to is a dynamic formation List, utilize the language material of certain scale to carry out word frequency statistics by the way of machine learning, first to this one The language material of set pattern mould carries out participle and filtration etc. and processes, and removes stop words and some parts of speech are not suitable for doing attribute After the lexical item of word, using the lexical item that comes top n as preset attribute word, form preset attribute word list. Utilize this preset attribute word list, obtain from the language material obtaining statement set and comprise different preset attribute word Statement set.It is noted that the language material that employing machine learning mode obtains preset attribute word is permissible It is different from the language material obtaining statement set, for example, it is possible to it is default to utilize database for natural language to be obtained ahead of time Attribute word.
Give an example: the article content from the recreational persona of encyclopaedia classifies extracts and comprises " film " Statement set.Now, language material is the entry under the recreational persona of encyclopaedia classifies and text, " film " For preset attribute word.
Recreational persona has encyclopaedia entry, obtains the text in each entry under all recreational persona classify, Text is first carried out subordinate sentence, forms a sentence set, all sentences under whole recreational persona being classified Constitute language material.The statement set comprising preset attribute word " film " is obtained from this language material.Such as: " model Ice ice wins the raw film festival the best actress of the 18th Peking University by film " Guanyin Mountain "." be A statement in statement set.
It is noted that when obtaining preset attribute word list, also preset attribute word can be carried out semanteme The normalized of relation.Such as, verb " is admitted to " these words such as " being admitted to " " going to school " and " is examined Enter " belong to semantic word, it is divided in same attribute word list.Such as, " film " " shadow Sheet " " large stretch of " belong to the word with identical semantic relation, it is divided in same attribute word list.
S102, from statement set obtain comprise the statement of preset attribute value as seed (seed), structure Become seed set.
The property value set utilizing handmarking or the property value set obtained by machine are to statement collection Conjunction is mated, obtains and comprise the statement of described preset attribute word and property value as seed.Initially During, use limited property value of artificial mark or the attribute that existing dictionary preset attribute word is corresponding Value, mates, it is thus achieved that comprise described preset attribute word and property value in statement set in statement set Statement as seed, constitute seed set.
In seed gathers, mark the property value in each statement, the most each seed use " statement t belong to Property value " form storage.Such as: it is big that Fan Bingbing wins the 18th Beijing by film " Guanyin Mountain " Student film festival the best actress t Guanyin Mountain.In this seed, " Guanyin Mountain " is preset attribute word The property value that " film " is corresponding.The most such as: within 1978, Liu De China is admitted to wireless performer training class.T without Line performer training class.In this seed, wireless performer training class is that correspondence " be admitted to " in preset attribute word Property value.
S103, from each seed, extraction template (pattern) constitutes the pattern that described preset attribute word is corresponding Set.
Described pattern includes: meets default feature in the character string adjacent with property value and seed and wants The eigenvalue asked.Wherein, adjacent with property value character string can include but not limited to: word, phrase, Symbol etc..
" Fan Bingbing wins the raw film festival optimal female of the 18th Peking University by film " Guanyin Mountain " Leading role t Guanyin Mountain " in this seed, the character string adjacent with property value " Guanyin Mountain " includes previous Word " " " and later word " " "." within 1978, Liu De China is admitted to wireless performer training class.T without Line performer training class " in this seed, can obtain that property value " wireless performer training class " is adjacent two Individual word for " being admitted to " and ".”.
Meet preset the eigenvalue of feature request include set forth below at least one: property value previous In individual verb, later verb, nearest noun, the part of speech of property value, property value the number of noun with And the number of words of property value.
" Fan Bingbing wins the raw film festival optimal female of the 18th Peking University by film " Guanyin Mountain " Leading role t Guanyin Mountain " in this seed, eigenvalue can include that later verb " is won ", nearest Noun " film ", the part of speech of property value is noun, and in property value, the number of noun is 1, property value Number of words is 3.
The word adjacent with property value acquired and seed meet the feature presetting feature request Value constitutes the pattern that this seed is corresponding.If preset feature request selection is nearest noun and rear Individual verb, then the pattern that can obtain from this seed above-mentioned is " film " " is won ".pattern The storage format present invention be not any limitation as, for example, it is also possible to use form mode store.
From different seed, extract the character string adjacent with property value and seed meet and preset spy Levy the eigenvalue of requirement, it is possible to obtain different pattern, constitute pattern set.Such as, in advance If in the seed that attribute word is " film ", the verb before and after the property value of mark is likely to also: " obtain ... Prize " " attending " " publicity " " being ... publicity " " participation " etc., the pattern of composition correspondingly has: " " " is obtained ... prize " " attending film " " " " publicity " " publicizing for film " " " " ginseng Add film " " " etc..
It addition, the eigenvalue meeting default feature request obtained in seed for this step, if this A little eigenvalues meet default attribute word characteristic, then can also be passed through by the eigenvalue that meet attribute word characteristic After the filtration step such as word frequency or part of speech, these eigenvalues after these being filtered add preset attribute word row to In table.The attribute word characteristic wherein preset can be: is the previous verb of property value, later verb Or nearest noun.
S104, to described preset attribute word corresponding pattern set filter.
The pattern set that step S103 is obtained by the frequency of occurrences based on pattern is filtered, and obtains The frequency of occurrences of pattern meets the pattern set of preset requirement.Such as, add up each pattern's Occurrence number, the pattern few for occurrence number is considered low quality pattern, by it from pattern Set filters out.
Or, semantic dependency based on pattern filters, and belongs in pattern extraction obtained Property the adjacent character string of value and seed in meet and preset the eigenvalue of feature request and carry out semantic matches, mistake Filter the pattern of semantic dependency difference.Such as, the semanteme of each word in pattern can be calculated be correlated with Degree, filters out pattern low for semantic relevancy;Can also be filtered by existing dictionary, as Really the collocation of pattern occurred in dictionary and just retained, and otherwise, filtered this out.
S105, utilize in the statement set that each pattern in pattern set obtains to step S101 and take out Take property value, obtain the property value set that described preset attribute word is corresponding.
Such as, pattern " film " " is won " is utilized to extract property value in statement set, permissible Obtain " Yang Lixin wins ' the favorite actor of teenager ' prize by film " the first secretary " " etc. Statement, therefrom extracts property value " the first secretary ".Similarly, utilize pattern " " " is obtained ... Prize " to statement set is extracted property value, can obtain that " about nineteen eighty-three, thunder " strange friend " obtains the 33 western Berlin International Film Festival special award " etc. statement, therefrom extract property value " strange friend ". The property value these extractions obtained is as the property value set of preset attribute word " film " correspondence.
S106, the property value set that described preset attribute word is corresponding is filtered.
The filter method of property value set can in the following ways one or more combine:
A, filter based on word frequency (TF), word frequency is filtered out less than the property value of preset requirement.
The property value that some preset attribute word is corresponding is repeatable appearance, and some non-duplicate appearance. Thus, before carrying out word frequency filtration, first judge the most repeatable appearance of property value, it is judged that rule be: Judge to extract property value corresponding to the preset attribute word obtained whether in multiple statements of described statement set Occur, if it is, think that property value corresponding to this preset attribute word is repeatable appearance.Such as, Preset attribute word is " film ", and the property value of its correspondence has " Guanyin Mountain ", " the first secretary ", " footpath between fields Raw friend " etc., wherein, " Guanyin Mountain " may have " model in the statement set that " film " is corresponding Ice ice wins the 23rd " Tokyo International Film Festival " the best actress by film " Guanyin Mountain ".", " model Ice ice wins the raw film festival the best actress of the 18th Peking University by film " Guanyin Mountain "." etc. Multiple statements, then it is assumed that " Guanyin Mountain " repeated, correspondingly, preset attribute word " film " Corresponding property value be repeatable appearance.As long as in the property value that preset attribute word is corresponding some Property value repeated, and was taken as repeatable appearance.
For the property value of repeatable appearance, filter according to the word frequency of property value, by word frequency less than pre- If the property value of property value word frequency threshold value filters out.
For the property value of non-duplicate appearance, filter according to the word frequency of property value context character string, The word frequency of context character string is filtered out less than the property value presetting word frequency threshold value.Wherein property value is upper and lower Chinese character string includes the total suffix information in statement or the adjacent character string with property value, adjacent character String can include but not limited to: word, phrase, symbol etc...
Assume to judge when the property value of preset attribute word " film " is the non-property value repeated, can Filter with the occurrence number according to the total suffix information in statement.Total suffix information refers to several There is after property value identical suffix information.Such as: statement set has " Ruan Jingtian rely on " boat shoulder blade " Obtain the 47th Taiwan Golden Horse Prize Best Male Lead Award.", " Liu De China 2004 is by film " nothing Between road III " obtain Taiwan Golden Horse Prize Best Male Lead Award " " within 2007, beam obtains towards big by " color ring " Taiwan Golden Horse Prize Best Male Lead Award " etc. statement, say, that property value " boat shoulder blade ", " continuously Road III " and " color ring " comprise total suffix information " Taiwan Golden Horse Prize Best Male Lead Award ", then root Filter according to the occurrence number of this total suffix information " Taiwan Golden Horse Prize Best Male Lead Award ", if The occurrence number of " Taiwan Golden Horse Prize Best Male Lead Award " exceedes default suffix word frequency threshold value, then retain genus Property value " boat shoulder blade ", " Infernal Affairs III " and " color ring ", otherwise, filters this out.
Or, it is also possible to filter according to the word frequency of the adjacent character string of property value.Property value adjacent Character string includes two words before and after property value, and such as " TV play that nineteen eighty-two supervises at Li Tiansheng " is hunted Eagle " inner serve as leading man one big gun for the first time and red ", property value " falcon " former and later two words are " " " respectively " " ", then filter according to the word frequency of these former and later two words.
B, filtering based on part of speech feature, the property value that part of speech does not meets property value feature filters out.
The part of speech of property value is usually the type such as noun or noun phrase, as right in preset attribute word " film " Noun or the noun phrase such as the property value " Guanyin Mountain " answered, " the first secretary ", " strange friend ". The property value that part of speech is conjunction, auxiliary word etc. is filtered out.
C, number of words based on property value, the property value that number of words exceedes the requirement of default number of words filters out.
The property value that some preset attribute word is corresponding may have identical number of words or number of words necessarily Within the scope of, the property value not meeting the requirement of default number of words is filtered out.Such as, preset attribute word be " × × the date " time, the property value of its correspondence has fixing form, and number of words also will be within the scope of certain.
D, based on the number of stop words in property value participle, comprised stop words number is exceeded preset want The property value asked filters out.
Extraction property value in containing stop words number too many time, the most less there is practical significance, thus The property value that comprised stop words number exceedes default stop words number threshold value filters out.
Such as, " it has been admitted to that local best school " with pattern " being admitted to ... " from statement to take out Obtain property value " that local best school ".In this property value, stop words includes " that Individual " " best " " ", number is 3.When default stop words number threshold value is set to 1, property value The stop words number of " that local best school " more than 1, then will be filtered.
S107, utilize and the statement set that the property value set after filtering obtains to step S101 is carried out Join, extract new seed, go to step S103, until algorithmic statement, obtain structured message, described Structured message includes: attribute word and property value.
Utilizing the property value in the property value set after filtering to mate in statement set, extraction comprises The statement of property value, and form new seed with the form of " statement t property value ", go to step S103, Carry out next circulation, until algorithmic statement.When not being drawn into new pattern or new property value Time, algorithmic statement.
It is above the detailed description that method provided by the present invention is carried out, the knot below present invention provided Structure information extraction device is described in detail.
Embodiment two,
The structured message draw-out device structure chart that Fig. 2 provides for the present embodiment, as in figure 2 it is shown, this dress Put and include: statement set acquisition module 201, seed gather acquisition module 202, pattern abstraction module 203, pattern filtering module 204, property value abstraction module 205 and property value filtering module 206.
Statement set acquisition module 201, for obtaining the statement set comprising preset attribute word from language material.
Described language material is the resource collection of certain scale, can select according to actual application scenarios, than As, the entry of whole encyclopaedia and article content can be selected as language material, it is also possible in selecting particular range Or the entry of some classification below encyclopaedia and article content are as language material.
Preset attribute word be can presentation-entity action or the lexical item of attribute in a certain respect, can be that some move Word or noun, such as, noun " film " " TV play ", verb " is admitted to " " being admitted to " etc..
Preset attribute word can be set by the way of artificial setting, it is also possible to is a dynamic formation List, utilize the language material of certain scale to carry out word frequency statistics by the way of machine learning, first to this one The language material of set pattern mould carries out participle and filtration etc. and processes, and removes stop words and some parts of speech are not suitable for doing attribute After the lexical item of word, using the lexical item that comes top n as preset attribute word, form preset attribute word list. Utilize this preset attribute word list, obtain from the language material obtaining statement set and comprise different preset attribute word Statement set.It is noted that the language material that employing machine learning mode obtains preset attribute word is permissible It is different from the language material obtaining statement set, for example, it is possible to it is default to utilize database for natural language to be obtained ahead of time Attribute word.
Give an example: statement set acquisition module 201 need the recreational persona from encyclopaedia to classify entry Text extracts the statement set comprising " film ".Now, language material is recreational persona's classification of encyclopaedia Under entry and text, " film " is preset attribute word.
Recreational persona has encyclopaedia entry, obtains the text in each entry under all recreational persona classify, Text is first carried out subordinate sentence, forms a sentence set, all sentences under whole recreational persona being classified Constitute language material.The statement set comprising preset attribute word " film " is obtained from this language material.Such as: " model Ice ice wins the raw film festival the best actress of the 18th Peking University by film " Guanyin Mountain "." be A statement in statement set.
It is noted that when obtaining preset attribute word list, also preset attribute word can be carried out semanteme The normalized of relation.Such as, verb " is admitted to " these words such as " being admitted to " " going to school " and " is examined Enter " belong to semantic word, it is divided in same attribute word list.Such as, " film " " shadow Sheet " " large stretch of " belong to the word with identical semantic relation, it is divided in same attribute word list.
Seed gathers acquisition module 202, for obtaining the statement comprising preset attribute value from statement set As seed, constitute seed set.
The property value set utilizing handmarking or the property value set obtained by machine are to statement collection Conjunction is mated, obtains and comprise the statement of described preset attribute word and property value as seed.Initially During, use limited property value of artificial mark or the attribute that existing dictionary preset attribute word is corresponding Value, mates, it is thus achieved that comprise described preset attribute word and property value in statement set in statement set Statement as seed, constitute seed set.
In seed gathers, mark the property value in each statement, the most each seed use " statement t belong to Property value " form storage.Such as: it is big that Fan Bingbing wins the 18th Beijing by film " Guanyin Mountain " Student film festival the best actress t Guanyin Mountain.In this seed, " Guanyin Mountain " is preset attribute word The property value that " film " is corresponding.The most such as: within 1978, Liu De China is admitted to wireless performer training class.T without Line performer training class.In this seed, wireless performer training class is that correspondence " be admitted to " in preset attribute word Property value.
Pattern abstraction module 203, constitutes described preset attribute word for extracting pattern from each seed Corresponding pattern set.Described pattern includes: in the character string adjacent with property value and seed Meet the eigenvalue presetting feature request.Wherein, adjacent with property value character string can include but not limit In: word, phrase, symbol etc..
" Fan Bingbing wins the raw film festival optimal female of the 18th Peking University by film " Guanyin Mountain " Leading role t Guanyin Mountain " in this seed, the character string adjacent with property value " Guanyin Mountain " includes previous Word " " " and later word " " "." within 1978, Liu De China is admitted to wireless performer training class.T without Line performer training class " in this seed, can obtain that property value " wireless performer training class " is adjacent two Individual word for " being admitted to " and ".”.
In described seed meet preset feature request eigenvalue include set forth below at least one: belong to Name in the property previous verb of value, later verb, nearest noun, the part of speech of property value, property value The number of word and the number of words of property value.
" Fan Bingbing wins the raw film festival optimal female of the 18th Peking University by film " Guanyin Mountain " Leading role t Guanyin Mountain " in this seed, eigenvalue can include that later verb " is won ", nearest Noun " film ", the part of speech of property value is noun, and in property value, the number of noun is 1, property value Number of words is 3.
The character string adjacent with property value that pattern abstraction module 203 acquires and seed meet The eigenvalue presetting feature request constitutes pattern corresponding to this seed.If presetting what feature request selected Be nearest noun and later verb, then the pattern that can obtain from this seed above-mentioned is " electricity Shadow " " is won ".The storage format present invention of pattern is not any limitation as, for example, it is also possible to adopt Store by the mode of form.
From different seed, extract the character string adjacent with property value and seed meet and preset spy Levy the eigenvalue of requirement, it is possible to obtain different pattern, constitute pattern set.Such as, in advance If in the seed that attribute word is " film ", the verb before and after the property value of mark is likely to also: " obtain ... Prize " " attending " " publicity " " being ... publicity " " participation " etc., the pattern of composition correspondingly has: " " " is obtained ... prize " " attending film " " " " publicity " " publicizing for film " " " " ginseng Add film " " " etc..
It addition, the satisfied default feature request obtained in seed by pattern abstraction module 203 Eigenvalue, if these eigenvalues meet default attribute word characteristic, then can also be special by meeting attribute word Property eigenvalue after the filtration step such as word frequency or part of speech, by these filter after these eigenvalues add In preset attribute word list.The attribute word characteristic wherein preset can be: is the previous dynamic of property value Word, later verb or nearest noun.
Pattern filtering module 204, for described pattern abstraction module extraction pattern gather into Row is supplied to property value abstraction module 205 after filtering.
Pattern set is filtered by the frequency of occurrences based on pattern, obtains the appearance frequency of pattern Rate meets the pattern set of preset requirement.Such as, add up the occurrence number of each pattern, for The pattern that occurrence number is few is considered low quality pattern, it is filtered out from pattern gathers.
Or, semantic dependency based on pattern filters, and belongs in pattern extraction obtained Property the adjacent character string of value and seed in meet and preset the eigenvalue of feature request and carry out semantic matches, mistake Filter the pattern of semantic dependency difference.Such as, the semanteme of each word in pattern can be calculated be correlated with Degree, filters out pattern low for semantic relevancy;Can also be filtered by existing dictionary, as Really the collocation of pattern occurred in dictionary and just retained, and otherwise, filtered this out.
Property value abstraction module 205, for utilizing each pattern in pattern set to obtain to statement set Take and the statement set that unit 201 obtains is extracted property value, obtain the attribute that described preset attribute word is corresponding Value set.
Such as, pattern " film " " is won " is utilized to extract property value in statement set, permissible Obtain " Yang Lixin wins ' the favorite actor of teenager ' prize by film " the first secretary " " etc. Statement, therefrom extracts property value " the first secretary ".Similarly, utilize pattern " " " is obtained ... Prize " to statement set is extracted property value, can obtain that " about nineteen eighty-three, thunder " strange friend " obtains the 33 western Berlin International Film Festival special award " etc. statement, therefrom extract property value " strange friend ". The property value these extractions obtained is as the property value set of preset attribute word " film " correspondence.
Property value filtering module 206, for carrying out the property value set that described preset attribute word is corresponding Being supplied to seed after filter and gather acquisition module 202, filter method can use one set forth below or several Plant and combine:
A, filter based on word frequency, word frequency is filtered out less than the property value of preset requirement.
Described property value filtering module, before carrying out filtering based on word frequency, first judges described preset attribute word The corresponding the most repeatable appearance of property value, it is judged that rule be: judge to extract the preset attribute obtained Whether the property value that word is corresponding occurs, if it is, think this in multiple statements of described statement set The property value that preset attribute word is corresponding is repeatable appearance.Such as, preset attribute word is " film ", The property value of its correspondence has " Guanyin Mountain ", " the first secretary ", " strange friend " etc., wherein " sees Sound mountain " may have in the statement set that " film " is corresponding that " Fan Bingbing is flourish by film " Guanyin Mountain " Obtain the 23rd " Tokyo International Film Festival " the best actress.", " Fan Bingbing rely on film " Guanyin Mountain " Win the raw film festival the best actress of the 18th Peking University." etc. multiple statements, then it is assumed that " kwan-yin Mountain " repeated, correspondingly, the corresponding property value of preset attribute word " film " is to weigh Appear again existing.As long as in the property value that preset attribute word is corresponding, some property value repeated, just It is considered repeatable appearance.
For the property value of repeatable appearance, filter according to the word frequency of property value, by word frequency less than pre- If the property value of property value word frequency threshold value filters out.
For the property value of non-duplicate appearance, filter according to the word frequency of property value context character string, The word frequency of context character string is filtered out less than the property value presetting word frequency threshold value.Wherein property value is upper and lower Chinese character string includes the total suffix information in statement or the adjacent character string with property value, adjacent character String can include but not limited to: word, phrase, symbol etc..
Assume to judge when the property value of preset attribute word " film " is the non-property value repeated, can Filter with the occurrence number according to the total suffix information in statement.Total suffix information refers to several There is after property value identical suffix information.Such as: statement set has " Ruan Jingtian rely on " boat shoulder blade " Obtain the 47th Taiwan Golden Horse Prize Best Male Lead Award.", " Liu De China 2004 is by film " nothing Between road III " obtain Taiwan Golden Horse Prize Best Male Lead Award " " within 2007, beam obtains towards big by " color ring " Taiwan Golden Horse Prize Best Male Lead Award " etc. statement, say, that property value " boat shoulder blade ", " continuously Road III " and " color ring " comprise total suffix information " Taiwan Golden Horse Prize Best Male Lead Award ", then root Filter according to the occurrence number of this total suffix information " Taiwan Golden Horse Prize Best Male Lead Award ", if The occurrence number of " Taiwan Golden Horse Prize Best Male Lead Award " exceedes default suffix word frequency threshold value, then retain genus Property value " boat shoulder blade ", " Infernal Affairs III " and " color ring ", otherwise, filters this out.
Or, it is also possible to filter according to the word frequency of the adjacent character string of property value.Property value adjacent Character string includes two words before and after property value, and such as " TV play that nineteen eighty-two supervises at Li Tiansheng " is hunted Eagle " inner serve as leading man one big gun for the first time and red ", property value " falcon " former and later two words are " " " respectively " " ", then filter according to the word frequency of these former and later two words.
B, filtering based on part of speech feature, the property value that part of speech does not meets property value feature filters out.
The part of speech of property value is usually the type such as noun or noun phrase, as right in preset attribute word " film " The property value " Guanyin Mountain " answered, " the first secretary ", " strange friend " etc..By part of speech be conjunction, The property value of auxiliary word etc. filters out.
C, number of words based on property value, the property value that number of words exceedes the requirement of default number of words filters out.
The property value that some preset attribute word is corresponding may have identical number of words or number of words necessarily Within the scope of, the property value not meeting the requirement of default number of words is filtered out.Such as, preset attribute word be " × × the date " time, the property value of its correspondence has fixing form, and number of words also will be within the scope of certain.
D, based on the number of stop words in property value participle, comprised stop words number is exceeded preset want The property value asked filters out.
Extraction property value in containing stop words number too many time, the most less there is practical significance, thus The property value that comprised stop words number exceedes default stop words number threshold value filters out.
Such as, " it has been admitted to that local best school " with pattern " being admitted to ... " from statement to take out Obtain property value " that local best school ".In this property value, stop words includes " that Individual " " best " " ", number is 3.When default stop words number threshold value is set to 1, property value The stop words number of " that local best school " more than 1, then will be filtered.
After property value filtering module 206 is filtrated to get property value set, then forwards seed set to and obtain In module 202, seed set acquisition module 202 utilizes the genus that described property value filtering module 206 obtains Property value set mate in statement set, extraction comprises the statement of property value, and with " statement t belong to Property value " form, generate new seed, add to seed set in.Constantly loop iteration, until Algorithmic statement, obtains structured message, and described structured message includes: attribute word and property value.When not having Have when being drawn into new pattern or new property value, it is believed that algorithmic statement.
The present invention passes through limited seed of handmarking, and from each seed, extraction obtains pattern, then profit Extract property value with those pattern, utilize the property value that extraction obtains, use the mode of circulation constantly to change In generation, obtain comprising attribute word and corresponding attribute-value structure information thereof.The present invention provide said method and Device may be used for such as encyclopaedia, knows etc. that natural language data carry out structuring and resolve and obtain structuring Data base, is then further used for search engine and realizes structured search (i.e. vertical search).Lead to below Cross embodiment three and structured search process is described by embodiment four.
Embodiment three,
The searching method flow chart of the structured message that Fig. 3 provides for the embodiment of the present invention three, such as Fig. 3 institute Showing, the method can perform following steps based on the structured message obtained:
Step S301, from user input query determine attribute word.
When user is by search engine input search word (query), need from query, determine attribute Word, the determination process of this attribute word can use prior art, such as uses the mode of dictionary or template to obtain Take, repeat no more.
Step S302, the attribute word utilizing step S301 to obtain mate in described structured message, Obtain the attribute value information of correspondence, the attribute value information obtained is included in Search Results and returns to user. Wherein structured message is to use the method as described in embodiment one to obtain.
At this it should be noted that the query generally often inputted during carrying out structured search In also include entity word, in step S301, actually can determine whether out entity word and attribute word, and In step 302, the structured message of inquiry is actually the structured message that this entity word is corresponding.Also That is, need in advance to set up the structured message that each entity word is corresponding, due to entity word and attribute word it Between the extraction of relation be prior art, therefore combine to use and the present invention extract attribute word and property value Between the mode of corresponding relation just can set up the structured message that entity word is corresponding, in this structured message Including attribute word and the property value of correspondence thereof.
Such as the query of " what school Liu Dehua is admitted to ", analyze its entity word for " Liu De China ", attribute word is " being admitted to ".The attribute word obtained is utilized " to be admitted to " structure set up in the present invention Searching in change information, this structured message is the structured message that entity word " Liu Dehua " is corresponding, obtains Corresponding attribute value information " wireless performer training class ", thus can return correspondence by structured search In the attribute value information of attribute word, thus realize structured search.Certainly, search for except return structureization Result outside, it is also possible to return further the result of common big search to user.
Corresponding searcher is as shown in embodiment four.
Embodiment four,
The apparatus structure schematic diagram that Fig. 4 provides for the embodiment of the present invention four, as shown in Figure 4, this device can To specifically include: analyze module 401 and matching module 402.
Analyze module 401 from the query that user inputs, obtain attribute word.
The attribute word that matching module 402 utilizes analysis module 401 to obtain mates in structured message, To corresponding attribute value information, return to be included in Search Results by the attribute value information obtained and return to User.Wherein structured message is to use the device as described in embodiment two to obtain.
At this it should be noted that the query generally often inputted during carrying out structured search In also include entity word, the structured message of inquiry is actually the structured message that this entity word is corresponding. It is to say, need in advance to set up the structured message that each entity word is corresponding, due to entity word and attribute word Between the extraction of relation be prior art, therefore combine to use and the present invention extract attribute word and attribute Between value, the mode of corresponding relation just can set up the structured message that entity word is corresponding, this structured message Include the property value of attribute word and correspondence thereof.Analyzing module 401 can be further from user's input Query obtains entity word.Now matching module 402 specifically utilizes and analyzes the attribute that module 401 obtains Word mates in the structured message that described entity word is corresponding, obtains the attribute value information of correspondence.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all at this Within the spirit of invention and principle, any modification, equivalent substitution and improvement etc. done, should be included in Within the scope of protection of the invention.

Claims (12)

1. a structured message abstracting method, it is characterised in that including:
S1, obtain from language material and comprise the statement set of preset attribute word;
S2, obtain from described statement set and comprise the statement of preset attribute value as seed seed, constitute seed Set;
S3, the pattern set that the described preset attribute word of extraction template pattern composition is corresponding from each seed, Described pattern includes: meet the feature presetting feature request in the character string adjacent with property value and seed Value;
S4, utilize extraction attribute in the statement set that each pattern in pattern set obtains to step S1 Value, obtains the property value set that described preset attribute word is corresponding;
S5, utilize described property value set to mate in statement set, extract new seed, go to step Rapid S3, until algorithmic statement, obtains structured message, and described structured message includes: attribute word and attribute Value;
In described step S3, also include that the pattern set to described preset attribute word is corresponding is filtered, mistake Filtering method uses set forth below at least one: the frequency of occurrences based on pattern carry out filtering or based on The semantic dependency of pattern filters.
Method the most according to claim 1, it is characterised in that in step s3, described and attribute The character string that value is adjacent includes: word, phrase or symbol;
In described seed meet preset feature request eigenvalue include set forth below at least one: attribute Noun in the previous verb of value, later verb, nearest noun, the part of speech of property value, property value Number and the number of words of property value.
Method the most according to claim 1, it is characterised in that in described step S4, it is right also to include The property value set that described preset attribute word is corresponding filters, and filter method uses one set forth below Or several combination:
A, filter based on word frequency, word frequency is filtered out less than the property value of preset requirement;
B, filtering based on part of speech feature, the property value that part of speech does not meets property value feature filters out;
C, number of words based on property value, the property value that number of words exceedes the requirement of default number of words filters out;
D, based on the number of stop words in property value participle, comprised stop words number is exceeded preset want The property value asked filters out.
Method the most according to claim 3, it is characterised in that before carrying out filtering based on word frequency, First judge the most repeatable appearance of property value that described preset attribute word is corresponding, it is judged that rule be: sentence Whether the property value that preset attribute word that disconnected extraction obtains is corresponding goes out in multiple statements of described statement set Existing, if it is, think that property value corresponding to this preset attribute word is repeatable appearance;
For the property value of repeatable appearance, filter according to the word frequency of property value, by word frequency less than pre- If the property value of property value word frequency threshold value filters out;
For the property value of non-duplicate appearance, filter according to the word frequency of property value context character string, The word frequency of context character string is filtered out less than the property value presetting word frequency threshold value.
5. the searching method of a structured message, it is characterised in that the method comprises the following steps:
S6, from user input query obtain attribute word;
S7, the attribute word utilizing step S6 to obtain mate in structured message, obtain the attribute of correspondence Value information, is included in the attribute value information obtained in Search Results and returns to described user;
Wherein said structured message uses the method for claim 1 to obtain.
Searching method the most according to claim 5, it is characterised in that also wrap in described step S6 Include: from the query of user's input, obtain entity word;
Coupling in described step S7 is particularly as follows: utilize the attribute word that step S6 obtains in described entity word Corresponding structured message mates, obtains the attribute value information of correspondence.
7. a structured message draw-out device, it is characterised in that including:
Statement set acquisition module, for obtaining the statement set comprising preset attribute word from language material;
Seed gathers acquisition module, makees for obtaining the statement comprising preset attribute value from described statement set For seed seed, constitute seed set;
Pattern abstraction module, constitutes described preset attribute word for extraction template pattern from each seed Corresponding pattern set, described pattern includes: meet in the character string adjacent with property value and seed Preset the eigenvalue of feature request;
Property value abstraction module, for utilizing each pattern in pattern set to obtain to described statement set Take and the statement set that unit obtains is extracted property value, obtain the property value set that described preset attribute word is corresponding; Described seed set acquisition module also utilizes property value set that described property value abstraction module obtains to statement Set is mated, extracts new seed, join in seed set;
Described property value abstraction module, after algorithmic statement, obtains structured message, and described structured message includes: Attribute word and property value;
Described device also includes:
Pattern filtering module, for carrying out the pattern set of described pattern abstraction module extraction Being supplied to described property value abstraction module after filtration, filter method uses set forth below at least one: based on The frequency of occurrences of pattern carries out filtering or semantic dependency based on pattern filters.
Device the most according to claim 7, it is characterised in that in described pattern abstraction module with The character string that property value is adjacent includes: word, phrase or symbol;
In described seed meet preset feature request eigenvalue include set forth below at least one: attribute Noun in the previous verb of value, later verb, nearest noun, the part of speech of property value, property value Number and the number of words of property value.
Device the most according to claim 7, it is characterised in that this device also includes that property value filters Module, is supplied to described seed after filtering the property value set that described preset attribute word is corresponding Set acquisition module, filter method uses set forth below one or more to combine:
A, filter based on word frequency, word frequency is filtered out less than the property value of preset requirement;
B, filtering based on part of speech feature, the property value that part of speech does not meets property value feature filters out;
C, number of words based on property value, the property value that number of words exceedes the requirement of default number of words filters out;
D, based on the number of stop words in property value participle, comprised stop words number is exceeded preset want The property value asked filters out.
Device the most according to claim 9, it is characterised in that described property value filtering module is entering Before row filters based on word frequency, first judge the most repeatable appearance of property value that described preset attribute word is corresponding , it is judged that rule be: judge whether to extract property value corresponding to the preset attribute word that obtains at institute's predicate Multiple statements of sentence set occur, if it is, think that property value corresponding to this preset attribute word is can Repeat;
For the property value of repeatable appearance, filter according to the word frequency of property value, by word frequency less than pre- If the property value of property value word frequency threshold value filters out;
For the property value of non-duplicate appearance, filter according to the word frequency of property value context character string, The word frequency of context character string is filtered out less than the property value presetting word frequency threshold value.
The searcher of 11. 1 kinds of structured messages, it is characterised in that this device also includes:
Analyze module, for the query from user's input, obtain attribute word;
Matching module, for utilizing attribute word that described analysis module obtains in described structured message Join, obtain the attribute value information of correspondence, return to be included in Search Results by the attribute value information obtained Return to described user;
Wherein said structured message uses device as claimed in claim 7 to obtain.
12. searchers according to claim 11, it is characterised in that described analysis module enters one Step obtains entity word from the query that described user inputs;
The attribute word that described matching module specifically utilizes described analysis module to obtain is corresponding in described entity word Structured message mates, obtains the attribute value information of correspondence.
CN201110459457.5A 2011-12-31 2011-12-31 A kind of structured message abstracting method, searching method and device Active CN103186633B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110459457.5A CN103186633B (en) 2011-12-31 2011-12-31 A kind of structured message abstracting method, searching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110459457.5A CN103186633B (en) 2011-12-31 2011-12-31 A kind of structured message abstracting method, searching method and device

Publications (2)

Publication Number Publication Date
CN103186633A CN103186633A (en) 2013-07-03
CN103186633B true CN103186633B (en) 2016-08-17

Family

ID=48677802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110459457.5A Active CN103186633B (en) 2011-12-31 2011-12-31 A kind of structured message abstracting method, searching method and device

Country Status (1)

Country Link
CN (1) CN103186633B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488105B (en) * 2015-11-19 2019-11-05 百度在线网络技术(北京)有限公司 The treating method and apparatus of the method for building up of information extraction template, knowledge data
CN106407377B (en) * 2016-09-12 2020-03-03 北京百度网讯科技有限公司 Search method and device based on artificial intelligence
CN107341171B (en) * 2017-05-03 2021-07-27 刘洪利 Method for extracting data feature template and method and system for applying template
CN107632975A (en) * 2017-08-09 2018-01-26 联动优势科技有限公司 A kind of dictionary method for building up and equipment
CN110245329A (en) * 2018-03-07 2019-09-17 珠海金山办公软件有限公司 Text managemant method, apparatus, electronic equipment and computer readable storage medium
CN110162786B (en) * 2019-04-23 2024-02-27 百度在线网络技术(北京)有限公司 Method and device for constructing configuration file and extracting structured information
CN111309853B (en) * 2019-09-03 2024-03-22 东南大学 Code searching method based on structured information
CN111666417B (en) * 2020-04-13 2023-06-23 百度在线网络技术(北京)有限公司 Method, device, electronic equipment and readable storage medium for generating synonyms
CN111695518B (en) * 2020-06-12 2023-09-29 北京百度网讯科技有限公司 Method and device for labeling structured document information and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101702167A (en) * 2009-11-03 2010-05-05 上海第二工业大学 Method for extracting attribution and comment word with template based on internet
CN101937433A (en) * 2009-06-29 2011-01-05 天津一度搜索网络科技有限公司 Real-time searching method of product
CN102200983A (en) * 2010-03-25 2011-09-28 日电(中国)有限公司 Attribute extraction device and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5372536B2 (en) * 2009-01-28 2013-12-18 ソニー株式会社 Information processing apparatus, information processing method, and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101937433A (en) * 2009-06-29 2011-01-05 天津一度搜索网络科技有限公司 Real-time searching method of product
CN101702167A (en) * 2009-11-03 2010-05-05 上海第二工业大学 Method for extracting attribution and comment word with template based on internet
CN102200983A (en) * 2010-03-25 2011-09-28 日电(中国)有限公司 Attribute extraction device and method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于Boot Strapping 的中文实体关系自动生成;张素香 等;《微电子学与计算机》;20061205;第23卷(第12期);第15-18页 *
基于相似计算的信息抽取模板自动获取方法;叶娜 等;《第二届全国学生计算语言学研讨会论文集》;20040801;第434-439页 *
维基百科人物属性自动获取方法研究;孟新萍 等;《第五届全国青年计算语言学研讨会论文集》;20101011;第452-458页 *

Also Published As

Publication number Publication date
CN103186633A (en) 2013-07-03

Similar Documents

Publication Publication Date Title
CN103186633B (en) A kind of structured message abstracting method, searching method and device
CN103886063B (en) A kind of text searching method and device
CN103927358B (en) text search method and system
KR101661198B1 (en) Method and system for searching by using natural language query
CN104281702B (en) Data retrieval method and device based on electric power critical word participle
CN105447080B (en) A kind of inquiry complementing method in community's question and answer search
CN104021198B (en) The relational database information search method and device indexed based on Ontology
RU2004108667A (en) SEARCH FOR A RANDOM TEXT AND SEARCH FOR ATTRIBUTES IN THE DATA OF THE ELECTRONIC GUIDE FOR PROGRAMS
CN106446018B (en) Query information processing method and device based on artificial intelligence
CN106202211A (en) A kind of integrated microblogging rumour recognition methods based on microblogging type
CN102073729A (en) Relationship knowledge sharing platform and implementation method thereof
CN101196898A (en) Method for applying phrase index technology into internet search engine
CN105718585B (en) Document and label word justice correlating method and its device
CN105528437A (en) Question-answering system construction method based on structured text knowledge extraction
CN103123624A (en) Method of confirming head word, device of confirming head word, searching method and device
CN111190900A (en) JSON data visualization optimization method in cloud computing mode
CN101788988A (en) Information extraction method
CN111475625A (en) News manuscript generation method and system based on knowledge graph
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
KR20210130976A (en) Device, method and computer program for deriving response based on knowledge graph
JP5504097B2 (en) Binary relation classification program, method and apparatus for classifying semantically similar word pairs into binary relation
CN103514289A (en) Method and device for building interest entity base
CN105956158A (en) Automatic extraction method of network neologism on the basis of mass microblog texts and use information
CN109284362A (en) A kind of content search method and system
Menaha et al. Question answering system using web snippets

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant