CN103186633B - A kind of structured message abstracting method, searching method and device - Google Patents
A kind of structured message abstracting method, searching method and device Download PDFInfo
- Publication number
- CN103186633B CN103186633B CN201110459457.5A CN201110459457A CN103186633B CN 103186633 B CN103186633 B CN 103186633B CN 201110459457 A CN201110459457 A CN 201110459457A CN 103186633 B CN103186633 B CN 103186633B
- Authority
- CN
- China
- Prior art keywords
- property value
- word
- pattern
- seed
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Machine Translation (AREA)
Abstract
The invention provides a kind of structured message abstracting method, searching method and device, wherein said structured message abstracting method includes: S1, the statement set that acquisition comprises preset attribute word from language material;S2, the statement that acquisition comprises preset attribute value from described statement set are gathered as seed, composition seed;S3, the pattern extracting the pattern described preset attribute word of composition from each seed corresponding gather;S4, utilize in the statement set that each pattern obtains to step S1 and extract property value, obtain the property value set that described preset attribute word is corresponding;S5, utilize described property value set to mate in statement set, extract new seed, go to step S3, until algorithmic statement, obtain structured message.Compared to prior art, the present invention can automatically set up template and excavate property value, it is achieved the extraction of structured message, more saves human resources, improves efficiency and the recall rate of search engine.
Description
[technical field]
The present invention relates to information retrieval and natural language processing technique field, particularly to a kind of structuring letter
Breath abstracting method, searching method and device.
[background technology]
Along with the development of communication technology and network, the growing magnanimity information relating to every field
Resource, proposes huge challenge to information tissue and retrieval.Search engine is the weight that people obtain information
Wanting approach, required information is extracted from webpage by search engine according to the search word (query) of input
Come, and return Search Results.When information data scale constantly increases, due to information destructuring, letter
Breath wide variety, the document content covering scope factor such as extensively, cause search efficiency of search engine low,
Recall rate is low, it is desirable to directly find the answer oneself wanted, more difficulty in a search engine.
Structured data searching is also referred to as vertical search, is relative to the containing much information of universal search, inquires about
The new search pattern of the propositions such as inaccurate, the degree of depth is inadequate.In order to realize structured data searching, need
Info web is carried out structuring extraction, the unstructured data of webpage is taken into specific structuring number
According to, and these data are stored data base, set up and be indexed for search.Info web is by some
Sentence composition, drawing-out structure data from sentence.For single sentence, some expression is certain
Individual entity associates by occurring certain action to produce with another property value.Such as: Liu De Huakao in 1978
Enter wireless performer training class.Wherein, entity word is " Liu Dehua ", and attribute word is " being admitted to ", " nothing
Line performer training class " it is property value.Under normal circumstances, the entity word in webpage is more fixed, and belongs to
The expression-form of property word can compare many, such as " is admitted to " and can also be expressed as " being admitted to " " attending school " etc.
Synonym.The process carrying out structuring extraction is to be analyzed sentence, thus extracts entity in sentence
The property value that word is corresponding with attribute word.
In the method for existing structural data, entity word and attribute word are respectively by the entity word word preset
Allusion quotation and attribute word dictionary carry out match cognization, find the entity word matched or attribute word in webpage, then
Extraction obtains the property value of correspondence.If not comprising the expression of a certain form in attribute word dictionary, it is right
The property value answered will be unable to identified, causes recall rate low.Although attribute word dictionary can also pass through
Artificial or combine the mode that synonymicon adds and add other synonym attribute words to attribute word dictionary
In, but artificial addition manner labor intensive resource, inefficient and recall rate is relatively low;In conjunction with synonym
The mode of dictionary, equally exists the problem that recall rate is relatively low.
[summary of the invention]
The invention provides a kind of structured message abstracting method, searching method and device, it is possible to automatically build
Shuttering and excavation property value, it is achieved the extraction of structured message, more save human resources, and raising is searched
Index the efficiency and recall rate held up.
Concrete technical scheme is as follows:
A kind of structured message abstracting method, the method includes:
S1, obtain from language material and comprise the statement set of preset attribute word;
S2, obtain from described statement set and comprise the statement of preset attribute value as seed seed, constitute
Seed gathers;
S3, the pattern collection that the described preset attribute word of extraction template pattern composition is corresponding from each seed
Closing, described pattern includes: meets default feature in the character string adjacent with property value and seed and wants
The eigenvalue asked;
S4, utilize extraction genus in the statement set that each pattern in pattern set obtains to step S1
Property value, obtains the property value set that described preset attribute word is corresponding.
According to one preferred embodiment of the present invention, also include after described step S4:
S5, utilize described property value set to mate in statement set, extract new seed, go to
Step S3, until algorithmic statement, obtains structured message, and described structured message includes: attribute word and
Property value.
According to one preferred embodiment of the present invention, in step s3, the described character string adjacent with property value
Including: word, phrase or symbol;
In described seed meet preset feature request eigenvalue include set forth below at least one: belong to
Name in the property previous verb of value, later verb, nearest noun, the part of speech of property value, property value
The number of word and the number of words of property value.
According to one preferred embodiment of the present invention, in described step S3, also include described preset attribute word
Corresponding pattern set is filtered, and filter method uses set forth below at least one: based on pattern
The frequency of occurrences carry out filtering or semantic dependency based on pattern filters.
According to one preferred embodiment of the present invention, in described step S4, also include described preset attribute word
Corresponding property value set filters, and filter method can use set forth below one or more to combine:
A, filter based on word frequency, word frequency is filtered out less than the property value of preset requirement;
B, filtering based on part of speech feature, the property value that part of speech does not meets property value feature filters out;
C, number of words based on property value, the property value that number of words exceedes the requirement of default number of words filters out;
D, based on the number of stop words in property value participle, comprised stop words number is exceeded preset want
The property value asked filters out.
According to one preferred embodiment of the present invention, before carrying out filtering based on word frequency, first judge described presetting
The most repeatable appearance of property value that attribute word is corresponding, it is judged that rule be: judge extract obtain pre-
If whether the property value that attribute word is corresponding occurs in multiple statements of described statement set, if it is,
Think that property value corresponding to this preset attribute word is repeatable appearance;
For the property value of repeatable appearance, filter according to the word frequency of property value, by word frequency less than pre-
If the property value of property value word frequency threshold value filters out;
For the property value of non-duplicate appearance, filter according to the word frequency of property value context character string,
The word frequency of context character string is filtered out less than the property value presetting word frequency threshold value.
A kind of searching method of structured message, the method comprises the following steps:
S6, from user input query obtain attribute word;
S7, the attribute word utilizing step S6 to obtain mate in structured message, obtain the attribute of correspondence
Value information, is included in the attribute value information obtained in Search Results and returns to described user;
Wherein said structured message uses structured message abstracting method of the present invention to obtain.
According to one preferred embodiment of the present invention, described step S6 also includes: from the query of user's input
Middle acquisition entity word;
Coupling in described step S7 is particularly as follows: utilize the attribute word that step S6 obtains in described entity word
Corresponding structured message mates, obtains the attribute value information of correspondence.
A kind of structured message draw-out device, this device includes:
Statement set acquisition module, for obtaining the statement set comprising preset attribute word from language material;
Seed gathers acquisition module, for obtaining the statement comprising preset attribute value from described statement set
As seed seed, constitute seed set;
Pattern abstraction module, constitutes described preset attribute for extraction template pattern from each seed
The pattern set that word is corresponding, described pattern includes: the character string adjacent with property value and seed
In meet preset feature request eigenvalue;
Property value abstraction module, for utilizing each pattern in pattern set to described statement set
The statement set that acquiring unit obtains is extracted property value, obtains the property value that described preset attribute word is corresponding
Set.
According to one preferred embodiment of the present invention, described seed set acquisition module utilizes described property value to extract
The property value set that module obtains mates in statement set, extracts new seed, joins seed
In set;
Described property value abstraction module, after algorithmic statement, obtains structured message, described structured message
Including: attribute word and property value.
According to one preferred embodiment of the present invention, adjacent with property value in described pattern abstraction module character
String includes: word, phrase or symbol;
In described seed meet preset feature request eigenvalue include set forth below at least one: belong to
Name in the property previous verb of value, later verb, nearest noun, the part of speech of property value, property value
The number of word and the number of words of property value.
According to one preferred embodiment of the present invention, this device also includes pattern filtering module, for described
The pattern set of pattern abstraction module extraction is supplied to described property value abstraction module after filtering,
Filter method uses set forth below at least one: the frequency of occurrences based on pattern filters or base
Semantic dependency in pattern filters.
According to one preferred embodiment of the present invention, this device also includes property value filtering module, for described
The property value set that preset attribute word is corresponding is supplied to described seed and gathers acquisition module, mistake after filtering
Filtering method can use set forth below one or more to combine:
A, filter based on word frequency, word frequency is filtered out less than the property value of preset requirement;
B, filtering based on part of speech feature, the property value that part of speech does not meets property value feature filters out;
C, number of words based on property value, the property value that number of words exceedes the requirement of default number of words filters out;
D, based on the number of stop words in property value participle, comprised stop words number is exceeded preset want
The property value asked filters out.
According to one preferred embodiment of the present invention, described property value filtering module filters it based on word frequency carrying out
Before, first judge the most repeatable appearance of property value that described preset attribute word is corresponding, it is judged that rule be:
Judge to extract property value corresponding to the preset attribute word obtained whether in multiple statements of described statement set
Occur, if it is, think that property value corresponding to this preset attribute word is repeatable appearance;
For the property value of repeatable appearance, filter according to the word frequency of property value, by word frequency less than pre-
If the property value of property value word frequency threshold value filters out;
For the property value of non-duplicate appearance, filter according to the word frequency of property value context character string,
The word frequency of context character string is filtered out less than the property value presetting word frequency threshold value.
A kind of searcher of structured message, this device includes:
Analyze module, for the query from user's input, obtain attribute word;
Matching module, for utilizing attribute word that described analysis module obtains in described structured message
Join, obtain the attribute value information of correspondence, return to be included in Search Results by the attribute value information obtained
Return to described user;
Wherein said structured message uses present configuration information extraction device to obtain.
According to one preferred embodiment of the present invention, the query that described analysis module inputs from described user further
Middle acquisition entity word;
The attribute word that described matching module specifically utilizes described analysis module to obtain is corresponding in described entity word
Structured message in mate, obtain correspondence attribute value information.
As can be seen from the above technical solutions, the structured message abstracting method of present invention offer and device,
By extracting template (pattern) from limited seed, the pattern obtained is utilized to extract property value,
And use the continuous iteration of mode of circulation, it is possible to automatically set up template and excavate property value dictionary, it is achieved knot
The extraction of structure information, more saves human resources, improves efficiency and the recall rate of search engine.
[accompanying drawing explanation]
The structured message abstracting method flow chart that Fig. 1 provides for the embodiment of the present invention one;
The structured message draw-out device structure chart that Fig. 2 provides for the embodiment of the present invention two;
The searching method flow chart of the structured message that Fig. 3 provides for the embodiment of the present invention three;
Structure that Fig. 4 provides for the embodiment of the present invention four and the searcher structure chart of information.
[detailed description of the invention]
In order to make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the accompanying drawings and specifically
Embodiment describes the present invention.
Embodiment one,
Fig. 1 is the structured message abstracting method flow chart that the present embodiment provides, as it is shown in figure 1, the party
Method includes:
S101, obtain from language material and comprise the statement set of preset attribute word.
Described language material is the resource collection of certain scale, can select according to actual application scenarios, than
As, the entry of whole encyclopaedia and article content can be selected as language material, it is also possible in selecting particular range
Or the entry of some classification below encyclopaedia and article content are as language material.
Preset attribute word be can presentation-entity action or the lexical item of attribute in a certain respect, can be that some move
Word or noun, such as, noun " film " " TV play ", verb " is admitted to " " being admitted to " etc..
Preset attribute word can be set by the way of artificial setting, it is also possible to is a dynamic formation
List, utilize the language material of certain scale to carry out word frequency statistics by the way of machine learning, first to this one
The language material of set pattern mould carries out participle and filtration etc. and processes, and removes stop words and some parts of speech are not suitable for doing attribute
After the lexical item of word, using the lexical item that comes top n as preset attribute word, form preset attribute word list.
Utilize this preset attribute word list, obtain from the language material obtaining statement set and comprise different preset attribute word
Statement set.It is noted that the language material that employing machine learning mode obtains preset attribute word is permissible
It is different from the language material obtaining statement set, for example, it is possible to it is default to utilize database for natural language to be obtained ahead of time
Attribute word.
Give an example: the article content from the recreational persona of encyclopaedia classifies extracts and comprises " film "
Statement set.Now, language material is the entry under the recreational persona of encyclopaedia classifies and text, " film "
For preset attribute word.
Recreational persona has encyclopaedia entry, obtains the text in each entry under all recreational persona classify,
Text is first carried out subordinate sentence, forms a sentence set, all sentences under whole recreational persona being classified
Constitute language material.The statement set comprising preset attribute word " film " is obtained from this language material.Such as: " model
Ice ice wins the raw film festival the best actress of the 18th Peking University by film " Guanyin Mountain "." be
A statement in statement set.
It is noted that when obtaining preset attribute word list, also preset attribute word can be carried out semanteme
The normalized of relation.Such as, verb " is admitted to " these words such as " being admitted to " " going to school " and " is examined
Enter " belong to semantic word, it is divided in same attribute word list.Such as, " film " " shadow
Sheet " " large stretch of " belong to the word with identical semantic relation, it is divided in same attribute word list.
S102, from statement set obtain comprise the statement of preset attribute value as seed (seed), structure
Become seed set.
The property value set utilizing handmarking or the property value set obtained by machine are to statement collection
Conjunction is mated, obtains and comprise the statement of described preset attribute word and property value as seed.Initially
During, use limited property value of artificial mark or the attribute that existing dictionary preset attribute word is corresponding
Value, mates, it is thus achieved that comprise described preset attribute word and property value in statement set in statement set
Statement as seed, constitute seed set.
In seed gathers, mark the property value in each statement, the most each seed use " statement t belong to
Property value " form storage.Such as: it is big that Fan Bingbing wins the 18th Beijing by film " Guanyin Mountain "
Student film festival the best actress t Guanyin Mountain.In this seed, " Guanyin Mountain " is preset attribute word
The property value that " film " is corresponding.The most such as: within 1978, Liu De China is admitted to wireless performer training class.T without
Line performer training class.In this seed, wireless performer training class is that correspondence " be admitted to " in preset attribute word
Property value.
S103, from each seed, extraction template (pattern) constitutes the pattern that described preset attribute word is corresponding
Set.
Described pattern includes: meets default feature in the character string adjacent with property value and seed and wants
The eigenvalue asked.Wherein, adjacent with property value character string can include but not limited to: word, phrase,
Symbol etc..
" Fan Bingbing wins the raw film festival optimal female of the 18th Peking University by film " Guanyin Mountain "
Leading role t Guanyin Mountain " in this seed, the character string adjacent with property value " Guanyin Mountain " includes previous
Word " " " and later word " " "." within 1978, Liu De China is admitted to wireless performer training class.T without
Line performer training class " in this seed, can obtain that property value " wireless performer training class " is adjacent two
Individual word for " being admitted to " and ".”.
Meet preset the eigenvalue of feature request include set forth below at least one: property value previous
In individual verb, later verb, nearest noun, the part of speech of property value, property value the number of noun with
And the number of words of property value.
" Fan Bingbing wins the raw film festival optimal female of the 18th Peking University by film " Guanyin Mountain "
Leading role t Guanyin Mountain " in this seed, eigenvalue can include that later verb " is won ", nearest
Noun " film ", the part of speech of property value is noun, and in property value, the number of noun is 1, property value
Number of words is 3.
The word adjacent with property value acquired and seed meet the feature presetting feature request
Value constitutes the pattern that this seed is corresponding.If preset feature request selection is nearest noun and rear
Individual verb, then the pattern that can obtain from this seed above-mentioned is " film " " is won ".pattern
The storage format present invention be not any limitation as, for example, it is also possible to use form mode store.
From different seed, extract the character string adjacent with property value and seed meet and preset spy
Levy the eigenvalue of requirement, it is possible to obtain different pattern, constitute pattern set.Such as, in advance
If in the seed that attribute word is " film ", the verb before and after the property value of mark is likely to also: " obtain ...
Prize " " attending " " publicity " " being ... publicity " " participation " etc., the pattern of composition correspondingly has:
" " " is obtained ... prize " " attending film " " " " publicity " " publicizing for film " " " " ginseng
Add film " " " etc..
It addition, the eigenvalue meeting default feature request obtained in seed for this step, if this
A little eigenvalues meet default attribute word characteristic, then can also be passed through by the eigenvalue that meet attribute word characteristic
After the filtration step such as word frequency or part of speech, these eigenvalues after these being filtered add preset attribute word row to
In table.The attribute word characteristic wherein preset can be: is the previous verb of property value, later verb
Or nearest noun.
S104, to described preset attribute word corresponding pattern set filter.
The pattern set that step S103 is obtained by the frequency of occurrences based on pattern is filtered, and obtains
The frequency of occurrences of pattern meets the pattern set of preset requirement.Such as, add up each pattern's
Occurrence number, the pattern few for occurrence number is considered low quality pattern, by it from pattern
Set filters out.
Or, semantic dependency based on pattern filters, and belongs in pattern extraction obtained
Property the adjacent character string of value and seed in meet and preset the eigenvalue of feature request and carry out semantic matches, mistake
Filter the pattern of semantic dependency difference.Such as, the semanteme of each word in pattern can be calculated be correlated with
Degree, filters out pattern low for semantic relevancy;Can also be filtered by existing dictionary, as
Really the collocation of pattern occurred in dictionary and just retained, and otherwise, filtered this out.
S105, utilize in the statement set that each pattern in pattern set obtains to step S101 and take out
Take property value, obtain the property value set that described preset attribute word is corresponding.
Such as, pattern " film " " is won " is utilized to extract property value in statement set, permissible
Obtain " Yang Lixin wins ' the favorite actor of teenager ' prize by film " the first secretary " " etc.
Statement, therefrom extracts property value " the first secretary ".Similarly, utilize pattern " " " is obtained ...
Prize " to statement set is extracted property value, can obtain that " about nineteen eighty-three, thunder " strange friend " obtains the
33 western Berlin International Film Festival special award " etc. statement, therefrom extract property value " strange friend ".
The property value these extractions obtained is as the property value set of preset attribute word " film " correspondence.
S106, the property value set that described preset attribute word is corresponding is filtered.
The filter method of property value set can in the following ways one or more combine:
A, filter based on word frequency (TF), word frequency is filtered out less than the property value of preset requirement.
The property value that some preset attribute word is corresponding is repeatable appearance, and some non-duplicate appearance.
Thus, before carrying out word frequency filtration, first judge the most repeatable appearance of property value, it is judged that rule be:
Judge to extract property value corresponding to the preset attribute word obtained whether in multiple statements of described statement set
Occur, if it is, think that property value corresponding to this preset attribute word is repeatable appearance.Such as,
Preset attribute word is " film ", and the property value of its correspondence has " Guanyin Mountain ", " the first secretary ", " footpath between fields
Raw friend " etc., wherein, " Guanyin Mountain " may have " model in the statement set that " film " is corresponding
Ice ice wins the 23rd " Tokyo International Film Festival " the best actress by film " Guanyin Mountain ".", " model
Ice ice wins the raw film festival the best actress of the 18th Peking University by film " Guanyin Mountain "." etc.
Multiple statements, then it is assumed that " Guanyin Mountain " repeated, correspondingly, preset attribute word " film "
Corresponding property value be repeatable appearance.As long as in the property value that preset attribute word is corresponding some
Property value repeated, and was taken as repeatable appearance.
For the property value of repeatable appearance, filter according to the word frequency of property value, by word frequency less than pre-
If the property value of property value word frequency threshold value filters out.
For the property value of non-duplicate appearance, filter according to the word frequency of property value context character string,
The word frequency of context character string is filtered out less than the property value presetting word frequency threshold value.Wherein property value is upper and lower
Chinese character string includes the total suffix information in statement or the adjacent character string with property value, adjacent character
String can include but not limited to: word, phrase, symbol etc...
Assume to judge when the property value of preset attribute word " film " is the non-property value repeated, can
Filter with the occurrence number according to the total suffix information in statement.Total suffix information refers to several
There is after property value identical suffix information.Such as: statement set has " Ruan Jingtian rely on " boat shoulder blade "
Obtain the 47th Taiwan Golden Horse Prize Best Male Lead Award.", " Liu De China 2004 is by film " nothing
Between road III " obtain Taiwan Golden Horse Prize Best Male Lead Award " " within 2007, beam obtains towards big by " color ring "
Taiwan Golden Horse Prize Best Male Lead Award " etc. statement, say, that property value " boat shoulder blade ", " continuously
Road III " and " color ring " comprise total suffix information " Taiwan Golden Horse Prize Best Male Lead Award ", then root
Filter according to the occurrence number of this total suffix information " Taiwan Golden Horse Prize Best Male Lead Award ", if
The occurrence number of " Taiwan Golden Horse Prize Best Male Lead Award " exceedes default suffix word frequency threshold value, then retain genus
Property value " boat shoulder blade ", " Infernal Affairs III " and " color ring ", otherwise, filters this out.
Or, it is also possible to filter according to the word frequency of the adjacent character string of property value.Property value adjacent
Character string includes two words before and after property value, and such as " TV play that nineteen eighty-two supervises at Li Tiansheng " is hunted
Eagle " inner serve as leading man one big gun for the first time and red ", property value " falcon " former and later two words are " " " respectively
" " ", then filter according to the word frequency of these former and later two words.
B, filtering based on part of speech feature, the property value that part of speech does not meets property value feature filters out.
The part of speech of property value is usually the type such as noun or noun phrase, as right in preset attribute word " film "
Noun or the noun phrase such as the property value " Guanyin Mountain " answered, " the first secretary ", " strange friend ".
The property value that part of speech is conjunction, auxiliary word etc. is filtered out.
C, number of words based on property value, the property value that number of words exceedes the requirement of default number of words filters out.
The property value that some preset attribute word is corresponding may have identical number of words or number of words necessarily
Within the scope of, the property value not meeting the requirement of default number of words is filtered out.Such as, preset attribute word be " ×
× the date " time, the property value of its correspondence has fixing form, and number of words also will be within the scope of certain.
D, based on the number of stop words in property value participle, comprised stop words number is exceeded preset want
The property value asked filters out.
Extraction property value in containing stop words number too many time, the most less there is practical significance, thus
The property value that comprised stop words number exceedes default stop words number threshold value filters out.
Such as, " it has been admitted to that local best school " with pattern " being admitted to ... " from statement to take out
Obtain property value " that local best school ".In this property value, stop words includes " that
Individual " " best " " ", number is 3.When default stop words number threshold value is set to 1, property value
The stop words number of " that local best school " more than 1, then will be filtered.
S107, utilize and the statement set that the property value set after filtering obtains to step S101 is carried out
Join, extract new seed, go to step S103, until algorithmic statement, obtain structured message, described
Structured message includes: attribute word and property value.
Utilizing the property value in the property value set after filtering to mate in statement set, extraction comprises
The statement of property value, and form new seed with the form of " statement t property value ", go to step S103,
Carry out next circulation, until algorithmic statement.When not being drawn into new pattern or new property value
Time, algorithmic statement.
It is above the detailed description that method provided by the present invention is carried out, the knot below present invention provided
Structure information extraction device is described in detail.
Embodiment two,
The structured message draw-out device structure chart that Fig. 2 provides for the present embodiment, as in figure 2 it is shown, this dress
Put and include: statement set acquisition module 201, seed gather acquisition module 202, pattern abstraction module
203, pattern filtering module 204, property value abstraction module 205 and property value filtering module 206.
Statement set acquisition module 201, for obtaining the statement set comprising preset attribute word from language material.
Described language material is the resource collection of certain scale, can select according to actual application scenarios, than
As, the entry of whole encyclopaedia and article content can be selected as language material, it is also possible in selecting particular range
Or the entry of some classification below encyclopaedia and article content are as language material.
Preset attribute word be can presentation-entity action or the lexical item of attribute in a certain respect, can be that some move
Word or noun, such as, noun " film " " TV play ", verb " is admitted to " " being admitted to " etc..
Preset attribute word can be set by the way of artificial setting, it is also possible to is a dynamic formation
List, utilize the language material of certain scale to carry out word frequency statistics by the way of machine learning, first to this one
The language material of set pattern mould carries out participle and filtration etc. and processes, and removes stop words and some parts of speech are not suitable for doing attribute
After the lexical item of word, using the lexical item that comes top n as preset attribute word, form preset attribute word list.
Utilize this preset attribute word list, obtain from the language material obtaining statement set and comprise different preset attribute word
Statement set.It is noted that the language material that employing machine learning mode obtains preset attribute word is permissible
It is different from the language material obtaining statement set, for example, it is possible to it is default to utilize database for natural language to be obtained ahead of time
Attribute word.
Give an example: statement set acquisition module 201 need the recreational persona from encyclopaedia to classify entry
Text extracts the statement set comprising " film ".Now, language material is recreational persona's classification of encyclopaedia
Under entry and text, " film " is preset attribute word.
Recreational persona has encyclopaedia entry, obtains the text in each entry under all recreational persona classify,
Text is first carried out subordinate sentence, forms a sentence set, all sentences under whole recreational persona being classified
Constitute language material.The statement set comprising preset attribute word " film " is obtained from this language material.Such as: " model
Ice ice wins the raw film festival the best actress of the 18th Peking University by film " Guanyin Mountain "." be
A statement in statement set.
It is noted that when obtaining preset attribute word list, also preset attribute word can be carried out semanteme
The normalized of relation.Such as, verb " is admitted to " these words such as " being admitted to " " going to school " and " is examined
Enter " belong to semantic word, it is divided in same attribute word list.Such as, " film " " shadow
Sheet " " large stretch of " belong to the word with identical semantic relation, it is divided in same attribute word list.
Seed gathers acquisition module 202, for obtaining the statement comprising preset attribute value from statement set
As seed, constitute seed set.
The property value set utilizing handmarking or the property value set obtained by machine are to statement collection
Conjunction is mated, obtains and comprise the statement of described preset attribute word and property value as seed.Initially
During, use limited property value of artificial mark or the attribute that existing dictionary preset attribute word is corresponding
Value, mates, it is thus achieved that comprise described preset attribute word and property value in statement set in statement set
Statement as seed, constitute seed set.
In seed gathers, mark the property value in each statement, the most each seed use " statement t belong to
Property value " form storage.Such as: it is big that Fan Bingbing wins the 18th Beijing by film " Guanyin Mountain "
Student film festival the best actress t Guanyin Mountain.In this seed, " Guanyin Mountain " is preset attribute word
The property value that " film " is corresponding.The most such as: within 1978, Liu De China is admitted to wireless performer training class.T without
Line performer training class.In this seed, wireless performer training class is that correspondence " be admitted to " in preset attribute word
Property value.
Pattern abstraction module 203, constitutes described preset attribute word for extracting pattern from each seed
Corresponding pattern set.Described pattern includes: in the character string adjacent with property value and seed
Meet the eigenvalue presetting feature request.Wherein, adjacent with property value character string can include but not limit
In: word, phrase, symbol etc..
" Fan Bingbing wins the raw film festival optimal female of the 18th Peking University by film " Guanyin Mountain "
Leading role t Guanyin Mountain " in this seed, the character string adjacent with property value " Guanyin Mountain " includes previous
Word " " " and later word " " "." within 1978, Liu De China is admitted to wireless performer training class.T without
Line performer training class " in this seed, can obtain that property value " wireless performer training class " is adjacent two
Individual word for " being admitted to " and ".”.
In described seed meet preset feature request eigenvalue include set forth below at least one: belong to
Name in the property previous verb of value, later verb, nearest noun, the part of speech of property value, property value
The number of word and the number of words of property value.
" Fan Bingbing wins the raw film festival optimal female of the 18th Peking University by film " Guanyin Mountain "
Leading role t Guanyin Mountain " in this seed, eigenvalue can include that later verb " is won ", nearest
Noun " film ", the part of speech of property value is noun, and in property value, the number of noun is 1, property value
Number of words is 3.
The character string adjacent with property value that pattern abstraction module 203 acquires and seed meet
The eigenvalue presetting feature request constitutes pattern corresponding to this seed.If presetting what feature request selected
Be nearest noun and later verb, then the pattern that can obtain from this seed above-mentioned is " electricity
Shadow " " is won ".The storage format present invention of pattern is not any limitation as, for example, it is also possible to adopt
Store by the mode of form.
From different seed, extract the character string adjacent with property value and seed meet and preset spy
Levy the eigenvalue of requirement, it is possible to obtain different pattern, constitute pattern set.Such as, in advance
If in the seed that attribute word is " film ", the verb before and after the property value of mark is likely to also: " obtain ...
Prize " " attending " " publicity " " being ... publicity " " participation " etc., the pattern of composition correspondingly has:
" " " is obtained ... prize " " attending film " " " " publicity " " publicizing for film " " " " ginseng
Add film " " " etc..
It addition, the satisfied default feature request obtained in seed by pattern abstraction module 203
Eigenvalue, if these eigenvalues meet default attribute word characteristic, then can also be special by meeting attribute word
Property eigenvalue after the filtration step such as word frequency or part of speech, by these filter after these eigenvalues add
In preset attribute word list.The attribute word characteristic wherein preset can be: is the previous dynamic of property value
Word, later verb or nearest noun.
Pattern filtering module 204, for described pattern abstraction module extraction pattern gather into
Row is supplied to property value abstraction module 205 after filtering.
Pattern set is filtered by the frequency of occurrences based on pattern, obtains the appearance frequency of pattern
Rate meets the pattern set of preset requirement.Such as, add up the occurrence number of each pattern, for
The pattern that occurrence number is few is considered low quality pattern, it is filtered out from pattern gathers.
Or, semantic dependency based on pattern filters, and belongs in pattern extraction obtained
Property the adjacent character string of value and seed in meet and preset the eigenvalue of feature request and carry out semantic matches, mistake
Filter the pattern of semantic dependency difference.Such as, the semanteme of each word in pattern can be calculated be correlated with
Degree, filters out pattern low for semantic relevancy;Can also be filtered by existing dictionary, as
Really the collocation of pattern occurred in dictionary and just retained, and otherwise, filtered this out.
Property value abstraction module 205, for utilizing each pattern in pattern set to obtain to statement set
Take and the statement set that unit 201 obtains is extracted property value, obtain the attribute that described preset attribute word is corresponding
Value set.
Such as, pattern " film " " is won " is utilized to extract property value in statement set, permissible
Obtain " Yang Lixin wins ' the favorite actor of teenager ' prize by film " the first secretary " " etc.
Statement, therefrom extracts property value " the first secretary ".Similarly, utilize pattern " " " is obtained ...
Prize " to statement set is extracted property value, can obtain that " about nineteen eighty-three, thunder " strange friend " obtains the
33 western Berlin International Film Festival special award " etc. statement, therefrom extract property value " strange friend ".
The property value these extractions obtained is as the property value set of preset attribute word " film " correspondence.
Property value filtering module 206, for carrying out the property value set that described preset attribute word is corresponding
Being supplied to seed after filter and gather acquisition module 202, filter method can use one set forth below or several
Plant and combine:
A, filter based on word frequency, word frequency is filtered out less than the property value of preset requirement.
Described property value filtering module, before carrying out filtering based on word frequency, first judges described preset attribute word
The corresponding the most repeatable appearance of property value, it is judged that rule be: judge to extract the preset attribute obtained
Whether the property value that word is corresponding occurs, if it is, think this in multiple statements of described statement set
The property value that preset attribute word is corresponding is repeatable appearance.Such as, preset attribute word is " film ",
The property value of its correspondence has " Guanyin Mountain ", " the first secretary ", " strange friend " etc., wherein " sees
Sound mountain " may have in the statement set that " film " is corresponding that " Fan Bingbing is flourish by film " Guanyin Mountain "
Obtain the 23rd " Tokyo International Film Festival " the best actress.", " Fan Bingbing rely on film " Guanyin Mountain "
Win the raw film festival the best actress of the 18th Peking University." etc. multiple statements, then it is assumed that " kwan-yin
Mountain " repeated, correspondingly, the corresponding property value of preset attribute word " film " is to weigh
Appear again existing.As long as in the property value that preset attribute word is corresponding, some property value repeated, just
It is considered repeatable appearance.
For the property value of repeatable appearance, filter according to the word frequency of property value, by word frequency less than pre-
If the property value of property value word frequency threshold value filters out.
For the property value of non-duplicate appearance, filter according to the word frequency of property value context character string,
The word frequency of context character string is filtered out less than the property value presetting word frequency threshold value.Wherein property value is upper and lower
Chinese character string includes the total suffix information in statement or the adjacent character string with property value, adjacent character
String can include but not limited to: word, phrase, symbol etc..
Assume to judge when the property value of preset attribute word " film " is the non-property value repeated, can
Filter with the occurrence number according to the total suffix information in statement.Total suffix information refers to several
There is after property value identical suffix information.Such as: statement set has " Ruan Jingtian rely on " boat shoulder blade "
Obtain the 47th Taiwan Golden Horse Prize Best Male Lead Award.", " Liu De China 2004 is by film " nothing
Between road III " obtain Taiwan Golden Horse Prize Best Male Lead Award " " within 2007, beam obtains towards big by " color ring "
Taiwan Golden Horse Prize Best Male Lead Award " etc. statement, say, that property value " boat shoulder blade ", " continuously
Road III " and " color ring " comprise total suffix information " Taiwan Golden Horse Prize Best Male Lead Award ", then root
Filter according to the occurrence number of this total suffix information " Taiwan Golden Horse Prize Best Male Lead Award ", if
The occurrence number of " Taiwan Golden Horse Prize Best Male Lead Award " exceedes default suffix word frequency threshold value, then retain genus
Property value " boat shoulder blade ", " Infernal Affairs III " and " color ring ", otherwise, filters this out.
Or, it is also possible to filter according to the word frequency of the adjacent character string of property value.Property value adjacent
Character string includes two words before and after property value, and such as " TV play that nineteen eighty-two supervises at Li Tiansheng " is hunted
Eagle " inner serve as leading man one big gun for the first time and red ", property value " falcon " former and later two words are " " " respectively
" " ", then filter according to the word frequency of these former and later two words.
B, filtering based on part of speech feature, the property value that part of speech does not meets property value feature filters out.
The part of speech of property value is usually the type such as noun or noun phrase, as right in preset attribute word " film "
The property value " Guanyin Mountain " answered, " the first secretary ", " strange friend " etc..By part of speech be conjunction,
The property value of auxiliary word etc. filters out.
C, number of words based on property value, the property value that number of words exceedes the requirement of default number of words filters out.
The property value that some preset attribute word is corresponding may have identical number of words or number of words necessarily
Within the scope of, the property value not meeting the requirement of default number of words is filtered out.Such as, preset attribute word be " ×
× the date " time, the property value of its correspondence has fixing form, and number of words also will be within the scope of certain.
D, based on the number of stop words in property value participle, comprised stop words number is exceeded preset want
The property value asked filters out.
Extraction property value in containing stop words number too many time, the most less there is practical significance, thus
The property value that comprised stop words number exceedes default stop words number threshold value filters out.
Such as, " it has been admitted to that local best school " with pattern " being admitted to ... " from statement to take out
Obtain property value " that local best school ".In this property value, stop words includes " that
Individual " " best " " ", number is 3.When default stop words number threshold value is set to 1, property value
The stop words number of " that local best school " more than 1, then will be filtered.
After property value filtering module 206 is filtrated to get property value set, then forwards seed set to and obtain
In module 202, seed set acquisition module 202 utilizes the genus that described property value filtering module 206 obtains
Property value set mate in statement set, extraction comprises the statement of property value, and with " statement t belong to
Property value " form, generate new seed, add to seed set in.Constantly loop iteration, until
Algorithmic statement, obtains structured message, and described structured message includes: attribute word and property value.When not having
Have when being drawn into new pattern or new property value, it is believed that algorithmic statement.
The present invention passes through limited seed of handmarking, and from each seed, extraction obtains pattern, then profit
Extract property value with those pattern, utilize the property value that extraction obtains, use the mode of circulation constantly to change
In generation, obtain comprising attribute word and corresponding attribute-value structure information thereof.The present invention provide said method and
Device may be used for such as encyclopaedia, knows etc. that natural language data carry out structuring and resolve and obtain structuring
Data base, is then further used for search engine and realizes structured search (i.e. vertical search).Lead to below
Cross embodiment three and structured search process is described by embodiment four.
Embodiment three,
The searching method flow chart of the structured message that Fig. 3 provides for the embodiment of the present invention three, such as Fig. 3 institute
Showing, the method can perform following steps based on the structured message obtained:
Step S301, from user input query determine attribute word.
When user is by search engine input search word (query), need from query, determine attribute
Word, the determination process of this attribute word can use prior art, such as uses the mode of dictionary or template to obtain
Take, repeat no more.
Step S302, the attribute word utilizing step S301 to obtain mate in described structured message,
Obtain the attribute value information of correspondence, the attribute value information obtained is included in Search Results and returns to user.
Wherein structured message is to use the method as described in embodiment one to obtain.
At this it should be noted that the query generally often inputted during carrying out structured search
In also include entity word, in step S301, actually can determine whether out entity word and attribute word, and
In step 302, the structured message of inquiry is actually the structured message that this entity word is corresponding.Also
That is, need in advance to set up the structured message that each entity word is corresponding, due to entity word and attribute word it
Between the extraction of relation be prior art, therefore combine to use and the present invention extract attribute word and property value
Between the mode of corresponding relation just can set up the structured message that entity word is corresponding, in this structured message
Including attribute word and the property value of correspondence thereof.
Such as the query of " what school Liu Dehua is admitted to ", analyze its entity word for " Liu De
China ", attribute word is " being admitted to ".The attribute word obtained is utilized " to be admitted to " structure set up in the present invention
Searching in change information, this structured message is the structured message that entity word " Liu Dehua " is corresponding, obtains
Corresponding attribute value information " wireless performer training class ", thus can return correspondence by structured search
In the attribute value information of attribute word, thus realize structured search.Certainly, search for except return structureization
Result outside, it is also possible to return further the result of common big search to user.
Corresponding searcher is as shown in embodiment four.
Embodiment four,
The apparatus structure schematic diagram that Fig. 4 provides for the embodiment of the present invention four, as shown in Figure 4, this device can
To specifically include: analyze module 401 and matching module 402.
Analyze module 401 from the query that user inputs, obtain attribute word.
The attribute word that matching module 402 utilizes analysis module 401 to obtain mates in structured message,
To corresponding attribute value information, return to be included in Search Results by the attribute value information obtained and return to
User.Wherein structured message is to use the device as described in embodiment two to obtain.
At this it should be noted that the query generally often inputted during carrying out structured search
In also include entity word, the structured message of inquiry is actually the structured message that this entity word is corresponding.
It is to say, need in advance to set up the structured message that each entity word is corresponding, due to entity word and attribute word
Between the extraction of relation be prior art, therefore combine to use and the present invention extract attribute word and attribute
Between value, the mode of corresponding relation just can set up the structured message that entity word is corresponding, this structured message
Include the property value of attribute word and correspondence thereof.Analyzing module 401 can be further from user's input
Query obtains entity word.Now matching module 402 specifically utilizes and analyzes the attribute that module 401 obtains
Word mates in the structured message that described entity word is corresponding, obtains the attribute value information of correspondence.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all at this
Within the spirit of invention and principle, any modification, equivalent substitution and improvement etc. done, should be included in
Within the scope of protection of the invention.
Claims (12)
1. a structured message abstracting method, it is characterised in that including:
S1, obtain from language material and comprise the statement set of preset attribute word;
S2, obtain from described statement set and comprise the statement of preset attribute value as seed seed, constitute seed
Set;
S3, the pattern set that the described preset attribute word of extraction template pattern composition is corresponding from each seed,
Described pattern includes: meet the feature presetting feature request in the character string adjacent with property value and seed
Value;
S4, utilize extraction attribute in the statement set that each pattern in pattern set obtains to step S1
Value, obtains the property value set that described preset attribute word is corresponding;
S5, utilize described property value set to mate in statement set, extract new seed, go to step
Rapid S3, until algorithmic statement, obtains structured message, and described structured message includes: attribute word and attribute
Value;
In described step S3, also include that the pattern set to described preset attribute word is corresponding is filtered, mistake
Filtering method uses set forth below at least one: the frequency of occurrences based on pattern carry out filtering or based on
The semantic dependency of pattern filters.
Method the most according to claim 1, it is characterised in that in step s3, described and attribute
The character string that value is adjacent includes: word, phrase or symbol;
In described seed meet preset feature request eigenvalue include set forth below at least one: attribute
Noun in the previous verb of value, later verb, nearest noun, the part of speech of property value, property value
Number and the number of words of property value.
Method the most according to claim 1, it is characterised in that in described step S4, it is right also to include
The property value set that described preset attribute word is corresponding filters, and filter method uses one set forth below
Or several combination:
A, filter based on word frequency, word frequency is filtered out less than the property value of preset requirement;
B, filtering based on part of speech feature, the property value that part of speech does not meets property value feature filters out;
C, number of words based on property value, the property value that number of words exceedes the requirement of default number of words filters out;
D, based on the number of stop words in property value participle, comprised stop words number is exceeded preset want
The property value asked filters out.
Method the most according to claim 3, it is characterised in that before carrying out filtering based on word frequency,
First judge the most repeatable appearance of property value that described preset attribute word is corresponding, it is judged that rule be: sentence
Whether the property value that preset attribute word that disconnected extraction obtains is corresponding goes out in multiple statements of described statement set
Existing, if it is, think that property value corresponding to this preset attribute word is repeatable appearance;
For the property value of repeatable appearance, filter according to the word frequency of property value, by word frequency less than pre-
If the property value of property value word frequency threshold value filters out;
For the property value of non-duplicate appearance, filter according to the word frequency of property value context character string,
The word frequency of context character string is filtered out less than the property value presetting word frequency threshold value.
5. the searching method of a structured message, it is characterised in that the method comprises the following steps:
S6, from user input query obtain attribute word;
S7, the attribute word utilizing step S6 to obtain mate in structured message, obtain the attribute of correspondence
Value information, is included in the attribute value information obtained in Search Results and returns to described user;
Wherein said structured message uses the method for claim 1 to obtain.
Searching method the most according to claim 5, it is characterised in that also wrap in described step S6
Include: from the query of user's input, obtain entity word;
Coupling in described step S7 is particularly as follows: utilize the attribute word that step S6 obtains in described entity word
Corresponding structured message mates, obtains the attribute value information of correspondence.
7. a structured message draw-out device, it is characterised in that including:
Statement set acquisition module, for obtaining the statement set comprising preset attribute word from language material;
Seed gathers acquisition module, makees for obtaining the statement comprising preset attribute value from described statement set
For seed seed, constitute seed set;
Pattern abstraction module, constitutes described preset attribute word for extraction template pattern from each seed
Corresponding pattern set, described pattern includes: meet in the character string adjacent with property value and seed
Preset the eigenvalue of feature request;
Property value abstraction module, for utilizing each pattern in pattern set to obtain to described statement set
Take and the statement set that unit obtains is extracted property value, obtain the property value set that described preset attribute word is corresponding;
Described seed set acquisition module also utilizes property value set that described property value abstraction module obtains to statement
Set is mated, extracts new seed, join in seed set;
Described property value abstraction module, after algorithmic statement, obtains structured message, and described structured message includes:
Attribute word and property value;
Described device also includes:
Pattern filtering module, for carrying out the pattern set of described pattern abstraction module extraction
Being supplied to described property value abstraction module after filtration, filter method uses set forth below at least one: based on
The frequency of occurrences of pattern carries out filtering or semantic dependency based on pattern filters.
Device the most according to claim 7, it is characterised in that in described pattern abstraction module with
The character string that property value is adjacent includes: word, phrase or symbol;
In described seed meet preset feature request eigenvalue include set forth below at least one: attribute
Noun in the previous verb of value, later verb, nearest noun, the part of speech of property value, property value
Number and the number of words of property value.
Device the most according to claim 7, it is characterised in that this device also includes that property value filters
Module, is supplied to described seed after filtering the property value set that described preset attribute word is corresponding
Set acquisition module, filter method uses set forth below one or more to combine:
A, filter based on word frequency, word frequency is filtered out less than the property value of preset requirement;
B, filtering based on part of speech feature, the property value that part of speech does not meets property value feature filters out;
C, number of words based on property value, the property value that number of words exceedes the requirement of default number of words filters out;
D, based on the number of stop words in property value participle, comprised stop words number is exceeded preset want
The property value asked filters out.
Device the most according to claim 9, it is characterised in that described property value filtering module is entering
Before row filters based on word frequency, first judge the most repeatable appearance of property value that described preset attribute word is corresponding
, it is judged that rule be: judge whether to extract property value corresponding to the preset attribute word that obtains at institute's predicate
Multiple statements of sentence set occur, if it is, think that property value corresponding to this preset attribute word is can
Repeat;
For the property value of repeatable appearance, filter according to the word frequency of property value, by word frequency less than pre-
If the property value of property value word frequency threshold value filters out;
For the property value of non-duplicate appearance, filter according to the word frequency of property value context character string,
The word frequency of context character string is filtered out less than the property value presetting word frequency threshold value.
The searcher of 11. 1 kinds of structured messages, it is characterised in that this device also includes:
Analyze module, for the query from user's input, obtain attribute word;
Matching module, for utilizing attribute word that described analysis module obtains in described structured message
Join, obtain the attribute value information of correspondence, return to be included in Search Results by the attribute value information obtained
Return to described user;
Wherein said structured message uses device as claimed in claim 7 to obtain.
12. searchers according to claim 11, it is characterised in that described analysis module enters one
Step obtains entity word from the query that described user inputs;
The attribute word that described matching module specifically utilizes described analysis module to obtain is corresponding in described entity word
Structured message mates, obtains the attribute value information of correspondence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110459457.5A CN103186633B (en) | 2011-12-31 | 2011-12-31 | A kind of structured message abstracting method, searching method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110459457.5A CN103186633B (en) | 2011-12-31 | 2011-12-31 | A kind of structured message abstracting method, searching method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103186633A CN103186633A (en) | 2013-07-03 |
CN103186633B true CN103186633B (en) | 2016-08-17 |
Family
ID=48677802
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110459457.5A Active CN103186633B (en) | 2011-12-31 | 2011-12-31 | A kind of structured message abstracting method, searching method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103186633B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105488105B (en) * | 2015-11-19 | 2019-11-05 | 百度在线网络技术(北京)有限公司 | The treating method and apparatus of the method for building up of information extraction template, knowledge data |
CN106407377B (en) * | 2016-09-12 | 2020-03-03 | 北京百度网讯科技有限公司 | Search method and device based on artificial intelligence |
CN107341171B (en) * | 2017-05-03 | 2021-07-27 | 刘洪利 | Method for extracting data feature template and method and system for applying template |
CN107632975A (en) * | 2017-08-09 | 2018-01-26 | 联动优势科技有限公司 | A kind of dictionary method for building up and equipment |
CN110245329A (en) * | 2018-03-07 | 2019-09-17 | 珠海金山办公软件有限公司 | Text managemant method, apparatus, electronic equipment and computer readable storage medium |
CN110162786B (en) * | 2019-04-23 | 2024-02-27 | 百度在线网络技术(北京)有限公司 | Method and device for constructing configuration file and extracting structured information |
CN111309853B (en) * | 2019-09-03 | 2024-03-22 | 东南大学 | Code searching method based on structured information |
CN111666417B (en) * | 2020-04-13 | 2023-06-23 | 百度在线网络技术(北京)有限公司 | Method, device, electronic equipment and readable storage medium for generating synonyms |
CN111695518B (en) * | 2020-06-12 | 2023-09-29 | 北京百度网讯科技有限公司 | Method and device for labeling structured document information and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101702167A (en) * | 2009-11-03 | 2010-05-05 | 上海第二工业大学 | Method for extracting attribution and comment word with template based on internet |
CN101937433A (en) * | 2009-06-29 | 2011-01-05 | 天津一度搜索网络科技有限公司 | Real-time searching method of product |
CN102200983A (en) * | 2010-03-25 | 2011-09-28 | 日电(中国)有限公司 | Attribute extraction device and method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5372536B2 (en) * | 2009-01-28 | 2013-12-18 | ソニー株式会社 | Information processing apparatus, information processing method, and program |
-
2011
- 2011-12-31 CN CN201110459457.5A patent/CN103186633B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101937433A (en) * | 2009-06-29 | 2011-01-05 | 天津一度搜索网络科技有限公司 | Real-time searching method of product |
CN101702167A (en) * | 2009-11-03 | 2010-05-05 | 上海第二工业大学 | Method for extracting attribution and comment word with template based on internet |
CN102200983A (en) * | 2010-03-25 | 2011-09-28 | 日电(中国)有限公司 | Attribute extraction device and method |
Non-Patent Citations (3)
Title |
---|
基于Boot Strapping 的中文实体关系自动生成;张素香 等;《微电子学与计算机》;20061205;第23卷(第12期);第15-18页 * |
基于相似计算的信息抽取模板自动获取方法;叶娜 等;《第二届全国学生计算语言学研讨会论文集》;20040801;第434-439页 * |
维基百科人物属性自动获取方法研究;孟新萍 等;《第五届全国青年计算语言学研讨会论文集》;20101011;第452-458页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103186633A (en) | 2013-07-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103186633B (en) | A kind of structured message abstracting method, searching method and device | |
CN103886063B (en) | A kind of text searching method and device | |
CN103927358B (en) | text search method and system | |
KR101661198B1 (en) | Method and system for searching by using natural language query | |
CN104281702B (en) | Data retrieval method and device based on electric power critical word participle | |
CN105447080B (en) | A kind of inquiry complementing method in community's question and answer search | |
CN104021198B (en) | The relational database information search method and device indexed based on Ontology | |
RU2004108667A (en) | SEARCH FOR A RANDOM TEXT AND SEARCH FOR ATTRIBUTES IN THE DATA OF THE ELECTRONIC GUIDE FOR PROGRAMS | |
CN106446018B (en) | Query information processing method and device based on artificial intelligence | |
CN106202211A (en) | A kind of integrated microblogging rumour recognition methods based on microblogging type | |
CN102073729A (en) | Relationship knowledge sharing platform and implementation method thereof | |
CN101196898A (en) | Method for applying phrase index technology into internet search engine | |
CN105718585B (en) | Document and label word justice correlating method and its device | |
CN105528437A (en) | Question-answering system construction method based on structured text knowledge extraction | |
CN103123624A (en) | Method of confirming head word, device of confirming head word, searching method and device | |
CN111190900A (en) | JSON data visualization optimization method in cloud computing mode | |
CN101788988A (en) | Information extraction method | |
CN111475625A (en) | News manuscript generation method and system based on knowledge graph | |
CN109522396B (en) | Knowledge processing method and system for national defense science and technology field | |
KR20210130976A (en) | Device, method and computer program for deriving response based on knowledge graph | |
JP5504097B2 (en) | Binary relation classification program, method and apparatus for classifying semantically similar word pairs into binary relation | |
CN103514289A (en) | Method and device for building interest entity base | |
CN105956158A (en) | Automatic extraction method of network neologism on the basis of mass microblog texts and use information | |
CN109284362A (en) | A kind of content search method and system | |
Menaha et al. | Question answering system using web snippets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |