CN1794240A - Computer information retrieval system based on natural speech understanding and its searching method - Google Patents

Computer information retrieval system based on natural speech understanding and its searching method Download PDF

Info

Publication number
CN1794240A
CN1794240A CN 200610032725 CN200610032725A CN1794240A CN 1794240 A CN1794240 A CN 1794240A CN 200610032725 CN200610032725 CN 200610032725 CN 200610032725 A CN200610032725 A CN 200610032725A CN 1794240 A CN1794240 A CN 1794240A
Authority
CN
China
Prior art keywords
sentence
semantic
answer
target
semantic relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200610032725
Other languages
Chinese (zh)
Inventor
梁威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Shenzhen Graduate School
Original Assignee
Peking University Shenzhen Graduate School
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Shenzhen Graduate School filed Critical Peking University Shenzhen Graduate School
Priority to CN 200610032725 priority Critical patent/CN1794240A/en
Publication of CN1794240A publication Critical patent/CN1794240A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This invention relates to a computer information search system based on the understanding of natural languages and a search method, in which, the search is started by the interrogative sentence input by a user and the system outputs a sequential answer according to the related program of the semantic meaning, first of all, articles from the internet and data from the content database are processed by the HNC sentence analysis module to get a being selected solution sentence repository with labels, then the interrogative sentences input by the user is processed by the HNC analysis module to get the HNC structure to enter into the interrogative analysis module for analysis to generate an equal semantic target sentence mode sequence then concept similarity computation is carried out to the words and expression blocks in the being selected solution sentence and target solution sentence mode in the repository by a sentence mode matching module to compare the being selected and target sentences to get the marks of accuracy for the result of the sentence mode match, the semantic relation structure identification match result and the solution to array in terms of the correctness and feeds back the result.

Description

Machine information retrieval system and search method thereof based on natural language understanding
Technical field
The present invention relates to a kind of machine information retrieval system, relate in particular to and use the natural language mode to put question to the computer system of carrying out information retrieval.
Background technology
Machine information retrieval system helps us to search the information material of wanting in the information ocean of vastness, the at present instrument such as the query software of retrieving information, search engine such as google etc., employing mainly be the keyword coupling, technology such as web page interlinkage analysis.But people are difficult to the search intention by the incompatible accurate definition of simple keyword sets oneself, and the search procedure of keyword coupling is not handled the combination of speech meaning, the semantic factors such as semantic relation of statement inside.So often having to spend the plenty of time that the results web page tabulation of huge amount is remake the Artificial Cognition, people seek desirable answer.
The information inquiry user wishes to use more natural, and mode defines the search request of oneself more accurately, and hope can access at semanteme, meets the answer of query intention on the knowledge aspect exactly, and is not only the answer tabulation that matches keyword.
The present invention utilizes natural language processing technique, the mode that allows the user to put question to natural language proposes search request to system, system is by the statement analysis to interrogative sentence, extraction and identification to the statement pattern and the semantic relation structure of target answer, thereby can identify on semanteme at all levels and the immediate answer content of target answer, and return to the answer of user's high accuracy.
Summary of the invention
The objective of the invention is to set up one efficiently, the model of unified knowledge processing generates a kind of computer system of setting up the natural language knowledge base.
A kind of machine information retrieval system based on natural language understanding, by the interrogative sentence startup retrieval of user's input, system's output is according to the answer of semantic degree of correlation ordering; Comprise HNC sentence category analysis (sca) module and sentence quasi-mode matching module and order module; Described HNC sentence category analysis (sca) module is analyzed the knowledge base that constitutes the answer sentence to be selected with mark to article and content from internet or other guide source, also the interrogative sentence that starts retrieval is carried out a class formation analysis and obtain target sentence quasi-mode, semantic objects sentence mode sequences such as generation; By the answer sentence coupling to be selected in described sentence quasi-mode matching module and the described knowledge base, matching result is sorted by described order module again.
A kind of computer information retrieval method based on natural language understanding, by the interrogative sentence startup retrieval of user's input, system's output is according to the answer of semantic degree of correlation ordering; Comprise following treatment step: the first step from the article of internet and the processing of the process of the data in content data base HNC sentence category analysis (sca) module, obtains to have the answer sentence knowledge base to be selected of mark; Second step, the interrogative sentence of described user's input at first calls the HNC sentence class formation that HNC sentence category analysis (sca) resume module obtains interrogative sentence, entering the interrogative sentence analysis module analyzes, enter then in the analysis module of query center and handle, and extracting objects answer sentence pattern on this basis, semantic objects sentence mode sequences such as generation; The 3rd step, answer sentence to be selected that has marked in the knowledge base and target answer sentence pattern (sequence) process sentence quasi-mode matching module are to word, semantic chunk carries out concept similarity and calculates, sentence to be selected is compared with the target sentence, obtain sentence quasi-mode matching result, semantic relation structure identification and matching result and answer accuracy score; The 4th step: according to the ordering of answer correctness, and return results.
In described the 3rd step, for the situation that is difficult to be suitable for sentence quasi-mode coupling, cross one another semantic relation matrix between each concept element (the perhaps combination of concept element) in the semantic relation structure extracting objects sentence of HNC sentence category analysis (sca).By hypothesis-verification scheme module, utilize the linguistic knowledge of system to come verification to calculate the degree of confidence that contains certain target semantic relation in the knowledge base sentence to be selected, find the similarity of the deep layer semanteme that implied under the different language performance forms with this.
Compare with similar technology in the past, the present invention adopts the HNC natural language understanding technology interrogative sentence and sentence to be selected to be done the semantic pattern identification and the semantic relation extraction of deep layer, not only broken through the disadvantage that does not have semantic association in the information query technique in the past between the query word, and the interrogative sentence that can be proposed by the pattern matching method match user and the semantic similarities and differences between the sentence to be selected; Extract interrogative sentence in meaning of a word notion by the semantic relation method of identification, the semantic relation of the inner meaning of a word collocation of semantic chunk, between the semantic chunk, semantic relation between the varigrained ingredient of each of statement, and discern sentence to be selected and whether have corresponding semanteme or semantic relation, conform with semanteme and the higher answer of accuracy so can offer the user.Because system accepts the query requests that the user puts question to natural language, make the user to make things convenient for and the query intention of definition oneself accurately; Because the question sentence analysis is done to the question sentence of inquiry by system, can discern the semantic relation of question sentence and the requirement of target answer.
Description of drawings
The present invention includes following accompanying drawing:
Fig. 1 is a HNC concept similarity computation process process flow diagram;
Fig. 2 is the target answer sentence mode sequences generative process of pattern matching method;
Fig. 3 pattern matching algorithm process block scheme;
Fig. 4 is a semantic relation method of identification treatment step process flow diagram;
Fig. 5 is that system forms structure and operation logic.
Specific implementation method
Below in conjunction with accompanying drawing the present invention is described in further details.
The present invention is a kind of technology of using natural language understanding technology to carry out information retrieval, system accepts the query requests that the user puts question in the natural language mode, after statement to be selected is carried out the natural language analysis of object-oriented answer, return to user's answer the most accurately.
The present invention adopt the HNC natural language processing technique to from the internet or the statement chapter of the natural language form that obtains of other guide source carry out sentence category analysis (sca), with sentence category analysis (sca) as a result statement and be kept in the knowledge base (KB) with HNC sentence class mark as answer sentence to be selected.
After system accepted the query requests of user with the natural language enquirement, system carried out interrogative to interrogative sentence earlier, and the query center is analyzed, and sought best target answer by dual mode then.
1. pattern matching method: system obtains the sentence quasi-mode (target sentence quasi-mode) of interrogative sentence by the HNC sentence category analysis (sca).(sentence to be selected can be the simple sentence of different sentence class forms to the sentence to be selected of identical for having (close) sentence quasi-mode, the mixed sentence class, the form of compound sentence), system obtains to be selected the order of accuarcy with respect to the target answer by the concept similarity between each the corresponding semantic chunk that calculates target sentence and sentence to be selected.
2. semantic relation method of identification: the to be selected sentence bigger for sentence class difference can not use pattern matching method, and can only use the semantic relation method of identification to seek the target answer.
System to each of interrogative sentence by word, speech, semantic chunk, semantic relation between the combination of semantic primitive such as statement or semantic primitive is found identification, and establishing target semantic relation matrix thus, attempt then in sentence to be selected, finding corresponding semantic relation, the degree of confidence that has certain target semantic relation in the sentence to be selected is calculated by a hypothesis-evaluation mechanism by system, obtains the answer accuracy of sentence to be selected with respect to the target answer by the degree of confidence of each relation and the COMPREHENSIVE CALCULATING result of its weight at last.
The present invention adopts the HNC natural language understanding technology interrogative sentence and sentence to be selected to be done the semantic pattern identification and the semantic relation extraction of deep layer, not only broken through the disadvantage that does not have semantic association in the information query technique in the past between the query word, and the interrogative sentence that can be proposed by the pattern matching method match user and the semantic similarities and differences between the sentence to be selected; Extract interrogative sentence in meaning of a word notion by the semantic relation method of identification, the semantic relation of the inner meaning of a word collocation of semantic chunk, between the semantic chunk, semantic relation between the varigrained ingredient of each of statement, and discern sentence to be selected and whether have corresponding semanteme or semantic relation, conform with semanteme and the higher answer of accuracy so can offer the user.
What is the interrogative sentence specificity analysis? answer has different requirements to the different interrogative sentence structures that interrogative guided to target, for the ease of at analysis, system definition two notions: query center, query centre word.
The query centre word: the yet interrogative guides, the word of modification.
Query center: the structure that interrogative and query centre word are formed.
System obtains the notion and the desired semantic structure of target answer by the analysis to query center and query centre word, and with the counter structure matching ratio of this and sentence to be selected, as a key factor calculating sentence answer accuracy to be selected.
From the HNC angle to interrogative, the analysis at query center, the statistics of acquisition is as shown in the table, wherein code such as J111, JK etc. are the concept symbols of the defined descriptive language semanteme of HNC, its meaning defines in the CN98101921.8 patent.
Interrogative Frequency Typical structure The query center, the target answer is described
What 913 [statement express J] [being j111] what [classification, country, the time, content etc.]? Interrogative " what " and query centre word [classification, country, the time, content etc.] serve as the JK of sentence.Expression is to the query of the semantic chunk that it substituted.Target answer: meet the concept similarity requirement with the query centre word.
Who 214 Who be [modifier] [h$141, h$ug] [people p genus] [being j111]? " who " serves as a JK, target answer: p, pe in sentence
What 166 [modifier] [the quantitative attribute notion: long, height, speed] [having, is j111] what [notion zz of unit of quantity]? Interrogative " how much " replaces the modification of quantity, and expression is to the query of quantity.Target answer: number j3
Many, [how] 112 [modifier] [Jkn] [having, is j111] many [attributive concept: long, height greatly, for a long time, waits u soon]? Interrogative " many " substitutes the quantity description query centre word is modified, and represents to quantity the query of degree.Target answer: number j3, or the notion of expression amount.J41, requiring of unit of quantity's notion needs of jzu41 and target answer and query center is corresponding.
Which 58 Any [measure word zz] [p, pe, w, pw, the jw genus, or static g, effect r notion, or class concepts] [J is expressed in statement]? A JK of question sentence is often served as in interrogative " where " and the query centre word combination of being modified.Target answer: concept and range of query centre word ordinary representation that " where " guided, the notion of a classification, the target answer is concrete concept normally, proper noun etc.
Which [a bit] [notion] with class-meaning? " which " is a special interrogative, and the answer of its requirement is not one, but satisfactory a plurality of answer.
Where 105 Where be [J is expressed in statement] [at v50001]? The auxilliary piece FK in place is served as at the query center, and the target answer: concept type is the wj2 genus.
Why 15 [why] [J is expressed in statement]? Interrogative " why " alternative reason for the E piece in sentence, the modification of purpose etc., expression is for reason Pr, the query of purpose Rt.Target answer: the semantic component that has corresponding semantic relation structure with question sentence.
How, how, how 35 [how, how, how] [J is expressed in statement]? Expression is for means Ms, approach Wy, and instrument In, condition C n etc. modify the query of E piece.Target answer: the semantic component that has corresponding semantic relation structure with question sentence.
The solution strategies of target answer:
By using the analysis of HNC theory to interrogative sentence and sentence to be selected, this paper proposes to seek two kinds of solution strategies of interrogative sentence target answer: pattern matching method, semantic relation method of identification.
In the HNC theoretical system, the HNC concept symbols is the fundamental element of expressing natural language formalization semanteme, so the similarity comparative approach of HNC concept symbols was discussed earlier before two kinds of solution strategies are discussed.
The HNC concept similarity compares:
For the ambiguity ambiguity of word, can obtain the certain semantic notion of this word in sentence in multiselect one ground by the HNC sentence category analysis (sca).Similarity degree between two word notions can relatively obtain by the HNC concept symbols for the two.
The structure of HNC concept symbols [1]:
((class code string) (level symbol string) (unitized construction symbol) (class code string) (level symbol string))
HNC concept similarity computation process is as shown in Figure 1:
At first relatively the concept classification of interrogative sentence and target sentence judges whether its concept classification symbol is identical, if inequality, then concept similarity is 0.0, finishes the calculating of concept similarity;
If the concept classification symbol is identical, then carry out comparison of five-tuple symbol and grammatical symbol respectively relatively; Judge that its concept hierarchy symbol still is senior middle school's low layer array mode for the mode of being affiliated to;
If senior middle school's low layer array mode judges at first whether high-rise symbol is identical, and then compare middle level symbol sebolic addressing and low layer symbol sebolic addressing respectively; The COMPREHENSIVE CALCULATING concept similarity finishes the calculating of concept similarity then;
If be affiliated to mode, judge at first whether the body layer symbol is identical, relatively be affiliated to a layer symbol sebolic addressing again; The COMPREHENSIVE CALCULATING concept similarity finishes the calculating of concept similarity then.
HNC concept similarity computing method are:
simConcept ( t , b ) =
simCat ( t , b ) βcat + ΣsimFiv ( t , b ) βfiv + simSynt ( t , b ) βsyn
Figure A20061003272500093
Each symbol implication in the formula:
SimConcept: notion b to be selected is with respect to the concept similarity of target concept t.
SimCat: concept classification similarity.
SimFiv: the concept similarity of five-tuple symbol sebolic addressing.
SimSyn: the similarity of grammatical symbol.
SimNou: body layer concept similarity.
SimRe: be affiliated to a layer concept similarity.
SimHigh: high-level concept similarity.
SimMid: middle level concept similarity.
SimLow: bottom concept similarity.
β: the calculating weight parameter of corresponding concept symbols part.
Pattern matching method:
Can obtain comprising the target sentence quasi-mode at query center by interrogative sentence being carried out the HNC sentence category analysis (sca).Target sentence quasi-mode can by etc. the sentence class format conversion of semanteme obtain a target sentence quasi-mode sequence.Target sentence quasi-mode and answer sentence pattern various piece to be selected are carried out matching ratio than the semantic similarity that can judge the two, and answer sentence to be selected comprises the degree of target answer.Pattern matching method is suitable for the interrogative sentence situation identical or close with the sentence class of answer sentence to be selected.
Be the target answer sentence mode sequences generative process (but dotted portion is represented lacuna) of pattern matching method as shown in Figure 2.Described target sentence mode sequences generative process is characterized as: to target answer sentence according to different sentence class forms, increase and decrease semantic chunk designator, and target answer sentence mode sequences that language performance form different identical with the method generative semantics of adjusting the semantic chunk position; To target answer sentence according to different sentence class forms, increase and decrease semantic chunk designator, and target answer sentence mode sequences that language performance form different identical with the method generative semantics of adjusting the semantic chunk position.At first generate the query center by query center semantic chunk JK or FK, comprise interrogative, with the query centre word that follows or modify interrogative closely, can also be simultaneously generating the accumulated value of auxilliary piece FK before the query center and the accumulated value of main piece JK after the query center; If expressing J (the perhaps several portions of J), the accumulated value of auxilliary piece FK and the accumulated value of main piece JK in conjunction with statement simultaneously end to end, the target sentence quasi-mode sequence that then waits semantic sentence class format conversion to be produced, order comprises accumulated value, the semantic chunk designator of auxilliary piece FK and main piece JK, and the accumulated value of query center semantic chunk JK or FK, semantic chunk designator and auxilliary piece FK and main piece JK.
The pattern matching algorithm process judges at first whether target pattern sentence class is identical with answer sentence class to be selected as shown in Figure 3,, is then handled the end mode coupling if inequality fully by the semantic relation identification module; For mixed sentence, the sentence to be selected that the compound sentence part is identical then identifies this class each semantic chunk partly in the mixed sentence class, handles equally with the identical part of heel sentence class; If the sentence class is identical, then for each semantic chunk, carry out the semantic chunk similarity one by one relatively, respectively relatively the concept similarity of GBK piece core word, comparison GBK piece modify part concept similarity, each GBK of comparison FK and corresponding FK concept similarity and for the analysis and the calculating of query center and target answer notion, the answer accuracy score of COMPREHENSIVE CALCULATING answer sentence to be selected then.
Sentence semantic chunk to be selected with respect to the semantic chunk similarity calculating method of the target sentence semantic chunk of correspondence is:
simChunk(Chunkt,Chunkb)=(∑simConcept(Mti,Mbi)βm+∑simConcept(Kti,Kbi)βk)/Tt
The answer accuracy computing method of pattern matching method:
correctness ( St , Sb ) = Σ i = 1 n simChunk ( Chunkti , Chunkbi ) + answFitness ( St , Sb )
Each symbol implication: answFitness in the formula: sentence to be selected is answered the answer degree of leaning on for target.
Tt: the target semantic chunk participates in notion element number relatively.
M: the qualifier of semantic chunk.
K: the core word of semantic chunk.
Correctness: the answer accuracy of sentence to be selected.
The semantic relation method of identification:
Semantic relation method of identification, its basic thought be find as far as possible and the extracting objects sentence at different ingredients, the various semantic relations between the different grain size level are attempted in sentence to be selected discovery then and are identified semantic relation similar between the corresponding notion.Basic semantic relation has: the notion syntagmatic as the effect, effect, object, content comprises, polarization, subject-predicate, logic; The internal relations of sentence class formation; And the relation of expression World Affairs.
Because the diversity of natural language expressing, a semantic relation can be sloughed off as nested sentence by multiple simple or complicated structure, the fast expansion, and semantic chunk separates, simple sentence, mixed sentence, forms such as compound sentence are expressed.So system has adopted a kind of hypothesis-verification scheme of object-oriented semantic relation in the semantic relation method of identification, utilize the linguistic knowledge of system to come verification to calculate the degree of confidence (even just partly being consistent) that contains certain target semantic relation in the sentence to be selected, find the deep layer semanteme that implied under the different language performance forms with this with target semantic relation structure.
Fig. 4 is a semantic relation method of identification treatment step, at first find the existing notion identical or similar in the sentence to be selected with the target sentence, again cross one another semantic relation matrix between each concept element in the target sentence (the perhaps combination of concept element) is performed an analysis and extract, obtain respectively based on the semantic relation of notion collocation, based on the semantic relation of sentence class formation, based on the semantic relation of modified relationship and the semantic relation of the knowledge that meets the needs of the world; And then utilize the verification of present on-the-spot statement analysis result to calculate to various semantic relations hypothesis; Discovery and the hypothesis verification of the corresponding semantic relation of in the semantic relation matrix each in sentence to be selected, and calculate for the similarity of pairing coupling in query center and target answer notion; The COMPREHENSIVE CALCULATING of the similarity of each semantic relation in the semantic relation matrix is obtained the answer accuracy of sentence to be selected.
The answer accuracy computing method of semantic relation identification:
correctness ( St , Sb ) = Σ i = 1 n simSynR ( Rti , Rbi ) confid ( confidRti , cinfidRbi ) βi + answFitness ( St , Sb )
Each symbol implication:
N: the semantic relation number of the semantic primitive in the target sentence in the semantic matrix (or semantic primitive combination)
SimSynR: semantic relation similarity.
R: the semantic relation of target sentence (sentence to be selected).
Confid: by the degree of confidence for the two similarity of the degree of confidence gained of two semantic relations.
ConfidR: the degree of confidence of semantic relation.
β i: the calculating weight parameter of semantic relation i.
As Fig. 5 is that system forms structure and operation logic figure, has described the execution sequence of the streams data in the database under execution module control.From the article in internet or other guide source and the processing of the process of the data in content data base HNC sentence category analysis (sca) module, obtain to have the answer sentence knowledge base to be selected that has marked; The interrogative sentence of user's input at first enters the interrogative sentence analysis module and analyzes, and enters then in the analysis module of query center and handles, and semantic relation structure abstraction module is handled the semantic relation matrix that combining target answer sentence pattern (sequence) obtains the target sentence; The interrogative sentence of interrogative sentence analysis module analysis also enters the HNC sentence class formation data that HNC sentence category analysis (sca) resume module obtains interrogative sentence, query center analysis module also obtains comprising the query centre data of query center to the requirement of target answer, and the query centre data combines the target answer sentence pattern (sequence) that obtains with the HNC sentence class formation data of interrogative sentence.Generate the semantic relation matrix of target sentence through the interrogative sentence semantic relation structure abstraction module combining target answer sentence pattern (sequence) of query center analysis module processing, in conjunction with the answer sentence to be selected that has marked in the knowledge base, enter hypothesis-verification scheme module again to the identification of answer sentence to be selected at target answer semantic relation matrix; The answer sentence combining target answer sentence pattern to be selected (sequence) that has marked in the knowledge base is handled through sentence quasi-mode matching module, with hypothesis verification scheme coupling, obtain sentence quasi-mode matching result, semantic relation structure identification and matching result and answer accuracy score, according to the ordering of answer correctness, the answer after obtaining to sort is tabulated again.

Claims (10)

1. the machine information retrieval system based on natural language understanding is retrieved by the interrogative sentence startup of user's input, and system's output is according to the answer of semantic degree of correlation ordering; It is characterized in that, comprise HNC sentence category analysis (sca) module and sentence quasi-mode matching module and order module; Described HNC sentence category analysis (sca) module is analyzed the knowledge base that constitutes the answer sentence to be selected with mark to article and content from internet or other guide source, also the interrogative sentence that starts retrieval is carried out a class formation analysis and obtain target sentence quasi-mode, semantic objects sentence mode sequences such as generation; By the answer sentence coupling to be selected in described sentence quasi-mode matching module and the described knowledge base, matching result is sorted by described order module again.
2. according to the described machine information retrieval system of claim l based on natural language understanding, it is characterized in that, described target sentence quasi-mode is for the answer sentence to be selected that has identical or close sentence quasi-mode in the described knowledge base, and system obtains to be selected the order of accuarcy with respect to the target answer by similarity between the notion of calculating target sentence and each corresponding semantic chunk of sentence to be selected:
simConcept(t,b)=
simCat(t,b)βcat+∑simFiv(t,b)βfiv+simSynt(t,b)βsyn
Each symbol implication: simConcept in the formula: notion b to be selected is with respect to the concept similarity of target concept t; SimCat: concept classification similarity: simFiv: the concept similarity of five-tuple symbol sebolic addressing; SimSyn: the similarity of grammatical symbol; SimNou: body layer concept similarity; SimRe: be affiliated to a layer concept similarity; SimHigh: high-level concept similarity; SimMid: middle level concept similarity; SimLow: bottom concept similarity; β: the calculating weight parameter of corresponding concept symbols part.
3. the machine information retrieval system based on natural language understanding according to claim 1, it is characterized in that, described target sentence quasi-mode by etc. the sentence class format conversion of semanteme obtain a target sentence quasi-mode sequence, target sentence quasi-mode and described answer sentence pattern various piece to be selected are carried out matching ratio judge the semantic similarity of the two, and answer sentence to be selected comprises the degree of target answer, and sentence semantic chunk to be selected with respect to the semantic chunk similarity calculating method of the target sentence semantic chunk of correspondence is:
simChunk(Chunkt,Chunkb)=(∑simConcept(Mti,Mbi)βm+∑simConcept(Kti,Kbf)βk)/Tt
The answer accuracy computing method of pattern matching method:
correctness ( St , Sb ) = Σ i = 1 n simChunk ( Chunkti , Chunkbi ) + answFitness ( St , Sb ) .
4. the machine information retrieval system based on natural language understanding according to claim 1, it is characterized in that, also comprise hypothesis-verification scheme module, the semantic relation matrix of described target sentence passes through hypothesis-verification scheme module analysis to the identification of answer sentence to be selected at target answer semantic relation matrix; Described hypothesis-verification scheme module, utilize the linguistic knowledge of system to come verification to calculate the degree of confidence that contains certain target semantic relation in the described knowledge base sentence to be selected, find the deep layer semanteme that implied under the different language performance forms with this, the answer accuracy computing method of semantic relation identification:
correctness ( St , Sb ) = Σ i = 1 n simSynR ( Rti , Rbi ) confid ( confidRti , confidRbi ) βi + answFitness ( St , Sb )
Each symbol implication:
N: the semantic relation number of the semantic primitive in the target sentence in the semantic matrix (or semantic primitive combination);
SimSynR: semantic relation similarity;
R: the semantic relation of target sentence (sentence to be selected);
Confid: by the degree of confidence for the two similarity of the degree of confidence gained of two semantic relations;
ConfidR: the degree of confidence of semantic relation;
β i: the calculating weight parameter of semantic relation i.
5. the machine information retrieval system based on natural language understanding according to claim 1, it is characterized in that, described semantic relation structure abstraction module performs an analysis to cross one another semantic relation matrix between each concept element in the target sentence (the perhaps combination of concept element) and extracts, and obtains respectively based on the semantic relation of notion collocation, based on the semantic relation of sentence class formation, based on the semantic relation of modified relationship and the semantic relation of the knowledge that meets the needs of the world; And then utilize the verification of present on-the-spot statement analysis result to calculate to various semantic relations hypothesis.
6. the computer information retrieval method based on natural language understanding is retrieved by the interrogative sentence startup of user's input, and system's output is according to the answer of semantic degree of correlation ordering; It is characterized in that, comprise following treatment step: the first step from the article of internet and the processing of the process of the data in content data base HNC sentence category analysis (sca) module, obtains to have the answer sentence knowledge base to be selected of mark; Second step, the interrogative sentence of described user's input at first calls the HNC sentence class formation that HNC sentence category analysis (sca) resume module obtains interrogative sentence, entering the interrogative sentence analysis module analyzes, enter then in the analysis module of query center and handle, and extracting objects answer sentence pattern on this basis, semantic objects sentence mode sequences such as generation; The 3rd step, answer sentence to be selected that has marked in the knowledge base and target answer sentence pattern (sequence) process sentence quasi-mode matching module are to word, semantic chunk carries out concept similarity and calculates, sentence to be selected is compared with the target sentence, obtain sentence quasi-mode matching result, semantic relation structure identification and matching result and answer accuracy score; The 4th step: according to the ordering of answer correctness, and return results.
7. the computer information retrieval method based on natural language understanding according to claim 6, it is characterized in that, in described the 3rd step, for the situation that is difficult to be suitable for sentence quasi-mode coupling, cross one another semantic relation matrix between each concept element (the perhaps combination of concept element) in the semantic relation structure extracting objects sentence of HNC sentence category analysis (sca).By hypothesis-verification scheme module, utilize the linguistic knowledge of system to come verification to calculate the degree of confidence that contains certain target semantic relation in the knowledge base sentence to be selected, find the similarity of the deep layer semanteme that implied under the different language performance forms with this.
8. the computer information retrieval method based on natural language understanding according to claim 6, it is characterized in that, described target answer sentence mode sequences generative process comprises the steps: to generate the query center by query center semantic chunk JK or FK, comprise interrogative, with the query centre word that follows or modify interrogative closely, and simultaneously generating the accumulated value of auxilliary piece FK before the query center and the accumulated value of main piece JK after the query center; If expressing J (the perhaps several portions of J), the accumulated value of auxilliary piece FK and the accumulated value of main piece JK in conjunction with statement simultaneously end to end, then by etc. the target sentence quasi-mode sequence order that produced of the sentence class format conversion of semanteme comprise accumulated value, the semantic chunk designator of auxilliary piece FK and main piece JK and the accumulated value of query center semantic chunk JK or FK, semantic chunk designator and auxilliary piece FK and main piece JK.
9. the computer information retrieval method based on natural language understanding according to claim 6, it is characterized in that, comprise the steps in described the 3rd step, judge at first whether target pattern sentence class is identical with answer sentence class to be selected, if it is inequality fully, then handle the end mode coupling by the semantic relation identification module; For mixed sentence, the sentence to be selected that the compound sentence part is identical then identifies this class each semantic chunk partly in the mixed sentence class, handles equally with the identical part of heel sentence class; If the sentence class is identical, then for each semantic chunk, carry out the semantic chunk similarity one by one relatively, respectively relatively the concept similarity of GBK piece core word, comparison GBK piece modify part concept similarity, each GBK of comparison FK and corresponding FK concept similarity and for the analysis and the calculating of query center and target answer notion, the answer accuracy score of COMPREHENSIVE CALCULATING answer sentence to be selected then.
10. the computer information retrieval method based on natural language understanding according to claim 6, it is characterized in that, comprise the steps in described the 4th step, at first find the existing notion identical or similar in the sentence to be selected with the target sentence, again cross one another semantic relation matrix between each concept element in the target sentence (the perhaps combination of concept element) is performed an analysis and extract, obtain respectively based on the semantic relation of notion collocation, based on the semantic relation of sentence class formation, based on the semantic relation of modified relationship and the semantic relation of the knowledge that meets the needs of the world; And then utilize the verification of present on-the-spot statement analysis result to calculate to various semantic relations hypothesis; Discovery and the hypothesis verification of the corresponding semantic relation of in the semantic relation matrix each in sentence to be selected, and calculate for the similarity of pairing coupling in query center and target answer notion; The COMPREHENSIVE CALCULATING of the similarity of each semantic relation in the semantic relation matrix is obtained the answer accuracy of sentence to be selected.
CN 200610032725 2006-01-09 2006-01-09 Computer information retrieval system based on natural speech understanding and its searching method Pending CN1794240A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200610032725 CN1794240A (en) 2006-01-09 2006-01-09 Computer information retrieval system based on natural speech understanding and its searching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200610032725 CN1794240A (en) 2006-01-09 2006-01-09 Computer information retrieval system based on natural speech understanding and its searching method

Publications (1)

Publication Number Publication Date
CN1794240A true CN1794240A (en) 2006-06-28

Family

ID=36805674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200610032725 Pending CN1794240A (en) 2006-01-09 2006-01-09 Computer information retrieval system based on natural speech understanding and its searching method

Country Status (1)

Country Link
CN (1) CN1794240A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8180628B2 (en) 2007-07-05 2012-05-15 Nec (China) Co., Ltd. Apparatus and method for expanding natural language query requirement
CN101763401B (en) * 2009-12-30 2012-05-30 暨南大学 Network public sentiment hotspot prediction and analysis method
WO2012109786A1 (en) * 2011-02-16 2012-08-23 Empire Technology Development Llc Performing queries using semantically restricted relations
CN102662930A (en) * 2012-04-16 2012-09-12 乐山师范学院 Corpus tagging method and corpus tagging device
WO2013016854A1 (en) * 2011-07-29 2013-02-07 Empire Technology Development Llc But reasoning in inconsistent knowledge base
CN103577558A (en) * 2013-10-21 2014-02-12 北京奇虎科技有限公司 Device and method for optimizing search ranking of frequently asked question and answer pairs
CN104182386A (en) * 2013-05-27 2014-12-03 华东师范大学 Word pair relation similarity calculation method
CN105378729A (en) * 2013-11-27 2016-03-02 Ntt都科摩公司 Generating resources for support of online services
CN103718173B (en) * 2011-07-29 2016-11-30 英派尔科技开发有限公司 BUT reasoning in inconsistent knowledge storehouse
CN107122421A (en) * 2017-04-05 2017-09-01 北京大学 Information retrieval method and device
CN107340999A (en) * 2017-01-09 2017-11-10 北京理工大学 Software automation method and system and the method in structure natural language understanding storehouse
CN107526727A (en) * 2017-07-31 2017-12-29 苏州大学 language generation method based on statistical machine translation
WO2019080648A1 (en) * 2017-10-26 2019-05-02 华为技术有限公司 Retelling sentence generation method and apparatus
CN110516157A (en) * 2019-08-30 2019-11-29 盈盛智创科技(广州)有限公司 A kind of document retrieval method, equipment and storage medium
CN111079641A (en) * 2019-12-13 2020-04-28 科大讯飞股份有限公司 Answering content identification method, related device and readable storage medium

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339551B (en) * 2007-07-05 2013-01-30 日电(中国)有限公司 Natural language query demand extension equipment and its method
US8180628B2 (en) 2007-07-05 2012-05-15 Nec (China) Co., Ltd. Apparatus and method for expanding natural language query requirement
CN101763401B (en) * 2009-12-30 2012-05-30 暨南大学 Network public sentiment hotspot prediction and analysis method
US9245049B2 (en) 2011-02-16 2016-01-26 Empire Technology Development Llc Performing queries using semantically restricted relations
CN103380426A (en) * 2011-02-16 2013-10-30 英派尔科技开发有限公司 Performing queries using semantically restricted relations
CN103380426B (en) * 2011-02-16 2017-09-22 英派尔科技开发有限公司 Inquiry is performed using semantic restriction relation
WO2012109786A1 (en) * 2011-02-16 2012-08-23 Empire Technology Development Llc Performing queries using semantically restricted relations
CN103718173B (en) * 2011-07-29 2016-11-30 英派尔科技开发有限公司 BUT reasoning in inconsistent knowledge storehouse
WO2013016854A1 (en) * 2011-07-29 2013-02-07 Empire Technology Development Llc But reasoning in inconsistent knowledge base
US8738561B2 (en) 2011-07-29 2014-05-27 Empire Technology Development Llc But reasoning in inconsistent knowledge base
KR101568623B1 (en) 2011-07-29 2015-11-11 엠파이어 테크놀로지 디벨롭먼트 엘엘씨 But reasoning in inconsistent knowledge base
CN102662930A (en) * 2012-04-16 2012-09-12 乐山师范学院 Corpus tagging method and corpus tagging device
CN102662930B (en) * 2012-04-16 2015-04-22 乐山师范学院 Corpus tagging method and corpus tagging device
CN104182386A (en) * 2013-05-27 2014-12-03 华东师范大学 Word pair relation similarity calculation method
CN103577558B (en) * 2013-10-21 2017-04-26 北京奇虎科技有限公司 Device and method for optimizing search ranking of frequently asked question and answer pairs
CN103577558A (en) * 2013-10-21 2014-02-12 北京奇虎科技有限公司 Device and method for optimizing search ranking of frequently asked question and answer pairs
CN105378729A (en) * 2013-11-27 2016-03-02 Ntt都科摩公司 Generating resources for support of online services
CN107340999A (en) * 2017-01-09 2017-11-10 北京理工大学 Software automation method and system and the method in structure natural language understanding storehouse
CN107122421A (en) * 2017-04-05 2017-09-01 北京大学 Information retrieval method and device
CN107526727A (en) * 2017-07-31 2017-12-29 苏州大学 language generation method based on statistical machine translation
WO2019080648A1 (en) * 2017-10-26 2019-05-02 华为技术有限公司 Retelling sentence generation method and apparatus
US11586814B2 (en) 2017-10-26 2023-02-21 Huawei Technologies Co., Ltd. Paraphrase sentence generation method and apparatus
CN110516157A (en) * 2019-08-30 2019-11-29 盈盛智创科技(广州)有限公司 A kind of document retrieval method, equipment and storage medium
CN111079641A (en) * 2019-12-13 2020-04-28 科大讯飞股份有限公司 Answering content identification method, related device and readable storage medium
CN111079641B (en) * 2019-12-13 2024-04-16 科大讯飞股份有限公司 Answer content identification method, related device and readable storage medium

Similar Documents

Publication Publication Date Title
CN1794240A (en) Computer information retrieval system based on natural speech understanding and its searching method
Yu et al. Typesql: Knowledge-based type-aware neural text-to-sql generation
Unger et al. Question answering over linked data (QALD-4)
US10503828B2 (en) System and method for answering natural language question
DE69932044T2 (en) LANGUAGE-BASED INFORMATION AND LANGUAGE RECOGNITION
CN1252876A (en) Information retrieval utilizing semantic presentation of text
Pattaniyil et al. Combining TF-IDF Text Retrieval with an Inverted Index over Symbol Pairs in Math Expressions: The Tangent Math Search Engine at NTCIR 2014.
CN1845104A (en) System and method for intelligent retrieval and processing of information
CN1335574A (en) Intelligent semantic searching method
CN1916905A (en) Method for carrying out retrieval hint based on inverted list
CN101051311A (en) Method for extracting central term of headword through central term dictionary and information search system of the same
CN1145899C (en) Method for automatic generating abstract from word or file
CN105760462B (en) Man-machine interaction method and device based on associated data inquiry
CN1652106A (en) Machine translation method and apparatus based on language knowledge base
CN1492367A (en) Inquire/response system and inquire/response method
CN102339294A (en) Searching method and system for preprocessing keywords
CN1949211A (en) New Chinese characters spoken language analytic method and device
CN1629837A (en) Method and apparatus for processing, browsing and classified searching of electronic document and system thereof
CN103885985A (en) Real-time microblog search method and device
CN1916904A (en) Method of abstracting single file based on expansion of file
CN113761162B (en) Code searching method based on context awareness
WO1998049632A1 (en) System and method for entity-based data retrieval
CN105677684A (en) Method for making semantic annotations on content generated by users based on external data sources
TWI446191B (en) Word matching and information query method and device
CN102508920B (en) Information retrieval method based on Boosting sorting algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication