CN1521661A - Method for information retrieval by using natural language processing function - Google Patents

Method for information retrieval by using natural language processing function Download PDF

Info

Publication number
CN1521661A
CN1521661A CNA031020755A CN03102075A CN1521661A CN 1521661 A CN1521661 A CN 1521661A CN A031020755 A CNA031020755 A CN A031020755A CN 03102075 A CN03102075 A CN 03102075A CN 1521661 A CN1521661 A CN 1521661A
Authority
CN
China
Prior art keywords
sublist
semantic
website
natural language
information retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA031020755A
Other languages
Chinese (zh)
Inventor
黄致辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CNA031020755A priority Critical patent/CN1521661A/en
Publication of CN1521661A publication Critical patent/CN1521661A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a method for information retrieval by utilizing natural language processing function comprising the steps of, the user feeding in the sentences for searching in the query interface, and breaking up the inputted sentences into a plurality of grammar parse tree subtable, transmitting all the subtables to each of the syntax website in the network, proceeding subsequent syntax analysis to the syntax parse tree subtables having other meaning with the help of the syntax website information, and the syntax website searching the data stored in the syntax website. The invention can effectively eliminate the problem of wrong meaning in the natural language analysis.

Description

A kind of natural language processing function of utilizing is carried out method for information retrieval
Technical field
The present invention relates to information retrieval technique, especially relate to a kind of natural language processing function of in whole information extraction process, utilizing and guarantee the free of losses of useful information to greatest extent, thereby improved the information retrieval method of the accuracy rate of information extraction.
Background technology
When prior art utilizes search engine inquiry to come retrieving information, normally come retrieving files by the input inquiry statement.
A kind of method is exactly a statistical method.Total system is based on keyword, and the Boolean expression of keyword is as query statement.For document data bank, use the dictionary of " keyword adds that the position appears in keyword ".Keyword in keyword in the comparison query statement and the document data bank dictionary just can find this speech position hereof.
This method to text message extraction/searching system, has two technical indicators:
Actual relevant documentation quantity in the number of documents/database of association's rate=extraction;
The number of documents of the extraction of the relevant documentation quantity of accuracy rate=extract/total.
The association's rate and the accuracy rate of all commercial systems for applications all do not reach 40%.And association's rate and accuracy rate have the cancellation effect of an inherence.Agree that the similar shape word just not can not find; Similar shape disagrees with that word just confused.
The cancellation effect of association's rate and accuracy rate is such: association's rate height, the file of extraction are just many, but wherein irrelevant file also just increased, so accuracy rate is just low; If the accuracy rate height, so many associated documents just do not extract, and therefore can't guarantee that abundant associated documents are extracted out, so association's rate are also just low.
Influencing the low theoretical defects of accuracy rate is: just consider the word external form, do not consider the word practical significance.And in fact word meaning ubiquity multiple implication.
Theory limit's property of SQL SQL is: only be suitable for relational model, be not suitable for inheriting and OO model such as abstract; Simple object can only be used for, complex object can not be handled.
At the key word information extraction method, some improvement have also appearred recently, as: synonym, statistic law, fuzzy logic, notion group; But all do not have from breaking through the defective of keyword Boolean expression in fact as query statement.
Semantic network is a very important thought of artificial intelligence field.Some classification policys have been arranged.Its basic ideas connect a word meaning and all related notions exactly.Mutual relationship between notion and the notion is formed a network structure, and this network structure is formed a hierarchical structure with " isa " relation.Find the definite meaning position of word from top layer, the meaning of such word has just been launched layer by layer always.
The defective of semantic network theory is: do not link up with the particular words meaning; Not based on psychological linguistics research; Do not link up with the word syntactic property; Can't in the grammatical analysis process, launch.
Also without any expression strategy, comprise the expression way of any semantic network at present, can solve and express the so thorny difficult problem of all notions of natural language exhaustively at natural language.Though finished some little semantic networks, never the big basis that must be enough to of any one semantic network as a natural language system.
Sum up all above-mentioned methods, the meaning of inquiry and people want that the meaning of describing has gap; Can not eliminate the cancellation effect of association's rate and accuracy rate effectively; Control characteristic level complexity effectively; Can't solve thorny commonsense reasoning.
Summary of the invention
The technical problem to be solved in the present invention is to propose a kind of natural language processing function of utilizing to carry out method for information retrieval, and combined high precision syntax analyzer of the present invention and up-to-date semantic web technology improve the accuracy rate of information retrieval significantly.
The method of the invention comprises the steps:
The sentence that step 1, user will inquire about in the query interface input;
Step 2, the sentence of being imported is resolved into several parse tree sublists;
Step 3, all parse tree sublists are delivered to each semantic website in the network;
Step 4, the parse tree sublist that ambiguity is arranged is carried out follow-up grammatical analysis in conjunction with semantic site information;
Step 5, semantic website use semantic knowledge and rule are searched the data that are kept in the semantic website according to the semanteme of parse tree sublist, if come to nothing, then finish; If the result is arranged, finish after then the result being returned query interface.
The present invention utilizes semantic web technology, by means of the semantics knowledge and the inferential capability of each semantic website itself, eliminates the ambiguity problem in the natural language analysis effectively, thereby realizes high accuracy rate.Simultaneously, the present invention is for the query statement of user's input, carry out grammatical analysis according to " expansion+eliminating " method, guaranteed limit analysis to each possible outcome, directly all these grammatical analysis results are passed to each the semantic website on the network, rather than resemble the Boolean expression that query statement is converted into traditional information extracting system keyword, or the result after handling through specific area or limited scale semantic system.Like this, just guaranteed that the harmless lost territory of original useful information of user's input arrives each semantic website, rather than as classic method, before retrieval, just lost a large amount of original useful informations.
Description of drawings
Fig. 1 is the process flow diagram of the method for the invention;
Fig. 2 is the processing procedure of the semantic website of a football to user's input;
Fig. 3 is a kind of structural representation that carries out the natural language analysis device of grammatical analysis;
Fig. 4 is the concrete instance of parse tree generative process.
Embodiment
The inventive method combined high precision syntax analyzer and up-to-date semantic web technology improve the search accuracy rate of information retrieval significantly.
As everyone knows, want to eliminate effectively the ambiguity in the natural language parsing, must use semantic knowledge and reasoning.And all existing semantic processes methods all can't be described semantic knowledges all in the natural language exhaustively, existing method can only be at specific ken, perhaps general utility functions but in the limited semantic coverage of scale, utilize semantic knowledge and reasoning to eliminate ambiguity in the natural language parsing, so not only limit the range of application of information extraction, and can cause the losing in a large number of original useful information of natural language.
Because the method for the invention based on up-to-date semantic web technology, for the present invention is described better, is introduced the correlation technique of semantic website below:
For " I want to take an airline flight from Beijing to Washington on May first2002; " such complicated query example sentence, such inquiry can't be handled in general website to which should start from Beijing in afternoon..If want to solve this class problem, must surmount keyword and the meaning of the resource determining to describe on the website.
Semantic web station system can define website ontology and relevant knowledge basis thereof.Ontology is a vocabulary of using from philosophy, refers to the science of describing individual type and mutual relationship thereof.In ontology, an entity is exactly the set of the definition of a class and attribute, and the rule that retrains these classes and attribute.Comprise:
Taxonomic relation between the class; The attribute of data type (to the description of generic attribute); Object properties (to the description of the relation of class); The example of class; The example of attribute.The attribute that data type attribute wherein and object properties are combined into class.
Assert the inference system of packing into for one group, this inference system is called knowledge base.These assert the fact of member's individuality that may comprise class, the perhaps fact of Tui Daoing, the non-existent fact in ontological surperficial literal, but the fact that ontological semanteme contained.These assert and may be based on a single ontology, or a plurality of distributed ontology that links together of the mechanism of the system by semantic website.
Ontology is different from the ad hoc rules in the specific area scope.Ad hoc rules in the specific area scope can't be handled the reasoning beyond this field at all.For example: the general mechanism of website neither one of content of good is drawn a conclusion: because product is a kind of Maotai, so it must be a kind of liquor.An ontological advantage is exactly that it can carry out reasoning to top situation.They will provide a kind of general support, rather than be confined to a specific field.
Semantic website is to handle web site contents, so he innately is exactly distributed and open.That is to say that resource description is not to be confined to a file or scope, but has OO inheritance and extensibility.For example: class C1 defines in ontology A at first, and it is expanded in ontology B, and ontology C has inherited such from ontology B then, thereby the class C1 among the ontology C has been arranged.
Up-to-date semantic web technology makes each website possess general semantic meaning representation and general knowledge reasoning ability, is not limited to specific knowledge field and ad hoc rules.Total system is wide-open, utilizes tool software can make things convenient for the structure of user to ontology, that is to say that different users can be from the diverse ontology of main member.The architecture of this system innately is exactly distributed in addition, can classification expand, and has OO inheritance and extensibility fully.
Basic thought of the present invention is: utilize semantic web technology, by means of the semantics knowledge and the inferential capability of each semantic website itself, eliminate the ambiguity problem in the natural language analysis effectively, thereby realize high accuracy rate.
On the one hand, query statement for user's input carries out grammatical analysis according to " expansion adds eliminating " method, this parsing method has guaranteed the limit analysis to each possible outcome, by getting rid of those absolutely not analysis results, the result who remains or be correct result, or be to utilize semantic knowledge and reasoning to come the possible correct result of disambiguation.Directly all these grammatical analysis results are passed to each the semantic website on the network, rather than resemble the Boolean expression that query statement is converted into traditional information retrieval system keyword, or the result after handling through the semantic system of specific area or limited scale.Like this, just guaranteed that the harmless lost territory of original useful information of user's input arrives each semantic website, rather than as classic method, before retrieval, just lost a large amount of original useful informations.
On the other hand, after search request is received in each semantic website, need utilize semantic knowledge and reasoning to come disambiguation, so just carry out follow-up natural language parsing in conjunction with the information of semantic website if find the grammatical analysis result who has some to transmit into.If eliminated grammatical ambiguity effectively through follow-up grammatical analysis, perhaps transmitting the grammatical analysis of coming in itself does not just have ambiguity as a result, so just inquires about reasoning according to the semantic knowledge and the inference system of semantic website self.If eligible and obtained the result, just send the response message that comprises the result to query interface; If ineligible, just do not respond.
Because the 26S Proteasome Structure and Function of semantic website is distributed and open, so the processing procedure of semantic website both can utilize the information of self website to carry out reasoning, also can call the information that is distributed in the semantic website that is associated on the network and carry out reasoning to self inside, website, perhaps initial conditions is passed in whole or in part the semantic website that is associated that is distributed on the network, carry out reasoning in the inside, semantic website that is associated, then the result is turned back to self website or directly return query interface by associating websites.
Because semantic website allows the user independently to set up ontology, and its architecture is distributed and open, and thousands of people just can build thousands of individual different ontological semantic websites on the network so.Even all semantic knowledges are inexhaustible in the natural language, still,, find this website by said method surely with regard to one so as long as exist the ontology that a semantic website has comprised certain semantic knowledge on the network.Do not satisfy the website of querying condition and do not send response to query interface, that therefore return is correct result all.Accuracy rate is just very high like this.
In addition, up-to-date semantic web technology makes each website possess general semantic meaning representation and general knowledge reasoning ability, is not limited to specific knowledge field and ad hoc rules.Therefore, the user can import any field at query interface, the querying condition of any range.As long as input characters meets general syntax rule, and there is the information that satisfies condition on the network really, obtains correct result surely with regard to one.
Under the big principle of basic ideas of the present invention, can also carry out following replenishing and variation.
1, when user's input language is carried out grammatical analysis, can handle roughly one time earlier in conjunction with a knowledge-based inference system that comprises commonly used, basic semantic knowledge and rule earlier, but do not throw away for the sublist that the semantic exclusionary rules with this inference system excludes, but be transferred to all semantic websites with other consequent sublist, finally utilize the knowledge-based inference system in the semantic website to determine whether that this excludes.Owing to finished the evaluation work of a part of knowledge reasoning at query interface earlier, so can reduce the total calculated amount in all semantic websites like this.
2, be not that the grammatical analysis result who will launch to add method for removing is directly passed to station addresses all in the network, but use earlier other technologies, under the prerequisite of the high association's rate of assurance, handle one time in advance.Illustrate: earlier according to general keyword retrieval method, find the website that the coupling keyword is arranged as much as possible, though such result's accuracy rate is very low, the website sum among the result is the sub-fraction of all website sums in the whole the Internet.The website transmission of result in the result of going ahead of the rest of method for removing will be launched to add then.Because the website sum in the result is the sub-fraction of all website summations in the whole the Internet in advance, so just can reduce owing to sending total network information flow that this grammatical analysis result causes.
3, regularly by the background process program with the semantic knowledge of semantic website and inference system through keeping some related contents after the particular procedure in the residing local computing environment of query interface.In the local computing environment of query interface, just can use the information and the function of part or all of semantic website like this, by in the local computing environment of query interface, query statement being carried out the processing of semantic web sites function to a certain degree, saved part transmission grammatical analysis result's on network data volume.Thereby quicken the computation process of whole inquiry.
4, the present invention not only is applicable to the INTERNET structure, is equally applicable to the information retrieval system under mininet or the stand-alone environment.Semantic website is a kind of in this class technology, and the present invention is equally applicable to other semantic knowledge technology that possesses the suitable function in semantic website.
Fig. 1 is the process flow diagram that the method for the invention is carried out information retrieval, specifies as follows:
101, the user is input query sentence in query interface;
102, add method for removing with expansion input query sentence is carried out grammatical analysis, obtain the parse tree sublist;
All results of 103,102 parse tree sublist has a sublist to comprise correct parse tree certainly; Presumable sublist comprises uncompleted parse tree; All these sublists are delivered to each semantic website the network from query interface;
104, enter semantic website, carry out following branch de-spread;
141, the class title of searching and determining to preserve in the semantic website;
142, search and the attribute of the class of determining to preserve in the semantic website;
143, search and the class of determining to preserve in the semantic website and the relation between the class;
144, that searches and determine to preserve in the semantic website asserts, carries out knowledge reasoning then, in order to the parse tree sublist that ambiguity is arranged is carried out follow-up grammatical analysis;
145, inference engine utilizes semantic knowledge and rule, searches the data that are kept in the semantic website according to the semanteme of parse tree sublist;
105, judge whether to also have the sublist of transmitting not by semantic web analytics mistake, if enter 106; Otherwise, represent that all sublists all analyze, then enter 100;
106, the parse tree of this sublist is taken out;
107, judge whether this parse tree exists ambiguity, judge just whether it is complete parse tree, if there is ambiguity (just it is not complete parse tree), just enter 108;
108, utilize the semantic knowledge of semantic website and inference engine to help expansion and add method for removing and again this sublist is carried out grammatical analysis;
109, judge whether 108 obtain correct parse tree; If enter 110; Otherwise enter 105;
110, arrive this, parse tree is correct;
When the semantic knowledge that utilizes semantic website and inference engine came this parse tree analyzed, a correct parse tree may have a plurality of different semantemes.If this correct parse tree is wrong, so not can a step remove to search deep data message in the semantic knowledge scope of this semanteme website.Otherwise, just search qualified data message according to the semantic knowledge and the inference engine of semantic website.Have the result, also may come to nothing.
111, judge whether 110 obtain the result; If no, enter 105; If have, enter 112;
112, the result is returned to query interface; Enter 105 then;
100, finish, not to the query interface return messages.
Below in conjunction with Fig. 1,, describe the method for the invention in detail with a real example.
At first, at 101 o'clock, the user in query interface, import sentence " Did Arsenal win last week? "
Correspondence 102, it is as follows that the grammatical analysis that adds method for removing expanded in the sentence of user's input:
Adding method for removing with expansion carries out obtaining a correct parse tree after the grammatical analysis to this sentence; About how launching to add the parse tree of eliminating, the present invention will be introduced in Fig. 3, Fig. 4.
This parse tree is a general question by auxiliary verb " do " guiding; It is made up of past tense " did ", a noun as subject " Arsenal " and the predicate verb phrase " win last week " of auxiliary verb " do "; This predicate verb phrase is made up of the noun phrase " last week " of a verb word " win " and an express time; This noun phrase is made up of an adjective " last " and a time noun " week ", modifies the verb " win " of front as adverb of time;
103 o'clock then, this parse tree just was sent to semantic websites all on the network.
Be example then below, introduce processing procedure the parse tree of user's input with the semantic website of a weapon:
Receive Query Information in the semantic website of 104, one weapons; The semantic website of this weapon begins to carry out the branch de-spread;
The semantic website of this weapon finds earlier the title of a part of class that comprises in 141: " arsenal of munitions factory ", " house house ", non-life object non-life object ", " mankind's activity human activity ", " triumph win ";
Find the part attribute of the class that comprises " mankind's activity human activity " of the semantic website of this weapon in 142, " actor agent " arranged, " time time ", " place place ";
Find a part that concerns between class that the semantic website of this weapon comprised and the class in 143, have:
" arsenal " Is_A_Kind_Of " house "/" munitions factory " is a kind of " house ";
" house " Is_A_Kind_Of " non-life object "/" house " is a kind of " non-life object ";
" win " Is_A_Kind_Of " human activity "/" triumph " is a kind of " mankind's activity ";
Find a part of asserting that comprises in the semantic website of this weapon in 144, have:
" non-life object " can not be the agent of " human activity "/" non-life object " can not be the actor of " mankind's activity ";
145 carry out reasoning, the part of the reasoning that carry out the semantic website of this weapon:
" arsenal " Is_A_Kind_Of " non-life object "/" munitions factory " is a kind of " non-life object ";
" arsenal " can not be the agent of " win "/" munitions factory " can not be the actor of " triumph ";
Come to nothing/Get no answer;
105-106 obtains this parse tree;
107, judge whether this parse tree has ambiguity, judge exactly whether it is complete parse tree.It does not have ambiguity as a result, and it is complete parse tree, enters 110;
110, utilize the information of the semantic website of this weapon to carry out reasoning, just the process of above-mentioned 141-145 enters 111 then;
111, judge whether 110 have the result.In this example, in the weapon semantic coverage, the subject of verb " triumph win " can not be done in noun " arsenal of munitions factory ", do not meet the data message of querying condition so have further to derive; Get back to 105,,, do not return any message to query interface so leave it at that because the sublist of input has only one.
Above-mentioned query and search about the semantic website of weapon is come to nothing, which type of result is ensuing example that if above-mentioned parse tree is delivered to the semantic website of a football, corresponding processing is again?
The processing procedure that the semantic website of football of Fig. 2 is imported the user:
2102, semantic website receives Query Information, above being exactly the input sentence " Did Arsenal win lastweek? "
2104, deliver to the semantic website of a football and analyze; It is as follows that the branch de-spread is carried out in the semantic website of this football:
141 of corresponding diagram 1 finds the title of a part of class that comprises in the semantic website of this football, has:
" arsenal of munitions factory ", " the football club of football club ", " the football team of football team ", " life object life object ", " mankind's activity human activity ", " triumph win ";
142 of corresponding diagram 1 finds the part attribute of the class " mankind's activity human activity " that comprises in the semantic website of this football, " actor agent " is arranged, " time time ", " place place ";
143 of corresponding diagram 1 finds a part that concerns between the class that comprises in the semantic website of this football and the class, has:
" arsenal " Is_One_Of " football club "/" munitions factory " is one " football club ";
" arsenal " Is_One_Of " football team "/" munitions factory " is one " football team ";
" football team " Is_A_Kind_Of " life object "/" football team " is a kind of " life object ";
" win " Is_A_Kind_Of " human activity "/" triumph " is a kind of " mankind's activity ";
144 of corresponding diagram 1 finds a part of asserting that comprises in the semantic website of this football, has:
" non-life object " can not be the agent of " human activity "/" non-life object " can not be the actor of " mankind's activity ";
" life object " can be the agent of " human activity "/" life object " can be the actor of " mankind's activity ";
145 of corresponding diagram 1 is carried out reasoning, the part of the reasoning of carrying out in the semantic website of this football:
" arsenal " Is_One_Of " life object "/" munitions factory " is one " a life object ";
" arsenal " can be the agent of " win "/" munitions factory " can be the actor of " triumph ";
Use the query statement searching database: Search databa se with QUERY=(" football team "=" arsenal ") AND (" human_activity "=" win ") AND (" time "=" last week ");
Obtain the result: Get answer=" Arsenal won Livepool with 3:2 last week. ";
2106, obtain this parse tree;
2107, judge whether this parse tree 4101 has ambiguity, judge just whether it is complete parse tree; It does not have ambiguity as a result, and promptly it is complete parse tree; Enter 2108;
2108, utilize the information of the semantic website of this football to carry out reasoning, just the process of 141-145 among Fig. 1; Enter 2111 then;
2111, judge whether 2108 have the result; In this example, in the scope of football semanteme, the subject of verb " triumph win " can be done in noun " arsenal of munitions factory ", so further derive query sentence of database QUERY=(" football team "=" arsenal ") AND (" human_activity "=" the win ") AND (" time "=" last week ") of keyword, again the data library searching obtained result " Arsenal won Livepoolwith 3:2 last week. "; Enter 2112;
2112, to query interface return results " Arsenal won Livepool with 3:2 last week. ";
Because the sublist of input has only one, so leave it at that.
Fig. 3 has provided the modular structure of natural language analysis device of the present invention, comprising: database 10, load module 20, sublist control module 30 and calculation process module 40.
Database 10: the content of database is various dictionary resources, deposits in the hard disk, a part wherein can be called in internal memory when program run.Sentence structure dictionary 11 wherein is used for determining each word all possible part of speech in dictionary for processing module provides the part of speech locating function.
Load module 20: this module is arranged in internal memory, comprises input-buffer district 21 and sublist unwind 22.At first pending input sentence is temporarily left in the input-buffer district 21; Sentence structure dictionary 11 among sublist unwind 22 Query Databases 10 obtains all possible part of speech of each word then; Last sublist unwind 22 will be imported sentence and resolve into sublist one by one.Each sublist is given sublist control module 30 and is preserved.
Sublist control module 30: comprise sublist control program 31 that is arranged in internal memory and the sublist file 32 that is arranged in storer; Sublist control program 31 be used for sublist preservation, call, sort, search, index, the sublist state is preserved and resource for computer system calls and control such as distribution, the result is preserved in the sublist file 32.
Calculation process module 40: this module is arranged in internal memory, comprises rule base, analysis and Control program 41 and current sublist analytic unit 42.The sublist that analysis and Control program 41 will also need to carry out grammatical analysis is taken out from the sublist control module and is put among the current sublist analytic unit 42, then the sentence content of current sublist and the rule of rule base is compared.If found the language phenomenon of apparent error, just this sublist is abandoned; Otherwise, just carry out combined analysis according to create-rule, then combined analysis result and Status Flag thereof are put back to sublist control module 30.
If can then the sublist behind the merge node be sent back in the sublist tabulation according to the create-rule merge node.At this moment, the attribute that also can be combined the sublist of handling is made mark.For example, can make corresponding mark at its create-rule of using.Like this, the applicable create-rule of how many kinds of is arranged, just can corresponding how many kinds of mark, thus improved the speed of the analysis computing of next round.The advantage of this way is earlier each sublist to be analyzed computing widely, then, carries out next round on this basis more repeatedly.Certainly, can also adopt the way of " depth analysis ",, then the sublist behind the merge node be sent back in the current sublist processing unit again, carry out logical operation once more if Here it is can merge node.Like this, this sublist is repeatedly got rid of repeatedly, merged, until can not having remerged, just this sublist is upgraded and made " end " mark; The sublist of upgrading is delivered to the afterbody of sublist tabulation; Empty current sublist processing unit.
Fig. 4 is that the present invention illustrates that a complete parse tree is the object lesson how to generate, and the example sentence of being imported is " they considered him a fool ".
Shown in Fig. 4-1, load module 20 at first temporarily leaves the sentence " they considered him a fool " of input in the input buffer 21, to the sentence structure dictionary 11 in each word lookup database 10 of sentence, determine each word all possible part of speech in sentence structure dictionary 11 then.Wherein, have only the part of speech of " fool " this word to have two kinds, i.e. noun or verb.
Subsequently, among Fig. 4-2, the load module 20 word part of speech combination that whole sentence is all expands into sublist one by one, obtains two sublists: sublist 1 and sublist 2; Guarantee that each word in each sublist has only a part of speech.
Sublist 1 is: they (noun) considered (verb) him (noun) a (article) fool (noun);
Sublist 2 is: they (noun) considered (verb) him (noun) a (article) fool (verb);
The sublist of all expansion manages by sublist control module 30, it be specifically responsible for sublist preservation, call, sort, search, index, the sublist state parameter is preserved and resource for computer system calls and distribute or the like control function.As can be seen, when just having finished the branch table handling, two sublists are arranged in the sublist control module 30 from Fig. 3-2: sublist 1 and sublist 2, their status attribute all are initial state.And be empty in the current sublist processing unit 42.
From Fig. 4-3, just begin to enter cyclic process.First sublist that need carry out grammatical analysis in the sublist control module 30 is taken out to put in the current sublist processing unit 42 at every turn and carry out the grammatical analysis computing.
At first, the sublist in the sublist control module 30 1 is taken out, put in the current sublist processing unit 42 and carry out the grammatical analysis computing.Just only remaining sublist 2 in so module of meter control 30, its state is an initial state.Current sublist processing unit carries out grammatical analysis to sublist 1, has finally obtained complete parse tree, that is: a noun phrase+verb phrase.
Fig. 4-4 expression be after grammatical analysis computing in the current sublist processing unit 42 finishes, the state of the sublist 1 after analyzing is labeled as end, add the afterbody of sublist control module 30 again to.The state of current like this sublist processing unit 42 becomes sky again, and two sublists are arranged in the sublist control module, and one is sublist 2, and its state is an initial state; Another is a sublist 1, and its state is for finishing attitude.
Among Fig. 4-5, be that sublist 2 is taken out, put in the current sublist processing unit 42 and carry out the grammatical analysis computing current first table in the sublist control module 30.Just only remaining sublist 1 in so module of meter control 30, its state is for finishing; Sublist 2 in the current sublist processing unit 42 is in original state.It is carried out in the process of grammatical analysis, discovery can be used a sentence structure exclusionary rules, and promptly " article+verb " (DET+V), the phenomenon that shows this " article+verb " is any correct complete parse tree of absolutely not formation, so whole sublist all there is no need to carry out any further grammatical analysis again.So current sublist processing unit 42 just excludes sublist 2, no longer joins in the sublist control module.
Fig. 4-6 expression after current sublist processing unit 42 becomes sky, has only a sublist 1 in the sublist control module 30, its state is for finishing.Do not had to carry out grammatical analysis and got sublist, finished all grammatical analysis computings this moment, and whole like this grammatical analysis process just is through with.If eliminated all wrong ambiguities in the grammatical analysis calculating process of front, just has only unique correct complete grammatical analysis operation tree so among the result.If do not eliminate the ambiguity of all mistakes in the grammatical process of front fully, just may have a plurality of complete grammatical analysis operation trees so among the result, but can guarantee that one of them is correct complete grammatical analysis operation tree.
Owing in the whole grammatical analysis calculating process, effectively handle, so do not worry stopping the grammatical analysis calculating process because system resource exhausts by 30 pairs of system resources of sublist control module.So in the applicational grammar rule, can carry out exhaustive search fully.So just guarantee to be bound to generate correct complete grammatical analysis operation tree.
It should be noted last that: above embodiment is the unrestricted technical scheme of the present invention in order to explanation only, although the present invention is had been described in detail with reference to the foregoing description, those of ordinary skill in the art is to be understood that: still can make amendment or be equal to replacement the present invention, and not breaking away from any modification or partial replacement of the spirit and scope of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.

Claims (15)

1, a kind of natural language processing function of utilizing is carried out method for information retrieval, it is characterized in that, comprises the steps:
Step 1, user are at the sentence of query interface input inquiry;
Step 2, the sentence of being imported is resolved into several parse tree sublists;
Step 3, all parse tree sublists are delivered to each semantic website in the network;
Step 4, the parse tree sublist that ambiguity is arranged is carried out follow-up grammatical analysis in conjunction with semantic site information;
Step 5, semantic website use semantic knowledge and rule are searched the data that are kept in the semantic website according to the semanteme of parse tree sublist, if come to nothing, then finish; If the result is arranged, finish after then the result being returned query interface.
2, the natural language processing function of utilizing according to claim 1 is carried out method for information retrieval, it is characterized in that described step 2 resolves into several parse tree sublists with the sentence of being imported, and further comprises:
Set up dictionary and rule base;
Search each word of importing in the sentence in all parts of speech of dictionary, generate the sublist tabulation, each sublist comprises all words of importing sentence in the described sublist tabulation, and each word has only a part of speech;
Successively this sublist is put into the calculation process module,, carry out calculation process, both this was fallen with this this this this this this with this, also utilize create-rule to merge the intermediate node that makes new advances with the rule comparison in the rule base;
Whether the sublist that judgment processing is crossed can be got rid of, and wrong certainly sublist is got rid of, and can not get rid of, and then with after this sublist renewal and carrying out mark, is put back into the afterbody of sublist tabulation.
3, the natural language processing function of utilizing according to claim 1 is carried out method for information retrieval, it is characterized in that, in the process of described generation sublist tabulation, the sublist content can be compared with the exclusionary rules in the described rule base, abandon the sublist of apparent error.
4, carry out method for information retrieval according to the arbitrary described natural language processing function of utilizing of claim 1-3, it is characterized in that, if the sublist of handling can not be got rid of, then compare with the create-rule in the described rule base, judge whether to carry out the merging of node, if can merge, then after merging, deliver in the sublist tabulation; Perhaps directly deliver to and continue to compare computing in the calculation process module.
5, the natural language processing function of utilizing according to claim 4 is carried out method for information retrieval, it is characterized in that, the node of described sublist comprises that also the attribute of the sublist after being combined is made mark after merging.
6, the natural language processing function of utilizing according to claim 5 is carried out method for information retrieval, it is characterized in that, and the create-rule of using when merging at sublist, the attribute of the sublist after being combined is made corresponding mark.
7, the natural language processing function of utilizing according to claim 4 is carried out method for information retrieval, it is characterized in that, if judge that sublist can not merge, then this sublist is upgraded and made " end " mark, the sublist of upgrading is delivered to the afterbody of the sublist tabulation in the first step according to operation result.
8, carry out method for information retrieval according to claim 1 or the 7 described natural language processing functions of utilizing, it is characterized in that, query statement for user's input, carry out grammatical analysis according to the principle of launching earlier to get rid of again, each possible outcome is carried out the limit analysis, directly all these grammatical analysis results are passed to each the semantic website on the network, so just guaranteed that the harmless lost territory of original useful information of user's input arrives each semantic website.
9, the natural language processing function of utilizing according to claim 1 is carried out method for information retrieval, it is characterized in that following branch de-spread is carried out in described semantic website;
The class title of searching and determining to preserve in the semantic website;
The attribute of the class of searching and determining to preserve in the semantic website;
Class of searching and determining to preserve in the semantic website and the relation between the class;
That searches and determine to preserve in the semantic website asserts, carries out knowledge reasoning then, in order to the parse tree sublist that ambiguity is arranged is carried out follow-up grammatical analysis;
Inference engine utilizes semantic knowledge and rule, searches the data that are kept in the semantic website according to the semanteme of parse tree sublist.
10, carry out method for information retrieval according to claim 1 or the 9 described natural language processing functions of utilizing, it is characterized in that, described step 4 comprises: judge whether the parse tree of being analyzed has ambiguity, if have, then in conjunction with semantic site information, carry out follow-up grammatical analysis according to the principle of launching earlier to get rid of again, the proper syntax parsing tree that is drawn is carried out reasoning with semantic website.
11, the natural language processing function of utilizing according to claim 1 is carried out method for information retrieval, it is characterized in that, the processing procedure of semantic website comprises in the described step 4: both can utilize the information of self website to carry out reasoning, also can call the information that is distributed in the semantic website that is associated on the network and carry out reasoning to self inside, website, perhaps initial conditions is passed in whole or in part the semantic website that is associated that is distributed on the network, carry out reasoning in the inside, semantic website that is associated, then the result is turned back to self website or directly return query interface by associating websites.
12, the natural language processing function of utilizing according to claim 1 is carried out method for information retrieval, it is characterized in that, described step 2 resolves in the process of several parse tree sublists at the sentence that will be imported, can also comprise: comprise commonly used in conjunction with one, the knowledge-based inference system of basic semantic knowledge and rule is handled one time earlier, and the sublist that excludes with the semantic exclusionary rules of this inference system is transferred to all semantic websites with other consequent sublist, finally utilize the knowledge-based inference system in the semantic website to determine whether and exclude.
13, carry out method for information retrieval according to claim 1 or the 12 described natural language processing functions of utilizing, it is characterized in that, described step 3 can also comprise: earlier according to general keyword retrieval method, find the website that the coupling keyword is arranged as much as possible, will launch to add the website transmission of result in the result of going ahead of the rest of method for removing then.
14, the natural language processing function of utilizing according to claim 1 is carried out method for information retrieval, it is characterized in that, described step 2 also comprises: regularly by behind the semantic knowledge and inference system process particular procedure of background process program with semantic website, related content is retained in the residing local computing environment of query interface.
15, carry out method for information retrieval according to claim 1 or the 14 described natural language processing functions of utilizing, it is characterized in that: described method not only is applicable to the Internet architecture, is equally applicable to the information retrieval system under mininet or the stand-alone environment.
CNA031020755A 2003-01-29 2003-01-29 Method for information retrieval by using natural language processing function Pending CN1521661A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA031020755A CN1521661A (en) 2003-01-29 2003-01-29 Method for information retrieval by using natural language processing function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA031020755A CN1521661A (en) 2003-01-29 2003-01-29 Method for information retrieval by using natural language processing function

Publications (1)

Publication Number Publication Date
CN1521661A true CN1521661A (en) 2004-08-18

Family

ID=34281589

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA031020755A Pending CN1521661A (en) 2003-01-29 2003-01-29 Method for information retrieval by using natural language processing function

Country Status (1)

Country Link
CN (1) CN1521661A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100361126C (en) * 2004-09-24 2008-01-09 北京亿维讯科技有限公司 Method of solving problem using wikipedia and user inquiry treatment technology
CN102360346A (en) * 2011-10-31 2012-02-22 武汉大学 Text inference method based on limited semantic dependency analysis
CN102708199A (en) * 2012-05-18 2012-10-03 苏州佰思迈信息咨询有限公司 Enterprise information data retrieval system
CN106325973A (en) * 2015-06-29 2017-01-11 龙芯中科技术有限公司 Interpretive execution method and device for virtual machine instruction
CN106919674A (en) * 2017-02-20 2017-07-04 广东省中医院 A kind of knowledge Q-A system and intelligent search method built based on Wiki semantic networks
CN107004158A (en) * 2014-11-27 2017-08-01 爱克发医疗保健公司 Data repository querying method
US10185772B2 (en) 2013-04-28 2019-01-22 Hithink Royalflush Information Network Co., Ltd. Query selection method and system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100361126C (en) * 2004-09-24 2008-01-09 北京亿维讯科技有限公司 Method of solving problem using wikipedia and user inquiry treatment technology
CN102360346A (en) * 2011-10-31 2012-02-22 武汉大学 Text inference method based on limited semantic dependency analysis
CN102360346B (en) * 2011-10-31 2014-05-21 武汉大学 Text inference method based on limited semantic dependency analysis
CN102708199A (en) * 2012-05-18 2012-10-03 苏州佰思迈信息咨询有限公司 Enterprise information data retrieval system
US10185772B2 (en) 2013-04-28 2019-01-22 Hithink Royalflush Information Network Co., Ltd. Query selection method and system
CN107004158A (en) * 2014-11-27 2017-08-01 爱克发医疗保健公司 Data repository querying method
CN106325973A (en) * 2015-06-29 2017-01-11 龙芯中科技术有限公司 Interpretive execution method and device for virtual machine instruction
CN106325973B (en) * 2015-06-29 2019-10-25 龙芯中科技术有限公司 The interpret-execution method and device of virtual machine instructions
CN106919674A (en) * 2017-02-20 2017-07-04 广东省中医院 A kind of knowledge Q-A system and intelligent search method built based on Wiki semantic networks

Similar Documents

Publication Publication Date Title
Andrenucci et al. Automated question answering: Review of the main approaches
Moldovan et al. Cogex: A logic prover for question answering
US9710458B2 (en) System for natural language understanding
US20140163959A1 (en) Multi-Domain Natural Language Processing Architecture
US6829605B2 (en) Method and apparatus for deriving logical relations from linguistic relations with multiple relevance ranking strategies for information retrieval
Cafarella et al. Knowitnow: Fast, scalable information extraction from the web
US20170011023A1 (en) System for Natural Language Understanding
US20090089047A1 (en) Natural Language Hypernym Weighting For Word Sense Disambiguation
US20100030552A1 (en) Deriving ontology based on linguistics and community tag clouds
EP0597630A1 (en) Method for resolution of natural-language queries against full-text databases
Weir et al. Dbpal: A fully pluggable nl2sql training pipeline
CN1687925A (en) Method for realizing bilingual web page searching
US20170011119A1 (en) System for Natural Language Understanding
CN105677725A (en) Preset parsing method for tourism vertical search engine
CN102339294A (en) Searching method and system for preprocessing keywords
Berger et al. An adaptive information retrieval system based on associative networks
CN1521661A (en) Method for information retrieval by using natural language processing function
CN101398828A (en) Information precision search and information publishing method
Mezghanni et al. Learning of legal ontology supporting the user queries satisfaction
Akerkar et al. Natural Language Interface Using Shallow Parsing.
Martínez-Santiago et al. A merging strategy proposal: The 2-step retrieval status value method
Kumar et al. Smart information retrieval using query transformation based on ontology and semantic-association
Kaur et al. Effective question answering techniques and their evaluation metrics
CN114610842A (en) Associated searching method and system based on intention identification
Liu et al. Interactive question answering based on FAQ

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication