CN1141660C - Natural language statement analyzing method simulating brain's language sensing process - Google Patents

Natural language statement analyzing method simulating brain's language sensing process Download PDF

Info

Publication number
CN1141660C
CN1141660C CNB981019218A CN98101921A CN1141660C CN 1141660 C CN1141660 C CN 1141660C CN B981019218 A CNB981019218 A CN B981019218A CN 98101921 A CN98101921 A CN 98101921A CN 1141660 C CN1141660 C CN 1141660C
Authority
CN
China
Prior art keywords
sentence
semantic chunk
semantic
class
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB981019218A
Other languages
Chinese (zh)
Other versions
CN1236138A (en
Inventor
黄曾D
黄曾旸
张全
刘志文
晋耀红
杜燕玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Original Assignee
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS filed Critical Institute of Acoustics CAS
Priority to CNB981019218A priority Critical patent/CN1141660C/en
Publication of CN1236138A publication Critical patent/CN1236138A/en
Application granted granted Critical
Publication of CN1141660C publication Critical patent/CN1141660C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The present invention relates to a computer analysis method of simulating the natural language and statement sensing process from people's brain. The present invention is characterized in that the method comprises a statement classification analysis method and a concept hierarchy network language knowledge base. The present invention has the natural language process proposal that the statement concept association vein which is formed by the statement knowledge is activated so that the equivocation and the fuzziness can be processed in the concept and language hierarchy. The expression of the statement classification knowledge is used as a center, and the concept hierarchical network sign system is used for expressing the statement in the knowledge base. The method is concise and efficient, and moreover, the method can greatly reduce the requirement for the storage space.

Description

The natural language statement analytical method of simulation brain language perception
Technical field
The present invention relates to a kind of Computer Natural Language Processing method, more particularly, relate to a kind of computer analysis method of simulating human brain natural language statement perception.
Background technology
From the forties computing machine be born, attempt using a computer with regard to continuous someone the usual natural language of the mankind analyzed and handled, below be several main method.
Chomsky (N.Chomsky) has proposed trnasformational generative grammar (Transformational Generative Grammar) in 1950's, has formed the syntactic analysis method of trnasformational generative grammar.Chomsky proposes to exist deep structure in the language in trnasformational generative grammar, but he do not solve how deep structure is represented and deep structure the deep structure in how many kinds of, the natural language is arranged is limited or unlimited problem.Although therefore the trnasformational generative grammar of Chomsky is based on statement and generates very strict process, but for the very complicated language phenomenon that the mankind form naturally, trnasformational generative grammar and syntactic analysis method thereof also do not possess enough abilities and remove to handle natural language problem.In addition, because deep structure does not play a role in processing procedure, strong excessively generative capacity also makes the syntactic analysis of trnasformational generative grammar very unsuccessful.
Along with going deep into of research, progressively formed a collection of syntax theory of being convenient to computer implemented natural language processing.Comprise that mainly to expand conversion (ATN) grammer, systemic functional grammar and various phrase structure grammars etc. be the syntactic analysis method of guidance.These methods are strengthened greatly than trnasformational generative grammar on computer implemented convenience, but they have all abandoned pursuing the analysis to the language deep structure, and the problem analysis of natural language is not only the problem of a grammer.Therefore, can not to solve computing machine natural language analysis problem well be conspicuous to these methods.
Further developing of phrase structure grammar, have realized that needs to utilize the knowledge frequently that contains in the natural language to obtain analysis result preferably many-sidedly.In recent years, based on phrase structure grammar, introduce the knowledge representation method of set of complex features and the algorithm of unification computing, formed Lexical-Functional Grammar (Lexical Functional Grammar, abbreviation LFG), functional unification grammar (FunctionalUnification Grammar, abbreviation FUG), Generalized Phrase Structure Grammar (GeneralizedPhrase Structure Grammar, be called for short CPSG) and nuclear driving phrase structure grammar (Head-Driben Phrase Structure Grammar is called for short GPSG).These methods relate to semantic depth analysis a bit, but owing to lack whole notion statement system, they do not have the real deep layer semantic structure of finding and using natural language yet simultaneously, still use syntactic structure, with semantic processes replenishing as grammatical analysis.Thereby can not thoroughly solve the problem of using the grammatical analysis natural language to be run into.
In the prevailing while of grammer disposal route, the tight semantic analysis and processing method that relies on has also appearred.They are that case grammar (Case Grammar) and the China fir that Fei Ermo (Fillmore) proposes restrains the conceptual dependency theory (Conceptual Dependency) that (Schank) proposes.Though case grammar is being explored and pursued on the deep layer semantic structure and made certain contribution, it does not finally form a rounded system yet, for such as there being what lattice to answer in the natural language.Even to connect lattice in the natural language be limited or unlimited, and such basic problem also can't be come to a conclusion.Conceptual dependency theory is then when bases such as shortage complete concept statement system and deep layer semantic structure, just be deep into understanding to general knowledge in the natural language and professional knowledge, make it just as being based upon the mansion on the sandy beach, can't really bear the important task of natural language processing.Based on the disposal route of conceptual dependency theory, be absorbed in the ocean of knowledge at the very start and can not extricate oneself.This disposal route that has caused conceptual dependency theory always is in a kind of stage that needs to add knowledge in the face of unlimited natural language phenomenon, can't enter practicality.
Computer application at present is almost omnipresent, and software industry will become the leading industry of 21 century, and this indicates that the information age arrives.In the face of main carrier---the natural language of information and knowledge, computing machine presses for has the function of handling the natural language semantic knowledge.Therefore at first to set up the natural language statement analytical method of being convenient to computer operation, make computing machine can grasp the deep layer semantic structure of natural language preferably.
Summary of the invention
The purpose of this invention is to provide a kind of complete, towards the computing machine natural language statement analytical method of brain language perception various natural languages, the anthropomorphic dummy.
A kind of analytical approach of simulating the natural language statement of brain language perception, this method comprise the foundation of hierarchical network of concepts speech level knowledge base and concrete processing two steps of definite sentence category analysis (sca); Wherein, hierarchical network of concepts speech level knowledge base is divided into essential sentence class and subclass to the natural language statement, to each essential sentence class and subclass thereof, is semantic primitive with semantic chunk physical representation formula, provide corresponding physical representation formula, comprise standard, standard, fault and 4 kinds of basic formats of omission; But every kind of basic format has again accordingly, the different-format of exclusive list on the mathematics;
(1) the statement step of hierarchical network of concepts speech level knowledge base is as follows:
(a) statement is divided into 7 essential sentence classes by semanteme: effect name, process sentence, transfer sentence, effect sentence, relation sentence, state sentence and judgement sentence; According to the dependence power of semantic chunk with the sentence class, semantic chunk is divided into main semantic chunk and auxilliary semantic chunk, wherein auxilliary semantic chunk comprises: condition, means, instrument, approach, reference, because of, really; From its common feature main semantic chunk is divided into: feature semantic chunk, actor, object and content; Set up the general physical representation formula of semantic chunk: SK=individual character+general character=sentence category information+semantic chunk type information; When the feature semantic chunk of sentence comprises the feature of two essential sentence classes, constitute mixed sentence; When explaining the feature of two or more essential sentence classes with two or more feature semantic chunks in the sentence, constitute the compound sentence class; Form with symbol is come out above-mentioned information representation, forms knowledge base;
(b) to the vocabulary in the knowledge base, if its concept classification contains v, according to the semantic knowledge of itself determine its correspondence node effect sentence Φ 0, process sentence Φ 1, shift sentence Φ 2, an effect sentence Φ 3, relation sentence Φ 4 and state sentence Φ 5 and generally judge among sentence class Φ 8 and other subclasses j11 that judges sentence and be the main contents of representative, determine that according to corresponding situation vocabulary belongs to any of 7 essential sentence classes; Group sentence situation when in 7 essential sentence classes of correspondence, serving as the feature semantic chunk according to this word, the concrete sentence category code of determining in 57 corresponding subclasses; If the semantic knowledge main contents of this word comprise the aforementioned nodes of two correspondences, then press the mixed sentence class and handle; The code of mixed sentence class, the present invention has made agreement: with the sentence category code E1 of two essential sentence classes constituting the mixed sentence class and E2 adds * number and the formal representation of three bit digital kmn, wherein E1 and E2 are the sentence category codes of essential sentence class, K represents total number of non-E semantic chunk, and m represents that first semantic chunk begins in the E1 essential sentence class, does not comprise the E semantic chunk, the semantic chunk number of taking out, n represents the start sequence number of the semantic chunk that takes out from second essential sentence class E2, when n=m+1, n can not write; To causing the vocabulary of compound sentence class, add * number with the E2 centre with the code E1 of two essential sentence classes forming the compound sentence class, fill in a category information; When analyzing, can from notion aspect sentence class expression knowledge base, take out the format indication of two sentence classes according to the indication of E1 and E2;
(c) when the sentence category code is effective, according to the sentence category code in (b), a concrete definite sentence class belongs to two sentences, three sentences and four sentences; Concrete definite way is as follows: the general mathematical notation formula of statement can be write as:
Connect the feature semantic chunk of sentence behind the number one generalized object semantic chunk JK, connect second generalized object semantic chunk after again, connect the 3rd generalized object semantic chunk after again, all the other generalized object semantic chunks are listed in proper order;
Do not limit the number of generalized object semantic chunk JK in the expression, but for the essential sentence class, the practical natural language only need consider that the JK number is 1,2,3 situation, they are respectively corresponding to two main piece sentences, three main piece sentences and four main piece sentences;
For four main piece sentences, JK2 is necessarily based on object B, and JK3 is necessarily based on content C, and for three main piece sentences, B or C can serve as the main body of JK; For two main piece sentences, can not have E, but at this moment JK2 must be based on C, the narrative order that master's semantic chunk often adopts during according to this word group sentence finds concrete format code in Fig. 3; Provide this phrase form that the period of the day from 11 p.m. to 1 a.m often adopts that forms a complete sentence with the form of code; According to this indication, can from the format conversion knowledge base of notion aspect, obtain concrete form; In the time of a plurality of form, with [1] [2] ... form label so that the different situations under the corresponding expression different-format in every below; As often adopting standard format and cannonical format when forming sentence, this can not filled out;
(d) when the sentence category code is effective, this word is during according to (b) category code in the middle ten days group sentence, if the expection relation is arranged between this word and the generalized object semantic chunk, promptly this word requires specific notion to serve as its certain generalized object semantic chunk, and then that this is specific, notion preferential and this collocations provides by the described method of F=∑ (alphabetic string) (numeric string); This expection comprises the expection to certain composition in the generalized object semantic chunk structure; At this moment, at first the configuration information to semantic chunk is described, and provides the preferential notion of tie element then; Yi @S representative in the preferential notion shop order of the formation knowledge of semantic chunk and each component part, the formation knowledge of JK semantic chunk, with=and+fill in this; The preferential conceptual knowledge of each several part, use: expression, also fill out in this; If the sentence that the v notion constitutes, often require a sentence to become wherein semantic chunk, if vocabulary has this knowledge, just this represents respectively that with JK=J and JK:=J a certain semantic chunk JK must be expanded into sentence and maybe may be expanded into sentence in knowledge base; About definite opinion (i)
The component part of semantic chunk or semantic chunk can be from partitioning object B on the intension and two parts of content C, also can be from be divided into preceding Q, back H two parts in form; Belong to agreement for this formation, need not again that explicitly writes out expression formula, only need add above-mentioned four letter bs after this semantic chunk or component part, C, Q, H provide its preferential notion, just represent that this formation exists, and also illustrates the preferential notion of this part simultaneously;
(e) if a part and other parts that the generalized object semantic chunk of describing (d) is formed in the structure are not to be right after together, but appear at respectively on two positions that separate of statement, this situation in constituting, semantic chunk is represented, with H and (), the expression semantic chunk may separate the part of necessarily separating with semantic chunk respectively;
(f) specifically write out the title of serving as semantic chunk, the semantic role knowledge that provides this vocabulary to serve as when this speech; The semantic role that vocabulary serves as, Yi @CA representative; The v notion is served as the E semantic chunk, and this information belongs to agreement, does not belong to this fill substance; But when the v notion constitutes E semantic chunk a part of, need clearly fill out;
(g) if this vocabulary itself also may provide the association's knowledge between the statement, promptly there is a kind of explicit result; Specifically provide this linguistic context knowledge, Yi @CT representative;
(h) needs in order to express, sentence class often convert another class to and express, but the information of semantic association remains before the conversion, and the present invention is called ten days class conversion; The conversion of distich class, the present invention has also provided expression method; To the v notion that meeting changes, in " the sentence class form " of knowledge base, fill in (E1, E2) J, the sentence class that often adopts when wherein E1 is this v notion formation E semantic chunk, also can think normal, original sentence class, after E2 represents to change, the sentence class that conversion is adopted; To causing the v notion of conversion, fill in E1J<=E2J, E1J represents what the sentence class that is transformed into, E2J were represented to come from the conversion of which kind of class; When the sentence category code was effective, this vocabulary was explained the sentence of other classes when the group sentence with sentence class this vocabulary correspondence, this category code, will indicate this situation, and the sentence category code of these other classes is provided;
(i) when the sentence category code is effective, during simultaneously according to the sentence category code group sentence that provides, require a sentence to serve as its a certain generalized object semantic chunk, this situation is indicated, some semantic chunk that promptly provides this vocabulary initiation expands to the knowledge of statement, sees step (d).
(2) determine that the concrete treatment step of sentence category analysis (sca) is as follows:
(a) to the sentence of input, carry out the dictionary coupling, be syncopated as the speech that runs in the sentence, from knowledge base, obtain the semantic knowledge of these vocabulary;
(b) according to the indication of concept classification information, be foundation, form the semantic chunk blank, form the E hypothesis with semantic chunk differentiation designator l0 genus and verb v notion;
(c) if fail to form the E hypothesis, turn to (i); Otherwise, continue;
(d) whole E. are supposed to screen and line up, the main information of utilizing is: sentence category code, format code and word frequency and linguistic context knowledge;
(e), carry out a class check successively according to the ordering of selected E hypothesis; The main information of utilizing is: the preferred sex knowledge of the notion of semantic chunk core; If the one-hundred-percent inspection failure turns to (k); Otherwise continue;
(f) carry out semantic chunk and constitute check; The main information of utilizing is: semantic chunk constitutes knowledge and constitutes the knowledge of the preferential notion of semantic chunk each several part; If the one-hundred-percent inspection failure turns to (k); Otherwise continue;
(g) carry out a class Transformation Tests where necessary, the main information of utilizing is: the sentence class conversion knowledge that vocabulary causes; If the one-hundred-percent inspection failure turns to (k); Otherwise turn to (l);
(h) carry out semantic chunk where necessary and separate check, the main information of utilizing is: semantic chunk separates and conversion knowledge; If all fail then commentaries on classics (k), otherwise turn to (j);
(i) there is not the check of E semantic chunk sentence class: if failure continues; Otherwise turn to (l);
(j) recast E hypothesis successfully turns to (d), otherwise, turn to (k);
(k) man-machine interaction;
(l) collect the linguistic context material, processing finishes.
The present invention is the computing machine natural language statement analytical method of anthropomorphic dummy's brain language perception.The people in the process of perception natural language, the knowledge of integrated use notion aspect, speech level and general knowledge specialty aspect; Wherein the knowledge of notion aspect and speech level is that the mankind carry out the key that perception is handled.The knowledge of notion aspect is irrelevant with languages, the knowledge of the processing natural language that the mankind are shared, and speech level knowledge is meant the knowledge that those are relevant with languages in perception.In the notion aspect, the present invention is an object with whole natural language, has intactly divided the sentence class, has provided the sentence class expression and the format conversion table of natural language, has set up the deep layer semantic structure of natural language statement.
This notion of sentence class is meant declarative sentence, imperative sentence, interrogative sentence and exclamative sentence in traditional grammar, mainly be the pragmatic classification of sentence, and sentence class of the present invention is meant the semantic classes of sentence.The present invention is divided into 7 essential sentence classes with statement by semanteme: effect sentence, process sentence, transfer sentence, effect sentence, relation sentence, state sentence and judgement sentence.
Semantic chunk is the semantic component unit of sentence, can be speech, phrase or a sentence in form.Proposing the semantic chunk notion is for the ease of describing sentence from semantic level.According to the dependence power of semantic chunk, semantic chunk is divided into main semantic chunk and auxilliary semantic chunk with the sentence class.Main semantic chunk depends on a class by force, depends on a class a little less than the auxilliary semantic chunk.Auxilliary semantic chunk is divided into 7 kinds: condition, means, instrument, approach, reference, because of, really.Main semantic chunk can be divided into from common feature: feature semantic chunk, actor, object and content.The personal characteristics of semantic chunk is its sentence generic attribute.Two sides of the general character of semantic chunk and individual character should be considered as at the bottom of two orthogonal basiss of statement two-dimensional space.Therefore, the general physical representation formula of semantic chunk is:
SK=" individual character+general character "=" sentence category information+semantic chunk type information " (1)
Following formula has shown that semantic chunk is the function of a class.Sentence class under the statement is by its feature semantic chunk decision.When the feature semantic chunk of sentence comprises the feature of two essential sentence classes, constitute the mixed sentence class; When explaining the feature of two or more essential sentence classes with two or more feature semantic chunks in the sentence, constitute the compound sentence class.
In order to make computing machine can use these knowledge, must these information representations be come out with the form of symbol, and form knowledge base.Need provide the expression of a class and the map table of form in the notion aspect; In speech level, need provide the knowledge of serving as theme at the vocabulary of concrete syntax with the sentence class.Following mask body is introduced the construction of two class knowledge bases.
The symbolic representation of four kinds of main semantic chunk primitives is: feature E, actor A, object B and content C; 7 kinds of auxilliary semantic chunks are: condition C n (Condition), means Ms (Means), instrument In (Instrument), approach Wy (Way), with reference to Re (Refer), because of Pr (Premise), fruit Rt (Result).The symbolic representation of essential sentence class is: act on X, process P, shift T, effect Y, concern R, state S and judge D.During the accurate expression of main semantic chunk, all use the serial connection form of capitalization and numeral to express two category informations in (1) formula.In the sentence category information item, essential sentence class, digitized representation subclass represented in letter; In the semantic chunk type information item, semantic chunk type, the subclass of digitized representation type represented in letter.The semantic chunk called after feature semantic chunk that only contains a category information is designated as E; Contain the semantic chunk called after generalized object semantic chunk of a category information and semantic chunk type information simultaneously, be designated as JK.
For example, X2, X2B, XAC, X26 represent reaction, reactor, reaction occasioner and the performance thereof of response sentence (one of effect sentence class), 4 kinds of semantic chunks such as follow-up performance of reactor respectively, and here, X2 is the E piece, and other all is the generalized object semantic chunk.Again for example, TB, TC are object and the contents that shifts sentence, and the object and the content of information transfer sentence (shifting one of sentence class) are designated as T3B, T3C respectively, and the both sides of relation are designated as RB1, RB2 respectively, or the like.
The general mathematical notation formula J of statement can be write as: J n + 1 = JK 1 + E + Σ j = 2 n JKj . . . . . . . . . . ( 2 )
JK1 is called the generalized object semantic chunk No. 1, and the rest may be inferred by analogy for it.Expression (2) does not limit the number of JK, but for the essential sentence class, the practical natural language only need consider that the JK number is 1,2,3 situation, and they are respectively corresponding to two main piece sentences, three main piece sentences and four main piece sentences.
For four main piece sentences, JK2 is necessarily based on object B, and JK3 is necessarily based on content C, and for three main piece sentences, B or C can serve as the main body of JK.For two main piece sentences, can not have E, but at this moment JK2 must be based on C, this situation often appears in the state sentence of Chinese.
E in (2) formula and JK are replaced with semantic chunk physical representation formula, promptly constitute the physical representation formula of statement.The semanteme statement that these physical representation formulas are statement deep structures.The present invention has provided the sentence class expression of 57 essential sentence classes and subclass thereof.The sentence class expression of mixed sentence class can be known by inference by essential sentence class expression, and needn't be built the storehouse separately.
Four kinds of format conversion types are explained as follows:
The feature of standard format is: main piece is pressed the natural logic series arrangement of language.The order of semantic chunk is just represented with this form in the sentence class expression storehouse.
The feature of cannonical format is: the natural logic of having violated language that puts in order of main piece puts in order, thereby has departed from standard format, but must add cue mark between the generalized object semantic chunk.To three main piece sentences, cannonical format has 4 kinds.To four main piece sentences, cannonical format has 23 kinds.
The feature of fault form is: partly or entirely omit cue mark between the generalized object semantic chunk.To three main piece sentences, the fault form has 4 kinds.To four main piece sentences, the fault form has 47 kinds.
Omit form and be meant the some semantic chunks of omission in the sentence.
Language knowledge base is exactly at the vocabulary in the concrete syntax, describes its semanteme and sentence class knowledge.The present invention uses the hierarchical network of concepts symbolism to explain these knowledge, and therefore, this language knowledge base claims the hierarchical network of concepts knowledge base again.Specifically, providing the knowledge of anolytic sentence exactly from the following aspects, for the ease of understanding, is that example is described with Chinese:
1. semantic knowledge.Notion statement system with natural language provides.Notion in the natural language has notion primitive and compound notion two classes, and the notion primitive refers to the notion that the definition of the semantic network node that its semanteme can provide with accompanying drawing 1 is directly expressed; Compound notion refers to and can't directly express with semantic network node, needs could express semantic notion through combination.The semantic expressiveness formula of notion primitive is:
F=∑ (alphabetic string) (numeric string) (3)
F represents the symbolic representation of notion primitive.Alphabetic string adopts lowercase, and numeric string adopts 16 systems numeral 0-f.By five-tuple { v (notion dynamically), g (static state of notion), u (attribute of notion), z (value of notion), r (effect of notion)), concrete concept classification { p (people), w (thing) }, aggregate concept classification { e (understanding comprehensive between the notion) between primitive, fundamental sum, x (rerum natura) } and semantic network symbol { Φ (primitive notion semantic network), j (key concept semantic network), 1 (logic of language notion semantic network), j1 (basic logic notion semantic network), jw (base substance notion semantic network)) the formation alphabetic string.Because the amount maximum of primitive notion is omitted when writing and is not write out Φ.Numeric string is the level symbol.
The semantic expressiveness of compound notion is: F=∑ F (K) (4)
F (K) promptly is the F of (3) formula, passes through between them:
Effect # effect symbol $
Dui Xiang ﹠amp; The content symbol |
Logic also, the choosing; Logical combination (, L)
Polarization/subject-predicate ‖
Non-! Instead
Preferential combination () is affiliated to+the composite symbol connection.
2. concept classification.The external performance of notion that vocabulary is expressed, i.e. alphabetic string in the content 1.When lexical representation be the notion primitive time, the alphabetic string of this symbol and semantic knowledge (seeing 1) is identical; When lexical representation be compound notion the time, the external manifestation of this expression combination back vocabulary may be different with the class code of each the notion primitive that constitutes combination.This has described the complete external manifestation of vocabulary.Directly provide concept classification, be convenient to computing machine at first use classes knowledge carry out analyzing and processing.
3. word frequency and linguistic context.The present invention expresses this knowledge with the hexadecimal digit of 0-b, estimate according to the semantic operating position of word.Each numeral is defined as: 0 extremely high frequency; 1 is commonly used; 2 specialties are commonly used; 3 is non-common; 4 spoken languages; 5 dialects; 6 ancient using; 7 modern ages; 8 is seldom used; 9 specialties are non-common; The extremely seldom used b. specialty of a is seldom used.
The sentence category code.When vocabulary has clear and definite sentence category information, fill in the information of a class with the form of code, this is primarily aimed at the verb v notion that can serve as E semantic chunk core and fills in.The sentence category code of essential sentence class correspondence as shown in Figure 2.The code of mixed sentence class (the mixed sentence class in the natural language, the overwhelming majority mixes in twos, therefore knowledge mixed sentence class of the present invention promptly refers to the mixed sentence class of mixing in twos), the present invention has made agreement: with the formal representation of E1E2*kmn.E1, E2 are the sentence category codes of essential sentence class, represent the essential sentence class of two mixing respectively; K represents total number of non-E semantic chunk, and m represents that first semantic chunk begins from E1 essential sentence class, does not comprise the E semantic chunk, the semantic chunk number of taking out, n represents the start sequence number of the semantic chunk that takes out from second essential sentence class E2, when n=m+1, n can not write.As: a class T3J=TA+T3+TB+TC and an XJ=A+X+B are arranged, and the sentence class form of T3X*21 is TA+T3X+B, and XT3*21 is A+XT3+TB, and XT3*213 is A+XT3+TC.The situation of filling in the knowledge base, " freedom " of consulting accompanying drawing 4.To causing the vocabulary of compound sentence class,, fill in a category information with the form of E1*E2.E1, E2 are the sentence category code of essential sentence class.When analyzing, can from notion aspect sentence class expression knowledge base, take out the format indication of two sentence classes according to the indication of E1 and E2.
5. format conversion knowledge.When " sentence category code " effectively the time, provide this phrase form that the period of the day from 11 p.m. to 1 a.m often adopts that forms a complete sentence with the form of code.According to this indication, can from the format conversion knowledge base of notion aspect, obtain concrete form.As: in the sentence category code XJ is arranged, in format conversion knowledge, has! 12, then the form of the frequent B+A+X that adopts of expression constitutes sentence.In the time of a plurality of form, with [1] [2] ... form label so that the different situations under the corresponding expression different-format in every below.As often adopting standard format and cannonical format when forming sentence, this can not filled out.
For the needs of expressing, sentence class often converts another class to and expresses, but the information of semantic association remains before the conversion, and the present invention of this phenomenon is called a class conversion.The conversion of distich class, the present invention has also provided expression method.To the v notion that meeting changes, in " the sentence class form " of knowledge base, fill in (E1, E2) J, the sentence class that often adopts when wherein E1 is this v notion formation E semantic chunk, also can think normal, original sentence class, after E2 represents to change, the sentence class that conversion is adopted.Consult " predation " of accompanying drawing 4.To causing the v notion of conversion, fill in E1J<=E2J, E1J represents what the sentence class that is transformed into, E2J were represented to come from the conversion of which kind of class.As: " love and esteem ", its knowledge is (X20, X10) J, represent that it can be converted to by original response sentence and bear sentence: it is X10J<=X20J that " being subjected to " this speech has a knowledge, represent that it can guide response sentence to convert to and bear sentence: for " love and esteem ", sentence " we love Zhou premier " can be arranged, this sentence can in order to " being subjected to " guiding conversion represent---" premier Zhou is subjected to our love and esteem ".
6. the preferential notion of the formation knowledge of semantic chunk and each component part, Yi @S representative in the shop order.When " sentence category code " was effective, the JK semantic chunk in the sentence class form filled in this with "=" and "+" if any constituting knowledge; As the each several part that constitutes semantic chunk has preferential conceptual knowledge, with ": " expression, also fills out in this.As: to XJ, its B semantic chunk is made of YB and YC, is write as B=YB+YC; Often be " thing " wherein, also in this, write, write as YB as YB; W (w promptly is aforesaid concept classification symbol, expression " thing ").The sentence that some v notion constitutes, often require a sentence to become its certain semantic chunk, if vocabulary has this knowledge, just this represents respectively that with JK=J and JK:=J a certain semantic chunk JK must be expanded into sentence and maybe may be expanded into sentence in knowledge base.As: " thinking ", in this, just need fill in DC=J, expression DC semantic chunk necessarily is expanded into sentence.
The component part of semantic chunk or semantic chunk can be from partitioning object on the intension (B) and two parts of content (C), also can be before be divided in form (Q), back (H) two parts.Belong to agreement for this formation, need not again that explicitly writes out expression formula, only need after certain semantic chunk or component part, add above-mentioned four letter (B, C, Q H) provides its preferential notion, just represent that this formation exists, and also illustrates the preferential notion of certain part simultaneously.
The semantic chunk that constitutes sentence can separate, and promptly the needs that reach for statement list are assigned to two local expression with a dark semantic chunk.Knowledge base of the present invention has also provided clear and definite form of presentation for this language phenomenon, and respectively with " [] " and " [()] ", the expression semantic chunk may separate the part of necessarily separating with semantic chunk.As: " interrupting ", in this, just have " B=XB+[YB] ", illustrate that its B semantic chunk may separate, example sentence is as " Li Si has been interrupted leg by Zhang San.", a part " leg " that will " Li Si's leg " this semantic chunk in the sentence if separated is gone out by unseparated situation, and this should be " Li Si's leg has been interrupted ".
7. the knowledge when this vocabulary constitutes semantic chunk, Yi @K represents.To non-v notion, fill in the collocation knowledge that needs when this vocabulary constitutes semantic chunk.When building the storehouse,, can adopt directly the form that provides Chinese character with " |: " in order to embody the difference on the pragmatic easily, and still back collocation of collocation before representing respectively to belong to Q and H.As " signature ", in this, fill out { ug, H|: motion }, when expression " signature " is used as the ug genus, often adopt " motion " as the back collocation.To the v notion, the verb of frequent logotype when this also provides this speech formation E semantic chunk.To the v notion, if segregation phenomenon is arranged when constituting the E semantic chunk, also in this expression, expression is consistent with " the formation knowledge of semantic chunk and the preferential notion of each component part " item.
When this vocabulary can constitute a semantic chunk a part of, represent with FK.FK also break as the 6th described natural decomposition (B, C, Q, H), its agreement is identical.Consult " freedom " of accompanying drawing 4.The part that this vocabulary preferentially serves as explanation in the 8th.
8. the vocabulary semantic role of often serving as, Yi @CA representative.When vocabulary often appears at a certain or some class, and when often serving as certain semantic chunk, the form with the semantic chunk title fills in this.As " clever ", often serve as the SC semantic chunk of state sentence, in this, fill in " SC ".The v notion is served as the E semantic chunk, and this information belongs to agreement, does not belong to this fill substance.But when the v notion constitutes E semantic chunk a part of, need clearly fill out.Consult " predation " of accompanying drawing 4.
9. linguistic context knowledge, Yi @CT represents.The linguistic context knowledge that this vocabulary provides itself, i.e. association's knowledge between statement.Title and notion statement symbol with auxilliary semantic chunk are filled in.As " earthquake ", its linguistic context knowledge is to cause catastrophic effect, fills at this of this speech: Rt:r322.
Compared with prior art, the present invention has following advantage:
The present invention simulates human brain has been set up the natural language statement to the perception mechanism of natural language deep layer semantic structure--the sentence class, and, formed the sentence category analysis (sca) technology as center construction knowledge base and statement analysis and processing method.This technology closely organically combines the expression and the natural language statement deep layer semantic structure of notion, has intactly described natural language statement deep layer semantic structure, has formed the natural language processing method of serving as theme with sentence category analysis (sca).Simultaneously, the present invention is to use with different levels to the processing of natural language and makes computing machine grasp the method for deep layer semantic structure.
The result that analyzing and processing obtains, promptly be in the mechanical translation to the analysis result of source language, handle if be equipped with the generation of target language, can constitute machine translation system.For Chinese, owing to there is the phenomenon of a sound multiword and a word multitone, use above-mentioned treatment step, can solve the transfer problem of " sound is to word " and " word is to sound " preferably.
Limit of the present invention the deep layer semantic structure of natural language statement, form complete statement deep layer semantic structure system.Therefore also solved prior art preferably owing to the incomplete problem that causes of deep layer semantic structure.
Knowledge base is expressed semanteme with the center that is expressed as of sentence class knowledge with the concept classification symbolism, than using set of complex features, directly uses the natural language expressing method of semantic, and is succinct efficient.Knowledge base adopts the mode of coding to express the deep layer semantic structure closely around the deep layer semantic structure of natural language, can significantly reduce the requirement to storage space.
Of the present invention above-mentioned and other feature and advantage by following to as shown in drawings, the more detailed description of the preferred embodiments of the present invention will become fully aware of.
Description of drawings
Fig. 1 is concept node statement figure of the present invention.
Fig. 2 is sentence class expression statement figure of the present invention.
Fig. 3 is format conversion statement figure of the present invention.
Fig. 4 fills in the sample free hand drawing for knowledge base of the present invention.
Embodiment
In order to finish phonetic conversion Chinese character, at first need to set up as the aforesaid Chinese vocabulary knowledge base of the present invention (comprising monosyllabic word).Secondly need form the software that uses knowledge base that input Pinyin stream is handled according to aforementioned processing method of the present invention.For convenience of explanation, emphasis describes for example with the corresponding down vocabulary " microcomputer, crisis jeopardize, great feats " of phonetic " wei ji " below." " be the word of specifying input, import with " 1 ".
Embodiment 1:zi ran zai hai wei ji lnong ye sheng chan. (the phonetic stream of input)
Nature * disaster crisis * agricultural production *
Wild
Chinese character under the phonetic is the result of dictionary coupling, and * represents to a plurality of speech should be arranged fuzzy set is arranged promptly.Corresponding a plurality of speech are: nature { spontaneous combustion }, crisis { microcomputer jeopardizes great feats } is produced { abounding with }.For the convenience of expressing, provide the concept classification of this related vocabulary and the semanteme that provides with representation of concept system of the present invention here, and omit in the knowledge base other.A plurality of semantemes with "; " cut apart.
Nature rw508:ru307+ (g711; Gva32); (u51; U65311; U65232)+ju600; Jluv13c43
Spontaneous combustion v009+u305
Disaster r322
Crisis r53322
Microcomputer pw+jv30
Jeopardize v53322; V53322+v341
Great feats rc30al
Agricultural ga21
Wild u508
Produce (va21; V660)+v3119
Abound with v311; Rw311 is through software processes, and computing machine can obtain following result: sentence class XS*22; A: disaster; B: agricultural production: XS: jeopardize.Finally, computing machine can provide the result of sound word conversion: disaster has jeopardized agricultural production.Embodiment 2:wo guo bang zhu ya zhou guo jia du guo jing rong
China helps Asian countries to spend * finance
Weigh
wei?ji
Crisis *
New fuzzy set has: spend { tiding over }.Semantic:
China pj2+g4001-
Jvz518 weighs
Help v9431
Asia fwj2
The pj2 of country
Spend v50010
Tide over v229
Finance ga24
Through software processes, computing machine can obtain following result:
Sentence class R31 1X*21; RB1: China; B: financial crisis (piece expansion) is spent by Asian countries; RX: help.
Finally, computing machine can provide the result of sound word conversion: China helps Asian countries to spend financial crisis.Embodiment 3:wo men xiu li l zhe tai wei ji.
The new term that runs in the sentence:
We are p4001-
Repair v65351a
Beautiful u51+j831
These are 1914005 years old
Through software processes, computing machine can obtain following result:
Sentence class X; A: we; B: this microcomputer; X: repair.Finally, computing machine can provide the result of sound word conversion: we have repaired this microcomputer.Embodiment 4:deng xiao ping tong zhi kai ll ge wan xiao
New term:
People p-+ga101
Folk song (pj01*+gc402)/gwa32
Eulogize (v7115,12, ra32u)
His 192+p4003-0+pj711
Her 192+p4003-0+pj712
Great achievement rc30a l+jzr41c44
Flatter (v7117u, v9711u)+j862
Surround and protect vc3219+jv4212 through software processes, computing machine can obtain following result: sentence class X20; X2B: the people; XBC: his great achievement; X2 eulogizes.Finally, computing machine can provide the result of sound word conversion: the people eulogize the great achievement of he (she).

Claims (1)

1. analytical approach of simulating the natural language statement of brain language perception, it is characterized in that: this method comprises the foundation of hierarchical network of concepts speech level knowledge base and concrete processing two steps of definite sentence category analysis (sca); Wherein, hierarchical network of concepts speech level knowledge base is divided into essential sentence class and subclass to the natural language statement, to each essential sentence class and subclass thereof, is semantic primitive with semantic chunk physical representation formula, provide corresponding physical representation formula, comprise standard, standard, fault and 4 kinds of basic formats of omission; But every kind of basic format has again accordingly, the different-format of exclusive list on the mathematics;
(1) the statement step of hierarchical network of concepts speech level knowledge base is as follows:
(a) statement is divided into 7 essential sentence classes by semanteme: effect name, process sentence, transfer sentence, effect sentence, relation sentence, state sentence and judgement sentence: according to the dependence power of semantic chunk and sentence class, semantic chunk is divided into main semantic chunk and auxilliary semantic chunk, and wherein auxilliary semantic chunk comprises: condition, means, instrument, approach, reference, because of, really; From its common feature main semantic chunk is divided into: feature semantic chunk, actor, object and content; Set up the general physical representation formula of semantic chunk: SK=individual character+general character=sentence category information+semantic chunk type information; When the feature semantic chunk of sentence comprises the feature of two essential sentence classes, constitute mixed sentence; When explaining the feature of two or more essential sentence classes with two or more feature semantic chunks in the sentence, constitute the compound sentence class; Form with symbol is come out above-mentioned information representation, forms knowledge base;
(b) to the vocabulary in the knowledge base, if its concept classification contains v, according to the semantic knowledge of itself determine its correspondence node effect sentence Φ 0, process sentence Φ 1, shift sentence Φ 2, an effect sentence Φ 3, relation sentence Φ 4 and state sentence Φ 5 and generally judge among sentence class Φ 8 and other subclasses j11 that judges sentence and be the main contents of representative, determine that according to corresponding situation vocabulary belongs to any of 7 essential sentence classes; Group sentence situation when in 7 essential sentence classes of correspondence, serving as the feature semantic chunk according to this word, the concrete sentence category code of determining in 57 corresponding subclasses; If the semantic knowledge main contents of this word comprise the aforementioned nodes of two correspondences, then press the mixed sentence class and handle; The code of mixed sentence class, the present invention has made agreement: with the sentence category code E1 of two essential sentence classes constituting the mixed sentence class and E2 adds * number and the formal representation of three bit digital kmn, wherein E1 and E2 are the sentence category codes of essential sentence class, K represents total number of non-E semantic chunk, and m represents that first semantic chunk begins in the E1 essential sentence class, does not comprise the E semantic chunk, the semantic chunk number of taking out, n represents the start sequence number of the semantic chunk that takes out from second essential sentence class E2, when n=m+1, n can not write; To causing the vocabulary of compound sentence class, add * number with the E2 centre with the code E1 of two essential sentence classes forming the compound sentence class, fill in a category information; When analyzing, can from notion aspect sentence class expression knowledge base, take out the format indication of two sentence classes according to the indication of E1 and E2;
(c) when the sentence category code is effective, according to the sentence category code in (b), a concrete definite sentence class belongs to two ten days, three sentences and four sentences; Concrete definite way is as follows: the general mathematical notation formula of statement can be write as:
Connect the feature semantic chunk of sentence behind the number one generalized object semantic chunk JK, connect second generalized object semantic chunk after again, connect the 3rd generalized object semantic chunk after again, all the other generalized object semantic chunks are listed in proper order;
Do not limit the number of generalized object semantic chunk JK in the expression, but for the essential sentence class, the practical natural language only need consider that the JK number is 1,2,3 situation, they are respectively corresponding to two main piece ten days, three main piece sentences and four main piece sentences;
For four main piece sentences, JK2 is necessarily based on object B, and JK3 is necessarily based on content C, and for three main piece sentences, B or C can serve as the main body of JK; For two main piece sentences, can not have E, but at this moment JK2 must be based on C, the narrative order that master's semantic chunk often adopts during according to this word group sentence finds concrete format code in Fig. 3; Provide this phrase form that the period of the day from 11 p.m. to 1 a.m often adopts that forms a complete sentence with the form of code; According to this indication, can from the format conversion knowledge base of notion aspect, obtain concrete form; In the time of a plurality of form, with [1] [2] .... form label so that the different situations under the corresponding expression different-format in every below; As often adopting standard format and cannonical format when forming sentence, this can not filled out;
(d) when the sentence category code is effective, this word is during according to (b) category code in the middle ten days group sentence, if the expection relation is arranged between this word and the generalized object semantic chunk, promptly this word requires specific notion to serve as its certain generalized object semantic chunk, and then that this is specific, notion preferential and this collocations provides by the described method of F=∑ (alphabetic string) (numeric string); This expection comprises the expection to certain composition in the generalized object semantic chunk structure; At this moment, at first the configuration information to semantic chunk is described, and provides the preferential notion of tie element then; Yi @S representative in the preferential notion shop order of the formation knowledge of semantic chunk and each component part, the formation knowledge of JK semantic chunk, with=and+fill in this; The preferential conceptual knowledge of each several part, use: expression, also fill out in this; If the sentence that the v notion constitutes, often require a sentence to become wherein semantic chunk, if vocabulary has this knowledge, just this represents respectively that with JK=J and JK:=J a certain semantic chunk JK must be expanded into sentence and maybe may be expanded into sentence in knowledge base; About definite opinion (i)
The component part of semantic chunk or semantic chunk can be from partitioning object B on the intension and two parts of content C, also can be from be divided into preceding Q, back H two parts in form; Belong to agreement for this formation, need not again that explicitly writes out expression formula, only need add above-mentioned four letter bs after this semantic chunk or component part, C, Q, H provide its preferential notion, just represent that this formation exists, and also illustrates the preferential notion of this part simultaneously;
(e) if a part and other parts that the generalized object semantic chunk of describing (d) is formed in the structure are not to be right after together, but appear at respectively on two positions that separate of statement, this situation in constituting, semantic chunk is represented, with [] and (), the expression semantic chunk may separate the part of necessarily separating with semantic chunk respectively;
(f) specifically write out the title of serving as semantic chunk, the semantic role knowledge that provides this vocabulary to serve as when this speech; The semantic role that vocabulary serves as, Yi @CA representative; The v notion is served as the E semantic chunk, and this information belongs to agreement, does not belong to this fill substance; But when the v notion constitutes E semantic chunk a part of, need clearly fill out;
(g) if this vocabulary itself also may provide the association's knowledge between the statement, promptly there is a kind of explicit result; Specifically provide this linguistic context knowledge, Yi @CT representative;
(h) needs in order to express, sentence class often convert another class to and express, but the information of semantic association remains before the conversion, and the present invention is called a class conversion; The conversion of distich class, the present invention has also provided expression method; To the v notion that meeting changes, in " the sentence class form " of knowledge base, fill in (E1, E2) J, the sentence class that often adopts when wherein E1 is this v notion formation E semantic chunk, also can think normal, original sentence class, after E2 represents to change, the sentence class that conversion is adopted; To causing the v notion of conversion, fill in E1J<=E2J, E1J represents what the sentence class that is transformed into, E2J were represented to come from the conversion of which kind of class; When the sentence category code was effective, this vocabulary was explained the sentence of other classes when the group sentence with sentence class this vocabulary correspondence, this category code, will indicate this situation, and the sentence category code of these other classes is provided;
(i) when the sentence category code is effective, during simultaneously according to the sentence category code group sentence that provides, require a sentence to serve as its a certain generalized object semantic chunk, this situation is indicated, some semantic chunk that promptly provides this vocabulary initiation expands to the knowledge of statement, sees step (d).
(2) determine that the concrete treatment step of sentence category analysis (sca) is as follows:
(a) to the sentence of input, carry out the dictionary coupling, be syncopated as the speech that runs in the sentence, from knowledge base, obtain the semantic knowledge of these vocabulary;
(b) according to the indication of concept classification information, be foundation, form the semantic chunk blank, form the E hypothesis with semantic chunk differentiation designator l0 genus and verb v notion;
(c) if fail to form the E hypothesis, turn to (i); Otherwise, continue;
(d) whole E are supposed to screen and line up, the main information of utilizing is: sentence category code, format code and word frequency and linguistic context knowledge;
(e), carry out a class check successively according to the ordering of selected E hypothesis; The main information of utilizing is: the preferred sex knowledge of the notion of semantic chunk core; If the one-hundred-percent inspection failure turns to (k); Otherwise continue;
(f) carry out semantic chunk and constitute check; The main information of utilizing is: semantic chunk constitutes knowledge and constitutes the knowledge of the preferential notion of semantic chunk each several part; If the one-hundred-percent inspection failure turns to (k); Otherwise continue;
(g) carry out a class Transformation Tests where necessary, the main information of utilizing is: the sentence class conversion knowledge that vocabulary causes; If the one-hundred-percent inspection failure turns to (k); Otherwise turn to (l)
(h) carry out semantic chunk where necessary and separate check, the main information of utilizing is: semantic chunk separates and conversion knowledge; If all fail then commentaries on classics (k), otherwise turn to (i);
(i) there is not the check of E semantic chunk sentence class; If failure continues; Otherwise turn to (l);
(j) recast E hypothesis successfully turns to (d), otherwise, turn to (k);
(k) man-machine interaction;
(f) collect the linguistic context material, processing finishes.
CNB981019218A 1998-05-18 1998-05-18 Natural language statement analyzing method simulating brain's language sensing process Expired - Fee Related CN1141660C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB981019218A CN1141660C (en) 1998-05-18 1998-05-18 Natural language statement analyzing method simulating brain's language sensing process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB981019218A CN1141660C (en) 1998-05-18 1998-05-18 Natural language statement analyzing method simulating brain's language sensing process

Publications (2)

Publication Number Publication Date
CN1236138A CN1236138A (en) 1999-11-24
CN1141660C true CN1141660C (en) 2004-03-10

Family

ID=5217018

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB981019218A Expired - Fee Related CN1141660C (en) 1998-05-18 1998-05-18 Natural language statement analyzing method simulating brain's language sensing process

Country Status (1)

Country Link
CN (1) CN1141660C (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1310171C (en) * 2004-09-29 2007-04-11 上海交通大学 Method for semantic analyzer bead on grammar model
CN1838159B (en) * 2006-02-14 2010-08-11 北京未名博思生物智能科技开发有限公司 Cognition logic machine and its information processing method
US10249297B2 (en) * 2015-07-13 2019-04-02 Microsoft Technology Licensing, Llc Propagating conversational alternatives using delayed hypothesis binding
CN107422691B (en) * 2017-08-11 2020-05-12 山东省计算中心(国家超级计算济南中心) Collaborative PLC programming language construction method

Also Published As

Publication number Publication date
CN1236138A (en) 1999-11-24

Similar Documents

Publication Publication Date Title
Inkelas et al. Is grammar dependence real? A comparison between cophonological and indexed constraint approaches to morphologically conditioned phonology
Koller et al. Sentence generation as a planning problem
CN101079268A (en) System and method for sign language synthesis and display
Prevost et al. Generating contextually appropriate intonation
Bernardy et al. A type-theoretical system for the FraCaS test suite: Grammatical framework meets Coq
Jain et al. Vishit: A visualizer for hindi text
CN1141660C (en) Natural language statement analyzing method simulating brain's language sensing process
Farwell et al. Automatically creating lexical entries for ULTRA, a multilingual MT system
Hirschman et al. The PUNDIT natural-language processing system
Mekki et al. Critical description of TA linguistic resources
Buránová et al. Tagging of very large corpora: Topic-focus articulation
Khan Representing temporal information in lexical linked data resources
Mou Rooted and Rootless Pluralist Approaches to Truth: Two Distinct Interpretations of Wang Chong’s Account
Papageorgiou et al. Multi-level XML-based Corpus Annotation.
Nallani et al. A Fully Expanded Dependency Treebank for Telugu
CN1111814C (en) Opening and alli-information template type of language translation method having man-machine dialogue function and all-information semanteme marking system
JP4033093B2 (en) Natural language processing system, natural language processing method, and computer program
Vetulani et al. The Case of Polish on its Way to Become a Well-Resourced-Language
Nagao et al. Dealing with incompleteness of linguistic knowledge in language translation–transfer and generation stage of Mu machine translation project
Narita Constructing a Tagged EJ Parallel Corpus for Assisting Japanese Software Engineers in Writing English Abstracts.
Qian et al. Ontology method for Chinese language processing
Lee et al. Natural language interface for an expert system
Denis et al. A deep-parsing approach to natural language understanding in dialogue system: Results of a corpus-based evaluation
KR101097209B1 (en) Apparatus and Method for Converting Korean Words into Foreign Words to Support Translation
Nirenburg et al. Two principles and six techniques for rapid MT development

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20040310

Termination date: 20100518