CN1236138A - Natural language statement analyzing method simulating brain's language sensing process - Google Patents

Natural language statement analyzing method simulating brain's language sensing process Download PDF

Info

Publication number
CN1236138A
CN1236138A CN98101921A CN98101921A CN1236138A CN 1236138 A CN1236138 A CN 1236138A CN 98101921 A CN98101921 A CN 98101921A CN 98101921 A CN98101921 A CN 98101921A CN 1236138 A CN1236138 A CN 1236138A
Authority
CN
China
Prior art keywords
sentence
knowledge
semantic
semantic chunk
vocabulary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN98101921A
Other languages
Chinese (zh)
Other versions
CN1141660C (en
Inventor
黄曾旸
张全
刘志文
晋耀红
杜燕玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Original Assignee
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS filed Critical Institute of Acoustics CAS
Priority to CNB981019218A priority Critical patent/CN1141660C/en
Publication of CN1236138A publication Critical patent/CN1236138A/en
Application granted granted Critical
Publication of CN1141660C publication Critical patent/CN1141660C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The computer analysis method features that it includes statement classification analysis method and concept hierarchy network language knowledge bank. The present invention processes natural language by utilizing statement classification knowledge to activate statement concept association vein and performing equivocation and fuzziness treatment in the concept and language hierarchy. The knowledge band uses the expression of statement classification knowledge as the central part and expresses statement in the sign system of the concept hierarchy network. The said method is simple and efficient and has reduced requirement in memory size

Description

The natural language statement analytical method of simulation brain language perception
The present invention relates to a kind of Computer Natural Language Processing method, more particularly, relate to a kind of computer analysis method of simulating human brain natural language statement perception.
From the forties computing machine be born, attempt using a computer with regard to continuous someone the usual natural language of the mankind analyzed and handled, below be several main method.
Chomsky (N.Chomsky) has proposed trnasformational generative grammar (Transformational Generative Grammar) in 1950's, has formed the syntactic analysis method of trnasformational generative grammar.Chomsky proposes to exist deep structure in the language in trnasformational generative grammar, but he do not solve how deep structure is represented and deep structure the deep structure in how many kinds of, the natural language is arranged is limited or unlimited problem.Although therefore the trnasformational generative grammar of Chomsky is based on statement and generates very strict process, but for the very complicated language phenomenon that the mankind form naturally, trnasformational generative grammar and syntactic analysis method thereof also do not possess enough abilities and remove to handle natural language problem.In addition, because deep structure does not play a role in processing procedure, strong excessively generative capacity also makes the syntactic analysis of trnasformational generative grammar very unsuccessful.
Along with going deep into of research, progressively formed a collection of syntax theory of being convenient to computer implemented natural language processing.Comprise that mainly to expand conversion (ATN) grammer, systemic functional grammar and various phrase structure grammars etc. be the syntactic analysis method of guidance.These methods are strengthened greatly than trnasformational generative grammar on computer implemented convenience, but they have all abandoned pursuing the analysis to the language deep structure, and the problem analysis of natural language is not only the problem of a grammer.Therefore, can not to solve computing machine natural language analysis problem well be conspicuous to these methods.
Further developing of phrase structure grammar, have realized that needs to utilize the knowledge frequently that contains in the natural language to obtain analysis result preferably many-sidedly.In recent years, based on phrase structure grammar, introduce the knowledge representation method of set of complex features and the algorithm of unification computing, formed Lexical-Functional Grammar (Lexical Functional Grammar, abbreviation LFG), functional unification grammar (FunctionalUnification Grammar, abbreviation FUG), Generalized Phrase Structure Grammar (GeneralizedPhrase Structure Grammar, be called for short CPSG) and nuclear driving phrase structure grammar (Head-Driben Phrase Structure Grammar is called for short GPSG).These methods relate to semantic depth analysis a bit, but owing to lack whole notion statement system, they do not have the real deep layer semantic structure of finding and using natural language yet simultaneously, still use syntactic structure, with semantic processes replenishing as grammatical analysis.Thereby can not thoroughly solve the problem of using the grammatical analysis natural language to be run into.
In the prevailing while of grammer disposal route, the tight semantic analysis and processing method that relies on has also appearred.They are that case grammar (Case Grammar) and the China fir that Fei Ermo (Fillmore) proposes restrains the conceptual dependency theory (Conceptual Dependency) that (Schank) proposes.Though case grammar is being explored and pursued on the deep layer semantic structure and made certain contribution, it does not finally form a rounded system yet, for such as there being what lattice to answer in the natural language.Even to connect lattice in the natural language be limited or unlimited, and such basic problem also can't be come to a conclusion.Conceptual dependency theory is then when bases such as shortage complete concept statement system and deep layer semantic structure, just be deep into understanding to general knowledge in the natural language and professional knowledge, make it just as being based upon the mansion on the sandy beach, can't really bear the important task of natural language processing.Based on the disposal route of conceptual dependency theory, be absorbed in the ocean of knowledge at the very start and can not extricate oneself.This disposal route that has caused conceptual dependency theory always is in a kind of stage that needs to add knowledge in the face of unlimited natural language phenomenon, can't enter practicality.
Computer application at present is almost omnipresent, and software industry will become the leading industry of 21 century, and this indicates that the information age arrives.In the face of main carrier--the natural language of information and knowledge, computing machine presses for has the function of handling the natural language semantic knowledge.Therefore at first to set up the natural language statement analytical method of being convenient to computer operation, make computing machine can grasp the deep layer semantic structure of natural language preferably.
The purpose of this invention is to provide a kind of complete, towards the computing machine natural language statement analytical method of brain language perception various natural languages, the anthropomorphic dummy.
A kind of natural language statement analytical method of simulating brain language perception, it is characterized in that: this method comprises sentence category analysis (sca) method and hierarchical network of concepts speech level knowledge base, wherein, sentence category analysis (sca) method basis is comprehensive to the natural language statement, conclusion and deduction are divided into 7 essential sentence classes and 57 subclasses, to each essential sentence class and subclass thereof, with semantic chunk physical representation formula is semantic primitive, provide corresponding statement physical representation formula, these expressions have standard, standard, fault and 4 kinds of basic formats of omission; But every kind of basic format has again accordingly, the different-format of exclusive list on the mathematics.
The statement step of hierarchical network of concepts speech level knowledge base is as follows:
(1) knowledge base provides with sentence category code form the sentence class under the vocabulary;
When (2) forming sentence for vocabulary, the various actual arrangement orders of semantic chunk are with the formal representation of format code;
The knowledge of the formation knowledge of semantic chunk and the preferential notion of formation semantic chunk each several part when (3) providing vocabulary formation sentence;
The separation and the conversion knowledge of semantic chunk when (4) providing vocabulary formation semantic chunk;
(5) provide the semantic role knowledge that vocabulary serves as when constituting sentence;
(6) provide the linguistic context knowledge that vocabulary causes;
(7) provide the sentence class conversion knowledge that vocabulary causes;
(8) provide the knowledge that some semantic chunk that vocabulary causes expands to statement.
The concrete treatment step of sentence category analysis (sca) is as follows:
(1) to the sentence of input, carry out the dictionary coupling, be syncopated as the speech that runs in the sentence, from knowledge base, obtain the semantic knowledge of these vocabulary;
(2) according to the indication of concept classification information, be foundation, form the semantic chunk blank, form the E hypothesis with semantic chunk differentiation designator 10 genuses and verb v notion;
(3) if fail to form the E hypothesis, turn to (9); Otherwise, continue;
(4) whole E are supposed to screen and line up, the main information of utilizing is: sentence category code, format code and word frequency and linguistic context knowledge;
(5), carry out a class check successively according to the ordering of selected E hypothesis; The main information of utilizing is: the preferred sex knowledge of the notion of semantic chunk core; If the one-hundred-percent inspection failure turns to (11); Otherwise continue;
(6) carry out semantic chunk and constitute check; The main information of utilizing is: semantic chunk constitutes knowledge and constitutes the knowledge of the preferential notion of semantic chunk each several part; If the one-hundred-percent inspection failure turns to (11); Otherwise continue;
(7) carry out a class Transformation Tests where necessary, the main information of utilizing is: the sentence class conversion knowledge that vocabulary causes; If the one-hundred-percent inspection failure turns to (11); Otherwise turn to (12);
(8) carry out semantic chunk where necessary and separate check, the main information of utilizing is: semantic chunk separates and conversion knowledge; If all fail then commentaries on classics (11), otherwise turn to (10);
(9) there is not the check of E semantic chunk sentence class; If failure continues; Otherwise turn to (12);
(10) recast E hypothesis successfully turns to (4), otherwise, turn to (11);
(11) man-machine interaction;
(12) collect the linguistic context material, processing finishes.
The present invention is the computing machine natural language statement analytical method of anthropomorphic dummy's brain language perception.The people in the process of perception natural language, the knowledge of integrated use notion aspect, speech level and general knowledge specialty aspect; Wherein the knowledge of notion aspect and speech level is that the mankind carry out the key that perception is handled.The knowledge of notion aspect is irrelevant with languages, the knowledge of the processing natural language that the mankind are shared, and speech level knowledge is meant the knowledge that those are relevant with languages in perception.In the notion aspect, the present invention is an object with whole natural language, has intactly divided the sentence class, has provided the sentence class expression and the format conversion table of natural language, has set up the deep layer semantic structure of natural language statement.
This notion of sentence class is meant declarative sentence, imperative sentence, interrogative sentence and exclamative sentence in traditional grammar, mainly be the pragmatic classification of sentence, and sentence class of the present invention is meant the semantic classes of sentence.The present invention is divided into 7 essential sentence classes with statement by semanteme: effect sentence, process sentence, transfer sentence, effect sentence, relation sentence, state sentence and judgement sentence.
Semantic chunk is the semantic component unit of sentence, can be speech, phrase or a sentence in form.Proposing the semantic chunk notion is for the ease of describing sentence from semantic level.According to the dependence power of semantic chunk, semantic chunk is divided into main semantic chunk and auxilliary semantic chunk with the sentence class.Main semantic chunk depends on a class by force, depends on a class a little less than the auxilliary semantic chunk.Auxilliary semantic chunk is divided into 7 kinds: condition, means, instrument, approach, reference, because of, really.Main semantic chunk can be divided into from common feature: feature semantic chunk, actor, object and content.The personal characteristics of semantic chunk is its sentence generic attribute.Two sides of the general character of semantic chunk and individual character should be considered as at the bottom of two orthogonal basiss of statement two-dimensional space.Therefore, the general physical representation formula of semantic chunk is:
SK=" individual character+general character "=" sentence category information+semantic chunk type information " (1) following formula has shown that semantic chunk is the function of a class.Sentence class under the statement is by its feature semantic chunk decision.When the feature semantic chunk of sentence comprises the feature of two essential sentence classes, constitute the mixed sentence class; When explaining the feature of two or more essential sentence classes with two or more feature semantic chunks in the sentence, constitute the compound sentence class.
In order to make computing machine can use these knowledge, must these information representations be come out with the form of symbol, and form knowledge base.Need provide the expression of a class and the map table of form in the notion aspect; In speech level, need provide the knowledge of serving as theme at the vocabulary of concrete syntax with the sentence class.Following mask body is introduced the construction of two class knowledge bases.
The symbolic representation of four kinds of main semantic chunk primitives is: feature E, actor A, object B and content C; 7 kinds of auxilliary semantic chunks are: condition C n (Condition), means Ms (Means), instrument In (Instrument), approach Wy (Way), with reference to Re (Refer), because of Pr (Premise), fruit Rt (Result).The symbolic representation of essential sentence class is: act on X, process P, shift T, effect Y, concern R, state S and judge D.During the accurate expression of main semantic chunk, all use the serial connection form of capitalization and numeral to express two category informations in (1) formula.In the sentence category information item, essential sentence class, digitized representation subclass represented in letter; In the semantic chunk type information item, semantic chunk type, the subclass of digitized representation type represented in letter.The semantic chunk called after feature semantic chunk that only contains a category information is designated as E; Contain the semantic chunk called after generalized object semantic chunk of a category information and semantic chunk type information simultaneously, be designated as JK.
For example, X2, X2B, XAC, X2C represent reaction, reactor, reaction occasioner and the performance thereof of response sentence (one of effect sentence class), 4 kinds of semantic chunks such as follow-up performance of reactor respectively, and here, X2 is the E piece, and other all is the generalized object semantic chunk.Again for example, TB, TC are object and the contents that shifts sentence, and the object and the content of information transfer sentence (shifting one of sentence class) are designated as T3B, T3C respectively, and the both sides of relation are designated as RB1, RB2 respectively, or the like.
The general mathematical notation formula J of statement can be write as: J n + 1 = JK 1 + E + Σ j = 2 n JKj - - - ( 2 )
JK1 is called the generalized object semantic chunk No. 1, and the rest may be inferred by analogy for it.Expression (2) does not limit the number of JK, but for the essential sentence class, the practical natural language only need consider that the JK number is 1,2,3 situation, and they are respectively corresponding to two main piece sentences, three main piece sentences and four main piece sentences.
For four main piece sentences, JK2 is necessarily based on object B, and JK3 is necessarily based on content C, and for three main piece sentences, B or C can serve as the main body of JK.For two main piece sentences, can not have E, but at this moment JK2 must be based on C, this situation often appears in the state sentence of Chinese.
E in (2) formula and JK are replaced with semantic chunk physical representation formula, promptly constitute the physical representation formula of statement.The semanteme statement that these physical representation formulas are statement deep structures.The present invention has provided the sentence class expression of 57 essential sentence classes and subclass thereof.The sentence class expression of mixed sentence class can be known by inference by essential sentence class expression, and needn't be built the storehouse separately.
Four kinds of format conversion types are explained as follows:
The feature of standard format is: main piece is pressed the natural logic series arrangement of language.The order of semantic chunk is just represented with this form in the sentence class expression storehouse.
The feature of cannonical format is: the natural logic of having violated language that puts in order of main piece puts in order, thereby has departed from standard format, but must add cue mark between the generalized object semantic chunk.To three main piece sentences, cannonical format has 4 kinds.To four main piece sentences, cannonical format has 23 kinds.
The feature of fault form is: partly or entirely omit cue mark between the generalized object semantic chunk.To three main piece sentences, the fault form has 4 kinds.To four main piece sentences, the fault form has 47 kinds.
Omit form and be meant the some semantic chunks of omission in the sentence.
Language knowledge base is exactly at the vocabulary in the concrete syntax, describes its semanteme and sentence class knowledge.The present invention uses the hierarchical network of concepts symbolism to explain these knowledge, and therefore, this language knowledge base claims the hierarchical network of concepts knowledge base again.Specifically, providing the knowledge of anolytic sentence exactly from the following aspects, for the ease of understanding, is that example is described with Chinese:
1. semantic knowledge.Notion statement system with natural language provides.Notion in the natural language has notion primitive and compound notion two classes, and the notion primitive refers to the notion that the definition of the semantic network node that its semanteme can provide with accompanying drawing 1 is directly expressed; Compound notion refers to and can't directly express with semantic network node, needs could express semantic notion through combination.The semantic expressiveness formula of notion primitive is:
F=∑ (alphabetic string) (numeric string) (3)
F represents the symbolic representation of notion primitive.Alphabetic string adopts lowercase, and numeric string adopts 16 systems numeral 0-f.By five-tuple { v (notion dynamically), g (static state of notion), u (attribute of notion), z (value of notion), r (effect of notion) }, concrete concept classification { p (people), w (thing) }, aggregate concept classification { e (understanding comprehensive between the notion) between primitive, fundamental sum, x (rerum natura) } and semantic network symbol { Φ (primitive notion semantic network), j (key concept semantic network), l (logic of language notion semantic network), jl (basic logic notion semantic network), jw (base substance notion semantic network)) the formation alphabetic string.Because the amount maximum of primitive notion is omitted when writing and is not write out Φ.Numeric string is the level symbol.
The semantic expressiveness of compound notion is: F=∑ F (K) (4)
F (K) promptly is the F of (3) formula, passes through between them:
Effect # effect symbol $
Dui Xiang ﹠amp; The content symbol |
Logic also, the choosing; Logical combination (, L)
Polarization/subject-predicate ‖
Non-! Anti-^
Preferential combination () is affiliated to+the composite symbol connection.
2. concept classification.The external performance of notion that vocabulary is expressed, i.e. alphabetic string in the content 1.When lexical representation be the notion primitive time, the alphabetic string of this symbol and semantic knowledge (seeing 1) is identical; When lexical representation be compound notion the time, the external manifestation of this expression combination back vocabulary may be different with the class code of each the notion primitive that constitutes combination.This has described the complete external manifestation of vocabulary.Directly provide concept classification, be convenient to computing machine at first use classes knowledge carry out analyzing and processing.
3. word frequency and linguistic context.The present invention expresses this knowledge with the hexadecimal digit of 0-b, estimate according to the semantic operating position of word.Each numeral is defined as: 0 extremely high frequency; 1 is commonly used; 2 specialties are commonly used; 3 is non-common; 4 spoken languages; 5 dialects; 6 ancient using; 7 modern ages; 8 is seldom used; 9 specialties are non-common; A is extremely seldom used; The b specialty is seldom used.
The sentence category code.When vocabulary has clear and definite sentence category information, fill in the information of a class with the form of code, this is primarily aimed at the verb v notion that can serve as E semantic chunk core and fills in.The sentence category code of essential sentence class correspondence as shown in Figure 2.The code of mixed sentence class (the mixed sentence class in the natural language, the overwhelming majority mixes in twos, therefore knowledge mixed sentence class of the present invention promptly refers to the mixed sentence class of mixing in twos), the present invention has made agreement: with the formal representation of E1E2*kmn.E1, E2 are the sentence category codes of essential sentence class, represent the essential sentence class of two mixing respectively; K represents total number of non-E semantic chunk, and m represents that first semantic chunk begins from E1 essential sentence class, does not comprise the E semantic chunk, the semantic chunk number of taking-up, and n represents the start sequence number of the semantic chunk that takes out from second essential sentence class E2, when n=m+1, n can not write.As: a class T3J=TA+T3+TB+TC and an XJ=A+X+B are arranged, and the sentence class form of T3X*21 is TA+T3X+B, and XT3*21 is A+XT3+TB, and XT3*213 is A+XT3+TC.The situation of filling in the knowledge base, " freedom " of consulting accompanying drawing 4.To causing the vocabulary of compound sentence class,, fill in a category information with the form of E1*E2.E1, E2 are the sentence category code of essential sentence class.When analyzing, can from notion aspect sentence class expression knowledge base, take out the format indication of two sentence classes according to the indication of E1 and E2.
5. format conversion knowledge.When " sentence category code " effectively the time, provide this phrase form that the period of the day from 11 p.m. to 1 a.m often adopts that forms a complete sentence with the form of code.According to this indication, can from the format conversion knowledge base of notion aspect, obtain concrete form.As: in the sentence category code XJ is arranged, have 112 in format conversion knowledge, then the form of the frequent B+A+X that adopts of expression constitutes sentence.In the time of a plurality of form, with [1] [2] ... form label so that the different situations under the corresponding expression different-format in every below.As often adopting standard format and cannonical format when forming sentence, this can not filled out.
For the needs of expressing, sentence class often converts another class to and expresses, but the information of semantic association remains before the conversion, and the present invention of this phenomenon is called a class conversion.The conversion of distich class, the present invention has also provided expression method.To the v notion that meeting changes, in " the sentence class form " of knowledge base, fill in (E1, E2) J, the sentence class that often adopts when wherein E1 is this v notion formation E semantic chunk, also can think normal, original sentence class, after E2 represents to change, the sentence class that conversion is adopted.Consult " predation " of accompanying drawing 4.To causing the v notion of conversion, fill in E1J<=E2J, E1J represents what the sentence class that is transformed into, E2J were represented to come from the conversion of which kind of class.As: " love and esteem ", its knowledge is (X20, X10) J, represent that it can be converted to by original response sentence and bear sentence: it is X10J<=X20J that " being subjected to " this speech has a knowledge, represent that it can guide response sentence to convert to and bear sentence: for " love and esteem ", sentence " we love Zhou premier " can be arranged, this sentence can in order to " being subjected to " guiding conversion represent--" premier Zhou is subjected to our love and esteem ".
6. the preferential notion of the formation knowledge of semantic chunk and each component part, Yi @S representative in the shop order.When " sentence category code " was effective, the JK semantic chunk in the sentence class form filled in this with "=" and "+" if any constituting knowledge; As the each several part that constitutes semantic chunk has preferential conceptual knowledge, with ": " expression, also fills out in this.As: to XJ, its B semantic chunk is made of YB and YC, is write as B=YB+YC; Often be " thing " wherein, also in this, write, write as YB:w (w promptly is aforesaid concept classification symbol, expression " thing ") as YB.The sentence that some v notion constitutes often requires a sentence to become its certain semantic chunk, if vocabulary has this knowledge, just this represents respectively that with JK=J and JK:=J a certain semantic chunk JK must be expanded into sentence and maybe may be expanded into sentence in knowledge base.As: " thinking ", in this, just need fill in DC=J, expression DC semantic chunk necessarily is expanded into sentence.
The component part of semantic chunk or semantic chunk can be from partitioning object on the intension (B) and two parts of content (C), also can be before be divided in form (Q), back (H) two parts.Belong to agreement for this formation, need not again that explicitly writes out expression formula, only need after certain semantic chunk or component part, add above-mentioned four letter (B, C, Q H) provides its preferential notion, just represent that this formation exists, and also illustrates the preferential notion of certain part simultaneously.
The semantic chunk that constitutes sentence can separate, and promptly the needs that reach for statement list are assigned to two local expression with a dark semantic chunk.Knowledge base of the present invention has also provided clear and definite form of presentation for this language phenomenon, and respectively with " [] " and " [()] ", the expression semantic chunk may separate the part of necessarily separating with semantic chunk.As: " interrupting ", in this, just have " B=XB+[YB] ", illustrate that its B semantic chunk may separate, example sentence is as " Li Si has been interrupted leg by Zhang San.", a part " leg " that will " Li Si's leg " this semantic chunk in the sentence if separated is gone out by unseparated situation, and this should be " Li Si's leg has been interrupted ".
7. the knowledge when this vocabulary constitutes semantic chunk, Yi @K represents.To non-v notion, fill in the collocation knowledge that needs when this vocabulary constitutes semantic chunk.When building the storehouse,, can adopt directly the form that provides Chinese character with " |: " in order to embody the difference on the pragmatic easily, and still back collocation of collocation before representing respectively to belong to Q and H.As " signature ", in this, fill out { ug, H|: motion }, when expression " signature " is used as the ug genus, often adopt " motion " as the back collocation.To the v notion, the verb of frequent logotype when this also provides this speech formation E semantic chunk.To the v notion, if segregation phenomenon is arranged when constituting the E semantic chunk, also in this expression, expression is consistent with " the formation knowledge of semantic chunk and the preferential notion of each component part " item.
When this vocabulary can constitute a semantic chunk a part of, represent with FK.FK also break as the 6th described natural decomposition (B, C, Q, H), its agreement is identical.Consult " freedom " of accompanying drawing 4.The part that this vocabulary preferentially serves as explanation in the 8th.
8. the vocabulary semantic role of often serving as, Yi @CA representative.When vocabulary often appears at a certain or some class, and when often serving as certain semantic chunk, the form with the semantic chunk title fills in this.As " clever ", often serve as the SC semantic chunk of state sentence, in this, fill in " SC ".The v notion is served as the E semantic chunk, and this information belongs to agreement, does not belong to this fill substance.But when the v notion constitutes E semantic chunk a part of, need clearly fill out.Consult " predation " of accompanying drawing 4.
9. linguistic context knowledge, Yi @CT represents.The linguistic context knowledge that this vocabulary provides itself, i.e. association's knowledge between statement.Title and notion statement symbol with auxilliary semantic chunk are filled in.As " earthquake ", its linguistic context knowledge is to cause catastrophic effect, fills at this of this speech: Rt:r322.
Compared with prior art, the present invention has following advantage:
The present invention simulates human brain has been set up the natural language statement to the perception mechanism of natural language deep layer semantic structure--the sentence class, and, formed the sentence category analysis (sca) technology as center construction knowledge base and statement analysis and processing method.This technology closely organically combines the expression and the natural language statement deep layer semantic structure of notion, has intactly described natural language statement deep layer semantic structure, has formed the natural language processing method of serving as theme with sentence category analysis (sca).Simultaneously, the present invention is to use with different levels to the processing of natural language and makes computing machine grasp the method for deep layer semantic structure.
The result that analyzing and processing obtains, promptly be in the mechanical translation to the analysis result of source language, handle if be equipped with the generation of target language, can constitute machine translation system.For Chinese, owing to there is the phenomenon of a sound multiword and a word multitone, use above-mentioned treatment step, can solve the transfer problem of " sound is to word " and " word is to sound " preferably.
Limit of the present invention the deep layer semantic structure of natural language statement, form complete statement deep layer semantic structure system.Therefore also solved prior art preferably owing to the incomplete problem that causes of deep layer semantic structure.
Knowledge base is expressed semanteme with the center that is expressed as of sentence class knowledge with the concept classification symbolism, than using set of complex features, directly uses the natural language expressing method of semantic, and is succinct efficient.Knowledge base adopts the mode of coding to express the deep layer semantic structure closely around the deep layer semantic structure of natural language, can significantly reduce the requirement to storage space.
Of the present invention above-mentioned and other feature and advantage by following to as shown in drawings, the more detailed description of the preferred embodiments of the present invention will become fully aware of.
Fig. 1 is concept node statement figure of the present invention.
Fig. 2 is sentence class expression statement figure of the present invention.
Fig. 3 is format conversion statement figure of the present invention.
Fig. 4 fills in the sample free hand drawing for knowledge base of the present invention.
In order to finish phonetic conversion Chinese character, at first need to set up as the aforesaid Chinese vocabulary knowledge base of the present invention (comprising monosyllabic word).Secondly need form the software that uses knowledge base that input Pinyin stream is handled according to aforementioned processing method of the present invention.For convenience of explanation, emphasis describes for example with the corresponding down vocabulary " microcomputer, crisis jeopardize, great feats " of phonetic " wei ji " below." " be the word of specifying input, import with " 1 ".
Embodiment 1:zi ran zai hai wei ji l nong ye sheng chan. (the phonetic stream of input)
Nature * disaster crisis * agricultural production *
Wild
Chinese character under the phonetic is the result of dictionary coupling, and * represents to a plurality of speech should be arranged fuzzy set is arranged promptly.Corresponding a plurality of speech are: nature { spontaneous combustion }, and crisis { microcomputer jeopardizes great feats }, produce abound with).For the convenience of expressing, provide the concept classification of this related vocabulary and the semanteme that provides with representation of concept system of the present invention here, and omit in the knowledge base other.A plurality of semantemes with "; " cut apart.
Nature rw508:ru307+ (g711; Gva32); (u51; U65311; U65232)+ju600; Jluv13c43
Spontaneous combustion v009+u305
Disaster r322
Crisis r53322
Microcomputer pw+jv30
Jeopardize v53322; V53322+v341
Great feats rc30al
Agricultural ga21
Wild u5508
Produce (va21; V660)+v3119
Abound with v311; Rw311
Through software processes, computing machine can obtain following result:
Sentence class XS*22; A: disaster; B: agricultural production; XS: jeopardize.
Finally, computing machine can provide the result of sound word conversion: disaster has jeopardized agricultural production.
Embodiment 2:wo guo bang zhu ya zhou guo jia du guo jing rong
China helps Asian countries to spend * finance
Wei ji crisis * weighs
New fuzzy set has: spend { tiding over }.Semantic:
China pj2+g4001-
Jvz518 weighs
Help v9431
Asia fwj2
The pj2 of country
Spend v50010
Tide over v229
Finance ga24
Through software processes, computing machine can obtain following result:
Sentence class R311X*21; RB1: China; B: financial crisis (piece expansion) is spent by Asian countries; RX: help.
Finally, computing machine can provide the result of sound word conversion: China helps Asian countries to spend financial crisis.
Embodiment 3:wo men xiu li l zhe tai wei ji.
The new term that runs in the sentence:
We are p4001-
Repair v65351a
Beautiful u51+j831
These are 1914005 years old
Through software processes, computing machine can obtain following result:
Sentence class X; A: we; B: this microcomputer; X: repair.
Finally, computing machine can provide the result of sound word conversion: we have repaired this microcomputer.
Embodiment 4:deng xiao ping tong zhi kai l l ge wan xiao
New term:
People p-+ga101
Folk song (pj01*+gc402)/gwa32
Eulogize (v7115,12, ra32u)
His 192+p4003-0+pj711
Her 192+p4003-0+pj712
Great achievement rc30a1+jzr41c44
Flatter (v7117u, v9711u)+j862
Surround and protect vc3219+jv4212
Through software processes, computing machine can obtain following result:
Sentence class X20; X2B: the people; XBC: his great achievement; X2 eulogizes.
Finally, computing machine can provide the result of sound word conversion: the people eulogize the great achievement of he (she).(" he " can not distinguish with " she " in this example.)

Claims (3)

1. analytical approach of simulating the natural language statement of brain language perception, it is characterized in that: this method comprises sentence category analysis (sca) method and hierarchical network of concepts speech level knowledge base, wherein, sentence category analysis (sca) method basis is comprehensive to the natural language statement, conclusion and deduction are divided into 7 essential sentence classes and 57 subclasses, to each essential sentence class and subclass thereof, with semantic chunk physical representation formula is semantic primitive, provide corresponding statement physical representation formula, these expressions have standard, standard, fault and 4 kinds of basic formats of omission; But every kind of basic format has again accordingly, the different-format of exclusive list on the mathematics.
2. method according to claim 1 is characterized in that: the statement step of this hierarchical network of concepts speech level knowledge base is as follows:
(1) knowledge base provides with sentence category code form the sentence class under the vocabulary;
When (2) forming sentence for vocabulary, the various actual arrangement orders of semantic chunk are with the formal representation of format code;
The knowledge of the formation knowledge of semantic chunk and the preferential notion of formation semantic chunk each several part when (3) providing vocabulary formation sentence;
The separation and the conversion knowledge of semantic chunk when (4) providing vocabulary formation semantic chunk;
(5) provide the semantic role knowledge that vocabulary serves as when constituting sentence;
(6) provide the linguistic context knowledge that vocabulary causes;
(7) provide the sentence class conversion knowledge that vocabulary causes;
(8) provide the knowledge that some semantic chunk that vocabulary causes expands to statement.
3. according to claim 1,2 described methods, it is characterized in that: the concrete treatment step of determining sentence category analysis (sca) is as follows:
(1) to the sentence of input, carry out the dictionary coupling, be syncopated as the speech that runs in the sentence, from knowledge base, obtain the semantic knowledge of these vocabulary;
(2) according to the indication of concept classification information, be foundation, form the semantic chunk blank, form the E hypothesis with semantic chunk differentiation designator 10 genuses and verb v notion;
(3) if fail to form the E hypothesis, turn to (9); Otherwise, continue;
(4) whole E are supposed to screen and line up, the main information of utilizing is: sentence category code, format code and word frequency and linguistic context knowledge;
(5), carry out a class check successively according to the ordering of selected E hypothesis; The main information of utilizing is: the preferred sex knowledge of the notion of semantic chunk core; If the one-hundred-percent inspection failure turns to (11); Otherwise continue;
(6) carry out semantic chunk and constitute check; The main information of utilizing is: semantic chunk constitutes knowledge and constitutes the knowledge of the preferential notion of semantic chunk each several part; If the one-hundred-percent inspection failure turns to (11); Otherwise continue;
(7) carry out a class Transformation Tests where necessary, the main information of utilizing is: the sentence class conversion knowledge that vocabulary causes; If the one-hundred-percent inspection failure turns to (11); Otherwise turn to (12);
(8) carry out semantic chunk where necessary and separate check, the main information of utilizing is: semantic chunk separates and conversion knowledge; If all fail then commentaries on classics (11), otherwise turn to (10);
(9) there is not the check of E semantic chunk sentence class; If failure continues; Otherwise turn to (12);
(10) recast E hypothesis successfully turns to (4), otherwise, turn to (11);
(11) man-machine interaction;
(12) collect the linguistic context material, processing finishes.
CNB981019218A 1998-05-18 1998-05-18 Natural language statement analyzing method simulating brain's language sensing process Expired - Fee Related CN1141660C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB981019218A CN1141660C (en) 1998-05-18 1998-05-18 Natural language statement analyzing method simulating brain's language sensing process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB981019218A CN1141660C (en) 1998-05-18 1998-05-18 Natural language statement analyzing method simulating brain's language sensing process

Publications (2)

Publication Number Publication Date
CN1236138A true CN1236138A (en) 1999-11-24
CN1141660C CN1141660C (en) 2004-03-10

Family

ID=5217018

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB981019218A Expired - Fee Related CN1141660C (en) 1998-05-18 1998-05-18 Natural language statement analyzing method simulating brain's language sensing process

Country Status (1)

Country Link
CN (1) CN1141660C (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1310171C (en) * 2004-09-29 2007-04-11 上海交通大学 Method for semantic analyzer bead on grammar model
CN1838159B (en) * 2006-02-14 2010-08-11 北京未名博思生物智能科技开发有限公司 Cognition logic machine and its information processing method
CN107422691A (en) * 2017-08-11 2017-12-01 山东省计算中心(国家超级计算济南中心) One kind collaboration PLC programming language building methods
CN107924679A (en) * 2015-07-13 2018-04-17 微软技术许可有限责任公司 Delayed binding during inputting understanding processing in response selects

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1310171C (en) * 2004-09-29 2007-04-11 上海交通大学 Method for semantic analyzer bead on grammar model
CN1838159B (en) * 2006-02-14 2010-08-11 北京未名博思生物智能科技开发有限公司 Cognition logic machine and its information processing method
CN107924679A (en) * 2015-07-13 2018-04-17 微软技术许可有限责任公司 Delayed binding during inputting understanding processing in response selects
CN107924679B (en) * 2015-07-13 2021-11-05 微软技术许可有限责任公司 Computer-implemented method, input understanding system and computer-readable storage device
CN107422691A (en) * 2017-08-11 2017-12-01 山东省计算中心(国家超级计算济南中心) One kind collaboration PLC programming language building methods

Also Published As

Publication number Publication date
CN1141660C (en) 2004-03-10

Similar Documents

Publication Publication Date Title
Inkelas et al. Is grammar dependence real? A comparison between cophonological and indexed constraint approaches to morphologically conditioned phonology
Koller et al. Sentence generation as a planning problem
Hutchins Towards a definition of example-based machine translation
Desclés et al. Textual processing and contextual exploration method
Bernardy et al. A type-theoretical system for the FraCaS test suite: Grammatical framework meets Coq
CN1141660C (en) Natural language statement analyzing method simulating brain's language sensing process
Farwell et al. Automatically creating lexical entries for ULTRA, a multilingual MT system
Hirschman et al. The PUNDIT natural-language processing system
Dorr et al. Multilingual generation: The role of telicity in lexical choice and syntactic realization
Buránová et al. Tagging of very large corpora: Topic-focus articulation
Lappin A sequenced model of anaphora and ellipsis resolution
CN113191140B (en) Text processing method and device, electronic equipment and storage medium
Papageorgiou et al. Multi-level XML-based Corpus Annotation.
CN1111814C (en) Opening and alli-information template type of language translation method having man-machine dialogue function and all-information semanteme marking system
Nallani et al. A Fully Expanded Dependency Treebank for Telugu
Shukla et al. A Framework of Translator from English Speech to Sanskrit Text
JP4033093B2 (en) Natural language processing system, natural language processing method, and computer program
Tsai et al. Applying an NVEF Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem
Vetulani et al. The Case of Polish on its Way to Become a Well-Resourced-Language
KR19990015131A (en) How to translate idioms in the English-Korean automatic translation system
Denis et al. A deep-parsing approach to natural language understanding in dialogue system: Results of a corpus-based evaluation
Curtis et al. Methods of Rule Acquisition in the TextLearner System.
JP4033088B2 (en) Natural language processing system, natural language processing method, and computer program
Vaillant A layered grammar model: Using tree-adjoining grammars to build a common syntactic kernel for related dialects
CN114282530A (en) Complex sentence emotion analysis method based on grammar structure and connection information triggering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20040310

Termination date: 20100518