CN100498932C - Universal Chinese dialogue generating method using two-stage compound template - Google Patents

Universal Chinese dialogue generating method using two-stage compound template Download PDF

Info

Publication number
CN100498932C
CN100498932C CNB031570046A CN03157004A CN100498932C CN 100498932 C CN100498932 C CN 100498932C CN B031570046 A CNB031570046 A CN B031570046A CN 03157004 A CN03157004 A CN 03157004A CN 100498932 C CN100498932 C CN 100498932C
Authority
CN
China
Prior art keywords
sentence
template
phrase
groove
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB031570046A
Other languages
Chinese (zh)
Other versions
CN1595496A (en
Inventor
杜利民
于水源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Original Assignee
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS filed Critical Institute of Acoustics CAS
Priority to CNB031570046A priority Critical patent/CN100498932C/en
Publication of CN1595496A publication Critical patent/CN1595496A/en
Application granted granted Critical
Publication of CN100498932C publication Critical patent/CN100498932C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to a general Chinese two-stage hybrid mold conversational language generation method, which involves the artificial intelligent natural language generating technique, especially generating the Chinese language with conversational characteristics according to the intrinsic expression of the language. The method divides the Chinese clause into two levels, which are sentence and phrase. Each level is generated by different mold. That is generating phrase according to the phrase mold, then combining the phrase mold to sentence according to the requirement of the sentence mold. In this way, the sentence is generated. The method is capable of providing the information needed by various types of the generated sentences, making use of the language behavior expression. Exacting language environment information assisted to generate the sentence makes the sentence more suitable for the conversational language environment. The algorithm independent of the task is easy to be transplanted.

Description

General Chinese two-stage hybrid template spoken dialog language generation method
Technical field
The present invention relates to the natural language generation technique of artificial intelligence, the internal expression that particularly relates to according to speech generates the Chinese language with spoken characteristic.
Background technology
Conversational language generation method is meant such computer software: it is an ingredient of the conversational system of oriented mission, and it can generate the natural language that is fit to spoken dialog according to the internal expression of speech.
Summary is got up, and existing natural language generation method can be classified as four classes:
One, the method for encapsulation text: an encapsulation text is a string that pre-defines, and it is what just to have finished writing in system design.System stores a set of strings, and the trigger condition of each string.When being subjected to triggering, can show one of them.Such string is static, presents to not change the user.
Two, based on the method for template: a template is a framework that pre-defines, and is filled with information by user or application program when operation.Template is made up of two major parts: template groove and pattern rule.The template groove is parameter or the variable that the user can assignment.Pattern rule has expressed how to realize a top layer key element.
Three, based on the method for phrase: phrase is the speech or the phrase of structure clause or a sentence in natural language.Defined a general template set based on the method for phrase, these templates are expressed the various phrases in natural languages, as noun phrase (NP) verb phrase (VP) etc.These general templates (phrase) are interrelated by a generation rule collection.A generation rule is exactly a constraint, has stipulated how to substitute another phrase with a speech or phrase.This method is based on phrase structure grammar.Phrase structure grammar has been described the method that the synthetic phrase of phrase and phrase are combined into sentence.
Four, based on the method for feature: feature representation be the characteristic of natural language.The probable value that each feature is all limited.As, quantative attribute can be odd number or plural number.The value of a feature has been described the form of a speech, clause or sentence structure.As, the subject of a sentence is a singular noun, then subject itself does not have inflection, and verb will change.In the language realization system based on feature, each characteristic of the syntax is all expressed by a feature.As, tense, number, and person.Generation is exactly that collecting for each importation one by one all is the feature set that is fit to.
From the angle of sentence generation, can be divided into two classes to above four class methods: based on the method for template (comprise the method that encapsulates text and based on the method for template) with based on the method that generates (comprising) based on the method for phrase with based on the method for feature.Because the former is based on the filling to prefab-form, and the latter is based on the generation of linguistic rules.
From generating on the employed method, their difference is: non-linguistics generates (the former) and linguistics generates (latter).Non-philological method only relates to the top layer of sentence, and philological method is to utilize the linguistics character of sentence.Philological method is utilized the syntactic property of sentence etc., as tense, and number, characteristics such as subject-predicate language consistance are though the sentence of generation is more flexible, because Chinese is not inflected language, lack the top layer constraint of variation and sentence structure, so philological method obviously is not suitable for the generation task of Chinese.But not the sentence that linguistic method generates lacks dirigibility, is difficult to be fit to spoken characteristic, and the maintenance task of template is heavy.
Summary of the invention
The purpose of this invention is to provide a kind of general Chinese two-stage hybrid template spoken dialog language generation method, it adopts the speech performance expression formula, and very strong ability to express is arranged, and meets discourse context more, has realized that algorithm and task are irrelevant, transplants easily.
For achieving the above object, technical solution of the present invention provides a kind of general Chinese two-stage hybrid template spoken dialog language generation method, and the sentence of its Chinese is broken down into two aspects: sentence and phrase; Each aspect uses different templates to generate respectively,, generates phrase according to the phrase template that is, again the phrase template is combined into sentence according to the requirement of sentence template, thereby generates sentence.
Described spoken dialog language generation method, it may further comprise the steps:
One, design phrase template is used for the constant phrase of generating structure;
Two, the sentence template of basic word order is expressed in design;
Three, with the phrase be assembly, fill sentence template, generate sentence;
Four, design sentence template transformation rule is with generating different sentence patterns.
Described spoken dialog language generation method, its described template all derives from the real corpus storehouse, and template is made up of several semantic grooves, the semantic role that each groove all is to use () to bracket; It is made up of template name and several groove, and each groove is made up of the semantic role title of groove and the phrase template name of this semantic role of realization again; Initiatively except the speech, it is made up of the symbol and the verb of verb; Template meets the characteristics of Chinese with semantic closely related.
Described spoken dialog language generation method, its described template comprises static template and encapsulation text two classes.
Described spoken dialog language generation method, its described static template, the sentence that the mode that there is no need to use sentence template to add the phrase template generates, its main body is not participated in reasoning from logic, and the content that sentence relates to is simple, and frequency of utilization is very low.
Described spoken dialog language generation method, its described encapsulation text is meant that some sentence cannot decompose in the dialogue, the meaning of sentence is not the function of the meaning and the structure of its ingredient; This part sentence generally is the communicative function language, as greeting, and greeting, they do not meet Fu Leige (Frege) combination principle, do not relate to entity and predicate, promptly directly use the encapsulation text definition, directly export when needing.
Described spoken dialog language generation method, its described phrase template is a basic templates, is changeless, has certain semantic meaning, template set up separately in each phrase, the method for setting up be entity as groove, all the other words all are the speech that are inlaid into template.
Described spoken dialog language generation method, its described sentence template is to be basic sentence that the center is divided into phrase with the entity, each phrase uses a title definition, except the active speech of sentence; Its each groove has indicated semantic role simultaneously.
Described spoken dialog language generation method, the input data structure of its algorithm are CSL speech performance expression formulas.
Described spoken dialog language generation method, the step of its algorithm is as follows:
The first step, template are selected: according to the title access templates mapping table of sentence predicate, and the type of the template of judgement visit, Fang Wen template is the encapsulation text if desired, then calls this template, obtains return data, and output is returned, and algorithm stops; Otherwise obtain template;
Second step, definite sentence element that needs generation: after having obtained sentence template, just obtained the groove that the needs of the certain semantic meaning of representative are filled in the sentence;
The 3rd goes on foot, assigns the filling value of each groove: after having determined the sentence element that needs to generate, exactly after having known which groove need be filled, because the semantic item in the predicate expression formula is corresponding with semantic item in the template, only need one by one to insert on the correspondence just passable;
The pre-generation of the 4th step, sentence element: the task in this stage can be divided into two classes, and a class is to use other generative process, generates phrase to replace the semantic groove in this template, and second class is to fill according to the groove of this template;
At first, the composition that relates in the first kind has the description speech, is collectively referred to as and pronoun; The generation of description speech is to call template equally, and the predicate symbol access templates mapping table according to the primitive formula of description speech obtains template, according to the template transformation rule of description speech, generates the description speech then; The generation that is collectively referred to as phrase is to use special-purpose template, with each speech series connection output; The generation of pronoun is exactly the service condition according to pronoun, under situation about satisfying condition, uses pronoun " it " to substitute the main body speech;
The 5th step, phrase generate: the second class sentence element is exactly the semantic groove in this template in the 4th step, with the information of each appointment, fills corresponding semantic terms template, and the result who fills is returned to this template;
The generation of the 6th step, all kinds of sentence patterns: according to the sentence pattern mapping table, select the type of the sentence that will generate according to the C_S_L three's of speech performance expression formula relation, comprising:
The generation of a, interrogative sentence: (1) is selected to ask: select the groove of query item correspondence to use each options content to repeat to generate successively, with " still being " serial connection, get and export the result to the end between the result of generation; (2) be non-asking: if query point is on predicate, then in that end of the sentence is connected in series "? " If query point is on predicate, then before the query item, add "Yes", in that end of the sentence is connected in series "? ", get and export the result to the end; (3) refer in particular to and ask: according to the position of query point in the predicate formula, the definition of visit predicate obtains the interrogative in the sets definition of this argument place; Title visit essential sentence template mapping table according to predicate obtains corresponding basic templates; Other parts of template generate according to the generative process of basic sentence, and the groove that query point is corresponding is filled according to the interrogative that obtains in using a), exports the result at last;
The generation of b, negative: for first speech in negative the corresponding phrase is pronouns, general term for nouns, numerals and measure words, and the negative of generation is to add " not being " before this phrase, otherwise adds " no "; For negating point at predicate, directly add negative word " no " before the active speech in the basic sentence that generates;
The generation of C, elliptical sentence: former predicate has corresponding template can be used for generating, the part composition omits later proposition now, still adopt original template to generate, the mode that generates is the same when not omitting, just the groove of omitted items correspondence will not be filled, when exporting the result at last, with the front and back item short circuit of the groove of omitted items correspondence.
Described spoken dialog language generation method, in its described first step, the inlet of described access templates mapping table is exactly to concern predicate symbol.
Described spoken dialog language generation method is in its described second step, sometimes, not all these sentence elements all need to generate, as using letter to answer language at needs when, just need not generate the composition in all sentences, only generate item corresponding and active speech with the query item of sentence; Sentence element that neither be all will generate one by one according to the template of composition correspondence, and as when needs use pronoun, the composition that is substituted by pronoun does not just need to have generated, and only is to use a pronoun to substitute a name lexical item in the sentence; So before generating sentence, need make judgement.
Described spoken dialog language generation method, in its described the 3rd step, wherein omitted items also is occupy-place.
Described spoken dialog language generation method, in its described the 6th step, other parts of sentence template generate according to the generative process of basic sentence.
Described spoken dialog language generation method, the speech performance expression formula is adopted in the input of its algorithm, and it has very strong ability to express, and the abundant needed information of generation sentence type can be provided.
Described spoken dialog language generation method is extracted the auxiliary generation of language ambience information in its algorithm, make the sentence that generates meet discourse context more.
Described spoken dialog language generation method, its algorithm has been taked following method: have only the universal law in the language to write algorithm, habitual intellectual content and task definition will all write configuration file.
Described spoken dialog language generation method, its algorithm and task are irrelevant, so transplant easily.
Described spoken dialog language generation method, its algorithm is open-ended.
Described spoken dialog language generation method, its described expansion is that the sentence type that algorithm generates can expand; The phrase type that algorithm generates can expand; The sentence element that algorithm generates can expand.
Spoken dialog language generation method of the present invention adopts the speech performance expression formula, and very strong ability to express is arranged, and the abundant needed information of generation sentence type can be provided.Extract the auxiliary generation of language ambience information, make the sentence that generates meet discourse context more.Realized that algorithm and task are irrelevant, so transplant easily.
Embodiment
Fu Leige (Frege) combination principle is thought: " whole meaning of sentence is the function of its part meaning and their array mode.
Putting in order of Chinese sentence composition is flexible and changeable, especially in spoken language.It not is without basis random that but these word orders change, but certain pragmatic purpose is arranged.
One class word order is arranged is the most basic, and they can be considered to linguistic context neutrality, are called as typical word order.They present a kind of modal, the most basic tactic pattern, and this structure does not have the formal notation of any other certain special pragmatic meaning of expression, no specific pragmatic meaning.Other word order all is the basic up conversion at typical word order, is for certain specific pragmatic purpose.
Simple transformation relation is arranged between the sentence pattern of Chinese, and they can be from the conversion of basic statement sentence.
Basic statement sentence → negative: before negating item, add negative word.
Basic statement sentence → alternative question: select the query item to repeat, sentence tail adds "? "
Basic statement sentence → be non-interrogative sentence: the antisense of query point repeats, and perhaps adds "Yes", sentence tail adds "? "
Basic statement sentence → refer in particular to interrogative sentence: the query item is replaced by interrogative pronoun, the sentence tail adds "? "
Basic statement sentence → elliptical sentence: composition short circuit before and after the omitted items, all the other are constant.”
The phrase of Chinese (phrase) has special grammer status: the structure principle of Chinese phrase is consistent with the structure principle of sentence basically.According to the Chinese grammar system at " phrase one's own department or unit ", the sentence of Chinese directly is not made up of speech, but forms phrase earlier by speech, is embodied as sentence by phrase again.The phrase of Chinese has strict fixing structure, and the semanteme of structural change phrase also and then changes.
According to the characteristics of Chinese word order, sentence pattern and phrase, designed following generation method:
Design phrase template is used for the constant phrase of generating structure.
The sentence template of basic word order is expressed in design.
With the phrase is assembly, fills sentence template, generates sentence.
Design sentence template transformation rule is with generating different sentence patterns.
Hybrid template method that Here it is.From to the result of study of Chinese linguistics in general, the method for hybrid template is not a kind of generation method of sentence, and a kind of especially viewpoint for the treatment of Chinese sentence is a kind of analytic method of Chinese sentence.
Template all derives from the real corpus storehouse, is artificial extraction template at present, and step is as follows:
With the sentence cluster, each class is represented a semanteme according to the semanteme of verb, then has a predicate corresponding with it.This predicate derives from the category process of task model.
Select that to relate to entity in the similar sentence maximum, and do not have special pragmatic purpose, declarative sentence certainly is as basic sentence.If do not satisfy these conditions, can do conversion according to the sentence pattern transformation rule.
Is this basic sentence that the center is divided into phrase with the entity, and each phrase uses a title definition.Except the active speech of sentence.What obtained this moment is exactly sentence template.
Template set up separately in each phrase, the method for setting up be entity as groove, all the other words all are the speech that are inlaid into template.Phrase template that Here it is.
Template is made up of several semantic grooves, the semantic role that each groove all is to use () to bracket.It is made up of template name and several groove.Each groove is made up of the semantic role title of groove and the phrase template name of this semantic role of realization again, and initiatively except the speech, it is made up of the symbol Verb and the verb of verb.
The maximum of this sentence template and conventional template is not both, and each groove has indicated semantic role simultaneously.
The phrase template is in order to realize the template of phrase, and it has certain semantic meaning.The phrase template is a basic templates, is changeless.
For the consideration that system realizes, this method also is defined as follows two class templates:
1, static template: the content that sentence relates to is simple, and frequency of utilization is very low, and main body is not wherein participated in reasoning from logic etc., and the mode that such sentence there is no need to use sentence template to add the phrase template generates.Designed static template for this reason.
2, encapsulation text: some sentence cannot decompose in the dialogue, and the meaning of sentence is not the function of the meaning and the structure of its ingredient.This part sentence generally is the communicative function language, as greeting greeting etc.They do not meet Fu Leige (Frege) combination principle.They do not relate to entity and predicate, and this method is directly used the encapsulation text definition, directly exports when needing.
Arthmetic statement
The input data structure of this algorithm is a CSL speech performance expression formula.The algorithm performing step is as follows:
Step 1: template is selected
The first step of algorithm is to obtain to generate the template that sentence is used.At first according to the title access templates mapping table of predicate, and judge the type of the template of visit, Fang Wen template is the encapsulation text if desired, then calls this template, obtains return data, exports, and returns, and algorithm stops.Otherwise obtain template.The inlet of access templates mapping table is exactly to concern predicate symbol.The template mapping table is exactly one of tables of data, sets up in advance.Its structure is very simple, is made up of predicate name and template name.
Step 2: sentence element that need to determine generation
After having obtained sentence template, just obtained the groove that the needs of the certain semantic meaning of representative are filled in the sentence.But sometimes, not all these sentence elements all need to generate, and when using letter to answer language at needs, just need not generate the composition in all sentences, only generate item corresponding with the query item of sentence and active speech, and its remainder need not generate.Sentence element that neither be all will generate one by one according to the template of composition correspondence, and as when needs use pronoun, the composition that is substituted by pronoun does not just need to have generated, and at this moment only is to use a pronoun to substitute a name lexical item in the sentence.So before generating sentence, need make judgement.
The generation of language is answered in letter
Be used to answer the sentence of interrogative sentence, with respect to interrogative sentence, just be called as and answer sentence.Owing in question sentence, mentioned the involved most of main body of information, question sentence is identical with the linguistic context of answering the sentence use, generally just needn't all provide all the components so answer in the sentence, only list wherein that part main information item gets final product, this is called as letter and answers sentence.So answering sentence sometimes can be very simple, simple such form is called as minimum form to only comprising pairing of query item, still for meet Politeness Principle generally in answering sentence also again multiple row go out several sentence elements.For the natural language dialogue system, this method does not use the letter of minimum form to answer language, but will list other sentence element of a part.The purpose of doing like this is to overcome following two kinds of deficiencies: the first, and the item of information that only comprises query in the language is answered in letter, and sentence gives hearer's feel for the language polite inadequately; The second, the sentence of minimum form is after doing speech production, because vocabulary is few, the voice of generation are short, cause hearer's leakage to listen easily, and speech just is through with when not catching, and causes the hearer to require repetition, has increased dialogue wheel number.
The language service condition is answered in letter:
If the speech performance expression formula of last sentence is Direct_Question, and this sentence is the Represent class, the speech performance that promptly goes up sentence is to put question to, and this speech performance is to set forth; The predicate expression formula of interrogative sentence is identical with the predicate expression formula of simple sentence.Concrete is: for selecting to ask and being non-asking, be one of expression formula of question sentence if answer the expression formula of sentence, then can use letter to answer language; So ask refering in particular to, if answer sentence and the semantic formula of question sentence except on the query point one be that one of question mark is the semantic formula, all the other are identical, then will use letter to answer language.
The generative process of language is answered in letter
Interrogative sentence is divided three classes, and letter is answered language and also is divided three classes accordingly.
One class is selected to ask, is the options that question sentence provides two or more, makes a choice by answering sentence.So it just can be only to comprise options that language is answered in letter.
One class is non-asking, can be regarded as special selection and asks, options is two propositions of negating each other, answers sentence and selects one of them.Language is answered in letter can be certainly or negate that sure letter is answered language and is: " yes ", the letter negating are answered language and are: " no ".
One class is refered in particular to and is asked, options is not provided, and the information content will provide by answering sentence, and the interrogative that yet uses is provided by a mapping table by system.The visit process be, according to the position at query point place, obtain the name variable of query item position, remove to visit the interrogative mapping table according to name variable.
The use of pronoun
So-called pronoun is the speech that replaces noun and play phrase, subordinate sentence and the subordinate sentence of noun effect.The use of pronoun is in order to increase the naturality of sentence, to make sentence more near natural language in generation system, makes hearer's sensory adaption.Simultaneously also be in order to make the language that generates more succinct, to give prominence to fresh information.
Use the condition of pronoun
If have only a main body, and identically with this main body of talking about can use pronoun in the last sentence (the other side, oneself).
After satisfying the pronoun service condition, with the body groove in this template of " it " substitution.
Step 3: the filling value of assigning each groove
After having determined the sentence element that needs to generate, exactly after having known which groove need be filled, will assign filling information for these grooves.This process is very simple because definition concern that the semantic item in the predicate expression formula is corresponding with semantic item in the template, only need correspondingly one by one to go up just passablely, still notice that omitted items also is occupy-place.
Step 4: the pre-generation of sentence element
Next be exactly to have generated the composition of sentence.Task in this stage can be divided into two classes, and a class is to use other generative process, generates phrase to replace the semantic groove in this template, and second class is to fill according to the groove of this template.
At first, the first kind is used other generative process, generates phrase to replace the semantic groove in this template.The composition that relates in this class has the description speech, is collectively referred to as and pronoun.The generation of description speech is to call template equally, and the predicate symbol access templates mapping table according to the primitive formula of description speech obtains template, according to the template transformation rule of description speech, generates the description speech then.This process is similar with the generative process of sentence in fact, is equivalent to the recursive call of this algorithm.The generation that is collectively referred to as phrase is to use special-purpose template P_QuantifierNoun, with each speech series connection output.The generation of pronoun is exactly the service condition according to pronoun, under situation about satisfying condition, uses pronoun " it " to substitute the main body speech.
Step 5: phrase generates
The second class sentence element is exactly the semantic groove in this template, and this part is just relatively simple, with the information of each appointment, fills corresponding semantic terms template, the result who fills is returned to this template get final product.
Step 6: the generation of all kinds of sentence patterns
Select the type of the sentence that will generate according to the C_S_L three's of speech performance expression formula relation, this concerns and the correspondence of the type of sentence is according to a sentence pattern mapping table.
What need to prove this sentence pattern mapping table embodiment is which kind of sentence pattern is a speech performance use go to express, and the different subclasses in similar speech performance have different pragmatic strength, need the conversion word order sometimes, need to select different sentence patterns to express sometimes again.Its structure is a simple two-dimensional corresponding tables.
The generation of interrogative sentence
Interrogative sentence is divided three classes: select to ask, be non-ly to ask and refer in particular to and ask.
1, select to ask:
The generative process of alternative question is:
Predicate title access templates mapping table according to the semantic meaning representation formula obtains the basic sentence template.
Determine to select the title of query item in template, and the content of the options that provides in the formula.
Other parts of template generate according to the generative process of basic sentence, select the groove of query item correspondence to use each options content to repeat to generate successively, are connected in series with " still being " between the result of generation.Get and export the result to the end.
2, be non-asking:
The judgement of the query point of yes-no question is the problem of more complicated, the sentence that same character arranging is formed, and query point is different, meaning is different, for sentence is produced ambiguity, this method is additional marking on query point, and the generative process of concrete yes-no question is as follows:
Generating mode according to the basic sentence of correspondence generates
If query point is on predicate, then in that end of the sentence is connected in series "? "
If query point is on predicate, then before the query item, add "Yes", in that end of the sentence is connected in series "? "
Need to prove, in the natural language spoken language, determine " T21 time 8 from Beijing? " the query point of such sentence is to use stress, and where stress is exactly to query where if dropping on, and goes up doubting to " 8 point " exactly as stress being dropped on " 8 point ".So that when emphasizing query point, use to add the method for speech "Yes" at query point, that is, and last become " be for T21 time 8 from Beijing? ", read " be 8 points " in the spoken language this moment.What this method adopted is emphatic query, and purpose mainly is to increase the readability of system's language, reduces ambiguity.Also reduced simultaneously requirement to the phonetic synthesis module.
3, refer in particular to and ask:
For refering in particular to the generation of asking, it generates template still according to the basic sentence template of correspondence, and the key in still generating is how to obtain interrogative.For this reason, this method makes up an interrogative mapping table, and this table is to set up in the application task categoryization, has wherein defined each body set and should use which interrogative to carry out query, and its structure is a simple two-dimensional corresponding tables.
Its generative process of refering in particular to question sentence is:
According to the position of query point in the predicate formula, the definition of visit predicate obtains the interrogative in the sets definition of this argument place.
Title visit essential sentence template mapping table according to predicate obtains corresponding basic templates.
Other parts of template generate according to the generative process of basic sentence, and the result is exported in the interrogative filling that the groove that query point is corresponding uses the generation of substep a interrogative sentence in the generation of all kinds of sentence patterns of the 6th step to obtain at last.
4, the generation of negative
Directly given expression to the position of negating in the negative expression formula.In general, negative is markd, and its focal position should be in negative part, for outstanding negative part, for first speech in negative the corresponding phrase is speech, and the negative that this method generates is to add " not being " before this phrase, otherwise adds " no "; For negating point at predicate, this method adopts the mode that directly adds negative word " no " before the active speech in the basic sentence that generates.
The explanation of following this method negate point not in the generative process of the negative of predicate:
Title visit basic sentence template mapping table according to predicate obtains corresponding template.
The generative process that is similar to basic sentence is filled each groove, comprises the groove at negative word place.
Serial connection output.If first that negates a corresponding phrase template is groove, then before this phrase, to insert " not being " otherwise insert " no ", all the other are every constant, the output of order serial connection.
5, the generation of elliptical sentence
Elliptical sentence is meant the sentence that utilization argument default rule obtains.Former predicate is to have corresponding template can be used for generating.The part composition omits later proposition now, this method still adopts original template to generate, and the mode of generation is the same with former the omission, and just the groove of omitted items correspondence will not be filled, when exporting the result at last, that the front and back item short circuit of the groove of omitted items correspondence is just passable.
Algorithmic characteristic
The speech performance expression formula is adopted in the input of algorithm, and it has very strong ability to express, and the abundant needed information of generation sentence type can be provided.
Extract the auxiliary generation of language ambience information, make the sentence that generates meet discourse context more.
Algorithm is open-ended.
The sentence type that algorithm generates can expand.
The phrase type that algorithm generates can expand.
The sentence element that algorithm generates can expand.
Template meets the characteristics of Chinese with semantic closely related
This algorithm has been taked following method:
Have only the universal law in the language to write algorithm, habitual intellectual content will write configuration file, and task definition will all write configuration file.
Because the present invention has realized algorithm and task and has had nothing to do, so transplant easily.

Claims (13)

1, a kind of general Chinese two-stage hybrid template spoken dialog language generation method is characterized in that the sentence of Chinese is broken down into two aspects: sentence and phrase; Each aspect uses different templates to generate respectively,, generates phrase according to the phrase template that is, again the phrase template is combined into sentence according to the requirement of sentence template, thereby generates sentence; The algorithm that this method adopted comprises following a few part:
One, design phrase template is used for the constant phrase of generating structure;
Two, the sentence template of basic word order is expressed in design;
Three, with the phrase be assembly, fill sentence template, generate sentence;
Four, design sentence template transformation rule is with generating different sentence patterns.
2, spoken dialog language generation method as claimed in claim 1 is characterized in that, all phrase template and sentence template all derive from the real corpus storehouse; The phrase template is made up of several semantic grooves, the semantic role that each groove all is to use bracket to bracket; Sentence template is made up of template name and several groove, and each groove is again by the semantic role title of groove with realize that the phrase template name of this semantic role forms, and except the active speech of sentence, it is made up of the symbol and the verb of verb.
3, spoken dialog language generation method as claimed in claim 1 or 2 is characterized in that, described sentence template comprises static template and encapsulation text two classes:
Described static template is meant the sentence that the mode that there is no need to use sentence template to add the phrase template generates, and its main body is not participated in reasoning from logic, and the content that sentence relates to is simple, and frequency of utilization is very low;
Described encapsulation text is meant that some sentence cannot decompose in the dialogue, and the meaning of sentence is not the function of the meaning and the structure of its ingredient; This part sentence generally is the communicative function language, comprises greeting and greeting, and they do not meet Fu Leige combination principle, do not relate to entity and predicate, promptly directly uses the encapsulation text definition, directly exports when needing.
4, spoken dialog language generation method as claimed in claim 1 or 2, it is characterized in that, described phrase template is a basic templates, be changeless, has certain semantic meaning, template set up separately in each phrase, the method for setting up be entity as groove, all the other words all are the speech that are inlaid into template.
5, spoken dialog language generation method as claimed in claim 1 or 2 is characterized in that, described sentence template is to be basic sentence that the center is divided into phrase with the entity, and each phrase uses a title definition, except the active speech of sentence; Its each groove has indicated semantic role simultaneously.
6, spoken dialog language generation method as claimed in claim 1 is characterized in that the step of described algorithm is as follows:
The first step, template are selected: according to the title access templates mapping table of sentence predicate, and the type of the template of judgement visit, Fang Wen template is the encapsulation text if desired, then calls this template, obtains return data, and output is returned, and algorithm stops; Otherwise obtain template;
Second step, definite sentence element that needs generation: after having obtained sentence template, just obtained the groove that the needs of the certain semantic meaning of representative are filled in the sentence;
The 3rd goes on foot, assigns the filling value of each groove: after having determined the sentence element that needs to generate, exactly after having known which groove need be filled, because the semantic item in the predicate expression formula is corresponding with semantic item in the template, only need one by one to insert on the correspondence just passable;
The pre-generation of the 4th step, sentence element: the task in this stage can be divided into two classes, and a class is to use other generative process, generates phrase to replace the semantic groove in this template, and second class is to fill according to the groove of this template;
At first, the composition that relates in the first kind has the description speech, is collectively referred to as and pronoun; The generation of description speech is to call template equally, and the predicate symbol access templates mapping table according to the primitive formula of description speech obtains template, according to the template transformation rule of description speech, generates the description speech then; The generation that is collectively referred to as phrase is to use special-purpose template, with each speech series connection output; The generation of pronoun is exactly the service condition according to pronoun, under situation about satisfying condition, uses pronoun " it " to substitute the main body speech;
The 5th step, phrase generate: the second class sentence element is exactly the semantic groove in this template in the 4th step, with the information of each appointment, fills corresponding semantic terms template, and the result who fills is returned to this template;
The generation of the 6th step, all kinds of sentence patterns: according to the sentence pattern mapping table, select the type of the sentence that will generate according to the relation of each parameter in the speech performance expression formula, comprising:
The generation of a, interrogative sentence: (1) is selected to ask: select the groove of query item correspondence to use each options content to repeat to generate successively, with " still being " serial connection, get and export the result to the end between the result of generation; (2) be non-asking: if query point is on predicate, then in that end of the sentence is connected in series "? " If query point is on predicate, then before the query item, add "Yes", in that end of the sentence is connected in series "? ", get and export the result to the end; (3) refer in particular to and ask: according to the position of query point in the predicate formula, the definition of visit predicate obtains the interrogative in the sets definition of this argument place; Title visit essential sentence template mapping table according to predicate obtains corresponding basic templates; Other parts of template generate according to the generative process of basic sentence, and the groove that query point is corresponding is filled according to the interrogative that obtains in using a), exports the result at last;
The generation of b, negative: for first speech in negative the corresponding phrase is pronouns, general term for nouns, numerals and measure words, and the negative of generation is to add " not being " before this phrase, otherwise adds " no "; For negating point at predicate, directly add negative word " no " before the active speech in the basic sentence that generates;
The generation of C, elliptical sentence: former predicate has corresponding template can be used for generating, the part composition omits later proposition now, still adopt original template to generate, the mode that generates is the same when not omitting, just the groove of omitted items correspondence will not be filled, when exporting the result at last, with the front and back item short circuit of the groove of omitted items correspondence.
7, spoken dialog language generation method as claimed in claim 6 is characterized in that, in the described first step, the inlet of described access templates mapping table is exactly to concern predicate symbol.
8, spoken dialog language generation method as claimed in claim 6 is characterized in that, in described second step, when needs use letter to answer language, need not generate all the components in the sentence, only generates item corresponding with the query item of sentence and active speech; When needs used pronoun, the composition that is substituted by pronoun did not just need to have generated, and only is to use a pronoun to substitute a name lexical item in the sentence; All these need make judgement before generating sentence.
9, spoken dialog language generation method as claimed in claim 6 is characterized in that, in described the 3rd step, wherein omitted items also is occupy-place.
10, spoken dialog language generation method as claimed in claim 6 is characterized in that, in described the 6th step, other parts of sentence template generate according to the generative process of basic sentence.
11, as claim 1 or 6 described spoken dialog language generation methods, it is characterized in that, extract the auxiliary spoken dialog language that generates of language ambience information in the described algorithm, make the sentence that generates meet discourse context more.
12, as claim 1 or 6 described spoken dialog language generation methods, it is characterized in that described algorithm has been taked following method: have only the universal law in the language to write algorithm, habitual intellectual content and task definition will all write configuration file.
13, as claim 1 or 6 described spoken dialog language generation methods, it is characterized in that, described algorithm is open-ended, i.e. the sentence element that the phrase type that sentence type can expand, algorithm generates can expand, algorithm generates of algorithm generation can expand.
CNB031570046A 2003-09-08 2003-09-08 Universal Chinese dialogue generating method using two-stage compound template Expired - Fee Related CN100498932C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB031570046A CN100498932C (en) 2003-09-08 2003-09-08 Universal Chinese dialogue generating method using two-stage compound template

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB031570046A CN100498932C (en) 2003-09-08 2003-09-08 Universal Chinese dialogue generating method using two-stage compound template

Publications (2)

Publication Number Publication Date
CN1595496A CN1595496A (en) 2005-03-16
CN100498932C true CN100498932C (en) 2009-06-10

Family

ID=34660168

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB031570046A Expired - Fee Related CN100498932C (en) 2003-09-08 2003-09-08 Universal Chinese dialogue generating method using two-stage compound template

Country Status (1)

Country Link
CN (1) CN100498932C (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193798B (en) * 2017-05-17 2019-06-04 南京大学 A kind of examination question understanding method in rule-based examination question class automatically request-answering system
CN108563617B (en) * 2018-03-12 2021-09-21 云知声智能科技股份有限公司 Method and device for mining Chinese sentence mixed template
CN114417807B (en) * 2022-01-24 2023-09-22 中国电子科技集团公司第五十四研究所 Human-like language description expression method for collaboration scene of presence or absence

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1179587A (en) * 1996-09-30 1998-04-22 微软公司 Prosodic databases holding fundamental frequency templates for use in speech synthesis
JP2000163088A (en) * 1998-11-30 2000-06-16 Matsushita Electric Ind Co Ltd Speech synthesis method and device
CN1417707A (en) * 2002-12-02 2003-05-14 刘莎 Natural language semantic information united-coding method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1179587A (en) * 1996-09-30 1998-04-22 微软公司 Prosodic databases holding fundamental frequency templates for use in speech synthesis
JP2000163088A (en) * 1998-11-30 2000-06-16 Matsushita Electric Ind Co Ltd Speech synthesis method and device
CN1417707A (en) * 2002-12-02 2003-05-14 刘莎 Natural language semantic information united-coding method

Also Published As

Publication number Publication date
CN1595496A (en) 2005-03-16

Similar Documents

Publication Publication Date Title
Hoeksema Categorial morphology
Baltin et al. The handbook of contemporary syntactic theory
Koenig et al. Type underspecification and on-line type construction in the lexicon
McCawley Current Trends in Linguistics, Vol. 3: Theoretical Foundations
Bagasheva Paradigmaticity in compounding
Vogt et al. Verifying theories of language acquisition using computer models of language evolution
Hahn et al. The anatomy of the natural language dialogue system HAM-RPM
CN100498932C (en) Universal Chinese dialogue generating method using two-stage compound template
Sgall Language in its multifarious aspects
Chor From'Direction'to'Positive Evaluation': On the Grammaticalization, Subjectification and Intersubjectification of faan1'return'in Cantonese
Cojocaru et al. Text Generation Starting from an Ontology.
Vallejos Yopán The focus function (s) of= pura in Kokama-Kokamilla discourse
Tucker A functional lexicogrammar of adjectives
Kempson et al. Incrementality, alignment and shared utterances
KR20040055288A (en) Method for tagging for prosodic module of speech synthesizer in Korean
Kazantseva An approach to summarizing short stories
Seuren The importance of being modular
Collins Expressions, sentences, propositions
Kempson et al. Incremental parsing, or incremental grammar?
Lasnik On ellipsis: Is material that is phonetically absent but semantically present present or absent syntactically
CN110457551B (en) Method for constructing semantic recursion representation system of natural language
Collins Horwich's schemata meet Syntactic Structures
CN110489752B (en) Semantic recursion representation system of natural language
Oversteegen et al. Computing perspective: the pluperfect in Dutch
JP3892227B2 (en) Machine translation system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090610

Termination date: 20120908