JP4537755B2

JP4537755B2 - Spoken dialogue system

Info

Publication number: JP4537755B2
Application number: JP2004135631A
Authority: JP
Inventors: 浩彦佐川; テルコ・ミタムラ; エリック・ナイバーグ
Original assignee: Carnegie Mellon University
Current assignee: Carnegie Mellon University
Priority date: 2004-04-30
Filing date: 2004-04-30
Publication date: 2010-09-08
Anticipated expiration: 2024-04-30
Also published as: JP2005316247A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technology to enable a user and a system to smoothly progress a dialogue even when the system makes erroneous recognition of a user's dialogue by performing detection of a correction utterance flexibly with respect to the conditions of the various dialogues and types of the correction dialogue and by correcting the erroneous recognition by the system. <P>SOLUTION: The system is provided with a dialogue history for recording the changes in the state of the dialogues, such as the rules used for recognizing the kinds of the dialogues and the voices of the user, the messages outputted by the system, etc., and the rules for recognizing the user's correction dialogue are generated by using the information in the dialogue history and a prepared template in advance. When user's utterance is recognized by using the generated rules, the user's utterance is regarded as the correction dialogue and the operation is shifted to processing to correct the erroneous recognition. In the correction of the erroneous recognition, the correction is performed by utilizing both of the recognition results of the correction dialogue and the recognition results of the user's utterances till then. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、音声を用いて利用者とシステムが対話を行うことにより、利用者にサービスを提供する音声対話システム、利用者用インタフェース、あるいはソフトウェアに関する。 The present invention relates to a voice interaction system, a user interface, or software that provides a service to a user when a user interacts with the system using voice.

音声を用いて利用者と対話を行い、利用者にサービスを提供する技術としては、「特許文献１」や「特許文献２」が挙げられる。また、利用者とシステムとの音声を用いた対話において、システムが利用者の発話を誤認識した際に、利用者がシステムの誤認識を訂正するために発話する訂正発話を扱った技術としては、「特許文献３」や「特許文献４」が挙げられる。「特許文献３」では、訂正発話の特徴的パラメータが通常の発話とは異なることを利用してシステムの誤認識を検出し、それを修正する技術が述べられている。「特許文献４」では、利用者の発話に対するシステムの返答中に利用者からの訂正発話を受け付けた場合、訂正発声の認識結果に基づいてシステムの返答内容を変更する技術が述べられている。
特許出願２００２−２４７９９、「音声対話装置及び方法、音声対話プログラム並びにその記録媒体」特許出願２００２−１２２４９１、「音声対話システム及び音声対話方法」特許出願２０００−４５４４２、「音声認識結果の誤り訂正方法およびその装置」特許出願２００２−４５５２、「音声対話方法および装置」 Examples of a technique for interacting with a user using voice and providing a service to the user include “Patent Document 1” and “Patent Document 2”. In addition, as a technology that handles corrected utterances that the user utters to correct misrecognition of the system when the system misrecognizes the utterances of the user in the dialogue between the user and the system, , “Patent Document 3” and “Patent Document 4”. “Patent Document 3” describes a technique for detecting and correcting misrecognition of a system by utilizing a characteristic parameter of a corrected utterance different from that of a normal utterance. “Patent Document 4” describes a technique in which when a corrected utterance is received from a user while the system responds to a user's utterance, the response content of the system is changed based on the recognition result of the corrected utterance.
Patent application 2002-24799, "Voice dialogue apparatus and method, voice dialogue program and recording medium thereof" Patent application 2002-122491, "voice dialogue system and voice dialogue method" Patent application 2000-45442, “Error recognition method and apparatus for speech recognition result” Patent application 2002-4552, “Voice interaction method and apparatus”

「特許文献１」や「特許文献２」に示されるような従来の多くの音声対話システムにおいては、利用者の発話を認識した結果をシステムが確認する際に、利用者の発話としては「はい」か「いいえ」のみしか想定されていない場合が多かった。しかし、システムが誤認識した内容を含む確認を行った場合、利用者は「いいえ」だけでなく、誤りを訂正するための内容を含む発話を行うことが多いことが知られている。このため、システムが利用者の発話を誤認識した際に利用者とシステムのスムーズな対話を行うことが困難であった。 In many conventional speech dialogue systems as shown in “Patent Document 1” and “Patent Document 2”, when the system confirms the result of recognizing the user's utterance, “Yes” In many cases, only "No" or "No" was assumed. However, it is known that when the confirmation including the content that the system misrecognized is performed, the user often makes an utterance including not only “No” but also the content for correcting the error. For this reason, when the system misrecognizes the user's utterance, it is difficult to perform a smooth dialogue between the user and the system.

一方、「特許文献３」では訂正発話の特徴的パラメータを利用した訂正発話の検出方法が述べられているが、さまざまな対話の形式を想定した場合、訂正発話と通常発話における差が見られない場合もあり、十分とは言えない。また、訂正発話として、訂正を行う語・句を含む発話のみが想定されているが、誤認識した語・句と訂正を行う語・句の両方を含む発話や誤認識した語を否定するだけの発話等、訂正発話にはさまざまな形式が想定される。このため、全ての訂正発話に正しく対応できないという問題がある。 On the other hand, “Patent Document 3” describes a method of detecting a corrected utterance using characteristic parameters of the corrected utterance. However, when various dialogue formats are assumed, there is no difference between the corrected utterance and the normal utterance. Sometimes it's not enough. In addition, only utterances that include corrected words / phrases are assumed as corrected utterances, but only utterances that include both misrecognized words / phrases and corrected words / phrases or misrecognized words are denied. Various forms of correction utterances such as utterances are assumed. For this reason, there is a problem that it is not possible to correctly deal with all corrected utterances.

また、「特許文献４」では、上記で述べたようなさまざまな形態の訂正発話をどのように認識するか、すなわち訂正発話を認識するために必要となるルールをどのように用意するかについては述べられていない。上記で述べたように訂正発話にはさまざまな形式が想定されることに加え、対話形式によっては状況毎に訂正発話で訂正される語・句が変化する場合がある。例えば、ある対話から別の対話に自由に移行できる形式の対話システムにおいて、別の対話への移行が、元の対話を継続するために行った発話を誤認識した結果である場合、その直後の訂正発話は元の対話の状況に依存する。このため、訂正発話を認識するためのルールをあらかじめ用意しておくためには、対話におけるあらゆる状況を考慮しなければならない。しかし、より柔軟な対話を実現するためには対話が複雑になるため、あらかじめ全ての状況を想定して必要なルールを用意しておくことが困難になるという問題がある。 In “Patent Document 4”, how to recognize various forms of corrected utterances as described above, that is, how to prepare rules necessary for recognizing corrected utterances. Not mentioned. As described above, various forms of correction utterances are assumed, and depending on the interactive form, the words / phrases corrected in the correction utterance may change depending on the situation. For example, in a dialogue system that can freely move from one dialogue to another, if the transition to another dialogue is the result of misrecognizing the utterance made to continue the original dialogue, Corrected utterances depend on the original dialogue situation. For this reason, in order to prepare a rule for recognizing a corrected utterance in advance, all situations in the dialogue must be considered. However, in order to realize a more flexible dialogue, the dialogue becomes complicated, and there is a problem that it is difficult to prepare necessary rules assuming all situations in advance.

本発明の目的は、さまざまな対話の状況や訂正発話の形式に対して柔軟に訂正発話の検出を行い、システムによる誤認識を修正することにより、システムが利用者の発話を誤認識した際にも、利用者とシステムがスムーズに対話を進めることを可能とする技術を提供することにある。 The purpose of the present invention is to detect correction utterances flexibly for various dialogue situations and correction utterance formats, and correct misrecognition by the system, so that when the system misrecognizes a user's utterance. Another object of the present invention is to provide a technology that enables a user and a system to smoothly communicate with each other.

上記目的を達成するために、本発明では、対話の進行具合や利用者の音声を認識するために使用したルール、利用者の音声を認識した結果等、対話の状態の変化を記録する対話履歴を設け、対話履歴中の情報とあらかじめ用意されているテンプレートを用いて利用者の訂正発話を認識するためのルールを生成する。利用者の音声が、生成されたルールを用いて認識された場合、利用者の音声を訂正発話と見なし、誤認識を修正する処理に移行する。誤認識の修正では、訂正発話の認識結果およびそれまでの利用者の発話の認識結果の両方を利用することにより修正を行う。 In order to achieve the above object, in the present invention, a dialog history that records changes in the state of the dialog, such as the progress of the dialog, the rules used to recognize the user's voice, the results of recognizing the user's voice, etc. And a rule for recognizing a user's corrected utterance using information in the dialogue history and a template prepared in advance. When the user's voice is recognized using the generated rule, the user's voice is regarded as a corrected utterance, and the process proceeds to a process of correcting misrecognition. In the correction of the misrecognition, the correction is performed by using both the recognition result of the corrected utterance and the recognition result of the user's previous utterance.

本発明によれば、対話中で使用された各種情報を格納する対話履歴を参照することにより訂正発話を認識するためのルールを生成し、生成したルールに基づいて訂正発話の有無を判定することにより、さまざまな対話の状況や訂正発話の形式に対して柔軟に訂正発話の検出を行うことが可能となる。また、訂正発話の認識結果およびそれまでの利用者の発話の認識結果の両方を利用して誤認識の修正を行うことにより、精度良く修正を行うことが可能となる。以上により、音声対話システムが利用者の発話を誤認識した際にも、利用者と音声対話システムがスムーズに対話を進めることが可能となる。 According to the present invention, a rule for recognizing a corrected utterance is generated by referring to a dialog history storing various information used in the dialog, and the presence / absence of a corrected utterance is determined based on the generated rule. Thus, it becomes possible to detect a correction utterance flexibly with respect to various conversation situations and correction utterance formats. In addition, it is possible to correct with high accuracy by correcting misrecognition using both the recognition result of the corrected utterance and the recognition result of the user's utterance so far. As described above, even when the voice dialogue system misrecognizes the user's utterance, the user and the voice dialogue system can smoothly proceed with the dialogue.

以下、本発明の一実施例を図１から図２５を用いて説明する。 An embodiment of the present invention will be described below with reference to FIGS.

図１は本発明を適用した音声対話システムの構成図である。図１において、マイク１０１は利用者の音声を入力するための装置、音声入力部１０２は利用者の音声をデジタル信号に変換するための装置である。スピーカ１０３は音声対話システムからの質問や確認あるいは処理結果等を音声として出力するための装置、音声出力部１０４はデジタル信号を音声としてスピーカ１０３から出力できるアナログ信号に変換するための装置である。情報処理部１０５は、音声対話システムにおける各種処理を行うための装置である。記憶部１０６は音声対話システムに必要となる各種処理を行うためのプログラムを格納するための装置であり、利用者が入力した音声信号から利用者の発話した内容を認識するための音声認識プログラム１０７、音声対話システムからの質問や確認、処理結果等のメッセージを音声として出力するための音声合成プログラム１０８、利用者と音声対話システムとの間の対話の進行を制御するための対話制御プログラム１０９、利用者からの要求に応じた処理を行うためのタスク実行プログラム１１０、音声対話システムによる誤認識を訂正するために利用者が発話する訂正発話を認識する際に使用される文法ルールを生成するための訂正発話用ルール生成プログラム１１１が格納される。また、記憶部１１２は対話の状態変化に関する情報である対話履歴１１３を格納するための装置である。対話シナリオ１１４には対話制御プログラム１０９で使用される対話の進行に関する情報が記述される。音声認識用文法ルール１１５は対話中に利用者が入力する発話を認識するために使用される文法ルールであり、音声認識プログラム１０７において使用される。訂正発話用テンプレート１１６は訂正発話を認識するための文法ルールを生成する際に使用されるテンプレートであり、訂正発話用ルール生成プログラム１１１において使用される。以上の各装置は全て、パーソナルコンピュータに用いられるような一般的な装置を利用することが可能である。また、音声認識プログラム１０７としては、良く知られた技術、例えば、「確率モデルによる音声認識、コロナ社、１９８８年」にあるような技術を使用することができる。音声合成プログラム１０８としても同様に、良く知られた技術、例えば、「音声情報処理電子情報通信工学シリーズ、森北出版、１９９８年」にあるような技術を使用することができる。タスク実行プログラム１１０としては、音声対話システムから実行可能なプログラムであれば、どのようなプログラムでも利用することが可能であり、データベースの検索や各種計算、装置の制御等の一般の計算機において利用可能な機能を利用することができる。 FIG. 1 is a block diagram of a spoken dialogue system to which the present invention is applied. In FIG. 1, a microphone 101 is a device for inputting a user's voice, and a voice input unit 102 is a device for converting the user's voice into a digital signal. The speaker 103 is a device for outputting questions, confirmations or processing results from the voice interactive system as voices, and the voice output unit 104 is a device for converting digital signals into analog signals that can be output from the speakers 103 as voices. The information processing unit 105 is a device for performing various processes in the voice interaction system. The storage unit 106 is a device for storing a program for performing various processes necessary for the voice interaction system, and a voice recognition program 107 for recognizing the content spoken by the user from the voice signal input by the user. A speech synthesis program 108 for outputting messages such as questions and confirmations from the speech dialogue system, processing results, etc. as speech, a dialogue control program 109 for controlling the progress of the dialogue between the user and the speech dialogue system, A task execution program 110 for performing processing in response to a request from the user, and a grammar rule used when recognizing a corrected utterance that the user utters in order to correct a misrecognition by the spoken dialogue system. The corrected utterance rule generation program 111 is stored. The storage unit 112 is a device for storing a dialogue history 113 that is information related to a dialogue state change. In the dialogue scenario 114, information on the progress of the dialogue used in the dialogue control program 109 is described. The speech recognition grammar rule 115 is a grammar rule used for recognizing an utterance input by a user during a conversation, and is used in the speech recognition program 107. The corrected utterance template 116 is a template used when generating a grammar rule for recognizing a corrected utterance, and is used in the corrected utterance rule generation program 111. All of the above devices can use general devices such as those used in personal computers. As the speech recognition program 107, a well-known technique, for example, a technique described in "Speech recognition using a probability model, Corona, 1988" can be used. Similarly, as the speech synthesis program 108, a well-known technique, for example, a technique as described in "Speech Information Processing Electronic Information Communication Engineering Series, Morikita Publishing, 1998" can be used. As the task execution program 110, any program that can be executed from the voice interactive system can be used, and can be used in general computers such as database search, various calculations, and device control. Can use various functions.

図２は対話シナリオ１１４に格納される対話に関する情報のフォーマットである。図２において、対話名２０１はそれぞれの対話を識別するための文字列である。スロット名２０２および２０５はタスクを実行するために利用者が入力する必要がある項目であるスロットの名称を表わす文字列であり、各スロットに対応する情報であるスロット値は、利用者が音声により入力することを示す。システムプロンプト２０３および２０６は、スロット名２０２および２０５に対する情報の入力を利用者に促す際に出力するメッセージを表わす文字列である。システムプロンプトとしては、例えば、スロットが会議の開始時間に関する場合、「会議は何時からですか？」といったような文を使用することができる。利用者発話認識用文法名２０４および２０７は利用者の発話を認識するために使用する音声認識用文法ルールの名称である。確認用プロンプト２０８は、利用者によって入力された各スロット値を確認するために出力されるメッセージを表わす文字列である。この文字列には、スロット名やスロット値を埋め込むための特殊な文字列を含むことができる。例えば、スロット値を埋め込むための文字列をスロット名を（）で囲むことにより表す場合、「開始時間は(スロット名１)、終了時間は(スロット名２)でよろしいですか」といったような文字列を使用することができる。タスク実行用コマンド２０９はタスク実行用プログラム１１０を実行するためのコマンドであり、フォーマットはタスク実行プログラム１１０の解釈可能な形式となる。例えば、タスク実行用プログラム１１０において、開始時間と終了時間を引数として利用可能な会議室を検索する処理が「SearchRoom」というコマンドで実行される場合、「SearchRoom (スロット名１) (スロット名２)」というように記述しておけば良い。ここでも、（）で囲まれたスロット名によりスロット値が埋め込まれると仮定している。利用者確認発話用文法名２１０は、確認用プロンプト２０８に対する利用者の応答を認識するための音声認識用文法ルールの名称である。結果プロンプト２１１はタスクの実行結果を利用者に報告するための文字列であり、タスクの実行結果を埋め込むための特殊な文字列を含むことができる。例えば、タスクの実行結果がResult[n]という文字列の配列変数に入力される場合、「利用可能な会議室は(Result[1])、(Result[2])、(Result[3])です」というような文を使用することができる。ここで、（）で囲まれた配列変数名により、結果の各文字列が埋め込まれると仮定している。また、スロット名やスロット値を埋め込めるようにすることもできる。 FIG. 2 shows a format of information regarding the dialogue stored in the dialogue scenario 114. In FIG. 2, a dialogue name 201 is a character string for identifying each dialogue. Slot names 202 and 205 are character strings representing the names of slots, which are items that the user needs to input in order to execute the task, and the slot values, which are information corresponding to each slot, are expressed by the user by voice. Indicates input. System prompts 203 and 206 are character strings representing messages to be output when prompting the user to input information for slot names 202 and 205. As the system prompt, for example, if the slot is related to the start time of the conference, a sentence such as “What time is the conference from?” Can be used. User utterance recognition grammar names 204 and 207 are names of grammar rules for speech recognition used to recognize a user's utterance. The confirmation prompt 208 is a character string representing a message output to confirm each slot value input by the user. This character string can include a special character string for embedding slot names and slot values. For example, when a character string for embedding a slot value is expressed by enclosing the slot name with (), characters such as “Are you sure that the start time is (slot name 1) and the end time is (slot name 2)?” Columns can be used. The task execution command 209 is a command for executing the task execution program 110 and has a format that can be interpreted by the task execution program 110. For example, in the task execution program 110, when the process of searching for a conference room that can be used with the start time and end time as arguments is executed by the command “SearchRoom”, “SearchRoom (slot name 1) (slot name 2) "And so on. Again, it is assumed that the slot value is embedded by the slot name surrounded by (). The user confirmation utterance grammar name 210 is a name of a speech recognition grammar rule for recognizing a user response to the confirmation prompt 208. The result prompt 211 is a character string for reporting the task execution result to the user, and may include a special character string for embedding the task execution result. For example, if the task execution result is input to an array variable of the string Result [n], “The available conference rooms are (Result [1]), (Result [2]), (Result [3]) Can be used. Here, it is assumed that each character string of the result is embedded by the array variable name enclosed in parentheses. It is also possible to embed slot names and slot values.

図３は音声対話システムにおいて実行される対話の一例を示す。図３における対話では、３種類のスロットが想定されており、それぞれ会議室の利用開始時間、終了時間、会議室名となっている。３０１、３０３、３０５は各スロット値の入力を利用者に促すためのシステムプロンプトであり、３０２、３０４、３０６は利用者の発話である。３０７は利用者が入力した情報を確認するための確認プロンプトであり、３０８は確認プロンプトに対する利用者の応答である。３０９はタスク実行結果を報告する結果プロンプトである。 FIG. 3 shows an example of dialogue executed in the voice dialogue system. In the dialogue in FIG. 3, three types of slots are assumed, which are a conference room use start time, an end time, and a conference room name, respectively. 301, 303, and 305 are system prompts for prompting the user to input each slot value, and 302, 304, and 306 are user's utterances. Reference numeral 307 denotes a confirmation prompt for confirming information input by the user, and reference numeral 308 denotes a user response to the confirmation prompt. Reference numeral 309 denotes a result prompt for reporting a task execution result.

音声認識用文法ルール１１５は、ノードとアークを有する状態遷移ネットワークによって表現する。図４に状態遷移ネットワークで表現した音声認識用文法ルールの例を示す。図４において、４０１、４０４、４０９、４１４はノードを、４０３、４０６、４０８、４１１、４１３はあるノードから他のノードへの遷移を表わすアーク、４０２、４０５、４０７、４１０、４１２はアークで示される遷移が実行されるための条件となる利用者発話中の単語あるいは句を表わす。例えば音声認識プログラム１０７が利用者の発話を認識した結果、「１０時」、「から」、「予約して下さい」を出力したとすると、この結果を図４に示す文法ルールに順に適用した場合、４０１で示すノードから開始して、ノード４０１、アーク４０３、ノード４０４、アーク４０８、ノード４０９、アーク４１３、ノード４１４と辿ることができる。音声認識プログラム１０７は、このように文法ルールの最初から最後のノードまで辿ることができる単語(句)列のみを音声認識の結果として出力する。 The speech recognition grammar rule 115 is expressed by a state transition network having nodes and arcs. FIG. 4 shows an example of a grammar rule for speech recognition expressed by a state transition network. In FIG. 4, 401, 404, 409, 414 are nodes, 403, 406, 408, 411, 413 are arcs representing transitions from one node to another, 402, 405, 407, 410, 412 are arcs. It represents a word or phrase in a user's utterance that is a condition for executing the indicated transition. For example, when the speech recognition program 107 recognizes the user's utterance and outputs “10 o'clock”, “from”, and “reserve”, this result is applied to the grammar rules shown in FIG. , 401, and can be traced to node 401, arc 403, node 404, arc 408, node 409, arc 413, and node 414. The speech recognition program 107 outputs only a word (phrase) string that can be traced from the beginning to the last node of the grammar rule as a result of speech recognition.

図５に、図４で示す音声認識用文法ルールを音声対話システムにおいて記述するためのフォーマットを示す。図５において、文法名５０１は各文法ルールを識別するための名称であり、これが図２における利用者発話認識用文法名２０４、２０７あるいは２１０として記述される。ノード名５０２はノードを識別するための文字列であり、図４における４０１、４０４、４０９、４１４の名称に対応する。スロット名５０３は次のノードへの遷移条件となる利用者の発話内容が図２に示す対話シナリオ中のスロット名に対応する場合、そのスロット名を記述する。対応するスロット名がない場合は空白とする。単語名５０４から５０５は、次のノードへの遷移条件となる単語あるいは句を記述する。遷移条件となる単語名が空白の場合は、無条件に遷移先のノードに遷移することを表す。次ノード名５０６は遷移先のノード名である。このように、文法ルールは、あるノードから次のノードへの遷移を単位として記述する。 FIG. 5 shows a format for describing the grammar rules for speech recognition shown in FIG. 4 in the speech dialogue system. In FIG. 5, a grammar name 501 is a name for identifying each grammar rule, and is described as the user utterance recognition grammar name 204, 207 or 210 in FIG. The node name 502 is a character string for identifying the node, and corresponds to the names 401, 404, 409, and 414 in FIG. The slot name 503 describes the slot name when the utterance content of the user, which becomes a transition condition to the next node, corresponds to the slot name in the dialogue scenario shown in FIG. Blank if there is no corresponding slot name. Word names 504 to 505 describe a word or phrase that is a transition condition to the next node. If the word name that is the transition condition is blank, it means that the transition is made unconditionally to the transition destination node. The next node name 506 is the node name of the transition destination. In this way, the grammar rule describes a transition from one node to the next node as a unit.

図６に、図５で示す音声認識用文法ルールを適用した後に出力される音声認識結果を記述するためのフォーマットを示す。図６において、信頼度６０１は音声認識結果がどの程度正しいかを表わす数値である。音声認識プログラムが複数の単語（句）列を出力可能な場合は、図６に示す音声認識結果が複数出力される。信頼度６０１は複数出力される音声認識結果の順位付けを行うために使用される。この信頼度は、後述する各単語の信頼度の和や平均値等に基づいて決定することができる。スロット名６０２および６０５は、図５におけるスロット名５０３に対応する。単語名６０３および６０６は遷移条件として採用された単語の名称であり、図５における５０４から５０５の単語名のいずれかが記述される。単語信頼度６０４および６０７は、音声認識プログラムが出力した各単語がどの程度正しいかを表わす数値であり、通常、音声認識プログラムでは単語（句）列が出力される際に同時に出力される。また図６において、スロット名が空白の項目については省略することができる。 FIG. 6 shows a format for describing a speech recognition result output after applying the speech recognition grammar rule shown in FIG. In FIG. 6, the reliability 601 is a numerical value indicating how correct the speech recognition result is. When the speech recognition program can output a plurality of word (phrase) strings, a plurality of speech recognition results shown in FIG. 6 are output. The reliability 601 is used for ranking a plurality of output speech recognition results. This reliability can be determined based on the sum or average value of the reliability of each word to be described later. Slot names 602 and 605 correspond to the slot name 503 in FIG. Word names 603 and 606 are names of words adopted as transition conditions, and any one of the word names 504 to 505 in FIG. 5 is described. The word reliability levels 604 and 607 are numerical values indicating how correct each word output by the speech recognition program is, and are usually output simultaneously when a word (phrase) string is output in the speech recognition program. In FIG. 6, items with blank slot names can be omitted.

図７に訂正発話用テンプレート１１６を記述するためのフォーマットを示す。図７に示す訂正発話用テンプレートは、あらかじめ用意されている文法ルールの間に別の文法ルールを挿入する形式となっている。テンプレートルール７０１、７０３、７０５は、あらかじめ用意されている文法ルールを表しており、その記述フォーマットは図５に示す音声認識用文法ルールと同様である。但し、挿入される文法ルールがテンプレートルールの直前に接続される場合は、テンプレートルールの先頭ノードのノード名として特別なノード名、例えば「Ｘ」等を記述する。また、挿入される文法ルールがテンプレートルールの後に接続される場合は、挿入される文法ルールに遷移するノードの次ノードとして挿入箇所を表わす特別なノード名、例えば「Ｘ」等を記述する。これにより、テンプレートに文法ルールを挿入する際は、テンプレート中の特別なノード名を検索し、そこに文法ルールを結合すれば良い。挿入ルールタイプ７０２および７０４は、テンプレートに挿入される文法ルールのタイプを表す。挿入ルールタイプとしては、「文」、「句」、「スロット値」、「スロット名」を用意する。それぞれのタイプは、テンプレートに別の文法ルールを挿入する際に使用される。図７では、テンプレートルールの間に、挿入される文法ルールが挟まれる形のテンプレートを表しているが、挿入される文法ルールから開始されるテンプレート、あるいは挿入される文法ルールで終了するテンプレートにすることもできる。 FIG. 7 shows a format for describing the corrected utterance template 116. The corrected utterance template shown in FIG. 7 has a format in which another grammar rule is inserted between grammar rules prepared in advance. Template rules 701, 703, and 705 represent grammar rules prepared in advance, and the description format is the same as the grammar rules for speech recognition shown in FIG. However, when the grammar rule to be inserted is connected immediately before the template rule, a special node name such as “X” is described as the node name of the first node of the template rule. When the grammar rule to be inserted is connected after the template rule, a special node name indicating the insertion location, for example, “X” is described as the next node of the node transitioning to the grammar rule to be inserted. Thus, when a grammar rule is inserted into a template, a special node name in the template is searched and the grammar rule is combined there. Insertion rule types 702 and 704 represent grammar rule types to be inserted into the template. “Sentence”, “phrase”, “slot value”, and “slot name” are prepared as insertion rule types. Each type is used to insert another grammar rule into the template. FIG. 7 shows a template in which the inserted grammar rule is sandwiched between the template rules. However, the template starts with the inserted grammar rule or ends with the inserted grammar rule. You can also.

図８に、対話履歴１１３に格納される対話の状態変化に関する情報のフォーマットを示す。図８において対話名８０１は、この情報に関連する対話の名称であり、図２における対話名２０１と同じ内容が記述される。履歴種類８０２は、記述されている情報が音声対話システムに関するものであるか利用者に関するものであるかを示す。図２に示す対話シナリオでは、音声対話システムから出力されるプロンプトと利用者の発話の繰返しによって対話が進行することになる。よって、履歴種類８０２には、システムプロンプトに関する情報の場合は「システム」、利用者の発話に関する情報の場合は「利用者」が記述されることになる。対話情報８０３には、システムプロンプトあるいは利用者の発話に関する情報が記述される。対話履歴１１３には、図８で示される音声対話システムに関する情報および利用者に関する情報が、対話の進行に従った時系列情報として記録される。 FIG. 8 shows a format of information relating to the state change of the dialogue stored in the dialogue history 113. In FIG. 8, a dialog name 801 is a name of a dialog related to this information, and the same contents as the dialog name 201 in FIG. 2 are described. The history type 802 indicates whether the described information relates to a voice interaction system or a user. In the dialogue scenario shown in FIG. 2, the dialogue proceeds by the repetition of the prompt output from the voice dialogue system and the user's utterance. Therefore, in the history type 802, “system” is described in the case of information related to the system prompt, and “user” is described in the case of information related to the user's utterance. In the dialog information 803, information related to the system prompt or the user's utterance is described. In the dialogue history 113, information on the voice dialogue system shown in FIG. 8 and information on the user are recorded as time-series information according to the progress of the dialogue.

図９にシステムプロンプトに関する情報のフォーマットを示す。図９において、プロンプト種類９０１はシステムプロンプトの種類を表わす項目であり、「質問」、「確認」、「応答」が記述される。システムプロンプトの種類は図２に示す対話シナリオを参照することにより容易に決定することができる。図１０に、利用者の発話に関する情報のフォーマットを示す。図１０において、利用者発話認識用文法名１００１は、利用者の発話を認識するために使用された音声認識用文法ルールの名称であり、図２における利用者発話認識用文法名２０４、２０７あるいは２１０が記述される。スロット名１００２は、利用者の発話が関連するスロットの名称を表しており、図２に示すスロット名２０２あるいは２０５が記述される。利用者発話認識結果１００３には、利用者の発話を認識した結果が記述される。認識結果のフォーマットは、図６に示すフォーマットと同じである。 FIG. 9 shows a format of information related to the system prompt. In FIG. 9, a prompt type 901 is an item representing the type of system prompt, and describes “question”, “confirmation”, and “response”. The type of system prompt can be easily determined by referring to the dialogue scenario shown in FIG. FIG. 10 shows a format of information related to the user's utterance. 10, a user utterance recognition grammar name 1001 is a name of a speech recognition grammar rule used for recognizing a user's utterance. The user utterance recognition grammar name 204, 207 in FIG. 210 is described. The slot name 1002 represents the name of the slot related to the user's utterance, and the slot name 202 or 205 shown in FIG. 2 is described. The user utterance recognition result 1003 describes the result of recognizing the user's utterance. The format of the recognition result is the same as the format shown in FIG.

図１１は、対話制御プログラム１０９における処理の流れ図である。図１１の流れ図では、ある種類の対話が選択された直後の状態を想定している。この状態では、全てのスロットの値が空白の状態である。図１１において、ステップ１１０１では値が空白であるスロットを一つ選択し、対象スロットとする。スロットの選択方法は、対話シナリオの最初から順にスロットを検索し、値が空白であるかどうかを調べれば良い。また、図２における対話シナリオで各スロットに対して優先順位を追加することも可能であり、その場合は、優先順位の高い順にスロットを調べれば良い。ステップ１１０２では、対象とするスロットに対応するシステムプロンプトを対話シナリオから読み出し、音声として出力する。ステップ１１０３では、対象とするスロットに対応する利用者発話認識用文法名に示される文法ルールを音声認識プログラムに送り、音声認識プログラム１０７では指定された文法ルールに基づいて利用者発話の認識を行い、認識結果を対話制御プログラムに返す。対話制御プログラムでは、認識結果からスロットに関連する値を抽出し、スロット値として設定する。ステップ１１０４では、ステップ１１０２およびステップ１１０３の内容に基づいて、対話履歴１１３の内容を更新する。ステップ１１０５では、値が空白のスロットがあるかどうかを調べ、ある場合はステップ１１０１に戻る。全てのスロット値が設定されている場合は、ステップ１１０６に移る。ステップ１１０６では、対話シナリオから確認用プロンプトを読み込み、必要に応じてスロット値やスロット名の埋め込みを行った後、音声として出力する。ステップ１１０７では、対話履歴の内容と音声認識用文法ルールを元に、訂正発話用文法ルールを生成する。訂正発話用文法ルールの生成方法については後述する。ステップ１１０８では、対話シナリオ中の利用者確認用文法名に示される文法ルールと、生成された訂正発話用文法ルールを音声認識プログラムに送り、音声認識プログラムでは指定された文法に基づいて利用者発話の認識を行い、認識結果を対話制御プログラムに返す。ステップ１１０９では、ステップ１１０６およびステップ１１０８の内容に基づいて、対話履歴の内容を更新する。ステップ１１１０では、ステップ１１０８で得られた利用者発話の認識結果が、ステップ１１０６における確認内容の否定であればステップ１１１１でスロットの値を全て空白にして、ステップ１１０１に戻る。ステップ１１１０において、利用者の発話が訂正発話用文法によって認識された結果である場合は、ステップ１１１４においてスロット値の修正を行い、ステップ１１０６に戻る。スロット値の修正方法についても後述する。ステップ１１１０において、利用者の発話がステップ１１０６における確認内容を肯定する発話であればステップ１１１２に進み、対話シナリオからタスク実行用コマンドを読み込み、必要に応じてスロット値を埋め込んだ後、タスク実行プログラム１１０でコマンドを実行し、結果を対話制御プログラムは受け取る。ステップ１１１３では、対話シナリオから結果プロンプトを読み込み、必要に応じてコマンドの実行結果を埋め込んだ後、音声として出力する。上記の処理では、ステップ１１１４で修正を行った後、全体の確認であるステップ１１０６に戻っているが、ステップ１１１４で修正を行った値についてのみ確認を行い、利用者から肯定の応答が得られた場合にステップ１１０６に戻るようにしても良い。この場合、修正値の確認用プロンプトを図２に示す対話シナリオに追加し、使用するようにすれば良い。また、修正値に対して利用者から肯定の応答が得られなかった場合は、修正対象となっているスロット値の入力を利用者に促すプロンプトを出力し、利用者からの入力を受けるようにしても良い。この場合に必要となるプロンプトも図２に示す対話シナリオに容易に追加することができる。修正値に対して利用者から肯定の応答が得られなかった場合、スロットを修正前の状態に戻した後、ステップ１１０６に戻るようにすることもできる。さらに、修正を行った後、確認に戻らず、直接次の対話状態に移行するようにしても良い。すなわち、図１１に示す流れ図の場合、ステップ１１１４の後、ステップ１１１２に進むようにすることができる。 FIG. 11 is a flowchart of processing in the dialog control program 109. In the flowchart of FIG. 11, a state immediately after a certain type of dialogue is selected is assumed. In this state, all slot values are blank. In FIG. 11, in step 1101, one slot having a blank value is selected and set as a target slot. To select a slot, it is only necessary to search for the slot in order from the beginning of the interactive scenario and check whether the value is blank. Also, it is possible to add priorities to each slot in the dialogue scenario in FIG. 2, and in this case, the slots may be examined in descending order of priority. In step 1102, a system prompt corresponding to the target slot is read from the dialogue scenario and output as a voice. In step 1103, the grammar rule indicated by the grammar name for user utterance recognition corresponding to the target slot is sent to the speech recognition program, and the speech recognition program 107 recognizes the user utterance based on the specified grammar rule. The recognition result is returned to the dialog control program. In the dialogue control program, a value related to the slot is extracted from the recognition result and set as a slot value. In step 1104, the contents of the dialogue history 113 are updated based on the contents of steps 1102 and 1103. In step 1105, it is checked whether there is a slot whose value is blank. If all slot values are set, the process proceeds to step 1106. In step 1106, a confirmation prompt is read from the dialogue scenario, and slot values and slot names are embedded as necessary, and then output as voice. In step 1107, a corrected utterance grammar rule is generated based on the contents of the dialogue history and the speech recognition grammar rule. A method for generating a grammar rule for corrected utterance will be described later. In step 1108, the grammar rule indicated by the user confirmation grammar name in the dialogue scenario and the generated corrected utterance grammar rule are sent to the speech recognition program, and the speech recognition program utters the user utterance based on the specified grammar. Is recognized, and the recognition result is returned to the dialog control program. In step 1109, the contents of the dialogue history are updated based on the contents of steps 1106 and 1108. In step 1110, if the user utterance recognition result obtained in step 1108 is negative in the confirmation contents in step 1106, all slot values are left blank in step 1111, and the process returns to step 1101. If it is determined in step 1110 that the user's utterance is a result recognized by the corrected utterance grammar, the slot value is corrected in step 1114, and the process returns to step 1106. A method for correcting the slot value will also be described later. In step 1110, if the user's utterance is an utterance that affirms the confirmation content in step 1106, the process proceeds to step 1112. The task execution command is read from the dialogue scenario, and the slot value is embedded as necessary. The command is executed at 110, and the dialog control program receives the result. In step 1113, a result prompt is read from the dialogue scenario, and the command execution result is embedded as necessary, and then output as a voice. In the above processing, after the correction in step 1114, the process returns to step 1106, which is the overall confirmation. However, only the value corrected in step 1114 is confirmed, and a positive response is obtained from the user. In such a case, the process may return to step 1106. In this case, a correction value confirmation prompt may be added to the interactive scenario shown in FIG. 2 and used. If no positive response is received from the user with respect to the correction value, a prompt prompting the user to input the slot value to be corrected is output and the input from the user is received. May be. Prompts required in this case can be easily added to the dialogue scenario shown in FIG. If a positive response is not obtained from the user with respect to the correction value, the slot may be returned to the state before correction, and then the process may return to step 1106. Further, after the correction is made, it is possible to shift directly to the next dialog state without returning to the confirmation. That is, in the case of the flowchart shown in FIG. 11, it is possible to proceed to step 1112 after step 1114.

次に、訂正発話用文法ルールを生成する方法について説明する。ここでの説明では、簡単のため、対話履歴中には、システムプロンプトおよび利用者の発話に関する最新の情報のみが格納されている場合を想定する。対話履歴中に最新の情報以前の情報を格納する場合も、以下で述べる方法は同様に使用することができる。 Next, a method for generating a corrected utterance grammar rule will be described. In this description, for the sake of simplicity, it is assumed that only the latest information related to the system prompt and the user's utterance is stored in the dialogue history. The method described below can be used in the same manner when information before the latest information is stored in the dialogue history.

訂正発話用ルールを生成する方法には、訂正発話用テンプレートを使用する方法と、使用しない方法がある。訂正発話用テンプレートを使用しない方法では、対話履歴中の利用者発話認識用文法名で指定される文法ルールを訂正発話用文法ルールとして複製する。検索された文法ルールが図４に示す文法ルールであるとすると、これをそのまま訂正発話用文法ルールとして使用する。 There are two methods for generating a correction utterance rule: a method using a correction utterance template and a method not using it. In the method that does not use the correction utterance template, the grammar rule specified by the user utterance recognition grammar name in the conversation history is copied as the grammar rule for correction utterance. If the searched grammar rule is the grammar rule shown in FIG. 4, it is used as it is as a grammar rule for correcting speech.

次に訂正発話用テンプレートを用いた訂正発話用文法ルールの生成方法について説明する。この方法では、訂正発話用テンプレート中に含まれている挿入ルールタイプによって、異なる方法を使用する。訂正発話用テンプレート中のテンプレートルールを図１２および図１３に示す文法ルールとし、これらの文法ルールの間に別の文法ルールが挿入される場合を想定する。図１３において、「null」１３０２は、無条件に次のノードに遷移することを表している。また、訂正発話用テンプレート中の挿入ルールタイプを「文」とし、対話履歴中の利用者発話認識用文法名で指定される文法ルールが図４に示す文法ルールであるとする。この場合、図４に示す文法ルール全体が図１２および図１３で表されるテンプレートルール間に挿入され、図１４に示すような訂正発話用文法ルールが生成される。図１４に示す文法ルールでは、図１２および図１３において「Ｘ」で示されているノード１２０１および１３０１に、それぞれ図４における文法ルールの先頭ノード４０１および最終ノード４１４が接続されており、１４０１から１４０５で示される文法ルールが図１２のテンプレートルール、１４０５から１４１３で示される文法ルールが図４で示される文法ルール、１４１３から１４１５で示される部分が図１３のテンプレートルールに対応している。また、文法ルールを訂正発話用テンプレートへ挿入する際、挿入する文法ルールの語尾変化や適切な語の削除あるいは追加を行った後、挿入を行うようにしても良い。例えば、挿入する文法ルールの語尾が「お願いします」であり、その後に接続するテンプレートルールが「ですが」であった場合、「お願いします」を「お願いしたいの」と修正を行った後接続すれば、自然な表現に対応できる。このような修正は、個々の表現毎に変換ルールを用意することにより、容易に実現可能である。 Next, a method for generating a correction utterance grammar rule using a correction utterance template will be described. In this method, a different method is used depending on the insertion rule type included in the correction utterance template. Assume that the template rules in the corrected utterance template are the grammar rules shown in FIGS. 12 and 13, and another grammar rule is inserted between these grammar rules. In FIG. 13, “null” 1302 represents a transition to the next node unconditionally. Assume that the insertion rule type in the corrected utterance template is “sentence”, and the grammar rule specified by the grammar name for user utterance recognition in the conversation history is the grammar rule shown in FIG. In this case, the entire grammar rule shown in FIG. 4 is inserted between the template rules shown in FIGS. 12 and 13, and a grammar rule for corrected utterance as shown in FIG. 14 is generated. In the grammar rule shown in FIG. 14, the first node 401 and the last node 414 of the grammar rule in FIG. 4 are connected to the nodes 1201 and 1301 indicated by “X” in FIGS. The grammatical rule indicated by 1405 corresponds to the template rule of FIG. 12, the grammatical rules indicated by 1405 to 1413 correspond to the grammatical rule indicated by FIG. 4, and the portions indicated by 1413 to 1415 correspond to the template rule of FIG. In addition, when inserting a grammar rule into a correction utterance template, the grammar rule may be inserted after a ending change of the grammar rule to be inserted or an appropriate word is deleted or added. For example, if the grammar rule to be inserted ends with “Please” and the template rule to be connected after that is “But”, after correcting “Please” with “I want to” If connected, it can handle natural expressions. Such correction can be easily realized by preparing a conversion rule for each expression.

訂正発話用テンプレート中のテンプレートルール間の挿入ルールタイプが「句」である場合の訂正発話用文法ルールの生成方法について説明する。テンプレートルールは図１２および図１５で示される文法ルール、対話履歴中の利用者発話に関する情報中のスロット名は「開始時刻」、対話履歴中の利用者発話認識用文法名で指定される文法ルールは図４であるとする。さらに、図４に示されるノード１（４０１）からノード２（４０２）への遷移に関する文法ルールにスロット名「開始時刻」が記述されているとする。この場合、対話履歴中のスロット名に対応するルールのみを図４に示す文法ルールから抽出し、テンプレートへ挿入する。すなわち、図４に示すノード１（４０１）からノード２（４０２）までの文法ルールを取り出し、先頭ノードを図１２に示すテンプレートルールの「Ｘ」で示されるノード１２０１に接続し、最終ノードを図１５に示すテンプレートルールの「Ｘ」で示されるノード１５０１に接続することにより、図１６に示す訂正発話用文法ルールが生成される。図１６において、１６０１から１６０５までの文法ルールが図１２に示すテンプレートルール、１６０７から１６１１までの文法ルールが図１５に示すテンプレートルールであり、１６０５から１６０７までの文法ルールが、対話履歴に基づいて抽出された文法ルールに対応している。 A method of generating a grammar rule for corrected utterance when the insertion rule type between template rules in the template for corrected utterance is “phrase” will be described. The template rule is the grammar rule shown in FIGS. 12 and 15, the slot name in the information related to the user utterance in the dialogue history is “start time”, and the grammar rule specified by the grammar name for user utterance recognition in the dialogue history Is assumed to be FIG. Furthermore, it is assumed that the slot name “start time” is described in the grammar rule relating to the transition from the node 1 (401) to the node 2 (402) shown in FIG. In this case, only the rule corresponding to the slot name in the dialogue history is extracted from the grammar rule shown in FIG. 4 and inserted into the template. That is, the grammatical rules from node 1 (401) to node 2 (402) shown in FIG. 4 are taken out, the leading node is connected to the node 1201 indicated by “X” in the template rule shown in FIG. By connecting to the node 1501 indicated by “X” in the template rule shown in FIG. 15, the grammar rule for corrected utterance shown in FIG. 16 is generated. In FIG. 16, the grammar rules 1601 to 1605 are the template rules shown in FIG. 12, the grammar rules 1607 to 1611 are the template rules shown in FIG. 15, and the grammar rules 1605 to 1607 are based on the conversation history. It corresponds to the extracted grammar rules.

訂正発話用テンプレート中のテンプレートルール間の挿入ルールタイプが「句」であり、対話履歴中の利用者発話認識用文法名で示される文法ルールが対話履歴により指定されるスロット名を複数含んでいる場合は、それらの組み合わせを含む訂正発話用文法ルールを生成する。対話履歴によって指定される文法ルールが図１７であり、ノード１７０１から１７０３のルールに対応するスロット名とノード１７０５から１７０７に対応するスロット名が、対話履歴によって指定されるスロット名であるとする。訂正発話用テンプレートが上記と同様に、図１２および図１５で示されるテンプレートルールから構成されているとすると、それぞれのスロット名に対応する文法ルールを含む訂正発話用文法ルールとして図１８および図１９に示す文法ルールが生成される。図１８ではノード１８０１から１８０３までの文法ルールが、また、図１９ではノード１９０１から１９０３までの文法ルールが、スロット名に対応する文法ルールを表している。ここで、図１８および図１９では、スロット名に関する文法ルールの後に続く助詞に関する文法ルール、ノード１８０３から１８０５まで、およびノード１９０３から１９０５までが追加された形となっている。助詞の種類は限られているため、あらかじめ訂正発話用ルール生成プログラム１１３中に記録しておくことができる。スロット名に対応する文法ルールを抽出する際、スロット名に対応する文法ルールに継続する文法ルールを検索し、記録されている助詞と文法ルール中の単語名を比較することにより、スロットに助詞が継続しているかどうかを容易に判定することができる。助詞が継続している場合、その助詞に関連する文法ルールをスロットに関する情報と共に抽出し、訂正発話用テンプレートに挿入すれば良い。また、スロット名に対応する文法ルールの後に、助詞に関する文法ルールを接続した遷移と、無い場合の遷移の両方を挿入するようにしても良い。また、助詞に関する文法ルールは、対話履歴中の利用者発話認識用文法名で示される文法ルールから検索する以外に、あらかじめ記憶された助詞を遷移条件とする文法ルールを生成し、それを挿入するようにしても良い。また、助詞に関する情報は、プログラムとは別に記憶装置上に格納するようにしても良い。両方のスロット名に対応するルールを含む訂正発話用文法ルールとしては、図２０に示すルールが生成される。図２０において、ノード２００１から２００３までの文法ルール、およびノード２００５から２００７までの文法ルールが、各スロット名に対応する文法ルールである。また、ノード２００３から２００５、およびノード２００７から２００９までの文法ルールが、それぞれのスロットに対応する文法ルールに継続する助詞に関する文法ルールを表わしている。図２０では、スロット名に対応する文法ルールの順序は、図１７に示す文法ルール中のスロット名に対応する文法ルールの順序と同じになっている。図２０に示す文法ルールでは、助詞によって順序が決定されるため一通りの組み合わせのみ挿入されているが、順序関係が限定されない助詞が継続する場合は、任意の組み合わせを生成して訂正発話用テンプレート中に挿入することができる。順序関係は助詞の種類によって決定されるため、助詞の種類と順序に関する情報を各助詞に付属する情報として記述しておき、それらの情報に基づいて挿入する文法ルールの順序を容易に決定することができる。対話履歴が、システムプロンプトおよび利用者の発話に関する最新の情報以前の情報も記録している場合は、利用者発話認識用文法名で示される全ての文法ルールから各スロット名に対応する文法ルールを全て抽出した後、スロット間の組み合わせを求めれば良い。 The insertion rule type between template rules in the correction utterance template is “phrase”, and the grammar rule indicated by the grammar name for user utterance recognition in the dialogue history includes a plurality of slot names specified by the dialogue history. In the case, a grammar rule for correct utterance including the combination is generated. FIG. 17 shows the grammatical rules specified by the dialog history, and the slot names corresponding to the rules of the nodes 1701 to 1703 and the slot names corresponding to the nodes 1705 to 1707 are the slot names specified by the dialog history. Assuming that the corrected utterance template is composed of the template rules shown in FIGS. 12 and 15 as described above, the corrected utterance grammar rules including the grammar rules corresponding to the respective slot names are shown in FIGS. Is generated. In FIG. 18, the grammar rules for the nodes 1801 to 1803, and in FIG. 19, the grammar rules for the nodes 1901 to 1903 represent the grammar rules corresponding to the slot names. Here, in FIG. 18 and FIG. 19, a grammar rule for particles following a grammar rule for slot names, nodes 1803 to 1805, and nodes 1903 to 1905 are added. Since the types of particles are limited, they can be recorded in the corrected utterance rule generation program 113 in advance. When extracting the grammar rule corresponding to the slot name, the grammar rule that continues to the grammar rule corresponding to the slot name is searched, and the particle is stored in the slot by comparing the recorded particle and the word name in the grammar rule. It can be easily determined whether or not it continues. If the particle continues, the grammatical rule related to the particle may be extracted together with information about the slot and inserted into the corrected utterance template. Further, after the grammar rule corresponding to the slot name, both a transition in which a grammar rule related to a particle is connected and a transition in the case where there is no grammar may be inserted. In addition to searching from the grammar rule indicated by the grammar name for user utterance recognition in the conversation history, the grammar rule related to the particle is generated by inserting a pre-stored particle grammar rule and inserting it. You may do it. Further, the information on the particle may be stored on the storage device separately from the program. The rule shown in FIG. 20 is generated as a grammar rule for corrected utterance including rules corresponding to both slot names. In FIG. 20, the grammar rules for nodes 2001 to 2003 and the grammar rules for nodes 2005 to 2007 are grammar rules corresponding to each slot name. Further, the grammar rules of the nodes 2003 to 2005 and the nodes 2007 to 2009 represent the grammar rules related to the particles continuing to the grammar rules corresponding to the respective slots. In FIG. 20, the order of the grammar rules corresponding to the slot names is the same as the order of the grammar rules corresponding to the slot names in the grammar rules shown in FIG. In the grammar rule shown in FIG. 20, only one combination is inserted because the order is determined by the particle, but when a particle whose order relation is not limited continues, an arbitrary combination is generated and a template for corrected utterance Can be inserted inside. Since the order relation is determined by the type of particle, information on the type and order of particles is described as information attached to each particle, and the order of grammar rules to be inserted is easily determined based on that information. Can do. If the dialog history also records information prior to the latest information related to the system prompt and user utterance, the grammar rule corresponding to each slot name is selected from all grammar rules indicated by the grammar name for user utterance recognition. What is necessary is just to obtain | require the combination between slots after extracting all.

訂正発話用テンプレート中のテンプレートルール間の挿入ルールタイプが「スロット値」である場合の訂正発話用文法ルールの生成方法について説明する。ここで、テンプレートルールとして図１２、図２１および図１５に示す文法ルールが順に訂正発話用テンプレート中に含まれており、図１２と図２１で示されるテンプレートルール間の挿入ルールタイプが「スロット値」、図２１と図１５で示されるテンプレートルール間の挿入ルールタイプを「句」とする。図２１では、前後に文法ルールが接続されるため、開始ノード２１０１、終了ノード２１０２共、ノード名が「Ｘ」となっている。また、対話履歴中の利用者発話認識用文法名で示される文法ルールは図２２で示される文法ルールであり、図２２中のノード２２０１から２２０５までの文法ルールが対話履歴中に含まれるスロット名に対応するとする。さらに、対話履歴中に含まれる利用者発話認識結果が、
開始時刻＝１０時
であったとする。なお、「開始時刻」は対話履歴中に含まれるスロット名であり、「１０時」がスロット値であるとする。この場合、訂正発話用文法ルールを生成するには、まず、挿入ルールタイプが「スロット値」の部分には、対話履歴中に含まれるスロット名に対応するスロット値を遷移条件とする文法ルールを生成し、挿入する。次に、挿入ルールタイプが「句」の部分には、対話履歴中に含まれているスロット名に対応する文法ルールを対話履歴中に含まれている利用者発話認識用文法名で指定される文法ルールから抽出し、訂正発話用テンプレートに挿入する。以上の操作により、図２３に示す文法ルールが訂正発話用文法ルールとして生成される。図２３において、２３０１から２３０３までのルールが利用者発話認識結果から抽出されたスロット値に対応する文法ルール、２３０５から２３０９までの文法ルールが対話履歴により指定される文法ルールから抽出された文法ルールに対応している。図２３に示す文法ルールでは、対話履歴により指定される文法ルールから抽出した文法ルールがそのまま挿入されているが、利用者発話認識結果から抽出されたスロット値に対応する遷移を除いたルールを挿入するようにしても良い。すなわち、図２３における「１０時」２３０６は利用者発話認識結果から抽出されたスロット値と同じであり、訂正発話としては除外することができるため、「１１時」２３０７および「１２時」２３０８のみを遷移条件としたルールにすることができる。 A method of generating a grammar rule for corrected utterance when the insertion rule type between template rules in the template for corrected utterance is “slot value” will be described. Here, the grammar rules shown in FIG. 12, FIG. 21, and FIG. 15 are included in the correction utterance template in order as template rules, and the insertion rule type between the template rules shown in FIG. 12 and FIG. ”, And the insertion rule type between the template rules shown in FIGS. 21 and 15 is“ phrase ”. In FIG. 21, since the grammar rules are connected before and after, the node name is “X” for both the start node 2101 and the end node 2102. The grammar rule indicated by the user utterance recognition grammar name in the dialogue history is the grammar rule shown in FIG. 22, and the grammar rules of nodes 2201 to 2205 in FIG. 22 are included in the dialogue history. Suppose that Furthermore, the user utterance recognition result included in the conversation history is
It is assumed that start time = 10 o'clock. “Start time” is a slot name included in the dialogue history, and “10 o'clock” is a slot value. In this case, in order to generate a grammar rule for correct utterance, first, a grammar rule whose transition condition is a slot value corresponding to a slot name included in the dialogue history is inserted in a portion where the insertion rule type is “slot value”. Generate and insert. Next, when the insertion rule type is “phrase”, the grammar rule corresponding to the slot name included in the dialogue history is specified by the grammar name for user utterance recognition contained in the dialogue history. Extract from grammatical rules and insert into correction utterance template. Through the above operation, the grammar rule shown in FIG. 23 is generated as the grammar rule for corrected utterance. In FIG. 23, grammar rules corresponding to the slot values extracted from the user utterance recognition results 2301 to 2303 are extracted from the grammar rules specified by the dialogue history. It corresponds to. In the grammar rule shown in FIG. 23, the grammar rule extracted from the grammar rule specified by the conversation history is inserted as it is, but the rule excluding the transition corresponding to the slot value extracted from the user utterance recognition result is inserted. You may make it do. That is, “10 o'clock” 2306 in FIG. 23 is the same as the slot value extracted from the user utterance recognition result and can be excluded as a corrected utterance, so only “11:00” 2307 and “12:00” 2308 Can be made a rule with transition condition.

訂正発話用テンプレート中のテンプレートルール間の挿入ルールタイプが「スロット名」である場合の訂正発話用文法ルールの生成方法について説明する。テンプレートルールとしては図１２、図２４および図１５に示すルールが順に訂正発話用テンプレートに含まれており、図１２と図２４で示されるテンプレートルール間の挿入ルールタイプが「スロット名」、図２４と図１５で示されるテンプレートルール間の挿入ルールタイプを「句」とする。図２４では、前後に挿入されるルールが接続されるため、開始ノード２４０１、終了ノード２４０２共、ノード名が「Ｘ」となっている。また、対話履歴中の利用者発話認識用文法名で示される文法ルールは図２２の文法ルールであり、図２２中のノード２２０１から２２０５までの文法ルールが対話履歴中に含まれるスロット名に対応するとし、そのスロット名は「開始時刻」であるとする。この場合、訂正発話用文法ルールを生成するには、まず、挿入ルールタイプが「スロット名」の部分には、対話履歴により指定されているスロット名を遷移条件とする文法ルールを生成して挿入する。次に、挿入ルールタイプが「句」の部分には、対話履歴中に含まれるスロット名に対応する文法ルールを対話履歴中の利用者発話認識用文法で指定される文法ルールから抽出し、訂正発話用テンプレートに挿入する。以上の操作により、図２５に示す文法ルールが訂正発話用文法ルールとして生成される。図２５において、２５０１から２５０３までの文法ルールがスロット名により生成されたルール、２５０５から２５０９までのルールが対話履歴中の利用者発話認識用文法名により指定される文法ルールから抽出された文法ルールに対応している。また、スロット名を利用者がどのように発話するかには複数通りの方法が考えられる。例えば、「開始時刻」に対して「開始時刻」、「開始」、「始まり」等が在り得る。このような複数の表現に対応するために、各スロット名に対応する表現を記憶装置上に別途記録しておき、それらの表現を遷移条件とする文法ルールを生成するようにしても良い。 A method of generating a grammar rule for corrected utterance when the insertion rule type between template rules in the template for corrected utterance is “slot name” will be described. As the template rules, the rules shown in FIGS. 12, 24 and 15 are included in the correction utterance template in order, and the insertion rule type between the template rules shown in FIGS. 12 and 24 is “slot name”. And the insertion rule type between the template rules shown in FIG. In FIG. 24, since the rules inserted before and after are connected, the node name is “X” for both the start node 2401 and the end node 2402. Further, the grammar rule indicated by the grammar name for user utterance recognition in the dialogue history is the grammar rule of FIG. 22, and the grammar rules of nodes 2201 to 2205 in FIG. 22 correspond to the slot names included in the dialogue history. Then, it is assumed that the slot name is “start time”. In this case, in order to generate a grammar rule for correct utterance, first, a grammar rule having a slot name specified by the dialog history as a transition condition is generated and inserted into the part where the insertion rule type is “slot name”. To do. Next, when the insertion rule type is “phrase”, the grammar rule corresponding to the slot name included in the dialogue history is extracted from the grammar rule specified by the user utterance recognition grammar in the dialogue history and corrected. Insert into the utterance template. Through the above operation, the grammar rule shown in FIG. 25 is generated as a grammar rule for correct utterance. In FIG. 25, grammar rules 2501 to 2503 are generated from slot names, and rules 2505 to 2509 are extracted from grammar rules specified by user utterance recognition grammar names in the conversation history. It corresponds to. In addition, there are a plurality of methods for how the user speaks the slot name. For example, there can be “start time”, “start”, “start”, etc. with respect to “start time”. In order to correspond to such a plurality of expressions, expressions corresponding to the respective slot names may be separately recorded on a storage device, and a grammatical rule using these expressions as transition conditions may be generated.

以上の方法では、訂正発話用テンプレート中にはそれぞれの挿入ルールタイプが一つずつしか含まれていないという前提となっている。同じ挿入ルールタイプの指定が複数箇所に含まれている場合は、対話履歴に基づいて抽出された文法ルールを挿入箇所に割り当てるための可能な全ての組み合わせを求め、各組み合わせ毎に訂正発話用文法ルールを生成するようにすれば良い。挿入箇所の数より抽出された文法ルールの数が少ない場合は、同じ文法ルールが複数箇所に挿入されることになる。挿入箇所に抽出された文法ルールを割り当てる際、挿入ルールタイプの順序も考慮することができる。例えば、「スロット値」や「スロット名」が挿入された場合、その次に来る「句」の挿入箇所には、通常、直前の「スロット値」や「スロット名」に関連する文法ルールのみが挿入可能である。また、挿入ルールタイプに、その前後の挿入箇所に挿入される文法ルールに関する情報も合わせて記述するようすれば、挿入さえる文法ルールの組み合わせをより容易に決定することが可能となる。 In the above method, it is assumed that only one insertion rule type is included in the corrected utterance template. If the same insertion rule type specification is included in multiple locations, find all possible combinations for assigning the grammar rules extracted based on the conversation history to the insertion location, and correct grammar for each utterance. A rule should be generated. When the number of extracted grammar rules is smaller than the number of insertion locations, the same grammar rules are inserted at a plurality of locations. When assigning extracted grammar rules to an insertion location, the order of insertion rule types can also be considered. For example, when “slot value” or “slot name” is inserted, the grammatical rule related to the immediately preceding “slot value” or “slot name” is usually only inserted at the next “phrase”. It can be inserted. If the information about the grammar rules inserted at the insertion locations before and after the insertion rule type is also described, it is possible to more easily determine the combination of grammar rules to be inserted.

また、抽出したスロット名に対応する文法ルールを次に使用される利用者発話認識用文法名で指定される文法ルールに挿入することにより、訂正発話用文法ルールを生成することもできる。この場合は、次に使用される文法ルール中のスロット名に対応する箇所を挿入箇所として、対話履歴に基づいて抽出されたスロット名に対応する文法ルールをテンプレートに挿入する方法を用いることで実現することができる。さらに、対話履歴に基づいて抽出されたスロット名に対応する文法ルールと、次に使用される利用者発話認識用文法名で指定される文法ルールから抽出したスロット名に対応する文法ルールとを、テンプレートに挿入する方法を用いることもできる。この場合は、スロット名に対応する文法ルールが複数ある場合と同様に、スロットの可能な組み合わせを求め、それぞれの組み合わせに対応する文法ルールを生成し、テンプレートに挿入することにより実現することができる。ここで、次に使用される利用者発話認識用文法名で指定される文法ルールから抽出したスロット名に対応する文法ルールのみを含む組み合わせは省略することができる。また、このようにして生成された訂正発話用文法ルールを用いて利用者の発話が認識された場合、図１１におけるステップ１１１４では、対話履歴中に含まれているスロット名に対応する認識結果を用いてスロット値の修正を行うと共に、次に使用される利用者発話認識用文法名で指定される文法ルールから抽出したスロット名に対応する認識結果を対応するスロットに代入する。また、ステップ１１０６において両者の結果に対する確認プロンプトを出力する必要がある。これは、対話シナリオ中に、スロットの可能な組み合わせに対する確認プロンプトを追加することにより、容易に実現することができる。 Further, the grammar rule for corrected utterance can be generated by inserting the grammar rule corresponding to the extracted slot name into the grammar rule specified by the grammar name for user utterance recognition to be used next. In this case, the location corresponding to the slot name in the grammar rule to be used next is used as the insertion location, and the grammar rule corresponding to the slot name extracted based on the dialogue history is inserted into the template. can do. Furthermore, a grammar rule corresponding to the slot name extracted based on the conversation history, and a grammar rule corresponding to the slot name extracted from the grammar rule specified by the grammar name for user utterance recognition to be used next, A method of inserting into a template can also be used. In this case, as in the case where there are a plurality of grammatical rules corresponding to the slot names, it is possible to obtain possible combinations of slots, generate grammatical rules corresponding to the respective combinations, and insert them into the template. . Here, a combination including only the grammar rule corresponding to the slot name extracted from the grammar rule specified by the user utterance recognition grammar name to be used next can be omitted. When the user's utterance is recognized using the grammar rule for corrected utterance generated in this way, in step 1114 in FIG. 11, the recognition result corresponding to the slot name included in the dialogue history is displayed. In addition to correcting the slot value, the recognition result corresponding to the slot name extracted from the grammar rule specified by the user utterance recognition grammar name to be used next is substituted into the corresponding slot. In step 1106, it is necessary to output a confirmation prompt for both results. This can be easily accomplished by adding confirmation prompts for possible combinations of slots during the interaction scenario.

以上の方法によって生成された訂正発話用文法ルール中において、対話履歴中の利用者発話認識用文法で指定される文法ルールから抽出された文法ルールには、元の文法ルールと同様にスロット名が付加される。ただし、音声認識プログラムからの認識結果が、元の文法ルールによって認識された結果であるか、訂正発話用文法ルールによって認識された結果であるかを識別するため、訂正発話用文法ルール中のスロット名には、特別な記号を付加する。例えば、スロット名が「開始時刻」であれば、「＿訂正」を付加して「開始時刻＿訂正」というスロット名を使用する。付加する記号は通常のスロット名として使用しないようにすれば、その記号があるかないかを調べることによって、利用者の発話が訂正発話であるかそうでないかを容易に判定することができる。 Among the corrected utterance grammar rules generated by the above method, the grammar rules extracted from the grammar rules specified by the user utterance recognition grammar in the dialogue history have the slot names as in the original grammar rules. Added. However, in order to identify whether the recognition result from the speech recognition program is the result recognized by the original grammar rule or the result recognized by the grammar rule for correct utterance, the slot in the grammar rule for correct utterance A special symbol is added to the name. For example, if the slot name is “start time”, “_correction” is added and the slot name “start time_correction” is used. If the added symbol is not used as a normal slot name, it can be easily determined whether or not the user's utterance is a corrected utterance by checking whether or not the symbol is present.

次に、訂正発話が認識された場合のスロット値の修正方法について説明する。対話履歴中の利用者発話認識結果の内容を、
開始時刻＝｛１０時、１２時｝
また、訂正発話の認識結果を、
開始時刻＿訂正＝｛１１時｝
とする。ここで、「開始時刻」はスロット名、「＿訂正」が訂正発話用文法ルールを用いて認識されたことを表すための記号であるとする。また、認識結果中には複数の候補が信頼度の高い順に並べられているとする。但し、上記の例では信頼度は省略している。この場合、訂正前のスロット「開始時刻」の値は「１０時」であり、これを訂正発話認識結果中の候補「１１時」で置き換えることにより修正が行われる。すなわち修正後のスロット値は、
開始時刻＝１１時
となる。この場合、訂正発話の認識結果に関わらず、対話履歴中の利用者発話認識結果の第二位の候補を選択することもできる。また、訂正発話の認識結果が、
開始時刻＿訂正＝｛１０時、１２時｝
であった場合、訂正前のスロット値と訂正発話の認識結果中の第一位候補の値が共に「１０時」であるため、この場合は、訂正発話の認識結果中の第二位候補「１２時」で置き換えることにより修正が行われる。すなわち修正後のスロット値は、
開始時刻＝１２時
となる。この場合も、対話履歴中の利用者発話認識結果の第二候補を用いて修正を行うこともできる。また、訂正発話の認識結果中の第二位候補と対話履歴中の利用者発話認識結果の第二位候補の内、信頼度の高い方を選択することもできる。さらに、訂正発話の認識結果と対話履歴中の利用者発話認識結果に共通する候補を選択するようにしても良い。この場合、両認識結果中の同じ候補について信頼度の和を求め、その値が最も大きい候補を選択するようにすれば良い。あるいは、訂正発話の認識結果と対話履歴中の利用者発話認識結果に共通する候補は両者における信頼度の和を新たな信頼度とし、それぞれの結果にしか含まれない候補についてはそのままの信頼度を用いることによって、両方の認識結果に含まれる全ての候補の順位付けを行い、既に選択されているスロット値と異なり、且つ信頼度の最も高い候補を用いることにより修正を行うことも可能である。両方の認識結果に含まれる候補については、高い方の信頼度を新たな信頼度としても良い。また、新たな信頼度を求める際、各認識結果における順位に基づいた重み付けを行っても良い。例えば、高い順位に対して大きな値になるような係数を用意し、それを元の信頼度に乗じる方法を使用することができる。 Next, a method for correcting the slot value when a corrected utterance is recognized will be described. The content of the user utterance recognition result in the conversation history
Start time = {10:00, 12:00}
Also, the recognition result of the corrected utterance
Start time_correction = {11:00}
And Here, it is assumed that “start time” is a slot name and a symbol for indicating that “_correction” is recognized using the grammar rule for corrected utterance. Further, it is assumed that a plurality of candidates are arranged in the recognition result in descending order of reliability. However, the reliability is omitted in the above example. In this case, the value of the slot “start time” before correction is “10 o'clock”, and correction is performed by replacing this with the candidate “11:00” in the corrected speech recognition result. In other words, the revised slot value is
Start time = 11 o'clock. In this case, the second candidate of the user utterance recognition result in the conversation history can be selected regardless of the recognition result of the corrected utterance. Also, the recognition result of the corrected utterance is
Start time_correction = {10:00, 12:00}
In this case, since the slot value before correction and the value of the first candidate in the recognition result of the corrected utterance are both “10 o'clock”, in this case, the second candidate in the recognition result of the corrected utterance “ Correction is made by replacing with “12:00”. In other words, the revised slot value is
Start time = 12: 00. In this case as well, correction can be performed using the second candidate utterance recognition result in the conversation history. It is also possible to select the second candidate in the recognition result of the corrected utterance and the second candidate in the user utterance recognition result in the conversation history, which has higher reliability. Further, a candidate common to the recognition result of the corrected utterance and the user utterance recognition result in the conversation history may be selected. In this case, the sum of reliability is obtained for the same candidate in both recognition results, and the candidate having the largest value may be selected. Alternatively, candidates that are common to the recognition result of the corrected utterance and the user utterance recognition result in the conversation history have the new reliability as the sum of the reliability of both, and the reliability that is included for candidates that are included only in each result By using, all candidates included in both recognition results are ranked, and it is also possible to make corrections by using a candidate having the highest reliability that is different from the already selected slot value. . For candidates included in both recognition results, the higher reliability may be set as a new reliability. Further, when obtaining a new reliability, weighting based on the rank in each recognition result may be performed. For example, it is possible to use a method of preparing a coefficient that becomes a large value with respect to a high rank and multiplying it by the original reliability.

対話履歴が、システムプロンプトおよび利用者の発話に関する最新の情報以前の情報も記録している場合は、さらに、同じ内容に関する訂正発話が繰り返されているかどうかを判定し、繰り返されている場合は、既に選択された候補に含まれない候補を選択することもできる。すなわち、利用者発話の認識結果の履歴が、
開始時刻＝｛１０時、１１時、１２時｝
開始時刻＿訂正＝｛１０時、１２時｝
であった場合、訂正前のスロット値は「１０時」、一回目の訂正後のスロット値は「１２時」となる。ここで、次の訂正発話の認識結果が、
開始時刻＿訂正＝｛１０時、１２時、１１時｝
であった場合、「１０時」および「１２時」は既に選択されているため、「１１時」を新しいスロット値として選択する。スロット値として使用された候補は利用者発話認識結果の履歴から容易に判定可能であるが、処理を効率化するために、対話履歴中に選択されたスロット値の履歴を記録するようにしても良い。また、訂正発話が継続しているかどうかは、対話履歴中の認識結果に含まれるスロット名、システムプロンプト、利用者発話認識用文法名等が継続しているかどうかを確認することによって容易に判定することが可能である。 If the conversation history also records information prior to the latest information about system prompts and user utterances, it also determines if corrected utterances are repeated for the same content, and if so, Candidates not included in the already selected candidates can also be selected. That is, the history of user utterance recognition results
Start time = {10 o'clock, 11 o'clock, 12 o'clock}
Start time_correction = {10:00, 12:00}
In this case, the slot value before correction is “10 o'clock”, and the slot value after the first correction is “12:00”. Here, the recognition result of the next corrected utterance is
Start time_correction = {10 o'clock, 12 o'clock, 11 o'clock}
In this case, “10 o'clock” and “12 o'clock” are already selected, so “11 o'clock” is selected as a new slot value. Candidates used as slot values can be easily determined from the history of user utterance recognition results, but in order to improve processing efficiency, the history of selected slot values may be recorded in the dialog history. good. Whether or not the corrected utterance is continued is easily determined by checking whether the slot name, system prompt, grammar name for user utterance recognition, etc. included in the recognition result in the conversation history is continued. It is possible.

また、上記スロット値の修正において、訂正発話中の候補の信頼度があらかじめ定められた値より低い場合は修正を行わないようにしても良い。さらに、既に選択されているスロット値の信頼度と修正値として選択された候補の信頼度の差あるいは比があらかじめ定められた値より大きい場合のみ、修正を行うようにすることもできる。 Further, in the correction of the slot value, if the reliability of the candidate during correction utterance is lower than a predetermined value, the correction may not be performed. Further, the correction can be performed only when the difference or ratio between the reliability of the slot value already selected and the reliability of the candidate selected as the correction value is larger than a predetermined value.

上記の実施例では、音声認識用文法ルールから訂正発話用文法ルールを生成し、誤認識の検出および修正を行う例を示したが、音声認識用文法ルールだけでなく、音声認識の後に行う処理である自然言語処理用のルール等、状態遷移に基づくルールを使用して入力された音声、文字列あるいはジェスチャの列を解析する機能を有する対話システムであれば、同様に適用することが可能である。 In the above embodiment, an example has been shown in which a grammar rule for correct utterance is generated from a grammar rule for speech recognition and detection and correction of misrecognition is performed, but not only the grammar rule for speech recognition but also the processing performed after speech recognition Any interactive system that has a function of analyzing speech, character strings, or gesture strings input using rules based on state transitions, such as natural language processing rules, can be similarly applied. is there.

また、上記の実施例は、音声対話システムからの質問に利用者が全て応えた後、音声対話システムが確認を行い、タスクを実行するという流れの対話における例であるが、本発明はそれ以外の形式の対話にも使用することが可能である。例えば、音声対話システムからの質問に利用者が応える毎に確認を行う形式の対話や、利用者が応えた内容の確認を次の音声対話システムの質問に含めて行う形式の対話、あるいは利用者が任意の発話を行える形式の対話等があるが、対話履歴中に格納する情報の範囲と、訂正発話用文法ルールを生成し使用する箇所とを、対話形式に合わせて変更することにより、上記実施例で述べた方法と同じ方法を本発明を適用することが可能である。 In addition, the above embodiment is an example of a dialogue in which a user responds to all questions from the voice dialogue system, and then the voice dialogue system performs confirmation and executes a task. It can also be used for other forms of dialogue. For example, a dialog in the form of confirmation every time a user responds to a question from a voice dialog system, a dialog in a form in which confirmation of the contents answered by the user is included in the question of the next voice dialog system, or the user There are dialogues in a format that can utter any utterance, but by changing the range of information stored in the dialogue history and the location where the grammar rules for correcting utterances are generated and used according to the dialogue format, the above The present invention can be applied to the same method as described in the embodiments.

本発明の一実施例の構成を示す図。The figure which shows the structure of one Example of this invention. 対話シナリオのフォーマットを示す図。The figure which shows the format of a dialogue scenario. 音声対話システムにおいて実行される対話の一例を示す図。The figure which shows an example of the dialogue performed in a voice dialogue system. 音声認識用文法ルールの一例を示す図。The figure which shows an example of the grammar rule for speech recognition. 音声認識用文法ルールを記述するためのフォーマットを示す図。The figure which shows the format for describing the grammar rule for speech recognition. 音声認識結果のフォーマットを示す図。The figure which shows the format of a speech recognition result. 訂正発話用テンプレートを記述するためのフォーマットを示す図。The figure which shows the format for describing the template for correction | amendment speech. 対話履歴を記述するためのフォーマットを示す図。The figure which shows the format for describing a dialog history. 対話履歴中のシステムプロンプトを記述するためのフォーマットを示す図。The figure which shows the format for describing the system prompt in a dialog history. 対話履歴中の利用者の発話に関する情報を記述するためのフォーマットを示す図。The figure which shows the format for describing the information regarding the user's utterance in dialog history. 対話制御プログラムにおける処理の流れ図。The flowchart of the process in a dialog control program. 訂正発話用ルールのテンプレートルールの一例を示す図。The figure which shows an example of the template rule of the rule for correction utterances. 訂正発話用ルールのテンプレートルールの一例を示す図。The figure which shows an example of the template rule of the rule for correction utterances. 利用者発話認識用文法ルール全体を訂正発話用ルールに挿入して生成された訂正発話用文法ルールの一例を示す図。The figure which shows an example of the grammar rule for correction utterances produced | generated by inserting the whole grammar rule for user utterance recognition into the rule for correction utterances. 訂正発話用ルールのテンプレートルールの一例を示す図。The figure which shows an example of the template rule of the rule for correction utterances. 利用者発話認識用文法ルールから抽出したスロット名に対応するルールのみを訂正発話用ルールに挿入して生成された訂正発話用文法ルールの一例を示す図。The figure which shows an example of the grammar rule for correction utterances produced | generated by inserting only the rule corresponding to the slot name extracted from the grammar rule for user utterance recognition into the rule for correction utterances. 利用者発話認識用文法ルールの一例を示す図。The figure which shows an example of the grammar rule for user utterance recognition. 利用者発話認識用文法ルールから抽出したスロット名に対応するルールのみを訂正発話用ルールに挿入して生成された訂正発話用文法ルールの一例を示す図。The figure which shows an example of the grammar rule for correction utterances produced | generated by inserting only the rule corresponding to the slot name extracted from the grammar rule for user utterance recognition into the rule for correction utterances. 利用者発話認識用文法ルールから抽出したスロット名に対応するルールのみを訂正発話用ルールに挿入して生成された訂正発話用文法ルールの一例を示す図。The figure which shows an example of the grammar rule for correction utterances produced | generated by inserting only the rule corresponding to the slot name extracted from the grammar rule for user utterance recognition into the rule for correction utterances. 利用者発話認識用文法ルールから抽出したスロット名に対応するルールの組み合わせを訂正発話用ルールに挿入して生成された訂正発話用文法ルールの一例を示す図。The figure which shows an example of the grammar rule for correction utterances produced | generated by inserting the combination of the rule corresponding to the slot name extracted from the grammar rule for user utterance recognition into the rule for correction utterances. 訂正発話用ルールのテンプレートルールの一例を示す図。The figure which shows an example of the template rule of the rule for correction utterances. 利用者発話認識用文法ルールの一例を示す図。The figure which shows an example of the grammar rule for user utterance recognition. 利用者発話の認識結果から抽出されたスロット値と利用者発話認識用文法ルールから抽出したスロット名に対応するルールを訂正発話用ルールに挿入して生成された訂正発話用文法ルールの一例を示す図。An example of a corrected utterance grammar rule generated by inserting a rule corresponding to a slot value extracted from a user utterance recognition result and a slot name extracted from a user utterance recognition grammar rule into the corrected utterance rule is shown. Figure. 訂正発話用ルールのテンプレートルールの一例を示す図。The figure which shows an example of the template rule of the rule for correction utterances. 利用者発話認識用文法ルールから抽出したスロット名とスロット名に対応するルールを訂正発話用ルールに挿入して生成された訂正発話用文法ルールの一例を示す図。The figure which shows an example of the grammar rule for correction utterances generated by inserting the slot name extracted from the grammar rule for user utterance recognition and the rule corresponding to the slot name into the rule for correction utterance.

Explanation of symbols

１０１マイク
１０２音声入力部
１０３スピーカ
１０４音声出力部
１０５情報処理部
１０６記憶部
１０７音声認識プログラム
１０８音声合成プログラム
１０９対話制御プログラム
１１０タスク実行用プログラム
１１１訂正発話用ルール生成プログラム
１１２記憶部
１１３対話履歴
１１４対話シナリオ
１１５音声認識用文法ルール
１１６訂正発話用テンプレート DESCRIPTION OF SYMBOLS 101 Microphone 102 Voice input part 103 Speaker 104 Voice output part 105 Information processing part 106 Storage part 107 Speech recognition program 108 Speech synthesis program 109 Dialog control program 110 Task execution program 111 Corrected utterance rule generation program 112 Storage part 113 Dialog history 114 Dialogue scenario 115 Grammar rules for speech recognition 116 Corrected utterance template

Claims

Means for inputting at least the user's voice;
Means for recognizing the input user's voice;
Means for converting the message from the system to the user and outputting it as voice;
Speech recognition grammar rule storage means for storing speech recognition grammar rules, which are rules for recognizing a user's speech;
A dialogue scenario storage means for storing a dialogue scenario which is information on the content of a dialogue between the user and the system;
Dialogue control means for realizing dialogue by controlling user's voice recognition, voice output of messages from the system, and the like based on information stored in the dialogue scenario storage means;
A task execution means for executing a task that is a process requested by a user and obtaining a result;
In a spoken dialogue system having
Progressive interaction condition, a conversation history storing means for storing the dialog history is a time-series of information consisting of speech recognition grammar rules and results recognized user's voice or the like was used to recognize the voice of the user ,
When the system has erroneously recognized the user's voice, used when the user generates the said speech recognition grammar rules used to recognize correct utterances is correct speech grammar rules to correct it a correction utterance template storage means for storing a correction utterance template being,
Use the information and the correct speech for the template in the dialog history, and the means for generating the correction speech for the grammar rules,
When the user's voice is recognized, the corrected utterance grammar rule is generated using the information in the dialogue history and the corrected utterance template, and the generated corrected utterance grammar rule is To recognize the user's voice,
If the user's utterance is recognized using the grammar rules for the corrected utterance, the user's voice is determined as a corrected utterance, and the system misrecognizes based on the information in the dialog history and the recognition result of the corrected utterance. Spoken dialogue system characterized by correcting

In the spoken dialogue system according to claim 1, as a method of generating the grammar rule for corrected utterance,
Wherein said recorded during conversation history the duplicates speech recognition grammar rules are, used as the correction speech grammar rules,
A method of embedding the speech recognition grammar rules that are recorded in the conversation history template for the correct utterance,
From the speech recognition grammar rules that are recorded in the dialog history, it extracts the speech recognition grammar rules word or phrase the system is focused on the basis of the interaction scenario templates for those the correct utterance How to embed
Wherein from the speech recognition grammar rules that are recorded in the dialog history, extracts the speech recognition grammar rules word or phrase the system is focused on the basis of the dialogue scenario, words that the system is focused or a method of embedding the speech recognition grammar rule and the extracted name representing the type of phrase in the template for the correct utterance,
From the speech recognition grammar rules that are recorded in the dialog history, on the basis of the conversation scenario to extract the speech recognition grammar rules word or phrase the system is focused, extracted the speech recognition grammar how to embed a word or phrase systems included in the result of recognizing the voice of the rules and the user is paying attention to the template for the correct utterance,
From the speech recognition grammar rules that are recorded in the dialog history, extracts the speech recognition grammar rules word or phrase the system is focused on the basis of the dialogue scenario, then used in the conversation a method of embedding the speech recognition grammar rule extracted in the speech recognition grammar rules that,
Wherein from the speech recognition grammar rules that are recorded in the dialog history, extracts the speech recognition grammar rules word or phrase the system is focused on the basis of the dialogue scenario, also then in the conversation the related word or phrase systems from within the speech recognition grammar rule used is paying attention to extract speech recognition grammar rule, embedded extracted the speech recognition grammar rule template for the correct utterance Method,
Of the voice dialogue system and generates the corrected speech grammar rules using at least one or more methods.

In the spoken dialogue system according to claim 1, as a method of correcting misrecognition when it is determined that a user's correction utterance has been input,
A method in which the second candidate in the recognition result of the user's voice recorded in the dialogue history is a corrected word or phrase,
The first word or phrase candidate in the user's speech recognition result recorded in the dialog history is compared with the first candidate in the corrected speech recognition result, and the former is different from the latter In the case, the first candidate in the recognition result of the corrected utterance is the corrected word or phrase, and in the same case, the second candidate in the correction utterance recognition result is the corrected word or phrase,
The first word or phrase candidate in the user's speech recognition result recorded in the dialog history is compared with the first candidate in the corrected speech recognition result, and the former is different from the latter In this case, the first candidate in the corrected utterance is the corrected word or phrase, and if different, the second candidate in the user's speech recognition result recorded in the dialogue history is corrected. A word or phrase,
Whether the words or phrases candidates in the recognition result of the user's voice recorded in the dialogue history and the recognition result of the corrected utterance are included in common, the reliability, ranking, etc. of each word or phrase integrating the basis evaluation value, and unlike the first place candidate interactions in recorded by the user's voice and recognition results in the history, and the highest word or phrase after correction candidate evaluation value how to,
A spoken dialogue system characterized by correcting misrecognition using at least one method.

In the voice interaction system according to claim 3, when the user's corrected utterance is continuously recognized,
The word or phrase selected as the correct candidate or the word or phrase selected as the corrected word or phrase within the range of the continued corrected utterance and the immediately preceding user utterance is excluded, and then the corrected word or phrase A spoken dialogue system characterized by selecting.

In the spoken dialogue system according to claim 1, as a method of returning to the original dialogue after detecting a user's correction utterance and correcting a misrecognition,
How to return to the conversational state when a corrected utterance is detected,
How to return to the next state of the conversation when a corrected utterance is detected,
When a message confirming the corrected word or phrase is output to the user and a positive response is obtained from the user, the conversation state is restored when the corrected utterance is detected, and a negative response is received. If the message is obtained, a positive response is received from the user to output a message prompting the user to re-enter the corrected word or phrase, voice recognition of the user, and confirmation of the recognized word or phrase. How to repeat until it is obtained,
When a message confirming the corrected word or phrase is output to the user and a positive response is obtained from the user, the conversation state is restored when the corrected utterance is detected, and a negative response is received. , You can cancel the correction of the word or phrase and return to the conversation state when the corrected utterance was detected,
When a message confirming the corrected word or phrase is output to the user and a positive response is obtained from the user, the conversation state is restored when the corrected utterance is detected, and a negative response is received. If you get, you can cancel the correction of the word or phrase and then return to the conversation state to enter the target word or phrase,
When a message confirming the corrected word or phrase is output to the user and a positive response is obtained from the user, the process proceeds to the next state of the conversation state when the corrected utterance is detected, If a negative response is obtained, the user affirms the message output prompting the user to re-enter the corrected word or phrase, the user's voice recognition, and the confirmation of the recognized word or phrase. To repeat until you get a positive response,
When a message confirming the corrected word or phrase is output to the user and a positive response is obtained from the user, the process proceeds to the next state of the conversation state when the corrected utterance is detected, If you get a negative response, cancel the correction of the word or phrase and return to the conversational state when the corrected utterance was detected,
When a message confirming the corrected word or phrase is output to the user and a positive response is obtained from the user, the process proceeds to the next state of the conversation state when the corrected utterance is detected, If you get a negative response, cancel the correction of the word or phrase and then return to the conversation state to enter the target word or phrase,
A spoken dialogue system, wherein at least one method is used to return to the original dialogue.

6. The spoken dialogue system according to claim 5, wherein when a corrected utterance is input from the user again after outputting a message confirming the corrected word or phrase to the user, the recognized word or phrase is used. A voice dialogue system that outputs a confirmation message to the user again after making corrections.