JP2001109493A

JP2001109493A - Voice interactive device

Info

Publication number: JP2001109493A
Application number: JP28931699A
Authority: JP
Inventors: Keisuke Watanabe; 圭輔渡邉; Akito Nagai; 明人永井; Yasushi Ishikawa; 泰石川
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1999-10-12
Filing date: 1999-10-12
Publication date: 2001-04-20
Anticipated expiration: 2019-10-12
Also published as: JP3941299B2

Abstract

PROBLEM TO BE SOLVED: To solve the problem that a conventional voice interactive devices where a keyword mutual relation extending over plural utterance is not considered, a user is recognized at every utterance for improving a interactive success rate, multiple recognition communication are executed and the convenience of the user and the naturalness of communication are damaged. SOLUTION: The interactive device is provided with a interactive procedure storage part storing the vocabulary of a recognition object in respective interactive states, a system response, a system response assumption response and a transition destination interactive state corresponding to the response, a voice recognition part recognizing voice with the vocabulary of the recognition object in the respective interactive states of the interactive procedure storage part and outputting plural results, a transition destination interactive state deciding operation deciding part deciding a transition destination interactive state according to the recognition result of the voice recognition part and the content of the interactive procedure storage part, deciding one state when the hypothesis of the transition destination interactive state satisfies a prescribed condition, holding decision when it does not satisfy the prescribed condition and outputting the hypothesis of the shift destination interactive state, and an interactive operation execution part outputting the system response recognizing the recognition result of the hypothesis of the transition destination interactive state when hypothesis is decided and outputting the system response of the hypothesis of the transition destination interactive state when decision is held.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は，自然言語によるマン
・マシン・インタフェースに用いられる音声対話処理装
置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech dialogue processing device used for a man-machine interface using a natural language.

【０００２】[0002]

【従来の技術】音声での対話により，利用者が必要とす
る情報を得るような音声対話装置の重要性が高まってい
る。このような音声対話装置においては，利用者の入力
中に含まれるキーワードを高い精度で認識し，かつ必要
な情報を利用者が効率的に得るための対話制御を行うこ
とが重要である。2. Description of the Related Art Speech dialogue devices that obtain information required by a user through voice dialogue have become increasingly important. In such a spoken dialogue apparatus, it is important to recognize keywords included in a user's input with high precision and to perform dialogue control so that the user can obtain necessary information efficiently.

【０００３】従来，高い正解率でキーワードを認識する
ために，入力音声に含まれる複数のキーワード間の関連
性を利用する方法が提案されている。例えば，図１４は
特開平7-92994号に示された認識候補抽出装置である。
このように構成された従来の認識候補抽出装置において
は，音声認識装置が，関連性を持った複数のキーワード
を含んだ連続音声から複数のキーワードを認識し，認識
尤度の高い順に各キーワードの認識結果を複数出力す
る。Conventionally, there has been proposed a method of utilizing the relevance between a plurality of keywords included in input speech in order to recognize a keyword with a high accuracy rate. For example, FIG. 14 shows a recognition candidate extracting device disclosed in Japanese Patent Laid-Open No. 7-92994.
In the conventional recognition candidate extraction device configured as described above, the speech recognition device recognizes a plurality of keywords from a continuous speech including a plurality of keywords having relevance, and ranks each keyword in descending order of recognition likelihood. Output multiple recognition results.

【０００４】候補抽出処理装置が，音声認識装置から出
力される認識結果から，予め定めたキーワード間の組合
わせ情報を用いて，組合わせ情報と一致するキーワード
の組のみを抽出して認識候補として出力し，関連性のな
いキーワードの組合わせを棄却することで認識候補の正
解率が向上する。[0004] A candidate extraction processing device extracts only a set of keywords that match the combination information from the recognition result output from the speech recognition device, using combination information between predetermined keywords, as a recognition candidate. By outputting and rejecting a combination of irrelevant keywords, the accuracy rate of recognition candidates is improved.

【０００５】さらに，確定処理装置が，候補抽出処理装
置から出力される認識候補を利用者に復唱確認すること
で確定し，復唱した認識候補が正しくないと判定された
場合には，複数のキーワードのうち第1のキーワードを
音声認識装置で認識し，この認識結果を確定処理装置で
確定した後，第2のキーワード以降の認識では確定した
キーワードと組合わせ可能なキーワードのみを認識候補
として抽出するため認識候補の正解率が向上する。[0005] Further, the confirmation processing device confirms the recognition candidate output from the candidate extraction processing device by repeating the confirmation to the user, and if it is determined that the repetition recognition candidate is not correct, a plurality of keywords are determined. After the first keyword is recognized by the speech recognition device and the result of the recognition is determined by the determination processing device, only the keywords that can be combined with the determined keyword are extracted as recognition candidates in the second and subsequent keywords. Therefore, the accuracy rate of the recognition candidate is improved.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら，上記の
ような従来の候補抽出処理装置では，複数の発話に跨っ
たキーワード相互の関係は考慮しておらず，利用者との
数回のやり取りによって得られる全入力項目の認識率を
向上させ対話成功率を高めるためには，一発話毎に利用
者へ確認を行い確定的に対話を進める必要があり，確認
対話が多くなり利用者の利便性および対話の自然性を損
なうという課題があった。However, in the conventional candidate extraction processing apparatus as described above, the relationship between keywords over a plurality of utterances is not taken into account, and it is obtained by several exchanges with the user. In order to improve the recognition rate of all input items and the success rate of dialogue, it is necessary to confirm with the user for each utterance and proceed deterministically. There was a problem of spoiling the naturalness of dialogue.

【０００７】この発明は，上述のような課題を解決する
ためになされたもので，一つの発話毎に確定処理を行わ
ずとも認識率を向上できる音声対話装置を得ることを目
的とするものである。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems, and has as its object to obtain a voice interactive device that can improve the recognition rate without performing a confirmation process for each utterance. is there.

【０００８】[0008]

【課題を解決するための手段】この発明に係る音声対話
装置は，音声認識部と，対話手順記憶部と，遷移先対話
状態確定動作決定部と，対話動作実行部とを備え，音声
での対話により，利用者が必要とする情報を得る音声対
話装置であって，対話手順記憶部は，各対話状態におけ
る認識対象語彙，システム応答と，システム応答に想定
される答え及びその答えに応じた遷移先対話状態を規定
して記憶し，音声認識部は，入力音声に対して，対話手
順記憶部に記憶された各対話状態に応じた認識対象語彙
を用いて音声認識を行って，複数の認識結果を出力し，
遷移先対話状態確定動作決定部は，前記音声認識部から
の認識結果と対話手順記憶部の内容により遷移先対話状
態を定め、その遷移先対話状態の仮説が予め定められた
所定条件を満たす場合は一つに確定し、所定条件を満た
さない場合は確定を保留する決定をすると共に遷移先対
話状態仮説を出力し，対話動作実行部は、仮説を一つに
確定する場合、遷移先対話状態確定動作決定部からの遷
移先対話状態仮説の認識結果を確認するシステム応答を
出力し、確定を保留する場合は、遷移先対話状態仮説の
システム応答を出力するものである。A speech dialogue apparatus according to the present invention includes a speech recognition section, a dialogue procedure storage section, a transition destination dialogue state determination action determination section, and a dialogue action execution section. This is a spoken dialogue device that obtains information required by a user through a dialogue. The dialogue procedure storage unit stores the vocabulary to be recognized in each dialogue state, the system response, the answer assumed for the system response, and the answer. The speech recognition unit defines and stores the transition destination conversation state, and performs speech recognition on the input speech using the recognition target vocabulary corresponding to each conversation state stored in the conversation procedure storage unit. Output recognition result,
The destination dialog state determination operation determining unit determines the destination dialog state based on the recognition result from the speech recognition unit and the contents of the dialog procedure storage unit, and when the hypothesis of the destination dialog state satisfies a predetermined condition. Is determined as one, and if the predetermined condition is not satisfied, the determination is suspended and the transition destination dialog state hypothesis is output. It outputs a system response for confirming the recognition result of the transition destination dialog state hypothesis from the decision operation determining unit, and outputs a system response of the transition destination dialog state hypothesis when suspending the decision.

【０００９】また，この発明に係る音声対話装置は，対
話状態遷移記憶部と，遷移先対話状態確定部と，暫定遷
移先対話状態決定部とを付加し、遷移先対話状態確定動
作決定部は，音声認識部からの認識結果と対話状態遷移
記憶部又は対話手順記憶部の内容とから定まる遷移先対
話状態の仮説を一つに確定するか、確定を保留するかを
決定し，遷移先対話状態仮説を出力し，遷移先対話状態
確定部は，遷移先対話状態確定動作決定部からの遷移先
対話状態仮説を一つに確定する場合にその遷移先対話状
態仮説を入力とし，利用者に認識結果を確認することに
より遷移先対話状態を確定して出力するとともに，対話
状態遷移記憶部に対し，記憶されている遷移先対話状態
仮説をを書変え，暫定遷移先対話状態決定部は，遷移先
対話状態確定動作決定部からの遷移先対話状態仮説を保
留する場合にその遷移先対話状態仮説を入力とし，暫定
的な遷移先対話状態を決定して出力するとともに，対話
状態遷移記憶部に対し遷移先対話状態仮説を書変え，対
話状態遷移記憶部は，対話開始時点からの対話状態遷移
履歴と遷移先対話状態確定部又は，暫定遷移先対話状態
決定部からの遷移先対話状態仮説を記憶し，対話動作実
行部は，前記遷移先対話状態確定部または暫定遷移先対
話状態決定部からの遷移先対話状態を入力とし，該遷移
先対話状態に規定されたシステム応答を出力するととも
に，該遷移先対話状態に規定された認識対象語彙を前記
音声認識部に出力し，音声認識部は，入力音声に対し
て，対話動作実行部から入力される認識対象語彙を用い
て音声認識を行い，複数の認識結果を出力するものであ
る。Further, the speech dialogue apparatus according to the present invention further includes a dialogue state transition storage unit, a transitional destination dialogue state determination unit, and a provisional transitional destination dialogue state determination unit. , Determine whether the hypothesis of the transition destination dialog state determined from the recognition result from the speech recognition unit and the contents of the dialog state transition storage unit or the dialog procedure storage unit is to be determined as one or suspend the determination. Outputs the state hypothesis, and the transition destination dialog state determination unit inputs the transition destination dialog state hypothesis when the transition destination dialog state hypothesis from the transition destination dialog state determination operation determination unit is determined as one, and provides the user with the By confirming the recognition result, the transition destination dialog state is determined and output, and the stored transition destination dialog state hypothesis is rewritten in the dialog state transition storage unit. Transition destination dialog state determination operation When the transition destination dialog state hypothesis from the fixed part is suspended, the transition destination dialog state hypothesis is input, the provisional transition destination dialog state is determined and output, and the transition state dialog state is stored in the dialog state transition storage unit. The hypothesis is rewritten, and the dialog state transition storage unit stores the transition state transition history from the dialog start time and the transition destination dialog state hypothesis from the transition destination dialog state determination unit or the provisional transition destination dialog state determination unit, and performs the dialog operation. The execution unit receives the transition destination dialog state from the transition destination dialog state determination unit or the provisional transition destination dialog state determination unit, outputs a system response defined in the transition destination dialog state, and outputs the transition destination conversation state. The recognition target vocabulary specified in the above is output to the speech recognition unit, and the speech recognition unit performs speech recognition on the input speech using the recognition target vocabulary input from the dialogue execution unit, and obtains a plurality of recognition results. To It is intended to force.

【００１０】また，この発明に係る音声対話装置は，音
声認識部は，複数の認識結果とその認識結果のスコアを
出力するように構成され、遷移先対話状態確定動作決定
部は，音声認識部からの入力された認識結果のスコアに
応じて確定動作を行うか否かを決定するものである。[0010] Further, in the voice interaction apparatus according to the present invention, the voice recognition unit is configured to output a plurality of recognition results and a score of the recognition results, and the transition destination dialog state determination operation determination unit includes the voice recognition unit. It is determined whether or not to execute the finalizing operation according to the score of the recognition result input from the CPU.

【００１１】また，この発明に係る音声対話装置は，対
話手順記憶部に記憶された各対話状態には，他の対話状
態から該対話状態へ状態遷移を行うために予め確定動作
を行う必要があるか否かを記述し，遷移先対話状態確定
動作決定部は，前記音声認識部から入力される認識結果
と前記対話状態遷移記憶部の内容と前記対話手順とから
定まる遷移先対話状態の仮説が，予め確定動作を行う必
要があるものの場合に確定動作を行うと決定するもので
ある。Further, in the speech dialogue apparatus according to the present invention, it is necessary to perform a definite operation in advance in each dialogue state stored in the dialogue procedure storage unit in order to make a state transition from another dialogue state to the dialogue state. The destination dialog state determination operation determination unit describes whether or not there is, and the hypothesis of the destination dialog state determined from the recognition result input from the speech recognition unit, the contents of the dialog state transition storage unit, and the dialog procedure. However, when it is necessary to perform the determining operation in advance, it is determined that the determining operation is performed.

【００１２】また，この発明に係る音声対話装置は，遷
移先対話状態確定動作決定部は，利用者からの入力項目
がすべて入力されていなくても，音声認識部からの認識
結果を確定することにより未入力項目に対する項目値が
一意に定まる場合に確定動作を行うと決定するものであ
る。Further, in the voice interactive device according to the present invention, the transition destination interactive state determining operation determining unit determines the recognition result from the voice recognizing unit even if all the input items from the user are not input. When the item value for an uninput item is uniquely determined, the determination operation is determined to be performed.

【００１３】また，この発明に係る音声対話装置は，遷
移先対話状態確定動作決定部は，遷移先対話状態仮説に
規定されたシステム応答に応じて確定動作を行うか否か
を決定するものである。Further, in the voice interaction apparatus according to the present invention, the transition destination dialog state determination operation determining unit determines whether to perform the determination operation according to a system response defined in the transition destination dialog state hypothesis. is there.

【００１４】また，この発明に係る音声対話装置は，遷
移先対話状態確定動作決定部は，遷移先対話状態仮説に
共通のシステム応答が存在しない場合に確定動作を行う
と決定し，遷移先対話状態仮説に共通のシステム応答が
存在する場合には，共通のシステム発話を持つ遷移先対
話状態仮説のみを遷移先対話状態仮説として出力するも
のである。Further, in the voice interaction apparatus according to the present invention, the transition destination dialog state determination operation determination unit determines that the determination operation is performed when there is no common system response in the transition destination dialog state hypothesis, and If there is a common system response in the state hypothesis, only the transition destination dialog state hypothesis having a common system utterance is output as the transition destination dialog state hypothesis.

【００１５】また，この発明に係る音声対話装置は，対
話手順記憶部に記憶された各対話状態には，複数のシス
テム応答を記述でき，対話動作実行部は，暫定遷移先対
話状態決定部から遷移先対話状態が入力された場合，入
力された遷移先対話状態に規定されたシステム応答のう
ち，前記対話状態遷移記憶部に記憶された遷移先対話状
態仮説に規定されたシステム応答と共通のものを出力す
るものである。Further, in the speech dialogue apparatus according to the present invention, each dialogue state stored in the dialogue procedure storage unit can describe a plurality of system responses, and the dialogue operation execution unit transmits the dialogue state from the temporary transition destination dialogue state determination unit. When the destination dialog state is input, of the system responses specified in the input destination dialog state, the common system responses are the same as the system response specified in the destination dialog state hypothesis stored in the dialog state transition storage unit. It is to output things.

【００１６】また，この発明に係る音声対話装置は，遷
移先対話状態確定動作決定部は，遷移先対話状態仮説の
全ての認識対象語彙を合計した語彙の規模が予め定めた
基準より大きい場合に確定動作を行うと決定するもので
ある。Also, in the voice interaction device according to the present invention, the transition destination dialog state determination operation determining unit determines that the total vocabulary of all the recognition target words of the transition destination dialog state hypothesis is larger than a predetermined reference. It is determined that the fixing operation is performed.

【００１７】また，この発明に係る音声対話装置は，遷
移先対話状態確定動作決定部は，前記対話状態遷移記憶
部を参照して，確定した対話状態から遷移先対話状態仮
説までの遷移系列の長さが予め定めた基準値以上の場合
に確定動作を行うと決定するものである。Further, in the speech dialogue apparatus according to the present invention, the transition destination dialog state determining operation determining unit refers to the dialog state transition storage unit to generate a transition sequence from the determined dialog state to the transition destination dialog state hypothesis. When the length is equal to or larger than a predetermined reference value, it is determined that the fixing operation is performed.

【００１８】[0018]

【発明の実施の形態】実施の形態１.図１はこの発明の
音声対話装置の実施の形態１の構成図を示すものであ
る。1は，入力音声に対して，後述する対話動作実行部
から入力される認識対象語彙を用いて音声認識を行い，
複数の認識結果および認識結果のスコアを出力する音声
認識部，2は，各対話状態における認識対象語彙，シス
テム応答，音声認識結果に応じた遷移先対話状態を規定
した対話手順記憶部，3は，対話開始時点からの対話状
態遷移履歴および遷移先対話状態仮説を記憶する対話状
態遷移記憶部，4は，前記音声認識部からの認識結果を
入力とし，該認識結果と前記対話状態遷移記憶部の内容
と前記対話手順とから定まる遷移先対話状態の仮説を一
つに確定するか否かを決定し，確定する場合には後述す
る遷移先対話状態確定部に遷移先対話状態仮説を出力
し，確定を保留する場合には暫定遷移先対話状態決定部
に遷移先対話状態仮説を出力する遷移先対話状態確定動
作決定部である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiment 1. FIG. 1 is a block diagram showing Embodiment 1 of a voice interaction apparatus according to the present invention. 1 performs speech recognition on the input speech using the recognition target vocabulary input from the interactive operation execution unit described later,
A speech recognition unit that outputs a plurality of recognition results and scores of the recognition results, 2 is a dialogue procedure storage unit that specifies the vocabulary to be recognized in each dialogue state, a system response, and a transition destination dialogue state according to the speech recognition result. A conversation state transition storage unit for storing a conversation state transition history from the start of the conversation and a transition destination conversation state hypothesis; and a recognition result from the speech recognition unit as an input, and the recognition result and the conversation state transition storage unit. Is determined whether the hypothesis of the destination dialog state determined from the contents of the above and the dialog procedure is determined as one, and if it is determined, the hypothesis of the transition destination dialog state is output to a destination dialog state determination unit described later. The transition destination dialog state determining operation determining unit outputs the transition destination dialog state hypothesis to the temporary transition destination dialog state determining unit when the determination is suspended.

【００１９】５は，前記遷移先対話状態確定動作決定部
からの遷移先対話状態仮説を入力とし，利用者へ認識結
果を確認することにより遷移先対話状態を確定して出力
するとともに，前記対話状態遷移記憶部に対し，記憶さ
れている遷移先対話状態仮説をすべて削除し，該確定し
た遷移先対話状態を書き加える遷移先対話状態確定部で
ある。Reference numeral 5 designates a transition destination dialog state hypothesis from the transition destination dialog state determination operation determination section as input, confirms a recognition result to a user to determine and output a transition destination dialog state, and outputs the dialog state. A transition destination dialog state determination unit that deletes all stored transition destination dialog state hypotheses from the state transition storage unit and writes the determined transition destination dialog state.

【００２０】６は，前記確定動作決定部からの遷移先対
話状態仮説を入力とし，認識結果のスコアに基づいて暫
定的な遷移先対話状態を決定して出力するとともに，前
記対話状態遷移記憶部に対し遷移先対話状態仮説を書き
加える暫定遷移先対話状態決定部，7は，前記遷移先対
話状態確定部あるいは暫定遷移先対話状態決定部からの
遷移先対話状態を入力とし，該遷移先対話状態に規定さ
れたシステム応答を出力するとともに，該遷移先対話状
態に規定された認識対象語彙と，前記対話状態遷移記憶
部に記憶された遷移先対話状態仮説に規定された認識対
象語彙を前記音声認識部に出力する対話動作実行部であ
る。[0020] A transition destination dialog state hypothesis from the definite operation determination unit is input, a provisional transition destination dialog state is determined and output based on the score of the recognition result, and the dialog state transition storage unit is determined. A transition destination dialog state hypothesis, which writes a transition destination dialog state hypothesis to the transition destination dialog state determination unit or the provisional transition destination dialog state determination unit. A system response defined in the state is output, and a recognition target vocabulary defined in the transition destination dialog state and a recognition target vocabulary defined in the transition destination dialog state hypothesis stored in the dialog state transition storage unit are output from the system. A dialogue execution unit that outputs to the voice recognition unit.

【００２１】以下，この発明を電話番号案内音声対話装
置として用いた場合について具体的な動作説明を行う。
電話番号案内音声対話装置とは，利用者が装置と音声で
対話することで，電話番号案内に必要な都道府県名，市
区町村名，業種，対象名などの項目値を入力し，装置は
入力された項目値に基づき電話番号の検索を行い，利用
者に電話番号を案内するものである。Hereinafter, a specific description will be given of the case where the present invention is used as a telephone number guidance voice interactive device.
A telephone number guidance voice interactive device is a device in which a user interacts with the device by voice, and inputs item values such as a prefecture name, a municipal name, a business type, and a target name necessary for telephone number guidance. The telephone number is searched based on the input item values, and the user is guided to the telephone number.

【００２２】図２は対話手順記憶部に保持された対話状
態の一例を示すものである。例えば，対話状態S1におい
ては，システム応答としてR1「県名をどうぞ」が，認識
対象語彙V1として都道府県名が規定されている。また，
認識結果が「北海道」の場合の遷移先対話状態としてS2
が規定されている。以下，音声認識部が出力する認識結
果の個数Nを5，遷移先対話状態確定動作決定部で用いる
一位の認識結果のスコアに対する閾値を0.5，対話開始
状態がS1の場合を例に説明する。FIG. 2 shows an example of the dialogue state stored in the dialogue procedure storage unit. For example, in the dialogue state S1, R1 "Please name a prefecture" is defined as a system response, and a prefecture name is defined as a recognition target vocabulary V1. Also,
S2 as the transition destination dialog state when the recognition result is "Hokkaido"
Is stipulated. Hereinafter, an example will be described in which the number N of recognition results output by the voice recognition unit is 5, the threshold for the score of the first recognition result used by the transition destination dialog state determination operation determination unit is 0.5, and the dialog start state is S1. .

【００２３】対話開始状態S1に基づいて，対話動作実行
部がシステム応答R1「都道府県名をどうぞ」を利用者に
出力し，認識対象語彙V1を音声認識部に出力することに
より対話を開始する。Based on the dialogue start state S1, the dialogue execution unit outputs the system response R1 "Please name a prefecture" to the user, and outputs the recognition target vocabulary V1 to the speech recognition unit to start the dialogue. .

【００２４】利用者が音声入力を行うと，音声認識部は
認識対象語彙V1を用いて音声認識処理を行い認識結果と
スコアを出力する。例えば利用者が「佐賀です」と入力
した場合，認識結果として「滋賀(0.88)，佐賀(0.87)，
香川(0.73)，神奈川(0.52)，鹿児島(0.50)」の5つの候
補を出力する。括弧内の数値は各認識候補に対するスコ
アであり1に近いほど良いスコアを表す。When the user performs a voice input, the voice recognition unit performs a voice recognition process using the recognition target vocabulary V1, and outputs a recognition result and a score. For example, if the user inputs "Saga is", the recognition result will be "Shiga (0.88), Saga (0.87),
Kagawa (0.73), Kanagawa (0.52) and Kagoshima (0.50) "are output. The numerical value in parentheses is a score for each recognition candidate, and the closer to 1, the better the score.

【００２５】認識結果が入力されると，遷移先対話状態
確定動作決定部は現在の対話状態S1に規定された遷移テ
ーブル T1 を参照して，前述の5つの認識結果に対する
遷移先対話状態の仮説として 5つの対話状態 S15, S16,
S17, S18, S19 を得る。次に，一位の認識結果「滋
賀」のスコアが 0.88 で閾値以上であるため確定を保留
すると決定し，5つの遷移先対話状態仮説を暫定遷移先
対話状態決定部に出力する。When the recognition result is input, the transition destination dialog state determination operation determining unit refers to the transition table T1 defined in the current conversation state S1, and determines the hypothesis of the transition destination dialog state for the above five recognition results. As five dialogue states S15, S16,
Obtain S17, S18, S19. Next, because the score of the first-ranked recognition result “Shiga” is 0.88, which is above the threshold, it is decided to suspend the decision, and the five transition destination dialog state hypotheses are output to the provisional transition destination dialog state determination unit.

【００２６】暫定遷移先対話状態決定部は，入力された
遷移先対話状態仮説から認識結果のスコアに基づいて暫
定的な遷移先対話状態仮説を一つ選択する。例えば，前
述の5つの遷移先対話状態仮説 S15, S16, S17, S18, S1
9 に対しては，スコアの最も良い「滋賀」に対する遷移
先対話状態仮説S16を選択し，対話動作実行部へ出力す
る。さらに，すべての遷移先対話状態仮説を対話状態遷
移記憶部に書き加える．例えば，図３は5つの遷移先対
話状態仮説 S15, S16, S17, S18,S19 を対話状態遷移記
憶部に書き加えた結果を示している。The provisional transition destination dialog state determination unit selects one provisional transition destination conversation state hypothesis from the input transition destination conversation state hypothesis based on the score of the recognition result. For example, the above five transition destination dialog state hypotheses S15, S16, S17, S18, S1
With respect to 9, the transition destination dialog state hypothesis S16 for “Shiga” having the best score is selected and output to the dialog operation execution unit. In addition, all transition destination dialog state hypotheses are added to the dialog state transition storage unit. For example, FIG. 3 shows a result of adding five transition destination dialog state hypotheses S15, S16, S17, S18, and S19 to the dialog state transition storage unit.

【００２７】暫定遷移先対話状態決定部から対話状態S1
6が入力されると，対話動作実行部はシステム応答R16
「市名をどうぞ」を利用者に出力するとともに，対話状
態S16に規定された認識対象語彙V16と，対話状態遷移記
憶部に記憶された4つの遷移先対話状態仮説S15,S17,S1
8,S19に規定された認識対象語彙V15,V17,V18,V19を音声
認識部に出力する。From the temporary transition destination dialog state determination unit, the dialog state S1
When 6 is input, the interactive operation execution unit returns the system response R16.
Along with outputting "Please enter the city name" to the user, the recognition target vocabulary V16 defined in the dialog state S16 and the four transition destination dialog state hypotheses S15, S17, and S1 stored in the dialog state transition storage unit.
8. The recognition target words V15, V17, V18, and V19 defined in S19 are output to the speech recognition unit.

【００２８】対話動作実行部が出力したシステム応答
「市名をどうぞ」に対して，利用者が「伊万里市です」
と入力した場合，音声認識部は認識対象語彙V15,V16,V1
7,V18,V19を用いて音声認識処理を行い，認識結果とし
て「伊万里(0.91)，出水(0.76)，伊勢原(0.30)，八日市
(0.11)，平塚(0.09)」を出力する。[0028] In response to the system response "Please enter the city name" output by the interactive operation execution unit, the user is "Imari city."
, The voice recognition unit recognizes the vocabulary V15, V16, V1
Speech recognition processing was performed using 7, V18, and V19, and the recognition results were "Imari (0.91), Izumi (0.76), Isehara (0.30), Yokaichi
(0.11), Hiratsuka (0.09) ".

【００２９】遷移先対話状態確定動作決定部は，遷移先
対話状態仮説 S15, S16, S17, S18,S19 に規定された遷
移テーブル T15, T16, T17, T18, T19 を参照して，認
識結果に対する遷移先対話状態の仮説として5つの対話
状態 S152, S153, S163,S182,S192 を得る。現在の対話
状態S16からの遷移先対話状態仮説であるS163 に対する
認識結果「八日市」のスコアは0.11で閾値以下のため，
遷移先対話状態確定動作決定部は遷移先対話状態の確定
を行うと決定し，遷移先対話状態仮説S152, S153, S16
3, S182, S192 を遷移先対話状態確定部に出力する。The destination dialog state determination operation determining unit refers to the transition tables T15, T16, T17, T18, and T19 defined in the destination dialog state hypotheses S15, S16, S17, S18, and S19, and determines the recognition result. Five dialog states S152, S153, S163, S182, and S192 are obtained as hypotheses of the transition destination dialog state. Since the score of "Yokaichi", which is the recognition result for S163, which is the transition destination dialog state hypothesis from the current dialog state S16, is 0.11 and less than the threshold,
The destination dialog state determination operation determination unit determines to determine the destination dialog state, and determines the destination dialog state hypotheses S152, S153, and S16.
3. Output S182 and S192 to the transition destination dialog state determination unit.

【００３０】遷移先対話状態確定部は，遷移先対話状態
の仮説が入力されると，例えばスコアの良い認識結果か
ら順に利用者に確認を行い遷移先対話状態を確定する。
遷移先対話状態仮説 S152, S153, S163, S182, S192 が
入力された場合には，利用者にまず「伊万里市ですか」
と確認を行い，利用者がこれに対して「はい」と応答す
ることにより遷移先対話状態が S182 に確定する。When the hypothesis of the transition destination dialog state is input, the transition destination dialog state determination unit confirms the user in order from, for example, a recognition result having a good score, and determines the transition destination dialog state.
If the destination dialogue state hypothesis S152, S153, S163, S182, S192 is entered, the user first asks, "Is it Imari City?"
When the user responds “Yes” to this, the transition destination dialog state is determined in S182.

【００３１】遷移先対話状態が確定した後，遷移先対話
状態確定部は確定した対話状態 S182 を対話動作実行部
に出力するとともに対話状態 S182 を対話状態遷移記憶
部に書き加え，さらに対話状態遷移記憶部に記憶されて
いた遷移先対話状態仮説 S15, S16, S17, S19 を対話状
態遷移記憶部から削除する。以上の動作を行った後の対
話状態遷移記憶部は図4に示すものとなる。After the transition destination dialog state is determined, the transition destination dialog state determination unit outputs the determined dialog state S182 to the dialog operation execution unit, writes the dialog state S182 into the dialog state transition storage unit, and furthermore, the dialog state transition The transition destination dialog state hypotheses S15, S16, S17, and S19 stored in the storage unit are deleted from the dialog state transition storage unit. The dialog state transition storage unit after performing the above operation is as shown in FIG.

【００３２】対話状態 S182 が入力されると，対話動作
実行部はシステム応答 R182「町名をどうぞ」を利用者
に出力するとともに，対話状態 S182 に規定された認識
対象語彙 V182 を音声認識部に出力し対話を継続する。When the dialogue state S182 is input, the dialogue execution unit outputs to the user the system response R182 "Please enter the town name" and also outputs the recognition target vocabulary V182 defined in the dialogue state S182 to the speech recognition unit. And continue the dialogue.

【００３３】以上の動作により，対話状態遷移記憶部が
利用者の入力に対する対話状態遷移の仮説を複数保持
し，遷移先対話状態確定動作決定部が，現在選択してい
る仮説に対する認識スコアが閾値より悪くなったときに
遷移先対話状態を一つに確定するため，一発話毎に利用
者へ確認を行なって確定的に対話を進めなくても認識率
を向上でき，さらに確認対話の回数が減るため利用者と
装置との自然な対話が実現でき利用者の利便性が向上す
る。With the above operations, the dialog state transition storage unit holds a plurality of dialog state transition hypotheses in response to a user's input, and the transition destination dialog state determination operation determination unit determines that the recognition score for the currently selected hypothesis is a threshold. When the state becomes worse, the transition destination dialog state is determined as one, so that the user can be confirmed for each utterance and the recognition rate can be improved without having to proceed with the dialog steadily. As a result, a natural dialog between the user and the device can be realized, and the convenience of the user is improved.

【００３４】なお、上記実施の形態１の構成から対話状
態遷移記憶部と，遷移先対話状態確定部と，暫定遷移先
対話状態決定部とを省き，音声認識部と，対話手順記憶
部と，遷移先対話状態確定動作決定部と，対話動作実行
部とで構成することも可能である。この音声対話装置に
あって，対話手順記憶部は，各対話状態における認識対
象語彙，システム応答と，システム応答に想定される答
え及びその答えに応じた遷移先対話状態を規定して記憶
する。Note that, from the configuration of the first embodiment, the dialog state transition storage unit, the transition destination dialog state determination unit, and the provisional transition destination dialog state determination unit are omitted, and the speech recognition unit, the dialog procedure storage unit, It is also possible to comprise a transition destination dialog state determination operation determination unit and a dialog operation execution unit. In this spoken dialogue apparatus, the dialogue procedure storage unit defines and stores a vocabulary to be recognized in each dialogue state, a system response, an answer assumed for the system response, and a transition destination dialogue state corresponding to the answer.

【００３５】音声認識部は，入力音声に対して，対話手
順記憶部に記憶された各対話状態に応じた認識対象語彙
を用いて音声認識を行って，複数の認識結果を出力す
る。遷移先対話状態確定動作決定部は，前記音声認識部
からの認識結果と対話手順記憶部の内容により遷移先対
話状態を定め、その遷移先対話状態の仮説が予め定めら
れた所定条件を満たす場合は一つに確定し、所定条件を
満たさない場合は確定を保留する決定をすると共に遷移
先対話状態仮説を出力する。対話動作実行部は、仮説を
一つに確定する場合、遷移先対話状態確定動作決定部か
らの遷移先対話状態仮説の認識結果を確認するシステム
応答を出力し、確定を保留する場合は、遷移先対話状態
仮説のシステム応答を出力する。The speech recognition unit performs speech recognition on the input speech using the recognition target vocabulary corresponding to each conversation state stored in the conversation procedure storage unit, and outputs a plurality of recognition results. The destination dialog state determination operation determining unit determines the destination dialog state based on the recognition result from the speech recognition unit and the contents of the dialog procedure storage unit, and when the hypothesis of the destination dialog state satisfies a predetermined condition. Is determined to be one, and if the predetermined condition is not satisfied, the determination is suspended and the transition destination dialog state hypothesis is output. The dialogue execution unit outputs a system response for confirming the recognition result of the transition destination dialog state hypothesis from the transition destination dialog state determination operation determination unit when the hypothesis is determined to be one. Outputs the system response of the previous conversation state hypothesis.

【００３６】以上のような構成の音声対話装置にあって
は，利用者が音声入力を行うと，音声認識部は対話手順
記憶部に記憶された認識対象語彙V1を用いて音声認識処
理を行い認識結果を出力する。例えば利用者が「佐賀で
す」と入力した場合，認識結果として「滋賀(0.88)，佐
賀(0.87)，香川(0.73)，神奈川(0.52)，鹿児島(0.50)」
の5つの候補を出力する。括弧内の数値は予め定められ
た所定条件としての各認識候補に対するスコアであり1
に近いほど良いスコアを表す。In the above-structured speech dialogue apparatus, when the user performs a speech input, the speech recognition unit performs speech recognition processing using the recognition target vocabulary V1 stored in the dialogue procedure storage unit. Output recognition result. For example, if the user inputs "Saga", the recognition result is "Shiga (0.88), Saga (0.87), Kagawa (0.73), Kanagawa (0.52), Kagoshima (0.50)"
Output the five candidates. The numerical value in parentheses is the score for each recognition candidate as a predetermined condition and is 1
The closer to, the better the score.

【００３７】認識結果が入力されると，遷移先対話状態
確定動作決定部は現在の対話状態S1に規定された遷移テ
ーブル T1 を参照して，前述の5つの認識結果に対する
遷移先対話状態の仮説として 5つの対話状態 S15, S16,
S17, S18, S19 を得，フラグをたてる。次に，予め定
められた所定条件が認識結果のスコアであるとすると，
一位の認識結果「滋賀」のスコアが 0.88 で所定条件の
閾値以上であるため確定を保留すると決定する。When the recognition result is input, the transition destination dialog state determination operation determination unit refers to the transition table T1 defined in the current conversation state S1, and determines the hypothesis of the transition destination dialog state for the above five recognition results. As five dialogue states S15, S16,
Obtain S17, S18, S19 and set flags. Next, assuming that a predetermined condition is a score of a recognition result,
Since the score of the first-ranked recognition result "Shiga" is 0.88, which is above the threshold value of the predetermined condition, it is decided to suspend the determination.

【００３８】次に遷移先対話状態確定動作決定部は，遷
移先対話状態仮説から認識結果のスコアに基づいて暫定
的な遷移先対話状態仮説を一つ選択する。例えば，前述
の5つの遷移先対話状態仮説 S15, S16, S17, S18, S19
に対しては，スコアの最も良い「滋賀」に対する遷移先
対話状態仮説S16を選択し，対話動作実行部へ出力す
る。Next, the transition destination dialog state determination operation determining unit selects one temporary transition destination dialog state hypothesis from the transition destination dialog state hypothesis based on the score of the recognition result. For example, the above five transition destination dialog state hypotheses S15, S16, S17, S18, S19
With respect to, the transition destination dialog state hypothesis S16 for “Shiga” having the best score is selected and output to the dialog operation execution unit.

【００３９】対話動作実行部は遷移先対話状態仮説S16
のシステム応答R16「市名をどうぞ」を利用者に出力す
る。The dialogue operation execution part is a transition destination dialogue state hypothesis S16.
The system response R16 "Please enter the city name" is output to the user.

【００４０】対話動作実行部が出力したシステム応答
「市名をどうぞ」に対して，利用者が「伊万里市です」
と入力した場合，音声認識部は対話手順記憶部の認識対
象語彙V15,V16,V17,V18,V19を用いて音声認識処理を行
い，認識結果として「伊万里(0.91)，出水(0.76)，伊勢
原(0.30)，八日市(0.11)，平塚(0.09)」を出力する。In response to the system response "Please enter the city name" output by the interactive operation execution unit, the user is "Imari city."
When the speech recognition unit is input, the speech recognition unit performs speech recognition processing using the recognition target vocabulary V15, V16, V17, V18, and V19 of the dialogue procedure storage unit, and as a recognition result, "Imari (0.91), Izumi (0.76), Isehara (0.30), Yokaichi (0.11), Hiratsuka (0.09) ".

【００４１】遷移先対話状態確定動作決定部は，遷移先
対話状態仮説 S15, S16, S17, S18,S19 に規定された遷
移テーブル T15, T16, T17, T18, T19 を参照して，認
識結果に対する遷移先対話状態の仮説として5つの対話
状態 S152, S153, S163,S182,S192 を得る。The destination dialog state determination operation determination unit refers to the transition tables T15, T16, T17, T18, and T19 defined in the destination dialog state hypotheses S15, S16, S17, S18, and S19, and determines the recognition result. Five dialog states S152, S153, S163, S182, and S192 are obtained as hypotheses of the transition destination dialog state.

【００４２】現在の対話状態S16からの遷移先対話状態
仮説であるS163 に対する認識結果「八日市」のスコア
は0.11で所定条件の閾値以下のため，遷移先対話状態確
定動作決定部は遷移先対話状態の確定を行うと決定す
る。Since the score of "Yokaichi", which is the recognition result for S163, which is the transition destination dialog state hypothesis from the current conversation state S16, is 0.11 or less than the threshold value of the predetermined condition, the transition destination dialog state determination operation determination unit determines the transition destination dialog state. Is determined.

【００４３】対話動作実行部は、例えばスコアの良い認
識結果から順に利用者に確認を行い遷移先対話状態を確
定する。遷移先対話状態仮説 S152, S153, S163, S182,
S192 が入力された場合には，利用者にまず「伊万里市
ですか」と確認を行い，利用者がこれに対して「はい」
と応答することにより遷移先対話状態が S182 に確定す
る。The interactive operation execution unit confirms the user in order from the recognition result having the highest score, for example, and determines the transition destination interactive state. Transition destination dialog state hypothesis S152, S153, S163, S182,
When S192 is entered, the user is first asked, "Is it Imari City?" And the user responds "Yes"
The destination dialog state is determined in S182 by responding with.

【００４４】対話状態 S182 が入力されると，対話動作
実行部はシステム応答 R182「町名をどうぞ」を利用者
に出力するとともに，対話状態 S182 に規定された認識
対象語彙 V182 を音声認識部に出力し対話を継続する。
以上のように，遷移先対話状態確定動作決定部が遷移先
対話状態の仮説を一つに確定するまで、動作を繰り返
す。When the dialogue state S182 is input, the dialogue execution unit outputs the system response R182 "Please enter the town name" to the user, and also outputs the recognition target vocabulary V182 defined in the dialogue state S182 to the speech recognition unit. And continue the dialogue.
As described above, the operation is repeated until the transition destination dialog state determining operation determining unit determines one hypothesis of the transition destination dialog state.

【００４５】実施の形態２．実施の形態２は上述の実施
の形態１とは遷移先対話状態確定動作決定部の動作が異
なるものであり、他は上述の実施の形態１と同様であ
る。以下，図１の対話手順記憶部と遷移先対話状態確定
動作決定部の動作について説明する。Embodiment 2 The second embodiment is different from the above-described first embodiment in the operation of the transition destination dialog state deciding operation determining unit, and the other is the same as the above-described first embodiment. Hereinafter, the operations of the interaction procedure storage unit and the transition destination interaction state determination operation determination unit of FIG. 1 will be described.

【００４６】図２，図５および図６は対話手順記憶部に
保持された対話状態の一例である。図６の対話状態 S18
231,S18241, S18251, S18281 に規定された語彙 V1823
1, V18241, V18251, V18281の規模は大きく，他の対話
状態の認識対象語彙と同時に音声認識処理を行うことが
望ましくなくいため，これらの対話状態に遷移する前に
対話状態の確定動作を行う必要があるという条件が規定
されている。FIGS. 2, 5 and 6 show examples of the dialogue state held in the dialogue procedure storage unit. Dialogue state S18 in FIG.
Vocabulary specified in 231, S18241, S18251, S18281 V1823
1, V18241, V18251, and V18281 are large in scale, and it is not desirable to perform speech recognition at the same time as the vocabulary to be recognized in other dialogue states, so it is necessary to determine the dialogue state before transitioning to these dialogue states There is a condition that there is.

【００４７】以下，装置と利用者が対話状態 S1 から対
話を開始した後，実施の形態1と同様に対話を行い，現
在の対話状態が S182 である場合を例に説明する。Hereinafter, a case will be described in which, after the apparatus and the user start a conversation from the conversation state S1, the conversation is performed in the same manner as in the first embodiment, and the current conversation state is S182.

【００４８】対話動作実行部が出力したシステム応答
「町名をどうぞ」に対して，利用者が「黒川です」と入
力した場合，音声認識部は認識対象語彙 V182 を用いて
音声認識処理を行い，認識結果として「黒川(0.95)，大
川(0.88)，大川内(0.70)，大坪(0.11)，立花(0.03)」を
出力する。When the user inputs "Kurokawa is" in response to the system response "please name of town" output by the interactive action execution unit, the speech recognition unit performs speech recognition processing using the recognition target vocabulary V182. "Kurokawa (0.95), Okawa (0.88), Okawachi (0.70), Otsubo (0.11), Tachibana (0.03)" are output as recognition results.

【００４９】遷移先対話状態確定動作決定部は遷移テー
ブル T182 を参照して遷移先対話状態仮説 S1825, S182
2, S1823, S1824, S1828 を得る。次に，一位の認識結
果「黒川」のスコアが閾値以上であるため，遷移先対話
状態の確定動作の保留を決定して遷移先対話状態仮説を
暫定遷移先対話状態決定部に出力する。The transition destination dialog state determination operation determining unit refers to the transition table T182 to determine the transition destination dialog state hypothesis S1825, S182.
2, S1823, S1824, S1828 are obtained. Next, since the score of the first recognition result “Kurokawa” is equal to or larger than the threshold, the determination of the transition destination dialog state determination operation is suspended, and the transition destination dialog state hypothesis is output to the provisional transition destination dialog state determination unit.

【００５０】暫定遷移先対話状態決定部は，スコアの最
も良い「黒川」に対する遷移先対話状態仮説 S1825 を
選択して対話動作実行部へ出力するとともに，すべての
遷移先対話状態仮説を対話状態遷移記憶部に書き加え
る。これらの処理後，対話状態遷移記憶部は図7に示す
ものとなる。The provisional transition destination dialog state determination unit selects the transition destination dialog state hypothesis S1825 for “Kurokawa” having the best score and outputs it to the dialogue execution unit. Add to the storage unit. After these processes, the dialog state transition storage unit is as shown in FIG.

【００５１】暫定遷移先対話状態決定部から対話状態 S
1825 が入力されると，対話動作実行部はシステム応答
R1825「業種をどうぞ」を利用者に出力するとともに，
対話状態 S1825 に規定された認識対象語彙 V1825 と，
対話状態遷移記憶部に記憶された4つの遷移先対話状態
仮説に規定された認識対象語彙 V1822, V1823, V1824,V
1828 を音声認識部に出力する。From the temporary transition destination dialog state determining unit, the dialog state S
When 1825 is entered, the interactive execution unit responds with a system response.
While outputting R1825 "Please type of business" to the user,
The recognition target vocabulary V1825 defined in the dialog state S1825,
Recognized vocabulary defined by the four transition destination dialog state hypotheses stored in the dialog state transition storage unit V1822, V1823, V1824, V
1828 is output to the voice recognition unit.

【００５２】システム応答「業種をどうぞ」に対して，
利用者が「旅館です」と入力した場合，音声認識部は認
識対象語彙 V1825, V1822, V1823, V1824, V1828 を用
いて音声認識処理を行い，認識結果として「旅館(0.9
5)，理容(0.62)，旅行業(0.51)，リュウマチ科(0.27)，
療養所(0.10)，猟銃(0.02)」を出力する。In response to the system response "Please enter the type of business,"
When the user inputs "Ryokan," the voice recognition unit performs voice recognition using the recognition target vocabulary V1825, V1822, V1823, V1824, and V1828, and as a recognition result, "Ryokan (0.9
5), barber (0.62), tourism (0.51), rheumatology (0.27),
Nursing home (0.10), hunting gun (0.02) ".

【００５３】遷移先対話状態確定動作決定部は遷移テー
ブル T1825, T1822, T1823, T1824,T1828 を参照して，
認識結果に対する遷移先対話状態の仮説として9つの対
話状態 S18231, S18232, S18241, S18242, S18243, S18
251, S18252, S18281, S18282 を得る。スコアの最もよ
い認識結果「旅館」のスコアは0.95で閾値以上である
が，「旅館」に対する遷移先対話状態仮説 S18231, S18
241, S18251, S18281 はすべて予め確定動作を行う必要
がある対話状態のため，遷移先対話状態確定動作決定部
は遷移先対話状態の確定を行うと決定し，遷移先対話状
態仮説を遷移先対話状態確定部に出力する。The destination dialog state determination operation determination unit refers to the transition tables T1825, T1822, T1823, T1824, and T1828, and
9 dialog states S18231, S18232, S18241, S18242, S18243, S18
251, S18252, S18281, S18282 are obtained. The score of the best recognition result “Ryokan” is 0.95, which is above the threshold, but the transition destination hypothesis for “Ryokan” is S18231, S18.
241, S18251, and S18281 are all dialogue states that require a definitive action, so the destination dialog state determination action deciding unit decides to determine the destination dialog state, and changes the destination dialog state hypothesis to the destination dialog state. Output to the state determination unit.

【００５４】遷移先対話状態確定部は利用者に「業種は
旅館ですか」と確認を行い，利用者がこれに対して「は
い」と応答することにより遷移先対話状態がS18231,S18
241,S18251, S18281 に確定されるため，対話状態遷移
記憶部に記憶された遷移先対話状態仮説 S1822を削除す
る。さらに，遷移対話状態仮説 S1823,S1824, S1825,S1
828 が存在するため，遷移先対話状態確定部は利用者に
町名を確認することにより仮説の確定を行う。まず，最
もスコアの良い対話状態S1825 に対する認識結果につい
て「町名は黒川ですか」と利用者に確認する。この確認
に対し利用者が「はい」と応答し対話状態S182が確定す
る。これにより最終的な遷移先対話状態は S18251 に決
定される。The destination dialog state determination unit confirms to the user, "Is the business type a ryokan?", And the user responds "Yes" to the destination dialog state to change the destination dialog state to S18231, S18.
241, S18251, and S18281, the transition destination dialog state hypothesis S1822 stored in the dialog state transition storage unit is deleted. Furthermore, the transition dialogue state hypothesis S1823, S1824, S1825, S1
Since 828 exists, the transition destination dialog state determination unit determines the hypothesis by confirming the town name to the user. First, the user confirms the recognition result for the dialogue state S1825 with the best score, "Is the town name Kurokawa?" The user responds “Yes” to this confirmation, and the dialogue state S182 is determined. As a result, the final transition destination conversation state is determined in S18251.

【００５５】対話状態 S18251 が入力されると，対話動
作実行部は認識対象語彙 V18251 を音声認識部に出力
し，システム応答R18251「旅館の名前は何ですか」を利
用者に出力して対話を継続する。When the dialogue state S18251 is input, the dialogue execution unit outputs the vocabulary V18251 to be recognized to the speech recognition unit, and outputs the system response R18251 "What is the name of the inn" to the user to carry out the dialogue. continue.

【００５６】以上の動作により，規定された認識対象語
彙が大きいため他の対話状態の認識対象語彙と同時に音
声認識処理を行うことが望ましくなく，該対話状態に遷
移する直前に予め確定動作を行う必要がある対話状態に
対して，遷移先対話状態確定動作決定部が確定動作実行
を決定し，遷移先対話状態決定部が遷移先対話状態を確
定するため，認識対象語彙を限定でき認識率が向上す
る。According to the above operation, since the specified vocabulary to be recognized is large, it is not desirable to perform the speech recognition process simultaneously with the vocabulary to be recognized in another dialogue state. For the dialog state that needs to be changed, the transition destination dialog state deciding operation decision unit decides the execution of the definitive operation, and the transition destination dialog state deciding unit decides the transition destination dialog state. improves.

【００５７】実施の形態３．実施の形態３は上述の実施
の形態１とは遷移先対話状態確定動作決定部の動作が異
なるものであり、他は上述の実施の形態１と同様であ
る。以下，図１の遷移先対話状態確定動作決定部の動作
について，対話手順記憶部に記憶された対話状態が図
８，電話番号データベースが図９，音声認識部が出力す
る認識結果の個数Nが3 の場合を例に説明する。Embodiment 3 The third embodiment is different from the above-described first embodiment in the operation of the transition destination dialog state determining operation determining unit, and is otherwise the same as the above-described first embodiment. Hereinafter, regarding the operation of the transition destination dialog state determination operation determining unit in FIG. 1, the dialog state stored in the dialog procedure storage unit is FIG. 8, the telephone number database is FIG. 9, and the number N of recognition results output by the speech recognition unit is The case of 3 will be described as an example.

【００５８】対話開始状態S1に基づいて，対話動作実行
部がシステム応答R1「どこの電話番号をお調べですか」
を利用者に出力し，認識対象語彙V1を音声認識部に出力
することにより対話を開始する。Based on the dialogue start state S1, the dialogue execution unit responds with the system response R1 "Which phone number are you looking for?"
Is output to the user and the vocabulary V1 to be recognized is output to the speech recognition unit to start the dialogue.

【００５９】利用者が音声入力を行うと，音声認識部は
認識対象語彙V1を用いて音声認識処理を行い認識結果と
スコアを出力する。例えば利用者が「甘太郎です」と入
力した場合，認識結果として「アマタ(0.88)，甘太郎
(0.87)，天城(0.73)」の3つの候補を出力する。When the user performs a voice input, the voice recognition unit performs a voice recognition process using the recognition target vocabulary V1, and outputs a recognition result and a score. For example, if the user inputs "Amataro", the recognition result is "Amata (0.88),
(0.87), Amagi (0.73) "are output.

【００６０】認識結果が入力されると，遷移先対話状態
確定動作決定部は遷移テーブル T1を参照して，遷移先
対話状態仮説として S2, S3, S4 を得る。次に，音声認
識部からの認識結果を確定することにより未入力項目に
対する項目値が一意に定まるか否かを検査する。本実施
の形態においては利用者の入力項目は，都道府県名，市
区町村名，業種，対象名である。現時点では対象名のみ
が入力された状態であり，図９の電話番号データベース
を参照することにより，認識結果として得られた対象名
のみから未入力項目が一意に定まるか否かを検査すれば
よい。認識候補「アマタ」に対しては，データ番号 5,
6 の二つのデータが存在するが，未入力項目である県名
は認識結果を確定しただけでは一意には定まらない。他
の認識結果「甘太郎」「天城」についても同様であり，
すべての認識候補に対して未入力項目が一意には定まら
ないため，遷移先対話状態の確定を保留すると決定し，
3つの遷移先対話状態仮説を暫定遷移先対話状態決定部
に出力する。When the recognition result is input, the transition destination dialog state determination operation determining unit obtains S2, S3, S4 as the transition destination dialog state hypothesis with reference to the transition table T1. Next, it is checked whether or not the item value for the uninput item is uniquely determined by determining the recognition result from the voice recognition unit. In the present embodiment, the input items of the user are a prefecture name, a municipal name, a business type, and a target name. At the moment, only the target name has been input, and it is sufficient to check whether or not an uninput item is uniquely determined from only the target name obtained as a recognition result by referring to the telephone number database of FIG. . For the recognition candidate “Amata”, data number 5,
Although there are two types of data, the prefecture name, which has not been entered, cannot be uniquely determined just by confirming the recognition result. The same applies to the other recognition results "Amataro" and "Amagi".
Since uninput items are not uniquely determined for all recognition candidates, it is determined that the determination of the transition destination dialog state is to be suspended,
The three transition destination dialog state hypotheses are output to the provisional transition destination dialog state determination unit.

【００６１】暫定遷移先対話状態決定部は，例えばスコ
アの最も良い「アマタ」に対する遷移先対話状態仮説S3
を選択して対話動作実行部へ出力する。さらに，すべて
の遷移先対話状態仮説を対話状態遷移記憶部に書き加え
る。The provisional transition destination dialog state determining unit determines, for example, the transition destination dialog state hypothesis S3 for “Amata” having the highest score.
Is selected and output to the interactive operation execution unit. Furthermore, all the transition destination dialog state hypotheses are added to the dialog state transition storage unit.

【００６２】暫定遷移先対話状態決定部から対話状態S3
が入力されると，対話動作実行部はシステム応答R3「何
県でしょうか」を利用者に出力するとともに，対話状態
S3に規定された認識対象語彙V3と，対話状態遷移記憶部
に記憶された2つの遷移先対話状態仮説 S2, S4 に規定
された認識対象語彙 V2, V4 を音声認識部に出力する。From the temporary transition destination dialog state determination unit, the dialog state S3
Is input, the dialogue execution unit outputs the system response R3 "What prefecture is it?"
The recognition target vocabulary V3 specified in S3 and the recognition target vocabulary V2 and V4 specified in the two transition destination dialog state hypotheses S2 and S4 stored in the dialog state transition storage unit are output to the speech recognition unit.

【００６３】対話動作実行部が出力したシステム応答
「何県でしょうか」に対して，利用者が「神奈川県で
す」と入力した場合，音声認識部は認識対象語彙 V2, V
3, V4 を用いて音声認識処理を行い，認識結果として
「神奈川(0.95)，香川(0.72)，佐賀(0.41)」を出力す
る。When the user inputs "Kanagawa prefecture" in response to the system response "What prefecture is it?" Output by the dialogue execution unit, the speech recognition unit causes the recognition target vocabulary V2, V
3. Perform speech recognition processing using V4, and output "Kanagawa (0.95), Kagawa (0.72), Saga (0.41)" as the recognition result.

【００６４】遷移先対話状態確定動作決定部は，遷移テ
ーブル T2, T3, T4 を参照して，認識結果に対する遷移
先対話状態の仮説として9つの対話状態 S22, S23, S24,
S32,S33, S34, S42, S43, S44 を得る。次に，音声認識
部からの認識結果を確定することにより未入力項目に対
する項目値が一意に定まるか否かを検査する。現時点で
は対象名と県名が入力された状態であり，図９の電話番
号データベースを参照すると，名称「アマタ」，県名
「神奈川」であるデータはデータ番号 5, 6 の二つであ
るが，県名を確定しただけでは未入力項目である市名は
一意に確定しない。同様に，名称が「甘太郎」，県名が
「神奈川」のデータ 1,2, 3も市名が一意に確定しない
ため，遷移先対話状態の確定を保留すると決定し，9つ
の遷移先対話状態仮説を暫定遷移先対話状態決定部に出
力する。The transition destination dialog state determination operation determination unit refers to the transition tables T2, T3, and T4, and as a hypothesis of the transition destination dialog state with respect to the recognition result, the nine dialog states S22, S23, S24,
Obtain S32, S33, S34, S42, S43, S44. Next, it is checked whether or not the item value for the uninput item is uniquely determined by determining the recognition result from the voice recognition unit. At this point, the subject name and prefecture name have been entered. Referring to the telephone number database in FIG. 9, the data with the name “Amata” and the prefecture name “Kanagawa” are two data numbers 5 and 6. However, the city name, which is an uninput item, is not uniquely determined just by determining the prefecture name. Similarly, for the data 1, 2, and 3 with the name “Amataro” and the prefecture name “Kanagawa”, it is determined that the determination of the state of the transition destination dialogue is suspended because the city names are not uniquely determined, and the nine transition destination dialogues are determined. The state hypothesis is output to the provisional transition destination dialog state determination unit.

【００６５】暫定遷移先対話状態決定部は，現在の対話
状態 S2 からスコアの最も良い「神奈川」に対する遷移
先対話状態仮説 S32 を選択して対話動作実行部へ出力
する。さらに，すべての9つの遷移先対話状態仮説を対
話状態遷移記憶部に書き加える。The provisional transition destination dialog state determination unit selects the transition destination dialog state hypothesis S32 for "Kanagawa" having the highest score from the current conversation state S2, and outputs it to the dialog operation execution unit. In addition, all nine transition destination dialog state hypotheses are added to the dialog state transition storage unit.

【００６６】暫定遷移先対話状態決定部から対話状態S3
2が入力されると，対話動作実行部はシステム応答R32
「何市ですか」を利用者に出力するとともに，対話状態
S32に規定された認識対象語彙V32と，対話状態遷移記憶
部に記憶された8つの遷移先対話状態仮説 S22, S42, S2
3, S33, S43, S24, S34, S44 に規定された認識対象語
彙 V22, V42, V23, V33, V43, V24, V34, V44 を音声認
識部に出力する。From the temporary transition destination dialog state determination unit, the dialog state S3
When 2 is entered, the interactive execution unit returns the system response R32
"What city is it?"
The recognition target vocabulary V32 defined in S32 and the eight transition destination dialog state hypotheses stored in the dialog state transition storage unit S22, S42, S2
3. Output the recognition target vocabulary V22, V42, V23, V33, V43, V24, V34, V44 specified in S33, S43, S24, S34, S44 to the voice recognition unit.

【００６７】対話動作実行部が出力したシステム応答
「何市ですか」に対して，利用者が「鎌倉市です」と入
力した場合，音声認識部は認識対象語彙 V22, V32, V4
2, V23,V33, V43, V24, V34, V44 を用いて音声認識処
理を行い，認識結果として「鎌倉(0.87)，川崎(0.66)，
唐津(0.28)」を出力する。When the user inputs "Kamakura City" in response to the system response "What city is it?" Output by the dialogue execution part, the speech recognition part makes the recognition target vocabulary V22, V32, V4.
2, V23, V33, V43, V24, V34, V44 are used for speech recognition processing, and the results of the recognition are “Kamakura (0.87), Kawasaki (0.66),
Karatsu (0.28) "is output.

【００６８】遷移先対話状態確定動作決定部は，遷移テ
ーブル T22, T32, T42, T23, T33,T43, T24, T34, T44
を参照して，認識結果に対する遷移先対話状態の仮説と
して9つの対話状態 S222, S223, S322, S323, S422, S4
23, S243, S343, S443 を得る。次に，音声認識部から
の認識結果を確定することにより未入力項目に対する項
目値が一意に定まるか否かを検査する。現時点では対象
名，県名，市名が入力された状態であり，図９の電話番
号データベースを参照すると，名称「アマタ」，県名
「神奈川」，市名「鎌倉」というデータは存在しない。The transition destination dialog state determination operation determining unit determines the transition table T22, T32, T42, T23, T33, T43, T24, T34, T44.
, Nine dialog states S222, S223, S322, S323, S422, S4
Obtain 23, S243, S343, S443. Next, it is checked whether or not the item value for the uninput item is uniquely determined by determining the recognition result from the voice recognition unit. At this point, the target name, the prefecture name, and the city name have been input. Referring to the telephone number database of FIG. 9, there is no data of the name “Amata”, the prefecture name “Kanagawa”, and the city name “Kamakura”.

【００６９】一方，名称「甘太郎」，県名「神奈川」，
市名「鎌倉」に対しては，データ番号 1のデータが存在
し，市名を確定することにより，未入力項目である町
名，業種が一意に確定する。したがって，遷移先対話状
態の確定を行うと決定し，遷移先対話状態仮説 S222, S
223, S322, S323, S422, S423, S243, S343, S443 を遷
移先対話状態確定部に出力する。On the other hand, the name “Amataro”, the prefecture name “Kanagawa”,
For the city name "Kamakura", there is data with data number 1, and by determining the city name, the unentered items such as the town name and business type are uniquely determined. Therefore, it is determined that the transition destination dialog state is determined, and the transition destination dialog state hypothesis S222, S
223, S322, S323, S422, S423, S243, S343, S443 are output to the transition destination dialog state determination unit.

【００７０】遷移先対話状態確定部は，遷移先対話状態
の仮説が入力されると，最もスコアの良い認識結果であ
る「鎌倉」を確認することで遷移先対話状態を S223 に
確定し，対話動作実行部に対話状態S223を出力する。When the hypothesis of the transition destination dialog state is input, the transition destination dialog state determination unit determines the transition destination dialog state in S223 by confirming “Kamakura” which is the recognition result having the highest score, and The conversation state S223 is output to the operation execution unit.

【００７１】対話動作実行部は対話状態 S223 が入力さ
れると，対話状態遷移記憶部に記憶された対話遷移系列
と，図の電話番号データベースから，全入力項目に対す
る項目値が，名称「甘太郎」，県名「神奈川」，市名
「鎌倉」，町名「大船」，業種「居酒屋」と決定し，電
話番号「0467-00-0000」が一意に定まるため，該電話番
号を利用者に応答する。When the dialogue state execution unit S223 is input, the dialogue operation execution unit stores the dialogue transition sequence stored in the dialogue state transition storage unit and the item values for all input items from the telephone number database shown in FIG. ”, Prefecture name“ Kanagawa ”, city name“ Kamakura ”, town name“ Ofuna ”, and business type“ Izakaya ”. Since the telephone number“ 0467-00-0000 ”is uniquely determined, the telephone number is returned to the user. I do.

【００７２】以上の動作により，利用者の入力に対する
対話状態遷移の仮説を複数保持し，利用者からの入力項
目がすべて入力されていなくても，認識結果を確定する
ことにより未入力項目に対する項目値が一意に定まる場
合に遷移先対話状態を一つに確定するため，一発話毎に
利用者へ確認を行なって確定的に対話を進めなくても認
識率を向上でき，さらに確認対話の回数が減るため利用
者と装置との自然な対話が実現でき利用者の利便性が向
上する。With the above operation, a plurality of hypotheses of the dialog state transition in response to the user's input are held, and even if all the input items from the user have not been input, the recognition result is determined and the items corresponding to the non-input items are determined. When the value is uniquely determined, the transition destination dialog state is determined as one, so the confirmation rate can be improved without confirming the user for each utterance and proceeding definitively, and the number of confirmation dialogs Therefore, a natural conversation between the user and the device can be realized, and the convenience of the user is improved.

【００７３】実施の形態4.実施の形態４は上述の実施の
形態１とは遷移先対話状態確定動作決定部の動作が異な
るものであり、他は上述の実施の形態１と同様である。
以下，図１の遷移先対話状態確定動作決定部の動作につ
いて，対話手順記憶部に図１０の対話状態が保持されて
いる場合を例に説明する。Embodiment 4 Embodiment 4 is different from Embodiment 1 described above in that the operation of the transition destination dialog state deciding operation determining unit is different, and the rest is the same as Embodiment 1 described above.
Hereinafter, the operation of the transition destination dialog state determination operation determining unit in FIG. 1 will be described by taking as an example a case where the dialog state in FIG. 10 is held in the dialog procedure storage unit.

【００７４】対話開始状態S1に基づいて，対話動作実行
部がシステム応答R1「都道府県名をどうぞ」を利用者に
出力し，認識対象語彙V1を音声認識部に出力することに
より対話を開始する。Based on the dialogue start state S1, the dialogue execution unit outputs the system response R1 "Please name the prefecture" to the user, and outputs the recognition target vocabulary V1 to the speech recognition unit to start the dialogue. .

【００７５】利用者が音声入力を行うと，音声認識部は
認識対象語彙V1を用いて音声認識処理を行い認識結果と
スコアを出力する。例えば利用者が「佐賀です」と入力
した場合，認識結果として「佐賀(0.92)，滋賀(0.80)，
香川(0.73)，神奈川(0.52)，鹿児島(0.50)」の5つの候
補を出力する。When the user makes a speech input, the speech recognition unit performs a speech recognition process using the recognition target vocabulary V1, and outputs a recognition result and a score. For example, if the user inputs “Saga is”, the recognition result will be “Saga (0.92), Shiga (0.80),
Kagawa (0.73), Kanagawa (0.52) and Kagoshima (0.50) "are output.

【００７６】認識結果が入力されると，遷移先対話状態
確定動作決定部は現在の対話状態S1に規定された遷移テ
ーブルT1を参照して，前述の5つの認識結果に対する遷
移先対話状態の仮説として5つの対話状態 S15, S16, S1
7, S18, S19 を得る。次に，一位の認識結果「佐賀」の
スコアが0.92で閾値以上であるため確定を保留すると決
定し，共通のシステム応答を持つ5つの遷移先対話状態
仮説すべてを暫定遷移先対話状態決定部に出力する。When the recognition result is input, the transition destination dialog state determining operation determining unit refers to the transition table T1 defined in the current conversation state S1, and determines the hypothesis of the transition destination dialog state for the above five recognition results. 5 conversation states S15, S16, S1
Get 7, S18, S19. Next, the first recognition result “Saga” has a score of 0.92, which is equal to or higher than the threshold, so that it is decided to suspend the decision, and all five transition destination dialog state hypotheses having a common system response are provisionally transitioned to the transition destination dialog state determination unit. Output to

【００７７】暫定遷移先対話状態決定部は，スコアの最
も良い「佐賀」に対する遷移先対話状態仮説 S18 を選
択し対話動作実行部へ出力し，すべての遷移先対話状態
仮説を対話状態遷移記憶部に書き加える。The provisional transition destination dialog state determination unit selects the transition destination dialog state hypothesis S18 for “Saga” having the best score, outputs it to the dialogue execution unit, and stores all transition destination dialog state hypotheses in the dialog state transition storage unit. Add to

【００７８】暫定遷移先対話状態決定部から対話状態 S
18 が入力されると，対話動作実行部はシステム応答 R1
8 「市名をどうぞ」を利用者に出力するとともに，認識
対象語彙 V18と，対話状態遷移記憶部に記憶された4つ
の遷移先対話状態仮説 S15,S16, S17, S19 に規定され
た認識対象語彙 V15, V16, V17, V19 を音声認識部に出
力する。From the temporary transition destination dialog state determination unit, the dialog state S
When "18" is input, the interactive action execution unit returns the system response R1.
8 Output “Please enter the city name” to the user, the vocabulary V18 to be recognized, and the four target dialogue state hypotheses S15, S16, S17, S19 stored in the dialogue state transition storage unit. Vocabulary V15, V16, V17, V19 are output to the speech recognition unit.

【００７９】対話動作実行部が出力したシステム応答
「市名をどうぞ」に対して，利用者が「伊万里市です」
と入力した場合，音声認識部は認識対象語彙 V15, V16,
V17, V18, V19 を用いて音声認識処理を行い，認識結果
として「伊万里(0.91)，出水(0.76)，伊勢原(0.30)，八
日市(0.11)，平塚(0.09)」を出力する。In response to the system response "Please enter the city name" output by the interactive operation execution unit, the user is "Imari city."
, The speech recognition unit recognizes the vocabulary V15, V16,
Speech recognition processing is performed using V17, V18, and V19, and "Imari (0.91), Izumi (0.76), Isehara (0.30), Yokaichi (0.11), Hiratsuka (0.09)" are output as recognition results.

【００８０】遷移先対話状態確定動作決定部は，遷移テ
ーブル T15, T16, T17, T18, T19を参照して，認識結果
に対する遷移先対話状態の仮説として5つの対話状態 S1
52, S153, S163, S182, S192 を得る。スコアの最も良
い認識結果「伊万里」のスコアは閾値以上であるが，遷
移先対話状態仮説 S152, S153, S163, S182,S192に共通
のシステム応答が存在しないため，遷移先対話状態確定
動作決定部は遷移先対話状態の確定を行うと決定し，遷
移先対話状態仮説 S152, S153, S163, S182, S192 を遷
移先対話状態確定部に出力する。The transition destination dialog state determination operation determination unit refers to the transition tables T15, T16, T17, T18, and T19, and as a hypothesis of the transition destination dialog state for the recognition result, the five dialog states S1
52, S153, S163, S182, S192 are obtained. Although the score of the best recognition result "Imari" is equal to or higher than the threshold, the destination dialog state hypothesis S152, S153, S163, S182, and S192 do not have a common system response. Determines that the destination dialog state is determined, and outputs the destination dialog state hypotheses S152, S153, S163, S182, and S192 to the destination dialog state determination unit.

【００８１】遷移先対話状態確定部は実施例1と同様に
動作し，遷移先対話状態を S182に確定して対話動作実
行部に出力する。遷移先対話状態 S182 が入力される
と，対話動作実行部も実施例1と同様に動作して利用者
との対話を継続する。The transition destination dialog state determination unit operates in the same manner as in the first embodiment, determines the transition destination dialog state in S182, and outputs it to the dialog operation execution unit. When the transition destination dialog state S182 is input, the dialog operation execution unit operates in the same manner as in the first embodiment, and continues the dialog with the user.

【００８２】以上の動作により，対話状態遷移記憶部が
利用者の入力に対する対話状態遷移の仮説を複数保持
し，遷移先対話状態確定動作決定部が，遷移先対話状態
仮説に共通のシステム発話が存在しなくなった場合に遷
移先対話状態を一つに確定するため，一発話毎に利用者
へ確認を行なって確定的に対話を進めなくても認識率を
向上でき，さらに確認対話の回数が減るため利用者と装
置との自然な対話が実現でき利用者の利便性が向上す
る。With the above operation, the dialog state transition storage section holds a plurality of hypotheses of the dialog state transition in response to the user's input, and the transition destination dialog state determining operation determining section determines that the system utterance common to the transition destination dialog state hypothesis is generated. When the dialogue no longer exists, the transition destination dialogue state is determined as one, so the user can be confirmed for each utterance and the recognition rate can be improved without having to proceed with the dialogue steadily. As a result, a natural dialog between the user and the device can be realized, and the convenience of the user is improved.

【００８３】実施の形態５.実施の形態５は上述の実施
の形態１とは遷移先対話状態確定動作決定部の動作が異
なるものであり、他は上述の実施の形態１と同様であ
る。以下，図１の遷移先対話状態確定動作決定部の動作
を，対話手順記憶部に図１１の対話状態が保持されてい
る場合を例に説明する。図１１に示した例では，対話状
態 S152 において複数のシステム応答 R152-1「伊勢原
市の何町ですか」とR152-2「町名をどうぞ」が規定され
ている。Fifth Embodiment The fifth embodiment is different from the first embodiment in the operation of the transition destination dialog state determining operation determining unit, and the other is the same as the first embodiment. Hereinafter, the operation of the transition destination dialog state deciding operation determining unit in FIG. 1 will be described by taking as an example a case where the dialog state in FIG. 11 is held in the dialog procedure storage unit. In the example shown in FIG. 11, in the conversation state S152, a plurality of system responses R152-1 "What is the town of Isehara city" and R152-2 "Please name the town" are specified.

【００８４】まず，実施の形態４と同様に，対話開始状
態S1から対話を開始し，対話状態 S18 に至り，利用者
がシステム応答 R18 「市名をどうぞ」に対し「伊万里
市です」と応答して音声認識部が認識結果「伊万里(0.9
1)，出水(0.76)，伊勢原(0.30)，八日市(0.11)，平塚
(0.09)」を出力した場合について説明する。First, in the same manner as in the fourth embodiment, the dialogue is started from the dialogue start state S1, the dialogue state is reached, and the user responds to the system response R18 “Please enter the city name” with “Imari city”. Then, the voice recognition unit returns the recognition result `` Imari (0.9
1), Izumi (0.76), Isehara (0.30), Yokaichi (0.11), Hiratsuka
(0.09) "will be described.

【００８５】遷移先対話状態確定動作決定部は，遷移テ
ーブル T15, T16, T17, T18, T19を参照して，実施の形
態４と同様に認識結果に対する遷移先対話状態の仮説と
して5つの対話状態 S152, S153, S163, S182, S192 を
得る。スコアの最も良い認識結果「伊万里」のスコアは
閾値以上であり，かつ遷移先対話状態仮説 S152,S153,
S163, S182, S192 に共通のシステム応答「町名をどう
ぞ」が存在するため，遷移先対話状態確定動作決定部は
遷移先対話状態の確定を保留すると決定し，遷移先対話
状態仮説 S152, S153, S163, S182, S192 を暫定遷移先
対話状態決定部に出力する。The transition destination dialog state determination operation determining unit refers to the transition tables T15, T16, T17, T18, and T19 and, as in the fourth embodiment, determines five transition states as the hypothesis of the transition destination dialog state for the recognition result. Obtain S152, S153, S163, S182, S192. The score of the best recognition result "Imari" is equal to or greater than the threshold value, and the destination dialog state hypothesis S152, S153,
Since there is a system response common to S163, S182, and S192, "Please enter the town name," the transition destination dialog state determination operation determination unit determines to suspend the determination of the transition destination dialog state, and the transition destination dialog state hypothesis S152, S153, S163, S182, and S192 are output to the provisional transition destination dialog state determination unit.

【００８６】暫定遷移先対話状態決定部は遷移先対話状
態仮説 S152, S153, S163, S182,S192 が入力される
と，最もスコアのよい認識結果「伊万里」に対する対話
状態S182 を暫定遷移先対話状態と決定して対話動作実
行部に出力する。When the transition destination dialog state hypothesis S152, S153, S163, S182, S192 is input, the provisional transition destination dialog state determination unit changes the dialog state S182 for the recognition result “Imari” having the highest score to the provisional transition destination dialog state. And outputs it to the interactive operation execution unit.

【００８７】対話動作実行部は対話状態 S182 が入力さ
れると，対話状態 S182 に規定された複数のシステム応
答のなかから，遷移先対話状態仮説 S152, S153, S163,
S192に規定されたシステム応答と共通の R182-2「町名
をどうぞ」をシステム応答として出力して対話を継続す
る。When the dialogue state execution unit S182 inputs the dialogue state S182, the dialogue execution unit hypotheses the destination dialogue state hypotheses S152, S153, S163, and S163 from among a plurality of system responses defined in the dialogue state S182.
Output R182-2 "Please enter the town name" common to the system response specified in S192 as the system response and continue the dialogue.

【００８８】一方，実施の形態１と同様に，対話開始状
態S1から対話を開始し，対話状態 S16 に至り，システ
ム応答 R16「市名をどうぞ」に対し利用者が「伊万里市
です」と応答したため，遷移先対話状態決定部が利用者
に確認を行い，遷移先対話状態を S182 に決定した場合
について説明する。On the other hand, as in the first embodiment, the dialogue is started from the dialogue start state S1, the dialogue state is reached, and the user responds to the system response R16 “Please enter the city name” with “Imari city”. Therefore, the case where the transition destination dialog state determination unit confirms with the user and determines the transition destination dialog state to S182 will be described.

【００８９】対話状態 S182 が入力されると，対話動作
実行部は対話状態 S182 に規定されたシステム応答 R18
2-1「伊万里市の何町ですか」およびR182-2「町名をど
うぞ」のうち，例えば，最初に定義されている R182-1
をシステム応答として出力し対話を継続する。When the dialogue state S182 is input, the dialogue operation execution unit executes the system response R18 specified in the dialogue state S182.
2-1 Of "What town in Imari city" and R182-2 "Please name the town", for example, the first defined R182-1
Is output as a system response and the dialogue is continued.

【００９０】以上の動作により，対話手順記憶部に記憶
された各対話状態に複数のシステム応答を記述すること
で，遷移先対話状態仮説に共通のシステム発話が存在す
る場合は，遷移先対話状態確定動作決定部は確認による
確定動作を行わず，各遷移先対話状態仮説に共通のシス
テム発話を出力して対話を継続し，一方，遷移先対話状
態確定部で遷移先対話状態が確定した場合には，確定し
た対話状態に固有のシステム応答を行えるため，一発話
毎に利用者へ確認を行なって確定的に対話を進めなくて
も認識率を向上でき，さらに対話状態遷移に応じた自然
な応答を行えるため，利用者と装置との自然な対話が実
現でき利用者の利便性が向上する。By describing a plurality of system responses in each dialogue state stored in the dialogue procedure storage unit by the above operation, when a common system utterance is present in the transitional dialogue state hypothesis, the transitional dialogue state is described. When the finalized action deciding unit does not perform the finalized action by confirmation, outputs a common system utterance to each transition destination dialog state hypothesis and continues the dialogue, while the transition destination dialog state determinator determines the transition destination dialog state Since the system response unique to the confirmed dialog state can be performed, the recognition rate can be improved without confirming the user every utterance and proceeding deterministically. Since a natural response can be made, a natural conversation between the user and the device can be realized, and the convenience of the user is improved.

【００９１】実施の形態６．実施の形態６は上述の実施
の形態１とは遷移先対話状態確定動作決定部の動作が異
なるものであり、他は上述の実施の形態１と同様であ
る。以下，図１の遷移先対話状態確定動作決定部の動作
について，対話手順記憶部に図２，５の対話状態が保持
されている場合を例に，遷移先対話状態確定動作決定部
で確定動作決定に用いる語彙規模の閾値が 300の場合に
ついて説明する。Embodiment 6 FIG. The sixth embodiment is different from the above-described first embodiment in the operation of the transition destination dialog state deciding operation determining unit, and is otherwise the same as the above-described first embodiment. Hereinafter, the operation of the transition destination dialog state determining operation determining unit in FIG. 1 will be described by taking the case where the dialog state of FIGS. A case in which the vocabulary scale threshold used for determination is 300 will be described.

【００９２】対話開始状態 S1 に基づいて，対話動作実
行部がシステム応答 R1 「都道府県名をどうぞ」を利用
者に出力し，認識対象語彙 V1 を音声認識部に出力する
ことにより対話を開始する。Based on the dialogue start state S1, the dialogue execution unit outputs the system response R1 "Please name the prefecture" to the user, and outputs the recognition target vocabulary V1 to the speech recognition unit to start the dialogue. .

【００９３】利用者が音声入力を行うと，音声認識部は
認識対象語彙V1を用いて音声認識処理を行い認識結果と
スコアを出力する。例えば利用者が「佐賀です」と入力
した場合，認識結果として「佐賀(0.92)，滋賀(0.80)，
香川(0.73)，神奈川(0.52)，鹿児島(0.50)」の5つの候
補を出力する。When the user performs voice input, the voice recognition unit performs voice recognition processing using the recognition target vocabulary V1, and outputs a recognition result and a score. For example, if the user inputs “Saga is”, the recognition result will be “Saga (0.92), Shiga (0.80),
Kagawa (0.73), Kanagawa (0.52) and Kagoshima (0.50) "are output.

【００９４】認識結果が入力されると，遷移先対話状態
確定動作決定部は現在の対話状態S1に規定された遷移テ
ーブルT1を参照して，前述の5つの認識結果に対する遷
移先対話状態の仮説として5つの対話状態 S15, S16, S1
7, S18, S19 を得る。次に，遷移先対話状態仮説の全て
の認識対象語彙 V15, V16, V17, V18, V19 を合計した
語彙を求める。V15, V16, V17, V18, V19 はそれぞれ，
神奈川県の市名，滋賀県の市名，香川県の市名，佐賀県
の市名，鹿児島県の市名のため，合計の語彙はこれら5
県のすべての市名であり，これらの異なる5県で同一の
市名は存在しないため，その語彙の規模は 52である。When the recognition result is input, the transition destination dialog state determining operation determining unit refers to the transition table T1 defined in the current conversation state S1, and determines the hypothesis of the transition destination dialog state for the above-described five recognition results. 5 conversation states S15, S16, S1
Get 7, S18, S19. Next, the vocabulary obtained by summing all the recognition target vocabularies V15, V16, V17, V18, and V19 of the transition destination dialog state hypothesis is obtained. V15, V16, V17, V18, V19 are respectively
Because the names of Kanagawa, Shiga, Kagawa, Saga, and Kagoshima prefectures, the total vocabulary is 5
The vocabulary size is 52 because the names of all the cities in the prefectures are not the same in these five prefectures.

【００９５】これは閾値の 300 より小さいため，遷移
先対話状態確定動作決定部は遷移先対話状態仮説の確定
を保留すると決定し，共通のシステム応答を持つ5つの
遷移先対話状態仮説すべてを暫定遷移先対話状態決定部
に出力する。Since this is smaller than the threshold value of 300, the transition destination dialog state determination operation determination unit determines to suspend the determination of the transition destination dialog state hypothesis, and tentatively determines all five transition destination dialog state hypotheses having a common system response. Output to the transition destination dialog state determination unit.

【００９６】暫定遷移先対話状態決定部は，スコアの最
も良い「佐賀」に対する遷移先対話状態仮説 S18 を選
択し対話動作実行部へ出力し，すべての遷移先対話状態
仮説を対話状態遷移記憶部に書き加える。The provisional transition destination dialog state determination unit selects the transition destination dialog state hypothesis S18 for “Saga” having the best score, outputs it to the dialogue execution unit, and stores all transition destination dialog state hypotheses in the dialog state transition storage unit. Add to

【００９７】暫定遷移先対話状態決定部から対話状態 S
18 が入力されると，対話動作実行部はシステム応答 R1
8 「市名をどうぞ」を利用者に出力するとともに，認識
対象語彙 V18 と，対話状態遷移記憶部に記憶された4つ
の遷移先対話状態仮説 S15, S16, S17, S19 に規定され
た認識対象語彙 V15, V16, V17, V19 を音声認識部に出
力する。From the provisional transition destination dialog state determination unit, the dialog state S
When "18" is input, the interactive action execution unit returns the system response R1.
8 While outputting “Please enter the city name” to the user, the recognition target vocabulary V18 and the four transition destination dialog state hypotheses S15, S16, S17, and S19 stored in the dialog state transition storage unit are recognized. Vocabulary V15, V16, V17, V19 are output to the speech recognition unit.

【００９８】対話動作実行部が出力したシステム応答
「市名をどうぞ」に対して，利用者が「伊万里市です」
と入力した場合，音声認識部は認識対象語彙 V15, V16,
V17, V18, V19 を用いて音声認識処理を行い，認識結果
として「伊万里(0.91)，出水(0.76)，伊勢原(0.30)，八
日市(0.11)，平塚(0.09)」を出力する。In response to the system response "Please enter the city name" output by the interactive operation execution unit, the user is "Imari city."
, The speech recognition unit recognizes the vocabulary V15, V16,
Speech recognition processing is performed using V17, V18, and V19, and "Imari (0.91), Izumi (0.76), Isehara (0.30), Yokaichi (0.11), Hiratsuka (0.09)" are output as recognition results.

【００９９】遷移先対話状態確定動作決定部は，遷移テ
ーブル T15, T16, T17, T18, T19を参照して，認識結果
に対する遷移先対話状態の仮説として5つの対話状態 S1
52, S153, S163, S182, S192 を得る。次に，遷移先対
話状態仮説の全ての認識対象語彙 V152, V153, V163, V
182, V192 を合計した語彙を求める。V152,V153, V163,
V182, V192 はそれぞれ，神奈川県伊勢原市の町名，神
奈川県平塚市の町名，滋賀県八日市市の町名，佐賀県伊
万里市の町名，鹿児島県出水市の町名のため，合計の語
彙はこれら5市のすべての町名である。その総数は 332
であるが，これら異なる5市に同一の町名が存在するた
め異なり語数は 327 である。The destination dialog state determination operation determining unit refers to the transition tables T15, T16, T17, T18, and T19, and as a hypothesis of the destination dialog state for the recognition result, the five dialog states S1
52, S153, S163, S182, S192 are obtained. Next, all the target vocabularies V152, V153, V163, V
Find the vocabulary of the sum of 182 and V192. V152, V153, V163,
V182 and V192 are the names of the towns of Isehara, Kanagawa, Hiratsuka, Kanagawa, Yokaichi, Shiga, Imari, Saga, and Izumi, Kagoshima, respectively. Are all the street names. The total number is 332
However, since the same town name exists in these five different cities, the number of words is 327 different.

【０１００】これは閾値の 300 より大きいため，遷移
先対話状態確定動作決定部は遷移先対話状態の確定を行
うと決定し，遷移先対話状態仮説 S152, S153, S163, S
182,S192 を遷移先対話状態確定部に出力する。Since this is larger than the threshold value of 300, the transition destination dialog state determination operation determination unit determines to determine the transition destination dialog state, and the transition destination dialog state hypothesis S152, S153, S163, S
182 and S192 are output to the transition destination dialog state determination unit.

【０１０１】遷移先対話状態確定部は実施例1と同様に
動作し，遷移先対話状態を S182に確定して対話動作実
行部に出力する。遷移先対話状態 S182 が入力される
と，対話動作実行部も実施例1と同様に動作して利用者
との対話を継続する。The transition destination dialog state determination unit operates in the same manner as in the first embodiment, determines the transition destination dialog state in S182, and outputs it to the dialog operation execution unit. When the transition destination dialog state S182 is input, the dialog operation execution unit operates in the same manner as in the first embodiment, and continues the dialog with the user.

【０１０２】以上の動作により，遷移先対話状態仮説に
規定された認識対象語彙の規模が大きく，認識率が低下
する恐れがある場合に遷移先対話状態確定動作決定部が
確定動作実行を決定し，遷移先対話状態決定部が遷移先
対話状態を確定するため，認識対象語彙を限定でき認識
率が向上する。By the above operation, when the size of the vocabulary to be recognized specified in the transition destination dialog state hypothesis is large and the recognition rate may be reduced, the transition destination dialog state determination operation determination unit determines the execution of the determination operation. Since the destination dialog state determination unit determines the destination dialog state, the vocabulary to be recognized can be limited and the recognition rate can be improved.

【０１０３】実施の形態７．実施の形態７は上述の実施
の形態１とは遷移先対話状態確定動作決定部の動作が異
なるものであり、他は上述の実施の形態１と同様であ
る。以下，図１の遷移先対話状態確定動作決定部の動作
について，対話手順記憶部に図２および図５の対話状態
が保持されており，遷移先対話状態確定動作決定部が確
定動作決定に用いる遷移系列の長さの閾値が 2 である
場合について説明する。Embodiment 7 FIG. The seventh embodiment is different from the first embodiment in the operation of the transition destination dialog state determination operation determining unit, and the other is the same as the first embodiment. Hereinafter, regarding the operation of the transition destination dialog state determination operation determining unit of FIG. 1, the dialog state of FIG. 2 and FIG. 5 is held in the dialog procedure storage unit, and the transition destination dialog state determination operation determination unit uses for the determination operation determination. The case where the threshold of the length of the transition sequence is 2 will be described.

【０１０４】対話開始状態S1に基づいて，対話動作実行
部がシステム応答R1「都道府県名をどうぞ」を利用者に
出力し，認識対象語彙V1を音声認識部に出力することに
より対話を開始する。On the basis of the dialogue start state S1, the dialogue execution unit outputs the system response R1 "Please name the prefecture" to the user, and outputs the recognition target vocabulary V1 to the speech recognition unit to start the dialogue. .

【０１０５】利用者が音声入力を行うと，音声認識部は
認識対象語彙V1を用いて音声認識処理を行い認識結果と
スコアを出力する。例えば利用者が「佐賀です」と入力
した場合，認識結果として「佐賀(0.92)，滋賀(0.80)，
香川(0.73)，神奈川(0.52)，鹿児島(0.50)」の5つの候
補を出力する。When the user makes a voice input, the voice recognition unit performs a voice recognition process using the recognition target vocabulary V1, and outputs a recognition result and a score. For example, if the user inputs “Saga is”, the recognition result will be “Saga (0.92), Shiga (0.80),
Kagawa (0.73), Kanagawa (0.52) and Kagoshima (0.50) "are output.

【０１０６】認識結果が入力されると，遷移先対話状態
確定動作決定部は現在の対話状態S1に規定された遷移テ
ーブルT1を参照して，前述の5つの認識結果に対する遷
移先対話状態の仮説として5つの対話状態 S15, S16, S1
7, S18, S19 を得る。遷移先対話状態確定動作決定部
は，遷移先対話状態仮説の遷移系列の長さを対話状態遷
移記憶部を参照して得る。この時点では対話開始状態 S
1 からの遷移先対話状態仮説は対話状態遷移記憶部には
なにも記憶されていないため，遷移系列の長さは0 であ
り閾値の 2 より小さい。したがって，遷移先対話状態
確定動作決定部は遷移先対話状態の確定を保留すると決
定し，5つの遷移先対話状態仮説すべてを暫定遷移先対
話状態決定部に出力する。When the recognition result is input, the transition destination dialog state determination operation determination unit refers to the transition table T1 defined in the current conversation state S1, and determines the hypothesis of the transition destination dialog state for the above five recognition results. 5 conversation states S15, S16, S1
Get 7, S18, S19. The transition destination dialog state determination operation determination unit obtains the length of the transition sequence of the transition destination dialog state hypothesis by referring to the dialog state transition storage unit. At this point, the conversation start state S
Since no dialog state hypothesis after 1 is stored in the dialog state transition storage unit, the length of the transition sequence is 0, which is smaller than the threshold value of 2. Therefore, the transition destination dialog state determination operation determination unit determines to suspend the determination of the transition destination dialog state, and outputs all five transition destination dialog state hypotheses to the provisional transition destination dialog state determination unit.

【０１０７】暫定遷移先対話状態決定部は，スコアの最
も良い「佐賀」に対する遷移先対話状態仮説 S18 を選
択し対話動作実行部へ出力し，すべての遷移先対話状態
仮説を対話状態遷移記憶部に書き加える。以上の動作の
結果，対話状態遷移記憶部の内容は図１２に示すものと
なる。The provisional transition destination dialog state deciding section selects the transition destination dialog state hypothesis S18 for “Saga” having the best score, outputs it to the dialogue execution section, and stores all transition destination dialog state hypotheses in the dialog state transition storage section. Add to As a result of the above operation, the contents of the dialog state transition storage unit are as shown in FIG.

【０１０８】暫定遷移先対話状態決定部から対話状態 S
18 が入力されると，対話動作実行部はシステム応答 R1
8 「市名をどうぞ」を利用者に出力するとともに，認識
対象語彙V18と，対話状態遷移記憶部に記憶された4つの
遷移先対話状態仮説 S15,S16, S17, S19 に規定された
認識対象語彙 V15, V16, V17, V19 を音声認識部に出力
する。From the temporary transition destination dialog state determining unit, the dialog state S
When "18" is input, the interactive action execution unit returns the system response R1.
8 Output “Please enter the city name” to the user, recognize the vocabulary V18 to be recognized, and the four transition destination hypotheses S15, S16, S17, S19 stored in the dialog state transition storage unit. Vocabulary V15, V16, V17, V19 are output to the speech recognition unit.

【０１０９】対話動作実行部が出力したシステム応答
「市名をどうぞ」に対して，利用者が「伊万里市です」
と入力した場合，音声認識部は認識対象語彙 V15, V16,
V17, V18, V19 を用いて音声認識処理を行い，認識結果
として「伊万里(0.91)，出水(0.76)，伊勢原(0.30)，八
日市(0.11)，平塚(0.09)」を出力する。In response to the system response "Please enter the city name" output by the interactive operation execution unit, the user is "Imari city."
, The speech recognition unit recognizes the vocabulary V15, V16,
Speech recognition processing is performed using V17, V18, and V19, and "Imari (0.91), Izumi (0.76), Isehara (0.30), Yokaichi (0.11), Hiratsuka (0.09)" are output as recognition results.

【０１１０】遷移先対話状態確定動作決定部は，遷移テ
ーブル T15, T16, T17, T18, T19を参照して，認識結果
に対する遷移先対話状態の仮説として5つの対話状態 S1
52, S153, S163, S182, S192 を得る。次に，図１２に
示す対話状態遷移記憶部の内容を参照すると，対話開始
状態S1から現在の対話状態 S18 までの遷移系列の長さ
は 1 であり，閾値である 2 より小さい。したがって，
遷移先対話状態確定動作決定部は遷移先対話状態の確定
を保留すると決定し，5つの遷移先対話状態仮説すべて
を暫定遷移先対話状態決定部に出力する。The destination dialog state determination operation determining unit refers to the transition tables T15, T16, T17, T18, and T19, and as a hypothesis of the destination dialog state for the recognition result, the five dialog states S1
52, S153, S163, S182, S192 are obtained. Next, referring to the contents of the dialog state transition storage unit shown in FIG. 12, the length of the transition sequence from the dialog start state S1 to the current dialog state S18 is 1, which is smaller than the threshold value of 2. Therefore,
The transition destination dialog state determination operation determination unit determines that the determination of the transition destination dialog state is suspended, and outputs all five transition destination dialog state hypotheses to the provisional transition destination dialog state determination unit.

【０１１１】暫定遷移先対話状態決定部は遷移先対話状
態仮説 S152, S153, S163, S182,S192 が入力される
と，最もスコアのよい認識結果「伊万里」に対する対話
状態 S182 を暫定遷移先対話状態と決定して対話動作実
行部に出力する。さらに，すべての遷移先対話状態仮説
を対話状態遷移記憶部に書き加え，対話状態遷移記憶部
の内容は図13に示すものとなる。When the transition destination dialog state hypothesis S152, S153, S163, S182, and S192 is input, the provisional transition destination dialog state determination unit changes the dialog state S182 for the recognition result “Imari” having the highest score to the provisional transition destination dialog state. And outputs it to the interactive operation execution unit. Further, all the transition destination dialog state hypotheses are added to the dialog state transition storage unit, and the contents of the dialog state transition storage unit are as shown in FIG.

【０１１２】暫定遷移先対話状態決定部から対話状態 S
182 が入力されると，対話動作実行部はシステム応答 R
182 「町名をどうぞ」を利用者に出力するとともに，認
識対象語彙 V182 と，対話状態遷移記憶部に記憶された
4つの遷移先対話状態仮説 S152, S153, S163, S192 に
規定された認識対象語彙 V152, V153, V163, V192 を音
声認識部に出力する。From the temporary transition destination dialog state determination unit, the dialog state S
When 182 is entered, the interactive execution unit returns the system response R
182 Outputs "Please enter the town name" to the user, and stores the vocabulary V182 to be recognized and the dialog state transition storage unit.
The recognition target vocabulary V152, V153, V163, and V192 defined in the four transition destination dialog state hypotheses S152, S153, S163, and S192 are output to the speech recognition unit.

【０１１３】対話動作実行部が出力したシステム応答
「町名をどうぞ」に対して，利用者が「黒川です」と入
力した場合，音声認識部は認識対象語彙 V182, V152, V
153,V163, V192 を用いて音声認識処理を行い，認識結
果として「黒川(0.90)，広川(0.64)，大川(0.42)，串橋
(0.13)，黒部丘(0.11)」を出力する。When the user inputs "Kurokawa is" in response to the system response "Please enter the town name" output by the interactive operation execution unit, the speech recognition unit uses the recognition target vocabulary V182, V152, V
Speech recognition was performed using 153, V163, and V192, and the results of recognition were "Kurokawa (0.90), Hirokawa (0.64), Okawa (0.42), Kushibashi
(0.13), Kurobe-oka (0.11) ".

【０１１４】遷移先対話状態確定動作決定部は遷移テー
ブル T182 を参照して遷移先対話状態仮説 S1825, S182
2, S1823, S1824, S1828 を得る。次に，図１３に示す
対話状態遷移記憶部の内容を参照すると，対話開始対話
状態 S1 から現在の対話状態S182 までの遷移系列の長
さは 2 であり閾値と等しい。したがって，遷移先対話
状態確定動作決定部は遷移先対話状態の確定を行うと決
定し，5つの遷移先対話状態仮説を遷移先対話状態確定
部に出力する。The transition destination dialog state determination operation determining unit refers to the transition table T182 to determine the transition destination dialog state hypothesis S1825, S182.
2, S1823, S1824, S1828 are obtained. Next, referring to the contents of the dialog state transition storage unit shown in FIG. 13, the length of the transition sequence from the dialog start dialog state S1 to the current dialog state S182 is 2, which is equal to the threshold value. Therefore, the transition destination dialog state determination operation determination unit determines to determine the transition destination dialog state, and outputs five transition destination dialog state hypotheses to the transition destination dialog state determination unit.

【０１１５】遷移先対話状態確定部は遷移先対話状態仮
説 S1825, S1822, S1823, S1824,S1828 が入力される
と，最もスコアのよい「黒川」を利用者に確認すること
で遷移先対話状態を S1825 に確定し，対話動作実行部
に出力する。When the transition destination dialog state hypothesis S1825, S1822, S1823, S1824, S1828 is input, the transition destination dialog state determination unit confirms the user with the highest score “Kurokawa” to determine the transition destination dialog state. Determined in S1825, and output to interactive operation execution unit.

【０１１６】対話動作実行部は，対話状態 S1825 が入
力されると，認識対象語彙 V1825 を音声認識部に出力
し，システム応答 R1825 「業種をどうぞ」を利用者に
出力して対話を継続する。When the dialogue state S1825 is input, the dialogue operation execution unit outputs the recognition target vocabulary V1825 to the speech recognition unit, and outputs the system response R1825 “Please type of business” to the user to continue the dialogue.

【０１１７】以上の動作により，対話状態遷移記憶部が
利用者の入力に対する対話状態遷移の仮説を複数保持
し，遷移先対話状態確定動作決定部が，最も最近に確定
した対話状態からの対話状態遷移仮説系列の長さが閾値
以上になった場合に遷移先対話状態を一つに確定するた
め，一発話毎に利用者へ確認を行なって確定的に対話を
進めなくても認識率を向上でき，さらに確認対話の回数
が減るため利用者と装置との自然な対話が実現でき利用
者の利便性が向上する。With the above operation, the dialog state transition storage unit holds a plurality of hypotheses of the dialog state transition in response to the user's input, and the transition destination dialog state determining operation determining unit determines the dialog state from the most recently determined dialog state. When the length of the transition hypothesis sequence exceeds the threshold, the transition destination dialog state is determined as one, so the user is confirmed for each utterance and the recognition rate is improved without having to proceed deterministically. Since the number of confirmation dialogues can be reduced, a natural dialogue between the user and the device can be realized, and the convenience of the user improves.

【０１１８】[0118]

【The invention's effect】

【０１１９】以上のように、この発明によれば，遷移先
対話状態確定動作決定部が，現在選択している仮説が所
定の条件を満たすときに遷移先対話状態を一つに確定す
るため，一発話毎に利用者へ確認を行なって確定的に対
話を進めなくても認識率を向上でき，さらに確認対話の
回数が減るため利用者と装置との自然な対話が実現でき
利用者の利便性が向上する。As described above, according to the present invention, when the currently selected hypothesis satisfies a predetermined condition, the transition destination dialog state determination operation determination unit determines one transition destination dialog state. The recognition rate can be improved without confirming to the user for each utterance and proceeding deterministically, and the number of confirmation dialogues can be reduced to realize a natural dialogue between the user and the device, which is convenient for the user. The performance is improved.

【０１２０】また、この発明によれば，対話状態遷移記
憶部が利用者の入力に対する対話状態遷移の仮説を複数
保持し，遷移先対話状態確定動作決定部が，現在選択し
ている仮説に対する認識スコアが閾値より悪くなったと
きに遷移先対話状態を一つに確定するため，一発話毎に
利用者へ確認を行なって確定的に対話を進めなくても認
識率を向上でき，さらに確認対話の回数が減るため利用
者と装置との自然な対話が実現でき利用者の利便性が向
上する。Further, according to the present invention, the dialog state transition storage unit holds a plurality of hypotheses of the dialog state transition in response to the user's input, and the transition destination dialog state determination operation determination unit recognizes the currently selected hypothesis. When the score becomes worse than the threshold, the transition destination dialog state is determined as one, so the user can be confirmed for each utterance, and the recognition rate can be improved without having to proceed deterministically. Since the number of times is reduced, a natural conversation between the user and the device can be realized, and the convenience of the user is improved.

【０１２１】また、この発明によれば，規定された認識
対象語彙が大きいため他の対話状態の認識対象語彙と同
時に音声認識処理を行うことが望ましくなく，該対話状
態に遷移する直前に予め確定動作を行う必要がある対話
状態に対して，遷移先対話状態確定動作決定部が確定動
作実行を決定し，遷移先対話状態決定部が遷移先対話状
態を確定するため，認識対象語彙を限定でき認識率が向
上する。Further, according to the present invention, since the specified vocabulary to be recognized is large, it is not desirable to perform the speech recognition process simultaneously with the vocabulary to be recognized in another dialogue state. For the dialog state that needs to be performed, the destination dialog state deciding operation deciding unit decides the execution of the definitive operation, and the transition destination dialog state deciding unit decides the transition destination dialog state. The recognition rate is improved.

【０１２２】また、この発明によれば，利用者の入力に
対する対話状態遷移の仮説を複数保持し，利用者からの
入力項目がすべて入力されていなくても，認識結果を確
定することにより未入力項目に対する項目値が一意に定
まる場合に遷移先対話状態を一つに確定するため，一発
話毎に利用者へ確認を行なって確定的に対話を進めなく
ても認識率を向上でき，さらに確認対話の回数が減るた
め利用者と装置との自然な対話が実現でき利用者の利便
性が向上する。Further, according to the present invention, a plurality of hypotheses of a dialog state transition in response to a user's input are held, and even if all the input items from the user have not been input, the recognition result is determined so that no input is made. When the item value for an item is uniquely determined, the transition destination dialog state is determined as one, so the user can be confirmed for each utterance and the recognition rate can be improved without steadily proceeding with the dialogue. Since the number of conversations is reduced, a natural conversation between the user and the device can be realized, and the convenience of the user is improved.

【０１２３】また、この発明によれば，対話状態遷移記
憶部が利用者の入力に対する対話状態遷移の仮説を複数
保持し，遷移先対話状態確定動作決定部が，遷移先対話
状態仮説に共通のシステム発話が存在しなくなった場合
に遷移先対話状態を一つに確定するため，一発話毎に利
用者へ確認を行なって確定的に対話を進めなくても認識
率を向上でき，さらに確認対話の回数が減るため利用者
と装置との自然な対話が実現でき利用者の利便性が向上
する。Further, according to the present invention, the conversation state transition storage unit holds a plurality of conversation state transition hypotheses in response to a user's input, and the transition destination conversation state determination operation determining unit determines the common state of the transition destination conversation state hypothesis. When the system utterance no longer exists, the transition destination dialog state is determined as one, so the user can be confirmed for each utterance and the recognition rate can be improved without steadily proceeding with the dialog. Since the number of times is reduced, a natural conversation between the user and the device can be realized, and the convenience of the user is improved.

【０１２４】また、この発明によれば，対話手順記憶部
に記憶された各対話状態に複数のシステム応答を記述す
ることで，遷移先対話状態仮説に共通のシステム発話が
存在する場合は，遷移先対話状態確定動作決定部は確認
による確定動作を行わず，各遷移先対話状態仮説に共通
のシステム発話を出力して対話を継続し，一方，遷移先
対話状態確定部で遷移先対話状態が確定した場合には，
確定した対話状態に固有のシステム応答を行えるため，
一発話毎に利用者へ確認を行なって確定的に対話を進め
なくても認識率を向上でき，さらに対話状態遷移に応じ
た自然な応答を行えるため，利用者と装置との自然な対
話が実現でき利用者の利便性が向上する。Further, according to the present invention, by describing a plurality of system responses in each dialog state stored in the dialog procedure storage unit, when a common system utterance exists in the transition destination dialog state hypothesis, the transition state is determined. The destination dialog state determination operation deciding unit does not perform the confirmation operation by confirmation, outputs a system utterance common to each transition destination dialog state hypothesis, and continues the dialogue, while the transition destination dialog state determination unit determines the transition destination dialog state. If confirmed,
Since a system response specific to the confirmed dialogue state can be performed,
The recognition rate can be improved without confirming the user every utterance and proceeding deterministically, and a natural response can be made according to the dialog state transition. It can be realized and user convenience is improved.

【０１２５】また、この発明によれば，遷移先対話状態
仮説に規定された認識対象語彙の規模が大きく，認識率
が低下する恐れがある場合に遷移先対話状態確定動作決
定部が確定動作実行を決定し，遷移先対話状態決定部が
遷移先対話状態を確定するため，認識対象語彙を限定で
き認識率が向上する。Further, according to the present invention, when the size of the recognition target vocabulary specified in the transition destination dialog state hypothesis is large and the recognition rate may be reduced, the transition destination dialog state determination operation determining unit executes the determination operation. Is determined, and the transition destination dialog state determination unit determines the transition destination dialog state, so that the recognition target vocabulary can be limited and the recognition rate can be improved.

【０１２６】また、この発明によれば，対話状態遷移記
憶部が利用者の入力に対する対話状態遷移の仮説を複数
保持し，遷移先対話状態確定動作決定部が，最も最近に
確定した対話状態からの対話状態遷移仮説系列の長さが
閾値以上になった場合に遷移先対話状態を一つに確定す
るため，一発話毎に利用者へ確認を行なって確定的に対
話を進めなくても認識率を向上でき，さらに確認対話の
回数が減るため利用者と装置との自然な対話が実現でき
利用者の利便性が向上する。Further, according to the present invention, the dialog state transition storage unit holds a plurality of hypotheses of the dialog state transition in response to the user's input, and the transition destination dialog state determining operation determining unit determines the most recently determined dialog state from the most recently determined dialog state. If the length of the dialogue state transition hypothesis sequence exceeds the threshold, the transition destination dialogue state is determined as one. Rate can be improved, and the number of confirmation dialogues can be reduced, so that a natural dialogue between the user and the device can be realized, thereby improving the convenience of the user.

[Brief description of the drawings]

【図１】この発明の実施の形態１である音声対話装置
の構成図。FIG. 1 is a configuration diagram of a voice interaction device according to a first embodiment of the present invention.

【図２】実施の形態１における対話手順記憶部に保持
された対話状態の一例を示す説明図。FIG. 2 is an explanatory diagram showing an example of a conversation state held in a conversation procedure storage unit according to the first embodiment;

【図３】実施の形態１における遷移先対話状態仮説を
対話状態遷移記憶部に書き加えた結果の説明図。FIG. 3 is an explanatory diagram of a result of adding a transition destination dialog state hypothesis according to the first embodiment to a dialog state transition storage unit.

【図４】実施の形態１における遷移先対話状態仮説確
定結果の対話状態遷移記憶部の説明図。FIG. 4 is an explanatory diagram of a dialog state transition storage unit of a transition destination dialog state hypothesis determination result in the first embodiment.

【図５】実施の形態２における対話手順記憶部に保持
された対話状態の一例を示す説明図。FIG. 5 is an explanatory diagram illustrating an example of a dialogue state stored in a dialogue procedure storage unit according to the second embodiment.

【図６】実施の形態２における対話手順記憶部に保持
された対話状態の一例を示す説明図。FIG. 6 is an explanatory diagram showing an example of a conversation state held in a conversation procedure storage unit according to the second embodiment.

【図７】実施の形態２における遷移先対話状態仮説を
対話状態遷移記憶部に書き加えた結果の説明図。FIG. 7 is an explanatory diagram of a result of adding a transition destination dialog state hypothesis according to the second embodiment to a dialog state transition storage unit.

【図８】実施の形態３における対話手順記憶部に記憶
された対話状態の説明図。FIG. 8 is an explanatory diagram of a conversation state stored in a conversation procedure storage unit according to the third embodiment.

【図９】実施の形態３における電話番号データベース
の説明図。FIG. 9 is an explanatory diagram of a telephone number database according to the third embodiment.

【図１０】実施の形態４における対話手順記憶部に記
憶された対話状態の説明図。FIG. 10 is an explanatory diagram of a conversation state stored in a conversation procedure storage unit according to the fourth embodiment.

【図１１】実施の形態５における対話手順記憶部に記
憶された対話状態の説明図。FIG. 11 is an explanatory diagram of a dialogue state stored in a dialogue procedure storage unit according to the fifth embodiment.

【図１２】実施の形態７における遷移先対話状態仮説
を対話状態遷移記憶部に書き加えた結果の説明図。FIG. 12 is an explanatory diagram of a result of adding a transition destination dialog state hypothesis according to the seventh embodiment to a dialog state transition storage unit.

【図１３】実施の形態７における対話状態遷移記憶部
の対話開始対話状態から現対話状態までの遷移系列の説
明図。FIG. 13 is an explanatory diagram of a transition sequence from a conversation start conversation state to a current conversation state in the conversation state transition storage unit according to the seventh embodiment.

【図１４】従来の認識候補抽出装置の構成図。FIG. 14 is a configuration diagram of a conventional recognition candidate extraction device.

[Explanation of symbols]

1：音声認識部，2：対話手順記憶部，3：対話状態遷移
記憶部，4：遷移先対話状態確定動作決定部，5：遷移先
対話状態確定部，6：暫定遷移先対話状態決定部，7：対
話動作実行部。1: Speech recognition unit, 2: Dialogue procedure storage unit, 3: Dialogue state transition storage unit, 4: Transition destination dialogue state determination operation determination unit, 5: Transition destination dialogue state determination unit, 6: Temporary transition destination dialogue state determination unit , 7: Interactive operation execution unit.

───────────────────────────────────────────────────── フロントページの続き (72)発明者石川泰東京都千代田区丸の内二丁目２番３号三菱電機株式会社内Ｆターム(参考） 5D015 AA05 JJ00 LL11 ────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Yasushi Ishikawa 2-3-2 Marunouchi, Chiyoda-ku, Tokyo F-term (reference) in Mitsubishi Electric Corporation 5D015 AA05 JJ00 LL11

Claims

[Claims]

A speech recognition unit, a conversation procedure storage unit, a transition destination conversation state determination operation decision unit, and a conversation operation execution unit, and a speech for acquiring information required by a user through speech conversation. A dialogue apparatus, wherein a dialogue procedure storage unit defines and stores a vocabulary to be recognized in each dialogue state, a system response, an answer assumed for the system response, and a transition destination dialogue state corresponding to the answer, and performs speech recognition. The unit performs speech recognition on the input speech using the recognition target vocabulary corresponding to each dialogue state stored in the dialogue procedure storage unit, outputs a plurality of recognition results, and determines a transition destination dialogue state determination operation. The unit determines a transition destination conversation state based on the recognition result from the speech recognition unit and the contents of the conversation procedure storage unit, and determines one when the hypothesis of the transition destination conversation state satisfies a predetermined condition, If the conditions are not met When the decision is made to suspend the decision and the transition destination dialog state hypothesis is output, the dialogue execution unit determines the transition destination dialog state hypothesis from the transition destination dialog state decision operation decision unit when the hypothesis is decided to be one. And outputting a system response of a transition destination dialog state hypothesis when outputting a system response for confirming the confirmation.

2. A dialog state transition storage unit, a transition destination dialog state determination unit, and a provisional transition destination dialog state determination unit are added, and the transition destination dialog state determination operation determination unit determines the recognition result from the speech recognition unit. Determines whether the hypothesis of the destination dialog state determined from the contents of the dialog state transition storage unit or the dialog procedure storage unit is determined or suspended, outputs the destination dialog state hypothesis, and outputs the destination dialog state. The state determination unit inputs the transition destination dialog state hypothesis from the transition destination dialog state determination operation determination unit into one, and confirms the recognition result to the user to determine the transition destination dialog state hypothesis. The dialogue state is determined and output, and the stored transitional dialogue state hypothesis is rewritten in the dialogue state transition storage unit. Of the destination dialog state In this case, the transition destination dialog state hypothesis is input, the provisional transition destination dialog state is determined and output, and the transition destination dialog state hypothesis is rewritten to the dialog state transition storage unit. The dialog state transition history from the start of the dialog and the transition destination dialog state determination unit or the transition destination dialog state hypothesis from the provisional transition destination dialog state determination unit are stored, and the dialog operation execution unit stores the transition destination dialog state determination unit or The transition destination dialog state from the provisional transition destination dialog state determination unit is input, a system response defined in the transition destination dialog state is output, and the recognition target vocabulary defined in the transition destination dialog state is recognized by the speech recognition unit. 2. The speech recognition unit according to claim 1, wherein the speech recognition unit performs speech recognition on the input speech using a recognition target vocabulary input from the interactive operation execution unit, and outputs a plurality of recognition results. sound of Voice interaction device.

3. The speech recognition unit is configured to output a plurality of recognition results and a score of the recognition results, and the transition destination dialog state determination operation determining unit determines a score of the recognition result input from the speech recognition unit. The voice interaction apparatus according to claim 1, wherein whether to perform a confirmation operation is determined according to the condition.

4. Each of the dialog states stored in the dialog procedure storage unit describes whether or not it is necessary to perform a definite operation in order to perform a state transition from another dialog state to the dialog state. The destination dialog state determining operation determining unit determines that the hypothesis of the destination dialog state determined from the recognition result input from the voice recognition unit, the contents of the dialog state transition storage unit, and the dialog procedure requires that the determining operation be performed in advance. The voice interaction device according to any one of claims 1 to 3, wherein it is determined that a confirmation operation is performed.

5. The transition destination dialog state determining operation determining unit determines a recognition result from the voice recognition unit to uniquely determine an item value for an uninput item even if all input items from the user have not been input. The voice interaction apparatus according to claim 1, wherein when it is determined to be, a decision operation is performed.

6. The transition destination dialog state determining operation determining unit determines whether or not to perform a determining operation according to a system response defined in the transition destination dialog state hypothesis.
6. The voice interaction device according to any one of items 5 to 5.

7. A transition destination dialog state deciding operation determining unit decides that a decision operation is to be performed when there is no common system response in the transition destination dialog state hypothesis, and a common system response exists in the transition destination dialog state hypothesis. 7. The voice interaction apparatus according to claim 6, wherein, when performing, only the transition destination dialog state hypothesis having a common system utterance is output as the transition destination dialog state hypothesis.

8. A system in which a plurality of system responses can be described in each dialog state stored in the dialog procedure storage unit, and the dialog operation execution unit is configured to input a transition destination dialog state from the temporary transition destination dialog state determination unit. Outputting a system response defined in the input destination dialog state that is common to the system response specified in the destination dialog state hypothesis stored in the interaction state transition storage unit, among the system responses specified in the input destination state. Item 7. The voice interaction device according to item 5 or 6.

9. The transition destination dialog state determining operation determining unit determines that the determining operation is performed when the total vocabulary of all the recognition target vocabularies of the transition destination dialog state hypothesis is larger than a predetermined reference. The speech dialogue device according to claim 1, wherein:

10. The transition destination dialog state determining operation determining unit refers to the dialog state transition storage unit and determines that the length of the transition sequence from the determined dialog state to the transition destination dialog state hypothesis is not less than a predetermined reference value. The voice interaction device according to claim 1, wherein it is determined that a decision operation is performed in such a case.